add readme
This commit is contained in:
129
README.md
129
README.md
@@ -1,5 +1,124 @@
|
|||||||
<!-- ------------------------------------------------------------------------------------------- -->
|
# check_and_reboot
|
||||||
<!-- add the following to root's crontab (sudo crontab -e) -->
|
|
||||||
<!-- ------------------------------------------------------------------------------------------- -->
|
An automated monitoring and system recovery tool written in Julia that checks network connectivity and website availability, with automatic reboot capabilities when failures are detected.
|
||||||
@reboot /usr/local/bin/juliar /home/ton/docker-programs/check_yiem_website_reboot/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1
|
|
||||||
# *** juliar is root's julia (sudo crontab -e) but I symlinked to juliar because I want to seperate it from user's julia
|
## Overview
|
||||||
|
|
||||||
|
This project consists of two monitoring scripts:
|
||||||
|
|
||||||
|
- **[`check_router_reboot.jl`](check_router_reboot.jl)** - Monitors router connectivity via ICMP ping
|
||||||
|
- **[`check_yiem_website_reboot.jl`](check_yiem_website_reboot.jl)** - Monitors website availability via HTTP requests
|
||||||
|
|
||||||
|
Both scripts run continuously in the background, performing periodic health checks and automatically rebooting the system if consecutive failures exceed a configured threshold.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Continuous Monitoring**: Runs indefinitely with configurable check intervals
|
||||||
|
- **Multi-attempt Verification**: Retries failed checks with backoff before declaring failure
|
||||||
|
- **State Persistence**: Maintains state in JSON file across restarts
|
||||||
|
- **Cooldown Period**: Prevents rapid repeated reboots after a reboot event
|
||||||
|
- **Cross-Platform Support**: Works on Linux, macOS, and Windows with appropriate reboot commands
|
||||||
|
- **Broadcast Notifications**: Sends system-wide notifications on events (via `wall`, `logger`, or platform equivalents)
|
||||||
|
- **Log Rotation**: Automatically limits log file to last 100 entries to prevent unbounded growth
|
||||||
|
- **Dry Run Mode**: Test configuration without triggering actual reboots
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Router Monitor Configuration (`check_router_reboot.jl`)
|
||||||
|
|
||||||
|
```julia
|
||||||
|
const ROUTER_IP = "192.168.88.1" # Target router IP address
|
||||||
|
const TIMEOUT_SECS = 30 # Request timeout in seconds
|
||||||
|
const ATTEMPTS_PER_CHECK = 3 # Number of ping attempts per check
|
||||||
|
const BACKOFF_BETWEEN_ATTEMPTS = 60 # Seconds between retry attempts
|
||||||
|
const FAILS_TO_REBOOT = 3 # Consecutive failures before reboot
|
||||||
|
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
|
||||||
|
const DRY_RUN = true # Set false to enable actual reboots
|
||||||
|
const CHECK_INTERVAL_SECS = 60 # Check interval in seconds
|
||||||
|
```
|
||||||
|
|
||||||
|
### Website Monitor Configuration (`check_yiem_website_reboot.jl`)
|
||||||
|
|
||||||
|
```julia
|
||||||
|
const URL = "https://www.yiem.cc" # Target URL to monitor
|
||||||
|
const TIMEOUT_SECS = 30 # Request timeout in seconds
|
||||||
|
const ATTEMPTS_PER_CHECK = 3 # Number of HTTP attempts per check
|
||||||
|
const BACKOFF_BETWEEN_ATTEMPTS = 60 # Seconds between retry attempts
|
||||||
|
const FAILS_TO_REBOOT = 3 # Consecutive failures before reboot
|
||||||
|
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
|
||||||
|
const DRY_RUN = false # Set false to enable actual reboots
|
||||||
|
const CHECK_INTERVAL_SECS = 60 # Check interval in seconds
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Running Manually
|
||||||
|
|
||||||
|
```bash
|
||||||
|
julia check_router_reboot.jl
|
||||||
|
julia check_yiem_website_reboot.jl
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running at System Boot (Crontab)
|
||||||
|
|
||||||
|
Add the following to root's crontab (`sudo crontab -e`):
|
||||||
|
|
||||||
|
```
|
||||||
|
@reboot /usr/local/bin/juliar /path/to/check_router_reboot.jl >> /var/log/check_reboot.log 2>&1
|
||||||
|
@reboot /usr/local/bin/juliar /path/to/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: The scripts use `juliar` which is a symlink to Julia for root (separate from user's Julia installation).
|
||||||
|
|
||||||
|
### Required Dependencies
|
||||||
|
|
||||||
|
Install the required Julia packages:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
julia -e 'using Pkg; Pkg.add(["HTTP", "Dates", "JSON"])'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| [`check_router_reboot.jl`](check_router_reboot.jl) | Router ping monitor with auto-reboot |
|
||||||
|
| [`check_yiem_website_reboot.jl`](check_yiem_website_reboot.jl) | Website HTTP monitor with auto-reboot |
|
||||||
|
| [`check_and_reboot_state.json`](check_and_reboot_state.json) | State persistence file (generated) |
|
||||||
|
| [`check_router_reboot_log.txt`](check_router_reboot_log.txt) | Router monitor log file |
|
||||||
|
| [`check_website_reboot_log.txt`](check_website_reboot_log.txt) | Website monitor log file |
|
||||||
|
|
||||||
|
## State File
|
||||||
|
|
||||||
|
The state is stored in [`check_and_reboot_state.json`](check_and_reboot_state.json) with the following structure:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"consecutive_fails": 0,
|
||||||
|
"last_reboot_datetime": "2026-03-11T10:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Log Output
|
||||||
|
|
||||||
|
Logs are written to both the console and the respective log files with timestamps:
|
||||||
|
|
||||||
|
```
|
||||||
|
[2026-03-11T10:05:09.123] Starting check loop. Checking router 192.168.88.1 every 60 seconds.
|
||||||
|
[2026-03-11T10:05:09.456] 192.168.88.1 is reachable; resetting consecutive failure counter.
|
||||||
|
[2026-03-11T10:06:09.789] 192.168.88.1 is unreachable (last result: no response). Consecutive fails: 1/3.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Reboot Commands
|
||||||
|
|
||||||
|
The scripts automatically select the appropriate reboot command based on the operating system:
|
||||||
|
|
||||||
|
- **Linux**: `sudo systemctl reboot` or `sudo reboot`
|
||||||
|
- **macOS**: `sudo shutdown -r now`
|
||||||
|
- **Windows**: `shutdown /r /t 0`
|
||||||
|
|
||||||
|
## Safety Features
|
||||||
|
|
||||||
|
1. **Cooldown Period**: After a reboot, the script waits `COOLDOWN_AFTER_REBOOT_SECS` seconds before performing another check
|
||||||
|
2. **Consecutive Failures**: Requires multiple consecutive failures before triggering a reboot
|
||||||
|
3. **Dry Run Mode**: Set `DRY_RUN = true` to test without actually rebooting
|
||||||
|
|||||||
Reference in New Issue
Block a user