add readme

This commit is contained in:
2026-03-11 17:08:28 +07:00
parent 321d04acc7
commit 758b5ebad2

129
README.md
View File

@@ -1,5 +1,124 @@
<!-- ------------------------------------------------------------------------------------------- --> # check_and_reboot
<!-- add the following to root's crontab (sudo crontab -e) -->
<!-- ------------------------------------------------------------------------------------------- --> An automated monitoring and system recovery tool written in Julia that checks network connectivity and website availability, with automatic reboot capabilities when failures are detected.
@reboot /usr/local/bin/juliar /home/ton/docker-programs/check_yiem_website_reboot/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1
# *** juliar is root's julia (sudo crontab -e) but I symlinked to juliar because I want to seperate it from user's julia ## Overview
This project consists of two monitoring scripts:
- **[`check_router_reboot.jl`](check_router_reboot.jl)** - Monitors router connectivity via ICMP ping
- **[`check_yiem_website_reboot.jl`](check_yiem_website_reboot.jl)** - Monitors website availability via HTTP requests
Both scripts run continuously in the background, performing periodic health checks and automatically rebooting the system if consecutive failures exceed a configured threshold.
## Features
- **Continuous Monitoring**: Runs indefinitely with configurable check intervals
- **Multi-attempt Verification**: Retries failed checks with backoff before declaring failure
- **State Persistence**: Maintains state in JSON file across restarts
- **Cooldown Period**: Prevents rapid repeated reboots after a reboot event
- **Cross-Platform Support**: Works on Linux, macOS, and Windows with appropriate reboot commands
- **Broadcast Notifications**: Sends system-wide notifications on events (via `wall`, `logger`, or platform equivalents)
- **Log Rotation**: Automatically limits log file to last 100 entries to prevent unbounded growth
- **Dry Run Mode**: Test configuration without triggering actual reboots
## Configuration
### Router Monitor Configuration (`check_router_reboot.jl`)
```julia
const ROUTER_IP = "192.168.88.1" # Target router IP address
const TIMEOUT_SECS = 30 # Request timeout in seconds
const ATTEMPTS_PER_CHECK = 3 # Number of ping attempts per check
const BACKOFF_BETWEEN_ATTEMPTS = 60 # Seconds between retry attempts
const FAILS_TO_REBOOT = 3 # Consecutive failures before reboot
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
const DRY_RUN = true # Set false to enable actual reboots
const CHECK_INTERVAL_SECS = 60 # Check interval in seconds
```
### Website Monitor Configuration (`check_yiem_website_reboot.jl`)
```julia
const URL = "https://www.yiem.cc" # Target URL to monitor
const TIMEOUT_SECS = 30 # Request timeout in seconds
const ATTEMPTS_PER_CHECK = 3 # Number of HTTP attempts per check
const BACKOFF_BETWEEN_ATTEMPTS = 60 # Seconds between retry attempts
const FAILS_TO_REBOOT = 3 # Consecutive failures before reboot
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
const DRY_RUN = false # Set false to enable actual reboots
const CHECK_INTERVAL_SECS = 60 # Check interval in seconds
```
## Usage
### Running Manually
```bash
julia check_router_reboot.jl
julia check_yiem_website_reboot.jl
```
### Running at System Boot (Crontab)
Add the following to root's crontab (`sudo crontab -e`):
```
@reboot /usr/local/bin/juliar /path/to/check_router_reboot.jl >> /var/log/check_reboot.log 2>&1
@reboot /usr/local/bin/juliar /path/to/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1
```
**Note**: The scripts use `juliar` which is a symlink to Julia for root (separate from user's Julia installation).
### Required Dependencies
Install the required Julia packages:
```bash
julia -e 'using Pkg; Pkg.add(["HTTP", "Dates", "JSON"])'
```
## Files
| File | Description |
|------|-------------|
| [`check_router_reboot.jl`](check_router_reboot.jl) | Router ping monitor with auto-reboot |
| [`check_yiem_website_reboot.jl`](check_yiem_website_reboot.jl) | Website HTTP monitor with auto-reboot |
| [`check_and_reboot_state.json`](check_and_reboot_state.json) | State persistence file (generated) |
| [`check_router_reboot_log.txt`](check_router_reboot_log.txt) | Router monitor log file |
| [`check_website_reboot_log.txt`](check_website_reboot_log.txt) | Website monitor log file |
## State File
The state is stored in [`check_and_reboot_state.json`](check_and_reboot_state.json) with the following structure:
```json
{
"consecutive_fails": 0,
"last_reboot_datetime": "2026-03-11T10:00:00"
}
```
## Log Output
Logs are written to both the console and the respective log files with timestamps:
```
[2026-03-11T10:05:09.123] Starting check loop. Checking router 192.168.88.1 every 60 seconds.
[2026-03-11T10:05:09.456] 192.168.88.1 is reachable; resetting consecutive failure counter.
[2026-03-11T10:06:09.789] 192.168.88.1 is unreachable (last result: no response). Consecutive fails: 1/3.
```
## Reboot Commands
The scripts automatically select the appropriate reboot command based on the operating system:
- **Linux**: `sudo systemctl reboot` or `sudo reboot`
- **macOS**: `sudo shutdown -r now`
- **Windows**: `shutdown /r /t 0`
## Safety Features
1. **Cooldown Period**: After a reboot, the script waits `COOLDOWN_AFTER_REBOOT_SECS` seconds before performing another check
2. **Consecutive Failures**: Requires multiple consecutive failures before triggering a reboot
3. **Dry Run Mode**: Set `DRY_RUN = true` to test without actually rebooting