# check_and_reboot An automated network card recovery tool written in Julia that monitors network connectivity and triggers system reboot to recover from network card hang/stuck conditions. ## Overview This project consists of two monitoring scripts that detect network failures and automatically reboot the system to recover the network card: - **[`check_router_reboot.jl`](check_router_reboot.jl)** - Monitors router connectivity via ICMP ping to detect when the network stack becomes unresponsive - **[`check_yiem_website_reboot.jl`](check_yiem_website_reboot.jl)** - Monitors website availability via HTTP requests as an additional network health indicator Both scripts run continuously in the background, performing periodic health checks and automatically rebooting the system if consecutive failures exceed a configured threshold. The reboot serves to reset the network card and restore network connectivity. ## Features - **Continuous Monitoring**: Runs indefinitely with configurable check intervals - **Multi-attempt Verification**: Retries failed checks with backoff before declaring failure - **State Persistence**: Maintains state in JSON file across restarts - **Cooldown Period**: Prevents rapid repeated reboots after a reboot event - **Cross-Platform Support**: Works on Linux, macOS, and Windows with appropriate reboot commands - **Broadcast Notifications**: Sends system-wide notifications on events (via `wall`, `logger`, or platform equivalents) - **Log Rotation**: Automatically limits log file to last 100 entries to prevent unbounded growth - **Dry Run Mode**: Test configuration without triggering actual reboots ## Configuration ### Router Monitor Configuration (`check_router_reboot.jl`) ```julia const ROUTER_IP = "192.168.88.1" # Target router IP address const TIMEOUT_SECS = 30 # Request timeout in seconds const ATTEMPTS_PER_CHECK = 3 # Number of ping attempts per check const BACKOFF_BETWEEN_ATTEMPTS = 60 # Seconds between retry attempts const FAILS_TO_REBOOT = 3 # Consecutive failures before reboot const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots const DRY_RUN = true # Set false to enable actual reboots const CHECK_INTERVAL_SECS = 60 # Check interval in seconds ``` ### Website Monitor Configuration (`check_yiem_website_reboot.jl`) ```julia const URL = "https://www.yiem.cc" # Target URL to monitor const TIMEOUT_SECS = 30 # Request timeout in seconds const ATTEMPTS_PER_CHECK = 3 # Number of HTTP attempts per check const BACKOFF_BETWEEN_ATTEMPTS = 60 # Seconds between retry attempts const FAILS_TO_REBOOT = 3 # Consecutive failures before reboot const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots const DRY_RUN = false # Set false to enable actual reboots const CHECK_INTERVAL_SECS = 60 # Check interval in seconds ``` ## Usage ### Running Manually ```bash julia check_router_reboot.jl julia check_yiem_website_reboot.jl ``` ### Running at System Boot (Crontab) Add the following to root's crontab (`sudo crontab -e`): ``` @reboot /usr/local/bin/juliar /path/to/check_router_reboot.jl >> /var/log/check_reboot.log 2>&1 @reboot /usr/local/bin/juliar /path/to/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1 ``` **Note**: The scripts use `juliar` which is a symlink to Julia for root (separate from user's Julia installation). ### Required Dependencies Install the required Julia packages: ```bash julia -e 'using Pkg; Pkg.add(["HTTP", "Dates", "JSON"])' ``` ## Files | File | Description | |------|-------------| | [`check_router_reboot.jl`](check_router_reboot.jl) | Router ping monitor with auto-reboot | | [`check_yiem_website_reboot.jl`](check_yiem_website_reboot.jl) | Website HTTP monitor with auto-reboot | | [`check_and_reboot_state.json`](check_and_reboot_state.json) | State persistence file (generated) | | [`check_router_reboot_log.txt`](check_router_reboot_log.txt) | Router monitor log file | | [`check_website_reboot_log.txt`](check_website_reboot_log.txt) | Website monitor log file | ## State File The state is stored in [`check_and_reboot_state.json`](check_and_reboot_state.json) with the following structure: ```json { "consecutive_fails": 0, "last_reboot_datetime": "2026-03-11T10:00:00" } ``` ## Log Output Logs are written to both the console and the respective log files with timestamps: ``` [2026-03-11T10:05:09.123] Starting check loop. Checking router 192.168.88.1 every 60 seconds. [2026-03-11T10:05:09.456] 192.168.88.1 is reachable; resetting consecutive failure counter. [2026-03-11T10:06:09.789] 192.168.88.1 is unreachable (last result: no response). Consecutive fails: 1/3. ``` ## Reboot Commands The scripts automatically select the appropriate reboot command based on the operating system: - **Linux**: `sudo systemctl reboot` or `sudo reboot` - **macOS**: `sudo shutdown -r now` - **Windows**: `shutdown /r /t 0` ## Safety Features 1. **Cooldown Period**: After a reboot, the script waits `COOLDOWN_AFTER_REBOOT_SECS` seconds before performing another check 2. **Consecutive Failures**: Requires multiple consecutive failures before triggering a reboot 3. **Dry Run Mode**: Set `DRY_RUN = true` to test without actually rebooting