Files
check_and_reboot/README.md
2026-03-11 17:08:28 +07:00

5.0 KiB

check_and_reboot

An automated monitoring and system recovery tool written in Julia that checks network connectivity and website availability, with automatic reboot capabilities when failures are detected.

Overview

This project consists of two monitoring scripts:

Both scripts run continuously in the background, performing periodic health checks and automatically rebooting the system if consecutive failures exceed a configured threshold.

Features

  • Continuous Monitoring: Runs indefinitely with configurable check intervals
  • Multi-attempt Verification: Retries failed checks with backoff before declaring failure
  • State Persistence: Maintains state in JSON file across restarts
  • Cooldown Period: Prevents rapid repeated reboots after a reboot event
  • Cross-Platform Support: Works on Linux, macOS, and Windows with appropriate reboot commands
  • Broadcast Notifications: Sends system-wide notifications on events (via wall, logger, or platform equivalents)
  • Log Rotation: Automatically limits log file to last 100 entries to prevent unbounded growth
  • Dry Run Mode: Test configuration without triggering actual reboots

Configuration

Router Monitor Configuration (check_router_reboot.jl)

const ROUTER_IP = "192.168.88.1"      # Target router IP address
const TIMEOUT_SECS = 30                # Request timeout in seconds
const ATTEMPTS_PER_CHECK = 3           # Number of ping attempts per check
const BACKOFF_BETWEEN_ATTEMPTS = 60    # Seconds between retry attempts
const FAILS_TO_REBOOT = 3              # Consecutive failures before reboot
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
const DRY_RUN = true                   # Set false to enable actual reboots
const CHECK_INTERVAL_SECS = 60         # Check interval in seconds

Website Monitor Configuration (check_yiem_website_reboot.jl)

const URL = "https://www.yiem.cc"      # Target URL to monitor
const TIMEOUT_SECS = 30                # Request timeout in seconds
const ATTEMPTS_PER_CHECK = 3           # Number of HTTP attempts per check
const BACKOFF_BETWEEN_ATTEMPTS = 60    # Seconds between retry attempts
const FAILS_TO_REBOOT = 3              # Consecutive failures before reboot
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
const DRY_RUN = false                  # Set false to enable actual reboots
const CHECK_INTERVAL_SECS = 60         # Check interval in seconds

Usage

Running Manually

julia check_router_reboot.jl
julia check_yiem_website_reboot.jl

Running at System Boot (Crontab)

Add the following to root's crontab (sudo crontab -e):

@reboot /usr/local/bin/juliar /path/to/check_router_reboot.jl >> /var/log/check_reboot.log 2>&1
@reboot /usr/local/bin/juliar /path/to/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1

Note: The scripts use juliar which is a symlink to Julia for root (separate from user's Julia installation).

Required Dependencies

Install the required Julia packages:

julia -e 'using Pkg; Pkg.add(["HTTP", "Dates", "JSON"])'

Files

File Description
check_router_reboot.jl Router ping monitor with auto-reboot
check_yiem_website_reboot.jl Website HTTP monitor with auto-reboot
check_and_reboot_state.json State persistence file (generated)
check_router_reboot_log.txt Router monitor log file
check_website_reboot_log.txt Website monitor log file

State File

The state is stored in check_and_reboot_state.json with the following structure:

{
  "consecutive_fails": 0,
  "last_reboot_datetime": "2026-03-11T10:00:00"
}

Log Output

Logs are written to both the console and the respective log files with timestamps:

[2026-03-11T10:05:09.123] Starting check loop. Checking router 192.168.88.1 every 60 seconds.
[2026-03-11T10:05:09.456] 192.168.88.1 is reachable; resetting consecutive failure counter.
[2026-03-11T10:06:09.789] 192.168.88.1 is unreachable (last result: no response). Consecutive fails: 1/3.

Reboot Commands

The scripts automatically select the appropriate reboot command based on the operating system:

  • Linux: sudo systemctl reboot or sudo reboot
  • macOS: sudo shutdown -r now
  • Windows: shutdown /r /t 0

Safety Features

  1. Cooldown Period: After a reboot, the script waits COOLDOWN_AFTER_REBOOT_SECS seconds before performing another check
  2. Consecutive Failures: Requires multiple consecutive failures before triggering a reboot
  3. Dry Run Mode: Set DRY_RUN = true to test without actually rebooting