2026-03-20 10:15:01 +07:00
2026-03-20 09:06:21 +07:00
2026-03-20 09:06:21 +07:00
2026-03-20 09:19:23 +07:00
2026-03-11 17:02:18 +07:00
2026-03-11 17:02:18 +07:00
2026-03-19 20:34:20 +07:00
2026-03-19 20:34:20 +07:00
2026-03-20 10:15:01 +07:00
2026-03-20 09:06:21 +07:00

check_and_reboot

An automated network card recovery tool written in Julia that monitors network connectivity and triggers system reboot to recover from network card hang/stuck conditions.

Overview

This project consists of two monitoring scripts that detect network failures and automatically reboot the system to recover the network card:

Both scripts run continuously in the background, performing periodic health checks and automatically rebooting the system if consecutive failures exceed a configured threshold. The reboot serves to reset the network card and restore network connectivity.

Features

  • Continuous Monitoring: Runs indefinitely with configurable check intervals
  • Multi-attempt Verification: Retries failed checks with backoff before declaring failure
  • State Persistence: Maintains state in JSON file across restarts
  • Cooldown Period: Prevents rapid repeated reboots after a reboot event
  • Cross-Platform Support: Works on Linux, macOS, and Windows with appropriate reboot commands
  • Broadcast Notifications: Sends system-wide notifications on events (via wall, logger, or platform equivalents)
  • Log Rotation: Automatically limits log file to last 100 entries to prevent unbounded growth
  • Dry Run Mode: Test configuration without triggering actual reboots

Configuration

Router Monitor Configuration (check_router_reboot.jl)

const ROUTER_IP = "192.168.88.1"      # Target router IP address
const TIMEOUT_SECS = 30                # Request timeout in seconds
const ATTEMPTS_PER_CHECK = 1           # Number of ping attempts per check
const BACKOFF_BETWEEN_ATTEMPTS = 1     # Seconds between retry attempts
const FAILS_TO_REBOOT = 3              # Consecutive failures before reboot
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
const DRY_RUN = false                  # Set false to enable actual reboots
const CHECK_INTERVAL_SECS = 60         # Check interval in seconds

Website Monitor Configuration (check_yiem_website_reboot.jl)

const URL = "https://www.yiem.cc"      # Target URL to monitor
const TIMEOUT_SECS = 30                # Request timeout in seconds
const ATTEMPTS_PER_CHECK = 3           # Number of HTTP attempts per check
const BACKOFF_BETWEEN_ATTEMPTS = 60    # Seconds between retry attempts
const FAILS_TO_REBOOT = 3              # Consecutive failures before reboot
const COOLDOWN_AFTER_REBOOT_SECS = 600 # Minimum seconds between reboots
const DRY_RUN = false                  # Set false to enable actual reboots
const CHECK_INTERVAL_SECS = 60         # Check interval in seconds

Usage

Running Manually

julia check_router_reboot.jl
julia check_yiem_website_reboot.jl

Running at System Boot (Crontab)

Add the following to root's crontab (sudo crontab -e):

@reboot /usr/local/bin/juliar /path/to/check_router_reboot.jl >> /var/log/check_reboot.log 2>&1
@reboot /usr/local/bin/juliar /path/to/check_yiem_website_reboot.jl >> /var/log/check_reboot.log 2>&1

Note: The scripts use juliar which is a symlink to Julia for root (separate from user's Julia installation).

Required Dependencies

Install the required Julia packages:

julia -e 'using Pkg; Pkg.add(["HTTP", "Dates", "JSON"])'

Files

File Description
check_router_reboot.jl Router ping monitor with auto-reboot
check_yiem_website_reboot.jl Website HTTP monitor with auto-reboot
check_and_reboot_state.json State persistence file (generated)
check_router_reboot_log.txt Router monitor log file
check_website_reboot_log.txt Website monitor log file

State File

The state is stored in check_and_reboot_state.json with the following structure:

{
  "consecutive_fails": 0,
  "last_reboot_datetime": "2026-03-11T10:00:00"
}

Log Output

Logs are written to both the console and the respective log files with timestamps:

[2026-03-11T10:05:09.123] Starting check loop. Checking router 192.168.88.1 every 60 seconds.
[2026-03-11T10:05:09.456] 192.168.88.1 is reachable; resetting consecutive failure counter.
[2026-03-11T10:06:09.789] 192.168.88.1 is unreachable (last result: no response). Consecutive fails: 1/3.

Reboot Commands

The scripts automatically select the appropriate reboot command based on the operating system:

  • Linux: sudo systemctl reboot or sudo reboot
  • macOS: sudo shutdown -r now
  • Windows: shutdown /r /t 0

Safety Features

  1. Cooldown Period: After a reboot, the script waits COOLDOWN_AFTER_REBOOT_SECS seconds before performing another check
  2. Consecutive Failures: Requires multiple consecutive failures before triggering a reboot
  3. Dry Run Mode: Set DRY_RUN = true to test without actually rebooting
Description
No description provided
Readme 154 KiB
v0.1.1 Latest
2026-03-11 14:07:41 +00:00
Languages
Julia 100%