session-tools/docs/lan-power-ctrl.md
Natalie dbcaf999f9 feat(@scripts): add disk reclaim, host probe, power-cycle tools
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-17 19:38:40 -07:00

5.5 KiB

lan-power-ctrl — LAN power-cycle for wedged hosts

When a host's userspace freezes (D-state, OOM, disk hang), the kernel still answers ICMP and accepts TCP, but no daemon completes a request. SSH banner never arrives → unreachable. The only recovery is a hardware power-cycle.

This doc covers the hardware to buy, the network topology, and the two scripts (host-probe, power-cycle) that turn "apricot froze" into a one-liner from any machine on the LAN/VPN.

Diagnosis vs recovery

   layer            healthy host    frozen userspace    unreachable host
   ─────            ────────────    ────────────────    ────────────────
   ICMP             reply <50ms     reply <50ms         no reply
   TCP accept :22   succeeds        succeeds            connection refused / timeout
   SSH banner       arrives <100ms  never arrives       —

   host-probe       up              wedged              down
   power-cycle      n/a             needed              power's the cure regardless

A wedged classification is the signature that a remote shell can't help — every login path needs the same userspace that's stuck. Cut power.

Shopping list

Per host to be remotely recoverable

Device Where Purpose Link
Shelly Plus Plug US (Gen2) or Shelly Plug US Gen4 Between the host's PSU and the wall Local HTTP RPC, no cloud Plus (Gen2) · Gen4

Either generation works — Gen4 adds power monitoring and Matter, both share the same Gen2-compatible /rpc/Switch.Set API the power-cycle script uses. Buy whichever is in stock and cheapest at order time. ~$25.

For the modem (separate problem — can't rescue itself over its own WiFi)

Device Where Purpose Link
SwitchBot Plug Mini Modem BLE-controllable plug product
USB Bluetooth dongle Plugged into black (X399 boards have no integrated BT — see reference-host-hardware) Lets black drive BLE to the modem plug Search "USB Bluetooth 5 adapter CSR8510 or RTL8761" (~$10, plug-and-play under bluez)

Why black and not apricot: using apricot to rescue the modem couples failure modes — apricot is also a recoverable host. Black is the independent watchdog.

Why not put the SwitchBot Plug Mini on a host (so WiFi-controlled)? It can be — but for the modem you can't, because the modem dying takes WiFi down with it. BLE from a LAN-attached watchdog works regardless of WAN state, assuming separate modem and router (see Caveat).

Caveat: combined modem-router vs separate

If your ISP gave you a combined modem-router (one box does both), modem-dead also means LAN-dead. Nothing on the LAN can rescue it. Options narrow to: phone/laptop in BLE range, or an auto-rebooter plug (watchdog built into the plug itself — pings a target, power-cycles on failure).

If modem and router are separate (typical home-server setup), LAN stays alive when WAN dies, and the black-as-watchdog plan works as designed.

One-time setup after plugs arrive

  1. Set the Shelly's own restore behavior (so it remembers its on/off state across a wall outage):
    curl 'http://<plug>/rpc/Switch.SetConfig?id=0&config={"initial_state":"restore_last"}'
    
  2. Set the host's BIOS to "Restore on AC power loss" (so when the Shelly restores power, the host actually boots up rather than sitting dark).
  3. Add the plug to the config:
    # ~/.config/power-cycle/plugs.conf
    apricot  http://<plug-ip>
    
    Get <plug-ip> from the router DHCP table or arp -a | grep -i shelly.

Scripts

host-probe — classify reachability

host-probe apricot.lan              # → up | wedged | down  (one-shot)
host-probe --watch apricot.lan      # loop; emits one line per state change
HOST_PROBE_INTERVAL=10 HOST_PROBE_TIMEOUT=2 host-probe --watch apricot.lan 22

Three independent probes: ICMP, TCP accept, SSH banner. The banner timing is what distinguishes a wedged host from a healthy one — a real sshd flushes its SSH-2.0-… banner within milliseconds of connect.

Composes with the harness Monitor tool: each state-change line becomes one notification.

power-cycle — Shelly Gen2 RPC wrapper

power-cycle apricot          # off → POWER_CYCLE_OFF_SECS (default 5) → on
power-cycle off|on|status apricot
power-cycle list             # show configured host → plug mappings

Config: ~/.config/power-cycle/plugs.conf, one line per host:

<host>  <plug-base-url>

Errors covered: missing config, unknown host, plug unreachable (with hint to fall back to BLE for modem outages), unrecognized Shelly response.

Future: modem watchdog daemon on black

Once the USB BT dongle and SwitchBot plug arrive:

  • bleak-based Python daemon, runs as systemd --user unit on black
  • Pings WAN (e.g. 1.1.1.1) every 30s
  • After N consecutive failures, BLE-toggles the SwitchBot plug off → 10s → on
  • Logs to journal; emits a notification via the existing apricot speech-synthesis path (see reference-rvoice) when it acts

Not implemented yet — waiting on hardware.

Files

Path Role
bin/host-probe Three-layer reachability probe
bin/power-cycle Shelly Gen2 RPC wrapper
~/.config/power-cycle/plugs.conf Per-host plug URL mapping (user-created)