134 lines
5.5 KiB
Markdown
134 lines
5.5 KiB
Markdown
|
|
# lan-power-ctrl — LAN power-cycle for wedged hosts
|
||
|
|
|
||
|
|
When a host's userspace freezes (D-state, OOM, disk hang), the kernel still
|
||
|
|
answers ICMP and accepts TCP, but no daemon completes a request. SSH banner
|
||
|
|
never arrives → unreachable. The only recovery is a hardware power-cycle.
|
||
|
|
|
||
|
|
This doc covers the hardware to buy, the network topology, and the two
|
||
|
|
scripts (`host-probe`, `power-cycle`) that turn "apricot froze" into a
|
||
|
|
one-liner from any machine on the LAN/VPN.
|
||
|
|
|
||
|
|
## Diagnosis vs recovery
|
||
|
|
|
||
|
|
```
|
||
|
|
layer healthy host frozen userspace unreachable host
|
||
|
|
───── ──────────── ──────────────── ────────────────
|
||
|
|
ICMP reply <50ms reply <50ms no reply
|
||
|
|
TCP accept :22 succeeds succeeds connection refused / timeout
|
||
|
|
SSH banner arrives <100ms never arrives —
|
||
|
|
|
||
|
|
host-probe up wedged down
|
||
|
|
power-cycle n/a needed power's the cure regardless
|
||
|
|
```
|
||
|
|
|
||
|
|
A `wedged` classification is the signature that a remote shell can't help —
|
||
|
|
every login path needs the same userspace that's stuck. Cut power.
|
||
|
|
|
||
|
|
## Shopping list
|
||
|
|
|
||
|
|
### Per host to be remotely recoverable
|
||
|
|
|
||
|
|
| Device | Where | Purpose | Link |
|
||
|
|
|---|---|---|---|
|
||
|
|
| **Shelly Plus Plug US** (Gen2) **or Shelly Plug US Gen4** | Between the host's PSU and the wall | Local HTTP RPC, no cloud | [Plus (Gen2)](https://us.shelly.com/products/shelly-plus-plug-us) · [Gen4](https://us.shelly.com/products/shelly-plug-us-gen4-white) |
|
||
|
|
|
||
|
|
Either generation works — Gen4 adds power monitoring and Matter, both share
|
||
|
|
the same Gen2-compatible `/rpc/Switch.Set` API the `power-cycle` script uses.
|
||
|
|
Buy whichever is in stock and cheapest at order time. ~$25.
|
||
|
|
|
||
|
|
### For the modem (separate problem — can't rescue itself over its own WiFi)
|
||
|
|
|
||
|
|
| Device | Where | Purpose | Link |
|
||
|
|
|---|---|---|---|
|
||
|
|
| **SwitchBot Plug Mini** | Modem | BLE-controllable plug | [product](https://us.switch-bot.com/products/switchbot-plug-mini) |
|
||
|
|
| **USB Bluetooth dongle** | Plugged into **black** (X399 boards have no integrated BT — see [[reference-host-hardware]]) | Lets black drive BLE to the modem plug | Search "USB Bluetooth 5 adapter CSR8510 or RTL8761" (~$10, plug-and-play under bluez) |
|
||
|
|
|
||
|
|
Why black and not apricot: using apricot to rescue the modem couples failure
|
||
|
|
modes — apricot is *also* a recoverable host. Black is the independent
|
||
|
|
watchdog.
|
||
|
|
|
||
|
|
Why not put the SwitchBot Plug Mini on a *host* (so WiFi-controlled)? It
|
||
|
|
*can* be — but for the modem you can't, because the modem dying takes WiFi
|
||
|
|
down with it. BLE from a LAN-attached watchdog works regardless of WAN
|
||
|
|
state, assuming **separate modem and router** (see Caveat).
|
||
|
|
|
||
|
|
### Caveat: combined modem-router vs separate
|
||
|
|
|
||
|
|
If your ISP gave you a combined modem-router (one box does both), modem-dead
|
||
|
|
also means LAN-dead. Nothing on the LAN can rescue it. Options narrow to:
|
||
|
|
phone/laptop in BLE range, or an auto-rebooter plug (watchdog built into the
|
||
|
|
plug itself — pings a target, power-cycles on failure).
|
||
|
|
|
||
|
|
If modem and router are separate (typical home-server setup), LAN stays
|
||
|
|
alive when WAN dies, and the black-as-watchdog plan works as designed.
|
||
|
|
|
||
|
|
## One-time setup after plugs arrive
|
||
|
|
|
||
|
|
1. **Set the Shelly's own restore behavior** (so it remembers its on/off
|
||
|
|
state across a *wall* outage):
|
||
|
|
```sh
|
||
|
|
curl 'http://<plug>/rpc/Switch.SetConfig?id=0&config={"initial_state":"restore_last"}'
|
||
|
|
```
|
||
|
|
2. **Set the host's BIOS to "Restore on AC power loss"** (so when the Shelly
|
||
|
|
restores power, the host actually boots up rather than sitting dark).
|
||
|
|
3. **Add the plug to the config:**
|
||
|
|
```
|
||
|
|
# ~/.config/power-cycle/plugs.conf
|
||
|
|
apricot http://<plug-ip>
|
||
|
|
```
|
||
|
|
Get `<plug-ip>` from the router DHCP table or `arp -a | grep -i shelly`.
|
||
|
|
|
||
|
|
## Scripts
|
||
|
|
|
||
|
|
### `host-probe` — classify reachability
|
||
|
|
|
||
|
|
```sh
|
||
|
|
host-probe apricot.lan # → up | wedged | down (one-shot)
|
||
|
|
host-probe --watch apricot.lan # loop; emits one line per state change
|
||
|
|
HOST_PROBE_INTERVAL=10 HOST_PROBE_TIMEOUT=2 host-probe --watch apricot.lan 22
|
||
|
|
```
|
||
|
|
|
||
|
|
Three independent probes: ICMP, TCP accept, SSH banner. The banner timing is
|
||
|
|
what distinguishes a wedged host from a healthy one — a real sshd flushes
|
||
|
|
its `SSH-2.0-…` banner within milliseconds of connect.
|
||
|
|
|
||
|
|
Composes with the harness `Monitor` tool: each state-change line becomes one
|
||
|
|
notification.
|
||
|
|
|
||
|
|
### `power-cycle` — Shelly Gen2 RPC wrapper
|
||
|
|
|
||
|
|
```sh
|
||
|
|
power-cycle apricot # off → POWER_CYCLE_OFF_SECS (default 5) → on
|
||
|
|
power-cycle off|on|status apricot
|
||
|
|
power-cycle list # show configured host → plug mappings
|
||
|
|
```
|
||
|
|
|
||
|
|
Config: `~/.config/power-cycle/plugs.conf`, one line per host:
|
||
|
|
|
||
|
|
```
|
||
|
|
<host> <plug-base-url>
|
||
|
|
```
|
||
|
|
|
||
|
|
Errors covered: missing config, unknown host, plug unreachable (with hint to
|
||
|
|
fall back to BLE for modem outages), unrecognized Shelly response.
|
||
|
|
|
||
|
|
## Future: modem watchdog daemon on black
|
||
|
|
|
||
|
|
Once the USB BT dongle and SwitchBot plug arrive:
|
||
|
|
|
||
|
|
- `bleak`-based Python daemon, runs as `systemd --user` unit on black
|
||
|
|
- Pings WAN (e.g. `1.1.1.1`) every 30s
|
||
|
|
- After N consecutive failures, BLE-toggles the SwitchBot plug off → 10s → on
|
||
|
|
- Logs to journal; emits a notification via the existing apricot
|
||
|
|
speech-synthesis path (see [[reference-rvoice]]) when it acts
|
||
|
|
|
||
|
|
Not implemented yet — waiting on hardware.
|
||
|
|
|
||
|
|
## Files
|
||
|
|
|
||
|
|
| Path | Role |
|
||
|
|
|---|---|
|
||
|
|
| `bin/host-probe` | Three-layer reachability probe |
|
||
|
|
| `bin/power-cycle` | Shelly Gen2 RPC wrapper |
|
||
|
|
| `~/.config/power-cycle/plugs.conf` | Per-host plug URL mapping (user-created) |
|