net-tools/docs/topology.md
Natalie 68c848dc56 feat(@tools/net-tools): add tray icon system
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-10 02:20:23 -07:00

11 KiB
Raw Permalink Blame History

Mesh topology

Networks

              ┌─────────────────────────────────────────────┐
              │  yuzu (vps, quinn-vps) — 1984, Iceland       │
              │  WireGuard hub   wg 10.9.0.1                 │
              │  public 89.127.233.145:51820                 │
              └───────────────┬─────────────────────────────┘
                              │  wg1 (AllowedIPs 10.9.0.0/24, 10.0.0.0/24)
          ┌───────────────┬───┴───────────────┬──────────────┐
          │               │                   │              │
   ┌──────┴──────┐ ┌──────┴──────┐     ┌──────┴──────┐ ┌─────┴───────┐
   │ apricot     │ │ pear (black)│     │ fennel      │ │ strawberry  │
   │ wg 10.9.0.2 │ │ wg 10.9.0.4 │     │ (plum)      │ │ (phone-     │
   │ lan: DHCP,  │─│ lan 10.0.0. │     │ wg 10.9.0.3 │ │  quinn) ios │
   │ discovered  │L│     11      │     │ macOS,      │ │ wg 10.9.0.5 │
   │ mesh DNS    │A│ LAN DNS     │     │ roams       │ │ DNS client  │
   └─────────────┘N└─────────────┘     └─────────────┘ └─────────────┘
        apricot + pear share the home LAN (10.0.0.0/24); fennel joins it
        when physically home; phones ride the tunnel with DNS=10.9.0.2.
  • Mesh 10.9.0.0/24 — full WireGuard overlay via the Iceland hub. Every host reaches every other by <host>.wg while the tunnel is up.
  • LAN 10.0.0.0/24 — apricot + black, plus plum when home. The tunnel also routes this /24, so 10.0.0.x works off-LAN through the hub (higher latency).

DNS responsibilities — and how .wg actually resolves

Two delivery paths, and they serve different consumers. This distinction is load-bearing (a config that renders a record is not the same as a client that can resolve it):

  • apricot runs dnsmasq bound to 10.9.0.2:53 (the mesh view). Serves the host .wg + .lan records from mesh-hosts.json, written by wg-dns-sync. These records are consumed only by clients whose WireGuard config sets DNS=10.9.0.2 — i.e. phones. The named hosts (apricot/pear/fennel) do not point their resolver at 10.9.0.2, so for them dnsmasq does not answer.
  • For the named hosts, names are delivered by the managed /etc/hosts block from mesh-hosts-render --install (bare + .lan at current IPs, .wg, service vhosts). Every node's agent regenerates it automatically on drift.
  • fennel roams off-LAN where dnsmasq is unreachable, so the managed /etc/hosts block is its only resolution path then.

The old *.local platform scheme is retired (platform → .com, infra → .lan); net-tools renders no .local.

Reachability matrix

from ↓ \ to → apricot pear fennel yuzu
apricot .lan ✦ · .wg fennel.wg only .wg
pear .lan ✦ · .wg fennel.wg only .wg
fennel .lan ✦ · .wg .lan ✦ · .wg .wg only
yuzu .wg only .wg only fennel.wg only

✦ preferred when co-located on the home LAN · ⚑ fennel falls back to .wg when it roams · fennel and yuzu are only ever reachable inbound via .wg (fennel has no stable LAN IP; yuzu has no LAN leg) · strawberry is reachable at strawberry.wg (10.9.0.5) when its tunnel is up, but runs no services.

.wg in this matrix resolves via each node's managed /etc/hosts block, which every agent maintains — the dnsmasq .wg records are the phones-only path (see DNS responsibilities above).

Hub IP note

plum's live wg1.conf endpoint is 89.127.233.145:51820. An older magic-civilization/scripts/lan/README.md also lists 93.95.231.174 for the Iceland hub — treat that as stale/secondary unless confirmed against the hub's own WireGuard config. mesh-hosts.json records only the live .145.

The fleet agent

smart-lan-router/smart-lan-router.py runs as a root service on every node (launchd on darwin, systemd on linux — install-agent.sh picks). One codebase; each node derives its roles from its own mesh-hosts.json entry:

Role Who What
pull all git pull as the repo owner (never root); exit-and-restart when its own code changes — pushing to the forge updates the fleet
hostname all (fleet.enforce_hostname) converge the OS hostname to the canonical name — the fleet renames hosts, humans don't
discover LAN nodes declared MAC → current DHCP IP via ARP/ip neighdata/lan-state.json (each LAN node discovers independently)
route laptop, darwin the home/away subnet switch below
render all regenerate /etc/hosts + ssh config on any change, at this node's vantage (mesh-only nodes resolve everything via .wg IPs)

The original laptop problem the route role solves:

The problem it solves: the wg config's AllowedIPs includes 10.0.0.0/24, so the tunnel installs a route capturing the entire home LAN. While home, traffic to home hosts hairpins through the Iceland hub (~350ms) instead of going out the LAN interface (~5ms). (Measured: apricot 351ms via tunnel → 5.6ms via en0.)

What it does, each cycle:

  1. Detect location — read the default route's gateway + interface. It's HOME iff the gateway is lan.gateway and its ARP MAC == lan.gateway_mac (the home gateway's fingerprint). The MAC check is what distinguishes the real home LAN from a visited café network that also happens to use 10.0.0.0/24.
  2. Switch the subnet route — HOME → route 10.0.0.0/24 via the LAN interface (direct); AWAY → via the wg mesh interface (so home stays reachable through the tunnel). Re-asserted every cycle, because wg-quick re-adds the tunnel /24 on reconnect.
  3. Name-sync (discover role) — keep ssh + hosts in sync with reality. Each LAN host's MAC is stable while its DHCP IP drifts, and the neighbour table (ARP / ip neigh) maps MAC↔IP. The agent reads it (rate-limited ping-sweep of the /24 when a host is missing), resolves every hosts[] entry with a mac to its current IP, and on any change writes data/lan-state.json ({name: ip}, gitignored — volatile, per-device) and regenerates both views: mesh-hosts-render --install (/etc/hosts) and, as the node's render user (its ssh_user), host-apply --ssh-apply (~/.ssh/config). Proven live: when apricot rebooted from .116 to .118, ssh apricot and quinn.apricot.lan followed automatically — no DHCP reservations, no hand-edits.

Why a subnet route, not per-host /32 pins (the old design): a /32 -interface route on macOS creates a self-MAC ARP entry that blackholes the host. A subnet route uses normal ARP, so every home host — at whatever DHCP address it currently holds — just works. This is drift-immune (apricot moving .116→.118 needs no config change) and free of the self-MAC bug. --status prints location + current route.

It re-reads mesh-hosts.json each cycle; a bad read keeps last-good and never tears down routing (KeepAlive root daemon over an autocommit-written repo).

Supersedes both the old per-host identity-probe pinner and the wg-route-watchdog system daemon (which unconditionally forced 10.0.0.0/24 through the tunnel — the home branch is the new, smarter behavior; the away branch preserves the watchdog's original purpose). The watchdog was retired (/Library/LaunchDaemons/com.natalie.wg-route-watchdog.plist + /usr/local/sbin/wg-route-watchdog.sh removed).

Fleet rename

Names follow fruit family = machine class (apricot=GPU stone fruit, pear=CPU/storage pome, yuzu=cloud citrus, fennel=laptop vegetable, strawberry=phone berry), executed alias-first: the fruit name is canonical, the old name lives in aliases[] forever, and every renderer emits both — pear.wg+black.wg, forge.pear.lan+forge.black.lan, ssh black keeps working. Old names are never retired; nothing that says "black" ever breaks.

OS hostnames converge automatically: with fleet.enforce_hostname: true, each node's agent renames its own OS (scutil ×3 / hostnamectl) to the canonical name on its next cycle — this is how the relic FQDNs (plum.voyager.nasty.sh, 0.vps.1984.uvlava.com) die. Never run the rename by hand. String-identity consumers stay untouched on purpose: the Forgejo runner label stays black (workflows reference it), the forge URL and NFS exports keep their old names as permanent aliases.

Migration

This repo replaces tooling scattered across four places:

Was Now Status
session-tools/data/wg-mesh-hosts.json data/mesh-hosts.json (expanded: .wg view, hosts[], mac, identity, fruit names) here
session-tools/bin/wg-dns-sync bin/wg-dns-sync (robust symlink path resolution) here + fixed
magic-civilization/scripts/lan/subscribe-black-dns.sh — (retired: *.local scheme is dead) removed
setup-lan-dns.sh (not in ~/Code — drifted) bin/mesh-hosts-render replaced
bin/host-apply (per-device ssh view) new here
~/bin/smart-lan-router.py (loose) smart-lan-router/smart-lan-router.py (JSON-driven, self-heal) here + fixed
~/{install-agent.sh,com.lilith…plist} (loose) smart-lan-router/ here

Done (2026-06-09): agent installed + verified on all four nodes (launchd on fennel; systemd on pear/apricot/yuzu); all three remote nodes are real git clones of origin/main (repo public on the LAN-only forge for credential-less pulls); mesh-hosts-render --install + host-apply --ssh-apply live on all four; fennel's hostname converged; the old wg-route-watchdog, setup-lan-dns block, /etc/resolver/*.lan files, loose ~/bin/smart-lan-router.py, and the stale self-MAC ARP entry are all retired.

Still pending:

  1. apricot mesh-DNS cutover — run sudo bin/wg-dns-sync on apricot from this repo (serves phones the .wg/.lan names); verify dig @10.9.0.2 apricot.wg. Then update the two session-tools consumers that call the old absolute path (bin/apricot-doctor, bin/quinn-phone-bootstrap) and delete the originals from session-tools/{data,bin}.
  2. pear/yuzu hostname convergence — automatic on the next pull cycle after the fleet.enforce_hostname commit lands on the forge (the agents do it; watch for hostname converged: black → pear in the journal).
  3. yuzu → home ssh auth — yuzu reaches pear/apricot by name but its key is not authorized there. Deliberate: internet-facing node, least-privilege. Grant only if actually needed.