11 KiB
Mesh topology
Networks
┌─────────────────────────────────────────────┐
│ yuzu (vps, quinn-vps) — 1984, Iceland │
│ WireGuard hub wg 10.9.0.1 │
│ public 89.127.233.145:51820 │
└───────────────┬─────────────────────────────┘
│ wg1 (AllowedIPs 10.9.0.0/24, 10.0.0.0/24)
┌───────────────┬───┴───────────────┬──────────────┐
│ │ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ ┌─────┴───────┐
│ apricot │ │ pear (black)│ │ fennel │ │ strawberry │
│ wg 10.9.0.2 │ │ wg 10.9.0.4 │ │ (plum) │ │ (phone- │
│ lan: DHCP, │─│ lan 10.0.0. │ │ wg 10.9.0.3 │ │ quinn) ios │
│ discovered │L│ 11 │ │ macOS, │ │ wg 10.9.0.5 │
│ mesh DNS │A│ LAN DNS │ │ roams │ │ DNS client │
└─────────────┘N└─────────────┘ └─────────────┘ └─────────────┘
apricot + pear share the home LAN (10.0.0.0/24); fennel joins it
when physically home; phones ride the tunnel with DNS=10.9.0.2.
- Mesh
10.9.0.0/24— full WireGuard overlay via the Iceland hub. Every host reaches every other by<host>.wgwhile the tunnel is up. - LAN
10.0.0.0/24— apricot + black, plus plum when home. The tunnel also routes this /24, so10.0.0.xworks off-LAN through the hub (higher latency).
DNS responsibilities — and how .wg actually resolves
Two delivery paths, and they serve different consumers. This distinction is load-bearing (a config that renders a record is not the same as a client that can resolve it):
- apricot runs dnsmasq bound to
10.9.0.2:53(the mesh view). Serves the host.wg+.lanrecords frommesh-hosts.json, written bywg-dns-sync. These records are consumed only by clients whose WireGuard config setsDNS=10.9.0.2— i.e. phones. The named hosts (apricot/pear/fennel) do not point their resolver at10.9.0.2, so for them dnsmasq does not answer. - For the named hosts, names are delivered by the managed
/etc/hostsblock frommesh-hosts-render --install(bare +.lanat current IPs,.wg, service vhosts). Every node's agent regenerates it automatically on drift. - fennel roams off-LAN where dnsmasq is unreachable, so the managed
/etc/hostsblock is its only resolution path then.
The old *.local platform scheme is retired (platform → .com, infra →
.lan); net-tools renders no .local.
Reachability matrix
| from ↓ \ to → | apricot | pear | fennel | yuzu |
|---|---|---|---|---|
| apricot | — | .lan ✦ · .wg |
fennel.wg only |
.wg |
| pear | .lan ✦ · .wg |
— | fennel.wg only |
.wg |
| fennel | .lan ✦ · .wg ⚑ |
.lan ✦ · .wg ⚑ |
— | .wg only |
| yuzu | .wg only |
.wg only |
fennel.wg only |
— |
✦ preferred when co-located on the home LAN · ⚑ fennel falls back to .wg when
it roams · fennel and yuzu are only ever reachable inbound via .wg (fennel
has no stable LAN IP; yuzu has no LAN leg) · strawberry is reachable at
strawberry.wg (10.9.0.5) when its tunnel is up, but runs no services.
.wg in this matrix resolves via each node's managed /etc/hosts block, which
every agent maintains — the dnsmasq .wg records are the phones-only path
(see DNS responsibilities above).
Hub IP note
plum's live wg1.conf endpoint is 89.127.233.145:51820. An older
magic-civilization/scripts/lan/README.md also lists 93.95.231.174 for the
Iceland hub — treat that as stale/secondary unless confirmed against the hub's
own WireGuard config. mesh-hosts.json records only the live .145.
The fleet agent
smart-lan-router/smart-lan-router.py runs as a root service on every node
(launchd on darwin, systemd on linux — install-agent.sh picks). One codebase;
each node derives its roles from its own mesh-hosts.json entry:
| Role | Who | What |
|---|---|---|
| pull | all | git pull as the repo owner (never root); exit-and-restart when its own code changes — pushing to the forge updates the fleet |
| hostname | all (fleet.enforce_hostname) |
converge the OS hostname to the canonical name — the fleet renames hosts, humans don't |
| discover | LAN nodes | declared MAC → current DHCP IP via ARP/ip neigh → data/lan-state.json (each LAN node discovers independently) |
| route | laptop, darwin | the home/away subnet switch below |
| render | all | regenerate /etc/hosts + ssh config on any change, at this node's vantage (mesh-only nodes resolve everything via .wg IPs) |
The original laptop problem the route role solves:
The problem it solves: the wg config's AllowedIPs includes 10.0.0.0/24, so
the tunnel installs a route capturing the entire home LAN. While home, traffic
to home hosts hairpins through the Iceland hub (~350ms) instead of going out the
LAN interface (~5ms). (Measured: apricot 351ms via tunnel → 5.6ms via en0.)
What it does, each cycle:
- Detect location — read the default route's gateway + interface. It's HOME
iff the gateway is
lan.gatewayand its ARP MAC ==lan.gateway_mac(the home gateway's fingerprint). The MAC check is what distinguishes the real home LAN from a visited café network that also happens to use10.0.0.0/24. - Switch the subnet route — HOME →
route 10.0.0.0/24via the LAN interface (direct); AWAY → via the wg mesh interface (so home stays reachable through the tunnel). Re-asserted every cycle, becausewg-quickre-adds the tunnel/24on reconnect. - Name-sync (discover role) — keep ssh + hosts in sync with reality. Each
LAN host's MAC is stable while its DHCP IP drifts, and the neighbour
table (ARP /
ip neigh) maps MAC↔IP. The agent reads it (rate-limited ping-sweep of the/24when a host is missing), resolves everyhosts[]entry with amacto its current IP, and on any change writesdata/lan-state.json({name: ip}, gitignored — volatile, per-device) and regenerates both views:mesh-hosts-render --install(/etc/hosts) and, as the node's render user (itsssh_user),host-apply --ssh-apply(~/.ssh/config). Proven live: when apricot rebooted from.116to.118,ssh apricotandquinn.apricot.lanfollowed automatically — no DHCP reservations, no hand-edits.
Why a subnet route, not per-host /32 pins (the old design): a /32 -interface route on macOS creates a self-MAC ARP entry that blackholes the
host. A subnet route uses normal ARP, so every home host — at whatever DHCP
address it currently holds — just works. This is drift-immune (apricot moving
.116→.118 needs no config change) and free of the self-MAC bug. --status
prints location + current route.
It re-reads mesh-hosts.json each cycle; a bad read keeps last-good and never
tears down routing (KeepAlive root daemon over an autocommit-written repo).
Supersedes both the old per-host identity-probe pinner and the
wg-route-watchdog system daemon (which unconditionally forced 10.0.0.0/24
through the tunnel — the home branch is the new, smarter behavior; the away
branch preserves the watchdog's original purpose). The watchdog was retired
(/Library/LaunchDaemons/com.natalie.wg-route-watchdog.plist +
/usr/local/sbin/wg-route-watchdog.sh removed).
Fleet rename
Names follow fruit family = machine class (apricot=GPU stone fruit,
pear=CPU/storage pome, yuzu=cloud citrus, fennel=laptop vegetable,
strawberry=phone berry), executed alias-first: the fruit name is canonical,
the old name lives in aliases[] forever, and every renderer emits both —
pear.wg+black.wg, forge.pear.lan+forge.black.lan, ssh black keeps
working. Old names are never retired; nothing that says "black" ever breaks.
OS hostnames converge automatically: with fleet.enforce_hostname: true,
each node's agent renames its own OS (scutil ×3 / hostnamectl) to the
canonical name on its next cycle — this is how the relic FQDNs
(plum.voyager.nasty.sh, 0.vps.1984.uvlava.com) die. Never run the rename by
hand. String-identity consumers stay untouched on purpose: the Forgejo runner
label stays black (workflows reference it), the forge URL and NFS exports keep
their old names as permanent aliases.
Migration
This repo replaces tooling scattered across four places:
| Was | Now | Status |
|---|---|---|
session-tools/data/wg-mesh-hosts.json |
data/mesh-hosts.json (expanded: .wg view, hosts[], mac, identity, fruit names) |
✅ here |
session-tools/bin/wg-dns-sync |
bin/wg-dns-sync (robust symlink path resolution) |
✅ here + fixed |
magic-civilization/scripts/lan/subscribe-black-dns.sh |
— (retired: *.local scheme is dead) |
✅ removed |
setup-lan-dns.sh (not in ~/Code — drifted) |
bin/mesh-hosts-render |
✅ replaced |
bin/host-apply (per-device ssh view) |
new | ✅ here |
~/bin/smart-lan-router.py (loose) |
smart-lan-router/smart-lan-router.py (JSON-driven, self-heal) |
✅ here + fixed |
~/{install-agent.sh,com.lilith…plist} (loose) |
smart-lan-router/ |
✅ here |
Done (2026-06-09): agent installed + verified on all four nodes (launchd on
fennel; systemd on pear/apricot/yuzu); all three remote nodes are real git
clones of origin/main (repo public on the LAN-only forge for credential-less
pulls); mesh-hosts-render --install + host-apply --ssh-apply live on all
four; fennel's hostname converged; the old wg-route-watchdog, setup-lan-dns
block, /etc/resolver/*.lan files, loose ~/bin/smart-lan-router.py, and the
stale self-MAC ARP entry are all retired.
Still pending:
- apricot mesh-DNS cutover — run
sudo bin/wg-dns-syncon apricot from this repo (serves phones the.wg/.lannames); verifydig @10.9.0.2 apricot.wg. Then update the two session-tools consumers that call the old absolute path (bin/apricot-doctor,bin/quinn-phone-bootstrap) and delete the originals fromsession-tools/{data,bin}. - pear/yuzu hostname convergence — automatic on the next pull cycle after
the
fleet.enforce_hostnamecommit lands on the forge (the agents do it; watch forhostname converged: black → pearin the journal). - yuzu → home ssh auth — yuzu reaches pear/apricot by name but its key is not authorized there. Deliberate: internet-facing node, least-privilege. Grant only if actually needed.