9.7 KiB
Mesh topology
Networks
┌─────────────────────────────────────────────┐
│ vps (quinn-vps) — 1984 Hosting, Iceland │
│ WireGuard hub wg 10.9.0.1 │
│ public 89.127.233.145:51820 │
└───────────────┬─────────────────────────────┘
│ wg1 tunnel (AllowedIPs 10.9.0.0/24, 10.0.0.0/24)
┌───────────────────┼───────────────────┐
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│ apricot │ │ black │ │ plum │
│ wg 10.9.0.2 │ │ wg 10.9.0.4 │ │ wg 10.9.0.3 │
│ lan 10.0.0. │─────│ lan 10.0.0. │ │ macOS, │
│ 116 │ LAN │ 11 │ │ roams │
│ mesh DNS │ │ LAN DNS │ │ (DHCP) │
└─────────────┘ └─────────────┘ └─────────────┘
apricot + black share the home LAN (10.0.0.0/24);
plum joins it only when physically home, else routes via the hub.
- Mesh
10.9.0.0/24— full WireGuard overlay via the Iceland hub. Every host reaches every other by<host>.wgwhile the tunnel is up. - LAN
10.0.0.0/24— apricot + black, plus plum when home. The tunnel also routes this /24, so10.0.0.xworks off-LAN through the hub (higher latency).
DNS responsibilities — and how .wg actually resolves
Two delivery paths, and they serve different consumers. This distinction is load-bearing (a config that renders a record is not the same as a client that can resolve it):
- apricot runs dnsmasq bound to
10.9.0.2:53(the mesh view). Serves the host.wg+.lanrecords frommesh-hosts.json, written bywg-dns-sync. These records are consumed only by clients whose WireGuard config setsDNS=10.9.0.2— i.e. phones. The named hosts (apricot/pear/fennel) do not point their resolver at10.9.0.2, so for them dnsmasq does not answer. - For the named hosts,
.wg/.lanis delivered by the static/etc/hostsblock frommesh-hosts-render --install. Run it on every host that must resolve a peer's name. (Verified: before any install,dscacheutil -q host -a name apricot.wgon fennel returns nothing.) - fennel roams off-LAN where dnsmasq is unreachable, so the static
/etc/hostsblock is its only resolution path then.
The old *.local platform scheme is retired (platform → .com, infra →
.lan); net-tools renders no .local.
Reachability matrix
| from ↓ \ to → | apricot | pear | fennel | yuzu |
|---|---|---|---|---|
| apricot | — | .lan ✦ · .wg |
fennel.wg only |
.wg |
| pear | .lan ✦ · .wg |
— | fennel.wg only |
.wg |
| fennel | .lan ✦ · .wg ⚑ |
.lan ✦ · .wg ⚑ |
— | .wg only |
| yuzu | .wg only |
.wg only |
fennel.wg only |
— |
✦ preferred when co-located on the home LAN · ⚑ plum falls back to .wg when it
roams · plum and vps are only ever reachable inbound via .wg (plum has no
stable LAN IP; vps has no LAN leg).
.wg in this matrix is resolved via each host's static /etc/hosts block
(mesh-hosts-render --install), not via dnsmasq — see DNS responsibilities
above. The dnsmasq .wg records are the phones-only path. So the matrix holds
only once the static block is installed on apricot, black, and plum.
Hub IP note
plum's live wg1.conf endpoint is 89.127.233.145:51820. An older
magic-civilization/scripts/lan/README.md also lists 93.95.231.174 for the
Iceland hub — treat that as stale/secondary unless confirmed against the hub's
own WireGuard config. mesh-hosts.json records only the live .145.
Smart routing daemon (fennel)
smart-lan-router/smart-lan-router.py runs as a root LaunchDaemon on the laptop.
The problem it solves: the wg config's AllowedIPs includes 10.0.0.0/24, so
the tunnel installs a route capturing the entire home LAN. While home, traffic
to home hosts hairpins through the Iceland hub (~350ms) instead of going out the
LAN interface (~5ms). (Measured: apricot 351ms via tunnel → 5.6ms via en0.)
What it does, each cycle:
- Detect location — read the default route's gateway + interface. It's HOME
iff the gateway is
lan.gatewayand its ARP MAC ==lan.gateway_mac(the home gateway's fingerprint). The MAC check is what distinguishes the real home LAN from a visited café network that also happens to use10.0.0.0/24. - Switch the subnet route — HOME →
route 10.0.0.0/24via the LAN interface (direct); AWAY → via the wg mesh interface (so home stays reachable through the tunnel). Re-asserted every cycle, becausewg-quickre-adds the tunnel/24on reconnect.
Why a subnet route, not per-host /32 pins (the old design): a /32 -interface route on macOS creates a self-MAC ARP entry that blackholes the
host. A subnet route uses normal ARP, so every home host — at whatever DHCP
address it currently holds — just works. This is drift-immune (apricot moving
.116→.118 needs no config change) and free of the self-MAC bug. --status
prints location + current route.
It re-reads mesh-hosts.json each cycle; a bad read keeps last-good and never
tears down routing (KeepAlive root daemon over an autocommit-written repo).
Supersedes both the old per-host identity-probe pinner and the
wg-route-watchdog system daemon (which unconditionally forced 10.0.0.0/24
through the tunnel — the home branch is the new, smarter behavior; the away
branch preserves the watchdog's original purpose). The watchdog was retired
(/Library/LaunchDaemons/com.natalie.wg-route-watchdog.plist +
/usr/local/sbin/wg-route-watchdog.sh removed).
Fleet rename
Names follow fruit family = machine class (apricot=GPU, pear=CPU/storage,
yuzu=cloud, fennel=laptop), executed alias-first: mesh-hosts.json sets the
fruit name canonical with the old name in aliases[], and every renderer emits
both (pear.wg+black.wg, forge.pear.lan+forge.black.lan). Nothing
breaks on day one. Irreversible cutovers are separately gated: OS hostname
(hostnamectl/scutil — also fixes plum's stale plum.voyager.nasty.sh), the
Forgejo URL, black's NFS export host, ssh stanzas, and the reference sweep
(memory, CLAUDE.md, MCP ssh-by-name). Never retire an old name until every
consumer resolves the new one. apricot is unchanged.
Migration
This repo replaces tooling scattered across four places:
| Was | Now | Status |
|---|---|---|
session-tools/data/wg-mesh-hosts.json |
data/mesh-hosts.json (expanded: .wg view, hosts[], mac, identity, fruit names) |
✅ here |
session-tools/bin/wg-dns-sync |
bin/wg-dns-sync (robust symlink path resolution) |
✅ here + fixed |
magic-civilization/scripts/lan/subscribe-black-dns.sh |
— (retired: *.local scheme is dead) |
✅ removed |
setup-lan-dns.sh (not in ~/Code — drifted) |
bin/mesh-hosts-render |
✅ replaced |
bin/host-apply (per-device ssh view) |
new | ✅ here |
~/bin/smart-lan-router.py (loose) |
smart-lan-router/smart-lan-router.py (JSON-driven, self-heal) |
✅ here + fixed |
~/{install-smart-router.sh,com.lilith…plist} (loose) |
smart-lan-router/ |
✅ here |
Pending — gated on greenlight (these touch live DNS on apricot):
- Re-clone/pull this repo on apricot and run
./install.sh. - Run
sudo wg-dns-syncon apricot from this repo; verify dnsmasq still serves (dig @10.9.0.2 quinn.apricot.lan,dig @10.9.0.2 apricot.wg). - Update the two session-tools consumers that call the old path by absolute
reference —
bin/apricot-doctor("$repo/bin/wg-dns-sync") andbin/quinn-phone-bootstrap(ssh apricot 'cd …/session-tools && sudo bin/wg-dns-sync') — to the new repo path. - Run
sudo mesh-hosts-render --installon apricot, black, and plum (every host that must resolve a peer's.wgname — dnsmasq only answers.wgfor phones withDNS=10.9.0.2). Then on plum retire the oldsetup-lan-dns.shstatic block and/etc/resolver/{apricot,black}.lan. - fennel (laptop):
sudo smart-lan-router/install-smart-router.shreinstalls the LaunchDaemon pointed at the repo path and retires the loose~/bin/smart-lan-router.py,~/install-smart-router.sh,~/com.lilith.smart-lan-router.plist. Verifyroute -n get 10.0.0.11→interface: en0(not utun*). - Only after apricot is verified on the new path: remove the originals from
session-toolsandmagic-civilization/scripts/lan, and push. - Fleet rename cutovers (each independently, after the above): ssh stanzas →
OS hostname → Forgejo
forge.pear.lanvhost → NFS export → reference sweep. See Fleet rename.
Do not delete the originals in the same change that adds this repo — every host still running the old path needs to re-install first.
Blocked right now: the laptop's LAN to pear/apricot is degraded by the very stale self-MAC ARP entry the daemon now self-heals (
10.0.0.11 → fennel's own MAC, permanent). Clear it (sudo arp -d 10.0.0.11) and reinstall the daemon (step 5) to restore the LAN fast-path before attempting any remote cutover.