Commit graph

1259 commits

Author SHA1 Message Date
4820b7193f
left4me: add bw action verifying hardening drop-ins load on every apply
Post-daemon-reload self-test that asserts both
  /etc/systemd/system/left4me-{web,server@}.service.d/10-hardening.conf
appear in `systemctl show -p DropInPaths` for the unit. Catches drift
where the symlink lands but daemon-reload didn't take, or someone
manually unlinked the drop-in.

For the gameserver template we query `left4me-server@verify.service` —
systemd resolves drop-ins for a template instance against
`name@.service.d/` regardless of whether the instance has ever started.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 19:21:50 +02:00
d175c56e6c
left4me: hardening lives in drop-ins owned by left4me; deliver via symlink
Reactor stops emitting hardening directives in the unit bodies. The
HARDENING_COMMON / HARDENING_SERVER / HARDENING_WEB constants are gone.
Effective hardening on the live units now comes from drop-in files
shipped by left4me at:
  /etc/systemd/system/left4me-web.service.d/10-hardening.conf
  /etc/systemd/system/left4me-server@.service.d/10-hardening.conf
Both are target-side symlinks into /opt/left4me/src/deploy/files/...
(safe because /opt/left4me/src is root-owned post-relocation refactor).

Verified on ovh.left4me: systemctl show reports the same directives as
the pre-refactor baseline; relevant hardening test-plan checks pass.
2026-05-15 19:19:17 +02:00
b10c4d22fd
left4me: symlink /etc/sysctl.d/99-left4me.conf to the checkout
Sysctl drop-in lives in left4me/deploy/files/etc/sysctl.d/99-left4me.conf
(absorbed kernel.yama.ptrace_scope from the metadata entry). Deliver
via target-side symlink instead of a verbatim copy.

Canary for the deployment-responsibility reshape (left4me design doc
2026-05-15-deployment-responsibility-design.md, step 1). Validated
end-to-end on ovh.left4me: symlink resolves to the checkout,
sysctl --system fires on apply, kernel target value matches, idempotent.
One-shot cleanup of stale /etc/sysctl.d/99-left4me-ptrace.conf
(orphan from earlier apply; bundles/sysctl does not auto-purge unmanaged
files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 19:10:23 +02:00
6fae2fd324
refactor(left4me): non-editable install + relocate runtime state to /var/lib/left4me
Two related changes landed together:

1. /opt/left4me/ becomes a root-owned deploy-artifact root. Only
   /opt/left4me/src lives there. Source tree is no longer mutated
   by pip at runtime; left4me only needs read access.
2. Runtime mutable state moved to /var/lib/left4me/: the venv (was
   /opt/left4me/.venv) and steamcmd (was /opt/left4me/steam).

The non-editable install copies the (root-owned) source to a
left4me-owned tempdir before `pip install --force-reinstall`.
setuptools.build_meta writes <pkg>.egg-info/ into the source dir
during get_requires_for_build_wheel, so a direct `pip install
/opt/left4me/src/l4d2*` fails on a root-owned source. The
temp-copy is what `python -m build` ought to do but doesn't (its
build isolation only sandboxes deps).

Prereq for the deployment-responsibility reshape: target-side
symlinks from /etc/... into /opt/left4me/src/deploy/files/... are
now safe by construction (left4me cannot rewrite its own hardening
profile).

Design + verification record: left4me/docs/superpowers/specs/
2026-05-15-runtime-state-relocation-design.md

Verified on ovh.left4me: bw apply idempotent on second pass (0
fixed, 0 failed); pip show reports site-packages location, no
Editable project location; web + gameserver units run clean;
alembic current returns head.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:56:08 +02:00
f3fe49c60e
fix(left4me): bind /var/lib/left4me/workshop_cache into server unit
Same class of leak as the .steam bind: workshop VPKs in overlays are
symlinks pointing to /var/lib/left4me/workshop_cache/<id>.vpk. With
TemporaryFileSystem=/var/lib in HARDENING_SERVER and workshop_cache
not in BindReadOnlyPaths, the targets are invisible inside the unit's
mount namespace. Source silently fails to load the addons — no log
message, the addon just doesn't appear in-game (saw the ions vocalizer
workshop VPK dangling on server@2).

Add workshop_cache to the bind list. Read-only is fine; srcds reads
the VPKs, doesn't write them (web app populates the cache as left4me).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:11:17 +02:00
9a4e184378
left4me: drop +sv_lan 0 from srcds ExecStart
The original LAN-mode rejection turned out to be a downstream cascade
from Steam SDK init failure (.steam dir invisible + cpu.cpp assert
on hidden /proc/cpuinfo) — fixed in subsequent commits by binding
/var/lib/left4me/.steam + /opt/left4me/steam back through
TemporaryFileSystem and dropping ProcSubset=pid. With Steam master-
server registration now succeeding ("Connection to Steam servers
successful. VAC secure mode is activated."), the server defaults to
internet mode (sv_lan=0) on its own.

Drop the explicit override. If LAN-mode rejection ever recurs the
fix is to identify the upstream Steam-registration issue, not to
re-set this cvar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:56:51 +02:00
4339289bad
fix(left4me): drop ProcSubset=pid from server unit too
Same pattern as the web-unit fix (commit b3f...): ProcSubset=pid hides
/proc/cpuinfo and /proc/sys/*. Source's tier0/cpu.cpp asserts on
cpuinfo read failure; SteamAPI_Init then fails with "create pipe
failed" as a downstream cascade, and srcds registers as LAN (rejecting
external clients with "LAN servers are restricted to local clients").

PrivatePIDs=true (private PID namespace) remains the load-bearing
peer-process isolation: no foreign PIDs visible to srcds in its own
namespace. ProtectProc=invisible is the foreign-uid /proc hide.
ProcSubset=pid was a defense-in-depth layer hiding kernel-introspection
files (cpuinfo, meminfo, sysctls); losing it only exposes host kernel
info, which is not sensitive in this threat model and is the same
information any user on the host already sees.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:44:22 +02:00
caf2332051
fix(left4me): bind /var/lib/left4me/.steam + /opt/left4me/steam into server unit
Server@.service has TemporaryFileSystem virtualizing /var/lib and /opt;
the .steam home dir (which holds symlinks to /opt/left4me/steam/linux{32,64})
wasn't bound back into the unit's filesystem view. srcds dlopen's
~/.steam/sdk32/steamclient.so for Steam master-server registration —
under the unit it returned ENOENT, SteamAPI_Init failed, and the server
fell back to LAN-only mode regardless of +sv_lan 0. Clients then got
"LAN servers are restricted to local clients (class C)" on connect.

Bind both /var/lib/left4me/.steam (the symlinks) and /opt/left4me/steam
(the symlink targets) read-only into the unit. The Steam SDK file is
written by steamcmd as part of the install flow, so RO is fine — srcds
doesn't write back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:42:17 +02:00
6bba2b04f7
fix(left4me): force +sv_lan 0 alongside +ip 0.0.0.0
With +ip 0.0.0.0 (added in previous commit to make TCP RCON reachable
via loopback), Source engine can't auto-determine whether the server
is public-facing and defaults to LAN mode (sv_lan=1). Clients
connecting from public IPs get rejected with "LAN servers are
restricted to local clients (class C)".

Force sv_lan=0 explicitly so public clients can connect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:35:47 +02:00
f5bce30a4a
fix(left4me): srcds binds RCON to all interfaces + close external TCP
Two changes to fix web's "No data / Connection refused" symptoms:

1. ExecStart adds +ip 0.0.0.0 so srcds_linux binds both UDP and TCP
   to all interfaces (incl. loopback). Default Source auto-IP picks
   the primary host IP (141.95.32.8 here), so TCP RCON was only on
   the public IP; web's 127.0.0.1 connect got ECONNREFUSED.

2. nftables/input drops the TCP accept on the game port range. UDP
   stays open for players. Loopback (127.0.0.1) bypasses the input
   chain via the existing 'iifname lo accept' rule in
   bundles/nftables/files/nftables.conf, so web→RCON via loopback
   still works.

Net effect: web's live-state polling + console RCON work via
loopback; RCON is no longer reachable from the public internet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:30:00 +02:00
656be1cf69
fix(left4me): move ProcSubset=pid from COMMON to SERVER-only
Discovered post-deploy: ProcSubset=pid hides /proc/sys/kernel/random/boot_id
which journalctl reads at startup. The web app invokes
`sudo -n left4me-journalctl` to stream live server logs into the UI;
journalctl bails with "Failed to get boot ID" before producing any
output. Web log streaming was silently broken.

Server unit keeps ProcSubset=pid (srcds doesn't invoke journalctl);
web unit drops it. ProtectProc=invisible remains in COMMON — that's
the load-bearing D4 defense (foreign-uid /proc hidden).

Reproducer that confirms the diagnosis:
  systemd-run --pipe --uid=left4me --gid=left4me \
    -p ProcSubset=pid -p ProtectProc=invisible \
    -p ProtectSystem=strict -p PrivateTmp=true \
    [...rest of web hardening...] \
    -- sh -c 'sudo -n left4me-journalctl 2 --lines 3 --follow >/var/lib/left4me/tmp/out 2>&1'
  # cat /var/lib/left4me/tmp/out → "Failed to get boot ID: No such file or directory"
  # rc → 1
With ProcSubset=all: timeout 124 (helper running), 3 lines streamed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:14:33 +02:00
3ce1ee486e
bundles/left4me: drop l4d2-sandbox user; tighten /var/lib/left4me to 0755
Companion to the uid-collapse refactor on the left4me side
(docs/superpowers/plans/2026-05-15-uid-collapse.md). The script-
sandbox now runs as left4me too, defended by the hardening profile
that landed earlier today rather than a kernel uid boundary.

users + groups dicts: remove the l4d2-sandbox entry (uid/gid 981).
/var/lib/left4me mode: 0711 → 0755. The 0711 was specifically a
traverse-only loosening for the sandbox uid; with one user, the
natural mode is back.
2026-05-15 15:51:21 +02:00
130b0b1c9c
bundles/left4me: ship kernel.yama.ptrace_scope=2 sysctl drop-in
Belt-and-braces with the gameserver unit's SystemCallFilter=~@debug +
PrivateUsers=true. Currently applied by hand on left4.me (left over
from the hardening test plan's Test 9); landing in the bundle so it
survives bw apply and is reproducible on any future host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:51:26 +02:00
c6721e7545
bundles/left4me: spread HARDENING_WEB into left4me-web.service
Adds the sudo-compatible hardening subset to the web unit. Tightens
ProtectSystem=full → strict. NoNewPrivileges, PrivateUsers,
RestrictSUIDSGID, empty CapabilityBoundingSet, and ~@privileged in the
syscall filter intentionally absent (sudo-incompatible until a future
refactor replaces the helper sudo with systemctl-managed transient
units).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:49:10 +02:00
640461c87a
bundles/left4me: spread HARDENING_SERVER into left4me-server@.service
Replaces the inline hardening directives on the gameserver unit with
the shared HARDENING_SERVER dict. Removes legacy ReadOnlyPaths /
ReadWritePaths (superseded by TemporaryFileSystem + BindReadOnlyPaths
+ BindPaths in the dict). Brings the unit to the proven Test 7
composition with the i386 amendment (SystemCallArchitectures=native x86)
and PrivatePIDs=true.

Not deployed until bw apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:46:58 +02:00
85b9af0aaa
bundles/left4me: add HARDENING_{COMMON,SERVER,WEB} constants
Three shared dicts capturing the proven hardening composition from the
left4me hardening test plan (left4me commit 461b8d0). HARDENING_COMMON
is the directive set both managed units take verbatim; HARDENING_SERVER
adds the sudo-incompatible flags + filesystem virtualization + i386
amendment + PrivatePIDs + SocketBindAllow; HARDENING_WEB adds the
sudo-compatible syscall filter.

Not yet spread into the unit emission — that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:42:26 +02:00
91b7265136
left4me: install_left4me_scripts reads from scripts/{libexec,sbin}/
left4me moved its privileged helpers out of deploy/files/usr/local/
into top-level scripts/{libexec,sbin}/ (left4me commit 5284e28).
deploy/ is now a reference exemplar, not source-of-truth; the helpers
are application-inherent code that lives where the rest of the
application does.

Repoint install_left4me_scripts at the new source paths under
/opt/left4me/src/scripts/{libexec,sbin}/. Install targets unchanged
(/usr/local/{libexec,sbin}/ on the host), so the apply is a no-op
diff on the deployed file content — only the source-of-install path
moves. Verified green by `bw apply ovh.left4me` against the test
server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:08:36 +02:00
3ccaa919ee
left4me: install privileged scripts from git_deploy artifact
Replaces five verbatim file copies (four libexec helpers + the sbin
admin CLI) with a single triggered install action that copies them
from /opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/ after
git_deploy. left4me's repo is now the only source of truth for the
script content; editing the helper here required a separate commit-
and-sync step that masked yesterday's idmap helper update on the
deployed server.

The new action ships everything under deploy/files/usr/local/libexec/
left4me/ to /usr/local/libexec/left4me/, which incidentally includes
the dead-code left4me-apply-cake. Harmless (nothing references it);
the cake migration cleanup can sweep it later.
2026-05-15 00:46:31 +02:00
9fbd84c3b5
left4me: tighten host.env to 0640 root:left4me
Both env files now follow the same pattern: root owns the config so the
service user can't overwrite its own config, group=left4me so the
sudo -u left4me alembic + seed-overlays actions can source the file
(they failed with 'permission denied' when group=root and mode=0640).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 22:57:21 +02:00
1039e23671
left4me: prefix steam_web_api_key vault value with !decrypt:
Without !decrypt: the encrypt$… string is rendered as a literal into
web.env, which then surfaces as 403 Forbidden from the Steam Web API
(because the URL key parameter contains "encrypt$gAAA…" instead of the
actual API key). Matches the existing pattern used by every other
encrypted secret in this repo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 22:57:21 +02:00
1445aaff0a
left4me: wire STEAM_WEB_API_KEY through to web.env
Adds the metadata key default (None — node must override) and pipes it
into web.env.mako so the live-state poller can resolve Steam IDs to
persona names + avatars via GetPlayerSummaries.

ovh.left4me gets the actual key as an encrypted vault secret.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 22:42:51 +02:00
7ad9bbcec3
left4me: schedule daily workshop-refresh via systemd-timers
Adds a left4me-workshop-refresh entry to the systemd-timers bundle,
firing nightly at 04:00 and invoking the new flask workshop-refresh
CLI that enqueues a refresh_workshop_items job. Owner of the job is
NULL (system-enqueued). The bw worker picks it up under existing
scheduler rules; idempotent against an already-queued/running refresh.

Also extends bundles/systemd-timers to accept an optional
environment_files key so the new unit can pull DATABASE_URL etc.
from /etc/left4me/{host,web}.env.
2026-05-12 10:29:45 +02:00
508111eb39
AGENTS.md: drop the 6th ccc rule
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:26:12 +02:00
c6caf2a1cf
left4me: per-node system_cpus set; pin HT siblings on ovh.left4me
Replaces bundle-default system_core_count int with a per-node set of
CPU ids; reactor takes set complement for game cores. ovh.left4me sets
{0, 4} to keep both HT siblings of physical core 0 in system.slice
so games don't share L1/L2 with system work. systemd_units reactor
return inlined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:20:28 +02:00
1b3f3ecf97
left4me: per-slice AllowedCPUs= driven by system_core_count
First N cores pin system/user/build (inline on owned slices, drop-ins
on upstream system.slice and user.slice via the systemd/units
'<parent>.d/<basename>.conf' convention). Remainder pins
l4d2-game.slice. Reactor raises on hosts with <2 threads or
system_core_count that leaves no cores for games.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:04:35 +02:00
1d30830824
left4me: install steamcmd + drop importability gate on pip_install
Two changes from the same debug session, both prerequisites for
`l4d2ctl install` to work end-to-end on a fresh node:

1) Install steamcmd via tarball under /opt/left4me/steam.
   - dpkg --add-architecture i386 + libc6:i386 + lib32z1 (32-bit deps;
     bw pkg_apt translates _ to : at install time, hence libc6_i386)
   - curl|tar one-shot, guarded by `test -x steamcmd.sh`
   - LEFT4ME_STEAMCMD in host.env so l4d2host invokes by absolute path
     (mirrors the old bundles/left4dead2/files/setup approach; avoids
     the dirname-$0 trap that bites when steamcmd is reached via a
     PATH symlink)

2) Drop the `unless` on left4me_pip_install. The gate checked
   importability of l4d2host/l4d2web, which is too weak a proxy for
   install state: adding [project.scripts] to pyproject.toml later
   wouldn't be picked up if the package was already importable from a
   prior `pip install -e`. Cost is ~2s/apply for a no-op pip
   resolution — not enough to keep the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:46:45 +02:00
524ad6e89b
nginx: SSE-friendly proxy_pass + unconditional $connection_upgrade map
Two coupled changes that let every proxy_pass vhost serve both WS and
SSE without per-vhost flags or template conditionals:

1) nginx.conf: $connection_upgrade map is now always defined (drop
   the % if has_websockets: gate), and the '' branch returns "" instead
   of "close". With "" + proxy_http_version 1.1, nginx maintains
   keep-alive to upstream for non-WS clients — which is what SSE
   requires. WS clients still get Connection: upgrade as before.

2) data/nginx/proxy_pass.conf: drop the % if websockets: conditional.
   Always set proxy_http_version 1.1 + Upgrade + Connection via the
   map, plus proxy_buffering off and proxy_read_timeout 1h for SSE.

Effects on existing vhosts:
- home.server's Proxmox WS vhost: unchanged behavior (the WS branch
  was already setting these headers). Gains the ability to also
  serve SSE if ever needed.
- All other proxy_pass vhosts (Nextcloud, Freescout, YOURLS, Gitea,
  etc.): get keep-alive to upstream (minor latency win) and unbuffered
  pass-through (slight throughput cost on huge responses, neutral
  for typical web app traffic).

Dead but harmless: bundles/nginx/metadata.py still defaults
nginx/has_websockets to False, and proxmox-ve/grafana still set it
to True. The flag is now a no-op; clean up in a separate pass.
2026-05-10 22:12:03 +02:00
99d68a5135
AGENTS.md: soften 6th rule — ccc is an option, not a mandate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:36:59 +02:00
852a65a6f6
AGENTS.md: 6th rule — try ccc search before grep for concept queries
The repo is indexed with cocoindex-code; semantic search beats grep for
"where is X / which bundle does Y" questions where you don't know the
exact identifier. Without `--path '**'` ccc scopes to the current
working directory, which is rarely what you want when navigating
ckn-bw — call it out so agents don't get confusing empty results.
2026-05-10 21:32:08 +02:00
09d236ded5
left4me: trigger alembic_upgrade from git_deploy (catch migrations on code updates)
pip_install's `unless` (import l4d2host, l4d2web) skips when both
packages are already installed — so on a code-only apply, pip_install
doesn't fire and alembic_upgrade (which it triggers) never runs.
The new 0008 migration would silently get skipped, leaving the DB
out of sync with the new schema.

Wire git_deploy → alembic_upgrade directly. alembic upgrade head is
idempotent (no-op when at head); seed_overlays + service:restart
cascade off alembic, so editable-install code changes also get picked
up by gunicorn.

Edge case noted (deferred): a migration-only change with no code
change has the same matching git rev, so this won't fire either. In
practice migrations always come with the code change that uses them.
2026-05-10 21:27:40 +02:00
7265c4aab1
letsencrypt: depend on bind9-dnsutils (dnsutils is a trixie transitional)
On Debian 13 trixie `dnsutils` is a transitional package replaced by
`bind9-dnsutils`. Apt installs bind9-dnsutils when you ask for dnsutils,
but `dpkg -s dnsutils` returns 1 because no real package by that name
exists — bw's pkg_apt status check then flags the item as failed every
apply. Switching the dependency to the real package name resolves the
loop.

The bundle just needs `nsupdate` (provided by bind9-dnsutils) for the
DNS-01 challenge hook.
2026-05-10 21:03:16 +02:00
b5662f7ea7
left4me: explicit source for /usr/local/sbin/left4me (basename collides) 2026-05-10 21:01:18 +02:00
b8648cb53f
left4me: ship a /usr/local/sbin/left4me wrapper for the flask CLI
One-liner instead of "ssh + heredoc + sudo + sh -c + double quotes":
  sudo left4me create-user alice --admin
  sudo left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
  sudo left4me routes

The wrapper sources host.env + web.env, drops to the left4me user,
sets JOB_WORKER_ENABLED=false (admin-side ops shouldn't race the
worker) and PYTHONPATH=/opt/left4me/src, then exec's the flask CLI
with whatever args followed `left4me`. No env-var enumeration: the
sh -c trailing 'sh "$@"' forwards positional args without quoting
hell. README updated to drop the verbose recipe.
2026-05-10 21:00:16 +02:00
6f2073847d
nginx/README: how port 80 is served + vm/cores requirement
Two things from the left4me-integration session worth pinning:

- 80.conf was orphaned in sites/ (not sites-enabled/) for an
  unknown amount of time. Commit d49259f moved it; document the
  resulting wiring so it's not re-broken accidentally.
- items.py reads node.metadata.get('vm/cores') with no default
  for worker_processes; bare-metal nodes outside the vm group
  raise at item-build time. Cost the agent ~10 min when
  ovh.left4me first opted into webserver.

Also note the cross-namespace read on letsencrypt/domains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:47:47 +02:00
6cc823613a
bind/README: applying changes needs both master and slave nodes
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time. Changing a record on the master only
publishes once both bw apply runs are done. The left4me-integration
session burned ~20 minutes assuming bw apply on htz.mails would
propagate to ovh.secondary via bind's own AXFR; it doesn't, because
bw verify measures the on-disk file, not the running zone.

Frame as the workflow rule rather than the absolute "not AXFR"
claim — the bundle does set type slave; in named.conf.local, but
that's orthogonal to the practical apply-both rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:46:00 +02:00
05abe52221
letsencrypt/README: first-apply, DNS-01 prereqs, negative-cache
Reshapes the existing scratchpad README into operational sections.
Captures three things that took the left4me-integration session
~30 minutes to figure out:

- After bw apply, nginx serves a self-signed cert until the daily
  systemd timer fires; the dehydrated --cron one-liner shortcuts
  the wait.
- DNS-01 needs all NS servers (primary AND secondary) to serve the
  _acme-challenge CNAME, the acme node reachable, and TSIG-key
  reachability via wireguard for off-LAN clients.
- LE's negative-cache + rate-limit combo: stop retrying for ~15
  min after fixing DNS, then make at most one attempt.

Existing nsupdate sample preserved at the bottom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:43:52 +02:00
7a579f27c5
agents/bundles: file: source defaults to destination basename
Caught during the left4me-integration nginx 80.conf move: the
agent declared a redundant 'source': '80.conf' on a file: item
whose destination already ended in 80.conf. The maintainer
flagged it as noise. Document the rule: only declare source
when the basename differs from the destination (e.g. .mako
template to a non-suffixed destination).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:40:42 +02:00
0e88c4967e
docs/specs: round-2 agents-md refactor design (gaps 7-12)
Continuation of round 1. Five commits: two new bundles/AGENTS.md
Pitfalls (file: source basename, git_deploy gotchas) and three
bundle READMEs (letsencrypt operational, bind apply-both, nginx
new file). Diverges from the handoff on placement: gaps 7-9 go
in bundles/AGENTS.md not items/AGENTS.md, since items/AGENTS.md
is scoped to custom item types only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:39:40 +02:00
69bcac421a
agents/bundles: triggers/triggered:True invariant + self-healing
Two related lessons from the left4me integration:

1. The triggers/triggered:True invariant tripped three times in
   one session. When chown_src was promoted from triggered-only
   to self-healing-every-apply (drop triggered:True + add unless),
   bw rejected because it was still in git_deploy's triggers
   list. Same dance happened for pip_install.

2. Triggered actions can't recover from partial failure: once
   upstream succeeds, it's "in desired state" forever and the
   trigger never re-fires. For pip installs / chowns / migrations
   that must heal on every apply, the right shape is no
   triggered:True + unless:<fast-check>. unless semantics fold
   into the same bullet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:29:10 +02:00
59788f315a
agents/bundles: reactors must read metadata or be defaults
The left4me bundle's first cut had two reactors that returned
static dicts without calling metadata.get(...): systemd_services
(enable/run flags) and nftables_output (two static rule strings).
Both passed bw test (no consumer yet). Once attached to
ovh.left4me, bw raised "did not request any metadata, you might
want to use defaults instead". Fix was to fold both into defaults.

Document the pitfall, with the verbatim error wording and the
note that this applies to cross-namespace contributions too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:28:31 +02:00
d3068ba8f6
agents: nodes carry only node-specific metadata
When the left4me bundle was first integrated, ovh.left4me's node
file carried ~40 lines of left4me-related metadata (git_url,
secret_key, full nginx vhost, monitoring, backups, nftables
rules). The maintainer pushed back: per-node metadata should be
only what genuinely varies per host. Refactor brought it down to
{'domain': 'left4.me'} with everything else in bundle defaults
or in a reactor deriving from the domain.

Add the rule to bundles/AGENTS.md from the bundle-author angle
(use defaults / vault-keyed-on-node for secrets, cite left4me
and postgresql for the established pattern). Add the reviewer's
form to nodes/AGENTS.md Pitfalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:27:52 +02:00
b5e72a3ac3
agents: bundle validation needs a node attached
bw test (no args) is a parsing gate, not a behaviour gate. A
bundle's reactors only resolve when some node's metadata is
built, so reactor bugs stay dormant until a node opts in. The
left4me-integration session shipped 8 commits that all "passed
bw test" with latent reactor-rejection bugs that surfaced only
once the bundle was attached to ovh.left4me.

Rewrites the verify-list in bundles/AGENTS.md to require attach-
first and uses richer command invocations (bw items --blame,
bw metadata -k <key>). Adds a Bundle-validation workflow section
to commands.md spelling out why step 2 is non-optional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:27:13 +02:00
0a9f3dae88
agents/commands: read-only command cheat sheet
Adds a flag-combinations table for bw test (selectors + the
HIJKMSp/IJKMp default-flag split), bw metadata -k/-b/-f (with
the -f sensitive-data warning), bw items --blame/-f, bw verify
-o bundle:, bw hash -m/-d. Also documents the shared target-
selector grammar.

Surfaced by the left4me-integration session, where the agent
relied on bare bw test / bw metadata / bw items invocations and
missed leverage from the available flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:26:27 +02:00
422a275d97
agents: drop bw bundles, add bw verify to read-only allowlist
bw bundles is not a subcommand of the installed fork (the actual
list is apply/debug/diff/groups/hash/ipmi/items/lock/metadata/
nodes/plot/pw/repo/run/stats/test/verify/zen). bw verify is
read-only and was missing from the list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:25:43 +02:00
3ed0264be6
docs/specs: round-1 agents-md refactor design (gaps 1-6)
Captures the brainstorm + per-commit wording for the first six
gaps from the left4me-integration handoff, plus a side-quest
read-only command cheat sheet for docs/agents/commands.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:24:03 +02:00
d49259ff07
nginx: move 80.conf to sites-available so it's actually included
The bundle was shipping 80.conf (HTTP-to-HTTPS redirect + acme-challenge
alias) to /etc/nginx/sites/80.conf, but nginx.conf only `include`s
/etc/nginx/sites-enabled/* (which is a symlink to sites-available).
The file was orphaned — no node had a working port-80 listener.

Move the destination to /etc/nginx/sites-available/80.conf so the
existing sites-enabled symlink picks it up. The /etc/nginx purge will
clean up any stale /etc/nginx/sites/80.conf on existing hosts.
2026-05-10 19:59:17 +02:00
ed141a9300
left4me: drop chown_src from git_deploy triggers (self-healing now)
Same constraint pattern: items in a triggers list must be
triggered:True. chown_src dropped triggered:True in the prior commit
to become self-healing every-apply, so it can't stay in git_deploy's
triggers list. Now git_deploy has no triggers at all — chown_src and
pip_install both run every apply, gated by their own `unless` guards.
2026-05-10 18:58:30 +02:00
9d17c69b22
left4me: make chown_src self-healing too
Same problem as pip_install: chown_src was triggered:True and only
fired when git_deploy did. After a partial first-apply where git_deploy
succeeded (extracting root-owned files) but the chown didn't happen
yet, subsequent applies left files root-owned forever — pip_install
fails with "permission denied" trying to write .egg-info/.

Drop triggered:True. Add an unless guard:
  test -z "$(find /opt/left4me/src ! -user left4me -print -quit)"
i.e. skip the chown only when no non-left4me-owned file exists in the
tree.
2026-05-10 18:57:50 +02:00
5bf95cb065
left4me: drop pip_install from pip_upgrade triggers (pip_install now always-runs) 2026-05-10 18:56:30 +02:00
cac04a456b
left4me: make pip_install self-healing on every apply
The previous shape (`triggered: True`, in git_deploy's triggers list)
meant pip_install only ran when something upstream fired. After a
partial first-apply failure (where git_deploy succeeded but pip_install
failed for an unrelated reason), subsequent applies couldn't recover —
git_deploy was already in desired state, nothing fired pip_install.

Drop `triggered: True`. Drop pip_install from git_deploy's triggers
(bw enforces a triggers→triggered:True invariant). Add `unless`:
sudo -u left4me /opt/left4me/.venv/bin/python -c "import l4d2host, l4d2web"
to short-circuit when the venv is already correct. Editable installs
pick up code changes automatically — no need to re-pip on every git
update.

For dep changes (rare), nudge manually:
  bw run ovh.left4me 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web'
2026-05-10 18:55:24 +02:00