Compare commits

..

57 commits

Author SHA1 Message Date
c6caf2a1cf
left4me: per-node system_cpus set; pin HT siblings on ovh.left4me
Replaces bundle-default system_core_count int with a per-node set of
CPU ids; reactor takes set complement for game cores. ovh.left4me sets
{0, 4} to keep both HT siblings of physical core 0 in system.slice
so games don't share L1/L2 with system work. systemd_units reactor
return inlined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:20:28 +02:00
1b3f3ecf97
left4me: per-slice AllowedCPUs= driven by system_core_count
First N cores pin system/user/build (inline on owned slices, drop-ins
on upstream system.slice and user.slice via the systemd/units
'<parent>.d/<basename>.conf' convention). Remainder pins
l4d2-game.slice. Reactor raises on hosts with <2 threads or
system_core_count that leaves no cores for games.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:04:35 +02:00
1d30830824
left4me: install steamcmd + drop importability gate on pip_install
Two changes from the same debug session, both prerequisites for
`l4d2ctl install` to work end-to-end on a fresh node:

1) Install steamcmd via tarball under /opt/left4me/steam.
   - dpkg --add-architecture i386 + libc6:i386 + lib32z1 (32-bit deps;
     bw pkg_apt translates _ to : at install time, hence libc6_i386)
   - curl|tar one-shot, guarded by `test -x steamcmd.sh`
   - LEFT4ME_STEAMCMD in host.env so l4d2host invokes by absolute path
     (mirrors the old bundles/left4dead2/files/setup approach; avoids
     the dirname-$0 trap that bites when steamcmd is reached via a
     PATH symlink)

2) Drop the `unless` on left4me_pip_install. The gate checked
   importability of l4d2host/l4d2web, which is too weak a proxy for
   install state: adding [project.scripts] to pyproject.toml later
   wouldn't be picked up if the package was already importable from a
   prior `pip install -e`. Cost is ~2s/apply for a no-op pip
   resolution — not enough to keep the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:46:45 +02:00
524ad6e89b
nginx: SSE-friendly proxy_pass + unconditional $connection_upgrade map
Two coupled changes that let every proxy_pass vhost serve both WS and
SSE without per-vhost flags or template conditionals:

1) nginx.conf: $connection_upgrade map is now always defined (drop
   the % if has_websockets: gate), and the '' branch returns "" instead
   of "close". With "" + proxy_http_version 1.1, nginx maintains
   keep-alive to upstream for non-WS clients — which is what SSE
   requires. WS clients still get Connection: upgrade as before.

2) data/nginx/proxy_pass.conf: drop the % if websockets: conditional.
   Always set proxy_http_version 1.1 + Upgrade + Connection via the
   map, plus proxy_buffering off and proxy_read_timeout 1h for SSE.

Effects on existing vhosts:
- home.server's Proxmox WS vhost: unchanged behavior (the WS branch
  was already setting these headers). Gains the ability to also
  serve SSE if ever needed.
- All other proxy_pass vhosts (Nextcloud, Freescout, YOURLS, Gitea,
  etc.): get keep-alive to upstream (minor latency win) and unbuffered
  pass-through (slight throughput cost on huge responses, neutral
  for typical web app traffic).

Dead but harmless: bundles/nginx/metadata.py still defaults
nginx/has_websockets to False, and proxmox-ve/grafana still set it
to True. The flag is now a no-op; clean up in a separate pass.
2026-05-10 22:12:03 +02:00
99d68a5135
AGENTS.md: soften 6th rule — ccc is an option, not a mandate
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:36:59 +02:00
852a65a6f6
AGENTS.md: 6th rule — try ccc search before grep for concept queries
The repo is indexed with cocoindex-code; semantic search beats grep for
"where is X / which bundle does Y" questions where you don't know the
exact identifier. Without `--path '**'` ccc scopes to the current
working directory, which is rarely what you want when navigating
ckn-bw — call it out so agents don't get confusing empty results.
2026-05-10 21:32:08 +02:00
09d236ded5
left4me: trigger alembic_upgrade from git_deploy (catch migrations on code updates)
pip_install's `unless` (import l4d2host, l4d2web) skips when both
packages are already installed — so on a code-only apply, pip_install
doesn't fire and alembic_upgrade (which it triggers) never runs.
The new 0008 migration would silently get skipped, leaving the DB
out of sync with the new schema.

Wire git_deploy → alembic_upgrade directly. alembic upgrade head is
idempotent (no-op when at head); seed_overlays + service:restart
cascade off alembic, so editable-install code changes also get picked
up by gunicorn.

Edge case noted (deferred): a migration-only change with no code
change has the same matching git rev, so this won't fire either. In
practice migrations always come with the code change that uses them.
2026-05-10 21:27:40 +02:00
7265c4aab1
letsencrypt: depend on bind9-dnsutils (dnsutils is a trixie transitional)
On Debian 13 trixie `dnsutils` is a transitional package replaced by
`bind9-dnsutils`. Apt installs bind9-dnsutils when you ask for dnsutils,
but `dpkg -s dnsutils` returns 1 because no real package by that name
exists — bw's pkg_apt status check then flags the item as failed every
apply. Switching the dependency to the real package name resolves the
loop.

The bundle just needs `nsupdate` (provided by bind9-dnsutils) for the
DNS-01 challenge hook.
2026-05-10 21:03:16 +02:00
b5662f7ea7
left4me: explicit source for /usr/local/sbin/left4me (basename collides) 2026-05-10 21:01:18 +02:00
b8648cb53f
left4me: ship a /usr/local/sbin/left4me wrapper for the flask CLI
One-liner instead of "ssh + heredoc + sudo + sh -c + double quotes":
  sudo left4me create-user alice --admin
  sudo left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
  sudo left4me routes

The wrapper sources host.env + web.env, drops to the left4me user,
sets JOB_WORKER_ENABLED=false (admin-side ops shouldn't race the
worker) and PYTHONPATH=/opt/left4me/src, then exec's the flask CLI
with whatever args followed `left4me`. No env-var enumeration: the
sh -c trailing 'sh "$@"' forwards positional args without quoting
hell. README updated to drop the verbose recipe.
2026-05-10 21:00:16 +02:00
6f2073847d
nginx/README: how port 80 is served + vm/cores requirement
Two things from the left4me-integration session worth pinning:

- 80.conf was orphaned in sites/ (not sites-enabled/) for an
  unknown amount of time. Commit d49259f moved it; document the
  resulting wiring so it's not re-broken accidentally.
- items.py reads node.metadata.get('vm/cores') with no default
  for worker_processes; bare-metal nodes outside the vm group
  raise at item-build time. Cost the agent ~10 min when
  ovh.left4me first opted into webserver.

Also note the cross-namespace read on letsencrypt/domains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:47:47 +02:00
6cc823613a
bind/README: applying changes needs both master and slave nodes
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time. Changing a record on the master only
publishes once both bw apply runs are done. The left4me-integration
session burned ~20 minutes assuming bw apply on htz.mails would
propagate to ovh.secondary via bind's own AXFR; it doesn't, because
bw verify measures the on-disk file, not the running zone.

Frame as the workflow rule rather than the absolute "not AXFR"
claim — the bundle does set type slave; in named.conf.local, but
that's orthogonal to the practical apply-both rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:46:00 +02:00
05abe52221
letsencrypt/README: first-apply, DNS-01 prereqs, negative-cache
Reshapes the existing scratchpad README into operational sections.
Captures three things that took the left4me-integration session
~30 minutes to figure out:

- After bw apply, nginx serves a self-signed cert until the daily
  systemd timer fires; the dehydrated --cron one-liner shortcuts
  the wait.
- DNS-01 needs all NS servers (primary AND secondary) to serve the
  _acme-challenge CNAME, the acme node reachable, and TSIG-key
  reachability via wireguard for off-LAN clients.
- LE's negative-cache + rate-limit combo: stop retrying for ~15
  min after fixing DNS, then make at most one attempt.

Existing nsupdate sample preserved at the bottom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:43:52 +02:00
7a579f27c5
agents/bundles: file: source defaults to destination basename
Caught during the left4me-integration nginx 80.conf move: the
agent declared a redundant 'source': '80.conf' on a file: item
whose destination already ended in 80.conf. The maintainer
flagged it as noise. Document the rule: only declare source
when the basename differs from the destination (e.g. .mako
template to a non-suffixed destination).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:40:42 +02:00
0e88c4967e
docs/specs: round-2 agents-md refactor design (gaps 7-12)
Continuation of round 1. Five commits: two new bundles/AGENTS.md
Pitfalls (file: source basename, git_deploy gotchas) and three
bundle READMEs (letsencrypt operational, bind apply-both, nginx
new file). Diverges from the handoff on placement: gaps 7-9 go
in bundles/AGENTS.md not items/AGENTS.md, since items/AGENTS.md
is scoped to custom item types only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:39:40 +02:00
69bcac421a
agents/bundles: triggers/triggered:True invariant + self-healing
Two related lessons from the left4me integration:

1. The triggers/triggered:True invariant tripped three times in
   one session. When chown_src was promoted from triggered-only
   to self-healing-every-apply (drop triggered:True + add unless),
   bw rejected because it was still in git_deploy's triggers
   list. Same dance happened for pip_install.

2. Triggered actions can't recover from partial failure: once
   upstream succeeds, it's "in desired state" forever and the
   trigger never re-fires. For pip installs / chowns / migrations
   that must heal on every apply, the right shape is no
   triggered:True + unless:<fast-check>. unless semantics fold
   into the same bullet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:29:10 +02:00
59788f315a
agents/bundles: reactors must read metadata or be defaults
The left4me bundle's first cut had two reactors that returned
static dicts without calling metadata.get(...): systemd_services
(enable/run flags) and nftables_output (two static rule strings).
Both passed bw test (no consumer yet). Once attached to
ovh.left4me, bw raised "did not request any metadata, you might
want to use defaults instead". Fix was to fold both into defaults.

Document the pitfall, with the verbatim error wording and the
note that this applies to cross-namespace contributions too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:28:31 +02:00
d3068ba8f6
agents: nodes carry only node-specific metadata
When the left4me bundle was first integrated, ovh.left4me's node
file carried ~40 lines of left4me-related metadata (git_url,
secret_key, full nginx vhost, monitoring, backups, nftables
rules). The maintainer pushed back: per-node metadata should be
only what genuinely varies per host. Refactor brought it down to
{'domain': 'left4.me'} with everything else in bundle defaults
or in a reactor deriving from the domain.

Add the rule to bundles/AGENTS.md from the bundle-author angle
(use defaults / vault-keyed-on-node for secrets, cite left4me
and postgresql for the established pattern). Add the reviewer's
form to nodes/AGENTS.md Pitfalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:27:52 +02:00
b5e72a3ac3
agents: bundle validation needs a node attached
bw test (no args) is a parsing gate, not a behaviour gate. A
bundle's reactors only resolve when some node's metadata is
built, so reactor bugs stay dormant until a node opts in. The
left4me-integration session shipped 8 commits that all "passed
bw test" with latent reactor-rejection bugs that surfaced only
once the bundle was attached to ovh.left4me.

Rewrites the verify-list in bundles/AGENTS.md to require attach-
first and uses richer command invocations (bw items --blame,
bw metadata -k <key>). Adds a Bundle-validation workflow section
to commands.md spelling out why step 2 is non-optional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:27:13 +02:00
0a9f3dae88
agents/commands: read-only command cheat sheet
Adds a flag-combinations table for bw test (selectors + the
HIJKMSp/IJKMp default-flag split), bw metadata -k/-b/-f (with
the -f sensitive-data warning), bw items --blame/-f, bw verify
-o bundle:, bw hash -m/-d. Also documents the shared target-
selector grammar.

Surfaced by the left4me-integration session, where the agent
relied on bare bw test / bw metadata / bw items invocations and
missed leverage from the available flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:26:27 +02:00
422a275d97
agents: drop bw bundles, add bw verify to read-only allowlist
bw bundles is not a subcommand of the installed fork (the actual
list is apply/debug/diff/groups/hash/ipmi/items/lock/metadata/
nodes/plot/pw/repo/run/stats/test/verify/zen). bw verify is
read-only and was missing from the list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:25:43 +02:00
3ed0264be6
docs/specs: round-1 agents-md refactor design (gaps 1-6)
Captures the brainstorm + per-commit wording for the first six
gaps from the left4me-integration handoff, plus a side-quest
read-only command cheat sheet for docs/agents/commands.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:24:03 +02:00
d49259ff07
nginx: move 80.conf to sites-available so it's actually included
The bundle was shipping 80.conf (HTTP-to-HTTPS redirect + acme-challenge
alias) to /etc/nginx/sites/80.conf, but nginx.conf only `include`s
/etc/nginx/sites-enabled/* (which is a symlink to sites-available).
The file was orphaned — no node had a working port-80 listener.

Move the destination to /etc/nginx/sites-available/80.conf so the
existing sites-enabled symlink picks it up. The /etc/nginx purge will
clean up any stale /etc/nginx/sites/80.conf on existing hosts.
2026-05-10 19:59:17 +02:00
ed141a9300
left4me: drop chown_src from git_deploy triggers (self-healing now)
Same constraint pattern: items in a triggers list must be
triggered:True. chown_src dropped triggered:True in the prior commit
to become self-healing every-apply, so it can't stay in git_deploy's
triggers list. Now git_deploy has no triggers at all — chown_src and
pip_install both run every apply, gated by their own `unless` guards.
2026-05-10 18:58:30 +02:00
9d17c69b22
left4me: make chown_src self-healing too
Same problem as pip_install: chown_src was triggered:True and only
fired when git_deploy did. After a partial first-apply where git_deploy
succeeded (extracting root-owned files) but the chown didn't happen
yet, subsequent applies left files root-owned forever — pip_install
fails with "permission denied" trying to write .egg-info/.

Drop triggered:True. Add an unless guard:
  test -z "$(find /opt/left4me/src ! -user left4me -print -quit)"
i.e. skip the chown only when no non-left4me-owned file exists in the
tree.
2026-05-10 18:57:50 +02:00
5bf95cb065
left4me: drop pip_install from pip_upgrade triggers (pip_install now always-runs) 2026-05-10 18:56:30 +02:00
cac04a456b
left4me: make pip_install self-healing on every apply
The previous shape (`triggered: True`, in git_deploy's triggers list)
meant pip_install only ran when something upstream fired. After a
partial first-apply failure (where git_deploy succeeded but pip_install
failed for an unrelated reason), subsequent applies couldn't recover —
git_deploy was already in desired state, nothing fired pip_install.

Drop `triggered: True`. Drop pip_install from git_deploy's triggers
(bw enforces a triggers→triggered:True invariant). Add `unless`:
sudo -u left4me /opt/left4me/.venv/bin/python -c "import l4d2host, l4d2web"
to short-circuit when the venv is already correct. Editable installs
pick up code changes automatically — no need to re-pip on every git
update.

For dep changes (rare), nudge manually:
  bw run ovh.left4me 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web'
2026-05-10 18:55:24 +02:00
c2cc3866f3
left4me: chown /opt/left4me/src after git_deploy
bw's git_deploy extracts the git archive as the connecting user (root
after sudo), so files end up root-owned. The subsequent pip install
runs as left4me and needs to write .egg-info/ inside each editable
package, which fails with "permission denied".

Add action:left4me_chown_src triggered by git_deploy and required by
pip_install. Idempotent (chown -R is fine to re-run).
2026-05-10 18:52:37 +02:00
d548235dfe
left4me: declare /opt/left4me/src as a directory: item
bw's git_deploy item assumes the destination directory exists on the
host — its fix path runs `find <dest> -mindepth 1 -delete` to clear
existing contents before unpacking the new archive, which fails on a
fresh box where the directory was never created. Flask follows the
same pattern (bundles/flask/items.py:13).
2026-05-10 18:51:05 +02:00
149ce6c870
left4me: use https git URL so bw clones locally per-apply
bw's git_deploy.py:103 falls into a per-apply temp clone path when the
repo URL contains '://' (HTTPS, ssh://, …). Without that, it requires
a static git_deploy_repos map file pointing at a long-lived local
clone — which is the wrong shape for left4me, where the source of
truth is git.sublimity.de.

Switching the default to the HTTPS URL means anyone with the bundle
gets a working clone-from-source on `bw apply`, no operator-side
mirror map required.

Note: the host will pull whatever is pushed to git.sublimity.de
master. Push local commits before applying.
2026-05-10 18:49:10 +02:00
0479c96ae9
gitignore: add bundlewrap git_deploy_repos map (operator-specific paths) 2026-05-10 18:43:59 +02:00
5d69180466
left4me: terse bundle-membership asserts 2026-05-10 18:34:09 +02:00
7d3554f8a5
left4me: split derived_from_domain into one reactor per consumer
Each reactor now scopes to a single downstream bundle:
  nginx_vhosts    -> nginx/vhosts
  nftables_input  -> nftables/input

Easier to grep "what writes nginx/vhosts" and harder to accidentally
couple unrelated keys together. Same merged metadata.
2026-05-10 18:33:11 +02:00
fc66267656
left4me: reuse nginx bundle's auto-monitoring via check_path
bundles/nginx/metadata.py:91-104 already creates a monitoring/services
entry per nginx/vhost using the vhost's check_protocol/check_path. Set
check_path: '/health' on the left4me vhost so the auto-check hits the
Flask health endpoint, drop the explicit monitoring/services/left4me-web
block from this reactor.

Net effect: same curl command lands in monitoring as before, but the
service name is now 'left4.me' (the hostname, per the nginx reactor's
naming convention) instead of 'left4me-web'.
2026-05-10 18:31:52 +02:00
758660b131
left4me: drop redundant letsencrypt/domains from reactor
bundles/nginx/metadata.py auto-populates letsencrypt/domains from
nginx/vhosts.keys(). Declaring it again in the left4me reactor was a
no-op duplication. Removed; bw metadata still shows the same merged
state (left4.me with reload: [nginx]).
2026-05-10 18:29:15 +02:00
7b291acca1
left4me: refresh README + opt ovh.left4me in via groups
README:
  Updated metadata example to show domain as the only required key.
  Documented the bundle's derived_from_domain reactor as the source of
  nginx/letsencrypt/monitoring/nftables-input wiring, and the
  bundle-defaults source of backup/paths.

nodes/ovh.left4me.py:
  - groups: + backup, + left4me, + webserver
  - bundles: dropped 'left4me' and 'nftables' (come via groups now;
    nftables ships with debian-13).
  - metadata: pinned vm/cores=4, vm/threads=8 (4-core HT box) so the
    nginx bundle's worker_processes resolves; left4me block reduced to
    {'domain': 'left4.me'} — git_url, git_branch, secret_key, and the
    nginx/letsencrypt/monitoring/nftables/backup blocks now come from
    bundle defaults / the derived_from_domain reactor.
2026-05-10 18:24:03 +02:00
90f14b69e4
left4me: pull node-agnostic metadata into the bundle
Nodes should only carry node-specific metadata. Previously each node
running left4me had to declare git_url, git_branch, secret_key, plus
nginx vhost / letsencrypt / monitoring / nftables-input blocks for
every game port. All of those are derivable from one truly node-
specific value: the domain.

Move into the bundle:
  - git_url + git_branch as defaults (override per-node only if needed).
  - secret_key as a per-node vault-derived value
    (random_bytes_as_base64_for f'{node.name} left4me secret_key',
    same convention as postgresql/mosquitto/etc.).
  - backup/paths defaults (set-merged with backup group / node paths).

Add a `derived_from_domain` reactor that reads left4me/domain and
emits:
  - nginx/vhosts/<domain> proxying 127.0.0.1:8000
  - letsencrypt/domains/<domain>
  - monitoring/services/left4me-web (curl /health)
  - nftables/input rules for the configured port range
    (defaults 27015-27115, derived from left4me/port_range_*).

Net effect: a node opting into left4me declares only
  metadata.left4me.domain = 'whatever.tld'
plus the universal node-level stuff (id, vm/cores, network, …).
2026-05-10 18:23:34 +02:00
3bffd7b8f5
bind-acme: guard against letsencrypt clients without internal LAN
The acme_zone reactor's first ACL branch iterates nodes that have
letsencrypt/domains and reads their network/internal/ipv4. Until now
that crashed for any node with letsencrypt but no internal LAN — the
node had to either fake a network/internal/ipv4 or skip TLS.

Add a `metadata.get(..., None)` guard to filter such nodes out of this
branch. The wireguard branch below already covers them (any node with
the wireguard bundle gets its wireguard/my_ip into the ACL), so ACME
DNS-01 reachability still works for cross-Internet nodes that join the
fleet via wireguard.

Surfaced by ovh.left4me: dedicated server with no Hetzner/internal
network, reachable from the bind-acme node only via wireguard.
2026-05-10 18:23:21 +02:00
43f0c57438
groups: add applications/left4me
Single bundle group; pulls in bundles/left4me. Joined by nodes that run
the L4D2 game-server platform. nftables and systemd come in via the
debian-13 group on Debian-13 nodes, so this group needs only the
left4me bundle itself.
2026-05-10 18:08:36 +02:00
d425afad02
left4me: write bundle README 2026-05-10 18:07:58 +02:00
f9bf289ef0
left4me: assert nftables + systemd bundle membership
Catches misconfiguration at bw test time if a node attaches left4me
without those two bundles. Both contribute load-bearing metadata
materializers (nftables/output rules; systemd/units → unit files).
2026-05-10 18:06:35 +02:00
a8fc3f2298
left4me: fix bundle defects surfaced by real-node validation
Three issues caught once `bw test ovh.left4me` ran with the bundle
actually attached (vs. the earlier `bw test` with no node opting in,
which only checks parsing):

1. systemd_services + nftables_output reactors didn't read any metadata.
   bw rejects this with "did not request any metadata, you might want
   to use defaults instead". Both contributions are static, so they
   belong in `defaults` — moved.

2. git_deploy:/opt/left4me/src triggered action:left4me_create_venv,
   but create_venv lacked `triggered: True`. bw enforces that any
   action in a triggers list must be `triggered: True`. Removed
   create_venv from the trigger list — it's gated by `unless` for
   idempotency and doesn't need to refire on git updates anyway
   (the venv persists). pip_install stays in triggers so editable
   installs pick up new code.
2026-05-10 18:05:38 +02:00
c82737b162
left4me: contribute uid-based DSCP/priority marks to nftables/output
Replaces the per-app inet left4me_mark table from
deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft with two rules
in the central bundles/nftables/ inet filter table's output chain.
Same selectors (skuid left4me + l4proto udp), same actions (DSCP EF +
priority 6) for both v4 and v6.
2026-05-10 17:53:17 +02:00
b1edcac3c7
left4me: enable+start left4me-web.service via systemd/services
The server@ template intentionally has no svc_systemd entry — instances
are started on-demand by the web app through the left4me-systemctl
helper. Slices are activated implicitly when units use Slice=.
2026-05-10 17:49:50 +02:00
72da6c0a8d
left4me: pin EnvironmentFile order via tuples (was sets)
Sets in libs/systemd.py:18 are sorted alphabetically. The current
output is correct by accident — host.env < web.env, host.env < /var.
Adding a third path later would silently reorder. Tuples preserve
insertion order; generate_unitfile() iterates them the same way.

Environment (HOME=, PATH=) stays a set: each line is an independent
KEY=VALUE assignment, order is irrelevant.
2026-05-10 17:48:03 +02:00
6965441e9a
left4me: emit server@ template + game/build slice units
Translates the remaining three unit files from left4me/deploy/files/.
Server template carries the full hardening + cgroup/IO/Mem keys
verbatim. Slices need the bundles/systemd .slice support added in
prior commit.
2026-05-10 17:43:25 +02:00
6bf46ce9a4
left4me: emit left4me-web.service via systemd/units reactor
Translates left4me/deploy/files/usr/local/lib/systemd/system/left4me-web.service
into a Python dict consumed by bundles/systemd/. Two changes vs. the
shell-deploy unit:
  - --bind 0.0.0.0:8000 -> 127.0.0.1:8000 (nginx terminates TLS in front)
  - workers/threads are templated from left4me/gunicorn_{workers,threads}
    (defaults: 1 worker + 32 threads — same as the static unit)
2026-05-10 17:38:15 +02:00
def010c976
left4me: git_deploy + venv/pip/alembic/seed action chain
Mirrors deploy-test-server.sh:233-242 + :329-333. Single pip command
installs both editable packages (l4d2host + l4d2web) from the same
checkout. Alembic and seed-overlays run as the left4me user with
JOB_WORKER_ENABLED=false sourced from web.env.
2026-05-10 17:32:19 +02:00
433c403ddc
left4me: validate sudoers file with visudo before install
A malformed /etc/sudoers.d/left4me would lock sudo on the target
(blast radius: every other bundle using sudo at apply time). bw's
file: items support test_with, which runs the supplied command on the
locally-rendered file before transfer. Use it to gate the sudoers
file on visudo -cf — analogous to the visudo -cf check the original
deploy script ran inline (deploy-test-server.sh:186).
2026-05-10 17:29:01 +02:00
80d2a79b97
left4me: declare directories, users, files, sysctl-reload action
Modes/owners match the upstream left4me deploy script:
  helpers          0755 root:root
  sudoers.d/left4me 0440 root:root (validated with visudo -cf)
  sysctl conf      0644 root:root  (triggers sysctl --system)
  sandbox-resolv   0644 root:root
  /etc/left4me/host.env  0644 root:root  (Mako)
  /etc/left4me/web.env   0640 root:left4me (Mako, contains SECRET_KEY)
  /var/lib/left4me 0711 left4me:left4me (l4d2-sandbox traversal)
UIDs/GIDs pinned at 980/981 for deterministic ownership.
2026-05-10 17:23:03 +02:00
e842e7caa6
left4me: wire LEFT4ME_PORT_RANGE_{START,END} into web.env
Bundle metadata declares port_range_start/end in defaults, but the
running app (l4d2web/config.py:34-35) reads them from
LEFT4ME_PORT_RANGE_START/END env vars. Without these in web.env, the
bundle's metadata values were dead code and the app fell back to its
own hardcoded defaults. Wiring them through closes the loop.
2026-05-10 17:19:02 +02:00
3afd4d60cc
left4me: add Mako templates for host.env and web.env
SECRET_KEY pulled from node metadata (set via !32_random_bytes_as_base64_for:
in the node file). SESSION_COOKIE_SECURE flips to true since nginx fronts
gunicorn with TLS.
2026-05-10 17:14:36 +02:00
6db792ce6a
left4me: vendor privileged helpers + sudoers/sysctl/sandbox-resolv
Copied verbatim from left4me/deploy/files/. Helpers are the trust unit
the sudoers rules grant access to; left as static files (not generated)
so the audit trail stays grep-able. Modes/owners are set via items.py
in the next commit.
2026-05-10 17:10:17 +02:00
7547d041a2
left4me: scaffold bundle (items/metadata/README stubs)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 17:05:13 +02:00
cc1c6a5767
systemd: accept .slice extension in unit-file routing
Slices are a standard systemd unit type; the existing routing only
covered timer/service/mount/swap/target and raised on .slice. Same
install path (/usr/local/lib/systemd/system/<name>) and same
systemd-reload trigger as the other unit kinds.
2026-05-10 17:00:45 +02:00
af78e40fda
left4me wireguard 2026-05-10 16:57:52 +02:00
c6bf2e0fc8
spec: banner stale sections so partial readers see the pivot
§0 Revisions notes that §3 and §7 Phase 2 are pre-pivot, but a reader
deep-linking into either section bypasses §0. Add a section-level
banner at the top of each that points back to §0 and to bundles/AGENTS.md
for the current per-bundle convention. Content is preserved as a record
of the original design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:14:12 +02:00
33 changed files with 2011 additions and 23 deletions

2
.gitignore vendored
View file

@ -5,3 +5,5 @@
.bw_debug_history .bw_debug_history
# CocoIndex Code (ccc) # CocoIndex Code (ccc)
/.cocoindex_code/ /.cocoindex_code/
# bundlewrap git_deploy local-mirror map (operator-specific paths)
git_deploy_repos

View file

@ -12,12 +12,12 @@ not project documentation. Onboarding lives **here**, in `AGENTS.md`.
## Quickstart for agents ## Quickstart for agents
Five rules; follow these and you won't break things: Six rules; follow these and you won't break things:
1. **Read-only by default.** Never run `bw apply`, `bw run`, or 1. **Read-only by default.** Never run `bw apply`, `bw run`, or
`bw lock` without explicit user request — even with `-i`. Stick `bw lock` without explicit user request — even with `-i`. Stick
to `bw test`, `bw nodes`, `bw groups`, `bw bundles`, to `bw test`, `bw nodes`, `bw groups`, `bw items`,
`bw items`, `bw metadata`, `bw hash`, `bw debug`. See `bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
[`docs/agents/commands.md`](docs/agents/commands.md) and the [`docs/agents/commands.md`](docs/agents/commands.md) and the
fork's [safety envelope](https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md). fork's [safety envelope](https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md).
2. **Never echo decrypted secrets.** Don't print, paste, or log the 2. **Never echo decrypted secrets.** Don't print, paste, or log the
@ -38,6 +38,15 @@ Five rules; follow these and you won't break things:
5. **Prefer adding helpers to `libs/`** over duplicating logic across 5. **Prefer adding helpers to `libs/`** over duplicating logic across
bundles. Repo-wide helpers go in bundles. Repo-wide helpers go in
[`libs/`](libs/AGENTS.md), reachable as `repo.libs.<x>`. [`libs/`](libs/AGENTS.md), reachable as `repo.libs.<x>`.
6. **`ccc` is available for semantic search.** This repo is indexed
with [`ccc`](https://github.com/cocoindex-io/cocoindex-code).
Reach for it on conceptual questions ("where is X used / which
bundles do Y / what are the contexts of Z"), where a keyword
grep would miss indirect usage:
`ccc search '<concept>' --path '**'`. Pass `--path '**'`
without it, results are filtered to the current working
directory's subtree. `grep`/`rg`/`find` remain fine for
exact-string lookups; pick whichever fits the question.
## Layout ## Layout

View file

@ -41,6 +41,16 @@ bundles/<name>/
more than one bundle. Don't duplicate logic across bundles. more than one bundle. Don't duplicate logic across bundles.
- **Custom item types** (e.g. `download:`) live in - **Custom item types** (e.g. `download:`) live in
[`items/`](../items/AGENTS.md), not per-bundle. [`items/`](../items/AGENTS.md), not per-bundle.
- **Bundles own application-wide knowledge; nodes carry only the few
per-host knobs the bundle actually needs.** When designing a bundle,
identify the per-node knobs (e.g. domain, uplink interface, a
vault-id suffix) and put everything else in `defaults`, or in a
reactor that derives from those knobs. Per-node random secrets
belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
keyed on the node — not in the node file. See
`bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
at module scope).
## How to add a new bundle ## How to add a new bundle
@ -56,12 +66,22 @@ bundles/<name>/
[`groups/<axis>/<x>.py`](../groups/AGENTS.md) (preferred for shared [`groups/<axis>/<x>.py`](../groups/AGENTS.md) (preferred for shared
bundles) or to the node's `bundles` list directly bundles) or to the node's `bundles` list directly
([`nodes/AGENTS.md`](../nodes/AGENTS.md)). ([`nodes/AGENTS.md`](../nodes/AGENTS.md)).
5. Verify, in this order: 5. **Verify, in this order:**
- `bw test` — sanity (loaders + reactors). - `bw test` — repo-wide parse + cross-cutting hooks. Loads every
- `bw items <node>` — confirm new items appear on a node that opts in. bundle, but reactors don't fire for nodes that haven't opted into
- `bw hash <node>` — confirm the change is what you expected. See the bundle yet — bugs in new reactors stay hidden here.
[`docs/agents/commands.md`](../docs/agents/commands.md) and the - **Attach the bundle to a node** (via the node's `bundles` list, or
fork's hash-diff workflow. a group it belongs to). Until you do, the next steps don't actually
exercise the bundle.
- `bw test <node>` — exercises every reactor and item-graph edge for
that node. This is where most new-bundle bugs surface.
- `bw items <node> --blame` — confirm items materialise with the
right paths, authored by the expected bundle.
- `bw metadata <node> -k <a/b>` — spot-check derived metadata.
- `bw hash <node>` — preview vs current host state.
See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
for the rationale.
6. Add a `bundles/<name>/README.md`. See "Per-bundle README" below 6. Add a `bundles/<name>/README.md`. See "Per-bundle README" below
for what to cover. for what to cover.
@ -82,6 +102,12 @@ bundles/<name>/
unless the matching `file:` item declares `content_type='mako'` unless the matching `file:` item declares `content_type='mako'`
(or a templating extension triggers it). To check, read the matching (or a templating extension triggers it). To check, read the matching
`file:` entry in `items.py`. `file:` entry in `items.py`.
- **`file:` `source` defaults to the destination basename.** For a
destination of `/etc/foo/bar.conf` with no `source` key, bw looks
for `bundles/<bundle>/files/bar.conf`. Only declare `source`
explicitly when the basename you want differs (e.g. shipping a Mako
template named `bar.conf.mako` to a destination of
`/etc/foo/bar.conf`).
- **Reactors writing across namespaces.** Some bundles' reactors write - **Reactors writing across namespaces.** Some bundles' reactors write
into other bundles' metadata namespaces (e.g. `nextcloud` writes into other bundles' metadata namespaces (e.g. `nextcloud` writes
into `apt.packages`, `archive.paths`). When you change such a bundle, into `apt.packages`, `archive.paths`). When you change such a bundle,
@ -90,6 +116,28 @@ bundles/<name>/
itself; grep `'<other-bundle>':` in the reactors when in doubt. itself; grep `'<other-bundle>':` in the reactors when in doubt.
- **`bw hash` doesn't accept selectors.** Use `bw hash <node>` per - **`bw hash` doesn't accept selectors.** Use `bw hash <node>` per
literal name; see the fork's runbook. literal name; see the fork's runbook.
- **Reactors must read metadata.** If a reactor body returns a static
dict without calling `metadata.get(...)`, bw raises
`ValueError: <reactor> on <node> did not request any metadata, you
might want to use defaults instead` once a node consumes the bundle.
Fix: fold the contribution into `defaults`. The rule applies even
when the reactor writes into another bundle's namespace — a static
contribution to e.g. `nftables/output` belongs in `defaults`, where
bw merges it with other bundles' contributions.
- **`triggers``triggered: True` invariant.** Any item listed in
another's `triggers` list must declare `triggered: True`. bw
enforces this at `bw test` time: *"…triggered by …, but missing
'triggered' attribute"*. Corollary: an action can't be both in an
upstream `triggers` list AND self-healing every apply — pick one.
- **Triggered actions don't recover from partial failure.** When an
upstream item's apply succeeds but its triggered downstream action
fails, subsequent applies can't recover via the trigger chain —
upstream is "already in desired state" and never re-triggers. For
actions that must self-heal (pip installs, chowns, migrations),
drop `triggered: True` and gate the command with `unless: <fast-check>`.
`unless` is a shell command on the target host whose exit status
decides whether the main command runs (exit 0 = skip); it's checked
at fire time, after `triggered:` filtering.
## Per-bundle README ## Per-bundle README

View file

@ -33,6 +33,7 @@ def acme_zone(metadata):
str(ip_interface(other_node.metadata.get('network/internal/ipv4')).ip) str(ip_interface(other_node.metadata.get('network/internal/ipv4')).ip)
for other_node in repo.nodes for other_node in repo.nodes
if other_node.metadata.get('letsencrypt/domains', {}) if other_node.metadata.get('letsencrypt/domains', {})
and other_node.metadata.get('network/internal/ipv4', None)
}, },
*{ *{
str(ip_interface(other_node.metadata.get('wireguard/my_ip')).ip) str(ip_interface(other_node.metadata.get('wireguard/my_ip')).ip)

30
bundles/bind/README.md Normal file
View file

@ -0,0 +1,30 @@
# bind
Authoritative DNS — primary plus optional `bind/master_node` slaves.
## Applying changes needs both nodes
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time (see `bundles/bind/items.py:100`). When
you change a record on the master (adding a `letsencrypt/domains`
entry, a new vhost, etc.), the change is only published once you
apply BOTH:
```sh
bw apply htz.mails # primary (where the source records live)
bw apply ovh.secondary # secondary (renders its own zone files)
```
Until both have been applied, `bw verify ovh.secondary` will show
stale zones and consumers that hit the secondary (Let's Encrypt's
secondary-region validators in particular) will see NXDOMAIN. Even
though the slave's named.conf.local declares `type slave;`, don't
rely on bind's own AXFR catching up — the bw-rendered file on disk
is what `bw verify` measures.
## See also
- `bundles/bind-acme/` — the in-house ACME-update receiver.
- `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
negative-cache penalty (the most common operational consequence
of forgetting to apply the secondary).

114
bundles/left4me/README.md Normal file
View file

@ -0,0 +1,114 @@
# left4me
L4D2 game-server management platform: a Flask web UI on gunicorn that
provisions per-instance srcds servers via templated systemd units, with
kernel-overlayfs layering for shared installations + per-overlay maps,
and uid-based DSCP/priority marking on the egress path so CAKE on the
external interface prioritizes srcds UDP over bulk traffic.
## Metadata
```python
'metadata': {
'left4me': {
'domain': 'whatever.tld', # required — the only per-node knob
# Everything below is optional and has a sensible default in the
# bundle. Override per-node only if the default is wrong:
# 'git_url': 'git@git.sublimity.de:cronekorkn/left4me',
# 'git_branch': 'master',
# 'gunicorn_workers': 1,
# 'gunicorn_threads': 32,
# 'job_worker_threads': 4,
# 'port_range_start': 27015,
# 'port_range_end': 27115,
# secret_key is auto-derived per node
# (repo.vault.random_bytes_as_base64_for f'{node.name} left4me secret_key').
},
},
```
The bundle's `derived_from_domain` reactor reads `left4me/domain` and
emits the corresponding `nginx/vhosts`, `letsencrypt/domains`,
`monitoring/services/left4me-web` (HTTPS health check), and the game-
port `nftables/input` accept rules. Backup paths
(`/var/lib/left4me`, `/etc/left4me`) are set-merged into `backup/paths`
from defaults. None of these need to be declared per-node.
## What this bundle does
- Creates system users `left4me` (uid/gid 980, home `/var/lib/left4me`,
mode 0711) and `l4d2-sandbox` (uid/gid 981, no home, used by bwrap
script-overlay builds).
- Drops privileged helpers under `/usr/local/libexec/left4me/`
(`left4me-systemctl`, `left4me-journalctl`, `left4me-overlay`,
`left4me-script-sandbox`) plus a tight sudoers file (validated with
`visudo -cf` before install).
- `git_deploy`s the left4me repo to `/opt/left4me/src`, builds a venv at
`/opt/left4me/.venv`, `pip install -e`s both `l4d2host` and `l4d2web`,
runs `alembic upgrade head` and `flask seed-script-overlays`, then
enables `left4me-web.service`.
- Emits four systemd units via `systemd/units` metadata (consumed by
`bundles/systemd/`):
- `left4me-web.service` — gunicorn on `127.0.0.1:8000` (TLS terminates upstream).
- `left4me-server@.service` — per-instance srcds template, started on
demand by the web app via the `left4me-systemctl` helper.
- `l4d2-game.slice` / `l4d2-build.slice` — cgroup slices for the
perf-baseline (CPU/IO weights, memory caps).
- Contributes uid-based DSCP/priority marks for srcds UDP egress to
`nftables/output` (via `defaults`).
## Gotchas
- **Requires `bundles/nftables` and `bundles/systemd` on the node.** The
bundle asserts membership at `bw test` time. On Debian-13 these ride
in via the `debian-13` group, so attaching the bundle to a Debian-13
node is enough.
- **`left4me-web.service` does not have `NoNewPrivileges=true`.** This is
intentional — workers `sudo` the privileged helpers; `NoNewPrivileges`
would block setuid escalation. Per-instance `server@.service` units
*do* have it.
- **CAKE shaping is configured separately**, via
`network/<iface>/cake` on the node (consumed by `bundles/network/`),
not by this bundle.
- **First-run admin user is manual.** After `bw apply`, ssh to the host and
bootstrap the admin via the `left4me` wrapper (it sources the env files,
drops to the `left4me` user, and runs the flask CLI):
`sudo left4me create-user <username> --admin` (prompts for password via
the flask CLI, or set `LEFT4ME_ADMIN_PASSWORD` first). The bundle
deliberately doesn't seed an admin to keep credentials out of the
metadata pipeline. The same `left4me` wrapper accepts any other flask
subcommand: `sudo left4me seed-script-overlays <dir>`,
`sudo left4me routes`, `sudo left4me shell`, etc.
- **CPU isolation is managed by this bundle**, driven by one required
per-node knob: `left4me/system_cpus` — a set of int CPU ids that
pins `system.slice` / `user.slice` / `l4d2-build.slice`. The
complement (`set(range(vm/threads)) - system_cpus`) pins
`l4d2-game.slice`. On HT hosts, list both SMT siblings of every
physical core you want to reserve for system, otherwise games end
up sharing L1/L2 with system. Find pairings via
`/sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list`. On
the prod node (`ovh.left4me`, 4 physical / 8 threads, pairings
(0,4) (1,5) (2,6) (3,7)) the node sets `'system_cpus': {0, 4}` to
reserve physical core 0 entirely. `l4d2-game.slice` and
`l4d2-build.slice` carry `AllowedCPUs=` inline on their unit
definitions; `system.slice` and `user.slice` get drop-ins registered
under `systemd/units` with the `'<parent>.d/<basename>.conf'` key
convention (same shape nginx and autologin use), landing at
`/usr/local/lib/systemd/system/<slice>.d/99-left4me-cpuset.conf`.
The reactor raises if `system_cpus` includes CPUs outside
`[0, vm/threads)` or leaves no cores for games.
- **Kernel feature requirement:** kernel-overlayfs (`CONFIG_OVERLAY_FS`).
Standard on debian-13.
- **Game ports** open by the web app on demand in the range 27015-27115
(UDP+TCP). Add corresponding accept rules to `nftables/input` per
node if the host's policy is default-drop on input.
- **Pinned UIDs/GIDs (980/981).** Chosen for deterministic ownership
across rebuilds and backup restores. If you add another bundle that
pins UIDs in this repo, make sure it doesn't collide.
## Slice support requires `bundles/systemd` ≥ commit cc1c6a5
This bundle's `l4d2-game.slice` and `l4d2-build.slice` units rely on
`bundles/systemd/items.py` accepting the `.slice` extension. Older
revisions raised `Exception(f'unknown type slice')` at apply time.
The repo-wide `bw test` will catch this if it regresses.

View file

@ -0,0 +1,6 @@
# Managed by ckn-bw bundles/left4me. Local edits will be reverted.
# Deployment units use fixed /var/lib/left4me paths; regenerate units if this changes.
LEFT4ME_ROOT=/var/lib/left4me
# l4d2host invokes steamcmd by absolute path — bypasses PATH lookup so the
# script's `cd "$(dirname "$0")"` resolves next to the real install dir.
LEFT4ME_STEAMCMD=/opt/left4me/steam/steamcmd.sh

View file

@ -0,0 +1,6 @@
# Sandbox-only resolver config — bind-mounted into script-overlay sandboxes
# at /etc/resolv.conf. The host's resolver (often a private/LAN DNS server)
# is unreachable from inside the sandbox because IPAddressDeny= blocks
# egress to RFC1918 / loopback. Public resolvers keep DNS working.
nameserver 1.1.1.1
nameserver 8.8.8.8

View file

@ -0,0 +1,7 @@
# Managed by ckn-bw bundles/left4me. Local edits will be reverted.
DATABASE_URL=sqlite:////var/lib/left4me/left4me.db
SECRET_KEY=${node.metadata.get('left4me/secret_key')}
JOB_WORKER_THREADS=${node.metadata.get('left4me/job_worker_threads')}
SESSION_COOKIE_SECURE=true
LEFT4ME_PORT_RANGE_START=${node.metadata.get('left4me/port_range_start')}
LEFT4ME_PORT_RANGE_END=${node.metadata.get('left4me/port_range_end')}

View file

@ -0,0 +1,5 @@
Defaults:left4me !requiretty
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-overlay mount *, /usr/local/libexec/left4me/left4me-overlay umount *
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox

View file

@ -0,0 +1,36 @@
# Host-side perf baseline for left4me — see
# docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
#
# UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
# Source-engine UDP across multiple instances. 8 MiB matches the standard
# 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
# explicitly enlarge their buffers.
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.core.rmem_default = 524288
net.core.wmem_default = 524288
# Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
# at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
# netdev_budget = 600 gives softirq more drain headroom per pass.
net.core.netdev_max_backlog = 5000
net.core.netdev_budget = 600
# Latency-sensitive default: avoid swap unless the box is really under
# pressure. Harmless on swapless hosts.
vm.swappiness = 10
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
net.ipv4.udp_rmem_min = 16384
net.ipv4.udp_wmem_min = 16384
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian Trixie
# already defaults to fq_codel; setting it explicitly is belt-and-suspenders
# and survives kernel-default churn.
net.core.default_qdisc = fq_codel
# TCP congestion control: BBR for any bulk TCP egress on the host (admin SSH,
# backups, package fetches, web-app responses) so a long flow does not push
# the bottleneck queue ahead of game UDP. UDP srcds is unaffected.
net.ipv4.tcp_congestion_control = bbr

View file

@ -0,0 +1,53 @@
#!/bin/sh
set -eu
usage() {
printf '%s\n' "usage: left4me-journalctl <server-name> --lines <n> --follow|--no-follow" >&2
exit 2
}
validate_name() {
name=$1
[ -n "$name" ] || usage
case "$name" in
.*|*..*|*/*|*\\*) usage ;;
esac
case "$name" in
*[!A-Za-z0-9_.-]*) usage ;;
esac
}
[ "$#" -eq 4 ] || usage
name=$1
lines_flag=$2
lines=$3
follow_flag=$4
validate_name "$name"
[ "$lines_flag" = "--lines" ] || usage
case "$lines" in
''|*[!0-9]*) usage ;;
esac
follow_arg=
case "$follow_flag" in
--follow) follow_arg=-f ;;
--no-follow) ;;
*) usage ;;
esac
unit="left4me-server@${name}.service"
if [ -x /bin/journalctl ]; then
journalctl=/bin/journalctl
elif [ -x /usr/bin/journalctl ]; then
journalctl=/usr/bin/journalctl
else
printf '%s\n' 'journalctl not found at /bin/journalctl or /usr/bin/journalctl' >&2
exit 69
fi
if [ -n "$follow_arg" ]; then
exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"
fi
exec "$journalctl" -u "$unit" -n "$lines" -o cat

View file

@ -0,0 +1,242 @@
#!/usr/bin/python3
"""Privileged overlay mount helper for left4me.
Invoked from the systemd unit's ExecStartPre / ExecStopPost via
`+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- …`. The unit-level
nsenter is what makes this work: it runs the helper Python interpreter
inside PID 1's mount namespace. Without it, the `+` Exec prefix
removes the sandbox/credentials but does NOT detach from the unit's
per-service mount namespace, and the helper process itself would pin
that namespace alive — turning every umount into a multi-second EBUSY
race with the kernel's deferred namespace cleanup. With the unit-level
nsenter the helper has no such reference and umount succeeds first try.
Validates inputs strictly, then performs `mount -t overlay` /
`umount` directly — no internal nsenter, since the helper is already
running where the syscalls need to take effect.
Verbs:
mount <name> Reads ${LEFT4ME_ROOT}/instances/<name>/instance.env
for L4D2_LOWERDIRS, validates every lowerdir is
under one of installation/overlays/workshop_cache/
global_overlay_cache, then mounts the kernel
overlay at runtime/<name>/merged.
umount <name> Unmounts runtime/<name>/merged and cleans up the
kernel-overlayfs `work/work` orphan.
Set LEFT4ME_OVERLAY_PRINT_ONLY=1 to print the would-be argv (one line,
shell-quoted) and exit 0 instead of execv. Used by tests.
"""
import os
import re
import shlex
import shutil
import subprocess
import sys
from pathlib import Path
NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
DEFAULT_ROOT = "/var/lib/left4me"
LOWERDIR_ALLOWLIST = (
"installation",
"overlays",
"global_overlay_cache",
"workshop_cache",
)
MAX_LOWERDIRS = 500
MOUNT_BIN = "/bin/mount"
UMOUNT_BIN = "/bin/umount"
def die(msg: str) -> None:
sys.stderr.write(f"left4me-overlay: {msg}\n")
sys.exit(1)
def root() -> Path:
return Path(os.environ.get("LEFT4ME_ROOT") or DEFAULT_ROOT)
def validate_name(name: str) -> str:
if not NAME_RE.fullmatch(name):
die(f"invalid instance name: {name!r}")
return name
def parse_lowerdirs(env_path: Path) -> list[str]:
if not env_path.is_file():
die(f"instance.env not found: {env_path}")
raw = None
for line in env_path.read_text().splitlines():
if "=" not in line:
continue
key, value = line.split("=", 1)
if key.strip() == "L4D2_LOWERDIRS":
raw = value
break
if raw is None:
die(f"L4D2_LOWERDIRS not set in {env_path}")
if raw == "":
die(f"L4D2_LOWERDIRS is empty in {env_path}")
parts = raw.split(":")
if any(p == "" for p in parts):
die(f"L4D2_LOWERDIRS contains an empty entry: {raw!r}")
if len(parts) > MAX_LOWERDIRS:
die(f"L4D2_LOWERDIRS has {len(parts)} entries (cap {MAX_LOWERDIRS})")
return parts
def canonical_under(allowed_roots: list[Path], path: Path) -> Path:
try:
canonical = path.resolve(strict=True)
except (FileNotFoundError, RuntimeError):
die(f"path does not exist or has a symlink loop: {path}")
for r in allowed_roots:
if canonical == r or r in canonical.parents:
return canonical
die(f"path is outside the permitted roots: {path} (resolved: {canonical})")
_LISTXATTR = getattr(os, "listxattr", None)
def _entry_has_fuse_xattr(path: str) -> str | None:
if _LISTXATTR is None:
return None
try:
attrs = _LISTXATTR(path, follow_symlinks=False)
except OSError:
return None
for a in attrs:
if a.startswith("user.fuseoverlayfs."):
return a
return None
def assert_no_fuse_xattrs(upper: Path) -> None:
if not upper.exists() or _LISTXATTR is None:
return
for dirpath, dirnames, filenames in os.walk(upper):
for entry in (dirpath, *(os.path.join(dirpath, n) for n in dirnames),
*(os.path.join(dirpath, n) for n in filenames)):
tainted = _entry_has_fuse_xattr(entry)
if tainted:
die(
f"upperdir contains fuse-overlayfs xattr {tainted!r} on {entry}; "
"wipe upper/ and work/ before mounting"
)
def exec_or_print(argv: list[str]) -> None:
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
print(" ".join(shlex.quote(a) for a in argv))
sys.exit(0)
os.execv(argv[0], argv)
def cmd_mount(name: str) -> None:
name = validate_name(name)
r = root()
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
merged_for_check = (runtime_name_dir / "merged").resolve(strict=True)
# Idempotency for unit restart cycles: if a previous start mounted
# successfully but ExecStart failed afterwards (and Restart=on-failure
# fires another cycle), the second ExecStartPre would otherwise refuse
# to mount-on-top. Short-circuit here so the second cycle just gets
# straight to ExecStart. PRINT_ONLY (test mode) bypasses this so the
# tests can exercise the full nsenter argv regardless of mount state.
if (
os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") != "1"
and os.path.ismount(merged_for_check)
):
return
instance_env = r / "instances" / name / "instance.env"
raw_lowerdirs = parse_lowerdirs(instance_env)
allowed_roots = [(r / sub).resolve() for sub in LOWERDIR_ALLOWLIST]
canonical_lowerdirs = [str(canonical_under(allowed_roots, Path(p))) for p in raw_lowerdirs]
upper = (runtime_name_dir / "upper").resolve(strict=True)
work = (runtime_name_dir / "work").resolve(strict=True)
merged = merged_for_check
for label, path in (("upper", upper), ("work", work), ("merged", merged)):
if path.parent != runtime_name_dir:
die(f"{label} resolved outside runtime/{name}: {path}")
assert_no_fuse_xattrs(upper)
options = f"lowerdir={':'.join(canonical_lowerdirs)},upperdir={upper},workdir={work}"
argv = [
MOUNT_BIN,
"-t", "overlay",
"overlay",
"-o", options,
str(merged),
]
exec_or_print(argv)
def cmd_umount(name: str) -> None:
name = validate_name(name)
r = root()
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
merged_path = runtime_name_dir / "merged"
work_inner = runtime_name_dir / "work" / "work"
argv = [
UMOUNT_BIN,
# Resolve only if it exists; PRINT_ONLY tests always pre-create it.
str(merged_path.resolve(strict=True) if merged_path.exists() else merged_path),
]
# PRINT_ONLY: emit the umount argv and exit. Tests assert exact shape
# of this dry-run; the post-umount cleanup of work_inner is a runtime
# behaviour exercised on the host, not in unit tests.
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
print(" ".join(shlex.quote(a) for a in argv))
sys.exit(0)
if merged_path.exists():
merged = merged_path.resolve(strict=True)
if merged.parent != runtime_name_dir:
die(f"merged resolved outside runtime/{name}: {merged}")
# Idempotency: only umount if currently a mount point. Mirrors
# cmd_mount's symmetric check; a redundant cleanup pass — or a
# call after a partial _purge_instance — must be a no-op.
#
# No retry loop here: with the helper running in PID 1's mount
# namespace (via the unit-level `nsenter --mount=/proc/1/ns/mnt`
# in ExecStopPost), it holds no reference to the unit's
# per-service mount namespace, so the cgroup-empty → namespace
# reaped → umount-clears sequence happens without any race
# window for us to ride out. EBUSY here is a real error.
if os.path.ismount(merged):
subprocess.run(argv, check=True)
# Kernel-overlayfs creates work_inner during mount with root:root mode
# 0/0. After unmount it's an orphan that the unit's User= (left4me)
# cannot traverse via shutil.rmtree, so reset/delete in instances.py
# blows up with EACCES on `runtime/<name>/work/work`. The helper is
# the only code path with root that knows about this directory, so
# the cleanup belongs here. Safe to nuke — the kernel re-creates it
# on the next mount. Run unconditionally — covers both "we just
# unmounted" and "previous teardown didn't finish" cases.
if work_inner.exists():
shutil.rmtree(work_inner)
def main(argv: list[str]) -> None:
if len(argv) != 3 or argv[1] not in ("mount", "umount"):
sys.stderr.write("usage: left4me-overlay mount|umount <name>\n")
sys.exit(2)
if argv[1] == "mount":
cmd_mount(argv[2])
else:
cmd_umount(argv[2])
if __name__ == "__main__":
main(sys.argv)

View file

@ -0,0 +1,82 @@
#!/bin/bash
# Privileged sandbox launcher for left4me script overlays.
#
# Invoked via sudo by the web user with two arguments:
# <overlay_id> numeric overlay id; bind-mounts /var/lib/left4me/overlays/<id>
# read-write at /overlay inside the sandbox.
# <script_path> absolute path to a bash file already written by the web app;
# bind-mounted read-only at /script.sh inside the sandbox.
#
# The script runs as a transient systemd .service with the full hardening
# surface: cgroup limits + walltime kill, NoNewPrivileges, ProtectSystem,
# ProtectHome, kernel-tunable / -module / -log protection, namespace
# restriction, address-family restriction, capability bounding (empty),
# seccomp filter (@system-service @network-io), MemoryDenyWriteExecute,
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
# scripts must reach the public internet to download workshop / l4d2center
# / cedapug content. PID namespace is shared with the host (no
# PrivatePID= directive in systemd); host PIDs are visible via /proc but
# not signal-able due to UID mismatch.
set -euo pipefail
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
OVERLAY_ID=$1
SCRIPT=$2
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
[[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
exit 0
fi
# Make sure the sandbox UID owns the overlay dir so the script can write there.
# Idempotent: a no-op when the dir is already l4d2-sandbox-owned (re-run case),
# and corrects the ownership the first time the dir was created by the web app
# under the left4me UID. World-readable so the gameserver process (left4me)
# can read the overlay contents via the kernel-overlayfs lowerdir at runtime.
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
chmod 0755 "$OVERLAY_DIR"
SCRIPT_RC=0
systemd-run --quiet --collect --wait --pipe \
--unit="left4me-script-${OVERLAY_ID}-$$" \
--slice=l4d2-build.slice \
-p OOMScoreAdjust=500 \
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
-p UMask=0022 \
-p NoNewPrivileges=yes \
-p ProtectSystem=strict -p ProtectHome=yes \
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
-p RestrictNamespaces=yes \
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
-p MemoryDenyWriteExecute=yes \
-p SystemCallFilter="@system-service @network-io" \
-p SystemCallArchitectures=native \
-p CapabilityBoundingSet= -p AmbientCapabilities= \
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
-p TemporaryFileSystem="/etc /var/lib" \
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-p BindPaths="${OVERLAY_DIR}:/overlay" \
-p WorkingDirectory=/overlay \
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
-- /bin/bash /script.sh || SCRIPT_RC=$?
# Normalize perms so the web service (left4me uid) can read overlay files
# directly via Python open() — needed by the file tree's download endpoint.
# UMask=0022 above takes care of *new* writes; this catches anything the
# script created with a tighter mode (e.g. cedapug_maps writes its
# .cedapug/manifest.tsv as 0600 by default).
find "$OVERLAY_DIR" -type f ! -perm -o+r -exec chmod o+r {} + 2>/dev/null || true
find "$OVERLAY_DIR" -type d ! -perm -o+rx -exec chmod o+rx {} + 2>/dev/null || true
exit $SCRIPT_RC

View file

@ -0,0 +1,44 @@
#!/bin/sh
set -eu
usage() {
printf '%s\n' "usage: left4me-systemctl enable|disable|show <server-name>" >&2
exit 2
}
validate_name() {
name=$1
[ -n "$name" ] || usage
case "$name" in
.*|*..*|*/*|*\\*) usage ;;
esac
case "$name" in
*[!A-Za-z0-9_.-]*) usage ;;
esac
}
[ "$#" -eq 2 ] || usage
action=$1
name=$2
case "$action" in
enable|disable|show) ;;
*) usage ;;
esac
validate_name "$name"
unit="left4me-server@${name}.service"
if [ -x /bin/systemctl ]; then
systemctl=/bin/systemctl
elif [ -x /usr/bin/systemctl ]; then
systemctl=/usr/bin/systemctl
else
printf '%s\n' 'systemctl not found at /bin/systemctl or /usr/bin/systemctl' >&2
exit 69
fi
case "$action" in
enable) exec "$systemctl" enable --now "$unit" ;;
disable) exec "$systemctl" disable --now "$unit" ;;
show) exec "$systemctl" show --property=ActiveState --property=SubState "$unit" ;;
esac

View file

@ -0,0 +1,17 @@
#!/bin/sh
# Run l4d2web flask CLI commands as the left4me user with the deploy env loaded.
# Usage: left4me <flask-subcommand> [args...]
# Examples:
# left4me create-user alice --admin
# left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
# left4me routes
set -eu
exec sudo -u left4me sh -c '
set -a
. /etc/left4me/host.env
. /etc/left4me/web.env
set +a
export JOB_WORKER_ENABLED=false
export PYTHONPATH=/opt/left4me/src
exec /opt/left4me/.venv/bin/flask --app l4d2web.app:create_app "$@"
' sh "$@"

293
bundles/left4me/items.py Normal file
View file

@ -0,0 +1,293 @@
# Items for the left4me bundle.
# Systemd units come from metadata via bundles/systemd/ — there are no
# .service or .slice files in this bundle's files/ tree. Cpuset drop-ins
# for system.slice / user.slice are likewise emitted via systemd/units
# in metadata.py (key: '<parent>.d/<basename>.conf').
directories = {
'/opt/left4me': {
'owner': 'left4me',
'group': 'left4me',
},
'/opt/left4me/src': {
'owner': 'left4me',
'group': 'left4me',
},
'/etc/left4me': {
'owner': 'root',
'group': 'root',
'mode': '0755',
},
'/var/lib/left4me': {
# left4me's home dir — useradd creates with 0700; loosen to 0711 so
# l4d2-sandbox can traverse (but not list) for bwrap bind-mounts.
'owner': 'left4me',
'group': 'left4me',
'mode': '0711',
},
'/var/lib/left4me/installation': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/overlays': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/instances': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/runtime': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/workshop_cache': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/tmp': {'owner': 'left4me', 'group': 'left4me'},
'/opt/left4me/steam': {'owner': 'left4me', 'group': 'left4me'},
'/usr/local/libexec/left4me': {
'owner': 'root',
'group': 'root',
'mode': '0755',
},
}
groups = {
'left4me': {'gid': 980},
'l4d2-sandbox': {'gid': 981},
}
users = {
'left4me': {
'uid': 980,
'gid': 980,
'home': '/var/lib/left4me',
'shell': '/usr/sbin/nologin',
},
'l4d2-sandbox': {
'uid': 981,
'gid': 981,
'shell': '/usr/sbin/nologin',
},
}
# UIDs/GIDs pinned in the system-package range (100-999, per Debian
# policy) so file ownership is deterministic across rebuilds and
# backup restores. 980/981 are unused elsewhere in this repo.
# Privileged helpers (mode 0755 root:root). Listed by sudoers as the only
# commands left4me can invoke as root NOPASSWD.
HELPERS = (
'left4me-systemctl',
'left4me-journalctl',
'left4me-overlay',
'left4me-script-sandbox',
)
files = {
'/usr/local/sbin/left4me': {
'source': 'usr/local/sbin/left4me', # explicit — basename collides with sudoers
'mode': '0755',
'owner': 'root',
'group': 'root',
},
**{
f'/usr/local/libexec/left4me/{h}': {
'source': f'usr/local/libexec/left4me/{h}',
'mode': '0755',
'owner': 'root',
'group': 'root',
}
for h in HELPERS
},
'/etc/left4me/sandbox-resolv.conf': {
'source': 'etc/left4me/sandbox-resolv.conf',
'mode': '0644',
'owner': 'root',
'group': 'root',
},
'/etc/sudoers.d/left4me': {
'source': 'etc/sudoers.d/left4me',
'mode': '0440',
'owner': 'root',
'group': 'root',
'test_with': 'visudo -cf {}',
},
'/etc/sysctl.d/99-left4me.conf': {
'source': 'etc/sysctl.d/99-left4me.conf',
'mode': '0644',
'owner': 'root',
'group': 'root',
'triggers': [
'action:left4me_sysctl_reload',
],
},
'/etc/left4me/host.env': {
'source': 'etc/left4me/host.env.mako',
'content_type': 'mako',
'mode': '0644',
'owner': 'root',
'group': 'root',
},
'/etc/left4me/web.env': {
'source': 'etc/left4me/web.env.mako',
'content_type': 'mako',
'mode': '0640',
'owner': 'root',
'group': 'left4me',
'needs': [
'group:left4me',
],
},
}
actions = {
'left4me_sysctl_reload': {
'command': 'sysctl --system >/dev/null',
'triggered': True,
},
'left4me_dpkg_add_i386_arch': {
# steamcmd is 32-bit and pulls libc6:i386 + lib32z1 from the i386 arch.
# apt-get update is part of this action because newly-added foreign
# archs need a fresh package list before any :i386 package resolves.
'command': 'dpkg --add-architecture i386 && apt-get update',
'unless': 'dpkg --print-foreign-architectures | grep -qx i386',
'cascade_skip': False,
},
'left4me_install_steamcmd': {
# Steam's tarball is rolling with no published checksum, so we can't
# use download: (which requires a hash). Guard with a presence check
# on steamcmd.sh — steamcmd self-updates at runtime, so chasing the
# tarball version from bw isn't useful.
'command': (
'sudo -u left4me sh -c "'
'cd /opt/left4me/steam && '
'curl -fsSL https://media.steampowered.com/installer/steamcmd_linux.tar.gz | '
'tar -xz'
'"'
),
'unless': 'test -x /opt/left4me/steam/steamcmd.sh',
'cascade_skip': False,
'needs': [
'directory:/opt/left4me/steam',
'pkg_apt:curl',
'pkg_apt:libc6_i386', # bw pkg_apt convention: _ → :
'pkg_apt:lib32z1',
'user:left4me',
],
},
}
# steamcmd is invoked by absolute path (LEFT4ME_STEAMCMD in host.env),
# not via PATH lookup — see l4d2host/cli.py:install. We don't need to put
# anything in /usr/local/bin for it.
git_deploy = {
'/opt/left4me/src': {
'repo': node.metadata.get('left4me/git_url'),
'rev': node.metadata.get('left4me/git_branch'),
'triggers': [
# On a code-update apply, refresh the DB schema. pip_install
# would have triggered alembic in the create_venv path, but on
# a normal apply pip_install's `unless` skips (packages still
# importable from the previous editable install), and that
# would leave alembic_upgrade dormant. Wiring git_deploy →
# alembic directly ensures new migrations land whenever new
# code lands. alembic upgrade head is idempotent (no-op when
# already at head), so this is safe to fire on every code
# update; the seed_overlays + service:restart cascade off
# alembic also covers picking up the new code in gunicorn.
'action:left4me_alembic_upgrade',
],
# chown_src and pip_install are NOT in triggers — they run every
# apply gated by their own `unless` guards, which makes the chain
# self-healing after a partial failure. (Items in a triggers list
# must be triggered:True, which would lose that property.)
},
}
actions['left4me_chown_src'] = {
# Runs every apply (cheap — chown -R on a small tree). Self-heals
# whenever git_deploy extracts a new tarball as root-owned files.
# Not in any triggers list so doesn't need triggered:True.
'command': 'chown -R left4me:left4me /opt/left4me/src',
'unless': 'test -z "$(find /opt/left4me/src \\! -user left4me -print -quit 2>/dev/null)"',
'cascade_skip': False,
'needs': [
'git_deploy:/opt/left4me/src',
'user:left4me',
'group:left4me',
],
}
actions['left4me_create_venv'] = {
'command': 'sudo -u left4me /usr/bin/python3 -m venv /opt/left4me/.venv',
'unless': 'test -x /opt/left4me/.venv/bin/python',
'cascade_skip': False,
'needs': [
'directory:/opt/left4me',
'pkg_apt:python3-venv',
'user:left4me',
],
'triggers': [
'action:left4me_pip_upgrade',
],
}
actions['left4me_pip_upgrade'] = {
'command': 'sudo -u left4me /opt/left4me/.venv/bin/python -m pip install --upgrade pip',
'triggered': True,
'cascade_skip': False,
'needs': [
'pkg_apt:python3-pip',
],
# No triggers — pip_install runs on every apply (gated by `unless`)
# rather than being chained from here. Keeps pip_upgrade scoped to
# exactly its purpose.
}
actions['left4me_pip_install'] = {
# Single pip invocation installs both editable packages from the same
# checkout. Runs on every apply: pip install -e is fast on no-op, and
# any gate weaker than "egg-info matches pyproject.toml" can mask
# script regeneration — e.g. adding [project.scripts] later wouldn't
# be picked up if `unless` only checks importability.
'command': 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web',
'cascade_skip': False,
'needs': [
'git_deploy:/opt/left4me/src',
'action:left4me_create_venv',
'action:left4me_chown_src',
],
'triggers': [
'action:left4me_alembic_upgrade',
],
}
actions['left4me_alembic_upgrade'] = {
# Mirrors deploy-test-server.sh:239-242. Runs as left4me with both env
# files sourced; JOB_WORKER_ENABLED=false so a stray worker doesn't race
# with the migration.
'command': (
'sudo -u left4me sh -c "'
'cd /opt/left4me/src/l4d2web && '
'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
'/opt/left4me/.venv/bin/alembic -c /opt/left4me/src/l4d2web/alembic.ini upgrade head'
'"'
),
'triggered': True,
'cascade_skip': False,
'needs': [
'action:left4me_pip_install',
'file:/etc/left4me/host.env',
'file:/etc/left4me/web.env',
],
'triggers': [
'action:left4me_seed_overlays',
'svc_systemd:left4me-web.service:restart',
],
}
actions['left4me_seed_overlays'] = {
# Idempotent: refreshes script bodies in place; existing overlay rows keep their ids.
'command': (
'sudo -u left4me sh -c "'
'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
'/opt/left4me/.venv/bin/flask --app l4d2web.app:create_app '
'seed-script-overlays /opt/left4me/src/examples/script-overlays'
'"'
),
'triggered': True,
'cascade_skip': False,
'needs': [
'action:left4me_alembic_upgrade',
],
}

275
bundles/left4me/metadata.py Normal file
View file

@ -0,0 +1,275 @@
assert node.has_bundle('nftables')
assert node.has_bundle('systemd')
defaults = {
'left4me': {
# Application-wide defaults; node only overrides if it really needs to.
'git_url': 'https://git.sublimity.de/cronekorkn/left4me.git',
'git_branch': 'master',
'secret_key': repo.vault.random_bytes_as_base64_for(f'{node.name} left4me secret_key', length=32).value,
'gunicorn_workers': 1,
'gunicorn_threads': 32,
'job_worker_threads': 4,
# Whole 27000-block: covers Steam's defaults (27015 game, 27005
# client/RCON) plus headroom for ad-hoc ports without further
# nftables changes. Mirrored into LEFT4ME_PORT_RANGE_{START,END}
# by web.env.mako and into the nftables input rule by the
# nftables_input reactor below.
'port_range_start': 27000,
'port_range_end': 27999,
},
'apt': {
'packages': {
'p7zip-full': {},
'nftables': {},
'iproute2': {},
'curl': {},
'ca-certificates': {},
'python3': {},
'python3-venv': {},
'python3-pip': {},
'python3-dev': {},
# steamcmd is a 32-bit ELF; needs i386 multiarch + these libs.
# `_` → `:` is bundlewrap's pkg_apt convention for multiarch
# names (see pkg_apt.py:48).
'libc6_i386': { # installs libc6:i386
'needs': ['action:left4me_dpkg_add_i386_arch'],
},
'lib32z1': {
'needs': ['action:left4me_dpkg_add_i386_arch'],
},
},
},
'nftables': {
# Match deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft.
# Mark srcds UDP egress (uid left4me) with DSCP EF + skb priority 6
# so CAKE classifies it into the priority tin.
'output': {
'meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000',
'meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000',
},
},
'systemd': {
'services': {
'left4me-web.service': {
'enabled': True,
'running': True,
'needs': [
'action:left4me_alembic_upgrade',
'file:/etc/left4me/host.env',
'file:/etc/left4me/web.env',
],
},
# Note: left4me-server@.service is a TEMPLATE — instances are
# started on-demand by the web app via the left4me-systemctl
# helper. Don't enable/start it from here.
# The slices are installed (file present) but don't need
# enable/start — they're activated implicitly when a unit
# uses Slice=.
},
},
'backup': {
# Application-owned paths. Set-merged with backup group / node-level paths.
'paths': {
'/var/lib/left4me',
'/etc/left4me',
},
},
}
@metadata_reactor.provides(
'nginx/vhosts',
)
def nginx_vhosts(metadata):
# letsencrypt/domains and monitoring/services for the vhost are auto-
# populated by bundles/nginx/metadata.py. We just declare check_path:
# '/health' so the auto-check hits the Flask health endpoint, not '/'.
domain = metadata.get('left4me/domain')
return {
'nginx': {
'vhosts': {
domain: {
'content': 'nginx/proxy_pass.conf',
'context': {
'target': 'http://127.0.0.1:8000',
},
'check_path': '/health',
},
},
},
}
@metadata_reactor.provides(
'nftables/input',
)
def nftables_input(metadata):
port_start = metadata.get('left4me/port_range_start')
port_end = metadata.get('left4me/port_range_end')
return {
'nftables': {
'input': {
f'udp dport {port_start}-{port_end} accept',
f'tcp dport {port_start}-{port_end} accept',
},
},
}
@metadata_reactor.provides(
'systemd/units',
)
def systemd_units(metadata):
workers = metadata.get('left4me/gunicorn_workers')
threads = metadata.get('left4me/gunicorn_threads')
# cgroup-v2 cpuset. `system_cpus` (set of int CPU ids, declared per
# node) pins system/user/build; the complement pins l4d2-game. On HT
# hosts, list both siblings of a physical core so games don't share
# L1/L2 with system work — pairings via
# /sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list.
vm_threads = metadata.get('vm/threads', metadata.get('vm/cores'))
all_cpus = set(range(vm_threads))
system_cpus = metadata.get('left4me/system_cpus')
if not system_cpus <= all_cpus:
raise Exception(
f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
f'includes CPUs outside [0, {vm_threads})'
)
game_cpus = all_cpus - system_cpus
if not game_cpus:
raise Exception(
f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
f'leaves no cores for games'
)
system_cpus_string = ','.join(str(t) for t in sorted(system_cpus))
game_cpus_string = ','.join(str(t) for t in sorted(game_cpus))
# Drop-in for upstream system.slice / user.slice (units we don't own).
# Same '<parent>.d/<basename>.conf' convention as nginx and autologin.
cpuset_dropin = {'Slice': {'AllowedCPUs': system_cpus_string}}
return {
'systemd': {
'units': {
'left4me-web.service': {
'Unit': {
'Description': 'left4me web application',
'After': 'network-online.target',
'Wants': 'network-online.target',
},
'Service': {
'Type': 'simple',
'User': 'left4me',
'Group': 'left4me',
'WorkingDirectory': '/opt/left4me/src',
'Environment': {
'HOME=/var/lib/left4me',
'PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
},
'EnvironmentFile': (
'/etc/left4me/host.env',
'/etc/left4me/web.env',
),
'ExecStart': (
'/opt/left4me/.venv/bin/gunicorn '
f'--workers {workers} --threads {threads} '
"--bind 127.0.0.1:8000 'l4d2web.app:create_app()'"
),
'Restart': 'on-failure',
'RestartSec': '3',
# NoNewPrivileges intentionally NOT set: workers sudo to the helpers.
'ProtectSystem': 'full',
'ReadWritePaths': '/var/lib/left4me',
'PrivateTmp': 'true',
},
'Install': {
'WantedBy': {'multi-user.target'},
},
},
'left4me-server@.service': {
'Unit': {
'Description': 'left4me server instance %i',
'After': 'network-online.target',
'Wants': 'network-online.target',
'StartLimitBurst': '5',
'StartLimitIntervalSec': '60s',
},
'Service': {
'Type': 'simple',
'User': 'left4me',
'Group': 'left4me',
'EnvironmentFile': (
'/etc/left4me/host.env',
'/var/lib/left4me/instances/%i/instance.env',
),
'WorkingDirectory': '-/var/lib/left4me/runtime/%i/merged/left4dead2',
'ExecStartPre': (
'+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
'/usr/local/libexec/left4me/left4me-overlay mount %i'
),
'ExecStart': (
'/var/lib/left4me/runtime/%i/merged/srcds_run '
'-game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS'
),
'ExecStopPost': (
'+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
'/usr/local/libexec/left4me/left4me-overlay umount %i'
),
'Restart': 'on-failure',
'RestartSec': '5',
'Slice': 'l4d2-game.slice',
'Nice': '-5',
'IOSchedulingClass': 'best-effort',
'IOSchedulingPriority': '4',
'OOMScoreAdjust': '-200',
'MemoryHigh': '1.5G',
'MemoryMax': '2G',
'TasksMax': '256',
'LimitNOFILE': '65536',
'KillSignal': 'SIGINT',
'TimeoutStopSec': '15s',
'LogRateLimitIntervalSec': '0',
'NoNewPrivileges': 'true',
'PrivateTmp': 'true',
'PrivateDevices': 'true',
'ProtectHome': 'true',
'ProtectSystem': 'strict',
'ReadOnlyPaths': '/var/lib/left4me/installation /var/lib/left4me/overlays',
'ReadWritePaths': '/var/lib/left4me/runtime/%i',
'RestrictSUIDSGID': 'true',
'LockPersonality': 'true',
},
'Install': {
'WantedBy': {'multi-user.target'},
},
},
'l4d2-game.slice': {
'Unit': {
'Description': 'left4me game-server slice',
'Before': 'slices.target',
},
'Slice': {
'CPUWeight': '1000',
'IOWeight': '1000',
'AllowedCPUs': game_cpus_string,
},
},
'l4d2-build.slice': {
'Unit': {
'Description': 'left4me script-sandbox build slice',
'Before': 'slices.target',
},
'Slice': {
'CPUWeight': '10',
'IOWeight': '10',
'AllowedCPUs': system_cpus_string,
},
},
'system.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
'user.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
},
},
}

View file

@ -1,9 +1,60 @@
https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script # letsencrypt
Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
DNS-01 against the in-house bind-acme server.
[upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
## First-apply behaviour
Immediately after `bw apply <node>`, nginx serves a **self-signed
cert** for each declared domain — generated by
`/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
something to start with. The real Let's Encrypt cert arrives at most
24h later when the systemd timer fires
(`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
shortcut the wait:
```sh
ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
ssh <node> 'sudo systemctl reload nginx'
```
## DNS-01 prerequisites
`hook.sh` does `nsupdate` against the bind-acme server (referenced
by `letsencrypt/acme_node`). For the challenge to succeed:
1. The acme node must be in the same metadata graph (so
`bw metadata <node> -k letsencrypt/acme_node` resolves).
2. **All NS servers** for the validated domain must serve the
`_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
primary AND secondary geographic regions; both authoritative
servers must agree. If a secondary NS is also a bw-managed node,
`bw apply` it after adding the domain (see e.g. `ovh.secondary`).
3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
rendered with the bind-acme server's `network/internal/ipv4`
for clients outside that LAN, the route must exist (typically via
wireguard `s2s` peer membership).
## Negative-cache penalty
If the first DNS-01 attempt fails (e.g. zone not yet applied to the
secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
negative TTL (often 900s = 15 min). Subsequent attempts during that
window also fail and refresh the cache. Combined with LE's rate limit
of **5 failed authorisations per domain per hour**, recovery requires
you to **stop retrying** for ~15 minutes after fixing the DNS, then
make at most one attempt.
## nsupdate sample
For interactive testing of the bind-acme TSIG path:
```sh ```sh
printf "server 127.0.0.1 printf "server 127.0.0.1
zone acme.resolver.name. zone acme.resolver.name.
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT "hello" update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
send send
" | nsupdate -y hmac-sha512:acme:XXXXXX " | nsupdate -y hmac-sha512:acme:XXXXXX
``` ```

View file

@ -2,7 +2,7 @@ defaults = {
'apt': { 'apt': {
'packages': { 'packages': {
'dehydrated': {}, 'dehydrated': {},
'dnsutils': {}, 'bind9-dnsutils': {},
}, },
}, },
'letsencrypt': { 'letsencrypt': {

36
bundles/nginx/README.md Normal file
View file

@ -0,0 +1,36 @@
# nginx
Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
`data/nginx/*.conf`.
## How port 80 is served
The bundle ships a fixed `80.conf` to
`/etc/nginx/sites-available/80.conf` (picked up by the
`sites-enabled/` symlink) that handles **all** port-80 traffic
across vhosts:
1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
served from `/var/lib/dehydrated/acme-challenges/`.
2. All other port-80 requests are 301-redirected to
`https://$host$request_uri`.
Per-vhost templates only declare `listen 443 ssl http2;`, so they
don't need their own port-80 server blocks. If you need vhost-
specific port-80 behaviour (e.g. plain-HTTP without redirect),
override 80.conf or add a per-vhost block.
## Required metadata
- `vm/cores` — read directly by `items.py` for `worker_processes`.
No default; `bw items <node>` raises at item-build time if missing.
Typically supplied by the `vm` bundle / hetzner-vm group; double-
check on bare-metal hosts.
- `nginx/vhosts` — dict of vhost-name → vhost-config.
- `nginx/modules` — list of dynamic modules to load.
## Cross-namespace
`items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
HTTPS block when LE hasn't declared the domain yet — keeps the
bundle loadable on a node where letsencrypt isn't fully wired up.

View file

@ -32,12 +32,13 @@ http {
% endif % endif
% if has_websockets: # Always defined: serves both WS-enabled vhosts (Connection: upgrade for
# ws clients) and SSE/keep-alive vhosts (Connection: "" lets nginx manage
# the upstream connection for keep-alive, instead of forcing "close").
map $http_upgrade $connection_upgrade { map $http_upgrade $connection_upgrade {
default upgrade; default upgrade;
'' close; '' '';
} }
% endif
include /etc/nginx/sites-enabled/*; include /etc/nginx/sites-enabled/*;
} }

View file

@ -64,7 +64,7 @@ files = {
'svc_systemd:nginx:restart', 'svc_systemd:nginx:restart',
}, },
}, },
'/etc/nginx/sites/80.conf': { '/etc/nginx/sites-available/80.conf': {
'triggers': { 'triggers': {
'svc_systemd:nginx:restart', 'svc_systemd:nginx:restart',
}, },

View file

@ -33,7 +33,7 @@ for name, unit in node.metadata.get('systemd/units').items():
'svc_systemd:systemd-networkd.service:restart', 'svc_systemd:systemd-networkd.service:restart',
], ],
} }
elif extension in ['timer', 'service', 'mount', 'swap', 'target']: elif extension in ['timer', 'service', 'mount', 'swap', 'target', 'slice']:
path = f'/usr/local/lib/systemd/system/{name}' path = f'/usr/local/lib/systemd/system/{name}'
dependencies = { dependencies = {
'triggers': [ 'triggers': [

View file

@ -8,10 +8,16 @@ server {
location / { location / {
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-IP $remote_addr;
% if websockets: # Always set Upgrade + Connection via the $connection_upgrade map:
# WS client (Upgrade header sent) -> Connection: upgrade
# non-WS client (no Upgrade) -> Connection: "" (keep-alive)
# Lets every vhost serve both WS and SSE without per-vhost flags.
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade; proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade; proxy_set_header Connection $connection_upgrade;
% endif # SSE-safe pass-through (also fine for non-SSE traffic):
proxy_buffering off;
proxy_read_timeout 1h;
proxy_pass ${target}; proxy_pass ${target};
} }
} }

View file

@ -48,3 +48,51 @@ instead.
See [`conventions.md#secrets`](conventions.md#secrets) for the See [`conventions.md#secrets`](conventions.md#secrets) for the
demagify magic-string list and the rule's full rationale. demagify magic-string list and the rule's full rationale.
## Read-only commands — useful flag combinations
The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
These are the flag combinations agents reach for most often in this repo:
| Want to … | Run |
|---|---|
| Sanity-check the whole repo (parse + cross-cutting hooks) | `bw test` (defaults to `-HIJKMSp`) |
| Exercise reactors and item-graph for one node | `bw test <node>` (defaults to `-IJKMp`) |
| Same, but every node that has a given bundle | `bw test bundle:<name>` |
| Print one metadata key for one node | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
| Show where each metadata value comes from | `bw metadata <node> -b` |
| Resolve Faults (vault values) into the dump | `bw metadata <node> -f`**may print secrets, avoid** |
| List a node's items, with the bundle that defines each | `bw items <node> --blame` |
| Preview a rendered file's content | `bw items <node> file:<path> -f` |
| Verify against the live host, scoped to one bundle | `bw verify <node> -o bundle:<name>` |
| Hash metadata only (faster than full config hash) | `bw hash <node> -m` |
| Inspect the data backing a hash | `bw hash <node> -d` |
`bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
selector grammar: bare node name, group name, `bundle:<name>`,
`!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
[fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
## Bundle-validation workflow
`bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
loads every bundle, but a bundle's reactors only resolve when a node's
metadata is actually built — and that happens only for nodes that
opt in. Until then, reactor bugs stay dormant. bw rejects reactors
that don't read any metadata, but the rejection only fires once *some*
node consumes the bundle.
When developing a new bundle:
1. Scaffold + `bw test` — confirms parsing.
2. **Attach the bundle to one node** (or a stub node) by adding it to
`nodes/<n>.py`'s `bundles` list, or to a group the node is in.
3. `bw test <node>` — now reactors fire. This is where bundle bugs
surface.
4. `bw items <node> --blame` and `bw metadata <node> -k <key>`
confirm items materialise and derived metadata looks right.
5. `bw hash <node>` — preview against the live host.
Step 2 is non-optional. A bundle that "passes `bw test`" with no
consumer is proven only to parse.

View file

@ -127,6 +127,12 @@ bundle.
## 3. Per-bundle `AGENTS.md` template ## 3. Per-bundle `AGENTS.md` template
> **Status: replaced — pre-pivot intent only.** Per-bundle docs are plain
> `README.md` with no fixed structure. See §0 Revisions and the
> "Per-bundle README" section in [`bundles/AGENTS.md`](../../../bundles/AGENTS.md)
> for the current convention. The template below is kept as a record of
> the original design.
One balanced doc serving both audiences. Prose where prose helps, structure One balanced doc serving both audiences. Prose where prose helps, structure
where structure helps. Sections in order: where structure helps. Sections in order:
@ -339,6 +345,12 @@ in 30120 lines each; root `AGENTS.md` is ~150 lines.
### Phase 2 — seed bundles (10) ### Phase 2 — seed bundles (10)
> **Status: dropped — pre-pivot intent only.** Phase 2 didn't ship. After
> Phase 1 landed, the maintainer pulled the per-bundle `AGENTS.md`
> migration: the rigid template proved a poor fit for the heterogeneous
> existing READMEs. See §0 Revisions. The seed list and migration plan
> below are kept as a record of how the work was scoped.
Bundles selected empirically (node+group references and recent commit Bundles selected empirically (node+group references and recent commit
activity, validated 2026-05-10): activity, validated 2026-05-10):

View file

@ -0,0 +1,253 @@
# Round 1 — agent-doc refactor (gaps 16 + cmd cheat sheet)
## Why
A previous session integrated `bundles/left4me/` and brought
`ovh.left4me` live. The integration produced a handoff (at
`~/.claude/plans/2026-05-10-ckn-bw-docs-improvements-handoff.md`)
listing 12 documentation gaps surfaced by the work. This spec covers
the first six (the cross-cutting ones) plus a useful side-quest:
adding a read-only command cheat sheet to `docs/agents/commands.md`.
Gaps 712 (item-specific, bundle READMEs) are deferred to a follow-up
round.
## Scope
In:
- Gap 1 — drop `bw bundles` (doesn't exist), add `bw verify` to the
read-only allowlist.
- Gap 2 — bundle-validation workflow needs a node attached.
- Gap 3 — nodes carry only node-specific metadata (split across
`bundles/AGENTS.md` and `nodes/AGENTS.md`).
- Gap 4 — reactors must read metadata or be defaults.
- Gap 5 — `triggers``triggered: True` invariant + self-healing
pattern.
- Gap 6 — `unless` semantics (folded into Gap 5's second bullet).
- Side-quest: read-only command cheat sheet in `commands.md` (`bw
test` flag matrix + selectors, `bw metadata -k/-b/-f`, `bw items
--blame/-f`, `bw verify -o bundle:`, `bw hash -m/-d`).
Out:
- Gaps 712 (`source` implicit, `git_deploy` chown, `git_deploy` URL
form, letsencrypt/bind/nginx READMEs).
- Any change to bundle behaviour. This is pure docs; if a doc claim
feels wrong, push back to the maintainer rather than editing
`.py`.
## Verification approach
For each gap, find current line numbers in the target doc (handoff
line numbers are May 2026; some have drifted). Verify code-level
claims against the fork source under `.venv/src/bundlewrap/` before
quoting them.
Already verified during brainstorm:
- Gap 1: `bw bundles` is not a subcommand of the installed fork
(`.venv/bin/bw --help` lists only
`apply, debug, diff, groups, hash, ipmi, items, lock, metadata,
nodes, plot, pw, repo, run, stats, test, verify, zen`). `bw verify`
is read-only.
- Gap 2: `bw test` default flag set differs by mode. Whole-repo:
`-HIJKMSp`. Node-targeted: `-IJKMp`. The repo-mode adds `-H`
(repo hooks) and `-S` (subgroup-loops); the node-mode adds `-J`
(node hooks). Reactors only resolve when a node's metadata is
built, which only happens when a node opts into the bundle.
- Gap 4: exact wording at `metagen.py:428`:
`"{reactor_name} on {node_name} did not request any metadata, you
might want to use defaults instead"`.
- Gap 5: exact wording at `deps.py:340`:
`"'{item1}' in bundle '{bundle1}' triggered by '{item2}' in bundle
'{bundle2}', but missing 'triggered' attribute"`.
- Gap 3 precedent: `bundles/left4me/metadata.py:10` is the canonical
random-bytes-in-defaults example. `bundles/postgresql/metadata.py:4`
is the password_for-at-module-scope example. (The handoff cites
postgresql for the random-bytes pattern; that's a misattribution —
postgresql uses `password_for`.)
After every commit: `.venv/bin/bw test` must pass with the same
output as before. Pure-docs edits cannot break it unless a `.py` is
touched accidentally.
## Commits
Six iterative commits, matching repo style.
### Commit 1 — drop `bw bundles`, add `bw verify` (Gap 1)
`AGENTS.md` rule 1 only. The handoff also flagged
`bundles/AGENTS.md:60-64`, but that list no longer references
`bw bundles` (it currently reads `bw test` / `bw items` / `bw hash`).
That section gets rewritten in commit 3, not here.
```diff
- to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
- `bw items`, `bw metadata`, `bw hash`, `bw debug`. See
+ to `bw test`, `bw nodes`, `bw groups`, `bw items`,
+ `bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
```
### Commit 2 — read-only command cheat sheet
Append to `docs/agents/commands.md`. New H2 section, table format
to match the existing voice.
```markdown
## Read-only commands — useful flag combinations
The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
These are the flag combinations agents reach for most often in this repo:
| Want to … | Run |
|---|---|
| Sanity-check the whole repo (parse + cross-cutting hooks) | `bw test` (defaults to `-HIJKMSp`) |
| Exercise reactors and item-graph for one node | `bw test <node>` (defaults to `-IJKMp`) |
| Same, but every node that has a given bundle | `bw test bundle:<name>` |
| Print one metadata key for one node | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
| Show where each metadata value comes from | `bw metadata <node> -b` |
| Resolve Faults (vault values) into the dump | `bw metadata <node> -f`**may print secrets, avoid** |
| List a node's items, with the bundle that defines each | `bw items <node> --blame` |
| Preview a rendered file's content | `bw items <node> file:<path> -f` |
| Verify against the live host, scoped to one bundle | `bw verify <node> -o bundle:<name>` |
| Hash metadata only (faster than full config hash) | `bw hash <node> -m` |
| Inspect the data backing a hash | `bw hash <node> -d` |
`bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
selector grammar: bare node name, group name, `bundle:<name>`,
`!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
[fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
```
### Commit 3 — bundle validation needs a node attached (Gap 2)
Two file changes.
**`bundles/AGENTS.md` lines 59-64** — replace the Verify list:
```markdown
5. **Verify, in this order:**
- `bw test` — repo-wide parse + cross-cutting hooks. Loads every
bundle, but reactors don't fire for nodes that haven't opted into
the bundle yet — bugs in new reactors stay hidden here.
- **Attach the bundle to a node** (via the node's `bundles` list, or
a group it belongs to). Until you do, the next steps don't actually
exercise the bundle.
- `bw test <node>` — exercises every reactor and item-graph edge for
that node. This is where most new-bundle bugs surface.
- `bw items <node> --blame` — confirm items materialise with the right
paths, authored by the expected bundle.
- `bw metadata <node> -k <a/b>` — spot-check derived metadata.
- `bw hash <node>` — preview vs current host state.
See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
for the rationale.
```
**`docs/agents/commands.md`** — new section after the cheat sheet:
```markdown
## Bundle-validation workflow
`bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
loads every bundle, but a bundle's reactors only resolve when a node's
metadata is actually built — and that happens only for nodes that
opt in. Until then, reactor bugs stay dormant. bw rejects reactors that
don't read any metadata, but the rejection only fires once *some* node
consumes the bundle.
When developing a new bundle:
1. Scaffold + `bw test` — confirms parsing.
2. **Attach the bundle to one node** (or a stub node) by adding it to
`nodes/<n>.py`'s `bundles` list, or to a group the node is in.
3. `bw test <node>` — now reactors fire. This is where bundle bugs
surface.
4. `bw items <node> --blame` and `bw metadata <node> -k <key>` — confirm
items materialise and derived metadata looks right.
5. `bw hash <node>` — preview against the live host.
Step 2 is non-optional. A bundle that "passes `bw test`" with no consumer
is proven only to parse.
```
### Commit 4 — nodes carry only node-specific metadata (Gap 3)
**`bundles/AGENTS.md` Conventions** — new bullet:
```markdown
- **Bundles own application-wide knowledge; nodes carry only the few
per-host knobs the bundle actually needs.** When designing a bundle,
identify the per-node knobs (e.g. domain, uplink interface, a
vault-id suffix) and put everything else in `defaults`, or in a
reactor that derives from those knobs. Per-node random secrets
belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
keyed on the node — not in the node file. See
`bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
at module scope).
```
**`nodes/AGENTS.md` Pitfalls** — new bullet:
```markdown
- **Bloated per-node metadata is usually a bundle smell.** If a
bundle's metadata block in the node file has more than 3-5 keys,
the bundle is probably under-using `defaults` / reactors. Push the
contribution into the bundle (see
[`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
growing the node file.
```
### Commit 5 — reactors must read metadata or be defaults (Gap 4)
**`bundles/AGENTS.md` Pitfalls** — new bullet:
```markdown
- **Reactors must read metadata.** If a reactor body returns a static
dict without calling `metadata.get(...)`, bw raises
`ValueError: <reactor> on <node> did not request any metadata, you
might want to use defaults instead` once a node consumes the bundle.
Fix: fold the contribution into `defaults`. The rule applies even
when the reactor writes into another bundle's namespace — a static
contribution to e.g. `nftables/output` belongs in `defaults`, where
bw merges it with other bundles' contributions.
```
### Commit 6 — `triggers` invariant + self-healing + `unless` (Gaps 5+6)
**`bundles/AGENTS.md` Pitfalls** — two new bullets (Gap 6's `unless`
semantics fold into the second; cleaner than three bullets):
```markdown
- **`triggers``triggered: True` invariant.** Any item listed in
another's `triggers` list must declare `triggered: True`. bw
enforces this at `bw test` time: *"…triggered by …, but missing
'triggered' attribute"*. Corollary: an action can't be both in an
upstream `triggers` list AND self-healing every apply — pick one.
- **Triggered actions don't recover from partial failure.** When an
upstream item's apply succeeds but its triggered downstream action
fails, subsequent applies can't recover via the trigger chain —
upstream is "already in desired state" and never re-triggers. For
actions that must self-heal (pip installs, chowns, migrations),
drop `triggered: True` and gate the command with `unless:
<fast-check>`. `unless` is a shell command on the target host whose
exit status decides whether the main command runs (exit 0 = skip);
it's checked at fire time, after `triggered:` filtering.
```
## Out of scope
- Gaps 712 — deferred. The maintainer re-engages after this round.
- Bundle behaviour changes. Pure docs.
- `bw apply` / `bw run` — not authorised this session.
## Constraints
- Don't echo decrypted secrets in commit messages or new doc text.
- Don't restore `*.py_` parked nodes.
- After each commit, `.venv/bin/bw test` must pass.
- No push.

View file

@ -0,0 +1,286 @@
# Round 2 — agent-doc refactor (gaps 712)
## Why
Continuation of round 1 (spec at
`2026-05-10-ckn-bw-agents-md-refactor-round-1-design.md`). Round 1
landed the cross-cutting lessons (read-only allowlist, bundle
validation needs a node, nodes-carry-only-node-specific-metadata,
reactors-must-read-metadata, triggers/triggered:True invariant,
self-healing pattern). Round 2 covers the remaining six gaps: built-in
item-type gotchas and three bundle READMEs.
## Scope
In:
- Gap 7 — `file:`'s `source` defaults to the basename of the destination.
- Gap 8 — `git_deploy` extracts as the connecting user (root after
sudo); chown action needed for non-root downstream consumers.
- Gap 9 — `git_deploy` URL form: `://` triggers per-apply clone, no `://`
requires a `git_deploy_repos` map at the repo root.
- Gap 10 — `bundles/letsencrypt`: first-apply behaviour, DNS-01
prerequisites, negative-cache penalty.
- Gap 11 — `bundles/bind`: applying changes to a `master_node`-linked
pair needs `bw apply` on both ends.
- Gap 12 — `bundles/nginx`: how port 80 is served, `vm/cores`
requirement.
Out:
- Bundle behaviour changes. Pure docs.
- `bw apply` / `bw run` — not authorised this session.
## Placement decision (diverges from the handoff)
The handoff suggests `items/AGENTS.md` for gaps 7, 8, 9. But
`items/AGENTS.md` is scoped to **custom** item types in the `items/`
directory — its first sentence: *"Custom item types — each `*.py` is
a `bundlewrap.items.Item` subclass…"*. Built-in gotchas (`file:`,
`git_deploy:`) don't fit there.
Round-1 lessons about built-in mechanics (reactors must read metadata,
`triggers` invariant, self-healing pattern) all landed in
`bundles/AGENTS.md` Pitfalls. Gaps 7, 8, 9 are the same shape, so
they go in the same place.
## Validation findings
- Gap 7: well-known bw built-in semantics. Trusting the handoff.
- Gap 8: confirmed at `.venv/src/bundlewrap/bundlewrap/items/git_deploy.py`'s
`fix()` method — uses `self.node.upload(...)` which writes as the sudo
user (root). Files end up root-owned.
- Gap 9: confirmed in round 1 (`git_deploy.py:103` —
`if "://" in self.attributes['repo']:`).
- Gap 10: confirmed `/etc/dehydrated/letsencrypt-ensure-some-certificate`
exists in the bundle; runs on every domain with idempotent `unless`.
Daily timer at `/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`.
- Gap 11: nuanced. The bundle DOES set `bind/type = 'slave'` and renders
different named.conf.local for slaves, so bind itself may AXFR at
runtime. But the slave's *bw-managed* zone files are statically
rendered from the master's metadata at slave-apply time
(`bundles/bind/items.py:100`). The practical workflow rule — "apply
both" — is correct regardless. I'll frame the README as the workflow
rule, not the absolute "not AXFR slaving" claim from the handoff.
- Gap 12: confirmed `nginx.conf:42` includes `/etc/nginx/sites-enabled/*`;
`nginx/items.py:35` reads `node.metadata.get('vm/cores')` with no
default. README does not exist.
## Existing README states
- `bundles/letsencrypt/README.md` — 9 lines: upstream link + nsupdate
snippet. Reshape into an operational README; keep the nsupdate snippet.
- `bundles/bind/README.md` — does not exist. Create.
- `bundles/nginx/README.md` — does not exist. Create.
## Commits
### Commit 7 — `file:` source defaults to destination basename (Gap 7)
`bundles/AGENTS.md` Pitfalls — new bullet:
```markdown
- **`file:` `source` defaults to the destination basename.** For a
destination of `/etc/foo/bar.conf` with no `source` key, bw looks for
`bundles/<bundle>/files/bar.conf`. Only declare `source` explicitly
when the basename you want differs (e.g. shipping a Mako template
named `bar.conf.mako` to a destination of `/etc/foo/bar.conf`).
```
### Commit 8 — `git_deploy` gotchas (Gaps 8 + 9)
`bundles/AGENTS.md` Pitfalls — two new bullets.
```markdown
- **`git_deploy` extracts as the connecting (sudo) user — files end up
root-owned.** A downstream action that runs as a non-root app user
(typical: editable pip install, Rails bundle install) will hit
`Permission denied` on `.egg-info` or similar. The fix is a
self-healing chown action between `git_deploy` and the downstream
action:
```python
actions['<bundle>_chown_src'] = {
'command': 'chown -R <user>:<group> <path>',
'unless': 'test -z "$(find <path> ! -user <user> -print -quit)"',
'cascade_skip': False,
'needs': ['git_deploy:<path>', 'user:<user>', 'group:<group>'],
}
```
See `bundles/left4me/items.py` for an in-tree example.
- **`git_deploy` URL form matters.** A URL containing `://` (HTTP/HTTPS,
`ssh://`) makes bw clone to a temp dir per-apply — no operator-side
state needed. Without `://` (SCP-style `git@host:path`), bw expects a
`git_deploy_repos` map file at the repo root pointing at a long-lived
local clone, and raises `RepositoryError('missing repo map for
git_deploy')` if it isn't there. For HTTPS-reachable repos use the
HTTPS form; for SSH-only, prefer the explicit `ssh://user@host/path`
form so the map isn't needed.
```
### Commit 9 — letsencrypt README (Gap 10)
Reshape `bundles/letsencrypt/README.md`. Keep the upstream link and
nsupdate snippet at the top; add three structured sections.
```markdown
# letsencrypt
Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
DNS-01 against the in-house bind-acme server.
[upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
## First-apply behaviour
Immediately after `bw apply <node>`, nginx serves a **self-signed
cert** for each declared domain — generated by
`/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
something to start with. The real Let's Encrypt cert arrives at most
24h later when the systemd timer fires
(`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
shortcut the wait:
```sh
ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
ssh <node> 'sudo systemctl reload nginx'
```
## DNS-01 prerequisites
`hook.sh` does `nsupdate` against the bind-acme server (referenced
by `letsencrypt/acme_node`). For the challenge to succeed:
1. The acme node must be in the same metadata graph (so
`bw metadata <node> -k letsencrypt/acme_node` resolves).
2. **All NS servers** for the validated domain must serve the
`_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
primary AND secondary geographic regions; both authoritative
servers must agree. If a secondary NS is also a bw-managed node,
`bw apply` it after adding the domain (see e.g. `ovh.secondary`).
3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
rendered with the bind-acme server's `network/internal/ipv4`
for clients outside that LAN, the route must exist (typically via
wireguard `s2s` peer membership).
## Negative-cache penalty
If the first DNS-01 attempt fails (e.g. zone not yet applied to the
secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
negative TTL (often 900s = 15 min). Subsequent attempts during that
window also fail and refresh the cache. Combined with LE's rate limit
of **5 failed authorisations per domain per hour**, recovery requires
you to **stop retrying** for ~15 minutes after fixing the DNS, then
make at most one attempt.
## nsupdate sample
For interactive testing of the bind-acme TSIG path:
```sh
printf "server 127.0.0.1
zone acme.resolver.name.
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
send
" | nsupdate -y hmac-sha512:acme:<TSIG_KEY_REDACTED>
```
```
### Commit 10 — bind README (Gap 11, reframed)
Create `bundles/bind/README.md`. Frame as the workflow rule, not the
absolute "not AXFR" claim.
```markdown
# bind
Authoritative DNS — primary plus optional `bind/master_node` slaves.
## Applying changes needs both nodes
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time (see `bundles/bind/items.py:100`). When
you change a record on the master (adding a `letsencrypt/domains`
entry, a new vhost, etc.), the change is only published once you
apply BOTH:
```sh
bw apply htz.mails # primary (where the source records live)
bw apply ovh.secondary # secondary (renders its own zone files)
```
Until both have been applied, `bw verify ovh.secondary` will show
stale zones and consumers that hit the secondary (Let's Encrypt's
secondary-region validators in particular) will see NXDOMAIN. Even
though the slave's named.conf.local declares `type slave;`, don't
rely on bind's own AXFR catching up — the bw-rendered file on disk
is what `bw verify` measures.
## See also
- `bundles/bind-acme/` — the in-house ACME-update receiver.
- `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
negative-cache penalty (the most common operational consequence of
forgetting to apply the secondary).
```
### Commit 11 — nginx README (Gap 12)
Create `bundles/nginx/README.md`.
```markdown
# nginx
Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
`data/nginx/*.conf`.
## How port 80 is served
The bundle ships a fixed `80.conf` to
`/etc/nginx/sites-available/80.conf` (picked up by the
`sites-enabled/` symlink) that handles **all** port-80 traffic
across vhosts:
1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
served from `/var/lib/dehydrated/acme-challenges/`.
2. All other port-80 requests are 301-redirected to
`https://$host$request_uri`.
Per-vhost templates only declare `listen 443 ssl http2;`, so they
don't need their own port-80 server blocks. If you need vhost-
specific port-80 behaviour (e.g. plain-HTTP without redirect), you'll
need to override 80.conf or add a per-vhost block.
## Required metadata
- `vm/cores` — read directly by `items.py` for `worker_processes`.
No default; `bw items <node>` raises at item-build time if missing.
Typically supplied by the `vm` bundle / hetzner-vm group; double-
check on bare-metal hosts.
- `nginx/vhosts` — dict of vhost-name → vhost-config.
- `nginx/modules` — list of dynamic modules to load.
## Cross-namespace
`items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
HTTPS block when LE hasn't declared the domain yet — keeps the bundle
loadable on a node where letsencrypt isn't fully wired up.
```
## Out of scope
- Bundle behaviour changes. Pure docs.
- `bw apply` / `bw run`.
- Reformatting the existing two-line bundle READMEs into the new
shape — bundles/AGENTS.md explicitly says don't do that
("uneven quality is part of what we accept in exchange for not
blocking other work").
## Constraints
- Don't echo decrypted secrets. The TSIG-key example in the
letsencrypt nsupdate snippet uses `<TSIG_KEY_REDACTED>`.
- After each commit, `.venv/bin/bw test` must pass.
- No push.

View file

@ -0,0 +1,5 @@
{
'bundles': {
'left4me',
},
}

View file

@ -81,6 +81,12 @@ This loader shape has consequences:
These are intentional parks/buffers, not bugs. These are intentional parks/buffers, not bugs.
- **`id` must be unique.** A pre-apply hook (`hooks/unique_node_ids.py`) - **`id` must be unique.** A pre-apply hook (`hooks/unique_node_ids.py`)
enforces this; duplicate IDs fail `bw test` and `bw apply`. enforces this; duplicate IDs fail `bw test` and `bw apply`.
- **Bloated per-node metadata is usually a bundle smell.** If a
bundle's metadata block in the node file has more than 3-5 keys,
the bundle is probably under-using `defaults` / reactors. Push the
contribution into the bundle (see
[`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
growing the node file.
## See also ## See also

View file

@ -233,6 +233,7 @@
'10.0.229.0/24', '10.0.229.0/24',
], ],
}, },
'ovh.left4me': {},
}, },
'clients': { 'clients': {
'macbook': { 'macbook': {

View file

@ -1,15 +1,21 @@
{ {
'hostname': '141.95.32.8', 'hostname': '141.95.32.8',
'username': 'debian',
'groups': [ 'groups': [
'backup',
'debian-13', 'debian-13',
'left4me',
'monitored', 'monitored',
'webserver',
], ],
'bundles': [ 'bundles': [
#'wireguard', 'wireguard',
], ],
'metadata': { 'metadata': {
'id': '14d2abc-3855-4bb7-99e2-d4e3eb0344dd', 'id': '14d2abc-3855-4bb7-99e2-d4e3eb0344dd',
'vm': {
'cores': 4, # 4 physical, 8 with HT
'threads': 8,
},
'network': { 'network': {
'external': { 'external': {
'interface': 'enp3s0f0', 'interface': 'enp3s0f0',
@ -35,5 +41,12 @@
}, },
}, },
}, },
'left4me': {
'domain': 'left4.me',
# Both HT siblings of physical core 0 (cpu0+cpu4 per
# /sys/devices/system/cpu/cpu0/topology/thread_siblings_list).
# Keeps system work off the physical cores running game ticks.
'system_cpus': {0, 4},
},
}, },
} }