Both env files now follow the same pattern: root owns the config so the
service user can't overwrite its own config, group=left4me so the
sudo -u left4me alembic + seed-overlays actions can source the file
(they failed with 'permission denied' when group=root and mode=0640).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without !decrypt: the encrypt$… string is rendered as a literal into
web.env, which then surfaces as 403 Forbidden from the Steam Web API
(because the URL key parameter contains "encrypt$gAAA…" instead of the
actual API key). Matches the existing pattern used by every other
encrypted secret in this repo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the metadata key default (None — node must override) and pipes it
into web.env.mako so the live-state poller can resolve Steam IDs to
persona names + avatars via GetPlayerSummaries.
ovh.left4me gets the actual key as an encrypted vault secret.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a left4me-workshop-refresh entry to the systemd-timers bundle,
firing nightly at 04:00 and invoking the new flask workshop-refresh
CLI that enqueues a refresh_workshop_items job. Owner of the job is
NULL (system-enqueued). The bw worker picks it up under existing
scheduler rules; idempotent against an already-queued/running refresh.
Also extends bundles/systemd-timers to accept an optional
environment_files key so the new unit can pull DATABASE_URL etc.
from /etc/left4me/{host,web}.env.
Replaces bundle-default system_core_count int with a per-node set of
CPU ids; reactor takes set complement for game cores. ovh.left4me sets
{0, 4} to keep both HT siblings of physical core 0 in system.slice
so games don't share L1/L2 with system work. systemd_units reactor
return inlined.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First N cores pin system/user/build (inline on owned slices, drop-ins
on upstream system.slice and user.slice via the systemd/units
'<parent>.d/<basename>.conf' convention). Remainder pins
l4d2-game.slice. Reactor raises on hosts with <2 threads or
system_core_count that leaves no cores for games.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes from the same debug session, both prerequisites for
`l4d2ctl install` to work end-to-end on a fresh node:
1) Install steamcmd via tarball under /opt/left4me/steam.
- dpkg --add-architecture i386 + libc6:i386 + lib32z1 (32-bit deps;
bw pkg_apt translates _ to : at install time, hence libc6_i386)
- curl|tar one-shot, guarded by `test -x steamcmd.sh`
- LEFT4ME_STEAMCMD in host.env so l4d2host invokes by absolute path
(mirrors the old bundles/left4dead2/files/setup approach; avoids
the dirname-$0 trap that bites when steamcmd is reached via a
PATH symlink)
2) Drop the `unless` on left4me_pip_install. The gate checked
importability of l4d2host/l4d2web, which is too weak a proxy for
install state: adding [project.scripts] to pyproject.toml later
wouldn't be picked up if the package was already importable from a
prior `pip install -e`. Cost is ~2s/apply for a no-op pip
resolution — not enough to keep the gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes that let every proxy_pass vhost serve both WS and
SSE without per-vhost flags or template conditionals:
1) nginx.conf: $connection_upgrade map is now always defined (drop
the % if has_websockets: gate), and the '' branch returns "" instead
of "close". With "" + proxy_http_version 1.1, nginx maintains
keep-alive to upstream for non-WS clients — which is what SSE
requires. WS clients still get Connection: upgrade as before.
2) data/nginx/proxy_pass.conf: drop the % if websockets: conditional.
Always set proxy_http_version 1.1 + Upgrade + Connection via the
map, plus proxy_buffering off and proxy_read_timeout 1h for SSE.
Effects on existing vhosts:
- home.server's Proxmox WS vhost: unchanged behavior (the WS branch
was already setting these headers). Gains the ability to also
serve SSE if ever needed.
- All other proxy_pass vhosts (Nextcloud, Freescout, YOURLS, Gitea,
etc.): get keep-alive to upstream (minor latency win) and unbuffered
pass-through (slight throughput cost on huge responses, neutral
for typical web app traffic).
Dead but harmless: bundles/nginx/metadata.py still defaults
nginx/has_websockets to False, and proxmox-ve/grafana still set it
to True. The flag is now a no-op; clean up in a separate pass.
The repo is indexed with cocoindex-code; semantic search beats grep for
"where is X / which bundle does Y" questions where you don't know the
exact identifier. Without `--path '**'` ccc scopes to the current
working directory, which is rarely what you want when navigating
ckn-bw — call it out so agents don't get confusing empty results.
pip_install's `unless` (import l4d2host, l4d2web) skips when both
packages are already installed — so on a code-only apply, pip_install
doesn't fire and alembic_upgrade (which it triggers) never runs.
The new 0008 migration would silently get skipped, leaving the DB
out of sync with the new schema.
Wire git_deploy → alembic_upgrade directly. alembic upgrade head is
idempotent (no-op when at head); seed_overlays + service:restart
cascade off alembic, so editable-install code changes also get picked
up by gunicorn.
Edge case noted (deferred): a migration-only change with no code
change has the same matching git rev, so this won't fire either. In
practice migrations always come with the code change that uses them.
On Debian 13 trixie `dnsutils` is a transitional package replaced by
`bind9-dnsutils`. Apt installs bind9-dnsutils when you ask for dnsutils,
but `dpkg -s dnsutils` returns 1 because no real package by that name
exists — bw's pkg_apt status check then flags the item as failed every
apply. Switching the dependency to the real package name resolves the
loop.
The bundle just needs `nsupdate` (provided by bind9-dnsutils) for the
DNS-01 challenge hook.
One-liner instead of "ssh + heredoc + sudo + sh -c + double quotes":
sudo left4me create-user alice --admin
sudo left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
sudo left4me routes
The wrapper sources host.env + web.env, drops to the left4me user,
sets JOB_WORKER_ENABLED=false (admin-side ops shouldn't race the
worker) and PYTHONPATH=/opt/left4me/src, then exec's the flask CLI
with whatever args followed `left4me`. No env-var enumeration: the
sh -c trailing 'sh "$@"' forwards positional args without quoting
hell. README updated to drop the verbose recipe.
Two things from the left4me-integration session worth pinning:
- 80.conf was orphaned in sites/ (not sites-enabled/) for an
unknown amount of time. Commit d49259f moved it; document the
resulting wiring so it's not re-broken accidentally.
- items.py reads node.metadata.get('vm/cores') with no default
for worker_processes; bare-metal nodes outside the vm group
raise at item-build time. Cost the agent ~10 min when
ovh.left4me first opted into webserver.
Also note the cross-namespace read on letsencrypt/domains.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time. Changing a record on the master only
publishes once both bw apply runs are done. The left4me-integration
session burned ~20 minutes assuming bw apply on htz.mails would
propagate to ovh.secondary via bind's own AXFR; it doesn't, because
bw verify measures the on-disk file, not the running zone.
Frame as the workflow rule rather than the absolute "not AXFR"
claim — the bundle does set type slave; in named.conf.local, but
that's orthogonal to the practical apply-both rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reshapes the existing scratchpad README into operational sections.
Captures three things that took the left4me-integration session
~30 minutes to figure out:
- After bw apply, nginx serves a self-signed cert until the daily
systemd timer fires; the dehydrated --cron one-liner shortcuts
the wait.
- DNS-01 needs all NS servers (primary AND secondary) to serve the
_acme-challenge CNAME, the acme node reachable, and TSIG-key
reachability via wireguard for off-LAN clients.
- LE's negative-cache + rate-limit combo: stop retrying for ~15
min after fixing DNS, then make at most one attempt.
Existing nsupdate sample preserved at the bottom.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caught during the left4me-integration nginx 80.conf move: the
agent declared a redundant 'source': '80.conf' on a file: item
whose destination already ended in 80.conf. The maintainer
flagged it as noise. Document the rule: only declare source
when the basename differs from the destination (e.g. .mako
template to a non-suffixed destination).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continuation of round 1. Five commits: two new bundles/AGENTS.md
Pitfalls (file: source basename, git_deploy gotchas) and three
bundle READMEs (letsencrypt operational, bind apply-both, nginx
new file). Diverges from the handoff on placement: gaps 7-9 go
in bundles/AGENTS.md not items/AGENTS.md, since items/AGENTS.md
is scoped to custom item types only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related lessons from the left4me integration:
1. The triggers/triggered:True invariant tripped three times in
one session. When chown_src was promoted from triggered-only
to self-healing-every-apply (drop triggered:True + add unless),
bw rejected because it was still in git_deploy's triggers
list. Same dance happened for pip_install.
2. Triggered actions can't recover from partial failure: once
upstream succeeds, it's "in desired state" forever and the
trigger never re-fires. For pip installs / chowns / migrations
that must heal on every apply, the right shape is no
triggered:True + unless:<fast-check>. unless semantics fold
into the same bullet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The left4me bundle's first cut had two reactors that returned
static dicts without calling metadata.get(...): systemd_services
(enable/run flags) and nftables_output (two static rule strings).
Both passed bw test (no consumer yet). Once attached to
ovh.left4me, bw raised "did not request any metadata, you might
want to use defaults instead". Fix was to fold both into defaults.
Document the pitfall, with the verbatim error wording and the
note that this applies to cross-namespace contributions too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the left4me bundle was first integrated, ovh.left4me's node
file carried ~40 lines of left4me-related metadata (git_url,
secret_key, full nginx vhost, monitoring, backups, nftables
rules). The maintainer pushed back: per-node metadata should be
only what genuinely varies per host. Refactor brought it down to
{'domain': 'left4.me'} with everything else in bundle defaults
or in a reactor deriving from the domain.
Add the rule to bundles/AGENTS.md from the bundle-author angle
(use defaults / vault-keyed-on-node for secrets, cite left4me
and postgresql for the established pattern). Add the reviewer's
form to nodes/AGENTS.md Pitfalls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bw test (no args) is a parsing gate, not a behaviour gate. A
bundle's reactors only resolve when some node's metadata is
built, so reactor bugs stay dormant until a node opts in. The
left4me-integration session shipped 8 commits that all "passed
bw test" with latent reactor-rejection bugs that surfaced only
once the bundle was attached to ovh.left4me.
Rewrites the verify-list in bundles/AGENTS.md to require attach-
first and uses richer command invocations (bw items --blame,
bw metadata -k <key>). Adds a Bundle-validation workflow section
to commands.md spelling out why step 2 is non-optional.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a flag-combinations table for bw test (selectors + the
HIJKMSp/IJKMp default-flag split), bw metadata -k/-b/-f (with
the -f sensitive-data warning), bw items --blame/-f, bw verify
-o bundle:, bw hash -m/-d. Also documents the shared target-
selector grammar.
Surfaced by the left4me-integration session, where the agent
relied on bare bw test / bw metadata / bw items invocations and
missed leverage from the available flags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bw bundles is not a subcommand of the installed fork (the actual
list is apply/debug/diff/groups/hash/ipmi/items/lock/metadata/
nodes/plot/pw/repo/run/stats/test/verify/zen). bw verify is
read-only and was missing from the list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the brainstorm + per-commit wording for the first six
gaps from the left4me-integration handoff, plus a side-quest
read-only command cheat sheet for docs/agents/commands.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bundle was shipping 80.conf (HTTP-to-HTTPS redirect + acme-challenge
alias) to /etc/nginx/sites/80.conf, but nginx.conf only `include`s
/etc/nginx/sites-enabled/* (which is a symlink to sites-available).
The file was orphaned — no node had a working port-80 listener.
Move the destination to /etc/nginx/sites-available/80.conf so the
existing sites-enabled symlink picks it up. The /etc/nginx purge will
clean up any stale /etc/nginx/sites/80.conf on existing hosts.
Same constraint pattern: items in a triggers list must be
triggered:True. chown_src dropped triggered:True in the prior commit
to become self-healing every-apply, so it can't stay in git_deploy's
triggers list. Now git_deploy has no triggers at all — chown_src and
pip_install both run every apply, gated by their own `unless` guards.
Same problem as pip_install: chown_src was triggered:True and only
fired when git_deploy did. After a partial first-apply where git_deploy
succeeded (extracting root-owned files) but the chown didn't happen
yet, subsequent applies left files root-owned forever — pip_install
fails with "permission denied" trying to write .egg-info/.
Drop triggered:True. Add an unless guard:
test -z "$(find /opt/left4me/src ! -user left4me -print -quit)"
i.e. skip the chown only when no non-left4me-owned file exists in the
tree.
The previous shape (`triggered: True`, in git_deploy's triggers list)
meant pip_install only ran when something upstream fired. After a
partial first-apply failure (where git_deploy succeeded but pip_install
failed for an unrelated reason), subsequent applies couldn't recover —
git_deploy was already in desired state, nothing fired pip_install.
Drop `triggered: True`. Drop pip_install from git_deploy's triggers
(bw enforces a triggers→triggered:True invariant). Add `unless`:
sudo -u left4me /opt/left4me/.venv/bin/python -c "import l4d2host, l4d2web"
to short-circuit when the venv is already correct. Editable installs
pick up code changes automatically — no need to re-pip on every git
update.
For dep changes (rare), nudge manually:
bw run ovh.left4me 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web'
bw's git_deploy extracts the git archive as the connecting user (root
after sudo), so files end up root-owned. The subsequent pip install
runs as left4me and needs to write .egg-info/ inside each editable
package, which fails with "permission denied".
Add action:left4me_chown_src triggered by git_deploy and required by
pip_install. Idempotent (chown -R is fine to re-run).
bw's git_deploy item assumes the destination directory exists on the
host — its fix path runs `find <dest> -mindepth 1 -delete` to clear
existing contents before unpacking the new archive, which fails on a
fresh box where the directory was never created. Flask follows the
same pattern (bundles/flask/items.py:13).
bw's git_deploy.py:103 falls into a per-apply temp clone path when the
repo URL contains '://' (HTTPS, ssh://, …). Without that, it requires
a static git_deploy_repos map file pointing at a long-lived local
clone — which is the wrong shape for left4me, where the source of
truth is git.sublimity.de.
Switching the default to the HTTPS URL means anyone with the bundle
gets a working clone-from-source on `bw apply`, no operator-side
mirror map required.
Note: the host will pull whatever is pushed to git.sublimity.de
master. Push local commits before applying.
Each reactor now scopes to a single downstream bundle:
nginx_vhosts -> nginx/vhosts
nftables_input -> nftables/input
Easier to grep "what writes nginx/vhosts" and harder to accidentally
couple unrelated keys together. Same merged metadata.
bundles/nginx/metadata.py:91-104 already creates a monitoring/services
entry per nginx/vhost using the vhost's check_protocol/check_path. Set
check_path: '/health' on the left4me vhost so the auto-check hits the
Flask health endpoint, drop the explicit monitoring/services/left4me-web
block from this reactor.
Net effect: same curl command lands in monitoring as before, but the
service name is now 'left4.me' (the hostname, per the nginx reactor's
naming convention) instead of 'left4me-web'.
bundles/nginx/metadata.py auto-populates letsencrypt/domains from
nginx/vhosts.keys(). Declaring it again in the left4me reactor was a
no-op duplication. Removed; bw metadata still shows the same merged
state (left4.me with reload: [nginx]).
README:
Updated metadata example to show domain as the only required key.
Documented the bundle's derived_from_domain reactor as the source of
nginx/letsencrypt/monitoring/nftables-input wiring, and the
bundle-defaults source of backup/paths.
nodes/ovh.left4me.py:
- groups: + backup, + left4me, + webserver
- bundles: dropped 'left4me' and 'nftables' (come via groups now;
nftables ships with debian-13).
- metadata: pinned vm/cores=4, vm/threads=8 (4-core HT box) so the
nginx bundle's worker_processes resolves; left4me block reduced to
{'domain': 'left4.me'} — git_url, git_branch, secret_key, and the
nginx/letsencrypt/monitoring/nftables/backup blocks now come from
bundle defaults / the derived_from_domain reactor.
Nodes should only carry node-specific metadata. Previously each node
running left4me had to declare git_url, git_branch, secret_key, plus
nginx vhost / letsencrypt / monitoring / nftables-input blocks for
every game port. All of those are derivable from one truly node-
specific value: the domain.
Move into the bundle:
- git_url + git_branch as defaults (override per-node only if needed).
- secret_key as a per-node vault-derived value
(random_bytes_as_base64_for f'{node.name} left4me secret_key',
same convention as postgresql/mosquitto/etc.).
- backup/paths defaults (set-merged with backup group / node paths).
Add a `derived_from_domain` reactor that reads left4me/domain and
emits:
- nginx/vhosts/<domain> proxying 127.0.0.1:8000
- letsencrypt/domains/<domain>
- monitoring/services/left4me-web (curl /health)
- nftables/input rules for the configured port range
(defaults 27015-27115, derived from left4me/port_range_*).
Net effect: a node opting into left4me declares only
metadata.left4me.domain = 'whatever.tld'
plus the universal node-level stuff (id, vm/cores, network, …).
The acme_zone reactor's first ACL branch iterates nodes that have
letsencrypt/domains and reads their network/internal/ipv4. Until now
that crashed for any node with letsencrypt but no internal LAN — the
node had to either fake a network/internal/ipv4 or skip TLS.
Add a `metadata.get(..., None)` guard to filter such nodes out of this
branch. The wireguard branch below already covers them (any node with
the wireguard bundle gets its wireguard/my_ip into the ACL), so ACME
DNS-01 reachability still works for cross-Internet nodes that join the
fleet via wireguard.
Surfaced by ovh.left4me: dedicated server with no Hetzner/internal
network, reachable from the bind-acme node only via wireguard.
Single bundle group; pulls in bundles/left4me. Joined by nodes that run
the L4D2 game-server platform. nftables and systemd come in via the
debian-13 group on Debian-13 nodes, so this group needs only the
left4me bundle itself.
Catches misconfiguration at bw test time if a node attaches left4me
without those two bundles. Both contribute load-bearing metadata
materializers (nftables/output rules; systemd/units → unit files).
Three issues caught once `bw test ovh.left4me` ran with the bundle
actually attached (vs. the earlier `bw test` with no node opting in,
which only checks parsing):
1. systemd_services + nftables_output reactors didn't read any metadata.
bw rejects this with "did not request any metadata, you might want
to use defaults instead". Both contributions are static, so they
belong in `defaults` — moved.
2. git_deploy:/opt/left4me/src triggered action:left4me_create_venv,
but create_venv lacked `triggered: True`. bw enforces that any
action in a triggers list must be `triggered: True`. Removed
create_venv from the trigger list — it's gated by `unless` for
idempotency and doesn't need to refire on git updates anyway
(the venv persists). pip_install stays in triggers so editable
installs pick up new code.
Replaces the per-app inet left4me_mark table from
deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft with two rules
in the central bundles/nftables/ inet filter table's output chain.
Same selectors (skuid left4me + l4proto udp), same actions (DSCP EF +
priority 6) for both v4 and v6.
The server@ template intentionally has no svc_systemd entry — instances
are started on-demand by the web app through the left4me-systemctl
helper. Slices are activated implicitly when units use Slice=.
Sets in libs/systemd.py:18 are sorted alphabetically. The current
output is correct by accident — host.env < web.env, host.env < /var.
Adding a third path later would silently reorder. Tuples preserve
insertion order; generate_unitfile() iterates them the same way.
Environment (HOME=, PATH=) stays a set: each line is an independent
KEY=VALUE assignment, order is irrelevant.