left4me: per-node system_cpus set; pin HT siblings on ovh.left4me

Replaces bundle-default system_core_count int with a per-node set of CPU ids; reactor takes set complement for game cores. ovh.left4me sets {0, 4} to keep both HT siblings of physical core 0 in system.slice so games don't share L1/L2 with system work. systemd_units reactor return inlined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
left4me: per-slice AllowedCPUs= driven by system_core_count
2026-05-11 00:20:28 +02:00 · 2026-05-11 00:04:35 +02:00 · 2026-05-10 22:46:45 +02:00 · 2026-05-10 22:12:03 +02:00 · 2026-05-10 21:36:59 +02:00 · 2026-05-10 21:32:08 +02:00
33 changed files with 2011 additions and 23 deletions
--- a/.gitignore
+++ b/.gitignore
@ -5,3 +5,5 @@
 .bw_debug_history
 # CocoIndex Code (ccc)
 /.cocoindex_code/
 # bundlewrap git_deploy local-mirror map (operator-specific paths)
 git_deploy_repos
--- a/AGENTS.md
+++ b/AGENTS.md
@ -12,12 +12,12 @@ not project documentation. Onboarding lives **here**, in `AGENTS.md`.
 ## Quickstart for agents
-Five rules; follow these and you won't break things:
+Six rules; follow these and you won't break things:
 1. **Read-only by default.** Never run `bw apply`, `bw run`, or
   `bw lock` without explicit user request — even with `-i`. Stick
-   to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
+   to `bw test`, `bw nodes`, `bw groups`, `bw items`,
-   `bw items`, `bw metadata`, `bw hash`, `bw debug`. See
+   `bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
   [`docs/agents/commands.md`](docs/agents/commands.md) and the
   fork's [safety envelope](https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md).
 2. **Never echo decrypted secrets.** Don't print, paste, or log the
@ -38,6 +38,15 @@ Five rules; follow these and you won't break things:
 5. **Prefer adding helpers to `libs/`** over duplicating logic across
   bundles. Repo-wide helpers go in
   [`libs/`](libs/AGENTS.md), reachable as `repo.libs.<x>`.
 6. **`ccc` is available for semantic search.** This repo is indexed
   with [`ccc`](https://github.com/cocoindex-io/cocoindex-code).
   Reach for it on conceptual questions ("where is X used / which
   bundles do Y / what are the contexts of Z"), where a keyword
   grep would miss indirect usage:
   `ccc search '<concept>' --path '**'`. Pass `--path '**'` —
   without it, results are filtered to the current working
   directory's subtree. `grep`/`rg`/`find` remain fine for
   exact-string lookups; pick whichever fits the question.
 ## Layout
--- a/bundles/AGENTS.md
+++ b/bundles/AGENTS.md
@ -41,6 +41,16 @@ bundles/<name>/
  more than one bundle. Don't duplicate logic across bundles.
 - **Custom item types** (e.g. `download:`) live in
  [`items/`](../items/AGENTS.md), not per-bundle.
 - **Bundles own application-wide knowledge; nodes carry only the few
  per-host knobs the bundle actually needs.** When designing a bundle,
  identify the per-node knobs (e.g. domain, uplink interface, a
  vault-id suffix) and put everything else in `defaults`, or in a
  reactor that derives from those knobs. Per-node random secrets
  belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
  keyed on the node — not in the node file. See
  `bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
  and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
  at module scope).
 ## How to add a new bundle
@ -56,12 +66,22 @@ bundles/<name>/
   [`groups/<axis>/<x>.py`](../groups/AGENTS.md) (preferred for shared
   bundles) or to the node's `bundles` list directly
   ([`nodes/AGENTS.md`](../nodes/AGENTS.md)).
-5. Verify, in this order:
+5. **Verify, in this order:**
-   - `bw test` — sanity (loaders + reactors).
+   - `bw test` — repo-wide parse + cross-cutting hooks. Loads every
-   - `bw items <node>` — confirm new items appear on a node that opts in.
+     bundle, but reactors don't fire for nodes that haven't opted into
-   - `bw hash <node>` — confirm the change is what you expected. See
+     the bundle yet — bugs in new reactors stay hidden here.
-     [`docs/agents/commands.md`](../docs/agents/commands.md) and the
+   - **Attach the bundle to a node** (via the node's `bundles` list, or
-     fork's hash-diff workflow.
+     a group it belongs to). Until you do, the next steps don't actually
     exercise the bundle.
   - `bw test <node>` — exercises every reactor and item-graph edge for
     that node. This is where most new-bundle bugs surface.
   - `bw items <node> --blame` — confirm items materialise with the
     right paths, authored by the expected bundle.
   - `bw metadata <node> -k <a/b>` — spot-check derived metadata.
   - `bw hash <node>` — preview vs current host state.
   See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
   for the rationale.
 6. Add a `bundles/<name>/README.md`. See "Per-bundle README" below
   for what to cover.
@ -82,6 +102,12 @@ bundles/<name>/
  unless the matching `file:` item declares `content_type='mako'`
  (or a templating extension triggers it). To check, read the matching
  `file:` entry in `items.py`.
 - **`file:` `source` defaults to the destination basename.** For a
  destination of `/etc/foo/bar.conf` with no `source` key, bw looks
  for `bundles/<bundle>/files/bar.conf`. Only declare `source`
  explicitly when the basename you want differs (e.g. shipping a Mako
  template named `bar.conf.mako` to a destination of
  `/etc/foo/bar.conf`).
 - **Reactors writing across namespaces.** Some bundles' reactors write
  into other bundles' metadata namespaces (e.g. `nextcloud` writes
  into `apt.packages`, `archive.paths`). When you change such a bundle,
@ -90,6 +116,28 @@ bundles/<name>/
  itself; grep `'<other-bundle>':` in the reactors when in doubt.
 - **`bw hash` doesn't accept selectors.** Use `bw hash <node>` per
  literal name; see the fork's runbook.
 - **Reactors must read metadata.** If a reactor body returns a static
  dict without calling `metadata.get(...)`, bw raises
  `ValueError: <reactor> on <node> did not request any metadata, you
  might want to use defaults instead` once a node consumes the bundle.
  Fix: fold the contribution into `defaults`. The rule applies even
  when the reactor writes into another bundle's namespace — a static
  contribution to e.g. `nftables/output` belongs in `defaults`, where
  bw merges it with other bundles' contributions.
 - **`triggers` ↔ `triggered: True` invariant.** Any item listed in
  another's `triggers` list must declare `triggered: True`. bw
  enforces this at `bw test` time: *"…triggered by …, but missing
  'triggered' attribute"*. Corollary: an action can't be both in an
  upstream `triggers` list AND self-healing every apply — pick one.
 - **Triggered actions don't recover from partial failure.** When an
  upstream item's apply succeeds but its triggered downstream action
  fails, subsequent applies can't recover via the trigger chain —
  upstream is "already in desired state" and never re-triggers. For
  actions that must self-heal (pip installs, chowns, migrations),
  drop `triggered: True` and gate the command with `unless: <fast-check>`.
  `unless` is a shell command on the target host whose exit status
  decides whether the main command runs (exit 0 = skip); it's checked
  at fire time, after `triggered:` filtering.
 ## Per-bundle README
--- a/bundles/bind-acme/metadata.py
+++ b/bundles/bind-acme/metadata.py
@ -33,6 +33,7 @@ def acme_zone(metadata):
            str(ip_interface(other_node.metadata.get('network/internal/ipv4')).ip)
                for other_node in repo.nodes
                if other_node.metadata.get('letsencrypt/domains', {})
                and other_node.metadata.get('network/internal/ipv4', None)
        },
        *{
            str(ip_interface(other_node.metadata.get('wireguard/my_ip')).ip)
--- a/bundles/bind/README.md
+++ b/bundles/bind/README.md
@ -0,0 +1,30 @@
 # bind
 Authoritative DNS — primary plus optional `bind/master_node` slaves.
 ## Applying changes needs both nodes
 The slave's bw-managed zone files are rendered from the master's
 metadata at slave-apply time (see `bundles/bind/items.py:100`). When
 you change a record on the master (adding a `letsencrypt/domains`
 entry, a new vhost, etc.), the change is only published once you
 apply BOTH:
 ```sh
 bw apply htz.mails        # primary (where the source records live)
 bw apply ovh.secondary    # secondary (renders its own zone files)
 ```
 Until both have been applied, `bw verify ovh.secondary` will show
 stale zones and consumers that hit the secondary (Let's Encrypt's
 secondary-region validators in particular) will see NXDOMAIN. Even
 though the slave's named.conf.local declares `type slave;`, don't
 rely on bind's own AXFR catching up — the bw-rendered file on disk
 is what `bw verify` measures.
 ## See also
 - `bundles/bind-acme/` — the in-house ACME-update receiver.
 - `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
  negative-cache penalty (the most common operational consequence
  of forgetting to apply the secondary).
--- a/bundles/left4me/README.md
+++ b/bundles/left4me/README.md
@ -0,0 +1,114 @@
 # left4me
 L4D2 game-server management platform: a Flask web UI on gunicorn that
 provisions per-instance srcds servers via templated systemd units, with
 kernel-overlayfs layering for shared installations + per-overlay maps,
 and uid-based DSCP/priority marking on the egress path so CAKE on the
 external interface prioritizes srcds UDP over bulk traffic.
 ## Metadata
 ```python
 'metadata': {
    'left4me': {
        'domain': 'whatever.tld',  # required — the only per-node knob
        # Everything below is optional and has a sensible default in the
        # bundle. Override per-node only if the default is wrong:
        # 'git_url': 'git@git.sublimity.de:cronekorkn/left4me',
        # 'git_branch': 'master',
        # 'gunicorn_workers': 1,
        # 'gunicorn_threads': 32,
        # 'job_worker_threads': 4,
        # 'port_range_start': 27015,
        # 'port_range_end': 27115,
        # secret_key is auto-derived per node
        # (repo.vault.random_bytes_as_base64_for f'{node.name} left4me secret_key').
    },
 },
 ```
 The bundle's `derived_from_domain` reactor reads `left4me/domain` and
 emits the corresponding `nginx/vhosts`, `letsencrypt/domains`,
 `monitoring/services/left4me-web` (HTTPS health check), and the game-
 port `nftables/input` accept rules. Backup paths
 (`/var/lib/left4me`, `/etc/left4me`) are set-merged into `backup/paths`
 from defaults. None of these need to be declared per-node.
 ## What this bundle does
 - Creates system users `left4me` (uid/gid 980, home `/var/lib/left4me`,
  mode 0711) and `l4d2-sandbox` (uid/gid 981, no home, used by bwrap
  script-overlay builds).
 - Drops privileged helpers under `/usr/local/libexec/left4me/`
  (`left4me-systemctl`, `left4me-journalctl`, `left4me-overlay`,
  `left4me-script-sandbox`) plus a tight sudoers file (validated with
  `visudo -cf` before install).
 - `git_deploy`s the left4me repo to `/opt/left4me/src`, builds a venv at
  `/opt/left4me/.venv`, `pip install -e`s both `l4d2host` and `l4d2web`,
  runs `alembic upgrade head` and `flask seed-script-overlays`, then
  enables `left4me-web.service`.
 - Emits four systemd units via `systemd/units` metadata (consumed by
  `bundles/systemd/`):
  - `left4me-web.service` — gunicorn on `127.0.0.1:8000` (TLS terminates upstream).
  - `left4me-server@.service` — per-instance srcds template, started on
    demand by the web app via the `left4me-systemctl` helper.
  - `l4d2-game.slice` / `l4d2-build.slice` — cgroup slices for the
    perf-baseline (CPU/IO weights, memory caps).
 - Contributes uid-based DSCP/priority marks for srcds UDP egress to
  `nftables/output` (via `defaults`).
 ## Gotchas
 - **Requires `bundles/nftables` and `bundles/systemd` on the node.** The
  bundle asserts membership at `bw test` time. On Debian-13 these ride
  in via the `debian-13` group, so attaching the bundle to a Debian-13
  node is enough.
 - **`left4me-web.service` does not have `NoNewPrivileges=true`.** This is
  intentional — workers `sudo` the privileged helpers; `NoNewPrivileges`
  would block setuid escalation. Per-instance `server@.service` units
  *do* have it.
 - **CAKE shaping is configured separately**, via
  `network/<iface>/cake` on the node (consumed by `bundles/network/`),
  not by this bundle.
 - **First-run admin user is manual.** After `bw apply`, ssh to the host and
  bootstrap the admin via the `left4me` wrapper (it sources the env files,
  drops to the `left4me` user, and runs the flask CLI):
  `sudo left4me create-user <username> --admin` (prompts for password via
  the flask CLI, or set `LEFT4ME_ADMIN_PASSWORD` first). The bundle
  deliberately doesn't seed an admin to keep credentials out of the
  metadata pipeline. The same `left4me` wrapper accepts any other flask
  subcommand: `sudo left4me seed-script-overlays <dir>`,
  `sudo left4me routes`, `sudo left4me shell`, etc.
 - **CPU isolation is managed by this bundle**, driven by one required
  per-node knob: `left4me/system_cpus` — a set of int CPU ids that
  pins `system.slice` / `user.slice` / `l4d2-build.slice`. The
  complement (`set(range(vm/threads)) - system_cpus`) pins
  `l4d2-game.slice`. On HT hosts, list both SMT siblings of every
  physical core you want to reserve for system, otherwise games end
  up sharing L1/L2 with system. Find pairings via
  `/sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list`. On
  the prod node (`ovh.left4me`, 4 physical / 8 threads, pairings
  (0,4) (1,5) (2,6) (3,7)) the node sets `'system_cpus': {0, 4}` to
  reserve physical core 0 entirely. `l4d2-game.slice` and
  `l4d2-build.slice` carry `AllowedCPUs=` inline on their unit
  definitions; `system.slice` and `user.slice` get drop-ins registered
  under `systemd/units` with the `'<parent>.d/<basename>.conf'` key
  convention (same shape nginx and autologin use), landing at
  `/usr/local/lib/systemd/system/<slice>.d/99-left4me-cpuset.conf`.
  The reactor raises if `system_cpus` includes CPUs outside
  `[0, vm/threads)` or leaves no cores for games.
 - **Kernel feature requirement:** kernel-overlayfs (`CONFIG_OVERLAY_FS`).
  Standard on debian-13.
 - **Game ports** open by the web app on demand in the range 27015-27115
  (UDP+TCP). Add corresponding accept rules to `nftables/input` per
  node if the host's policy is default-drop on input.
 - **Pinned UIDs/GIDs (980/981).** Chosen for deterministic ownership
  across rebuilds and backup restores. If you add another bundle that
  pins UIDs in this repo, make sure it doesn't collide.
 ## Slice support requires `bundles/systemd` ≥ commit cc1c6a5
 This bundle's `l4d2-game.slice` and `l4d2-build.slice` units rely on
 `bundles/systemd/items.py` accepting the `.slice` extension. Older
 revisions raised `Exception(f'unknown type slice')` at apply time.
 The repo-wide `bw test` will catch this if it regresses.
--- a/bundles/left4me/files/etc/left4me/host.env.mako
+++ b/bundles/left4me/files/etc/left4me/host.env.mako
@ -0,0 +1,6 @@
 # Managed by ckn-bw bundles/left4me. Local edits will be reverted.
 # Deployment units use fixed /var/lib/left4me paths; regenerate units if this changes.
 LEFT4ME_ROOT=/var/lib/left4me
 # l4d2host invokes steamcmd by absolute path — bypasses PATH lookup so the
 # script's `cd "$(dirname "$0")"` resolves next to the real install dir.
 LEFT4ME_STEAMCMD=/opt/left4me/steam/steamcmd.sh
--- a/bundles/left4me/files/etc/left4me/sandbox-resolv.conf
+++ b/bundles/left4me/files/etc/left4me/sandbox-resolv.conf
@ -0,0 +1,6 @@
 # Sandbox-only resolver config — bind-mounted into script-overlay sandboxes
 # at /etc/resolv.conf. The host's resolver (often a private/LAN DNS server)
 # is unreachable from inside the sandbox because IPAddressDeny= blocks
 # egress to RFC1918 / loopback. Public resolvers keep DNS working.
 nameserver 1.1.1.1
 nameserver 8.8.8.8
--- a/bundles/left4me/files/etc/left4me/web.env.mako
+++ b/bundles/left4me/files/etc/left4me/web.env.mako
@ -0,0 +1,7 @@
 # Managed by ckn-bw bundles/left4me. Local edits will be reverted.
 DATABASE_URL=sqlite:////var/lib/left4me/left4me.db
 SECRET_KEY=${node.metadata.get('left4me/secret_key')}
 JOB_WORKER_THREADS=${node.metadata.get('left4me/job_worker_threads')}
 SESSION_COOKIE_SECURE=true
 LEFT4ME_PORT_RANGE_START=${node.metadata.get('left4me/port_range_start')}
 LEFT4ME_PORT_RANGE_END=${node.metadata.get('left4me/port_range_end')}
--- a/bundles/left4me/files/etc/sudoers.d/left4me
+++ b/bundles/left4me/files/etc/sudoers.d/left4me
@ -0,0 +1,5 @@
 Defaults:left4me !requiretty
 left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
 left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
 left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-overlay mount *, /usr/local/libexec/left4me/left4me-overlay umount *
 left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
--- a/bundles/left4me/files/etc/sysctl.d/99-left4me.conf
+++ b/bundles/left4me/files/etc/sysctl.d/99-left4me.conf
@ -0,0 +1,36 @@
 # Host-side perf baseline for left4me — see
 # docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
 #
 # UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
 # Source-engine UDP across multiple instances. 8 MiB matches the standard
 # 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
 # explicitly enlarge their buffers.
 net.core.rmem_max = 8388608
 net.core.wmem_max = 8388608
 net.core.rmem_default = 524288
 net.core.wmem_default = 524288
 # Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
 # at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
 # netdev_budget = 600 gives softirq more drain headroom per pass.
 net.core.netdev_max_backlog = 5000
 net.core.netdev_budget = 600
 # Latency-sensitive default: avoid swap unless the box is really under
 # pressure. Harmless on swapless hosts.
 vm.swappiness = 10
 # Per-socket UDP buffer floors: protect game-server sockets that don't bump
 # their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
 net.ipv4.udp_rmem_min = 16384
 net.ipv4.udp_wmem_min = 16384
 # Default qdisc for ifaces we don't explicitly shape with CAKE. Debian Trixie
 # already defaults to fq_codel; setting it explicitly is belt-and-suspenders
 # and survives kernel-default churn.
 net.core.default_qdisc = fq_codel
 # TCP congestion control: BBR for any bulk TCP egress on the host (admin SSH,
 # backups, package fetches, web-app responses) so a long flow does not push
 # the bottleneck queue ahead of game UDP. UDP srcds is unaffected.
 net.ipv4.tcp_congestion_control = bbr
--- a/bundles/left4me/files/usr/local/libexec/left4me/left4me-journalctl
+++ b/bundles/left4me/files/usr/local/libexec/left4me/left4me-journalctl
@ -0,0 +1,53 @@
 #!/bin/sh
 set -eu
 usage() {
    printf '%s\n' "usage: left4me-journalctl <server-name> --lines <n> --follow|--no-follow" >&2
    exit 2
 }
 validate_name() {
    name=$1
    [ -n "$name" ] || usage
    case "$name" in
        .*|*..*|*/*|*\\*) usage ;;
    esac
    case "$name" in
        *[!A-Za-z0-9_.-]*) usage ;;
    esac
 }
 [ "$#" -eq 4 ] || usage
 name=$1
 lines_flag=$2
 lines=$3
 follow_flag=$4
 validate_name "$name"
 [ "$lines_flag" = "--lines" ] || usage
 case "$lines" in
    ''|*[!0-9]*) usage ;;
 esac
 follow_arg=
 case "$follow_flag" in
    --follow) follow_arg=-f ;;
    --no-follow) ;;
    *) usage ;;
 esac
 unit="left4me-server@${name}.service"
 if [ -x /bin/journalctl ]; then
    journalctl=/bin/journalctl
 elif [ -x /usr/bin/journalctl ]; then
    journalctl=/usr/bin/journalctl
 else
    printf '%s\n' 'journalctl not found at /bin/journalctl or /usr/bin/journalctl' >&2
    exit 69
 fi
 if [ -n "$follow_arg" ]; then
    exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"
 fi
 exec "$journalctl" -u "$unit" -n "$lines" -o cat
--- a/bundles/left4me/files/usr/local/libexec/left4me/left4me-overlay
+++ b/bundles/left4me/files/usr/local/libexec/left4me/left4me-overlay
@ -0,0 +1,242 @@
 #!/usr/bin/python3
 """Privileged overlay mount helper for left4me.
 Invoked from the systemd unit's ExecStartPre / ExecStopPost via
 `+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- …`. The unit-level
 nsenter is what makes this work: it runs the helper Python interpreter
 inside PID 1's mount namespace. Without it, the `+` Exec prefix
 removes the sandbox/credentials but does NOT detach from the unit's
 per-service mount namespace, and the helper process itself would pin
 that namespace alive — turning every umount into a multi-second EBUSY
 race with the kernel's deferred namespace cleanup. With the unit-level
 nsenter the helper has no such reference and umount succeeds first try.
 Validates inputs strictly, then performs `mount -t overlay` /
 `umount` directly — no internal nsenter, since the helper is already
 running where the syscalls need to take effect.
 Verbs:
    mount <name>    Reads ${LEFT4ME_ROOT}/instances/<name>/instance.env
                    for L4D2_LOWERDIRS, validates every lowerdir is
                    under one of installation/overlays/workshop_cache/
                    global_overlay_cache, then mounts the kernel
                    overlay at runtime/<name>/merged.
    umount <name>   Unmounts runtime/<name>/merged and cleans up the
                    kernel-overlayfs `work/work` orphan.
 Set LEFT4ME_OVERLAY_PRINT_ONLY=1 to print the would-be argv (one line,
 shell-quoted) and exit 0 instead of execv. Used by tests.
 """
 import os
 import re
 import shlex
 import shutil
 import subprocess
 import sys
 from pathlib import Path
 NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
 DEFAULT_ROOT = "/var/lib/left4me"
 LOWERDIR_ALLOWLIST = (
    "installation",
    "overlays",
    "global_overlay_cache",
    "workshop_cache",
 )
 MAX_LOWERDIRS = 500
 MOUNT_BIN = "/bin/mount"
 UMOUNT_BIN = "/bin/umount"
 def die(msg: str) -> None:
    sys.stderr.write(f"left4me-overlay: {msg}\n")
    sys.exit(1)
 def root() -> Path:
    return Path(os.environ.get("LEFT4ME_ROOT") or DEFAULT_ROOT)
 def validate_name(name: str) -> str:
    if not NAME_RE.fullmatch(name):
        die(f"invalid instance name: {name!r}")
    return name
 def parse_lowerdirs(env_path: Path) -> list[str]:
    if not env_path.is_file():
        die(f"instance.env not found: {env_path}")
    raw = None
    for line in env_path.read_text().splitlines():
        if "=" not in line:
            continue
        key, value = line.split("=", 1)
        if key.strip() == "L4D2_LOWERDIRS":
            raw = value
            break
    if raw is None:
        die(f"L4D2_LOWERDIRS not set in {env_path}")
    if raw == "":
        die(f"L4D2_LOWERDIRS is empty in {env_path}")
    parts = raw.split(":")
    if any(p == "" for p in parts):
        die(f"L4D2_LOWERDIRS contains an empty entry: {raw!r}")
    if len(parts) > MAX_LOWERDIRS:
        die(f"L4D2_LOWERDIRS has {len(parts)} entries (cap {MAX_LOWERDIRS})")
    return parts
 def canonical_under(allowed_roots: list[Path], path: Path) -> Path:
    try:
        canonical = path.resolve(strict=True)
    except (FileNotFoundError, RuntimeError):
        die(f"path does not exist or has a symlink loop: {path}")
    for r in allowed_roots:
        if canonical == r or r in canonical.parents:
            return canonical
    die(f"path is outside the permitted roots: {path} (resolved: {canonical})")
 _LISTXATTR = getattr(os, "listxattr", None)
 def _entry_has_fuse_xattr(path: str) -> str | None:
    if _LISTXATTR is None:
        return None
    try:
        attrs = _LISTXATTR(path, follow_symlinks=False)
    except OSError:
        return None
    for a in attrs:
        if a.startswith("user.fuseoverlayfs."):
            return a
    return None
 def assert_no_fuse_xattrs(upper: Path) -> None:
    if not upper.exists() or _LISTXATTR is None:
        return
    for dirpath, dirnames, filenames in os.walk(upper):
        for entry in (dirpath, *(os.path.join(dirpath, n) for n in dirnames),
                      *(os.path.join(dirpath, n) for n in filenames)):
            tainted = _entry_has_fuse_xattr(entry)
            if tainted:
                die(
                    f"upperdir contains fuse-overlayfs xattr {tainted!r} on {entry}; "
                    "wipe upper/ and work/ before mounting"
                )
 def exec_or_print(argv: list[str]) -> None:
    if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
        print(" ".join(shlex.quote(a) for a in argv))
        sys.exit(0)
    os.execv(argv[0], argv)
 def cmd_mount(name: str) -> None:
    name = validate_name(name)
    r = root()
    runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
    merged_for_check = (runtime_name_dir / "merged").resolve(strict=True)
    # Idempotency for unit restart cycles: if a previous start mounted
    # successfully but ExecStart failed afterwards (and Restart=on-failure
    # fires another cycle), the second ExecStartPre would otherwise refuse
    # to mount-on-top. Short-circuit here so the second cycle just gets
    # straight to ExecStart. PRINT_ONLY (test mode) bypasses this so the
    # tests can exercise the full nsenter argv regardless of mount state.
    if (
        os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") != "1"
        and os.path.ismount(merged_for_check)
    ):
        return
    instance_env = r / "instances" / name / "instance.env"
    raw_lowerdirs = parse_lowerdirs(instance_env)
    allowed_roots = [(r / sub).resolve() for sub in LOWERDIR_ALLOWLIST]
    canonical_lowerdirs = [str(canonical_under(allowed_roots, Path(p))) for p in raw_lowerdirs]
    upper = (runtime_name_dir / "upper").resolve(strict=True)
    work = (runtime_name_dir / "work").resolve(strict=True)
    merged = merged_for_check
    for label, path in (("upper", upper), ("work", work), ("merged", merged)):
        if path.parent != runtime_name_dir:
            die(f"{label} resolved outside runtime/{name}: {path}")
    assert_no_fuse_xattrs(upper)
    options = f"lowerdir={':'.join(canonical_lowerdirs)},upperdir={upper},workdir={work}"
    argv = [
        MOUNT_BIN,
        "-t", "overlay",
        "overlay",
        "-o", options,
        str(merged),
    ]
    exec_or_print(argv)
 def cmd_umount(name: str) -> None:
    name = validate_name(name)
    r = root()
    runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
    merged_path = runtime_name_dir / "merged"
    work_inner = runtime_name_dir / "work" / "work"
    argv = [
        UMOUNT_BIN,
        # Resolve only if it exists; PRINT_ONLY tests always pre-create it.
        str(merged_path.resolve(strict=True) if merged_path.exists() else merged_path),
    ]
    # PRINT_ONLY: emit the umount argv and exit. Tests assert exact shape
    # of this dry-run; the post-umount cleanup of work_inner is a runtime
    # behaviour exercised on the host, not in unit tests.
    if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
        print(" ".join(shlex.quote(a) for a in argv))
        sys.exit(0)
    if merged_path.exists():
        merged = merged_path.resolve(strict=True)
        if merged.parent != runtime_name_dir:
            die(f"merged resolved outside runtime/{name}: {merged}")
        # Idempotency: only umount if currently a mount point. Mirrors
        # cmd_mount's symmetric check; a redundant cleanup pass — or a
        # call after a partial _purge_instance — must be a no-op.
        #
        # No retry loop here: with the helper running in PID 1's mount
        # namespace (via the unit-level `nsenter --mount=/proc/1/ns/mnt`
        # in ExecStopPost), it holds no reference to the unit's
        # per-service mount namespace, so the cgroup-empty → namespace
        # reaped → umount-clears sequence happens without any race
        # window for us to ride out. EBUSY here is a real error.
        if os.path.ismount(merged):
            subprocess.run(argv, check=True)
    # Kernel-overlayfs creates work_inner during mount with root:root mode
    # 0/0. After unmount it's an orphan that the unit's User= (left4me)
    # cannot traverse via shutil.rmtree, so reset/delete in instances.py
    # blows up with EACCES on `runtime/<name>/work/work`. The helper is
    # the only code path with root that knows about this directory, so
    # the cleanup belongs here. Safe to nuke — the kernel re-creates it
    # on the next mount. Run unconditionally — covers both "we just
    # unmounted" and "previous teardown didn't finish" cases.
    if work_inner.exists():
        shutil.rmtree(work_inner)
 def main(argv: list[str]) -> None:
    if len(argv) != 3 or argv[1] not in ("mount", "umount"):
        sys.stderr.write("usage: left4me-overlay mount|umount <name>\n")
        sys.exit(2)
    if argv[1] == "mount":
        cmd_mount(argv[2])
    else:
        cmd_umount(argv[2])
 if __name__ == "__main__":
    main(sys.argv)
--- a/bundles/left4me/files/usr/local/libexec/left4me/left4me-script-sandbox
+++ b/bundles/left4me/files/usr/local/libexec/left4me/left4me-script-sandbox
@ -0,0 +1,82 @@
 #!/bin/bash
 # Privileged sandbox launcher for left4me script overlays.
 #
 # Invoked via sudo by the web user with two arguments:
 #   <overlay_id>   numeric overlay id; bind-mounts /var/lib/left4me/overlays/<id>
 #                  read-write at /overlay inside the sandbox.
 #   <script_path>  absolute path to a bash file already written by the web app;
 #                  bind-mounted read-only at /script.sh inside the sandbox.
 #
 # The script runs as a transient systemd .service with the full hardening
 # surface: cgroup limits + walltime kill, NoNewPrivileges, ProtectSystem,
 # ProtectHome, kernel-tunable / -module / -log protection, namespace
 # restriction, address-family restriction, capability bounding (empty),
 # seccomp filter (@system-service @network-io), MemoryDenyWriteExecute,
 # LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
 # scripts must reach the public internet to download workshop / l4d2center
 # / cedapug content. PID namespace is shared with the host (no
 # PrivatePID= directive in systemd); host PIDs are visible via /proc but
 # not signal-able due to UID mismatch.
 set -euo pipefail
 [[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
 OVERLAY_ID=$1
 SCRIPT=$2
 [[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
 OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
 [[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
 [[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
 if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
    echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
    exit 0
 fi
 # Make sure the sandbox UID owns the overlay dir so the script can write there.
 # Idempotent: a no-op when the dir is already l4d2-sandbox-owned (re-run case),
 # and corrects the ownership the first time the dir was created by the web app
 # under the left4me UID. World-readable so the gameserver process (left4me)
 # can read the overlay contents via the kernel-overlayfs lowerdir at runtime.
 chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
 chmod 0755 "$OVERLAY_DIR"
 SCRIPT_RC=0
 systemd-run --quiet --collect --wait --pipe \
    --unit="left4me-script-${OVERLAY_ID}-$$" \
    --slice=l4d2-build.slice \
    -p OOMScoreAdjust=500 \
    -p User=l4d2-sandbox -p Group=l4d2-sandbox \
    -p UMask=0022 \
    -p NoNewPrivileges=yes \
    -p ProtectSystem=strict -p ProtectHome=yes \
    -p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
    -p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
    -p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
    -p RestrictNamespaces=yes \
    -p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
    -p RestrictSUIDSGID=yes -p LockPersonality=yes \
    -p MemoryDenyWriteExecute=yes \
    -p SystemCallFilter="@system-service @network-io" \
    -p SystemCallArchitectures=native \
    -p CapabilityBoundingSet= -p AmbientCapabilities= \
    -p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
    -p TemporaryFileSystem="/etc /var/lib" \
    -p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
    -p BindPaths="${OVERLAY_DIR}:/overlay" \
    -p WorkingDirectory=/overlay \
    -p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
    -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
    -p CPUQuota=200% -p RuntimeMaxSec=3600 \
    -- /bin/bash /script.sh || SCRIPT_RC=$?
 # Normalize perms so the web service (left4me uid) can read overlay files
 # directly via Python open() — needed by the file tree's download endpoint.
 # UMask=0022 above takes care of *new* writes; this catches anything the
 # script created with a tighter mode (e.g. cedapug_maps writes its
 # .cedapug/manifest.tsv as 0600 by default).
 find "$OVERLAY_DIR" -type f ! -perm -o+r -exec chmod o+r {} + 2>/dev/null || true
 find "$OVERLAY_DIR" -type d ! -perm -o+rx -exec chmod o+rx {} + 2>/dev/null || true
 exit $SCRIPT_RC
--- a/bundles/left4me/files/usr/local/libexec/left4me/left4me-systemctl
+++ b/bundles/left4me/files/usr/local/libexec/left4me/left4me-systemctl
@ -0,0 +1,44 @@
 #!/bin/sh
 set -eu
 usage() {
    printf '%s\n' "usage: left4me-systemctl enable|disable|show <server-name>" >&2
    exit 2
 }
 validate_name() {
    name=$1
    [ -n "$name" ] || usage
    case "$name" in
        .*|*..*|*/*|*\\*) usage ;;
    esac
    case "$name" in
        *[!A-Za-z0-9_.-]*) usage ;;
    esac
 }
 [ "$#" -eq 2 ] || usage
 action=$1
 name=$2
 case "$action" in
    enable|disable|show) ;;
    *) usage ;;
 esac
 validate_name "$name"
 unit="left4me-server@${name}.service"
 if [ -x /bin/systemctl ]; then
    systemctl=/bin/systemctl
 elif [ -x /usr/bin/systemctl ]; then
    systemctl=/usr/bin/systemctl
 else
    printf '%s\n' 'systemctl not found at /bin/systemctl or /usr/bin/systemctl' >&2
    exit 69
 fi
 case "$action" in
    enable) exec "$systemctl" enable --now "$unit" ;;
    disable) exec "$systemctl" disable --now "$unit" ;;
    show) exec "$systemctl" show --property=ActiveState --property=SubState "$unit" ;;
 esac
--- a/bundles/left4me/files/usr/local/sbin/left4me
+++ b/bundles/left4me/files/usr/local/sbin/left4me
@ -0,0 +1,17 @@
 #!/bin/sh
 # Run l4d2web flask CLI commands as the left4me user with the deploy env loaded.
 # Usage: left4me <flask-subcommand> [args...]
 # Examples:
 #   left4me create-user alice --admin
 #   left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
 #   left4me routes
 set -eu
 exec sudo -u left4me sh -c '
    set -a
    . /etc/left4me/host.env
    . /etc/left4me/web.env
    set +a
    export JOB_WORKER_ENABLED=false
    export PYTHONPATH=/opt/left4me/src
    exec /opt/left4me/.venv/bin/flask --app l4d2web.app:create_app "$@"
 ' sh "$@"
--- a/bundles/left4me/items.py
+++ b/bundles/left4me/items.py
@ -0,0 +1,293 @@
 # Items for the left4me bundle.
 # Systemd units come from metadata via bundles/systemd/ — there are no
 # .service or .slice files in this bundle's files/ tree. Cpuset drop-ins
 # for system.slice / user.slice are likewise emitted via systemd/units
 # in metadata.py (key: '<parent>.d/<basename>.conf').
 directories = {
    '/opt/left4me': {
        'owner': 'left4me',
        'group': 'left4me',
    },
    '/opt/left4me/src': {
        'owner': 'left4me',
        'group': 'left4me',
    },
    '/etc/left4me': {
        'owner': 'root',
        'group': 'root',
        'mode': '0755',
    },
    '/var/lib/left4me': {
        # left4me's home dir — useradd creates with 0700; loosen to 0711 so
        # l4d2-sandbox can traverse (but not list) for bwrap bind-mounts.
        'owner': 'left4me',
        'group': 'left4me',
        'mode': '0711',
    },
    '/var/lib/left4me/installation':   {'owner': 'left4me', 'group': 'left4me'},
    '/var/lib/left4me/overlays':       {'owner': 'left4me', 'group': 'left4me'},
    '/var/lib/left4me/instances':      {'owner': 'left4me', 'group': 'left4me'},
    '/var/lib/left4me/runtime':        {'owner': 'left4me', 'group': 'left4me'},
    '/var/lib/left4me/workshop_cache': {'owner': 'left4me', 'group': 'left4me'},
    '/var/lib/left4me/tmp':            {'owner': 'left4me', 'group': 'left4me'},
    '/opt/left4me/steam':              {'owner': 'left4me', 'group': 'left4me'},
    '/usr/local/libexec/left4me': {
        'owner': 'root',
        'group': 'root',
        'mode': '0755',
    },
 }
 groups = {
    'left4me':      {'gid': 980},
    'l4d2-sandbox': {'gid': 981},
 }
 users = {
    'left4me': {
        'uid': 980,
        'gid': 980,
        'home': '/var/lib/left4me',
        'shell': '/usr/sbin/nologin',
    },
    'l4d2-sandbox': {
        'uid': 981,
        'gid': 981,
        'shell': '/usr/sbin/nologin',
    },
 }
 # UIDs/GIDs pinned in the system-package range (100-999, per Debian
 # policy) so file ownership is deterministic across rebuilds and
 # backup restores. 980/981 are unused elsewhere in this repo.
 # Privileged helpers (mode 0755 root:root). Listed by sudoers as the only
 # commands left4me can invoke as root NOPASSWD.
 HELPERS = (
    'left4me-systemctl',
    'left4me-journalctl',
    'left4me-overlay',
    'left4me-script-sandbox',
 )
 files = {
    '/usr/local/sbin/left4me': {
        'source': 'usr/local/sbin/left4me',  # explicit — basename collides with sudoers
        'mode': '0755',
        'owner': 'root',
        'group': 'root',
    },
    **{
        f'/usr/local/libexec/left4me/{h}': {
            'source': f'usr/local/libexec/left4me/{h}',
            'mode': '0755',
            'owner': 'root',
            'group': 'root',
        }
        for h in HELPERS
    },
    '/etc/left4me/sandbox-resolv.conf': {
        'source': 'etc/left4me/sandbox-resolv.conf',
        'mode': '0644',
        'owner': 'root',
        'group': 'root',
    },
    '/etc/sudoers.d/left4me': {
        'source': 'etc/sudoers.d/left4me',
        'mode': '0440',
        'owner': 'root',
        'group': 'root',
        'test_with': 'visudo -cf {}',
    },
    '/etc/sysctl.d/99-left4me.conf': {
        'source': 'etc/sysctl.d/99-left4me.conf',
        'mode': '0644',
        'owner': 'root',
        'group': 'root',
        'triggers': [
            'action:left4me_sysctl_reload',
        ],
    },
    '/etc/left4me/host.env': {
        'source': 'etc/left4me/host.env.mako',
        'content_type': 'mako',
        'mode': '0644',
        'owner': 'root',
        'group': 'root',
    },
    '/etc/left4me/web.env': {
        'source': 'etc/left4me/web.env.mako',
        'content_type': 'mako',
        'mode': '0640',
        'owner': 'root',
        'group': 'left4me',
        'needs': [
            'group:left4me',
        ],
    },
 }
 actions = {
    'left4me_sysctl_reload': {
        'command': 'sysctl --system >/dev/null',
        'triggered': True,
    },
    'left4me_dpkg_add_i386_arch': {
        # steamcmd is 32-bit and pulls libc6:i386 + lib32z1 from the i386 arch.
        # apt-get update is part of this action because newly-added foreign
        # archs need a fresh package list before any :i386 package resolves.
        'command': 'dpkg --add-architecture i386 && apt-get update',
        'unless': 'dpkg --print-foreign-architectures | grep -qx i386',
        'cascade_skip': False,
    },
    'left4me_install_steamcmd': {
        # Steam's tarball is rolling with no published checksum, so we can't
        # use download: (which requires a hash). Guard with a presence check
        # on steamcmd.sh — steamcmd self-updates at runtime, so chasing the
        # tarball version from bw isn't useful.
        'command': (
            'sudo -u left4me sh -c "'
            'cd /opt/left4me/steam && '
            'curl -fsSL https://media.steampowered.com/installer/steamcmd_linux.tar.gz | '
            'tar -xz'
            '"'
        ),
        'unless': 'test -x /opt/left4me/steam/steamcmd.sh',
        'cascade_skip': False,
        'needs': [
            'directory:/opt/left4me/steam',
            'pkg_apt:curl',
            'pkg_apt:libc6_i386',  # bw pkg_apt convention: _ → :
            'pkg_apt:lib32z1',
            'user:left4me',
        ],
    },
 }
 # steamcmd is invoked by absolute path (LEFT4ME_STEAMCMD in host.env),
 # not via PATH lookup — see l4d2host/cli.py:install. We don't need to put
 # anything in /usr/local/bin for it.
 git_deploy = {
    '/opt/left4me/src': {
        'repo': node.metadata.get('left4me/git_url'),
        'rev': node.metadata.get('left4me/git_branch'),
        'triggers': [
            # On a code-update apply, refresh the DB schema. pip_install
            # would have triggered alembic in the create_venv path, but on
            # a normal apply pip_install's `unless` skips (packages still
            # importable from the previous editable install), and that
            # would leave alembic_upgrade dormant. Wiring git_deploy →
            # alembic directly ensures new migrations land whenever new
            # code lands. alembic upgrade head is idempotent (no-op when
            # already at head), so this is safe to fire on every code
            # update; the seed_overlays + service:restart cascade off
            # alembic also covers picking up the new code in gunicorn.
            'action:left4me_alembic_upgrade',
        ],
        # chown_src and pip_install are NOT in triggers — they run every
        # apply gated by their own `unless` guards, which makes the chain
        # self-healing after a partial failure. (Items in a triggers list
        # must be triggered:True, which would lose that property.)
    },
 }
 actions['left4me_chown_src'] = {
    # Runs every apply (cheap — chown -R on a small tree). Self-heals
    # whenever git_deploy extracts a new tarball as root-owned files.
    # Not in any triggers list so doesn't need triggered:True.
    'command': 'chown -R left4me:left4me /opt/left4me/src',
    'unless': 'test -z "$(find /opt/left4me/src \\! -user left4me -print -quit 2>/dev/null)"',
    'cascade_skip': False,
    'needs': [
        'git_deploy:/opt/left4me/src',
        'user:left4me',
        'group:left4me',
    ],
 }
 actions['left4me_create_venv'] = {
    'command': 'sudo -u left4me /usr/bin/python3 -m venv /opt/left4me/.venv',
    'unless':  'test -x /opt/left4me/.venv/bin/python',
    'cascade_skip': False,
    'needs': [
        'directory:/opt/left4me',
        'pkg_apt:python3-venv',
        'user:left4me',
    ],
    'triggers': [
        'action:left4me_pip_upgrade',
    ],
 }
 actions['left4me_pip_upgrade'] = {
    'command': 'sudo -u left4me /opt/left4me/.venv/bin/python -m pip install --upgrade pip',
    'triggered': True,
    'cascade_skip': False,
    'needs': [
        'pkg_apt:python3-pip',
    ],
    # No triggers — pip_install runs on every apply (gated by `unless`)
    # rather than being chained from here. Keeps pip_upgrade scoped to
    # exactly its purpose.
 }
 actions['left4me_pip_install'] = {
    # Single pip invocation installs both editable packages from the same
    # checkout. Runs on every apply: pip install -e is fast on no-op, and
    # any gate weaker than "egg-info matches pyproject.toml" can mask
    # script regeneration — e.g. adding [project.scripts] later wouldn't
    # be picked up if `unless` only checks importability.
    'command': 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web',
    'cascade_skip': False,
    'needs': [
        'git_deploy:/opt/left4me/src',
        'action:left4me_create_venv',
        'action:left4me_chown_src',
    ],
    'triggers': [
        'action:left4me_alembic_upgrade',
    ],
 }
 actions['left4me_alembic_upgrade'] = {
    # Mirrors deploy-test-server.sh:239-242. Runs as left4me with both env
    # files sourced; JOB_WORKER_ENABLED=false so a stray worker doesn't race
    # with the migration.
    'command': (
        'sudo -u left4me sh -c "'
        'cd /opt/left4me/src/l4d2web && '
        'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
        'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
        '/opt/left4me/.venv/bin/alembic -c /opt/left4me/src/l4d2web/alembic.ini upgrade head'
        '"'
    ),
    'triggered': True,
    'cascade_skip': False,
    'needs': [
        'action:left4me_pip_install',
        'file:/etc/left4me/host.env',
        'file:/etc/left4me/web.env',
    ],
    'triggers': [
        'action:left4me_seed_overlays',
        'svc_systemd:left4me-web.service:restart',
    ],
 }
 actions['left4me_seed_overlays'] = {
    # Idempotent: refreshes script bodies in place; existing overlay rows keep their ids.
    'command': (
        'sudo -u left4me sh -c "'
        'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
        'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
        '/opt/left4me/.venv/bin/flask --app l4d2web.app:create_app '
        'seed-script-overlays /opt/left4me/src/examples/script-overlays'
        '"'
    ),
    'triggered': True,
    'cascade_skip': False,
    'needs': [
        'action:left4me_alembic_upgrade',
    ],
 }
--- a/bundles/left4me/metadata.py
+++ b/bundles/left4me/metadata.py
@ -0,0 +1,275 @@
 assert node.has_bundle('nftables')
 assert node.has_bundle('systemd')
 defaults = {
    'left4me': {
        # Application-wide defaults; node only overrides if it really needs to.
        'git_url': 'https://git.sublimity.de/cronekorkn/left4me.git',
        'git_branch': 'master',
        'secret_key': repo.vault.random_bytes_as_base64_for(f'{node.name} left4me secret_key', length=32).value,
        'gunicorn_workers': 1,
        'gunicorn_threads': 32,
        'job_worker_threads': 4,
        # Whole 27000-block: covers Steam's defaults (27015 game, 27005
        # client/RCON) plus headroom for ad-hoc ports without further
        # nftables changes. Mirrored into LEFT4ME_PORT_RANGE_{START,END}
        # by web.env.mako and into the nftables input rule by the
        # nftables_input reactor below.
        'port_range_start': 27000,
        'port_range_end': 27999,
    },
    'apt': {
        'packages': {
            'p7zip-full': {},
            'nftables': {},
            'iproute2': {},
            'curl': {},
            'ca-certificates': {},
            'python3': {},
            'python3-venv': {},
            'python3-pip': {},
            'python3-dev': {},
            # steamcmd is a 32-bit ELF; needs i386 multiarch + these libs.
            # `_` → `:` is bundlewrap's pkg_apt convention for multiarch
            # names (see pkg_apt.py:48).
            'libc6_i386': {  # installs libc6:i386
                'needs': ['action:left4me_dpkg_add_i386_arch'],
            },
            'lib32z1': {
                'needs': ['action:left4me_dpkg_add_i386_arch'],
            },
        },
    },
    'nftables': {
        # Match deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft.
        # Mark srcds UDP egress (uid left4me) with DSCP EF + skb priority 6
        # so CAKE classifies it into the priority tin.
        'output': {
            'meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000',
            'meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000',
        },
    },
    'systemd': {
        'services': {
            'left4me-web.service': {
                'enabled': True,
                'running': True,
                'needs': [
                    'action:left4me_alembic_upgrade',
                    'file:/etc/left4me/host.env',
                    'file:/etc/left4me/web.env',
                ],
            },
            # Note: left4me-server@.service is a TEMPLATE — instances are
            # started on-demand by the web app via the left4me-systemctl
            # helper. Don't enable/start it from here.
            # The slices are installed (file present) but don't need
            # enable/start — they're activated implicitly when a unit
            # uses Slice=.
        },
    },
    'backup': {
        # Application-owned paths. Set-merged with backup group / node-level paths.
        'paths': {
            '/var/lib/left4me',
            '/etc/left4me',
        },
    },
 }
@metadata_reactor.provides(
    'nginx/vhosts',
 )
 def nginx_vhosts(metadata):
    # letsencrypt/domains and monitoring/services for the vhost are auto-
    # populated by bundles/nginx/metadata.py. We just declare check_path:
    # '/health' so the auto-check hits the Flask health endpoint, not '/'.
    domain = metadata.get('left4me/domain')
    return {
        'nginx': {
            'vhosts': {
                domain: {
                    'content': 'nginx/proxy_pass.conf',
                    'context': {
                        'target': 'http://127.0.0.1:8000',
                    },
                    'check_path': '/health',
                },
            },
        },
    }
@metadata_reactor.provides(
    'nftables/input',
 )
 def nftables_input(metadata):
    port_start = metadata.get('left4me/port_range_start')
    port_end = metadata.get('left4me/port_range_end')
    return {
        'nftables': {
            'input': {
                f'udp dport {port_start}-{port_end} accept',
                f'tcp dport {port_start}-{port_end} accept',
            },
        },
    }
@metadata_reactor.provides(
    'systemd/units',
 )
 def systemd_units(metadata):
    workers = metadata.get('left4me/gunicorn_workers')
    threads = metadata.get('left4me/gunicorn_threads')
    # cgroup-v2 cpuset. `system_cpus` (set of int CPU ids, declared per
    # node) pins system/user/build; the complement pins l4d2-game. On HT
    # hosts, list both siblings of a physical core so games don't share
    # L1/L2 with system work — pairings via
    # /sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list.
    vm_threads = metadata.get('vm/threads', metadata.get('vm/cores'))
    all_cpus = set(range(vm_threads))
    system_cpus = metadata.get('left4me/system_cpus')
    if not system_cpus <= all_cpus:
        raise Exception(
            f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
            f'includes CPUs outside [0, {vm_threads})'
        )
    game_cpus = all_cpus - system_cpus
    if not game_cpus:
        raise Exception(
            f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
            f'leaves no cores for games'
        )
    system_cpus_string = ','.join(str(t) for t in sorted(system_cpus))
    game_cpus_string = ','.join(str(t) for t in sorted(game_cpus))
    # Drop-in for upstream system.slice / user.slice (units we don't own).
    # Same '<parent>.d/<basename>.conf' convention as nginx and autologin.
    cpuset_dropin = {'Slice': {'AllowedCPUs': system_cpus_string}}
    return {
        'systemd': {
            'units': {
                'left4me-web.service': {
                    'Unit': {
                        'Description': 'left4me web application',
                        'After': 'network-online.target',
                        'Wants': 'network-online.target',
                    },
                    'Service': {
                        'Type': 'simple',
                        'User': 'left4me',
                        'Group': 'left4me',
                        'WorkingDirectory': '/opt/left4me/src',
                        'Environment': {
                            'HOME=/var/lib/left4me',
                            'PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
                        },
                        'EnvironmentFile': (
                            '/etc/left4me/host.env',
                            '/etc/left4me/web.env',
                        ),
                        'ExecStart': (
                            '/opt/left4me/.venv/bin/gunicorn '
                            f'--workers {workers} --threads {threads} '
                            "--bind 127.0.0.1:8000 'l4d2web.app:create_app()'"
                        ),
                        'Restart': 'on-failure',
                        'RestartSec': '3',
                        # NoNewPrivileges intentionally NOT set: workers sudo to the helpers.
                        'ProtectSystem': 'full',
                        'ReadWritePaths': '/var/lib/left4me',
                        'PrivateTmp': 'true',
                    },
                    'Install': {
                        'WantedBy': {'multi-user.target'},
                    },
                },
                'left4me-server@.service': {
                    'Unit': {
                        'Description': 'left4me server instance %i',
                        'After': 'network-online.target',
                        'Wants': 'network-online.target',
                        'StartLimitBurst': '5',
                        'StartLimitIntervalSec': '60s',
                    },
                    'Service': {
                        'Type': 'simple',
                        'User': 'left4me',
                        'Group': 'left4me',
                        'EnvironmentFile': (
                            '/etc/left4me/host.env',
                            '/var/lib/left4me/instances/%i/instance.env',
                        ),
                        'WorkingDirectory': '-/var/lib/left4me/runtime/%i/merged/left4dead2',
                        'ExecStartPre': (
                            '+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
                            '/usr/local/libexec/left4me/left4me-overlay mount %i'
                        ),
                        'ExecStart': (
                            '/var/lib/left4me/runtime/%i/merged/srcds_run '
                            '-game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS'
                        ),
                        'ExecStopPost': (
                            '+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
                            '/usr/local/libexec/left4me/left4me-overlay umount %i'
                        ),
                        'Restart': 'on-failure',
                        'RestartSec': '5',
                        'Slice': 'l4d2-game.slice',
                        'Nice': '-5',
                        'IOSchedulingClass': 'best-effort',
                        'IOSchedulingPriority': '4',
                        'OOMScoreAdjust': '-200',
                        'MemoryHigh': '1.5G',
                        'MemoryMax': '2G',
                        'TasksMax': '256',
                        'LimitNOFILE': '65536',
                        'KillSignal': 'SIGINT',
                        'TimeoutStopSec': '15s',
                        'LogRateLimitIntervalSec': '0',
                        'NoNewPrivileges': 'true',
                        'PrivateTmp': 'true',
                        'PrivateDevices': 'true',
                        'ProtectHome': 'true',
                        'ProtectSystem': 'strict',
                        'ReadOnlyPaths': '/var/lib/left4me/installation /var/lib/left4me/overlays',
                        'ReadWritePaths': '/var/lib/left4me/runtime/%i',
                        'RestrictSUIDSGID': 'true',
                        'LockPersonality': 'true',
                    },
                    'Install': {
                        'WantedBy': {'multi-user.target'},
                    },
                },
                'l4d2-game.slice': {
                    'Unit': {
                        'Description': 'left4me game-server slice',
                        'Before': 'slices.target',
                    },
                    'Slice': {
                        'CPUWeight': '1000',
                        'IOWeight': '1000',
                        'AllowedCPUs': game_cpus_string,
                    },
                },
                'l4d2-build.slice': {
                    'Unit': {
                        'Description': 'left4me script-sandbox build slice',
                        'Before': 'slices.target',
                    },
                    'Slice': {
                        'CPUWeight': '10',
                        'IOWeight': '10',
                        'AllowedCPUs': system_cpus_string,
                    },
                },
                'system.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
                'user.slice.d/99-left4me-cpuset.conf':   cpuset_dropin,
            },
        },
    }
--- a/bundles/letsencrypt/README.md
+++ b/bundles/letsencrypt/README.md
@ -1,9 +1,60 @@
-https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
+# letsencrypt
 Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
 DNS-01 against the in-house bind-acme server.
 [upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
 ## First-apply behaviour
 Immediately after `bw apply <node>`, nginx serves a **self-signed
 cert** for each declared domain — generated by
 `/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
 something to start with. The real Let's Encrypt cert arrives at most
 24h later when the systemd timer fires
 (`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
 shortcut the wait:
 ```sh
 ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
 ssh <node> 'sudo systemctl reload nginx'
 ```
 ## DNS-01 prerequisites
 `hook.sh` does `nsupdate` against the bind-acme server (referenced
 by `letsencrypt/acme_node`). For the challenge to succeed:
 1. The acme node must be in the same metadata graph (so
   `bw metadata <node> -k letsencrypt/acme_node` resolves).
 2. **All NS servers** for the validated domain must serve the
   `_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
   primary AND secondary geographic regions; both authoritative
   servers must agree. If a secondary NS is also a bw-managed node,
   `bw apply` it after adding the domain (see e.g. `ovh.secondary`).
 3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
   rendered with the bind-acme server's `network/internal/ipv4` —
   for clients outside that LAN, the route must exist (typically via
   wireguard `s2s` peer membership).
 ## Negative-cache penalty
 If the first DNS-01 attempt fails (e.g. zone not yet applied to the
 secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
 negative TTL (often 900s = 15 min). Subsequent attempts during that
 window also fail and refresh the cache. Combined with LE's rate limit
 of **5 failed authorisations per domain per hour**, recovery requires
 you to **stop retrying** for ~15 minutes after fixing the DNS, then
 make at most one attempt.
 ## nsupdate sample
 For interactive testing of the bind-acme TSIG path:
 ```sh
 printf "server 127.0.0.1
 zone acme.resolver.name.
-update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT "hello"
+update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
 send
 " | nsupdate -y hmac-sha512:acme:XXXXXX
 ```
--- a/bundles/letsencrypt/metadata.py
+++ b/bundles/letsencrypt/metadata.py
@ -2,7 +2,7 @@ defaults = {
    'apt': {
        'packages': {
            'dehydrated': {},
-            'dnsutils': {},
+            'bind9-dnsutils': {},
        },
    },
    'letsencrypt': {
--- a/bundles/nginx/README.md
+++ b/bundles/nginx/README.md
@ -0,0 +1,36 @@
 # nginx
 Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
 `data/nginx/*.conf`.
 ## How port 80 is served
 The bundle ships a fixed `80.conf` to
 `/etc/nginx/sites-available/80.conf` (picked up by the
 `sites-enabled/` symlink) that handles **all** port-80 traffic
 across vhosts:
 1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
   served from `/var/lib/dehydrated/acme-challenges/`.
 2. All other port-80 requests are 301-redirected to
   `https://$host$request_uri`.
 Per-vhost templates only declare `listen 443 ssl http2;`, so they
 don't need their own port-80 server blocks. If you need vhost-
 specific port-80 behaviour (e.g. plain-HTTP without redirect),
 override 80.conf or add a per-vhost block.
 ## Required metadata
 - `vm/cores` — read directly by `items.py` for `worker_processes`.
  No default; `bw items <node>` raises at item-build time if missing.
  Typically supplied by the `vm` bundle / hetzner-vm group; double-
  check on bare-metal hosts.
 - `nginx/vhosts` — dict of vhost-name → vhost-config.
 - `nginx/modules` — list of dynamic modules to load.
 ## Cross-namespace
 `items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
 HTTPS block when LE hasn't declared the domain yet — keeps the
 bundle loadable on a node where letsencrypt isn't fully wired up.
--- a/bundles/nginx/files/nginx.conf
+++ b/bundles/nginx/files/nginx.conf
@ -32,12 +32,13 @@ http {
    % endif
-    % if has_websockets:
+    # Always defined: serves both WS-enabled vhosts (Connection: upgrade for
    # ws clients) and SSE/keep-alive vhosts (Connection: "" lets nginx manage
    # the upstream connection for keep-alive, instead of forcing "close").
    map $http_upgrade $connection_upgrade {
        default upgrade;
-        '' close;
+        '' '';
    }
    % endif
    include /etc/nginx/sites-enabled/*;
 }
--- a/bundles/nginx/items.py
+++ b/bundles/nginx/items.py
@ -64,7 +64,7 @@ files = {
            'svc_systemd:nginx:restart',
        },
    },
-    '/etc/nginx/sites/80.conf': {
+    '/etc/nginx/sites-available/80.conf': {
        'triggers': {
            'svc_systemd:nginx:restart',
        },
--- a/bundles/systemd/items.py
+++ b/bundles/systemd/items.py
@ -33,7 +33,7 @@ for name, unit in node.metadata.get('systemd/units').items():
                'svc_systemd:systemd-networkd.service:restart',
            ],
        }
-    elif extension in ['timer', 'service', 'mount', 'swap', 'target']:
+    elif extension in ['timer', 'service', 'mount', 'swap', 'target', 'slice']:
        path = f'/usr/local/lib/systemd/system/{name}'
        dependencies = {
            'triggers': [
--- a/data/nginx/proxy_pass.conf
+++ b/data/nginx/proxy_pass.conf
@ -8,10 +8,16 @@ server {
    location / {
        proxy_set_header   X-Real-IP          $remote_addr;
-% if websockets:
+        # Always set Upgrade + Connection via the $connection_upgrade map:
        #   WS client (Upgrade header sent)  -> Connection: upgrade
        #   non-WS client (no Upgrade)       -> Connection: "" (keep-alive)
        # Lets every vhost serve both WS and SSE without per-vhost flags.
        proxy_http_version 1.1;
        proxy_set_header   Upgrade            $http_upgrade;
        proxy_set_header   Connection         $connection_upgrade;
-% endif
+        # SSE-safe pass-through (also fine for non-SSE traffic):
        proxy_buffering    off;
        proxy_read_timeout 1h;
        proxy_pass ${target};
    }
 }
--- a/docs/agents/commands.md
+++ b/docs/agents/commands.md
@ -48,3 +48,51 @@ instead.
 See [`conventions.md#secrets`](conventions.md#secrets) for the
 demagify magic-string list and the rule's full rationale.
 ## Read-only commands — useful flag combinations
 The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
 These are the flag combinations agents reach for most often in this repo:
 | Want to … | Run |
 |---|---|
 | Sanity-check the whole repo (parse + cross-cutting hooks)        | `bw test` (defaults to `-HIJKMSp`) |
 | Exercise reactors and item-graph for one node                    | `bw test <node>` (defaults to `-IJKMp`) |
 | Same, but every node that has a given bundle                     | `bw test bundle:<name>` |
 | Print one metadata key for one node                              | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
 | Show where each metadata value comes from                        | `bw metadata <node> -b` |
 | Resolve Faults (vault values) into the dump                      | `bw metadata <node> -f` — **may print secrets, avoid** |
 | List a node's items, with the bundle that defines each           | `bw items <node> --blame` |
 | Preview a rendered file's content                                | `bw items <node> file:<path> -f` |
 | Verify against the live host, scoped to one bundle               | `bw verify <node> -o bundle:<name>` |
 | Hash metadata only (faster than full config hash)                | `bw hash <node> -m` |
 | Inspect the data backing a hash                                  | `bw hash <node> -d` |
 `bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
 selector grammar: bare node name, group name, `bundle:<name>`,
 `!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
 [fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
 ## Bundle-validation workflow
 `bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
 loads every bundle, but a bundle's reactors only resolve when a node's
 metadata is actually built — and that happens only for nodes that
 opt in. Until then, reactor bugs stay dormant. bw rejects reactors
 that don't read any metadata, but the rejection only fires once *some*
 node consumes the bundle.
 When developing a new bundle:
 1. Scaffold + `bw test` — confirms parsing.
 2. **Attach the bundle to one node** (or a stub node) by adding it to
   `nodes/<n>.py`'s `bundles` list, or to a group the node is in.
 3. `bw test <node>` — now reactors fire. This is where bundle bugs
   surface.
 4. `bw items <node> --blame` and `bw metadata <node> -k <key>` —
   confirm items materialise and derived metadata looks right.
 5. `bw hash <node>` — preview against the live host.
 Step 2 is non-optional. A bundle that "passes `bw test`" with no
 consumer is proven only to parse.
--- a/docs/superpowers/specs/2026-05-10-agent-friendliness-design.md
+++ b/docs/superpowers/specs/2026-05-10-agent-friendliness-design.md
@ -127,6 +127,12 @@ bundle.
 ## 3. Per-bundle `AGENTS.md` template
 > **Status: replaced — pre-pivot intent only.** Per-bundle docs are plain
 > `README.md` with no fixed structure. See §0 Revisions and the
 > "Per-bundle README" section in [`bundles/AGENTS.md`](../../../bundles/AGENTS.md)
 > for the current convention. The template below is kept as a record of
 > the original design.
 One balanced doc serving both audiences. Prose where prose helps, structure
 where structure helps. Sections in order:
@ -339,6 +345,12 @@ in 30–120 lines each; root `AGENTS.md` is ~150 lines.
 ### Phase 2 — seed bundles (10)
 > **Status: dropped — pre-pivot intent only.** Phase 2 didn't ship. After
 > Phase 1 landed, the maintainer pulled the per-bundle `AGENTS.md`
 > migration: the rigid template proved a poor fit for the heterogeneous
 > existing READMEs. See §0 Revisions. The seed list and migration plan
 > below are kept as a record of how the work was scoped.
 Bundles selected empirically (node+group references and recent commit
 activity, validated 2026-05-10):
--- a/docs/superpowers/specs/2026-05-10-ckn-bw-agents-md-refactor-round-1-design.md
+++ b/docs/superpowers/specs/2026-05-10-ckn-bw-agents-md-refactor-round-1-design.md
@ -0,0 +1,253 @@
 # Round 1 — agent-doc refactor (gaps 1–6 + cmd cheat sheet)
 ## Why
 A previous session integrated `bundles/left4me/` and brought
 `ovh.left4me` live. The integration produced a handoff (at
 `~/.claude/plans/2026-05-10-ckn-bw-docs-improvements-handoff.md`)
 listing 12 documentation gaps surfaced by the work. This spec covers
 the first six (the cross-cutting ones) plus a useful side-quest:
 adding a read-only command cheat sheet to `docs/agents/commands.md`.
 Gaps 7–12 (item-specific, bundle READMEs) are deferred to a follow-up
 round.
 ## Scope
 In:
 - Gap 1 — drop `bw bundles` (doesn't exist), add `bw verify` to the
  read-only allowlist.
 - Gap 2 — bundle-validation workflow needs a node attached.
 - Gap 3 — nodes carry only node-specific metadata (split across
  `bundles/AGENTS.md` and `nodes/AGENTS.md`).
 - Gap 4 — reactors must read metadata or be defaults.
 - Gap 5 — `triggers` ↔ `triggered: True` invariant + self-healing
  pattern.
 - Gap 6 — `unless` semantics (folded into Gap 5's second bullet).
 - Side-quest: read-only command cheat sheet in `commands.md` (`bw
  test` flag matrix + selectors, `bw metadata -k/-b/-f`, `bw items
  --blame/-f`, `bw verify -o bundle:`, `bw hash -m/-d`).
 Out:
 - Gaps 7–12 (`source` implicit, `git_deploy` chown, `git_deploy` URL
  form, letsencrypt/bind/nginx READMEs).
 - Any change to bundle behaviour. This is pure docs; if a doc claim
  feels wrong, push back to the maintainer rather than editing
  `.py`.
 ## Verification approach
 For each gap, find current line numbers in the target doc (handoff
 line numbers are May 2026; some have drifted). Verify code-level
 claims against the fork source under `.venv/src/bundlewrap/` before
 quoting them.
 Already verified during brainstorm:
 - Gap 1: `bw bundles` is not a subcommand of the installed fork
  (`.venv/bin/bw --help` lists only
  `apply, debug, diff, groups, hash, ipmi, items, lock, metadata,
  nodes, plot, pw, repo, run, stats, test, verify, zen`). `bw verify`
  is read-only.
 - Gap 2: `bw test` default flag set differs by mode. Whole-repo:
  `-HIJKMSp`. Node-targeted: `-IJKMp`. The repo-mode adds `-H`
  (repo hooks) and `-S` (subgroup-loops); the node-mode adds `-J`
  (node hooks). Reactors only resolve when a node's metadata is
  built, which only happens when a node opts into the bundle.
 - Gap 4: exact wording at `metagen.py:428`:
  `"{reactor_name} on {node_name} did not request any metadata, you
  might want to use defaults instead"`.
 - Gap 5: exact wording at `deps.py:340`:
  `"'{item1}' in bundle '{bundle1}' triggered by '{item2}' in bundle
  '{bundle2}', but missing 'triggered' attribute"`.
 - Gap 3 precedent: `bundles/left4me/metadata.py:10` is the canonical
  random-bytes-in-defaults example. `bundles/postgresql/metadata.py:4`
  is the password_for-at-module-scope example. (The handoff cites
  postgresql for the random-bytes pattern; that's a misattribution —
  postgresql uses `password_for`.)
 After every commit: `.venv/bin/bw test` must pass with the same
 output as before. Pure-docs edits cannot break it unless a `.py` is
 touched accidentally.
 ## Commits
 Six iterative commits, matching repo style.
 ### Commit 1 — drop `bw bundles`, add `bw verify` (Gap 1)
 `AGENTS.md` rule 1 only. The handoff also flagged
 `bundles/AGENTS.md:60-64`, but that list no longer references
 `bw bundles` (it currently reads `bw test` / `bw items` / `bw hash`).
 That section gets rewritten in commit 3, not here.
 ```diff
 - to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
 - `bw items`, `bw metadata`, `bw hash`, `bw debug`. See
 + to `bw test`, `bw nodes`, `bw groups`, `bw items`,
 + `bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
 ```
 ### Commit 2 — read-only command cheat sheet
 Append to `docs/agents/commands.md`. New H2 section, table format
 to match the existing voice.
 ```markdown
 ## Read-only commands — useful flag combinations
 The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
 These are the flag combinations agents reach for most often in this repo:
 | Want to … | Run |
 |---|---|
 | Sanity-check the whole repo (parse + cross-cutting hooks)        | `bw test` (defaults to `-HIJKMSp`) |
 | Exercise reactors and item-graph for one node                    | `bw test <node>` (defaults to `-IJKMp`) |
 | Same, but every node that has a given bundle                     | `bw test bundle:<name>` |
 | Print one metadata key for one node                              | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
 | Show where each metadata value comes from                        | `bw metadata <node> -b` |
 | Resolve Faults (vault values) into the dump                      | `bw metadata <node> -f` — **may print secrets, avoid** |
 | List a node's items, with the bundle that defines each           | `bw items <node> --blame` |
 | Preview a rendered file's content                                | `bw items <node> file:<path> -f` |
 | Verify against the live host, scoped to one bundle               | `bw verify <node> -o bundle:<name>` |
 | Hash metadata only (faster than full config hash)                | `bw hash <node> -m` |
 | Inspect the data backing a hash                                  | `bw hash <node> -d` |
 `bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
 selector grammar: bare node name, group name, `bundle:<name>`,
 `!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
 [fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
 ```
 ### Commit 3 — bundle validation needs a node attached (Gap 2)
 Two file changes.
 **`bundles/AGENTS.md` lines 59-64** — replace the Verify list:
 ```markdown
 5. **Verify, in this order:**
   - `bw test` — repo-wide parse + cross-cutting hooks. Loads every
     bundle, but reactors don't fire for nodes that haven't opted into
     the bundle yet — bugs in new reactors stay hidden here.
   - **Attach the bundle to a node** (via the node's `bundles` list, or
     a group it belongs to). Until you do, the next steps don't actually
     exercise the bundle.
   - `bw test <node>` — exercises every reactor and item-graph edge for
     that node. This is where most new-bundle bugs surface.
   - `bw items <node> --blame` — confirm items materialise with the right
     paths, authored by the expected bundle.
   - `bw metadata <node> -k <a/b>` — spot-check derived metadata.
   - `bw hash <node>` — preview vs current host state.
   See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
   for the rationale.
 ```
 **`docs/agents/commands.md`** — new section after the cheat sheet:
 ```markdown
 ## Bundle-validation workflow
 `bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
 loads every bundle, but a bundle's reactors only resolve when a node's
 metadata is actually built — and that happens only for nodes that
 opt in. Until then, reactor bugs stay dormant. bw rejects reactors that
 don't read any metadata, but the rejection only fires once *some* node
 consumes the bundle.
 When developing a new bundle:
 1. Scaffold + `bw test` — confirms parsing.
 2. **Attach the bundle to one node** (or a stub node) by adding it to
   `nodes/<n>.py`'s `bundles` list, or to a group the node is in.
 3. `bw test <node>` — now reactors fire. This is where bundle bugs
   surface.
 4. `bw items <node> --blame` and `bw metadata <node> -k <key>` — confirm
   items materialise and derived metadata looks right.
 5. `bw hash <node>` — preview against the live host.
 Step 2 is non-optional. A bundle that "passes `bw test`" with no consumer
 is proven only to parse.
 ```
 ### Commit 4 — nodes carry only node-specific metadata (Gap 3)
 **`bundles/AGENTS.md` Conventions** — new bullet:
 ```markdown
 - **Bundles own application-wide knowledge; nodes carry only the few
  per-host knobs the bundle actually needs.** When designing a bundle,
  identify the per-node knobs (e.g. domain, uplink interface, a
  vault-id suffix) and put everything else in `defaults`, or in a
  reactor that derives from those knobs. Per-node random secrets
  belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
  keyed on the node — not in the node file. See
  `bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
  and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
  at module scope).
 ```
 **`nodes/AGENTS.md` Pitfalls** — new bullet:
 ```markdown
 - **Bloated per-node metadata is usually a bundle smell.** If a
  bundle's metadata block in the node file has more than 3-5 keys,
  the bundle is probably under-using `defaults` / reactors. Push the
  contribution into the bundle (see
  [`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
  growing the node file.
 ```
 ### Commit 5 — reactors must read metadata or be defaults (Gap 4)
 **`bundles/AGENTS.md` Pitfalls** — new bullet:
 ```markdown
 - **Reactors must read metadata.** If a reactor body returns a static
  dict without calling `metadata.get(...)`, bw raises
  `ValueError: <reactor> on <node> did not request any metadata, you
  might want to use defaults instead` once a node consumes the bundle.
  Fix: fold the contribution into `defaults`. The rule applies even
  when the reactor writes into another bundle's namespace — a static
  contribution to e.g. `nftables/output` belongs in `defaults`, where
  bw merges it with other bundles' contributions.
 ```
 ### Commit 6 — `triggers` invariant + self-healing + `unless` (Gaps 5+6)
 **`bundles/AGENTS.md` Pitfalls** — two new bullets (Gap 6's `unless`
 semantics fold into the second; cleaner than three bullets):
 ```markdown
 - **`triggers` ↔ `triggered: True` invariant.** Any item listed in
  another's `triggers` list must declare `triggered: True`. bw
  enforces this at `bw test` time: *"…triggered by …, but missing
  'triggered' attribute"*. Corollary: an action can't be both in an
  upstream `triggers` list AND self-healing every apply — pick one.
 - **Triggered actions don't recover from partial failure.** When an
  upstream item's apply succeeds but its triggered downstream action
  fails, subsequent applies can't recover via the trigger chain —
  upstream is "already in desired state" and never re-triggers. For
  actions that must self-heal (pip installs, chowns, migrations),
  drop `triggered: True` and gate the command with `unless:
  <fast-check>`. `unless` is a shell command on the target host whose
  exit status decides whether the main command runs (exit 0 = skip);
  it's checked at fire time, after `triggered:` filtering.
 ```
 ## Out of scope
 - Gaps 7–12 — deferred. The maintainer re-engages after this round.
 - Bundle behaviour changes. Pure docs.
 - `bw apply` / `bw run` — not authorised this session.
 ## Constraints
 - Don't echo decrypted secrets in commit messages or new doc text.
 - Don't restore `*.py_` parked nodes.
 - After each commit, `.venv/bin/bw test` must pass.
 - No push.
--- a/docs/superpowers/specs/2026-05-10-ckn-bw-agents-md-refactor-round-2-design.md
+++ b/docs/superpowers/specs/2026-05-10-ckn-bw-agents-md-refactor-round-2-design.md
@ -0,0 +1,286 @@
 # Round 2 — agent-doc refactor (gaps 7–12)
 ## Why
 Continuation of round 1 (spec at
 `2026-05-10-ckn-bw-agents-md-refactor-round-1-design.md`). Round 1
 landed the cross-cutting lessons (read-only allowlist, bundle
 validation needs a node, nodes-carry-only-node-specific-metadata,
 reactors-must-read-metadata, triggers/triggered:True invariant,
 self-healing pattern). Round 2 covers the remaining six gaps: built-in
 item-type gotchas and three bundle READMEs.
 ## Scope
 In:
 - Gap 7 — `file:`'s `source` defaults to the basename of the destination.
 - Gap 8 — `git_deploy` extracts as the connecting user (root after
  sudo); chown action needed for non-root downstream consumers.
 - Gap 9 — `git_deploy` URL form: `://` triggers per-apply clone, no `://`
  requires a `git_deploy_repos` map at the repo root.
 - Gap 10 — `bundles/letsencrypt`: first-apply behaviour, DNS-01
  prerequisites, negative-cache penalty.
 - Gap 11 — `bundles/bind`: applying changes to a `master_node`-linked
  pair needs `bw apply` on both ends.
 - Gap 12 — `bundles/nginx`: how port 80 is served, `vm/cores`
  requirement.
 Out:
 - Bundle behaviour changes. Pure docs.
 - `bw apply` / `bw run` — not authorised this session.
 ## Placement decision (diverges from the handoff)
 The handoff suggests `items/AGENTS.md` for gaps 7, 8, 9. But
 `items/AGENTS.md` is scoped to **custom** item types in the `items/`
 directory — its first sentence: *"Custom item types — each `*.py` is
 a `bundlewrap.items.Item` subclass…"*. Built-in gotchas (`file:`,
 `git_deploy:`) don't fit there.
 Round-1 lessons about built-in mechanics (reactors must read metadata,
 `triggers` invariant, self-healing pattern) all landed in
 `bundles/AGENTS.md` Pitfalls. Gaps 7, 8, 9 are the same shape, so
 they go in the same place.
 ## Validation findings
 - Gap 7: well-known bw built-in semantics. Trusting the handoff.
 - Gap 8: confirmed at `.venv/src/bundlewrap/bundlewrap/items/git_deploy.py`'s
  `fix()` method — uses `self.node.upload(...)` which writes as the sudo
  user (root). Files end up root-owned.
 - Gap 9: confirmed in round 1 (`git_deploy.py:103` —
  `if "://" in self.attributes['repo']:`).
 - Gap 10: confirmed `/etc/dehydrated/letsencrypt-ensure-some-certificate`
  exists in the bundle; runs on every domain with idempotent `unless`.
  Daily timer at `/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`.
 - Gap 11: nuanced. The bundle DOES set `bind/type = 'slave'` and renders
  different named.conf.local for slaves, so bind itself may AXFR at
  runtime. But the slave's *bw-managed* zone files are statically
  rendered from the master's metadata at slave-apply time
  (`bundles/bind/items.py:100`). The practical workflow rule — "apply
  both" — is correct regardless. I'll frame the README as the workflow
  rule, not the absolute "not AXFR slaving" claim from the handoff.
 - Gap 12: confirmed `nginx.conf:42` includes `/etc/nginx/sites-enabled/*`;
  `nginx/items.py:35` reads `node.metadata.get('vm/cores')` with no
  default. README does not exist.
 ## Existing README states
 - `bundles/letsencrypt/README.md` — 9 lines: upstream link + nsupdate
  snippet. Reshape into an operational README; keep the nsupdate snippet.
 - `bundles/bind/README.md` — does not exist. Create.
 - `bundles/nginx/README.md` — does not exist. Create.
 ## Commits
 ### Commit 7 — `file:` source defaults to destination basename (Gap 7)
 `bundles/AGENTS.md` Pitfalls — new bullet:
 ```markdown
 - **`file:` `source` defaults to the destination basename.** For a
  destination of `/etc/foo/bar.conf` with no `source` key, bw looks for
  `bundles/<bundle>/files/bar.conf`. Only declare `source` explicitly
  when the basename you want differs (e.g. shipping a Mako template
  named `bar.conf.mako` to a destination of `/etc/foo/bar.conf`).
 ```
 ### Commit 8 — `git_deploy` gotchas (Gaps 8 + 9)
 `bundles/AGENTS.md` Pitfalls — two new bullets.
 ```markdown
 - **`git_deploy` extracts as the connecting (sudo) user — files end up
  root-owned.** A downstream action that runs as a non-root app user
  (typical: editable pip install, Rails bundle install) will hit
  `Permission denied` on `.egg-info` or similar. The fix is a
  self-healing chown action between `git_deploy` and the downstream
  action:
  ```python
  actions['<bundle>_chown_src'] = {
      'command': 'chown -R <user>:<group> <path>',
      'unless': 'test -z "$(find <path> ! -user <user> -print -quit)"',
      'cascade_skip': False,
      'needs': ['git_deploy:<path>', 'user:<user>', 'group:<group>'],
  }
  ```
  See `bundles/left4me/items.py` for an in-tree example.
 - **`git_deploy` URL form matters.** A URL containing `://` (HTTP/HTTPS,
  `ssh://`) makes bw clone to a temp dir per-apply — no operator-side
  state needed. Without `://` (SCP-style `git@host:path`), bw expects a
  `git_deploy_repos` map file at the repo root pointing at a long-lived
  local clone, and raises `RepositoryError('missing repo map for
  git_deploy')` if it isn't there. For HTTPS-reachable repos use the
  HTTPS form; for SSH-only, prefer the explicit `ssh://user@host/path`
  form so the map isn't needed.
 ```
 ### Commit 9 — letsencrypt README (Gap 10)
 Reshape `bundles/letsencrypt/README.md`. Keep the upstream link and
 nsupdate snippet at the top; add three structured sections.
 ```markdown
 # letsencrypt
 Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
 DNS-01 against the in-house bind-acme server.
 [upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
 ## First-apply behaviour
 Immediately after `bw apply <node>`, nginx serves a **self-signed
 cert** for each declared domain — generated by
 `/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
 something to start with. The real Let's Encrypt cert arrives at most
 24h later when the systemd timer fires
 (`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
 shortcut the wait:
 ```sh
 ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
 ssh <node> 'sudo systemctl reload nginx'
 ```
 ## DNS-01 prerequisites
 `hook.sh` does `nsupdate` against the bind-acme server (referenced
 by `letsencrypt/acme_node`). For the challenge to succeed:
 1. The acme node must be in the same metadata graph (so
   `bw metadata <node> -k letsencrypt/acme_node` resolves).
 2. **All NS servers** for the validated domain must serve the
   `_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
   primary AND secondary geographic regions; both authoritative
   servers must agree. If a secondary NS is also a bw-managed node,
   `bw apply` it after adding the domain (see e.g. `ovh.secondary`).
 3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
   rendered with the bind-acme server's `network/internal/ipv4` —
   for clients outside that LAN, the route must exist (typically via
   wireguard `s2s` peer membership).
 ## Negative-cache penalty
 If the first DNS-01 attempt fails (e.g. zone not yet applied to the
 secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
 negative TTL (often 900s = 15 min). Subsequent attempts during that
 window also fail and refresh the cache. Combined with LE's rate limit
 of **5 failed authorisations per domain per hour**, recovery requires
 you to **stop retrying** for ~15 minutes after fixing the DNS, then
 make at most one attempt.
 ## nsupdate sample
 For interactive testing of the bind-acme TSIG path:
 ```sh
 printf "server 127.0.0.1
 zone acme.resolver.name.
 update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
 send
 " | nsupdate -y hmac-sha512:acme:<TSIG_KEY_REDACTED>
 ```
 ```
 ### Commit 10 — bind README (Gap 11, reframed)
 Create `bundles/bind/README.md`. Frame as the workflow rule, not the
 absolute "not AXFR" claim.
 ```markdown
 # bind
 Authoritative DNS — primary plus optional `bind/master_node` slaves.
 ## Applying changes needs both nodes
 The slave's bw-managed zone files are rendered from the master's
 metadata at slave-apply time (see `bundles/bind/items.py:100`). When
 you change a record on the master (adding a `letsencrypt/domains`
 entry, a new vhost, etc.), the change is only published once you
 apply BOTH:
 ```sh
 bw apply htz.mails        # primary (where the source records live)
 bw apply ovh.secondary    # secondary (renders its own zone files)
 ```
 Until both have been applied, `bw verify ovh.secondary` will show
 stale zones and consumers that hit the secondary (Let's Encrypt's
 secondary-region validators in particular) will see NXDOMAIN. Even
 though the slave's named.conf.local declares `type slave;`, don't
 rely on bind's own AXFR catching up — the bw-rendered file on disk
 is what `bw verify` measures.
 ## See also
 - `bundles/bind-acme/` — the in-house ACME-update receiver.
 - `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
  negative-cache penalty (the most common operational consequence of
  forgetting to apply the secondary).
 ```
 ### Commit 11 — nginx README (Gap 12)
 Create `bundles/nginx/README.md`.
 ```markdown
 # nginx
 Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
 `data/nginx/*.conf`.
 ## How port 80 is served
 The bundle ships a fixed `80.conf` to
 `/etc/nginx/sites-available/80.conf` (picked up by the
 `sites-enabled/` symlink) that handles **all** port-80 traffic
 across vhosts:
 1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
   served from `/var/lib/dehydrated/acme-challenges/`.
 2. All other port-80 requests are 301-redirected to
   `https://$host$request_uri`.
 Per-vhost templates only declare `listen 443 ssl http2;`, so they
 don't need their own port-80 server blocks. If you need vhost-
 specific port-80 behaviour (e.g. plain-HTTP without redirect), you'll
 need to override 80.conf or add a per-vhost block.
 ## Required metadata
 - `vm/cores` — read directly by `items.py` for `worker_processes`.
  No default; `bw items <node>` raises at item-build time if missing.
  Typically supplied by the `vm` bundle / hetzner-vm group; double-
  check on bare-metal hosts.
 - `nginx/vhosts` — dict of vhost-name → vhost-config.
 - `nginx/modules` — list of dynamic modules to load.
 ## Cross-namespace
 `items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
 HTTPS block when LE hasn't declared the domain yet — keeps the bundle
 loadable on a node where letsencrypt isn't fully wired up.
 ```
 ## Out of scope
 - Bundle behaviour changes. Pure docs.
 - `bw apply` / `bw run`.
 - Reformatting the existing two-line bundle READMEs into the new
  shape — bundles/AGENTS.md explicitly says don't do that
  ("uneven quality is part of what we accept in exchange for not
  blocking other work").
 ## Constraints
 - Don't echo decrypted secrets. The TSIG-key example in the
  letsencrypt nsupdate snippet uses `<TSIG_KEY_REDACTED>`.
 - After each commit, `.venv/bin/bw test` must pass.
 - No push.
--- a/groups/applications/left4me.py
+++ b/groups/applications/left4me.py
@ -0,0 +1,5 @@
 {
    'bundles': {
        'left4me',
    },
 }
--- a/nodes/AGENTS.md
+++ b/nodes/AGENTS.md
@ -81,6 +81,12 @@ This loader shape has consequences:
  These are intentional parks/buffers, not bugs.
 - **`id` must be unique.** A pre-apply hook (`hooks/unique_node_ids.py`)
  enforces this; duplicate IDs fail `bw test` and `bw apply`.
 - **Bloated per-node metadata is usually a bundle smell.** If a
  bundle's metadata block in the node file has more than 3-5 keys,
  the bundle is probably under-using `defaults` / reactors. Push the
  contribution into the bundle (see
  [`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
  growing the node file.
 ## See also
--- a/nodes/htz.mails.py
+++ b/nodes/htz.mails.py
@ -233,6 +233,7 @@
                        '10.0.229.0/24',
                    ],
                },
                'ovh.left4me': {},
            },
            'clients': {
                'macbook': {
--- a/nodes/ovh.left4me.py
+++ b/nodes/ovh.left4me.py
@ -1,15 +1,21 @@
 {
    'hostname': '141.95.32.8',
    'username': 'debian',
    'groups': [
        'backup',
        'debian-13',
        'left4me',
        'monitored',
        'webserver',
    ],
    'bundles': [
-        #'wireguard',
+        'wireguard',
    ],
    'metadata': {
        'id': '14d2abc-3855-4bb7-99e2-d4e3eb0344dd',
        'vm': {
            'cores': 4,    # 4 physical, 8 with HT
            'threads': 8,
        },
        'network': {
            'external': {
                'interface': 'enp3s0f0',
@ -35,5 +41,12 @@
                },
            },
        },
        'left4me': {
            'domain': 'left4.me',
            # Both HT siblings of physical core 0 (cpu0+cpu4 per
            # /sys/devices/system/cpu/cpu0/topology/thread_siblings_list).
            # Keeps system work off the physical cores running game ticks.
            'system_cpus': {0, 4},
        },
    },
 }