Compare commits

..

No commits in common. "c6caf2a1cf532cc675778840689daaba28891621" and "d4dedde0ad8af5d384df8653fa0e26d25a89bdf8" have entirely different histories.

33 changed files with 23 additions and 2011 deletions

2
.gitignore vendored
View file

@ -5,5 +5,3 @@
.bw_debug_history
# CocoIndex Code (ccc)
/.cocoindex_code/
# bundlewrap git_deploy local-mirror map (operator-specific paths)
git_deploy_repos

View file

@ -12,12 +12,12 @@ not project documentation. Onboarding lives **here**, in `AGENTS.md`.
## Quickstart for agents
Six rules; follow these and you won't break things:
Five rules; follow these and you won't break things:
1. **Read-only by default.** Never run `bw apply`, `bw run`, or
`bw lock` without explicit user request — even with `-i`. Stick
to `bw test`, `bw nodes`, `bw groups`, `bw items`,
`bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
`bw items`, `bw metadata`, `bw hash`, `bw debug`. See
[`docs/agents/commands.md`](docs/agents/commands.md) and the
fork's [safety envelope](https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md).
2. **Never echo decrypted secrets.** Don't print, paste, or log the
@ -38,15 +38,6 @@ Six rules; follow these and you won't break things:
5. **Prefer adding helpers to `libs/`** over duplicating logic across
bundles. Repo-wide helpers go in
[`libs/`](libs/AGENTS.md), reachable as `repo.libs.<x>`.
6. **`ccc` is available for semantic search.** This repo is indexed
with [`ccc`](https://github.com/cocoindex-io/cocoindex-code).
Reach for it on conceptual questions ("where is X used / which
bundles do Y / what are the contexts of Z"), where a keyword
grep would miss indirect usage:
`ccc search '<concept>' --path '**'`. Pass `--path '**'`
without it, results are filtered to the current working
directory's subtree. `grep`/`rg`/`find` remain fine for
exact-string lookups; pick whichever fits the question.
## Layout

View file

@ -41,16 +41,6 @@ bundles/<name>/
more than one bundle. Don't duplicate logic across bundles.
- **Custom item types** (e.g. `download:`) live in
[`items/`](../items/AGENTS.md), not per-bundle.
- **Bundles own application-wide knowledge; nodes carry only the few
per-host knobs the bundle actually needs.** When designing a bundle,
identify the per-node knobs (e.g. domain, uplink interface, a
vault-id suffix) and put everything else in `defaults`, or in a
reactor that derives from those knobs. Per-node random secrets
belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
keyed on the node — not in the node file. See
`bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
at module scope).
## How to add a new bundle
@ -66,22 +56,12 @@ bundles/<name>/
[`groups/<axis>/<x>.py`](../groups/AGENTS.md) (preferred for shared
bundles) or to the node's `bundles` list directly
([`nodes/AGENTS.md`](../nodes/AGENTS.md)).
5. **Verify, in this order:**
- `bw test` — repo-wide parse + cross-cutting hooks. Loads every
bundle, but reactors don't fire for nodes that haven't opted into
the bundle yet — bugs in new reactors stay hidden here.
- **Attach the bundle to a node** (via the node's `bundles` list, or
a group it belongs to). Until you do, the next steps don't actually
exercise the bundle.
- `bw test <node>` — exercises every reactor and item-graph edge for
that node. This is where most new-bundle bugs surface.
- `bw items <node> --blame` — confirm items materialise with the
right paths, authored by the expected bundle.
- `bw metadata <node> -k <a/b>` — spot-check derived metadata.
- `bw hash <node>` — preview vs current host state.
See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
for the rationale.
5. Verify, in this order:
- `bw test` — sanity (loaders + reactors).
- `bw items <node>` — confirm new items appear on a node that opts in.
- `bw hash <node>` — confirm the change is what you expected. See
[`docs/agents/commands.md`](../docs/agents/commands.md) and the
fork's hash-diff workflow.
6. Add a `bundles/<name>/README.md`. See "Per-bundle README" below
for what to cover.
@ -102,12 +82,6 @@ bundles/<name>/
unless the matching `file:` item declares `content_type='mako'`
(or a templating extension triggers it). To check, read the matching
`file:` entry in `items.py`.
- **`file:` `source` defaults to the destination basename.** For a
destination of `/etc/foo/bar.conf` with no `source` key, bw looks
for `bundles/<bundle>/files/bar.conf`. Only declare `source`
explicitly when the basename you want differs (e.g. shipping a Mako
template named `bar.conf.mako` to a destination of
`/etc/foo/bar.conf`).
- **Reactors writing across namespaces.** Some bundles' reactors write
into other bundles' metadata namespaces (e.g. `nextcloud` writes
into `apt.packages`, `archive.paths`). When you change such a bundle,
@ -116,28 +90,6 @@ bundles/<name>/
itself; grep `'<other-bundle>':` in the reactors when in doubt.
- **`bw hash` doesn't accept selectors.** Use `bw hash <node>` per
literal name; see the fork's runbook.
- **Reactors must read metadata.** If a reactor body returns a static
dict without calling `metadata.get(...)`, bw raises
`ValueError: <reactor> on <node> did not request any metadata, you
might want to use defaults instead` once a node consumes the bundle.
Fix: fold the contribution into `defaults`. The rule applies even
when the reactor writes into another bundle's namespace — a static
contribution to e.g. `nftables/output` belongs in `defaults`, where
bw merges it with other bundles' contributions.
- **`triggers``triggered: True` invariant.** Any item listed in
another's `triggers` list must declare `triggered: True`. bw
enforces this at `bw test` time: *"…triggered by …, but missing
'triggered' attribute"*. Corollary: an action can't be both in an
upstream `triggers` list AND self-healing every apply — pick one.
- **Triggered actions don't recover from partial failure.** When an
upstream item's apply succeeds but its triggered downstream action
fails, subsequent applies can't recover via the trigger chain —
upstream is "already in desired state" and never re-triggers. For
actions that must self-heal (pip installs, chowns, migrations),
drop `triggered: True` and gate the command with `unless: <fast-check>`.
`unless` is a shell command on the target host whose exit status
decides whether the main command runs (exit 0 = skip); it's checked
at fire time, after `triggered:` filtering.
## Per-bundle README

View file

@ -33,7 +33,6 @@ def acme_zone(metadata):
str(ip_interface(other_node.metadata.get('network/internal/ipv4')).ip)
for other_node in repo.nodes
if other_node.metadata.get('letsencrypt/domains', {})
and other_node.metadata.get('network/internal/ipv4', None)
},
*{
str(ip_interface(other_node.metadata.get('wireguard/my_ip')).ip)

View file

@ -1,30 +0,0 @@
# bind
Authoritative DNS — primary plus optional `bind/master_node` slaves.
## Applying changes needs both nodes
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time (see `bundles/bind/items.py:100`). When
you change a record on the master (adding a `letsencrypt/domains`
entry, a new vhost, etc.), the change is only published once you
apply BOTH:
```sh
bw apply htz.mails # primary (where the source records live)
bw apply ovh.secondary # secondary (renders its own zone files)
```
Until both have been applied, `bw verify ovh.secondary` will show
stale zones and consumers that hit the secondary (Let's Encrypt's
secondary-region validators in particular) will see NXDOMAIN. Even
though the slave's named.conf.local declares `type slave;`, don't
rely on bind's own AXFR catching up — the bw-rendered file on disk
is what `bw verify` measures.
## See also
- `bundles/bind-acme/` — the in-house ACME-update receiver.
- `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
negative-cache penalty (the most common operational consequence
of forgetting to apply the secondary).

View file

@ -1,114 +0,0 @@
# left4me
L4D2 game-server management platform: a Flask web UI on gunicorn that
provisions per-instance srcds servers via templated systemd units, with
kernel-overlayfs layering for shared installations + per-overlay maps,
and uid-based DSCP/priority marking on the egress path so CAKE on the
external interface prioritizes srcds UDP over bulk traffic.
## Metadata
```python
'metadata': {
'left4me': {
'domain': 'whatever.tld', # required — the only per-node knob
# Everything below is optional and has a sensible default in the
# bundle. Override per-node only if the default is wrong:
# 'git_url': 'git@git.sublimity.de:cronekorkn/left4me',
# 'git_branch': 'master',
# 'gunicorn_workers': 1,
# 'gunicorn_threads': 32,
# 'job_worker_threads': 4,
# 'port_range_start': 27015,
# 'port_range_end': 27115,
# secret_key is auto-derived per node
# (repo.vault.random_bytes_as_base64_for f'{node.name} left4me secret_key').
},
},
```
The bundle's `derived_from_domain` reactor reads `left4me/domain` and
emits the corresponding `nginx/vhosts`, `letsencrypt/domains`,
`monitoring/services/left4me-web` (HTTPS health check), and the game-
port `nftables/input` accept rules. Backup paths
(`/var/lib/left4me`, `/etc/left4me`) are set-merged into `backup/paths`
from defaults. None of these need to be declared per-node.
## What this bundle does
- Creates system users `left4me` (uid/gid 980, home `/var/lib/left4me`,
mode 0711) and `l4d2-sandbox` (uid/gid 981, no home, used by bwrap
script-overlay builds).
- Drops privileged helpers under `/usr/local/libexec/left4me/`
(`left4me-systemctl`, `left4me-journalctl`, `left4me-overlay`,
`left4me-script-sandbox`) plus a tight sudoers file (validated with
`visudo -cf` before install).
- `git_deploy`s the left4me repo to `/opt/left4me/src`, builds a venv at
`/opt/left4me/.venv`, `pip install -e`s both `l4d2host` and `l4d2web`,
runs `alembic upgrade head` and `flask seed-script-overlays`, then
enables `left4me-web.service`.
- Emits four systemd units via `systemd/units` metadata (consumed by
`bundles/systemd/`):
- `left4me-web.service` — gunicorn on `127.0.0.1:8000` (TLS terminates upstream).
- `left4me-server@.service` — per-instance srcds template, started on
demand by the web app via the `left4me-systemctl` helper.
- `l4d2-game.slice` / `l4d2-build.slice` — cgroup slices for the
perf-baseline (CPU/IO weights, memory caps).
- Contributes uid-based DSCP/priority marks for srcds UDP egress to
`nftables/output` (via `defaults`).
## Gotchas
- **Requires `bundles/nftables` and `bundles/systemd` on the node.** The
bundle asserts membership at `bw test` time. On Debian-13 these ride
in via the `debian-13` group, so attaching the bundle to a Debian-13
node is enough.
- **`left4me-web.service` does not have `NoNewPrivileges=true`.** This is
intentional — workers `sudo` the privileged helpers; `NoNewPrivileges`
would block setuid escalation. Per-instance `server@.service` units
*do* have it.
- **CAKE shaping is configured separately**, via
`network/<iface>/cake` on the node (consumed by `bundles/network/`),
not by this bundle.
- **First-run admin user is manual.** After `bw apply`, ssh to the host and
bootstrap the admin via the `left4me` wrapper (it sources the env files,
drops to the `left4me` user, and runs the flask CLI):
`sudo left4me create-user <username> --admin` (prompts for password via
the flask CLI, or set `LEFT4ME_ADMIN_PASSWORD` first). The bundle
deliberately doesn't seed an admin to keep credentials out of the
metadata pipeline. The same `left4me` wrapper accepts any other flask
subcommand: `sudo left4me seed-script-overlays <dir>`,
`sudo left4me routes`, `sudo left4me shell`, etc.
- **CPU isolation is managed by this bundle**, driven by one required
per-node knob: `left4me/system_cpus` — a set of int CPU ids that
pins `system.slice` / `user.slice` / `l4d2-build.slice`. The
complement (`set(range(vm/threads)) - system_cpus`) pins
`l4d2-game.slice`. On HT hosts, list both SMT siblings of every
physical core you want to reserve for system, otherwise games end
up sharing L1/L2 with system. Find pairings via
`/sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list`. On
the prod node (`ovh.left4me`, 4 physical / 8 threads, pairings
(0,4) (1,5) (2,6) (3,7)) the node sets `'system_cpus': {0, 4}` to
reserve physical core 0 entirely. `l4d2-game.slice` and
`l4d2-build.slice` carry `AllowedCPUs=` inline on their unit
definitions; `system.slice` and `user.slice` get drop-ins registered
under `systemd/units` with the `'<parent>.d/<basename>.conf'` key
convention (same shape nginx and autologin use), landing at
`/usr/local/lib/systemd/system/<slice>.d/99-left4me-cpuset.conf`.
The reactor raises if `system_cpus` includes CPUs outside
`[0, vm/threads)` or leaves no cores for games.
- **Kernel feature requirement:** kernel-overlayfs (`CONFIG_OVERLAY_FS`).
Standard on debian-13.
- **Game ports** open by the web app on demand in the range 27015-27115
(UDP+TCP). Add corresponding accept rules to `nftables/input` per
node if the host's policy is default-drop on input.
- **Pinned UIDs/GIDs (980/981).** Chosen for deterministic ownership
across rebuilds and backup restores. If you add another bundle that
pins UIDs in this repo, make sure it doesn't collide.
## Slice support requires `bundles/systemd` ≥ commit cc1c6a5
This bundle's `l4d2-game.slice` and `l4d2-build.slice` units rely on
`bundles/systemd/items.py` accepting the `.slice` extension. Older
revisions raised `Exception(f'unknown type slice')` at apply time.
The repo-wide `bw test` will catch this if it regresses.

View file

@ -1,6 +0,0 @@
# Managed by ckn-bw bundles/left4me. Local edits will be reverted.
# Deployment units use fixed /var/lib/left4me paths; regenerate units if this changes.
LEFT4ME_ROOT=/var/lib/left4me
# l4d2host invokes steamcmd by absolute path — bypasses PATH lookup so the
# script's `cd "$(dirname "$0")"` resolves next to the real install dir.
LEFT4ME_STEAMCMD=/opt/left4me/steam/steamcmd.sh

View file

@ -1,6 +0,0 @@
# Sandbox-only resolver config — bind-mounted into script-overlay sandboxes
# at /etc/resolv.conf. The host's resolver (often a private/LAN DNS server)
# is unreachable from inside the sandbox because IPAddressDeny= blocks
# egress to RFC1918 / loopback. Public resolvers keep DNS working.
nameserver 1.1.1.1
nameserver 8.8.8.8

View file

@ -1,7 +0,0 @@
# Managed by ckn-bw bundles/left4me. Local edits will be reverted.
DATABASE_URL=sqlite:////var/lib/left4me/left4me.db
SECRET_KEY=${node.metadata.get('left4me/secret_key')}
JOB_WORKER_THREADS=${node.metadata.get('left4me/job_worker_threads')}
SESSION_COOKIE_SECURE=true
LEFT4ME_PORT_RANGE_START=${node.metadata.get('left4me/port_range_start')}
LEFT4ME_PORT_RANGE_END=${node.metadata.get('left4me/port_range_end')}

View file

@ -1,5 +0,0 @@
Defaults:left4me !requiretty
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-overlay mount *, /usr/local/libexec/left4me/left4me-overlay umount *
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox

View file

@ -1,36 +0,0 @@
# Host-side perf baseline for left4me — see
# docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
#
# UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
# Source-engine UDP across multiple instances. 8 MiB matches the standard
# 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
# explicitly enlarge their buffers.
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.core.rmem_default = 524288
net.core.wmem_default = 524288
# Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
# at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
# netdev_budget = 600 gives softirq more drain headroom per pass.
net.core.netdev_max_backlog = 5000
net.core.netdev_budget = 600
# Latency-sensitive default: avoid swap unless the box is really under
# pressure. Harmless on swapless hosts.
vm.swappiness = 10
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
net.ipv4.udp_rmem_min = 16384
net.ipv4.udp_wmem_min = 16384
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian Trixie
# already defaults to fq_codel; setting it explicitly is belt-and-suspenders
# and survives kernel-default churn.
net.core.default_qdisc = fq_codel
# TCP congestion control: BBR for any bulk TCP egress on the host (admin SSH,
# backups, package fetches, web-app responses) so a long flow does not push
# the bottleneck queue ahead of game UDP. UDP srcds is unaffected.
net.ipv4.tcp_congestion_control = bbr

View file

@ -1,53 +0,0 @@
#!/bin/sh
set -eu
usage() {
printf '%s\n' "usage: left4me-journalctl <server-name> --lines <n> --follow|--no-follow" >&2
exit 2
}
validate_name() {
name=$1
[ -n "$name" ] || usage
case "$name" in
.*|*..*|*/*|*\\*) usage ;;
esac
case "$name" in
*[!A-Za-z0-9_.-]*) usage ;;
esac
}
[ "$#" -eq 4 ] || usage
name=$1
lines_flag=$2
lines=$3
follow_flag=$4
validate_name "$name"
[ "$lines_flag" = "--lines" ] || usage
case "$lines" in
''|*[!0-9]*) usage ;;
esac
follow_arg=
case "$follow_flag" in
--follow) follow_arg=-f ;;
--no-follow) ;;
*) usage ;;
esac
unit="left4me-server@${name}.service"
if [ -x /bin/journalctl ]; then
journalctl=/bin/journalctl
elif [ -x /usr/bin/journalctl ]; then
journalctl=/usr/bin/journalctl
else
printf '%s\n' 'journalctl not found at /bin/journalctl or /usr/bin/journalctl' >&2
exit 69
fi
if [ -n "$follow_arg" ]; then
exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"
fi
exec "$journalctl" -u "$unit" -n "$lines" -o cat

View file

@ -1,242 +0,0 @@
#!/usr/bin/python3
"""Privileged overlay mount helper for left4me.
Invoked from the systemd unit's ExecStartPre / ExecStopPost via
`+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- …`. The unit-level
nsenter is what makes this work: it runs the helper Python interpreter
inside PID 1's mount namespace. Without it, the `+` Exec prefix
removes the sandbox/credentials but does NOT detach from the unit's
per-service mount namespace, and the helper process itself would pin
that namespace alive — turning every umount into a multi-second EBUSY
race with the kernel's deferred namespace cleanup. With the unit-level
nsenter the helper has no such reference and umount succeeds first try.
Validates inputs strictly, then performs `mount -t overlay` /
`umount` directly — no internal nsenter, since the helper is already
running where the syscalls need to take effect.
Verbs:
mount <name> Reads ${LEFT4ME_ROOT}/instances/<name>/instance.env
for L4D2_LOWERDIRS, validates every lowerdir is
under one of installation/overlays/workshop_cache/
global_overlay_cache, then mounts the kernel
overlay at runtime/<name>/merged.
umount <name> Unmounts runtime/<name>/merged and cleans up the
kernel-overlayfs `work/work` orphan.
Set LEFT4ME_OVERLAY_PRINT_ONLY=1 to print the would-be argv (one line,
shell-quoted) and exit 0 instead of execv. Used by tests.
"""
import os
import re
import shlex
import shutil
import subprocess
import sys
from pathlib import Path
NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
DEFAULT_ROOT = "/var/lib/left4me"
LOWERDIR_ALLOWLIST = (
"installation",
"overlays",
"global_overlay_cache",
"workshop_cache",
)
MAX_LOWERDIRS = 500
MOUNT_BIN = "/bin/mount"
UMOUNT_BIN = "/bin/umount"
def die(msg: str) -> None:
sys.stderr.write(f"left4me-overlay: {msg}\n")
sys.exit(1)
def root() -> Path:
return Path(os.environ.get("LEFT4ME_ROOT") or DEFAULT_ROOT)
def validate_name(name: str) -> str:
if not NAME_RE.fullmatch(name):
die(f"invalid instance name: {name!r}")
return name
def parse_lowerdirs(env_path: Path) -> list[str]:
if not env_path.is_file():
die(f"instance.env not found: {env_path}")
raw = None
for line in env_path.read_text().splitlines():
if "=" not in line:
continue
key, value = line.split("=", 1)
if key.strip() == "L4D2_LOWERDIRS":
raw = value
break
if raw is None:
die(f"L4D2_LOWERDIRS not set in {env_path}")
if raw == "":
die(f"L4D2_LOWERDIRS is empty in {env_path}")
parts = raw.split(":")
if any(p == "" for p in parts):
die(f"L4D2_LOWERDIRS contains an empty entry: {raw!r}")
if len(parts) > MAX_LOWERDIRS:
die(f"L4D2_LOWERDIRS has {len(parts)} entries (cap {MAX_LOWERDIRS})")
return parts
def canonical_under(allowed_roots: list[Path], path: Path) -> Path:
try:
canonical = path.resolve(strict=True)
except (FileNotFoundError, RuntimeError):
die(f"path does not exist or has a symlink loop: {path}")
for r in allowed_roots:
if canonical == r or r in canonical.parents:
return canonical
die(f"path is outside the permitted roots: {path} (resolved: {canonical})")
_LISTXATTR = getattr(os, "listxattr", None)
def _entry_has_fuse_xattr(path: str) -> str | None:
if _LISTXATTR is None:
return None
try:
attrs = _LISTXATTR(path, follow_symlinks=False)
except OSError:
return None
for a in attrs:
if a.startswith("user.fuseoverlayfs."):
return a
return None
def assert_no_fuse_xattrs(upper: Path) -> None:
if not upper.exists() or _LISTXATTR is None:
return
for dirpath, dirnames, filenames in os.walk(upper):
for entry in (dirpath, *(os.path.join(dirpath, n) for n in dirnames),
*(os.path.join(dirpath, n) for n in filenames)):
tainted = _entry_has_fuse_xattr(entry)
if tainted:
die(
f"upperdir contains fuse-overlayfs xattr {tainted!r} on {entry}; "
"wipe upper/ and work/ before mounting"
)
def exec_or_print(argv: list[str]) -> None:
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
print(" ".join(shlex.quote(a) for a in argv))
sys.exit(0)
os.execv(argv[0], argv)
def cmd_mount(name: str) -> None:
name = validate_name(name)
r = root()
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
merged_for_check = (runtime_name_dir / "merged").resolve(strict=True)
# Idempotency for unit restart cycles: if a previous start mounted
# successfully but ExecStart failed afterwards (and Restart=on-failure
# fires another cycle), the second ExecStartPre would otherwise refuse
# to mount-on-top. Short-circuit here so the second cycle just gets
# straight to ExecStart. PRINT_ONLY (test mode) bypasses this so the
# tests can exercise the full nsenter argv regardless of mount state.
if (
os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") != "1"
and os.path.ismount(merged_for_check)
):
return
instance_env = r / "instances" / name / "instance.env"
raw_lowerdirs = parse_lowerdirs(instance_env)
allowed_roots = [(r / sub).resolve() for sub in LOWERDIR_ALLOWLIST]
canonical_lowerdirs = [str(canonical_under(allowed_roots, Path(p))) for p in raw_lowerdirs]
upper = (runtime_name_dir / "upper").resolve(strict=True)
work = (runtime_name_dir / "work").resolve(strict=True)
merged = merged_for_check
for label, path in (("upper", upper), ("work", work), ("merged", merged)):
if path.parent != runtime_name_dir:
die(f"{label} resolved outside runtime/{name}: {path}")
assert_no_fuse_xattrs(upper)
options = f"lowerdir={':'.join(canonical_lowerdirs)},upperdir={upper},workdir={work}"
argv = [
MOUNT_BIN,
"-t", "overlay",
"overlay",
"-o", options,
str(merged),
]
exec_or_print(argv)
def cmd_umount(name: str) -> None:
name = validate_name(name)
r = root()
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
merged_path = runtime_name_dir / "merged"
work_inner = runtime_name_dir / "work" / "work"
argv = [
UMOUNT_BIN,
# Resolve only if it exists; PRINT_ONLY tests always pre-create it.
str(merged_path.resolve(strict=True) if merged_path.exists() else merged_path),
]
# PRINT_ONLY: emit the umount argv and exit. Tests assert exact shape
# of this dry-run; the post-umount cleanup of work_inner is a runtime
# behaviour exercised on the host, not in unit tests.
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
print(" ".join(shlex.quote(a) for a in argv))
sys.exit(0)
if merged_path.exists():
merged = merged_path.resolve(strict=True)
if merged.parent != runtime_name_dir:
die(f"merged resolved outside runtime/{name}: {merged}")
# Idempotency: only umount if currently a mount point. Mirrors
# cmd_mount's symmetric check; a redundant cleanup pass — or a
# call after a partial _purge_instance — must be a no-op.
#
# No retry loop here: with the helper running in PID 1's mount
# namespace (via the unit-level `nsenter --mount=/proc/1/ns/mnt`
# in ExecStopPost), it holds no reference to the unit's
# per-service mount namespace, so the cgroup-empty → namespace
# reaped → umount-clears sequence happens without any race
# window for us to ride out. EBUSY here is a real error.
if os.path.ismount(merged):
subprocess.run(argv, check=True)
# Kernel-overlayfs creates work_inner during mount with root:root mode
# 0/0. After unmount it's an orphan that the unit's User= (left4me)
# cannot traverse via shutil.rmtree, so reset/delete in instances.py
# blows up with EACCES on `runtime/<name>/work/work`. The helper is
# the only code path with root that knows about this directory, so
# the cleanup belongs here. Safe to nuke — the kernel re-creates it
# on the next mount. Run unconditionally — covers both "we just
# unmounted" and "previous teardown didn't finish" cases.
if work_inner.exists():
shutil.rmtree(work_inner)
def main(argv: list[str]) -> None:
if len(argv) != 3 or argv[1] not in ("mount", "umount"):
sys.stderr.write("usage: left4me-overlay mount|umount <name>\n")
sys.exit(2)
if argv[1] == "mount":
cmd_mount(argv[2])
else:
cmd_umount(argv[2])
if __name__ == "__main__":
main(sys.argv)

View file

@ -1,82 +0,0 @@
#!/bin/bash
# Privileged sandbox launcher for left4me script overlays.
#
# Invoked via sudo by the web user with two arguments:
# <overlay_id> numeric overlay id; bind-mounts /var/lib/left4me/overlays/<id>
# read-write at /overlay inside the sandbox.
# <script_path> absolute path to a bash file already written by the web app;
# bind-mounted read-only at /script.sh inside the sandbox.
#
# The script runs as a transient systemd .service with the full hardening
# surface: cgroup limits + walltime kill, NoNewPrivileges, ProtectSystem,
# ProtectHome, kernel-tunable / -module / -log protection, namespace
# restriction, address-family restriction, capability bounding (empty),
# seccomp filter (@system-service @network-io), MemoryDenyWriteExecute,
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
# scripts must reach the public internet to download workshop / l4d2center
# / cedapug content. PID namespace is shared with the host (no
# PrivatePID= directive in systemd); host PIDs are visible via /proc but
# not signal-able due to UID mismatch.
set -euo pipefail
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
OVERLAY_ID=$1
SCRIPT=$2
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
[[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
exit 0
fi
# Make sure the sandbox UID owns the overlay dir so the script can write there.
# Idempotent: a no-op when the dir is already l4d2-sandbox-owned (re-run case),
# and corrects the ownership the first time the dir was created by the web app
# under the left4me UID. World-readable so the gameserver process (left4me)
# can read the overlay contents via the kernel-overlayfs lowerdir at runtime.
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
chmod 0755 "$OVERLAY_DIR"
SCRIPT_RC=0
systemd-run --quiet --collect --wait --pipe \
--unit="left4me-script-${OVERLAY_ID}-$$" \
--slice=l4d2-build.slice \
-p OOMScoreAdjust=500 \
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
-p UMask=0022 \
-p NoNewPrivileges=yes \
-p ProtectSystem=strict -p ProtectHome=yes \
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
-p RestrictNamespaces=yes \
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
-p MemoryDenyWriteExecute=yes \
-p SystemCallFilter="@system-service @network-io" \
-p SystemCallArchitectures=native \
-p CapabilityBoundingSet= -p AmbientCapabilities= \
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
-p TemporaryFileSystem="/etc /var/lib" \
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-p BindPaths="${OVERLAY_DIR}:/overlay" \
-p WorkingDirectory=/overlay \
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
-- /bin/bash /script.sh || SCRIPT_RC=$?
# Normalize perms so the web service (left4me uid) can read overlay files
# directly via Python open() — needed by the file tree's download endpoint.
# UMask=0022 above takes care of *new* writes; this catches anything the
# script created with a tighter mode (e.g. cedapug_maps writes its
# .cedapug/manifest.tsv as 0600 by default).
find "$OVERLAY_DIR" -type f ! -perm -o+r -exec chmod o+r {} + 2>/dev/null || true
find "$OVERLAY_DIR" -type d ! -perm -o+rx -exec chmod o+rx {} + 2>/dev/null || true
exit $SCRIPT_RC

View file

@ -1,44 +0,0 @@
#!/bin/sh
set -eu
usage() {
printf '%s\n' "usage: left4me-systemctl enable|disable|show <server-name>" >&2
exit 2
}
validate_name() {
name=$1
[ -n "$name" ] || usage
case "$name" in
.*|*..*|*/*|*\\*) usage ;;
esac
case "$name" in
*[!A-Za-z0-9_.-]*) usage ;;
esac
}
[ "$#" -eq 2 ] || usage
action=$1
name=$2
case "$action" in
enable|disable|show) ;;
*) usage ;;
esac
validate_name "$name"
unit="left4me-server@${name}.service"
if [ -x /bin/systemctl ]; then
systemctl=/bin/systemctl
elif [ -x /usr/bin/systemctl ]; then
systemctl=/usr/bin/systemctl
else
printf '%s\n' 'systemctl not found at /bin/systemctl or /usr/bin/systemctl' >&2
exit 69
fi
case "$action" in
enable) exec "$systemctl" enable --now "$unit" ;;
disable) exec "$systemctl" disable --now "$unit" ;;
show) exec "$systemctl" show --property=ActiveState --property=SubState "$unit" ;;
esac

View file

@ -1,17 +0,0 @@
#!/bin/sh
# Run l4d2web flask CLI commands as the left4me user with the deploy env loaded.
# Usage: left4me <flask-subcommand> [args...]
# Examples:
# left4me create-user alice --admin
# left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
# left4me routes
set -eu
exec sudo -u left4me sh -c '
set -a
. /etc/left4me/host.env
. /etc/left4me/web.env
set +a
export JOB_WORKER_ENABLED=false
export PYTHONPATH=/opt/left4me/src
exec /opt/left4me/.venv/bin/flask --app l4d2web.app:create_app "$@"
' sh "$@"

View file

@ -1,293 +0,0 @@
# Items for the left4me bundle.
# Systemd units come from metadata via bundles/systemd/ — there are no
# .service or .slice files in this bundle's files/ tree. Cpuset drop-ins
# for system.slice / user.slice are likewise emitted via systemd/units
# in metadata.py (key: '<parent>.d/<basename>.conf').
directories = {
'/opt/left4me': {
'owner': 'left4me',
'group': 'left4me',
},
'/opt/left4me/src': {
'owner': 'left4me',
'group': 'left4me',
},
'/etc/left4me': {
'owner': 'root',
'group': 'root',
'mode': '0755',
},
'/var/lib/left4me': {
# left4me's home dir — useradd creates with 0700; loosen to 0711 so
# l4d2-sandbox can traverse (but not list) for bwrap bind-mounts.
'owner': 'left4me',
'group': 'left4me',
'mode': '0711',
},
'/var/lib/left4me/installation': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/overlays': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/instances': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/runtime': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/workshop_cache': {'owner': 'left4me', 'group': 'left4me'},
'/var/lib/left4me/tmp': {'owner': 'left4me', 'group': 'left4me'},
'/opt/left4me/steam': {'owner': 'left4me', 'group': 'left4me'},
'/usr/local/libexec/left4me': {
'owner': 'root',
'group': 'root',
'mode': '0755',
},
}
groups = {
'left4me': {'gid': 980},
'l4d2-sandbox': {'gid': 981},
}
users = {
'left4me': {
'uid': 980,
'gid': 980,
'home': '/var/lib/left4me',
'shell': '/usr/sbin/nologin',
},
'l4d2-sandbox': {
'uid': 981,
'gid': 981,
'shell': '/usr/sbin/nologin',
},
}
# UIDs/GIDs pinned in the system-package range (100-999, per Debian
# policy) so file ownership is deterministic across rebuilds and
# backup restores. 980/981 are unused elsewhere in this repo.
# Privileged helpers (mode 0755 root:root). Listed by sudoers as the only
# commands left4me can invoke as root NOPASSWD.
HELPERS = (
'left4me-systemctl',
'left4me-journalctl',
'left4me-overlay',
'left4me-script-sandbox',
)
files = {
'/usr/local/sbin/left4me': {
'source': 'usr/local/sbin/left4me', # explicit — basename collides with sudoers
'mode': '0755',
'owner': 'root',
'group': 'root',
},
**{
f'/usr/local/libexec/left4me/{h}': {
'source': f'usr/local/libexec/left4me/{h}',
'mode': '0755',
'owner': 'root',
'group': 'root',
}
for h in HELPERS
},
'/etc/left4me/sandbox-resolv.conf': {
'source': 'etc/left4me/sandbox-resolv.conf',
'mode': '0644',
'owner': 'root',
'group': 'root',
},
'/etc/sudoers.d/left4me': {
'source': 'etc/sudoers.d/left4me',
'mode': '0440',
'owner': 'root',
'group': 'root',
'test_with': 'visudo -cf {}',
},
'/etc/sysctl.d/99-left4me.conf': {
'source': 'etc/sysctl.d/99-left4me.conf',
'mode': '0644',
'owner': 'root',
'group': 'root',
'triggers': [
'action:left4me_sysctl_reload',
],
},
'/etc/left4me/host.env': {
'source': 'etc/left4me/host.env.mako',
'content_type': 'mako',
'mode': '0644',
'owner': 'root',
'group': 'root',
},
'/etc/left4me/web.env': {
'source': 'etc/left4me/web.env.mako',
'content_type': 'mako',
'mode': '0640',
'owner': 'root',
'group': 'left4me',
'needs': [
'group:left4me',
],
},
}
actions = {
'left4me_sysctl_reload': {
'command': 'sysctl --system >/dev/null',
'triggered': True,
},
'left4me_dpkg_add_i386_arch': {
# steamcmd is 32-bit and pulls libc6:i386 + lib32z1 from the i386 arch.
# apt-get update is part of this action because newly-added foreign
# archs need a fresh package list before any :i386 package resolves.
'command': 'dpkg --add-architecture i386 && apt-get update',
'unless': 'dpkg --print-foreign-architectures | grep -qx i386',
'cascade_skip': False,
},
'left4me_install_steamcmd': {
# Steam's tarball is rolling with no published checksum, so we can't
# use download: (which requires a hash). Guard with a presence check
# on steamcmd.sh — steamcmd self-updates at runtime, so chasing the
# tarball version from bw isn't useful.
'command': (
'sudo -u left4me sh -c "'
'cd /opt/left4me/steam && '
'curl -fsSL https://media.steampowered.com/installer/steamcmd_linux.tar.gz | '
'tar -xz'
'"'
),
'unless': 'test -x /opt/left4me/steam/steamcmd.sh',
'cascade_skip': False,
'needs': [
'directory:/opt/left4me/steam',
'pkg_apt:curl',
'pkg_apt:libc6_i386', # bw pkg_apt convention: _ → :
'pkg_apt:lib32z1',
'user:left4me',
],
},
}
# steamcmd is invoked by absolute path (LEFT4ME_STEAMCMD in host.env),
# not via PATH lookup — see l4d2host/cli.py:install. We don't need to put
# anything in /usr/local/bin for it.
git_deploy = {
'/opt/left4me/src': {
'repo': node.metadata.get('left4me/git_url'),
'rev': node.metadata.get('left4me/git_branch'),
'triggers': [
# On a code-update apply, refresh the DB schema. pip_install
# would have triggered alembic in the create_venv path, but on
# a normal apply pip_install's `unless` skips (packages still
# importable from the previous editable install), and that
# would leave alembic_upgrade dormant. Wiring git_deploy →
# alembic directly ensures new migrations land whenever new
# code lands. alembic upgrade head is idempotent (no-op when
# already at head), so this is safe to fire on every code
# update; the seed_overlays + service:restart cascade off
# alembic also covers picking up the new code in gunicorn.
'action:left4me_alembic_upgrade',
],
# chown_src and pip_install are NOT in triggers — they run every
# apply gated by their own `unless` guards, which makes the chain
# self-healing after a partial failure. (Items in a triggers list
# must be triggered:True, which would lose that property.)
},
}
actions['left4me_chown_src'] = {
# Runs every apply (cheap — chown -R on a small tree). Self-heals
# whenever git_deploy extracts a new tarball as root-owned files.
# Not in any triggers list so doesn't need triggered:True.
'command': 'chown -R left4me:left4me /opt/left4me/src',
'unless': 'test -z "$(find /opt/left4me/src \\! -user left4me -print -quit 2>/dev/null)"',
'cascade_skip': False,
'needs': [
'git_deploy:/opt/left4me/src',
'user:left4me',
'group:left4me',
],
}
actions['left4me_create_venv'] = {
'command': 'sudo -u left4me /usr/bin/python3 -m venv /opt/left4me/.venv',
'unless': 'test -x /opt/left4me/.venv/bin/python',
'cascade_skip': False,
'needs': [
'directory:/opt/left4me',
'pkg_apt:python3-venv',
'user:left4me',
],
'triggers': [
'action:left4me_pip_upgrade',
],
}
actions['left4me_pip_upgrade'] = {
'command': 'sudo -u left4me /opt/left4me/.venv/bin/python -m pip install --upgrade pip',
'triggered': True,
'cascade_skip': False,
'needs': [
'pkg_apt:python3-pip',
],
# No triggers — pip_install runs on every apply (gated by `unless`)
# rather than being chained from here. Keeps pip_upgrade scoped to
# exactly its purpose.
}
actions['left4me_pip_install'] = {
# Single pip invocation installs both editable packages from the same
# checkout. Runs on every apply: pip install -e is fast on no-op, and
# any gate weaker than "egg-info matches pyproject.toml" can mask
# script regeneration — e.g. adding [project.scripts] later wouldn't
# be picked up if `unless` only checks importability.
'command': 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web',
'cascade_skip': False,
'needs': [
'git_deploy:/opt/left4me/src',
'action:left4me_create_venv',
'action:left4me_chown_src',
],
'triggers': [
'action:left4me_alembic_upgrade',
],
}
actions['left4me_alembic_upgrade'] = {
# Mirrors deploy-test-server.sh:239-242. Runs as left4me with both env
# files sourced; JOB_WORKER_ENABLED=false so a stray worker doesn't race
# with the migration.
'command': (
'sudo -u left4me sh -c "'
'cd /opt/left4me/src/l4d2web && '
'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
'/opt/left4me/.venv/bin/alembic -c /opt/left4me/src/l4d2web/alembic.ini upgrade head'
'"'
),
'triggered': True,
'cascade_skip': False,
'needs': [
'action:left4me_pip_install',
'file:/etc/left4me/host.env',
'file:/etc/left4me/web.env',
],
'triggers': [
'action:left4me_seed_overlays',
'svc_systemd:left4me-web.service:restart',
],
}
actions['left4me_seed_overlays'] = {
# Idempotent: refreshes script bodies in place; existing overlay rows keep their ids.
'command': (
'sudo -u left4me sh -c "'
'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
'/opt/left4me/.venv/bin/flask --app l4d2web.app:create_app '
'seed-script-overlays /opt/left4me/src/examples/script-overlays'
'"'
),
'triggered': True,
'cascade_skip': False,
'needs': [
'action:left4me_alembic_upgrade',
],
}

View file

@ -1,275 +0,0 @@
assert node.has_bundle('nftables')
assert node.has_bundle('systemd')
defaults = {
'left4me': {
# Application-wide defaults; node only overrides if it really needs to.
'git_url': 'https://git.sublimity.de/cronekorkn/left4me.git',
'git_branch': 'master',
'secret_key': repo.vault.random_bytes_as_base64_for(f'{node.name} left4me secret_key', length=32).value,
'gunicorn_workers': 1,
'gunicorn_threads': 32,
'job_worker_threads': 4,
# Whole 27000-block: covers Steam's defaults (27015 game, 27005
# client/RCON) plus headroom for ad-hoc ports without further
# nftables changes. Mirrored into LEFT4ME_PORT_RANGE_{START,END}
# by web.env.mako and into the nftables input rule by the
# nftables_input reactor below.
'port_range_start': 27000,
'port_range_end': 27999,
},
'apt': {
'packages': {
'p7zip-full': {},
'nftables': {},
'iproute2': {},
'curl': {},
'ca-certificates': {},
'python3': {},
'python3-venv': {},
'python3-pip': {},
'python3-dev': {},
# steamcmd is a 32-bit ELF; needs i386 multiarch + these libs.
# `_` → `:` is bundlewrap's pkg_apt convention for multiarch
# names (see pkg_apt.py:48).
'libc6_i386': { # installs libc6:i386
'needs': ['action:left4me_dpkg_add_i386_arch'],
},
'lib32z1': {
'needs': ['action:left4me_dpkg_add_i386_arch'],
},
},
},
'nftables': {
# Match deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft.
# Mark srcds UDP egress (uid left4me) with DSCP EF + skb priority 6
# so CAKE classifies it into the priority tin.
'output': {
'meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000',
'meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000',
},
},
'systemd': {
'services': {
'left4me-web.service': {
'enabled': True,
'running': True,
'needs': [
'action:left4me_alembic_upgrade',
'file:/etc/left4me/host.env',
'file:/etc/left4me/web.env',
],
},
# Note: left4me-server@.service is a TEMPLATE — instances are
# started on-demand by the web app via the left4me-systemctl
# helper. Don't enable/start it from here.
# The slices are installed (file present) but don't need
# enable/start — they're activated implicitly when a unit
# uses Slice=.
},
},
'backup': {
# Application-owned paths. Set-merged with backup group / node-level paths.
'paths': {
'/var/lib/left4me',
'/etc/left4me',
},
},
}
@metadata_reactor.provides(
'nginx/vhosts',
)
def nginx_vhosts(metadata):
# letsencrypt/domains and monitoring/services for the vhost are auto-
# populated by bundles/nginx/metadata.py. We just declare check_path:
# '/health' so the auto-check hits the Flask health endpoint, not '/'.
domain = metadata.get('left4me/domain')
return {
'nginx': {
'vhosts': {
domain: {
'content': 'nginx/proxy_pass.conf',
'context': {
'target': 'http://127.0.0.1:8000',
},
'check_path': '/health',
},
},
},
}
@metadata_reactor.provides(
'nftables/input',
)
def nftables_input(metadata):
port_start = metadata.get('left4me/port_range_start')
port_end = metadata.get('left4me/port_range_end')
return {
'nftables': {
'input': {
f'udp dport {port_start}-{port_end} accept',
f'tcp dport {port_start}-{port_end} accept',
},
},
}
@metadata_reactor.provides(
'systemd/units',
)
def systemd_units(metadata):
workers = metadata.get('left4me/gunicorn_workers')
threads = metadata.get('left4me/gunicorn_threads')
# cgroup-v2 cpuset. `system_cpus` (set of int CPU ids, declared per
# node) pins system/user/build; the complement pins l4d2-game. On HT
# hosts, list both siblings of a physical core so games don't share
# L1/L2 with system work — pairings via
# /sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list.
vm_threads = metadata.get('vm/threads', metadata.get('vm/cores'))
all_cpus = set(range(vm_threads))
system_cpus = metadata.get('left4me/system_cpus')
if not system_cpus <= all_cpus:
raise Exception(
f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
f'includes CPUs outside [0, {vm_threads})'
)
game_cpus = all_cpus - system_cpus
if not game_cpus:
raise Exception(
f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
f'leaves no cores for games'
)
system_cpus_string = ','.join(str(t) for t in sorted(system_cpus))
game_cpus_string = ','.join(str(t) for t in sorted(game_cpus))
# Drop-in for upstream system.slice / user.slice (units we don't own).
# Same '<parent>.d/<basename>.conf' convention as nginx and autologin.
cpuset_dropin = {'Slice': {'AllowedCPUs': system_cpus_string}}
return {
'systemd': {
'units': {
'left4me-web.service': {
'Unit': {
'Description': 'left4me web application',
'After': 'network-online.target',
'Wants': 'network-online.target',
},
'Service': {
'Type': 'simple',
'User': 'left4me',
'Group': 'left4me',
'WorkingDirectory': '/opt/left4me/src',
'Environment': {
'HOME=/var/lib/left4me',
'PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
},
'EnvironmentFile': (
'/etc/left4me/host.env',
'/etc/left4me/web.env',
),
'ExecStart': (
'/opt/left4me/.venv/bin/gunicorn '
f'--workers {workers} --threads {threads} '
"--bind 127.0.0.1:8000 'l4d2web.app:create_app()'"
),
'Restart': 'on-failure',
'RestartSec': '3',
# NoNewPrivileges intentionally NOT set: workers sudo to the helpers.
'ProtectSystem': 'full',
'ReadWritePaths': '/var/lib/left4me',
'PrivateTmp': 'true',
},
'Install': {
'WantedBy': {'multi-user.target'},
},
},
'left4me-server@.service': {
'Unit': {
'Description': 'left4me server instance %i',
'After': 'network-online.target',
'Wants': 'network-online.target',
'StartLimitBurst': '5',
'StartLimitIntervalSec': '60s',
},
'Service': {
'Type': 'simple',
'User': 'left4me',
'Group': 'left4me',
'EnvironmentFile': (
'/etc/left4me/host.env',
'/var/lib/left4me/instances/%i/instance.env',
),
'WorkingDirectory': '-/var/lib/left4me/runtime/%i/merged/left4dead2',
'ExecStartPre': (
'+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
'/usr/local/libexec/left4me/left4me-overlay mount %i'
),
'ExecStart': (
'/var/lib/left4me/runtime/%i/merged/srcds_run '
'-game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS'
),
'ExecStopPost': (
'+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
'/usr/local/libexec/left4me/left4me-overlay umount %i'
),
'Restart': 'on-failure',
'RestartSec': '5',
'Slice': 'l4d2-game.slice',
'Nice': '-5',
'IOSchedulingClass': 'best-effort',
'IOSchedulingPriority': '4',
'OOMScoreAdjust': '-200',
'MemoryHigh': '1.5G',
'MemoryMax': '2G',
'TasksMax': '256',
'LimitNOFILE': '65536',
'KillSignal': 'SIGINT',
'TimeoutStopSec': '15s',
'LogRateLimitIntervalSec': '0',
'NoNewPrivileges': 'true',
'PrivateTmp': 'true',
'PrivateDevices': 'true',
'ProtectHome': 'true',
'ProtectSystem': 'strict',
'ReadOnlyPaths': '/var/lib/left4me/installation /var/lib/left4me/overlays',
'ReadWritePaths': '/var/lib/left4me/runtime/%i',
'RestrictSUIDSGID': 'true',
'LockPersonality': 'true',
},
'Install': {
'WantedBy': {'multi-user.target'},
},
},
'l4d2-game.slice': {
'Unit': {
'Description': 'left4me game-server slice',
'Before': 'slices.target',
},
'Slice': {
'CPUWeight': '1000',
'IOWeight': '1000',
'AllowedCPUs': game_cpus_string,
},
},
'l4d2-build.slice': {
'Unit': {
'Description': 'left4me script-sandbox build slice',
'Before': 'slices.target',
},
'Slice': {
'CPUWeight': '10',
'IOWeight': '10',
'AllowedCPUs': system_cpus_string,
},
},
'system.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
'user.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
},
},
}

View file

@ -1,60 +1,9 @@
# letsencrypt
Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
DNS-01 against the in-house bind-acme server.
[upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
## First-apply behaviour
Immediately after `bw apply <node>`, nginx serves a **self-signed
cert** for each declared domain — generated by
`/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
something to start with. The real Let's Encrypt cert arrives at most
24h later when the systemd timer fires
(`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
shortcut the wait:
```sh
ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
ssh <node> 'sudo systemctl reload nginx'
```
## DNS-01 prerequisites
`hook.sh` does `nsupdate` against the bind-acme server (referenced
by `letsencrypt/acme_node`). For the challenge to succeed:
1. The acme node must be in the same metadata graph (so
`bw metadata <node> -k letsencrypt/acme_node` resolves).
2. **All NS servers** for the validated domain must serve the
`_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
primary AND secondary geographic regions; both authoritative
servers must agree. If a secondary NS is also a bw-managed node,
`bw apply` it after adding the domain (see e.g. `ovh.secondary`).
3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
rendered with the bind-acme server's `network/internal/ipv4`
for clients outside that LAN, the route must exist (typically via
wireguard `s2s` peer membership).
## Negative-cache penalty
If the first DNS-01 attempt fails (e.g. zone not yet applied to the
secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
negative TTL (often 900s = 15 min). Subsequent attempts during that
window also fail and refresh the cache. Combined with LE's rate limit
of **5 failed authorisations per domain per hour**, recovery requires
you to **stop retrying** for ~15 minutes after fixing the DNS, then
make at most one attempt.
## nsupdate sample
For interactive testing of the bind-acme TSIG path:
https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
```sh
printf "server 127.0.0.1
zone acme.resolver.name.
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT "hello"
send
" | nsupdate -y hmac-sha512:acme:XXXXXX
```

View file

@ -2,7 +2,7 @@ defaults = {
'apt': {
'packages': {
'dehydrated': {},
'bind9-dnsutils': {},
'dnsutils': {},
},
},
'letsencrypt': {

View file

@ -1,36 +0,0 @@
# nginx
Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
`data/nginx/*.conf`.
## How port 80 is served
The bundle ships a fixed `80.conf` to
`/etc/nginx/sites-available/80.conf` (picked up by the
`sites-enabled/` symlink) that handles **all** port-80 traffic
across vhosts:
1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
served from `/var/lib/dehydrated/acme-challenges/`.
2. All other port-80 requests are 301-redirected to
`https://$host$request_uri`.
Per-vhost templates only declare `listen 443 ssl http2;`, so they
don't need their own port-80 server blocks. If you need vhost-
specific port-80 behaviour (e.g. plain-HTTP without redirect),
override 80.conf or add a per-vhost block.
## Required metadata
- `vm/cores` — read directly by `items.py` for `worker_processes`.
No default; `bw items <node>` raises at item-build time if missing.
Typically supplied by the `vm` bundle / hetzner-vm group; double-
check on bare-metal hosts.
- `nginx/vhosts` — dict of vhost-name → vhost-config.
- `nginx/modules` — list of dynamic modules to load.
## Cross-namespace
`items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
HTTPS block when LE hasn't declared the domain yet — keeps the
bundle loadable on a node where letsencrypt isn't fully wired up.

View file

@ -32,13 +32,12 @@ http {
% endif
# Always defined: serves both WS-enabled vhosts (Connection: upgrade for
# ws clients) and SSE/keep-alive vhosts (Connection: "" lets nginx manage
# the upstream connection for keep-alive, instead of forcing "close").
% if has_websockets:
map $http_upgrade $connection_upgrade {
default upgrade;
'' '';
'' close;
}
% endif
include /etc/nginx/sites-enabled/*;
}

View file

@ -64,7 +64,7 @@ files = {
'svc_systemd:nginx:restart',
},
},
'/etc/nginx/sites-available/80.conf': {
'/etc/nginx/sites/80.conf': {
'triggers': {
'svc_systemd:nginx:restart',
},

View file

@ -33,7 +33,7 @@ for name, unit in node.metadata.get('systemd/units').items():
'svc_systemd:systemd-networkd.service:restart',
],
}
elif extension in ['timer', 'service', 'mount', 'swap', 'target', 'slice']:
elif extension in ['timer', 'service', 'mount', 'swap', 'target']:
path = f'/usr/local/lib/systemd/system/{name}'
dependencies = {
'triggers': [

View file

@ -8,16 +8,10 @@ server {
location / {
proxy_set_header X-Real-IP $remote_addr;
# Always set Upgrade + Connection via the $connection_upgrade map:
# WS client (Upgrade header sent) -> Connection: upgrade
# non-WS client (no Upgrade) -> Connection: "" (keep-alive)
# Lets every vhost serve both WS and SSE without per-vhost flags.
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# SSE-safe pass-through (also fine for non-SSE traffic):
proxy_buffering off;
proxy_read_timeout 1h;
% if websockets:
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
% endif
proxy_pass ${target};
}
}

View file

@ -48,51 +48,3 @@ instead.
See [`conventions.md#secrets`](conventions.md#secrets) for the
demagify magic-string list and the rule's full rationale.
## Read-only commands — useful flag combinations
The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
These are the flag combinations agents reach for most often in this repo:
| Want to … | Run |
|---|---|
| Sanity-check the whole repo (parse + cross-cutting hooks) | `bw test` (defaults to `-HIJKMSp`) |
| Exercise reactors and item-graph for one node | `bw test <node>` (defaults to `-IJKMp`) |
| Same, but every node that has a given bundle | `bw test bundle:<name>` |
| Print one metadata key for one node | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
| Show where each metadata value comes from | `bw metadata <node> -b` |
| Resolve Faults (vault values) into the dump | `bw metadata <node> -f`**may print secrets, avoid** |
| List a node's items, with the bundle that defines each | `bw items <node> --blame` |
| Preview a rendered file's content | `bw items <node> file:<path> -f` |
| Verify against the live host, scoped to one bundle | `bw verify <node> -o bundle:<name>` |
| Hash metadata only (faster than full config hash) | `bw hash <node> -m` |
| Inspect the data backing a hash | `bw hash <node> -d` |
`bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
selector grammar: bare node name, group name, `bundle:<name>`,
`!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
[fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
## Bundle-validation workflow
`bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
loads every bundle, but a bundle's reactors only resolve when a node's
metadata is actually built — and that happens only for nodes that
opt in. Until then, reactor bugs stay dormant. bw rejects reactors
that don't read any metadata, but the rejection only fires once *some*
node consumes the bundle.
When developing a new bundle:
1. Scaffold + `bw test` — confirms parsing.
2. **Attach the bundle to one node** (or a stub node) by adding it to
`nodes/<n>.py`'s `bundles` list, or to a group the node is in.
3. `bw test <node>` — now reactors fire. This is where bundle bugs
surface.
4. `bw items <node> --blame` and `bw metadata <node> -k <key>`
confirm items materialise and derived metadata looks right.
5. `bw hash <node>` — preview against the live host.
Step 2 is non-optional. A bundle that "passes `bw test`" with no
consumer is proven only to parse.

View file

@ -127,12 +127,6 @@ bundle.
## 3. Per-bundle `AGENTS.md` template
> **Status: replaced — pre-pivot intent only.** Per-bundle docs are plain
> `README.md` with no fixed structure. See §0 Revisions and the
> "Per-bundle README" section in [`bundles/AGENTS.md`](../../../bundles/AGENTS.md)
> for the current convention. The template below is kept as a record of
> the original design.
One balanced doc serving both audiences. Prose where prose helps, structure
where structure helps. Sections in order:
@ -345,12 +339,6 @@ in 30120 lines each; root `AGENTS.md` is ~150 lines.
### Phase 2 — seed bundles (10)
> **Status: dropped — pre-pivot intent only.** Phase 2 didn't ship. After
> Phase 1 landed, the maintainer pulled the per-bundle `AGENTS.md`
> migration: the rigid template proved a poor fit for the heterogeneous
> existing READMEs. See §0 Revisions. The seed list and migration plan
> below are kept as a record of how the work was scoped.
Bundles selected empirically (node+group references and recent commit
activity, validated 2026-05-10):

View file

@ -1,253 +0,0 @@
# Round 1 — agent-doc refactor (gaps 16 + cmd cheat sheet)
## Why
A previous session integrated `bundles/left4me/` and brought
`ovh.left4me` live. The integration produced a handoff (at
`~/.claude/plans/2026-05-10-ckn-bw-docs-improvements-handoff.md`)
listing 12 documentation gaps surfaced by the work. This spec covers
the first six (the cross-cutting ones) plus a useful side-quest:
adding a read-only command cheat sheet to `docs/agents/commands.md`.
Gaps 712 (item-specific, bundle READMEs) are deferred to a follow-up
round.
## Scope
In:
- Gap 1 — drop `bw bundles` (doesn't exist), add `bw verify` to the
read-only allowlist.
- Gap 2 — bundle-validation workflow needs a node attached.
- Gap 3 — nodes carry only node-specific metadata (split across
`bundles/AGENTS.md` and `nodes/AGENTS.md`).
- Gap 4 — reactors must read metadata or be defaults.
- Gap 5 — `triggers``triggered: True` invariant + self-healing
pattern.
- Gap 6 — `unless` semantics (folded into Gap 5's second bullet).
- Side-quest: read-only command cheat sheet in `commands.md` (`bw
test` flag matrix + selectors, `bw metadata -k/-b/-f`, `bw items
--blame/-f`, `bw verify -o bundle:`, `bw hash -m/-d`).
Out:
- Gaps 712 (`source` implicit, `git_deploy` chown, `git_deploy` URL
form, letsencrypt/bind/nginx READMEs).
- Any change to bundle behaviour. This is pure docs; if a doc claim
feels wrong, push back to the maintainer rather than editing
`.py`.
## Verification approach
For each gap, find current line numbers in the target doc (handoff
line numbers are May 2026; some have drifted). Verify code-level
claims against the fork source under `.venv/src/bundlewrap/` before
quoting them.
Already verified during brainstorm:
- Gap 1: `bw bundles` is not a subcommand of the installed fork
(`.venv/bin/bw --help` lists only
`apply, debug, diff, groups, hash, ipmi, items, lock, metadata,
nodes, plot, pw, repo, run, stats, test, verify, zen`). `bw verify`
is read-only.
- Gap 2: `bw test` default flag set differs by mode. Whole-repo:
`-HIJKMSp`. Node-targeted: `-IJKMp`. The repo-mode adds `-H`
(repo hooks) and `-S` (subgroup-loops); the node-mode adds `-J`
(node hooks). Reactors only resolve when a node's metadata is
built, which only happens when a node opts into the bundle.
- Gap 4: exact wording at `metagen.py:428`:
`"{reactor_name} on {node_name} did not request any metadata, you
might want to use defaults instead"`.
- Gap 5: exact wording at `deps.py:340`:
`"'{item1}' in bundle '{bundle1}' triggered by '{item2}' in bundle
'{bundle2}', but missing 'triggered' attribute"`.
- Gap 3 precedent: `bundles/left4me/metadata.py:10` is the canonical
random-bytes-in-defaults example. `bundles/postgresql/metadata.py:4`
is the password_for-at-module-scope example. (The handoff cites
postgresql for the random-bytes pattern; that's a misattribution —
postgresql uses `password_for`.)
After every commit: `.venv/bin/bw test` must pass with the same
output as before. Pure-docs edits cannot break it unless a `.py` is
touched accidentally.
## Commits
Six iterative commits, matching repo style.
### Commit 1 — drop `bw bundles`, add `bw verify` (Gap 1)
`AGENTS.md` rule 1 only. The handoff also flagged
`bundles/AGENTS.md:60-64`, but that list no longer references
`bw bundles` (it currently reads `bw test` / `bw items` / `bw hash`).
That section gets rewritten in commit 3, not here.
```diff
- to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
- `bw items`, `bw metadata`, `bw hash`, `bw debug`. See
+ to `bw test`, `bw nodes`, `bw groups`, `bw items`,
+ `bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
```
### Commit 2 — read-only command cheat sheet
Append to `docs/agents/commands.md`. New H2 section, table format
to match the existing voice.
```markdown
## Read-only commands — useful flag combinations
The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
These are the flag combinations agents reach for most often in this repo:
| Want to … | Run |
|---|---|
| Sanity-check the whole repo (parse + cross-cutting hooks) | `bw test` (defaults to `-HIJKMSp`) |
| Exercise reactors and item-graph for one node | `bw test <node>` (defaults to `-IJKMp`) |
| Same, but every node that has a given bundle | `bw test bundle:<name>` |
| Print one metadata key for one node | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
| Show where each metadata value comes from | `bw metadata <node> -b` |
| Resolve Faults (vault values) into the dump | `bw metadata <node> -f`**may print secrets, avoid** |
| List a node's items, with the bundle that defines each | `bw items <node> --blame` |
| Preview a rendered file's content | `bw items <node> file:<path> -f` |
| Verify against the live host, scoped to one bundle | `bw verify <node> -o bundle:<name>` |
| Hash metadata only (faster than full config hash) | `bw hash <node> -m` |
| Inspect the data backing a hash | `bw hash <node> -d` |
`bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
selector grammar: bare node name, group name, `bundle:<name>`,
`!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
[fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
```
### Commit 3 — bundle validation needs a node attached (Gap 2)
Two file changes.
**`bundles/AGENTS.md` lines 59-64** — replace the Verify list:
```markdown
5. **Verify, in this order:**
- `bw test` — repo-wide parse + cross-cutting hooks. Loads every
bundle, but reactors don't fire for nodes that haven't opted into
the bundle yet — bugs in new reactors stay hidden here.
- **Attach the bundle to a node** (via the node's `bundles` list, or
a group it belongs to). Until you do, the next steps don't actually
exercise the bundle.
- `bw test <node>` — exercises every reactor and item-graph edge for
that node. This is where most new-bundle bugs surface.
- `bw items <node> --blame` — confirm items materialise with the right
paths, authored by the expected bundle.
- `bw metadata <node> -k <a/b>` — spot-check derived metadata.
- `bw hash <node>` — preview vs current host state.
See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
for the rationale.
```
**`docs/agents/commands.md`** — new section after the cheat sheet:
```markdown
## Bundle-validation workflow
`bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
loads every bundle, but a bundle's reactors only resolve when a node's
metadata is actually built — and that happens only for nodes that
opt in. Until then, reactor bugs stay dormant. bw rejects reactors that
don't read any metadata, but the rejection only fires once *some* node
consumes the bundle.
When developing a new bundle:
1. Scaffold + `bw test` — confirms parsing.
2. **Attach the bundle to one node** (or a stub node) by adding it to
`nodes/<n>.py`'s `bundles` list, or to a group the node is in.
3. `bw test <node>` — now reactors fire. This is where bundle bugs
surface.
4. `bw items <node> --blame` and `bw metadata <node> -k <key>` — confirm
items materialise and derived metadata looks right.
5. `bw hash <node>` — preview against the live host.
Step 2 is non-optional. A bundle that "passes `bw test`" with no consumer
is proven only to parse.
```
### Commit 4 — nodes carry only node-specific metadata (Gap 3)
**`bundles/AGENTS.md` Conventions** — new bullet:
```markdown
- **Bundles own application-wide knowledge; nodes carry only the few
per-host knobs the bundle actually needs.** When designing a bundle,
identify the per-node knobs (e.g. domain, uplink interface, a
vault-id suffix) and put everything else in `defaults`, or in a
reactor that derives from those knobs. Per-node random secrets
belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
keyed on the node — not in the node file. See
`bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
at module scope).
```
**`nodes/AGENTS.md` Pitfalls** — new bullet:
```markdown
- **Bloated per-node metadata is usually a bundle smell.** If a
bundle's metadata block in the node file has more than 3-5 keys,
the bundle is probably under-using `defaults` / reactors. Push the
contribution into the bundle (see
[`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
growing the node file.
```
### Commit 5 — reactors must read metadata or be defaults (Gap 4)
**`bundles/AGENTS.md` Pitfalls** — new bullet:
```markdown
- **Reactors must read metadata.** If a reactor body returns a static
dict without calling `metadata.get(...)`, bw raises
`ValueError: <reactor> on <node> did not request any metadata, you
might want to use defaults instead` once a node consumes the bundle.
Fix: fold the contribution into `defaults`. The rule applies even
when the reactor writes into another bundle's namespace — a static
contribution to e.g. `nftables/output` belongs in `defaults`, where
bw merges it with other bundles' contributions.
```
### Commit 6 — `triggers` invariant + self-healing + `unless` (Gaps 5+6)
**`bundles/AGENTS.md` Pitfalls** — two new bullets (Gap 6's `unless`
semantics fold into the second; cleaner than three bullets):
```markdown
- **`triggers``triggered: True` invariant.** Any item listed in
another's `triggers` list must declare `triggered: True`. bw
enforces this at `bw test` time: *"…triggered by …, but missing
'triggered' attribute"*. Corollary: an action can't be both in an
upstream `triggers` list AND self-healing every apply — pick one.
- **Triggered actions don't recover from partial failure.** When an
upstream item's apply succeeds but its triggered downstream action
fails, subsequent applies can't recover via the trigger chain —
upstream is "already in desired state" and never re-triggers. For
actions that must self-heal (pip installs, chowns, migrations),
drop `triggered: True` and gate the command with `unless:
<fast-check>`. `unless` is a shell command on the target host whose
exit status decides whether the main command runs (exit 0 = skip);
it's checked at fire time, after `triggered:` filtering.
```
## Out of scope
- Gaps 712 — deferred. The maintainer re-engages after this round.
- Bundle behaviour changes. Pure docs.
- `bw apply` / `bw run` — not authorised this session.
## Constraints
- Don't echo decrypted secrets in commit messages or new doc text.
- Don't restore `*.py_` parked nodes.
- After each commit, `.venv/bin/bw test` must pass.
- No push.

View file

@ -1,286 +0,0 @@
# Round 2 — agent-doc refactor (gaps 712)
## Why
Continuation of round 1 (spec at
`2026-05-10-ckn-bw-agents-md-refactor-round-1-design.md`). Round 1
landed the cross-cutting lessons (read-only allowlist, bundle
validation needs a node, nodes-carry-only-node-specific-metadata,
reactors-must-read-metadata, triggers/triggered:True invariant,
self-healing pattern). Round 2 covers the remaining six gaps: built-in
item-type gotchas and three bundle READMEs.
## Scope
In:
- Gap 7 — `file:`'s `source` defaults to the basename of the destination.
- Gap 8 — `git_deploy` extracts as the connecting user (root after
sudo); chown action needed for non-root downstream consumers.
- Gap 9 — `git_deploy` URL form: `://` triggers per-apply clone, no `://`
requires a `git_deploy_repos` map at the repo root.
- Gap 10 — `bundles/letsencrypt`: first-apply behaviour, DNS-01
prerequisites, negative-cache penalty.
- Gap 11 — `bundles/bind`: applying changes to a `master_node`-linked
pair needs `bw apply` on both ends.
- Gap 12 — `bundles/nginx`: how port 80 is served, `vm/cores`
requirement.
Out:
- Bundle behaviour changes. Pure docs.
- `bw apply` / `bw run` — not authorised this session.
## Placement decision (diverges from the handoff)
The handoff suggests `items/AGENTS.md` for gaps 7, 8, 9. But
`items/AGENTS.md` is scoped to **custom** item types in the `items/`
directory — its first sentence: *"Custom item types — each `*.py` is
a `bundlewrap.items.Item` subclass…"*. Built-in gotchas (`file:`,
`git_deploy:`) don't fit there.
Round-1 lessons about built-in mechanics (reactors must read metadata,
`triggers` invariant, self-healing pattern) all landed in
`bundles/AGENTS.md` Pitfalls. Gaps 7, 8, 9 are the same shape, so
they go in the same place.
## Validation findings
- Gap 7: well-known bw built-in semantics. Trusting the handoff.
- Gap 8: confirmed at `.venv/src/bundlewrap/bundlewrap/items/git_deploy.py`'s
`fix()` method — uses `self.node.upload(...)` which writes as the sudo
user (root). Files end up root-owned.
- Gap 9: confirmed in round 1 (`git_deploy.py:103` —
`if "://" in self.attributes['repo']:`).
- Gap 10: confirmed `/etc/dehydrated/letsencrypt-ensure-some-certificate`
exists in the bundle; runs on every domain with idempotent `unless`.
Daily timer at `/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`.
- Gap 11: nuanced. The bundle DOES set `bind/type = 'slave'` and renders
different named.conf.local for slaves, so bind itself may AXFR at
runtime. But the slave's *bw-managed* zone files are statically
rendered from the master's metadata at slave-apply time
(`bundles/bind/items.py:100`). The practical workflow rule — "apply
both" — is correct regardless. I'll frame the README as the workflow
rule, not the absolute "not AXFR slaving" claim from the handoff.
- Gap 12: confirmed `nginx.conf:42` includes `/etc/nginx/sites-enabled/*`;
`nginx/items.py:35` reads `node.metadata.get('vm/cores')` with no
default. README does not exist.
## Existing README states
- `bundles/letsencrypt/README.md` — 9 lines: upstream link + nsupdate
snippet. Reshape into an operational README; keep the nsupdate snippet.
- `bundles/bind/README.md` — does not exist. Create.
- `bundles/nginx/README.md` — does not exist. Create.
## Commits
### Commit 7 — `file:` source defaults to destination basename (Gap 7)
`bundles/AGENTS.md` Pitfalls — new bullet:
```markdown
- **`file:` `source` defaults to the destination basename.** For a
destination of `/etc/foo/bar.conf` with no `source` key, bw looks for
`bundles/<bundle>/files/bar.conf`. Only declare `source` explicitly
when the basename you want differs (e.g. shipping a Mako template
named `bar.conf.mako` to a destination of `/etc/foo/bar.conf`).
```
### Commit 8 — `git_deploy` gotchas (Gaps 8 + 9)
`bundles/AGENTS.md` Pitfalls — two new bullets.
```markdown
- **`git_deploy` extracts as the connecting (sudo) user — files end up
root-owned.** A downstream action that runs as a non-root app user
(typical: editable pip install, Rails bundle install) will hit
`Permission denied` on `.egg-info` or similar. The fix is a
self-healing chown action between `git_deploy` and the downstream
action:
```python
actions['<bundle>_chown_src'] = {
'command': 'chown -R <user>:<group> <path>',
'unless': 'test -z "$(find <path> ! -user <user> -print -quit)"',
'cascade_skip': False,
'needs': ['git_deploy:<path>', 'user:<user>', 'group:<group>'],
}
```
See `bundles/left4me/items.py` for an in-tree example.
- **`git_deploy` URL form matters.** A URL containing `://` (HTTP/HTTPS,
`ssh://`) makes bw clone to a temp dir per-apply — no operator-side
state needed. Without `://` (SCP-style `git@host:path`), bw expects a
`git_deploy_repos` map file at the repo root pointing at a long-lived
local clone, and raises `RepositoryError('missing repo map for
git_deploy')` if it isn't there. For HTTPS-reachable repos use the
HTTPS form; for SSH-only, prefer the explicit `ssh://user@host/path`
form so the map isn't needed.
```
### Commit 9 — letsencrypt README (Gap 10)
Reshape `bundles/letsencrypt/README.md`. Keep the upstream link and
nsupdate snippet at the top; add three structured sections.
```markdown
# letsencrypt
Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
DNS-01 against the in-house bind-acme server.
[upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
## First-apply behaviour
Immediately after `bw apply <node>`, nginx serves a **self-signed
cert** for each declared domain — generated by
`/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
something to start with. The real Let's Encrypt cert arrives at most
24h later when the systemd timer fires
(`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
shortcut the wait:
```sh
ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
ssh <node> 'sudo systemctl reload nginx'
```
## DNS-01 prerequisites
`hook.sh` does `nsupdate` against the bind-acme server (referenced
by `letsencrypt/acme_node`). For the challenge to succeed:
1. The acme node must be in the same metadata graph (so
`bw metadata <node> -k letsencrypt/acme_node` resolves).
2. **All NS servers** for the validated domain must serve the
`_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
primary AND secondary geographic regions; both authoritative
servers must agree. If a secondary NS is also a bw-managed node,
`bw apply` it after adding the domain (see e.g. `ovh.secondary`).
3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
rendered with the bind-acme server's `network/internal/ipv4`
for clients outside that LAN, the route must exist (typically via
wireguard `s2s` peer membership).
## Negative-cache penalty
If the first DNS-01 attempt fails (e.g. zone not yet applied to the
secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
negative TTL (often 900s = 15 min). Subsequent attempts during that
window also fail and refresh the cache. Combined with LE's rate limit
of **5 failed authorisations per domain per hour**, recovery requires
you to **stop retrying** for ~15 minutes after fixing the DNS, then
make at most one attempt.
## nsupdate sample
For interactive testing of the bind-acme TSIG path:
```sh
printf "server 127.0.0.1
zone acme.resolver.name.
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
send
" | nsupdate -y hmac-sha512:acme:<TSIG_KEY_REDACTED>
```
```
### Commit 10 — bind README (Gap 11, reframed)
Create `bundles/bind/README.md`. Frame as the workflow rule, not the
absolute "not AXFR" claim.
```markdown
# bind
Authoritative DNS — primary plus optional `bind/master_node` slaves.
## Applying changes needs both nodes
The slave's bw-managed zone files are rendered from the master's
metadata at slave-apply time (see `bundles/bind/items.py:100`). When
you change a record on the master (adding a `letsencrypt/domains`
entry, a new vhost, etc.), the change is only published once you
apply BOTH:
```sh
bw apply htz.mails # primary (where the source records live)
bw apply ovh.secondary # secondary (renders its own zone files)
```
Until both have been applied, `bw verify ovh.secondary` will show
stale zones and consumers that hit the secondary (Let's Encrypt's
secondary-region validators in particular) will see NXDOMAIN. Even
though the slave's named.conf.local declares `type slave;`, don't
rely on bind's own AXFR catching up — the bw-rendered file on disk
is what `bw verify` measures.
## See also
- `bundles/bind-acme/` — the in-house ACME-update receiver.
- `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
negative-cache penalty (the most common operational consequence of
forgetting to apply the secondary).
```
### Commit 11 — nginx README (Gap 12)
Create `bundles/nginx/README.md`.
```markdown
# nginx
Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
`data/nginx/*.conf`.
## How port 80 is served
The bundle ships a fixed `80.conf` to
`/etc/nginx/sites-available/80.conf` (picked up by the
`sites-enabled/` symlink) that handles **all** port-80 traffic
across vhosts:
1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
served from `/var/lib/dehydrated/acme-challenges/`.
2. All other port-80 requests are 301-redirected to
`https://$host$request_uri`.
Per-vhost templates only declare `listen 443 ssl http2;`, so they
don't need their own port-80 server blocks. If you need vhost-
specific port-80 behaviour (e.g. plain-HTTP without redirect), you'll
need to override 80.conf or add a per-vhost block.
## Required metadata
- `vm/cores` — read directly by `items.py` for `worker_processes`.
No default; `bw items <node>` raises at item-build time if missing.
Typically supplied by the `vm` bundle / hetzner-vm group; double-
check on bare-metal hosts.
- `nginx/vhosts` — dict of vhost-name → vhost-config.
- `nginx/modules` — list of dynamic modules to load.
## Cross-namespace
`items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
HTTPS block when LE hasn't declared the domain yet — keeps the bundle
loadable on a node where letsencrypt isn't fully wired up.
```
## Out of scope
- Bundle behaviour changes. Pure docs.
- `bw apply` / `bw run`.
- Reformatting the existing two-line bundle READMEs into the new
shape — bundles/AGENTS.md explicitly says don't do that
("uneven quality is part of what we accept in exchange for not
blocking other work").
## Constraints
- Don't echo decrypted secrets. The TSIG-key example in the
letsencrypt nsupdate snippet uses `<TSIG_KEY_REDACTED>`.
- After each commit, `.venv/bin/bw test` must pass.
- No push.

View file

@ -1,5 +0,0 @@
{
'bundles': {
'left4me',
},
}

View file

@ -81,12 +81,6 @@ This loader shape has consequences:
These are intentional parks/buffers, not bugs.
- **`id` must be unique.** A pre-apply hook (`hooks/unique_node_ids.py`)
enforces this; duplicate IDs fail `bw test` and `bw apply`.
- **Bloated per-node metadata is usually a bundle smell.** If a
bundle's metadata block in the node file has more than 3-5 keys,
the bundle is probably under-using `defaults` / reactors. Push the
contribution into the bundle (see
[`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
growing the node file.
## See also

View file

@ -233,7 +233,6 @@
'10.0.229.0/24',
],
},
'ovh.left4me': {},
},
'clients': {
'macbook': {

View file

@ -1,21 +1,15 @@
{
'hostname': '141.95.32.8',
'username': 'debian',
'groups': [
'backup',
'debian-13',
'left4me',
'monitored',
'webserver',
],
'bundles': [
'wireguard',
#'wireguard',
],
'metadata': {
'id': '14d2abc-3855-4bb7-99e2-d4e3eb0344dd',
'vm': {
'cores': 4, # 4 physical, 8 with HT
'threads': 8,
},
'network': {
'external': {
'interface': 'enp3s0f0',
@ -41,12 +35,5 @@
},
},
},
'left4me': {
'domain': 'left4.me',
# Both HT siblings of physical core 0 (cpu0+cpu4 per
# /sys/devices/system/cpu/cpu0/topology/thread_siblings_list).
# Keeps system work off the physical cores running game ticks.
'system_cpus': {0, 4},
},
},
}