Compare commits
No commits in common. "c6caf2a1cf532cc675778840689daaba28891621" and "d4dedde0ad8af5d384df8653fa0e26d25a89bdf8" have entirely different histories.
c6caf2a1cf
...
d4dedde0ad
33 changed files with 23 additions and 2011 deletions
2
.gitignore
vendored
2
.gitignore
vendored
|
|
@ -5,5 +5,3 @@
|
|||
.bw_debug_history
|
||||
# CocoIndex Code (ccc)
|
||||
/.cocoindex_code/
|
||||
# bundlewrap git_deploy local-mirror map (operator-specific paths)
|
||||
git_deploy_repos
|
||||
|
|
|
|||
15
AGENTS.md
15
AGENTS.md
|
|
@ -12,12 +12,12 @@ not project documentation. Onboarding lives **here**, in `AGENTS.md`.
|
|||
|
||||
## Quickstart for agents
|
||||
|
||||
Six rules; follow these and you won't break things:
|
||||
Five rules; follow these and you won't break things:
|
||||
|
||||
1. **Read-only by default.** Never run `bw apply`, `bw run`, or
|
||||
`bw lock` without explicit user request — even with `-i`. Stick
|
||||
to `bw test`, `bw nodes`, `bw groups`, `bw items`,
|
||||
`bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
|
||||
to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
|
||||
`bw items`, `bw metadata`, `bw hash`, `bw debug`. See
|
||||
[`docs/agents/commands.md`](docs/agents/commands.md) and the
|
||||
fork's [safety envelope](https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md).
|
||||
2. **Never echo decrypted secrets.** Don't print, paste, or log the
|
||||
|
|
@ -38,15 +38,6 @@ Six rules; follow these and you won't break things:
|
|||
5. **Prefer adding helpers to `libs/`** over duplicating logic across
|
||||
bundles. Repo-wide helpers go in
|
||||
[`libs/`](libs/AGENTS.md), reachable as `repo.libs.<x>`.
|
||||
6. **`ccc` is available for semantic search.** This repo is indexed
|
||||
with [`ccc`](https://github.com/cocoindex-io/cocoindex-code).
|
||||
Reach for it on conceptual questions ("where is X used / which
|
||||
bundles do Y / what are the contexts of Z"), where a keyword
|
||||
grep would miss indirect usage:
|
||||
`ccc search '<concept>' --path '**'`. Pass `--path '**'` —
|
||||
without it, results are filtered to the current working
|
||||
directory's subtree. `grep`/`rg`/`find` remain fine for
|
||||
exact-string lookups; pick whichever fits the question.
|
||||
|
||||
## Layout
|
||||
|
||||
|
|
|
|||
|
|
@ -41,16 +41,6 @@ bundles/<name>/
|
|||
more than one bundle. Don't duplicate logic across bundles.
|
||||
- **Custom item types** (e.g. `download:`) live in
|
||||
[`items/`](../items/AGENTS.md), not per-bundle.
|
||||
- **Bundles own application-wide knowledge; nodes carry only the few
|
||||
per-host knobs the bundle actually needs.** When designing a bundle,
|
||||
identify the per-node knobs (e.g. domain, uplink interface, a
|
||||
vault-id suffix) and put everything else in `defaults`, or in a
|
||||
reactor that derives from those knobs. Per-node random secrets
|
||||
belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
|
||||
keyed on the node — not in the node file. See
|
||||
`bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
|
||||
and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
|
||||
at module scope).
|
||||
|
||||
## How to add a new bundle
|
||||
|
||||
|
|
@ -66,22 +56,12 @@ bundles/<name>/
|
|||
[`groups/<axis>/<x>.py`](../groups/AGENTS.md) (preferred for shared
|
||||
bundles) or to the node's `bundles` list directly
|
||||
([`nodes/AGENTS.md`](../nodes/AGENTS.md)).
|
||||
5. **Verify, in this order:**
|
||||
- `bw test` — repo-wide parse + cross-cutting hooks. Loads every
|
||||
bundle, but reactors don't fire for nodes that haven't opted into
|
||||
the bundle yet — bugs in new reactors stay hidden here.
|
||||
- **Attach the bundle to a node** (via the node's `bundles` list, or
|
||||
a group it belongs to). Until you do, the next steps don't actually
|
||||
exercise the bundle.
|
||||
- `bw test <node>` — exercises every reactor and item-graph edge for
|
||||
that node. This is where most new-bundle bugs surface.
|
||||
- `bw items <node> --blame` — confirm items materialise with the
|
||||
right paths, authored by the expected bundle.
|
||||
- `bw metadata <node> -k <a/b>` — spot-check derived metadata.
|
||||
- `bw hash <node>` — preview vs current host state.
|
||||
|
||||
See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
|
||||
for the rationale.
|
||||
5. Verify, in this order:
|
||||
- `bw test` — sanity (loaders + reactors).
|
||||
- `bw items <node>` — confirm new items appear on a node that opts in.
|
||||
- `bw hash <node>` — confirm the change is what you expected. See
|
||||
[`docs/agents/commands.md`](../docs/agents/commands.md) and the
|
||||
fork's hash-diff workflow.
|
||||
6. Add a `bundles/<name>/README.md`. See "Per-bundle README" below
|
||||
for what to cover.
|
||||
|
||||
|
|
@ -102,12 +82,6 @@ bundles/<name>/
|
|||
unless the matching `file:` item declares `content_type='mako'`
|
||||
(or a templating extension triggers it). To check, read the matching
|
||||
`file:` entry in `items.py`.
|
||||
- **`file:` `source` defaults to the destination basename.** For a
|
||||
destination of `/etc/foo/bar.conf` with no `source` key, bw looks
|
||||
for `bundles/<bundle>/files/bar.conf`. Only declare `source`
|
||||
explicitly when the basename you want differs (e.g. shipping a Mako
|
||||
template named `bar.conf.mako` to a destination of
|
||||
`/etc/foo/bar.conf`).
|
||||
- **Reactors writing across namespaces.** Some bundles' reactors write
|
||||
into other bundles' metadata namespaces (e.g. `nextcloud` writes
|
||||
into `apt.packages`, `archive.paths`). When you change such a bundle,
|
||||
|
|
@ -116,28 +90,6 @@ bundles/<name>/
|
|||
itself; grep `'<other-bundle>':` in the reactors when in doubt.
|
||||
- **`bw hash` doesn't accept selectors.** Use `bw hash <node>` per
|
||||
literal name; see the fork's runbook.
|
||||
- **Reactors must read metadata.** If a reactor body returns a static
|
||||
dict without calling `metadata.get(...)`, bw raises
|
||||
`ValueError: <reactor> on <node> did not request any metadata, you
|
||||
might want to use defaults instead` once a node consumes the bundle.
|
||||
Fix: fold the contribution into `defaults`. The rule applies even
|
||||
when the reactor writes into another bundle's namespace — a static
|
||||
contribution to e.g. `nftables/output` belongs in `defaults`, where
|
||||
bw merges it with other bundles' contributions.
|
||||
- **`triggers` ↔ `triggered: True` invariant.** Any item listed in
|
||||
another's `triggers` list must declare `triggered: True`. bw
|
||||
enforces this at `bw test` time: *"…triggered by …, but missing
|
||||
'triggered' attribute"*. Corollary: an action can't be both in an
|
||||
upstream `triggers` list AND self-healing every apply — pick one.
|
||||
- **Triggered actions don't recover from partial failure.** When an
|
||||
upstream item's apply succeeds but its triggered downstream action
|
||||
fails, subsequent applies can't recover via the trigger chain —
|
||||
upstream is "already in desired state" and never re-triggers. For
|
||||
actions that must self-heal (pip installs, chowns, migrations),
|
||||
drop `triggered: True` and gate the command with `unless: <fast-check>`.
|
||||
`unless` is a shell command on the target host whose exit status
|
||||
decides whether the main command runs (exit 0 = skip); it's checked
|
||||
at fire time, after `triggered:` filtering.
|
||||
|
||||
## Per-bundle README
|
||||
|
||||
|
|
|
|||
|
|
@ -33,7 +33,6 @@ def acme_zone(metadata):
|
|||
str(ip_interface(other_node.metadata.get('network/internal/ipv4')).ip)
|
||||
for other_node in repo.nodes
|
||||
if other_node.metadata.get('letsencrypt/domains', {})
|
||||
and other_node.metadata.get('network/internal/ipv4', None)
|
||||
},
|
||||
*{
|
||||
str(ip_interface(other_node.metadata.get('wireguard/my_ip')).ip)
|
||||
|
|
|
|||
|
|
@ -1,30 +0,0 @@
|
|||
# bind
|
||||
|
||||
Authoritative DNS — primary plus optional `bind/master_node` slaves.
|
||||
|
||||
## Applying changes needs both nodes
|
||||
|
||||
The slave's bw-managed zone files are rendered from the master's
|
||||
metadata at slave-apply time (see `bundles/bind/items.py:100`). When
|
||||
you change a record on the master (adding a `letsencrypt/domains`
|
||||
entry, a new vhost, etc.), the change is only published once you
|
||||
apply BOTH:
|
||||
|
||||
```sh
|
||||
bw apply htz.mails # primary (where the source records live)
|
||||
bw apply ovh.secondary # secondary (renders its own zone files)
|
||||
```
|
||||
|
||||
Until both have been applied, `bw verify ovh.secondary` will show
|
||||
stale zones and consumers that hit the secondary (Let's Encrypt's
|
||||
secondary-region validators in particular) will see NXDOMAIN. Even
|
||||
though the slave's named.conf.local declares `type slave;`, don't
|
||||
rely on bind's own AXFR catching up — the bw-rendered file on disk
|
||||
is what `bw verify` measures.
|
||||
|
||||
## See also
|
||||
|
||||
- `bundles/bind-acme/` — the in-house ACME-update receiver.
|
||||
- `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
|
||||
negative-cache penalty (the most common operational consequence
|
||||
of forgetting to apply the secondary).
|
||||
|
|
@ -1,114 +0,0 @@
|
|||
# left4me
|
||||
|
||||
L4D2 game-server management platform: a Flask web UI on gunicorn that
|
||||
provisions per-instance srcds servers via templated systemd units, with
|
||||
kernel-overlayfs layering for shared installations + per-overlay maps,
|
||||
and uid-based DSCP/priority marking on the egress path so CAKE on the
|
||||
external interface prioritizes srcds UDP over bulk traffic.
|
||||
|
||||
## Metadata
|
||||
|
||||
```python
|
||||
'metadata': {
|
||||
'left4me': {
|
||||
'domain': 'whatever.tld', # required — the only per-node knob
|
||||
# Everything below is optional and has a sensible default in the
|
||||
# bundle. Override per-node only if the default is wrong:
|
||||
# 'git_url': 'git@git.sublimity.de:cronekorkn/left4me',
|
||||
# 'git_branch': 'master',
|
||||
# 'gunicorn_workers': 1,
|
||||
# 'gunicorn_threads': 32,
|
||||
# 'job_worker_threads': 4,
|
||||
# 'port_range_start': 27015,
|
||||
# 'port_range_end': 27115,
|
||||
# secret_key is auto-derived per node
|
||||
# (repo.vault.random_bytes_as_base64_for f'{node.name} left4me secret_key').
|
||||
},
|
||||
},
|
||||
```
|
||||
|
||||
The bundle's `derived_from_domain` reactor reads `left4me/domain` and
|
||||
emits the corresponding `nginx/vhosts`, `letsencrypt/domains`,
|
||||
`monitoring/services/left4me-web` (HTTPS health check), and the game-
|
||||
port `nftables/input` accept rules. Backup paths
|
||||
(`/var/lib/left4me`, `/etc/left4me`) are set-merged into `backup/paths`
|
||||
from defaults. None of these need to be declared per-node.
|
||||
|
||||
## What this bundle does
|
||||
|
||||
- Creates system users `left4me` (uid/gid 980, home `/var/lib/left4me`,
|
||||
mode 0711) and `l4d2-sandbox` (uid/gid 981, no home, used by bwrap
|
||||
script-overlay builds).
|
||||
- Drops privileged helpers under `/usr/local/libexec/left4me/`
|
||||
(`left4me-systemctl`, `left4me-journalctl`, `left4me-overlay`,
|
||||
`left4me-script-sandbox`) plus a tight sudoers file (validated with
|
||||
`visudo -cf` before install).
|
||||
- `git_deploy`s the left4me repo to `/opt/left4me/src`, builds a venv at
|
||||
`/opt/left4me/.venv`, `pip install -e`s both `l4d2host` and `l4d2web`,
|
||||
runs `alembic upgrade head` and `flask seed-script-overlays`, then
|
||||
enables `left4me-web.service`.
|
||||
- Emits four systemd units via `systemd/units` metadata (consumed by
|
||||
`bundles/systemd/`):
|
||||
- `left4me-web.service` — gunicorn on `127.0.0.1:8000` (TLS terminates upstream).
|
||||
- `left4me-server@.service` — per-instance srcds template, started on
|
||||
demand by the web app via the `left4me-systemctl` helper.
|
||||
- `l4d2-game.slice` / `l4d2-build.slice` — cgroup slices for the
|
||||
perf-baseline (CPU/IO weights, memory caps).
|
||||
- Contributes uid-based DSCP/priority marks for srcds UDP egress to
|
||||
`nftables/output` (via `defaults`).
|
||||
|
||||
## Gotchas
|
||||
|
||||
- **Requires `bundles/nftables` and `bundles/systemd` on the node.** The
|
||||
bundle asserts membership at `bw test` time. On Debian-13 these ride
|
||||
in via the `debian-13` group, so attaching the bundle to a Debian-13
|
||||
node is enough.
|
||||
- **`left4me-web.service` does not have `NoNewPrivileges=true`.** This is
|
||||
intentional — workers `sudo` the privileged helpers; `NoNewPrivileges`
|
||||
would block setuid escalation. Per-instance `server@.service` units
|
||||
*do* have it.
|
||||
- **CAKE shaping is configured separately**, via
|
||||
`network/<iface>/cake` on the node (consumed by `bundles/network/`),
|
||||
not by this bundle.
|
||||
- **First-run admin user is manual.** After `bw apply`, ssh to the host and
|
||||
bootstrap the admin via the `left4me` wrapper (it sources the env files,
|
||||
drops to the `left4me` user, and runs the flask CLI):
|
||||
`sudo left4me create-user <username> --admin` (prompts for password via
|
||||
the flask CLI, or set `LEFT4ME_ADMIN_PASSWORD` first). The bundle
|
||||
deliberately doesn't seed an admin to keep credentials out of the
|
||||
metadata pipeline. The same `left4me` wrapper accepts any other flask
|
||||
subcommand: `sudo left4me seed-script-overlays <dir>`,
|
||||
`sudo left4me routes`, `sudo left4me shell`, etc.
|
||||
- **CPU isolation is managed by this bundle**, driven by one required
|
||||
per-node knob: `left4me/system_cpus` — a set of int CPU ids that
|
||||
pins `system.slice` / `user.slice` / `l4d2-build.slice`. The
|
||||
complement (`set(range(vm/threads)) - system_cpus`) pins
|
||||
`l4d2-game.slice`. On HT hosts, list both SMT siblings of every
|
||||
physical core you want to reserve for system, otherwise games end
|
||||
up sharing L1/L2 with system. Find pairings via
|
||||
`/sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list`. On
|
||||
the prod node (`ovh.left4me`, 4 physical / 8 threads, pairings
|
||||
(0,4) (1,5) (2,6) (3,7)) the node sets `'system_cpus': {0, 4}` to
|
||||
reserve physical core 0 entirely. `l4d2-game.slice` and
|
||||
`l4d2-build.slice` carry `AllowedCPUs=` inline on their unit
|
||||
definitions; `system.slice` and `user.slice` get drop-ins registered
|
||||
under `systemd/units` with the `'<parent>.d/<basename>.conf'` key
|
||||
convention (same shape nginx and autologin use), landing at
|
||||
`/usr/local/lib/systemd/system/<slice>.d/99-left4me-cpuset.conf`.
|
||||
The reactor raises if `system_cpus` includes CPUs outside
|
||||
`[0, vm/threads)` or leaves no cores for games.
|
||||
- **Kernel feature requirement:** kernel-overlayfs (`CONFIG_OVERLAY_FS`).
|
||||
Standard on debian-13.
|
||||
- **Game ports** open by the web app on demand in the range 27015-27115
|
||||
(UDP+TCP). Add corresponding accept rules to `nftables/input` per
|
||||
node if the host's policy is default-drop on input.
|
||||
- **Pinned UIDs/GIDs (980/981).** Chosen for deterministic ownership
|
||||
across rebuilds and backup restores. If you add another bundle that
|
||||
pins UIDs in this repo, make sure it doesn't collide.
|
||||
|
||||
## Slice support requires `bundles/systemd` ≥ commit cc1c6a5
|
||||
|
||||
This bundle's `l4d2-game.slice` and `l4d2-build.slice` units rely on
|
||||
`bundles/systemd/items.py` accepting the `.slice` extension. Older
|
||||
revisions raised `Exception(f'unknown type slice')` at apply time.
|
||||
The repo-wide `bw test` will catch this if it regresses.
|
||||
|
|
@ -1,6 +0,0 @@
|
|||
# Managed by ckn-bw bundles/left4me. Local edits will be reverted.
|
||||
# Deployment units use fixed /var/lib/left4me paths; regenerate units if this changes.
|
||||
LEFT4ME_ROOT=/var/lib/left4me
|
||||
# l4d2host invokes steamcmd by absolute path — bypasses PATH lookup so the
|
||||
# script's `cd "$(dirname "$0")"` resolves next to the real install dir.
|
||||
LEFT4ME_STEAMCMD=/opt/left4me/steam/steamcmd.sh
|
||||
|
|
@ -1,6 +0,0 @@
|
|||
# Sandbox-only resolver config — bind-mounted into script-overlay sandboxes
|
||||
# at /etc/resolv.conf. The host's resolver (often a private/LAN DNS server)
|
||||
# is unreachable from inside the sandbox because IPAddressDeny= blocks
|
||||
# egress to RFC1918 / loopback. Public resolvers keep DNS working.
|
||||
nameserver 1.1.1.1
|
||||
nameserver 8.8.8.8
|
||||
|
|
@ -1,7 +0,0 @@
|
|||
# Managed by ckn-bw bundles/left4me. Local edits will be reverted.
|
||||
DATABASE_URL=sqlite:////var/lib/left4me/left4me.db
|
||||
SECRET_KEY=${node.metadata.get('left4me/secret_key')}
|
||||
JOB_WORKER_THREADS=${node.metadata.get('left4me/job_worker_threads')}
|
||||
SESSION_COOKIE_SECURE=true
|
||||
LEFT4ME_PORT_RANGE_START=${node.metadata.get('left4me/port_range_start')}
|
||||
LEFT4ME_PORT_RANGE_END=${node.metadata.get('left4me/port_range_end')}
|
||||
|
|
@ -1,5 +0,0 @@
|
|||
Defaults:left4me !requiretty
|
||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
|
||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
|
||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-overlay mount *, /usr/local/libexec/left4me/left4me-overlay umount *
|
||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
|
||||
|
|
@ -1,36 +0,0 @@
|
|||
# Host-side perf baseline for left4me — see
|
||||
# docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
||||
#
|
||||
# UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
|
||||
# Source-engine UDP across multiple instances. 8 MiB matches the standard
|
||||
# 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
|
||||
# explicitly enlarge their buffers.
|
||||
net.core.rmem_max = 8388608
|
||||
net.core.wmem_max = 8388608
|
||||
net.core.rmem_default = 524288
|
||||
net.core.wmem_default = 524288
|
||||
|
||||
# Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
|
||||
# at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
|
||||
# netdev_budget = 600 gives softirq more drain headroom per pass.
|
||||
net.core.netdev_max_backlog = 5000
|
||||
net.core.netdev_budget = 600
|
||||
|
||||
# Latency-sensitive default: avoid swap unless the box is really under
|
||||
# pressure. Harmless on swapless hosts.
|
||||
vm.swappiness = 10
|
||||
|
||||
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
|
||||
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
|
||||
net.ipv4.udp_rmem_min = 16384
|
||||
net.ipv4.udp_wmem_min = 16384
|
||||
|
||||
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian Trixie
|
||||
# already defaults to fq_codel; setting it explicitly is belt-and-suspenders
|
||||
# and survives kernel-default churn.
|
||||
net.core.default_qdisc = fq_codel
|
||||
|
||||
# TCP congestion control: BBR for any bulk TCP egress on the host (admin SSH,
|
||||
# backups, package fetches, web-app responses) so a long flow does not push
|
||||
# the bottleneck queue ahead of game UDP. UDP srcds is unaffected.
|
||||
net.ipv4.tcp_congestion_control = bbr
|
||||
|
|
@ -1,53 +0,0 @@
|
|||
#!/bin/sh
|
||||
set -eu
|
||||
|
||||
usage() {
|
||||
printf '%s\n' "usage: left4me-journalctl <server-name> --lines <n> --follow|--no-follow" >&2
|
||||
exit 2
|
||||
}
|
||||
|
||||
validate_name() {
|
||||
name=$1
|
||||
[ -n "$name" ] || usage
|
||||
case "$name" in
|
||||
.*|*..*|*/*|*\\*) usage ;;
|
||||
esac
|
||||
case "$name" in
|
||||
*[!A-Za-z0-9_.-]*) usage ;;
|
||||
esac
|
||||
}
|
||||
|
||||
[ "$#" -eq 4 ] || usage
|
||||
name=$1
|
||||
lines_flag=$2
|
||||
lines=$3
|
||||
follow_flag=$4
|
||||
|
||||
validate_name "$name"
|
||||
[ "$lines_flag" = "--lines" ] || usage
|
||||
case "$lines" in
|
||||
''|*[!0-9]*) usage ;;
|
||||
esac
|
||||
|
||||
follow_arg=
|
||||
case "$follow_flag" in
|
||||
--follow) follow_arg=-f ;;
|
||||
--no-follow) ;;
|
||||
*) usage ;;
|
||||
esac
|
||||
|
||||
unit="left4me-server@${name}.service"
|
||||
if [ -x /bin/journalctl ]; then
|
||||
journalctl=/bin/journalctl
|
||||
elif [ -x /usr/bin/journalctl ]; then
|
||||
journalctl=/usr/bin/journalctl
|
||||
else
|
||||
printf '%s\n' 'journalctl not found at /bin/journalctl or /usr/bin/journalctl' >&2
|
||||
exit 69
|
||||
fi
|
||||
|
||||
if [ -n "$follow_arg" ]; then
|
||||
exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"
|
||||
fi
|
||||
|
||||
exec "$journalctl" -u "$unit" -n "$lines" -o cat
|
||||
|
|
@ -1,242 +0,0 @@
|
|||
#!/usr/bin/python3
|
||||
"""Privileged overlay mount helper for left4me.
|
||||
|
||||
Invoked from the systemd unit's ExecStartPre / ExecStopPost via
|
||||
`+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- …`. The unit-level
|
||||
nsenter is what makes this work: it runs the helper Python interpreter
|
||||
inside PID 1's mount namespace. Without it, the `+` Exec prefix
|
||||
removes the sandbox/credentials but does NOT detach from the unit's
|
||||
per-service mount namespace, and the helper process itself would pin
|
||||
that namespace alive — turning every umount into a multi-second EBUSY
|
||||
race with the kernel's deferred namespace cleanup. With the unit-level
|
||||
nsenter the helper has no such reference and umount succeeds first try.
|
||||
|
||||
Validates inputs strictly, then performs `mount -t overlay` /
|
||||
`umount` directly — no internal nsenter, since the helper is already
|
||||
running where the syscalls need to take effect.
|
||||
|
||||
Verbs:
|
||||
mount <name> Reads ${LEFT4ME_ROOT}/instances/<name>/instance.env
|
||||
for L4D2_LOWERDIRS, validates every lowerdir is
|
||||
under one of installation/overlays/workshop_cache/
|
||||
global_overlay_cache, then mounts the kernel
|
||||
overlay at runtime/<name>/merged.
|
||||
umount <name> Unmounts runtime/<name>/merged and cleans up the
|
||||
kernel-overlayfs `work/work` orphan.
|
||||
|
||||
Set LEFT4ME_OVERLAY_PRINT_ONLY=1 to print the would-be argv (one line,
|
||||
shell-quoted) and exit 0 instead of execv. Used by tests.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
import shlex
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
|
||||
DEFAULT_ROOT = "/var/lib/left4me"
|
||||
LOWERDIR_ALLOWLIST = (
|
||||
"installation",
|
||||
"overlays",
|
||||
"global_overlay_cache",
|
||||
"workshop_cache",
|
||||
)
|
||||
MAX_LOWERDIRS = 500
|
||||
MOUNT_BIN = "/bin/mount"
|
||||
UMOUNT_BIN = "/bin/umount"
|
||||
|
||||
|
||||
def die(msg: str) -> None:
|
||||
sys.stderr.write(f"left4me-overlay: {msg}\n")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def root() -> Path:
|
||||
return Path(os.environ.get("LEFT4ME_ROOT") or DEFAULT_ROOT)
|
||||
|
||||
|
||||
def validate_name(name: str) -> str:
|
||||
if not NAME_RE.fullmatch(name):
|
||||
die(f"invalid instance name: {name!r}")
|
||||
return name
|
||||
|
||||
|
||||
def parse_lowerdirs(env_path: Path) -> list[str]:
|
||||
if not env_path.is_file():
|
||||
die(f"instance.env not found: {env_path}")
|
||||
raw = None
|
||||
for line in env_path.read_text().splitlines():
|
||||
if "=" not in line:
|
||||
continue
|
||||
key, value = line.split("=", 1)
|
||||
if key.strip() == "L4D2_LOWERDIRS":
|
||||
raw = value
|
||||
break
|
||||
if raw is None:
|
||||
die(f"L4D2_LOWERDIRS not set in {env_path}")
|
||||
if raw == "":
|
||||
die(f"L4D2_LOWERDIRS is empty in {env_path}")
|
||||
parts = raw.split(":")
|
||||
if any(p == "" for p in parts):
|
||||
die(f"L4D2_LOWERDIRS contains an empty entry: {raw!r}")
|
||||
if len(parts) > MAX_LOWERDIRS:
|
||||
die(f"L4D2_LOWERDIRS has {len(parts)} entries (cap {MAX_LOWERDIRS})")
|
||||
return parts
|
||||
|
||||
|
||||
def canonical_under(allowed_roots: list[Path], path: Path) -> Path:
|
||||
try:
|
||||
canonical = path.resolve(strict=True)
|
||||
except (FileNotFoundError, RuntimeError):
|
||||
die(f"path does not exist or has a symlink loop: {path}")
|
||||
for r in allowed_roots:
|
||||
if canonical == r or r in canonical.parents:
|
||||
return canonical
|
||||
die(f"path is outside the permitted roots: {path} (resolved: {canonical})")
|
||||
|
||||
|
||||
_LISTXATTR = getattr(os, "listxattr", None)
|
||||
|
||||
|
||||
def _entry_has_fuse_xattr(path: str) -> str | None:
|
||||
if _LISTXATTR is None:
|
||||
return None
|
||||
try:
|
||||
attrs = _LISTXATTR(path, follow_symlinks=False)
|
||||
except OSError:
|
||||
return None
|
||||
for a in attrs:
|
||||
if a.startswith("user.fuseoverlayfs."):
|
||||
return a
|
||||
return None
|
||||
|
||||
|
||||
def assert_no_fuse_xattrs(upper: Path) -> None:
|
||||
if not upper.exists() or _LISTXATTR is None:
|
||||
return
|
||||
for dirpath, dirnames, filenames in os.walk(upper):
|
||||
for entry in (dirpath, *(os.path.join(dirpath, n) for n in dirnames),
|
||||
*(os.path.join(dirpath, n) for n in filenames)):
|
||||
tainted = _entry_has_fuse_xattr(entry)
|
||||
if tainted:
|
||||
die(
|
||||
f"upperdir contains fuse-overlayfs xattr {tainted!r} on {entry}; "
|
||||
"wipe upper/ and work/ before mounting"
|
||||
)
|
||||
|
||||
|
||||
def exec_or_print(argv: list[str]) -> None:
|
||||
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
|
||||
print(" ".join(shlex.quote(a) for a in argv))
|
||||
sys.exit(0)
|
||||
os.execv(argv[0], argv)
|
||||
|
||||
|
||||
def cmd_mount(name: str) -> None:
|
||||
name = validate_name(name)
|
||||
r = root()
|
||||
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
|
||||
merged_for_check = (runtime_name_dir / "merged").resolve(strict=True)
|
||||
|
||||
# Idempotency for unit restart cycles: if a previous start mounted
|
||||
# successfully but ExecStart failed afterwards (and Restart=on-failure
|
||||
# fires another cycle), the second ExecStartPre would otherwise refuse
|
||||
# to mount-on-top. Short-circuit here so the second cycle just gets
|
||||
# straight to ExecStart. PRINT_ONLY (test mode) bypasses this so the
|
||||
# tests can exercise the full nsenter argv regardless of mount state.
|
||||
if (
|
||||
os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") != "1"
|
||||
and os.path.ismount(merged_for_check)
|
||||
):
|
||||
return
|
||||
|
||||
instance_env = r / "instances" / name / "instance.env"
|
||||
raw_lowerdirs = parse_lowerdirs(instance_env)
|
||||
|
||||
allowed_roots = [(r / sub).resolve() for sub in LOWERDIR_ALLOWLIST]
|
||||
canonical_lowerdirs = [str(canonical_under(allowed_roots, Path(p))) for p in raw_lowerdirs]
|
||||
|
||||
upper = (runtime_name_dir / "upper").resolve(strict=True)
|
||||
work = (runtime_name_dir / "work").resolve(strict=True)
|
||||
merged = merged_for_check
|
||||
for label, path in (("upper", upper), ("work", work), ("merged", merged)):
|
||||
if path.parent != runtime_name_dir:
|
||||
die(f"{label} resolved outside runtime/{name}: {path}")
|
||||
|
||||
assert_no_fuse_xattrs(upper)
|
||||
|
||||
options = f"lowerdir={':'.join(canonical_lowerdirs)},upperdir={upper},workdir={work}"
|
||||
argv = [
|
||||
MOUNT_BIN,
|
||||
"-t", "overlay",
|
||||
"overlay",
|
||||
"-o", options,
|
||||
str(merged),
|
||||
]
|
||||
exec_or_print(argv)
|
||||
|
||||
|
||||
def cmd_umount(name: str) -> None:
|
||||
name = validate_name(name)
|
||||
r = root()
|
||||
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
|
||||
merged_path = runtime_name_dir / "merged"
|
||||
work_inner = runtime_name_dir / "work" / "work"
|
||||
|
||||
argv = [
|
||||
UMOUNT_BIN,
|
||||
# Resolve only if it exists; PRINT_ONLY tests always pre-create it.
|
||||
str(merged_path.resolve(strict=True) if merged_path.exists() else merged_path),
|
||||
]
|
||||
|
||||
# PRINT_ONLY: emit the umount argv and exit. Tests assert exact shape
|
||||
# of this dry-run; the post-umount cleanup of work_inner is a runtime
|
||||
# behaviour exercised on the host, not in unit tests.
|
||||
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
|
||||
print(" ".join(shlex.quote(a) for a in argv))
|
||||
sys.exit(0)
|
||||
|
||||
if merged_path.exists():
|
||||
merged = merged_path.resolve(strict=True)
|
||||
if merged.parent != runtime_name_dir:
|
||||
die(f"merged resolved outside runtime/{name}: {merged}")
|
||||
# Idempotency: only umount if currently a mount point. Mirrors
|
||||
# cmd_mount's symmetric check; a redundant cleanup pass — or a
|
||||
# call after a partial _purge_instance — must be a no-op.
|
||||
#
|
||||
# No retry loop here: with the helper running in PID 1's mount
|
||||
# namespace (via the unit-level `nsenter --mount=/proc/1/ns/mnt`
|
||||
# in ExecStopPost), it holds no reference to the unit's
|
||||
# per-service mount namespace, so the cgroup-empty → namespace
|
||||
# reaped → umount-clears sequence happens without any race
|
||||
# window for us to ride out. EBUSY here is a real error.
|
||||
if os.path.ismount(merged):
|
||||
subprocess.run(argv, check=True)
|
||||
|
||||
# Kernel-overlayfs creates work_inner during mount with root:root mode
|
||||
# 0/0. After unmount it's an orphan that the unit's User= (left4me)
|
||||
# cannot traverse via shutil.rmtree, so reset/delete in instances.py
|
||||
# blows up with EACCES on `runtime/<name>/work/work`. The helper is
|
||||
# the only code path with root that knows about this directory, so
|
||||
# the cleanup belongs here. Safe to nuke — the kernel re-creates it
|
||||
# on the next mount. Run unconditionally — covers both "we just
|
||||
# unmounted" and "previous teardown didn't finish" cases.
|
||||
if work_inner.exists():
|
||||
shutil.rmtree(work_inner)
|
||||
|
||||
|
||||
def main(argv: list[str]) -> None:
|
||||
if len(argv) != 3 or argv[1] not in ("mount", "umount"):
|
||||
sys.stderr.write("usage: left4me-overlay mount|umount <name>\n")
|
||||
sys.exit(2)
|
||||
if argv[1] == "mount":
|
||||
cmd_mount(argv[2])
|
||||
else:
|
||||
cmd_umount(argv[2])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main(sys.argv)
|
||||
|
|
@ -1,82 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Privileged sandbox launcher for left4me script overlays.
|
||||
#
|
||||
# Invoked via sudo by the web user with two arguments:
|
||||
# <overlay_id> numeric overlay id; bind-mounts /var/lib/left4me/overlays/<id>
|
||||
# read-write at /overlay inside the sandbox.
|
||||
# <script_path> absolute path to a bash file already written by the web app;
|
||||
# bind-mounted read-only at /script.sh inside the sandbox.
|
||||
#
|
||||
# The script runs as a transient systemd .service with the full hardening
|
||||
# surface: cgroup limits + walltime kill, NoNewPrivileges, ProtectSystem,
|
||||
# ProtectHome, kernel-tunable / -module / -log protection, namespace
|
||||
# restriction, address-family restriction, capability bounding (empty),
|
||||
# seccomp filter (@system-service @network-io), MemoryDenyWriteExecute,
|
||||
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
|
||||
# scripts must reach the public internet to download workshop / l4d2center
|
||||
# / cedapug content. PID namespace is shared with the host (no
|
||||
# PrivatePID= directive in systemd); host PIDs are visible via /proc but
|
||||
# not signal-able due to UID mismatch.
|
||||
set -euo pipefail
|
||||
|
||||
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
|
||||
|
||||
OVERLAY_ID=$1
|
||||
SCRIPT=$2
|
||||
|
||||
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
|
||||
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
|
||||
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
|
||||
[[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
|
||||
|
||||
if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
|
||||
echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Make sure the sandbox UID owns the overlay dir so the script can write there.
|
||||
# Idempotent: a no-op when the dir is already l4d2-sandbox-owned (re-run case),
|
||||
# and corrects the ownership the first time the dir was created by the web app
|
||||
# under the left4me UID. World-readable so the gameserver process (left4me)
|
||||
# can read the overlay contents via the kernel-overlayfs lowerdir at runtime.
|
||||
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
|
||||
chmod 0755 "$OVERLAY_DIR"
|
||||
|
||||
SCRIPT_RC=0
|
||||
systemd-run --quiet --collect --wait --pipe \
|
||||
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
||||
--slice=l4d2-build.slice \
|
||||
-p OOMScoreAdjust=500 \
|
||||
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
|
||||
-p UMask=0022 \
|
||||
-p NoNewPrivileges=yes \
|
||||
-p ProtectSystem=strict -p ProtectHome=yes \
|
||||
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
|
||||
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
|
||||
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
|
||||
-p RestrictNamespaces=yes \
|
||||
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
|
||||
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
|
||||
-p MemoryDenyWriteExecute=yes \
|
||||
-p SystemCallFilter="@system-service @network-io" \
|
||||
-p SystemCallArchitectures=native \
|
||||
-p CapabilityBoundingSet= -p AmbientCapabilities= \
|
||||
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
|
||||
-p TemporaryFileSystem="/etc /var/lib" \
|
||||
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
|
||||
-p BindPaths="${OVERLAY_DIR}:/overlay" \
|
||||
-p WorkingDirectory=/overlay \
|
||||
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
|
||||
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
|
||||
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
|
||||
-- /bin/bash /script.sh || SCRIPT_RC=$?
|
||||
|
||||
# Normalize perms so the web service (left4me uid) can read overlay files
|
||||
# directly via Python open() — needed by the file tree's download endpoint.
|
||||
# UMask=0022 above takes care of *new* writes; this catches anything the
|
||||
# script created with a tighter mode (e.g. cedapug_maps writes its
|
||||
# .cedapug/manifest.tsv as 0600 by default).
|
||||
find "$OVERLAY_DIR" -type f ! -perm -o+r -exec chmod o+r {} + 2>/dev/null || true
|
||||
find "$OVERLAY_DIR" -type d ! -perm -o+rx -exec chmod o+rx {} + 2>/dev/null || true
|
||||
|
||||
exit $SCRIPT_RC
|
||||
|
|
@ -1,44 +0,0 @@
|
|||
#!/bin/sh
|
||||
set -eu
|
||||
|
||||
usage() {
|
||||
printf '%s\n' "usage: left4me-systemctl enable|disable|show <server-name>" >&2
|
||||
exit 2
|
||||
}
|
||||
|
||||
validate_name() {
|
||||
name=$1
|
||||
[ -n "$name" ] || usage
|
||||
case "$name" in
|
||||
.*|*..*|*/*|*\\*) usage ;;
|
||||
esac
|
||||
case "$name" in
|
||||
*[!A-Za-z0-9_.-]*) usage ;;
|
||||
esac
|
||||
}
|
||||
|
||||
[ "$#" -eq 2 ] || usage
|
||||
action=$1
|
||||
name=$2
|
||||
|
||||
case "$action" in
|
||||
enable|disable|show) ;;
|
||||
*) usage ;;
|
||||
esac
|
||||
|
||||
validate_name "$name"
|
||||
unit="left4me-server@${name}.service"
|
||||
if [ -x /bin/systemctl ]; then
|
||||
systemctl=/bin/systemctl
|
||||
elif [ -x /usr/bin/systemctl ]; then
|
||||
systemctl=/usr/bin/systemctl
|
||||
else
|
||||
printf '%s\n' 'systemctl not found at /bin/systemctl or /usr/bin/systemctl' >&2
|
||||
exit 69
|
||||
fi
|
||||
|
||||
case "$action" in
|
||||
enable) exec "$systemctl" enable --now "$unit" ;;
|
||||
disable) exec "$systemctl" disable --now "$unit" ;;
|
||||
show) exec "$systemctl" show --property=ActiveState --property=SubState "$unit" ;;
|
||||
esac
|
||||
|
|
@ -1,17 +0,0 @@
|
|||
#!/bin/sh
|
||||
# Run l4d2web flask CLI commands as the left4me user with the deploy env loaded.
|
||||
# Usage: left4me <flask-subcommand> [args...]
|
||||
# Examples:
|
||||
# left4me create-user alice --admin
|
||||
# left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
|
||||
# left4me routes
|
||||
set -eu
|
||||
exec sudo -u left4me sh -c '
|
||||
set -a
|
||||
. /etc/left4me/host.env
|
||||
. /etc/left4me/web.env
|
||||
set +a
|
||||
export JOB_WORKER_ENABLED=false
|
||||
export PYTHONPATH=/opt/left4me/src
|
||||
exec /opt/left4me/.venv/bin/flask --app l4d2web.app:create_app "$@"
|
||||
' sh "$@"
|
||||
|
|
@ -1,293 +0,0 @@
|
|||
# Items for the left4me bundle.
|
||||
# Systemd units come from metadata via bundles/systemd/ — there are no
|
||||
# .service or .slice files in this bundle's files/ tree. Cpuset drop-ins
|
||||
# for system.slice / user.slice are likewise emitted via systemd/units
|
||||
# in metadata.py (key: '<parent>.d/<basename>.conf').
|
||||
|
||||
directories = {
|
||||
'/opt/left4me': {
|
||||
'owner': 'left4me',
|
||||
'group': 'left4me',
|
||||
},
|
||||
'/opt/left4me/src': {
|
||||
'owner': 'left4me',
|
||||
'group': 'left4me',
|
||||
},
|
||||
'/etc/left4me': {
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
'mode': '0755',
|
||||
},
|
||||
'/var/lib/left4me': {
|
||||
# left4me's home dir — useradd creates with 0700; loosen to 0711 so
|
||||
# l4d2-sandbox can traverse (but not list) for bwrap bind-mounts.
|
||||
'owner': 'left4me',
|
||||
'group': 'left4me',
|
||||
'mode': '0711',
|
||||
},
|
||||
'/var/lib/left4me/installation': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/var/lib/left4me/overlays': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/var/lib/left4me/instances': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/var/lib/left4me/runtime': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/var/lib/left4me/workshop_cache': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/var/lib/left4me/tmp': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/opt/left4me/steam': {'owner': 'left4me', 'group': 'left4me'},
|
||||
'/usr/local/libexec/left4me': {
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
'mode': '0755',
|
||||
},
|
||||
}
|
||||
|
||||
groups = {
|
||||
'left4me': {'gid': 980},
|
||||
'l4d2-sandbox': {'gid': 981},
|
||||
}
|
||||
|
||||
users = {
|
||||
'left4me': {
|
||||
'uid': 980,
|
||||
'gid': 980,
|
||||
'home': '/var/lib/left4me',
|
||||
'shell': '/usr/sbin/nologin',
|
||||
},
|
||||
'l4d2-sandbox': {
|
||||
'uid': 981,
|
||||
'gid': 981,
|
||||
'shell': '/usr/sbin/nologin',
|
||||
},
|
||||
}
|
||||
# UIDs/GIDs pinned in the system-package range (100-999, per Debian
|
||||
# policy) so file ownership is deterministic across rebuilds and
|
||||
# backup restores. 980/981 are unused elsewhere in this repo.
|
||||
|
||||
# Privileged helpers (mode 0755 root:root). Listed by sudoers as the only
|
||||
# commands left4me can invoke as root NOPASSWD.
|
||||
HELPERS = (
|
||||
'left4me-systemctl',
|
||||
'left4me-journalctl',
|
||||
'left4me-overlay',
|
||||
'left4me-script-sandbox',
|
||||
)
|
||||
|
||||
files = {
|
||||
'/usr/local/sbin/left4me': {
|
||||
'source': 'usr/local/sbin/left4me', # explicit — basename collides with sudoers
|
||||
'mode': '0755',
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
},
|
||||
**{
|
||||
f'/usr/local/libexec/left4me/{h}': {
|
||||
'source': f'usr/local/libexec/left4me/{h}',
|
||||
'mode': '0755',
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
}
|
||||
for h in HELPERS
|
||||
},
|
||||
'/etc/left4me/sandbox-resolv.conf': {
|
||||
'source': 'etc/left4me/sandbox-resolv.conf',
|
||||
'mode': '0644',
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
},
|
||||
'/etc/sudoers.d/left4me': {
|
||||
'source': 'etc/sudoers.d/left4me',
|
||||
'mode': '0440',
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
'test_with': 'visudo -cf {}',
|
||||
},
|
||||
'/etc/sysctl.d/99-left4me.conf': {
|
||||
'source': 'etc/sysctl.d/99-left4me.conf',
|
||||
'mode': '0644',
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
'triggers': [
|
||||
'action:left4me_sysctl_reload',
|
||||
],
|
||||
},
|
||||
'/etc/left4me/host.env': {
|
||||
'source': 'etc/left4me/host.env.mako',
|
||||
'content_type': 'mako',
|
||||
'mode': '0644',
|
||||
'owner': 'root',
|
||||
'group': 'root',
|
||||
},
|
||||
'/etc/left4me/web.env': {
|
||||
'source': 'etc/left4me/web.env.mako',
|
||||
'content_type': 'mako',
|
||||
'mode': '0640',
|
||||
'owner': 'root',
|
||||
'group': 'left4me',
|
||||
'needs': [
|
||||
'group:left4me',
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
actions = {
|
||||
'left4me_sysctl_reload': {
|
||||
'command': 'sysctl --system >/dev/null',
|
||||
'triggered': True,
|
||||
},
|
||||
'left4me_dpkg_add_i386_arch': {
|
||||
# steamcmd is 32-bit and pulls libc6:i386 + lib32z1 from the i386 arch.
|
||||
# apt-get update is part of this action because newly-added foreign
|
||||
# archs need a fresh package list before any :i386 package resolves.
|
||||
'command': 'dpkg --add-architecture i386 && apt-get update',
|
||||
'unless': 'dpkg --print-foreign-architectures | grep -qx i386',
|
||||
'cascade_skip': False,
|
||||
},
|
||||
'left4me_install_steamcmd': {
|
||||
# Steam's tarball is rolling with no published checksum, so we can't
|
||||
# use download: (which requires a hash). Guard with a presence check
|
||||
# on steamcmd.sh — steamcmd self-updates at runtime, so chasing the
|
||||
# tarball version from bw isn't useful.
|
||||
'command': (
|
||||
'sudo -u left4me sh -c "'
|
||||
'cd /opt/left4me/steam && '
|
||||
'curl -fsSL https://media.steampowered.com/installer/steamcmd_linux.tar.gz | '
|
||||
'tar -xz'
|
||||
'"'
|
||||
),
|
||||
'unless': 'test -x /opt/left4me/steam/steamcmd.sh',
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'directory:/opt/left4me/steam',
|
||||
'pkg_apt:curl',
|
||||
'pkg_apt:libc6_i386', # bw pkg_apt convention: _ → :
|
||||
'pkg_apt:lib32z1',
|
||||
'user:left4me',
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
# steamcmd is invoked by absolute path (LEFT4ME_STEAMCMD in host.env),
|
||||
# not via PATH lookup — see l4d2host/cli.py:install. We don't need to put
|
||||
# anything in /usr/local/bin for it.
|
||||
|
||||
git_deploy = {
|
||||
'/opt/left4me/src': {
|
||||
'repo': node.metadata.get('left4me/git_url'),
|
||||
'rev': node.metadata.get('left4me/git_branch'),
|
||||
'triggers': [
|
||||
# On a code-update apply, refresh the DB schema. pip_install
|
||||
# would have triggered alembic in the create_venv path, but on
|
||||
# a normal apply pip_install's `unless` skips (packages still
|
||||
# importable from the previous editable install), and that
|
||||
# would leave alembic_upgrade dormant. Wiring git_deploy →
|
||||
# alembic directly ensures new migrations land whenever new
|
||||
# code lands. alembic upgrade head is idempotent (no-op when
|
||||
# already at head), so this is safe to fire on every code
|
||||
# update; the seed_overlays + service:restart cascade off
|
||||
# alembic also covers picking up the new code in gunicorn.
|
||||
'action:left4me_alembic_upgrade',
|
||||
],
|
||||
# chown_src and pip_install are NOT in triggers — they run every
|
||||
# apply gated by their own `unless` guards, which makes the chain
|
||||
# self-healing after a partial failure. (Items in a triggers list
|
||||
# must be triggered:True, which would lose that property.)
|
||||
},
|
||||
}
|
||||
|
||||
actions['left4me_chown_src'] = {
|
||||
# Runs every apply (cheap — chown -R on a small tree). Self-heals
|
||||
# whenever git_deploy extracts a new tarball as root-owned files.
|
||||
# Not in any triggers list so doesn't need triggered:True.
|
||||
'command': 'chown -R left4me:left4me /opt/left4me/src',
|
||||
'unless': 'test -z "$(find /opt/left4me/src \\! -user left4me -print -quit 2>/dev/null)"',
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'git_deploy:/opt/left4me/src',
|
||||
'user:left4me',
|
||||
'group:left4me',
|
||||
],
|
||||
}
|
||||
|
||||
actions['left4me_create_venv'] = {
|
||||
'command': 'sudo -u left4me /usr/bin/python3 -m venv /opt/left4me/.venv',
|
||||
'unless': 'test -x /opt/left4me/.venv/bin/python',
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'directory:/opt/left4me',
|
||||
'pkg_apt:python3-venv',
|
||||
'user:left4me',
|
||||
],
|
||||
'triggers': [
|
||||
'action:left4me_pip_upgrade',
|
||||
],
|
||||
}
|
||||
|
||||
actions['left4me_pip_upgrade'] = {
|
||||
'command': 'sudo -u left4me /opt/left4me/.venv/bin/python -m pip install --upgrade pip',
|
||||
'triggered': True,
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'pkg_apt:python3-pip',
|
||||
],
|
||||
# No triggers — pip_install runs on every apply (gated by `unless`)
|
||||
# rather than being chained from here. Keeps pip_upgrade scoped to
|
||||
# exactly its purpose.
|
||||
}
|
||||
|
||||
actions['left4me_pip_install'] = {
|
||||
# Single pip invocation installs both editable packages from the same
|
||||
# checkout. Runs on every apply: pip install -e is fast on no-op, and
|
||||
# any gate weaker than "egg-info matches pyproject.toml" can mask
|
||||
# script regeneration — e.g. adding [project.scripts] later wouldn't
|
||||
# be picked up if `unless` only checks importability.
|
||||
'command': 'sudo -u left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/src/l4d2host -e /opt/left4me/src/l4d2web',
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'git_deploy:/opt/left4me/src',
|
||||
'action:left4me_create_venv',
|
||||
'action:left4me_chown_src',
|
||||
],
|
||||
'triggers': [
|
||||
'action:left4me_alembic_upgrade',
|
||||
],
|
||||
}
|
||||
|
||||
actions['left4me_alembic_upgrade'] = {
|
||||
# Mirrors deploy-test-server.sh:239-242. Runs as left4me with both env
|
||||
# files sourced; JOB_WORKER_ENABLED=false so a stray worker doesn't race
|
||||
# with the migration.
|
||||
'command': (
|
||||
'sudo -u left4me sh -c "'
|
||||
'cd /opt/left4me/src/l4d2web && '
|
||||
'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
|
||||
'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
|
||||
'/opt/left4me/.venv/bin/alembic -c /opt/left4me/src/l4d2web/alembic.ini upgrade head'
|
||||
'"'
|
||||
),
|
||||
'triggered': True,
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'action:left4me_pip_install',
|
||||
'file:/etc/left4me/host.env',
|
||||
'file:/etc/left4me/web.env',
|
||||
],
|
||||
'triggers': [
|
||||
'action:left4me_seed_overlays',
|
||||
'svc_systemd:left4me-web.service:restart',
|
||||
],
|
||||
}
|
||||
|
||||
actions['left4me_seed_overlays'] = {
|
||||
# Idempotent: refreshes script bodies in place; existing overlay rows keep their ids.
|
||||
'command': (
|
||||
'sudo -u left4me sh -c "'
|
||||
'set -a && . /etc/left4me/host.env && . /etc/left4me/web.env && set +a && '
|
||||
'env JOB_WORKER_ENABLED=false PYTHONPATH=/opt/left4me/src '
|
||||
'/opt/left4me/.venv/bin/flask --app l4d2web.app:create_app '
|
||||
'seed-script-overlays /opt/left4me/src/examples/script-overlays'
|
||||
'"'
|
||||
),
|
||||
'triggered': True,
|
||||
'cascade_skip': False,
|
||||
'needs': [
|
||||
'action:left4me_alembic_upgrade',
|
||||
],
|
||||
}
|
||||
|
|
@ -1,275 +0,0 @@
|
|||
assert node.has_bundle('nftables')
|
||||
assert node.has_bundle('systemd')
|
||||
|
||||
|
||||
defaults = {
|
||||
'left4me': {
|
||||
# Application-wide defaults; node only overrides if it really needs to.
|
||||
'git_url': 'https://git.sublimity.de/cronekorkn/left4me.git',
|
||||
'git_branch': 'master',
|
||||
'secret_key': repo.vault.random_bytes_as_base64_for(f'{node.name} left4me secret_key', length=32).value,
|
||||
'gunicorn_workers': 1,
|
||||
'gunicorn_threads': 32,
|
||||
'job_worker_threads': 4,
|
||||
# Whole 27000-block: covers Steam's defaults (27015 game, 27005
|
||||
# client/RCON) plus headroom for ad-hoc ports without further
|
||||
# nftables changes. Mirrored into LEFT4ME_PORT_RANGE_{START,END}
|
||||
# by web.env.mako and into the nftables input rule by the
|
||||
# nftables_input reactor below.
|
||||
'port_range_start': 27000,
|
||||
'port_range_end': 27999,
|
||||
},
|
||||
'apt': {
|
||||
'packages': {
|
||||
'p7zip-full': {},
|
||||
'nftables': {},
|
||||
'iproute2': {},
|
||||
'curl': {},
|
||||
'ca-certificates': {},
|
||||
'python3': {},
|
||||
'python3-venv': {},
|
||||
'python3-pip': {},
|
||||
'python3-dev': {},
|
||||
# steamcmd is a 32-bit ELF; needs i386 multiarch + these libs.
|
||||
# `_` → `:` is bundlewrap's pkg_apt convention for multiarch
|
||||
# names (see pkg_apt.py:48).
|
||||
'libc6_i386': { # installs libc6:i386
|
||||
'needs': ['action:left4me_dpkg_add_i386_arch'],
|
||||
},
|
||||
'lib32z1': {
|
||||
'needs': ['action:left4me_dpkg_add_i386_arch'],
|
||||
},
|
||||
},
|
||||
},
|
||||
'nftables': {
|
||||
# Match deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft.
|
||||
# Mark srcds UDP egress (uid left4me) with DSCP EF + skb priority 6
|
||||
# so CAKE classifies it into the priority tin.
|
||||
'output': {
|
||||
'meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000',
|
||||
'meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000',
|
||||
},
|
||||
},
|
||||
'systemd': {
|
||||
'services': {
|
||||
'left4me-web.service': {
|
||||
'enabled': True,
|
||||
'running': True,
|
||||
'needs': [
|
||||
'action:left4me_alembic_upgrade',
|
||||
'file:/etc/left4me/host.env',
|
||||
'file:/etc/left4me/web.env',
|
||||
],
|
||||
},
|
||||
# Note: left4me-server@.service is a TEMPLATE — instances are
|
||||
# started on-demand by the web app via the left4me-systemctl
|
||||
# helper. Don't enable/start it from here.
|
||||
# The slices are installed (file present) but don't need
|
||||
# enable/start — they're activated implicitly when a unit
|
||||
# uses Slice=.
|
||||
},
|
||||
},
|
||||
'backup': {
|
||||
# Application-owned paths. Set-merged with backup group / node-level paths.
|
||||
'paths': {
|
||||
'/var/lib/left4me',
|
||||
'/etc/left4me',
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@metadata_reactor.provides(
|
||||
'nginx/vhosts',
|
||||
)
|
||||
def nginx_vhosts(metadata):
|
||||
# letsencrypt/domains and monitoring/services for the vhost are auto-
|
||||
# populated by bundles/nginx/metadata.py. We just declare check_path:
|
||||
# '/health' so the auto-check hits the Flask health endpoint, not '/'.
|
||||
domain = metadata.get('left4me/domain')
|
||||
return {
|
||||
'nginx': {
|
||||
'vhosts': {
|
||||
domain: {
|
||||
'content': 'nginx/proxy_pass.conf',
|
||||
'context': {
|
||||
'target': 'http://127.0.0.1:8000',
|
||||
},
|
||||
'check_path': '/health',
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@metadata_reactor.provides(
|
||||
'nftables/input',
|
||||
)
|
||||
def nftables_input(metadata):
|
||||
port_start = metadata.get('left4me/port_range_start')
|
||||
port_end = metadata.get('left4me/port_range_end')
|
||||
return {
|
||||
'nftables': {
|
||||
'input': {
|
||||
f'udp dport {port_start}-{port_end} accept',
|
||||
f'tcp dport {port_start}-{port_end} accept',
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@metadata_reactor.provides(
|
||||
'systemd/units',
|
||||
)
|
||||
def systemd_units(metadata):
|
||||
workers = metadata.get('left4me/gunicorn_workers')
|
||||
threads = metadata.get('left4me/gunicorn_threads')
|
||||
|
||||
# cgroup-v2 cpuset. `system_cpus` (set of int CPU ids, declared per
|
||||
# node) pins system/user/build; the complement pins l4d2-game. On HT
|
||||
# hosts, list both siblings of a physical core so games don't share
|
||||
# L1/L2 with system work — pairings via
|
||||
# /sys/devices/system/cpu/cpu<n>/topology/thread_siblings_list.
|
||||
vm_threads = metadata.get('vm/threads', metadata.get('vm/cores'))
|
||||
all_cpus = set(range(vm_threads))
|
||||
system_cpus = metadata.get('left4me/system_cpus')
|
||||
if not system_cpus <= all_cpus:
|
||||
raise Exception(
|
||||
f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
|
||||
f'includes CPUs outside [0, {vm_threads})'
|
||||
)
|
||||
game_cpus = all_cpus - system_cpus
|
||||
if not game_cpus:
|
||||
raise Exception(
|
||||
f'left4me/system_cpus={sorted(system_cpus)} on {vm_threads}-thread host '
|
||||
f'leaves no cores for games'
|
||||
)
|
||||
system_cpus_string = ','.join(str(t) for t in sorted(system_cpus))
|
||||
game_cpus_string = ','.join(str(t) for t in sorted(game_cpus))
|
||||
|
||||
# Drop-in for upstream system.slice / user.slice (units we don't own).
|
||||
# Same '<parent>.d/<basename>.conf' convention as nginx and autologin.
|
||||
cpuset_dropin = {'Slice': {'AllowedCPUs': system_cpus_string}}
|
||||
|
||||
return {
|
||||
'systemd': {
|
||||
'units': {
|
||||
'left4me-web.service': {
|
||||
'Unit': {
|
||||
'Description': 'left4me web application',
|
||||
'After': 'network-online.target',
|
||||
'Wants': 'network-online.target',
|
||||
},
|
||||
'Service': {
|
||||
'Type': 'simple',
|
||||
'User': 'left4me',
|
||||
'Group': 'left4me',
|
||||
'WorkingDirectory': '/opt/left4me/src',
|
||||
'Environment': {
|
||||
'HOME=/var/lib/left4me',
|
||||
'PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
|
||||
},
|
||||
'EnvironmentFile': (
|
||||
'/etc/left4me/host.env',
|
||||
'/etc/left4me/web.env',
|
||||
),
|
||||
'ExecStart': (
|
||||
'/opt/left4me/.venv/bin/gunicorn '
|
||||
f'--workers {workers} --threads {threads} '
|
||||
"--bind 127.0.0.1:8000 'l4d2web.app:create_app()'"
|
||||
),
|
||||
'Restart': 'on-failure',
|
||||
'RestartSec': '3',
|
||||
# NoNewPrivileges intentionally NOT set: workers sudo to the helpers.
|
||||
'ProtectSystem': 'full',
|
||||
'ReadWritePaths': '/var/lib/left4me',
|
||||
'PrivateTmp': 'true',
|
||||
},
|
||||
'Install': {
|
||||
'WantedBy': {'multi-user.target'},
|
||||
},
|
||||
},
|
||||
'left4me-server@.service': {
|
||||
'Unit': {
|
||||
'Description': 'left4me server instance %i',
|
||||
'After': 'network-online.target',
|
||||
'Wants': 'network-online.target',
|
||||
'StartLimitBurst': '5',
|
||||
'StartLimitIntervalSec': '60s',
|
||||
},
|
||||
'Service': {
|
||||
'Type': 'simple',
|
||||
'User': 'left4me',
|
||||
'Group': 'left4me',
|
||||
'EnvironmentFile': (
|
||||
'/etc/left4me/host.env',
|
||||
'/var/lib/left4me/instances/%i/instance.env',
|
||||
),
|
||||
'WorkingDirectory': '-/var/lib/left4me/runtime/%i/merged/left4dead2',
|
||||
'ExecStartPre': (
|
||||
'+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
|
||||
'/usr/local/libexec/left4me/left4me-overlay mount %i'
|
||||
),
|
||||
'ExecStart': (
|
||||
'/var/lib/left4me/runtime/%i/merged/srcds_run '
|
||||
'-game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS'
|
||||
),
|
||||
'ExecStopPost': (
|
||||
'+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- '
|
||||
'/usr/local/libexec/left4me/left4me-overlay umount %i'
|
||||
),
|
||||
'Restart': 'on-failure',
|
||||
'RestartSec': '5',
|
||||
'Slice': 'l4d2-game.slice',
|
||||
'Nice': '-5',
|
||||
'IOSchedulingClass': 'best-effort',
|
||||
'IOSchedulingPriority': '4',
|
||||
'OOMScoreAdjust': '-200',
|
||||
'MemoryHigh': '1.5G',
|
||||
'MemoryMax': '2G',
|
||||
'TasksMax': '256',
|
||||
'LimitNOFILE': '65536',
|
||||
'KillSignal': 'SIGINT',
|
||||
'TimeoutStopSec': '15s',
|
||||
'LogRateLimitIntervalSec': '0',
|
||||
'NoNewPrivileges': 'true',
|
||||
'PrivateTmp': 'true',
|
||||
'PrivateDevices': 'true',
|
||||
'ProtectHome': 'true',
|
||||
'ProtectSystem': 'strict',
|
||||
'ReadOnlyPaths': '/var/lib/left4me/installation /var/lib/left4me/overlays',
|
||||
'ReadWritePaths': '/var/lib/left4me/runtime/%i',
|
||||
'RestrictSUIDSGID': 'true',
|
||||
'LockPersonality': 'true',
|
||||
},
|
||||
'Install': {
|
||||
'WantedBy': {'multi-user.target'},
|
||||
},
|
||||
},
|
||||
'l4d2-game.slice': {
|
||||
'Unit': {
|
||||
'Description': 'left4me game-server slice',
|
||||
'Before': 'slices.target',
|
||||
},
|
||||
'Slice': {
|
||||
'CPUWeight': '1000',
|
||||
'IOWeight': '1000',
|
||||
'AllowedCPUs': game_cpus_string,
|
||||
},
|
||||
},
|
||||
'l4d2-build.slice': {
|
||||
'Unit': {
|
||||
'Description': 'left4me script-sandbox build slice',
|
||||
'Before': 'slices.target',
|
||||
},
|
||||
'Slice': {
|
||||
'CPUWeight': '10',
|
||||
'IOWeight': '10',
|
||||
'AllowedCPUs': system_cpus_string,
|
||||
},
|
||||
},
|
||||
'system.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
|
||||
'user.slice.d/99-left4me-cpuset.conf': cpuset_dropin,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
|
@ -1,60 +1,9 @@
|
|||
# letsencrypt
|
||||
|
||||
Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
|
||||
DNS-01 against the in-house bind-acme server.
|
||||
|
||||
[upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
|
||||
|
||||
## First-apply behaviour
|
||||
|
||||
Immediately after `bw apply <node>`, nginx serves a **self-signed
|
||||
cert** for each declared domain — generated by
|
||||
`/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
|
||||
something to start with. The real Let's Encrypt cert arrives at most
|
||||
24h later when the systemd timer fires
|
||||
(`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
|
||||
shortcut the wait:
|
||||
|
||||
```sh
|
||||
ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
|
||||
ssh <node> 'sudo systemctl reload nginx'
|
||||
```
|
||||
|
||||
## DNS-01 prerequisites
|
||||
|
||||
`hook.sh` does `nsupdate` against the bind-acme server (referenced
|
||||
by `letsencrypt/acme_node`). For the challenge to succeed:
|
||||
|
||||
1. The acme node must be in the same metadata graph (so
|
||||
`bw metadata <node> -k letsencrypt/acme_node` resolves).
|
||||
2. **All NS servers** for the validated domain must serve the
|
||||
`_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
|
||||
primary AND secondary geographic regions; both authoritative
|
||||
servers must agree. If a secondary NS is also a bw-managed node,
|
||||
`bw apply` it after adding the domain (see e.g. `ovh.secondary`).
|
||||
3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
|
||||
rendered with the bind-acme server's `network/internal/ipv4` —
|
||||
for clients outside that LAN, the route must exist (typically via
|
||||
wireguard `s2s` peer membership).
|
||||
|
||||
## Negative-cache penalty
|
||||
|
||||
If the first DNS-01 attempt fails (e.g. zone not yet applied to the
|
||||
secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
|
||||
negative TTL (often 900s = 15 min). Subsequent attempts during that
|
||||
window also fail and refresh the cache. Combined with LE's rate limit
|
||||
of **5 failed authorisations per domain per hour**, recovery requires
|
||||
you to **stop retrying** for ~15 minutes after fixing the DNS, then
|
||||
make at most one attempt.
|
||||
|
||||
## nsupdate sample
|
||||
|
||||
For interactive testing of the bind-acme TSIG path:
|
||||
https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
|
||||
|
||||
```sh
|
||||
printf "server 127.0.0.1
|
||||
zone acme.resolver.name.
|
||||
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
|
||||
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT "hello"
|
||||
send
|
||||
" | nsupdate -y hmac-sha512:acme:XXXXXX
|
||||
```
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@ defaults = {
|
|||
'apt': {
|
||||
'packages': {
|
||||
'dehydrated': {},
|
||||
'bind9-dnsutils': {},
|
||||
'dnsutils': {},
|
||||
},
|
||||
},
|
||||
'letsencrypt': {
|
||||
|
|
|
|||
|
|
@ -1,36 +0,0 @@
|
|||
# nginx
|
||||
|
||||
Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
|
||||
`data/nginx/*.conf`.
|
||||
|
||||
## How port 80 is served
|
||||
|
||||
The bundle ships a fixed `80.conf` to
|
||||
`/etc/nginx/sites-available/80.conf` (picked up by the
|
||||
`sites-enabled/` symlink) that handles **all** port-80 traffic
|
||||
across vhosts:
|
||||
|
||||
1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
|
||||
served from `/var/lib/dehydrated/acme-challenges/`.
|
||||
2. All other port-80 requests are 301-redirected to
|
||||
`https://$host$request_uri`.
|
||||
|
||||
Per-vhost templates only declare `listen 443 ssl http2;`, so they
|
||||
don't need their own port-80 server blocks. If you need vhost-
|
||||
specific port-80 behaviour (e.g. plain-HTTP without redirect),
|
||||
override 80.conf or add a per-vhost block.
|
||||
|
||||
## Required metadata
|
||||
|
||||
- `vm/cores` — read directly by `items.py` for `worker_processes`.
|
||||
No default; `bw items <node>` raises at item-build time if missing.
|
||||
Typically supplied by the `vm` bundle / hetzner-vm group; double-
|
||||
check on bare-metal hosts.
|
||||
- `nginx/vhosts` — dict of vhost-name → vhost-config.
|
||||
- `nginx/modules` — list of dynamic modules to load.
|
||||
|
||||
## Cross-namespace
|
||||
|
||||
`items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
|
||||
HTTPS block when LE hasn't declared the domain yet — keeps the
|
||||
bundle loadable on a node where letsencrypt isn't fully wired up.
|
||||
|
|
@ -32,13 +32,12 @@ http {
|
|||
|
||||
% endif
|
||||
|
||||
# Always defined: serves both WS-enabled vhosts (Connection: upgrade for
|
||||
# ws clients) and SSE/keep-alive vhosts (Connection: "" lets nginx manage
|
||||
# the upstream connection for keep-alive, instead of forcing "close").
|
||||
% if has_websockets:
|
||||
map $http_upgrade $connection_upgrade {
|
||||
default upgrade;
|
||||
'' '';
|
||||
'' close;
|
||||
}
|
||||
% endif
|
||||
|
||||
include /etc/nginx/sites-enabled/*;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -64,7 +64,7 @@ files = {
|
|||
'svc_systemd:nginx:restart',
|
||||
},
|
||||
},
|
||||
'/etc/nginx/sites-available/80.conf': {
|
||||
'/etc/nginx/sites/80.conf': {
|
||||
'triggers': {
|
||||
'svc_systemd:nginx:restart',
|
||||
},
|
||||
|
|
|
|||
|
|
@ -33,7 +33,7 @@ for name, unit in node.metadata.get('systemd/units').items():
|
|||
'svc_systemd:systemd-networkd.service:restart',
|
||||
],
|
||||
}
|
||||
elif extension in ['timer', 'service', 'mount', 'swap', 'target', 'slice']:
|
||||
elif extension in ['timer', 'service', 'mount', 'swap', 'target']:
|
||||
path = f'/usr/local/lib/systemd/system/{name}'
|
||||
dependencies = {
|
||||
'triggers': [
|
||||
|
|
|
|||
|
|
@ -8,16 +8,10 @@ server {
|
|||
|
||||
location / {
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
# Always set Upgrade + Connection via the $connection_upgrade map:
|
||||
# WS client (Upgrade header sent) -> Connection: upgrade
|
||||
# non-WS client (no Upgrade) -> Connection: "" (keep-alive)
|
||||
# Lets every vhost serve both WS and SSE without per-vhost flags.
|
||||
proxy_http_version 1.1;
|
||||
% if websockets:
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection $connection_upgrade;
|
||||
# SSE-safe pass-through (also fine for non-SSE traffic):
|
||||
proxy_buffering off;
|
||||
proxy_read_timeout 1h;
|
||||
% endif
|
||||
proxy_pass ${target};
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -48,51 +48,3 @@ instead.
|
|||
|
||||
See [`conventions.md#secrets`](conventions.md#secrets) for the
|
||||
demagify magic-string list and the rule's full rationale.
|
||||
|
||||
## Read-only commands — useful flag combinations
|
||||
|
||||
The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
|
||||
These are the flag combinations agents reach for most often in this repo:
|
||||
|
||||
| Want to … | Run |
|
||||
|---|---|
|
||||
| Sanity-check the whole repo (parse + cross-cutting hooks) | `bw test` (defaults to `-HIJKMSp`) |
|
||||
| Exercise reactors and item-graph for one node | `bw test <node>` (defaults to `-IJKMp`) |
|
||||
| Same, but every node that has a given bundle | `bw test bundle:<name>` |
|
||||
| Print one metadata key for one node | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
|
||||
| Show where each metadata value comes from | `bw metadata <node> -b` |
|
||||
| Resolve Faults (vault values) into the dump | `bw metadata <node> -f` — **may print secrets, avoid** |
|
||||
| List a node's items, with the bundle that defines each | `bw items <node> --blame` |
|
||||
| Preview a rendered file's content | `bw items <node> file:<path> -f` |
|
||||
| Verify against the live host, scoped to one bundle | `bw verify <node> -o bundle:<name>` |
|
||||
| Hash metadata only (faster than full config hash) | `bw hash <node> -m` |
|
||||
| Inspect the data backing a hash | `bw hash <node> -d` |
|
||||
|
||||
`bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
|
||||
selector grammar: bare node name, group name, `bundle:<name>`,
|
||||
`!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
|
||||
|
||||
[fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
|
||||
|
||||
## Bundle-validation workflow
|
||||
|
||||
`bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
|
||||
loads every bundle, but a bundle's reactors only resolve when a node's
|
||||
metadata is actually built — and that happens only for nodes that
|
||||
opt in. Until then, reactor bugs stay dormant. bw rejects reactors
|
||||
that don't read any metadata, but the rejection only fires once *some*
|
||||
node consumes the bundle.
|
||||
|
||||
When developing a new bundle:
|
||||
|
||||
1. Scaffold + `bw test` — confirms parsing.
|
||||
2. **Attach the bundle to one node** (or a stub node) by adding it to
|
||||
`nodes/<n>.py`'s `bundles` list, or to a group the node is in.
|
||||
3. `bw test <node>` — now reactors fire. This is where bundle bugs
|
||||
surface.
|
||||
4. `bw items <node> --blame` and `bw metadata <node> -k <key>` —
|
||||
confirm items materialise and derived metadata looks right.
|
||||
5. `bw hash <node>` — preview against the live host.
|
||||
|
||||
Step 2 is non-optional. A bundle that "passes `bw test`" with no
|
||||
consumer is proven only to parse.
|
||||
|
|
|
|||
|
|
@ -127,12 +127,6 @@ bundle.
|
|||
|
||||
## 3. Per-bundle `AGENTS.md` template
|
||||
|
||||
> **Status: replaced — pre-pivot intent only.** Per-bundle docs are plain
|
||||
> `README.md` with no fixed structure. See §0 Revisions and the
|
||||
> "Per-bundle README" section in [`bundles/AGENTS.md`](../../../bundles/AGENTS.md)
|
||||
> for the current convention. The template below is kept as a record of
|
||||
> the original design.
|
||||
|
||||
One balanced doc serving both audiences. Prose where prose helps, structure
|
||||
where structure helps. Sections in order:
|
||||
|
||||
|
|
@ -345,12 +339,6 @@ in 30–120 lines each; root `AGENTS.md` is ~150 lines.
|
|||
|
||||
### Phase 2 — seed bundles (10)
|
||||
|
||||
> **Status: dropped — pre-pivot intent only.** Phase 2 didn't ship. After
|
||||
> Phase 1 landed, the maintainer pulled the per-bundle `AGENTS.md`
|
||||
> migration: the rigid template proved a poor fit for the heterogeneous
|
||||
> existing READMEs. See §0 Revisions. The seed list and migration plan
|
||||
> below are kept as a record of how the work was scoped.
|
||||
|
||||
Bundles selected empirically (node+group references and recent commit
|
||||
activity, validated 2026-05-10):
|
||||
|
||||
|
|
|
|||
|
|
@ -1,253 +0,0 @@
|
|||
# Round 1 — agent-doc refactor (gaps 1–6 + cmd cheat sheet)
|
||||
|
||||
## Why
|
||||
|
||||
A previous session integrated `bundles/left4me/` and brought
|
||||
`ovh.left4me` live. The integration produced a handoff (at
|
||||
`~/.claude/plans/2026-05-10-ckn-bw-docs-improvements-handoff.md`)
|
||||
listing 12 documentation gaps surfaced by the work. This spec covers
|
||||
the first six (the cross-cutting ones) plus a useful side-quest:
|
||||
adding a read-only command cheat sheet to `docs/agents/commands.md`.
|
||||
Gaps 7–12 (item-specific, bundle READMEs) are deferred to a follow-up
|
||||
round.
|
||||
|
||||
## Scope
|
||||
|
||||
In:
|
||||
|
||||
- Gap 1 — drop `bw bundles` (doesn't exist), add `bw verify` to the
|
||||
read-only allowlist.
|
||||
- Gap 2 — bundle-validation workflow needs a node attached.
|
||||
- Gap 3 — nodes carry only node-specific metadata (split across
|
||||
`bundles/AGENTS.md` and `nodes/AGENTS.md`).
|
||||
- Gap 4 — reactors must read metadata or be defaults.
|
||||
- Gap 5 — `triggers` ↔ `triggered: True` invariant + self-healing
|
||||
pattern.
|
||||
- Gap 6 — `unless` semantics (folded into Gap 5's second bullet).
|
||||
- Side-quest: read-only command cheat sheet in `commands.md` (`bw
|
||||
test` flag matrix + selectors, `bw metadata -k/-b/-f`, `bw items
|
||||
--blame/-f`, `bw verify -o bundle:`, `bw hash -m/-d`).
|
||||
|
||||
Out:
|
||||
|
||||
- Gaps 7–12 (`source` implicit, `git_deploy` chown, `git_deploy` URL
|
||||
form, letsencrypt/bind/nginx READMEs).
|
||||
- Any change to bundle behaviour. This is pure docs; if a doc claim
|
||||
feels wrong, push back to the maintainer rather than editing
|
||||
`.py`.
|
||||
|
||||
## Verification approach
|
||||
|
||||
For each gap, find current line numbers in the target doc (handoff
|
||||
line numbers are May 2026; some have drifted). Verify code-level
|
||||
claims against the fork source under `.venv/src/bundlewrap/` before
|
||||
quoting them.
|
||||
|
||||
Already verified during brainstorm:
|
||||
|
||||
- Gap 1: `bw bundles` is not a subcommand of the installed fork
|
||||
(`.venv/bin/bw --help` lists only
|
||||
`apply, debug, diff, groups, hash, ipmi, items, lock, metadata,
|
||||
nodes, plot, pw, repo, run, stats, test, verify, zen`). `bw verify`
|
||||
is read-only.
|
||||
- Gap 2: `bw test` default flag set differs by mode. Whole-repo:
|
||||
`-HIJKMSp`. Node-targeted: `-IJKMp`. The repo-mode adds `-H`
|
||||
(repo hooks) and `-S` (subgroup-loops); the node-mode adds `-J`
|
||||
(node hooks). Reactors only resolve when a node's metadata is
|
||||
built, which only happens when a node opts into the bundle.
|
||||
- Gap 4: exact wording at `metagen.py:428`:
|
||||
`"{reactor_name} on {node_name} did not request any metadata, you
|
||||
might want to use defaults instead"`.
|
||||
- Gap 5: exact wording at `deps.py:340`:
|
||||
`"'{item1}' in bundle '{bundle1}' triggered by '{item2}' in bundle
|
||||
'{bundle2}', but missing 'triggered' attribute"`.
|
||||
- Gap 3 precedent: `bundles/left4me/metadata.py:10` is the canonical
|
||||
random-bytes-in-defaults example. `bundles/postgresql/metadata.py:4`
|
||||
is the password_for-at-module-scope example. (The handoff cites
|
||||
postgresql for the random-bytes pattern; that's a misattribution —
|
||||
postgresql uses `password_for`.)
|
||||
|
||||
After every commit: `.venv/bin/bw test` must pass with the same
|
||||
output as before. Pure-docs edits cannot break it unless a `.py` is
|
||||
touched accidentally.
|
||||
|
||||
## Commits
|
||||
|
||||
Six iterative commits, matching repo style.
|
||||
|
||||
### Commit 1 — drop `bw bundles`, add `bw verify` (Gap 1)
|
||||
|
||||
`AGENTS.md` rule 1 only. The handoff also flagged
|
||||
`bundles/AGENTS.md:60-64`, but that list no longer references
|
||||
`bw bundles` (it currently reads `bw test` / `bw items` / `bw hash`).
|
||||
That section gets rewritten in commit 3, not here.
|
||||
|
||||
```diff
|
||||
- to `bw test`, `bw nodes`, `bw groups`, `bw bundles`,
|
||||
- `bw items`, `bw metadata`, `bw hash`, `bw debug`. See
|
||||
+ to `bw test`, `bw nodes`, `bw groups`, `bw items`,
|
||||
+ `bw metadata`, `bw hash`, `bw verify`, `bw debug`. See
|
||||
```
|
||||
|
||||
### Commit 2 — read-only command cheat sheet
|
||||
|
||||
Append to `docs/agents/commands.md`. New H2 section, table format
|
||||
to match the existing voice.
|
||||
|
||||
```markdown
|
||||
## Read-only commands — useful flag combinations
|
||||
|
||||
The fork's [`AGENTS.md`][fork] documents the canonical safety envelope.
|
||||
These are the flag combinations agents reach for most often in this repo:
|
||||
|
||||
| Want to … | Run |
|
||||
|---|---|
|
||||
| Sanity-check the whole repo (parse + cross-cutting hooks) | `bw test` (defaults to `-HIJKMSp`) |
|
||||
| Exercise reactors and item-graph for one node | `bw test <node>` (defaults to `-IJKMp`) |
|
||||
| Same, but every node that has a given bundle | `bw test bundle:<name>` |
|
||||
| Print one metadata key for one node | `bw metadata <node> -k <a/b>` (repeat `-k` for more) |
|
||||
| Show where each metadata value comes from | `bw metadata <node> -b` |
|
||||
| Resolve Faults (vault values) into the dump | `bw metadata <node> -f` — **may print secrets, avoid** |
|
||||
| List a node's items, with the bundle that defines each | `bw items <node> --blame` |
|
||||
| Preview a rendered file's content | `bw items <node> file:<path> -f` |
|
||||
| Verify against the live host, scoped to one bundle | `bw verify <node> -o bundle:<name>` |
|
||||
| Hash metadata only (faster than full config hash) | `bw hash <node> -m` |
|
||||
| Inspect the data backing a hash | `bw hash <node> -d` |
|
||||
|
||||
`bw test`, `bw verify`, `bw nodes`, `bw metadata` all share a target-
|
||||
selector grammar: bare node name, group name, `bundle:<name>`,
|
||||
`!bundle:<name>`, or `"lambda:node.metadata_get('foo/bar', 0) < 3"`.
|
||||
|
||||
[fork]: https://github.com/CroneKorkN/bundlewrap/blob/main/AGENTS.md
|
||||
```
|
||||
|
||||
### Commit 3 — bundle validation needs a node attached (Gap 2)
|
||||
|
||||
Two file changes.
|
||||
|
||||
**`bundles/AGENTS.md` lines 59-64** — replace the Verify list:
|
||||
|
||||
```markdown
|
||||
5. **Verify, in this order:**
|
||||
- `bw test` — repo-wide parse + cross-cutting hooks. Loads every
|
||||
bundle, but reactors don't fire for nodes that haven't opted into
|
||||
the bundle yet — bugs in new reactors stay hidden here.
|
||||
- **Attach the bundle to a node** (via the node's `bundles` list, or
|
||||
a group it belongs to). Until you do, the next steps don't actually
|
||||
exercise the bundle.
|
||||
- `bw test <node>` — exercises every reactor and item-graph edge for
|
||||
that node. This is where most new-bundle bugs surface.
|
||||
- `bw items <node> --blame` — confirm items materialise with the right
|
||||
paths, authored by the expected bundle.
|
||||
- `bw metadata <node> -k <a/b>` — spot-check derived metadata.
|
||||
- `bw hash <node>` — preview vs current host state.
|
||||
|
||||
See [`docs/agents/commands.md#bundle-validation-workflow`](../docs/agents/commands.md#bundle-validation-workflow)
|
||||
for the rationale.
|
||||
```
|
||||
|
||||
**`docs/agents/commands.md`** — new section after the cheat sheet:
|
||||
|
||||
```markdown
|
||||
## Bundle-validation workflow
|
||||
|
||||
`bw test` (no args) is a *parsing* gate, not a *behaviour* gate. It
|
||||
loads every bundle, but a bundle's reactors only resolve when a node's
|
||||
metadata is actually built — and that happens only for nodes that
|
||||
opt in. Until then, reactor bugs stay dormant. bw rejects reactors that
|
||||
don't read any metadata, but the rejection only fires once *some* node
|
||||
consumes the bundle.
|
||||
|
||||
When developing a new bundle:
|
||||
|
||||
1. Scaffold + `bw test` — confirms parsing.
|
||||
2. **Attach the bundle to one node** (or a stub node) by adding it to
|
||||
`nodes/<n>.py`'s `bundles` list, or to a group the node is in.
|
||||
3. `bw test <node>` — now reactors fire. This is where bundle bugs
|
||||
surface.
|
||||
4. `bw items <node> --blame` and `bw metadata <node> -k <key>` — confirm
|
||||
items materialise and derived metadata looks right.
|
||||
5. `bw hash <node>` — preview against the live host.
|
||||
|
||||
Step 2 is non-optional. A bundle that "passes `bw test`" with no consumer
|
||||
is proven only to parse.
|
||||
```
|
||||
|
||||
### Commit 4 — nodes carry only node-specific metadata (Gap 3)
|
||||
|
||||
**`bundles/AGENTS.md` Conventions** — new bullet:
|
||||
|
||||
```markdown
|
||||
- **Bundles own application-wide knowledge; nodes carry only the few
|
||||
per-host knobs the bundle actually needs.** When designing a bundle,
|
||||
identify the per-node knobs (e.g. domain, uplink interface, a
|
||||
vault-id suffix) and put everything else in `defaults`, or in a
|
||||
reactor that derives from those knobs. Per-node random secrets
|
||||
belong in `defaults` via `repo.vault.random_bytes_as_base64_for(...)`
|
||||
keyed on the node — not in the node file. See
|
||||
`bundles/left4me/metadata.py:10` (`secret_key` derived in defaults)
|
||||
and `bundles/postgresql/metadata.py:4` (vault-derived `password_for`
|
||||
at module scope).
|
||||
```
|
||||
|
||||
**`nodes/AGENTS.md` Pitfalls** — new bullet:
|
||||
|
||||
```markdown
|
||||
- **Bloated per-node metadata is usually a bundle smell.** If a
|
||||
bundle's metadata block in the node file has more than 3-5 keys,
|
||||
the bundle is probably under-using `defaults` / reactors. Push the
|
||||
contribution into the bundle (see
|
||||
[`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
|
||||
growing the node file.
|
||||
```
|
||||
|
||||
### Commit 5 — reactors must read metadata or be defaults (Gap 4)
|
||||
|
||||
**`bundles/AGENTS.md` Pitfalls** — new bullet:
|
||||
|
||||
```markdown
|
||||
- **Reactors must read metadata.** If a reactor body returns a static
|
||||
dict without calling `metadata.get(...)`, bw raises
|
||||
`ValueError: <reactor> on <node> did not request any metadata, you
|
||||
might want to use defaults instead` once a node consumes the bundle.
|
||||
Fix: fold the contribution into `defaults`. The rule applies even
|
||||
when the reactor writes into another bundle's namespace — a static
|
||||
contribution to e.g. `nftables/output` belongs in `defaults`, where
|
||||
bw merges it with other bundles' contributions.
|
||||
```
|
||||
|
||||
### Commit 6 — `triggers` invariant + self-healing + `unless` (Gaps 5+6)
|
||||
|
||||
**`bundles/AGENTS.md` Pitfalls** — two new bullets (Gap 6's `unless`
|
||||
semantics fold into the second; cleaner than three bullets):
|
||||
|
||||
```markdown
|
||||
- **`triggers` ↔ `triggered: True` invariant.** Any item listed in
|
||||
another's `triggers` list must declare `triggered: True`. bw
|
||||
enforces this at `bw test` time: *"…triggered by …, but missing
|
||||
'triggered' attribute"*. Corollary: an action can't be both in an
|
||||
upstream `triggers` list AND self-healing every apply — pick one.
|
||||
|
||||
- **Triggered actions don't recover from partial failure.** When an
|
||||
upstream item's apply succeeds but its triggered downstream action
|
||||
fails, subsequent applies can't recover via the trigger chain —
|
||||
upstream is "already in desired state" and never re-triggers. For
|
||||
actions that must self-heal (pip installs, chowns, migrations),
|
||||
drop `triggered: True` and gate the command with `unless:
|
||||
<fast-check>`. `unless` is a shell command on the target host whose
|
||||
exit status decides whether the main command runs (exit 0 = skip);
|
||||
it's checked at fire time, after `triggered:` filtering.
|
||||
```
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Gaps 7–12 — deferred. The maintainer re-engages after this round.
|
||||
- Bundle behaviour changes. Pure docs.
|
||||
- `bw apply` / `bw run` — not authorised this session.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Don't echo decrypted secrets in commit messages or new doc text.
|
||||
- Don't restore `*.py_` parked nodes.
|
||||
- After each commit, `.venv/bin/bw test` must pass.
|
||||
- No push.
|
||||
|
|
@ -1,286 +0,0 @@
|
|||
# Round 2 — agent-doc refactor (gaps 7–12)
|
||||
|
||||
## Why
|
||||
|
||||
Continuation of round 1 (spec at
|
||||
`2026-05-10-ckn-bw-agents-md-refactor-round-1-design.md`). Round 1
|
||||
landed the cross-cutting lessons (read-only allowlist, bundle
|
||||
validation needs a node, nodes-carry-only-node-specific-metadata,
|
||||
reactors-must-read-metadata, triggers/triggered:True invariant,
|
||||
self-healing pattern). Round 2 covers the remaining six gaps: built-in
|
||||
item-type gotchas and three bundle READMEs.
|
||||
|
||||
## Scope
|
||||
|
||||
In:
|
||||
|
||||
- Gap 7 — `file:`'s `source` defaults to the basename of the destination.
|
||||
- Gap 8 — `git_deploy` extracts as the connecting user (root after
|
||||
sudo); chown action needed for non-root downstream consumers.
|
||||
- Gap 9 — `git_deploy` URL form: `://` triggers per-apply clone, no `://`
|
||||
requires a `git_deploy_repos` map at the repo root.
|
||||
- Gap 10 — `bundles/letsencrypt`: first-apply behaviour, DNS-01
|
||||
prerequisites, negative-cache penalty.
|
||||
- Gap 11 — `bundles/bind`: applying changes to a `master_node`-linked
|
||||
pair needs `bw apply` on both ends.
|
||||
- Gap 12 — `bundles/nginx`: how port 80 is served, `vm/cores`
|
||||
requirement.
|
||||
|
||||
Out:
|
||||
|
||||
- Bundle behaviour changes. Pure docs.
|
||||
- `bw apply` / `bw run` — not authorised this session.
|
||||
|
||||
## Placement decision (diverges from the handoff)
|
||||
|
||||
The handoff suggests `items/AGENTS.md` for gaps 7, 8, 9. But
|
||||
`items/AGENTS.md` is scoped to **custom** item types in the `items/`
|
||||
directory — its first sentence: *"Custom item types — each `*.py` is
|
||||
a `bundlewrap.items.Item` subclass…"*. Built-in gotchas (`file:`,
|
||||
`git_deploy:`) don't fit there.
|
||||
|
||||
Round-1 lessons about built-in mechanics (reactors must read metadata,
|
||||
`triggers` invariant, self-healing pattern) all landed in
|
||||
`bundles/AGENTS.md` Pitfalls. Gaps 7, 8, 9 are the same shape, so
|
||||
they go in the same place.
|
||||
|
||||
## Validation findings
|
||||
|
||||
- Gap 7: well-known bw built-in semantics. Trusting the handoff.
|
||||
- Gap 8: confirmed at `.venv/src/bundlewrap/bundlewrap/items/git_deploy.py`'s
|
||||
`fix()` method — uses `self.node.upload(...)` which writes as the sudo
|
||||
user (root). Files end up root-owned.
|
||||
- Gap 9: confirmed in round 1 (`git_deploy.py:103` —
|
||||
`if "://" in self.attributes['repo']:`).
|
||||
- Gap 10: confirmed `/etc/dehydrated/letsencrypt-ensure-some-certificate`
|
||||
exists in the bundle; runs on every domain with idempotent `unless`.
|
||||
Daily timer at `/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`.
|
||||
- Gap 11: nuanced. The bundle DOES set `bind/type = 'slave'` and renders
|
||||
different named.conf.local for slaves, so bind itself may AXFR at
|
||||
runtime. But the slave's *bw-managed* zone files are statically
|
||||
rendered from the master's metadata at slave-apply time
|
||||
(`bundles/bind/items.py:100`). The practical workflow rule — "apply
|
||||
both" — is correct regardless. I'll frame the README as the workflow
|
||||
rule, not the absolute "not AXFR slaving" claim from the handoff.
|
||||
- Gap 12: confirmed `nginx.conf:42` includes `/etc/nginx/sites-enabled/*`;
|
||||
`nginx/items.py:35` reads `node.metadata.get('vm/cores')` with no
|
||||
default. README does not exist.
|
||||
|
||||
## Existing README states
|
||||
|
||||
- `bundles/letsencrypt/README.md` — 9 lines: upstream link + nsupdate
|
||||
snippet. Reshape into an operational README; keep the nsupdate snippet.
|
||||
- `bundles/bind/README.md` — does not exist. Create.
|
||||
- `bundles/nginx/README.md` — does not exist. Create.
|
||||
|
||||
## Commits
|
||||
|
||||
### Commit 7 — `file:` source defaults to destination basename (Gap 7)
|
||||
|
||||
`bundles/AGENTS.md` Pitfalls — new bullet:
|
||||
|
||||
```markdown
|
||||
- **`file:` `source` defaults to the destination basename.** For a
|
||||
destination of `/etc/foo/bar.conf` with no `source` key, bw looks for
|
||||
`bundles/<bundle>/files/bar.conf`. Only declare `source` explicitly
|
||||
when the basename you want differs (e.g. shipping a Mako template
|
||||
named `bar.conf.mako` to a destination of `/etc/foo/bar.conf`).
|
||||
```
|
||||
|
||||
### Commit 8 — `git_deploy` gotchas (Gaps 8 + 9)
|
||||
|
||||
`bundles/AGENTS.md` Pitfalls — two new bullets.
|
||||
|
||||
```markdown
|
||||
- **`git_deploy` extracts as the connecting (sudo) user — files end up
|
||||
root-owned.** A downstream action that runs as a non-root app user
|
||||
(typical: editable pip install, Rails bundle install) will hit
|
||||
`Permission denied` on `.egg-info` or similar. The fix is a
|
||||
self-healing chown action between `git_deploy` and the downstream
|
||||
action:
|
||||
|
||||
```python
|
||||
actions['<bundle>_chown_src'] = {
|
||||
'command': 'chown -R <user>:<group> <path>',
|
||||
'unless': 'test -z "$(find <path> ! -user <user> -print -quit)"',
|
||||
'cascade_skip': False,
|
||||
'needs': ['git_deploy:<path>', 'user:<user>', 'group:<group>'],
|
||||
}
|
||||
```
|
||||
|
||||
See `bundles/left4me/items.py` for an in-tree example.
|
||||
|
||||
- **`git_deploy` URL form matters.** A URL containing `://` (HTTP/HTTPS,
|
||||
`ssh://`) makes bw clone to a temp dir per-apply — no operator-side
|
||||
state needed. Without `://` (SCP-style `git@host:path`), bw expects a
|
||||
`git_deploy_repos` map file at the repo root pointing at a long-lived
|
||||
local clone, and raises `RepositoryError('missing repo map for
|
||||
git_deploy')` if it isn't there. For HTTPS-reachable repos use the
|
||||
HTTPS form; for SSH-only, prefer the explicit `ssh://user@host/path`
|
||||
form so the map isn't needed.
|
||||
```
|
||||
|
||||
### Commit 9 — letsencrypt README (Gap 10)
|
||||
|
||||
Reshape `bundles/letsencrypt/README.md`. Keep the upstream link and
|
||||
nsupdate snippet at the top; add three structured sections.
|
||||
|
||||
```markdown
|
||||
# letsencrypt
|
||||
|
||||
Issues and renews Let's Encrypt certs via [dehydrated][upstream] with
|
||||
DNS-01 against the in-house bind-acme server.
|
||||
|
||||
[upstream]: https://github.com/dehydrated-io/dehydrated/wiki/example-dns-01-nsupdate-script
|
||||
|
||||
## First-apply behaviour
|
||||
|
||||
Immediately after `bw apply <node>`, nginx serves a **self-signed
|
||||
cert** for each declared domain — generated by
|
||||
`/etc/dehydrated/letsencrypt-ensure-some-certificate` so nginx has
|
||||
something to start with. The real Let's Encrypt cert arrives at most
|
||||
24h later when the systemd timer fires
|
||||
(`/usr/bin/dehydrated --cron --accept-terms --challenge dns-01`). To
|
||||
shortcut the wait:
|
||||
|
||||
```sh
|
||||
ssh <node> 'sudo /usr/bin/dehydrated --cron --accept-terms --challenge dns-01'
|
||||
ssh <node> 'sudo systemctl reload nginx'
|
||||
```
|
||||
|
||||
## DNS-01 prerequisites
|
||||
|
||||
`hook.sh` does `nsupdate` against the bind-acme server (referenced
|
||||
by `letsencrypt/acme_node`). For the challenge to succeed:
|
||||
|
||||
1. The acme node must be in the same metadata graph (so
|
||||
`bw metadata <node> -k letsencrypt/acme_node` resolves).
|
||||
2. **All NS servers** for the validated domain must serve the
|
||||
`_acme-challenge.<domain>` CNAME — Let's Encrypt validates from
|
||||
primary AND secondary geographic regions; both authoritative
|
||||
servers must agree. If a secondary NS is also a bw-managed node,
|
||||
`bw apply` it after adding the domain (see e.g. `ovh.secondary`).
|
||||
3. The bind-acme node's TSIG key must be reachable. `hook.sh` is
|
||||
rendered with the bind-acme server's `network/internal/ipv4` —
|
||||
for clients outside that LAN, the route must exist (typically via
|
||||
wireguard `s2s` peer membership).
|
||||
|
||||
## Negative-cache penalty
|
||||
|
||||
If the first DNS-01 attempt fails (e.g. zone not yet applied to the
|
||||
secondary NS), Let's Encrypt's resolvers cache NXDOMAIN for the SOA's
|
||||
negative TTL (often 900s = 15 min). Subsequent attempts during that
|
||||
window also fail and refresh the cache. Combined with LE's rate limit
|
||||
of **5 failed authorisations per domain per hour**, recovery requires
|
||||
you to **stop retrying** for ~15 minutes after fixing the DNS, then
|
||||
make at most one attempt.
|
||||
|
||||
## nsupdate sample
|
||||
|
||||
For interactive testing of the bind-acme TSIG path:
|
||||
|
||||
```sh
|
||||
printf "server 127.0.0.1
|
||||
zone acme.resolver.name.
|
||||
update add _acme-challenge.ckn.li.acme.resolver.name. 600 IN TXT \"hello\"
|
||||
send
|
||||
" | nsupdate -y hmac-sha512:acme:<TSIG_KEY_REDACTED>
|
||||
```
|
||||
```
|
||||
|
||||
### Commit 10 — bind README (Gap 11, reframed)
|
||||
|
||||
Create `bundles/bind/README.md`. Frame as the workflow rule, not the
|
||||
absolute "not AXFR" claim.
|
||||
|
||||
```markdown
|
||||
# bind
|
||||
|
||||
Authoritative DNS — primary plus optional `bind/master_node` slaves.
|
||||
|
||||
## Applying changes needs both nodes
|
||||
|
||||
The slave's bw-managed zone files are rendered from the master's
|
||||
metadata at slave-apply time (see `bundles/bind/items.py:100`). When
|
||||
you change a record on the master (adding a `letsencrypt/domains`
|
||||
entry, a new vhost, etc.), the change is only published once you
|
||||
apply BOTH:
|
||||
|
||||
```sh
|
||||
bw apply htz.mails # primary (where the source records live)
|
||||
bw apply ovh.secondary # secondary (renders its own zone files)
|
||||
```
|
||||
|
||||
Until both have been applied, `bw verify ovh.secondary` will show
|
||||
stale zones and consumers that hit the secondary (Let's Encrypt's
|
||||
secondary-region validators in particular) will see NXDOMAIN. Even
|
||||
though the slave's named.conf.local declares `type slave;`, don't
|
||||
rely on bind's own AXFR catching up — the bw-rendered file on disk
|
||||
is what `bw verify` measures.
|
||||
|
||||
## See also
|
||||
|
||||
- `bundles/bind-acme/` — the in-house ACME-update receiver.
|
||||
- `bundles/letsencrypt/README.md` — DNS-01 prerequisites and the
|
||||
negative-cache penalty (the most common operational consequence of
|
||||
forgetting to apply the secondary).
|
||||
```
|
||||
|
||||
### Commit 11 — nginx README (Gap 12)
|
||||
|
||||
Create `bundles/nginx/README.md`.
|
||||
|
||||
```markdown
|
||||
# nginx
|
||||
|
||||
Webserver. Per-node vhosts in `nginx/vhosts`; per-vhost templates in
|
||||
`data/nginx/*.conf`.
|
||||
|
||||
## How port 80 is served
|
||||
|
||||
The bundle ships a fixed `80.conf` to
|
||||
`/etc/nginx/sites-available/80.conf` (picked up by the
|
||||
`sites-enabled/` symlink) that handles **all** port-80 traffic
|
||||
across vhosts:
|
||||
|
||||
1. ACME HTTP-01 challenges (`/.well-known/acme-challenge/`) are
|
||||
served from `/var/lib/dehydrated/acme-challenges/`.
|
||||
2. All other port-80 requests are 301-redirected to
|
||||
`https://$host$request_uri`.
|
||||
|
||||
Per-vhost templates only declare `listen 443 ssl http2;`, so they
|
||||
don't need their own port-80 server blocks. If you need vhost-
|
||||
specific port-80 behaviour (e.g. plain-HTTP without redirect), you'll
|
||||
need to override 80.conf or add a per-vhost block.
|
||||
|
||||
## Required metadata
|
||||
|
||||
- `vm/cores` — read directly by `items.py` for `worker_processes`.
|
||||
No default; `bw items <node>` raises at item-build time if missing.
|
||||
Typically supplied by the `vm` bundle / hetzner-vm group; double-
|
||||
check on bare-metal hosts.
|
||||
- `nginx/vhosts` — dict of vhost-name → vhost-config.
|
||||
- `nginx/modules` — list of dynamic modules to load.
|
||||
|
||||
## Cross-namespace
|
||||
|
||||
`items.py` reads `letsencrypt/domains` to skip emitting a per-vhost
|
||||
HTTPS block when LE hasn't declared the domain yet — keeps the bundle
|
||||
loadable on a node where letsencrypt isn't fully wired up.
|
||||
```
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Bundle behaviour changes. Pure docs.
|
||||
- `bw apply` / `bw run`.
|
||||
- Reformatting the existing two-line bundle READMEs into the new
|
||||
shape — bundles/AGENTS.md explicitly says don't do that
|
||||
("uneven quality is part of what we accept in exchange for not
|
||||
blocking other work").
|
||||
|
||||
## Constraints
|
||||
|
||||
- Don't echo decrypted secrets. The TSIG-key example in the
|
||||
letsencrypt nsupdate snippet uses `<TSIG_KEY_REDACTED>`.
|
||||
- After each commit, `.venv/bin/bw test` must pass.
|
||||
- No push.
|
||||
|
|
@ -1,5 +0,0 @@
|
|||
{
|
||||
'bundles': {
|
||||
'left4me',
|
||||
},
|
||||
}
|
||||
|
|
@ -81,12 +81,6 @@ This loader shape has consequences:
|
|||
These are intentional parks/buffers, not bugs.
|
||||
- **`id` must be unique.** A pre-apply hook (`hooks/unique_node_ids.py`)
|
||||
enforces this; duplicate IDs fail `bw test` and `bw apply`.
|
||||
- **Bloated per-node metadata is usually a bundle smell.** If a
|
||||
bundle's metadata block in the node file has more than 3-5 keys,
|
||||
the bundle is probably under-using `defaults` / reactors. Push the
|
||||
contribution into the bundle (see
|
||||
[`bundles/AGENTS.md`](../bundles/AGENTS.md#conventions)) rather than
|
||||
growing the node file.
|
||||
|
||||
## See also
|
||||
|
||||
|
|
|
|||
|
|
@ -233,7 +233,6 @@
|
|||
'10.0.229.0/24',
|
||||
],
|
||||
},
|
||||
'ovh.left4me': {},
|
||||
},
|
||||
'clients': {
|
||||
'macbook': {
|
||||
|
|
|
|||
|
|
@ -1,21 +1,15 @@
|
|||
{
|
||||
'hostname': '141.95.32.8',
|
||||
'username': 'debian',
|
||||
'groups': [
|
||||
'backup',
|
||||
'debian-13',
|
||||
'left4me',
|
||||
'monitored',
|
||||
'webserver',
|
||||
],
|
||||
'bundles': [
|
||||
'wireguard',
|
||||
#'wireguard',
|
||||
],
|
||||
'metadata': {
|
||||
'id': '14d2abc-3855-4bb7-99e2-d4e3eb0344dd',
|
||||
'vm': {
|
||||
'cores': 4, # 4 physical, 8 with HT
|
||||
'threads': 8,
|
||||
},
|
||||
'network': {
|
||||
'external': {
|
||||
'interface': 'enp3s0f0',
|
||||
|
|
@ -41,12 +35,5 @@
|
|||
},
|
||||
},
|
||||
},
|
||||
'left4me': {
|
||||
'domain': 'left4.me',
|
||||
# Both HT siblings of physical core 0 (cpu0+cpu4 per
|
||||
# /sys/devices/system/cpu/cpu0/topology/thread_siblings_list).
|
||||
# Keeps system work off the physical cores running game ticks.
|
||||
'system_cpus': {0, 4},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in a new issue