diff --git a/docs/superpowers/specs/2026-05-15-session-handoff.md b/docs/superpowers/specs/2026-05-15-session-handoff.md index 0043f64..005372f 100644 --- a/docs/superpowers/specs/2026-05-15-session-handoff.md +++ b/docs/superpowers/specs/2026-05-15-session-handoff.md @@ -1,178 +1,126 @@ -# Session handoff — next: write the hardening-refactor implementation plan +# Session handoff — hardening refactor landed -Short handoff. The hardening test plan was executed end-to-end on -`left4.me` this session. Results are recorded inline in the spec at -`docs/superpowers/specs/2026-05-15-hardening-test-plan.md` (commit -`461b8d0`). The next session writes the implementation plan that lands -the proven composition in ckn-bw. +The hardening refactor planned at +`docs/superpowers/plans/2026-05-15-hardening-refactor.md` is deployed +to `left4.me` and verified. This session executed all 12 tasks +subagent-driven; no follow-up implementation work is queued. -## What just happened +## What landed -Ran all 11 tests from the hardening test plan on -`left4me-server@1` (canary) and `left4me-web` against the live host -at `left4.me` / `left4me.ovh.ckn.li` (Debian 13, systemd 257). All -drop-ins cleaned up at session end; the Test 9 sysctl -(`kernel.yama.ptrace_scope=2`) is the one persistent host change. -`gdb` + `seccomp` packages left installed. +**left4me commits** (this session, in order; all on `master`, pushed): +- `7c64910` — `spec(hardening-refactor): resolve emitter open items` + (verified ckn-bw systemd-bundle emitter handles tuples + empty values) +- `8e678b6` — `deploy/files: annotate reference units with per-directive hardening comments` +- `37309ba` — `spec(hardening-test-plan): fix four bugs surfaced by executor` +- `f615d0d` — `spec(user-uid-split): mark superseded by the hardening refactor` -Headline numbers: -- `left4me-server@1.service`: **7.5 EXPOSED → 1.3 OK** (systemd-analyze) -- `left4me-web.service`: **8.7 EXPOSED → 4.1 OK** -- Test 8 attack matrix: all 8 vectors (D1.a/b/c, D2.a/b/c, D3, D5) blocked. +**ckn-bw commits** (this session, in order; all on `master`, pushed): +- `85b9af0` — `bundles/left4me: add HARDENING_{COMMON,SERVER,WEB} constants` +- `640461c` — `bundles/left4me: spread HARDENING_SERVER into left4me-server@.service` +- `c6721e7` — `bundles/left4me: spread HARDENING_WEB into left4me-web.service` +- `130b0b1` — `bundles/left4me: ship kernel.yama.ptrace_scope=2 sysctl drop-in` -Three things the test surfaced that change what the refactor must look like: -- **`SystemCallArchitectures=native x86`**, not bare `native`. - `srcds_linux` is 32-bit i386; with `native=AUDIT_ARCH_X86_64` only, - every i386 syscall is killed and srcds_run respawns every 10 s. -- **Add `PrivatePIDs=true`** to the composition. `ProtectProc=invisible` - alone cannot hide gunicorn from srcds because they share uid 980; - PrivatePIDs gives each instance its own PID namespace and closes - D2.b without needing the uid split. -- **Exclude `MemoryDenyWriteExecute=true`.** Source engine i386 `.so` - files have text relocations; MDW returns EPERM on the relocation - `mprotect`, dlopen aborts, srcds enters the respawn loop. Permanent - exclusion — not fixable without rebuilding Valve's closed-source - binary. +**Deploy:** `bw apply ovh.left4me` ran clean in 10 s (194 OK, 4 fixed, +0 failed). `left4me-web.service` restarted automatically by `bw`; +`left4me-server@1` and `@2` restarted manually post-apply. -Full per-test detail is in the spec's "Results" section. +## What's live on `left4.me` -## What's next: write the refactor plan +| Unit | systemd-analyze score | State | +|---|---|---| +| `left4me-server@1.service` | **1.3 OK** (was 7.5 baseline) | active since 13:13:39 UTC | +| `left4me-server@2.service` | 1.3 OK | active since 13:14:40 UTC | +| `left4me-web.service` | **4.1 OK** (was 8.7 baseline) | active since 13:01:06 UTC | -Target file: `docs/superpowers/plans/2026-05-16-hardening-refactor.md` -(or whatever date the next session opens). +Sysctl: `kernel.yama.ptrace_scope = 2` (managed by ckn-bw bundle now, +not hand-applied). -Scope: +Composition matches Test 7 of the test plan with two amendments +(`SystemCallArchitectures=native x86`, `PrivatePIDs=true`) and one +addition (`SocketBindAllow=udp:27000-27999 tcp:27000-27999`). +`MemoryDenyWriteExecute=true` permanently excluded. -1. **Land the proven composition in ckn-bw.** Live source for the - unit emission is `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`. - The reactor emits `left4me-server@.service` and `left4me-web.service` - — both need the new directives. Copy the Test 7 drop-in (from the - spec) into the reactor's unit body, with the two amendments above. +## Attack vectors blocked (Test 8 subset rerun post-deploy) -2. **Land the web composition** (sudo-compatible subset from Test 10) - in the same reactor. +- **D1.a — srcds reads DB**: `cat /var/lib/left4me/left4me.db` from + inside the unit's mount namespace → `No such file or directory` +- **D1.b — srcds reads web.env**: `cat /etc/left4me/web.env` → + `No such file or directory` +- **D1.c — srcds sees /opt**: empty listing +- **D2.b — srcds sees gunicorn PID via /proc**: `cannot access /proc/` + (PrivatePIDs in effect; PID doesn't exist in the namespace) +- **D5 — cross-instance ptrace**: `cannot access /proc/` + (cross-instance PID isolation) +- **Syscall filter compiled correctly**: `ptrace` and `process_vm_*` + not in the compiled allow list (verified via + `systemd-analyze syscall-filter`) -3. **Land the sysctl drop-in in ckn-bw.** Currently - `/etc/sysctl.d/99-left4me-ptrace.conf` is host-only — if ckn-bw - later enforces unmanaged-file removal, this would disappear. Add - `pkg_files:` entry (or whatever the bundle convention is) for - `kernel.yama.ptrace_scope=2`. +## Known acceptable noise -4. **Update reference units** in - `deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service` - to mirror the new emission (these are reference-only post the - deploy-dir-rethink, but should not drift from the live source). +- **One SECCOMP audit line per gameserver restart** (`type=1326`, + i386 syscall 26 = `ptrace`, sig=31 SIGSYS, code=0x80000000 + SECCOMP_RET_KILL_PROCESS). Source: srcds's Breakpad crash-reporter + init forks a child that attempts `ptrace`; we block it by design. + The child gets killed; the main srcds process is unaffected. Net + effect: Valve doesn't get crash minidumps from this host. + Acceptable trade-off given the threat model. If the audit-log noise + becomes a problem, switch the SECCOMP filter's action from + `KILL_PROCESS` to `EPERM` via `SystemCallErrorNumber=EPERM` (would + let breakpad fail cleanly instead of getting killed; same security + outcome). -5. **Decide on `SocketBindAllow=`** for game port range (27000–27999 - per `LEFT4ME_PORT_RANGE_*`). Worth adding to lock srcds's bindable - sockets; not tested in this session. +## Host cleanup done -6. **Resolve the deferred specs:** - - `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` — - **mark as superseded.** PrivatePIDs + PrivateUsers close the - same-uid /proc gap that motivated it. Note the residual app-level - same-uid surface (DB ACLs, web.env mode) is a separate concern - not addressed by uid split anyway. - - AppArmor follow-up — defer further; defenses survey lists it. - Revisit if directive-only hardening leaves observable gaps. +`gdb`, `libseccomp-dev`, `seccomp` removed via `apt remove --purge`. +Test tooling was installed during the test-plan execution session +(commit `461b8d0`); not needed in steady state. ~13 MB freed. -7. **Fix the four spec bugs documented at the bottom of the test plan** - (PID-lookup races, gdb-from-outside-NS verification flaw, D5 - pgrep pattern, scmp_sys_resolver package name). +## What's next -### Recommendation on sequencing +No queued follow-up from this work. Adjacent open work: -Before touching ckn-bw, run **superpowers:brainstorming** on the -refactor — there's a real design choice on emission shape. The -test-plan drop-in is ~50 lines of new directives; the existing -reactor emits a smaller unit. Options: +- **`build-overlay-unit` refactor** + (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`). + Will reuse `HARDENING_COMMON` (or a sandbox-class variant) when it + lands. Sequenced after this; not blocked. +- **Broader configmgmt-responsibility reshape** — hardening as drop-in + files living in left4me with ckn-bw as a thin file-shipper. Real + direction, deliberately deferred to a dedicated session in this + refactor's design doc. +- **Stale RCON port app bug** flagged in the prior executor's handoff. + Not a hardening issue; separate scope. -- **A. Inline.** All directives land directly in the - `[Service]` block emitted by the reactor. Simple, ckn-bw-idiomatic. -- **B. Profile-as-reusable-fragment.** Put the directive block in a - shared bundle (so the future build-overlay-unit refactor can reuse - it). Better factoring, more upfront design. -- **C. Drop-in pattern preserved.** Reactor emits the base unit - unchanged, plus a separate `*.service.d/hardening.conf` drop-in. - Mirrors the test methodology; easier to revert by removing the - drop-in. +## Open items the operator should sanity-check manually -My weak preference is **A** for the first pass — get the production -state hardened, then refactor into shared shape (B) when the -build-overlay-unit work needs it. **C** is operationally clean but -introduces a new emission pattern just for this. Worth 10 minutes of -brainstorming before committing. +I executed everything programmatically that I could. The following +need an eyeballed check via the web UI from your laptop: -## Decision-relevant context +1. Login to the web UI; confirm session works (would catch a SECRET_KEY + regression or session-cookie issue). +2. Start/stop a server from the UI (exercises the sudo path on the web + unit; if the SystemCallFilter or any other web hardening broke + sudo, this would fail). +3. View live logs for a server (uses `sudo left4me-journalctl`). +4. Trigger an overlay rebuild for a script overlay (exercises the + sandbox; unchanged by this refactor, but a smoke against the + full chain). -- **Source of truth is ckn-bw.** `deploy/files/.../*.service` copies - are reference-only post-deploy-dir-rethink. Don't edit them as the - primary change — emit-then-mirror. -- **Sandbox `l4d2-sandbox` unit is out of scope.** Verified during - prior build-time-idmap work; do not weaken. -- **Web sudo helpers must keep working.** `NoNewPrivileges` and - `PrivateUsers` are NOT in the web composition (Test 10 confirmed - the sudo-compat subset). The "replace sudo with systemctl-managed - unit triggering" refactor (build-overlay-unit spec is a step - toward it) would unlock deeper web hardening later. -- **App-level stale RCON port bug** surfaced during testing: each - srcds restart picks a new ephemeral RCON port; the web app - caches the previous one and logs `Connection refused`. Pre-exists - hardening (repro'd before any drop-in was applied). Separate issue, - not for this refactor. Mention to operator; track in whatever - issue-tracking the project uses. -- **gdb + seccomp packages on left4.me** are installed but not in - ckn-bw. Either add them to the bundle (so they're reproducible) - or `apt remove` them after the refactor lands — operator - preference. - -## Host state at end of session - -- `left4me-server@1`, `@2`, `left4me-web`: all `active`, baseline - (no drop-ins). -- `/etc/sysctl.d/99-left4me-ptrace.conf` present, `ptrace_scope=2` - effective. -- `gdb`, `seccomp` (provides `scmp_sys_resolver`), `libseccomp-dev` - installed. -- `/tmp/sec-{baseline,after}-{server,web}.txt`, `/tmp/unit-baseline-*.conf`, - `/tmp/sysctl-baseline.txt` retained (next session can pull diffs from - these if needed). - -## What's NOT next - -- **Don't re-run the test plan.** Already done; results committed. -- **Don't push to origin yet.** Repo is 3 ahead of - `origin/master` (the three hardening specs + this commit). User - said "commit locally" this session; they'll push when ready. -- **Don't fix the stale-RCON-port app bug as part of the refactor.** - Different system, different scope. -- **Don't do AppArmor.** Still deferred. -- **build-overlay-unit refactor** (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`) - remains sequenced behind this; not next. - -## Open questions for the next session - -- Should the refactor be a single PR/commit, or split into - "ckn-bw emission" + "reference unit mirror" + "sysctl drop-in"? - Operator preference. Recommend single bundle if the changes are - small. -- Should we land Test 7's composition on `@2` first as a longer - canary before rolling to all instances? Or trust the symmetric - emission and roll everywhere at once? Currently both are running - baseline; @1 was the only canary. -- `SocketBindAllow=` for the 27000–27999 game port range — include - in the first pass, or defer to a follow-up commit? Survey lists - it, test plan didn't exercise it. +If any of those break, the most likely cause is the web unit's +`SystemCallFilter`. Drop-in override at +`/etc/systemd/system/left4me-web.service.d/00-debug.conf` with +`SystemCallLog=...` instead of `SystemCallFilter` to identify the +offending syscall, then narrow the filter. ## Pointers -- Test plan (executed; **read the Results section first**): - `docs/superpowers/specs/2026-05-15-hardening-test-plan.md` - Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md` - Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md` -- Original uid-split (to be marked superseded): - `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` -- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+` -- Reference units: `deploy/files/usr/local/lib/systemd/system/` -- Recent commit on this work: `461b8d0` -- Host SSH: `ssh left4.me` (config at `~/.ssh/config`, 1Password agent) +- Test plan (with executor results + this session's bug fixes): + `docs/superpowers/specs/2026-05-15-hardening-test-plan.md` +- Design doc: `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md` +- Implementation plan: `docs/superpowers/plans/2026-05-15-hardening-refactor.md` +- uid-split spec (marked superseded): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` +- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py` + (`HARDENING_COMMON` etc. near top; spreads at the + `left4me-server@.service` and `left4me-web.service` entries) +- Reference units (annotated): `deploy/files/usr/local/lib/systemd/system/`