left4me/docs/superpowers/specs/2026-05-15-session-handoff.md
mwiegand 152c313315
spec(session-handoff): point next session at hardening-refactor plan
The prior handoff pointed this session at running the test plan; that's
done (commit 461b8d0). Update the handoff to point the next session at
writing docs/superpowers/plans/2026-MM-DD-hardening-refactor.md against
the proven composition, including the two amendments (x86 arch,
PrivatePIDs) and the MDW permanent exclusion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 13:43:37 +02:00

178 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Session handoff — next: write the hardening-refactor implementation plan
Short handoff. The hardening test plan was executed end-to-end on
`left4.me` this session. Results are recorded inline in the spec at
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md` (commit
`461b8d0`). The next session writes the implementation plan that lands
the proven composition in ckn-bw.
## What just happened
Ran all 11 tests from the hardening test plan on
`left4me-server@1` (canary) and `left4me-web` against the live host
at `left4.me` / `left4me.ovh.ckn.li` (Debian 13, systemd 257). All
drop-ins cleaned up at session end; the Test 9 sysctl
(`kernel.yama.ptrace_scope=2`) is the one persistent host change.
`gdb` + `seccomp` packages left installed.
Headline numbers:
- `left4me-server@1.service`: **7.5 EXPOSED → 1.3 OK** (systemd-analyze)
- `left4me-web.service`: **8.7 EXPOSED → 4.1 OK**
- Test 8 attack matrix: all 8 vectors (D1.a/b/c, D2.a/b/c, D3, D5) blocked.
Three things the test surfaced that change what the refactor must look like:
- **`SystemCallArchitectures=native x86`**, not bare `native`.
`srcds_linux` is 32-bit i386; with `native=AUDIT_ARCH_X86_64` only,
every i386 syscall is killed and srcds_run respawns every 10 s.
- **Add `PrivatePIDs=true`** to the composition. `ProtectProc=invisible`
alone cannot hide gunicorn from srcds because they share uid 980;
PrivatePIDs gives each instance its own PID namespace and closes
D2.b without needing the uid split.
- **Exclude `MemoryDenyWriteExecute=true`.** Source engine i386 `.so`
files have text relocations; MDW returns EPERM on the relocation
`mprotect`, dlopen aborts, srcds enters the respawn loop. Permanent
exclusion — not fixable without rebuilding Valve's closed-source
binary.
Full per-test detail is in the spec's "Results" section.
## What's next: write the refactor plan
Target file: `docs/superpowers/plans/2026-05-16-hardening-refactor.md`
(or whatever date the next session opens).
Scope:
1. **Land the proven composition in ckn-bw.** Live source for the
unit emission is `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`.
The reactor emits `left4me-server@.service` and `left4me-web.service`
— both need the new directives. Copy the Test 7 drop-in (from the
spec) into the reactor's unit body, with the two amendments above.
2. **Land the web composition** (sudo-compatible subset from Test 10)
in the same reactor.
3. **Land the sysctl drop-in in ckn-bw.** Currently
`/etc/sysctl.d/99-left4me-ptrace.conf` is host-only — if ckn-bw
later enforces unmanaged-file removal, this would disappear. Add
`pkg_files:` entry (or whatever the bundle convention is) for
`kernel.yama.ptrace_scope=2`.
4. **Update reference units** in
`deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service`
to mirror the new emission (these are reference-only post the
deploy-dir-rethink, but should not drift from the live source).
5. **Decide on `SocketBindAllow=`** for game port range (2700027999
per `LEFT4ME_PORT_RANGE_*`). Worth adding to lock srcds's bindable
sockets; not tested in this session.
6. **Resolve the deferred specs:**
- `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
**mark as superseded.** PrivatePIDs + PrivateUsers close the
same-uid /proc gap that motivated it. Note the residual app-level
same-uid surface (DB ACLs, web.env mode) is a separate concern
not addressed by uid split anyway.
- AppArmor follow-up — defer further; defenses survey lists it.
Revisit if directive-only hardening leaves observable gaps.
7. **Fix the four spec bugs documented at the bottom of the test plan**
(PID-lookup races, gdb-from-outside-NS verification flaw, D5
pgrep pattern, scmp_sys_resolver package name).
### Recommendation on sequencing
Before touching ckn-bw, run **superpowers:brainstorming** on the
refactor — there's a real design choice on emission shape. The
test-plan drop-in is ~50 lines of new directives; the existing
reactor emits a smaller unit. Options:
- **A. Inline.** All directives land directly in the
`[Service]` block emitted by the reactor. Simple, ckn-bw-idiomatic.
- **B. Profile-as-reusable-fragment.** Put the directive block in a
shared bundle (so the future build-overlay-unit refactor can reuse
it). Better factoring, more upfront design.
- **C. Drop-in pattern preserved.** Reactor emits the base unit
unchanged, plus a separate `*.service.d/hardening.conf` drop-in.
Mirrors the test methodology; easier to revert by removing the
drop-in.
My weak preference is **A** for the first pass — get the production
state hardened, then refactor into shared shape (B) when the
build-overlay-unit work needs it. **C** is operationally clean but
introduces a new emission pattern just for this. Worth 10 minutes of
brainstorming before committing.
## Decision-relevant context
- **Source of truth is ckn-bw.** `deploy/files/.../*.service` copies
are reference-only post-deploy-dir-rethink. Don't edit them as the
primary change — emit-then-mirror.
- **Sandbox `l4d2-sandbox` unit is out of scope.** Verified during
prior build-time-idmap work; do not weaken.
- **Web sudo helpers must keep working.** `NoNewPrivileges` and
`PrivateUsers` are NOT in the web composition (Test 10 confirmed
the sudo-compat subset). The "replace sudo with systemctl-managed
unit triggering" refactor (build-overlay-unit spec is a step
toward it) would unlock deeper web hardening later.
- **App-level stale RCON port bug** surfaced during testing: each
srcds restart picks a new ephemeral RCON port; the web app
caches the previous one and logs `Connection refused`. Pre-exists
hardening (repro'd before any drop-in was applied). Separate issue,
not for this refactor. Mention to operator; track in whatever
issue-tracking the project uses.
- **gdb + seccomp packages on left4.me** are installed but not in
ckn-bw. Either add them to the bundle (so they're reproducible)
or `apt remove` them after the refactor lands — operator
preference.
## Host state at end of session
- `left4me-server@1`, `@2`, `left4me-web`: all `active`, baseline
(no drop-ins).
- `/etc/sysctl.d/99-left4me-ptrace.conf` present, `ptrace_scope=2`
effective.
- `gdb`, `seccomp` (provides `scmp_sys_resolver`), `libseccomp-dev`
installed.
- `/tmp/sec-{baseline,after}-{server,web}.txt`, `/tmp/unit-baseline-*.conf`,
`/tmp/sysctl-baseline.txt` retained (next session can pull diffs from
these if needed).
## What's NOT next
- **Don't re-run the test plan.** Already done; results committed.
- **Don't push to origin yet.** Repo is 3 ahead of
`origin/master` (the three hardening specs + this commit). User
said "commit locally" this session; they'll push when ready.
- **Don't fix the stale-RCON-port app bug as part of the refactor.**
Different system, different scope.
- **Don't do AppArmor.** Still deferred.
- **build-overlay-unit refactor** (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`)
remains sequenced behind this; not next.
## Open questions for the next session
- Should the refactor be a single PR/commit, or split into
"ckn-bw emission" + "reference unit mirror" + "sysctl drop-in"?
Operator preference. Recommend single bundle if the changes are
small.
- Should we land Test 7's composition on `@2` first as a longer
canary before rolling to all instances? Or trust the symmetric
emission and roll everywhere at once? Currently both are running
baseline; @1 was the only canary.
- `SocketBindAllow=` for the 2700027999 game port range — include
in the first pass, or defer to a follow-up commit? Survey lists
it, test plan didn't exercise it.
## Pointers
- Test plan (executed; **read the Results section first**):
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
- Original uid-split (to be marked superseded):
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
- Recent commit on this work: `461b8d0`
- Host SSH: `ssh left4.me` (config at `~/.ssh/config`, 1Password agent)