From 146cb0145092bbd50edefe146e7baeb8c1357fc9 Mon Sep 17 00:00:00 2001 From: mwiegand Date: Fri, 15 May 2026 15:39:51 +0200 Subject: [PATCH] plan(uid-collapse): drop l4d2-sandbox user; handoff to next session Approved-but-not-executed plan to collapse the two-user model (left4me + l4d2-sandbox) into one. The build-time-idmap that translates sandbox writes back to left4me uid becomes a no-op when source uid == target uid, so it's removed along with ~30 lines of helper plumbing. Hardening already covers the same-uid attack surface the sandbox uid was defending against, so collapsing makes the architecture consistent with the web/server hardening-only decision. Plan: docs/superpowers/plans/2026-05-15-uid-collapse.md Handoff: docs/superpowers/specs/2026-05-15-session-handoff.md Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plans/2026-05-15-uid-collapse.md | 226 ++++++++++++++++++ .../specs/2026-05-15-session-handoff.md | 201 ++++++++-------- 2 files changed, 324 insertions(+), 103 deletions(-) create mode 100644 docs/superpowers/plans/2026-05-15-uid-collapse.md diff --git a/docs/superpowers/plans/2026-05-15-uid-collapse.md b/docs/superpowers/plans/2026-05-15-uid-collapse.md new file mode 100644 index 0000000..6bad1bd --- /dev/null +++ b/docs/superpowers/plans/2026-05-15-uid-collapse.md @@ -0,0 +1,226 @@ +# UID collapse — remove `l4d2-sandbox` user + +## Context + +The hardening refactor landed earlier today +(`docs/superpowers/plans/2026-05-15-hardening-refactor.md`) deployed +the systemd-directive composition that covers all same-uid attack +vectors for the gameserver + web units running as `left4me`. + +The script-sandbox unit still runs as a separate uid `l4d2-sandbox` +(981) with a build-time idmap (`mount --bind --map-users=980:981:1`) +translating sandbox-side writes to land on disk as `left4me`. After +the hardening refactor, the same-uid attack vectors the sandbox uid +defends against (FS-view access, ptrace, /proc, signals) are +already closed by the sandbox's own systemd-run hardening profile. +The separate uid is now defense-in-depth only — and it's +inconsistent with the decision *not* to split the web/server uid. + +Pick one principle. Option C from the discussion: **one user**. +Delete `l4d2-sandbox`, simplify the sandbox helper, remove the +idmap. Architecture gets smaller (one fewer uid, no idmap binds, +~30 lines deleted from the helper). Trade: if sandbox hardening +regresses, kernel uid boundary no longer helps — consistent with +what we already accepted for server/web. + +## Approach + +1. **Edit `scripts/libexec/left4me-script-sandbox`** (left4me repo): + delete the idmap block (lines 49-78 per Phase 1 exploration — + the `LEFT4ME_UID`/`SANDBOX_UID` lookups, `STAGING` setup, + `cleanup_staging` trap, `mount --bind --map-users=…` call). + Change `User=l4d2-sandbox -p Group=l4d2-sandbox` (line 85) + to `User=left4me -p Group=left4me`. Change + `BindPaths="${STAGING}:/overlay"` (line 102) to + `BindPaths="${OVERLAY_DIR}:/overlay"`. Keep the + `nsenter --mount=/proc/1/ns/mnt` self-wrap at the top — it's + about namespace escape, not uid. + +2. **Update `scripts/tests/test_script_sandbox.py`** (left4me repo): + - Lines 36-37: change `User=l4d2-sandbox`/`Group=l4d2-sandbox` + assertions → `User=left4me`/`Group=left4me`. + - Delete `test_script_sandbox_uses_idmap_staging` (lines 114-133) + entirely — it asserts the idmap and staging exist; after + refactor neither does. + - Update line 165-166 comments to drop the sandbox-uid reference. + +3. **Update inline comments** referencing the sandbox uid: + - `l4d2web/services/overlay_builders.py:342` (or near 100 — agents + reported different lines; locate via grep) — "as l4d2-sandbox" + → "as left4me". + - `l4d2host/instances.py:80` — comment about l4d2-sandbox-owned + lower-layer files → reflect that all overlay content is now + left4me-owned end-to-end. + +4. **Mark the build-time-idmap plan superseded**: + `docs/superpowers/plans/2026-05-15-build-time-idmap.md` — add a + top-line status note: "SUPERSEDED 2026-05-15 by the uid-collapse + refactor (this plan). The idmap pattern this plan introduced is + removed because source uid (`left4me`) now equals target uid + (`left4me`) — translation is a no-op." Same one-line treatment + for `docs/superpowers/plans/2026-05-14-overlay-idmap.md`. + +5. **Update the user-uid-split spec's existing superseded header**: + `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` — + currently says "2 users (current state) is correct"; revise to + say "1 user (after uid-collapse refactor) is correct" and update + the reasoning paragraph. + +6. **Light-touch updates to other docs** that reference + `l4d2-sandbox` for accuracy. Pragmatic scope — add a top-line + note instead of rewriting body content: + - `deploy/README.md` — drop the `l4d2-sandbox` bullet (line 84), + fix the paragraph at line 141 to reflect no-idmap state. + - `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md` + and `2026-05-15-hardening-threat-model.md` — add a one-line + "Updated 2026-05-15: l4d2-sandbox collapsed into left4me; see + plans/2026-05-15-uid-collapse.md" note in the relevant context + section. + - `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md` + — same one-line note (the spec's hardening profile sketch + references the old `User=l4d2-sandbox`; the new build-overlay-unit + refactor when it lands will inherit `User=left4me` from this + change). + - **Leave the 2026-05-08-* design specs alone.** They describe + historical design at the time; rewriting them obscures the + evolution. Anyone reading them sees the date and the + superseded-note chain leads forward. + +7. **Remove `l4d2-sandbox` from the ckn-bw bundle** + (`~/Projekte/ckn-bw/bundles/left4me/items.py`): + - Delete the `l4d2-sandbox` entry from the `users` dict + (lines 54-58 per Phase 1). + - Delete the `l4d2-sandbox` entry from the `groups` dict + (line 44). + - Update the `/var/lib/left4me` mode comment + decide whether to + change `0711` → `0755`. The `0711` was specifically to let + `l4d2-sandbox` traverse (not list) the dir; with sandbox gone, + `0755` is the natural choice. Pick `0755`. + +8. **On-host pre-flight**: before `bw apply`, chown any remaining + uid-981 files to `left4me`: + ```bash + ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 -print + | head -50' + # If any results, chown them: + ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 + -exec chown left4me:left4me {} +' + ``` + Per the build-time-idmap plan that landed earlier, new sandbox + writes already land as `left4me`, so the result should be small + or empty. The chown catches any stragglers. + +9. **Cross-repo push + bw apply**: + - Commit left4me changes (helper, tests, doc updates) on master. + - Commit ckn-bw changes (users/groups deletion, mode change) on + master. + - Push both. + - `bw apply ovh.left4me`. + +10. **Verify**: + - `getent passwd l4d2-sandbox` on the host → no result (user + removed). + - `sudo find /var/lib/left4me /opt/left4me -uid 981 -print` → + empty. + - Trigger a sandbox build via the web UI; observe in + `journalctl -u 'left4me-script-*'` that the transient unit + runs as `left4me`, completes successfully, and the resulting + overlay files in `/var/lib/left4me/overlays//` are + `left4me:left4me`. + - `pytest scripts/tests/test_script_sandbox.py` locally passes + with updated assertions. + +## Files to modify + +**Left4me repo (`~/Projekte/left4me`):** +- `scripts/libexec/left4me-script-sandbox` — helper changes (step 1) +- `scripts/tests/test_script_sandbox.py` — test updates (step 2) +- `l4d2web/services/overlay_builders.py` — comment update (step 3) +- `l4d2host/instances.py` — comment update (step 3) +- `docs/superpowers/plans/2026-05-15-build-time-idmap.md` — + SUPERSEDED header (step 4) +- `docs/superpowers/plans/2026-05-14-overlay-idmap.md` — + SUPERSEDED header (step 4) +- `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` — + update existing superseded header (step 5) +- `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md` — + one-line note (step 6) +- `docs/superpowers/specs/2026-05-15-hardening-threat-model.md` — + one-line note (step 6) +- `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md` — + one-line note (step 6) +- `deploy/README.md` — drop sandbox bullet, update idmap paragraph + (step 6) + +**Ckn-bw repo (`~/Projekte/ckn-bw`):** +- `bundles/left4me/items.py` — drop `l4d2-sandbox` user + group; + tighten mode (step 7) + +**Host actions (no commits):** +- pre-flight chown of orphan-981 files (step 8) +- `bw apply ovh.left4me` (step 9) + +## Verification + +End-to-end on `left4.me`: + +```bash +# User removed +ssh left4.me 'getent passwd l4d2-sandbox; getent group l4d2-sandbox' +# Expect: empty (both) + +# No orphan-uid files +ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 -print 2>/dev/null' +# Expect: empty + +# Sandbox build runs as left4me end-to-end +# (Trigger via web UI; then check) +ssh left4.me 'sudo journalctl --since "5 minutes ago" -u "left4me-script-*" | head -30' +# Expect: clean run, no permission errors + +ssh left4.me 'sudo ls -ln /var/lib/left4me/overlays// | head -5' +# Expect: uid 980 (left4me), not 981 + +# Local tests +cd ~/Projekte/left4me && pytest scripts/tests/test_script_sandbox.py -q +# Expect: all green (one fewer test — the idmap test was deleted) +``` + +## Rollback + +If the deploy goes wrong: +- `git revert` the left4me commits + the ckn-bw commit, push, + `bw apply` again. +- ckn-bw will recreate the `l4d2-sandbox` user on the host. +- The old helper script comes back via `git_deploy`. +- Any files chown'd from 981→980 in the pre-flight stay at 980 — + that's fine because the new helper would have written them as 980 + anyway. + +## Risks + +- **Sandbox build running during `bw apply`**: ckn-bw's user-removal + step might fail if a `l4d2-sandbox`-uid process is alive. + Mitigation: don't apply during a build. Quick check before apply: + `ssh left4.me 'sudo systemctl list-units --type=service "left4me-script-*"'` + → expect "0 loaded units". +- **Orphan files not caught by the pre-flight find**: if any uid-981 + file exists outside `/var/lib/left4me` or `/opt/left4me`, the user + removal succeeds but the file becomes orphan-uid. Practically these + paths are exhaustive; if paranoid, expand the find to `/`. +- **The `nsenter` self-wrap still needs `PrivateTmp=true` on the web + unit to be the *reason* the wrap exists**. If the web unit's + PrivateTmp ever goes away, the wrap becomes unnecessary. Not + affected by this refactor; flag for future cleanup. + +## Out of scope + +- Renaming `left4me` to something else (e.g., `l4d2-app`). Cosmetic + only; not worth the migration cost. +- The broader configmgmt responsibility reshape (drop-ins owned by + left4me, ckn-bw as thin file-shipper). Deferred per the + hardening-refactor design. +- `build-overlay-unit` template refactor + (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`) + — still queued; will inherit `User=left4me` cleanly from this work. +- Rewriting historical 2026-05-08-* design specs. diff --git a/docs/superpowers/specs/2026-05-15-session-handoff.md b/docs/superpowers/specs/2026-05-15-session-handoff.md index 005372f..a891fa0 100644 --- a/docs/superpowers/specs/2026-05-15-session-handoff.md +++ b/docs/superpowers/specs/2026-05-15-session-handoff.md @@ -1,126 +1,121 @@ -# Session handoff — hardening refactor landed +# Session handoff — next: execute uid-collapse plan -The hardening refactor planned at -`docs/superpowers/plans/2026-05-15-hardening-refactor.md` is deployed -to `left4.me` and verified. This session executed all 12 tasks -subagent-driven; no follow-up implementation work is queued. +The hardening refactor landed and was verified on `left4.me` earlier +in this session day. A follow-up question surfaced — the +two-user model (`left4me` + `l4d2-sandbox`) is inconsistent now +that systemd hardening covers the same-uid attack surface. The +asymmetry was hashed out and **Option C** (collapse to one user) +chosen. A plan was written but **not executed**. The next session +picks it up. -## What landed +## What just landed (committed + pushed earlier today) -**left4me commits** (this session, in order; all on `master`, pushed): -- `7c64910` — `spec(hardening-refactor): resolve emitter open items` - (verified ckn-bw systemd-bundle emitter handles tuples + empty values) -- `8e678b6` — `deploy/files: annotate reference units with per-directive hardening comments` -- `37309ba` — `spec(hardening-test-plan): fix four bugs surfaced by executor` -- `f615d0d` — `spec(user-uid-split): mark superseded by the hardening refactor` +The hardening refactor — full directive composition deployed to +`left4.me`. server@1 went 7.5 → 1.3 systemd-analyze; web 8.7 → 4.1; +all Test 8 attack vectors blocked. See the prior session-handoff +content in this file's git history (`git log --oneline -- this-file`) +and the close-out commits. -**ckn-bw commits** (this session, in order; all on `master`, pushed): -- `85b9af0` — `bundles/left4me: add HARDENING_{COMMON,SERVER,WEB} constants` -- `640461c` — `bundles/left4me: spread HARDENING_SERVER into left4me-server@.service` -- `c6721e7` — `bundles/left4me: spread HARDENING_WEB into left4me-web.service` -- `130b0b1` — `bundles/left4me: ship kernel.yama.ptrace_scope=2 sysctl drop-in` +## What's next: execute `2026-05-15-uid-collapse.md` -**Deploy:** `bw apply ovh.left4me` ran clean in 10 s (194 OK, 4 fixed, -0 failed). `left4me-web.service` restarted automatically by `bw`; -`left4me-server@1` and `@2` restarted manually post-apply. +Plan: `docs/superpowers/plans/2026-05-15-uid-collapse.md`. Approved +in plan-mode this session; not executed. -## What's live on `left4.me` +Scope (10 steps; see plan for detail): -| Unit | systemd-analyze score | State | -|---|---|---| -| `left4me-server@1.service` | **1.3 OK** (was 7.5 baseline) | active since 13:13:39 UTC | -| `left4me-server@2.service` | 1.3 OK | active since 13:14:40 UTC | -| `left4me-web.service` | **4.1 OK** (was 8.7 baseline) | active since 13:01:06 UTC | +1. Strip the idmap block from `scripts/libexec/left4me-script-sandbox` + (~30 lines deleted), change `User=l4d2-sandbox` → `User=left4me`, + `BindPaths="${STAGING}:/overlay"` → `BindPaths="${OVERLAY_DIR}:/overlay"`. + Keep the `nsenter` self-wrap (it's about namespace escape, not + uid — unaffected). +2. Update `scripts/tests/test_script_sandbox.py` — assertion changes + + delete the `test_script_sandbox_uses_idmap_staging` test. +3. Update two inline comments referencing `l4d2-sandbox`. +4-6. Doc updates: mark `2026-05-15-build-time-idmap.md` and + `2026-05-14-overlay-idmap.md` superseded; revise the + user-uid-split superseded header to say "1 user" instead of + "2"; one-line notes in the hardening specs. +7. Remove `l4d2-sandbox` from `~/Projekte/ckn-bw/bundles/left4me/items.py` + (users + groups dicts). Tighten `/var/lib/left4me` mode from + `0711` → `0755`. +8. **On-host pre-flight**: `ssh left4.me` + `sudo find -uid 981`, + chown any stragglers to `left4me` BEFORE applying. ckn-bw won't + remove a user whose files (or processes) are still on disk + gracefully. +9. Push both repos; `bw apply ovh.left4me`. +10. Verify: `getent passwd l4d2-sandbox` empty, no uid-981 files, + sandbox build runs as left4me end-to-end via the web UI. -Sysctl: `kernel.yama.ptrace_scope = 2` (managed by ckn-bw bundle now, -not hand-applied). +Rollback path documented in the plan (git revert + bw apply +recreates the user). -Composition matches Test 7 of the test plan with two amendments -(`SystemCallArchitectures=native x86`, `PrivatePIDs=true`) and one -addition (`SocketBindAllow=udp:27000-27999 tcp:27000-27999`). -`MemoryDenyWriteExecute=true` permanently excluded. +## Why we're doing this -## Attack vectors blocked (Test 8 subset rerun post-deploy) +The two-user setup was the inconsistent middle ground: +- Server + web run as `left4me` because hardening covers the + threat — uid split would be 1-2 days of cross-repo migration for + marginal kernel-enforcement benefit. +- Sandbox runs as `l4d2-sandbox` for historical reasons — the + build-time-idmap design baked it in. -- **D1.a — srcds reads DB**: `cat /var/lib/left4me/left4me.db` from - inside the unit's mount namespace → `No such file or directory` -- **D1.b — srcds reads web.env**: `cat /etc/left4me/web.env` → - `No such file or directory` -- **D1.c — srcds sees /opt**: empty listing -- **D2.b — srcds sees gunicorn PID via /proc**: `cannot access /proc/` - (PrivatePIDs in effect; PID doesn't exist in the namespace) -- **D5 — cross-instance ptrace**: `cannot access /proc/` - (cross-instance PID isolation) -- **Syscall filter compiled correctly**: `ptrace` and `process_vm_*` - not in the compiled allow list (verified via - `systemd-analyze syscall-filter`) +The hardening composition on the sandbox unit (which the +script-sandbox helper applies via `systemd-run -p ...`) already +gives the same protection profile as the gameserver unit. The +separate uid is defense-in-depth only. -## Known acceptable noise +Picking one principle: +- **C** (collapse to one): cheap, deletes ~30 lines of helper code, + removes the build-time-idmap concern entirely. Architecture + simpler. Consistent with the web/server hardening-only decision. +- A (status quo): inconsistent. Documented but not principled. +- B (split fully): 1-2 days of work; we already rejected this for + server/web. -- **One SECCOMP audit line per gameserver restart** (`type=1326`, - i386 syscall 26 = `ptrace`, sig=31 SIGSYS, code=0x80000000 - SECCOMP_RET_KILL_PROCESS). Source: srcds's Breakpad crash-reporter - init forks a child that attempts `ptrace`; we block it by design. - The child gets killed; the main srcds process is unaffected. Net - effect: Valve doesn't get crash minidumps from this host. - Acceptable trade-off given the threat model. If the audit-log noise - becomes a problem, switch the SECCOMP filter's action from - `KILL_PROCESS` to `EPERM` via `SystemCallErrorNumber=EPERM` (would - let breakpad fail cleanly instead of getting killed; same security - outcome). +Operator picked C. -## Host cleanup done +## Decision-relevant context already on the host -`gdb`, `libseccomp-dev`, `seccomp` removed via `apt remove --purge`. -Test tooling was installed during the test-plan execution session -(commit `461b8d0`); not needed in steady state. ~13 MB freed. +- After the hardening refactor + bw apply earlier, `left4me-server@*` + and `left4me-web` are running with the full hardening profile. + `kernel.yama.ptrace_scope=2` is set system-wide via the bundle. +- The sandbox unit is currently inactive (it's transient — only + exists during a build). Per the build-time-idmap plan, the + staging path lives at `/var/lib/left4me/tmp/sandbox-idmap-` + during a build. +- ckn-bw's `users` bundle handles the removal mechanically; no + custom dance needed beyond the pre-flight chown. -## What's next +## Open questions to clarify with the operator before/during execution -No queued follow-up from this work. Adjacent open work: +- Whether to expand the pre-flight `find -uid 981` from + `/var/lib/left4me` + `/opt/left4me` to all of `/` for paranoia. + Probably not needed; flag for the implementer's judgement. +- Whether to combine left4me + ckn-bw into a single PR-equivalent + cross-repo commit pair, or push left4me first then ckn-bw. Plan + assumes both pushed before `bw apply`. + +## What's NOT next - **`build-overlay-unit` refactor** (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`). - Will reuse `HARDENING_COMMON` (or a sandbox-class variant) when it - lands. Sequenced after this; not blocked. -- **Broader configmgmt-responsibility reshape** — hardening as drop-in - files living in left4me with ckn-bw as a thin file-shipper. Real - direction, deliberately deferred to a dedicated session in this - refactor's design doc. -- **Stale RCON port app bug** flagged in the prior executor's handoff. - Not a hardening issue; separate scope. - -## Open items the operator should sanity-check manually - -I executed everything programmatically that I could. The following -need an eyeballed check via the web UI from your laptop: - -1. Login to the web UI; confirm session works (would catch a SECRET_KEY - regression or session-cookie issue). -2. Start/stop a server from the UI (exercises the sudo path on the web - unit; if the SystemCallFilter or any other web hardening broke - sudo, this would fail). -3. View live logs for a server (uses `sudo left4me-journalctl`). -4. Trigger an overlay rebuild for a script overlay (exercises the - sandbox; unchanged by this refactor, but a smoke against the - full chain). - -If any of those break, the most likely cause is the web unit's -`SystemCallFilter`. Drop-in override at -`/etc/systemd/system/left4me-web.service.d/00-debug.conf` with -`SystemCallLog=...` instead of `SystemCallFilter` to identify the -offending syscall, then narrow the filter. + Sequenced after this; will inherit `User=left4me` cleanly. +- **Broader configmgmt responsibility reshape** (drop-ins owned by + left4me, ckn-bw as thin file-shipper). Deliberately deferred. +- **Stale RCON port app bug** flagged in the earlier executor's + handoff. Separate scope. +- Renaming `left4me` to anything else. Cosmetic. ## Pointers -- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md` -- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md` -- Test plan (with executor results + this session's bug fixes): - `docs/superpowers/specs/2026-05-15-hardening-test-plan.md` -- Design doc: `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md` -- Implementation plan: `docs/superpowers/plans/2026-05-15-hardening-refactor.md` -- uid-split spec (marked superseded): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` -- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py` - (`HARDENING_COMMON` etc. near top; spreads at the - `left4me-server@.service` and `left4me-web.service` entries) -- Reference units (annotated): `deploy/files/usr/local/lib/systemd/system/` +- The plan to execute: `docs/superpowers/plans/2026-05-15-uid-collapse.md` +- Hardening refactor that just landed: `docs/superpowers/plans/2026-05-15-hardening-refactor.md` +- Hardening threat model + defenses survey + test plan (commit + `461b8d0` recorded the test results inline): + `docs/superpowers/specs/2026-05-15-hardening-{threat-model,defenses-survey,test-plan}.md` +- Build-time-idmap plan (about to be marked superseded): + `docs/superpowers/plans/2026-05-15-build-time-idmap.md` +- uid-split spec (also affected — answer revises from "stay at 2" + to "collapse to 1"): + `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` +- Live source for unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py` +- Live source for users/groups: `~/Projekte/ckn-bw/bundles/left4me/items.py`