plan(uid-collapse): drop l4d2-sandbox user; handoff to next session

Approved-but-not-executed plan to collapse the two-user model
(left4me + l4d2-sandbox) into one. The build-time-idmap that
translates sandbox writes back to left4me uid becomes a no-op when
source uid == target uid, so it's removed along with ~30 lines of
helper plumbing. Hardening already covers the same-uid attack
surface the sandbox uid was defending against, so collapsing makes
the architecture consistent with the web/server hardening-only
decision.

Plan: docs/superpowers/plans/2026-05-15-uid-collapse.md
Handoff: docs/superpowers/specs/2026-05-15-session-handoff.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-15 15:39:51 +02:00
parent f5f8db84ef
commit 146cb01450
No known key found for this signature in database
2 changed files with 324 additions and 103 deletions

View file

@ -0,0 +1,226 @@
# UID collapse — remove `l4d2-sandbox` user
## Context
The hardening refactor landed earlier today
(`docs/superpowers/plans/2026-05-15-hardening-refactor.md`) deployed
the systemd-directive composition that covers all same-uid attack
vectors for the gameserver + web units running as `left4me`.
The script-sandbox unit still runs as a separate uid `l4d2-sandbox`
(981) with a build-time idmap (`mount --bind --map-users=980:981:1`)
translating sandbox-side writes to land on disk as `left4me`. After
the hardening refactor, the same-uid attack vectors the sandbox uid
defends against (FS-view access, ptrace, /proc, signals) are
already closed by the sandbox's own systemd-run hardening profile.
The separate uid is now defense-in-depth only — and it's
inconsistent with the decision *not* to split the web/server uid.
Pick one principle. Option C from the discussion: **one user**.
Delete `l4d2-sandbox`, simplify the sandbox helper, remove the
idmap. Architecture gets smaller (one fewer uid, no idmap binds,
~30 lines deleted from the helper). Trade: if sandbox hardening
regresses, kernel uid boundary no longer helps — consistent with
what we already accepted for server/web.
## Approach
1. **Edit `scripts/libexec/left4me-script-sandbox`** (left4me repo):
delete the idmap block (lines 49-78 per Phase 1 exploration —
the `LEFT4ME_UID`/`SANDBOX_UID` lookups, `STAGING` setup,
`cleanup_staging` trap, `mount --bind --map-users=…` call).
Change `User=l4d2-sandbox -p Group=l4d2-sandbox` (line 85)
to `User=left4me -p Group=left4me`. Change
`BindPaths="${STAGING}:/overlay"` (line 102) to
`BindPaths="${OVERLAY_DIR}:/overlay"`. Keep the
`nsenter --mount=/proc/1/ns/mnt` self-wrap at the top — it's
about namespace escape, not uid.
2. **Update `scripts/tests/test_script_sandbox.py`** (left4me repo):
- Lines 36-37: change `User=l4d2-sandbox`/`Group=l4d2-sandbox`
assertions → `User=left4me`/`Group=left4me`.
- Delete `test_script_sandbox_uses_idmap_staging` (lines 114-133)
entirely — it asserts the idmap and staging exist; after
refactor neither does.
- Update line 165-166 comments to drop the sandbox-uid reference.
3. **Update inline comments** referencing the sandbox uid:
- `l4d2web/services/overlay_builders.py:342` (or near 100 — agents
reported different lines; locate via grep) — "as l4d2-sandbox"
→ "as left4me".
- `l4d2host/instances.py:80` — comment about l4d2-sandbox-owned
lower-layer files → reflect that all overlay content is now
left4me-owned end-to-end.
4. **Mark the build-time-idmap plan superseded**:
`docs/superpowers/plans/2026-05-15-build-time-idmap.md` — add a
top-line status note: "SUPERSEDED 2026-05-15 by the uid-collapse
refactor (this plan). The idmap pattern this plan introduced is
removed because source uid (`left4me`) now equals target uid
(`left4me`) — translation is a no-op." Same one-line treatment
for `docs/superpowers/plans/2026-05-14-overlay-idmap.md`.
5. **Update the user-uid-split spec's existing superseded header**:
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
currently says "2 users (current state) is correct"; revise to
say "1 user (after uid-collapse refactor) is correct" and update
the reasoning paragraph.
6. **Light-touch updates to other docs** that reference
`l4d2-sandbox` for accuracy. Pragmatic scope — add a top-line
note instead of rewriting body content:
- `deploy/README.md` — drop the `l4d2-sandbox` bullet (line 84),
fix the paragraph at line 141 to reflect no-idmap state.
- `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`
and `2026-05-15-hardening-threat-model.md` — add a one-line
"Updated 2026-05-15: l4d2-sandbox collapsed into left4me; see
plans/2026-05-15-uid-collapse.md" note in the relevant context
section.
- `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`
— same one-line note (the spec's hardening profile sketch
references the old `User=l4d2-sandbox`; the new build-overlay-unit
refactor when it lands will inherit `User=left4me` from this
change).
- **Leave the 2026-05-08-* design specs alone.** They describe
historical design at the time; rewriting them obscures the
evolution. Anyone reading them sees the date and the
superseded-note chain leads forward.
7. **Remove `l4d2-sandbox` from the ckn-bw bundle**
(`~/Projekte/ckn-bw/bundles/left4me/items.py`):
- Delete the `l4d2-sandbox` entry from the `users` dict
(lines 54-58 per Phase 1).
- Delete the `l4d2-sandbox` entry from the `groups` dict
(line 44).
- Update the `/var/lib/left4me` mode comment + decide whether to
change `0711``0755`. The `0711` was specifically to let
`l4d2-sandbox` traverse (not list) the dir; with sandbox gone,
`0755` is the natural choice. Pick `0755`.
8. **On-host pre-flight**: before `bw apply`, chown any remaining
uid-981 files to `left4me`:
```bash
ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 -print
| head -50'
# If any results, chown them:
ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981
-exec chown left4me:left4me {} +'
```
Per the build-time-idmap plan that landed earlier, new sandbox
writes already land as `left4me`, so the result should be small
or empty. The chown catches any stragglers.
9. **Cross-repo push + bw apply**:
- Commit left4me changes (helper, tests, doc updates) on master.
- Commit ckn-bw changes (users/groups deletion, mode change) on
master.
- Push both.
- `bw apply ovh.left4me`.
10. **Verify**:
- `getent passwd l4d2-sandbox` on the host → no result (user
removed).
- `sudo find /var/lib/left4me /opt/left4me -uid 981 -print`
empty.
- Trigger a sandbox build via the web UI; observe in
`journalctl -u 'left4me-script-*'` that the transient unit
runs as `left4me`, completes successfully, and the resulting
overlay files in `/var/lib/left4me/overlays/<id>/` are
`left4me:left4me`.
- `pytest scripts/tests/test_script_sandbox.py` locally passes
with updated assertions.
## Files to modify
**Left4me repo (`~/Projekte/left4me`):**
- `scripts/libexec/left4me-script-sandbox` — helper changes (step 1)
- `scripts/tests/test_script_sandbox.py` — test updates (step 2)
- `l4d2web/services/overlay_builders.py` — comment update (step 3)
- `l4d2host/instances.py` — comment update (step 3)
- `docs/superpowers/plans/2026-05-15-build-time-idmap.md`
SUPERSEDED header (step 4)
- `docs/superpowers/plans/2026-05-14-overlay-idmap.md`
SUPERSEDED header (step 4)
- `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
update existing superseded header (step 5)
- `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`
one-line note (step 6)
- `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
one-line note (step 6)
- `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`
one-line note (step 6)
- `deploy/README.md` — drop sandbox bullet, update idmap paragraph
(step 6)
**Ckn-bw repo (`~/Projekte/ckn-bw`):**
- `bundles/left4me/items.py` — drop `l4d2-sandbox` user + group;
tighten mode (step 7)
**Host actions (no commits):**
- pre-flight chown of orphan-981 files (step 8)
- `bw apply ovh.left4me` (step 9)
## Verification
End-to-end on `left4.me`:
```bash
# User removed
ssh left4.me 'getent passwd l4d2-sandbox; getent group l4d2-sandbox'
# Expect: empty (both)
# No orphan-uid files
ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 -print 2>/dev/null'
# Expect: empty
# Sandbox build runs as left4me end-to-end
# (Trigger via web UI; then check)
ssh left4.me 'sudo journalctl --since "5 minutes ago" -u "left4me-script-*" | head -30'
# Expect: clean run, no permission errors
ssh left4.me 'sudo ls -ln /var/lib/left4me/overlays/<id>/ | head -5'
# Expect: uid 980 (left4me), not 981
# Local tests
cd ~/Projekte/left4me && pytest scripts/tests/test_script_sandbox.py -q
# Expect: all green (one fewer test — the idmap test was deleted)
```
## Rollback
If the deploy goes wrong:
- `git revert` the left4me commits + the ckn-bw commit, push,
`bw apply` again.
- ckn-bw will recreate the `l4d2-sandbox` user on the host.
- The old helper script comes back via `git_deploy`.
- Any files chown'd from 981→980 in the pre-flight stay at 980 —
that's fine because the new helper would have written them as 980
anyway.
## Risks
- **Sandbox build running during `bw apply`**: ckn-bw's user-removal
step might fail if a `l4d2-sandbox`-uid process is alive.
Mitigation: don't apply during a build. Quick check before apply:
`ssh left4.me 'sudo systemctl list-units --type=service "left4me-script-*"'`
→ expect "0 loaded units".
- **Orphan files not caught by the pre-flight find**: if any uid-981
file exists outside `/var/lib/left4me` or `/opt/left4me`, the user
removal succeeds but the file becomes orphan-uid. Practically these
paths are exhaustive; if paranoid, expand the find to `/`.
- **The `nsenter` self-wrap still needs `PrivateTmp=true` on the web
unit to be the *reason* the wrap exists**. If the web unit's
PrivateTmp ever goes away, the wrap becomes unnecessary. Not
affected by this refactor; flag for future cleanup.
## Out of scope
- Renaming `left4me` to something else (e.g., `l4d2-app`). Cosmetic
only; not worth the migration cost.
- The broader configmgmt responsibility reshape (drop-ins owned by
left4me, ckn-bw as thin file-shipper). Deferred per the
hardening-refactor design.
- `build-overlay-unit` template refactor
(`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`)
— still queued; will inherit `User=left4me` cleanly from this work.
- Rewriting historical 2026-05-08-* design specs.

View file

@ -1,126 +1,121 @@
# Session handoff — hardening refactor landed
# Session handoff — next: execute uid-collapse plan
The hardening refactor planned at
`docs/superpowers/plans/2026-05-15-hardening-refactor.md` is deployed
to `left4.me` and verified. This session executed all 12 tasks
subagent-driven; no follow-up implementation work is queued.
The hardening refactor landed and was verified on `left4.me` earlier
in this session day. A follow-up question surfaced — the
two-user model (`left4me` + `l4d2-sandbox`) is inconsistent now
that systemd hardening covers the same-uid attack surface. The
asymmetry was hashed out and **Option C** (collapse to one user)
chosen. A plan was written but **not executed**. The next session
picks it up.
## What landed
## What just landed (committed + pushed earlier today)
**left4me commits** (this session, in order; all on `master`, pushed):
- `7c64910``spec(hardening-refactor): resolve emitter open items`
(verified ckn-bw systemd-bundle emitter handles tuples + empty values)
- `8e678b6``deploy/files: annotate reference units with per-directive hardening comments`
- `37309ba``spec(hardening-test-plan): fix four bugs surfaced by executor`
- `f615d0d``spec(user-uid-split): mark superseded by the hardening refactor`
The hardening refactor — full directive composition deployed to
`left4.me`. server@1 went 7.5 → 1.3 systemd-analyze; web 8.7 → 4.1;
all Test 8 attack vectors blocked. See the prior session-handoff
content in this file's git history (`git log --oneline -- this-file`)
and the close-out commits.
**ckn-bw commits** (this session, in order; all on `master`, pushed):
- `85b9af0``bundles/left4me: add HARDENING_{COMMON,SERVER,WEB} constants`
- `640461c``bundles/left4me: spread HARDENING_SERVER into left4me-server@.service`
- `c6721e7``bundles/left4me: spread HARDENING_WEB into left4me-web.service`
- `130b0b1``bundles/left4me: ship kernel.yama.ptrace_scope=2 sysctl drop-in`
## What's next: execute `2026-05-15-uid-collapse.md`
**Deploy:** `bw apply ovh.left4me` ran clean in 10 s (194 OK, 4 fixed,
0 failed). `left4me-web.service` restarted automatically by `bw`;
`left4me-server@1` and `@2` restarted manually post-apply.
Plan: `docs/superpowers/plans/2026-05-15-uid-collapse.md`. Approved
in plan-mode this session; not executed.
## What's live on `left4.me`
Scope (10 steps; see plan for detail):
| Unit | systemd-analyze score | State |
|---|---|---|
| `left4me-server@1.service` | **1.3 OK** (was 7.5 baseline) | active since 13:13:39 UTC |
| `left4me-server@2.service` | 1.3 OK | active since 13:14:40 UTC |
| `left4me-web.service` | **4.1 OK** (was 8.7 baseline) | active since 13:01:06 UTC |
1. Strip the idmap block from `scripts/libexec/left4me-script-sandbox`
(~30 lines deleted), change `User=l4d2-sandbox``User=left4me`,
`BindPaths="${STAGING}:/overlay"``BindPaths="${OVERLAY_DIR}:/overlay"`.
Keep the `nsenter` self-wrap (it's about namespace escape, not
uid — unaffected).
2. Update `scripts/tests/test_script_sandbox.py` — assertion changes
+ delete the `test_script_sandbox_uses_idmap_staging` test.
3. Update two inline comments referencing `l4d2-sandbox`.
4-6. Doc updates: mark `2026-05-15-build-time-idmap.md` and
`2026-05-14-overlay-idmap.md` superseded; revise the
user-uid-split superseded header to say "1 user" instead of
"2"; one-line notes in the hardening specs.
7. Remove `l4d2-sandbox` from `~/Projekte/ckn-bw/bundles/left4me/items.py`
(users + groups dicts). Tighten `/var/lib/left4me` mode from
`0711``0755`.
8. **On-host pre-flight**: `ssh left4.me` + `sudo find -uid 981`,
chown any stragglers to `left4me` BEFORE applying. ckn-bw won't
remove a user whose files (or processes) are still on disk
gracefully.
9. Push both repos; `bw apply ovh.left4me`.
10. Verify: `getent passwd l4d2-sandbox` empty, no uid-981 files,
sandbox build runs as left4me end-to-end via the web UI.
Sysctl: `kernel.yama.ptrace_scope = 2` (managed by ckn-bw bundle now,
not hand-applied).
Rollback path documented in the plan (git revert + bw apply
recreates the user).
Composition matches Test 7 of the test plan with two amendments
(`SystemCallArchitectures=native x86`, `PrivatePIDs=true`) and one
addition (`SocketBindAllow=udp:27000-27999 tcp:27000-27999`).
`MemoryDenyWriteExecute=true` permanently excluded.
## Why we're doing this
## Attack vectors blocked (Test 8 subset rerun post-deploy)
The two-user setup was the inconsistent middle ground:
- Server + web run as `left4me` because hardening covers the
threat — uid split would be 1-2 days of cross-repo migration for
marginal kernel-enforcement benefit.
- Sandbox runs as `l4d2-sandbox` for historical reasons — the
build-time-idmap design baked it in.
- **D1.a — srcds reads DB**: `cat /var/lib/left4me/left4me.db` from
inside the unit's mount namespace → `No such file or directory`
- **D1.b — srcds reads web.env**: `cat /etc/left4me/web.env`
`No such file or directory`
- **D1.c — srcds sees /opt**: empty listing
- **D2.b — srcds sees gunicorn PID via /proc**: `cannot access /proc/<pid>`
(PrivatePIDs in effect; PID doesn't exist in the namespace)
- **D5 — cross-instance ptrace**: `cannot access /proc/<peer-srcds-pid>`
(cross-instance PID isolation)
- **Syscall filter compiled correctly**: `ptrace` and `process_vm_*`
not in the compiled allow list (verified via
`systemd-analyze syscall-filter`)
The hardening composition on the sandbox unit (which the
script-sandbox helper applies via `systemd-run -p ...`) already
gives the same protection profile as the gameserver unit. The
separate uid is defense-in-depth only.
## Known acceptable noise
Picking one principle:
- **C** (collapse to one): cheap, deletes ~30 lines of helper code,
removes the build-time-idmap concern entirely. Architecture
simpler. Consistent with the web/server hardening-only decision.
- A (status quo): inconsistent. Documented but not principled.
- B (split fully): 1-2 days of work; we already rejected this for
server/web.
- **One SECCOMP audit line per gameserver restart** (`type=1326`,
i386 syscall 26 = `ptrace`, sig=31 SIGSYS, code=0x80000000
SECCOMP_RET_KILL_PROCESS). Source: srcds's Breakpad crash-reporter
init forks a child that attempts `ptrace`; we block it by design.
The child gets killed; the main srcds process is unaffected. Net
effect: Valve doesn't get crash minidumps from this host.
Acceptable trade-off given the threat model. If the audit-log noise
becomes a problem, switch the SECCOMP filter's action from
`KILL_PROCESS` to `EPERM` via `SystemCallErrorNumber=EPERM` (would
let breakpad fail cleanly instead of getting killed; same security
outcome).
Operator picked C.
## Host cleanup done
## Decision-relevant context already on the host
`gdb`, `libseccomp-dev`, `seccomp` removed via `apt remove --purge`.
Test tooling was installed during the test-plan execution session
(commit `461b8d0`); not needed in steady state. ~13 MB freed.
- After the hardening refactor + bw apply earlier, `left4me-server@*`
and `left4me-web` are running with the full hardening profile.
`kernel.yama.ptrace_scope=2` is set system-wide via the bundle.
- The sandbox unit is currently inactive (it's transient — only
exists during a build). Per the build-time-idmap plan, the
staging path lives at `/var/lib/left4me/tmp/sandbox-idmap-<id>`
during a build.
- ckn-bw's `users` bundle handles the removal mechanically; no
custom dance needed beyond the pre-flight chown.
## What's next
## Open questions to clarify with the operator before/during execution
No queued follow-up from this work. Adjacent open work:
- Whether to expand the pre-flight `find -uid 981` from
`/var/lib/left4me` + `/opt/left4me` to all of `/` for paranoia.
Probably not needed; flag for the implementer's judgement.
- Whether to combine left4me + ckn-bw into a single PR-equivalent
cross-repo commit pair, or push left4me first then ckn-bw. Plan
assumes both pushed before `bw apply`.
## What's NOT next
- **`build-overlay-unit` refactor**
(`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`).
Will reuse `HARDENING_COMMON` (or a sandbox-class variant) when it
lands. Sequenced after this; not blocked.
- **Broader configmgmt-responsibility reshape** — hardening as drop-in
files living in left4me with ckn-bw as a thin file-shipper. Real
direction, deliberately deferred to a dedicated session in this
refactor's design doc.
- **Stale RCON port app bug** flagged in the prior executor's handoff.
Not a hardening issue; separate scope.
## Open items the operator should sanity-check manually
I executed everything programmatically that I could. The following
need an eyeballed check via the web UI from your laptop:
1. Login to the web UI; confirm session works (would catch a SECRET_KEY
regression or session-cookie issue).
2. Start/stop a server from the UI (exercises the sudo path on the web
unit; if the SystemCallFilter or any other web hardening broke
sudo, this would fail).
3. View live logs for a server (uses `sudo left4me-journalctl`).
4. Trigger an overlay rebuild for a script overlay (exercises the
sandbox; unchanged by this refactor, but a smoke against the
full chain).
If any of those break, the most likely cause is the web unit's
`SystemCallFilter`. Drop-in override at
`/etc/systemd/system/left4me-web.service.d/00-debug.conf` with
`SystemCallLog=...` instead of `SystemCallFilter` to identify the
offending syscall, then narrow the filter.
Sequenced after this; will inherit `User=left4me` cleanly.
- **Broader configmgmt responsibility reshape** (drop-ins owned by
left4me, ckn-bw as thin file-shipper). Deliberately deferred.
- **Stale RCON port app bug** flagged in the earlier executor's
handoff. Separate scope.
- Renaming `left4me` to anything else. Cosmetic.
## Pointers
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
- Test plan (with executor results + this session's bug fixes):
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
- Design doc: `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`
- Implementation plan: `docs/superpowers/plans/2026-05-15-hardening-refactor.md`
- uid-split spec (marked superseded): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py`
(`HARDENING_COMMON` etc. near top; spreads at the
`left4me-server@.service` and `left4me-web.service` entries)
- Reference units (annotated): `deploy/files/usr/local/lib/systemd/system/`
- The plan to execute: `docs/superpowers/plans/2026-05-15-uid-collapse.md`
- Hardening refactor that just landed: `docs/superpowers/plans/2026-05-15-hardening-refactor.md`
- Hardening threat model + defenses survey + test plan (commit
`461b8d0` recorded the test results inline):
`docs/superpowers/specs/2026-05-15-hardening-{threat-model,defenses-survey,test-plan}.md`
- Build-time-idmap plan (about to be marked superseded):
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`
- uid-split spec (also affected — answer revises from "stay at 2"
to "collapse to 1"):
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
- Live source for unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py`
- Live source for users/groups: `~/Projekte/ckn-bw/bundles/left4me/items.py`