refactor(sandbox): collapse l4d2-sandbox user into left4me

The hardening refactor that just landed closes the same-uid attack surface (FS view, ptrace, /proc visibility, signals) for the web + gameserver units via systemd directives plus system-wide kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate uid was the inconsistent half-step — defense-in-depth only, with build-time-idmap complexity attached. One principle wins: harden once, share the uid. scripts/libexec/left4me-script-sandbox: drop the idmap block (uid lookups, STAGING setup, cleanup_staging trap, mount --bind --map-users), switch User=/Group= to left4me, point BindPaths at \$OVERLAY_DIR directly. Header comment updated to reflect hardening-not-uid as the same-uid defense. nsenter self-wrap kept — it's about mount-namespace escape, not uid. Tests + comments + companion docs updated. Build-time-idmap and overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised to "1 user is correct"; one-line update notes on the hardening specs and the build-overlay-unit-design. Companion ckn-bw commit removes the l4d2-sandbox user + group and tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode was specifically for the sandbox uid).
2026-05-15 15:50:57 +02:00 · 2026-05-15 15:50:57 +02:00 · 8971b23617
commit 8971b23617
parent 146cb01450
11 changed files with 80 additions and 93 deletions
--- a/deploy/README.md
+++ b/deploy/README.md
@ -77,21 +77,20 @@ The deployment uses these on-host paths (FHS-aligned):
 ## Runtime users
-Two system users are involved:
+One system user does everything:
 - **`left4me`** (home `/var/lib/left4me`, shell `/usr/sbin/nologin`):
-  web app, host library, and gameserver runtime.
+  web app, host library, gameserver runtime, and script-overlay
- **`l4d2-sandbox`** (no home, shell `/usr/sbin/nologin`): unprivileged
+  sandbox. The sandbox unit drops privileges via `systemd-run` and
-  uid the script-overlay sandbox drops into via `systemd-run`. The
+  runs the user-authored bash inside a fully hardened transient
-  `left4me-script-sandbox` helper sets up an idmapped bind from the
+  service (see `scripts/libexec/left4me-script-sandbox`). Same-uid
-  sandbox uid back to `left4me` on a staging path so overlay writes
+  attack surface — sandbox escape reaching `web.env`, the SQLite DB,
-  land on disk as `left4me`-owned. The split is load-bearing: a
+  or running gameservers — is closed by that hardening profile plus
-  sandbox escape would otherwise see `web.env`, the SQLite DB, and
+  system-wide `kernel.yama.ptrace_scope=2`, rather than by a uid
-  running gameservers.
+  boundary.
-(Whether the gameserver runtime should be split off into a third uid is
+The user-count decision and its history live in
-an open design question — see
+`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
 `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.)
 ## Deployment
@ -137,10 +136,10 @@ The web app currently supports two overlay surfaces:
  symlinks under
  `${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`.
 - **`script` overlays** — populated by an arbitrary user-authored bash
-  script that runs inside `systemd-run` as the unprivileged
+  script that runs inside `systemd-run` as `left4me` (under a fully
-  `l4d2-sandbox` UID, with the overlay directory bind-mounted RW at
+  hardened transient service unit), with the overlay directory
-  `/overlay`. Resource caps: 1h walltime, 4 GB RAM, 512 tasks, 200% CPU,
+  bind-mounted RW at `/overlay`. Resource caps: 1h walltime, 4 GB RAM,
-  20 GB post-build disk cap.
+  512 tasks, 200% CPU, 20 GB post-build disk cap.
 Both caches and overlay directories are owned by `left4me`. If the web
 service ever runs as a different uid, ensure it shares a group with the
--- a/docs/superpowers/plans/2026-05-14-overlay-idmap.md
+++ b/docs/superpowers/plans/2026-05-14-overlay-idmap.md
@ -1,5 +1,11 @@
 # Idmapped lowerdirs for left4me kernel-overlayfs
 > **SUPERSEDED 2026-05-15** by the uid-collapse refactor
 > ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). With
 > `l4d2-sandbox` collapsed into `left4me`, all overlay content is
 > uniformly `left4me`-owned end-to-end and no idmap is needed at
 > mount time either. Kept for design-evolution context.
 ## Context
 Kernel-overlayfs copy-up preserves the lower-layer file's owner and mode in the
--- a/docs/superpowers/plans/2026-05-15-build-time-idmap.md
+++ b/docs/superpowers/plans/2026-05-15-build-time-idmap.md
@ -1,6 +1,12 @@
 # Build-time idmap: move the uid translation from the gameserver mount
 into the script sandbox
 > **SUPERSEDED 2026-05-15** by the uid-collapse refactor
 > ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). The
 > idmap pattern this plan introduced is removed because source uid
 > (`left4me`) now equals target uid (`left4me`) — the translation is
 > a no-op. Kept for design-evolution context.
 ## Context
 The current idmap implementation translates uids at **gameserver mount
--- a/docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md
+++ b/docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md
@ -9,6 +9,13 @@ The same pattern is already established in the codebase for
 gameservers (`left4me-server@.service`). A future session should
 evaluate whether to refactor and, if so, follow the steps below.
 > **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
 > — see `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The
 > idmap bind setup + trap cleanup are gone, so the remaining
 > complexity in the helper is just the nsenter self-wrap. References
 > below to `User=l4d2-sandbox` should read as `User=left4me`; the
 > template refactor will inherit that cleanly.
 ## Why this came up
 While verifying the build-time idmap refactor, the first 5 build jobs
--- a/docs/superpowers/specs/2026-05-15-hardening-refactor-design.md
+++ b/docs/superpowers/specs/2026-05-15-hardening-refactor-design.md
@ -6,6 +6,12 @@ Companion: `2026-05-15-hardening-threat-model.md`,
 `2026-05-15-hardening-defenses-survey.md`,
 `2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
 > **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
 > after this refactor landed — see
 > `docs/superpowers/plans/2026-05-15-uid-collapse.md`. References below
 > to the sandbox running as a separate uid describe the pre-collapse
 > state; the directive composition this doc establishes is unchanged.
 This doc records the *shape* of the refactor — where the artifacts live,
 how they're factored, what's in scope. The implementation plan lays out
 the steps.
--- a/docs/superpowers/specs/2026-05-15-hardening-threat-model.md
+++ b/docs/superpowers/specs/2026-05-15-hardening-threat-model.md
@ -4,6 +4,13 @@
 Paired with `2026-05-15-hardening-defenses-survey.md` and
 `2026-05-15-hardening-test-plan.md`.
 > **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
 > after the hardening refactor landed — see
 > `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The same-uid
 > threat surface that doc accepts is the same surface this model
 > already documents for server/web; the sandbox is now in scope of
 > the same hardening profile.
 This document establishes *what we defend against and what we accept losing*.
 The defenses survey and test plan operationalize this against the codebase.
--- a/docs/superpowers/specs/2026-05-15-user-uid-split-design.md
+++ b/docs/superpowers/specs/2026-05-15-user-uid-split-design.md
@ -1,10 +1,10 @@
 # How many system users should left4me have? — 1, 2, or 3
-**Status: SUPERSEDED 2026-05-15 by the hardening refactor.**
+**Status: SUPERSEDED 2026-05-15 by the hardening refactor + uid-collapse.**
 The original question — should left4me have 1, 2, or 3 system users — is
-now answered: **2 users (current state) is correct.** The
+now answered: **1 user (after the uid-collapse refactor) is correct.**
-defenses that motivated a 3-user split (DB readability from srcds,
+The defenses that motivated a multi-user split (DB readability from srcds,
 cross-server ptrace, same-uid /proc visibility, web-side reach into
 gameserver state) are closed by the systemd hardening composition
 landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`):
@ -14,12 +14,22 @@ landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-harden
  srcds entirely.
 - `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block
  ptrace at the syscall layer.
 - System-wide `kernel.yama.ptrace_scope=2` blocks same-uid ptrace.
 The interim state (`left4me` + `l4d2-sandbox`) recorded earlier in this
 doc was the principled middle ground — script-sandbox builds keeping a
 separate uid for kernel-enforced isolation. After the hardening
 refactor closed the same-uid attack surface for server/web, leaving
 the sandbox on a separate uid was the inconsistent half-step. The
 uid-collapse refactor (`docs/superpowers/plans/2026-05-15-uid-collapse.md`)
 removed `l4d2-sandbox` so the sandbox now runs as `left4me`, defended
 by the same hardening profile. One principle: hardening covers it.
 The residual filesystem-ACL surface (DB at `0640 root:left4me`,
 web.env same) is a separate concern: a uid split would close it via
 kernel ACLs, but for the current deployment shape it's covered by the
 systemd-imposed FS view. If the deployment shape changes (multi-tenant
-host, shell logins as the service uids, additional services running
+host, shell logins as the service uid, additional services running
 as `left4me` outside these units) the uid split should be revisited.
 The original content of this spec is preserved below for context.
--- a/l4d2host/instances.py
+++ b/l4d2host/instances.py
@ -73,13 +73,12 @@ def start_instance(
    runtime_dir = root / "runtime" / name
    # Stage cfg files in the upper layer. Writing here goes straight to the
-    # upper dir on the host filesystem with the worker's uid; the unit's
+    # upper dir on the host filesystem; the unit's ExecStartPre then mounts
-    # ExecStartPre then mounts the overlay (single source of truth for the
+    # the overlay (single source of truth for the mount), and the kernel
-    # mount), and the kernel surfaces these files at the top of the merged
+    # surfaces these files at the top of the merged stack. All overlay
-    # stack. A script-sandbox-built lower-layer `server.cfg` is owned by
+    # content (script-built lowers + this upper stage) is left4me-owned
-    # `l4d2-sandbox`, not the worker — staging in upper sidesteps the
+    # end-to-end, so the kernel's overlay copy-up is uniform — no
-    # ownership-preserving copy-up that would happen if we wrote through
+    # ownership crossings to reason about.
    # merged post-mount.
    emit_step("staging server.cfg + per-overlay aliases in upper layer...", on_stdout, passthrough)
    upper_cfg_dir = runtime_dir / "upper" / "left4dead2" / "cfg"
    upper_cfg_dir.mkdir(parents=True, exist_ok=True)
--- a/l4d2web/services/overlay_builders.py
+++ b/l4d2web/services/overlay_builders.py
@ -339,7 +339,7 @@ def run_sandboxed_script(
        f.write(script_text or "")
        script_path = f.name
    # NamedTemporaryFile creates 0600 owned by the web user; the sandbox runs
-    # as l4d2-sandbox and needs to read it (bind-mounted at /script.sh inside
+    # as left4me and needs to read it (bind-mounted at /script.sh inside
    # the sandbox). Script content is not a secret — it's plain bash stored
    # in the DB and editable by the user — so 0644 is appropriate.
    os.chmod(script_path, 0o644)
--- a/scripts/libexec/left4me-script-sandbox
+++ b/scripts/libexec/left4me-script-sandbox
@ -15,8 +15,11 @@
 # LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
 # scripts must reach the public internet to download workshop / l4d2center
 # / cedapug content. PID namespace is shared with the host (no
-# PrivatePID= directive in systemd); host PIDs are visible via /proc but
+# PrivatePID= directive in systemd); host PIDs are visible via /proc.
-# not signal-able due to UID mismatch.
+# Same-uid attack surface (the sandbox runs as left4me, so do the
 # gameservers and the web app) is covered by the hardening profile plus
 # system-wide kernel.yama.ptrace_scope=2 — see
 # docs/superpowers/specs/2026-05-15-hardening-threat-model.md.
 set -euo pipefail
 # Self-wrap into PID 1's mount namespace before doing anything mount-related.
@ -46,43 +49,12 @@ if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
    exit 0
 fi
 # Pre-create an idmapped bind of the overlay dir, then point the sandbox's
 # BindPaths at that staging path. The bind translates the sandbox's writing
 # uid (l4d2-sandbox) back to left4me on disk, so all overlay content
 # (script-built and workshop) is uniformly left4me-owned. Map direction:
 # `--map-users=<disk_uid>:<mount_uid>:1` with disk=left4me, mount=sandbox —
 # a process inside the bind with uid sandbox sees its uid as itself, and
 # writes get translated to disk-uid left4me. Verified on kernel 6.12 that
 # idmap propagates through systemd-run's plain re-bind of the staging path.
 LEFT4ME_UID=$(id -u left4me)
 LEFT4ME_GID=$(id -g left4me)
 SANDBOX_UID=$(id -u l4d2-sandbox)
 SANDBOX_GID=$(id -g l4d2-sandbox)
 STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}
 # trap fires even on errors / signals so the staging bind doesn't outlive
 # this invocation. Idempotent if the staging is already gone.
 cleanup_staging() {
    umount "$STAGING" 2>/dev/null || true
    rmdir "$STAGING" 2>/dev/null || true
 }
 trap cleanup_staging EXIT
 # A leftover staging mount from a SIGKILLed prior run can be reset by
 # umounting first, then re-binding fresh on the same path.
 umount "$STAGING" 2>/dev/null || true
 mkdir -p "$STAGING"
 mount --bind \
    --map-users="${LEFT4ME_UID}:${SANDBOX_UID}:1" \
    --map-groups="${LEFT4ME_GID}:${SANDBOX_GID}:1" \
    "$OVERLAY_DIR" "$STAGING"
 SCRIPT_RC=0
 systemd-run --quiet --collect --wait --pipe \
    --unit="left4me-script-${OVERLAY_ID}-$$" \
    --slice=l4d2-build.slice \
    -p OOMScoreAdjust=500 \
-    -p User=l4d2-sandbox -p Group=l4d2-sandbox \
+    -p User=left4me -p Group=left4me \
    -p UMask=0022 \
    -p NoNewPrivileges=yes \
    -p ProtectSystem=strict -p ProtectHome=yes \
@ -99,7 +71,7 @@ systemd-run --quiet --collect --wait --pipe \
    -p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
    -p TemporaryFileSystem="/etc /var/lib" \
    -p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-    -p BindPaths="${STAGING}:/overlay" \
+    -p BindPaths="${OVERLAY_DIR}:/overlay" \
    -p WorkingDirectory=/overlay \
    -p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
    -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
--- a/scripts/tests/test_script_sandbox.py
+++ b/scripts/tests/test_script_sandbox.py
@ -33,8 +33,8 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
    assert "bubblewrap" not in text
    # UID drop via systemd directives.
-    assert "User=l4d2-sandbox" in text
+    assert "User=left4me" in text
-    assert "Group=l4d2-sandbox" in text
+    assert "Group=left4me" in text
    # Cgroup limits unchanged from v1.
    assert "MemoryMax=4G" in text
@ -80,7 +80,7 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
    assert "/etc/nsswitch.conf" in text
    assert "/etc/alternatives" in text
    assert "${SCRIPT}:/script.sh" in text
-    assert 'BindPaths="${STAGING}:/overlay"' in text
+    assert 'BindPaths="${OVERLAY_DIR}:/overlay"' in text
    # IP egress filter: allow public, deny localhost / RFC1918 / link-local /
    # multicast / CGNAT / ULA. systemd's "more specific rule wins" semantics
@ -110,29 +110,6 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
        assert token in text, f"missing {token!r} in IPAddressDeny set"
 def test_script_sandbox_uses_idmap_staging():
    """The sandbox runs as l4d2-sandbox but writes need to land on disk as
    left4me, so all overlay content (workshop + script-built) is uniformly
    left4me-owned. The helper pre-creates an idmapped bind on a staging
    path and points the sandbox's BindPaths at the staging, not at the raw
    overlay dir. trap cleans up the staging bind on exit.
    """
    text = SCRIPT_SANDBOX_HELPER.read_text()
    # Idmap mount setup uses --map-users / --map-groups.
    assert "--map-users=" in text
    assert "--map-groups=" in text
    # Staging path lives under /var/lib/left4me/tmp/sandbox-idmap-<id>.
    assert "/var/lib/left4me/tmp/sandbox-idmap-" in text
    # BindPaths into the sandbox points at the staging path, not the
    # raw overlay dir.
    assert 'BindPaths="${STAGING}:/overlay"' in text
    # trap registers cleanup so the staging bind doesn't outlive the helper.
    assert "trap " in text and "cleanup_staging" in text
    # The previous chown-to-l4d2-sandbox approach is gone; overlay dirs
    # stay left4me-owned end-to-end.
    assert "chown -R l4d2-sandbox" not in text
 def test_script_sandbox_in_build_slice_with_oom_adjust():
    text = SCRIPT_SANDBOX_HELPER.read_text()
@ -162,10 +139,8 @@ def test_script_sandbox_helper_dry_run_mode(tmp_path):
    fake_script = tmp_path / "fake.sh"
    fake_script.write_text("echo hi")
    # Run in DRY_RUN mode against a fake l4d2-sandbox UID via a tiny shim that
    # simulates `id -u l4d2-sandbox` resolving to a valid number.
    helper_text = SCRIPT_SANDBOX_HELPER.read_text()
-    # We can't actually exec this without root + a real sandbox user; just
+    # We can't actually exec this without root; just verify the dry-run
-    # verify the dry-run guard short-circuits before systemd-run runs.
+    # guard short-circuits before systemd-run runs.
    assert 'LEFT4ME_SCRIPT_SANDBOX_DRY_RUN' in helper_text
    assert 'exit 0' in helper_text