refactor(sandbox): collapse l4d2-sandbox user into left4me

The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
This commit is contained in:
mwiegand 2026-05-15 15:50:57 +02:00
parent 146cb01450
commit 8971b23617
No known key found for this signature in database
11 changed files with 80 additions and 93 deletions

View file

@ -77,21 +77,20 @@ The deployment uses these on-host paths (FHS-aligned):
## Runtime users
Two system users are involved:
One system user does everything:
- **`left4me`** (home `/var/lib/left4me`, shell `/usr/sbin/nologin`):
web app, host library, and gameserver runtime.
- **`l4d2-sandbox`** (no home, shell `/usr/sbin/nologin`): unprivileged
uid the script-overlay sandbox drops into via `systemd-run`. The
`left4me-script-sandbox` helper sets up an idmapped bind from the
sandbox uid back to `left4me` on a staging path so overlay writes
land on disk as `left4me`-owned. The split is load-bearing: a
sandbox escape would otherwise see `web.env`, the SQLite DB, and
running gameservers.
web app, host library, gameserver runtime, and script-overlay
sandbox. The sandbox unit drops privileges via `systemd-run` and
runs the user-authored bash inside a fully hardened transient
service (see `scripts/libexec/left4me-script-sandbox`). Same-uid
attack surface — sandbox escape reaching `web.env`, the SQLite DB,
or running gameservers — is closed by that hardening profile plus
system-wide `kernel.yama.ptrace_scope=2`, rather than by a uid
boundary.
(Whether the gameserver runtime should be split off into a third uid is
an open design question — see
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.)
The user-count decision and its history live in
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
## Deployment
@ -137,10 +136,10 @@ The web app currently supports two overlay surfaces:
symlinks under
`${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`.
- **`script` overlays** — populated by an arbitrary user-authored bash
script that runs inside `systemd-run` as the unprivileged
`l4d2-sandbox` UID, with the overlay directory bind-mounted RW at
`/overlay`. Resource caps: 1h walltime, 4 GB RAM, 512 tasks, 200% CPU,
20 GB post-build disk cap.
script that runs inside `systemd-run` as `left4me` (under a fully
hardened transient service unit), with the overlay directory
bind-mounted RW at `/overlay`. Resource caps: 1h walltime, 4 GB RAM,
512 tasks, 200% CPU, 20 GB post-build disk cap.
Both caches and overlay directories are owned by `left4me`. If the web
service ever runs as a different uid, ensure it shares a group with the

View file

@ -1,5 +1,11 @@
# Idmapped lowerdirs for left4me kernel-overlayfs
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). With
> `l4d2-sandbox` collapsed into `left4me`, all overlay content is
> uniformly `left4me`-owned end-to-end and no idmap is needed at
> mount time either. Kept for design-evolution context.
## Context
Kernel-overlayfs copy-up preserves the lower-layer file's owner and mode in the

View file

@ -1,6 +1,12 @@
# Build-time idmap: move the uid translation from the gameserver mount
into the script sandbox
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). The
> idmap pattern this plan introduced is removed because source uid
> (`left4me`) now equals target uid (`left4me`) — the translation is
> a no-op. Kept for design-evolution context.
## Context
The current idmap implementation translates uids at **gameserver mount

View file

@ -9,6 +9,13 @@ The same pattern is already established in the codebase for
gameservers (`left4me-server@.service`). A future session should
evaluate whether to refactor and, if so, follow the steps below.
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> — see `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The
> idmap bind setup + trap cleanup are gone, so the remaining
> complexity in the helper is just the nsenter self-wrap. References
> below to `User=l4d2-sandbox` should read as `User=left4me`; the
> template refactor will inherit that cleanly.
## Why this came up
While verifying the build-time idmap refactor, the first 5 build jobs

View file

@ -6,6 +6,12 @@ Companion: `2026-05-15-hardening-threat-model.md`,
`2026-05-15-hardening-defenses-survey.md`,
`2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> after this refactor landed — see
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. References below
> to the sandbox running as a separate uid describe the pre-collapse
> state; the directive composition this doc establishes is unchanged.
This doc records the *shape* of the refactor — where the artifacts live,
how they're factored, what's in scope. The implementation plan lays out
the steps.

View file

@ -4,6 +4,13 @@
Paired with `2026-05-15-hardening-defenses-survey.md` and
`2026-05-15-hardening-test-plan.md`.
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> after the hardening refactor landed — see
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The same-uid
> threat surface that doc accepts is the same surface this model
> already documents for server/web; the sandbox is now in scope of
> the same hardening profile.
This document establishes *what we defend against and what we accept losing*.
The defenses survey and test plan operationalize this against the codebase.

View file

@ -1,10 +1,10 @@
# How many system users should left4me have? — 1, 2, or 3
**Status: SUPERSEDED 2026-05-15 by the hardening refactor.**
**Status: SUPERSEDED 2026-05-15 by the hardening refactor + uid-collapse.**
The original question — should left4me have 1, 2, or 3 system users — is
now answered: **2 users (current state) is correct.** The
defenses that motivated a 3-user split (DB readability from srcds,
now answered: **1 user (after the uid-collapse refactor) is correct.**
The defenses that motivated a multi-user split (DB readability from srcds,
cross-server ptrace, same-uid /proc visibility, web-side reach into
gameserver state) are closed by the systemd hardening composition
landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`):
@ -14,12 +14,22 @@ landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-harden
srcds entirely.
- `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block
ptrace at the syscall layer.
- System-wide `kernel.yama.ptrace_scope=2` blocks same-uid ptrace.
The interim state (`left4me` + `l4d2-sandbox`) recorded earlier in this
doc was the principled middle ground — script-sandbox builds keeping a
separate uid for kernel-enforced isolation. After the hardening
refactor closed the same-uid attack surface for server/web, leaving
the sandbox on a separate uid was the inconsistent half-step. The
uid-collapse refactor (`docs/superpowers/plans/2026-05-15-uid-collapse.md`)
removed `l4d2-sandbox` so the sandbox now runs as `left4me`, defended
by the same hardening profile. One principle: hardening covers it.
The residual filesystem-ACL surface (DB at `0640 root:left4me`,
web.env same) is a separate concern: a uid split would close it via
kernel ACLs, but for the current deployment shape it's covered by the
systemd-imposed FS view. If the deployment shape changes (multi-tenant
host, shell logins as the service uids, additional services running
host, shell logins as the service uid, additional services running
as `left4me` outside these units) the uid split should be revisited.
The original content of this spec is preserved below for context.

View file

@ -73,13 +73,12 @@ def start_instance(
runtime_dir = root / "runtime" / name
# Stage cfg files in the upper layer. Writing here goes straight to the
# upper dir on the host filesystem with the worker's uid; the unit's
# ExecStartPre then mounts the overlay (single source of truth for the
# mount), and the kernel surfaces these files at the top of the merged
# stack. A script-sandbox-built lower-layer `server.cfg` is owned by
# `l4d2-sandbox`, not the worker — staging in upper sidesteps the
# ownership-preserving copy-up that would happen if we wrote through
# merged post-mount.
# upper dir on the host filesystem; the unit's ExecStartPre then mounts
# the overlay (single source of truth for the mount), and the kernel
# surfaces these files at the top of the merged stack. All overlay
# content (script-built lowers + this upper stage) is left4me-owned
# end-to-end, so the kernel's overlay copy-up is uniform — no
# ownership crossings to reason about.
emit_step("staging server.cfg + per-overlay aliases in upper layer...", on_stdout, passthrough)
upper_cfg_dir = runtime_dir / "upper" / "left4dead2" / "cfg"
upper_cfg_dir.mkdir(parents=True, exist_ok=True)

View file

@ -339,7 +339,7 @@ def run_sandboxed_script(
f.write(script_text or "")
script_path = f.name
# NamedTemporaryFile creates 0600 owned by the web user; the sandbox runs
# as l4d2-sandbox and needs to read it (bind-mounted at /script.sh inside
# as left4me and needs to read it (bind-mounted at /script.sh inside
# the sandbox). Script content is not a secret — it's plain bash stored
# in the DB and editable by the user — so 0644 is appropriate.
os.chmod(script_path, 0o644)

View file

@ -15,8 +15,11 @@
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
# scripts must reach the public internet to download workshop / l4d2center
# / cedapug content. PID namespace is shared with the host (no
# PrivatePID= directive in systemd); host PIDs are visible via /proc but
# not signal-able due to UID mismatch.
# PrivatePID= directive in systemd); host PIDs are visible via /proc.
# Same-uid attack surface (the sandbox runs as left4me, so do the
# gameservers and the web app) is covered by the hardening profile plus
# system-wide kernel.yama.ptrace_scope=2 — see
# docs/superpowers/specs/2026-05-15-hardening-threat-model.md.
set -euo pipefail
# Self-wrap into PID 1's mount namespace before doing anything mount-related.
@ -46,43 +49,12 @@ if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
exit 0
fi
# Pre-create an idmapped bind of the overlay dir, then point the sandbox's
# BindPaths at that staging path. The bind translates the sandbox's writing
# uid (l4d2-sandbox) back to left4me on disk, so all overlay content
# (script-built and workshop) is uniformly left4me-owned. Map direction:
# `--map-users=<disk_uid>:<mount_uid>:1` with disk=left4me, mount=sandbox —
# a process inside the bind with uid sandbox sees its uid as itself, and
# writes get translated to disk-uid left4me. Verified on kernel 6.12 that
# idmap propagates through systemd-run's plain re-bind of the staging path.
LEFT4ME_UID=$(id -u left4me)
LEFT4ME_GID=$(id -g left4me)
SANDBOX_UID=$(id -u l4d2-sandbox)
SANDBOX_GID=$(id -g l4d2-sandbox)
STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}
# trap fires even on errors / signals so the staging bind doesn't outlive
# this invocation. Idempotent if the staging is already gone.
cleanup_staging() {
umount "$STAGING" 2>/dev/null || true
rmdir "$STAGING" 2>/dev/null || true
}
trap cleanup_staging EXIT
# A leftover staging mount from a SIGKILLed prior run can be reset by
# umounting first, then re-binding fresh on the same path.
umount "$STAGING" 2>/dev/null || true
mkdir -p "$STAGING"
mount --bind \
--map-users="${LEFT4ME_UID}:${SANDBOX_UID}:1" \
--map-groups="${LEFT4ME_GID}:${SANDBOX_GID}:1" \
"$OVERLAY_DIR" "$STAGING"
SCRIPT_RC=0
systemd-run --quiet --collect --wait --pipe \
--unit="left4me-script-${OVERLAY_ID}-$$" \
--slice=l4d2-build.slice \
-p OOMScoreAdjust=500 \
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
-p User=left4me -p Group=left4me \
-p UMask=0022 \
-p NoNewPrivileges=yes \
-p ProtectSystem=strict -p ProtectHome=yes \
@ -99,7 +71,7 @@ systemd-run --quiet --collect --wait --pipe \
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
-p TemporaryFileSystem="/etc /var/lib" \
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-p BindPaths="${STAGING}:/overlay" \
-p BindPaths="${OVERLAY_DIR}:/overlay" \
-p WorkingDirectory=/overlay \
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \

View file

@ -33,8 +33,8 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
assert "bubblewrap" not in text
# UID drop via systemd directives.
assert "User=l4d2-sandbox" in text
assert "Group=l4d2-sandbox" in text
assert "User=left4me" in text
assert "Group=left4me" in text
# Cgroup limits unchanged from v1.
assert "MemoryMax=4G" in text
@ -80,7 +80,7 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
assert "/etc/nsswitch.conf" in text
assert "/etc/alternatives" in text
assert "${SCRIPT}:/script.sh" in text
assert 'BindPaths="${STAGING}:/overlay"' in text
assert 'BindPaths="${OVERLAY_DIR}:/overlay"' in text
# IP egress filter: allow public, deny localhost / RFC1918 / link-local /
# multicast / CGNAT / ULA. systemd's "more specific rule wins" semantics
@ -110,29 +110,6 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
assert token in text, f"missing {token!r} in IPAddressDeny set"
def test_script_sandbox_uses_idmap_staging():
"""The sandbox runs as l4d2-sandbox but writes need to land on disk as
left4me, so all overlay content (workshop + script-built) is uniformly
left4me-owned. The helper pre-creates an idmapped bind on a staging
path and points the sandbox's BindPaths at the staging, not at the raw
overlay dir. trap cleans up the staging bind on exit.
"""
text = SCRIPT_SANDBOX_HELPER.read_text()
# Idmap mount setup uses --map-users / --map-groups.
assert "--map-users=" in text
assert "--map-groups=" in text
# Staging path lives under /var/lib/left4me/tmp/sandbox-idmap-<id>.
assert "/var/lib/left4me/tmp/sandbox-idmap-" in text
# BindPaths into the sandbox points at the staging path, not the
# raw overlay dir.
assert 'BindPaths="${STAGING}:/overlay"' in text
# trap registers cleanup so the staging bind doesn't outlive the helper.
assert "trap " in text and "cleanup_staging" in text
# The previous chown-to-l4d2-sandbox approach is gone; overlay dirs
# stay left4me-owned end-to-end.
assert "chown -R l4d2-sandbox" not in text
def test_script_sandbox_in_build_slice_with_oom_adjust():
text = SCRIPT_SANDBOX_HELPER.read_text()
@ -162,10 +139,8 @@ def test_script_sandbox_helper_dry_run_mode(tmp_path):
fake_script = tmp_path / "fake.sh"
fake_script.write_text("echo hi")
# Run in DRY_RUN mode against a fake l4d2-sandbox UID via a tiny shim that
# simulates `id -u l4d2-sandbox` resolving to a valid number.
helper_text = SCRIPT_SANDBOX_HELPER.read_text()
# We can't actually exec this without root + a real sandbox user; just
# verify the dry-run guard short-circuits before systemd-run runs.
# We can't actually exec this without root; just verify the dry-run
# guard short-circuits before systemd-run runs.
assert 'LEFT4ME_SCRIPT_SANDBOX_DRY_RUN' in helper_text
assert 'exit 0' in helper_text