refactor(sandbox): collapse l4d2-sandbox user into left4me

The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
This commit is contained in:
mwiegand 2026-05-15 15:50:57 +02:00
parent 146cb01450
commit 8971b23617
No known key found for this signature in database
11 changed files with 80 additions and 93 deletions

View file

@ -77,21 +77,20 @@ The deployment uses these on-host paths (FHS-aligned):
## Runtime users ## Runtime users
Two system users are involved: One system user does everything:
- **`left4me`** (home `/var/lib/left4me`, shell `/usr/sbin/nologin`): - **`left4me`** (home `/var/lib/left4me`, shell `/usr/sbin/nologin`):
web app, host library, and gameserver runtime. web app, host library, gameserver runtime, and script-overlay
- **`l4d2-sandbox`** (no home, shell `/usr/sbin/nologin`): unprivileged sandbox. The sandbox unit drops privileges via `systemd-run` and
uid the script-overlay sandbox drops into via `systemd-run`. The runs the user-authored bash inside a fully hardened transient
`left4me-script-sandbox` helper sets up an idmapped bind from the service (see `scripts/libexec/left4me-script-sandbox`). Same-uid
sandbox uid back to `left4me` on a staging path so overlay writes attack surface — sandbox escape reaching `web.env`, the SQLite DB,
land on disk as `left4me`-owned. The split is load-bearing: a or running gameservers — is closed by that hardening profile plus
sandbox escape would otherwise see `web.env`, the SQLite DB, and system-wide `kernel.yama.ptrace_scope=2`, rather than by a uid
running gameservers. boundary.
(Whether the gameserver runtime should be split off into a third uid is The user-count decision and its history live in
an open design question — see `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.)
## Deployment ## Deployment
@ -137,10 +136,10 @@ The web app currently supports two overlay surfaces:
symlinks under symlinks under
`${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`. `${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`.
- **`script` overlays** — populated by an arbitrary user-authored bash - **`script` overlays** — populated by an arbitrary user-authored bash
script that runs inside `systemd-run` as the unprivileged script that runs inside `systemd-run` as `left4me` (under a fully
`l4d2-sandbox` UID, with the overlay directory bind-mounted RW at hardened transient service unit), with the overlay directory
`/overlay`. Resource caps: 1h walltime, 4 GB RAM, 512 tasks, 200% CPU, bind-mounted RW at `/overlay`. Resource caps: 1h walltime, 4 GB RAM,
20 GB post-build disk cap. 512 tasks, 200% CPU, 20 GB post-build disk cap.
Both caches and overlay directories are owned by `left4me`. If the web Both caches and overlay directories are owned by `left4me`. If the web
service ever runs as a different uid, ensure it shares a group with the service ever runs as a different uid, ensure it shares a group with the

View file

@ -1,5 +1,11 @@
# Idmapped lowerdirs for left4me kernel-overlayfs # Idmapped lowerdirs for left4me kernel-overlayfs
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). With
> `l4d2-sandbox` collapsed into `left4me`, all overlay content is
> uniformly `left4me`-owned end-to-end and no idmap is needed at
> mount time either. Kept for design-evolution context.
## Context ## Context
Kernel-overlayfs copy-up preserves the lower-layer file's owner and mode in the Kernel-overlayfs copy-up preserves the lower-layer file's owner and mode in the

View file

@ -1,6 +1,12 @@
# Build-time idmap: move the uid translation from the gameserver mount # Build-time idmap: move the uid translation from the gameserver mount
into the script sandbox into the script sandbox
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). The
> idmap pattern this plan introduced is removed because source uid
> (`left4me`) now equals target uid (`left4me`) — the translation is
> a no-op. Kept for design-evolution context.
## Context ## Context
The current idmap implementation translates uids at **gameserver mount The current idmap implementation translates uids at **gameserver mount

View file

@ -9,6 +9,13 @@ The same pattern is already established in the codebase for
gameservers (`left4me-server@.service`). A future session should gameservers (`left4me-server@.service`). A future session should
evaluate whether to refactor and, if so, follow the steps below. evaluate whether to refactor and, if so, follow the steps below.
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> — see `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The
> idmap bind setup + trap cleanup are gone, so the remaining
> complexity in the helper is just the nsenter self-wrap. References
> below to `User=l4d2-sandbox` should read as `User=left4me`; the
> template refactor will inherit that cleanly.
## Why this came up ## Why this came up
While verifying the build-time idmap refactor, the first 5 build jobs While verifying the build-time idmap refactor, the first 5 build jobs

View file

@ -6,6 +6,12 @@ Companion: `2026-05-15-hardening-threat-model.md`,
`2026-05-15-hardening-defenses-survey.md`, `2026-05-15-hardening-defenses-survey.md`,
`2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline). `2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> after this refactor landed — see
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. References below
> to the sandbox running as a separate uid describe the pre-collapse
> state; the directive composition this doc establishes is unchanged.
This doc records the *shape* of the refactor — where the artifacts live, This doc records the *shape* of the refactor — where the artifacts live,
how they're factored, what's in scope. The implementation plan lays out how they're factored, what's in scope. The implementation plan lays out
the steps. the steps.

View file

@ -4,6 +4,13 @@
Paired with `2026-05-15-hardening-defenses-survey.md` and Paired with `2026-05-15-hardening-defenses-survey.md` and
`2026-05-15-hardening-test-plan.md`. `2026-05-15-hardening-test-plan.md`.
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> after the hardening refactor landed — see
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The same-uid
> threat surface that doc accepts is the same surface this model
> already documents for server/web; the sandbox is now in scope of
> the same hardening profile.
This document establishes *what we defend against and what we accept losing*. This document establishes *what we defend against and what we accept losing*.
The defenses survey and test plan operationalize this against the codebase. The defenses survey and test plan operationalize this against the codebase.

View file

@ -1,10 +1,10 @@
# How many system users should left4me have? — 1, 2, or 3 # How many system users should left4me have? — 1, 2, or 3
**Status: SUPERSEDED 2026-05-15 by the hardening refactor.** **Status: SUPERSEDED 2026-05-15 by the hardening refactor + uid-collapse.**
The original question — should left4me have 1, 2, or 3 system users — is The original question — should left4me have 1, 2, or 3 system users — is
now answered: **2 users (current state) is correct.** The now answered: **1 user (after the uid-collapse refactor) is correct.**
defenses that motivated a 3-user split (DB readability from srcds, The defenses that motivated a multi-user split (DB readability from srcds,
cross-server ptrace, same-uid /proc visibility, web-side reach into cross-server ptrace, same-uid /proc visibility, web-side reach into
gameserver state) are closed by the systemd hardening composition gameserver state) are closed by the systemd hardening composition
landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`): landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`):
@ -14,12 +14,22 @@ landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-harden
srcds entirely. srcds entirely.
- `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block - `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block
ptrace at the syscall layer. ptrace at the syscall layer.
- System-wide `kernel.yama.ptrace_scope=2` blocks same-uid ptrace.
The interim state (`left4me` + `l4d2-sandbox`) recorded earlier in this
doc was the principled middle ground — script-sandbox builds keeping a
separate uid for kernel-enforced isolation. After the hardening
refactor closed the same-uid attack surface for server/web, leaving
the sandbox on a separate uid was the inconsistent half-step. The
uid-collapse refactor (`docs/superpowers/plans/2026-05-15-uid-collapse.md`)
removed `l4d2-sandbox` so the sandbox now runs as `left4me`, defended
by the same hardening profile. One principle: hardening covers it.
The residual filesystem-ACL surface (DB at `0640 root:left4me`, The residual filesystem-ACL surface (DB at `0640 root:left4me`,
web.env same) is a separate concern: a uid split would close it via web.env same) is a separate concern: a uid split would close it via
kernel ACLs, but for the current deployment shape it's covered by the kernel ACLs, but for the current deployment shape it's covered by the
systemd-imposed FS view. If the deployment shape changes (multi-tenant systemd-imposed FS view. If the deployment shape changes (multi-tenant
host, shell logins as the service uids, additional services running host, shell logins as the service uid, additional services running
as `left4me` outside these units) the uid split should be revisited. as `left4me` outside these units) the uid split should be revisited.
The original content of this spec is preserved below for context. The original content of this spec is preserved below for context.

View file

@ -73,13 +73,12 @@ def start_instance(
runtime_dir = root / "runtime" / name runtime_dir = root / "runtime" / name
# Stage cfg files in the upper layer. Writing here goes straight to the # Stage cfg files in the upper layer. Writing here goes straight to the
# upper dir on the host filesystem with the worker's uid; the unit's # upper dir on the host filesystem; the unit's ExecStartPre then mounts
# ExecStartPre then mounts the overlay (single source of truth for the # the overlay (single source of truth for the mount), and the kernel
# mount), and the kernel surfaces these files at the top of the merged # surfaces these files at the top of the merged stack. All overlay
# stack. A script-sandbox-built lower-layer `server.cfg` is owned by # content (script-built lowers + this upper stage) is left4me-owned
# `l4d2-sandbox`, not the worker — staging in upper sidesteps the # end-to-end, so the kernel's overlay copy-up is uniform — no
# ownership-preserving copy-up that would happen if we wrote through # ownership crossings to reason about.
# merged post-mount.
emit_step("staging server.cfg + per-overlay aliases in upper layer...", on_stdout, passthrough) emit_step("staging server.cfg + per-overlay aliases in upper layer...", on_stdout, passthrough)
upper_cfg_dir = runtime_dir / "upper" / "left4dead2" / "cfg" upper_cfg_dir = runtime_dir / "upper" / "left4dead2" / "cfg"
upper_cfg_dir.mkdir(parents=True, exist_ok=True) upper_cfg_dir.mkdir(parents=True, exist_ok=True)

View file

@ -339,7 +339,7 @@ def run_sandboxed_script(
f.write(script_text or "") f.write(script_text or "")
script_path = f.name script_path = f.name
# NamedTemporaryFile creates 0600 owned by the web user; the sandbox runs # NamedTemporaryFile creates 0600 owned by the web user; the sandbox runs
# as l4d2-sandbox and needs to read it (bind-mounted at /script.sh inside # as left4me and needs to read it (bind-mounted at /script.sh inside
# the sandbox). Script content is not a secret — it's plain bash stored # the sandbox). Script content is not a secret — it's plain bash stored
# in the DB and editable by the user — so 0644 is appropriate. # in the DB and editable by the user — so 0644 is appropriate.
os.chmod(script_path, 0o644) os.chmod(script_path, 0o644)

View file

@ -15,8 +15,11 @@
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted — # LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
# scripts must reach the public internet to download workshop / l4d2center # scripts must reach the public internet to download workshop / l4d2center
# / cedapug content. PID namespace is shared with the host (no # / cedapug content. PID namespace is shared with the host (no
# PrivatePID= directive in systemd); host PIDs are visible via /proc but # PrivatePID= directive in systemd); host PIDs are visible via /proc.
# not signal-able due to UID mismatch. # Same-uid attack surface (the sandbox runs as left4me, so do the
# gameservers and the web app) is covered by the hardening profile plus
# system-wide kernel.yama.ptrace_scope=2 — see
# docs/superpowers/specs/2026-05-15-hardening-threat-model.md.
set -euo pipefail set -euo pipefail
# Self-wrap into PID 1's mount namespace before doing anything mount-related. # Self-wrap into PID 1's mount namespace before doing anything mount-related.
@ -46,43 +49,12 @@ if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
exit 0 exit 0
fi fi
# Pre-create an idmapped bind of the overlay dir, then point the sandbox's
# BindPaths at that staging path. The bind translates the sandbox's writing
# uid (l4d2-sandbox) back to left4me on disk, so all overlay content
# (script-built and workshop) is uniformly left4me-owned. Map direction:
# `--map-users=<disk_uid>:<mount_uid>:1` with disk=left4me, mount=sandbox —
# a process inside the bind with uid sandbox sees its uid as itself, and
# writes get translated to disk-uid left4me. Verified on kernel 6.12 that
# idmap propagates through systemd-run's plain re-bind of the staging path.
LEFT4ME_UID=$(id -u left4me)
LEFT4ME_GID=$(id -g left4me)
SANDBOX_UID=$(id -u l4d2-sandbox)
SANDBOX_GID=$(id -g l4d2-sandbox)
STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}
# trap fires even on errors / signals so the staging bind doesn't outlive
# this invocation. Idempotent if the staging is already gone.
cleanup_staging() {
umount "$STAGING" 2>/dev/null || true
rmdir "$STAGING" 2>/dev/null || true
}
trap cleanup_staging EXIT
# A leftover staging mount from a SIGKILLed prior run can be reset by
# umounting first, then re-binding fresh on the same path.
umount "$STAGING" 2>/dev/null || true
mkdir -p "$STAGING"
mount --bind \
--map-users="${LEFT4ME_UID}:${SANDBOX_UID}:1" \
--map-groups="${LEFT4ME_GID}:${SANDBOX_GID}:1" \
"$OVERLAY_DIR" "$STAGING"
SCRIPT_RC=0 SCRIPT_RC=0
systemd-run --quiet --collect --wait --pipe \ systemd-run --quiet --collect --wait --pipe \
--unit="left4me-script-${OVERLAY_ID}-$$" \ --unit="left4me-script-${OVERLAY_ID}-$$" \
--slice=l4d2-build.slice \ --slice=l4d2-build.slice \
-p OOMScoreAdjust=500 \ -p OOMScoreAdjust=500 \
-p User=l4d2-sandbox -p Group=l4d2-sandbox \ -p User=left4me -p Group=left4me \
-p UMask=0022 \ -p UMask=0022 \
-p NoNewPrivileges=yes \ -p NoNewPrivileges=yes \
-p ProtectSystem=strict -p ProtectHome=yes \ -p ProtectSystem=strict -p ProtectHome=yes \
@ -99,7 +71,7 @@ systemd-run --quiet --collect --wait --pipe \
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \ -p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
-p TemporaryFileSystem="/etc /var/lib" \ -p TemporaryFileSystem="/etc /var/lib" \
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \ -p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-p BindPaths="${STAGING}:/overlay" \ -p BindPaths="${OVERLAY_DIR}:/overlay" \
-p WorkingDirectory=/overlay \ -p WorkingDirectory=/overlay \
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \ -p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \ -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \

View file

@ -33,8 +33,8 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
assert "bubblewrap" not in text assert "bubblewrap" not in text
# UID drop via systemd directives. # UID drop via systemd directives.
assert "User=l4d2-sandbox" in text assert "User=left4me" in text
assert "Group=l4d2-sandbox" in text assert "Group=left4me" in text
# Cgroup limits unchanged from v1. # Cgroup limits unchanged from v1.
assert "MemoryMax=4G" in text assert "MemoryMax=4G" in text
@ -80,7 +80,7 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
assert "/etc/nsswitch.conf" in text assert "/etc/nsswitch.conf" in text
assert "/etc/alternatives" in text assert "/etc/alternatives" in text
assert "${SCRIPT}:/script.sh" in text assert "${SCRIPT}:/script.sh" in text
assert 'BindPaths="${STAGING}:/overlay"' in text assert 'BindPaths="${OVERLAY_DIR}:/overlay"' in text
# IP egress filter: allow public, deny localhost / RFC1918 / link-local / # IP egress filter: allow public, deny localhost / RFC1918 / link-local /
# multicast / CGNAT / ULA. systemd's "more specific rule wins" semantics # multicast / CGNAT / ULA. systemd's "more specific rule wins" semantics
@ -110,29 +110,6 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
assert token in text, f"missing {token!r} in IPAddressDeny set" assert token in text, f"missing {token!r} in IPAddressDeny set"
def test_script_sandbox_uses_idmap_staging():
"""The sandbox runs as l4d2-sandbox but writes need to land on disk as
left4me, so all overlay content (workshop + script-built) is uniformly
left4me-owned. The helper pre-creates an idmapped bind on a staging
path and points the sandbox's BindPaths at the staging, not at the raw
overlay dir. trap cleans up the staging bind on exit.
"""
text = SCRIPT_SANDBOX_HELPER.read_text()
# Idmap mount setup uses --map-users / --map-groups.
assert "--map-users=" in text
assert "--map-groups=" in text
# Staging path lives under /var/lib/left4me/tmp/sandbox-idmap-<id>.
assert "/var/lib/left4me/tmp/sandbox-idmap-" in text
# BindPaths into the sandbox points at the staging path, not the
# raw overlay dir.
assert 'BindPaths="${STAGING}:/overlay"' in text
# trap registers cleanup so the staging bind doesn't outlive the helper.
assert "trap " in text and "cleanup_staging" in text
# The previous chown-to-l4d2-sandbox approach is gone; overlay dirs
# stay left4me-owned end-to-end.
assert "chown -R l4d2-sandbox" not in text
def test_script_sandbox_in_build_slice_with_oom_adjust(): def test_script_sandbox_in_build_slice_with_oom_adjust():
text = SCRIPT_SANDBOX_HELPER.read_text() text = SCRIPT_SANDBOX_HELPER.read_text()
@ -162,10 +139,8 @@ def test_script_sandbox_helper_dry_run_mode(tmp_path):
fake_script = tmp_path / "fake.sh" fake_script = tmp_path / "fake.sh"
fake_script.write_text("echo hi") fake_script.write_text("echo hi")
# Run in DRY_RUN mode against a fake l4d2-sandbox UID via a tiny shim that
# simulates `id -u l4d2-sandbox` resolving to a valid number.
helper_text = SCRIPT_SANDBOX_HELPER.read_text() helper_text = SCRIPT_SANDBOX_HELPER.read_text()
# We can't actually exec this without root + a real sandbox user; just # We can't actually exec this without root; just verify the dry-run
# verify the dry-run guard short-circuits before systemd-run runs. # guard short-circuits before systemd-run runs.
assert 'LEFT4ME_SCRIPT_SANDBOX_DRY_RUN' in helper_text assert 'LEFT4ME_SCRIPT_SANDBOX_DRY_RUN' in helper_text
assert 'exit 0' in helper_text assert 'exit 0' in helper_text