refactor(sandbox): collapse l4d2-sandbox user into left4me
The hardening refactor that just landed closes the same-uid attack surface (FS view, ptrace, /proc visibility, signals) for the web + gameserver units via systemd directives plus system-wide kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate uid was the inconsistent half-step — defense-in-depth only, with build-time-idmap complexity attached. One principle wins: harden once, share the uid. scripts/libexec/left4me-script-sandbox: drop the idmap block (uid lookups, STAGING setup, cleanup_staging trap, mount --bind --map-users), switch User=/Group= to left4me, point BindPaths at \$OVERLAY_DIR directly. Header comment updated to reflect hardening-not-uid as the same-uid defense. nsenter self-wrap kept — it's about mount-namespace escape, not uid. Tests + comments + companion docs updated. Build-time-idmap and overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised to "1 user is correct"; one-line update notes on the hardening specs and the build-overlay-unit-design. Companion ckn-bw commit removes the l4d2-sandbox user + group and tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode was specifically for the sandbox uid).
This commit is contained in:
parent
146cb01450
commit
8971b23617
11 changed files with 80 additions and 93 deletions
|
|
@ -77,21 +77,20 @@ The deployment uses these on-host paths (FHS-aligned):
|
|||
|
||||
## Runtime users
|
||||
|
||||
Two system users are involved:
|
||||
One system user does everything:
|
||||
|
||||
- **`left4me`** (home `/var/lib/left4me`, shell `/usr/sbin/nologin`):
|
||||
web app, host library, and gameserver runtime.
|
||||
- **`l4d2-sandbox`** (no home, shell `/usr/sbin/nologin`): unprivileged
|
||||
uid the script-overlay sandbox drops into via `systemd-run`. The
|
||||
`left4me-script-sandbox` helper sets up an idmapped bind from the
|
||||
sandbox uid back to `left4me` on a staging path so overlay writes
|
||||
land on disk as `left4me`-owned. The split is load-bearing: a
|
||||
sandbox escape would otherwise see `web.env`, the SQLite DB, and
|
||||
running gameservers.
|
||||
web app, host library, gameserver runtime, and script-overlay
|
||||
sandbox. The sandbox unit drops privileges via `systemd-run` and
|
||||
runs the user-authored bash inside a fully hardened transient
|
||||
service (see `scripts/libexec/left4me-script-sandbox`). Same-uid
|
||||
attack surface — sandbox escape reaching `web.env`, the SQLite DB,
|
||||
or running gameservers — is closed by that hardening profile plus
|
||||
system-wide `kernel.yama.ptrace_scope=2`, rather than by a uid
|
||||
boundary.
|
||||
|
||||
(Whether the gameserver runtime should be split off into a third uid is
|
||||
an open design question — see
|
||||
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.)
|
||||
The user-count decision and its history live in
|
||||
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
|
||||
|
||||
## Deployment
|
||||
|
||||
|
|
@ -137,10 +136,10 @@ The web app currently supports two overlay surfaces:
|
|||
symlinks under
|
||||
`${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`.
|
||||
- **`script` overlays** — populated by an arbitrary user-authored bash
|
||||
script that runs inside `systemd-run` as the unprivileged
|
||||
`l4d2-sandbox` UID, with the overlay directory bind-mounted RW at
|
||||
`/overlay`. Resource caps: 1h walltime, 4 GB RAM, 512 tasks, 200% CPU,
|
||||
20 GB post-build disk cap.
|
||||
script that runs inside `systemd-run` as `left4me` (under a fully
|
||||
hardened transient service unit), with the overlay directory
|
||||
bind-mounted RW at `/overlay`. Resource caps: 1h walltime, 4 GB RAM,
|
||||
512 tasks, 200% CPU, 20 GB post-build disk cap.
|
||||
|
||||
Both caches and overlay directories are owned by `left4me`. If the web
|
||||
service ever runs as a different uid, ensure it shares a group with the
|
||||
|
|
|
|||
|
|
@ -1,5 +1,11 @@
|
|||
# Idmapped lowerdirs for left4me kernel-overlayfs
|
||||
|
||||
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
|
||||
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). With
|
||||
> `l4d2-sandbox` collapsed into `left4me`, all overlay content is
|
||||
> uniformly `left4me`-owned end-to-end and no idmap is needed at
|
||||
> mount time either. Kept for design-evolution context.
|
||||
|
||||
## Context
|
||||
|
||||
Kernel-overlayfs copy-up preserves the lower-layer file's owner and mode in the
|
||||
|
|
|
|||
|
|
@ -1,6 +1,12 @@
|
|||
# Build-time idmap: move the uid translation from the gameserver mount
|
||||
into the script sandbox
|
||||
|
||||
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
|
||||
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). The
|
||||
> idmap pattern this plan introduced is removed because source uid
|
||||
> (`left4me`) now equals target uid (`left4me`) — the translation is
|
||||
> a no-op. Kept for design-evolution context.
|
||||
|
||||
## Context
|
||||
|
||||
The current idmap implementation translates uids at **gameserver mount
|
||||
|
|
|
|||
|
|
@ -9,6 +9,13 @@ The same pattern is already established in the codebase for
|
|||
gameservers (`left4me-server@.service`). A future session should
|
||||
evaluate whether to refactor and, if so, follow the steps below.
|
||||
|
||||
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
|
||||
> — see `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The
|
||||
> idmap bind setup + trap cleanup are gone, so the remaining
|
||||
> complexity in the helper is just the nsenter self-wrap. References
|
||||
> below to `User=l4d2-sandbox` should read as `User=left4me`; the
|
||||
> template refactor will inherit that cleanly.
|
||||
|
||||
## Why this came up
|
||||
|
||||
While verifying the build-time idmap refactor, the first 5 build jobs
|
||||
|
|
|
|||
|
|
@ -6,6 +6,12 @@ Companion: `2026-05-15-hardening-threat-model.md`,
|
|||
`2026-05-15-hardening-defenses-survey.md`,
|
||||
`2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
|
||||
|
||||
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
|
||||
> after this refactor landed — see
|
||||
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. References below
|
||||
> to the sandbox running as a separate uid describe the pre-collapse
|
||||
> state; the directive composition this doc establishes is unchanged.
|
||||
|
||||
This doc records the *shape* of the refactor — where the artifacts live,
|
||||
how they're factored, what's in scope. The implementation plan lays out
|
||||
the steps.
|
||||
|
|
|
|||
|
|
@ -4,6 +4,13 @@
|
|||
Paired with `2026-05-15-hardening-defenses-survey.md` and
|
||||
`2026-05-15-hardening-test-plan.md`.
|
||||
|
||||
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
|
||||
> after the hardening refactor landed — see
|
||||
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The same-uid
|
||||
> threat surface that doc accepts is the same surface this model
|
||||
> already documents for server/web; the sandbox is now in scope of
|
||||
> the same hardening profile.
|
||||
|
||||
This document establishes *what we defend against and what we accept losing*.
|
||||
The defenses survey and test plan operationalize this against the codebase.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,10 +1,10 @@
|
|||
# How many system users should left4me have? — 1, 2, or 3
|
||||
|
||||
**Status: SUPERSEDED 2026-05-15 by the hardening refactor.**
|
||||
**Status: SUPERSEDED 2026-05-15 by the hardening refactor + uid-collapse.**
|
||||
|
||||
The original question — should left4me have 1, 2, or 3 system users — is
|
||||
now answered: **2 users (current state) is correct.** The
|
||||
defenses that motivated a 3-user split (DB readability from srcds,
|
||||
now answered: **1 user (after the uid-collapse refactor) is correct.**
|
||||
The defenses that motivated a multi-user split (DB readability from srcds,
|
||||
cross-server ptrace, same-uid /proc visibility, web-side reach into
|
||||
gameserver state) are closed by the systemd hardening composition
|
||||
landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`):
|
||||
|
|
@ -14,12 +14,22 @@ landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-harden
|
|||
srcds entirely.
|
||||
- `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block
|
||||
ptrace at the syscall layer.
|
||||
- System-wide `kernel.yama.ptrace_scope=2` blocks same-uid ptrace.
|
||||
|
||||
The interim state (`left4me` + `l4d2-sandbox`) recorded earlier in this
|
||||
doc was the principled middle ground — script-sandbox builds keeping a
|
||||
separate uid for kernel-enforced isolation. After the hardening
|
||||
refactor closed the same-uid attack surface for server/web, leaving
|
||||
the sandbox on a separate uid was the inconsistent half-step. The
|
||||
uid-collapse refactor (`docs/superpowers/plans/2026-05-15-uid-collapse.md`)
|
||||
removed `l4d2-sandbox` so the sandbox now runs as `left4me`, defended
|
||||
by the same hardening profile. One principle: hardening covers it.
|
||||
|
||||
The residual filesystem-ACL surface (DB at `0640 root:left4me`,
|
||||
web.env same) is a separate concern: a uid split would close it via
|
||||
kernel ACLs, but for the current deployment shape it's covered by the
|
||||
systemd-imposed FS view. If the deployment shape changes (multi-tenant
|
||||
host, shell logins as the service uids, additional services running
|
||||
host, shell logins as the service uid, additional services running
|
||||
as `left4me` outside these units) the uid split should be revisited.
|
||||
|
||||
The original content of this spec is preserved below for context.
|
||||
|
|
|
|||
|
|
@ -73,13 +73,12 @@ def start_instance(
|
|||
runtime_dir = root / "runtime" / name
|
||||
|
||||
# Stage cfg files in the upper layer. Writing here goes straight to the
|
||||
# upper dir on the host filesystem with the worker's uid; the unit's
|
||||
# ExecStartPre then mounts the overlay (single source of truth for the
|
||||
# mount), and the kernel surfaces these files at the top of the merged
|
||||
# stack. A script-sandbox-built lower-layer `server.cfg` is owned by
|
||||
# `l4d2-sandbox`, not the worker — staging in upper sidesteps the
|
||||
# ownership-preserving copy-up that would happen if we wrote through
|
||||
# merged post-mount.
|
||||
# upper dir on the host filesystem; the unit's ExecStartPre then mounts
|
||||
# the overlay (single source of truth for the mount), and the kernel
|
||||
# surfaces these files at the top of the merged stack. All overlay
|
||||
# content (script-built lowers + this upper stage) is left4me-owned
|
||||
# end-to-end, so the kernel's overlay copy-up is uniform — no
|
||||
# ownership crossings to reason about.
|
||||
emit_step("staging server.cfg + per-overlay aliases in upper layer...", on_stdout, passthrough)
|
||||
upper_cfg_dir = runtime_dir / "upper" / "left4dead2" / "cfg"
|
||||
upper_cfg_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
|
|
|||
|
|
@ -339,7 +339,7 @@ def run_sandboxed_script(
|
|||
f.write(script_text or "")
|
||||
script_path = f.name
|
||||
# NamedTemporaryFile creates 0600 owned by the web user; the sandbox runs
|
||||
# as l4d2-sandbox and needs to read it (bind-mounted at /script.sh inside
|
||||
# as left4me and needs to read it (bind-mounted at /script.sh inside
|
||||
# the sandbox). Script content is not a secret — it's plain bash stored
|
||||
# in the DB and editable by the user — so 0644 is appropriate.
|
||||
os.chmod(script_path, 0o644)
|
||||
|
|
|
|||
|
|
@ -15,8 +15,11 @@
|
|||
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
|
||||
# scripts must reach the public internet to download workshop / l4d2center
|
||||
# / cedapug content. PID namespace is shared with the host (no
|
||||
# PrivatePID= directive in systemd); host PIDs are visible via /proc but
|
||||
# not signal-able due to UID mismatch.
|
||||
# PrivatePID= directive in systemd); host PIDs are visible via /proc.
|
||||
# Same-uid attack surface (the sandbox runs as left4me, so do the
|
||||
# gameservers and the web app) is covered by the hardening profile plus
|
||||
# system-wide kernel.yama.ptrace_scope=2 — see
|
||||
# docs/superpowers/specs/2026-05-15-hardening-threat-model.md.
|
||||
set -euo pipefail
|
||||
|
||||
# Self-wrap into PID 1's mount namespace before doing anything mount-related.
|
||||
|
|
@ -46,43 +49,12 @@ if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
|
|||
exit 0
|
||||
fi
|
||||
|
||||
# Pre-create an idmapped bind of the overlay dir, then point the sandbox's
|
||||
# BindPaths at that staging path. The bind translates the sandbox's writing
|
||||
# uid (l4d2-sandbox) back to left4me on disk, so all overlay content
|
||||
# (script-built and workshop) is uniformly left4me-owned. Map direction:
|
||||
# `--map-users=<disk_uid>:<mount_uid>:1` with disk=left4me, mount=sandbox —
|
||||
# a process inside the bind with uid sandbox sees its uid as itself, and
|
||||
# writes get translated to disk-uid left4me. Verified on kernel 6.12 that
|
||||
# idmap propagates through systemd-run's plain re-bind of the staging path.
|
||||
LEFT4ME_UID=$(id -u left4me)
|
||||
LEFT4ME_GID=$(id -g left4me)
|
||||
SANDBOX_UID=$(id -u l4d2-sandbox)
|
||||
SANDBOX_GID=$(id -g l4d2-sandbox)
|
||||
STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}
|
||||
|
||||
# trap fires even on errors / signals so the staging bind doesn't outlive
|
||||
# this invocation. Idempotent if the staging is already gone.
|
||||
cleanup_staging() {
|
||||
umount "$STAGING" 2>/dev/null || true
|
||||
rmdir "$STAGING" 2>/dev/null || true
|
||||
}
|
||||
trap cleanup_staging EXIT
|
||||
|
||||
# A leftover staging mount from a SIGKILLed prior run can be reset by
|
||||
# umounting first, then re-binding fresh on the same path.
|
||||
umount "$STAGING" 2>/dev/null || true
|
||||
mkdir -p "$STAGING"
|
||||
mount --bind \
|
||||
--map-users="${LEFT4ME_UID}:${SANDBOX_UID}:1" \
|
||||
--map-groups="${LEFT4ME_GID}:${SANDBOX_GID}:1" \
|
||||
"$OVERLAY_DIR" "$STAGING"
|
||||
|
||||
SCRIPT_RC=0
|
||||
systemd-run --quiet --collect --wait --pipe \
|
||||
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
||||
--slice=l4d2-build.slice \
|
||||
-p OOMScoreAdjust=500 \
|
||||
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
|
||||
-p User=left4me -p Group=left4me \
|
||||
-p UMask=0022 \
|
||||
-p NoNewPrivileges=yes \
|
||||
-p ProtectSystem=strict -p ProtectHome=yes \
|
||||
|
|
@ -99,7 +71,7 @@ systemd-run --quiet --collect --wait --pipe \
|
|||
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
|
||||
-p TemporaryFileSystem="/etc /var/lib" \
|
||||
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
|
||||
-p BindPaths="${STAGING}:/overlay" \
|
||||
-p BindPaths="${OVERLAY_DIR}:/overlay" \
|
||||
-p WorkingDirectory=/overlay \
|
||||
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
|
||||
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
|
||||
|
|
|
|||
|
|
@ -33,8 +33,8 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
|
|||
assert "bubblewrap" not in text
|
||||
|
||||
# UID drop via systemd directives.
|
||||
assert "User=l4d2-sandbox" in text
|
||||
assert "Group=l4d2-sandbox" in text
|
||||
assert "User=left4me" in text
|
||||
assert "Group=left4me" in text
|
||||
|
||||
# Cgroup limits unchanged from v1.
|
||||
assert "MemoryMax=4G" in text
|
||||
|
|
@ -80,7 +80,7 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
|
|||
assert "/etc/nsswitch.conf" in text
|
||||
assert "/etc/alternatives" in text
|
||||
assert "${SCRIPT}:/script.sh" in text
|
||||
assert 'BindPaths="${STAGING}:/overlay"' in text
|
||||
assert 'BindPaths="${OVERLAY_DIR}:/overlay"' in text
|
||||
|
||||
# IP egress filter: allow public, deny localhost / RFC1918 / link-local /
|
||||
# multicast / CGNAT / ULA. systemd's "more specific rule wins" semantics
|
||||
|
|
@ -110,29 +110,6 @@ def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
|
|||
assert token in text, f"missing {token!r} in IPAddressDeny set"
|
||||
|
||||
|
||||
def test_script_sandbox_uses_idmap_staging():
|
||||
"""The sandbox runs as l4d2-sandbox but writes need to land on disk as
|
||||
left4me, so all overlay content (workshop + script-built) is uniformly
|
||||
left4me-owned. The helper pre-creates an idmapped bind on a staging
|
||||
path and points the sandbox's BindPaths at the staging, not at the raw
|
||||
overlay dir. trap cleans up the staging bind on exit.
|
||||
"""
|
||||
text = SCRIPT_SANDBOX_HELPER.read_text()
|
||||
# Idmap mount setup uses --map-users / --map-groups.
|
||||
assert "--map-users=" in text
|
||||
assert "--map-groups=" in text
|
||||
# Staging path lives under /var/lib/left4me/tmp/sandbox-idmap-<id>.
|
||||
assert "/var/lib/left4me/tmp/sandbox-idmap-" in text
|
||||
# BindPaths into the sandbox points at the staging path, not the
|
||||
# raw overlay dir.
|
||||
assert 'BindPaths="${STAGING}:/overlay"' in text
|
||||
# trap registers cleanup so the staging bind doesn't outlive the helper.
|
||||
assert "trap " in text and "cleanup_staging" in text
|
||||
# The previous chown-to-l4d2-sandbox approach is gone; overlay dirs
|
||||
# stay left4me-owned end-to-end.
|
||||
assert "chown -R l4d2-sandbox" not in text
|
||||
|
||||
|
||||
def test_script_sandbox_in_build_slice_with_oom_adjust():
|
||||
text = SCRIPT_SANDBOX_HELPER.read_text()
|
||||
|
||||
|
|
@ -162,10 +139,8 @@ def test_script_sandbox_helper_dry_run_mode(tmp_path):
|
|||
fake_script = tmp_path / "fake.sh"
|
||||
fake_script.write_text("echo hi")
|
||||
|
||||
# Run in DRY_RUN mode against a fake l4d2-sandbox UID via a tiny shim that
|
||||
# simulates `id -u l4d2-sandbox` resolving to a valid number.
|
||||
helper_text = SCRIPT_SANDBOX_HELPER.read_text()
|
||||
# We can't actually exec this without root + a real sandbox user; just
|
||||
# verify the dry-run guard short-circuits before systemd-run runs.
|
||||
# We can't actually exec this without root; just verify the dry-run
|
||||
# guard short-circuits before systemd-run runs.
|
||||
assert 'LEFT4ME_SCRIPT_SANDBOX_DRY_RUN' in helper_text
|
||||
assert 'exit 0' in helper_text
|
||||
|
|
|
|||
Loading…
Reference in a new issue