Compare commits

...

3 commits

Author SHA1 Message Date
mwiegand
ae443299c8
chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode
bubblewrap is no longer used now that left4me-script-sandbox runs as a
systemd service unit. Remove it from the apt-get and dnf install lines.

Also tighten the application database file mode after the alembic
upgrade step: chown root:left4me, chmod 0640. The DB had been created
at default 0644 by SQLite's open() call inside the web service, which
made it world-readable on the host — i.e. readable by any uid that can
traverse /var/lib/left4me, including the sandbox's l4d2-sandbox uid.
Smoke-testing the v2 sandbox prototype on ckn@10.0.4.128 surfaced this:
the sandbox could read "SQLite format 3" from the DB until the parent
dir was masked with TemporaryFileSystem=. Tightening the file mode is
the host-level fix; the sandbox-level mask is defense in depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:48:26 +02:00
mwiegand
4ee8f6af44
refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap
Replaces the systemd-run --scope + bwrap composition with systemd-run in
service-unit mode (--pipe --wait, transient .service unit). Same cgroup
limits and walltime kill, plus the hardening directives that --scope
units cannot carry: NoNewPrivileges, ProtectSystem=strict, ProtectHome,
ProtectKernel{Tunables,Modules,Logs,ControlGroups}, RestrictNamespaces,
RestrictAddressFamilies, RestrictSUIDSGID, LockPersonality,
MemoryDenyWriteExecute, SystemCallFilter (seccomp), and an empty
CapabilityBoundingSet (drops all caps). UID drop via User=/Group=.

The TemporaryFileSystem="/etc /var/lib" pair is the gotcha:
ProtectSystem=strict makes /var/lib *read-only* but visible, so the host
DB at /var/lib/left4me/left4me.db (mode 0644) was readable from inside.
Masking /var/lib with tmpfs hides the entire subtree; the BindPaths bind
to /overlay is at a different path and unaffected.

The Python side (ScriptBuilder, run_sandboxed_script, routes) is
unchanged — same sudo-helper invocation, same argv shape.

Loses PID-namespace isolation (no PrivatePID= directive in systemd).
Host PIDs are visible via /proc and ps -ef but not signal-able due to
UID mismatch — information disclosure only, not a privilege boundary.

Smoke-tested on ckn@10.0.4.128 prior to this commit; all isolation
invariants reproduced and the hardening directives provably blocked
unshare(2), mount(2), personality(2), bpf(2), and sysctl writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:47:30 +02:00
mwiegand
efaaf84cd9
docs(specs): script sandbox v2 — systemd-only design + plan
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
  must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
  Information disclosure only — kernel UID-mismatch blocks signals.

Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:46:13 +02:00
5 changed files with 393 additions and 45 deletions

View file

@ -85,9 +85,9 @@ fi
if command -v apt-get >/dev/null 2>&1; then
$sudo_cmd apt-get update
$sudo_cmd apt-get install -y python3 python3-venv python3-pip curl ca-certificates tar gzip util-linux sudo bubblewrap
$sudo_cmd apt-get install -y python3 python3-venv python3-pip curl ca-certificates tar gzip util-linux sudo
elif command -v dnf >/dev/null 2>&1; then
$sudo_cmd dnf install -y python3 python3-pip curl ca-certificates tar gzip util-linux sudo bubblewrap
$sudo_cmd dnf install -y python3 python3-pip curl ca-certificates tar gzip util-linux sudo
else
printf 'Unsupported package manager: expected apt-get or dnf\n' >&2
exit 1
@ -171,6 +171,16 @@ run_as_left4me sh -c "cd /opt/left4me/l4d2web && set -a; . /etc/left4me/host.env
PYTHONPATH=/opt/left4me \
/opt/left4me/.venv/bin/alembic -c /opt/left4me/l4d2web/alembic.ini upgrade head"
# Tighten the application database to root:left4me 0640. The DB is created
# by the web app at first start with the default 0644 umask, which makes it
# world-readable on the host. The script-overlay sandbox runs as a separate
# system uid (l4d2-sandbox) — without this clamp, a sandboxed script could
# read the entire app DB. Idempotent on rerun.
if [ -f /var/lib/left4me/left4me.db ]; then
$sudo_cmd chown root:left4me /var/lib/left4me/left4me.db
$sudo_cmd chmod 0640 /var/lib/left4me/left4me.db
fi
if [ -f "$remote_tmp/admin_username" ] && [ -f "$remote_tmp/admin_password" ]; then
LEFT4ME_ADMIN_USERNAME=$(cat "$remote_tmp/admin_username")
LEFT4ME_ADMIN_PASSWORD=$(cat "$remote_tmp/admin_password")

View file

@ -7,12 +7,16 @@
# <script_path> absolute path to a bash file already written by the web app;
# bind-mounted read-only at /script.sh inside the sandbox.
#
# The script runs under bubblewrap inside a transient systemd scope so we get
# cgroup-v2 limits (memory / tasks / cpu) and a wallclock kill via
# RuntimeMaxSec. The sandbox drops to the unprivileged l4d2-sandbox UID;
# host filesystems are exposed read-only except /overlay (rw) and tmpfs
# /tmp + /run. Network namespace is *not* unshared — scripts must reach the
# public internet to download workshop / l4d2center / cedapug content.
# The script runs as a transient systemd .service with the full hardening
# surface: cgroup limits + walltime kill, NoNewPrivileges, ProtectSystem,
# ProtectHome, kernel-tunable / -module / -log protection, namespace
# restriction, address-family restriction, capability bounding (empty),
# seccomp filter (@system-service @network-io), MemoryDenyWriteExecute,
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
# scripts must reach the public internet to download workshop / l4d2center
# / cedapug content. PID namespace is shared with the host (no
# PrivatePID= directive in systemd); host PIDs are visible via /proc but
# not signal-able due to UID mismatch.
set -euo pipefail
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
@ -33,33 +37,31 @@ fi
# Make sure the sandbox UID owns the overlay dir so the script can write there.
# Idempotent: a no-op when the dir is already l4d2-sandbox-owned (re-run case),
# and corrects the ownership the first time the dir was created by the web app
# under the left4me UID. Group-readable so the gameserver process (left4me)
# under the left4me UID. World-readable so the gameserver process (left4me)
# can read the overlay contents via the kernel-overlayfs lowerdir at runtime.
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
chmod 0755 "$OVERLAY_DIR"
# UID/GID drop happens via systemd-run --uid/--gid before bwrap is invoked.
# bwrap then runs unprivileged as l4d2-sandbox; --unshare-user-try gives it
# the user-namespace context it needs for bind-mounts as a regular user.
exec systemd-run --quiet --scope --collect \
--uid=l4d2-sandbox --gid=l4d2-sandbox \
exec systemd-run --quiet --collect --wait --pipe \
--unit="left4me-script-${OVERLAY_ID}-$$" \
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
-p NoNewPrivileges=yes \
-p ProtectSystem=strict -p ProtectHome=yes \
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
-p RestrictNamespaces=yes \
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
-p MemoryDenyWriteExecute=yes \
-p SystemCallFilter="@system-service @network-io" \
-p SystemCallArchitectures=native \
-p CapabilityBoundingSet= -p AmbientCapabilities= \
-p TemporaryFileSystem="/etc /var/lib" \
-p BindReadOnlyPaths="/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-p BindPaths="${OVERLAY_DIR}:/overlay" \
-p WorkingDirectory=/overlay \
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
-- bwrap \
--die-with-parent --new-session \
--unshare-user-try \
--unshare-pid --unshare-ipc --unshare-uts --unshare-cgroup \
--proc /proc --dev /dev --tmpfs /tmp --tmpfs /run \
--ro-bind /usr /usr --ro-bind /lib /lib --ro-bind /lib64 /lib64 \
--symlink usr/bin /bin --symlink usr/sbin /sbin \
--ro-bind /etc/resolv.conf /etc/resolv.conf \
--ro-bind /etc/ssl /etc/ssl \
--ro-bind /etc/ca-certificates /etc/ca-certificates \
--ro-bind /etc/nsswitch.conf /etc/nsswitch.conf \
--ro-bind /etc/alternatives /etc/alternatives \
--bind "$OVERLAY_DIR" /overlay \
--chdir /overlay \
--setenv HOME /tmp --setenv PATH /usr/bin:/usr/sbin \
--setenv OVERLAY /overlay \
--ro-bind "$SCRIPT" /script.sh \
/bin/bash /script.sh
-- /bin/bash /script.sh

View file

@ -294,14 +294,21 @@ def test_deploy_script_provisions_sandbox_user():
assert "useradd --system --no-create-home --shell /usr/sbin/nologin l4d2-sandbox" in script
def test_deploy_script_installs_bubblewrap():
def test_deploy_script_does_not_install_bubblewrap():
install_lines = [
line for line in DEPLOY_SCRIPT.read_text().splitlines()
if ("apt-get install" in line or "dnf install" in line)
]
assert install_lines, "expected at least one apt/dnf install line"
for line in install_lines:
assert "bubblewrap" in line, f"missing bubblewrap in install line: {line}"
assert "bubblewrap" not in line, line
assert "bwrap" not in line, line
def test_deploy_script_tightens_left4me_db_permissions():
script = DEPLOY_SCRIPT.read_text()
assert "chown root:left4me /var/lib/left4me/left4me.db" in script
assert "chmod 0640 /var/lib/left4me/left4me.db" in script
def test_deploy_script_installs_script_sandbox_helper():
@ -321,22 +328,67 @@ def test_script_sandbox_helper_passes_shell_syntax_check():
subprocess.run(["bash", "-n", str(SCRIPT_SANDBOX_HELPER)], check=True)
def test_script_sandbox_helper_invokes_systemd_run_and_bwrap():
def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
text = SCRIPT_SANDBOX_HELPER.read_text()
# systemd-run service mode (no --scope), with synchronous I/O to caller.
assert "systemd-run" in text
assert "--scope" in text
assert "--scope" not in text, "v2 uses transient service units, not scopes"
assert "--pipe" in text
assert "--wait" in text
assert "--collect" in text
assert "--unit=" in text
# No bwrap.
assert "bwrap" not in text
assert "bubblewrap" not in text
# UID drop via systemd directives.
assert "User=l4d2-sandbox" in text
assert "Group=l4d2-sandbox" in text
# Cgroup limits unchanged from v1.
assert "MemoryMax=4G" in text
assert "RuntimeMaxSec=3600" in text
assert "MemorySwapMax=0" in text
assert "TasksMax=512" in text
assert "bwrap" in text
assert "--unshare-pid" in text
assert "--unshare-net" not in text, "scripts must keep host network access"
# UID drop happens at systemd-run, not inside bwrap (modern bwrap requires
# --unshare-user for --uid; doing the drop earlier keeps file ownership
# straight on the host bind-mount).
assert "--uid=l4d2-sandbox" in text
assert "--gid=l4d2-sandbox" in text
assert "CPUQuota=200%" in text
assert "RuntimeMaxSec=3600" in text
# Hardening directives that v1 (scope mode) couldn't carry.
assert "NoNewPrivileges=yes" in text
assert "ProtectSystem=strict" in text
assert "ProtectHome=yes" in text
assert "PrivateTmp=yes" in text
assert "PrivateDevices=yes" in text
assert "PrivateIPC=yes" in text
assert "ProtectKernelTunables=yes" in text
assert "ProtectKernelModules=yes" in text
assert "ProtectKernelLogs=yes" in text
assert "ProtectControlGroups=yes" in text
assert "RestrictNamespaces=yes" in text
assert "RestrictSUIDSGID=yes" in text
assert "LockPersonality=yes" in text
assert "MemoryDenyWriteExecute=yes" in text
assert "SystemCallFilter=" in text
assert "@system-service" in text
assert "@network-io" in text
assert "CapabilityBoundingSet=" in text
assert "AmbientCapabilities=" in text
assert 'RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX"' in text
# Network namespace stays shared with host.
assert "PrivateNetwork=" not in text
# Mount setup: /etc and /var/lib masked with tmpfs; selective binds back.
assert 'TemporaryFileSystem="/etc /var/lib"' in text
assert "BindReadOnlyPaths=" in text
assert "/etc/resolv.conf" in text
assert "/etc/ssl" in text
assert "/etc/ca-certificates" in text
assert "/etc/nsswitch.conf" in text
assert "/etc/alternatives" in text
assert "${SCRIPT}:/script.sh" in text
assert 'BindPaths="${OVERLAY_DIR}:/overlay"' in text
def test_script_sandbox_helper_validates_overlay_id():

View file

@ -0,0 +1,146 @@
# L4D2 Script Sandbox v2 Implementation Plan
> **Approval status:** User-approved 2026-05-08 after smoke-testing the v2 prototype on `ckn@10.0.4.128`.
**Goal:** Replace the bwrap-based sandbox helper with a systemd-only one per `docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md`. Drop the `bubblewrap` apt dep. Tighten `left4me.db` file mode to 0640 root:left4me. Update the deploy-artifact tests to assert the new helper shape.
**Architecture:** See spec. Helper invokes `systemd-run --pipe --wait` in service-unit mode with full hardening directives. No bwrap. Web-app side (`ScriptBuilder`, `run_sandboxed_script`, routes) is unchanged.
---
## Locked Decisions
See spec §Locked Decisions for rationale. Implementation summary:
- Helper file at the same path (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`) is rewritten in place.
- The sudoers rule is unchanged.
- `bubblewrap` dropped from `apt-get install` / `dnf install` lines.
- `left4me.db` chmod 0640 added to deploy script as a post-init step.
- Sandbox UID, system user, overlay-dir chown logic, and ScriptBuilder API stay the same.
---
## Current Gap
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` invokes `systemd-run --scope ... -- bwrap [namespace flags] /bin/bash /script.sh`.
- `deploy/deploy-test-server.sh` line ~84 installs `bubblewrap` via apt/dnf.
- `deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` asserts `bwrap`, `--unshare-pid`, `--uid=l4d2-sandbox`, etc.
- `deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap` asserts `bubblewrap` is in apt/dnf install lines.
- `left4me.db` is created at deploy time with the default 0644 permissions; any host user can read it.
---
## Task 1: Rewrite the sandbox helper to be systemd-only
**Files:**
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — replace the `systemd-run --scope … bwrap …` invocation with `systemd-run --service --pipe --wait …` carrying the hardening directives.
Test plan:
1. `bash -n` syntax check (already covered by `test_script_sandbox_helper_passes_shell_syntax_check`).
2. `test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` is replaced by a new pin: `test_script_sandbox_helper_invokes_systemd_run_with_hardening`. Asserts:
- No `bwrap` reference remains.
- `systemd-run` is invoked with `--pipe`, `--wait`, `--collect`, `--unit=` (transient service unit form, no `--scope`).
- All hardening directives present: `NoNewPrivileges=yes`, `ProtectSystem=strict`, `ProtectHome=yes`, `PrivateTmp=yes`, `PrivateDevices=yes`, `PrivateIPC=yes`, `ProtectKernelTunables=yes`, `ProtectKernelModules=yes`, `ProtectKernelLogs=yes`, `ProtectControlGroups=yes`, `RestrictNamespaces=yes`, `RestrictSUIDSGID=yes`, `LockPersonality=yes`, `MemoryDenyWriteExecute=yes`, `SystemCallFilter=`, `CapabilityBoundingSet=` (empty), `User=l4d2-sandbox`, `Group=l4d2-sandbox`.
- `TemporaryFileSystem=` covers `/etc` and `/var/lib`.
- `BindReadOnlyPaths=` includes `/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives` and the script bind `${SCRIPT}:/script.sh`.
- `BindPaths=` carries the overlay bind.
- Cgroup limits unchanged (`MemoryMax=4G`, `MemorySwapMax=0`, `TasksMax=512`, `CPUQuota=200%`, `RuntimeMaxSec=3600`).
3. Existing `test_script_sandbox_helper_dry_run_mode` keeps passing — the dry-run guard still short-circuits before `systemd-run`.
4. Existing `test_script_sandbox_helper_validates_overlay_id` keeps passing — argument validation is unchanged.
Implementation: helper body verbatim from the spec §Helper.
**Verification:**
```
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
```
**Commit:** `refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap`
---
## Task 2: Drop bubblewrap apt/dnf dep + tighten left4me.db mode
**Files:**
- Modify: `deploy/deploy-test-server.sh` — remove `bubblewrap` from `apt-get install` / `dnf install` package lists; add a post-init step that ensures `left4me.db` is mode 0640 owned `root:left4me`.
- Modify: `deploy/tests/test_deploy_artifacts.py` — replace `test_deploy_script_installs_bubblewrap` with `test_deploy_script_does_not_install_bubblewrap`; add `test_deploy_script_tightens_left4me_db_permissions`.
Test plan:
1. `test_deploy_script_does_not_install_bubblewrap` — for each `apt-get install` / `dnf install` line, `bubblewrap` is absent.
2. `test_deploy_script_tightens_left4me_db_permissions` — script contains `chmod 0640 /var/lib/left4me/left4me.db` and `chown root:left4me /var/lib/left4me/left4me.db` (in either order).
3. `test_deploy_script_shell_syntax` keeps passing (`sh -n`).
Implementation:
- Remove the bare `bubblewrap` token from the two install lines.
- After the `alembic upgrade head` step (which creates the DB if missing), add:
```
$sudo_cmd chown root:left4me /var/lib/left4me/left4me.db
$sudo_cmd chmod 0640 /var/lib/left4me/left4me.db
```
Idempotent — re-runs are no-ops.
**Verification:**
```
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
sh -n deploy/deploy-test-server.sh
```
**Commit:** `chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode to 0640 root:left4me`
---
## Task 3: Deploy + smoke-test on the test host
**Files:** none.
This is an operational verification step, not a code change. Run `deploy/deploy-test-server.sh ckn@10.0.4.128`, then on the host re-run the same smoke battery used to validate the prototype:
1. **Identity / privileges**: `id` returns `uid=996 gid=985`; `/proc/self/status` shows `NoNewPrivs: 1` and `CapBnd: 0000000000000000`.
2. **Filesystem isolation**: `/etc/passwd` absent, `/etc/alternatives/awk` present, `/var/lib/left4me/left4me.db` absent, `/home` inaccessible, `/usr` not writable, `/overlay` writable.
3. **Tools + network**: `awk` resolves through `/etc/alternatives`; `curl https://steamcommunity.com/` returns 200.
4. **Cgroup limits**: while a 5s-sleep script runs, `cat /sys/fs/cgroup/.../memory.max` returns `4294967296`; `pids.max` `512`; `cpu.max` `200000 100000`.
5. **Memory cap**: 5 GB Python alloc raises `MemoryError`.
6. **Wipe**: `find /overlay -mindepth 1 -delete` empties the overlay dir.
7. **Seccomp / restriction probes**: `unshare -U`, `mount -t tmpfs`, `setarch -X`, `bpf` setsockopt all fail with EPERM/EINVAL.
8. **Build via web UI**: log in as admin, create a script overlay with `echo "hi" > foo`, click Save, confirm job succeeds and `foo` appears in `/var/lib/left4me/overlays/{id}/foo`.
9. **DB hardening**: `stat -c "%a %U:%G" /var/lib/left4me/left4me.db` returns `640 root:left4me`.
Mark this task complete only after every check passes on the live host.
**Commit:** none (operational verification — record results in conversation/PR description).
---
## Task 4: Drift sweep + push
**Files:** as needed across the repo.
Run the full test suite for all three packages; chase any drift caused by the helper rewrite or deploy-script changes.
```
python3 -m pytest l4d2web/tests/ -q
python3 -m pytest l4d2host/tests/ -q
python3 -m pytest deploy/tests/ -q
```
Implementation: fix what breaks. Expected: nothing new should break, since the Python-side contract is unchanged. If something does, treat it as a sign of an unintended coupling and address.
Push the commits to `origin/master`.
**Verification:** all three suites green; `git status` clean; commits visible on `git.sublimity.de/cronekorkn/left4me`.
**Commit:** none unless drift fixes are needed.
---
## Rollback plan
If Task 3 surfaces a blocker (a hardening directive breaks a real-world script class, seccomp filter is too narrow, BindPaths semantics differ on the host's systemd version), roll back via `git revert` of Tasks 1+2 and redeploy. Git history preserves both the v1 and v2 helper. The Python side never changed, so reverting only the deploy artifacts is sufficient — no DB migration to undo, no template change to roll back.

View file

@ -0,0 +1,138 @@
# L4D2 Script Sandbox v2 — Systemd-Only
**Goal:** Replace the bwrap-based `left4me-script-sandbox` helper with one that uses `systemd-run` in **service-unit mode** alone. Drop `bubblewrap` as a system dependency. Gain capability bounding, seccomp filtering, kernel-tunable / -module / -log protection, address-family restriction, `LockPersonality`, `MemoryDenyWriteExecute`, and `RestrictSUIDSGID` — none of which the bwrap+systemd-run-scope composition could provide. Lose PID-namespace isolation (no `PrivatePID=` directive in systemd) — judged acceptable for the current trust model.
**Approval status:** User-approved 2026-05-08 after smoke testing on `ckn@10.0.4.128`.
## Context
The v1 sandbox (see `2026-05-08-l4d2-script-overlays-design.md`) layers `bubblewrap` for namespacing inside `systemd-run --scope` for cgroup limits. That works, but `--scope` units register an existing process tree and so cannot accept service-only directives like `NoNewPrivileges=`, `ProtectSystem=`, `SystemCallFilter=`, `CapabilityBoundingSet=`, etc. Smoke testing on the deployed host confirmed bwrap covers mount/PID/IPC/UTS namespacing well, but leaves capability bounding, seccomp, and kernel-surface protection unenforced.
A switch to `systemd-run` in default (transient service) mode unlocks the full hardening surface. Smoke testing of a v2 prototype against the deployed test host confirmed:
- Every isolation invariant the bwrap version provides (filesystem masking, UID drop, network reachability, `/overlay` RW bind, host-side `l4d2-sandbox` ownership, host secret hiding) is reproducible with systemd directives.
- All cgroup limits (`memory.max=4G`, `memory.swap.max=0`, `pids.max=512`, `cpu.max=200%`, `RuntimeMaxSec=3600`) apply identically.
- `MemoryError` fires at the 4 GB cap (cgroup-enforced).
- The wipe path (`find /overlay -mindepth 1 -delete`) succeeds.
- Hardening directives the v1 design couldn't express enforce real syscall blocks: `unshare(CLONE_NEWUSER)`, `mount(2)`, `personality(2)`, `bpf(2)`, `swapoff(2)`, `sysctl -w` are all blocked.
The single behavioral regression: host process IDs are visible via `/proc` and `ps -ef` because systemd has no `PrivatePID=` directive. Sending signals to those processes is still blocked by the kernel's UID-mismatch check (`l4d2-sandbox` cannot signal `root`-owned processes). Information disclosure is the only leak; signal capability is intact.
## Locked Decisions
1. **Replace the helper body wholesale.** No `bwrap` invocation. `systemd-run` in service mode does both isolation and resource limits.
2. **Helper path, sudoers rule, ScriptBuilder API, and `l4d2-sandbox` UID are unchanged.** The Python side (`run_sandboxed_script`, route handlers, tests) does not change.
3. **`bubblewrap` apt dependency dropped from `deploy-test-server.sh`.**
4. **`left4me.db` file mode tightened to 0640 root:left4me at deploy time.** This is a host-hygiene fix that is independent of the sandbox change but was surfaced by smoke testing — without it, *any* host user (and, transitively, the sandbox) could read the application database.
5. **`TemporaryFileSystem=/var/lib` is required.** `ProtectSystem=strict` makes `/var/lib/left4me` read-only but visible; the only way to reliably hide its contents from the unit is to mask the parent with a tmpfs. The `BindPaths=…/overlays/{id}:/overlay` mount is unaffected because `/overlay` is at a different path.
6. **`PrivatePID=` is not configured.** systemd has no such directive. `ps -ef` from inside the sandbox shows host processes. The kernel's UID-based signal restriction blocks any actual interaction with them. Acceptable for the current trust model.
7. **Walltime kill remains `RuntimeMaxSec=3600`.** Same as v1.
8. **Network namespace remains shared with the host.** No `PrivateNetwork=`. Scripts must reach Steam / l4d2center / GitHub / etc.
9. **`SystemCallFilter=@system-service @network-io`** is the seccomp baseline. systemd's curated `@system-service` group is "everything a normal service does"; adding `@network-io` is explicit even though it overlaps. Build failures revealing missing syscall classes are surfaced via `journalctl` and addressed by widening the filter (`@process`, etc.) on demand.
10. **Single helper file replaces v1.** Not adding a `-v2` variant. The v1 implementation is removed in the same change.
## Architecture
```text
sudo helper
└─ systemd-run --service (default) --pipe --wait
(transient .service unit, full hardening directives)
└─ /bin/bash /script.sh
```
systemd-run in service mode:
- Opens a transient service unit on the system bus.
- Applies all `-p` properties as the unit's exec context.
- Forks; the child sets up the unit's namespaces (mount, IPC, user), drops privileges to `User=l4d2-sandbox`, applies the seccomp filter, and `execve()`s `/bin/bash /script.sh`.
- `--pipe` connects the unit's stdin/stdout/stderr to the calling helper's stdio (so the existing `run_command` harness in `ScriptBuilder` continues to capture line-by-line).
- `--wait` blocks until the unit terminates and propagates the exit code.
- `--collect` removes the unit on exit even if it failed.
- The cgroup carries the resource limits; the systemd timer enforces `RuntimeMaxSec=3600`.
### Helper
`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, mode 0755, owned root:
```bash
#!/bin/bash
set -euo pipefail
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
OVERLAY_ID=$1; SCRIPT=$2
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
[[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
exit 0
fi
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
chmod 0755 "$OVERLAY_DIR"
exec systemd-run --quiet --collect --wait --pipe \
--unit="left4me-script-${OVERLAY_ID}-$$" \
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
-p NoNewPrivileges=yes \
-p ProtectSystem=strict -p ProtectHome=yes \
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
-p RestrictNamespaces=yes \
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
-p MemoryDenyWriteExecute=yes \
-p SystemCallFilter="@system-service @network-io" \
-p SystemCallArchitectures=native \
-p CapabilityBoundingSet= -p AmbientCapabilities= \
-p TemporaryFileSystem="/etc /var/lib" \
-p BindReadOnlyPaths="/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
-p BindPaths="${OVERLAY_DIR}:/overlay" \
-p WorkingDirectory=/overlay \
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
-- /bin/bash /script.sh
```
### Sudoers fragment
Unchanged from v1: `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox`.
### System user
Unchanged from v1: `l4d2-sandbox` (`useradd --system --no-create-home --shell /usr/sbin/nologin`).
### Filesystem expectations
- `/var/lib/left4me` must be mode 0711 (left4me-owned). Already provisioned by v1 deploy script.
- `/var/lib/left4me/left4me.db` mode 0640 root:left4me. **New** — added by this change.
- Overlay directory `/var/lib/left4me/overlays/{id}/` chowned to `l4d2-sandbox:l4d2-sandbox` 0755 by the helper before each run. Unchanged from v1.
## Build Lifecycle (unchanged from v1)
`ScriptBuilder.build()` writes the script to a 0644 tmpfile, exec's `sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>` via `run_command`, then runs `_enforce_disk_budget`. The helper's internal mechanism changes; the wrapper API is identical. `Overlay.last_build_status` is written by the job worker on completion.
## Risks
- **systemd CVE landing in our directive set.** Single-tool migration removes one isolation layer. Mitigated by uid drop + cgroup limits + `NoNewPrivileges=yes` (kernel-enforced state independent of namespace setup). The escape would be an unprivileged process with no filesystem isolation but still capped on resources; same severity envelope as a hypothetical bwrap CVE in v1. The trust model (registered users) makes a single isolation layer acceptable.
- **`SystemCallFilter` rejecting a syscall a user script unexpectedly needs.** Symptom: build fails with SIGSYS. Diagnosis: `journalctl --since "1 min ago" | grep SECCOMP`. Resolution: widen the filter (`+@process`, `+@privileged` if the script genuinely needs more than a normal service). v1 had no syscall filter, so this is a new failure class.
- **`ProtectSystem=strict` masking something a script wanted to write to.** Only `/overlay`, `/tmp`, `/run` are writable inside the sandbox. Same as v1.
- **Host PID visibility (no `PrivatePID=`).** Information disclosure; not a privilege boundary.
- **`MemoryDenyWriteExecute=yes` blocking JITs.** A script that launches `node` / a JIT runtime would fail because W+X mappings are blocked. None of the recipe set the user has historically used (curl + tar + cp) needs a JIT; revisit if a real script trips this.
- **`RestrictAddressFamilies` blocking some download tools.** `curl`, `wget`, `git over https` use `AF_INET`/`AF_INET6`; `getent hosts` uses `AF_UNIX` (nss). Smoke-tested as working. A script that wanted raw sockets (`AF_PACKET`) or netlink (`AF_NETLINK`) would fail; neither is plausible for build recipes.
## Out Of Scope
- **Per-overlay UID isolation.** Cross-script-overlay write access is still possible after a hypothetical sandbox bypass (every script overlay's dir is owned by `l4d2-sandbox`). A per-overlay UID pool was discussed as the next-step hardening but is deferred.
- **`PrivateNetwork=` / egress filtering.** No change from v1.
- **systemd-nspawn or LXC.** Researched; both are heavier than necessary for transient bash builds.
- **`PrivatePID=` workaround via `unshare`.** Not pursued — would require re-introducing a wrapper inside the unit, defeating the simplification.
## Implementation Boundaries
- **Web app code is unchanged.** `ScriptBuilder`, `run_sandboxed_script`, route handlers, models, migrations — all untouched. The migration is purely in the deployed helper script and adjacent deploy artifacts.
- **`bubblewrap` apt package removed.** Already absent from production paths after this change; deploy script updated.
- **No new systemd unit files.** Each invocation is a transient unit named `left4me-script-{overlay_id}-{pid}.service`.
- **No application-level dependency changes.** No new Python packages, no template changes, no DB migration.