# L4D2 Script Sandbox v2 — Systemd-Only **Goal:** Replace the bwrap-based `left4me-script-sandbox` helper with one that uses `systemd-run` in **service-unit mode** alone. Drop `bubblewrap` as a system dependency. Gain capability bounding, seccomp filtering, kernel-tunable / -module / -log protection, address-family restriction, `LockPersonality`, `MemoryDenyWriteExecute`, and `RestrictSUIDSGID` — none of which the bwrap+systemd-run-scope composition could provide. Lose PID-namespace isolation (no `PrivatePID=` directive in systemd) — judged acceptable for the current trust model. **Approval status:** User-approved 2026-05-08 after smoke testing on `ckn@10.0.4.128`. ## Context The v1 sandbox (see `2026-05-08-l4d2-script-overlays-design.md`) layers `bubblewrap` for namespacing inside `systemd-run --scope` for cgroup limits. That works, but `--scope` units register an existing process tree and so cannot accept service-only directives like `NoNewPrivileges=`, `ProtectSystem=`, `SystemCallFilter=`, `CapabilityBoundingSet=`, etc. Smoke testing on the deployed host confirmed bwrap covers mount/PID/IPC/UTS namespacing well, but leaves capability bounding, seccomp, and kernel-surface protection unenforced. A switch to `systemd-run` in default (transient service) mode unlocks the full hardening surface. Smoke testing of a v2 prototype against the deployed test host confirmed: - Every isolation invariant the bwrap version provides (filesystem masking, UID drop, network reachability, `/overlay` RW bind, host-side `l4d2-sandbox` ownership, host secret hiding) is reproducible with systemd directives. - All cgroup limits (`memory.max=4G`, `memory.swap.max=0`, `pids.max=512`, `cpu.max=200%`, `RuntimeMaxSec=3600`) apply identically. - `MemoryError` fires at the 4 GB cap (cgroup-enforced). - The wipe path (`find /overlay -mindepth 1 -delete`) succeeds. - Hardening directives the v1 design couldn't express enforce real syscall blocks: `unshare(CLONE_NEWUSER)`, `mount(2)`, `personality(2)`, `bpf(2)`, `swapoff(2)`, `sysctl -w` are all blocked. The single behavioral regression: host process IDs are visible via `/proc` and `ps -ef` because systemd has no `PrivatePID=` directive. Sending signals to those processes is still blocked by the kernel's UID-mismatch check (`l4d2-sandbox` cannot signal `root`-owned processes). Information disclosure is the only leak; signal capability is intact. ## Locked Decisions 1. **Replace the helper body wholesale.** No `bwrap` invocation. `systemd-run` in service mode does both isolation and resource limits. 2. **Helper path, sudoers rule, ScriptBuilder API, and `l4d2-sandbox` UID are unchanged.** The Python side (`run_sandboxed_script`, route handlers, tests) does not change. 3. **`bubblewrap` apt dependency dropped from `deploy-test-server.sh`.** 4. **`left4me.db` file mode tightened to 0640 root:left4me at deploy time.** This is a host-hygiene fix that is independent of the sandbox change but was surfaced by smoke testing — without it, *any* host user (and, transitively, the sandbox) could read the application database. 5. **`TemporaryFileSystem=/var/lib` is required.** `ProtectSystem=strict` makes `/var/lib/left4me` read-only but visible; the only way to reliably hide its contents from the unit is to mask the parent with a tmpfs. The `BindPaths=…/overlays/{id}:/overlay` mount is unaffected because `/overlay` is at a different path. 6. **`PrivatePID=` is not configured.** systemd has no such directive. `ps -ef` from inside the sandbox shows host processes. The kernel's UID-based signal restriction blocks any actual interaction with them. Acceptable for the current trust model. 7. **Walltime kill remains `RuntimeMaxSec=3600`.** Same as v1. 8. **Network namespace remains shared with the host.** No `PrivateNetwork=`. Scripts must reach Steam / l4d2center / GitHub / etc. 9. **`SystemCallFilter=@system-service @network-io`** is the seccomp baseline. systemd's curated `@system-service` group is "everything a normal service does"; adding `@network-io` is explicit even though it overlaps. Build failures revealing missing syscall classes are surfaced via `journalctl` and addressed by widening the filter (`@process`, etc.) on demand. 10. **Single helper file replaces v1.** Not adding a `-v2` variant. The v1 implementation is removed in the same change. ## Architecture ```text sudo helper └─ systemd-run --service (default) --pipe --wait (transient .service unit, full hardening directives) └─ /bin/bash /script.sh ``` systemd-run in service mode: - Opens a transient service unit on the system bus. - Applies all `-p` properties as the unit's exec context. - Forks; the child sets up the unit's namespaces (mount, IPC, user), drops privileges to `User=l4d2-sandbox`, applies the seccomp filter, and `execve()`s `/bin/bash /script.sh`. - `--pipe` connects the unit's stdin/stdout/stderr to the calling helper's stdio (so the existing `run_command` harness in `ScriptBuilder` continues to capture line-by-line). - `--wait` blocks until the unit terminates and propagates the exit code. - `--collect` removes the unit on exit even if it failed. - The cgroup carries the resource limits; the systemd timer enforces `RuntimeMaxSec=3600`. ### Helper `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, mode 0755, owned root: ```bash #!/bin/bash set -euo pipefail [[ $# -eq 2 ]] || { echo "usage: $0