From efaaf84cd99c99d0f1f27e294048806f19152a3b Mon Sep 17 00:00:00 2001 From: mwiegand Date: Fri, 8 May 2026 16:46:13 +0200 Subject: [PATCH] =?UTF-8?q?docs(specs):=20script=20sandbox=20v2=20?= =?UTF-8?q?=E2=80=94=20systemd-only=20design=20+=20plan?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Spec captures the v2 architecture (systemd-run service mode with full hardening directives, no bwrap), the two surfaces in scope (helper rewrite + bubblewrap dep removal + left4me.db mode tightening), and the gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128: - ProtectSystem=strict makes /var/lib/left4me visible (not invisible); must add TemporaryFileSystem=/var/lib to mask it. - Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax. - No PrivatePID= directive in systemd; host PIDs visible via /proc. Information disclosure only — kernel UID-mismatch blocks signals. Plan breaks the migration into 4 tasks (helper rewrite, deploy-script deps + DB mode, host smoke-test, drift sweep) with explicit rollback. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...26-05-08-l4d2-script-sandbox-v2-systemd.md | 146 ++++++++++++++++++ ...26-05-08-l4d2-script-sandbox-v2-systemd.md | 138 +++++++++++++++++ 2 files changed, 284 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-08-l4d2-script-sandbox-v2-systemd.md create mode 100644 docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md diff --git a/docs/superpowers/plans/2026-05-08-l4d2-script-sandbox-v2-systemd.md b/docs/superpowers/plans/2026-05-08-l4d2-script-sandbox-v2-systemd.md new file mode 100644 index 0000000..d0f8b5a --- /dev/null +++ b/docs/superpowers/plans/2026-05-08-l4d2-script-sandbox-v2-systemd.md @@ -0,0 +1,146 @@ +# L4D2 Script Sandbox v2 Implementation Plan + +> **Approval status:** User-approved 2026-05-08 after smoke-testing the v2 prototype on `ckn@10.0.4.128`. + +**Goal:** Replace the bwrap-based sandbox helper with a systemd-only one per `docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md`. Drop the `bubblewrap` apt dep. Tighten `left4me.db` file mode to 0640 root:left4me. Update the deploy-artifact tests to assert the new helper shape. + +**Architecture:** See spec. Helper invokes `systemd-run --pipe --wait` in service-unit mode with full hardening directives. No bwrap. Web-app side (`ScriptBuilder`, `run_sandboxed_script`, routes) is unchanged. + +--- + +## Locked Decisions + +See spec §Locked Decisions for rationale. Implementation summary: + +- Helper file at the same path (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`) is rewritten in place. +- The sudoers rule is unchanged. +- `bubblewrap` dropped from `apt-get install` / `dnf install` lines. +- `left4me.db` chmod 0640 added to deploy script as a post-init step. +- Sandbox UID, system user, overlay-dir chown logic, and ScriptBuilder API stay the same. + +--- + +## Current Gap + +- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` invokes `systemd-run --scope ... -- bwrap [namespace flags] /bin/bash /script.sh`. +- `deploy/deploy-test-server.sh` line ~84 installs `bubblewrap` via apt/dnf. +- `deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` asserts `bwrap`, `--unshare-pid`, `--uid=l4d2-sandbox`, etc. +- `deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap` asserts `bubblewrap` is in apt/dnf install lines. +- `left4me.db` is created at deploy time with the default 0644 permissions; any host user can read it. + +--- + +## Task 1: Rewrite the sandbox helper to be systemd-only + +**Files:** + +- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — replace the `systemd-run --scope … bwrap …` invocation with `systemd-run --service --pipe --wait …` carrying the hardening directives. + +Test plan: + +1. `bash -n` syntax check (already covered by `test_script_sandbox_helper_passes_shell_syntax_check`). +2. `test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` is replaced by a new pin: `test_script_sandbox_helper_invokes_systemd_run_with_hardening`. Asserts: + - No `bwrap` reference remains. + - `systemd-run` is invoked with `--pipe`, `--wait`, `--collect`, `--unit=` (transient service unit form, no `--scope`). + - All hardening directives present: `NoNewPrivileges=yes`, `ProtectSystem=strict`, `ProtectHome=yes`, `PrivateTmp=yes`, `PrivateDevices=yes`, `PrivateIPC=yes`, `ProtectKernelTunables=yes`, `ProtectKernelModules=yes`, `ProtectKernelLogs=yes`, `ProtectControlGroups=yes`, `RestrictNamespaces=yes`, `RestrictSUIDSGID=yes`, `LockPersonality=yes`, `MemoryDenyWriteExecute=yes`, `SystemCallFilter=`, `CapabilityBoundingSet=` (empty), `User=l4d2-sandbox`, `Group=l4d2-sandbox`. + - `TemporaryFileSystem=` covers `/etc` and `/var/lib`. + - `BindReadOnlyPaths=` includes `/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives` and the script bind `${SCRIPT}:/script.sh`. + - `BindPaths=` carries the overlay bind. + - Cgroup limits unchanged (`MemoryMax=4G`, `MemorySwapMax=0`, `TasksMax=512`, `CPUQuota=200%`, `RuntimeMaxSec=3600`). +3. Existing `test_script_sandbox_helper_dry_run_mode` keeps passing — the dry-run guard still short-circuits before `systemd-run`. +4. Existing `test_script_sandbox_helper_validates_overlay_id` keeps passing — argument validation is unchanged. + +Implementation: helper body verbatim from the spec §Helper. + +**Verification:** + +``` +python3 -m pytest deploy/tests/test_deploy_artifacts.py -q +bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox +``` + +**Commit:** `refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap` + +--- + +## Task 2: Drop bubblewrap apt/dnf dep + tighten left4me.db mode + +**Files:** + +- Modify: `deploy/deploy-test-server.sh` — remove `bubblewrap` from `apt-get install` / `dnf install` package lists; add a post-init step that ensures `left4me.db` is mode 0640 owned `root:left4me`. +- Modify: `deploy/tests/test_deploy_artifacts.py` — replace `test_deploy_script_installs_bubblewrap` with `test_deploy_script_does_not_install_bubblewrap`; add `test_deploy_script_tightens_left4me_db_permissions`. + +Test plan: + +1. `test_deploy_script_does_not_install_bubblewrap` — for each `apt-get install` / `dnf install` line, `bubblewrap` is absent. +2. `test_deploy_script_tightens_left4me_db_permissions` — script contains `chmod 0640 /var/lib/left4me/left4me.db` and `chown root:left4me /var/lib/left4me/left4me.db` (in either order). +3. `test_deploy_script_shell_syntax` keeps passing (`sh -n`). + +Implementation: + +- Remove the bare `bubblewrap` token from the two install lines. +- After the `alembic upgrade head` step (which creates the DB if missing), add: + ``` + $sudo_cmd chown root:left4me /var/lib/left4me/left4me.db + $sudo_cmd chmod 0640 /var/lib/left4me/left4me.db + ``` + Idempotent — re-runs are no-ops. + +**Verification:** + +``` +python3 -m pytest deploy/tests/test_deploy_artifacts.py -q +sh -n deploy/deploy-test-server.sh +``` + +**Commit:** `chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode to 0640 root:left4me` + +--- + +## Task 3: Deploy + smoke-test on the test host + +**Files:** none. + +This is an operational verification step, not a code change. Run `deploy/deploy-test-server.sh ckn@10.0.4.128`, then on the host re-run the same smoke battery used to validate the prototype: + +1. **Identity / privileges**: `id` returns `uid=996 gid=985`; `/proc/self/status` shows `NoNewPrivs: 1` and `CapBnd: 0000000000000000`. +2. **Filesystem isolation**: `/etc/passwd` absent, `/etc/alternatives/awk` present, `/var/lib/left4me/left4me.db` absent, `/home` inaccessible, `/usr` not writable, `/overlay` writable. +3. **Tools + network**: `awk` resolves through `/etc/alternatives`; `curl https://steamcommunity.com/` returns 200. +4. **Cgroup limits**: while a 5s-sleep script runs, `cat /sys/fs/cgroup/.../memory.max` returns `4294967296`; `pids.max` `512`; `cpu.max` `200000 100000`. +5. **Memory cap**: 5 GB Python alloc raises `MemoryError`. +6. **Wipe**: `find /overlay -mindepth 1 -delete` empties the overlay dir. +7. **Seccomp / restriction probes**: `unshare -U`, `mount -t tmpfs`, `setarch -X`, `bpf` setsockopt all fail with EPERM/EINVAL. +8. **Build via web UI**: log in as admin, create a script overlay with `echo "hi" > foo`, click Save, confirm job succeeds and `foo` appears in `/var/lib/left4me/overlays/{id}/foo`. +9. **DB hardening**: `stat -c "%a %U:%G" /var/lib/left4me/left4me.db` returns `640 root:left4me`. + +Mark this task complete only after every check passes on the live host. + +**Commit:** none (operational verification — record results in conversation/PR description). + +--- + +## Task 4: Drift sweep + push + +**Files:** as needed across the repo. + +Run the full test suite for all three packages; chase any drift caused by the helper rewrite or deploy-script changes. + +``` +python3 -m pytest l4d2web/tests/ -q +python3 -m pytest l4d2host/tests/ -q +python3 -m pytest deploy/tests/ -q +``` + +Implementation: fix what breaks. Expected: nothing new should break, since the Python-side contract is unchanged. If something does, treat it as a sign of an unintended coupling and address. + +Push the commits to `origin/master`. + +**Verification:** all three suites green; `git status` clean; commits visible on `git.sublimity.de/cronekorkn/left4me`. + +**Commit:** none unless drift fixes are needed. + +--- + +## Rollback plan + +If Task 3 surfaces a blocker (a hardening directive breaks a real-world script class, seccomp filter is too narrow, BindPaths semantics differ on the host's systemd version), roll back via `git revert` of Tasks 1+2 and redeploy. Git history preserves both the v1 and v2 helper. The Python side never changed, so reverting only the deploy artifacts is sufficient — no DB migration to undo, no template change to roll back. diff --git a/docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md b/docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md new file mode 100644 index 0000000..7d33953 --- /dev/null +++ b/docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md @@ -0,0 +1,138 @@ +# L4D2 Script Sandbox v2 — Systemd-Only + +**Goal:** Replace the bwrap-based `left4me-script-sandbox` helper with one that uses `systemd-run` in **service-unit mode** alone. Drop `bubblewrap` as a system dependency. Gain capability bounding, seccomp filtering, kernel-tunable / -module / -log protection, address-family restriction, `LockPersonality`, `MemoryDenyWriteExecute`, and `RestrictSUIDSGID` — none of which the bwrap+systemd-run-scope composition could provide. Lose PID-namespace isolation (no `PrivatePID=` directive in systemd) — judged acceptable for the current trust model. + +**Approval status:** User-approved 2026-05-08 after smoke testing on `ckn@10.0.4.128`. + +## Context + +The v1 sandbox (see `2026-05-08-l4d2-script-overlays-design.md`) layers `bubblewrap` for namespacing inside `systemd-run --scope` for cgroup limits. That works, but `--scope` units register an existing process tree and so cannot accept service-only directives like `NoNewPrivileges=`, `ProtectSystem=`, `SystemCallFilter=`, `CapabilityBoundingSet=`, etc. Smoke testing on the deployed host confirmed bwrap covers mount/PID/IPC/UTS namespacing well, but leaves capability bounding, seccomp, and kernel-surface protection unenforced. + +A switch to `systemd-run` in default (transient service) mode unlocks the full hardening surface. Smoke testing of a v2 prototype against the deployed test host confirmed: + +- Every isolation invariant the bwrap version provides (filesystem masking, UID drop, network reachability, `/overlay` RW bind, host-side `l4d2-sandbox` ownership, host secret hiding) is reproducible with systemd directives. +- All cgroup limits (`memory.max=4G`, `memory.swap.max=0`, `pids.max=512`, `cpu.max=200%`, `RuntimeMaxSec=3600`) apply identically. +- `MemoryError` fires at the 4 GB cap (cgroup-enforced). +- The wipe path (`find /overlay -mindepth 1 -delete`) succeeds. +- Hardening directives the v1 design couldn't express enforce real syscall blocks: `unshare(CLONE_NEWUSER)`, `mount(2)`, `personality(2)`, `bpf(2)`, `swapoff(2)`, `sysctl -w` are all blocked. + +The single behavioral regression: host process IDs are visible via `/proc` and `ps -ef` because systemd has no `PrivatePID=` directive. Sending signals to those processes is still blocked by the kernel's UID-mismatch check (`l4d2-sandbox` cannot signal `root`-owned processes). Information disclosure is the only leak; signal capability is intact. + +## Locked Decisions + +1. **Replace the helper body wholesale.** No `bwrap` invocation. `systemd-run` in service mode does both isolation and resource limits. +2. **Helper path, sudoers rule, ScriptBuilder API, and `l4d2-sandbox` UID are unchanged.** The Python side (`run_sandboxed_script`, route handlers, tests) does not change. +3. **`bubblewrap` apt dependency dropped from `deploy-test-server.sh`.** +4. **`left4me.db` file mode tightened to 0640 root:left4me at deploy time.** This is a host-hygiene fix that is independent of the sandbox change but was surfaced by smoke testing — without it, *any* host user (and, transitively, the sandbox) could read the application database. +5. **`TemporaryFileSystem=/var/lib` is required.** `ProtectSystem=strict` makes `/var/lib/left4me` read-only but visible; the only way to reliably hide its contents from the unit is to mask the parent with a tmpfs. The `BindPaths=…/overlays/{id}:/overlay` mount is unaffected because `/overlay` is at a different path. +6. **`PrivatePID=` is not configured.** systemd has no such directive. `ps -ef` from inside the sandbox shows host processes. The kernel's UID-based signal restriction blocks any actual interaction with them. Acceptable for the current trust model. +7. **Walltime kill remains `RuntimeMaxSec=3600`.** Same as v1. +8. **Network namespace remains shared with the host.** No `PrivateNetwork=`. Scripts must reach Steam / l4d2center / GitHub / etc. +9. **`SystemCallFilter=@system-service @network-io`** is the seccomp baseline. systemd's curated `@system-service` group is "everything a normal service does"; adding `@network-io` is explicit even though it overlaps. Build failures revealing missing syscall classes are surfaced via `journalctl` and addressed by widening the filter (`@process`, etc.) on demand. +10. **Single helper file replaces v1.** Not adding a `-v2` variant. The v1 implementation is removed in the same change. + +## Architecture + +```text +sudo helper + └─ systemd-run --service (default) --pipe --wait + (transient .service unit, full hardening directives) + └─ /bin/bash /script.sh +``` + +systemd-run in service mode: +- Opens a transient service unit on the system bus. +- Applies all `-p` properties as the unit's exec context. +- Forks; the child sets up the unit's namespaces (mount, IPC, user), drops privileges to `User=l4d2-sandbox`, applies the seccomp filter, and `execve()`s `/bin/bash /script.sh`. +- `--pipe` connects the unit's stdin/stdout/stderr to the calling helper's stdio (so the existing `run_command` harness in `ScriptBuilder` continues to capture line-by-line). +- `--wait` blocks until the unit terminates and propagates the exit code. +- `--collect` removes the unit on exit even if it failed. +- The cgroup carries the resource limits; the systemd timer enforces `RuntimeMaxSec=3600`. + +### Helper + +`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, mode 0755, owned root: + +```bash +#!/bin/bash +set -euo pipefail +[[ $# -eq 2 ]] || { echo "usage: $0