left4me/docs/superpowers/plans/2026-05-08-l4d2-script-sandbox-v2-systemd.md
mwiegand efaaf84cd9
docs(specs): script sandbox v2 — systemd-only design + plan
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
  must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
  Information disclosure only — kernel UID-mismatch blocks signals.

Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:46:13 +02:00

146 lines
7.9 KiB
Markdown

# L4D2 Script Sandbox v2 Implementation Plan
> **Approval status:** User-approved 2026-05-08 after smoke-testing the v2 prototype on `ckn@10.0.4.128`.
**Goal:** Replace the bwrap-based sandbox helper with a systemd-only one per `docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md`. Drop the `bubblewrap` apt dep. Tighten `left4me.db` file mode to 0640 root:left4me. Update the deploy-artifact tests to assert the new helper shape.
**Architecture:** See spec. Helper invokes `systemd-run --pipe --wait` in service-unit mode with full hardening directives. No bwrap. Web-app side (`ScriptBuilder`, `run_sandboxed_script`, routes) is unchanged.
---
## Locked Decisions
See spec §Locked Decisions for rationale. Implementation summary:
- Helper file at the same path (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`) is rewritten in place.
- The sudoers rule is unchanged.
- `bubblewrap` dropped from `apt-get install` / `dnf install` lines.
- `left4me.db` chmod 0640 added to deploy script as a post-init step.
- Sandbox UID, system user, overlay-dir chown logic, and ScriptBuilder API stay the same.
---
## Current Gap
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` invokes `systemd-run --scope ... -- bwrap [namespace flags] /bin/bash /script.sh`.
- `deploy/deploy-test-server.sh` line ~84 installs `bubblewrap` via apt/dnf.
- `deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` asserts `bwrap`, `--unshare-pid`, `--uid=l4d2-sandbox`, etc.
- `deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap` asserts `bubblewrap` is in apt/dnf install lines.
- `left4me.db` is created at deploy time with the default 0644 permissions; any host user can read it.
---
## Task 1: Rewrite the sandbox helper to be systemd-only
**Files:**
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — replace the `systemd-run --scope … bwrap …` invocation with `systemd-run --service --pipe --wait …` carrying the hardening directives.
Test plan:
1. `bash -n` syntax check (already covered by `test_script_sandbox_helper_passes_shell_syntax_check`).
2. `test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` is replaced by a new pin: `test_script_sandbox_helper_invokes_systemd_run_with_hardening`. Asserts:
- No `bwrap` reference remains.
- `systemd-run` is invoked with `--pipe`, `--wait`, `--collect`, `--unit=` (transient service unit form, no `--scope`).
- All hardening directives present: `NoNewPrivileges=yes`, `ProtectSystem=strict`, `ProtectHome=yes`, `PrivateTmp=yes`, `PrivateDevices=yes`, `PrivateIPC=yes`, `ProtectKernelTunables=yes`, `ProtectKernelModules=yes`, `ProtectKernelLogs=yes`, `ProtectControlGroups=yes`, `RestrictNamespaces=yes`, `RestrictSUIDSGID=yes`, `LockPersonality=yes`, `MemoryDenyWriteExecute=yes`, `SystemCallFilter=`, `CapabilityBoundingSet=` (empty), `User=l4d2-sandbox`, `Group=l4d2-sandbox`.
- `TemporaryFileSystem=` covers `/etc` and `/var/lib`.
- `BindReadOnlyPaths=` includes `/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives` and the script bind `${SCRIPT}:/script.sh`.
- `BindPaths=` carries the overlay bind.
- Cgroup limits unchanged (`MemoryMax=4G`, `MemorySwapMax=0`, `TasksMax=512`, `CPUQuota=200%`, `RuntimeMaxSec=3600`).
3. Existing `test_script_sandbox_helper_dry_run_mode` keeps passing — the dry-run guard still short-circuits before `systemd-run`.
4. Existing `test_script_sandbox_helper_validates_overlay_id` keeps passing — argument validation is unchanged.
Implementation: helper body verbatim from the spec §Helper.
**Verification:**
```
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
```
**Commit:** `refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap`
---
## Task 2: Drop bubblewrap apt/dnf dep + tighten left4me.db mode
**Files:**
- Modify: `deploy/deploy-test-server.sh` — remove `bubblewrap` from `apt-get install` / `dnf install` package lists; add a post-init step that ensures `left4me.db` is mode 0640 owned `root:left4me`.
- Modify: `deploy/tests/test_deploy_artifacts.py` — replace `test_deploy_script_installs_bubblewrap` with `test_deploy_script_does_not_install_bubblewrap`; add `test_deploy_script_tightens_left4me_db_permissions`.
Test plan:
1. `test_deploy_script_does_not_install_bubblewrap` — for each `apt-get install` / `dnf install` line, `bubblewrap` is absent.
2. `test_deploy_script_tightens_left4me_db_permissions` — script contains `chmod 0640 /var/lib/left4me/left4me.db` and `chown root:left4me /var/lib/left4me/left4me.db` (in either order).
3. `test_deploy_script_shell_syntax` keeps passing (`sh -n`).
Implementation:
- Remove the bare `bubblewrap` token from the two install lines.
- After the `alembic upgrade head` step (which creates the DB if missing), add:
```
$sudo_cmd chown root:left4me /var/lib/left4me/left4me.db
$sudo_cmd chmod 0640 /var/lib/left4me/left4me.db
```
Idempotent — re-runs are no-ops.
**Verification:**
```
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
sh -n deploy/deploy-test-server.sh
```
**Commit:** `chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode to 0640 root:left4me`
---
## Task 3: Deploy + smoke-test on the test host
**Files:** none.
This is an operational verification step, not a code change. Run `deploy/deploy-test-server.sh ckn@10.0.4.128`, then on the host re-run the same smoke battery used to validate the prototype:
1. **Identity / privileges**: `id` returns `uid=996 gid=985`; `/proc/self/status` shows `NoNewPrivs: 1` and `CapBnd: 0000000000000000`.
2. **Filesystem isolation**: `/etc/passwd` absent, `/etc/alternatives/awk` present, `/var/lib/left4me/left4me.db` absent, `/home` inaccessible, `/usr` not writable, `/overlay` writable.
3. **Tools + network**: `awk` resolves through `/etc/alternatives`; `curl https://steamcommunity.com/` returns 200.
4. **Cgroup limits**: while a 5s-sleep script runs, `cat /sys/fs/cgroup/.../memory.max` returns `4294967296`; `pids.max` `512`; `cpu.max` `200000 100000`.
5. **Memory cap**: 5 GB Python alloc raises `MemoryError`.
6. **Wipe**: `find /overlay -mindepth 1 -delete` empties the overlay dir.
7. **Seccomp / restriction probes**: `unshare -U`, `mount -t tmpfs`, `setarch -X`, `bpf` setsockopt all fail with EPERM/EINVAL.
8. **Build via web UI**: log in as admin, create a script overlay with `echo "hi" > foo`, click Save, confirm job succeeds and `foo` appears in `/var/lib/left4me/overlays/{id}/foo`.
9. **DB hardening**: `stat -c "%a %U:%G" /var/lib/left4me/left4me.db` returns `640 root:left4me`.
Mark this task complete only after every check passes on the live host.
**Commit:** none (operational verification — record results in conversation/PR description).
---
## Task 4: Drift sweep + push
**Files:** as needed across the repo.
Run the full test suite for all three packages; chase any drift caused by the helper rewrite or deploy-script changes.
```
python3 -m pytest l4d2web/tests/ -q
python3 -m pytest l4d2host/tests/ -q
python3 -m pytest deploy/tests/ -q
```
Implementation: fix what breaks. Expected: nothing new should break, since the Python-side contract is unchanged. If something does, treat it as a sign of an unintended coupling and address.
Push the commits to `origin/master`.
**Verification:** all three suites green; `git status` clean; commits visible on `git.sublimity.de/cronekorkn/left4me`.
**Commit:** none unless drift fixes are needed.
---
## Rollback plan
If Task 3 surfaces a blocker (a hardening directive breaks a real-world script class, seccomp filter is too narrow, BindPaths semantics differ on the host's systemd version), roll back via `git revert` of Tasks 1+2 and redeploy. Git history preserves both the v1 and v2 helper. The Python side never changed, so reverting only the deploy artifacts is sufficient — no DB migration to undo, no template change to roll back.