left4me/docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md
mwiegand 8971b23617
refactor(sandbox): collapse l4d2-sandbox user into left4me
The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
2026-05-15 15:50:57 +02:00

579 lines
23 KiB
Markdown

# Build-overlay template unit — refactor the script-sandbox helper
**Status: open question, not settled design.** This is a handoff
document prompted by the build-time idmap landing on 2026-05-15. The
current `left4me-script-sandbox` shell helper works but has accumulated
several layers of complexity (idmap bind setup, trap cleanup, nsenter
self-wrap) that a systemd template unit would handle declaratively.
The same pattern is already established in the codebase for
gameservers (`left4me-server@.service`). A future session should
evaluate whether to refactor and, if so, follow the steps below.
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> — see `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The
> idmap bind setup + trap cleanup are gone, so the remaining
> complexity in the helper is just the nsenter self-wrap. References
> below to `User=l4d2-sandbox` should read as `User=left4me`; the
> template refactor will inherit that cleanly.
## Why this came up
While verifying the build-time idmap refactor, the first 5 build jobs
failed with `mkdir: Permission denied` on `/overlay/...`. Root cause:
- `left4me-web.service` runs with `PrivateTmp=true`, which puts the
web app (and anything it sudoes into) in a private mount namespace.
- The script-sandbox helper, invoked via `sudo` from the web app,
inherits that namespace.
- The helper's `mount --bind --map-users=...` pre-creates the idmap
staging path *in the web app's namespace*.
- `systemd-run` (called by the helper) spawns a transient unit in
PID 1's mount namespace.
- The transient unit's `BindPaths=...:/overlay` resolves the staging
path in PID 1's namespace — where the bind doesn't exist. It sees
an empty root-owned dir at the staging path (mkdir'd by the helper
before the bind) and binds *that* to `/overlay`.
- Sandbox uid hits EACCES on every write.
We fixed it (commit `f1aa05d`) by self-wrapping the helper into
PID 1's mount namespace at the top of the script:
```bash
if [[ "${L4D2_SANDBOX_IN_PID1_MNT_NS:-}" != "1" ]]; then
exec env L4D2_SANDBOX_IN_PID1_MNT_NS=1 \
/usr/bin/nsenter --mount=/proc/1/ns/mnt -- "$0" "$@"
fi
```
That works. But it's a band-aid for an architectural friction:
**helper invocation via `sudo` from a hardened service forces us to
manually escape the caller's namespace before any mount syscall**.
If the helper were *itself* a systemd unit started by PID 1, the
namespace would be correct by default.
The gameserver helper handles this at the unit level. Its
ExecStartPre is:
```
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
```
i.e. wrapped in nsenter *at the unit*. The unit is started by PID 1,
so it has PID 1's namespace, then nsenter is a belt-and-braces.
Mirror that pattern for builds: introduce `build-overlay@.service` as
a template unit, have the worker activate it instead of forking a
helper.
## Current state (the thing being replaced)
Files:
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
the bash helper. ~100 lines. Self-wraps in nsenter, does pre-bind
with `--map-users`, invokes `systemd-run --quiet --collect --wait
--pipe -p ... -- /bin/bash /script.sh`, cleans up via trap.
- `l4d2web/services/overlay_builders.py:run_sandboxed_script` — the
worker entry point. Writes script content to
`/var/lib/left4me/sandbox-scripts/<uniqued>.sh`, invokes
`sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id>
<path>`, streams stdout/stderr via `subprocess.Popen` + the existing
`run_command` plumbing.
- `deploy/files/etc/sudoers.d/left4me` — grants `left4me` NOPASSWD to
the helper path.
What the helper actually does:
1. nsenter into PID 1's mount ns (the band-aid)
2. validate args + overlay dir exists
3. compute `STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}`
4. `trap` cleanup; pre-emptive `umount` of stale staging; `mkdir -p`
the staging
5. `mount --bind --map-users=$(id -u left4me):$(id -u l4d2-sandbox):1
--map-groups=... $OVERLAY_DIR $STAGING`
6. `systemd-run` with the full hardening profile, `BindPaths=$STAGING:/overlay`
7. Wait for completion, propagate exit code
8. trap fires: `umount $STAGING; rmdir $STAGING`
## Proposed design
Replace the bash helper with two systemd units (template + a slice)
emitted from ckn-bw's existing `systemd_units` reactor, plus a small
worker rewrite.
### `build-overlay@.service` (template unit)
```ini
[Unit]
Description=Sandboxed overlay build for instance %i
DefaultDependencies=no
After=local-fs.target
RequiresMountsFor=/var/lib/left4me/overlays/%i
ConditionPathIsDirectory=/var/lib/left4me/overlays/%i
ConditionPathExists=/var/lib/left4me/sandbox-scripts/%i.sh
[Service]
Type=oneshot
User=l4d2-sandbox
Group=l4d2-sandbox
Slice=l4d2-build.slice
# Idmap bind: disk uid 980 (left4me) ↔ mount uid 981 (sandbox), so writes
# from the sandbox land on disk as left4me. + prefix runs as root before
# the User= drop (mount syscall requires CAP_SYS_ADMIN).
ExecStartPre=+/usr/bin/mkdir -p /run/left4me/idmap/%i
ExecStartPre=+/usr/bin/mount --bind \
--map-users=980:981:1 --map-groups=980:981:1 \
/var/lib/left4me/overlays/%i /run/left4me/idmap/%i
ExecStart=/bin/bash /script.sh
ExecStopPost=+-/usr/bin/umount /run/left4me/idmap/%i
ExecStopPost=+-/usr/bin/rmdir /run/left4me/idmap/%i
# Hardening — all the -p flags from the current bash helper, declared
# declaratively here instead of as systemd-run -p arguments.
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
PrivateDevices=yes
PrivateIPC=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectKernelLogs=yes
ProtectControlGroups=yes
RestrictNamespaces=yes
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
RestrictSUIDSGID=yes
LockPersonality=yes
MemoryDenyWriteExecute=yes
SystemCallFilter=@system-service @network-io
SystemCallArchitectures=native
CapabilityBoundingSet=
AmbientCapabilities=
IPAddressDeny=127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7
TemporaryFileSystem=/etc /var/lib
BindReadOnlyPaths=/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives /var/lib/left4me/sandbox-scripts/%i.sh:/script.sh
BindPaths=/run/left4me/idmap/%i:/overlay
WorkingDirectory=/overlay
Environment=HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay
UMask=0022
OOMScoreAdjust=500
MemoryMax=4G
MemorySwapMax=0
TasksMax=512
CPUQuota=200%
RuntimeMaxSec=3600
TimeoutStartSec=1h
TimeoutStopSec=30s
```
Notes:
- `Type=oneshot` makes `systemctl start` block until ExecStart exits.
- `ConditionPath*` provides early failure if the overlay dir or script
doesn't exist (avoids running the unit at all in those cases).
- `RequiresMountsFor=/var/lib/left4me/overlays/%i` ensures the parent
fs is mounted before this unit runs (`/` and `/var/lib` if it's a
separate mount point).
- `ExecStopPost` uses `+-` (root, ignore failures) — the bind might
already be torn down if the unit is restarting.
- `BindReadOnlyPaths=...:/script.sh` makes the per-overlay script
available at `/script.sh` inside the sandbox, picked from the
predictable path `/var/lib/left4me/sandbox-scripts/%i.sh`.
### Script source: filesystem vs. DB
**Critical design decision the future session must make.** The current
plan in the unit sketch above assumes the worker writes the script
content to `/var/lib/left4me/sandbox-scripts/<id>.sh` before calling
`systemctl start`. But the script *already lives in the DB* (the
`overlays.script` column), and the unit instance name `%i` is the
overlay row id. The filesystem copy is redundant unless we want it.
Three options:
**Option A — worker writes the script (the unit sketch above).**
Worker queries DB, writes `<id>.sh` to a known path, then
`systemctl start`. Unit reads via `BindReadOnlyPaths`. Simple, no DB
access from the unit, the existing `_sandbox_script_dir()` plumbing
mostly works. Cost: redundant on-disk copy; stale files between
builds if you don't clean them.
**Option B — unit fetches the script from the DB itself.** A small
root-side helper installed as
`/usr/local/libexec/left4me/left4me-fetch-script` does:
```python
#!/usr/bin/python3
import sqlite3, sys
overlay_id = int(sys.argv[1])
conn = sqlite3.connect("/var/lib/left4me/left4me.db")
row = conn.execute(
"SELECT script FROM overlays WHERE id = ?", (overlay_id,)
).fetchone()
sys.stdout.write((row[0] if row else "") or "")
```
Unit's ExecStartPre runs it as root (the `+` prefix), pipes the
output to a runtime path that ExecStart reads:
```ini
RuntimeDirectory=left4me/sandbox-scripts
RuntimeDirectoryMode=0700
ExecStartPre=+/bin/sh -c '/usr/local/libexec/left4me/left4me-fetch-script %i \
> /run/left4me/sandbox-scripts/%i.sh && chmod 0644 /run/left4me/sandbox-scripts/%i.sh'
BindReadOnlyPaths=/run/left4me/sandbox-scripts/%i.sh:/script.sh
```
(`RuntimeDirectory=` auto-creates `/run/left4me/sandbox-scripts/` on
start and removes it on stop, including the file inside.)
The fetch script doesn't need sudoers — it runs from ExecStartPre with
root privileges already. It only reads the DB; no writes. The DB is
`root:left4me 0640` so root can read it.
Worker becomes a one-liner: `sudo systemctl start build-overlay@<id>`.
No FS prep, no tmpfile cleanup.
**Option C — pipe the script content directly into bash stdin.** The
unit's ExecStart is something like
`/bin/sh -c "fetch-script %i | /bin/bash"`. Pros: no on-disk file at
all. Cons: `/bin/bash` runs without a file path, so `$0` is `bash` and
error messages look weird; harder to debug a failing script when there's
no file to inspect.
**Recommendation**: Option B. Decouples script storage (DB) from
sandbox transport (a /run/ runtime file). RuntimeDirectory= handles
cleanup. Worker becomes trivially small. The fetch-script helper is
~10 lines and stays in deploy/files/usr/local/libexec/left4me/.
If Option A is chosen instead, plan to track the script tmpfiles
explicitly so they don't accumulate. With Option B, RuntimeDirectory
auto-cleans on stop.
### Worker invocation
Replace `run_sandboxed_script` in
`l4d2web/services/overlay_builders.py`. The code below is the **Option
A** shape (worker writes the script). For **Option B** (recommended),
drop the `script_dir`/`script_path`/`write_text`/`chmod` lines — the
unit's ExecStartPre fetches from the DB. The signature can also drop
`script_text` since the worker doesn't need to pass content anymore.
```python
def run_sandboxed_script(
overlay_id: int,
script_text: str, # remove this param if Option B
*,
on_stdout: LogSink,
on_stderr: LogSink,
should_cancel: CancelCheck,
) -> None:
# The four lines below are Option A only — delete for Option B.
script_dir = _sandbox_script_dir()
script_dir.mkdir(parents=True, exist_ok=True)
script_path = script_dir / f"{overlay_id}.sh"
script_path.write_text(script_text or "")
os.chmod(script_path, 0o644)
unit = f"build-overlay@{overlay_id}.service"
# Tail the unit's journal as a sidecar so output streams into job-logs
# while the unit runs. --follow exits when the unit reaches "inactive".
journal = subprocess.Popen(
["journalctl", "--unit", unit, "--output=cat", "--follow",
"--since=now", "--no-pager"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
)
try:
# Start the unit (sudoers permits this exact verb pattern).
# Type=oneshot makes this block until ExecStart returns.
rc = subprocess.run(
["sudo", "-n", "/bin/systemctl", "start", unit],
check=False,
).returncode
finally:
# Drain remaining journal lines (journalctl --follow may not have
# printed everything yet by the time systemctl returns).
journal.terminate()
try:
for line in journal.stdout or []:
on_stdout(line.rstrip("\n"))
finally:
journal.wait(timeout=5)
# Read exit code from the unit. ExecMainStatus is the script's rc;
# Result is "success" / "failed" / "timeout" etc.
show = subprocess.check_output(
["systemctl", "show", unit,
"-p", "ExecMainStatus", "-p", "Result", "--value"],
text=True,
).split()
exec_main_status = int(show[0])
result = show[1]
if rc != 0 or result != "success":
raise BuildError(
f"build-overlay@{overlay_id} failed: "
f"systemctl rc={rc} unit result={result} script exit={exec_main_status}"
)
```
That's ~30 lines vs. ~50 today, and the helper script disappears
entirely.
**Two refinements to consider:**
1. **Cancel semantics**: today the worker's `should_cancel` callback
triggers a SIGTERM via the existing `run_command` plumbing. With
systemctl-start, you'd issue `systemctl stop build-overlay@<id>`
in a parallel thread when `should_cancel()` returns True. Wire
that up.
2. **Journal streaming race**: `journalctl --follow --since=now`
started *after* `systemctl start` may miss the first few lines.
Two fixes:
- Start the journal tail before systemctl-start (the unit doesn't
exist yet, so journalctl waits silently — verify this behaviour
on Trixie).
- Or use `journalctl --cursor` machinery: snapshot the cursor
before start, then read with `--cursor=` after.
Start-before is simpler and likely sufficient for L4D2 build
verbosity, where the first second of output isn't critical.
### Sudoers
Replace:
```
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
```
with:
```
left4me ALL=(root) NOPASSWD: /bin/systemctl start build-overlay@*.service
left4me ALL=(root) NOPASSWD: /bin/systemctl stop build-overlay@*.service
```
(Tighter — verb-prefixed and instance-globbed. No script path passed.)
### Slice
`l4d2-build.slice` already exists (per the gameserver/sandbox today's
configuration). Reuse it — no change needed.
### Sandbox script tmpfile cleanup
Currently `run_sandboxed_script` writes a per-invocation
`tempfile.NamedTemporaryFile` with a random suffix and unlinks it in a
`finally`. With template-unit lookup, the script path is **predictable
per overlay id** (`/var/lib/left4me/sandbox-scripts/<id>.sh`).
Implications:
- Two concurrent builds for the *same* overlay id would clobber the
script file. The job queue already serializes per-overlay (per
`l4d2web/services/job_worker.py:OVERLAY_OPERATIONS`), so this
is OK.
- Scripts persist between builds (no auto-cleanup). Either accept
that (the next build overwrites) or delete after the unit goes
inactive. Recommend: leave them — small, useful for debugging.
## Migration
In order:
1. **Add the unit emission to ckn-bw's `bundles/left4me/metadata.py`
systemd_units reactor.** Mirror the pattern used for
`left4me-server@.service`. Drop in the template-unit content as
another reactor entry.
2. **Update sudoers** (`bundles/left4me/files/etc/sudoers.d/left4me`)
to permit `systemctl start/stop build-overlay@*.service` and
remove the script-sandbox grant.
3. **Replace `run_sandboxed_script` in left4me.** Add the new
journalctl-based output streaming, exit-code reading, and cancel
handling. Keep the function signature stable so callers
(`ScriptBuilder.build`, the wipe route) are unchanged.
4. **Delete `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`.**
5. **Update tests:**
- `deploy/tests/test_deploy_artifacts.py`:
- Drop `test_script_sandbox_uses_idmap_staging` and any other
tests that read SCRIPT_SANDBOX_HELPER.
- Add tests that assert the new unit emission in ckn-bw's
reactor output. (But that's in the other repo — left4me's
deploy tests can't directly cover it.)
- Add a test that asserts the worker invokes
`sudo systemctl start build-overlay@*` (grep
`overlay_builders.py`).
- `l4d2web/tests/test_overlay_builders.py` (if it exists):
update mocks for `run_sandboxed_script` to expect the new
subprocess shape.
6. **Test on `left4.me`:**
- Push left4me, `bw apply ovh.left4me`. Apply also picks up the
new unit emission and the sudoers change.
- Trigger a script-overlay rebuild via the web UI or the
enqueue API path used in this session (see test history in
git log around 2026-05-15).
- Inspect: `journalctl -u build-overlay@9.service`,
`systemctl status build-overlay@9.service`.
- Verify on-disk state: overlay files end up `left4me`-owned;
idmap bind cleanly torn down (`findmnt | grep idmap` empty).
## Open decisions for the future session
0. **Script source: filesystem (Option A) vs. DB-fetched in ExecStartPre
(Option B) vs. piped to stdin (Option C).** See the "Script source"
section above. This is the highest-impact decision because it
shapes the worker, the unit's ExecStartPre, and whether you need
a fetch-script helper binary at all. Recommendation: Option B.
1. **`/run/left4me/idmap/%i` vs. `/var/lib/left4me/tmp/sandbox-idmap-%i`** —
`/run` is tmpfs and wiped on reboot, more correct for transient
mount paths. But it requires the dir to exist (created by
ExecStartPre). Either works.
2. **What to do with the existing `left4me-apply-cake` dead code**
irrelevant to this refactor; flagged in the other handoff doc.
3. **Whether to drop the post-build `chmod o+r` in the sandbox helper**
already gone in the build-time-idmap commit. (Verify in the new
unit nothing equivalent is needed; files are left4me-owned, web
reads via primary uid.)
4. **`Type=oneshot` vs. `Type=exec`** — oneshot blocks `systemctl
start`. exec doesn't. With oneshot we don't need the
`journalctl --follow` workaround if we read journal *after*
completion. But for live progress (which the existing builds
stream), `--follow` is still needed. Stick with oneshot.
5. **Should the unit set `KillMode=mixed`** to ensure children die on
stop? Worth checking — the existing systemd-run line doesn't set
it explicitly; defaults usually suffice.
6. **`StateDirectory=` vs. explicit `mkdir -p`** — systemd has
StateDirectory and RuntimeDirectory directives that auto-create
per-unit directories. Could replace the `mkdir -p /run/left4me/idmap/%i`
ExecStartPre with `RuntimeDirectory=left4me/idmap/%i`. Cleaner;
gets auto-cleanup on stop too. Recommend doing this — both the
mkdir and the rmdir ExecStopPost would go away.
## Verification
End-to-end smoke test on `left4.me` after the deploy:
```bash
# unit is installed and template-parseable
systemctl status build-overlay@.service # should show "loaded; static"
sudo systemd-analyze verify build-overlay@1.service
# enqueue a build via the web app's worker path (mimic the
# enqueue_build_overlay pattern from this session's job 64 onwards)
# then watch:
sudo journalctl -u build-overlay@9.service -f
# on completion:
systemctl show build-overlay@9.service -p Result -p ExecMainStatus
# expect: Result=success, ExecMainStatus=0
# disk state
sudo find /var/lib/left4me/overlays/9 -uid 981 # should be empty
sudo find /run/left4me/idmap # should not exist or be empty
# pid 1 mount table — no orphan idmap binds
sudo findmnt --task 1 -o TARGET | grep idmap # empty
```
## Risks
- **Worker cancel-during-build**: today's `should_cancel` callback
signals via `run_command`'s child process. With the unit, the
worker needs a separate path: spawn a thread that polls
`should_cancel()` and calls `sudo systemctl stop build-overlay@<id>`
when triggered. Without this, builds that exceed `RuntimeMaxSec` or
hit user-cancel won't terminate promptly.
- **Journal lag at unit start**: `journalctl --follow` started before
`systemctl start` should pick up all output. If not, may need
cursor-based streaming. Test with a script that prints immediately
(`echo hello; exit 0`) — if "hello" appears in the job log, race
is handled.
- **Sudoers globbing**: `systemctl start build-overlay@*.service`
permits any instance id including weird strings like `../etc-passwd`.
Use a tighter glob if possible (e.g.,
`build-overlay@[0-9]*.service`). Test that sudoers rejects
unexpected instance names.
- **Type=oneshot return semantics**: confirm that `systemctl start
build-overlay@<id>` on a Type=oneshot unit returns rc=3 (or
similar) when the unit's ExecStart fails, so the worker can detect
failure without re-querying `systemctl show`.
- **Idle running over reboot**: a build that's running across a reboot
is killed when the system goes down. That's identical to today's
behavior with systemd-run. Acceptable.
- **The journalctl sidecar process accumulates as a zombie if not
reaped properly.** The proposed code does `journal.wait(timeout=5)`
— handle the timeout case (force-kill).
## Pointers
Reference files (with line numbers if applicable):
- **Current helper to be removed**:
`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
- **Current worker invoker**:
`l4d2web/services/overlay_builders.py:run_sandboxed_script` (~ln 324)
- **Current job-worker dispatch**:
`l4d2web/services/job_worker.py` (build_overlay operation)
- **Sudoers**:
`deploy/files/etc/sudoers.d/left4me` (matched verbatim in
`ckn-bw/bundles/left4me/files/etc/sudoers.d/left4me`)
- **Sample template unit pattern** (the model to copy):
`left4me-server@.service` emission in ckn-bw's
`bundles/left4me/metadata.py` systemd_units reactor.
- **Existing slice declaration** (already correct):
`l4d2-build.slice` in ckn-bw's reactor.
Recent commits that touched this surface:
- `4838108` — moved idmap to build time (the refactor that surfaced
the namespace bug)
- `f1aa05d` — added nsenter self-wrap (the band-aid this refactor
removes)
- `2f6a9cf`, `9053186`, `dd918ac` — earlier idmap-on-mount approach
that was reverted
Related design docs:
- `docs/superpowers/plans/2026-05-15-build-time-idmap.md` — the
plan whose architecture this refactor builds on
- `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md`
unrelated open questions about deploy/ layout
## What's NOT in scope
- Rewriting the sandbox in Python / packaging differently.
- Changing the security hardening profile (the unit duplicates the
current set verbatim — adjust later if needed).
- Splitting the gameserver uid from the web app uid (noted in earlier
handoff doc).
- Re-evaluating whether `l4d2-sandbox` should exist as a separate
uid (kept; defense in depth).
- Touching the `left4me-overlay` gameserver helper (it already uses
the pattern; only the sandbox helper is being refactored to match).
## Estimate
Rough breakdown for the future session:
- Unit file design + ckn-bw reactor change: 1-2 hours
- Worker rewrite (run_sandboxed_script): 1-2 hours
- Tests: 1 hour
- Deploy + verify on test server: 30 min
- Bug-fix and iteration buffer: 1 hour
~5 hours of focused work, assuming no surprises with journalctl
streaming or sudoers semantics.
## Decision criteria for whether to do this
Do it if:
- You're about to make any other change to the sandbox hardening,
build lifecycle, or sandbox uid story.
- You're frustrated by debugging the existing helper.
- You want to remove the nsenter band-aid for hygiene.
Skip if:
- The sandbox is stable and you're not planning related changes.
- You'd rather invest the time in higher-value work elsewhere.
The current solution is fine; this refactor is upgrade-not-fix.