Captures the architectural fix for the mount-propagation bug: replace fuse-overlayfs (rootless mount inside the web service's namespace, never visible to host or to gameserver units) with kernel-native overlayfs mounted via a privileged sudo helper that nsenters into PID 1's mount namespace. Companion plan numbers the migration as five tasks ending in end-to-end verification on the test box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Kernel Overlayfs Helper Implementation Plan
Approval status: User-approved 2026-05-08. Implementation proceeds.
Goal: Implement the kernel-overlayfs migration per docs/superpowers/specs/2026-05-08-kernel-overlayfs-helper-design.md. Add a Python left4me-overlay privileged helper, a KernelOverlayFSMounter Python class, wire the existing OverlayMounter ABC through l4d2host/instances.py, drop fuse-overlayfs from the deploy stack, and migrate existing on-disk upper/work directories.
Architecture: The web app continues to call l4d2ctl start|stop|delete <name>; l4d2host continues to expose the same CLI verbs. Internally, start_instance/stop_instance/delete_instance move from a hardcoded subprocess call to fuse-overlayfs/fusermount3 to using KernelOverlayFSMounter, which invokes the new sudo helper that mounts in PID 1's namespace via nsenter.
Locked Decisions
See docs/superpowers/specs/2026-05-08-kernel-overlayfs-helper-design.md for the design rationale. Implementation-relevant summary:
left4me-overlayPython helper in/usr/local/libexec/left4me/, owned root, mode 0755, system/usr/bin/python3, stdlib only.- Verbs:
mount <name>,umount <name>. - Validation in helper: name regex; realpath + allowlist for each lowerdir; exact-prefix check for upper/work/merged; reject upperdir with
user.fuseoverlayfs.*xattrs; lowerdir count ≤ 500. - Sudoers verb-constrained:
mount *,umount *. KernelOverlayFSMounterinl4d2host/fs/kernel_overlayfs.py— implementsOverlayMounter. Derivesnamefrom the merged path's parent.start_instanceaddsos.path.ismount(merged)guard before mounting.- Deploy migration: gated on sentinel file
/var/lib/left4me/.kernel-overlay-migrated; stops gameservers + web, force-unmounts stale mounts, wipes upper/work, recreates empty. - Web unit cleanup: drop
MountFlags=shared, restorePrivateTmp=true, rewrite comment block. KeepNoNewPrivilegesunset. - Delete
l4d2host/fs/fuse_overlayfs.py(currently unused —start_instancebypasses it). - AGENTS.md contracts unchanged.
Current Gap
l4d2host/instances.pystart_instancecallsfuse-overlayfsdirectly (lines 85-101);stop_instance/delete_instancecallfusermount3 -udirectly. TheOverlayMounterABC atl4d2host/fs/base.pyand theFuseOverlayFSMounterimpl atl4d2host/fs/fuse_overlayfs.pyexist but are unused.- Mounts land in the web service's private mount namespace, invisible to host and to gameserver units.
MountFlags=shareddoes not fix it. - No privileged mount helper exists; only
left4me-systemctlandleft4me-journalctl. - Deploy script installs
fuse-overlayfsapt package and assumes it as a runtime tool. - Existing
runtime/<name>/upperdirectories may carryuser.fuseoverlayfs.*xattrs that kernel overlayfs would silently ignore (resurrecting "deleted" files).
Task 1: Helper Script + Sudoers + Mounter Class (RED-first)
Files:
- Create:
deploy/files/usr/local/libexec/left4me/left4me-overlay(Python, mode 0755 after deploy) - Modify:
deploy/files/etc/sudoers.d/left4me - Create:
l4d2host/fs/kernel_overlayfs.py - Create:
l4d2host/tests/test_kernel_overlayfs.py - Create:
l4d2host/tests/test_overlay_helper.py - Modify:
deploy/tests/test_deploy_artifacts.py(assert helper deployed + sudoers entry)
Test plan (RED first):
test_kernel_overlayfs.py::test_mount_invokes_helper_with_name— mockrun_command, callKernelOverlayFSMounter().mount(lowerdirs="/x:/y", upperdir=Path("/var/lib/left4me/runtime/alpha/upper"), workdir=Path("/var/lib/left4me/runtime/alpha/work"), merged=Path("/var/lib/left4me/runtime/alpha/merged")), assert argv["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "mount", "alpha"].test_kernel_overlayfs.py::test_unmount_invokes_helper_with_umount_verb— mock + call + assert argv withumount.test_overlay_helper.py— drives the helper script as a subprocess withLEFT4ME_OVERLAY_PRINT_ONLY=1env var (helper prints the would-bensenter …command line and exits 0 instead of execve), and with isolatedLEFT4ME_ROOT=tmp_path. Cases:- Valid mount: prints expected
nsenter --mount=/proc/1/ns/mnt -- /bin/mount -t overlay …line. - Valid umount: prints expected umount line.
- Bad name (
../escape, uppercase, empty): exit non-zero, stderr matches. - Lowerdir traversal (
/etc,/var/lib/left4me/../etc, symlink escape): exit non-zero. - Missing
instance.env: exit non-zero. - Tainted upperdir (with
user.fuseoverlayfs.opaquexattr): exit non-zero with clear message. (Optional: skip ifsetfattris unavailable on dev machine; keep test on Linux only viapytest.mark.skipif.) - Lowerdir count > 500: exit non-zero.
- Valid mount: prints expected
test_deploy_artifacts.py— assert/usr/local/libexec/left4me/left4me-overlayis present in deployed files; sudoers includes the new lines.
Implementation:
- Helper script structure:
argparsefor the verb, then path-validation funcs, thenos.execv("/usr/bin/nsenter", [...])(or printing it underLEFT4ME_OVERLAY_PRINT_ONLY). KernelOverlayFSMounter:name = merged.parent.name(with a one-line comment), thenrun_command(["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", verb, name], on_stdout=…, on_stderr=…, passthrough=…, should_cancel=…).
Verification:
python3 -m pytest l4d2host/tests/test_kernel_overlayfs.py l4d2host/tests/test_overlay_helper.py deploy/tests/test_deploy_artifacts.py -q
Expected before implementation: FAIL on missing class/script. After: all green.
Commit: feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper
Task 2: Wire OverlayMounter Through Lifecycle + Drop Fuse Module
Files:
- Modify:
l4d2host/instances.py(start/stop/delete) - Modify:
l4d2host/tests/test_lifecycle.py(update argv assertions, add double-mount guard test) - Delete:
l4d2host/fs/fuse_overlayfs.py - Verify:
l4d2host/fs/__init__.pydoes not re-exportFuseOverlayFSMounter
Test plan (update RED, then GREEN):
test_lifecycle.py::test_start_order— change assertion:calls[0]is now["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "mount", "alpha"]. Adjust setup so the test still creates the merged directory.test_lifecycle.py::test_stop_succeeds_when_unmount_fails—cmd[0:5] == ["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "umount", "alpha"].test_lifecycle.py::test_delete_succeeds_when_unmount_fails— same.- NEW
test_lifecycle.py::test_start_refuses_double_mount— monkeypatchos.path.ismountto return True; expectstart_instanceto raisesubprocess.CalledProcessError; assert NO mount command was issued. test_lifecycle.py::test_lifecycle_rejects_unsafe_instance_names— unchanged.test_lifecycle.py::test_delete_missing_is_noop— unchanged.
Implementation:
instances.pyimportsKernelOverlayFSMounter. Module-level singleton instance (_mounter = KernelOverlayFSMounter()). Replace directrun_command([...fuse-overlayfs...])with_mounter.mount(...). Replace directrun_command([...fusermount3...])with_mounter.unmount(...)(still inside the existing try/except for stop/delete).- Add the ismount guard at the top of
start_instanceafterruntime_diris computed, beforeemit_step("mounting runtime overlay..."). Raisesubprocess.CalledProcessError(returncode=1, cmd=["mount-guard"], stderr="runtime overlay already mounted at <path>; refusing to double-mount"). - Delete
l4d2host/fs/fuse_overlayfs.py. - Confirm
l4d2host/fs/__init__.pyis empty (already verified to be 1 line).
Verification:
python3 -m pytest l4d2host/tests -q
python3 -m pytest l4d2web/tests -q
Both green. Web tests: the "Step: mounting runtime overlay..." log line is preserved in start_instance.
Commit: refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop FuseOverlayFSMounter
Task 3: Deploy Script Migration (Apt Deps + Wipe Upper/Work)
Files:
- Modify:
deploy/deploy-test-server.sh - Modify:
deploy/tests/test_deploy_artifacts.py(assert deploy script contains migration lines; assertfuse-overlayfsno longer in apt-get install)
Test plan:
test_deploy_artifacts.py::test_deploy_script_drops_fuse_overlayfs_apt_dep—assert "fuse-overlayfs" not in deploy_scriptandassert "kernel-overlay-migrated" in deploy_script.test_deploy_artifacts.py::test_deploy_script_migration_block_uses_sentinel—assert ".kernel-overlay-migrated" in deploy_script.
Implementation:
In deploy/deploy-test-server.sh, drop fuse-overlayfs from the apt-get and dnf lines (lines 82, 84). Insert before the existing systemctl restart left4me-web.service (line 182):
# One-time migration: fuse-overlayfs upperdir → kernel overlayfs upperdir.
# fuse-overlayfs running as the left4me user uses user.fuseoverlayfs.* xattrs
# for whiteouts and opaque dirs; kernel overlayfs ignores those, so any
# pre-existing upper/ from the fuse era would resurrect "deleted" files.
sentinel=/var/lib/left4me/.kernel-overlay-migrated
if [ ! -e "$sentinel" ]; then
$sudo_cmd systemctl stop 'left4me-server@*.service' 2>/dev/null || true
$sudo_cmd systemctl stop left4me-web.service 2>/dev/null || true
$sudo_cmd sh -c 'findmnt -t fuse.fuse-overlayfs -o TARGET --noheadings | xargs -r -n1 fusermount3 -u 2>/dev/null || true'
$sudo_cmd sh -c "findmnt -t overlay -o TARGET --noheadings | grep '/var/lib/left4me/runtime/' | xargs -r -n1 umount 2>/dev/null || true"
$sudo_cmd sh -c 'for d in /var/lib/left4me/runtime/*/; do [ -d "$d" ] || continue; rm -rf "$d/upper" "$d/work"; mkdir -p "$d/upper" "$d/work"; chown left4me:left4me "$d/upper" "$d/work"; done'
$sudo_cmd touch "$sentinel"
$sudo_cmd chown left4me:left4me "$sentinel"
fi
Verification:
python3 -m pytest deploy/tests -q
Green.
Commit: chore(deploy): drop fuse-overlayfs apt dep + one-shot migrate upper/work
Task 4: Web Unit Hardening Cleanup + Docs
Files:
- Modify:
deploy/files/usr/local/lib/systemd/system/left4me-web.service - Modify:
deploy/tests/test_deploy_artifacts.py - Modify:
README.md - Modify:
l4d2host/README.md - Modify:
deploy/README.md
Test plan:
test_deploy_artifacts.py::test_web_unit_contains_required_runtime_contract— dropassert "MountFlags=shared" in unit(or rather: replace withassert "MountFlags=" not in unit); addassert "PrivateTmp=true" in unit; addassert "left4me-overlay" not in unit(just to be precise — the unit shouldn't reference the helper directly, only via Python code).
Implementation:
Edit left4me-web.service:
- Drop
MountFlags=shared. - Restore
PrivateTmp=true. - Rewrite the comment block above hardening lines to explain: mounts now go through the
left4me-overlayhelper whichnsenters into PID 1's mount namespace, so this unit's namespace is irrelevant to gameserver visibility.NoNewPrivilegesstays unset because sudo is setuid.
README updates:
README.md(line ~59): drop fuse-overlayfs from tech-stack list; replace with "kernel overlayfs via privileged helper".l4d2host/README.md: lines 29, 52, 64 reference fuse — update to "kernel overlayfs (mount via theleft4me-overlayhelper deployed to/usr/local/libexec/left4me/)".deploy/README.md: add/usr/local/libexec/left4me/left4me-overlayto the privileged-helpers inventory.
Verification:
python3 -m pytest deploy/tests -q
Green. Manual readthrough of the three READMEs confirms no stale fuse references.
Commit: chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs
Task 5: End-to-End Verification on ckn@10.0.4.128
Pre-deploy: branch is clean, all four prior commits land, all tests green locally.
Deploy:
deploy/deploy-test-server.sh ckn@10.0.4.128
Verification commands on the box:
test -e /var/lib/left4me/.kernel-overlay-migrated && echo migrated— sentinel created.systemctl status left4me-web.service --no-pager—active (running), recent invocation timestamp.- From the UI or via
sudo -u left4me /opt/left4me/.venv/bin/l4d2ctl start test-server— exit 0. findmnt /var/lib/left4me/runtime/test-server/merged— shows fstypeoverlayin the host namespace.systemctl status left4me-server@test-server --no-pager—active (running)after the start; not inactivating (auto-restart). Nostatus=200/CHDIRerrors injournalctl -u left4me-server@test-server.sudo journalctl -k --since "5 minutes ago" | grep -i apparmor | tail— no overlay-related denials.- Negative test:
sudo -u left4me sudo -n /usr/local/libexec/left4me/left4me-overlay mount '../escape'— exits non-zero with validation error. - Idempotency:
l4d2ctl stop test-server && l4d2ctl stop test-server— both succeed (per the priorfix(l4d2-host): make stop_instance idempotentcommit, still holds). - Re-start:
l4d2ctl start test-server— succeeds,findmntshows the mount again. - Double-mount guard: while the server is running, attempting another start (not via UI; via Python REPL or a second job) —
start_instanceraisesCalledProcessErrorwith the "refusing to double-mount" message. Optional, can be left to the unit test.
On failure of any step: stop and report. Do NOT push. The deploy script is rerunnable; the migration sentinel stays so wipe doesn't repeat.
Out Of Scope
- See spec's "Out Of Scope" section.
- This plan does not push commits; pushing is a separate user decision after end-to-end verification passes.