# Kernel Overlayfs Helper Implementation Plan > **Approval status:** User-approved 2026-05-08. Implementation proceeds. **Goal:** Implement the kernel-overlayfs migration per `docs/superpowers/specs/2026-05-08-kernel-overlayfs-helper-design.md`. Add a Python `left4me-overlay` privileged helper, a `KernelOverlayFSMounter` Python class, wire the existing `OverlayMounter` ABC through `l4d2host/instances.py`, drop `fuse-overlayfs` from the deploy stack, and migrate existing on-disk upper/work directories. **Architecture:** The web app continues to call `l4d2ctl start|stop|delete `; `l4d2host` continues to expose the same CLI verbs. Internally, `start_instance`/`stop_instance`/`delete_instance` move from a hardcoded subprocess call to `fuse-overlayfs`/`fusermount3` to using `KernelOverlayFSMounter`, which invokes the new sudo helper that mounts in PID 1's namespace via `nsenter`. --- ## Locked Decisions See `docs/superpowers/specs/2026-05-08-kernel-overlayfs-helper-design.md` for the design rationale. Implementation-relevant summary: - `left4me-overlay` Python helper in `/usr/local/libexec/left4me/`, owned root, mode 0755, system `/usr/bin/python3`, stdlib only. - Verbs: `mount `, `umount `. - Validation in helper: name regex; realpath + allowlist for each lowerdir; exact-prefix check for upper/work/merged; reject upperdir with `user.fuseoverlayfs.*` xattrs; lowerdir count ≤ 500. - Sudoers verb-constrained: `mount *`, `umount *`. - `KernelOverlayFSMounter` in `l4d2host/fs/kernel_overlayfs.py` — implements `OverlayMounter`. Derives `name` from the merged path's parent. - `start_instance` adds `os.path.ismount(merged)` guard before mounting. - Deploy migration: gated on sentinel file `/var/lib/left4me/.kernel-overlay-migrated`; stops gameservers + web, force-unmounts stale mounts, wipes upper/work, recreates empty. - Web unit cleanup: drop `MountFlags=shared`, restore `PrivateTmp=true`, rewrite comment block. Keep `NoNewPrivileges` unset. - Delete `l4d2host/fs/fuse_overlayfs.py` (currently unused — `start_instance` bypasses it). - AGENTS.md contracts unchanged. --- ## Current Gap - `l4d2host/instances.py` `start_instance` calls `fuse-overlayfs` directly (lines 85-101); `stop_instance`/`delete_instance` call `fusermount3 -u` directly. The `OverlayMounter` ABC at `l4d2host/fs/base.py` and the `FuseOverlayFSMounter` impl at `l4d2host/fs/fuse_overlayfs.py` exist but are unused. - Mounts land in the web service's private mount namespace, invisible to host and to gameserver units. `MountFlags=shared` does not fix it. - No privileged mount helper exists; only `left4me-systemctl` and `left4me-journalctl`. - Deploy script installs `fuse-overlayfs` apt package and assumes it as a runtime tool. - Existing `runtime//upper` directories may carry `user.fuseoverlayfs.*` xattrs that kernel overlayfs would silently ignore (resurrecting "deleted" files). --- ## Task 1: Helper Script + Sudoers + Mounter Class (RED-first) **Files:** - Create: `deploy/files/usr/local/libexec/left4me/left4me-overlay` (Python, mode 0755 after deploy) - Modify: `deploy/files/etc/sudoers.d/left4me` - Create: `l4d2host/fs/kernel_overlayfs.py` - Create: `l4d2host/tests/test_kernel_overlayfs.py` - Create: `l4d2host/tests/test_overlay_helper.py` - Modify: `deploy/tests/test_deploy_artifacts.py` (assert helper deployed + sudoers entry) Test plan (RED first): 1. `test_kernel_overlayfs.py::test_mount_invokes_helper_with_name` — mock `run_command`, call `KernelOverlayFSMounter().mount(lowerdirs="/x:/y", upperdir=Path("/var/lib/left4me/runtime/alpha/upper"), workdir=Path("/var/lib/left4me/runtime/alpha/work"), merged=Path("/var/lib/left4me/runtime/alpha/merged"))`, assert argv `["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "mount", "alpha"]`. 2. `test_kernel_overlayfs.py::test_unmount_invokes_helper_with_umount_verb` — mock + call + assert argv with `umount`. 3. `test_overlay_helper.py` — drives the helper script as a subprocess with `LEFT4ME_OVERLAY_PRINT_ONLY=1` env var (helper prints the would-be `nsenter …` command line and exits 0 instead of execve), and with isolated `LEFT4ME_ROOT=tmp_path`. Cases: - Valid mount: prints expected `nsenter --mount=/proc/1/ns/mnt -- /bin/mount -t overlay …` line. - Valid umount: prints expected umount line. - Bad name (`../escape`, uppercase, empty): exit non-zero, stderr matches. - Lowerdir traversal (`/etc`, `/var/lib/left4me/../etc`, symlink escape): exit non-zero. - Missing `instance.env`: exit non-zero. - Tainted upperdir (with `user.fuseoverlayfs.opaque` xattr): exit non-zero with clear message. (Optional: skip if `setfattr` is unavailable on dev machine; keep test on Linux only via `pytest.mark.skipif`.) - Lowerdir count > 500: exit non-zero. 4. `test_deploy_artifacts.py` — assert `/usr/local/libexec/left4me/left4me-overlay` is present in deployed files; sudoers includes the new lines. Implementation: - Helper script structure: `argparse` for the verb, then path-validation funcs, then `os.execv("/usr/bin/nsenter", [...])` (or printing it under `LEFT4ME_OVERLAY_PRINT_ONLY`). - `KernelOverlayFSMounter`: `name = merged.parent.name` (with a one-line comment), then `run_command(["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", verb, name], on_stdout=…, on_stderr=…, passthrough=…, should_cancel=…)`. **Verification:** ``` python3 -m pytest l4d2host/tests/test_kernel_overlayfs.py l4d2host/tests/test_overlay_helper.py deploy/tests/test_deploy_artifacts.py -q ``` Expected before implementation: FAIL on missing class/script. After: all green. **Commit:** `feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper` --- ## Task 2: Wire OverlayMounter Through Lifecycle + Drop Fuse Module **Files:** - Modify: `l4d2host/instances.py` (start/stop/delete) - Modify: `l4d2host/tests/test_lifecycle.py` (update argv assertions, add double-mount guard test) - Delete: `l4d2host/fs/fuse_overlayfs.py` - Verify: `l4d2host/fs/__init__.py` does not re-export `FuseOverlayFSMounter` Test plan (update RED, then GREEN): 1. `test_lifecycle.py::test_start_order` — change assertion: `calls[0]` is now `["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "mount", "alpha"]`. Adjust setup so the test still creates the merged directory. 2. `test_lifecycle.py::test_stop_succeeds_when_unmount_fails` — `cmd[0:5] == ["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "umount", "alpha"]`. 3. `test_lifecycle.py::test_delete_succeeds_when_unmount_fails` — same. 4. NEW `test_lifecycle.py::test_start_refuses_double_mount` — monkeypatch `os.path.ismount` to return True; expect `start_instance` to raise `subprocess.CalledProcessError`; assert NO mount command was issued. 5. `test_lifecycle.py::test_lifecycle_rejects_unsafe_instance_names` — unchanged. 6. `test_lifecycle.py::test_delete_missing_is_noop` — unchanged. Implementation: - `instances.py` imports `KernelOverlayFSMounter`. Module-level singleton instance (`_mounter = KernelOverlayFSMounter()`). Replace direct `run_command([...fuse-overlayfs...])` with `_mounter.mount(...)`. Replace direct `run_command([...fusermount3...])` with `_mounter.unmount(...)` (still inside the existing try/except for stop/delete). - Add the ismount guard at the top of `start_instance` after `runtime_dir` is computed, before `emit_step("mounting runtime overlay...")`. Raise `subprocess.CalledProcessError(returncode=1, cmd=["mount-guard"], stderr="runtime overlay already mounted at ; refusing to double-mount")`. - Delete `l4d2host/fs/fuse_overlayfs.py`. - Confirm `l4d2host/fs/__init__.py` is empty (already verified to be 1 line). **Verification:** ``` python3 -m pytest l4d2host/tests -q python3 -m pytest l4d2web/tests -q ``` Both green. Web tests: the `"Step: mounting runtime overlay..."` log line is preserved in `start_instance`. **Commit:** `refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop FuseOverlayFSMounter` --- ## Task 3: Deploy Script Migration (Apt Deps + Wipe Upper/Work) **Files:** - Modify: `deploy/deploy-test-server.sh` - Modify: `deploy/tests/test_deploy_artifacts.py` (assert deploy script contains migration lines; assert `fuse-overlayfs` no longer in apt-get install) Test plan: 1. `test_deploy_artifacts.py::test_deploy_script_drops_fuse_overlayfs_apt_dep` — `assert "fuse-overlayfs" not in deploy_script` and `assert "kernel-overlay-migrated" in deploy_script`. 2. `test_deploy_artifacts.py::test_deploy_script_migration_block_uses_sentinel` — `assert ".kernel-overlay-migrated" in deploy_script`. Implementation: In `deploy/deploy-test-server.sh`, drop `fuse-overlayfs` from the apt-get and dnf lines (lines 82, 84). Insert before the existing `systemctl restart left4me-web.service` (line 182): ```sh # One-time migration: fuse-overlayfs upperdir → kernel overlayfs upperdir. # fuse-overlayfs running as the left4me user uses user.fuseoverlayfs.* xattrs # for whiteouts and opaque dirs; kernel overlayfs ignores those, so any # pre-existing upper/ from the fuse era would resurrect "deleted" files. sentinel=/var/lib/left4me/.kernel-overlay-migrated if [ ! -e "$sentinel" ]; then $sudo_cmd systemctl stop 'left4me-server@*.service' 2>/dev/null || true $sudo_cmd systemctl stop left4me-web.service 2>/dev/null || true $sudo_cmd sh -c 'findmnt -t fuse.fuse-overlayfs -o TARGET --noheadings | xargs -r -n1 fusermount3 -u 2>/dev/null || true' $sudo_cmd sh -c "findmnt -t overlay -o TARGET --noheadings | grep '/var/lib/left4me/runtime/' | xargs -r -n1 umount 2>/dev/null || true" $sudo_cmd sh -c 'for d in /var/lib/left4me/runtime/*/; do [ -d "$d" ] || continue; rm -rf "$d/upper" "$d/work"; mkdir -p "$d/upper" "$d/work"; chown left4me:left4me "$d/upper" "$d/work"; done' $sudo_cmd touch "$sentinel" $sudo_cmd chown left4me:left4me "$sentinel" fi ``` **Verification:** ``` python3 -m pytest deploy/tests -q ``` Green. **Commit:** `chore(deploy): drop fuse-overlayfs apt dep + one-shot migrate upper/work` --- ## Task 4: Web Unit Hardening Cleanup + Docs **Files:** - Modify: `deploy/files/usr/local/lib/systemd/system/left4me-web.service` - Modify: `deploy/tests/test_deploy_artifacts.py` - Modify: `README.md` - Modify: `l4d2host/README.md` - Modify: `deploy/README.md` Test plan: 1. `test_deploy_artifacts.py::test_web_unit_contains_required_runtime_contract` — drop `assert "MountFlags=shared" in unit` (or rather: replace with `assert "MountFlags=" not in unit`); add `assert "PrivateTmp=true" in unit`; add `assert "left4me-overlay" not in unit` (just to be precise — the unit shouldn't reference the helper directly, only via Python code). Implementation: Edit `left4me-web.service`: - Drop `MountFlags=shared`. - Restore `PrivateTmp=true`. - Rewrite the comment block above hardening lines to explain: mounts now go through the `left4me-overlay` helper which `nsenter`s into PID 1's mount namespace, so this unit's namespace is irrelevant to gameserver visibility. `NoNewPrivileges` stays unset because sudo is setuid. README updates: - `README.md` (line ~59): drop fuse-overlayfs from tech-stack list; replace with "kernel overlayfs via privileged helper". - `l4d2host/README.md`: lines 29, 52, 64 reference fuse — update to "kernel overlayfs (mount via the `left4me-overlay` helper deployed to `/usr/local/libexec/left4me/`)". - `deploy/README.md`: add `/usr/local/libexec/left4me/left4me-overlay` to the privileged-helpers inventory. **Verification:** ``` python3 -m pytest deploy/tests -q ``` Green. Manual readthrough of the three READMEs confirms no stale fuse references. **Commit:** `chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs` --- ## Task 5: End-to-End Verification on `ckn@10.0.4.128` **Pre-deploy:** branch is clean, all four prior commits land, all tests green locally. **Deploy:** ``` deploy/deploy-test-server.sh ckn@10.0.4.128 ``` **Verification commands on the box:** 1. `test -e /var/lib/left4me/.kernel-overlay-migrated && echo migrated` — sentinel created. 2. `systemctl status left4me-web.service --no-pager` — `active (running)`, recent invocation timestamp. 3. From the UI or via `sudo -u left4me /opt/left4me/.venv/bin/l4d2ctl start test-server` — exit 0. 4. `findmnt /var/lib/left4me/runtime/test-server/merged` — shows fstype `overlay` in the host namespace. 5. `systemctl status left4me-server@test-server --no-pager` — `active (running)` after the start; **not** in `activating (auto-restart)`. No `status=200/CHDIR` errors in `journalctl -u left4me-server@test-server`. 6. `sudo journalctl -k --since "5 minutes ago" | grep -i apparmor | tail` — no overlay-related denials. 7. Negative test: `sudo -u left4me sudo -n /usr/local/libexec/left4me/left4me-overlay mount '../escape'` — exits non-zero with validation error. 8. Idempotency: `l4d2ctl stop test-server && l4d2ctl stop test-server` — both succeed (per the prior `fix(l4d2-host): make stop_instance idempotent` commit, still holds). 9. Re-start: `l4d2ctl start test-server` — succeeds, `findmnt` shows the mount again. 10. Double-mount guard: while the server is running, attempting another start (not via UI; via Python REPL or a second job) — `start_instance` raises `CalledProcessError` with the "refusing to double-mount" message. Optional, can be left to the unit test. **On failure of any step:** stop and report. Do NOT push. The deploy script is rerunnable; the migration sentinel stays so wipe doesn't repeat. --- ## Out Of Scope - See spec's "Out Of Scope" section. - This plan does not push commits; pushing is a separate user decision after end-to-end verification passes.