left4me

Author	SHA1	Message	Date
mwiegand	2f6a9cfba0	feat(left4me-overlay): idmap bind mounts for l4d2-sandbox-owned lowerdirs Insert an idmapped bind mount in front of each lowerdir whose top-level uid matches l4d2-sandbox at overlay-mount time, so that overlayfs copy-up produces left4me-owned upperdir entries instead of EACCES. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 23:48:07 +02:00
mwiegand	363f429c7a	l4d2host: LEFT4ME_STEAMCMD env var for steamcmd path SteamInstaller defaults to steamcmd="steamcmd" (bare name), which relies on PATH lookup. Deployments that don't have steamcmd on PATH — or where steamcmd.sh's `cd "$(dirname "$0")"` breaks under PATH-symlink invocation (observed with the Valve-shipped script) — can now pin an absolute path via LEFT4ME_STEAMCMD. Default keeps bare-name lookup for dev/tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 22:46:21 +02:00
mwiegand	5eac51a93e	fix(deploy): wrap overlay helper with nsenter so it doesn't pin the unit's mount namespace systemd's `+` Exec prefix removes sandbox/credentials but does NOT detach from the unit's per-service mount namespace (created by PrivateTmp/Protect). The Python interpreter for the helper was launched inside that namespace, and even though the helper internally nsenter'd into PID 1 for the umount syscall, the calling Python process itself never left the unit's namespace. Its existence pinned the namespace alive, which kept the slave mount tree alive, which made PID 1's umount return EBUSY for the entire duration of the helper's run. The mount became unmountable the moment the helper exited — empirically verified by polling /proc//ns/mnt during stop: the only PID holding the dying namespace was the helper itself. Wrap both ExecStartPre and ExecStopPost with `/usr/bin/nsenter --mount=/proc/1/ns/mnt --` so the helper Python interpreter runs in PID 1's mount namespace from the start. With the helper out of the unit's namespace, umount succeeds first try once the cgroup empties. Reset went from ~25 s with retry/lazy-fallback workarounds to ~0.5 s clean. Knock-on cleanups: - Helper drops internal nsenter for the syscalls (already in PID 1's namespace), and drops the eager-retry loop + lazy-umount fallback + inner work_inner retry (no race left to ride out). - Revert TimeoutStopSec=60s back to 15s. - Tests updated to expect the new argv shapes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 15:13:59 +02:00
mwiegand	59771f91c4	fix(deploy): drop deleted l4d2host.fs from pyproject + use nproc --all Two bugs surfaced by the previous deploy attempt: 1. l4d2host/pyproject.toml still listed `l4d2host.fs` in the explicit packages= list. After deleting the fs/ package, pip install -e fails with "package directory './fs' does not exist". 2. The CPU-isolation deploy step uses `nproc` to detect host core count, but `nproc` honors Cpus_allowed of the calling shell. On a host that already has the cpuset drop-ins applied (system.slice/user.slice → AllowedCPUs=0), the SSH login lands constrained to one core and `nproc` returns 1 — making subsequent deploys think they're on a single-core box and skip the cpuset writes entirely. `nproc --all` reports installed processors regardless of affinity, which is what the deploy actually wants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 13:11:19 +02:00
mwiegand	ff6ce7b091	refactor(l4d2-host): unmount via ExecStopPost — single code path mirroring mount Symmetric with the earlier mount cleanup (commits 519567e..a982995). Until now, the unit's ExecStartPre handled mount but the Python side still drove unmount: stop_instance and _purge_instance both called _mounter.unmount, which wrapped sudo + the helper. Two code paths for two halves of the same lifecycle. Move unmount into the unit: - ExecStopPost=+/usr/local/libexec/left4me/left4me-overlay umount %i (ExecStopPost, not ExecStop, so it runs after the cgroup is cleared; ExecStop runs while srcds is alive and would EBUSY the umount syscall.) - Helper's umount verb is now idempotent (mirrors mount): if merged isn't a mount point, return early. PRINT_ONLY mode bypasses both short-circuits so the unit tests still exercise the full nsenter argv. Drop the dead Python machinery: - _mounter.unmount(...) calls in stop_instance and _purge_instance - _mounter global + KernelOverlayFSMounter import - The whole l4d2host/fs/ package (OverlayMounter ABC + KernelOverlayFSMounter class) — no production callers, just self-tests - l4d2host/tests/test_kernel_overlayfs.py - test_stop_succeeds_when_unmount_fails / test_delete_succeeds_when_unmount_fails (tested Python-side unmount-failure tolerance that no longer exists) - The l4d2host.fs.kernel_overlayfs.run_command monkeypatches in lifecycle tests After this, the only thing start_instance does beyond cfg-staging is ask systemd to enable+start the unit. stop/delete/reset only ask systemd to disable; the overlay lifecycle lives entirely in the unit file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 13:09:52 +02:00
mwiegand	56f5c30296	refactor(l4d2-host): unit's ExecStartPre is the sole code path to the mount Before this change there were two callers of left4me-overlay mount: the web app's start_instance (Python, in-process) and the unit's ExecStartPre (shell, via sudo). The duplication invited divergence; the helper's recently-added idempotency made both paths technically work but at the cost of a "first wins" race and dead-code retry logic in start_instance. Drop the in-process _mounter.mount() call from start_instance. The web app now only stages cfg files (which still must happen on the host filesystem before mount, to avoid overlayfs copy-up changing ownership), then asks systemd to enable+start the unit; the unit's ExecStartPre does the mount. Removed: - os.path.ismount(merged) refusal in start_instance and its test (test_start_refuses_to_double_mount). The race the check guarded against is now handled by the helper's idempotency. - _load_instance_env helper and the `os` import (both became dead). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 12:54:05 +02:00
mwiegand	8552c559d3	feat(l4d2-host): server lifecycle uses systemctl enable --now / disable --now Servers started via the web UI now create a WantedBy= symlink under multi-user.target.wants/, so they auto-start on the next host reboot. Helper verbs renamed start/stop -> enable/disable; service_control.py renamed start_service/stop_service -> enable_service/disable_service. The user-facing l4d2ctl start/stop commands keep their names per the AGENTS.md contract -- only the implementation changes. Spec: docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 12:28:44 +02:00
mwiegand	985df970f8	feat(l4d2-web): per-overlay server.cfg aliases — expose checkbox + auto-exec Each linked overlay gets a checkbox on the blueprint detail page that opts its server.cfg in as exec server_overlay_<id>. The web app builds the spec with {path, alias} per overlay and prepends exec server_overlay_<id> lines to the blueprint config in lowest-overlay-first order. The host stages those copies in the overlayfs upper layer before mounting (avoids copy-up writes against a sandbox-uid file). A live preview block above the Config textarea shows what gets auto-executed. Schema: - alembic 0007: BlueprintOverlay.expose_server_cfg BOOLEAN Spec contract: - l4d2host OverlayRef(path, alias?). load_spec accepts both bare-string and {path, alias} entries. Side effects folded in (same file in l4d2_facade): - start_server auto-initializes; the manual Initialize step is no longer needed before Start. - initialize_server no longer runs blueprint builders — builds happen on overlay save, not on every server Start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 01:26:31 +02:00
mwiegand	6b4eef22c2	feat: server Reset action — wipe runtime, keep DB row Reset stops the systemd service, unmounts the overlay, and rm -rf's both runtime/<name> and instances/<name>, but keeps the Server row, blueprint, and (shared) systemd template. Next Start re-initializes from the current blueprint, so users can clean up logs/caches/accumulated game state without losing the server. Implementation factors a shared _purge_instance helper out of delete_instance; reset_instance reuses it without the existence guard. New "reset" lifecycle op flows through the same route + worker + facade plumbing as the other server ops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 18:10:32 +02:00
mwiegand	9985ecc56c	chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs Drop MountFlags=shared (the assumption that it propagated fuse mounts to host was incorrect on systemd 257 with ProtectSystem+ReadWritePaths). Restore PrivateTmp=true (was dropped in `593611e` for fuse propagation that did not work). Rewrite the comment block to describe the new model: mounts go through the left4me-overlay helper which nsenters into PID 1's mount namespace, so the unit's mount-ns layout is no longer load-bearing. Update the three user-facing READMEs (root, l4d2host, deploy) to drop fuse-overlayfs / fusermount3 prereqs and call out the kernel overlayfs mount path through the privileged helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:29:49 +02:00
mwiegand	93a60befb6	refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop fuse module Replace direct fuse-overlayfs / fusermount3 subprocess calls in start_instance / stop_instance / delete_instance with the existing OverlayMounter abstraction, now backed by KernelOverlayFSMounter. Adds an os.path.ismount guard at the top of start_instance so a kernel-level overlay that survived a web-worker crash isn't double- mounted (kernel mounts persist when the cgroup dies, unlike fuse daemons). Delete the unused FuseOverlayFSMounter module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:26:28 +02:00
mwiegand	d5b321b557	feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper New privileged helper at /usr/local/libexec/left4me/left4me-overlay (Python, system /usr/bin/python3, stdlib only) takes only the instance name, parses instance.env for L4D2_LOWERDIRS, validates each lowerdir against an allowlist (installation/, overlays/, global_overlay_cache/, workshop_cache/), refuses upperdirs tainted with user.fuseoverlayfs.* xattrs from the prior fuse era, and execs `nsenter --mount=/proc/1/ns/mnt -- mount -t overlay ...` so the resulting mount lives in the host namespace. Mirrors the existing left4me-systemctl / left4me-journalctl pattern; sudoers entry is verb-constrained. KernelOverlayFSMounter implements the existing OverlayMounter ABC, deriving the instance name from the merged path. No call sites use it yet — that's the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:23:58 +02:00
mwiegand	d5d710afa7	fix(l4d2-host): make stop_instance idempotent on the unmount step systemctl stop is already a no-op on a stopped unit, but stop_instance was unconditionally running fusermount3 -u and bubbling up the EINVAL when the overlay wasn't currently mounted (e.g. server already stopped). Mirror the established delete_instance pattern: always attempt the unmount, swallow CalledProcessError, and label the step "(if mounted)". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 11:24:04 +02:00
mwiegand	d18b397330	fix(host): create ~/.steam/sdk32 and sdk64 symlinks during install L4D2 dedicated server expects to dlopen steamclient.so via ~/.steam/sdk32 (and sdk64). Without those symlinks, srcds_run logs 'cannot open shared object file' and SteamAPI_Init fails, which means the server is invisible to the public Steam master server, Workshop addon downloads break, and Steam 'Join Game' / lobby joins do not reach the server (only direct-IP connect works). SteamInstaller.install_or_update now ensures the symlinks exist after SteamCMD finishes. Targets prefer SteamCMD's linux32/linux64 sibling dirs; falls back to <install_dir>/bin/ if the siblings cannot be located. Idempotent: re-running the install repairs or leaves the symlinks alone. Path.home() respects HOME, which the systemd web unit sets to /var/lib/left4me, so the symlinks land in the left4me user's home. Existing deploys can apply the fix by re-running 'Install' from /admin without a full redeploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 02:11:27 +02:00
mwiegand	f81e839ba2	security: harden boundary inputs and production defaults - validate instance names at the host lib and web boundary against [a-z0-9][a-z0-9_-]{0,63} to prevent path traversal via Server.name - fail-closed on SECRET_KEY: load_config returns None when env unset, create_app raises if missing or "dev" outside TESTING - close login timing oracle by hashing a dummy digest when the user is not found, equalizing response time - set SESSION_COOKIE_SECURE outside TESTING - delete_instance tolerates stop_service and fusermount3 failures so partially-initialized instances clean up without contract breaks; drops the is_mount() preflight that violated AGENTS.md - document claim_next_job's single-process assumption - clarify emit_step contract via docstring Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 00:53:33 +02:00
mwiegand	1604859f41	feat(host): add step logging to steam_install	2026-05-06 20:41:39 +02:00
mwiegand	005d2d8458	fix(host): enforce flush=True to prevent pipeline block buffering	2026-05-06 20:34:41 +02:00
mwiegand	38d04e8551	feat(host): emit steps during start, stop, and delete operations	2026-05-06 20:07:00 +02:00
mwiegand	d977098344	feat(host): emit steps during initialize_instance	2026-05-06 20:04:08 +02:00
mwiegand	700b5be6f8	feat(host): add _emit_step helper for lifecycle logging	2026-05-06 20:00:07 +02:00
mwiegand	bbfc528354	feat(deploy): add production-like test deployment	2026-05-06 19:30:10 +02:00
mwiegand	de86139323	feat(l4d2): add l4d2ctl host command boundary	2026-05-06 16:35:20 +02:00
mwiegand	a347829608	feat(l4d2-web): add job pages and cancellation	2026-05-06 15:05:13 +02:00
mwiegand	288eda7c37	chore(l4d2): flatten component layout	2026-05-05 23:47:06 +02:00

24 commits