Commit graph

20 commits

Author SHA1 Message Date
mwiegand
363f429c7a
l4d2host: LEFT4ME_STEAMCMD env var for steamcmd path
SteamInstaller defaults to steamcmd="steamcmd" (bare name), which relies
on PATH lookup. Deployments that don't have steamcmd on PATH — or where
steamcmd.sh's `cd "$(dirname "$0")"` breaks under PATH-symlink invocation
(observed with the Valve-shipped script) — can now pin an absolute path
via LEFT4ME_STEAMCMD. Default keeps bare-name lookup for dev/tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:46:21 +02:00
mwiegand
5eac51a93e
fix(deploy): wrap overlay helper with nsenter so it doesn't pin the unit's mount namespace
systemd's `+` Exec prefix removes sandbox/credentials but does NOT
detach from the unit's per-service mount namespace (created by
PrivateTmp/Protect*). The Python interpreter for the helper was
launched inside that namespace, and even though the helper internally
nsenter'd into PID 1 for the umount syscall, the calling Python
process itself never left the unit's namespace. Its existence pinned
the namespace alive, which kept the slave mount tree alive, which
made PID 1's umount return EBUSY for the entire duration of the
helper's run. The mount became unmountable the moment the helper
exited — empirically verified by polling /proc/*/ns/mnt during stop:
the only PID holding the dying namespace was the helper itself.

Wrap both ExecStartPre and ExecStopPost with `/usr/bin/nsenter
--mount=/proc/1/ns/mnt --` so the helper Python interpreter runs in
PID 1's mount namespace from the start. With the helper out of the
unit's namespace, umount succeeds first try once the cgroup empties.
Reset went from ~25 s with retry/lazy-fallback workarounds to ~0.5 s
clean.

Knock-on cleanups:
- Helper drops internal nsenter for the syscalls (already in PID 1's
  namespace), and drops the eager-retry loop + lazy-umount fallback +
  inner work_inner retry (no race left to ride out).
- Revert TimeoutStopSec=60s back to 15s.
- Tests updated to expect the new argv shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 15:13:59 +02:00
mwiegand
ff6ce7b091
refactor(l4d2-host): unmount via ExecStopPost — single code path mirroring mount
Symmetric with the earlier mount cleanup (commits 519567e..a982995). Until
now, the unit's ExecStartPre handled mount but the Python side still drove
unmount: stop_instance and _purge_instance both called _mounter.unmount,
which wrapped sudo + the helper. Two code paths for two halves of the
same lifecycle.

Move unmount into the unit:

- ExecStopPost=+/usr/local/libexec/left4me/left4me-overlay umount %i
  (ExecStopPost, not ExecStop, so it runs after the cgroup is cleared;
  ExecStop runs while srcds is alive and would EBUSY the umount syscall.)
- Helper's umount verb is now idempotent (mirrors mount): if merged
  isn't a mount point, return early. PRINT_ONLY mode bypasses both
  short-circuits so the unit tests still exercise the full nsenter argv.

Drop the dead Python machinery:

- _mounter.unmount(...) calls in stop_instance and _purge_instance
- _mounter global + KernelOverlayFSMounter import
- The whole l4d2host/fs/ package (OverlayMounter ABC + KernelOverlayFSMounter
  class) — no production callers, just self-tests
- l4d2host/tests/test_kernel_overlayfs.py
- test_stop_succeeds_when_unmount_fails / test_delete_succeeds_when_unmount_fails
  (tested Python-side unmount-failure tolerance that no longer exists)
- The l4d2host.fs.kernel_overlayfs.run_command monkeypatches in lifecycle tests

After this, the only thing start_instance does beyond cfg-staging is ask
systemd to enable+start the unit. stop/delete/reset only ask systemd to
disable; the overlay lifecycle lives entirely in the unit file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 13:09:52 +02:00
mwiegand
56f5c30296
refactor(l4d2-host): unit's ExecStartPre is the sole code path to the mount
Before this change there were two callers of left4me-overlay mount:
the web app's start_instance (Python, in-process) and the unit's
ExecStartPre (shell, via sudo). The duplication invited divergence; the
helper's recently-added idempotency made both paths technically work
but at the cost of a "first wins" race and dead-code retry logic in
start_instance.

Drop the in-process _mounter.mount() call from start_instance. The web
app now only stages cfg files (which still must happen on the host
filesystem before mount, to avoid overlayfs copy-up changing ownership),
then asks systemd to enable+start the unit; the unit's ExecStartPre
does the mount.

Removed:
- os.path.ismount(merged) refusal in start_instance and its test
  (test_start_refuses_to_double_mount). The race the check guarded
  against is now handled by the helper's idempotency.
- _load_instance_env helper and the `os` import (both became dead).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:54:05 +02:00
mwiegand
8552c559d3
feat(l4d2-host): server lifecycle uses systemctl enable --now / disable --now
Servers started via the web UI now create a WantedBy= symlink under
multi-user.target.wants/, so they auto-start on the next host reboot.
Helper verbs renamed start/stop -> enable/disable; service_control.py
renamed start_service/stop_service -> enable_service/disable_service.
The user-facing l4d2ctl start/stop commands keep their names per the
AGENTS.md contract -- only the implementation changes. Spec:
docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:28:44 +02:00
mwiegand
985df970f8
feat(l4d2-web): per-overlay server.cfg aliases — expose checkbox + auto-exec
Each linked overlay gets a checkbox on the blueprint detail page that opts
its server.cfg in as exec server_overlay_<id>. The web app builds the
spec with {path, alias} per overlay and prepends exec server_overlay_<id>
lines to the blueprint config in lowest-overlay-first order. The host
stages those copies in the overlayfs upper layer before mounting (avoids
copy-up writes against a sandbox-uid file). A live preview block above the
Config textarea shows what gets auto-executed.

Schema:
- alembic 0007: BlueprintOverlay.expose_server_cfg BOOLEAN

Spec contract:
- l4d2host OverlayRef(path, alias?). load_spec accepts both bare-string
  and {path, alias} entries.

Side effects folded in (same file in l4d2_facade):
- start_server auto-initializes; the manual Initialize step is no longer
  needed before Start.
- initialize_server no longer runs blueprint builders — builds happen on
  overlay save, not on every server Start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:26:31 +02:00
mwiegand
6b4eef22c2
feat: server Reset action — wipe runtime, keep DB row
Reset stops the systemd service, unmounts the overlay, and rm -rf's both
runtime/<name> and instances/<name>, but keeps the Server row, blueprint,
and (shared) systemd template. Next Start re-initializes from the current
blueprint, so users can clean up logs/caches/accumulated game state without
losing the server.

Implementation factors a shared _purge_instance helper out of
delete_instance; reset_instance reuses it without the existence guard. New
"reset" lifecycle op flows through the same route + worker + facade plumbing
as the other server ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:10:32 +02:00
mwiegand
93a60befb6
refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop fuse module
Replace direct fuse-overlayfs / fusermount3 subprocess calls in
start_instance / stop_instance / delete_instance with the existing
OverlayMounter abstraction, now backed by KernelOverlayFSMounter.
Adds an os.path.ismount guard at the top of start_instance so a
kernel-level overlay that survived a web-worker crash isn't double-
mounted (kernel mounts persist when the cgroup dies, unlike fuse
daemons).

Delete the unused FuseOverlayFSMounter module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:26:28 +02:00
mwiegand
d5b321b557
feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper
New privileged helper at /usr/local/libexec/left4me/left4me-overlay
(Python, system /usr/bin/python3, stdlib only) takes only the instance
name, parses instance.env for L4D2_LOWERDIRS, validates each lowerdir
against an allowlist (installation/, overlays/, global_overlay_cache/,
workshop_cache/), refuses upperdirs tainted with user.fuseoverlayfs.*
xattrs from the prior fuse era, and execs `nsenter --mount=/proc/1/ns/mnt
-- mount -t overlay ...` so the resulting mount lives in the host
namespace. Mirrors the existing left4me-systemctl / left4me-journalctl
pattern; sudoers entry is verb-constrained.

KernelOverlayFSMounter implements the existing OverlayMounter ABC,
deriving the instance name from the merged path. No call sites use it
yet — that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:23:58 +02:00
mwiegand
d5d710afa7
fix(l4d2-host): make stop_instance idempotent on the unmount step
systemctl stop is already a no-op on a stopped unit, but stop_instance
was unconditionally running fusermount3 -u and bubbling up the EINVAL
when the overlay wasn't currently mounted (e.g. server already stopped).
Mirror the established delete_instance pattern: always attempt the
unmount, swallow CalledProcessError, and label the step "(if mounted)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:24:04 +02:00
mwiegand
d18b397330
fix(host): create ~/.steam/sdk32 and sdk64 symlinks during install
L4D2 dedicated server expects to dlopen steamclient.so via
~/.steam/sdk32 (and sdk64). Without those symlinks, srcds_run logs
'cannot open shared object file' and SteamAPI_Init fails, which means
the server is invisible to the public Steam master server, Workshop
addon downloads break, and Steam 'Join Game' / lobby joins do not
reach the server (only direct-IP connect works).

SteamInstaller.install_or_update now ensures the symlinks exist after
SteamCMD finishes. Targets prefer SteamCMD's linux32/linux64 sibling
dirs; falls back to <install_dir>/bin/ if the siblings cannot be
located. Idempotent: re-running the install repairs or leaves the
symlinks alone.

Path.home() respects HOME, which the systemd web unit sets to
/var/lib/left4me, so the symlinks land in the left4me user's home.

Existing deploys can apply the fix by re-running 'Install' from /admin
without a full redeploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:11:27 +02:00
mwiegand
f81e839ba2
security: harden boundary inputs and production defaults
- validate instance names at the host lib and web boundary against
  [a-z0-9][a-z0-9_-]{0,63} to prevent path traversal via Server.name
- fail-closed on SECRET_KEY: load_config returns None when env unset,
  create_app raises if missing or "dev" outside TESTING
- close login timing oracle by hashing a dummy digest when the user
  is not found, equalizing response time
- set SESSION_COOKIE_SECURE outside TESTING
- delete_instance tolerates stop_service and fusermount3 failures so
  partially-initialized instances clean up without contract breaks;
  drops the is_mount() preflight that violated AGENTS.md
- document claim_next_job's single-process assumption
- clarify emit_step contract via docstring

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:53:33 +02:00
mwiegand
1604859f41
feat(host): add step logging to steam_install 2026-05-06 20:41:39 +02:00
mwiegand
005d2d8458
fix(host): enforce flush=True to prevent pipeline block buffering 2026-05-06 20:34:41 +02:00
mwiegand
d977098344
feat(host): emit steps during initialize_instance 2026-05-06 20:04:08 +02:00
mwiegand
700b5be6f8
feat(host): add _emit_step helper for lifecycle logging 2026-05-06 20:00:07 +02:00
mwiegand
bbfc528354
feat(deploy): add production-like test deployment 2026-05-06 19:30:10 +02:00
mwiegand
de86139323
feat(l4d2): add l4d2ctl host command boundary 2026-05-06 16:35:20 +02:00
mwiegand
a347829608
feat(l4d2-web): add job pages and cancellation 2026-05-06 15:05:13 +02:00
mwiegand
288eda7c37
chore(l4d2): flatten component layout 2026-05-05 23:47:06 +02:00