Commit graph

91 commits

Author SHA1 Message Date
mwiegand
93a60befb6
refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop fuse module
Replace direct fuse-overlayfs / fusermount3 subprocess calls in
start_instance / stop_instance / delete_instance with the existing
OverlayMounter abstraction, now backed by KernelOverlayFSMounter.
Adds an os.path.ismount guard at the top of start_instance so a
kernel-level overlay that survived a web-worker crash isn't double-
mounted (kernel mounts persist when the cgroup dies, unlike fuse
daemons).

Delete the unused FuseOverlayFSMounter module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:26:28 +02:00
mwiegand
d5b321b557
feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper
New privileged helper at /usr/local/libexec/left4me/left4me-overlay
(Python, system /usr/bin/python3, stdlib only) takes only the instance
name, parses instance.env for L4D2_LOWERDIRS, validates each lowerdir
against an allowlist (installation/, overlays/, global_overlay_cache/,
workshop_cache/), refuses upperdirs tainted with user.fuseoverlayfs.*
xattrs from the prior fuse era, and execs `nsenter --mount=/proc/1/ns/mnt
-- mount -t overlay ...` so the resulting mount lives in the host
namespace. Mirrors the existing left4me-systemctl / left4me-journalctl
pattern; sudoers entry is verb-constrained.

KernelOverlayFSMounter implements the existing OverlayMounter ABC,
deriving the instance name from the merged path. No call sites use it
yet — that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:23:58 +02:00
mwiegand
db120d77d3
docs(specs): kernel overlayfs migration design + plan
Captures the architectural fix for the mount-propagation bug: replace
fuse-overlayfs (rootless mount inside the web service's namespace, never
visible to host or to gameserver units) with kernel-native overlayfs
mounted via a privileged sudo helper that nsenters into PID 1's mount
namespace. Companion plan numbers the migration as five tasks ending in
end-to-end verification on the test box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:19:26 +02:00
mwiegand
d5d710afa7
fix(l4d2-host): make stop_instance idempotent on the unmount step
systemctl stop is already a no-op on a stopped unit, but stop_instance
was unconditionally running fusermount3 -u and bubbling up the EINVAL
when the overlay wasn't currently mounted (e.g. server already stopped).
Mirror the established delete_instance pattern: always attempt the
unmount, swallow CalledProcessError, and label the step "(if mounted)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:24:04 +02:00
mwiegand
38548ab0d7
chore(deploy): raise gunicorn thread pool to 32 for SSE headroom
Each SSE log-viewer or job-log stream holds a thread for its full
lifetime. With --threads 8, a handful of open browser tabs could
exhaust the pool. 32 keeps the same single-process scheduler invariant
(_claim_lock in job_worker is process-local) while giving SSE plenty
of headroom on the test box's user count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:19:03 +02:00
mwiegand
4552af6544
fix(l4d2-web): keep SSE log stream from pinning gunicorn threads
stream_command used a blocking proc.stdout.readline() that never woke
when the underlying journalctl was silent, so Flask never delivered
GeneratorExit on client disconnect — the worker thread and the journalctl
child both leaked permanently and pinned the gunicorn thread pool.

Switch to a select-based read loop with a 15s heartbeat tick (yielded as
""), and translate the tick to an SSE keepalive comment in the log route.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:18:56 +02:00
mwiegand
ffc4cdbd7d
refactor(l4d2-web): remove legacy external overlay type
The workshop + managed-global overlay surface fully covers the
admin-SFTP flow that 'external' was a placeholder for. Drop the type
from the model defaults, builder registry, routes, template, and
tests, and add migration 0004 that deletes any leftover external
rows along with their blueprint and job references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:31:04 +02:00
mwiegand
92d6ebbe82
feat(l4d2-web): managed global map overlays with daily refresh
Adds two managed system overlays (l4d2center-maps, cedapug-maps) that
fetch curated map archives from upstream sources and reconcile addons
symlinks for non-Steam maps. A daily systemd timer enqueues a coalesced
refresh_global_overlays worker job; downloads, extraction, and rebuilds
run in the existing job worker and surface in the job log UI.

Schema: GlobalOverlaySource / GlobalOverlayItem / GlobalOverlayItemFile
plus nullable Job.user_id so system jobs render as "system" in the UI.
The new builder reconciles symlinks against the per-source vpk cache
and leaves foreign symlinks untouched. Initialize-time guard refuses
to mount a partial overlay if any expected vpk is missing from cache.

Refresh service uses shutil.move to handle EXDEV when /tmp and the
cache live on different filesystems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:05:14 +02:00
mwiegand
4f78574edd
fix(l4d2-web): keep workshop refresh responsive
Limit workshop refresh downloads to one worker and commit build-overlay enqueue work before writing the final job log so SQLite locks do not wedge the web process.
2026-05-07 17:31:12 +02:00
mwiegand
0e83ee07d7
fix(deploy): make test deployments safe to rerun
Exclude local agent state from deploy archives, avoid recursive ownership over active runtime mounts, and let Alembic own schema upgrades before app startup.
2026-05-07 17:16:58 +02:00
mwiegand
b2a8d3d5e0
feat(deploy): workshop_cache provisioning
Adds /var/lib/left4me/workshop_cache to the deploy mkdir list (owned by
the left4me runtime user). Updates deploy/README.md to document the new
directory and the workshop overlay layout: web app downloads VPKs into
the cache and symlinks them into overlays/{overlay_id}/left4dead2/addons/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:53:49 +02:00
mwiegand
ac020d1e77
feat(l4d2-web): initialize-time guard for uncached workshop items
Before invoking l4d2ctl initialize, run each blueprint overlay's builder
synchronously and then verify that every workshop item attached to the
blueprint has a cache file on disk. If any are missing, raise a clear
error naming the overlay and the missing steam_ids — server start can't
silently mount a partial overlay where some maps are mysteriously absent
in-game.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:53:04 +02:00
mwiegand
df1ccb4cca
feat(l4d2-web): workshop overlay UI (routes + templates)
Adds workshop_routes blueprint with add-items / remove-item / manual-
build endpoints plus admin /admin/workshop/refresh. Add-items handles
single ID, single URL, multi-line batch, or a collection ID; auto-
enqueues a coalesced build_overlay job per call. Reject non-L4D2 items
with 400, duplicate associations with friendly toast, intruders with
403.

Generalizes overlay_routes: type+name only on create (no path field);
external is admin-only and system-wide, workshop is per-user and
auto-pathed. Update is name-only. Delete recursively removes the
on-disk dir only for managed paths (path == str(id)); legacy externals
are left in place. The pre-existing in-use guard is preserved.

Page routes filter the overlay listing by user permissions and load
workshop items + the latest related job for the detail view.

Templates: unified Create modal with type radio (no path field).
Type-aware overlay detail: workshop overlays show a multi-line input
+ items/collection radio + item table partial with thumbnails, manual
Rebuild button, and a small status indicator pulled from the latest
related job. Admin page gets a "Refresh all workshop items" button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:50:54 +02:00
mwiegand
38a6fbbe1e
feat(l4d2-web): worker support for build_overlay and refresh_workshop_items
Extends SchedulerState with running_overlays / refresh_running /
blocked_servers_by_overlay, and updates can_start with the truth table:
install and refresh_workshop_items are global mutexes; build_overlay
serializes per-overlay; server jobs block on builds for any overlay
their blueprint references.

Adds enqueue_build_overlay coalescing helper that returns an existing
queued job for the same overlay rather than inserting a duplicate.

Adds run_job dispatch for build_overlay (BUILDERS[overlay.type].build)
and refresh_workshop_items (re-fetches metadata, re-downloads on
time_updated/filename change, enqueues coalesced rebuilds for affected
overlays).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:44:10 +02:00
mwiegand
700940d578
feat(l4d2-web): overlay builder registry with workshop builder
Adds l4d2web/services/overlay_builders.py with a BUILDERS dict mapping
Overlay.type to a builder class. ExternalBuilder is a no-op that just
ensures the overlay directory exists. WorkshopBuilder diff-applies
absolute symlinks under left4dead2/addons/ against the overlay's current
WorkshopItem associations: creates new ones, removes obsolete, leaves
unrelated files alone, and skips uncached items with a warning rather
than producing dangling symlinks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:40:30 +02:00
mwiegand
f0230e17d3
feat(l4d2-web): overlay path helpers and creation
Adds workshop_paths.cache_path(steam_id) returning
$LEFT4ME_ROOT/workshop_cache/{steam_id}.vpk with digit-only validation.

Adds overlay_creation.generate_overlay_path(id) and
create_overlay_directory(overlay) with exist_ok=False so a stray dir from a
prior failed delete surfaces loudly instead of shadowing fresh content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:38:39 +02:00
mwiegand
c6b41429ee
feat(l4d2-web): steam workshop API client and downloader
Adds l4d2web/services/steam_workshop.py: parse_workshop_input (single ID,
URL, or multi-line batch), resolve_collection (HTTPS POST to
GetCollectionDetails), fetch_metadata_batch (HTTPS POST to
GetPublishedFileDetails with consumer_app_id == 550 enforcement that
raises WorkshopValidationError in add-mode and silently skips in
refresh-mode), download_to_cache (atomic + idempotent on mtime+size),
and refresh_all (ThreadPoolExecutor with per-item error collection).

Adds requests as an explicit dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:37:39 +02:00
mwiegand
2543a05c12
feat(l4d2-web): typed overlays + workshop schema migration
Adds Overlay.type and Overlay.user_id with two partial unique indexes
(externals globally unique by name; user overlays unique per user).
Adds WorkshopItem registry keyed on steam_id and a pure many-to-many
overlay_workshop_items association. Adds Job.overlay_id for build_overlay
job tracking. Switches overlays.id to AUTOINCREMENT so deleted IDs are
never reused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:35:13 +02:00
mwiegand
b46f52258d
docs(workshop): spec and plan for steam workshop overlays
Add a typed-overlay model with workshop as the first non-external type:
deduplicated WorkshopItem registry, symlink-based overlay directories,
auto-rebuild after item changes, admin global refresh, and a unified
Create-overlay UI with web-managed paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:25:13 +02:00
mwiegand
d18b397330
fix(host): create ~/.steam/sdk32 and sdk64 symlinks during install
L4D2 dedicated server expects to dlopen steamclient.so via
~/.steam/sdk32 (and sdk64). Without those symlinks, srcds_run logs
'cannot open shared object file' and SteamAPI_Init fails, which means
the server is invisible to the public Steam master server, Workshop
addon downloads break, and Steam 'Join Game' / lobby joins do not
reach the server (only direct-IP connect works).

SteamInstaller.install_or_update now ensures the symlinks exist after
SteamCMD finishes. Targets prefer SteamCMD's linux32/linux64 sibling
dirs; falls back to <install_dir>/bin/ if the siblings cannot be
located. Idempotent: re-running the install repairs or leaves the
symlinks alone.

Path.home() respects HOME, which the systemd web unit sets to
/var/lib/left4me, so the symlinks land in the left4me user's home.

Existing deploys can apply the fix by re-running 'Install' from /admin
without a full redeploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:11:27 +02:00
mwiegand
1968684c03
fix(deploy): MountFlags=shared on web service for fuse mount propagation
ProtectSystem=full + ReadWritePaths implicitly give the unit a private
mount namespace (systemd needs to remount /usr read-only). The default
namespace propagation is slave, so mounts the worker creates inside
never reach the host. The gameserver units (started via systemctl,
each with their own namespace) then inherit a host that lacks the
overlay, and their CHDIR into /var/lib/left4me/runtime/<name>/merged
fails.

Set MountFlags=shared so mount events propagate from the worker's
namespace back to the host, then onward to gameserver units at their
unshare time.

Verified on test box: nsenter -t <gunicorn-pid> -m mount showed the
fuse-overlayfs mount inside the worker but plain mount on the host
did not, while web unit had ProtectSystem=full + ReadWritePaths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:01:24 +02:00
mwiegand
593611e194
fix(deploy): drop PrivateTmp on web service so fuse mounts propagate
PrivateTmp=true gives the unit a private mount namespace. The worker's
fuse-overlayfs mount lives only inside that namespace, so the host
cannot see it and the gameserver unit (started via systemctl, with its
own namespace inherited from the host) also cannot see it. The
gameserver unit then fails CHDIR on
/var/lib/left4me/runtime/<name>/merged/left4dead2.

The mount must land in the host namespace so the gameserver unit
inherits it at unshare time. Remaining hardening: dedicated user,
ProtectSystem=full, ReadWritePaths, sudoers allowlist limited to two
helper scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:57:43 +02:00
mwiegand
56b9523d88
fix(deploy): drop NoNewPrivileges on web service so FUSE mounts work
The job worker calls fusermount3 (setuid-root) to mount per-instance
FUSE overlays and sudo to invoke the privileged systemctl wrapper.
NoNewPrivileges=true blocks both, surfacing as
"fusermount3: mount failed: Operation not permitted" the first time a
server is started. Hardening is still enforced via dedicated user,
PrivateTmp, ProtectSystem=full, ReadWritePaths, and the narrow sudoers
allowlist limited to two helper scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:51:39 +02:00
mwiegand
d14ed9c117
feat(web): blueprint-prefilled create-server flow + empty-state CTA
- Per-row "Create server" link on /blueprints navigates to
  /servers?blueprint_id=<id>; that page validates the param against
  the user's owned blueprints, pre-selects the option, and auto-opens
  the create modal.
- /servers empty-blueprint state now shows an actionable
  "Create a blueprint first ->" link (styled like the primary button)
  pointing at /blueprints, replacing the silent disabled "+ Create"
  button + muted hint.
- Drop the "Reassign blueprint" form on the server detail page
  along with the unused POST /servers/<id> form route. The JSON
  PATCH /servers/<id> endpoint is retained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:47:33 +02:00
mwiegand
923a1840f4
feat(web): forms in modals, edit/delete on detail pages, port auto-assign
- Native <dialog> modal infra (CSS + ~30 LOC JS, no framework) used for
  create forms and delete confirmations.
- Index pages become listing-only: + Create button opens a modal; the
  broken blueprint Actions column and inline overlay edit cells are gone.
- Server detail gains a blueprint reassignment form; existing Delete
  button now opens a confirmation modal before tearing down the runtime.
- Blueprint detail gains a Delete button + confirmation modal (was
  unreachable from the UI before).
- New overlay detail page at /overlays/<id> with edit form, "Used by"
  blueprints list, and delete (admin only).
- Server create: port field is now optional; backend auto-assigns the
  next free port from LEFT4ME_PORT_RANGE_START/_END (default
  27015-27115). 409 on range exhaustion.
- New routes: POST /blueprints/<id>/delete (form sentinel matching
  overlays pattern), POST /servers/<id> (form-friendly blueprint
  reassign), GET /overlays/<id>.
- Server delete operation now redirects to /servers; overlay update
  redirects to /overlays/<id>.

Server rename remains unsupported pending an id-vs-name design pass for
l4d2host (the runtime directory is name-keyed; renaming would orphan
files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:30:33 +02:00
mwiegand
7d9939c71d
fix(deploy): exclude macOS AppleDouble files from deploy archive
When tar runs on macOS it embeds ._* resource-fork sidecars next to each
file. These ended up under l4d2web/alembic/versions/ on the target and
alembic tried to import them as migration modules, failing with
"source code string cannot contain null bytes". Set COPYFILE_DISABLE=1
and add an --exclude '._*' so the archive is portable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:58:29 +02:00
mwiegand
0210ecd301
config: allow SESSION_COOKIE_SECURE override and disable on test deploy
The HTTP-only test deployment binds gunicorn to 0.0.0.0:8000 with no TLS
terminator, so a hardcoded SESSION_COOKIE_SECURE=True breaks browser
login. Make it opt-out via env (default True outside TESTING) and set
SESSION_COOKIE_SECURE=false in the generated web.env so the test box
keeps working over HTTP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:56:48 +02:00
mwiegand
f81e839ba2
security: harden boundary inputs and production defaults
- validate instance names at the host lib and web boundary against
  [a-z0-9][a-z0-9_-]{0,63} to prevent path traversal via Server.name
- fail-closed on SECRET_KEY: load_config returns None when env unset,
  create_app raises if missing or "dev" outside TESTING
- close login timing oracle by hashing a dummy digest when the user
  is not found, equalizing response time
- set SESSION_COOKIE_SECURE outside TESTING
- delete_instance tolerates stop_service and fusermount3 failures so
  partially-initialized instances clean up without contract breaks;
  drops the is_mount() preflight that violated AGENTS.md
- document claim_next_job's single-process assumption
- clarify emit_step contract via docstring

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:53:33 +02:00
mwiegand
3809f85795
fix: load environment variables for alembic upgrade in deploy script to ensure database url is set properly 2026-05-06 21:01:35 +02:00
mwiegand
441c1db79b
fix: change directory before running alembic upgrade in deploy script to avoid pyproject.toml permission issues 2026-05-06 21:00:59 +02:00
mwiegand
fa566db820
fix: apply alembic migrations automatically on deployment 2026-05-06 21:00:25 +02:00
mwiegand
c01359002a
fix: enable batch operations in alembic for sqlite unique constraints 2026-05-06 20:59:18 +02:00
mwiegand
7e5a8f89b5
docs: add server port constraint implementation plan 2026-05-06 20:53:50 +02:00
mwiegand
114b141e2a
test: add test for duplicate port constraint 2026-05-06 20:53:04 +02:00
mwiegand
fa002ce0f2
feat: enforce unique port constraint on servers 2026-05-06 20:52:46 +02:00
mwiegand
833ae318cf
fix(deploy): add venv to PATH in left4me-web systemd service 2026-05-06 20:45:37 +02:00
mwiegand
1604859f41
feat(host): add step logging to steam_install 2026-05-06 20:41:39 +02:00
mwiegand
005d2d8458
fix(host): enforce flush=True to prevent pipeline block buffering 2026-05-06 20:34:41 +02:00
mwiegand
27a905c22b
feat(web): add boundary log lines to job worker execution 2026-05-06 20:18:23 +02:00
mwiegand
38d04e8551
feat(host): emit steps during start, stop, and delete operations 2026-05-06 20:07:00 +02:00
mwiegand
d977098344
feat(host): emit steps during initialize_instance 2026-05-06 20:04:08 +02:00
mwiegand
700b5be6f8
feat(host): add _emit_step helper for lifecycle logging 2026-05-06 20:00:07 +02:00
mwiegand
ee144fad96
feat(l4d2-web): add server creation form 2026-05-06 19:41:04 +02:00
mwiegand
bbfc528354
feat(deploy): add production-like test deployment 2026-05-06 19:30:10 +02:00
mwiegand
de86139323
feat(l4d2): add l4d2ctl host command boundary 2026-05-06 16:35:20 +02:00
mwiegand
a347829608
feat(l4d2-web): add job pages and cancellation 2026-05-06 15:05:13 +02:00
mwiegand
91d042cf33
feat(l4d2-web): execute queued lifecycle jobs 2026-05-06 14:08:18 +02:00
mwiegand
df680f6226
fix(l4d2-web): reject encoded unsafe redirects 2026-05-06 13:24:04 +02:00
mwiegand
58fb8b2b63
fix(l4d2-web): harden auth redirect targets 2026-05-06 13:01:48 +02:00
mwiegand
deca2c9153
docs(l4d2-web): update auth contract 2026-05-06 12:55:38 +02:00