Commit graph

117 commits

Author SHA1 Message Date
mwiegand
5e2c771276
chore(l4d2-web): remove orphaned 'Global map overlays' admin section
The route /admin/global-overlays/refresh was removed with the script-overlays
rewrite (migration 0005 dropped the global_overlay_* tables; the systemd
refresh units were deleted from deploy/). The admin-page form was left
behind and would 404 on submit. Drop the section and lock it out with an
assertion in the existing admin-pages test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:25:15 +02:00
mwiegand
ebddb0fab2
chore(deploy): install p7zip + coreutils for script-overlay tooling
Script overlays commonly need 7z and md5sum (e.g. the l4d2center map
sync recipe). Add p7zip-full to the apt install line, p7zip + p7zip-plugins
to dnf, and coreutils explicitly so md5sum is guaranteed even on slim base
images. Lock both in with a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:23:23 +02:00
mwiegand
406f2196f8
fix(l4d2-web): write sandbox script tmpfile under LEFT4ME_ROOT, not /tmp
The web service unit has PrivateTmp=yes: its /tmp is a per-instance
namespace at /tmp/systemd-private-X-left4me-web.service-Y/tmp/ from
PID 1's perspective. When ScriptBuilder writes /tmp/tmpXXX.sh and
passes that path to the sandbox helper, systemd-run asks PID 1 to set
up BindReadOnlyPaths=${SCRIPT}:/script.sh — but PID 1 lives in the host
namespace and can't resolve the web service's PrivateTmp path. The
unit fails to start with status=226/NAMESPACE and "Failed to set up
mount namespacing: /script.sh: No such file or directory".

Move the tmpfile to ${LEFT4ME_ROOT}/sandbox-scripts/. /var/lib is not
affected by PrivateTmp (only /tmp and /var/tmp are), so PID 1 can
resolve the path. The web service has ReadWritePaths=/var/lib/left4me
already, and the directory is created on demand by Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:14:21 +02:00
mwiegand
023cc5c9b0
fix(deploy): chown WAL+SHM sidecars too, not just left4me.db
SQLite in WAL mode (the default for this app) maintains left4me.db-wal
and left4me.db-shm sidecar files alongside the main DB. All three must
be writable by the web service uid; if any one is root-owned, SQLite
reports "attempt to write a readonly database" on the next INSERT —
which surfaced as a 500 on POST /overlays/{id}/script after I'd done
ad-hoc root-side sqlite3.connect() inspection earlier and the resulting
root-owned WAL/SHM persisted.

Loop over all three paths in the deploy chmod step so root-owned
sidecars are corrected on every deploy. Idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:11:42 +02:00
mwiegand
f6ca85fc6f
fix(deploy): chown left4me.db to left4me:left4me, not root:left4me
The v2 hardening tightened the DB to mode 0640 owned by root:left4me,
intending to block reads from the sandbox uid (l4d2-sandbox, not in the
left4me group). It did — but it also took away write access from the web
service itself, which runs as user left4me. With root owning the file,
left4me only had group-read; INSERTs into the jobs table failed with
"attempt to write a readonly database" and surfaced as a 500 on POST
/overlays/{id}/script.

Owner left4me + group left4me + mode 0640 keeps the same external
posture (l4d2-sandbox gets nothing via "other") while restoring the
web service's read+write access via "owner".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:09:47 +02:00
mwiegand
abc907b14b
docs(specs): script sandbox v3 — egress filter design + plan
Captures the v3 design: IPAddressDeny= alone (no IPAddressAllow=any
because the documented "more specific wins" semantics don't hold on
systemd 257 / kernel 6.12 — the allow trumps unconditionally), explicit
CIDRs (the -p parser rejects the localhost/link-local shorthand
keywords), and a static sandbox-only resolv.conf bind to keep DNS
reachable when private RFC1918 ranges are blocked.

Plan documents what was implemented (in 7e66936) and the lessons
surfaced during execution so the next person doesn't have to rediscover
them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:08:47 +02:00
mwiegand
7e66936d03
feat(deploy): restrict script-sandbox egress to public internet only
Adds IPAddressDeny= to the sandbox unit covering loopback (127/8 + ::1),
link-local (169.254/16 + fe80::/10), multicast (224/4 + ff00::/8), all
RFC1918 v4 (10/8, 172.16/12, 192.168/16), CGNAT (100.64/10), and ULA v6
(fc00::/7). The kernel attaches systemd's sd_fw_egress BPF program to
the unit's cgroup; egress packets matching any of the deny prefixes are
silently dropped at the cgroup boundary.

Important: do NOT pair this with `IPAddressAllow=any`. Documentation
claims "more specific rule wins" but on this systemd 257 + kernel 6.12
combo, having both set causes the allow to win unconditionally — the
deny gets ignored. Empty IPAddressAllow + populated IPAddressDeny is the
correct shape: kernel default "allow all" applies to non-listed
addresses, and the listed prefixes are blocked.

Because the host's resolv.conf typically points at a private-IP DNS
server (10.0.0.1 in the test deploy), blocking RFC1918 also kills DNS.
Adds a static /etc/left4me/sandbox-resolv.conf with public resolvers
(Cloudflare 1.1.1.1, Google 8.8.8.8) and bind-mounts that into the
sandbox at /etc/resolv.conf, replacing the host's resolver inside the
sandbox only.

Smoke-tested on ckn@10.0.4.128:
- public 1.1.1.1:443: CONNECTED
- public HTTPS via DNS (steamcommunity.com): 200
- localhost web app 127.0.0.1:8000: blocked (TimeoutError)
- localhost sshd 127.0.0.1:22: blocked
- private LAN ssh 10.0.4.128:22: blocked
- private DNS 10.0.0.1:53: blocked

AF_UNIX stays in RestrictAddressFamilies — dropping it would risk
breaking NSS / syslog for marginal gain, and the IP-level filter
addresses the primary threat (reaching the host's HTTP/SSH services).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:04:57 +02:00
mwiegand
ae443299c8
chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode
bubblewrap is no longer used now that left4me-script-sandbox runs as a
systemd service unit. Remove it from the apt-get and dnf install lines.

Also tighten the application database file mode after the alembic
upgrade step: chown root:left4me, chmod 0640. The DB had been created
at default 0644 by SQLite's open() call inside the web service, which
made it world-readable on the host — i.e. readable by any uid that can
traverse /var/lib/left4me, including the sandbox's l4d2-sandbox uid.
Smoke-testing the v2 sandbox prototype on ckn@10.0.4.128 surfaced this:
the sandbox could read "SQLite format 3" from the DB until the parent
dir was masked with TemporaryFileSystem=. Tightening the file mode is
the host-level fix; the sandbox-level mask is defense in depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:48:26 +02:00
mwiegand
4ee8f6af44
refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap
Replaces the systemd-run --scope + bwrap composition with systemd-run in
service-unit mode (--pipe --wait, transient .service unit). Same cgroup
limits and walltime kill, plus the hardening directives that --scope
units cannot carry: NoNewPrivileges, ProtectSystem=strict, ProtectHome,
ProtectKernel{Tunables,Modules,Logs,ControlGroups}, RestrictNamespaces,
RestrictAddressFamilies, RestrictSUIDSGID, LockPersonality,
MemoryDenyWriteExecute, SystemCallFilter (seccomp), and an empty
CapabilityBoundingSet (drops all caps). UID drop via User=/Group=.

The TemporaryFileSystem="/etc /var/lib" pair is the gotcha:
ProtectSystem=strict makes /var/lib *read-only* but visible, so the host
DB at /var/lib/left4me/left4me.db (mode 0644) was readable from inside.
Masking /var/lib with tmpfs hides the entire subtree; the BindPaths bind
to /overlay is at a different path and unaffected.

The Python side (ScriptBuilder, run_sandboxed_script, routes) is
unchanged — same sudo-helper invocation, same argv shape.

Loses PID-namespace isolation (no PrivatePID= directive in systemd).
Host PIDs are visible via /proc and ps -ef but not signal-able due to
UID mismatch — information disclosure only, not a privilege boundary.

Smoke-tested on ckn@10.0.4.128 prior to this commit; all isolation
invariants reproduced and the hardening directives provably blocked
unshare(2), mount(2), personality(2), bpf(2), and sysctl writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:47:30 +02:00
mwiegand
efaaf84cd9
docs(specs): script sandbox v2 — systemd-only design + plan
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
  must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
  Information disclosure only — kernel UID-mismatch blocks signals.

Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:46:13 +02:00
mwiegand
a62f26ba4a
fix(l4d2-web): normalize CRLF to LF in script overlay POST
HTML <textarea> form submission encodes line breaks as CRLF per spec.
Storing those CRLFs unchanged means every line of the script reaches
bash with a trailing \r, which bash treats as part of the argument —
turning "ls /" into "ls /\r" and failing. Normalize CRLF/CR → LF in the
/overlays/{id}/script handler so storage and the sandbox tmpfile are
LF-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:20:10 +02:00
mwiegand
908bca3687
fix(l4d2-web): ScriptBuilder — chmod script tmpfile to 0644 for sandbox read
NamedTemporaryFile creates the script file at mode 0600 owned by the
left4me web user. The sandbox runs as l4d2-sandbox and bwrap bind-mounts
the file read-only at /script.sh, but the kernel still enforces the
underlying file's permissions — l4d2-sandbox can't read 0600 left4me
files, so /bin/bash /script.sh fails with "Permission denied".

Script content is not a secret (it's stored in the DB and editable by
the user), so 0644 is appropriate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:18:00 +02:00
mwiegand
cf865d4915
fix(deploy): one-shot cleanup of orphan overlay dirs after globals removal
Migration 0005_script_overlays drops the legacy l4d2center_maps /
cedapug_maps overlay rows but leaves their /var/lib/left4me/overlays/{id}
directories on disk. When the web app subsequently creates a new overlay
and AUTOINCREMENT issues an id matching one of those orphans,
create_overlay_directory(exist_ok=False) crashes with FileExistsError —
which surfaced as a 500 on POST /overlays the first time a script
overlay was created on a deployed test box.

Adds a sentinel-gated sweep in deploy-test-server.sh that lists overlay
ids in the DB, removes any directory under overlays/ whose id has no
matching row, and drops the now-unused global_overlay_cache. Mirrors the
.kernel-overlay-migrated sentinel pattern so reruns are no-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:16:33 +02:00
mwiegand
06ae84fbe4
fix(deploy): script-sandbox helper — UID drop via systemd-run, --unshare-user-try, /etc/alternatives
Smoke testing on the test host revealed three issues with the helper as
shipped:

1. bwrap 0.11+ rejects --uid without --unshare-user. Switching the UID
   drop from inside bwrap to systemd-run (--uid=l4d2-sandbox
   --gid=l4d2-sandbox) sidesteps the userns UID-mapping headaches and
   keeps file ownership on the bind-mounted /overlay matching
   l4d2-sandbox on the host (which the wipe path relies on).

2. bwrap running as an unprivileged uid still needs a user namespace to
   set up its mount-namespace bind-mounts. Adding --unshare-user-try
   gives it the userns context when needed and is a no-op otherwise.

3. /etc/alternatives wasn't bind-mounted, so symlinked tools like
   /usr/bin/awk -> /etc/alternatives/awk fell over inside the sandbox.
   Adds the ro-bind.

Also: the helper now chowns the overlay dir to l4d2-sandbox before bwrap
(idempotent — needed because the web app creates the dir as left4me),
and the deploy script chmods /var/lib/left4me to 0711 so l4d2-sandbox
can traverse to the bind-mount source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:12:46 +02:00
mwiegand
1e62a44c16
docs(deploy): replace globals overlay description with script overlays
deploy/README.md still described the deleted managed-global overlays as
the second overlay surface. Replace with a description of script
overlays (bubblewrap + systemd-run sandbox, resource caps).

Full test sweep: 367 passing, 2 skipped across l4d2web, l4d2host, deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:56:24 +02:00
mwiegand
e51a4d58a4
chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer
deploy-test-server.sh: provisions the l4d2-sandbox system user (no home,
nologin shell) and installs the bubblewrap apt/dnf package; copies the
left4me-script-sandbox helper into /usr/local/libexec/left4me with mode
0755. Drops the global_overlay_cache directory provisioning, the
refresh-global-overlays unit installation, and the timer enable.

Deletes the orphaned left4me-refresh-global-overlays.{service,timer}
files. Trims the matching paragraph from deploy/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:54:57 +02:00
mwiegand
75e703e1a4
feat(deploy): left4me-script-sandbox helper + sudoers fragment
Privileged bash helper that wraps user-authored scripts in
systemd-run --scope (cgroup limits + RuntimeMaxSec=3600) inside a
bubblewrap sandbox dropped to the l4d2-sandbox uid. Network is shared
with the host so scripts can fetch from Steam / l4d2center / etc.;
filesystem is RO except for /overlay (rw bind from
/var/lib/left4me/overlays/{id}) and tmpfs /tmp + /run.

Adds a sudoers rule allowing the left4me user to invoke this helper
without restrictions on its arguments. Strict argument validation is
in the helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:53:21 +02:00
mwiegand
d351bcbee5
feat(l4d2-web): script overlay UI
Adds the script type to the create-overlay modal (with an admin-only
system-wide checkbox) and a script-section to the detail page: textarea
for the bash body, Save / Rebuild / Wipe buttons, last_build_status
badge, latest-build-job link, and a Wipe confirm modal. Removes the
GlobalOverlaySource block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:50:36 +02:00
mwiegand
be22744d54
feat(l4d2-web): script overlay routes (script update / wipe / build)
Adds POST /overlays/{id}/script, /wipe, /build under the overlay blueprint.
Generalizes /build to handle any owner/admin-editable overlay (deletes the
duplicate workshop-specific manual_build). Wipe runs the literal script
"find /overlay -mindepth 1 -delete" through run_sandboxed_script and
refuses with 409 while a build_overlay job is running. Adds an
admin-only system_wide=1 flag to POST /overlays for system-wide creation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:48:15 +02:00
mwiegand
879c54cbda
refactor(l4d2-web): drop refresh_global_overlays from scheduler
GLOBAL_OPERATIONS becomes {"install", "refresh_workshop_items"}.
Removes refresh_global_overlays_running from SchedulerState and the
_run_refresh_global_overlays dispatch. Drops dead test cases and pins
GLOBAL_OPERATIONS contents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:45:34 +02:00
mwiegand
9f476e3456
refactor(l4d2-web): drop global-overlays subsystem in favor of script type
Deletes the global_map_sources, global_overlay_refresh, global_map_cache,
and global_overlays service modules and their tests. Removes the
refresh-global-overlays CLI command, the /admin/global-overlays/refresh
route, and the GlobalOverlaySource view in overlay_detail rendering.
Drops py7zr from dependencies — was only used by the deleted subsystem.

The job_worker scheduler still tracks refresh_global_overlays; that
cleanup is Task 4. Deploy/README references are Task 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:43:41 +02:00
mwiegand
d29afa41fa
feat(l4d2-web): ScriptBuilder + BUILDERS registry update
Adds ScriptBuilder that runs user-authored bash inside the
left4me-script-sandbox helper via run_command, with a 20 GB post-build
disk cap. Registry now {"workshop", "script"}.
finish_job writes Overlay.last_build_status on build_overlay completion.
Drops GlobalMapOverlayBuilder and the now-unreachable
_check_global_overlay_caches in l4d2_facade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:39:13 +02:00
mwiegand
43dc9b0ccf
feat(l4d2-web): script overlay schema — add overlay.script + last_build_status, drop globals tables
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:33:04 +02:00
mwiegand
78ead0b41d
docs(specs): script overlay type — design + implementation plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:27:14 +02:00
mwiegand
9985ecc56c
chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs
Drop MountFlags=shared (the assumption that it propagated fuse mounts
to host was incorrect on systemd 257 with ProtectSystem+ReadWritePaths).
Restore PrivateTmp=true (was dropped in 593611e for fuse propagation
that did not work). Rewrite the comment block to describe the new
model: mounts go through the left4me-overlay helper which nsenters
into PID 1's mount namespace, so the unit's mount-ns layout is no
longer load-bearing.

Update the three user-facing READMEs (root, l4d2host, deploy) to drop
fuse-overlayfs / fusermount3 prereqs and call out the kernel overlayfs
mount path through the privileged helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:29:49 +02:00
mwiegand
172e574a00
chore(deploy): drop fuse-overlayfs apt dep + one-shot migrate upper/work
Drop fuse-overlayfs / fuse3 from the apt/dnf install line — the new
mount path is kernel overlayfs via the left4me-overlay helper, no
fuse userspace needed.

Add a one-shot migration block gated by /var/lib/left4me/.kernel-overlay-migrated
that runs before daemon-reload: stop gameservers + web service, force-
unmount any leftover fuse or overlay mounts under runtime/, then wipe
and recreate empty upper/ and work/ for every instance. fuse-overlayfs
running as a non-root user used user.fuseoverlayfs.* xattrs that kernel
overlayfs ignores, so a pre-existing upper/ from the fuse era would
resurrect "deleted" files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:28:00 +02:00
mwiegand
93a60befb6
refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop fuse module
Replace direct fuse-overlayfs / fusermount3 subprocess calls in
start_instance / stop_instance / delete_instance with the existing
OverlayMounter abstraction, now backed by KernelOverlayFSMounter.
Adds an os.path.ismount guard at the top of start_instance so a
kernel-level overlay that survived a web-worker crash isn't double-
mounted (kernel mounts persist when the cgroup dies, unlike fuse
daemons).

Delete the unused FuseOverlayFSMounter module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:26:28 +02:00
mwiegand
d5b321b557
feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper
New privileged helper at /usr/local/libexec/left4me/left4me-overlay
(Python, system /usr/bin/python3, stdlib only) takes only the instance
name, parses instance.env for L4D2_LOWERDIRS, validates each lowerdir
against an allowlist (installation/, overlays/, global_overlay_cache/,
workshop_cache/), refuses upperdirs tainted with user.fuseoverlayfs.*
xattrs from the prior fuse era, and execs `nsenter --mount=/proc/1/ns/mnt
-- mount -t overlay ...` so the resulting mount lives in the host
namespace. Mirrors the existing left4me-systemctl / left4me-journalctl
pattern; sudoers entry is verb-constrained.

KernelOverlayFSMounter implements the existing OverlayMounter ABC,
deriving the instance name from the merged path. No call sites use it
yet — that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:23:58 +02:00
mwiegand
db120d77d3
docs(specs): kernel overlayfs migration design + plan
Captures the architectural fix for the mount-propagation bug: replace
fuse-overlayfs (rootless mount inside the web service's namespace, never
visible to host or to gameserver units) with kernel-native overlayfs
mounted via a privileged sudo helper that nsenters into PID 1's mount
namespace. Companion plan numbers the migration as five tasks ending in
end-to-end verification on the test box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:19:26 +02:00
mwiegand
d5d710afa7
fix(l4d2-host): make stop_instance idempotent on the unmount step
systemctl stop is already a no-op on a stopped unit, but stop_instance
was unconditionally running fusermount3 -u and bubbling up the EINVAL
when the overlay wasn't currently mounted (e.g. server already stopped).
Mirror the established delete_instance pattern: always attempt the
unmount, swallow CalledProcessError, and label the step "(if mounted)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:24:04 +02:00
mwiegand
38548ab0d7
chore(deploy): raise gunicorn thread pool to 32 for SSE headroom
Each SSE log-viewer or job-log stream holds a thread for its full
lifetime. With --threads 8, a handful of open browser tabs could
exhaust the pool. 32 keeps the same single-process scheduler invariant
(_claim_lock in job_worker is process-local) while giving SSE plenty
of headroom on the test box's user count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:19:03 +02:00
mwiegand
4552af6544
fix(l4d2-web): keep SSE log stream from pinning gunicorn threads
stream_command used a blocking proc.stdout.readline() that never woke
when the underlying journalctl was silent, so Flask never delivered
GeneratorExit on client disconnect — the worker thread and the journalctl
child both leaked permanently and pinned the gunicorn thread pool.

Switch to a select-based read loop with a 15s heartbeat tick (yielded as
""), and translate the tick to an SSE keepalive comment in the log route.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:18:56 +02:00
mwiegand
ffc4cdbd7d
refactor(l4d2-web): remove legacy external overlay type
The workshop + managed-global overlay surface fully covers the
admin-SFTP flow that 'external' was a placeholder for. Drop the type
from the model defaults, builder registry, routes, template, and
tests, and add migration 0004 that deletes any leftover external
rows along with their blueprint and job references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:31:04 +02:00
mwiegand
92d6ebbe82
feat(l4d2-web): managed global map overlays with daily refresh
Adds two managed system overlays (l4d2center-maps, cedapug-maps) that
fetch curated map archives from upstream sources and reconcile addons
symlinks for non-Steam maps. A daily systemd timer enqueues a coalesced
refresh_global_overlays worker job; downloads, extraction, and rebuilds
run in the existing job worker and surface in the job log UI.

Schema: GlobalOverlaySource / GlobalOverlayItem / GlobalOverlayItemFile
plus nullable Job.user_id so system jobs render as "system" in the UI.
The new builder reconciles symlinks against the per-source vpk cache
and leaves foreign symlinks untouched. Initialize-time guard refuses
to mount a partial overlay if any expected vpk is missing from cache.

Refresh service uses shutil.move to handle EXDEV when /tmp and the
cache live on different filesystems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:05:14 +02:00
mwiegand
4f78574edd
fix(l4d2-web): keep workshop refresh responsive
Limit workshop refresh downloads to one worker and commit build-overlay enqueue work before writing the final job log so SQLite locks do not wedge the web process.
2026-05-07 17:31:12 +02:00
mwiegand
0e83ee07d7
fix(deploy): make test deployments safe to rerun
Exclude local agent state from deploy archives, avoid recursive ownership over active runtime mounts, and let Alembic own schema upgrades before app startup.
2026-05-07 17:16:58 +02:00
mwiegand
b2a8d3d5e0
feat(deploy): workshop_cache provisioning
Adds /var/lib/left4me/workshop_cache to the deploy mkdir list (owned by
the left4me runtime user). Updates deploy/README.md to document the new
directory and the workshop overlay layout: web app downloads VPKs into
the cache and symlinks them into overlays/{overlay_id}/left4dead2/addons/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:53:49 +02:00
mwiegand
ac020d1e77
feat(l4d2-web): initialize-time guard for uncached workshop items
Before invoking l4d2ctl initialize, run each blueprint overlay's builder
synchronously and then verify that every workshop item attached to the
blueprint has a cache file on disk. If any are missing, raise a clear
error naming the overlay and the missing steam_ids — server start can't
silently mount a partial overlay where some maps are mysteriously absent
in-game.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:53:04 +02:00
mwiegand
df1ccb4cca
feat(l4d2-web): workshop overlay UI (routes + templates)
Adds workshop_routes blueprint with add-items / remove-item / manual-
build endpoints plus admin /admin/workshop/refresh. Add-items handles
single ID, single URL, multi-line batch, or a collection ID; auto-
enqueues a coalesced build_overlay job per call. Reject non-L4D2 items
with 400, duplicate associations with friendly toast, intruders with
403.

Generalizes overlay_routes: type+name only on create (no path field);
external is admin-only and system-wide, workshop is per-user and
auto-pathed. Update is name-only. Delete recursively removes the
on-disk dir only for managed paths (path == str(id)); legacy externals
are left in place. The pre-existing in-use guard is preserved.

Page routes filter the overlay listing by user permissions and load
workshop items + the latest related job for the detail view.

Templates: unified Create modal with type radio (no path field).
Type-aware overlay detail: workshop overlays show a multi-line input
+ items/collection radio + item table partial with thumbnails, manual
Rebuild button, and a small status indicator pulled from the latest
related job. Admin page gets a "Refresh all workshop items" button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:50:54 +02:00
mwiegand
38a6fbbe1e
feat(l4d2-web): worker support for build_overlay and refresh_workshop_items
Extends SchedulerState with running_overlays / refresh_running /
blocked_servers_by_overlay, and updates can_start with the truth table:
install and refresh_workshop_items are global mutexes; build_overlay
serializes per-overlay; server jobs block on builds for any overlay
their blueprint references.

Adds enqueue_build_overlay coalescing helper that returns an existing
queued job for the same overlay rather than inserting a duplicate.

Adds run_job dispatch for build_overlay (BUILDERS[overlay.type].build)
and refresh_workshop_items (re-fetches metadata, re-downloads on
time_updated/filename change, enqueues coalesced rebuilds for affected
overlays).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:44:10 +02:00
mwiegand
700940d578
feat(l4d2-web): overlay builder registry with workshop builder
Adds l4d2web/services/overlay_builders.py with a BUILDERS dict mapping
Overlay.type to a builder class. ExternalBuilder is a no-op that just
ensures the overlay directory exists. WorkshopBuilder diff-applies
absolute symlinks under left4dead2/addons/ against the overlay's current
WorkshopItem associations: creates new ones, removes obsolete, leaves
unrelated files alone, and skips uncached items with a warning rather
than producing dangling symlinks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:40:30 +02:00
mwiegand
f0230e17d3
feat(l4d2-web): overlay path helpers and creation
Adds workshop_paths.cache_path(steam_id) returning
$LEFT4ME_ROOT/workshop_cache/{steam_id}.vpk with digit-only validation.

Adds overlay_creation.generate_overlay_path(id) and
create_overlay_directory(overlay) with exist_ok=False so a stray dir from a
prior failed delete surfaces loudly instead of shadowing fresh content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:38:39 +02:00
mwiegand
c6b41429ee
feat(l4d2-web): steam workshop API client and downloader
Adds l4d2web/services/steam_workshop.py: parse_workshop_input (single ID,
URL, or multi-line batch), resolve_collection (HTTPS POST to
GetCollectionDetails), fetch_metadata_batch (HTTPS POST to
GetPublishedFileDetails with consumer_app_id == 550 enforcement that
raises WorkshopValidationError in add-mode and silently skips in
refresh-mode), download_to_cache (atomic + idempotent on mtime+size),
and refresh_all (ThreadPoolExecutor with per-item error collection).

Adds requests as an explicit dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:37:39 +02:00
mwiegand
2543a05c12
feat(l4d2-web): typed overlays + workshop schema migration
Adds Overlay.type and Overlay.user_id with two partial unique indexes
(externals globally unique by name; user overlays unique per user).
Adds WorkshopItem registry keyed on steam_id and a pure many-to-many
overlay_workshop_items association. Adds Job.overlay_id for build_overlay
job tracking. Switches overlays.id to AUTOINCREMENT so deleted IDs are
never reused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:35:13 +02:00
mwiegand
b46f52258d
docs(workshop): spec and plan for steam workshop overlays
Add a typed-overlay model with workshop as the first non-external type:
deduplicated WorkshopItem registry, symlink-based overlay directories,
auto-rebuild after item changes, admin global refresh, and a unified
Create-overlay UI with web-managed paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:25:13 +02:00
mwiegand
d18b397330
fix(host): create ~/.steam/sdk32 and sdk64 symlinks during install
L4D2 dedicated server expects to dlopen steamclient.so via
~/.steam/sdk32 (and sdk64). Without those symlinks, srcds_run logs
'cannot open shared object file' and SteamAPI_Init fails, which means
the server is invisible to the public Steam master server, Workshop
addon downloads break, and Steam 'Join Game' / lobby joins do not
reach the server (only direct-IP connect works).

SteamInstaller.install_or_update now ensures the symlinks exist after
SteamCMD finishes. Targets prefer SteamCMD's linux32/linux64 sibling
dirs; falls back to <install_dir>/bin/ if the siblings cannot be
located. Idempotent: re-running the install repairs or leaves the
symlinks alone.

Path.home() respects HOME, which the systemd web unit sets to
/var/lib/left4me, so the symlinks land in the left4me user's home.

Existing deploys can apply the fix by re-running 'Install' from /admin
without a full redeploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:11:27 +02:00
mwiegand
1968684c03
fix(deploy): MountFlags=shared on web service for fuse mount propagation
ProtectSystem=full + ReadWritePaths implicitly give the unit a private
mount namespace (systemd needs to remount /usr read-only). The default
namespace propagation is slave, so mounts the worker creates inside
never reach the host. The gameserver units (started via systemctl,
each with their own namespace) then inherit a host that lacks the
overlay, and their CHDIR into /var/lib/left4me/runtime/<name>/merged
fails.

Set MountFlags=shared so mount events propagate from the worker's
namespace back to the host, then onward to gameserver units at their
unshare time.

Verified on test box: nsenter -t <gunicorn-pid> -m mount showed the
fuse-overlayfs mount inside the worker but plain mount on the host
did not, while web unit had ProtectSystem=full + ReadWritePaths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:01:24 +02:00
mwiegand
593611e194
fix(deploy): drop PrivateTmp on web service so fuse mounts propagate
PrivateTmp=true gives the unit a private mount namespace. The worker's
fuse-overlayfs mount lives only inside that namespace, so the host
cannot see it and the gameserver unit (started via systemctl, with its
own namespace inherited from the host) also cannot see it. The
gameserver unit then fails CHDIR on
/var/lib/left4me/runtime/<name>/merged/left4dead2.

The mount must land in the host namespace so the gameserver unit
inherits it at unshare time. Remaining hardening: dedicated user,
ProtectSystem=full, ReadWritePaths, sudoers allowlist limited to two
helper scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:57:43 +02:00
mwiegand
56b9523d88
fix(deploy): drop NoNewPrivileges on web service so FUSE mounts work
The job worker calls fusermount3 (setuid-root) to mount per-instance
FUSE overlays and sudo to invoke the privileged systemctl wrapper.
NoNewPrivileges=true blocks both, surfacing as
"fusermount3: mount failed: Operation not permitted" the first time a
server is started. Hardening is still enforced via dedicated user,
PrivateTmp, ProtectSystem=full, ReadWritePaths, and the narrow sudoers
allowlist limited to two helper scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:51:39 +02:00
mwiegand
d14ed9c117
feat(web): blueprint-prefilled create-server flow + empty-state CTA
- Per-row "Create server" link on /blueprints navigates to
  /servers?blueprint_id=<id>; that page validates the param against
  the user's owned blueprints, pre-selects the option, and auto-opens
  the create modal.
- /servers empty-blueprint state now shows an actionable
  "Create a blueprint first ->" link (styled like the primary button)
  pointing at /blueprints, replacing the silent disabled "+ Create"
  button + muted hint.
- Drop the "Reassign blueprint" form on the server detail page
  along with the unused POST /servers/<id> form route. The JSON
  PATCH /servers/<id> endpoint is retained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:47:33 +02:00