Commit graph

14 commits

Author SHA1 Message Date
mwiegand
ae443299c8
chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode
bubblewrap is no longer used now that left4me-script-sandbox runs as a
systemd service unit. Remove it from the apt-get and dnf install lines.

Also tighten the application database file mode after the alembic
upgrade step: chown root:left4me, chmod 0640. The DB had been created
at default 0644 by SQLite's open() call inside the web service, which
made it world-readable on the host — i.e. readable by any uid that can
traverse /var/lib/left4me, including the sandbox's l4d2-sandbox uid.
Smoke-testing the v2 sandbox prototype on ckn@10.0.4.128 surfaced this:
the sandbox could read "SQLite format 3" from the DB until the parent
dir was masked with TemporaryFileSystem=. Tightening the file mode is
the host-level fix; the sandbox-level mask is defense in depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:48:26 +02:00
mwiegand
4ee8f6af44
refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap
Replaces the systemd-run --scope + bwrap composition with systemd-run in
service-unit mode (--pipe --wait, transient .service unit). Same cgroup
limits and walltime kill, plus the hardening directives that --scope
units cannot carry: NoNewPrivileges, ProtectSystem=strict, ProtectHome,
ProtectKernel{Tunables,Modules,Logs,ControlGroups}, RestrictNamespaces,
RestrictAddressFamilies, RestrictSUIDSGID, LockPersonality,
MemoryDenyWriteExecute, SystemCallFilter (seccomp), and an empty
CapabilityBoundingSet (drops all caps). UID drop via User=/Group=.

The TemporaryFileSystem="/etc /var/lib" pair is the gotcha:
ProtectSystem=strict makes /var/lib *read-only* but visible, so the host
DB at /var/lib/left4me/left4me.db (mode 0644) was readable from inside.
Masking /var/lib with tmpfs hides the entire subtree; the BindPaths bind
to /overlay is at a different path and unaffected.

The Python side (ScriptBuilder, run_sandboxed_script, routes) is
unchanged — same sudo-helper invocation, same argv shape.

Loses PID-namespace isolation (no PrivatePID= directive in systemd).
Host PIDs are visible via /proc and ps -ef but not signal-able due to
UID mismatch — information disclosure only, not a privilege boundary.

Smoke-tested on ckn@10.0.4.128 prior to this commit; all isolation
invariants reproduced and the hardening directives provably blocked
unshare(2), mount(2), personality(2), bpf(2), and sysctl writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:47:30 +02:00
mwiegand
cf865d4915
fix(deploy): one-shot cleanup of orphan overlay dirs after globals removal
Migration 0005_script_overlays drops the legacy l4d2center_maps /
cedapug_maps overlay rows but leaves their /var/lib/left4me/overlays/{id}
directories on disk. When the web app subsequently creates a new overlay
and AUTOINCREMENT issues an id matching one of those orphans,
create_overlay_directory(exist_ok=False) crashes with FileExistsError —
which surfaced as a 500 on POST /overlays the first time a script
overlay was created on a deployed test box.

Adds a sentinel-gated sweep in deploy-test-server.sh that lists overlay
ids in the DB, removes any directory under overlays/ whose id has no
matching row, and drops the now-unused global_overlay_cache. Mirrors the
.kernel-overlay-migrated sentinel pattern so reruns are no-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:16:33 +02:00
mwiegand
06ae84fbe4
fix(deploy): script-sandbox helper — UID drop via systemd-run, --unshare-user-try, /etc/alternatives
Smoke testing on the test host revealed three issues with the helper as
shipped:

1. bwrap 0.11+ rejects --uid without --unshare-user. Switching the UID
   drop from inside bwrap to systemd-run (--uid=l4d2-sandbox
   --gid=l4d2-sandbox) sidesteps the userns UID-mapping headaches and
   keeps file ownership on the bind-mounted /overlay matching
   l4d2-sandbox on the host (which the wipe path relies on).

2. bwrap running as an unprivileged uid still needs a user namespace to
   set up its mount-namespace bind-mounts. Adding --unshare-user-try
   gives it the userns context when needed and is a no-op otherwise.

3. /etc/alternatives wasn't bind-mounted, so symlinked tools like
   /usr/bin/awk -> /etc/alternatives/awk fell over inside the sandbox.
   Adds the ro-bind.

Also: the helper now chowns the overlay dir to l4d2-sandbox before bwrap
(idempotent — needed because the web app creates the dir as left4me),
and the deploy script chmods /var/lib/left4me to 0711 so l4d2-sandbox
can traverse to the bind-mount source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:12:46 +02:00
mwiegand
e51a4d58a4
chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer
deploy-test-server.sh: provisions the l4d2-sandbox system user (no home,
nologin shell) and installs the bubblewrap apt/dnf package; copies the
left4me-script-sandbox helper into /usr/local/libexec/left4me with mode
0755. Drops the global_overlay_cache directory provisioning, the
refresh-global-overlays unit installation, and the timer enable.

Deletes the orphaned left4me-refresh-global-overlays.{service,timer}
files. Trims the matching paragraph from deploy/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:54:57 +02:00
mwiegand
75e703e1a4
feat(deploy): left4me-script-sandbox helper + sudoers fragment
Privileged bash helper that wraps user-authored scripts in
systemd-run --scope (cgroup limits + RuntimeMaxSec=3600) inside a
bubblewrap sandbox dropped to the l4d2-sandbox uid. Network is shared
with the host so scripts can fetch from Steam / l4d2center / etc.;
filesystem is RO except for /overlay (rw bind from
/var/lib/left4me/overlays/{id}) and tmpfs /tmp + /run.

Adds a sudoers rule allowing the left4me user to invoke this helper
without restrictions on its arguments. Strict argument validation is
in the helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:53:21 +02:00
mwiegand
9985ecc56c
chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs
Drop MountFlags=shared (the assumption that it propagated fuse mounts
to host was incorrect on systemd 257 with ProtectSystem+ReadWritePaths).
Restore PrivateTmp=true (was dropped in 593611e for fuse propagation
that did not work). Rewrite the comment block to describe the new
model: mounts go through the left4me-overlay helper which nsenters
into PID 1's mount namespace, so the unit's mount-ns layout is no
longer load-bearing.

Update the three user-facing READMEs (root, l4d2host, deploy) to drop
fuse-overlayfs / fusermount3 prereqs and call out the kernel overlayfs
mount path through the privileged helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:29:49 +02:00
mwiegand
172e574a00
chore(deploy): drop fuse-overlayfs apt dep + one-shot migrate upper/work
Drop fuse-overlayfs / fuse3 from the apt/dnf install line — the new
mount path is kernel overlayfs via the left4me-overlay helper, no
fuse userspace needed.

Add a one-shot migration block gated by /var/lib/left4me/.kernel-overlay-migrated
that runs before daemon-reload: stop gameservers + web service, force-
unmount any leftover fuse or overlay mounts under runtime/, then wipe
and recreate empty upper/ and work/ for every instance. fuse-overlayfs
running as a non-root user used user.fuseoverlayfs.* xattrs that kernel
overlayfs ignores, so a pre-existing upper/ from the fuse era would
resurrect "deleted" files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:28:00 +02:00
mwiegand
d5b321b557
feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper
New privileged helper at /usr/local/libexec/left4me/left4me-overlay
(Python, system /usr/bin/python3, stdlib only) takes only the instance
name, parses instance.env for L4D2_LOWERDIRS, validates each lowerdir
against an allowlist (installation/, overlays/, global_overlay_cache/,
workshop_cache/), refuses upperdirs tainted with user.fuseoverlayfs.*
xattrs from the prior fuse era, and execs `nsenter --mount=/proc/1/ns/mnt
-- mount -t overlay ...` so the resulting mount lives in the host
namespace. Mirrors the existing left4me-systemctl / left4me-journalctl
pattern; sudoers entry is verb-constrained.

KernelOverlayFSMounter implements the existing OverlayMounter ABC,
deriving the instance name from the merged path. No call sites use it
yet — that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:23:58 +02:00
mwiegand
38548ab0d7
chore(deploy): raise gunicorn thread pool to 32 for SSE headroom
Each SSE log-viewer or job-log stream holds a thread for its full
lifetime. With --threads 8, a handful of open browser tabs could
exhaust the pool. 32 keeps the same single-process scheduler invariant
(_claim_lock in job_worker is process-local) while giving SSE plenty
of headroom on the test box's user count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:19:03 +02:00
mwiegand
92d6ebbe82
feat(l4d2-web): managed global map overlays with daily refresh
Adds two managed system overlays (l4d2center-maps, cedapug-maps) that
fetch curated map archives from upstream sources and reconcile addons
symlinks for non-Steam maps. A daily systemd timer enqueues a coalesced
refresh_global_overlays worker job; downloads, extraction, and rebuilds
run in the existing job worker and surface in the job log UI.

Schema: GlobalOverlaySource / GlobalOverlayItem / GlobalOverlayItemFile
plus nullable Job.user_id so system jobs render as "system" in the UI.
The new builder reconciles symlinks against the per-source vpk cache
and leaves foreign symlinks untouched. Initialize-time guard refuses
to mount a partial overlay if any expected vpk is missing from cache.

Refresh service uses shutil.move to handle EXDEV when /tmp and the
cache live on different filesystems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:05:14 +02:00
mwiegand
0e83ee07d7
fix(deploy): make test deployments safe to rerun
Exclude local agent state from deploy archives, avoid recursive ownership over active runtime mounts, and let Alembic own schema upgrades before app startup.
2026-05-07 17:16:58 +02:00
mwiegand
833ae318cf
fix(deploy): add venv to PATH in left4me-web systemd service 2026-05-06 20:45:37 +02:00
mwiegand
bbfc528354
feat(deploy): add production-like test deployment 2026-05-06 19:30:10 +02:00