systemd applies WorkingDirectory= to every Exec line including ExecStartPre.
With the merged dir not yet existing at boot time (the volatile overlay
mount has been wiped), the chdir into runtime/%i/merged/left4dead2 fails
with status=200/CHDIR before ExecStartPre can run the mount helper.
The `-` prefix makes chdir failure non-fatal: ExecStartPre runs in the
unit's home (cwd doesn't matter for the mount helper); ExecStart re-applies
WorkingDirectory once the mount has landed and chdirs successfully.
Companion to commit 519567e (which added the ExecStartPre mount + helper
idempotency but didn't account for the WorkingDirectory ordering).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lifecycle change to systemctl enable --now (commit 8552c55) made
units auto-start at boot. But the kernel-overlayfs mount is volatile
(reboot kills it), and the web app's start_instance only re-mounts in
response to a UI click. Result: at boot, systemd starts the unit, finds
empty merged/, CHDIR fails, Restart=on-failure spins forever (counter
hit 65 on ckn before this fix landed).
Fix:
- Unit gets `ExecStartPre=/usr/bin/sudo -n .../left4me-overlay mount %i`
so the overlay is established before the main process starts.
- Helper is now idempotent: if merged is already a mount point, exit 0.
Required because Restart=on-failure re-runs ExecStartPre on each
cycle, and the web-app's start_instance also calls the helper, so
both paths would otherwise collide on "already mounted".
- StartLimitBurst=5 + StartLimitIntervalSec=60s caps the restart loop
instead of letting it spin indefinitely on a fundamental failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Servers started via the web UI now create a WantedBy= symlink under
multi-user.target.wants/, so they auto-start on the next host reboot.
Helper verbs renamed start/stop -> enable/disable; service_control.py
renamed start_service/stop_service -> enable_service/disable_service.
The user-facing l4d2ctl start/stop commands keep their names per the
AGENTS.md contract -- only the implementation changes. Spec:
docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds yield CPU/IO to game-server instances under contention via the
slice's weight=10, and are killed first under memory pressure
(servers have OOMScoreAdjust=-200).
Matches the spec-pointer comment Task 1 added to
left4me-server@.service. A future operator running
`systemctl cat l4d2-game.slice` now finds the rationale.
Flat top-level slices. Game wins under contention; build still gets
the box when uncontended. Referenced by left4me-server@.service and
the script-sandbox systemd-run invocation.
Cedapug's build script writes .cedapug/manifest.tsv with mode 0600 owned
by l4d2-sandbox; the web service (left4me uid) then 500s when streaming
that file via the download route — PermissionError on open().
Two fixes:
- UMask=0022 on the systemd-run unit so new file writes default to
0644 / dirs to 0755.
- Post-script chmod o+r/o+rx walk over the overlay dir to backfill any
stricter modes the script left behind (e.g. shells/tools that ignore
umask and explicitly create with 0600).
The helper no longer execs systemd-run; it captures the rc, runs the
post-step, and exits with the original rc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds IPAddressDeny= to the sandbox unit covering loopback (127/8 + ::1),
link-local (169.254/16 + fe80::/10), multicast (224/4 + ff00::/8), all
RFC1918 v4 (10/8, 172.16/12, 192.168/16), CGNAT (100.64/10), and ULA v6
(fc00::/7). The kernel attaches systemd's sd_fw_egress BPF program to
the unit's cgroup; egress packets matching any of the deny prefixes are
silently dropped at the cgroup boundary.
Important: do NOT pair this with `IPAddressAllow=any`. Documentation
claims "more specific rule wins" but on this systemd 257 + kernel 6.12
combo, having both set causes the allow to win unconditionally — the
deny gets ignored. Empty IPAddressAllow + populated IPAddressDeny is the
correct shape: kernel default "allow all" applies to non-listed
addresses, and the listed prefixes are blocked.
Because the host's resolv.conf typically points at a private-IP DNS
server (10.0.0.1 in the test deploy), blocking RFC1918 also kills DNS.
Adds a static /etc/left4me/sandbox-resolv.conf with public resolvers
(Cloudflare 1.1.1.1, Google 8.8.8.8) and bind-mounts that into the
sandbox at /etc/resolv.conf, replacing the host's resolver inside the
sandbox only.
Smoke-tested on ckn@10.0.4.128:
- public 1.1.1.1:443: CONNECTED
- public HTTPS via DNS (steamcommunity.com): 200
- localhost web app 127.0.0.1:8000: blocked (TimeoutError)
- localhost sshd 127.0.0.1:22: blocked
- private LAN ssh 10.0.4.128:22: blocked
- private DNS 10.0.0.1:53: blocked
AF_UNIX stays in RestrictAddressFamilies — dropping it would risk
breaking NSS / syslog for marginal gain, and the IP-level filter
addresses the primary threat (reaching the host's HTTP/SSH services).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the systemd-run --scope + bwrap composition with systemd-run in
service-unit mode (--pipe --wait, transient .service unit). Same cgroup
limits and walltime kill, plus the hardening directives that --scope
units cannot carry: NoNewPrivileges, ProtectSystem=strict, ProtectHome,
ProtectKernel{Tunables,Modules,Logs,ControlGroups}, RestrictNamespaces,
RestrictAddressFamilies, RestrictSUIDSGID, LockPersonality,
MemoryDenyWriteExecute, SystemCallFilter (seccomp), and an empty
CapabilityBoundingSet (drops all caps). UID drop via User=/Group=.
The TemporaryFileSystem="/etc /var/lib" pair is the gotcha:
ProtectSystem=strict makes /var/lib *read-only* but visible, so the host
DB at /var/lib/left4me/left4me.db (mode 0644) was readable from inside.
Masking /var/lib with tmpfs hides the entire subtree; the BindPaths bind
to /overlay is at a different path and unaffected.
The Python side (ScriptBuilder, run_sandboxed_script, routes) is
unchanged — same sudo-helper invocation, same argv shape.
Loses PID-namespace isolation (no PrivatePID= directive in systemd).
Host PIDs are visible via /proc and ps -ef but not signal-able due to
UID mismatch — information disclosure only, not a privilege boundary.
Smoke-tested on ckn@10.0.4.128 prior to this commit; all isolation
invariants reproduced and the hardening directives provably blocked
unshare(2), mount(2), personality(2), bpf(2), and sysctl writes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke testing on the test host revealed three issues with the helper as
shipped:
1. bwrap 0.11+ rejects --uid without --unshare-user. Switching the UID
drop from inside bwrap to systemd-run (--uid=l4d2-sandbox
--gid=l4d2-sandbox) sidesteps the userns UID-mapping headaches and
keeps file ownership on the bind-mounted /overlay matching
l4d2-sandbox on the host (which the wipe path relies on).
2. bwrap running as an unprivileged uid still needs a user namespace to
set up its mount-namespace bind-mounts. Adding --unshare-user-try
gives it the userns context when needed and is a no-op otherwise.
3. /etc/alternatives wasn't bind-mounted, so symlinked tools like
/usr/bin/awk -> /etc/alternatives/awk fell over inside the sandbox.
Adds the ro-bind.
Also: the helper now chowns the overlay dir to l4d2-sandbox before bwrap
(idempotent — needed because the web app creates the dir as left4me),
and the deploy script chmods /var/lib/left4me to 0711 so l4d2-sandbox
can traverse to the bind-mount source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deploy-test-server.sh: provisions the l4d2-sandbox system user (no home,
nologin shell) and installs the bubblewrap apt/dnf package; copies the
left4me-script-sandbox helper into /usr/local/libexec/left4me with mode
0755. Drops the global_overlay_cache directory provisioning, the
refresh-global-overlays unit installation, and the timer enable.
Deletes the orphaned left4me-refresh-global-overlays.{service,timer}
files. Trims the matching paragraph from deploy/README.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Privileged bash helper that wraps user-authored scripts in
systemd-run --scope (cgroup limits + RuntimeMaxSec=3600) inside a
bubblewrap sandbox dropped to the l4d2-sandbox uid. Network is shared
with the host so scripts can fetch from Steam / l4d2center / etc.;
filesystem is RO except for /overlay (rw bind from
/var/lib/left4me/overlays/{id}) and tmpfs /tmp + /run.
Adds a sudoers rule allowing the left4me user to invoke this helper
without restrictions on its arguments. Strict argument validation is
in the helper itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop MountFlags=shared (the assumption that it propagated fuse mounts
to host was incorrect on systemd 257 with ProtectSystem+ReadWritePaths).
Restore PrivateTmp=true (was dropped in 593611e for fuse propagation
that did not work). Rewrite the comment block to describe the new
model: mounts go through the left4me-overlay helper which nsenters
into PID 1's mount namespace, so the unit's mount-ns layout is no
longer load-bearing.
Update the three user-facing READMEs (root, l4d2host, deploy) to drop
fuse-overlayfs / fusermount3 prereqs and call out the kernel overlayfs
mount path through the privileged helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New privileged helper at /usr/local/libexec/left4me/left4me-overlay
(Python, system /usr/bin/python3, stdlib only) takes only the instance
name, parses instance.env for L4D2_LOWERDIRS, validates each lowerdir
against an allowlist (installation/, overlays/, global_overlay_cache/,
workshop_cache/), refuses upperdirs tainted with user.fuseoverlayfs.*
xattrs from the prior fuse era, and execs `nsenter --mount=/proc/1/ns/mnt
-- mount -t overlay ...` so the resulting mount lives in the host
namespace. Mirrors the existing left4me-systemctl / left4me-journalctl
pattern; sudoers entry is verb-constrained.
KernelOverlayFSMounter implements the existing OverlayMounter ABC,
deriving the instance name from the merged path. No call sites use it
yet — that's the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each SSE log-viewer or job-log stream holds a thread for its full
lifetime. With --threads 8, a handful of open browser tabs could
exhaust the pool. 32 keeps the same single-process scheduler invariant
(_claim_lock in job_worker is process-local) while giving SSE plenty
of headroom on the test box's user count.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two managed system overlays (l4d2center-maps, cedapug-maps) that
fetch curated map archives from upstream sources and reconcile addons
symlinks for non-Steam maps. A daily systemd timer enqueues a coalesced
refresh_global_overlays worker job; downloads, extraction, and rebuilds
run in the existing job worker and surface in the job log UI.
Schema: GlobalOverlaySource / GlobalOverlayItem / GlobalOverlayItemFile
plus nullable Job.user_id so system jobs render as "system" in the UI.
The new builder reconciles symlinks against the per-source vpk cache
and leaves foreign symlinks untouched. Initialize-time guard refuses
to mount a partial overlay if any expected vpk is missing from cache.
Refresh service uses shutil.move to handle EXDEV when /tmp and the
cache live on different filesystems.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ProtectSystem=full + ReadWritePaths implicitly give the unit a private
mount namespace (systemd needs to remount /usr read-only). The default
namespace propagation is slave, so mounts the worker creates inside
never reach the host. The gameserver units (started via systemctl,
each with their own namespace) then inherit a host that lacks the
overlay, and their CHDIR into /var/lib/left4me/runtime/<name>/merged
fails.
Set MountFlags=shared so mount events propagate from the worker's
namespace back to the host, then onward to gameserver units at their
unshare time.
Verified on test box: nsenter -t <gunicorn-pid> -m mount showed the
fuse-overlayfs mount inside the worker but plain mount on the host
did not, while web unit had ProtectSystem=full + ReadWritePaths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PrivateTmp=true gives the unit a private mount namespace. The worker's
fuse-overlayfs mount lives only inside that namespace, so the host
cannot see it and the gameserver unit (started via systemctl, with its
own namespace inherited from the host) also cannot see it. The
gameserver unit then fails CHDIR on
/var/lib/left4me/runtime/<name>/merged/left4dead2.
The mount must land in the host namespace so the gameserver unit
inherits it at unshare time. Remaining hardening: dedicated user,
ProtectSystem=full, ReadWritePaths, sudoers allowlist limited to two
helper scripts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The job worker calls fusermount3 (setuid-root) to mount per-instance
FUSE overlays and sudo to invoke the privileged systemctl wrapper.
NoNewPrivileges=true blocks both, surfacing as
"fusermount3: mount failed: Operation not permitted" the first time a
server is started. Hardening is still enforced via dedicated user,
PrivateTmp, ProtectSystem=full, ReadWritePaths, and the narrow sudoers
allowlist limited to two helper scripts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>