Commit graph

142 commits

Author SHA1 Message Date
mwiegand
db3b149045
docs(specs): l4d2 server host perf baseline — design
Approach A: per-instance unit directives (Nice, OOM, Memory caps,
KillSignal=SIGINT, log-rate disable), flat l4d2-game/l4d2-build slice
hierarchy with 100:1 CPU/IO weight ratio, sandbox into build slice with
OOMScoreAdjust=500, host sysctls for UDP buffers + netdev backlog/budget
+ vm.swappiness. SCHED_FIFO, CPU governor, CPUAffinity, NIC tuning are
documented escape hatches, not auto-applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:31:05 +02:00
mwiegand
965b67e6fc
fix(l4d2-host): script-sandbox normalizes file perms so web user can read
Cedapug's build script writes .cedapug/manifest.tsv with mode 0600 owned
by l4d2-sandbox; the web service (left4me uid) then 500s when streaming
that file via the download route — PermissionError on open().

Two fixes:
- UMask=0022 on the systemd-run unit so new file writes default to
  0644 / dirs to 0755.
- Post-script chmod o+r/o+rx walk over the overlay dir to backfill any
  stricter modes the script left behind (e.g. shells/tools that ignore
  umask and explicitly create with 0600).

The helper no longer execs systemd-run; it captures the rc, runs the
post-step, and exits with the original rc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:44:26 +02:00
mwiegand
c16e780283
feat(l4d2-web): server file tree — enable download symmetric with overlay tree
Adds a /servers/<id>/files/download route mirroring the overlay download
endpoint. Same safety rules: real-path must resolve under LEFT4ME_ROOT
(merged view threads through `installation/` and overlay layers, all
already inside the root). The server file-tree partial now renders
download links.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:40:04 +02:00
mwiegand
aacd95012e
feat(l4d2-web): blueprint rename moves to footer modal — matches overlay/server pattern
Drops the inline Name input from the blueprint edit form. A Rename link
sits next to Delete in the page footer; clicking opens a one-line modal
that posts to a new POST /blueprints/<id>/rename route. The main edit
form keeps the current name as a hidden input so its full Save still
works unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:37:29 +02:00
mwiegand
ed12280cf0
feat(l4d2-web): server detail — directory tree of the runtime merged view
Adds a Files section at the bottom of the server detail page that lists
the kernel-overlayfs merged view at runtime/<server_id>/merged/. Reuses
the overlay file-tree partial via two new template variables:

- files_base_url: parent passes "/overlays/<id>" or "/servers/<id>"
- download_supported: false for servers (runtime holds large game
  binaries; no download endpoint), true for overlays (existing behavior)

New service helper safe_resolve_for_server_listing() rejects path
traversal beyond the merged root and returns None when the overlayfs
mount doesn't exist (server never started or just reset).

New route GET /servers/<id>/files?path=<rel> returns the lazy-load
file-tree fragment, gated to the server owner. No download counterpart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:35:09 +02:00
mwiegand
fa686f11e3
feat(l4d2-web): server + overlay detail — live-refresh via HTMX, restructured
Vendors HTMX 2.0.4 (the prior file was a 1-line stub) and uses it to poll
two new partials on a 2s tick while a job is in flight:

- /servers/<id>/actions → state badge, filtered action buttons,
  last-job sentence, live job log (SSE) while a Start/Stop/Reset job
  is running. When the job is terminal the partial re-renders without
  hx-trigger and polling stops.
- /overlays/<id>/build-status → build state badge, last-build
  sentence, live job log while a build_overlay job is running. Same
  terminal-state stop behavior.

Server detail restructure:
- Editable name moves out of the page body into a Rename modal
  triggered from a link next to Delete in the page footer.
- Compact dl with Port (linked as steam://run/550//+connect <host>:<port>)
  and Blueprint.
- Actions row: state badge + state-filtered buttons (start/stop, reset)
  + last-job sentence. Drift warning when desired ≠ actual.
- Recent Jobs table removed.

Overlay detail restructure:
- Single panel, dl Type/Scope, no separate Last build row, no Builds
  section.
- Script form gets two compound submits: "Save and build" and
  "Save, reset and rebuild". Standalone Rebuild/Wipe gone.
- Build status state badge + last-build sentence under the editor;
  action buttons hide while a build is in flight.
- Rename modal in the page footer next to Delete.

sse.js binds on htmx:load (covers initial document and post-swap inserts)
and closes EventSources on htmx:beforeCleanupElement to avoid leaking
streams across swaps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:27:30 +02:00
mwiegand
3c4bd6880a
refactor(l4d2-web): detail-page UI — single panel, soft border, footer Delete
- Detail panels: softer (color-mix --line-soft) border. h2 sub-section
  spacing inside a single outer panel. admin and job_detail collapse to
  one panel each.
- Color tokens: --color-button-primary / --color-button-danger stay
  saturated in dark mode so white text on filled buttons stays readable.
- Site header: transparent, no full-width bar; aligned with panel-content
  width. No more sticky.
- Page-level Delete: low-contrast outline button at the page footer
  (left side, justify-content flex-start). Save buttons no longer
  full-width (.stack > button { justify-self: end }).
- form-actions-inline helper for right-aligned button rows.
- New service: l4d2web.services.timeago.humanize_delta — used by the
  upcoming server / overlay live-status partials.
- Server route: POST /servers/<id> renames the server (mirrors the
  overlay update pattern, returns 409 on per-user duplicate).
- Overlay route: POST /overlays/<id>/script handles `action` form value
  — `save_build` (default) or `save_reset_build` (wipes overlay dir
  before queuing build). Redirect lands on /overlays/<id> instead of
  the job page so users see the live status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:26:57 +02:00
mwiegand
985df970f8
feat(l4d2-web): per-overlay server.cfg aliases — expose checkbox + auto-exec
Each linked overlay gets a checkbox on the blueprint detail page that opts
its server.cfg in as exec server_overlay_<id>. The web app builds the
spec with {path, alias} per overlay and prepends exec server_overlay_<id>
lines to the blueprint config in lowest-overlay-first order. The host
stages those copies in the overlayfs upper layer before mounting (avoids
copy-up writes against a sandbox-uid file). A live preview block above the
Config textarea shows what gets auto-executed.

Schema:
- alembic 0007: BlueprintOverlay.expose_server_cfg BOOLEAN

Spec contract:
- l4d2host OverlayRef(path, alias?). load_spec accepts both bare-string
  and {path, alias} entries.

Side effects folded in (same file in l4d2_facade):
- start_server auto-initializes; the manual Initialize step is no longer
  needed before Start.
- initialize_server no longer runs blueprint builders — builds happen on
  overlay save, not on every server Start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:26:31 +02:00
mwiegand
c2cf723911
docs(agents): require specs and plans to live in this repo
Make explicit that design specs go in docs/superpowers/specs/ and
implementation plans go in docs/superpowers/plans/, both committed
to git, with the YYYY-MM-DD-<topic>[-design].md naming already used
elsewhere in the tree. The plan-mode scratch file under
~/.claude/plans/ is fine while plan mode is open, but the persisted
artifact must end up inside the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:37:17 +02:00
mwiegand
a4e9f6cd26
feat(l4d2-web): blueprint overlay picker — drag-list + add-dropdown
Replace the per-row checkbox + numeric Order table on the blueprint
detail page with a drag-to-reorder list of selected overlays plus a
native <select> for adding more. Removing uses an × button per row;
the option sorted-inserts back into the dropdown alphabetically.

Native HTML5 drag-and-drop, no library, no JS-disabled fallback.
Server contract is unchanged: each list row owns one hidden
<input name="overlay_ids">, DOM order = submission order, and the
existing fallback_position branch in ordered_overlay_ids_from_form
absorbs the now-omitted overlay_position_<id> fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:37:11 +02:00
mwiegand
dec4fed809
docs(specs): blueprint overlay picker — drag-list + add-dropdown
Replace per-row checkbox + numeric Order inputs with a drag-to-reorder
list of selected overlays plus a native <select> for adding more.
Native HTML5 DnD; no library, no JS-disabled fallback. Server contract
unchanged (overlay_ids in DOM order; existing fallback_position branch
absorbs the omitted overlay_position_<id> fields).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:32:45 +02:00
mwiegand
01760a31f5
fix(l4d2-web): textareas — monospace font, consistent rows on blueprint forms
Bash script, Arguments and Config are all structured text — render them
in a monospace font with tab-size: 4 and resize: vertical via a base
'textarea' rule in components.css. Add rows="8" + spellcheck="false"
to the blueprint Arguments/Config textareas (both edit and create
forms) so they're a sensible size and consistent with each other.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:52:12 +02:00
mwiegand
7b31390b4c
fix(l4d2-web): file tree — uniform vertical spacing across all rows
The flex 'gap' shorthand on .file-tree-row was setting row-gap as well
as column-gap, so when the .file-tree-children div wrapped to a new
line the row-gap (--space-s) added on top of the nested ul's
margin-top (--space-xs) — making the button-to-first-child gap visibly
bigger than the sibling-row gap. Switch to 'gap: 0 var(--space-s)' so
only column-gap applies; vertical rhythm is now owned exclusively by
the outer grid gap (--space-xs) and the nested ul margin-top
(--space-xs), both equal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:49:05 +02:00
mwiegand
4619a91f45
fix(l4d2-web): file tree layout — wrap children to next line, align names
Two CSS fixes that together turn the rendered file tree from
'everything on one line' into an actual tree:

- .file-tree-children: flex-basis: 100% so an expanded folder's children
  wrap to the next line of the parent <li> flex container instead of
  flowing inline next to the toggle button.
- .file-tree-row-file: padding-left = chevron width, so file rows align
  visually with sibling folder names (folder names are offset by their
  chevron; files have no chevron, so without padding they'd start at
  the chevron column instead of the name column). Chevron itself
  pinned to width: 1ch so rotated/un-rotated states have identical
  layout.
2026-05-08 20:44:41 +02:00
mwiegand
caa8b83cf0
chore(deploy): rewrite web.env every deploy with machine-id-derived SECRET_KEY
Drops the 'only on first creation' guard so newly added env vars reach
existing boxes (today's SESSION_COOKIE_SECURE=false rake). SECRET_KEY
is now sha256(/etc/machine-id) — stable per host, no session
invalidation across redeploys, no state persisted in /etc that the
deploy has to tiptoe around. Single-operator test deployment; the
secret being machine-id-derivable is acceptable per deploy/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:39:02 +02:00
mwiegand
c958d0352a
fix(l4d2-web): show empty-state when overlay dir is empty, not just missing
Tickrate and other seeded examples whose overlay directory exists but
hasn't been built yet rendered a visually blank Files panel — entries
was [] (not None), so the template fell through to an empty <ul>. Use
'not file_tree_root_entries' so both None (dir missing) and []
(dir empty) trigger the 'No files yet' message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:32:09 +02:00
mwiegand
2ab54a3800
fix(l4d2-web): file tree fetches in plain JS — vendored htmx is a stub
The vendored static/vendor/htmx.min.js turned out to be a 33-byte
placeholder, so the hx-get/hx-target/hx-trigger attributes on the
overlay file tree's folder buttons were inert: clicks rotated the
chevron (own JS) but never fetched. Switch the lazy-load to a
~30-line plain-JS handler in static/js/file-tree.js that fetches
button.dataset.filesUrl on first expand and dedupes via dataset.loaded.
Update the spec/plan to match. Route + partial contracts unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:23:04 +02:00
mwiegand
a11d030edd
feat(l4d2-web): overlay detail Files section with HTMX file tree + downloads
Adds a server-rendered collapsible file tree section to the overlay
detail page so users can verify what their script/workshop overlays
produced and pull individual artifacts (VPKs, configs) without SSH.
HTMX-driven lazy folder expansion with click-to-download via send_file;
symlinks land anywhere under LEFT4ME_ROOT (so workshop addons stream
from the shared cache) but escapes are refused. Same access rule as the
rest of the page (admin or owner). 39 new tests; full web suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:16:25 +02:00
mwiegand
76bd6e8d4d
docs(specs): overlay file tree — design + implementation plan
Captures the design rationale for the new overlay-detail Files section
(verify build output, click-to-download for individual files via Flask
send_file, HTMX-driven lazy folder expansion) and the paired
implementation plan that produced it. Adds .superpowers/ to .gitignore
so brainstorm session artifacts never sneak into a future commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:16:10 +02:00
mwiegand
1166e13e44
feat(l4d2-web): server identity by id, name as display label
Host-side identifier (systemd unit name and /var/lib/left4me dirs) is now
str(server.id), centralized in services/server_identity.server_unit_name.
Server.name becomes a free-form display label, required and unique per
user (was [a-z0-9_-]{1,64} and globally unique).

Migration 0006 swaps the old global UNIQUE(name) for UNIQUE(user_id, name).
Web routes already keyed on id; templates only used name for display.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:22:09 +02:00
mwiegand
0d906605e9
chore: add direnv .envrc for local Python 3.13 venv
Pins to python3.13 to match the Debian Trixie production target.
Documents the dev setup in README and AGENTS.md so a fresh checkout
gets a working `python` via `direnv allow` + editable installs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:56:51 +02:00
mwiegand
196d2db33e
feat(l4d2-web): seed example script overlays from examples/script-overlays/
Bundles four reference script overlays (cedapug_maps, l4d2center_maps,
competitive_rework, tickrate) and adds a `flask seed-script-overlays`
CLI that upserts each *.sh as a system-wide overlay. Test deploy
invokes it after the orphan-cleanup migration so fresh test servers
come up with the same overlays the user has been maintaining by hand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:41:08 +02:00
mwiegand
6b4eef22c2
feat: server Reset action — wipe runtime, keep DB row
Reset stops the systemd service, unmounts the overlay, and rm -rf's both
runtime/<name> and instances/<name>, but keeps the Server row, blueprint,
and (shared) systemd template. Next Start re-initializes from the current
blueprint, so users can clean up logs/caches/accumulated game state without
losing the server.

Implementation factors a shared _purge_instance helper out of
delete_instance; reset_instance reuses it without the existence guard. New
"reset" lifecycle op flows through the same route + worker + facade plumbing
as the other server ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:10:32 +02:00
mwiegand
c8a2d563ce
fix(l4d2-web): server delete job now removes the DB row
The delete job ran l4d2ctl delete (host-side cleanup) but never removed the
Server row, so deleted servers kept appearing on /servers. Hard-delete the
row in the worker's success path and skip the post-op status refresh, since
the systemd unit is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:09:45 +02:00
mwiegand
fb3c6be052
feat(l4d2-web): per-overlay job list + redirect to job after build-triggering edits
Saving a script overlay or adding/removing workshop items now redirects to the
enqueued build job's detail page so logs are immediately visible. Added a new
/overlays/<id>/jobs page (linked as "all builds →" from the overlay detail
page) for browsing the full build history. Renamed the script "Save" button to
"Save and build" to make the side effect explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:44:22 +02:00
mwiegand
5e2c771276
chore(l4d2-web): remove orphaned 'Global map overlays' admin section
The route /admin/global-overlays/refresh was removed with the script-overlays
rewrite (migration 0005 dropped the global_overlay_* tables; the systemd
refresh units were deleted from deploy/). The admin-page form was left
behind and would 404 on submit. Drop the section and lock it out with an
assertion in the existing admin-pages test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:25:15 +02:00
mwiegand
ebddb0fab2
chore(deploy): install p7zip + coreutils for script-overlay tooling
Script overlays commonly need 7z and md5sum (e.g. the l4d2center map
sync recipe). Add p7zip-full to the apt install line, p7zip + p7zip-plugins
to dnf, and coreutils explicitly so md5sum is guaranteed even on slim base
images. Lock both in with a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:23:23 +02:00
mwiegand
406f2196f8
fix(l4d2-web): write sandbox script tmpfile under LEFT4ME_ROOT, not /tmp
The web service unit has PrivateTmp=yes: its /tmp is a per-instance
namespace at /tmp/systemd-private-X-left4me-web.service-Y/tmp/ from
PID 1's perspective. When ScriptBuilder writes /tmp/tmpXXX.sh and
passes that path to the sandbox helper, systemd-run asks PID 1 to set
up BindReadOnlyPaths=${SCRIPT}:/script.sh — but PID 1 lives in the host
namespace and can't resolve the web service's PrivateTmp path. The
unit fails to start with status=226/NAMESPACE and "Failed to set up
mount namespacing: /script.sh: No such file or directory".

Move the tmpfile to ${LEFT4ME_ROOT}/sandbox-scripts/. /var/lib is not
affected by PrivateTmp (only /tmp and /var/tmp are), so PID 1 can
resolve the path. The web service has ReadWritePaths=/var/lib/left4me
already, and the directory is created on demand by Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:14:21 +02:00
mwiegand
023cc5c9b0
fix(deploy): chown WAL+SHM sidecars too, not just left4me.db
SQLite in WAL mode (the default for this app) maintains left4me.db-wal
and left4me.db-shm sidecar files alongside the main DB. All three must
be writable by the web service uid; if any one is root-owned, SQLite
reports "attempt to write a readonly database" on the next INSERT —
which surfaced as a 500 on POST /overlays/{id}/script after I'd done
ad-hoc root-side sqlite3.connect() inspection earlier and the resulting
root-owned WAL/SHM persisted.

Loop over all three paths in the deploy chmod step so root-owned
sidecars are corrected on every deploy. Idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:11:42 +02:00
mwiegand
f6ca85fc6f
fix(deploy): chown left4me.db to left4me:left4me, not root:left4me
The v2 hardening tightened the DB to mode 0640 owned by root:left4me,
intending to block reads from the sandbox uid (l4d2-sandbox, not in the
left4me group). It did — but it also took away write access from the web
service itself, which runs as user left4me. With root owning the file,
left4me only had group-read; INSERTs into the jobs table failed with
"attempt to write a readonly database" and surfaced as a 500 on POST
/overlays/{id}/script.

Owner left4me + group left4me + mode 0640 keeps the same external
posture (l4d2-sandbox gets nothing via "other") while restoring the
web service's read+write access via "owner".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:09:47 +02:00
mwiegand
abc907b14b
docs(specs): script sandbox v3 — egress filter design + plan
Captures the v3 design: IPAddressDeny= alone (no IPAddressAllow=any
because the documented "more specific wins" semantics don't hold on
systemd 257 / kernel 6.12 — the allow trumps unconditionally), explicit
CIDRs (the -p parser rejects the localhost/link-local shorthand
keywords), and a static sandbox-only resolv.conf bind to keep DNS
reachable when private RFC1918 ranges are blocked.

Plan documents what was implemented (in 7e66936) and the lessons
surfaced during execution so the next person doesn't have to rediscover
them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:08:47 +02:00
mwiegand
7e66936d03
feat(deploy): restrict script-sandbox egress to public internet only
Adds IPAddressDeny= to the sandbox unit covering loopback (127/8 + ::1),
link-local (169.254/16 + fe80::/10), multicast (224/4 + ff00::/8), all
RFC1918 v4 (10/8, 172.16/12, 192.168/16), CGNAT (100.64/10), and ULA v6
(fc00::/7). The kernel attaches systemd's sd_fw_egress BPF program to
the unit's cgroup; egress packets matching any of the deny prefixes are
silently dropped at the cgroup boundary.

Important: do NOT pair this with `IPAddressAllow=any`. Documentation
claims "more specific rule wins" but on this systemd 257 + kernel 6.12
combo, having both set causes the allow to win unconditionally — the
deny gets ignored. Empty IPAddressAllow + populated IPAddressDeny is the
correct shape: kernel default "allow all" applies to non-listed
addresses, and the listed prefixes are blocked.

Because the host's resolv.conf typically points at a private-IP DNS
server (10.0.0.1 in the test deploy), blocking RFC1918 also kills DNS.
Adds a static /etc/left4me/sandbox-resolv.conf with public resolvers
(Cloudflare 1.1.1.1, Google 8.8.8.8) and bind-mounts that into the
sandbox at /etc/resolv.conf, replacing the host's resolver inside the
sandbox only.

Smoke-tested on ckn@10.0.4.128:
- public 1.1.1.1:443: CONNECTED
- public HTTPS via DNS (steamcommunity.com): 200
- localhost web app 127.0.0.1:8000: blocked (TimeoutError)
- localhost sshd 127.0.0.1:22: blocked
- private LAN ssh 10.0.4.128:22: blocked
- private DNS 10.0.0.1:53: blocked

AF_UNIX stays in RestrictAddressFamilies — dropping it would risk
breaking NSS / syslog for marginal gain, and the IP-level filter
addresses the primary threat (reaching the host's HTTP/SSH services).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:04:57 +02:00
mwiegand
ae443299c8
chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode
bubblewrap is no longer used now that left4me-script-sandbox runs as a
systemd service unit. Remove it from the apt-get and dnf install lines.

Also tighten the application database file mode after the alembic
upgrade step: chown root:left4me, chmod 0640. The DB had been created
at default 0644 by SQLite's open() call inside the web service, which
made it world-readable on the host — i.e. readable by any uid that can
traverse /var/lib/left4me, including the sandbox's l4d2-sandbox uid.
Smoke-testing the v2 sandbox prototype on ckn@10.0.4.128 surfaced this:
the sandbox could read "SQLite format 3" from the DB until the parent
dir was masked with TemporaryFileSystem=. Tightening the file mode is
the host-level fix; the sandbox-level mask is defense in depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:48:26 +02:00
mwiegand
4ee8f6af44
refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap
Replaces the systemd-run --scope + bwrap composition with systemd-run in
service-unit mode (--pipe --wait, transient .service unit). Same cgroup
limits and walltime kill, plus the hardening directives that --scope
units cannot carry: NoNewPrivileges, ProtectSystem=strict, ProtectHome,
ProtectKernel{Tunables,Modules,Logs,ControlGroups}, RestrictNamespaces,
RestrictAddressFamilies, RestrictSUIDSGID, LockPersonality,
MemoryDenyWriteExecute, SystemCallFilter (seccomp), and an empty
CapabilityBoundingSet (drops all caps). UID drop via User=/Group=.

The TemporaryFileSystem="/etc /var/lib" pair is the gotcha:
ProtectSystem=strict makes /var/lib *read-only* but visible, so the host
DB at /var/lib/left4me/left4me.db (mode 0644) was readable from inside.
Masking /var/lib with tmpfs hides the entire subtree; the BindPaths bind
to /overlay is at a different path and unaffected.

The Python side (ScriptBuilder, run_sandboxed_script, routes) is
unchanged — same sudo-helper invocation, same argv shape.

Loses PID-namespace isolation (no PrivatePID= directive in systemd).
Host PIDs are visible via /proc and ps -ef but not signal-able due to
UID mismatch — information disclosure only, not a privilege boundary.

Smoke-tested on ckn@10.0.4.128 prior to this commit; all isolation
invariants reproduced and the hardening directives provably blocked
unshare(2), mount(2), personality(2), bpf(2), and sysctl writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:47:30 +02:00
mwiegand
efaaf84cd9
docs(specs): script sandbox v2 — systemd-only design + plan
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
  must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
  Information disclosure only — kernel UID-mismatch blocks signals.

Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:46:13 +02:00
mwiegand
a62f26ba4a
fix(l4d2-web): normalize CRLF to LF in script overlay POST
HTML <textarea> form submission encodes line breaks as CRLF per spec.
Storing those CRLFs unchanged means every line of the script reaches
bash with a trailing \r, which bash treats as part of the argument —
turning "ls /" into "ls /\r" and failing. Normalize CRLF/CR → LF in the
/overlays/{id}/script handler so storage and the sandbox tmpfile are
LF-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:20:10 +02:00
mwiegand
908bca3687
fix(l4d2-web): ScriptBuilder — chmod script tmpfile to 0644 for sandbox read
NamedTemporaryFile creates the script file at mode 0600 owned by the
left4me web user. The sandbox runs as l4d2-sandbox and bwrap bind-mounts
the file read-only at /script.sh, but the kernel still enforces the
underlying file's permissions — l4d2-sandbox can't read 0600 left4me
files, so /bin/bash /script.sh fails with "Permission denied".

Script content is not a secret (it's stored in the DB and editable by
the user), so 0644 is appropriate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:18:00 +02:00
mwiegand
cf865d4915
fix(deploy): one-shot cleanup of orphan overlay dirs after globals removal
Migration 0005_script_overlays drops the legacy l4d2center_maps /
cedapug_maps overlay rows but leaves their /var/lib/left4me/overlays/{id}
directories on disk. When the web app subsequently creates a new overlay
and AUTOINCREMENT issues an id matching one of those orphans,
create_overlay_directory(exist_ok=False) crashes with FileExistsError —
which surfaced as a 500 on POST /overlays the first time a script
overlay was created on a deployed test box.

Adds a sentinel-gated sweep in deploy-test-server.sh that lists overlay
ids in the DB, removes any directory under overlays/ whose id has no
matching row, and drops the now-unused global_overlay_cache. Mirrors the
.kernel-overlay-migrated sentinel pattern so reruns are no-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:16:33 +02:00
mwiegand
06ae84fbe4
fix(deploy): script-sandbox helper — UID drop via systemd-run, --unshare-user-try, /etc/alternatives
Smoke testing on the test host revealed three issues with the helper as
shipped:

1. bwrap 0.11+ rejects --uid without --unshare-user. Switching the UID
   drop from inside bwrap to systemd-run (--uid=l4d2-sandbox
   --gid=l4d2-sandbox) sidesteps the userns UID-mapping headaches and
   keeps file ownership on the bind-mounted /overlay matching
   l4d2-sandbox on the host (which the wipe path relies on).

2. bwrap running as an unprivileged uid still needs a user namespace to
   set up its mount-namespace bind-mounts. Adding --unshare-user-try
   gives it the userns context when needed and is a no-op otherwise.

3. /etc/alternatives wasn't bind-mounted, so symlinked tools like
   /usr/bin/awk -> /etc/alternatives/awk fell over inside the sandbox.
   Adds the ro-bind.

Also: the helper now chowns the overlay dir to l4d2-sandbox before bwrap
(idempotent — needed because the web app creates the dir as left4me),
and the deploy script chmods /var/lib/left4me to 0711 so l4d2-sandbox
can traverse to the bind-mount source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:12:46 +02:00
mwiegand
1e62a44c16
docs(deploy): replace globals overlay description with script overlays
deploy/README.md still described the deleted managed-global overlays as
the second overlay surface. Replace with a description of script
overlays (bubblewrap + systemd-run sandbox, resource caps).

Full test sweep: 367 passing, 2 skipped across l4d2web, l4d2host, deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:56:24 +02:00
mwiegand
e51a4d58a4
chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer
deploy-test-server.sh: provisions the l4d2-sandbox system user (no home,
nologin shell) and installs the bubblewrap apt/dnf package; copies the
left4me-script-sandbox helper into /usr/local/libexec/left4me with mode
0755. Drops the global_overlay_cache directory provisioning, the
refresh-global-overlays unit installation, and the timer enable.

Deletes the orphaned left4me-refresh-global-overlays.{service,timer}
files. Trims the matching paragraph from deploy/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:54:57 +02:00
mwiegand
75e703e1a4
feat(deploy): left4me-script-sandbox helper + sudoers fragment
Privileged bash helper that wraps user-authored scripts in
systemd-run --scope (cgroup limits + RuntimeMaxSec=3600) inside a
bubblewrap sandbox dropped to the l4d2-sandbox uid. Network is shared
with the host so scripts can fetch from Steam / l4d2center / etc.;
filesystem is RO except for /overlay (rw bind from
/var/lib/left4me/overlays/{id}) and tmpfs /tmp + /run.

Adds a sudoers rule allowing the left4me user to invoke this helper
without restrictions on its arguments. Strict argument validation is
in the helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:53:21 +02:00
mwiegand
d351bcbee5
feat(l4d2-web): script overlay UI
Adds the script type to the create-overlay modal (with an admin-only
system-wide checkbox) and a script-section to the detail page: textarea
for the bash body, Save / Rebuild / Wipe buttons, last_build_status
badge, latest-build-job link, and a Wipe confirm modal. Removes the
GlobalOverlaySource block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:50:36 +02:00
mwiegand
be22744d54
feat(l4d2-web): script overlay routes (script update / wipe / build)
Adds POST /overlays/{id}/script, /wipe, /build under the overlay blueprint.
Generalizes /build to handle any owner/admin-editable overlay (deletes the
duplicate workshop-specific manual_build). Wipe runs the literal script
"find /overlay -mindepth 1 -delete" through run_sandboxed_script and
refuses with 409 while a build_overlay job is running. Adds an
admin-only system_wide=1 flag to POST /overlays for system-wide creation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:48:15 +02:00
mwiegand
879c54cbda
refactor(l4d2-web): drop refresh_global_overlays from scheduler
GLOBAL_OPERATIONS becomes {"install", "refresh_workshop_items"}.
Removes refresh_global_overlays_running from SchedulerState and the
_run_refresh_global_overlays dispatch. Drops dead test cases and pins
GLOBAL_OPERATIONS contents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:45:34 +02:00
mwiegand
9f476e3456
refactor(l4d2-web): drop global-overlays subsystem in favor of script type
Deletes the global_map_sources, global_overlay_refresh, global_map_cache,
and global_overlays service modules and their tests. Removes the
refresh-global-overlays CLI command, the /admin/global-overlays/refresh
route, and the GlobalOverlaySource view in overlay_detail rendering.
Drops py7zr from dependencies — was only used by the deleted subsystem.

The job_worker scheduler still tracks refresh_global_overlays; that
cleanup is Task 4. Deploy/README references are Task 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:43:41 +02:00
mwiegand
d29afa41fa
feat(l4d2-web): ScriptBuilder + BUILDERS registry update
Adds ScriptBuilder that runs user-authored bash inside the
left4me-script-sandbox helper via run_command, with a 20 GB post-build
disk cap. Registry now {"workshop", "script"}.
finish_job writes Overlay.last_build_status on build_overlay completion.
Drops GlobalMapOverlayBuilder and the now-unreachable
_check_global_overlay_caches in l4d2_facade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:39:13 +02:00
mwiegand
43dc9b0ccf
feat(l4d2-web): script overlay schema — add overlay.script + last_build_status, drop globals tables
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:33:04 +02:00
mwiegand
78ead0b41d
docs(specs): script overlay type — design + implementation plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:27:14 +02:00
mwiegand
9985ecc56c
chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs
Drop MountFlags=shared (the assumption that it propagated fuse mounts
to host was incorrect on systemd 257 with ProtectSystem+ReadWritePaths).
Restore PrivateTmp=true (was dropped in 593611e for fuse propagation
that did not work). Rewrite the comment block to describe the new
model: mounts go through the left4me-overlay helper which nsenters
into PID 1's mount namespace, so the unit's mount-ns layout is no
longer load-bearing.

Update the three user-facing READMEs (root, l4d2host, deploy) to drop
fuse-overlayfs / fusermount3 prereqs and call out the kernel overlayfs
mount path through the privileged helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:29:49 +02:00