Commit graph

356 commits

Author SHA1 Message Date
mwiegand
20604dd79c
docs(deploy): document CPU isolation in performance-tuning section
Explains the core-0-vs-game-cores split, the LEFT4ME_SYSTEM_CPUS /
LEFT4ME_GAME_CPUS overrides, the single-core skip, and the
subset-of relationship with per-instance CPUAffinity=.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:06:59 +02:00
mwiegand
af3171102a
feat(deploy): cgroup-v2 cpuset drop-ins pin system to core 0, game to rest
Computes NPROC at deploy time. Defaults LEFT4ME_SYSTEM_CPUS=0 and
LEFT4ME_GAME_CPUS=1-(NPROC-1). Single-core hosts skip cpuset writes
with a stderr warning unless an env var override is set. Spec:
docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:06:34 +02:00
mwiegand
c91c029c38
docs(plans): l4d2 cpu isolation — implementation plan
Two TDD tasks: deploy-script cpuset block + tests, README
"CPU isolation" subsection. Operator-side smoke test in F.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:03:37 +02:00
mwiegand
17b7c2ff10
docs(specs): l4d2 cpu isolation — design
cgroup-v2 AllowedCPUs= drop-ins for system/user/build/game slices.
Defaults: core 0 for everything-not-game, cores 1..N-1 for game,
computed from nproc. LEFT4ME_SYSTEM_CPUS / LEFT4ME_GAME_CPUS
overrides; single-core hosts skip with a warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:03:37 +02:00
mwiegand
e5126c8c0b
docs(deploy): tighten perf-tuning escape hatches
- RT example: add AmbientCapabilities=CAP_SYS_NICE so the User=left4me
  service can actually enter SCHED_FIFO on Trixie.
- CPU governor: note that linux-cpupower may need apt install.
- CPUAffinity=2: clarify that per-instance values typically increment.
- NIC tuning: note that ethtool may need apt install.
2026-05-09 10:15:45 +02:00
mwiegand
9e0f6f17ef
docs(deploy): performance-tuning escape-hatch section in README
Documents CPU governor, per-instance CPUAffinity, NIC tuning, and
SCHED_FIFO opt-in patterns. None of these are auto-applied; they're
ops-side knobs for measured problems the perf baseline doesn't solve.
2026-05-09 10:09:40 +02:00
mwiegand
928519fa34
feat(deploy): install slice + sysctl artifacts and apply via sysctl --system
Copies l4d2-game.slice and l4d2-build.slice into
/usr/local/lib/systemd/system/, installs 99-left4me.conf into
/etc/sysctl.d/, and runs sysctl --system so the perf baseline is
live this deploy, not on next reboot.
2026-05-09 10:05:41 +02:00
mwiegand
7e4a5691ed
feat(deploy): script-sandbox runs in l4d2-build.slice + OOMScoreAdjust=500
Builds yield CPU/IO to game-server instances under contention via the
slice's weight=10, and are killed first under memory pressure
(servers have OOMScoreAdjust=-200).
2026-05-09 10:01:38 +02:00
mwiegand
b3fca4772c
feat(deploy): host sysctls for UDP buffers + netdev backlog/budget
99-left4me.conf: rmem_max/wmem_max=8M (with 512K defaults),
netdev_max_backlog=5000, netdev_budget=600, vm.swappiness=10.
2026-05-09 09:53:07 +02:00
mwiegand
66d83a0282
docs(deploy): point slice files at perf baseline spec
Matches the spec-pointer comment Task 1 added to
left4me-server@.service. A future operator running
`systemctl cat l4d2-game.slice` now finds the rationale.
2026-05-09 09:51:48 +02:00
mwiegand
ad7d73608e
feat(deploy): l4d2-game.slice + l4d2-build.slice with 100:1 weight ratio
Flat top-level slices. Game wins under contention; build still gets
the box when uncontended. Referenced by left4me-server@.service and
the script-sandbox systemd-run invocation.
2026-05-09 09:48:41 +02:00
mwiegand
7193163488
feat(deploy): perf-baseline directives on left4me-server@.service
Slice=l4d2-game.slice, Nice=-5, IOSchedulingClass=best-effort,
OOMScoreAdjust=-200, MemoryHigh=1.5G, MemoryMax=2G, TasksMax=256,
LimitNOFILE=65536, KillSignal=SIGINT, TimeoutStopSec=15s,
LogRateLimitIntervalSec=0. Spec:
docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
2026-05-09 09:44:12 +02:00
mwiegand
851e6629aa
docs(plans): l4d2 server host perf baseline — implementation plan
Six tasks (TDD, one commit each): unit directives, slice files,
sysctl conf, sandbox slice + OOMScoreAdjust, deploy-script wiring,
README escape-hatch section. Final verification step with full
deploy + host + web pytest sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:39:12 +02:00
mwiegand
b6574e308b
docs(specs): perf baseline — fix transient-service phrasing
The existing left4me-script-sandbox helper uses systemd-run in
transient service mode (--unit=, no --scope). Spec wrongly said
'--scope'. No semantic change — the design's --slice= and
-p OOMScoreAdjust= guidance is identical for service vs scope mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:39:12 +02:00
mwiegand
db3b149045
docs(specs): l4d2 server host perf baseline — design
Approach A: per-instance unit directives (Nice, OOM, Memory caps,
KillSignal=SIGINT, log-rate disable), flat l4d2-game/l4d2-build slice
hierarchy with 100:1 CPU/IO weight ratio, sandbox into build slice with
OOMScoreAdjust=500, host sysctls for UDP buffers + netdev backlog/budget
+ vm.swappiness. SCHED_FIFO, CPU governor, CPUAffinity, NIC tuning are
documented escape hatches, not auto-applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:31:05 +02:00
mwiegand
965b67e6fc
fix(l4d2-host): script-sandbox normalizes file perms so web user can read
Cedapug's build script writes .cedapug/manifest.tsv with mode 0600 owned
by l4d2-sandbox; the web service (left4me uid) then 500s when streaming
that file via the download route — PermissionError on open().

Two fixes:
- UMask=0022 on the systemd-run unit so new file writes default to
  0644 / dirs to 0755.
- Post-script chmod o+r/o+rx walk over the overlay dir to backfill any
  stricter modes the script left behind (e.g. shells/tools that ignore
  umask and explicitly create with 0600).

The helper no longer execs systemd-run; it captures the rc, runs the
post-step, and exits with the original rc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:44:26 +02:00
mwiegand
c16e780283
feat(l4d2-web): server file tree — enable download symmetric with overlay tree
Adds a /servers/<id>/files/download route mirroring the overlay download
endpoint. Same safety rules: real-path must resolve under LEFT4ME_ROOT
(merged view threads through `installation/` and overlay layers, all
already inside the root). The server file-tree partial now renders
download links.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:40:04 +02:00
mwiegand
aacd95012e
feat(l4d2-web): blueprint rename moves to footer modal — matches overlay/server pattern
Drops the inline Name input from the blueprint edit form. A Rename link
sits next to Delete in the page footer; clicking opens a one-line modal
that posts to a new POST /blueprints/<id>/rename route. The main edit
form keeps the current name as a hidden input so its full Save still
works unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:37:29 +02:00
mwiegand
ed12280cf0
feat(l4d2-web): server detail — directory tree of the runtime merged view
Adds a Files section at the bottom of the server detail page that lists
the kernel-overlayfs merged view at runtime/<server_id>/merged/. Reuses
the overlay file-tree partial via two new template variables:

- files_base_url: parent passes "/overlays/<id>" or "/servers/<id>"
- download_supported: false for servers (runtime holds large game
  binaries; no download endpoint), true for overlays (existing behavior)

New service helper safe_resolve_for_server_listing() rejects path
traversal beyond the merged root and returns None when the overlayfs
mount doesn't exist (server never started or just reset).

New route GET /servers/<id>/files?path=<rel> returns the lazy-load
file-tree fragment, gated to the server owner. No download counterpart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:35:09 +02:00
mwiegand
fa686f11e3
feat(l4d2-web): server + overlay detail — live-refresh via HTMX, restructured
Vendors HTMX 2.0.4 (the prior file was a 1-line stub) and uses it to poll
two new partials on a 2s tick while a job is in flight:

- /servers/<id>/actions → state badge, filtered action buttons,
  last-job sentence, live job log (SSE) while a Start/Stop/Reset job
  is running. When the job is terminal the partial re-renders without
  hx-trigger and polling stops.
- /overlays/<id>/build-status → build state badge, last-build
  sentence, live job log while a build_overlay job is running. Same
  terminal-state stop behavior.

Server detail restructure:
- Editable name moves out of the page body into a Rename modal
  triggered from a link next to Delete in the page footer.
- Compact dl with Port (linked as steam://run/550//+connect <host>:<port>)
  and Blueprint.
- Actions row: state badge + state-filtered buttons (start/stop, reset)
  + last-job sentence. Drift warning when desired ≠ actual.
- Recent Jobs table removed.

Overlay detail restructure:
- Single panel, dl Type/Scope, no separate Last build row, no Builds
  section.
- Script form gets two compound submits: "Save and build" and
  "Save, reset and rebuild". Standalone Rebuild/Wipe gone.
- Build status state badge + last-build sentence under the editor;
  action buttons hide while a build is in flight.
- Rename modal in the page footer next to Delete.

sse.js binds on htmx:load (covers initial document and post-swap inserts)
and closes EventSources on htmx:beforeCleanupElement to avoid leaking
streams across swaps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:27:30 +02:00
mwiegand
3c4bd6880a
refactor(l4d2-web): detail-page UI — single panel, soft border, footer Delete
- Detail panels: softer (color-mix --line-soft) border. h2 sub-section
  spacing inside a single outer panel. admin and job_detail collapse to
  one panel each.
- Color tokens: --color-button-primary / --color-button-danger stay
  saturated in dark mode so white text on filled buttons stays readable.
- Site header: transparent, no full-width bar; aligned with panel-content
  width. No more sticky.
- Page-level Delete: low-contrast outline button at the page footer
  (left side, justify-content flex-start). Save buttons no longer
  full-width (.stack > button { justify-self: end }).
- form-actions-inline helper for right-aligned button rows.
- New service: l4d2web.services.timeago.humanize_delta — used by the
  upcoming server / overlay live-status partials.
- Server route: POST /servers/<id> renames the server (mirrors the
  overlay update pattern, returns 409 on per-user duplicate).
- Overlay route: POST /overlays/<id>/script handles `action` form value
  — `save_build` (default) or `save_reset_build` (wipes overlay dir
  before queuing build). Redirect lands on /overlays/<id> instead of
  the job page so users see the live status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:26:57 +02:00
mwiegand
985df970f8
feat(l4d2-web): per-overlay server.cfg aliases — expose checkbox + auto-exec
Each linked overlay gets a checkbox on the blueprint detail page that opts
its server.cfg in as exec server_overlay_<id>. The web app builds the
spec with {path, alias} per overlay and prepends exec server_overlay_<id>
lines to the blueprint config in lowest-overlay-first order. The host
stages those copies in the overlayfs upper layer before mounting (avoids
copy-up writes against a sandbox-uid file). A live preview block above the
Config textarea shows what gets auto-executed.

Schema:
- alembic 0007: BlueprintOverlay.expose_server_cfg BOOLEAN

Spec contract:
- l4d2host OverlayRef(path, alias?). load_spec accepts both bare-string
  and {path, alias} entries.

Side effects folded in (same file in l4d2_facade):
- start_server auto-initializes; the manual Initialize step is no longer
  needed before Start.
- initialize_server no longer runs blueprint builders — builds happen on
  overlay save, not on every server Start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:26:31 +02:00
mwiegand
c2cf723911
docs(agents): require specs and plans to live in this repo
Make explicit that design specs go in docs/superpowers/specs/ and
implementation plans go in docs/superpowers/plans/, both committed
to git, with the YYYY-MM-DD-<topic>[-design].md naming already used
elsewhere in the tree. The plan-mode scratch file under
~/.claude/plans/ is fine while plan mode is open, but the persisted
artifact must end up inside the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:37:17 +02:00
mwiegand
a4e9f6cd26
feat(l4d2-web): blueprint overlay picker — drag-list + add-dropdown
Replace the per-row checkbox + numeric Order table on the blueprint
detail page with a drag-to-reorder list of selected overlays plus a
native <select> for adding more. Removing uses an × button per row;
the option sorted-inserts back into the dropdown alphabetically.

Native HTML5 drag-and-drop, no library, no JS-disabled fallback.
Server contract is unchanged: each list row owns one hidden
<input name="overlay_ids">, DOM order = submission order, and the
existing fallback_position branch in ordered_overlay_ids_from_form
absorbs the now-omitted overlay_position_<id> fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:37:11 +02:00
mwiegand
dec4fed809
docs(specs): blueprint overlay picker — drag-list + add-dropdown
Replace per-row checkbox + numeric Order inputs with a drag-to-reorder
list of selected overlays plus a native <select> for adding more.
Native HTML5 DnD; no library, no JS-disabled fallback. Server contract
unchanged (overlay_ids in DOM order; existing fallback_position branch
absorbs the omitted overlay_position_<id> fields).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:32:45 +02:00
mwiegand
01760a31f5
fix(l4d2-web): textareas — monospace font, consistent rows on blueprint forms
Bash script, Arguments and Config are all structured text — render them
in a monospace font with tab-size: 4 and resize: vertical via a base
'textarea' rule in components.css. Add rows="8" + spellcheck="false"
to the blueprint Arguments/Config textareas (both edit and create
forms) so they're a sensible size and consistent with each other.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:52:12 +02:00
mwiegand
7b31390b4c
fix(l4d2-web): file tree — uniform vertical spacing across all rows
The flex 'gap' shorthand on .file-tree-row was setting row-gap as well
as column-gap, so when the .file-tree-children div wrapped to a new
line the row-gap (--space-s) added on top of the nested ul's
margin-top (--space-xs) — making the button-to-first-child gap visibly
bigger than the sibling-row gap. Switch to 'gap: 0 var(--space-s)' so
only column-gap applies; vertical rhythm is now owned exclusively by
the outer grid gap (--space-xs) and the nested ul margin-top
(--space-xs), both equal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:49:05 +02:00
mwiegand
4619a91f45
fix(l4d2-web): file tree layout — wrap children to next line, align names
Two CSS fixes that together turn the rendered file tree from
'everything on one line' into an actual tree:

- .file-tree-children: flex-basis: 100% so an expanded folder's children
  wrap to the next line of the parent <li> flex container instead of
  flowing inline next to the toggle button.
- .file-tree-row-file: padding-left = chevron width, so file rows align
  visually with sibling folder names (folder names are offset by their
  chevron; files have no chevron, so without padding they'd start at
  the chevron column instead of the name column). Chevron itself
  pinned to width: 1ch so rotated/un-rotated states have identical
  layout.
2026-05-08 20:44:41 +02:00
mwiegand
caa8b83cf0
chore(deploy): rewrite web.env every deploy with machine-id-derived SECRET_KEY
Drops the 'only on first creation' guard so newly added env vars reach
existing boxes (today's SESSION_COOKIE_SECURE=false rake). SECRET_KEY
is now sha256(/etc/machine-id) — stable per host, no session
invalidation across redeploys, no state persisted in /etc that the
deploy has to tiptoe around. Single-operator test deployment; the
secret being machine-id-derivable is acceptable per deploy/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:39:02 +02:00
mwiegand
c958d0352a
fix(l4d2-web): show empty-state when overlay dir is empty, not just missing
Tickrate and other seeded examples whose overlay directory exists but
hasn't been built yet rendered a visually blank Files panel — entries
was [] (not None), so the template fell through to an empty <ul>. Use
'not file_tree_root_entries' so both None (dir missing) and []
(dir empty) trigger the 'No files yet' message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:32:09 +02:00
mwiegand
2ab54a3800
fix(l4d2-web): file tree fetches in plain JS — vendored htmx is a stub
The vendored static/vendor/htmx.min.js turned out to be a 33-byte
placeholder, so the hx-get/hx-target/hx-trigger attributes on the
overlay file tree's folder buttons were inert: clicks rotated the
chevron (own JS) but never fetched. Switch the lazy-load to a
~30-line plain-JS handler in static/js/file-tree.js that fetches
button.dataset.filesUrl on first expand and dedupes via dataset.loaded.
Update the spec/plan to match. Route + partial contracts unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:23:04 +02:00
mwiegand
a11d030edd
feat(l4d2-web): overlay detail Files section with HTMX file tree + downloads
Adds a server-rendered collapsible file tree section to the overlay
detail page so users can verify what their script/workshop overlays
produced and pull individual artifacts (VPKs, configs) without SSH.
HTMX-driven lazy folder expansion with click-to-download via send_file;
symlinks land anywhere under LEFT4ME_ROOT (so workshop addons stream
from the shared cache) but escapes are refused. Same access rule as the
rest of the page (admin or owner). 39 new tests; full web suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:16:25 +02:00
mwiegand
76bd6e8d4d
docs(specs): overlay file tree — design + implementation plan
Captures the design rationale for the new overlay-detail Files section
(verify build output, click-to-download for individual files via Flask
send_file, HTMX-driven lazy folder expansion) and the paired
implementation plan that produced it. Adds .superpowers/ to .gitignore
so brainstorm session artifacts never sneak into a future commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:16:10 +02:00
mwiegand
1166e13e44
feat(l4d2-web): server identity by id, name as display label
Host-side identifier (systemd unit name and /var/lib/left4me dirs) is now
str(server.id), centralized in services/server_identity.server_unit_name.
Server.name becomes a free-form display label, required and unique per
user (was [a-z0-9_-]{1,64} and globally unique).

Migration 0006 swaps the old global UNIQUE(name) for UNIQUE(user_id, name).
Web routes already keyed on id; templates only used name for display.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:22:09 +02:00
mwiegand
0d906605e9
chore: add direnv .envrc for local Python 3.13 venv
Pins to python3.13 to match the Debian Trixie production target.
Documents the dev setup in README and AGENTS.md so a fresh checkout
gets a working `python` via `direnv allow` + editable installs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:56:51 +02:00
mwiegand
196d2db33e
feat(l4d2-web): seed example script overlays from examples/script-overlays/
Bundles four reference script overlays (cedapug_maps, l4d2center_maps,
competitive_rework, tickrate) and adds a `flask seed-script-overlays`
CLI that upserts each *.sh as a system-wide overlay. Test deploy
invokes it after the orphan-cleanup migration so fresh test servers
come up with the same overlays the user has been maintaining by hand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:41:08 +02:00
mwiegand
6b4eef22c2
feat: server Reset action — wipe runtime, keep DB row
Reset stops the systemd service, unmounts the overlay, and rm -rf's both
runtime/<name> and instances/<name>, but keeps the Server row, blueprint,
and (shared) systemd template. Next Start re-initializes from the current
blueprint, so users can clean up logs/caches/accumulated game state without
losing the server.

Implementation factors a shared _purge_instance helper out of
delete_instance; reset_instance reuses it without the existence guard. New
"reset" lifecycle op flows through the same route + worker + facade plumbing
as the other server ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:10:32 +02:00
mwiegand
c8a2d563ce
fix(l4d2-web): server delete job now removes the DB row
The delete job ran l4d2ctl delete (host-side cleanup) but never removed the
Server row, so deleted servers kept appearing on /servers. Hard-delete the
row in the worker's success path and skip the post-op status refresh, since
the systemd unit is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:09:45 +02:00
mwiegand
fb3c6be052
feat(l4d2-web): per-overlay job list + redirect to job after build-triggering edits
Saving a script overlay or adding/removing workshop items now redirects to the
enqueued build job's detail page so logs are immediately visible. Added a new
/overlays/<id>/jobs page (linked as "all builds →" from the overlay detail
page) for browsing the full build history. Renamed the script "Save" button to
"Save and build" to make the side effect explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:44:22 +02:00
mwiegand
5e2c771276
chore(l4d2-web): remove orphaned 'Global map overlays' admin section
The route /admin/global-overlays/refresh was removed with the script-overlays
rewrite (migration 0005 dropped the global_overlay_* tables; the systemd
refresh units were deleted from deploy/). The admin-page form was left
behind and would 404 on submit. Drop the section and lock it out with an
assertion in the existing admin-pages test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:25:15 +02:00
mwiegand
ebddb0fab2
chore(deploy): install p7zip + coreutils for script-overlay tooling
Script overlays commonly need 7z and md5sum (e.g. the l4d2center map
sync recipe). Add p7zip-full to the apt install line, p7zip + p7zip-plugins
to dnf, and coreutils explicitly so md5sum is guaranteed even on slim base
images. Lock both in with a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:23:23 +02:00
mwiegand
406f2196f8
fix(l4d2-web): write sandbox script tmpfile under LEFT4ME_ROOT, not /tmp
The web service unit has PrivateTmp=yes: its /tmp is a per-instance
namespace at /tmp/systemd-private-X-left4me-web.service-Y/tmp/ from
PID 1's perspective. When ScriptBuilder writes /tmp/tmpXXX.sh and
passes that path to the sandbox helper, systemd-run asks PID 1 to set
up BindReadOnlyPaths=${SCRIPT}:/script.sh — but PID 1 lives in the host
namespace and can't resolve the web service's PrivateTmp path. The
unit fails to start with status=226/NAMESPACE and "Failed to set up
mount namespacing: /script.sh: No such file or directory".

Move the tmpfile to ${LEFT4ME_ROOT}/sandbox-scripts/. /var/lib is not
affected by PrivateTmp (only /tmp and /var/tmp are), so PID 1 can
resolve the path. The web service has ReadWritePaths=/var/lib/left4me
already, and the directory is created on demand by Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:14:21 +02:00
mwiegand
023cc5c9b0
fix(deploy): chown WAL+SHM sidecars too, not just left4me.db
SQLite in WAL mode (the default for this app) maintains left4me.db-wal
and left4me.db-shm sidecar files alongside the main DB. All three must
be writable by the web service uid; if any one is root-owned, SQLite
reports "attempt to write a readonly database" on the next INSERT —
which surfaced as a 500 on POST /overlays/{id}/script after I'd done
ad-hoc root-side sqlite3.connect() inspection earlier and the resulting
root-owned WAL/SHM persisted.

Loop over all three paths in the deploy chmod step so root-owned
sidecars are corrected on every deploy. Idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:11:42 +02:00
mwiegand
f6ca85fc6f
fix(deploy): chown left4me.db to left4me:left4me, not root:left4me
The v2 hardening tightened the DB to mode 0640 owned by root:left4me,
intending to block reads from the sandbox uid (l4d2-sandbox, not in the
left4me group). It did — but it also took away write access from the web
service itself, which runs as user left4me. With root owning the file,
left4me only had group-read; INSERTs into the jobs table failed with
"attempt to write a readonly database" and surfaced as a 500 on POST
/overlays/{id}/script.

Owner left4me + group left4me + mode 0640 keeps the same external
posture (l4d2-sandbox gets nothing via "other") while restoring the
web service's read+write access via "owner".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:09:47 +02:00
mwiegand
abc907b14b
docs(specs): script sandbox v3 — egress filter design + plan
Captures the v3 design: IPAddressDeny= alone (no IPAddressAllow=any
because the documented "more specific wins" semantics don't hold on
systemd 257 / kernel 6.12 — the allow trumps unconditionally), explicit
CIDRs (the -p parser rejects the localhost/link-local shorthand
keywords), and a static sandbox-only resolv.conf bind to keep DNS
reachable when private RFC1918 ranges are blocked.

Plan documents what was implemented (in 7e66936) and the lessons
surfaced during execution so the next person doesn't have to rediscover
them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:08:47 +02:00
mwiegand
7e66936d03
feat(deploy): restrict script-sandbox egress to public internet only
Adds IPAddressDeny= to the sandbox unit covering loopback (127/8 + ::1),
link-local (169.254/16 + fe80::/10), multicast (224/4 + ff00::/8), all
RFC1918 v4 (10/8, 172.16/12, 192.168/16), CGNAT (100.64/10), and ULA v6
(fc00::/7). The kernel attaches systemd's sd_fw_egress BPF program to
the unit's cgroup; egress packets matching any of the deny prefixes are
silently dropped at the cgroup boundary.

Important: do NOT pair this with `IPAddressAllow=any`. Documentation
claims "more specific rule wins" but on this systemd 257 + kernel 6.12
combo, having both set causes the allow to win unconditionally — the
deny gets ignored. Empty IPAddressAllow + populated IPAddressDeny is the
correct shape: kernel default "allow all" applies to non-listed
addresses, and the listed prefixes are blocked.

Because the host's resolv.conf typically points at a private-IP DNS
server (10.0.0.1 in the test deploy), blocking RFC1918 also kills DNS.
Adds a static /etc/left4me/sandbox-resolv.conf with public resolvers
(Cloudflare 1.1.1.1, Google 8.8.8.8) and bind-mounts that into the
sandbox at /etc/resolv.conf, replacing the host's resolver inside the
sandbox only.

Smoke-tested on ckn@10.0.4.128:
- public 1.1.1.1:443: CONNECTED
- public HTTPS via DNS (steamcommunity.com): 200
- localhost web app 127.0.0.1:8000: blocked (TimeoutError)
- localhost sshd 127.0.0.1:22: blocked
- private LAN ssh 10.0.4.128:22: blocked
- private DNS 10.0.0.1:53: blocked

AF_UNIX stays in RestrictAddressFamilies — dropping it would risk
breaking NSS / syslog for marginal gain, and the IP-level filter
addresses the primary threat (reaching the host's HTTP/SSH services).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:04:57 +02:00
mwiegand
ae443299c8
chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode
bubblewrap is no longer used now that left4me-script-sandbox runs as a
systemd service unit. Remove it from the apt-get and dnf install lines.

Also tighten the application database file mode after the alembic
upgrade step: chown root:left4me, chmod 0640. The DB had been created
at default 0644 by SQLite's open() call inside the web service, which
made it world-readable on the host — i.e. readable by any uid that can
traverse /var/lib/left4me, including the sandbox's l4d2-sandbox uid.
Smoke-testing the v2 sandbox prototype on ckn@10.0.4.128 surfaced this:
the sandbox could read "SQLite format 3" from the DB until the parent
dir was masked with TemporaryFileSystem=. Tightening the file mode is
the host-level fix; the sandbox-level mask is defense in depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:48:26 +02:00
mwiegand
4ee8f6af44
refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap
Replaces the systemd-run --scope + bwrap composition with systemd-run in
service-unit mode (--pipe --wait, transient .service unit). Same cgroup
limits and walltime kill, plus the hardening directives that --scope
units cannot carry: NoNewPrivileges, ProtectSystem=strict, ProtectHome,
ProtectKernel{Tunables,Modules,Logs,ControlGroups}, RestrictNamespaces,
RestrictAddressFamilies, RestrictSUIDSGID, LockPersonality,
MemoryDenyWriteExecute, SystemCallFilter (seccomp), and an empty
CapabilityBoundingSet (drops all caps). UID drop via User=/Group=.

The TemporaryFileSystem="/etc /var/lib" pair is the gotcha:
ProtectSystem=strict makes /var/lib *read-only* but visible, so the host
DB at /var/lib/left4me/left4me.db (mode 0644) was readable from inside.
Masking /var/lib with tmpfs hides the entire subtree; the BindPaths bind
to /overlay is at a different path and unaffected.

The Python side (ScriptBuilder, run_sandboxed_script, routes) is
unchanged — same sudo-helper invocation, same argv shape.

Loses PID-namespace isolation (no PrivatePID= directive in systemd).
Host PIDs are visible via /proc and ps -ef but not signal-able due to
UID mismatch — information disclosure only, not a privilege boundary.

Smoke-tested on ckn@10.0.4.128 prior to this commit; all isolation
invariants reproduced and the hardening directives provably blocked
unshare(2), mount(2), personality(2), bpf(2), and sysctl writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:47:30 +02:00
mwiegand
efaaf84cd9
docs(specs): script sandbox v2 — systemd-only design + plan
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
  must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
  Information disclosure only — kernel UID-mismatch blocks signals.

Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:46:13 +02:00
mwiegand
a62f26ba4a
fix(l4d2-web): normalize CRLF to LF in script overlay POST
HTML <textarea> form submission encodes line breaks as CRLF per spec.
Storing those CRLFs unchanged means every line of the script reaches
bash with a trailing \r, which bash treats as part of the argument —
turning "ls /" into "ls /\r" and failing. Normalize CRLF/CR → LF in the
/overlays/{id}/script handler so storage and the sandbox tmpfile are
LF-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:20:10 +02:00