Commit graph

58 commits

Author SHA1 Message Date
mwiegand
bc25d423aa
plan(left4me): move idmap from gameserver mount to script-sandbox build
Architectural cleanup: the uid translation is a build-time concern
(the sandbox produces sandbox-uid files); having the gameserver path
unwind that producer-side decision on every mount means the mount
helper carries idmap lifecycle code it shouldn't need. Moving the
idmap into the script-sandbox bind makes files land left4me-owned on
disk, drops ~140 lines from left4me-overlay, and makes all overlay
content (workshop + script-built) consistent on-disk.

Verified on left4.me kernel 6.12.86 that the kernel idmap propagates
through plain re-bind, so systemd-run's BindPaths can wrap a
pre-created idmapped staging path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:15:46 +02:00
mwiegand
2b20bffeb8
spec: handoff doc for rethinking deploy/ dir architecture
The 2026-05-15 script-consolidation pass landed a working but
half-finished mental model: deploy/files/ was retroactively promoted
from "historical reference" to "canonical source," but only for the
script files. Several adjacent things (sudoers/sysctl duplication
across both repos, the systemd unit files that ckn-bw's reactor
ignores, deploy-test-server.sh's role, dead-code apply-cake) didn't
get resolved. Capture the open questions and pointers so a future
session can pick this up and commit to a coherent shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 00:53:55 +02:00
mwiegand
3a2c379b71
plan(left4me-overlay): idmap lowerdir bind mounts for cross-uid copy-up
Persist the implementation plan for adding idmapped bind mounts to
left4me-overlay so that overlay copy-up from l4d2-sandbox-owned lower
layers (script-built overlays) produces left4me-owned upperdir entries
the gameserver can write. Mechanism verified end-to-end on ovh.left4me
in a temp dir on 2026-05-14.
2026-05-14 23:42:36 +02:00
mwiegand
1d3eb51871
docs(plan): RCON console on server detail page
Plan for adding a per-server RCON console: HTMX append-swap input form,
fixed-height scrolling transcript replayed from CommandHistory on load,
multi-packet response handling, owner-only access, 30s timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:14:06 +02:00
mwiegand
f3f0a8927a
docs: add server hostname implementation plan 2026-05-13 14:21:30 +02:00
mwiegand
fcf3143b39
docs: add server hostname cvar design spec 2026-05-13 14:19:57 +02:00
mwiegand
e75feb0649
docs: add rcon password display implementation plan 2026-05-13 11:36:08 +02:00
mwiegand
358a835d65
docs: add rcon password display design spec 2026-05-13 11:35:46 +02:00
mwiegand
83d2a9932c
refactor(rcon): harden _parse_duration; surface fixture handler errors
- _parse_duration wraps int() in try/except so malformed connected
  durations raise RconError (not ValueError leaking past the poller's
  except RconError).
- fake_rcon_server captures handler exceptions and re-raises at context
  exit, so a buggy test handler surfaces as a real failure instead of
  silently degrading into a client-side timeout.
- Two new parser tests: HH:MM:SS duration parsing and malformed input
  coverage.
- Fix Steam ID formula typo in the spec doc (Z*2 + Y, not Y*2 + Z; Y is
  the low bit). Code was already correct.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 21:39:32 +02:00
mwiegand
e25e7098f6
refactor(live-state): drop redundant ix_sps_server_recent index
The two indexes ix_sps_server_open and ix_sps_server_recent were
byte-identical because SQLAlchemy's Index(name, *cols) form drops the
DESC ordering the spec intended. Rather than reach for text("left_at
DESC"), drop the second index entirely — SQLite scans the ASC index
backwards at no measurable cost. Spec and plan updated to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 21:27:01 +02:00
mwiegand
a5f7b736a2
docs/plan: server live-state display implementation plan
Thirteen TDD-structured tasks covering schema migration, RCON client,
spec injection, password generation, Steam Web API client, live-state
poller (RLE snapshots + session reconciliation + profile enrichment +
retention + thread startup), server list badge, server detail
fragment, deploy env, and end-to-end smoke.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 21:10:33 +02:00
mwiegand
202026e11a
docs/spec: add server live-state display design
RCON-based polling with run-length-encoded snapshots, session intervals
with min/max ping, Steam profile cache, and a server-detail roster of
current + recent players hot-linked from Steam CDN avatars.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 21:03:26 +02:00
mwiegand
429fee3868
docs/plan: trim retry-backoff tuple to match attempts-1 2026-05-11 23:21:10 +02:00
mwiegand
532b4c4469
docs: implementation plan for workshop auto-download
7 tasks: retry helper, builder download phase, per-overlay refresh
route, template button, CLI subcommand, systemd timer, smoke-test.
2026-05-11 22:34:31 +02:00
mwiegand
fef8cc4ea6
docs: design for workshop auto-download
Closes the gap where added workshop items never reach disk until an
admin presses the global refresh button. Downloads piggyback on the
per-overlay build_overlay job; daily updates come from a systemd
timer + CLI subcommand that enqueues the existing refresh job.
2026-05-11 22:28:20 +02:00
mwiegand
ccd3b36319
docs: design for profile page with self-service password change
The matching design doc for the implementation plan committed in
6eb9bd0. Captures the session-invalidation reasoning (Django-style
"keep current session, kill others") and the open questions resolved
during brainstorming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:21:40 +02:00
mwiegand
6eb9bd0ab3
docs: plan for profile page with self-service password change
Adds /profile reachable via header username, with change-password form
as its first section. Industry-standard session semantics: other sessions
invalidated on password change, current session kept, via new
users.password_changed_at column + session marker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 21:41:25 +02:00
mwiegand
e1add4fffa
docs(plans): l4d2 network shaping & marking — implementation plan
Eight TDD tasks: sysctl extension, nftables marking (file + unit), CAKE
shaper (env + helper + unit), deploy-script wiring, README. Each task
adds one artifact with its assertion in test_deploy_artifacts.py and
ends in its own commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:10:40 +02:00
mwiegand
0cc92f2c17
docs(specs): l4d2 network shaping & marking — design
CAKE egress shaping (test-deploy oneshot + systemd-networkd [CAKE] block
on prod), nftables uid-based DSCP-EF + skb-priority marking for srcds
UDP, plus rounding sysctls (udp_rmem_min/wmem_min, default_qdisc=fq_codel,
tcp_congestion_control=bbr). Hardware-specific knobs stay documented
escape hatches matching the perf-baseline boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:05:44 +02:00
mwiegand
2d3c98866a
feat(files-overlay): user-managed file content as a third overlay type
Adds Overlay.type='files' whose source-of-truth IS the overlay directory
itself. Users can:

  * upload arbitrary files / whole folders by dragging from the OS onto a
    folder row in the file tree (one POST per file, queue with
    concurrency 3, per-file progress in a floating Uploads panel)
  * move via drag-and-drop inside the tree (same gesture, source
    distinguishes; refuses cycles)
  * create / edit / rename / replace through a single editor modal
    (text flavor for editable files, binary flavor with replace-upload
    for everything else; filename input is the rename surface)
  * mkdir empty folders (slashes allowed for nested intermediates)
  * stream a folder as a zip download
  * delete files and empty folders

Backend is type-agnostic past the new files_routes endpoints, so the
existing mount / spec / overlayfs / expose_server_cfg pipeline is reused
unchanged. is_editable gates the row's edit affordance and the /save
content rules. Three new safe-resolve helpers (write/delete/move) cover
the new operations with the same anchor-and-resolve pattern as listing
and download. FilesBuilder is a no-op so the build subsystem can
dispatch uniformly.

Spec: docs/superpowers/specs/2026-05-09-files-overlay-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:59:32 +02:00
mwiegand
36d3d83de6
docs: postmortem for the overlay-umount EBUSY rabbit hole
Captures the symptom (Reset blew up on `umount target busy`), the
false starts (eager retry, lazy fallback, TimeoutStopSec bump — all
shipped briefly and reverted), the actual root cause (the helper's
own Python interpreter inheriting and pinning the unit's mount
namespace), and the fix (nsenter at the systemd Exec line).

The lessons section is the part future-me reads first: a retry loop
is a hint that something we own is the blocker; probe `/proc/*/ns/mnt`
before assuming kernel async; `+` Exec prefix doesn't escape the
unit's mount namespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 15:50:41 +02:00
mwiegand
b62fc08127
docs(specs): l4d2 cpu pinning — decision record (deferred)
Investigated whether to hard-pin each srcds instance to a single core
within the existing AllowedCPUs=1-7 set. Modern kernels (5.13+) no
longer expose kernel.sched_migration_cost_ns or the other classic CFS
"laziness" tunables, so a global cheap-fix is unavailable. Decision
for now: trust CFS + Nice=-5 + AllowedCPUs=1-7. Per-instance
CPUAffinity= remains an opt-in escape hatch in deploy/README.md.
Documents the revisit triggers and the preferred implementation path
when the time comes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:41:40 +02:00
mwiegand
1dd674714a
docs(specs): perf baseline lifecycle — premise check on system vs user units
Make explicit that the project uses system units (root systemctl, unit
under /usr/local/lib/systemd/system/, WantedBy=multi-user.target), so
`systemctl enable --now` is the correct verb to make instances survive
a host reboot. User units have different lifecycle rules and would not
auto-start at boot without enable-linger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:25:34 +02:00
mwiegand
3b0bde9b50
docs(plans): l4d2 server lifecycle reboot-and-drift — implementation plan
Two TDD tasks: helper+service_control verb rename, then poller code
+ wiring + tests. Operator-side smoke test in F.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:21:59 +02:00
mwiegand
72cd7ca1ef
docs(specs): l4d2 server lifecycle reboot-and-drift — design
Switch lifecycle verbs from systemctl start/stop to enable --now /
disable --now (servers survive host reboot via WantedBy= symlinks),
plus a periodic state poller for runtime drift (OOM kills, manual
systemctl ops, exhausted Restart=on-failure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:21:59 +02:00
mwiegand
c91c029c38
docs(plans): l4d2 cpu isolation — implementation plan
Two TDD tasks: deploy-script cpuset block + tests, README
"CPU isolation" subsection. Operator-side smoke test in F.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:03:37 +02:00
mwiegand
17b7c2ff10
docs(specs): l4d2 cpu isolation — design
cgroup-v2 AllowedCPUs= drop-ins for system/user/build/game slices.
Defaults: core 0 for everything-not-game, cores 1..N-1 for game,
computed from nproc. LEFT4ME_SYSTEM_CPUS / LEFT4ME_GAME_CPUS
overrides; single-core hosts skip with a warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:03:37 +02:00
mwiegand
851e6629aa
docs(plans): l4d2 server host perf baseline — implementation plan
Six tasks (TDD, one commit each): unit directives, slice files,
sysctl conf, sandbox slice + OOMScoreAdjust, deploy-script wiring,
README escape-hatch section. Final verification step with full
deploy + host + web pytest sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:39:12 +02:00
mwiegand
b6574e308b
docs(specs): perf baseline — fix transient-service phrasing
The existing left4me-script-sandbox helper uses systemd-run in
transient service mode (--unit=, no --scope). Spec wrongly said
'--scope'. No semantic change — the design's --slice= and
-p OOMScoreAdjust= guidance is identical for service vs scope mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:39:12 +02:00
mwiegand
db3b149045
docs(specs): l4d2 server host perf baseline — design
Approach A: per-instance unit directives (Nice, OOM, Memory caps,
KillSignal=SIGINT, log-rate disable), flat l4d2-game/l4d2-build slice
hierarchy with 100:1 CPU/IO weight ratio, sandbox into build slice with
OOMScoreAdjust=500, host sysctls for UDP buffers + netdev backlog/budget
+ vm.swappiness. SCHED_FIFO, CPU governor, CPUAffinity, NIC tuning are
documented escape hatches, not auto-applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:31:05 +02:00
mwiegand
985df970f8
feat(l4d2-web): per-overlay server.cfg aliases — expose checkbox + auto-exec
Each linked overlay gets a checkbox on the blueprint detail page that opts
its server.cfg in as exec server_overlay_<id>. The web app builds the
spec with {path, alias} per overlay and prepends exec server_overlay_<id>
lines to the blueprint config in lowest-overlay-first order. The host
stages those copies in the overlayfs upper layer before mounting (avoids
copy-up writes against a sandbox-uid file). A live preview block above the
Config textarea shows what gets auto-executed.

Schema:
- alembic 0007: BlueprintOverlay.expose_server_cfg BOOLEAN

Spec contract:
- l4d2host OverlayRef(path, alias?). load_spec accepts both bare-string
  and {path, alias} entries.

Side effects folded in (same file in l4d2_facade):
- start_server auto-initializes; the manual Initialize step is no longer
  needed before Start.
- initialize_server no longer runs blueprint builders — builds happen on
  overlay save, not on every server Start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 01:26:31 +02:00
mwiegand
dec4fed809
docs(specs): blueprint overlay picker — drag-list + add-dropdown
Replace per-row checkbox + numeric Order inputs with a drag-to-reorder
list of selected overlays plus a native <select> for adding more.
Native HTML5 DnD; no library, no JS-disabled fallback. Server contract
unchanged (overlay_ids in DOM order; existing fallback_position branch
absorbs the omitted overlay_position_<id> fields).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:32:45 +02:00
mwiegand
2ab54a3800
fix(l4d2-web): file tree fetches in plain JS — vendored htmx is a stub
The vendored static/vendor/htmx.min.js turned out to be a 33-byte
placeholder, so the hx-get/hx-target/hx-trigger attributes on the
overlay file tree's folder buttons were inert: clicks rotated the
chevron (own JS) but never fetched. Switch the lazy-load to a
~30-line plain-JS handler in static/js/file-tree.js that fetches
button.dataset.filesUrl on first expand and dedupes via dataset.loaded.
Update the spec/plan to match. Route + partial contracts unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:23:04 +02:00
mwiegand
76bd6e8d4d
docs(specs): overlay file tree — design + implementation plan
Captures the design rationale for the new overlay-detail Files section
(verify build output, click-to-download for individual files via Flask
send_file, HTMX-driven lazy folder expansion) and the paired
implementation plan that produced it. Adds .superpowers/ to .gitignore
so brainstorm session artifacts never sneak into a future commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:16:10 +02:00
mwiegand
1166e13e44
feat(l4d2-web): server identity by id, name as display label
Host-side identifier (systemd unit name and /var/lib/left4me dirs) is now
str(server.id), centralized in services/server_identity.server_unit_name.
Server.name becomes a free-form display label, required and unique per
user (was [a-z0-9_-]{1,64} and globally unique).

Migration 0006 swaps the old global UNIQUE(name) for UNIQUE(user_id, name).
Web routes already keyed on id; templates only used name for display.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:22:09 +02:00
mwiegand
abc907b14b
docs(specs): script sandbox v3 — egress filter design + plan
Captures the v3 design: IPAddressDeny= alone (no IPAddressAllow=any
because the documented "more specific wins" semantics don't hold on
systemd 257 / kernel 6.12 — the allow trumps unconditionally), explicit
CIDRs (the -p parser rejects the localhost/link-local shorthand
keywords), and a static sandbox-only resolv.conf bind to keep DNS
reachable when private RFC1918 ranges are blocked.

Plan documents what was implemented (in 7e66936) and the lessons
surfaced during execution so the next person doesn't have to rediscover
them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:08:47 +02:00
mwiegand
efaaf84cd9
docs(specs): script sandbox v2 — systemd-only design + plan
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
  must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
  Information disclosure only — kernel UID-mismatch blocks signals.

Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:46:13 +02:00
mwiegand
78ead0b41d
docs(specs): script overlay type — design + implementation plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:27:14 +02:00
mwiegand
db120d77d3
docs(specs): kernel overlayfs migration design + plan
Captures the architectural fix for the mount-propagation bug: replace
fuse-overlayfs (rootless mount inside the web service's namespace, never
visible to host or to gameserver units) with kernel-native overlayfs
mounted via a privileged sudo helper that nsenters into PID 1's mount
namespace. Companion plan numbers the migration as five tasks ending in
end-to-end verification on the test box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:19:26 +02:00
mwiegand
92d6ebbe82
feat(l4d2-web): managed global map overlays with daily refresh
Adds two managed system overlays (l4d2center-maps, cedapug-maps) that
fetch curated map archives from upstream sources and reconcile addons
symlinks for non-Steam maps. A daily systemd timer enqueues a coalesced
refresh_global_overlays worker job; downloads, extraction, and rebuilds
run in the existing job worker and surface in the job log UI.

Schema: GlobalOverlaySource / GlobalOverlayItem / GlobalOverlayItemFile
plus nullable Job.user_id so system jobs render as "system" in the UI.
The new builder reconciles symlinks against the per-source vpk cache
and leaves foreign symlinks untouched. Initialize-time guard refuses
to mount a partial overlay if any expected vpk is missing from cache.

Refresh service uses shutil.move to handle EXDEV when /tmp and the
cache live on different filesystems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:05:14 +02:00
mwiegand
b46f52258d
docs(workshop): spec and plan for steam workshop overlays
Add a typed-overlay model with workshop as the first non-external type:
deduplicated WorkshopItem registry, symlink-based overlay directories,
auto-rebuild after item changes, admin global refresh, and a unified
Create-overlay UI with web-managed paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:25:13 +02:00
mwiegand
7e5a8f89b5
docs: add server port constraint implementation plan 2026-05-06 20:53:50 +02:00
mwiegand
833ae318cf
fix(deploy): add venv to PATH in left4me-web systemd service 2026-05-06 20:45:37 +02:00
mwiegand
bbfc528354
feat(deploy): add production-like test deployment 2026-05-06 19:30:10 +02:00
mwiegand
de86139323
feat(l4d2): add l4d2ctl host command boundary 2026-05-06 16:35:20 +02:00
mwiegand
a347829608
feat(l4d2-web): add job pages and cancellation 2026-05-06 15:05:13 +02:00
mwiegand
91d042cf33
feat(l4d2-web): execute queued lifecycle jobs 2026-05-06 14:08:18 +02:00
mwiegand
df680f6226
fix(l4d2-web): reject encoded unsafe redirects 2026-05-06 13:24:04 +02:00
mwiegand
58fb8b2b63
fix(l4d2-web): harden auth redirect targets 2026-05-06 13:01:48 +02:00
mwiegand
deca2c9153
docs(l4d2-web): update auth contract 2026-05-06 12:55:38 +02:00