Commit graph

327 commits

Author SHA1 Message Date
mwiegand
49992b3a26
refactor(repo): uv workspace + hatchling + layout restructure
Migrate from pip-install-e + setuptools to a uv workspace with a
committed uv.lock for deterministic deps. Switch both members to
hatchling, and move package sources into nested standard layout
(l4d2host/l4d2host/, l4d2web/l4d2web/) so builds work from a
read-only source tree — setuptools wrote egg-info to source under
the old layout, which broke uv sync on the root-owned /opt/left4me/src.

Local dev install: `pip install -e ./l4d2host -e ./l4d2web` -> `uv sync`.
.envrc switches from `layout python python3.13` to `use uv`. Python
pinned to 3.13 via .python-version.

l4d2web now declares its cross-dep on l4d2host explicitly via
[tool.uv.sources] (workspace = true). l4d2web/alembic.ini and
l4d2web/alembic/ stay at the project root (standard alembic layout).

Test fixes:
- tests/__init__.py added to both test dirs so pytest doesn't shadow
  l4d2host as a namespace package via outer-dir walk.
- 3 CWD-relative paths in tests (l4d2web/static/css/{tokens,layout}.css
  and js/sse.js) anchored to Path(__file__) so they survive layout
  changes.
- Two test_install.py tests now monkeypatch HOME to tmp_path so they
  stop silently mutating ~/.steam/sdk32 on every run.

628 tests pass under sandboxed `uv run pytest`.

Per docs/superpowers/plans/2026-05-15-uv-workspace-execution.md;
prereq for the ckn-bw bundle's uv-sync action (queued).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 22:04:29 +02:00
mwiegand
a7580ea759
deploy/tests: assert both hardening drop-ins allow x86 syscalls
The web and server hardening drop-ins both fork-exec 32-bit binaries
on critical paths (steamcmd_linux from the install job, srcds_linux
on the game side). When the web drop-in had SystemCallArchitectures=native
and the server had native x86, the asymmetry silently broke the install
flow — bash exit 159 (SIGSYS) — for as long as nobody re-triggered it.

Pin the constraint as a test: both drop-ins must agree on
SystemCallArchitectures, and both must include x86.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 20:35:18 +02:00
mwiegand
e28d4fad8c
l4d2web/csp: allow Steam avatar CDN in img-src
The live-state grid renders player avatars as <img src="https://avatars.steamstatic.com/...">,
but the CSP img-src directive was `'self' data:` — so the browser
silently blocked every avatar load, leaving placeholder circles in
place. The DB cache and Steam API path were both healthy; only the
browser-side load was blocked.

Use the wildcard *.steamstatic.com host-source rather than pinning a
single hostname: Steam rotates avatars across steamcdn-a.akamaihd.net,
avatars.akamai/cloudflare/fastly.steamstatic.com over time, and a
single-hostname allowlist would re-break on the next shuffle.

Test now pins img-src explicitly — the previous assertions only
checked default-src/frame-ancestors/form-action, so a regression of
this exact line would have silently passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 20:23:29 +02:00
mwiegand
b13d164931
spec(uv-workspace): handoff for the venv-chain → uv workspace migration
Queued for a future agent: collapse the 5-action venv chain in ckn-bw
(create_venv + pip_upgrade + pip_install [the tempdir-copy dance] +
alembic_upgrade + seed_overlays) into 3 actions backed by a uv
workspace at the left4me repo root and a single `uv sync --frozen`
driven by a committed uv.lock.

Handoff is self-contained: spike test for the source-cleanliness
assumption, fallback to Medium scope if that fails, concrete file
edits in both repos, migration order, verification matrix, and risks.
Independent of the just-shipped deployment-responsibility reshape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 20:16:38 +02:00
mwiegand
55b013833b
deploy/hardening: allow x86 syscalls on web drop-in (steamcmd is 32-bit)
The web service handles install jobs by fork-exec'ing steamcmd_linux,
a 32-bit binary. With SystemCallArchitectures=native (x86_64 only) the
kernel SIGSYS-kills it on its first i386 syscall — surfaced as bash
exit 159 (= 128 + SIGSYS) in job logs. Mirror the server drop-in's
`native x86` so the install path works again; the server unit already
needed the same allowance for srcds_linux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 20:14:26 +02:00
mwiegand
450f9f1591
deploy/docs+cleanup: describe symlink model; drop stale scripts/ tracked paths
deploy/README.md: rewrite intro to reflect that deploy/files/ and
deploy/scripts/ are the canonical sources of truth (not examples), with
hardening drop-ins explicitly listed; reference fixtures in
files/usr/local/lib/systemd/system/ noted as such.

spec: add ## Status block marking the deployment-responsibility migration
shipped 2026-05-15.

Cleanup: remove the old scripts/{libexec,sbin,tests}/ paths that were
still tracked after the 2834ad4 move to deploy/scripts/. The content
is already present at deploy/scripts/; these entries were a tracking
artifact from an incomplete git mv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:48:59 +02:00
mwiegand
2834ad4911
deploy: move scripts/{libexec,sbin}/ into deploy/scripts/
Layout consistency: everything ckn-bw deploys to the host now lives
under deploy/. ckn-bw's install_left4me_scripts copy-action goes away
in lockstep with this commit and is replaced by target-side symlinks.

Also updates all path references in docs, tests (conftest.py parents[]
depth, test_overlay_helper.py HELPER_SOURCE), and deploy/README.md.

Part of 2026-05-15-deployment-responsibility-design.md migration step 4.
2026-05-15 19:38:42 +02:00
mwiegand
55d5ab4017
plan(deployment-responsibility): mark Task 3 done 2026-05-15 19:30:35 +02:00
mwiegand
2c4bf1a27f
deploy/tests: add visudo syntax test for the sudoers drop-in
Pre-deploy syntax guard; replaces ckn-bw's per-item test_with which
won't apply to a symlink-delivered file (see deployment-responsibility
migration step 3).
2026-05-15 19:28:45 +02:00
mwiegand
3703749252
deploy/hardening: drop ProcSubset=pid from the server drop-in (regression fix)
The hardening-extraction subagent (commit just prior) re-introduced
ProcSubset=pid into the server@ drop-in because the design plan's
template had it. The directive had previously been removed from the
live unit by ckn-bw 4339289 — it hides /proc/cpuinfo and breaks
SteamAPI master-server registration, leaving the server in LAN-only
fallback ("LAN servers are restricted to local clients (class C)").

Add a negative assertion in the drop-in test so the regression cannot
sneak back in via a copy-paste edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 19:24:34 +02:00
mwiegand
e9c172a619
deploy: extract hardening into drop-in files alongside the units
Hardening directives leave the base unit body and live in:
  deploy/files/etc/systemd/system/left4me-web.service.d/10-hardening.conf
  deploy/files/etc/systemd/system/left4me-server@.service.d/10-hardening.conf

Reference units now describe just the base operational shape (exec,
env, restart, resources). Tests split: base-unit content and hardening
profile are asserted separately.

Part of 2026-05-15-deployment-responsibility-design.md migration
step 2. ckn-bw lands the matching reactor surgery + symlink delivery.
2026-05-15 19:16:59 +02:00
mwiegand
949f1bae78
deploy/sysctl: absorb kernel.yama.ptrace_scope into the drop-in
Single source of truth for left4me sysctl tuning. The metadata entry
in ckn-bw (sysctl/kernel/yama/ptrace_scope) is removed in lockstep;
the live value is unchanged.

Part of 2026-05-15-deployment-responsibility-design.md migration step 1
(canary).
2026-05-15 19:00:35 +02:00
mwiegand
672fd9660b
plan(deployment-responsibility): five-task migration with sysctl canary
Implementation plan for 2026-05-15-deployment-responsibility-design.md.
Bite-sized steps per task; each task ends with both repos committed
and ovh.left4me idempotent. Tasks: (1) sysctl consolidation canary,
(2) hardening drop-ins, (3) sudoers symlink, (4) scripts relocation
+ symlinks, (5) cleanup + docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:57:45 +02:00
mwiegand
ddf97b3a05
spec(deployment-responsibility): mark handoff resolved by the design doc
Brainstorm happened; design at 2026-05-15-deployment-responsibility-design.md.
Handoff doc stays as the historical framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:51:12 +02:00
mwiegand
c446f6c8eb
spec(deployment-responsibility): design — symlink hardening drop-ins, sudoers, sysctl, helpers
Conservative reshape coming out of the brainstorm: application-shape
static artifacts move to left4me/deploy/ and are delivered to the
target via bw symlink items pointing into /opt/left4me/src/deploy/...
(safe because the runtime-state relocation made the checkout
root-owned). Per-host shape — base unit bodies, slice CPU pinning,
env templates, nginx/timers/nftables metadata — stays bw-managed in
ckn-bw.

Moves: hardening drop-ins (new), sudoers (dedup mirror), sysctl
drop-in (dedup mirror + absorb ptrace_scope metadata entry),
privileged scripts (relocate scripts/ to deploy/scripts/, replace
install-action with symlinks).

Five-step migration with sysctl consolidation as the canary, then
hardening drop-ins, sudoers, scripts, cleanup. Lands before the
build-overlay-unit refactor so that work can ship its hardening
drop-in inline using this pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:48:13 +02:00
mwiegand
434ee20339
refactor(deploy): venv + steam now under /var/lib/left4me
Sync deployment references for the runtime state relocation
shipped via ckn-bw (commit 6fae2fd). /opt/left4me/ is now a
root-owned deploy-artifact root (just src/); .venv and steamcmd
live at /var/lib/left4me/{.venv,steam}.

Touches:
- deploy/files/.../left4me-web.service: PATH + ExecStart
- deploy/files/.../left4me-workshop-refresh.service: WorkingDirectory
  (was /opt/left4me, now /opt/left4me/src to match the web unit),
  PATH, ExecStart
- scripts/sbin/left4me wrapper: flask path
- deploy/tests/test_example_units.py: PATH + ExecStart assertions
  for the web unit; also fix a pre-existing broken assertion that
  read "Environment=PATH=..." (the unit has Environment=HOME=...
  PATH=... on one line, so "Environment=PATH=" was never present)
  - now reads just "PATH=..."
- deploy/README.md: paths
- l4d2host/tests/test_cli.py: LEFT4ME_STEAMCMD fixture path

Design + as-shipped record:
docs/superpowers/specs/2026-05-15-runtime-state-relocation-design.md.
The original (narrower) prereq spec at
docs/superpowers/specs/2026-05-15-handoff-noneditable-install.md
is marked superseded with a pointer to what shipped + why the
scope grew (setuptools writes egg-info to source during PEP 517
build prep).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:56:32 +02:00
mwiegand
ff2b5c4c5a
spec(noneditable-install): handoff for the install refactor prereq
Self-contained spec for the next agent to land the editable→
non-editable install switch and the root-ownership flip on
/opt/left4me/src. Prereq for the deployment-responsibility brainstorm:
target-side symlinks from /etc/... into the checkout's deploy/files/
only become safe once the checkout is unwritable by the left4me user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:53:19 +02:00
mwiegand
6cf4517a88
fix(deploy/files): drop ProcSubset=pid from web reference unit
Mirrors ckn-bw fix: ProcSubset=pid hides /proc/sys/kernel/random/boot_id,
which journalctl needs at startup; web unit invokes journalctl for
live log streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:14:40 +02:00
mwiegand
15c620f95c
spec(deployment-responsibility): handoff for brainstorming the deploy split
The hardening refactor + uid-collapse make the "what does left4me own
vs. ckn-bw own" question more pointed. The 2026-05-06 deployment
design already framed this: deploy/files/ in left4me mirrors target
paths, configmgmt integrates. Some artifacts have drifted into the
ckn-bw reactor since (systemd unit emissions, sysctl defaults); the
brainstorming session reconciles.

Sequenced after uid-collapse. Self-contained for a fresh Claude
session to read cold via superpowers:brainstorming.

Session-handoff updated to point at this as the next-next queued work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:56:38 +02:00
mwiegand
8971b23617
refactor(sandbox): collapse l4d2-sandbox user into left4me
The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
2026-05-15 15:50:57 +02:00
mwiegand
146cb01450
plan(uid-collapse): drop l4d2-sandbox user; handoff to next session
Approved-but-not-executed plan to collapse the two-user model
(left4me + l4d2-sandbox) into one. The build-time-idmap that
translates sandbox writes back to left4me uid becomes a no-op when
source uid == target uid, so it's removed along with ~30 lines of
helper plumbing. Hardening already covers the same-uid attack
surface the sandbox uid was defending against, so collapsing makes
the architecture consistent with the web/server hardening-only
decision.

Plan: docs/superpowers/plans/2026-05-15-uid-collapse.md
Handoff: docs/superpowers/specs/2026-05-15-session-handoff.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:39:51 +02:00
mwiegand
f5f8db84ef
spec(session-handoff): hardening refactor landed and verified on left4.me
12-task subagent-driven refactor complete. left4me-server@1: 7.5 → 1.3
systemd-analyze. left4me-web: 8.7 → 4.1. All 6 Test 8 attack vectors
blocked post-deploy. One acceptable SECCOMP audit line per gameserver
restart (Breakpad's ptrace fork, blocked by design). Test tooling
(gdb, seccomp, libseccomp-dev) apt-removed from left4.me. uid-split
spec marked superseded.

No queued follow-up. Adjacent work: build-overlay-unit refactor and
the deferred drop-in / configmgmt-responsibility reshape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:17:06 +02:00
mwiegand
f615d0de75
spec(user-uid-split): mark superseded by the hardening refactor
The 1/2/3-user question is answered: stay at 2 (left4me + l4d2-sandbox).
The defenses that motivated a 3-user split (cross-uid ptrace,
cross-server contamination, web-side reach into gameserver state,
DB/env exposure to srcds) are closed by the systemd hardening
composition: PrivateUsers + PrivatePIDs + TemporaryFileSystem +
SystemCallFilter=~@debug + empty CapabilityBoundingSet.

The residual filesystem-ACL surface (mode 0640 root:left4me on DB and
web.env) is noted as a separate concern — covered for the current
deployment shape, revisit if shape changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:59:13 +02:00
mwiegand
37309ba399
spec(hardening-test-plan): fix four bugs surfaced by executor
Four corrections noted by the test plan's executor in commit 461b8d0:

- PID-lookup race: pgrep+head can pick the wrong instance. Replace
  with systemctl show -p MainPID --value left4me-server@N.service.
- gdb-from-host ptrace check: nsenter into only the mount namespace
  with root caps bypasses the SECCOMP filter, so the test is a false
  positive. Replace with systemd-run-with-same-directives probe, or
  syscall-filter inspection.
- D5 pgrep pattern: 'srcds_linux.*\@2' doesn't match because @N is
  in the unit name, not argv. Use systemctl show -p MainPID.
- scmp_sys_resolver is in the seccomp package on Debian 13, not
  libseccomp-dev.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:58:46 +02:00
mwiegand
8e678b6765
deploy/files: annotate reference units with per-directive hardening comments
Update the educational reference copies of left4me-server@.service and
left4me-web.service to match the new hardening composition from the
ckn-bw reactor (HARDENING_COMMON + HARDENING_SERVER / HARDENING_WEB).
Per-directive comments explain each defense's purpose and the threat
it addresses, so a cold reader of this repo can understand the threat
model from the unit file alone.

Top-of-file note in each reference points at the ckn-bw reactor as
the live source; reference is hand-synced.

gunicorn ExecStart in the web reference uses placeholder
'--workers 4 --threads 4' values; live emission interpolates from
metadata. This is the documented divergence between the reference
and the deployed unit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:54:10 +02:00
mwiegand
7c64910c90
spec(hardening-refactor): resolve emitter open items
Verified during plan execution that the ckn-bw systemd-bundle emitter
handles tuples and empty values as expected. SocketBindAllow port
range hard-coded since systemd directive variable substitution is not
universal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:39:11 +02:00
mwiegand
b1293f9952
plan(hardening-refactor): implementation plan against the proven composition
12 tasks across left4me + ckn-bw: emitter verification, three Python
constants in the systemd_units reactor, spread into both managed units,
sysctl drop-in, annotated reference units, four spec bug fixes, mark
uid-split spec superseded, cross-repo push, bw apply + verify on host,
apt-remove test tooling. Each task has bite-sized steps with exact
commands and expected output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:25:25 +02:00
mwiegand
81dc29a9c3
spec(hardening-refactor): revise design — inline-in-reactor, defer drop-in reshape
Going back to the inline-in-reactor shape: hardening directives land in
ckn-bw's systemd_units reactor as shared Python dicts (HARDENING_COMMON
+ HARDENING_SERVER + HARDENING_WEB), spread into each unit's Service
block. Educational reference units in deploy/files/.../*.service stay
and get per-directive comments. Operator discipline hand-syncs the
reference to the reactor; no CI drift test.

The broader responsibility reshape — hardening drop-ins living in
left4me with ckn-bw as a thin file-shipper — is worth pursuing as a
separate dedicated session, not bundled into this refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:16:02 +02:00
mwiegand
3256ed2ab1
spec(hardening-refactor): design — drop-ins owned by left4me, ckn-bw deploys
Hardening composition is application knowledge (which paths to bind, that
srcds is i386, what breaks sudo). It belongs in the left4me repo as
drop-in .conf files under deploy/files/etc/systemd/system/<unit>.d/.
ckn-bw shrinks: keeps the base units in its reactor, removes the
hardening keys, ships the drop-ins to /etc/systemd/system/. Existing
educational reference units in deploy/files/.../*.service are deleted in
favor of the drop-ins, which carry per-directive comments. Broader
configmgmt-responsibility reshape (base units leaving the reactor)
deliberately deferred to a future session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:05:38 +02:00
mwiegand
152c313315
spec(session-handoff): point next session at hardening-refactor plan
The prior handoff pointed this session at running the test plan; that's
done (commit 461b8d0). Update the handoff to point the next session at
writing docs/superpowers/plans/2026-MM-DD-hardening-refactor.md against
the proven composition, including the two amendments (x86 arch,
PrivatePIDs) and the MDW permanent exclusion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 13:43:37 +02:00
mwiegand
461b8d028f
spec(hardening): test plan executed on left4.me — results recorded
Ran the 11-test plan against left4me-server@1 (canary) and left4me-web
on left4.me / Debian 13 / systemd 257. Cleaned up all unit drop-ins;
kept the Test 9 sysctl (kernel.yama.ptrace_scope=2) per spec.

Outcomes:
- server@1 systemd-analyze: 7.5 EXPOSED → 1.3 OK
- left4me-web systemd-analyze: 8.7 EXPOSED → 4.1 OK
- All 8 attack vectors in Test 8 (D1.a-c, D2.a-c, D3, D5) blocked
- Test 6 (MemoryDenyWriteExecute) fails as predicted — Source engine
  i386 .so files have text relocations; exclude from final composition.
- Test 11 (24-48h soak) skipped per operator decision.

Two amendments to the spec's proposed composition required for the
refactor:
- SystemCallArchitectures=native x86 (not bare 'native') — srcds_linux
  is i386, the kernel kills every native-only call.
- PrivatePIDs=true added — ProtectProc=invisible alone cannot hide
  gunicorn from srcds because both run as uid 980; PrivatePIDs gives
  each instance its own PID namespace and closes D2.b.

Spec bugs surfaced and documented in the "Output" section: PID lookup
via pgrep (race vs. instance), Test 4/10 gdb-from-host doesn't
actually exercise the unit's SECCOMP filter, Test 8 D5 pgrep pattern
won't match. Tooling note corrected: scmp_sys_resolver is in
'seccomp' package, not 'libseccomp-dev'.

Next session: write docs/superpowers/plans/2026-MM-DD-hardening-refactor.md
against the proven composition; supersede the uid-split spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 13:39:50 +02:00
mwiegand
1df811e62a
spec(hardening): threat model + defenses survey + test plan; pivot handoff
Reframe the queued uid-split decision into a broader hardening analysis.
Audit found the same-uid attack surface (DB readable from srcds, ptrace
allowed, RCON stored plaintext) is closable by either uid split or
systemd directive composition; the three specs ground that choice in a
threat model, survey the defenses, and lay out a self-contained test
plan to run on left4.me next. uid-split spec deferred pending results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 13:07:40 +02:00
mwiegand
9a2ab974e6
spec: session handoff pointing next session at uid-split
Short companion to the existing topic-specific handoff docs. Captures
the situationally-fresh state at the end of the 2026-05-15
deploy-dir-rethink + janitorial sweep so a fresh session can pick
up cold: what just landed, what's next (uid-split), what's NOT next
(build-overlay-unit, until uid-split decides), and the
decision-relevant signals that emerged during this session — mostly
that the 2-uid model was freshly load-bearing in the build-time-idmap
work and that srcds hardening already covers most of what a
gameserver-uid split would add.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:17:55 +02:00
mwiegand
4aa69c2461
spec(janitorial): mark items 8, 9 resolved after on-host verification
Both items were operational verifications (not code changes) against
the deployed test host ovh.left4me (141.95.32.8).

Item 8: orphan idmap binds in PID 1's mount namespace.
  `sudo findmnt --task 1 -o TARGET | grep /var/lib/left4me/runtime/.*/idmap/`
  returned zero matches with left4me-server@{1,2}.service both active.
  Either swept earlier or never appeared on this host; nothing to umount.

Item 9: Optimized Settings (overlay 8) files-overlay sanity.
  Dir is left4me:left4me end-to-end; `sudo find /var/lib/left4me/overlays/8
  -type f -uid 981` returned empty. The invariant "files-overlays are
  populated by the web app as left4me, never through the sandbox helper"
  holds.

Remaining live janitorial items: 7 (conditional on the build-overlay-unit
refactor) and 10 (SourceMod 1.13 calendar reminder, ~late 2026).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:14:34 +02:00
mwiegand
8f30dd7754
docs: correct stale bubblewrap references in v1 spec + live docstring
Janitorial item 6 in 2026-05-15-janitorial-cleanup.md. The v1 sandbox
design (2026-05-08-l4d2-script-overlays-design.md) was approved
2026-05-08 and superseded the same day by the v2 systemd-only design
(2026-05-08-l4d2-script-sandbox-v2-systemd.md). The current
left4me-script-sandbox helper uses systemd-run in service-unit mode;
no bwrap binary is invoked. The v1 spec still described bubblewrap as
the engine.

- v1 spec gets a top-of-file banner pointing at v2 as the supersede.
  Body preserved; the rest of the v1 design (overlay-type unification,
  resource caps, helper auth) is still valid — only the sandbox engine
  changed.
- l4d2web/services/overlay_builders.py: ScriptBuilder docstring
  "bubblewrap + systemd-run" → "hardened systemd-run transient
  service" (the as-built reality).
- scripts/tests/test_script_sandbox.py: stray "/bwrap" in a comment
  cleaned up. Negative regression assertions (`assert "bwrap" not in
  text`) intentionally retained as the guard against accidental
  re-introduction.
- Plan docs left untouched (historical action snapshots).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:12:31 +02:00
mwiegand
160911fbca
spec(deploy-dir-rethink): plan + mark adjacent specs resolved
Adds the implementation plan that landed in the preceding commit
(2026-05-15-deploy-dir-rethink.md) under docs/superpowers/plans/, and
marks the two related specs:

- 2026-05-15-deploy-dir-rethink-design.md (the source handoff) gets a
  "Resolved by …" banner at the top with a one-paragraph summary of
  the decisions taken. Body preserved for archaeology.

- 2026-05-15-janitorial-cleanup.md gets a status banner noting that
  items 1, 3, 4, 5 are fully resolved by the deploy-dir-rethink plan
  and item 2 is partially resolved with a third option the original
  enumeration didn't list: only the truly-dead two static units
  (cake.service, nft-mark.service) deleted, the reactor-emitted set
  (server@, web, workshop-refresh.{service,timer}, slices) retained
  as curated examples. Resolved items left in place but flagged.

Remaining live janitorial items: 6 (bubblewrap doc drift), 7
(conditional on build-overlay-unit refactor), 8 (operational idmap
bind cleanup), 9 (Optimized Settings overlay verification), 10 (SM
1.13 calendar reminder).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:05:53 +02:00
mwiegand
5284e28af7
refactor: move privileged scripts to scripts/{libexec,sbin}/; deploy/ is reference
Pulls the 5 privileged helpers out of deploy/files/usr/local/{libexec,sbin}/
into top-level scripts/{libexec,sbin}/. They are application-inherent code
(invoked at runtime via sudo from l4d2host/l4d2web), not deploy artifacts —
the previous nesting under deploy/files/ confused source-of-truth with
install-target FHS layout.

deploy/ now means "reference exemplar": README explaining the target
layout, plus example sudoers / sysctl / sandbox-resolv.conf / env
templates / curated systemd units (the ones ckn-bw's reactor emits).
Anyone building a fresh deployment (other than ckn-bw) reads this tree.

Dead static artifacts deleted: left4me-apply-cake helper, left4me-cake
+ left4me-nft-mark service units, cake.env, left4me-mark.nft, and the
superseded deploy-test-server.sh installer.

Tests split to match the new shape:
- scripts/tests/{test_overlay,test_script_sandbox,test_systemctl_helper,
  test_journalctl_helper,test_helpers_use_fixed_paths,test_sudoers_grants}.py
  with shared fixtures in conftest.py
- deploy/tests/test_example_units.py (renamed from test_deploy_artifacts.py)
  — slimmed to lock down the curated example units, sysctl, env templates

l4d2host/tests/test_overlay_helper.py: helper-source path updated to
scripts/libexec/left4me-overlay (was building the path segment-by-segment
under deploy/files/, missed by the path-prefix grep during pre-flight).

Runtime install-target paths (/usr/local/{libexec,sbin}/) unchanged, so
l4d2host/service_control.py, l4d2web/services/overlay_builders.py, the
sudoers grants, and the systemd units all keep their existing path
references.

Requires the matching ckn-bw change to bundles/left4me/items.py
(install_left4me_scripts repointed from /opt/left4me/src/deploy/files/...
to /opt/left4me/src/scripts/...). Left4me lands first so a fresh
git_deploy exposes the new source path before the bundle apply runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:05:30 +02:00
mwiegand
e38b844978
docs: janitorial cleanup checklist + L4D2 server cvar reference
Two follow-ups bundled into a single commit:

- docs/superpowers/specs/2026-05-15-janitorial-cleanup.md collects
  the "do later" small TODOs that surfaced across the recent idmap
  + consolidation work: dead cake-related artifacts, obsolete
  static systemd units in deploy/files/, the bubblewrap→systemd-run
  doc drift, stale gameserver-side idmap binds on un-checked
  instances, calendar reminder for SM 1.13 stable. Each item is
  small and self-contained.

- docs/l4d2-server-cvar-reference.md captures the research from
  the early-session L4D2 cvar deep-dive: tickrate sweet spots,
  nb_update_frequency cheat-protection + sm_cvar workaround,
  cvars that don't exist in L4D2 (net_maxcleartime,
  z_resolve_zombie_collision_multiplier per RCON probe), recommended
  plugins, MetaMod/SourceMod branch tracking, and the empirically-
  verified idmap-propagation-through-rebind kernel-6.12 quirk.
  Reference material, not a spec — lives at docs/ root.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 02:05:12 +02:00
mwiegand
a450491a90
spec(uid-split): note these are system units, not user units
Explicit clarification so the next agent doesn't go looking for
user-unit friction. left4me-server@.service and left4me-web.service
are system units that drop to User=left4me; the 3-user split is a
literal one-line edit per unit. No lingering, no pam_systemd, no
per-user systemd instance bootstrap. The privileged
ExecStartPre/ExecStopPost steps stay root via the + prefix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:59:56 +02:00
mwiegand
62cf6cdd56
spec: handoff for revisiting 1/2/3-user split for left4me
The 2-user split (left4me + l4d2-sandbox) has been inherited as a
constraint across multiple recent plans (idmap-on-mount, build-time-
idmap, helper consolidation) without ever being designed
end-to-end. Three plausible configurations: collapse to 1 user
(rejected for security), keep at 2 users (status quo), or split web
from game into 3 users for blast-radius limiting on either side.

Doc captures the threat-model heuristics, cross-uid file-access
plumbing options (shared group vs. world-read), idmap implications,
a step-by-step migration sketch for the 3-user variant, and explicit
out-of-scope items (per-instance gameserver uids, etc.). Detailed
enough that a future session can pick a configuration and execute
without re-deriving the design space.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:58:09 +02:00
mwiegand
28b0ff951b
spec(build-overlay-unit): flag DB-fetch-in-ExecStartPre as an option
The script content lives in the overlays.script DB column and the
unit's %i is the row id, so the worker-writes-script-to-fs step in
the original sketch is duplication. Document three options (worker
writes / unit fetches via helper / pipe to stdin) and recommend the
unit-fetches variant with RuntimeDirectory= auto-cleanup. Promote
this to the top of the open-decisions list since it shapes the
worker, the unit, and whether a fetch-script helper is added.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:54:41 +02:00
mwiegand
a9bbc209ae
spec: handoff for replacing script-sandbox helper with template unit
The build-time idmap landing today required a nsenter self-wrap in
left4me-script-sandbox to escape the web app's PrivateTmp namespace
before pre-creating the idmapped staging bind. Working but band-aid:
the helper is reinventing what a systemd template unit would do
declaratively. Mirror the left4me-server@.service pattern with a
build-overlay@.service template — ExecStartPre does the idmap bind in
PID 1's namespace by default, the hardening flags live in the unit
file, ExecStopPost tears down. Worker switches to sudo systemctl start.

Doc captures full proposed unit, worker rewrite sketch, sudoers
update, migration order, verification steps, and the ~5h estimate
so a future session can pick this up cold and execute.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:52:57 +02:00
mwiegand
7a25c2453c
fix(left4me-script-sandbox): self-wrap into PID 1's mount namespace
The web service runs with PrivateTmp=true, which puts it in its own
mount namespace. Worker invokes the sandbox helper via sudo from there;
the helper's pre-systemd-run `mount --bind --map-users=...` lands in
the web service's namespace. systemd-run then spawns transient units
in PID 1's namespace where the bind is invisible — the BindPaths lookup
finds an empty staging dir owned by root, and the sandbox uid hits
permission-denied on every write.

Mirror the pattern from left4me-overlay's ExecStartPre wrapper: enter
PID 1's mount namespace at the start of the helper via `nsenter
--mount=/proc/1/ns/mnt`. Sentinel env var avoids exec recursion. The
gameserver helper handles this at the unit level; the script helper
doesn't have a unit so we self-wrap.

Diagnosis: 5 failed builds all hit the same EACCES on the first
`mkdir`/`tar mkdir`. Direct SSH-sudo invocations of the same helper
succeeded because SSH-sudo doesn't inherit a private namespace; only
the worker-invoked path is affected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:33:13 +02:00
mwiegand
48381089d3
refactor(left4me-overlay): move uid translation to script-sandbox build
left4me-script-sandbox now pre-creates an idmapped bind staging path
(--map-users=<left4me_uid>:<sandbox_uid>:1) and points the sandbox's
BindPaths at that staging instead of the raw overlay dir. Writes from
inside the sandbox (uid l4d2-sandbox) land on disk as left4me, so all
overlay content is uniformly left4me-owned end-to-end.

left4me-overlay loses ~165 lines of idmap-on-mount logic: the per-
lowerdir stat + idmap-bind setup, the bind-umount loop in teardown,
the uid lookup helpers, the _is_mountpoint /proc/self/mountinfo parser,
and the LEFT4ME_TEST_* env-var stubs. It's back to a simple "validate
lowerdirs, mount overlay" shape; gameserver mount path no longer needs
to know about producer-side ownership decisions.

Verified on kernel 6.12 that the kernel idmap propagates through
systemd-run's plain re-bind of the staging path. Tests dropped 4
idmap-on-mount specs and one deploy-artifact regression check; added
test_script_sandbox_uses_idmap_staging to pin the new staging path
+ map flags + trap cleanup.

The post-build world-read chmod kludge in the sandbox is also dropped:
the web app reads overlay files via its primary uid (left4me).

Existing overlays on the test server are sandbox-owned from prior runs
and need a one-shot `chown -R left4me:left4me /var/lib/left4me/overlays`
during deploy. New overlays produced by the refactored sandbox are
left4me-owned from creation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:20:39 +02:00
mwiegand
bc25d423aa
plan(left4me): move idmap from gameserver mount to script-sandbox build
Architectural cleanup: the uid translation is a build-time concern
(the sandbox produces sandbox-uid files); having the gameserver path
unwind that producer-side decision on every mount means the mount
helper carries idmap lifecycle code it shouldn't need. Moving the
idmap into the script-sandbox bind makes files land left4me-owned on
disk, drops ~140 lines from left4me-overlay, and makes all overlay
content (workshop + script-built) consistent on-disk.

Verified on left4.me kernel 6.12.86 that the kernel idmap propagates
through plain re-bind, so systemd-run's BindPaths can wrap a
pre-created idmapped staging path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:15:46 +02:00
mwiegand
dd918aca4b
fix(left4me-overlay): use /proc/self/mountinfo to detect bind mounts
os.path.ismount() compares st_dev against the parent dir, which silently
returns False for same-fs bind mounts. The idmap binds at runtime/<n>/
idmap/<basename> are exactly that case, so:

- cmd_umount skipped the bind-umount step every stop, leaving orphan
  binds in PID 1's mount namespace.
- cmd_mount's idempotency check then "didn't see" the orphan and
  re-bound on top, accumulating one mount per start/stop cycle.

Findmnt nesting like
    /var/lib/left4me/runtime/2/idmap/overlays_9
    └─/var/lib/left4me/runtime/2/idmap/overlays_9
is the visible symptom. Reboot wipes everything so the bug is invisible
on a fresh boot — only stop/start cycles accumulate.

Replace both ismount sites with a _is_mountpoint() helper that reads
/proc/self/mountinfo (column 5 is the mount point). Keep os.path.ismount
for the overlay merged check, where it's reliable (distinct fs type).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 01:02:18 +02:00
mwiegand
2b20bffeb8
spec: handoff doc for rethinking deploy/ dir architecture
The 2026-05-15 script-consolidation pass landed a working but
half-finished mental model: deploy/files/ was retroactively promoted
from "historical reference" to "canonical source," but only for the
script files. Several adjacent things (sudoers/sysctl duplication
across both repos, the systemd unit files that ckn-bw's reactor
ignores, deploy-test-server.sh's role, dead-code apply-cake) didn't
get resolved. Capture the open questions and pointers so a future
session can pick this up and commit to a coherent shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 00:53:55 +02:00
mwiegand
f5e36eef79
deploy: claim /usr/local/sbin/left4me admin CLI in deploy/files
ckn-bw was shipping the admin CLI wrapper (sudo left4me <flask
subcommand>) verbatim from its own bundle copy. Move ownership of the
file into left4me so ckn-bw's upcoming install-action approach can
deploy it from deploy/files/usr/local/sbin/left4me on the deployed
git checkout, eliminating the cross-repo duplication that masked the
idmap helper update earlier.

Also re-frame deploy/README.md: deploy/files/, deploy/templates/, and
deploy/tests/ are now genuinely canonical (read by ckn-bw via
git_deploy). Only deploy-test-server.sh remains a superseded artifact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 00:41:06 +02:00
mwiegand
f231ebcb0d
doc(deploy): clarify ckn-bw verbatim-sync workflow for shipped files
Spell out that the deploy step for changes to verbatim-shipped files
(privileged helpers, sudoers, sysctl, …) is just re-syncing the bundle's
copy + bw apply. Removes ambiguity for the idmap helper change and any
future edit within the same set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 23:57:31 +02:00
mwiegand
e4101de7a5
test(deploy): assert left4me-overlay idmaps sandbox-owned lowerdirs
Guards against silent regression of the idmap bind-mount step in the
privileged kernel-overlayfs helper. Asserts --map-users / --map-groups
argv, the runtime/<name>/idmap/ target path, the LEFT4ME_TEST_* stub-
env-var names, and the collision-detection table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 23:56:36 +02:00