Both items were operational verifications (not code changes) against
the deployed test host ovh.left4me (141.95.32.8).
Item 8: orphan idmap binds in PID 1's mount namespace.
`sudo findmnt --task 1 -o TARGET | grep /var/lib/left4me/runtime/.*/idmap/`
returned zero matches with left4me-server@{1,2}.service both active.
Either swept earlier or never appeared on this host; nothing to umount.
Item 9: Optimized Settings (overlay 8) files-overlay sanity.
Dir is left4me:left4me end-to-end; `sudo find /var/lib/left4me/overlays/8
-type f -uid 981` returned empty. The invariant "files-overlays are
populated by the web app as left4me, never through the sandbox helper"
holds.
Remaining live janitorial items: 7 (conditional on the build-overlay-unit
refactor) and 10 (SourceMod 1.13 calendar reminder, ~late 2026).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Janitorial item 6 in 2026-05-15-janitorial-cleanup.md. The v1 sandbox
design (2026-05-08-l4d2-script-overlays-design.md) was approved
2026-05-08 and superseded the same day by the v2 systemd-only design
(2026-05-08-l4d2-script-sandbox-v2-systemd.md). The current
left4me-script-sandbox helper uses systemd-run in service-unit mode;
no bwrap binary is invoked. The v1 spec still described bubblewrap as
the engine.
- v1 spec gets a top-of-file banner pointing at v2 as the supersede.
Body preserved; the rest of the v1 design (overlay-type unification,
resource caps, helper auth) is still valid — only the sandbox engine
changed.
- l4d2web/services/overlay_builders.py: ScriptBuilder docstring
"bubblewrap + systemd-run" → "hardened systemd-run transient
service" (the as-built reality).
- scripts/tests/test_script_sandbox.py: stray "/bwrap" in a comment
cleaned up. Negative regression assertions (`assert "bwrap" not in
text`) intentionally retained as the guard against accidental
re-introduction.
- Plan docs left untouched (historical action snapshots).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the implementation plan that landed in the preceding commit
(2026-05-15-deploy-dir-rethink.md) under docs/superpowers/plans/, and
marks the two related specs:
- 2026-05-15-deploy-dir-rethink-design.md (the source handoff) gets a
"Resolved by …" banner at the top with a one-paragraph summary of
the decisions taken. Body preserved for archaeology.
- 2026-05-15-janitorial-cleanup.md gets a status banner noting that
items 1, 3, 4, 5 are fully resolved by the deploy-dir-rethink plan
and item 2 is partially resolved with a third option the original
enumeration didn't list: only the truly-dead two static units
(cake.service, nft-mark.service) deleted, the reactor-emitted set
(server@, web, workshop-refresh.{service,timer}, slices) retained
as curated examples. Resolved items left in place but flagged.
Remaining live janitorial items: 6 (bubblewrap doc drift), 7
(conditional on build-overlay-unit refactor), 8 (operational idmap
bind cleanup), 9 (Optimized Settings overlay verification), 10 (SM
1.13 calendar reminder).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups bundled into a single commit:
- docs/superpowers/specs/2026-05-15-janitorial-cleanup.md collects
the "do later" small TODOs that surfaced across the recent idmap
+ consolidation work: dead cake-related artifacts, obsolete
static systemd units in deploy/files/, the bubblewrap→systemd-run
doc drift, stale gameserver-side idmap binds on un-checked
instances, calendar reminder for SM 1.13 stable. Each item is
small and self-contained.
- docs/l4d2-server-cvar-reference.md captures the research from
the early-session L4D2 cvar deep-dive: tickrate sweet spots,
nb_update_frequency cheat-protection + sm_cvar workaround,
cvars that don't exist in L4D2 (net_maxcleartime,
z_resolve_zombie_collision_multiplier per RCON probe), recommended
plugins, MetaMod/SourceMod branch tracking, and the empirically-
verified idmap-propagation-through-rebind kernel-6.12 quirk.
Reference material, not a spec — lives at docs/ root.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Explicit clarification so the next agent doesn't go looking for
user-unit friction. left4me-server@.service and left4me-web.service
are system units that drop to User=left4me; the 3-user split is a
literal one-line edit per unit. No lingering, no pam_systemd, no
per-user systemd instance bootstrap. The privileged
ExecStartPre/ExecStopPost steps stay root via the + prefix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 2-user split (left4me + l4d2-sandbox) has been inherited as a
constraint across multiple recent plans (idmap-on-mount, build-time-
idmap, helper consolidation) without ever being designed
end-to-end. Three plausible configurations: collapse to 1 user
(rejected for security), keep at 2 users (status quo), or split web
from game into 3 users for blast-radius limiting on either side.
Doc captures the threat-model heuristics, cross-uid file-access
plumbing options (shared group vs. world-read), idmap implications,
a step-by-step migration sketch for the 3-user variant, and explicit
out-of-scope items (per-instance gameserver uids, etc.). Detailed
enough that a future session can pick a configuration and execute
without re-deriving the design space.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The script content lives in the overlays.script DB column and the
unit's %i is the row id, so the worker-writes-script-to-fs step in
the original sketch is duplication. Document three options (worker
writes / unit fetches via helper / pipe to stdin) and recommend the
unit-fetches variant with RuntimeDirectory= auto-cleanup. Promote
this to the top of the open-decisions list since it shapes the
worker, the unit, and whether a fetch-script helper is added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The build-time idmap landing today required a nsenter self-wrap in
left4me-script-sandbox to escape the web app's PrivateTmp namespace
before pre-creating the idmapped staging bind. Working but band-aid:
the helper is reinventing what a systemd template unit would do
declaratively. Mirror the left4me-server@.service pattern with a
build-overlay@.service template — ExecStartPre does the idmap bind in
PID 1's namespace by default, the hardening flags live in the unit
file, ExecStopPost tears down. Worker switches to sudo systemctl start.
Doc captures full proposed unit, worker rewrite sketch, sudoers
update, migration order, verification steps, and the ~5h estimate
so a future session can pick this up cold and execute.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Architectural cleanup: the uid translation is a build-time concern
(the sandbox produces sandbox-uid files); having the gameserver path
unwind that producer-side decision on every mount means the mount
helper carries idmap lifecycle code it shouldn't need. Moving the
idmap into the script-sandbox bind makes files land left4me-owned on
disk, drops ~140 lines from left4me-overlay, and makes all overlay
content (workshop + script-built) consistent on-disk.
Verified on left4.me kernel 6.12.86 that the kernel idmap propagates
through plain re-bind, so systemd-run's BindPaths can wrap a
pre-created idmapped staging path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 2026-05-15 script-consolidation pass landed a working but
half-finished mental model: deploy/files/ was retroactively promoted
from "historical reference" to "canonical source," but only for the
script files. Several adjacent things (sudoers/sysctl duplication
across both repos, the systemd unit files that ckn-bw's reactor
ignores, deploy-test-server.sh's role, dead-code apply-cake) didn't
get resolved. Capture the open questions and pointers so a future
session can pick this up and commit to a coherent shape.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Persist the implementation plan for adding idmapped bind mounts to
left4me-overlay so that overlay copy-up from l4d2-sandbox-owned lower
layers (script-built overlays) produces left4me-owned upperdir entries
the gameserver can write. Mechanism verified end-to-end on ovh.left4me
in a temp dir on 2026-05-14.
Plan for adding a per-server RCON console: HTMX append-swap input form,
fixed-height scrolling transcript replayed from CommandHistory on load,
multi-packet response handling, owner-only access, 30s timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- _parse_duration wraps int() in try/except so malformed connected
durations raise RconError (not ValueError leaking past the poller's
except RconError).
- fake_rcon_server captures handler exceptions and re-raises at context
exit, so a buggy test handler surfaces as a real failure instead of
silently degrading into a client-side timeout.
- Two new parser tests: HH:MM:SS duration parsing and malformed input
coverage.
- Fix Steam ID formula typo in the spec doc (Z*2 + Y, not Y*2 + Z; Y is
the low bit). Code was already correct.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The two indexes ix_sps_server_open and ix_sps_server_recent were
byte-identical because SQLAlchemy's Index(name, *cols) form drops the
DESC ordering the spec intended. Rather than reach for text("left_at
DESC"), drop the second index entirely — SQLite scans the ASC index
backwards at no measurable cost. Spec and plan updated to match.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RCON-based polling with run-length-encoded snapshots, session intervals
with min/max ping, Steam profile cache, and a server-detail roster of
current + recent players hot-linked from Steam CDN avatars.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes the gap where added workshop items never reach disk until an
admin presses the global refresh button. Downloads piggyback on the
per-overlay build_overlay job; daily updates come from a systemd
timer + CLI subcommand that enqueues the existing refresh job.
The matching design doc for the implementation plan committed in
6eb9bd0. Captures the session-invalidation reasoning (Django-style
"keep current session, kill others") and the open questions resolved
during brainstorming.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds /profile reachable via header username, with change-password form
as its first section. Industry-standard session semantics: other sessions
invalidated on password change, current session kept, via new
users.password_changed_at column + session marker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight TDD tasks: sysctl extension, nftables marking (file + unit), CAKE
shaper (env + helper + unit), deploy-script wiring, README. Each task
adds one artifact with its assertion in test_deploy_artifacts.py and
ends in its own commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Overlay.type='files' whose source-of-truth IS the overlay directory
itself. Users can:
* upload arbitrary files / whole folders by dragging from the OS onto a
folder row in the file tree (one POST per file, queue with
concurrency 3, per-file progress in a floating Uploads panel)
* move via drag-and-drop inside the tree (same gesture, source
distinguishes; refuses cycles)
* create / edit / rename / replace through a single editor modal
(text flavor for editable files, binary flavor with replace-upload
for everything else; filename input is the rename surface)
* mkdir empty folders (slashes allowed for nested intermediates)
* stream a folder as a zip download
* delete files and empty folders
Backend is type-agnostic past the new files_routes endpoints, so the
existing mount / spec / overlayfs / expose_server_cfg pipeline is reused
unchanged. is_editable gates the row's edit affordance and the /save
content rules. Three new safe-resolve helpers (write/delete/move) cover
the new operations with the same anchor-and-resolve pattern as listing
and download. FilesBuilder is a no-op so the build subsystem can
dispatch uniformly.
Spec: docs/superpowers/specs/2026-05-09-files-overlay-design.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the symptom (Reset blew up on `umount target busy`), the
false starts (eager retry, lazy fallback, TimeoutStopSec bump — all
shipped briefly and reverted), the actual root cause (the helper's
own Python interpreter inheriting and pinning the unit's mount
namespace), and the fix (nsenter at the systemd Exec line).
The lessons section is the part future-me reads first: a retry loop
is a hint that something we own is the blocker; probe `/proc/*/ns/mnt`
before assuming kernel async; `+` Exec prefix doesn't escape the
unit's mount namespace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Investigated whether to hard-pin each srcds instance to a single core
within the existing AllowedCPUs=1-7 set. Modern kernels (5.13+) no
longer expose kernel.sched_migration_cost_ns or the other classic CFS
"laziness" tunables, so a global cheap-fix is unavailable. Decision
for now: trust CFS + Nice=-5 + AllowedCPUs=1-7. Per-instance
CPUAffinity= remains an opt-in escape hatch in deploy/README.md.
Documents the revisit triggers and the preferred implementation path
when the time comes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make explicit that the project uses system units (root systemctl, unit
under /usr/local/lib/systemd/system/, WantedBy=multi-user.target), so
`systemctl enable --now` is the correct verb to make instances survive
a host reboot. User units have different lifecycle rules and would not
auto-start at boot without enable-linger.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two TDD tasks: helper+service_control verb rename, then poller code
+ wiring + tests. Operator-side smoke test in F.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch lifecycle verbs from systemctl start/stop to enable --now /
disable --now (servers survive host reboot via WantedBy= symlinks),
plus a periodic state poller for runtime drift (OOM kills, manual
systemctl ops, exhausted Restart=on-failure).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two TDD tasks: deploy-script cpuset block + tests, README
"CPU isolation" subsection. Operator-side smoke test in F.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cgroup-v2 AllowedCPUs= drop-ins for system/user/build/game slices.
Defaults: core 0 for everything-not-game, cores 1..N-1 for game,
computed from nproc. LEFT4ME_SYSTEM_CPUS / LEFT4ME_GAME_CPUS
overrides; single-core hosts skip with a warning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six tasks (TDD, one commit each): unit directives, slice files,
sysctl conf, sandbox slice + OOMScoreAdjust, deploy-script wiring,
README escape-hatch section. Final verification step with full
deploy + host + web pytest sweep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing left4me-script-sandbox helper uses systemd-run in
transient service mode (--unit=, no --scope). Spec wrongly said
'--scope'. No semantic change — the design's --slice= and
-p OOMScoreAdjust= guidance is identical for service vs scope mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each linked overlay gets a checkbox on the blueprint detail page that opts
its server.cfg in as exec server_overlay_<id>. The web app builds the
spec with {path, alias} per overlay and prepends exec server_overlay_<id>
lines to the blueprint config in lowest-overlay-first order. The host
stages those copies in the overlayfs upper layer before mounting (avoids
copy-up writes against a sandbox-uid file). A live preview block above the
Config textarea shows what gets auto-executed.
Schema:
- alembic 0007: BlueprintOverlay.expose_server_cfg BOOLEAN
Spec contract:
- l4d2host OverlayRef(path, alias?). load_spec accepts both bare-string
and {path, alias} entries.
Side effects folded in (same file in l4d2_facade):
- start_server auto-initializes; the manual Initialize step is no longer
needed before Start.
- initialize_server no longer runs blueprint builders — builds happen on
overlay save, not on every server Start.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace per-row checkbox + numeric Order inputs with a drag-to-reorder
list of selected overlays plus a native <select> for adding more.
Native HTML5 DnD; no library, no JS-disabled fallback. Server contract
unchanged (overlay_ids in DOM order; existing fallback_position branch
absorbs the omitted overlay_position_<id> fields).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The vendored static/vendor/htmx.min.js turned out to be a 33-byte
placeholder, so the hx-get/hx-target/hx-trigger attributes on the
overlay file tree's folder buttons were inert: clicks rotated the
chevron (own JS) but never fetched. Switch the lazy-load to a
~30-line plain-JS handler in static/js/file-tree.js that fetches
button.dataset.filesUrl on first expand and dedupes via dataset.loaded.
Update the spec/plan to match. Route + partial contracts unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the design rationale for the new overlay-detail Files section
(verify build output, click-to-download for individual files via Flask
send_file, HTMX-driven lazy folder expansion) and the paired
implementation plan that produced it. Adds .superpowers/ to .gitignore
so brainstorm session artifacts never sneak into a future commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host-side identifier (systemd unit name and /var/lib/left4me dirs) is now
str(server.id), centralized in services/server_identity.server_unit_name.
Server.name becomes a free-form display label, required and unique per
user (was [a-z0-9_-]{1,64} and globally unique).
Migration 0006 swaps the old global UNIQUE(name) for UNIQUE(user_id, name).
Web routes already keyed on id; templates only used name for display.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the v3 design: IPAddressDeny= alone (no IPAddressAllow=any
because the documented "more specific wins" semantics don't hold on
systemd 257 / kernel 6.12 — the allow trumps unconditionally), explicit
CIDRs (the -p parser rejects the localhost/link-local shorthand
keywords), and a static sandbox-only resolv.conf bind to keep DNS
reachable when private RFC1918 ranges are blocked.
Plan documents what was implemented (in 7e66936) and the lessons
surfaced during execution so the next person doesn't have to rediscover
them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec captures the v2 architecture (systemd-run service mode with full
hardening directives, no bwrap), the two surfaces in scope (helper
rewrite + bubblewrap dep removal + left4me.db mode tightening), and the
gotchas surfaced by smoke-testing the prototype on ckn@10.0.4.128:
- ProtectSystem=strict makes /var/lib/left4me visible (not invisible);
must add TemporaryFileSystem=/var/lib to mask it.
- Script bind via BindReadOnlyPaths uses ${SCRIPT}:/script.sh syntax.
- No PrivatePID= directive in systemd; host PIDs visible via /proc.
Information disclosure only — kernel UID-mismatch blocks signals.
Plan breaks the migration into 4 tasks (helper rewrite, deploy-script
deps + DB mode, host smoke-test, drift sweep) with explicit rollback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the architectural fix for the mount-propagation bug: replace
fuse-overlayfs (rootless mount inside the web service's namespace, never
visible to host or to gameserver units) with kernel-native overlayfs
mounted via a privileged sudo helper that nsenters into PID 1's mount
namespace. Companion plan numbers the migration as five tasks ending in
end-to-end verification on the test box.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two managed system overlays (l4d2center-maps, cedapug-maps) that
fetch curated map archives from upstream sources and reconcile addons
symlinks for non-Steam maps. A daily systemd timer enqueues a coalesced
refresh_global_overlays worker job; downloads, extraction, and rebuilds
run in the existing job worker and surface in the job log UI.
Schema: GlobalOverlaySource / GlobalOverlayItem / GlobalOverlayItemFile
plus nullable Job.user_id so system jobs render as "system" in the UI.
The new builder reconciles symlinks against the per-source vpk cache
and leaves foreign symlinks untouched. Initialize-time guard refuses
to mount a partial overlay if any expected vpk is missing from cache.
Refresh service uses shutil.move to handle EXDEV when /tmp and the
cache live on different filesystems.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a typed-overlay model with workshop as the first non-external type:
deduplicated WorkshopItem registry, symlink-based overlay directories,
auto-rebuild after item changes, admin global refresh, and a unified
Create-overlay UI with web-managed paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refine the host library plan with web-facing API boundaries and rewrite the web app plan around live-linked blueprints, async execution, and hardened logging/state workflows.