docs(specs): script overlay type — design + implementation plan

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 15:27:14 +02:00

24 KiB

Raw Blame History

L4D2 Script Overlays Implementation Plan

Approval status: User-approved 2026-05-08. Implementation proceeds.

Goal: Implement the script overlay type per docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md. Add an Overlay.script TEXT column and Overlay.last_build_status enum-string column, a ScriptBuilder that runs user bash inside a bubblewrap + systemd-run --scope sandbox via a new left4me-script-sandbox privileged helper, route + UI surface for editing/wiping/rebuilding, and delete the entire managed-globals (l4d2center_maps, cedapug_maps) subsystem and its daily-refresh timer/CLI.

Architecture: The web app continues to enqueue build_overlay jobs for any overlay row. The job worker dispatches via BUILDERS[overlay.type].build(...). After this change BUILDERS = {"workshop": WorkshopBuilder(), "script": ScriptBuilder()}. The new ScriptBuilder writes overlay.script to a tmpfile and execs sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>, which itself execs systemd-run --scope --collect ... -- bwrap [namespace flags] /bin/bash /script.sh. stdout/stderr stream through the existing run_with_streamed_output helper into the existing job-log SSE plumbing. The job-completion path writes Overlay.last_build_status based on the build outcome. The kernel-overlayfs mount layer (KernelOverlayFSMounter) is unchanged.

Locked Decisions

See docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md for design rationale. Implementation-relevant summary:

Final overlay type list: workshop (unchanged) + script (new). Drop l4d2center_maps, cedapug_maps.
New columns on overlays: script TEXT NOT NULL DEFAULT '', last_build_status VARCHAR(16) NOT NULL DEFAULT ''.
Drop tables (FK order): global_overlay_item_files, global_overlay_items, global_overlay_sources.
ScriptBuilder in l4d2web/services/overlay_builders.py, uses existing run_with_streamed_output.
Privileged helper left4me-script-sandbox (bash, mode 0755, owned root). systemd-run --scope --collect -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 -p CPUQuota=200% -p RuntimeMaxSec=3600 -- bwrap …. Limits 1 h walltime, 4 GB RAM, 20 GB post-build du cap.
New system user l4d2-sandbox (/usr/sbin/nologin, no home). New apt dep bubblewrap.
Sudoers verb-unrestricted: left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox.
Daily refresh subsystem deleted: left4me-refresh-global-overlays.{timer,service} and flask refresh-global-overlays CLI removed. No replacement.
Wipe is the same sandbox helper invoked with the literal script find /overlay -mindepth 1 -delete.
auto_refresh column NOT added in this iteration.
Test deploy DB is wiped on rollout; migration includes DELETE FROM overlays WHERE type IN ('l4d2center_maps', 'cedapug_maps') for safety.

Current Gap

l4d2web/models.py Overlay has no script or last_build_status columns. The 3 globals tables are present.
l4d2web/services/overlay_builders.py BUILDERS = {"workshop": WorkshopBuilder(), "l4d2center_maps": GlobalMapOverlayBuilder(), "cedapug_maps": GlobalMapOverlayBuilder()}. No ScriptBuilder.
l4d2web/services/{global_map_sources,global_overlay_refresh,global_map_cache,global_overlays}.py exist and are referenced by routes / CLI.
l4d2web/services/job_worker.py carries refresh_global_overlays_running plumbing.
l4d2web/cli.py defines refresh-global-overlays.
l4d2web/routes/overlay_routes.py has no /script, /wipe, or /build endpoints for non-workshop types.
l4d2web/templates/overlays.html create modal type radio offers only workshop.
l4d2web/templates/overlay_detail.html has a global-source block (~lines 34–46) that should not survive.
deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.{timer,service} exist.
deploy/deploy-test-server.sh provisions global_overlay_cache/ and does not provision l4d2-sandbox or install bubblewrap.
Seven tests/test_global_*.py files exist and reference removed code.

Task 1: Schema migration (alembic 0005)

Files:

Create: l4d2web/alembic/versions/0005_script_overlays.py (revises 0004_drop_legacy_external_overlay_type).
Modify: l4d2web/models.py — Overlay gains script and last_build_status columns; remove GlobalOverlaySource, GlobalOverlayItem, GlobalOverlayItemFile model classes.
Modify: l4d2web/tests/test_overlay_models.py (or whichever existing test asserts the Overlay schema; create one if absent) — assert new columns present.

Test plan (RED first):

tests/test_alembic_migrations.py::test_upgrade_0005_adds_script_columns — apply migrations to a fresh in-memory SQLite, assert script and last_build_status columns present on overlays, assert no global_overlay_* tables, assert old data wipe DELETE FROM overlays WHERE type IN (...) is part of the upgrade.
tests/test_alembic_migrations.py::test_downgrade_0005_restores_globals (only if downgrade is supported in the project's migration policy; skip with pytest.skip if not — kernel-overlayfs migration is one-way, follow that precedent).
tests/test_overlay_models.py::test_overlay_has_script_columns — Overlay(...) instance has script='' and last_build_status='' defaults.

Implementation:

Migration uses op.drop_table('global_overlay_item_files') etc. in correct FK order; uses op.add_column('overlays', sa.Column('script', sa.Text(), nullable=False, server_default='')) and similar for last_build_status (sa.String(16)).
The DELETE FROM overlays WHERE type IN ('l4d2center_maps','cedapug_maps') runs before the column additions so the operation is straightforward — these rows do not reference the new columns.
models.py: delete the three globals model classes outright; add the two new columns to Overlay with explicit defaults.

Verification:

python3 -m pytest l4d2web/tests/test_alembic_migrations.py l4d2web/tests/test_overlay_models.py -q

Commit: feat(l4d2-web): script overlay schema — add overlay.script + last_build_status, drop globals tables

Task 2: ScriptBuilder + BUILDERS registry update

Files:

Modify: l4d2web/services/overlay_builders.py — add ScriptBuilder, remove GlobalMapOverlayBuilder, change BUILDERS dict.
Rewrite: l4d2web/tests/test_overlay_builders.py — drop globals-builder tests, add ScriptBuilder tests.

Test plan (RED first):

test_overlay_builders.py::test_builders_registry — set(BUILDERS) == {"workshop", "script"}. Assert "l4d2center_maps" and "cedapug_maps" and "external" are absent.
test_overlay_builders.py::test_script_builder_invokes_helper — patch run_with_streamed_output to capture argv; build an Overlay(id=42, type='script', script='echo hi'); assert argv shape ["sudo", "-n", "/usr/local/libexec/left4me/left4me-script-sandbox", "42", <script_path>] and that the script_path file exists with content "echo hi" at invocation time. Verify the tmpfile is unlinked after build.
test_overlay_builders.py::test_script_builder_disk_cap — fake subprocess.check_output for du to return 25000000000; build raises BuildError("disk-cap-exceeded") and on_stderr was called with the cap message.
test_overlay_builders.py::test_script_builder_streams_output — fake run_with_streamed_output invokes both on_stdout("hello\n") and on_stderr("warn\n"); both lambda lists capture the lines.
test_overlay_builders.py::test_script_builder_cancel — should_cancel returns True after the first stdout line; assert run_with_streamed_output propagated cancellation (the existing helper's contract — the test just ensures we pass should_cancel through and don't run the disk-budget check on cancel).
test_overlay_builders.py::test_workshop_builder_unchanged — smoke test that WorkshopBuilder still exists and is invokable (regression guard against accidental removal during refactor).

Implementation:

Add import os, subprocess, tempfile at the top of overlay_builders.py if not present.
ScriptBuilder exactly as in the spec (verbatim copy from the design doc, §Build Lifecycle).
Define a small BuildError exception class if one doesn't already exist locally; reuse the existing one if WorkshopBuilder already raises a similar type.
_enforce_disk_budget calls subprocess.check_output(["du", "-sb", str(overlay_path(overlay_id))]); the existing overlay_path helper in the module already returns the absolute Path. Parse first whitespace-delimited integer; cap is 20 * 1024**3.
Job-completion path: locate the existing path that handles build_overlay job success/failure (likely in services/job_worker.py or a related orchestration module). Add a single column write: on success last_build_status='ok', on BuildError / non-zero exit / cancel last_build_status='failed'. Add a tests/test_job_worker.py::test_build_overlay_writes_last_build_status covering both branches.
Remove GlobalMapOverlayBuilder class and any helper functions it owns that are not used elsewhere.

Verification:

python3 -m pytest l4d2web/tests/test_overlay_builders.py l4d2web/tests/test_job_worker.py -q

Commit: feat(l4d2-web): ScriptBuilder + BUILDERS registry update

Task 3: Delete global-overlay services + CLI command + their tests

Files:

Delete: l4d2web/services/global_map_sources.py
Delete: l4d2web/services/global_overlay_refresh.py
Delete: l4d2web/services/global_map_cache.py
Delete: l4d2web/services/global_overlays.py
Modify: l4d2web/cli.py — remove refresh-global-overlays command (lines ~44–55). Drop any imports that go orphaned.
Delete: l4d2web/tests/test_global_map_sources.py
Delete: l4d2web/tests/test_global_overlay_models.py
Delete: l4d2web/tests/test_global_overlay_builders.py
Delete: l4d2web/tests/test_global_overlay_cli.py
Delete: l4d2web/tests/test_global_overlay_refresh.py
Delete: l4d2web/tests/test_global_overlays.py
Delete: l4d2web/tests/test_global_map_cache.py
Audit & fix: any other module that imports the deleted modules. Likely candidates: l4d2web/app.py (CLI registration), routes/overlay_routes.py, routes/page_routes.py. Resolve by deletion of the dead import / call site, not by stubbing.
Modify: pyproject.toml — drop py7zr from dependencies (only used by the deleted globals subsystem).

Test plan:

RED-first via grep: grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/ — should return zero hits at the end of this task. Add this as tests/test_no_globals_references.py::test_no_globals_imports if you want it as a permanent regression guard, otherwise spot-check.
Existing tests/test_cli.py (or whichever covers Flask CLI) loses any cases for refresh-global-overlays; add a test_refresh_global_overlays_command_removed that asserts the click command is not registered.

Implementation:

Delete files via git rm.
In cli.py, remove the command function and its @app.cli.command(...) decorator. Drop any helper imports that become orphaned.
Remove py7zr from pyproject.toml and re-lock if a lockfile is present.

Verification:

python3 -m pytest l4d2web/tests/ -q
grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/ || echo "clean"

Commit: refactor(l4d2-web): drop global-overlays subsystem in favor of script type

Task 4: Job worker — drop refresh_global_overlays from scheduler

Files:

Modify: l4d2web/services/job_worker.py — remove "refresh_global_overlays" from GLOBAL_OPERATIONS; remove refresh_global_overlays_running field from SchedulerState and any references in can_start(); check whether blocked_servers_by_overlay was added solely for the globals subsystem and remove if so.
Modify: l4d2web/tests/test_job_worker.py — drop refresh_global_overlays truth-table rows; add explicit build_overlay truth-table cases for script-type overlays (mechanically identical to workshop, but pinned by test).

Test plan:

test_job_worker.py::test_global_operations_set — GLOBAL_OPERATIONS == {"install", "refresh_workshop_items"} (or whatever subset remains; pin it).
test_job_worker.py::test_build_overlay_script_type_blocks_per_overlay — start build_overlay(overlay_id=7) for a script-type overlay; assert second build_overlay(overlay_id=7) cannot start; assert build_overlay(overlay_id=8) can.
test_job_worker.py::test_build_overlay_blocks_server_init_on_blueprint_overlay — existing test, may need re-pinning if it referenced globals.

Implementation:

Remove the field from the dataclass / TypedDict that backs SchedulerState.
Remove any update sites that flipped the flag (the worker's enqueue / on-start / on-complete paths).
The remaining mutex rules (install / refresh_workshop_items are global; build_overlay per-overlay; server ops block on overlays in their blueprint) are unchanged structurally.

Verification:

python3 -m pytest l4d2web/tests/test_job_worker.py -q

Commit: refactor(l4d2-web): drop refresh_global_overlays from scheduler

Task 5: Routes (script update / wipe / build)

Files:

Modify: l4d2web/routes/overlay_routes.py — add three POST endpoints.
Create: l4d2web/tests/test_script_overlay_routes.py.

Test plan (RED first):

test_script_overlay_routes.py::test_create_script_overlay — POST /overlays with form {"name": "x", "type": "script"} as a regular user → 302 to detail; row exists with type='script', script='', last_build_status='', user_id=current_user.id, path=str(id).
test_script_overlay_routes.py::test_admin_creates_system_wide_script_overlay — admin POST with system-wide flag → row has user_id=NULL.
test_script_overlay_routes.py::test_update_script_body_enqueues_build — POST /overlays/{id}/script with {"script": "echo new"} → row.script updated; one new build_overlay job enqueued for the overlay; second immediate POST coalesces (no second job inserted while first is pending).
test_script_overlay_routes.py::test_manual_rebuild — POST /overlays/{id}/build → enqueues build_overlay; coalesces.
test_script_overlay_routes.py::test_wipe_runs_find_delete — POST /overlays/{id}/wipe → invokes ScriptBuilder.build (or the underlying helper) with the literal script find /overlay -mindepth 1 -delete. After success, row.last_build_status ==''. Does not enqueue a build_overlay.
test_script_overlay_routes.py::test_wipe_refuses_during_running_build — set scheduler state to build_overlay(overlay_id=7) running; POST /overlays/7/wipe → 409 (or whatever the existing pattern uses for scheduler conflicts), no sandbox invocation.
test_script_overlay_routes.py::test_permissions_non_owner_denied — user A creates private script overlay; user B POSTs /overlays/{id}/script → 403.
test_script_overlay_routes.py::test_permissions_admin_can_edit_any — admin POSTs /overlays/{id}/script for user A's row → 200.

Implementation:

Mirror the existing _can_edit_overlay() permission helper.
The /wipe endpoint can either (a) call ScriptBuilder directly with a synthetic Overlay-like object whose .script is the find command and whose .id is the real overlay id, or (b) factor a _run_sandbox(overlay_id, script_text, on_stdout, on_stderr, should_cancel) helper out of ScriptBuilder.build() and call it from both. (b) is cleaner; do (b).
Wipe runs synchronously in the request thread (small, fast). It does NOT enqueue a job. Surface log output as flash messages or by streaming through the existing log infra — pick whichever matches the existing wipe-equivalent pattern (workshop overlays don't have a wipe; closest analog is the existing delete-overlay flow).
The /script endpoint enqueues via the same enqueue_build_overlay(overlay_id) helper used by workshop overlays' add/remove flows. Coalescing is already implemented there.

Verification:

python3 -m pytest l4d2web/tests/test_script_overlay_routes.py l4d2web/tests/test_overlay_routes.py -q

Commit: feat(l4d2-web): script overlay routes (script update / wipe / build)

Task 6: Templates (overlays.html + overlay_detail.html)

Files:

Modify: l4d2web/templates/overlays.html — add script to the create-modal type radio (lines ~29–49).
Modify: l4d2web/templates/overlay_detail.html — add a {% if overlay.type == 'script' %} block with textarea + Save / Rebuild / Wipe buttons + status badge; delete the global-source block (lines ~34–46).
Modify: l4d2web/tests/test_pages.py — assert script-section renders for type=script, workshop-section renders for type=workshop, global-source-section is absent.

Test plan:

test_pages.py::test_overlay_create_modal_offers_script_type — GET /overlays; HTML contains value="script" radio.
test_pages.py::test_overlay_detail_script_section — create script overlay, GET /overlays/{id}; HTML contains <textarea name="script">, "Rebuild" button, "Wipe" button, status badge element.
test_pages.py::test_overlay_detail_workshop_section_unchanged — existing workshop detail still has thumbnail grid, add-item form, etc.
test_pages.py::test_overlay_detail_no_global_source_block — page HTML has no element from the deleted global-source block (check for an attribute or string unique to that block).

Implementation:

Detail-page wipe button uses a small confirm-modal pattern (copy from the existing delete-overlay confirm modal).
Status badge: existing CSS classes for ok/warn/error already exist in static/; reuse them.
No new JS deps. Plain <form method="post"> with HTMX hx-post for the script update if a streaming UX is desired (match existing patterns).

Verification:

python3 -m pytest l4d2web/tests/test_pages.py -q

Manual: start dev server (flask run), create a script overlay, paste echo "hi" > foo, click Save, watch log stream. Then click Wipe; confirm dir is empty. Then click Rebuild; confirm foo reappears.

Commit: feat(l4d2-web): script overlay UI

Task 7: Libexec sandbox helper + sudoers + deploy-artifacts test

Files:

Create: deploy/files/usr/local/libexec/left4me/left4me-script-sandbox (bash, mode 0755 after deploy, owned root).
Modify: deploy/files/etc/sudoers.d/left4me — append the rule.
Modify: deploy/tests/test_deploy_artifacts.py — assert helper file present + sudoers contains the new line.

Test plan (RED first):

test_deploy_artifacts.py::test_script_sandbox_helper_present — file exists, mode bits indicate 0755 (or whatever the test framework allows checking pre-deploy), shebang is #!/bin/bash.
test_deploy_artifacts.py::test_sudoers_includes_script_sandbox_rule — sudoers file contains the exact line left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox.
Optional integration test (skip on non-Linux dev): drive the helper as a subprocess with a synthesized fake /var/lib/left4me/overlays/1/ and a no-op script, assert bwrap invocation happens (use a mock systemd-run or LEFT4ME_SCRIPT_SANDBOX_DRY_RUN=1 env that prints the would-be invocation and exits 0). Mirrors the LEFT4ME_OVERLAY_PRINT_ONLY=1 pattern from the kernel-overlayfs helper test.

Implementation:

Helper script verbatim from the spec §Sandbox.
Sudoers fragment: append (don't replace existing rules). The existing fragment has rules for left4me-overlay, left4me-systemctl, left4me-journalctl — match the same formatting (one rule per line, no trailing whitespace).

Verification:

python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox

Commit: feat(deploy): left4me-script-sandbox helper + sudoers fragment

Task 8: Deploy script — provision l4d2-sandbox + bubblewrap; drop globals timer

Files:

Modify: deploy/deploy-test-server.sh — add useradd --system ... l4d2-sandbox, add apt-get install -y bubblewrap, ensure helper installation step picks up left4me-script-sandbox (likely automatic if it's a glob in deploy/files/usr/local/libexec/left4me/*); drop the mkdir global_overlay_cache line if present.
Delete: deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.timer
Delete: deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.service
Modify: deploy/tests/test_deploy_artifacts.py — assert the two unit files are absent; assert useradd l4d2-sandbox and apt-get install ... bubblewrap lines are present in the deploy script.

Test plan:

test_deploy_artifacts.py::test_globals_refresh_units_removed — files do not exist under deploy/files/usr/local/lib/systemd/system/.
test_deploy_artifacts.py::test_deploy_script_provisions_sandbox_user — grep the deploy script for the useradd line.
test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap — grep for bubblewrap in apt invocations.

Implementation:

useradd line uses --system --no-create-home --shell /usr/sbin/nologin. Idempotency: wrap with id l4d2-sandbox &>/dev/null || useradd ....
apt-get install: append bubblewrap to whatever package list the script already maintains.
Globals timer/service deletions: git rm.

Verification:

python3 -m pytest deploy/tests/ -q
shellcheck deploy/deploy-test-server.sh deploy/files/usr/local/libexec/left4me/left4me-script-sandbox

Commit: chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer

Task 9: Full pytest run + drift fixes

Files: as needed across the repo.

Test plan: run the full test suite for both packages; chase down any drift caused by removed model classes, dropped imports, or template changes.

python3 -m pytest l4d2web/tests/ -q
python3 -m pytest l4d2host/tests/ -q
python3 -m pytest deploy/tests/ -q

Implementation: fix what breaks. Common drift sources to expect:

Tests that imported from deleted modules.
Tests that asserted exact BUILDERS keyset (good — they should have been updated in Task 2).
Tests that built fixtures with type='l4d2center_maps' or type='cedapug_maps' — those tests likely belong to the deleted set or need conversion to type='script'.
Template snapshot tests (if any) that captured the deleted global-source block.

Verification: all three suites green.

Commit: chore(l4d2-web): test suite drift fixes after script-overlays migration (only if drift fixes needed; skip if Tasks 1–8 left the suite green)

End-to-end deployment verification (manual, on test host)

After all tasks committed:

Reset deploy: run deploy/deploy-test-server.sh from clean state. Confirm bubblewrap installed (dpkg -l bubblewrap), l4d2-sandbox user exists (id l4d2-sandbox), /usr/local/libexec/left4me/left4me-script-sandbox is mode 0755 and root-owned, sudo -ln as left4me shows the new rule.
Sandbox smoke: as left4me, write /tmp/echo.sh containing echo $(whoami) > /overlay/sentinel. mkdir -p /var/lib/left4me/overlays/1. sudo /usr/local/libexec/left4me/left4me-script-sandbox 1 /tmp/echo.sh. Confirm /var/lib/left4me/overlays/1/sentinel contains l4d2-sandbox and is owned by l4d2-sandbox. Confirm /etc/passwd, /var/lib/left4me/l4d2web.db, and /home are not visible inside the sandbox by running probe scripts.
Resource limits:
- dd if=/dev/zero of=/overlay/big bs=1M count=25000 → succeeds inside sandbox; ScriptBuilder._enforce_disk_budget flags the build failed; last_build_status='failed'.
- sleep 7200 → killed at 1 h by RuntimeMaxSec=3600.
- Memory hog (python3 -c "x=' '*(5*1024**3)") → OOM at 4 GB.
App-level happy path: as a non-admin user, create a script overlay via the UI, paste an old competitive_rework-style script, Save → build runs, succeeds, addons appear in overlays/{id}/left4dead2/. Stack onto a server blueprint, start the server, verify content mounts via the L4D2 admin console (map workshop/...).
Wipe: click Wipe → dir empty (find -delete output in log). Click Rebuild → repopulates. last_build_status cycles: '' → 'ok'.
Scheduler: start a server using the script overlay; in another browser tab attempt to Rebuild → 409 / scheduler-blocked. Stop server; rebuild succeeds.
Audit log: journalctl --since "5 min ago" | grep run- shows transient scopes per build with cgroup memory accounting visible.

These are not required for any single commit but should pass before declaring the work done.

24 KiB Raw Blame History Unescape Escape

L4D2 Script Overlays Implementation Plan

Locked Decisions

Current Gap

Task 1: Schema migration (alembic 0005)

Task 2: ScriptBuilder + BUILDERS registry update

Task 3: Delete global-overlay services + CLI command + their tests

Task 4: Job worker — drop refresh_global_overlays from scheduler

Task 5: Routes (script update / wipe / build)

Task 6: Templates (overlays.html + overlay_detail.html)

Task 7: Libexec sandbox helper + sudoers + deploy-artifacts test

Task 8: Deploy script — provision l4d2-sandbox + bubblewrap; drop globals timer

Task 9: Full pytest run + drift fixes

End-to-end deployment verification (manual, on test host)

24 KiB

Raw Blame History