24 KiB
L4D2 Script Overlays Implementation Plan
Approval status: User-approved 2026-05-08. Implementation proceeds.
Goal: Implement the script overlay type per docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md. Add an Overlay.script TEXT column and Overlay.last_build_status enum-string column, a ScriptBuilder that runs user bash inside a bubblewrap + systemd-run --scope sandbox via a new left4me-script-sandbox privileged helper, route + UI surface for editing/wiping/rebuilding, and delete the entire managed-globals (l4d2center_maps, cedapug_maps) subsystem and its daily-refresh timer/CLI.
Architecture: The web app continues to enqueue build_overlay jobs for any overlay row. The job worker dispatches via BUILDERS[overlay.type].build(...). After this change BUILDERS = {"workshop": WorkshopBuilder(), "script": ScriptBuilder()}. The new ScriptBuilder writes overlay.script to a tmpfile and execs sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>, which itself execs systemd-run --scope --collect ... -- bwrap [namespace flags] /bin/bash /script.sh. stdout/stderr stream through the existing run_with_streamed_output helper into the existing job-log SSE plumbing. The job-completion path writes Overlay.last_build_status based on the build outcome. The kernel-overlayfs mount layer (KernelOverlayFSMounter) is unchanged.
Locked Decisions
See docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md for design rationale. Implementation-relevant summary:
- Final overlay type list:
workshop(unchanged) +script(new). Dropl4d2center_maps,cedapug_maps. - New columns on
overlays:script TEXT NOT NULL DEFAULT '',last_build_status VARCHAR(16) NOT NULL DEFAULT ''. - Drop tables (FK order):
global_overlay_item_files,global_overlay_items,global_overlay_sources. ScriptBuilderinl4d2web/services/overlay_builders.py, uses existingrun_with_streamed_output.- Privileged helper
left4me-script-sandbox(bash, mode 0755, owned root).systemd-run --scope --collect -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 -p CPUQuota=200% -p RuntimeMaxSec=3600 -- bwrap …. Limits 1 h walltime, 4 GB RAM, 20 GB post-buildducap. - New system user
l4d2-sandbox(/usr/sbin/nologin, no home). New apt depbubblewrap. - Sudoers verb-unrestricted:
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox. - Daily refresh subsystem deleted:
left4me-refresh-global-overlays.{timer,service}andflask refresh-global-overlaysCLI removed. No replacement. - Wipe is the same sandbox helper invoked with the literal script
find /overlay -mindepth 1 -delete. auto_refreshcolumn NOT added in this iteration.- Test deploy DB is wiped on rollout; migration includes
DELETE FROM overlays WHERE type IN ('l4d2center_maps', 'cedapug_maps')for safety.
Current Gap
l4d2web/models.pyOverlayhas noscriptorlast_build_statuscolumns. The 3 globals tables are present.l4d2web/services/overlay_builders.pyBUILDERS = {"workshop": WorkshopBuilder(), "l4d2center_maps": GlobalMapOverlayBuilder(), "cedapug_maps": GlobalMapOverlayBuilder()}. NoScriptBuilder.l4d2web/services/{global_map_sources,global_overlay_refresh,global_map_cache,global_overlays}.pyexist and are referenced by routes / CLI.l4d2web/services/job_worker.pycarriesrefresh_global_overlays_runningplumbing.l4d2web/cli.pydefinesrefresh-global-overlays.l4d2web/routes/overlay_routes.pyhas no/script,/wipe, or/buildendpoints for non-workshop types.l4d2web/templates/overlays.htmlcreate modal type radio offers onlyworkshop.l4d2web/templates/overlay_detail.htmlhas a global-source block (~lines 34–46) that should not survive.deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.{timer,service}exist.deploy/deploy-test-server.shprovisionsglobal_overlay_cache/and does not provisionl4d2-sandboxor installbubblewrap.- Seven
tests/test_global_*.pyfiles exist and reference removed code.
Task 1: Schema migration (alembic 0005)
Files:
- Create:
l4d2web/alembic/versions/0005_script_overlays.py(revises0004_drop_legacy_external_overlay_type). - Modify:
l4d2web/models.py—Overlaygainsscriptandlast_build_statuscolumns; removeGlobalOverlaySource,GlobalOverlayItem,GlobalOverlayItemFilemodel classes. - Modify:
l4d2web/tests/test_overlay_models.py(or whichever existing test asserts the Overlay schema; create one if absent) — assert new columns present.
Test plan (RED first):
tests/test_alembic_migrations.py::test_upgrade_0005_adds_script_columns— apply migrations to a fresh in-memory SQLite, assertscriptandlast_build_statuscolumns present onoverlays, assert noglobal_overlay_*tables, assert old data wipeDELETE FROM overlays WHERE type IN (...)is part of the upgrade.tests/test_alembic_migrations.py::test_downgrade_0005_restores_globals(only if downgrade is supported in the project's migration policy; skip withpytest.skipif not — kernel-overlayfs migration is one-way, follow that precedent).tests/test_overlay_models.py::test_overlay_has_script_columns—Overlay(...)instance hasscript=''andlast_build_status=''defaults.
Implementation:
- Migration uses
op.drop_table('global_overlay_item_files')etc. in correct FK order; usesop.add_column('overlays', sa.Column('script', sa.Text(), nullable=False, server_default=''))and similar forlast_build_status(sa.String(16)). - The
DELETE FROM overlays WHERE type IN ('l4d2center_maps','cedapug_maps')runs before the column additions so the operation is straightforward — these rows do not reference the new columns. models.py: delete the three globals model classes outright; add the two new columns toOverlaywith explicit defaults.
Verification:
python3 -m pytest l4d2web/tests/test_alembic_migrations.py l4d2web/tests/test_overlay_models.py -q
Commit: feat(l4d2-web): script overlay schema — add overlay.script + last_build_status, drop globals tables
Task 2: ScriptBuilder + BUILDERS registry update
Files:
- Modify:
l4d2web/services/overlay_builders.py— addScriptBuilder, removeGlobalMapOverlayBuilder, changeBUILDERSdict. - Rewrite:
l4d2web/tests/test_overlay_builders.py— drop globals-builder tests, add ScriptBuilder tests.
Test plan (RED first):
test_overlay_builders.py::test_builders_registry—set(BUILDERS) == {"workshop", "script"}. Assert"l4d2center_maps"and"cedapug_maps"and"external"are absent.test_overlay_builders.py::test_script_builder_invokes_helper— patchrun_with_streamed_outputto capture argv; build anOverlay(id=42, type='script', script='echo hi'); assert argv shape["sudo", "-n", "/usr/local/libexec/left4me/left4me-script-sandbox", "42", <script_path>]and that the script_path file exists with content"echo hi"at invocation time. Verify the tmpfile is unlinked after build.test_overlay_builders.py::test_script_builder_disk_cap— fakesubprocess.check_outputforduto return25000000000; build raisesBuildError("disk-cap-exceeded")andon_stderrwas called with the cap message.test_overlay_builders.py::test_script_builder_streams_output— fakerun_with_streamed_outputinvokes bothon_stdout("hello\n")andon_stderr("warn\n"); both lambda lists capture the lines.test_overlay_builders.py::test_script_builder_cancel—should_cancelreturns True after the first stdout line; assertrun_with_streamed_outputpropagated cancellation (the existing helper's contract — the test just ensures we passshould_cancelthrough and don't run the disk-budget check on cancel).test_overlay_builders.py::test_workshop_builder_unchanged— smoke test thatWorkshopBuilderstill exists and is invokable (regression guard against accidental removal during refactor).
Implementation:
- Add
import os, subprocess, tempfileat the top ofoverlay_builders.pyif not present. ScriptBuilderexactly as in the spec (verbatim copy from the design doc, §Build Lifecycle).- Define a small
BuildErrorexception class if one doesn't already exist locally; reuse the existing one ifWorkshopBuilderalready raises a similar type. _enforce_disk_budgetcallssubprocess.check_output(["du", "-sb", str(overlay_path(overlay_id))]); the existingoverlay_pathhelper in the module already returns the absolute Path. Parse first whitespace-delimited integer; cap is20 * 1024**3.- Job-completion path: locate the existing path that handles
build_overlayjob success/failure (likely inservices/job_worker.pyor a related orchestration module). Add a single column write: on successlast_build_status='ok', onBuildError/ non-zero exit / cancellast_build_status='failed'. Add atests/test_job_worker.py::test_build_overlay_writes_last_build_statuscovering both branches. - Remove
GlobalMapOverlayBuilderclass and any helper functions it owns that are not used elsewhere.
Verification:
python3 -m pytest l4d2web/tests/test_overlay_builders.py l4d2web/tests/test_job_worker.py -q
Commit: feat(l4d2-web): ScriptBuilder + BUILDERS registry update
Task 3: Delete global-overlay services + CLI command + their tests
Files:
- Delete:
l4d2web/services/global_map_sources.py - Delete:
l4d2web/services/global_overlay_refresh.py - Delete:
l4d2web/services/global_map_cache.py - Delete:
l4d2web/services/global_overlays.py - Modify:
l4d2web/cli.py— removerefresh-global-overlayscommand (lines ~44–55). Drop any imports that go orphaned. - Delete:
l4d2web/tests/test_global_map_sources.py - Delete:
l4d2web/tests/test_global_overlay_models.py - Delete:
l4d2web/tests/test_global_overlay_builders.py - Delete:
l4d2web/tests/test_global_overlay_cli.py - Delete:
l4d2web/tests/test_global_overlay_refresh.py - Delete:
l4d2web/tests/test_global_overlays.py - Delete:
l4d2web/tests/test_global_map_cache.py - Audit & fix: any other module that imports the deleted modules. Likely candidates:
l4d2web/app.py(CLI registration),routes/overlay_routes.py,routes/page_routes.py. Resolve by deletion of the dead import / call site, not by stubbing. - Modify:
pyproject.toml— droppy7zrfrom dependencies (only used by the deleted globals subsystem).
Test plan:
- RED-first via grep:
grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/— should return zero hits at the end of this task. Add this astests/test_no_globals_references.py::test_no_globals_importsif you want it as a permanent regression guard, otherwise spot-check. - Existing
tests/test_cli.py(or whichever covers Flask CLI) loses any cases forrefresh-global-overlays; add atest_refresh_global_overlays_command_removedthat asserts the click command is not registered.
Implementation:
- Delete files via
git rm. - In
cli.py, remove the command function and its@app.cli.command(...)decorator. Drop any helper imports that become orphaned. - Remove
py7zrfrompyproject.tomland re-lock if a lockfile is present.
Verification:
python3 -m pytest l4d2web/tests/ -q
grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/ || echo "clean"
Commit: refactor(l4d2-web): drop global-overlays subsystem in favor of script type
Task 4: Job worker — drop refresh_global_overlays from scheduler
Files:
- Modify:
l4d2web/services/job_worker.py— remove"refresh_global_overlays"fromGLOBAL_OPERATIONS; removerefresh_global_overlays_runningfield fromSchedulerStateand any references incan_start(); check whetherblocked_servers_by_overlaywas added solely for the globals subsystem and remove if so. - Modify:
l4d2web/tests/test_job_worker.py— droprefresh_global_overlaystruth-table rows; add explicitbuild_overlaytruth-table cases forscript-type overlays (mechanically identical to workshop, but pinned by test).
Test plan:
test_job_worker.py::test_global_operations_set—GLOBAL_OPERATIONS == {"install", "refresh_workshop_items"}(or whatever subset remains; pin it).test_job_worker.py::test_build_overlay_script_type_blocks_per_overlay— startbuild_overlay(overlay_id=7)for ascript-type overlay; assert secondbuild_overlay(overlay_id=7)cannot start; assertbuild_overlay(overlay_id=8)can.test_job_worker.py::test_build_overlay_blocks_server_init_on_blueprint_overlay— existing test, may need re-pinning if it referenced globals.
Implementation:
- Remove the field from the dataclass / TypedDict that backs
SchedulerState. - Remove any update sites that flipped the flag (the worker's enqueue / on-start / on-complete paths).
- The remaining mutex rules (
install/refresh_workshop_itemsare global;build_overlayper-overlay; server ops block on overlays in their blueprint) are unchanged structurally.
Verification:
python3 -m pytest l4d2web/tests/test_job_worker.py -q
Commit: refactor(l4d2-web): drop refresh_global_overlays from scheduler
Task 5: Routes (script update / wipe / build)
Files:
- Modify:
l4d2web/routes/overlay_routes.py— add three POST endpoints. - Create:
l4d2web/tests/test_script_overlay_routes.py.
Test plan (RED first):
test_script_overlay_routes.py::test_create_script_overlay— POST/overlayswith form{"name": "x", "type": "script"}as a regular user → 302 to detail; row exists withtype='script',script='',last_build_status='',user_id=current_user.id,path=str(id).test_script_overlay_routes.py::test_admin_creates_system_wide_script_overlay— admin POST with system-wide flag → row hasuser_id=NULL.test_script_overlay_routes.py::test_update_script_body_enqueues_build— POST/overlays/{id}/scriptwith{"script": "echo new"}→ row.script updated; one newbuild_overlayjob enqueued for the overlay; second immediate POST coalesces (no second job inserted while first is pending).test_script_overlay_routes.py::test_manual_rebuild— POST/overlays/{id}/build→ enqueuesbuild_overlay; coalesces.test_script_overlay_routes.py::test_wipe_runs_find_delete— POST/overlays/{id}/wipe→ invokesScriptBuilder.build(or the underlying helper) with the literal scriptfind /overlay -mindepth 1 -delete. After success, row.last_build_status==''. Does not enqueue abuild_overlay.test_script_overlay_routes.py::test_wipe_refuses_during_running_build— set scheduler state tobuild_overlay(overlay_id=7)running; POST/overlays/7/wipe→ 409 (or whatever the existing pattern uses for scheduler conflicts), no sandbox invocation.test_script_overlay_routes.py::test_permissions_non_owner_denied— user A creates private script overlay; user B POSTs/overlays/{id}/script→ 403.test_script_overlay_routes.py::test_permissions_admin_can_edit_any— admin POSTs/overlays/{id}/scriptfor user A's row → 200.
Implementation:
- Mirror the existing
_can_edit_overlay()permission helper. - The
/wipeendpoint can either (a) callScriptBuilderdirectly with a syntheticOverlay-like object whose.scriptis the find command and whose.idis the real overlay id, or (b) factor a_run_sandbox(overlay_id, script_text, on_stdout, on_stderr, should_cancel)helper out ofScriptBuilder.build()and call it from both. (b) is cleaner; do (b). - Wipe runs synchronously in the request thread (small, fast). It does NOT enqueue a job. Surface log output as flash messages or by streaming through the existing log infra — pick whichever matches the existing wipe-equivalent pattern (workshop overlays don't have a wipe; closest analog is the existing delete-overlay flow).
- The
/scriptendpoint enqueues via the sameenqueue_build_overlay(overlay_id)helper used by workshop overlays' add/remove flows. Coalescing is already implemented there.
Verification:
python3 -m pytest l4d2web/tests/test_script_overlay_routes.py l4d2web/tests/test_overlay_routes.py -q
Commit: feat(l4d2-web): script overlay routes (script update / wipe / build)
Task 6: Templates (overlays.html + overlay_detail.html)
Files:
- Modify:
l4d2web/templates/overlays.html— addscriptto the create-modal type radio (lines ~29–49). - Modify:
l4d2web/templates/overlay_detail.html— add a{% if overlay.type == 'script' %}block with textarea + Save / Rebuild / Wipe buttons + status badge; delete the global-source block (lines ~34–46). - Modify:
l4d2web/tests/test_pages.py— assert script-section renders for type=script, workshop-section renders for type=workshop, global-source-section is absent.
Test plan:
test_pages.py::test_overlay_create_modal_offers_script_type— GET/overlays; HTML containsvalue="script"radio.test_pages.py::test_overlay_detail_script_section— create script overlay, GET/overlays/{id}; HTML contains<textarea name="script">, "Rebuild" button, "Wipe" button, status badge element.test_pages.py::test_overlay_detail_workshop_section_unchanged— existing workshop detail still has thumbnail grid, add-item form, etc.test_pages.py::test_overlay_detail_no_global_source_block— page HTML has no element from the deleted global-source block (check for an attribute or string unique to that block).
Implementation:
- Detail-page wipe button uses a small confirm-modal pattern (copy from the existing delete-overlay confirm modal).
- Status badge: existing CSS classes for ok/warn/error already exist in
static/; reuse them. - No new JS deps. Plain
<form method="post">with HTMXhx-postfor the script update if a streaming UX is desired (match existing patterns).
Verification:
python3 -m pytest l4d2web/tests/test_pages.py -q
Manual: start dev server (flask run), create a script overlay, paste echo "hi" > foo, click Save, watch log stream. Then click Wipe; confirm dir is empty. Then click Rebuild; confirm foo reappears.
Commit: feat(l4d2-web): script overlay UI
Task 7: Libexec sandbox helper + sudoers + deploy-artifacts test
Files:
- Create:
deploy/files/usr/local/libexec/left4me/left4me-script-sandbox(bash, mode 0755 after deploy, owned root). - Modify:
deploy/files/etc/sudoers.d/left4me— append the rule. - Modify:
deploy/tests/test_deploy_artifacts.py— assert helper file present + sudoers contains the new line.
Test plan (RED first):
test_deploy_artifacts.py::test_script_sandbox_helper_present— file exists, mode bits indicate 0755 (or whatever the test framework allows checking pre-deploy), shebang is#!/bin/bash.test_deploy_artifacts.py::test_sudoers_includes_script_sandbox_rule— sudoers file contains the exact lineleft4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox.- Optional integration test (skip on non-Linux dev): drive the helper as a subprocess with a synthesized fake
/var/lib/left4me/overlays/1/and a no-op script, assertbwrapinvocation happens (use a mocksystemd-runorLEFT4ME_SCRIPT_SANDBOX_DRY_RUN=1env that prints the would-be invocation and exits 0). Mirrors theLEFT4ME_OVERLAY_PRINT_ONLY=1pattern from the kernel-overlayfs helper test.
Implementation:
- Helper script verbatim from the spec §Sandbox.
- Sudoers fragment: append (don't replace existing rules). The existing fragment has rules for
left4me-overlay,left4me-systemctl,left4me-journalctl— match the same formatting (one rule per line, no trailing whitespace).
Verification:
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
Commit: feat(deploy): left4me-script-sandbox helper + sudoers fragment
Task 8: Deploy script — provision l4d2-sandbox + bubblewrap; drop globals timer
Files:
- Modify:
deploy/deploy-test-server.sh— adduseradd --system ... l4d2-sandbox, addapt-get install -y bubblewrap, ensure helper installation step picks upleft4me-script-sandbox(likely automatic if it's a glob indeploy/files/usr/local/libexec/left4me/*); drop themkdir global_overlay_cacheline if present. - Delete:
deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.timer - Delete:
deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.service - Modify:
deploy/tests/test_deploy_artifacts.py— assert the two unit files are absent; assertuseradd l4d2-sandboxandapt-get install ... bubblewraplines are present in the deploy script.
Test plan:
test_deploy_artifacts.py::test_globals_refresh_units_removed— files do not exist underdeploy/files/usr/local/lib/systemd/system/.test_deploy_artifacts.py::test_deploy_script_provisions_sandbox_user— grep the deploy script for the useradd line.test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap— grep forbubblewrapin apt invocations.
Implementation:
useraddline uses--system --no-create-home --shell /usr/sbin/nologin. Idempotency: wrap withid l4d2-sandbox &>/dev/null || useradd ....apt-get install: appendbubblewrapto whatever package list the script already maintains.- Globals timer/service deletions:
git rm.
Verification:
python3 -m pytest deploy/tests/ -q
shellcheck deploy/deploy-test-server.sh deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
Commit: chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer
Task 9: Full pytest run + drift fixes
Files: as needed across the repo.
Test plan: run the full test suite for both packages; chase down any drift caused by removed model classes, dropped imports, or template changes.
python3 -m pytest l4d2web/tests/ -q
python3 -m pytest l4d2host/tests/ -q
python3 -m pytest deploy/tests/ -q
Implementation: fix what breaks. Common drift sources to expect:
- Tests that imported from deleted modules.
- Tests that asserted exact
BUILDERSkeyset (good — they should have been updated in Task 2). - Tests that built fixtures with
type='l4d2center_maps'ortype='cedapug_maps'— those tests likely belong to the deleted set or need conversion totype='script'. - Template snapshot tests (if any) that captured the deleted global-source block.
Verification: all three suites green.
Commit: chore(l4d2-web): test suite drift fixes after script-overlays migration (only if drift fixes needed; skip if Tasks 1–8 left the suite green)
End-to-end deployment verification (manual, on test host)
After all tasks committed:
- Reset deploy: run
deploy/deploy-test-server.shfrom clean state. Confirmbubblewrapinstalled (dpkg -l bubblewrap),l4d2-sandboxuser exists (id l4d2-sandbox),/usr/local/libexec/left4me/left4me-script-sandboxis mode 0755 and root-owned,sudo -lnasleft4meshows the new rule. - Sandbox smoke: as
left4me, write/tmp/echo.shcontainingecho $(whoami) > /overlay/sentinel.mkdir -p /var/lib/left4me/overlays/1.sudo /usr/local/libexec/left4me/left4me-script-sandbox 1 /tmp/echo.sh. Confirm/var/lib/left4me/overlays/1/sentinelcontainsl4d2-sandboxand is owned byl4d2-sandbox. Confirm/etc/passwd,/var/lib/left4me/l4d2web.db, and/homeare not visible inside the sandbox by running probe scripts. - Resource limits:
dd if=/dev/zero of=/overlay/big bs=1M count=25000→ succeeds inside sandbox;ScriptBuilder._enforce_disk_budgetflags the build failed;last_build_status='failed'.sleep 7200→ killed at 1 h byRuntimeMaxSec=3600.- Memory hog (
python3 -c "x=' '*(5*1024**3)") → OOM at 4 GB.
- App-level happy path: as a non-admin user, create a script overlay via the UI, paste an old
competitive_rework-style script, Save → build runs, succeeds, addons appear inoverlays/{id}/left4dead2/. Stack onto a server blueprint, start the server, verify content mounts via the L4D2 admin console (map workshop/...). - Wipe: click Wipe → dir empty (find -delete output in log). Click Rebuild → repopulates.
last_build_statuscycles:''→'ok'. - Scheduler: start a server using the script overlay; in another browser tab attempt to Rebuild → 409 / scheduler-blocked. Stop server; rebuild succeeds.
- Audit log:
journalctl --since "5 min ago" | grep run-shows transient scopes per build with cgroup memory accounting visible.
These are not required for any single commit but should pass before declaring the work done.