docs(specs): script overlay type — design + implementation plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9985ecc56c
commit
78ead0b41d
2 changed files with 673 additions and 0 deletions
350
docs/superpowers/plans/2026-05-08-l4d2-script-overlays.md
Normal file
350
docs/superpowers/plans/2026-05-08-l4d2-script-overlays.md
Normal file
|
|
@ -0,0 +1,350 @@
|
|||
# L4D2 Script Overlays Implementation Plan
|
||||
|
||||
> **Approval status:** User-approved 2026-05-08. Implementation proceeds.
|
||||
|
||||
**Goal:** Implement the `script` overlay type per `docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md`. Add an `Overlay.script` TEXT column and `Overlay.last_build_status` enum-string column, a `ScriptBuilder` that runs user bash inside a `bubblewrap` + `systemd-run --scope` sandbox via a new `left4me-script-sandbox` privileged helper, route + UI surface for editing/wiping/rebuilding, and delete the entire managed-globals (`l4d2center_maps`, `cedapug_maps`) subsystem and its daily-refresh timer/CLI.
|
||||
|
||||
**Architecture:** The web app continues to enqueue `build_overlay` jobs for any overlay row. The job worker dispatches via `BUILDERS[overlay.type].build(...)`. After this change `BUILDERS = {"workshop": WorkshopBuilder(), "script": ScriptBuilder()}`. The new `ScriptBuilder` writes `overlay.script` to a tmpfile and execs `sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>`, which itself execs `systemd-run --scope --collect ... -- bwrap [namespace flags] /bin/bash /script.sh`. stdout/stderr stream through the existing `run_with_streamed_output` helper into the existing job-log SSE plumbing. The job-completion path writes `Overlay.last_build_status` based on the build outcome. The kernel-overlayfs mount layer (`KernelOverlayFSMounter`) is unchanged.
|
||||
|
||||
---
|
||||
|
||||
## Locked Decisions
|
||||
|
||||
See `docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md` for design rationale. Implementation-relevant summary:
|
||||
|
||||
- Final overlay type list: `workshop` (unchanged) + `script` (new). Drop `l4d2center_maps`, `cedapug_maps`.
|
||||
- New columns on `overlays`: `script TEXT NOT NULL DEFAULT ''`, `last_build_status VARCHAR(16) NOT NULL DEFAULT ''`.
|
||||
- Drop tables (FK order): `global_overlay_item_files`, `global_overlay_items`, `global_overlay_sources`.
|
||||
- `ScriptBuilder` in `l4d2web/services/overlay_builders.py`, uses existing `run_with_streamed_output`.
|
||||
- Privileged helper `left4me-script-sandbox` (bash, mode 0755, owned root). `systemd-run --scope --collect -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 -p CPUQuota=200% -p RuntimeMaxSec=3600 -- bwrap …`. Limits 1 h walltime, 4 GB RAM, 20 GB post-build `du` cap.
|
||||
- New system user `l4d2-sandbox` (`/usr/sbin/nologin`, no home). New apt dep `bubblewrap`.
|
||||
- Sudoers verb-unrestricted: `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox`.
|
||||
- Daily refresh subsystem deleted: `left4me-refresh-global-overlays.{timer,service}` and `flask refresh-global-overlays` CLI removed. No replacement.
|
||||
- Wipe is the same sandbox helper invoked with the literal script `find /overlay -mindepth 1 -delete`.
|
||||
- `auto_refresh` column NOT added in this iteration.
|
||||
- Test deploy DB is wiped on rollout; migration includes `DELETE FROM overlays WHERE type IN ('l4d2center_maps', 'cedapug_maps')` for safety.
|
||||
|
||||
---
|
||||
|
||||
## Current Gap
|
||||
|
||||
- `l4d2web/models.py` `Overlay` has no `script` or `last_build_status` columns. The 3 globals tables are present.
|
||||
- `l4d2web/services/overlay_builders.py` `BUILDERS = {"workshop": WorkshopBuilder(), "l4d2center_maps": GlobalMapOverlayBuilder(), "cedapug_maps": GlobalMapOverlayBuilder()}`. No `ScriptBuilder`.
|
||||
- `l4d2web/services/{global_map_sources,global_overlay_refresh,global_map_cache,global_overlays}.py` exist and are referenced by routes / CLI.
|
||||
- `l4d2web/services/job_worker.py` carries `refresh_global_overlays_running` plumbing.
|
||||
- `l4d2web/cli.py` defines `refresh-global-overlays`.
|
||||
- `l4d2web/routes/overlay_routes.py` has no `/script`, `/wipe`, or `/build` endpoints for non-workshop types.
|
||||
- `l4d2web/templates/overlays.html` create modal type radio offers only `workshop`.
|
||||
- `l4d2web/templates/overlay_detail.html` has a global-source block (~lines 34–46) that should not survive.
|
||||
- `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.{timer,service}` exist.
|
||||
- `deploy/deploy-test-server.sh` provisions `global_overlay_cache/` and does not provision `l4d2-sandbox` or install `bubblewrap`.
|
||||
- Seven `tests/test_global_*.py` files exist and reference removed code.
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Schema migration (alembic 0005)
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `l4d2web/alembic/versions/0005_script_overlays.py` (revises `0004_drop_legacy_external_overlay_type`).
|
||||
- Modify: `l4d2web/models.py` — `Overlay` gains `script` and `last_build_status` columns; remove `GlobalOverlaySource`, `GlobalOverlayItem`, `GlobalOverlayItemFile` model classes.
|
||||
- Modify: `l4d2web/tests/test_overlay_models.py` (or whichever existing test asserts the Overlay schema; create one if absent) — assert new columns present.
|
||||
|
||||
Test plan (RED first):
|
||||
|
||||
1. `tests/test_alembic_migrations.py::test_upgrade_0005_adds_script_columns` — apply migrations to a fresh in-memory SQLite, assert `script` and `last_build_status` columns present on `overlays`, assert no `global_overlay_*` tables, assert old data wipe `DELETE FROM overlays WHERE type IN (...)` is part of the upgrade.
|
||||
2. `tests/test_alembic_migrations.py::test_downgrade_0005_restores_globals` (only if downgrade is supported in the project's migration policy; skip with `pytest.skip` if not — kernel-overlayfs migration is one-way, follow that precedent).
|
||||
3. `tests/test_overlay_models.py::test_overlay_has_script_columns` — `Overlay(...)` instance has `script=''` and `last_build_status=''` defaults.
|
||||
|
||||
Implementation:
|
||||
|
||||
- Migration uses `op.drop_table('global_overlay_item_files')` etc. in correct FK order; uses `op.add_column('overlays', sa.Column('script', sa.Text(), nullable=False, server_default=''))` and similar for `last_build_status` (`sa.String(16)`).
|
||||
- The `DELETE FROM overlays WHERE type IN ('l4d2center_maps','cedapug_maps')` runs *before* the column additions so the operation is straightforward — these rows do not reference the new columns.
|
||||
- `models.py`: delete the three globals model classes outright; add the two new columns to `Overlay` with explicit defaults.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/test_alembic_migrations.py l4d2web/tests/test_overlay_models.py -q
|
||||
```
|
||||
|
||||
**Commit:** `feat(l4d2-web): script overlay schema — add overlay.script + last_build_status, drop globals tables`
|
||||
|
||||
---
|
||||
|
||||
## Task 2: ScriptBuilder + BUILDERS registry update
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `l4d2web/services/overlay_builders.py` — add `ScriptBuilder`, remove `GlobalMapOverlayBuilder`, change `BUILDERS` dict.
|
||||
- Rewrite: `l4d2web/tests/test_overlay_builders.py` — drop globals-builder tests, add ScriptBuilder tests.
|
||||
|
||||
Test plan (RED first):
|
||||
|
||||
1. `test_overlay_builders.py::test_builders_registry` — `set(BUILDERS) == {"workshop", "script"}`. Assert `"l4d2center_maps"` and `"cedapug_maps"` and `"external"` are absent.
|
||||
2. `test_overlay_builders.py::test_script_builder_invokes_helper` — patch `run_with_streamed_output` to capture argv; build an `Overlay(id=42, type='script', script='echo hi')`; assert argv shape `["sudo", "-n", "/usr/local/libexec/left4me/left4me-script-sandbox", "42", <script_path>]` and that the script_path file exists with content `"echo hi"` at invocation time. Verify the tmpfile is unlinked after build.
|
||||
3. `test_overlay_builders.py::test_script_builder_disk_cap` — fake `subprocess.check_output` for `du` to return `25000000000`; build raises `BuildError("disk-cap-exceeded")` and `on_stderr` was called with the cap message.
|
||||
4. `test_overlay_builders.py::test_script_builder_streams_output` — fake `run_with_streamed_output` invokes both `on_stdout("hello\n")` and `on_stderr("warn\n")`; both lambda lists capture the lines.
|
||||
5. `test_overlay_builders.py::test_script_builder_cancel` — `should_cancel` returns True after the first stdout line; assert `run_with_streamed_output` propagated cancellation (the existing helper's contract — the test just ensures we pass `should_cancel` through and don't run the disk-budget check on cancel).
|
||||
6. `test_overlay_builders.py::test_workshop_builder_unchanged` — smoke test that `WorkshopBuilder` still exists and is invokable (regression guard against accidental removal during refactor).
|
||||
|
||||
Implementation:
|
||||
|
||||
- Add `import os, subprocess, tempfile` at the top of `overlay_builders.py` if not present.
|
||||
- `ScriptBuilder` exactly as in the spec (verbatim copy from the design doc, §Build Lifecycle).
|
||||
- Define a small `BuildError` exception class if one doesn't already exist locally; reuse the existing one if `WorkshopBuilder` already raises a similar type.
|
||||
- `_enforce_disk_budget` calls `subprocess.check_output(["du", "-sb", str(overlay_path(overlay_id))])`; the existing `overlay_path` helper in the module already returns the absolute Path. Parse first whitespace-delimited integer; cap is `20 * 1024**3`.
|
||||
- Job-completion path: locate the existing path that handles `build_overlay` job success/failure (likely in `services/job_worker.py` or a related orchestration module). Add a single column write: on success `last_build_status='ok'`, on `BuildError` / non-zero exit / cancel `last_build_status='failed'`. Add a `tests/test_job_worker.py::test_build_overlay_writes_last_build_status` covering both branches.
|
||||
- Remove `GlobalMapOverlayBuilder` class and any helper functions it owns that are not used elsewhere.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/test_overlay_builders.py l4d2web/tests/test_job_worker.py -q
|
||||
```
|
||||
|
||||
**Commit:** `feat(l4d2-web): ScriptBuilder + BUILDERS registry update`
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Delete global-overlay services + CLI command + their tests
|
||||
|
||||
**Files:**
|
||||
|
||||
- Delete: `l4d2web/services/global_map_sources.py`
|
||||
- Delete: `l4d2web/services/global_overlay_refresh.py`
|
||||
- Delete: `l4d2web/services/global_map_cache.py`
|
||||
- Delete: `l4d2web/services/global_overlays.py`
|
||||
- Modify: `l4d2web/cli.py` — remove `refresh-global-overlays` command (lines ~44–55). Drop any imports that go orphaned.
|
||||
- Delete: `l4d2web/tests/test_global_map_sources.py`
|
||||
- Delete: `l4d2web/tests/test_global_overlay_models.py`
|
||||
- Delete: `l4d2web/tests/test_global_overlay_builders.py`
|
||||
- Delete: `l4d2web/tests/test_global_overlay_cli.py`
|
||||
- Delete: `l4d2web/tests/test_global_overlay_refresh.py`
|
||||
- Delete: `l4d2web/tests/test_global_overlays.py`
|
||||
- Delete: `l4d2web/tests/test_global_map_cache.py`
|
||||
- Audit & fix: any other module that imports the deleted modules. Likely candidates: `l4d2web/app.py` (CLI registration), `routes/overlay_routes.py`, `routes/page_routes.py`. Resolve by deletion of the dead import / call site, not by stubbing.
|
||||
- Modify: `pyproject.toml` — drop `py7zr` from dependencies (only used by the deleted globals subsystem).
|
||||
|
||||
Test plan:
|
||||
|
||||
1. RED-first via grep: `grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/` — should return zero hits at the end of this task. Add this as `tests/test_no_globals_references.py::test_no_globals_imports` if you want it as a permanent regression guard, otherwise spot-check.
|
||||
2. Existing `tests/test_cli.py` (or whichever covers Flask CLI) loses any cases for `refresh-global-overlays`; add a `test_refresh_global_overlays_command_removed` that asserts the click command is not registered.
|
||||
|
||||
Implementation:
|
||||
|
||||
- Delete files via `git rm`.
|
||||
- In `cli.py`, remove the command function and its `@app.cli.command(...)` decorator. Drop any helper imports that become orphaned.
|
||||
- Remove `py7zr` from `pyproject.toml` and re-lock if a lockfile is present.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/ -q
|
||||
grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/ || echo "clean"
|
||||
```
|
||||
|
||||
**Commit:** `refactor(l4d2-web): drop global-overlays subsystem in favor of script type`
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Job worker — drop refresh_global_overlays from scheduler
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `l4d2web/services/job_worker.py` — remove `"refresh_global_overlays"` from `GLOBAL_OPERATIONS`; remove `refresh_global_overlays_running` field from `SchedulerState` and any references in `can_start()`; check whether `blocked_servers_by_overlay` was added solely for the globals subsystem and remove if so.
|
||||
- Modify: `l4d2web/tests/test_job_worker.py` — drop `refresh_global_overlays` truth-table rows; add explicit `build_overlay` truth-table cases for `script`-type overlays (mechanically identical to workshop, but pinned by test).
|
||||
|
||||
Test plan:
|
||||
|
||||
1. `test_job_worker.py::test_global_operations_set` — `GLOBAL_OPERATIONS == {"install", "refresh_workshop_items"}` (or whatever subset remains; pin it).
|
||||
2. `test_job_worker.py::test_build_overlay_script_type_blocks_per_overlay` — start `build_overlay(overlay_id=7)` for a `script`-type overlay; assert second `build_overlay(overlay_id=7)` cannot start; assert `build_overlay(overlay_id=8)` can.
|
||||
3. `test_job_worker.py::test_build_overlay_blocks_server_init_on_blueprint_overlay` — existing test, may need re-pinning if it referenced globals.
|
||||
|
||||
Implementation:
|
||||
|
||||
- Remove the field from the dataclass / TypedDict that backs `SchedulerState`.
|
||||
- Remove any update sites that flipped the flag (the worker's enqueue / on-start / on-complete paths).
|
||||
- The remaining mutex rules (`install` / `refresh_workshop_items` are global; `build_overlay` per-overlay; server ops block on overlays in their blueprint) are unchanged structurally.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/test_job_worker.py -q
|
||||
```
|
||||
|
||||
**Commit:** `refactor(l4d2-web): drop refresh_global_overlays from scheduler`
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Routes (script update / wipe / build)
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `l4d2web/routes/overlay_routes.py` — add three POST endpoints.
|
||||
- Create: `l4d2web/tests/test_script_overlay_routes.py`.
|
||||
|
||||
Test plan (RED first):
|
||||
|
||||
1. `test_script_overlay_routes.py::test_create_script_overlay` — POST `/overlays` with form `{"name": "x", "type": "script"}` as a regular user → 302 to detail; row exists with `type='script'`, `script=''`, `last_build_status=''`, `user_id=current_user.id`, `path=str(id)`.
|
||||
2. `test_script_overlay_routes.py::test_admin_creates_system_wide_script_overlay` — admin POST with system-wide flag → row has `user_id=NULL`.
|
||||
3. `test_script_overlay_routes.py::test_update_script_body_enqueues_build` — POST `/overlays/{id}/script` with `{"script": "echo new"}` → row.script updated; one new `build_overlay` job enqueued for the overlay; second immediate POST coalesces (no second job inserted while first is pending).
|
||||
4. `test_script_overlay_routes.py::test_manual_rebuild` — POST `/overlays/{id}/build` → enqueues `build_overlay`; coalesces.
|
||||
5. `test_script_overlay_routes.py::test_wipe_runs_find_delete` — POST `/overlays/{id}/wipe` → invokes `ScriptBuilder.build` (or the underlying helper) with the literal script `find /overlay -mindepth 1 -delete`. After success, row.last_build_status `==''`. Does not enqueue a `build_overlay`.
|
||||
6. `test_script_overlay_routes.py::test_wipe_refuses_during_running_build` — set scheduler state to `build_overlay(overlay_id=7)` running; POST `/overlays/7/wipe` → 409 (or whatever the existing pattern uses for scheduler conflicts), no sandbox invocation.
|
||||
7. `test_script_overlay_routes.py::test_permissions_non_owner_denied` — user A creates private script overlay; user B POSTs `/overlays/{id}/script` → 403.
|
||||
8. `test_script_overlay_routes.py::test_permissions_admin_can_edit_any` — admin POSTs `/overlays/{id}/script` for user A's row → 200.
|
||||
|
||||
Implementation:
|
||||
|
||||
- Mirror the existing `_can_edit_overlay()` permission helper.
|
||||
- The `/wipe` endpoint can either (a) call `ScriptBuilder` directly with a synthetic `Overlay`-like object whose `.script` is the find command and whose `.id` is the real overlay id, or (b) factor a `_run_sandbox(overlay_id, script_text, on_stdout, on_stderr, should_cancel)` helper out of `ScriptBuilder.build()` and call it from both. (b) is cleaner; do (b).
|
||||
- Wipe runs **synchronously** in the request thread (small, fast). It does NOT enqueue a job. Surface log output as flash messages or by streaming through the existing log infra — pick whichever matches the existing wipe-equivalent pattern (workshop overlays don't have a wipe; closest analog is the existing delete-overlay flow).
|
||||
- The `/script` endpoint enqueues via the same `enqueue_build_overlay(overlay_id)` helper used by workshop overlays' add/remove flows. Coalescing is already implemented there.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/test_script_overlay_routes.py l4d2web/tests/test_overlay_routes.py -q
|
||||
```
|
||||
|
||||
**Commit:** `feat(l4d2-web): script overlay routes (script update / wipe / build)`
|
||||
|
||||
---
|
||||
|
||||
## Task 6: Templates (overlays.html + overlay_detail.html)
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `l4d2web/templates/overlays.html` — add `script` to the create-modal type radio (lines ~29–49).
|
||||
- Modify: `l4d2web/templates/overlay_detail.html` — add a `{% if overlay.type == 'script' %}` block with textarea + Save / Rebuild / Wipe buttons + status badge; delete the global-source block (lines ~34–46).
|
||||
- Modify: `l4d2web/tests/test_pages.py` — assert script-section renders for type=`script`, workshop-section renders for type=`workshop`, global-source-section is absent.
|
||||
|
||||
Test plan:
|
||||
|
||||
1. `test_pages.py::test_overlay_create_modal_offers_script_type` — GET `/overlays`; HTML contains `value="script"` radio.
|
||||
2. `test_pages.py::test_overlay_detail_script_section` — create script overlay, GET `/overlays/{id}`; HTML contains `<textarea name="script">`, "Rebuild" button, "Wipe" button, status badge element.
|
||||
3. `test_pages.py::test_overlay_detail_workshop_section_unchanged` — existing workshop detail still has thumbnail grid, add-item form, etc.
|
||||
4. `test_pages.py::test_overlay_detail_no_global_source_block` — page HTML has no element from the deleted global-source block (check for an attribute or string unique to that block).
|
||||
|
||||
Implementation:
|
||||
|
||||
- Detail-page wipe button uses a small confirm-modal pattern (copy from the existing delete-overlay confirm modal).
|
||||
- Status badge: existing CSS classes for ok/warn/error already exist in `static/`; reuse them.
|
||||
- No new JS deps. Plain `<form method="post">` with HTMX `hx-post` for the script update if a streaming UX is desired (match existing patterns).
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/test_pages.py -q
|
||||
```
|
||||
|
||||
Manual: start dev server (`flask run`), create a script overlay, paste `echo "hi" > foo`, click Save, watch log stream. Then click Wipe; confirm dir is empty. Then click Rebuild; confirm `foo` reappears.
|
||||
|
||||
**Commit:** `feat(l4d2-web): script overlay UI`
|
||||
|
||||
---
|
||||
|
||||
## Task 7: Libexec sandbox helper + sudoers + deploy-artifacts test
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` (bash, mode 0755 after deploy, owned root).
|
||||
- Modify: `deploy/files/etc/sudoers.d/left4me` — append the rule.
|
||||
- Modify: `deploy/tests/test_deploy_artifacts.py` — assert helper file present + sudoers contains the new line.
|
||||
|
||||
Test plan (RED first):
|
||||
|
||||
1. `test_deploy_artifacts.py::test_script_sandbox_helper_present` — file exists, mode bits indicate 0755 (or whatever the test framework allows checking pre-deploy), shebang is `#!/bin/bash`.
|
||||
2. `test_deploy_artifacts.py::test_sudoers_includes_script_sandbox_rule` — sudoers file contains the exact line `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox`.
|
||||
3. Optional integration test (skip on non-Linux dev): drive the helper as a subprocess with a synthesized fake `/var/lib/left4me/overlays/1/` and a no-op script, assert `bwrap` invocation happens (use a mock `systemd-run` or `LEFT4ME_SCRIPT_SANDBOX_DRY_RUN=1` env that prints the would-be invocation and exits 0). Mirrors the `LEFT4ME_OVERLAY_PRINT_ONLY=1` pattern from the kernel-overlayfs helper test.
|
||||
|
||||
Implementation:
|
||||
|
||||
- Helper script verbatim from the spec §Sandbox.
|
||||
- Sudoers fragment: append (don't replace existing rules). The existing fragment has rules for `left4me-overlay`, `left4me-systemctl`, `left4me-journalctl` — match the same formatting (one rule per line, no trailing whitespace).
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
|
||||
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
|
||||
```
|
||||
|
||||
**Commit:** `feat(deploy): left4me-script-sandbox helper + sudoers fragment`
|
||||
|
||||
---
|
||||
|
||||
## Task 8: Deploy script — provision l4d2-sandbox + bubblewrap; drop globals timer
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `deploy/deploy-test-server.sh` — add `useradd --system ... l4d2-sandbox`, add `apt-get install -y bubblewrap`, ensure helper installation step picks up `left4me-script-sandbox` (likely automatic if it's a glob in `deploy/files/usr/local/libexec/left4me/*`); drop the `mkdir global_overlay_cache` line if present.
|
||||
- Delete: `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.timer`
|
||||
- Delete: `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.service`
|
||||
- Modify: `deploy/tests/test_deploy_artifacts.py` — assert the two unit files are absent; assert `useradd l4d2-sandbox` and `apt-get install ... bubblewrap` lines are present in the deploy script.
|
||||
|
||||
Test plan:
|
||||
|
||||
1. `test_deploy_artifacts.py::test_globals_refresh_units_removed` — files do not exist under `deploy/files/usr/local/lib/systemd/system/`.
|
||||
2. `test_deploy_artifacts.py::test_deploy_script_provisions_sandbox_user` — grep the deploy script for the useradd line.
|
||||
3. `test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap` — grep for `bubblewrap` in apt invocations.
|
||||
|
||||
Implementation:
|
||||
|
||||
- `useradd` line uses `--system --no-create-home --shell /usr/sbin/nologin`. Idempotency: wrap with `id l4d2-sandbox &>/dev/null || useradd ...`.
|
||||
- `apt-get install`: append `bubblewrap` to whatever package list the script already maintains.
|
||||
- Globals timer/service deletions: `git rm`.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```
|
||||
python3 -m pytest deploy/tests/ -q
|
||||
shellcheck deploy/deploy-test-server.sh deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
|
||||
```
|
||||
|
||||
**Commit:** `chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer`
|
||||
|
||||
---
|
||||
|
||||
## Task 9: Full pytest run + drift fixes
|
||||
|
||||
**Files:** as needed across the repo.
|
||||
|
||||
Test plan: run the full test suite for both packages; chase down any drift caused by removed model classes, dropped imports, or template changes.
|
||||
|
||||
```
|
||||
python3 -m pytest l4d2web/tests/ -q
|
||||
python3 -m pytest l4d2host/tests/ -q
|
||||
python3 -m pytest deploy/tests/ -q
|
||||
```
|
||||
|
||||
Implementation: fix what breaks. Common drift sources to expect:
|
||||
|
||||
- Tests that imported from deleted modules.
|
||||
- Tests that asserted exact `BUILDERS` keyset (good — they should have been updated in Task 2).
|
||||
- Tests that built fixtures with `type='l4d2center_maps'` or `type='cedapug_maps'` — those tests likely belong to the deleted set or need conversion to `type='script'`.
|
||||
- Template snapshot tests (if any) that captured the deleted global-source block.
|
||||
|
||||
**Verification:** all three suites green.
|
||||
|
||||
**Commit:** `chore(l4d2-web): test suite drift fixes after script-overlays migration` (only if drift fixes needed; skip if Tasks 1–8 left the suite green)
|
||||
|
||||
---
|
||||
|
||||
## End-to-end deployment verification (manual, on test host)
|
||||
|
||||
After all tasks committed:
|
||||
|
||||
1. **Reset deploy:** run `deploy/deploy-test-server.sh` from clean state. Confirm `bubblewrap` installed (`dpkg -l bubblewrap`), `l4d2-sandbox` user exists (`id l4d2-sandbox`), `/usr/local/libexec/left4me/left4me-script-sandbox` is mode 0755 and root-owned, `sudo -ln` as `left4me` shows the new rule.
|
||||
2. **Sandbox smoke:** as `left4me`, write `/tmp/echo.sh` containing `echo $(whoami) > /overlay/sentinel`. `mkdir -p /var/lib/left4me/overlays/1`. `sudo /usr/local/libexec/left4me/left4me-script-sandbox 1 /tmp/echo.sh`. Confirm `/var/lib/left4me/overlays/1/sentinel` contains `l4d2-sandbox` and is owned by `l4d2-sandbox`. Confirm `/etc/passwd`, `/var/lib/left4me/l4d2web.db`, and `/home` are not visible inside the sandbox by running probe scripts.
|
||||
3. **Resource limits:**
|
||||
- `dd if=/dev/zero of=/overlay/big bs=1M count=25000` → succeeds inside sandbox; `ScriptBuilder._enforce_disk_budget` flags the build failed; `last_build_status='failed'`.
|
||||
- `sleep 7200` → killed at 1 h by `RuntimeMaxSec=3600`.
|
||||
- Memory hog (`python3 -c "x=' '*(5*1024**3)"`) → OOM at 4 GB.
|
||||
4. **App-level happy path:** as a non-admin user, create a script overlay via the UI, paste an old `competitive_rework`-style script, Save → build runs, succeeds, addons appear in `overlays/{id}/left4dead2/`. Stack onto a server blueprint, start the server, verify content mounts via the L4D2 admin console (`map workshop/...`).
|
||||
5. **Wipe:** click Wipe → dir empty (find -delete output in log). Click Rebuild → repopulates. `last_build_status` cycles: `''` → `'ok'`.
|
||||
6. **Scheduler:** start a server using the script overlay; in another browser tab attempt to Rebuild → 409 / scheduler-blocked. Stop server; rebuild succeeds.
|
||||
7. **Audit log:** `journalctl --since "5 min ago" | grep run-` shows transient scopes per build with cgroup memory accounting visible.
|
||||
|
||||
These are not required for any single commit but should pass before declaring the work done.
|
||||
323
docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md
Normal file
323
docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md
Normal file
|
|
@ -0,0 +1,323 @@
|
|||
# L4D2 Script Overlays Design
|
||||
|
||||
**Goal:** Add a single new overlay type, `script`, that lets users author arbitrary build recipes as bash and runs them inside a `bubblewrap` + `systemd-run --scope` sandbox. The new type subsumes the existing `l4d2center_maps` and `cedapug_maps` managed-globals overlay types, both of which are removed in the same change. After this work the overlay type list is exactly `workshop` (unchanged) and `script` (new).
|
||||
|
||||
**Approval status:** User-approved design direction. Implementation proceeds in lockstep with the companion plan at `docs/superpowers/plans/2026-05-08-l4d2-script-overlays.md`.
|
||||
|
||||
## Context
|
||||
|
||||
`left4me` users today have two ways to add content to a server: workshop overlays (rich UI for Steam Workshop items via `WorkshopBuilder`) and a pair of managed global-map overlay types (`l4d2center_maps`, `cedapug_maps`) with bespoke parsers, per-item DB rows, ETag-based change detection, and a daily refresh timer. They cannot author arbitrary build recipes.
|
||||
|
||||
The user's previous setup at `ckn-bw/bundles/left4dead2/files/scripts/overlays/` expressed every recipe as a small bash file: `competitive_rework` (GitHub tarball download), `tickrate` (inline `server.cfg` + addon DLL fetch), `standard` (workshop items + admin-list write), `workshop_maps` (workshop collection import), `l4d2center_maps` (CSV-driven map sync). All five fit naturally into a single "run a sandboxed bash script that populates the overlay dir" model.
|
||||
|
||||
The two managed global-map types in the current codebase are over-engineered for what they do — each is essentially "fetch a manifest, download archives, extract VPKs, place in `addons/`." Folding them into the new `script` type eliminates three database tables, two source-parser modules, the `GlobalMapOverlayBuilder`, the `py7zr` dependency, the global-overlay cache root, and the managed-singleton machinery, while letting an admin paste the equivalent shell code (which the user already wrote years ago) into a normal admin-owned, system-wide script overlay.
|
||||
|
||||
The trust model for the sandbox is "semi-public deployment, registered users." The threat surface is one user reading another user's overlay, the application DB, or arbitrary host secrets, plus runaway scripts exhausting disk/CPU/RAM. Network access is *not* restricted — scripts must be able to download from arbitrary URLs (GitHub, l4d2center, Steam CDN). Sandbox boundaries are namespace-based (mount, PID, IPC, UTS, cgroup), not command-allowlist-based; binary-allowlist sandboxing of bash is theatre because of `eval` and `exec`.
|
||||
|
||||
The test deploy DB is wiped as part of rollout; no data migration is performed. Existing user blueprints that reference `l4d2center_maps` or `cedapug_maps` overlay rows do not survive the change in the test environment.
|
||||
|
||||
A scheduled-refresh feature (the daily timer that today drives the global-map types) is intentionally **out of scope for this iteration**. The two existing systemd units and the `flask refresh-global-overlays` CLI command are deleted with no replacement. Refresh is reintroduced in a later iteration designed against concrete needs.
|
||||
|
||||
## Locked Decisions
|
||||
|
||||
1. **Single new overlay type: `script`.** Replaces both managed-globals types. Final type list: `workshop` + `script`. No `tarball`/`inline`/`manual` types — all of those collapse into `script` (with UI templates as a future ergonomics improvement).
|
||||
2. **`Overlay.script` is a DB `TEXT` column** holding the raw bash. No file storage, no revision history in v1. Empty string for `workshop` rows.
|
||||
3. **Build idempotency contract: script runs against the existing overlay dir.** No automatic wipe between builds. Users write `test -f … || curl …`-style guards if they want bandwidth efficiency. A manual "Wipe overlay" button on the detail page resets the dir to empty.
|
||||
4. **No left4me-aware helpers in the sandbox.** The script sees pure bash plus whatever's in `/usr` (RO bind-mount of the host). Workshop items are not exposed via a helper — users wanting workshop content create a `workshop`-type overlay, which has its own first-class UX (thumbnails, collection paste, dedup cache, refresh).
|
||||
5. **Sandbox engine: `bubblewrap` (`bwrap`) inside `systemd-run --scope --collect`.** `systemd-run` provides cgroup v2 limits + walltime kill via `RuntimeMaxSec`; `bwrap` provides the namespace isolation. Both are stable, well-audited, in-tree on Debian.
|
||||
6. **Resource limits (system-wide, not per-overlay):** 1 hour walltime (`RuntimeMaxSec=3600`), 4 GB RAM (`MemoryMax=4G`, `MemorySwapMax=0`), 512 tasks, 200% CPU quota, post-build 20 GB disk cap on `du -sb` of the overlay dir.
|
||||
7. **Network: host-shared.** No `--unshare-net`. Scripts have full outbound. Egress filtering is not in v1; the sandbox prevents reading internal state but does not prevent talking to internal IPs. Acceptable for the current trust model.
|
||||
8. **No auto-seeding of "default" overlays.** Admin manually creates the equivalents of the old `l4d2center-maps`/`cedapug-maps` post-deploy by pasting the bash. The deploy script does not insert overlay rows.
|
||||
9. **Daily/scheduled refresh: out of scope for this iteration.** No `auto_refresh` flag, no timer, no CLI command. Manual rebuild via the detail-page button is the only build trigger after this change.
|
||||
10. **Permissions mirror workshop overlays.** Any logged-in user can create a private (`user_id = me`) script overlay. Admin can create system-wide (`user_id = NULL`). Owner or admin can edit/delete.
|
||||
11. **Failure semantics via `Overlay.last_build_status`** (`'' | 'ok' | 'failed'`). Drives a "rebuild required" badge on the list and detail pages. Server initialization does **not** auto-block on `failed` (matches workshop's current behavior).
|
||||
12. **Wipe is just another sandbox invocation.** The wipe endpoint runs the literal script `find /overlay -mindepth 1 -delete` through the same `left4me-script-sandbox` helper. No second helper, no privilege/UID puzzle (files are owned by `l4d2-sandbox`, who runs the wipe). After a successful wipe, `last_build_status` is reset to `''`. Wipe does **not** auto-enqueue a rebuild — the user decides.
|
||||
13. **Privileged helper: `/usr/local/libexec/left4me/left4me-script-sandbox`.** Same pattern as the existing `left4me-overlay`, `left4me-systemctl`, `left4me-journalctl` helpers. Bash, owned root, mode 0755. The web user invokes it via `sudo -n` per a sudoers fragment. Root is needed to set up the namespaces; bwrap drops to the unprivileged `l4d2-sandbox` UID immediately.
|
||||
14. **Dedicated sandbox UID `l4d2-sandbox`** (system user, `/usr/sbin/nologin`, no home). Owns nothing on the host outside what bwrap binds in. UID-drop happens inside the bwrap invocation via `--uid`/`--gid`.
|
||||
15. **Strict argument validation in the helper.** Overlay id matches `^[0-9]+$`; overlay dir must exist under `/var/lib/left4me/overlays/`; script path must exist. Defense in depth — the real authorization check lives in the web app.
|
||||
16. **Streaming I/O via the existing `run_with_streamed_output` helper.** Same plumbing `WorkshopBuilder` already uses for `steamcmd`/`curl` invocations. No new SSE/log path.
|
||||
|
||||
## Architecture
|
||||
|
||||
```text
|
||||
Overlay row (type=script, script=TEXT, last_build_status)
|
||||
│
|
||||
▼ build_overlay(overlay_id) job
|
||||
│
|
||||
▼ BUILDERS["script"].build(overlay, on_stdout, on_stderr, should_cancel)
|
||||
│
|
||||
▼ ScriptBuilder writes overlay.script → tmpfile, then:
|
||||
│ sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>
|
||||
│
|
||||
▼ Helper validates args, then exec()s:
|
||||
│ systemd-run --scope --collect
|
||||
│ -p MemoryMax=4G -p MemorySwapMax=0
|
||||
│ -p TasksMax=512 -p CPUQuota=200%
|
||||
│ -p RuntimeMaxSec=3600
|
||||
│ -- bwrap [namespace flags...] /bin/bash /script.sh
|
||||
│
|
||||
▼ Inside the sandbox the script sees:
|
||||
│ /overlay ← /var/lib/left4me/overlays/{id} RW (the build target)
|
||||
│ /tmp,/run ← fresh tmpfs RW (ephemeral)
|
||||
│ /usr,/lib,/lib64,/etc/{ssl,resolv.conf,nsswitch} RO (host-curated)
|
||||
│ /proc,/dev ← fresh
|
||||
│ network ← shared with host
|
||||
│ UID/GID ← l4d2-sandbox (no_new_privs implicit in bwrap)
|
||||
│
|
||||
▼ stdout/stderr → run_with_streamed_output → existing job-log SSE stream
|
||||
▼ After exit:
|
||||
│ exit 0 ∧ du -sb /overlay ≤ 20 GB → last_build_status='ok'
|
||||
│ any other outcome → last_build_status='failed'
|
||||
```
|
||||
|
||||
The host library (`l4d2host`) is unchanged. The `KernelOverlayFSMounter` already mounts whatever's at `overlays/{id}/` regardless of how it got there. The Job model and worker model are essentially unchanged — `script` is just another overlay type for the same `build_overlay` operation that today supports `workshop`.
|
||||
|
||||
```python
|
||||
BUILDERS = {
|
||||
"workshop": WorkshopBuilder(),
|
||||
"script": ScriptBuilder(),
|
||||
}
|
||||
```
|
||||
|
||||
## Data Model
|
||||
|
||||
### `Overlay` (modified)
|
||||
|
||||
```text
|
||||
id INTEGER PK AUTOINCREMENT
|
||||
name VARCHAR(255) NOT NULL
|
||||
path VARCHAR(255) NOT NULL -- str(id) for new rows
|
||||
type VARCHAR(16) NOT NULL -- 'workshop' | 'script'
|
||||
user_id INTEGER NULL REFERENCES users(id) -- NULL = system-wide
|
||||
|
||||
script TEXT NOT NULL DEFAULT '' -- new; meaningful for type='script'
|
||||
last_build_status VARCHAR(16) NOT NULL DEFAULT '' -- new; '' | 'ok' | 'failed'
|
||||
|
||||
created_at, updated_at
|
||||
|
||||
UNIQUE INDEX on (name) WHERE user_id IS NULL
|
||||
UNIQUE INDEX on (name, user_id) WHERE user_id IS NOT NULL
|
||||
INDEX on (type, user_id)
|
||||
```
|
||||
|
||||
### Tables removed
|
||||
|
||||
- `global_overlay_item_files`
|
||||
- `global_overlay_items`
|
||||
- `global_overlay_sources`
|
||||
|
||||
Drop order matters for the SQLite migration: drop `_item_files` first (FK to `_items`), then `_items` (FK to `_sources`), then `_sources` (FK to `overlays`).
|
||||
|
||||
### Unchanged
|
||||
|
||||
`WorkshopItem`, `overlay_workshop_items`, `Job` (including `Job.overlay_id` and nullable `Job.user_id`), `Server`, `Blueprint`, etc.
|
||||
|
||||
## Filesystem Layout
|
||||
|
||||
```text
|
||||
${LEFT4ME_ROOT}/
|
||||
overlays/
|
||||
{overlay_id}/ # script writes here; mounted by host
|
||||
left4dead2/... # whatever the script produces
|
||||
workshop_cache/{steam_id}.vpk # workshop type only — unchanged
|
||||
|
||||
# removed:
|
||||
# global_overlay_cache/ # was used by managed-globals types
|
||||
```
|
||||
|
||||
Single tree per overlay. No per-overlay scratch cache (the chosen idempotency model is "script runs against existing dir," so any caching the user wants lives inside the overlay dir and is preserved between builds).
|
||||
|
||||
The sandbox bind-mounts `${LEFT4ME_ROOT}/overlays/{id}/` to `/overlay` (RW). Nothing else under `${LEFT4ME_ROOT}` is visible inside the sandbox.
|
||||
|
||||
## Sandbox
|
||||
|
||||
### Helper script
|
||||
|
||||
`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, mode 0755, owned root:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# args: <overlay_id> <script_path>
|
||||
set -euo pipefail
|
||||
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
|
||||
OVERLAY_ID=$1; SCRIPT=$2
|
||||
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
|
||||
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
|
||||
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir" >&2; exit 65; }
|
||||
[[ -f $SCRIPT ]] || { echo "no script" >&2; exit 65; }
|
||||
|
||||
SBX_UID=$(id -u l4d2-sandbox); SBX_GID=$(id -g l4d2-sandbox)
|
||||
|
||||
exec systemd-run --quiet --scope --collect \
|
||||
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
|
||||
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
|
||||
-- bwrap \
|
||||
--die-with-parent --new-session \
|
||||
--unshare-pid --unshare-ipc --unshare-uts --unshare-cgroup \
|
||||
--uid "$SBX_UID" --gid "$SBX_GID" \
|
||||
--proc /proc --dev /dev --tmpfs /tmp --tmpfs /run \
|
||||
--ro-bind /usr /usr --ro-bind /lib /lib --ro-bind /lib64 /lib64 \
|
||||
--symlink usr/bin /bin --symlink usr/sbin /sbin \
|
||||
--ro-bind /etc/resolv.conf /etc/resolv.conf \
|
||||
--ro-bind /etc/ssl /etc/ssl \
|
||||
--ro-bind /etc/ca-certificates /etc/ca-certificates \
|
||||
--ro-bind /etc/nsswitch.conf /etc/nsswitch.conf \
|
||||
--bind "$OVERLAY_DIR" /overlay \
|
||||
--chdir /overlay \
|
||||
--setenv HOME /tmp --setenv PATH /usr/bin:/usr/sbin \
|
||||
--setenv OVERLAY /overlay \
|
||||
--ro-bind "$SCRIPT" /script.sh \
|
||||
/bin/bash /script.sh
|
||||
```
|
||||
|
||||
Network is *not* unshared (no `--unshare-net`); the sandbox shares the host network namespace. Every transient unit is visible via `systemctl list-units --type=scope` while running and journaled afterward (`journalctl --user-unit=run-…scope` or system journal depending on invocation).
|
||||
|
||||
### Sudoers fragment
|
||||
|
||||
Append to `deploy/files/etc/sudoers.d/left4me`:
|
||||
|
||||
```
|
||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
|
||||
```
|
||||
|
||||
### System user
|
||||
|
||||
Provisioned in `deploy/deploy-test-server.sh`:
|
||||
|
||||
```bash
|
||||
useradd --system --no-create-home --shell /usr/sbin/nologin l4d2-sandbox
|
||||
apt-get install -y bubblewrap
|
||||
```
|
||||
|
||||
## Build Lifecycle
|
||||
|
||||
`ScriptBuilder` lives in `l4d2web/services/overlay_builders.py` next to `WorkshopBuilder`:
|
||||
|
||||
```python
|
||||
class ScriptBuilder:
|
||||
def build(self, overlay, *, on_stdout, on_stderr, should_cancel):
|
||||
with tempfile.NamedTemporaryFile("w", suffix=".sh", delete=False) as f:
|
||||
f.write(overlay.script or "")
|
||||
script_path = f.name
|
||||
try:
|
||||
cmd = [
|
||||
"sudo", "-n",
|
||||
"/usr/local/libexec/left4me/left4me-script-sandbox",
|
||||
str(overlay.id), script_path,
|
||||
]
|
||||
run_with_streamed_output(cmd, on_stdout, on_stderr, should_cancel)
|
||||
self._enforce_disk_budget(overlay.id, on_stderr)
|
||||
finally:
|
||||
os.unlink(script_path)
|
||||
|
||||
def _enforce_disk_budget(self, overlay_id, on_stderr):
|
||||
size = subprocess.check_output(["du", "-sb", overlay_path(overlay_id)])
|
||||
if int(size.split()[0]) > 20 * 1024**3:
|
||||
on_stderr("overlay exceeded 20 GB disk cap")
|
||||
raise BuildError("disk-cap-exceeded")
|
||||
```
|
||||
|
||||
`run_with_streamed_output` is the existing helper used by `WorkshopBuilder` for `steamcmd`/`curl` invocations. The `should_cancel` callback fires `kill -TERM` on the sudo-`systemd-run` process tree; cgroup-collect tears down the whole scope on exit.
|
||||
|
||||
The job worker's existing job-completion path writes `Overlay.last_build_status = 'ok'` on success and `'failed'` on any non-zero exit / `BuildError` / cancel. This is a single column update inside the existing transaction; no new infrastructure.
|
||||
|
||||
## UI
|
||||
|
||||
### Create modal (`templates/overlays.html`)
|
||||
|
||||
The existing modal grows one option in the type radio: `Workshop | Script`. Name field unchanged. After insert, the web app generates `path = str(overlay_id)` for new rows (existing pattern).
|
||||
|
||||
### Detail page when `type='script'` (`templates/overlay_detail.html`)
|
||||
|
||||
- Plain styled `<textarea>` for `overlay.script` with a Save button → `POST /overlays/{id}/script`. No CodeMirror dependency in v1 (out of scope; keep frontend dep-light).
|
||||
- "Rebuild" button → `POST /overlays/{id}/build`. Existing pattern from workshop overlays.
|
||||
- "Wipe overlay" button (red, confirm-modal) → `POST /overlays/{id}/wipe`.
|
||||
- `last_build_status` indicator badge: empty / "ok" / "failed".
|
||||
- Live build log via existing SSE plumbing on the related Job row.
|
||||
|
||||
### Detail page when `type='workshop'`: unchanged.
|
||||
|
||||
### Sections removed
|
||||
|
||||
The global-source detail block (`overlay_detail.html` lines 34–46) is deleted along with the managed-globals subsystem.
|
||||
|
||||
## Routes
|
||||
|
||||
`l4d2web/routes/overlay_routes.py` adds:
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|---|---|---|
|
||||
| POST | `/overlays/{id}/script` | Update `script` text. Auto-enqueue coalesced `build_overlay` job. |
|
||||
| POST | `/overlays/{id}/wipe` | Invoke `left4me-script-sandbox` with the literal script `find /overlay -mindepth 1 -delete`. Owner/admin only. Refuses if a `build_overlay` for this overlay is running. After success, set `last_build_status=''`. Does not auto-enqueue a rebuild. |
|
||||
| POST | `/overlays/{id}/build` | Manual rebuild — same pattern as today's workshop overlay manual rebuild. |
|
||||
|
||||
Existing `POST /overlays` accepts `type=script` and an optional initial `script` body.
|
||||
|
||||
## Permissions
|
||||
|
||||
| Action | Who |
|
||||
|---|---|
|
||||
| Create script overlay (private, `user_id = me`) | Any authenticated user |
|
||||
| Create script overlay (system-wide, `user_id = NULL`) | Admin |
|
||||
| Edit (script body, name) | Owner or admin |
|
||||
| Wipe / Rebuild | Owner or admin |
|
||||
| Delete | Owner or admin |
|
||||
| View | Owner, admin, or any user when `user_id IS NULL` |
|
||||
|
||||
These match the existing rules for workshop overlays.
|
||||
|
||||
## Job Worker / Scheduler
|
||||
|
||||
`services/job_worker.py` drops `"refresh_global_overlays"` from `GLOBAL_OPERATIONS` and removes the corresponding `refresh_global_overlays_running` and `blocked_servers_by_overlay` plumbing that exists only for the global-maps subsystem. The remaining mutex rules already cover:
|
||||
|
||||
- `build_overlay` per overlay (one running build per overlay).
|
||||
- `install` and `refresh_workshop_items` as global mutexes.
|
||||
- Server start/init blocks if any `build_overlay` for an overlay in the server's blueprint is running.
|
||||
|
||||
No new rules are needed for `script` — its build is mechanically identical to a `workshop` build from the scheduler's perspective.
|
||||
|
||||
## Daily Refresh — Removed
|
||||
|
||||
This iteration deletes the daily-refresh subsystem entirely:
|
||||
|
||||
- `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.timer` and `.service` — deleted.
|
||||
- `flask refresh-global-overlays` CLI command in `l4d2web/cli.py` — deleted.
|
||||
- No replacement timer, no replacement CLI, no `auto_refresh` column on `Overlay`.
|
||||
|
||||
The only build trigger after this change is the user clicking Rebuild on the detail page (or the auto-enqueue when they Save the script body). A scheduled-refresh feature is reintroduced in a future iteration designed against concrete operational needs.
|
||||
|
||||
## Risks
|
||||
|
||||
- **Sandbox escape via kernel bug.** `bwrap` has a strong track record but is not invulnerable. Mitigated by running as `l4d2-sandbox` (no privileged capabilities), no setuid binaries reachable, `no_new_privs` implicit. A successful escape would land in an unprivileged UID with no host secrets reachable.
|
||||
- **Disk fill via runaway script.** A script that writes a 20 GB+ payload to `/overlay` succeeds inside the sandbox and only fails afterward at the post-build `du` check. The 20 GB lands on disk transiently. Mitigated by the kernel's per-cgroup IO accounting being unaware of file size (no good IO-time limit), accepting this as a v1 trade-off; a future improvement is overlay-dir-on-its-own-filesystem with a quota.
|
||||
- **Network exfiltration.** Script can connect to anything outbound, including internal IPs. Acceptable for the current trust model (semi-public; users have credentials). Egress firewall is out of scope.
|
||||
- **Build-mid-server-running.** The scheduler refuses `build_overlay` for an overlay attached to a starting/running server (existing rule, unchanged). Good. A user can still rebuild while a server using a *different* blueprint runs concurrently.
|
||||
- **Wipe race with running build.** The wipe endpoint refuses if a `build_overlay` for the overlay is running. Without this check, a wipe could blow away files mid-script and produce undefined results.
|
||||
- **Stale `last_build_status`.** A row inserted via direct DB write or restored from backup could carry an `'ok'` status that no longer reflects reality. Treated as cosmetic; users can rebuild to refresh.
|
||||
- **Sudoers misconfig.** A typo in the sudoers fragment could grant `left4me` more than intended. Mitigated by deploy-artifact tests asserting the exact expected lines.
|
||||
- **DB row deletion racing the sandbox.** A user deleting an overlay while its build runs would invalidate the bind-mount target. Mitigated by the existing scheduler rule that tracks running overlays; delete should refuse if a build is running. (Existing pattern for workshop overlays; reuse.)
|
||||
- **Migration drops globals tables.** Acceptable for the test deploy. Production rollout would need a different migration story; this spec explicitly assumes test-deploy DB wipe.
|
||||
|
||||
## Out Of Scope
|
||||
|
||||
- **Scheduled / daily refresh.** Intentionally removed in this iteration. Reintroduced later, designed against the use cases that emerge.
|
||||
- **Per-overlay resource overrides.** All script overlays share the same 1 h / 4 GB / 20 GB envelope. If a real overlay needs more (l4d2center mirror at peak), revisit.
|
||||
- **CodeMirror or other rich script editor.** Plain `<textarea>` in v1.
|
||||
- **Egress allowlist / proxy.** No network restrictions on the sandbox in v1.
|
||||
- **`$CACHE` scratch dir** persisted across builds. Users cache inside the overlay dir if they want; idempotency model is "script runs against existing dir."
|
||||
- **Multi-tenant cgroup tree per user.** All sandboxes share the same cgroup-quota envelope.
|
||||
- **Revision history on `script` column.** No `overlay_script_revisions` table; whatever's in the row is the current script.
|
||||
- **Auto-seeding of l4d2center / cedapug equivalents.** Admin pastes the script post-deploy.
|
||||
- **Migration that preserves existing global-map overlay rows.** Test deploy DB is wiped.
|
||||
- **Container-per-build (podman / docker).** Heavier than `bwrap`; revisit only if multi-tenant escalates to "fully public sign-up."
|
||||
- **left4me-aware helpers** (`workshop`, `download`, `extract`) inside the sandbox. Pure bash + host `/usr` only.
|
||||
|
||||
## Implementation Boundaries
|
||||
|
||||
- **`l4d2host` is unchanged.** The host library has no concept of overlay types and the mount layer (`KernelOverlayFSMounter`) doesn't care how the overlay dir got populated.
|
||||
- **The `OverlayBuilder` Protocol is unchanged** — same `build(overlay, *, on_stdout, on_stderr, should_cancel)` signature. `ScriptBuilder` plugs into the existing registry.
|
||||
- **The job worker model is unchanged.** Same operations, same logs, same SSE plumbing, same scheduler rules (minus the refresh_global_overlays entry).
|
||||
- **No new application-level dependencies.** Vendored HTMX, no new Python packages. Two new system dependencies: `bubblewrap` apt package and the `l4d2-sandbox` system user.
|
||||
- **No new config keys.** Same env files (`/etc/left4me/host.env`, `/etc/left4me/web.env`).
|
||||
- **DB migration is destructive for global-maps overlay rows.** This is acceptable per the test-deploy assumption; a production-rollout follow-up would need to address it.
|
||||
- The companion implementation plan governs task ordering and verification commands. Implementation must not start without explicit user approval per that plan's gate.
|
||||
Loading…
Reference in a new issue