Compare commits
No commits in common. "master" and "harden-boundary-inputs" have entirely different histories.
master
...
harden-bou
334 changed files with 2282 additions and 79350 deletions
1
.envrc
1
.envrc
|
|
@ -1 +0,0 @@
|
||||||
layout uv
|
|
||||||
6
.gitignore
vendored
6
.gitignore
vendored
|
|
@ -1,7 +1,5 @@
|
||||||
.worktrees/
|
.worktrees/
|
||||||
.claude/
|
|
||||||
.venv/
|
.venv/
|
||||||
.direnv/
|
|
||||||
.pytest_cache/
|
.pytest_cache/
|
||||||
__pycache__/
|
__pycache__/
|
||||||
*.pyc
|
*.pyc
|
||||||
|
|
@ -9,7 +7,3 @@ __pycache__/
|
||||||
l4d2web.db*
|
l4d2web.db*
|
||||||
# CocoIndex Code (ccc)
|
# CocoIndex Code (ccc)
|
||||||
/.cocoindex_code/
|
/.cocoindex_code/
|
||||||
.superpowers/
|
|
||||||
*.db
|
|
||||||
opencode.json
|
|
||||||
.tmp/
|
|
||||||
|
|
|
||||||
|
|
@ -1 +0,0 @@
|
||||||
3.13
|
|
||||||
132
AGENTS.md
132
AGENTS.md
|
|
@ -21,94 +21,6 @@ Do not invent architecture outside these plans unless explicitly requested.
|
||||||
### Workspace and tools
|
### Workspace and tools
|
||||||
|
|
||||||
- Do not use git worktrees.
|
- Do not use git worktrees.
|
||||||
- Repo is a uv workspace; Python is pinned to 3.13 via `.python-version`. After fresh checkout: install `uv` (`brew install uv` / `curl -LsSf https://astral.sh/uv/install.sh | sh`), then `direnv allow` (or `uv sync` directly). See README **Local development** for details.
|
|
||||||
|
|
||||||
### Modals: inline vs routed
|
|
||||||
|
|
||||||
Two coexisting modal mechanisms, one module (`l4d2web/l4d2web/static/js/modals.js`). When adding a new modal, decide which pipeline it belongs to:
|
|
||||||
|
|
||||||
**Inline modal** — the dialog markup is pre-rendered into the page HTML. Content is whatever's already there; the JS just calls `showModal()` / `close()` on a specific `<dialog>` by id. Use when:
|
|
||||||
- It's a confirmation (delete, overwrite, reset)
|
|
||||||
- It's a transient prompt mid-flow (conflict resolution during upload)
|
|
||||||
- It's a form whose URL state would be noise (rename, new-folder, new-server)
|
|
||||||
- The content has no standalone-page equivalent
|
|
||||||
|
|
||||||
Hooks: `<button data-inline-modal-open="<dialog-id>">` opens; `<button data-inline-modal-close>` inside the dialog closes; Esc and backdrop click also close. Programmatic: `window.modals.openInline(idOrEl)` / `window.modals.closeInline(idOrEl)`.
|
|
||||||
|
|
||||||
**Routed modal** — content is server-rendered from a URL and lands in the persistent `<dialog id="modal-container">` slot. URL gains `?modal=<path>`, refresh + share + back/forward all work. Use when:
|
|
||||||
- The content has standalone-page meaning (editor, detail view, settings panel)
|
|
||||||
- "Share this view" or "refresh-stays-here" matters
|
|
||||||
- The URL state earns its keep
|
|
||||||
|
|
||||||
Hooks: `<a data-routed-modal href="<path>">` opens (full-page nav fallback if JS fails); `<button data-routed-modal-dismiss>` inside the swapped content closes. Programmatic: `window.modals.openRouted(path)` / `window.modals.closeRouted()`.
|
|
||||||
|
|
||||||
**Conventions for routed-modal templates** (templates that `{% extends base_layout %}`, where `base_layout` resolves to `_modal_partial.html` for `HX-Modal: 1` requests and `base.html` otherwise — see `app.py:inject_base_layout`):
|
|
||||||
|
|
||||||
- **The outermost element of `{% block content %}` is a `<div>`, NOT a `<dialog>`.** The persistent slot in `base.html` already provides top-layer + backdrop + focus-trap + Esc-to-close semantics. Nested `<dialog>` collapses to 2 px in every browser.
|
|
||||||
- **Close buttons use `data-routed-modal-dismiss`** (NOT the inline-modal attribute). `modals.js` delegates at document level.
|
|
||||||
- **Form-bearing content needs document-level event delegation** for submit/save/delete, gated on `event.target.closest("#modal-content")`. Direct binding to elements in the swapped-in fragment only works in standalone mode — HTMX-swapped content arrives as fresh DOM nodes with no listeners attached. See `static/js/files-overlay/editor.js`'s document-level click listener + the `routedSaveClicked` / `routedReplaceClicked` / `routedDeleteClicked` functions for the canonical pattern (read `data-*` attributes from the swapped DOM, NOT from JS state set during open).
|
|
||||||
- **CSS classes targeting modal chrome are scoped to the outer slot** — `dialog.modal, div.modal` in `components.css`. The inner content div should NOT carry `class="modal modal-wide"` (the outer dialog owns chrome; otherwise both paint card-in-a-card).
|
|
||||||
|
|
||||||
**Reference:** `docs/superpowers/specs/2026-05-17-url-addressable-modals-design.md` (design + verification matrix) and the plan errata at the top of `docs/superpowers/plans/2026-05-17-url-addressable-modals.md`.
|
|
||||||
|
|
||||||
### Files overlay: module layout
|
|
||||||
|
|
||||||
The file-manager JS for files-type overlays is split across four
|
|
||||||
modules under `l4d2web/l4d2web/static/js/files-overlay/`, all loaded
|
|
||||||
with `defer` from `templates/overlay_detail.html`. They cooperate via
|
|
||||||
the `window.__filesOverlay` action registry that `core.js` sets up:
|
|
||||||
|
|
||||||
- **`core.js`** — manager-element detection (`.files-manager` guard),
|
|
||||||
derived state (`overlayId`, `baseUrl`, `treeRoot`, `csrfToken`),
|
|
||||||
shared helpers (`joinPath`, `parentOf`, `basename`, `humanSize`,
|
|
||||||
`fetchJson`, `postJson`, `postForm`, `refreshFolder`,
|
|
||||||
`findRowByPath`, `cssEscape`, `scheduleRefresh`), and the
|
|
||||||
document-level click listener that dispatches `[data-action]`
|
|
||||||
clicks through `__filesOverlay.handleAction(op, path, actionEl)`
|
|
||||||
into per-feature handlers.
|
|
||||||
- **`editor.js`** — URL-addressable editor only. Handles the new-file
|
|
||||||
route (`/files/new?at=...`), edit route for text + binary
|
|
||||||
(`/files/edit?path=...`), and the save / replace / delete delegated
|
|
||||||
click handlers scoped to `#modal-content`. Registers `"new-file"`
|
|
||||||
and `"edit"` into the registry.
|
|
||||||
- **`dialogs.js`** — the three inline `<dialog>` modals (new-folder,
|
|
||||||
delete-confirm, conflict). Module-scope state per dialog (one
|
|
||||||
delegated listener each, no clone-and-rebind). Exposes
|
|
||||||
`askConflict(path) → Promise<"overwrite"|"keep-both"|"cancel">`
|
|
||||||
on `__filesOverlay` for use by editor.js + uploads.js. Registers
|
|
||||||
`"new-folder"` and `"delete"` into the registry.
|
|
||||||
- **`uploads.js`** — upload queue (concurrency 3, XHR-based progress,
|
|
||||||
`data-upload-id` delegated cancel), drag-drop on `treeRoot`
|
|
||||||
(direct-bound — 5 coordinated events share highlight state), and
|
|
||||||
the `"zip"` registry handler. Exposes
|
|
||||||
`withCollisionSuffix(path) → suffixedPath` for the upload + save
|
|
||||||
conflict paths. Drag-drop on `treeRoot` is the **only** direct-bound
|
|
||||||
listener block in the four modules; everything else is document-level
|
|
||||||
delegation (see escape-hatch comments in-source).
|
|
||||||
|
|
||||||
When adding a new file-row action, the contract is:
|
|
||||||
|
|
||||||
1. Render the `<button data-action="my-op" data-target-path="...">` in
|
|
||||||
`templates/_overlay_file_node.html` (gated on the right capability
|
|
||||||
flag).
|
|
||||||
2. Pick the module that owns the action and register a handler:
|
|
||||||
`fo.registerHandler("my-op", (path, actionEl) => { ... })`.
|
|
||||||
3. The dispatch wiring in `core.js` takes care of catching the click
|
|
||||||
and calling the handler. No new listeners needed.
|
|
||||||
|
|
||||||
### Dev server and filesystem paths
|
|
||||||
|
|
||||||
- **Production paths (`/var/lib/left4me`, `/usr/local/lib/systemd/system`, `/usr/local/libexec/left4me`, `/etc/left4me`) exist only on Linux deploy hosts.** Never create or write to these on a developer machine. They are referenced in `l4d2host/l4d2host/paths.py` and the spec only as the production layout.
|
|
||||||
- **For local dev, always use `scripts/dev-server.py`.** It sets `LEFT4ME_ROOT=./.tmp/dev-server`, runs migrations, seeds demo content (admin + blueprint + script overlay + files overlay), and starts Flask on port 5051. Reset state with `rm -rf .tmp/dev-server` then re-run. Never invoke `flask run` directly — that leaves `LEFT4ME_ROOT` unset and the app falls back to the production `/var/lib/left4me`, which on macOS surfaces as "route returns 404 / empty modal / file not found" and can be mistaken for a code bug.
|
|
||||||
- **All ephemeral dev state lives under `.tmp/`** (gitignored). Use `$TMPDIR` only for transient files outside the repo. Do NOT use `/tmp`, `~/Library/Application Support`, or any system path for project state — only `.tmp/` (project-local) or `$TMPDIR` (sandbox-blessed).
|
|
||||||
- **Symptom-to-cause translation:** if a route returns 404 or behaves as if the filesystem is empty, the first diagnosis is "`LEFT4ME_ROOT` is wrong" (defaulted to the production path), not "code bug." Restart via `scripts/dev-server.py`.
|
|
||||||
|
|
||||||
### Planning artifacts
|
|
||||||
|
|
||||||
- Design specs live in `docs/superpowers/specs/` as `YYYY-MM-DD-<topic>-design.md`.
|
|
||||||
- Implementation plans live in `docs/superpowers/plans/` as `YYYY-MM-DD-<topic>.md` (suffix the topic with `-v1`/`-v2`/etc. if a plan is versioned).
|
|
||||||
- Commit both to git as soon as the user approves them.
|
|
||||||
- Do not leave specs or plans outside this repo. The `~/.claude/plans/<slug>.md` plan-mode scratch file is acceptable while plan mode is open; the persisted artifact must end up under `docs/superpowers/` and be committed.
|
|
||||||
|
|
||||||
### Naming and boundaries
|
### Naming and boundaries
|
||||||
|
|
||||||
|
|
@ -129,7 +41,7 @@ When adding a new file-row action, the contract is:
|
||||||
- `logs <name> --lines <n> --follow/--no-follow`
|
- `logs <name> --lines <n> --follow/--no-follow`
|
||||||
- Runtime paths are rooted at `LEFT4ME_ROOT`, defaulting to `/var/lib/left4me`.
|
- Runtime paths are rooted at `LEFT4ME_ROOT`, defaulting to `/var/lib/left4me`.
|
||||||
- Deployment/config management owns global units under `/usr/local/lib/systemd/system` and privileged helpers under `/usr/local/libexec/left4me`.
|
- Deployment/config management owns global units under `/usr/local/lib/systemd/system` and privileged helpers under `/usr/local/libexec/left4me`.
|
||||||
- Overlay directories are populated by the web app (workshop downloads, managed-global refresh). The host library only mounts them.
|
- Overlays are external directories (no overlay content management here).
|
||||||
- Fail-fast subprocess behavior; pass raw stderr; propagate return code.
|
- Fail-fast subprocess behavior; pass raw stderr; propagate return code.
|
||||||
- No lock manager, no rollback, no preflight runtime checks.
|
- No lock manager, no rollback, no preflight runtime checks.
|
||||||
- Delete missing instance/runtime dirs must succeed (no-op).
|
- Delete missing instance/runtime dirs must succeed (no-op).
|
||||||
|
|
@ -177,45 +89,3 @@ Typical commands (once components exist):
|
||||||
|
|
||||||
- If a requested change conflicts with this file, follow explicit user instruction.
|
- If a requested change conflicts with this file, follow explicit user instruction.
|
||||||
- If plans and code diverge, update plans or flag the mismatch clearly.
|
- If plans and code diverge, update plans or flag the mismatch clearly.
|
||||||
|
|
||||||
## End-to-end tests
|
|
||||||
|
|
||||||
The Playwright-based browser tests under `l4d2web/tests/e2e/` need a
|
|
||||||
chromium binary, fetched on first setup:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
uv run playwright install chromium
|
|
||||||
```
|
|
||||||
|
|
||||||
Always invoke as `uv run pytest -m e2e ...` (excluded from the default
|
|
||||||
fast suite via the `e2e` marker). Other forms crash Chromium under the
|
|
||||||
macOS sandbox; only this exact invocation is exempt.
|
|
||||||
|
|
||||||
## Editor bundle (CodeMirror 6)
|
|
||||||
|
|
||||||
The in-browser code editor on the blueprint config / overlay script /
|
|
||||||
files-modal textareas is bundled from `l4d2web/scripts/editor-src/`
|
|
||||||
via esbuild and committed pre-built to
|
|
||||||
`l4d2web/l4d2web/static/vendor/editor.bundle.js`. Source lives under
|
|
||||||
`l4d2web/scripts/editor-src/`; design and plan at
|
|
||||||
`docs/superpowers/specs/2026-05-17-textarea-editor-v2-design.md` and
|
|
||||||
`docs/superpowers/plans/2026-05-17-textarea-editor-v2.md`.
|
|
||||||
|
|
||||||
Rebuild after editing the source:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
./l4d2web/scripts/build-editor.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Requires `node` + `npm` locally. The script overrides the npm cache to
|
|
||||||
`$TMPDIR/npm-cache` (set `NPM_CACHE` to override) to dodge root-owned
|
|
||||||
files in `~/.npm/_cacache/` from older npm versions. Commit the
|
|
||||||
regenerated `editor.bundle.js`, `editor.bundle.css`, and
|
|
||||||
`editor.bundle.sha256` alongside any source change.
|
|
||||||
|
|
||||||
Regenerate the autocomplete vocab from `./cvar_list` (live L4D2
|
|
||||||
cvarlist dump committed at repo root) after replacing the dump:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
./l4d2web/scripts/build-vocab.py
|
|
||||||
```
|
|
||||||
|
|
|
||||||
16
README.md
16
README.md
|
|
@ -27,7 +27,7 @@ Implementation plans remain the source of truth for architecture and task sequen
|
||||||
- `logs <name> --lines <n> --follow/--no-follow`
|
- `logs <name> --lines <n> --follow/--no-follow`
|
||||||
- The web app calls host operations through `l4d2ctl`, not direct `l4d2host` imports.
|
- The web app calls host operations through `l4d2ctl`, not direct `l4d2host` imports.
|
||||||
- Deployment uses `/var/lib/left4me` for runtime state, `/opt/left4me` for repository contents and the virtualenv, `/etc/left4me` for environment files, and global units under `/usr/local/lib/systemd/system`.
|
- Deployment uses `/var/lib/left4me` for runtime state, `/opt/left4me` for repository contents and the virtualenv, `/etc/left4me` for environment files, and global units under `/usr/local/lib/systemd/system`.
|
||||||
- Overlay handling is directory-based; the web app populates each overlay (workshop downloads, managed-global refresh).
|
- Overlay handling is directory-based and externally populated.
|
||||||
- No lock manager, no rollback, no preflight checks in host library.
|
- No lock manager, no rollback, no preflight checks in host library.
|
||||||
- CLI propagates subprocess failures via stderr and return code.
|
- CLI propagates subprocess failures via stderr and return code.
|
||||||
- `delete` on missing instance is no-op success.
|
- `delete` on missing instance is no-op success.
|
||||||
|
|
@ -50,23 +50,13 @@ Implementation plans remain the source of truth for architecture and task sequen
|
||||||
|
|
||||||
See `deploy/README.md` for the Linux test deployment contract, including the runtime user, target filesystem layout, systemd units, privileged helpers, sudoers rules, admin bootstrap, and overlay reference rules.
|
See `deploy/README.md` for the Linux test deployment contract, including the runtime user, target filesystem layout, systemd units, privileged helpers, sudoers rules, admin bootstrap, and overlay reference rules.
|
||||||
|
|
||||||
## Local development
|
|
||||||
|
|
||||||
This repo is a [uv](https://docs.astral.sh/uv/) workspace (`l4d2host` + `l4d2web` as members) with a committed `uv.lock` and a `.python-version` pinning Python 3.13 (matching the Debian Trixie production target).
|
|
||||||
|
|
||||||
One-time prereq: install `uv` (macOS: `brew install uv`; Linux: `curl -LsSf https://astral.sh/uv/install.sh | sh` — `uv` is not yet in Debian stable's apt).
|
|
||||||
|
|
||||||
1. `direnv allow` once per fresh checkout (and after any `.envrc` change). `.envrc` uses `use uv`, which runs `uv sync` and activates `.venv/` on `cd`.
|
|
||||||
2. Without direnv: `uv sync` at the repo root creates `.venv/`, installs both workspace members editable, and pulls in dev deps (pytest) from the lockfile.
|
|
||||||
3. Tests: `uv run pytest` (or just `pytest` once the venv is on PATH).
|
|
||||||
|
|
||||||
## Tech Stack (planned)
|
## Tech Stack (planned)
|
||||||
|
|
||||||
- Python 3.13+ (workspace uses uv + hatchling)
|
- Python 3.12+
|
||||||
- Typer, PyYAML, pytest
|
- Typer, PyYAML, pytest
|
||||||
- Flask, SQLAlchemy, Alembic
|
- Flask, SQLAlchemy, Alembic
|
||||||
- HTMX (vendored locally), custom CSS, SSE
|
- HTMX (vendored locally), custom CSS, SSE
|
||||||
- systemd units, kernel overlayfs (mounted via the `left4me-overlay` privileged helper), steamcmd
|
- systemd user units, fuse-overlayfs, steamcmd
|
||||||
|
|
||||||
## Recommended Implementation Order
|
## Recommended Implementation Order
|
||||||
|
|
||||||
|
|
|
||||||
318
deploy/README.md
318
deploy/README.md
|
|
@ -1,133 +1,66 @@
|
||||||
# left4me deploy — reference exemplar
|
# left4me Deployment
|
||||||
|
|
||||||
> The canonical deploy of `ovh.left4me` is driven by
|
This directory contains the production-like test deployment for a Linux server. It installs the repository into a fixed host layout, configures a dedicated runtime user, installs systemd units, and wires the web app to host operations through privileged helper commands.
|
||||||
> [ckn-bw](https://git.sublimity.de/cronekorkn/ckn-bw)'s `bundles/left4me/`
|
|
||||||
> (attached via `groups/applications/left4me.py`); run `bw apply ovh.left4me`
|
|
||||||
> from the ckn-bw repo to deploy.
|
|
||||||
>
|
|
||||||
> **`deploy/files/` is the canonical source of truth** for static deployment
|
|
||||||
> artifacts — sudoers, sysctl drop-in, and hardening drop-ins for the
|
|
||||||
> systemd service units. ckn-bw delivers these via **target-side symlinks**
|
|
||||||
> from their on-host paths into `/opt/left4me/src/deploy/files/...` (safe
|
|
||||||
> because `/opt/left4me/src` is root-owned at runtime; the application cannot
|
|
||||||
> rewrite its own deployment artifacts).
|
|
||||||
>
|
|
||||||
> **`deploy/scripts/` is the canonical source of truth** for privileged
|
|
||||||
> helpers. ckn-bw creates target-side symlinks from
|
|
||||||
> `/usr/local/{libexec/left4me,sbin}/` into
|
|
||||||
> `/opt/left4me/src/deploy/scripts/{libexec,sbin}/` after `git_deploy`.
|
|
||||||
>
|
|
||||||
> What remains under `deploy/files/usr/local/lib/systemd/system/` is a set
|
|
||||||
> of **reference fixtures** — a curated subset of the systemd units ckn-bw's
|
|
||||||
> reactor emits at apply time. They exist so a fresh consumer (other than
|
|
||||||
> ckn-bw) can read this tree and understand the live unit shape, and so that
|
|
||||||
> `deploy/tests/test_example_units.py` can assert the reference matches the
|
|
||||||
> live form. The live base units are emitted by ckn-bw's `systemd/units`
|
|
||||||
> reactor with per-host CPU pinning and worker counts; the reference files
|
|
||||||
> must not include hardening directives (those live in the drop-ins, not the
|
|
||||||
> base units).
|
|
||||||
|
|
||||||
## What's here
|
## Target Layout
|
||||||
|
|
||||||
| Path | Role |
|
The deployment uses these paths:
|
||||||
|---|---|
|
|
||||||
| `files/etc/sudoers.d/left4me` | **Canonical** sudoers grants. Symlinked to `/etc/sudoers.d/left4me`. CI syntax test: `tests/test_sudoers.py`. |
|
|
||||||
| `files/etc/sysctl.d/99-left4me.conf` | **Canonical** sysctl drop-in (UDP buffers, fq_codel + BBR, `kernel.yama.ptrace_scope=2`). Symlinked to `/etc/sysctl.d/99-left4me.conf`. |
|
|
||||||
| `files/etc/systemd/system/left4me-web.service.d/10-hardening.conf` | **Canonical** hardening drop-in for `left4me-web.service`. Symlinked to the same on-host path. |
|
|
||||||
| `files/etc/systemd/system/left4me-server@.service.d/10-hardening.conf` | **Canonical** hardening drop-in for `left4me-server@.service`. Symlinked to the same on-host path. |
|
|
||||||
| `files/etc/left4me/sandbox-resolv.conf` | Example `/etc/resolv.conf` bound into the script-overlay sandbox (delivered as a bw `files{}` item, not a symlink). |
|
|
||||||
| `files/usr/local/lib/systemd/system/left4me-web.service` | **Reference fixture** — the web-app unit the reactor emits (per-host worker/thread counts omitted). |
|
|
||||||
| `files/usr/local/lib/systemd/system/left4me-server@.service` | **Reference fixture** — the per-instance gameserver unit template the reactor emits. |
|
|
||||||
| `files/usr/local/lib/systemd/system/left4me-workshop-refresh.{service,timer}` | **Reference fixture** — the daily workshop-refresh cron-equivalent. |
|
|
||||||
| `files/usr/local/lib/systemd/system/l4d2-{game,build}.slice` | **Reference fixture** — slice definitions (CPU/IO weights; reactor fills in `AllowedCPUs=` from host metadata). |
|
|
||||||
| `scripts/libexec/{left4me-overlay,left4me-systemctl,left4me-journalctl,left4me-script-sandbox}` | **Canonical** privileged helper commands. Symlinked under `/usr/local/libexec/left4me/`. |
|
|
||||||
| `scripts/sbin/left4me` | **Canonical** admin CLI wrapper. Symlinked to `/usr/local/sbin/left4me`. |
|
|
||||||
| `templates/etc/left4me/host.env` | Example host-library env (deployment-fixed paths). |
|
|
||||||
| `templates/etc/left4me/web.env.template` | Example web-app env. ckn-bw renders the real version via the matching Mako template in `bundles/left4me/files/etc/left4me/web.env.mako`. |
|
|
||||||
| `tests/test_example_units.py` | Locks down the reference units and env templates above; also asserts hardening drop-in shape. |
|
|
||||||
| `tests/test_sudoers.py` | Runs `visudo -cf` against the sudoers file in CI. |
|
|
||||||
|
|
||||||
## Target layout
|
- `/etc/left4me/host.env`: host library environment configuration.
|
||||||
|
- `/etc/left4me/web.env`: web app environment configuration.
|
||||||
|
- `/opt/left4me/.venv`: Python virtual environment for deployed commands.
|
||||||
|
- `/opt/left4me`: deployed repository contents.
|
||||||
|
- `/var/lib/left4me/left4me.db`: SQLite database used by the web app.
|
||||||
|
- `/var/lib/left4me/installation`: shared L4D2 installation.
|
||||||
|
- `/var/lib/left4me/overlays`: externally managed overlay directories.
|
||||||
|
- `/var/lib/left4me/instances`: rendered instance specifications and per-instance state.
|
||||||
|
- `/var/lib/left4me/runtime`: per-instance runtime mount directories.
|
||||||
|
- `/var/lib/left4me/tmp`: temporary files used by deployment/runtime operations.
|
||||||
|
- `/usr/local/lib/systemd/system`: global systemd unit files, including `left4me-server@.service`.
|
||||||
|
- `/usr/local/libexec/left4me`: privileged helper commands, including `left4me-systemctl` and `left4me-journalctl`.
|
||||||
|
- `/etc/sudoers.d/left4me`: sudoers rules allowing the web/runtime commands to call the helpers non-interactively.
|
||||||
|
|
||||||
The deployment uses these on-host paths (FHS-aligned):
|
Static units are generated for `/var/lib/left4me`. If `LEFT4ME_ROOT` changes, regenerate and reinstall the unit files instead of reusing the existing static units.
|
||||||
|
|
||||||
- `/etc/left4me/host.env` — host library environment configuration.
|
## Runtime User
|
||||||
- `/etc/left4me/web.env` — web app environment configuration.
|
|
||||||
- `/etc/left4me/sandbox-resolv.conf` — DNS resolv.conf bound into the
|
|
||||||
script-overlay sandbox.
|
|
||||||
- `/etc/sudoers.d/left4me` — sudoers rules letting the `left4me` uid call
|
|
||||||
the privileged helpers non-interactively.
|
|
||||||
- `/etc/sysctl.d/99-left4me.conf` — perf-baseline sysctls.
|
|
||||||
- `/opt/left4me/src` — deployed repository contents (via ckn-bw
|
|
||||||
`git_deploy`). Root-owned; read-only at runtime. `/opt/left4me/`
|
|
||||||
itself is also root-owned and contains only `src/`.
|
|
||||||
- `/var/lib/left4me/.venv` — Python virtual environment for the web app
|
|
||||||
(non-editable install of `l4d2host` + `l4d2web`).
|
|
||||||
- `/var/lib/left4me/steam` — steamcmd install (self-updates).
|
|
||||||
- `/var/lib/left4me/left4me.db` — SQLite database used by the web app.
|
|
||||||
- `/var/lib/left4me/installation` — shared L4D2 installation.
|
|
||||||
- `/var/lib/left4me/overlays` — overlay directories. Each overlay lives
|
|
||||||
at `${overlay_id}` under here.
|
|
||||||
- `/var/lib/left4me/workshop_cache` — deduplicated cache of `.vpk` files
|
|
||||||
downloaded for workshop overlays. One file per Steam item, named
|
|
||||||
`{steam_id}.vpk`. Workshop overlays symlink into this tree.
|
|
||||||
- `/var/lib/left4me/instances` — rendered instance specifications and
|
|
||||||
per-instance state.
|
|
||||||
- `/var/lib/left4me/runtime` — per-instance runtime mount directories.
|
|
||||||
- `/var/lib/left4me/tmp` — temporary files used by deployment/runtime
|
|
||||||
operations (incl. idmap staging binds).
|
|
||||||
- `/usr/local/lib/systemd/system/` — global systemd unit files emitted
|
|
||||||
by ckn-bw's `systemd_units` reactor.
|
|
||||||
- `/usr/local/libexec/left4me/` — privileged helper commands, symlinked
|
|
||||||
from `deploy/scripts/libexec/`.
|
|
||||||
- `/usr/local/sbin/left4me` — admin CLI wrapper, symlinked from
|
|
||||||
`deploy/scripts/sbin/left4me`.
|
|
||||||
|
|
||||||
## Runtime users
|
The deployment creates and runs host operations as the dedicated runtime user:
|
||||||
|
|
||||||
One system user does everything:
|
- Username: `left4me`
|
||||||
|
- Home: `/var/lib/left4me`
|
||||||
|
- Shell: `/usr/sbin/nologin`
|
||||||
|
|
||||||
- **`left4me`** (home `/var/lib/left4me`, shell `/usr/sbin/nologin`):
|
## Running A Test Deployment
|
||||||
web app, host library, gameserver runtime, and script-overlay
|
|
||||||
sandbox. The sandbox unit drops privileges via `systemd-run` and
|
|
||||||
runs the user-authored bash inside a fully hardened transient
|
|
||||||
service (see `deploy/scripts/libexec/left4me-script-sandbox`). Same-uid
|
|
||||||
attack surface — sandbox escape reaching `web.env`, the SQLite DB,
|
|
||||||
or running gameservers — is closed by that hardening profile plus
|
|
||||||
system-wide `kernel.yama.ptrace_scope=2`, rather than by a uid
|
|
||||||
boundary.
|
|
||||||
|
|
||||||
The user-count decision and its history live in
|
Run the deployment from the repository root:
|
||||||
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
|
|
||||||
|
|
||||||
## Deployment
|
```bash
|
||||||
|
deploy/deploy-test-server.sh deploy-user@example-host
|
||||||
Production deploy:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
# In the ckn-bw repo:
|
|
||||||
bw apply ovh.left4me
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Admin bootstrap is a manual one-time step after the first apply
|
The SSH user must be able to run `sudo` on the target host. The deployment configures system packages, directories, environment files, helper scripts, sudoers rules, Python dependencies, and systemd units.
|
||||||
(ckn-bw deliberately doesn't seed an admin to keep credentials out of
|
|
||||||
the metadata pipeline):
|
|
||||||
|
|
||||||
```sh
|
## Admin Bootstrap
|
||||||
sudo -u left4me LEFT4ME_ADMIN_USERNAME=admin \
|
|
||||||
|
Set the bootstrap credentials in the environment when creating the first admin user:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
LEFT4ME_ADMIN_USERNAME=admin \
|
||||||
LEFT4ME_ADMIN_PASSWORD='change-me' \
|
LEFT4ME_ADMIN_PASSWORD='change-me' \
|
||||||
/var/lib/left4me/.venv/bin/flask --app l4d2web.app:create_app \
|
flask create-user "$LEFT4ME_ADMIN_USERNAME" --admin
|
||||||
create-user "$LEFT4ME_ADMIN_USERNAME" --admin
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Rotate the bootstrap password after first login.
|
Use a strong one-time password and rotate it after first login if needed.
|
||||||
|
|
||||||
## Overlay references
|
## Overlay References
|
||||||
|
|
||||||
Overlay references are relative paths below `${LEFT4ME_ROOT}/overlays`.
|
Overlay references are relative paths below `${LEFT4ME_ROOT}/overlays`. With the default deployment root, they resolve under `/var/lib/left4me/overlays`.
|
||||||
With the default deployment root, they resolve under
|
|
||||||
`/var/lib/left4me/overlays`. New overlays use `${overlay_id}` as their
|
Valid examples:
|
||||||
path; the digit-only form is the only one created by the web app.
|
|
||||||
|
- `standard`
|
||||||
|
- `competitive/base`
|
||||||
|
- `users/42/custom`
|
||||||
|
|
||||||
Invalid references are rejected:
|
Invalid references are rejected:
|
||||||
|
|
||||||
|
|
@ -136,165 +69,4 @@ Invalid references are rejected:
|
||||||
- Empty path components such as `competitive//base`.
|
- Empty path components such as `competitive//base`.
|
||||||
- Symlink escapes that resolve outside `${LEFT4ME_ROOT}/overlays`.
|
- Symlink escapes that resolve outside `${LEFT4ME_ROOT}/overlays`.
|
||||||
|
|
||||||
The web app currently supports two overlay surfaces:
|
Overlay content is external to the host library and deployment contract. Populate overlay directories separately before referencing them from blueprints or instance specs.
|
||||||
|
|
||||||
- **`workshop` overlays** (user-owned) — populated by downloading
|
|
||||||
`.vpk` files from the public Steam Web API into
|
|
||||||
`${LEFT4ME_ROOT}/workshop_cache/{steam_id}.vpk` and creating absolute
|
|
||||||
symlinks under
|
|
||||||
`${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`.
|
|
||||||
- **`script` overlays** — populated by an arbitrary user-authored bash
|
|
||||||
script that runs inside `systemd-run` as `left4me` (under a fully
|
|
||||||
hardened transient service unit), with the overlay directory
|
|
||||||
bind-mounted RW at `/overlay`. Resource caps: 1h walltime, 4 GB RAM,
|
|
||||||
512 tasks, 200% CPU, 20 GB post-build disk cap.
|
|
||||||
|
|
||||||
Both caches and overlay directories are owned by `left4me`. If the web
|
|
||||||
service ever runs as a different uid, ensure it shares a group with the
|
|
||||||
host process and that both trees are group-readable.
|
|
||||||
|
|
||||||
## Performance tuning
|
|
||||||
|
|
||||||
The deployment ships a host-side perf baseline (slices, unit directives,
|
|
||||||
sysctls). See
|
|
||||||
`docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md`
|
|
||||||
for design rationale.
|
|
||||||
|
|
||||||
The knobs below are documented escape hatches — **not** auto-applied.
|
|
||||||
Apply only after measuring a need and understanding the failure modes.
|
|
||||||
|
|
||||||
### Network shaping
|
|
||||||
|
|
||||||
Three pieces of the baseline affect player-experience network behaviour:
|
|
||||||
|
|
||||||
1. **Per-flow marking.** ckn-bw's central `bundles/nftables/` consumes
|
|
||||||
left4me's nftables defaults and marks every UDP packet from uid
|
|
||||||
`left4me` with DSCP EF and `skb->priority` 6. srcds doesn't set
|
|
||||||
these itself, so without this rule its UDP is indistinguishable
|
|
||||||
from any other flow.
|
|
||||||
2. **Sysctl baseline.** `99-left4me.conf` sets `udp_rmem_min=16384`,
|
|
||||||
`udp_wmem_min=16384`, `default_qdisc=fq_codel`, and
|
|
||||||
`tcp_congestion_control=bbr`. Reduces head-of-line blocking when
|
|
||||||
bulk TCP egress coexists with game UDP.
|
|
||||||
3. **CAKE egress shaping.** Configured per-interface via systemd-networkd
|
|
||||||
metadata (`network/<iface>/cake` in ckn-bw's `bundles/network/`),
|
|
||||||
which reapplies the CAKE qdisc across iface lifecycle events. Set
|
|
||||||
the declared bandwidth to ≈95% of measured uplink — CAKE only shapes
|
|
||||||
if its declared bandwidth is *below* the real bottleneck. Idle links
|
|
||||||
with no competing egress see no visible CAKE effect; the win
|
|
||||||
materialises under bulk traffic that would otherwise bufferbloat the
|
|
||||||
link the players share.
|
|
||||||
|
|
||||||
### CPU governor
|
|
||||||
|
|
||||||
The performance governor squeezes a few percent off jitter under bursty
|
|
||||||
load. `schedutil` is acceptable for sustained UDP workloads.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo cpupower frequency-set -g performance
|
|
||||||
```
|
|
||||||
|
|
||||||
Install via `sudo apt install linux-cpupower` if the binary isn't
|
|
||||||
present. Persist via your distro's CPU-frequency tooling (e.g.
|
|
||||||
`/etc/default/cpufrequtils`).
|
|
||||||
|
|
||||||
### CPU isolation (cores)
|
|
||||||
|
|
||||||
The deploy writes four `AllowedCPUs=` drop-ins so that by default only
|
|
||||||
`l4d2-game.slice` is allowed to run on cores `1..N-1`; `system.slice`,
|
|
||||||
`user.slice`, and `l4d2-build.slice` are pinned to core 0. Game servers
|
|
||||||
get the host minus core 0 exclusively; the build sandbox and the web
|
|
||||||
app stay on core 0; a logged-in admin running CPU-heavy work in their
|
|
||||||
shell can't steal cycles from a live match. Single-core hosts skip the
|
|
||||||
cpuset drop-ins entirely; the rest of the perf baseline (cgroup
|
|
||||||
weights, sysctls, OOM scores) still applies.
|
|
||||||
|
|
||||||
Per-instance `CPUAffinity=` (next subsection) composes on top of this —
|
|
||||||
the per-instance value must be a subset of `l4d2-game.slice`'s
|
|
||||||
`AllowedCPUs=`, which the kernel enforces.
|
|
||||||
|
|
||||||
### Per-instance CPU affinity
|
|
||||||
|
|
||||||
`srcds` is single-threaded per instance. On a multi-core host, pinning
|
|
||||||
each instance to its own core can cut jitter under contention. Drop in
|
|
||||||
`/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
CPUAffinity=2
|
|
||||||
```
|
|
||||||
|
|
||||||
This pins the instance to CPU 2. A reasonable strategy on an N-core
|
|
||||||
host: leave core 0 for the kernel + IRQs + system services, then pin
|
|
||||||
one instance per remaining core.
|
|
||||||
|
|
||||||
### NIC tuning
|
|
||||||
|
|
||||||
Hardware-specific (install via `sudo apt install ethtool` if not
|
|
||||||
present). On a host with a single primary interface (replace `eth0`):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo ethtool -G eth0 rx 4096 tx 4096
|
|
||||||
sudo ethtool -K eth0 gro on lro off
|
|
||||||
```
|
|
||||||
|
|
||||||
If you run a high instance count, also pin the NIC's interrupts off
|
|
||||||
the cores that game servers occupy (see `/proc/interrupts` and
|
|
||||||
`/proc/irq/<n>/smp_affinity`).
|
|
||||||
|
|
||||||
### Real-time scheduling (advanced, opt-in)
|
|
||||||
|
|
||||||
Source-engine servers do not need real-time scheduling, and a
|
|
||||||
misbehaving `srcds` at any RT priority can starve kernel threads — even
|
|
||||||
with the default `kernel.sched_rt_runtime_us=950000` throttling 5% of
|
|
||||||
CPU back. Use only if you have a measured jitter problem that the
|
|
||||||
baseline does not solve.
|
|
||||||
|
|
||||||
`/etc/systemd/system/left4me-server@.service.d/realtime.conf`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
CPUSchedulingPolicy=fifo
|
|
||||||
CPUSchedulingPriority=10
|
|
||||||
LimitRTPRIO=10
|
|
||||||
AmbientCapabilities=CAP_SYS_NICE
|
|
||||||
```
|
|
||||||
|
|
||||||
The `AmbientCapabilities=CAP_SYS_NICE` line is needed because the
|
|
||||||
service runs as `User=left4me` with `NoNewPrivileges=true`; without it
|
|
||||||
some kernels/systemd combinations refuse to apply the RT policy.
|
|
||||||
|
|
||||||
### Additional opt-in network knobs
|
|
||||||
|
|
||||||
- **Ingress shaping via IFB.** Egress CAKE alone does not protect srcds
|
|
||||||
receive against ingress saturation (large workshop downloads,
|
|
||||||
package fetches arriving at line rate). Worth flipping only when
|
|
||||||
measurement shows ingress hurting receive.
|
|
||||||
|
|
||||||
sudo modprobe ifb && sudo ip link set ifb0 up
|
|
||||||
sudo tc qdisc add dev <uplink> handle ffff: ingress
|
|
||||||
sudo tc filter add dev <uplink> parent ffff: protocol ip u32 \
|
|
||||||
match u32 0 0 action mirred egress redirect dev ifb0
|
|
||||||
sudo tc qdisc add dev ifb0 root cake bandwidth Xmbit ingress \
|
|
||||||
diffserv4 dual-srchost
|
|
||||||
|
|
||||||
- **`net.core.busy_poll = 50` / `net.core.busy_read = 50`.** Reduces
|
|
||||||
UDP receive median latency by polling for incoming packets briefly
|
|
||||||
at syscall boundaries. Cost: measurable CPU per syscall under load.
|
|
||||||
Worth flipping if a host is dedicated to game serving and CPU
|
|
||||||
headroom is plentiful.
|
|
||||||
|
|
||||||
- **`ethtool -K <iface> gro off`.** Some Source-engine ops disable
|
|
||||||
generic receive offload to avoid receive-side coalescing latency.
|
|
||||||
Hardware/driver dependent; document only.
|
|
||||||
|
|
||||||
### Applying changes to running servers
|
|
||||||
|
|
||||||
Unit-file changes do not apply to already-running services. After any
|
|
||||||
change:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo systemctl daemon-reload
|
|
||||||
# Restart each game server via the web UI's stop + start, or:
|
|
||||||
sudo systemctl restart 'left4me-server@*.service'
|
|
||||||
```
|
|
||||||
|
|
|
||||||
181
deploy/deploy-test-server.sh
Executable file
181
deploy/deploy-test-server.sh
Executable file
|
|
@ -0,0 +1,181 @@
|
||||||
|
#!/bin/sh
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
printf 'Usage: %s <ssh-user@host>\n' "$0" >&2
|
||||||
|
exit 2
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ "$#" -ne 1 ]; then
|
||||||
|
usage
|
||||||
|
fi
|
||||||
|
|
||||||
|
target=$1
|
||||||
|
script_dir=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd)
|
||||||
|
repo_root=$(CDPATH= cd -- "$script_dir/.." && pwd)
|
||||||
|
tmp_dir=$(mktemp -d)
|
||||||
|
archive="$tmp_dir/left4me.tar.gz"
|
||||||
|
|
||||||
|
cleanup() {
|
||||||
|
rm -rf "$tmp_dir"
|
||||||
|
}
|
||||||
|
trap cleanup EXIT INT HUP TERM
|
||||||
|
|
||||||
|
tar -czf "$archive" \
|
||||||
|
--exclude .git \
|
||||||
|
--exclude .venv \
|
||||||
|
--exclude __pycache__ \
|
||||||
|
--exclude .pytest_cache \
|
||||||
|
--exclude '*.egg-info' \
|
||||||
|
--exclude 'l4d2web.db*' \
|
||||||
|
-C "$repo_root" .
|
||||||
|
|
||||||
|
remote_tmp=$(ssh "$target" 'mktemp -d')
|
||||||
|
scp "$archive" "$target:$remote_tmp/left4me.tar.gz"
|
||||||
|
|
||||||
|
admin_username_file=
|
||||||
|
admin_password_file=
|
||||||
|
if [ "${LEFT4ME_ADMIN_USERNAME+x}" = x ] && [ "${LEFT4ME_ADMIN_PASSWORD+x}" = x ]; then
|
||||||
|
admin_username_file="$tmp_dir/admin_username"
|
||||||
|
admin_password_file="$tmp_dir/admin_password"
|
||||||
|
umask 077
|
||||||
|
printf '%s' "$LEFT4ME_ADMIN_USERNAME" > "$admin_username_file"
|
||||||
|
printf '%s' "$LEFT4ME_ADMIN_PASSWORD" > "$admin_password_file"
|
||||||
|
scp "$admin_username_file" "$target:$remote_tmp/admin_username"
|
||||||
|
scp "$admin_password_file" "$target:$remote_tmp/admin_password"
|
||||||
|
fi
|
||||||
|
|
||||||
|
ssh "$target" sh -s -- "$remote_tmp" <<'REMOTE'
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
remote_tmp=$1
|
||||||
|
archive="$remote_tmp/left4me.tar.gz"
|
||||||
|
repo_tmp="$remote_tmp/repo"
|
||||||
|
|
||||||
|
if [ "$(id -u)" -eq 0 ]; then
|
||||||
|
sudo_cmd=
|
||||||
|
else
|
||||||
|
sudo_cmd=sudo
|
||||||
|
fi
|
||||||
|
|
||||||
|
run_as_left4me() {
|
||||||
|
sudo -u left4me "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_left4me_with_env() {
|
||||||
|
run_as_left4me sh -c 'set -a; . /etc/left4me/host.env; . /etc/left4me/web.env; set +a; exec "$@"' sh "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
cleanup_remote() {
|
||||||
|
rm -rf "$remote_tmp"
|
||||||
|
}
|
||||||
|
trap cleanup_remote EXIT INT HUP TERM
|
||||||
|
|
||||||
|
if ! id left4me >/dev/null 2>&1; then
|
||||||
|
$sudo_cmd useradd --system --home-dir /var/lib/left4me --create-home --shell /usr/sbin/nologin left4me
|
||||||
|
fi
|
||||||
|
|
||||||
|
if command -v apt-get >/dev/null 2>&1; then
|
||||||
|
$sudo_cmd apt-get update
|
||||||
|
$sudo_cmd apt-get install -y python3 python3-venv python3-pip curl ca-certificates tar gzip fuse-overlayfs fuse3 sudo
|
||||||
|
elif command -v dnf >/dev/null 2>&1; then
|
||||||
|
$sudo_cmd dnf install -y python3 python3-pip curl ca-certificates tar gzip fuse-overlayfs fuse3 sudo
|
||||||
|
else
|
||||||
|
printf 'Unsupported package manager: expected apt-get or dnf\n' >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
$sudo_cmd mkdir -p \
|
||||||
|
/etc/left4me \
|
||||||
|
/opt/left4me \
|
||||||
|
/usr/local/lib/systemd/system \
|
||||||
|
/usr/local/libexec/left4me \
|
||||||
|
/var/lib/left4me/installation \
|
||||||
|
/var/lib/left4me/overlays \
|
||||||
|
/var/lib/left4me/instances \
|
||||||
|
/var/lib/left4me/runtime \
|
||||||
|
/var/lib/left4me/tmp
|
||||||
|
|
||||||
|
$sudo_cmd chown -R left4me:left4me /var/lib/left4me /opt/left4me
|
||||||
|
|
||||||
|
mkdir -p "$repo_tmp"
|
||||||
|
tar -xzf "$archive" -C "$repo_tmp"
|
||||||
|
|
||||||
|
if [ -d /opt/left4me/.venv ]; then
|
||||||
|
$sudo_cmd mv /opt/left4me/.venv "$remote_tmp/venv"
|
||||||
|
fi
|
||||||
|
$sudo_cmd find /opt/left4me -mindepth 1 -maxdepth 1 -exec rm -rf {} +
|
||||||
|
$sudo_cmd cp -R "$repo_tmp"/. /opt/left4me/
|
||||||
|
if [ -d "$remote_tmp/venv" ]; then
|
||||||
|
$sudo_cmd mv "$remote_tmp/venv" /opt/left4me/.venv
|
||||||
|
fi
|
||||||
|
$sudo_cmd chown -R left4me:left4me /opt/left4me
|
||||||
|
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-web.service /usr/local/lib/systemd/system/left4me-web.service
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-server@.service /usr/local/lib/systemd/system/left4me-server@.service
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/libexec/left4me/left4me-systemctl /usr/local/libexec/left4me/left4me-systemctl
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/libexec/left4me/left4me-journalctl /usr/local/libexec/left4me/left4me-journalctl
|
||||||
|
$sudo_cmd chmod 0755 /usr/local/libexec/left4me/left4me-systemctl /usr/local/libexec/left4me/left4me-journalctl
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/etc/sudoers.d/left4me /etc/sudoers.d/left4me
|
||||||
|
$sudo_cmd chmod 0440 /etc/sudoers.d/left4me
|
||||||
|
$sudo_cmd visudo -cf /etc/sudoers.d/left4me
|
||||||
|
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/templates/etc/left4me/host.env /etc/left4me/host.env
|
||||||
|
$sudo_cmd chmod 0644 /etc/left4me/host.env
|
||||||
|
|
||||||
|
if [ ! -f /etc/left4me/web.env ]; then
|
||||||
|
secret_key=$(python3 -c 'import secrets; print(secrets.token_hex(32))')
|
||||||
|
tmp_web_env="$remote_tmp/web.env"
|
||||||
|
{
|
||||||
|
printf 'DATABASE_URL=sqlite:////var/lib/left4me/left4me.db\n'
|
||||||
|
printf 'SECRET_KEY=%s\n' "$secret_key"
|
||||||
|
printf 'JOB_WORKER_THREADS=4\n'
|
||||||
|
} > "$tmp_web_env"
|
||||||
|
$sudo_cmd install -m 0640 -o root -g left4me "$tmp_web_env" /etc/left4me/web.env
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ ! -x /opt/left4me/.venv/bin/python ]; then
|
||||||
|
run_as_left4me python3 -m venv /opt/left4me/.venv
|
||||||
|
fi
|
||||||
|
run_as_left4me /opt/left4me/.venv/bin/python -m pip install --upgrade pip
|
||||||
|
run_as_left4me /opt/left4me/.venv/bin/pip install -e /opt/left4me/l4d2host -e /opt/left4me/l4d2web
|
||||||
|
|
||||||
|
run_left4me_with_env env \
|
||||||
|
JOB_WORKER_ENABLED=false \
|
||||||
|
/opt/left4me/.venv/bin/python -c "from l4d2web.app import create_app; create_app()"
|
||||||
|
|
||||||
|
run_as_left4me sh -c "cd /opt/left4me/l4d2web && set -a; . /etc/left4me/host.env; . /etc/left4me/web.env; set +a; env \
|
||||||
|
JOB_WORKER_ENABLED=false \
|
||||||
|
PYTHONPATH=/opt/left4me \
|
||||||
|
/opt/left4me/.venv/bin/alembic -c /opt/left4me/l4d2web/alembic.ini upgrade head"
|
||||||
|
|
||||||
|
if [ -f "$remote_tmp/admin_username" ] && [ -f "$remote_tmp/admin_password" ]; then
|
||||||
|
LEFT4ME_ADMIN_USERNAME=$(cat "$remote_tmp/admin_username")
|
||||||
|
LEFT4ME_ADMIN_PASSWORD=$(cat "$remote_tmp/admin_password")
|
||||||
|
if ! create_user_output=$(run_left4me_with_env env \
|
||||||
|
JOB_WORKER_ENABLED=false \
|
||||||
|
LEFT4ME_ADMIN_PASSWORD="$LEFT4ME_ADMIN_PASSWORD" \
|
||||||
|
/opt/left4me/.venv/bin/flask --app l4d2web.app:create_app create-user "$LEFT4ME_ADMIN_USERNAME" --admin 2>&1); then
|
||||||
|
case "$create_user_output" in
|
||||||
|
*'user already exists'*) printf '%s\n' "$create_user_output" ;;
|
||||||
|
*) printf '%s\n' "$create_user_output" >&2; exit 1 ;;
|
||||||
|
esac
|
||||||
|
else
|
||||||
|
printf '%s\n' "$create_user_output"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
$sudo_cmd systemctl daemon-reload
|
||||||
|
$sudo_cmd systemctl enable --now left4me-web.service
|
||||||
|
$sudo_cmd systemctl restart left4me-web.service
|
||||||
|
for attempt in 1 2 3 4 5 6 7 8 9 10; do
|
||||||
|
if curl -fsS http://127.0.0.1:8000/health; then
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
sleep 1
|
||||||
|
done
|
||||||
|
|
||||||
|
$sudo_cmd systemctl status left4me-web.service --no-pager >&2 || true
|
||||||
|
$sudo_cmd journalctl -u left4me-web.service -n 80 --no-pager >&2 || true
|
||||||
|
exit 1
|
||||||
|
REMOTE
|
||||||
|
|
@ -1,6 +0,0 @@
|
||||||
# Sandbox-only resolver config — bind-mounted into script-overlay sandboxes
|
|
||||||
# at /etc/resolv.conf. The host's resolver (often a private/LAN DNS server)
|
|
||||||
# is unreachable from inside the sandbox because IPAddressDeny= blocks
|
|
||||||
# egress to RFC1918 / loopback. Public resolvers keep DNS working.
|
|
||||||
nameserver 1.1.1.1
|
|
||||||
nameserver 8.8.8.8
|
|
||||||
|
|
@ -1,5 +1,3 @@
|
||||||
Defaults:left4me !requiretty
|
Defaults:left4me !requiretty
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
|
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
|
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-overlay mount *, /usr/local/libexec/left4me/left4me-overlay umount *
|
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
|
|
|
||||||
|
|
@ -1,41 +0,0 @@
|
||||||
# Host-side perf baseline for left4me — see
|
|
||||||
# docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
#
|
|
||||||
# UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
|
|
||||||
# Source-engine UDP across multiple instances. 8 MiB matches the standard
|
|
||||||
# 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
|
|
||||||
# explicitly enlarge their buffers.
|
|
||||||
net.core.rmem_max = 8388608
|
|
||||||
net.core.wmem_max = 8388608
|
|
||||||
net.core.rmem_default = 524288
|
|
||||||
net.core.wmem_default = 524288
|
|
||||||
|
|
||||||
# Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
|
|
||||||
# at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
|
|
||||||
# netdev_budget = 600 gives softirq more drain headroom per pass.
|
|
||||||
net.core.netdev_max_backlog = 5000
|
|
||||||
net.core.netdev_budget = 600
|
|
||||||
|
|
||||||
# Latency-sensitive default: avoid swap unless the box is really under
|
|
||||||
# pressure. Harmless on swapless hosts.
|
|
||||||
vm.swappiness = 10
|
|
||||||
|
|
||||||
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
|
|
||||||
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
|
|
||||||
net.ipv4.udp_rmem_min = 16384
|
|
||||||
net.ipv4.udp_wmem_min = 16384
|
|
||||||
|
|
||||||
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian Trixie
|
|
||||||
# already defaults to fq_codel; setting it explicitly is belt-and-suspenders
|
|
||||||
# and survives kernel-default churn.
|
|
||||||
net.core.default_qdisc = fq_codel
|
|
||||||
|
|
||||||
# TCP congestion control: BBR for any bulk TCP egress on the host (admin SSH,
|
|
||||||
# backups, package fetches, web-app responses) so a long flow does not push
|
|
||||||
# the bottleneck queue ahead of game UDP. UDP srcds is unaffected.
|
|
||||||
net.ipv4.tcp_congestion_control = bbr
|
|
||||||
|
|
||||||
# Block ptrace except from CAP_SYS_PTRACE holders. Belt-and-braces with
|
|
||||||
# SystemCallFilter=~@debug + PrivateUsers=true in the gameserver unit.
|
|
||||||
# See docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md.
|
|
||||||
kernel.yama.ptrace_scope = 2
|
|
||||||
|
|
@ -1,82 +0,0 @@
|
||||||
# Hardening drop-in for left4me-server@.service.
|
|
||||||
#
|
|
||||||
# Source of truth: this file (in left4me/deploy/files/). ckn-bw deploys
|
|
||||||
# it to /etc/systemd/system/left4me-server@.service.d/10-hardening.conf
|
|
||||||
# via a target-side symlink into the checkout.
|
|
||||||
#
|
|
||||||
# Gameserver unit: full hardening profile. No sudo path inside; no
|
|
||||||
# sudo-incompatibility carve-outs.
|
|
||||||
[Service]
|
|
||||||
NoNewPrivileges=true
|
|
||||||
RestrictSUIDSGID=true
|
|
||||||
CapabilityBoundingSet=
|
|
||||||
AmbientCapabilities=
|
|
||||||
# srcds_linux is i386 (Source 2007 engine). Bare 'native' kills every
|
|
||||||
# 32-bit syscall and traps srcds_run in a respawn loop.
|
|
||||||
SystemCallArchitectures=native x86
|
|
||||||
SystemCallFilter=@system-service
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
|
|
||||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/installation
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/overlays
|
|
||||||
# Workshop VPKs in overlays are symlinks into workshop_cache;
|
|
||||||
# without this bind they dangle inside the unit and Source
|
|
||||||
# silently fails to load the addons.
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/workshop_cache
|
|
||||||
# Steam SDK: srcds dlopen's ~/.steam/sdk32/steamclient.so for
|
|
||||||
# Steam master-server registration. Without this, SteamAPI_Init
|
|
||||||
# fails and the server falls back to LAN-only mode regardless
|
|
||||||
# of sv_lan=0 — clients then get "LAN servers are restricted
|
|
||||||
# to local clients (class C)". .steam holds symlinks into
|
|
||||||
# /var/lib/left4me/steam, so both paths need to be bound back
|
|
||||||
# through TemporaryFileSystem.
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/.steam
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/steam
|
|
||||||
BindReadOnlyPaths=/etc/left4me/host.env
|
|
||||||
BindReadOnlyPaths=/etc/ssl
|
|
||||||
BindReadOnlyPaths=/etc/ca-certificates
|
|
||||||
BindReadOnlyPaths=/etc/resolv.conf
|
|
||||||
BindReadOnlyPaths=/etc/nsswitch.conf
|
|
||||||
BindReadOnlyPaths=/etc/alternatives
|
|
||||||
BindPaths=/var/lib/left4me/runtime/%i
|
|
||||||
ProtectSystem=strict
|
|
||||||
ProtectHome=true
|
|
||||||
PrivateUsers=true
|
|
||||||
# PrivatePIDs is the test-plan amendment that closes D2.b: same-uid
|
|
||||||
# ProtectProc=invisible cannot hide gunicorn from srcds (both run as
|
|
||||||
# uid 980); a private PID namespace does.
|
|
||||||
PrivatePIDs=true
|
|
||||||
PrivateTmp=true
|
|
||||||
PrivateDevices=true
|
|
||||||
PrivateIPC=true
|
|
||||||
RestrictNamespaces=true
|
|
||||||
RestrictRealtime=true
|
|
||||||
ProtectProc=invisible
|
|
||||||
# ProcSubset=pid intentionally OMITTED — it hides /proc/cpuinfo and
|
|
||||||
# /proc/sys/*, which breaks Source's tier0/cpu.cpp and (downstream)
|
|
||||||
# SteamAPI_Init's pipe-creation step. Server then registers as LAN-only
|
|
||||||
# and rejects external clients with "LAN servers are restricted to
|
|
||||||
# local clients (class C)". PrivatePIDs=true (kernel PID namespace) is
|
|
||||||
# the load-bearing peer-process isolation; ProtectProc=invisible is the
|
|
||||||
# foreign-uid /proc hide. Losing ProcSubset=pid only exposes host kernel
|
|
||||||
# info (cpuinfo, meminfo, sysctls), which is not sensitive in this
|
|
||||||
# threat model. See ckn-bw commit 4339289 for the original fix.
|
|
||||||
ProtectKernelTunables=true
|
|
||||||
ProtectKernelModules=true
|
|
||||||
ProtectKernelLogs=true
|
|
||||||
ProtectClock=true
|
|
||||||
ProtectControlGroups=true
|
|
||||||
ProtectHostname=true
|
|
||||||
LockPersonality=true
|
|
||||||
RemoveIPC=true
|
|
||||||
KeyringMode=private
|
|
||||||
UMask=0027
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
|
|
||||||
# Lock srcds bindable sockets to the game port range. Hard-coded range
|
|
||||||
# because systemd directive variable substitution is uneven for
|
|
||||||
# SocketBindAllow.
|
|
||||||
SocketBindAllow=udp:27000-27999
|
|
||||||
SocketBindAllow=tcp:27000-27999
|
|
||||||
# W+X mprotect (text relocations in Source engine i386 .so files) is
|
|
||||||
# incompatible with the memory-deny-write-execute directive; that
|
|
||||||
# directive is therefore intentionally absent from this drop-in.
|
|
||||||
|
|
@ -1,44 +0,0 @@
|
||||||
# Hardening drop-in for left4me-web.service.
|
|
||||||
#
|
|
||||||
# Source of truth: this file (in left4me/deploy/files/). ckn-bw deploys
|
|
||||||
# it to /etc/systemd/system/left4me-web.service.d/10-hardening.conf via a
|
|
||||||
# target-side symlink into the checkout.
|
|
||||||
#
|
|
||||||
# See docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
|
|
||||||
# and 2026-05-15-hardening-test-plan.md for the threat model and the
|
|
||||||
# verification matrix.
|
|
||||||
#
|
|
||||||
# This unit is the web app; some sudo-incompatible directives are
|
|
||||||
# intentionally absent:
|
|
||||||
# NoNewPrivileges — blocks sudo's setuid escalation
|
|
||||||
# PrivateUsers — breaks sudo's host-root mapping
|
|
||||||
# RestrictSUIDSGID — blocks setuid()/setgid()
|
|
||||||
# CapabilityBoundingSet — empty value would deny sudo's caps
|
|
||||||
# @privileged exclusion in SystemCallFilter — blocks sudo's setuid syscall
|
|
||||||
# All of those are unconditional on the gameserver unit (no sudo there).
|
|
||||||
[Service]
|
|
||||||
ProtectSystem=strict
|
|
||||||
ProtectHome=true
|
|
||||||
PrivateTmp=true
|
|
||||||
ProtectProc=invisible
|
|
||||||
ProtectKernelTunables=true
|
|
||||||
ProtectKernelModules=true
|
|
||||||
ProtectKernelLogs=true
|
|
||||||
ProtectClock=true
|
|
||||||
ProtectControlGroups=true
|
|
||||||
ProtectHostname=true
|
|
||||||
LockPersonality=true
|
|
||||||
# `native x86` (not just `native`) — the install job fork-execs
|
|
||||||
# steamcmd_linux, a 32-bit binary, which makes i386-numbered syscalls.
|
|
||||||
# Under `native` alone the kernel SIGSYS-kills it (bash exit 159 =
|
|
||||||
# 128+SIGSYS). Mirrors the server unit, which needs the same allowance
|
|
||||||
# for srcds_linux. See deploy/files/etc/systemd/system/left4me-server@.service.d/10-hardening.conf.
|
|
||||||
SystemCallArchitectures=native x86
|
|
||||||
SystemCallFilter=@system-service
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
|
|
||||||
RestrictNamespaces=true
|
|
||||||
RestrictRealtime=true
|
|
||||||
RemoveIPC=true
|
|
||||||
KeyringMode=private
|
|
||||||
UMask=0027
|
|
||||||
|
|
@ -1,8 +0,0 @@
|
||||||
# Perf baseline — see docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
[Unit]
|
|
||||||
Description=left4me script-sandbox build slice
|
|
||||||
Before=slices.target
|
|
||||||
|
|
||||||
[Slice]
|
|
||||||
CPUWeight=10
|
|
||||||
IOWeight=10
|
|
||||||
|
|
@ -1,8 +0,0 @@
|
||||||
# Perf baseline — see docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
[Unit]
|
|
||||||
Description=left4me game-server slice
|
|
||||||
Before=slices.target
|
|
||||||
|
|
||||||
[Slice]
|
|
||||||
CPUWeight=1000
|
|
||||||
IOWeight=1000
|
|
||||||
|
|
@ -1,24 +1,7 @@
|
||||||
# left4me gameserver — system unit, one instance per gameserver.
|
|
||||||
#
|
|
||||||
# This is the REFERENCE COPY of the deployed unit base body. The live
|
|
||||||
# source is the systemd/units reactor at
|
|
||||||
# ~/Projekte/ckn-bw/bundles/left4me/metadata.py (look for
|
|
||||||
# 'left4me-server@.service').
|
|
||||||
#
|
|
||||||
# Hardening: see left4me-server@.service.d/10-hardening.conf
|
|
||||||
#
|
|
||||||
# Threat model: docs/superpowers/specs/2026-05-15-hardening-threat-model.md
|
|
||||||
# Defenses survey: docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
|
|
||||||
# Test plan + results: docs/superpowers/specs/2026-05-15-hardening-test-plan.md
|
|
||||||
|
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=left4me server instance %i
|
Description=left4me server instance %i
|
||||||
After=network-online.target
|
After=network-online.target
|
||||||
Wants=network-online.target
|
Wants=network-online.target
|
||||||
# Bound the restart loop. Without these, a persistent ExecStartPre or
|
|
||||||
# ExecStart failure spins indefinitely.
|
|
||||||
StartLimitBurst=5
|
|
||||||
StartLimitIntervalSec=60s
|
|
||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=simple
|
Type=simple
|
||||||
|
|
@ -26,38 +9,19 @@ User=left4me
|
||||||
Group=left4me
|
Group=left4me
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
EnvironmentFile=/etc/left4me/host.env
|
||||||
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
||||||
# `-` prefix: chdir failure is non-fatal. The merged dir only exists
|
WorkingDirectory=/var/lib/left4me/runtime/%i/merged/left4dead2
|
||||||
# once ExecStartPre's overlay mount succeeds.
|
ExecStart=/var/lib/left4me/installation/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
||||||
WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
|
|
||||||
# `+` prefix runs the helper as PID 1 (root, all caps, host
|
|
||||||
# namespaces) — required because the hardening drop-in sets
|
|
||||||
# NoNewPrivileges and PrivateUsers; both block sudo's setuid path.
|
|
||||||
# nsenter into PID 1's mount namespace ensures the umount in
|
|
||||||
# ExecStopPost succeeds without EBUSY from the unit's own
|
|
||||||
# slave-mount tree.
|
|
||||||
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
|
||||||
# Run from the merged overlay, NOT installation/. srcds_run cds to its
|
|
||||||
# own dirname before exec'ing srcds_linux; the binary's path determines
|
|
||||||
# gameinfo + addons lookup.
|
|
||||||
ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
|
||||||
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
|
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=5
|
RestartSec=5
|
||||||
|
NoNewPrivileges=true
|
||||||
# === Resource control baseline ===
|
PrivateTmp=true
|
||||||
# See docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
PrivateDevices=true
|
||||||
Slice=l4d2-game.slice
|
ProtectHome=true
|
||||||
Nice=-5
|
ProtectSystem=strict
|
||||||
IOSchedulingClass=best-effort
|
ReadOnlyPaths=/var/lib/left4me/installation /var/lib/left4me/overlays
|
||||||
IOSchedulingPriority=4
|
ReadWritePaths=/var/lib/left4me/runtime/%i
|
||||||
OOMScoreAdjust=-200
|
RestrictSUIDSGID=true
|
||||||
MemoryHigh=1.5G
|
LockPersonality=true
|
||||||
MemoryMax=2G
|
|
||||||
TasksMax=256
|
|
||||||
LimitNOFILE=65536
|
|
||||||
KillSignal=SIGINT
|
|
||||||
TimeoutStopSec=15s
|
|
||||||
LogRateLimitIntervalSec=0
|
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
|
|
|
||||||
|
|
@ -1,14 +1,3 @@
|
||||||
# left4me web application — system unit.
|
|
||||||
#
|
|
||||||
# This is the REFERENCE COPY of the deployed unit base body. The live
|
|
||||||
# source is the systemd/units reactor at
|
|
||||||
# ~/Projekte/ckn-bw/bundles/left4me/metadata.py (look for
|
|
||||||
# 'left4me-web.service').
|
|
||||||
#
|
|
||||||
# Hardening: see left4me-web.service.d/10-hardening.conf
|
|
||||||
#
|
|
||||||
# Threat model + defenses + tests: see docs/superpowers/specs/2026-05-15-hardening-*
|
|
||||||
|
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=left4me web application
|
Description=left4me web application
|
||||||
After=network-online.target
|
After=network-online.target
|
||||||
|
|
@ -18,19 +7,17 @@ Wants=network-online.target
|
||||||
Type=simple
|
Type=simple
|
||||||
User=left4me
|
User=left4me
|
||||||
Group=left4me
|
Group=left4me
|
||||||
WorkingDirectory=/opt/left4me/src
|
WorkingDirectory=/opt/left4me
|
||||||
Environment=HOME=/var/lib/left4me PATH=/var/lib/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
Environment=HOME=/var/lib/left4me
|
||||||
|
Environment=PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
EnvironmentFile=/etc/left4me/host.env
|
||||||
EnvironmentFile=/etc/left4me/web.env
|
EnvironmentFile=/etc/left4me/web.env
|
||||||
# Placeholder values for --workers / --threads. Live emission interpolates
|
ExecStart=/opt/left4me/.venv/bin/gunicorn --workers 1 --threads 8 --bind 0.0.0.0:8000 'l4d2web.app:create_app()'
|
||||||
# from metadata.get('left4me/gunicorn_workers') and gunicorn_threads.
|
|
||||||
ExecStart=/var/lib/left4me/.venv/bin/gunicorn --workers 1 --threads 32 --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
|
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=3
|
RestartSec=3
|
||||||
|
NoNewPrivileges=true
|
||||||
# Web writes broadly under /var/lib/left4me (DB, instance configs,
|
PrivateTmp=true
|
||||||
# overlays, runtime). Kept inline because it's web-specific
|
ProtectSystem=full
|
||||||
# (server@ uses BindPaths to bind only its instance dir).
|
|
||||||
ReadWritePaths=/var/lib/left4me
|
ReadWritePaths=/var/lib/left4me
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
|
|
|
||||||
|
|
@ -1,15 +0,0 @@
|
||||||
[Unit]
|
|
||||||
Description=left4me daily workshop refresh (enqueue job)
|
|
||||||
After=network-online.target left4me-web.service
|
|
||||||
Wants=left4me-web.service
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
User=left4me
|
|
||||||
Group=left4me
|
|
||||||
WorkingDirectory=/opt/left4me/src
|
|
||||||
Environment=HOME=/var/lib/left4me
|
|
||||||
Environment=PATH=/var/lib/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
|
||||||
EnvironmentFile=/etc/left4me/web.env
|
|
||||||
ExecStart=/var/lib/left4me/.venv/bin/flask --app l4d2web.app:create_app workshop-refresh
|
|
||||||
|
|
@ -1,11 +0,0 @@
|
||||||
[Unit]
|
|
||||||
Description=left4me daily workshop refresh
|
|
||||||
|
|
||||||
[Timer]
|
|
||||||
OnCalendar=*-*-* 04:00:00
|
|
||||||
Persistent=true
|
|
||||||
RandomizedDelaySec=15min
|
|
||||||
Unit=left4me-workshop-refresh.service
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=timers.target
|
|
||||||
|
|
@ -37,16 +37,6 @@ case "$follow_flag" in
|
||||||
esac
|
esac
|
||||||
|
|
||||||
unit="left4me-server@${name}.service"
|
unit="left4me-server@${name}.service"
|
||||||
|
|
||||||
if [ -x /bin/systemctl ]; then
|
|
||||||
systemctl=/bin/systemctl
|
|
||||||
elif [ -x /usr/bin/systemctl ]; then
|
|
||||||
systemctl=/usr/bin/systemctl
|
|
||||||
else
|
|
||||||
printf '%s\n' 'systemctl not found at /bin/systemctl or /usr/bin/systemctl' >&2
|
|
||||||
exit 69
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [ -x /bin/journalctl ]; then
|
if [ -x /bin/journalctl ]; then
|
||||||
journalctl=/bin/journalctl
|
journalctl=/bin/journalctl
|
||||||
elif [ -x /usr/bin/journalctl ]; then
|
elif [ -x /usr/bin/journalctl ]; then
|
||||||
|
|
@ -56,20 +46,8 @@ else
|
||||||
exit 69
|
exit 69
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Anchor `--since` to the moment systemd began the unit's current start
|
|
||||||
# transaction so the log panel starts at the latest run. Force LC_ALL=C so
|
|
||||||
# the day-of-week prefix is in a locale journalctl reliably parses.
|
|
||||||
start_time=$(LC_ALL=C "$systemctl" show -p InactiveExitTimestamp --value "$unit" 2>/dev/null || true)
|
|
||||||
|
|
||||||
if [ -n "$start_time" ]; then
|
|
||||||
if [ -n "$follow_arg" ]; then
|
|
||||||
exec "$journalctl" -u "$unit" --since "$start_time" -n "$lines" -o cat "$follow_arg"
|
|
||||||
fi
|
|
||||||
exec "$journalctl" -u "$unit" --since "$start_time" -n "$lines" -o cat
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Unit has never run: no --since cutoff. `-f` will attach on first start.
|
|
||||||
if [ -n "$follow_arg" ]; then
|
if [ -n "$follow_arg" ]; then
|
||||||
exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"
|
exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
exec "$journalctl" -u "$unit" -n "$lines" -o cat
|
exec "$journalctl" -u "$unit" -n "$lines" -o cat
|
||||||
|
|
@ -2,7 +2,7 @@
|
||||||
set -eu
|
set -eu
|
||||||
|
|
||||||
usage() {
|
usage() {
|
||||||
printf '%s\n' "usage: left4me-systemctl enable|disable|show <server-name>" >&2
|
printf '%s\n' "usage: left4me-systemctl start|stop|show <server-name>" >&2
|
||||||
exit 2
|
exit 2
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -22,7 +22,7 @@ action=$1
|
||||||
name=$2
|
name=$2
|
||||||
|
|
||||||
case "$action" in
|
case "$action" in
|
||||||
enable|disable|show) ;;
|
start|stop|show) ;;
|
||||||
*) usage ;;
|
*) usage ;;
|
||||||
esac
|
esac
|
||||||
|
|
||||||
|
|
@ -38,7 +38,7 @@ else
|
||||||
fi
|
fi
|
||||||
|
|
||||||
case "$action" in
|
case "$action" in
|
||||||
enable) exec "$systemctl" enable --now "$unit" ;;
|
start) exec "$systemctl" start "$unit" ;;
|
||||||
disable) exec "$systemctl" disable --now "$unit" ;;
|
stop) exec "$systemctl" stop "$unit" ;;
|
||||||
show) exec "$systemctl" show --property=ActiveState --property=SubState "$unit" ;;
|
show) exec "$systemctl" show --property=ActiveState --property=SubState "$unit" ;;
|
||||||
esac
|
esac
|
||||||
|
|
@ -1,244 +0,0 @@
|
||||||
#!/usr/bin/python3
|
|
||||||
"""Privileged overlay mount helper for left4me.
|
|
||||||
|
|
||||||
Invoked from the systemd unit's ExecStartPre / ExecStopPost via
|
|
||||||
`+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- …`. The unit-level
|
|
||||||
nsenter is what makes this work: it runs the helper Python interpreter
|
|
||||||
inside PID 1's mount namespace. Without it, the `+` Exec prefix
|
|
||||||
removes the sandbox/credentials but does NOT detach from the unit's
|
|
||||||
per-service mount namespace, and the helper process itself would pin
|
|
||||||
that namespace alive — turning every umount into a multi-second EBUSY
|
|
||||||
race with the kernel's deferred namespace cleanup. With the unit-level
|
|
||||||
nsenter the helper has no such reference and umount succeeds first try.
|
|
||||||
|
|
||||||
Validates inputs strictly, then performs `mount -t overlay` /
|
|
||||||
`umount` directly — no internal nsenter, since the helper is already
|
|
||||||
running where the syscalls need to take effect.
|
|
||||||
|
|
||||||
Verbs:
|
|
||||||
mount <name> Reads ${LEFT4ME_ROOT}/instances/<name>/instance.env
|
|
||||||
for L4D2_LOWERDIRS, validates every lowerdir is
|
|
||||||
under one of installation/overlays/workshop_cache/
|
|
||||||
global_overlay_cache, then mounts the kernel
|
|
||||||
overlay at runtime/<name>/merged.
|
|
||||||
umount <name> Unmounts runtime/<name>/merged and cleans up the
|
|
||||||
kernel-overlayfs `work/work` orphan.
|
|
||||||
|
|
||||||
Set LEFT4ME_OVERLAY_PRINT_ONLY=1 to print the would-be argv (one line,
|
|
||||||
shell-quoted) and exit 0 instead of execv. Used by tests.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import re
|
|
||||||
import shlex
|
|
||||||
import shutil
|
|
||||||
import subprocess
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
NAME_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
|
|
||||||
DEFAULT_ROOT = "/var/lib/left4me"
|
|
||||||
LOWERDIR_ALLOWLIST = (
|
|
||||||
"installation",
|
|
||||||
"overlays",
|
|
||||||
"global_overlay_cache",
|
|
||||||
"workshop_cache",
|
|
||||||
)
|
|
||||||
MAX_LOWERDIRS = 500
|
|
||||||
MOUNT_BIN = "/bin/mount"
|
|
||||||
UMOUNT_BIN = "/bin/umount"
|
|
||||||
|
|
||||||
|
|
||||||
def die(msg: str) -> None:
|
|
||||||
sys.stderr.write(f"left4me-overlay: {msg}\n")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
|
|
||||||
def root() -> Path:
|
|
||||||
return Path(os.environ.get("LEFT4ME_ROOT") or DEFAULT_ROOT)
|
|
||||||
|
|
||||||
|
|
||||||
def validate_name(name: str) -> str:
|
|
||||||
if not NAME_RE.fullmatch(name):
|
|
||||||
die(f"invalid instance name: {name!r}")
|
|
||||||
return name
|
|
||||||
|
|
||||||
|
|
||||||
def parse_lowerdirs(env_path: Path) -> list[str]:
|
|
||||||
if not env_path.is_file():
|
|
||||||
die(f"instance.env not found: {env_path}")
|
|
||||||
raw = None
|
|
||||||
for line in env_path.read_text().splitlines():
|
|
||||||
if "=" not in line:
|
|
||||||
continue
|
|
||||||
key, value = line.split("=", 1)
|
|
||||||
if key.strip() == "L4D2_LOWERDIRS":
|
|
||||||
raw = value
|
|
||||||
break
|
|
||||||
if raw is None:
|
|
||||||
die(f"L4D2_LOWERDIRS not set in {env_path}")
|
|
||||||
if raw == "":
|
|
||||||
die(f"L4D2_LOWERDIRS is empty in {env_path}")
|
|
||||||
parts = raw.split(":")
|
|
||||||
if any(p == "" for p in parts):
|
|
||||||
die(f"L4D2_LOWERDIRS contains an empty entry: {raw!r}")
|
|
||||||
if len(parts) > MAX_LOWERDIRS:
|
|
||||||
die(f"L4D2_LOWERDIRS has {len(parts)} entries (cap {MAX_LOWERDIRS})")
|
|
||||||
return parts
|
|
||||||
|
|
||||||
|
|
||||||
def canonical_under(allowed_roots: list[Path], path: Path) -> Path:
|
|
||||||
try:
|
|
||||||
canonical = path.resolve(strict=True)
|
|
||||||
except (FileNotFoundError, RuntimeError):
|
|
||||||
die(f"path does not exist or has a symlink loop: {path}")
|
|
||||||
for r in allowed_roots:
|
|
||||||
if canonical == r or r in canonical.parents:
|
|
||||||
return canonical
|
|
||||||
die(f"path is outside the permitted roots: {path} (resolved: {canonical})")
|
|
||||||
|
|
||||||
|
|
||||||
_LISTXATTR = getattr(os, "listxattr", None)
|
|
||||||
|
|
||||||
|
|
||||||
def _entry_has_fuse_xattr(path: str) -> str | None:
|
|
||||||
if _LISTXATTR is None:
|
|
||||||
return None
|
|
||||||
try:
|
|
||||||
attrs = _LISTXATTR(path, follow_symlinks=False)
|
|
||||||
except OSError:
|
|
||||||
return None
|
|
||||||
for a in attrs:
|
|
||||||
if a.startswith("user.fuseoverlayfs."):
|
|
||||||
return a
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def assert_no_fuse_xattrs(upper: Path) -> None:
|
|
||||||
if not upper.exists() or _LISTXATTR is None:
|
|
||||||
return
|
|
||||||
for dirpath, dirnames, filenames in os.walk(upper):
|
|
||||||
for entry in (dirpath, *(os.path.join(dirpath, n) for n in dirnames),
|
|
||||||
*(os.path.join(dirpath, n) for n in filenames)):
|
|
||||||
tainted = _entry_has_fuse_xattr(entry)
|
|
||||||
if tainted:
|
|
||||||
die(
|
|
||||||
f"upperdir contains fuse-overlayfs xattr {tainted!r} on {entry}; "
|
|
||||||
"wipe upper/ and work/ before mounting"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _print_argv(argv: list[str]) -> None:
|
|
||||||
"""Emit one shell-quoted argv line to stdout (PRINT_ONLY helper, no exit)."""
|
|
||||||
print(" ".join(shlex.quote(a) for a in argv))
|
|
||||||
|
|
||||||
|
|
||||||
def exec_or_print(argv: list[str]) -> None:
|
|
||||||
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
|
|
||||||
_print_argv(argv)
|
|
||||||
sys.exit(0)
|
|
||||||
os.execv(argv[0], argv)
|
|
||||||
|
|
||||||
|
|
||||||
def cmd_mount(name: str) -> None:
|
|
||||||
name = validate_name(name)
|
|
||||||
r = root()
|
|
||||||
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
|
|
||||||
merged_for_check = (runtime_name_dir / "merged").resolve(strict=True)
|
|
||||||
|
|
||||||
# Idempotency for unit restart cycles: if a previous start mounted
|
|
||||||
# successfully but ExecStart failed afterwards (and Restart=on-failure
|
|
||||||
# fires another cycle), the second ExecStartPre would otherwise refuse
|
|
||||||
# to mount-on-top. Short-circuit here so the second cycle just gets
|
|
||||||
# straight to ExecStart. PRINT_ONLY (test mode) bypasses this so the
|
|
||||||
# tests can exercise the full nsenter argv regardless of mount state.
|
|
||||||
if (
|
|
||||||
os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") != "1"
|
|
||||||
and os.path.ismount(merged_for_check)
|
|
||||||
):
|
|
||||||
return
|
|
||||||
|
|
||||||
instance_env = r / "instances" / name / "instance.env"
|
|
||||||
raw_lowerdirs = parse_lowerdirs(instance_env)
|
|
||||||
|
|
||||||
allowed_roots = [(r / sub).resolve() for sub in LOWERDIR_ALLOWLIST]
|
|
||||||
canonical_lowerdirs = [str(canonical_under(allowed_roots, Path(p))) for p in raw_lowerdirs]
|
|
||||||
|
|
||||||
upper = (runtime_name_dir / "upper").resolve(strict=True)
|
|
||||||
work = (runtime_name_dir / "work").resolve(strict=True)
|
|
||||||
merged = merged_for_check
|
|
||||||
for label, path in (("upper", upper), ("work", work), ("merged", merged)):
|
|
||||||
if path.parent != runtime_name_dir:
|
|
||||||
die(f"{label} resolved outside runtime/{name}: {path}")
|
|
||||||
|
|
||||||
assert_no_fuse_xattrs(upper)
|
|
||||||
|
|
||||||
options = f"lowerdir={':'.join(canonical_lowerdirs)},upperdir={upper},workdir={work}"
|
|
||||||
argv = [
|
|
||||||
MOUNT_BIN,
|
|
||||||
"-t", "overlay",
|
|
||||||
"overlay",
|
|
||||||
"-o", options,
|
|
||||||
str(merged),
|
|
||||||
]
|
|
||||||
exec_or_print(argv)
|
|
||||||
|
|
||||||
|
|
||||||
def cmd_umount(name: str) -> None:
|
|
||||||
name = validate_name(name)
|
|
||||||
r = root()
|
|
||||||
runtime_name_dir = (r / "runtime" / name).resolve(strict=True)
|
|
||||||
merged_path = runtime_name_dir / "merged"
|
|
||||||
work_inner = runtime_name_dir / "work" / "work"
|
|
||||||
|
|
||||||
overlay_umount_argv = [
|
|
||||||
UMOUNT_BIN,
|
|
||||||
# Resolve only if it exists; PRINT_ONLY tests always pre-create it.
|
|
||||||
str(merged_path.resolve(strict=True) if merged_path.exists() else merged_path),
|
|
||||||
]
|
|
||||||
|
|
||||||
if os.environ.get("LEFT4ME_OVERLAY_PRINT_ONLY") == "1":
|
|
||||||
_print_argv(overlay_umount_argv)
|
|
||||||
sys.exit(0)
|
|
||||||
|
|
||||||
if merged_path.exists():
|
|
||||||
merged = merged_path.resolve(strict=True)
|
|
||||||
if merged.parent != runtime_name_dir:
|
|
||||||
die(f"merged resolved outside runtime/{name}: {merged}")
|
|
||||||
# Idempotency: only umount if currently a mount point. Mirrors
|
|
||||||
# cmd_mount's symmetric check; a redundant cleanup pass — or a
|
|
||||||
# call after a partial _purge_instance — must be a no-op.
|
|
||||||
#
|
|
||||||
# No retry loop here: with the helper running in PID 1's mount
|
|
||||||
# namespace (via the unit-level `nsenter --mount=/proc/1/ns/mnt`
|
|
||||||
# in ExecStopPost), it holds no reference to the unit's
|
|
||||||
# per-service mount namespace, so the cgroup-empty → namespace
|
|
||||||
# reaped → umount-clears sequence happens without any race
|
|
||||||
# window for us to ride out. EBUSY here is a real error.
|
|
||||||
if os.path.ismount(merged):
|
|
||||||
subprocess.run(overlay_umount_argv, check=True)
|
|
||||||
|
|
||||||
# Kernel-overlayfs creates work_inner during mount with root:root mode
|
|
||||||
# 0/0. After unmount it's an orphan that the unit's User= (left4me)
|
|
||||||
# cannot traverse via shutil.rmtree, so reset/delete in instances.py
|
|
||||||
# blows up with EACCES on `runtime/<name>/work/work`. The helper is
|
|
||||||
# the only code path with root that knows about this directory, so
|
|
||||||
# the cleanup belongs here. Safe to nuke — the kernel re-creates it
|
|
||||||
# on the next mount. Run unconditionally — covers both "we just
|
|
||||||
# unmounted" and "previous teardown didn't finish" cases.
|
|
||||||
if work_inner.exists():
|
|
||||||
shutil.rmtree(work_inner)
|
|
||||||
|
|
||||||
|
|
||||||
def main(argv: list[str]) -> None:
|
|
||||||
if len(argv) != 3 or argv[1] not in ("mount", "umount"):
|
|
||||||
sys.stderr.write("usage: left4me-overlay mount|umount <name>\n")
|
|
||||||
sys.exit(2)
|
|
||||||
if argv[1] == "mount":
|
|
||||||
cmd_mount(argv[2])
|
|
||||||
else:
|
|
||||||
cmd_umount(argv[2])
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main(sys.argv)
|
|
||||||
|
|
@ -1,81 +0,0 @@
|
||||||
#!/bin/bash
|
|
||||||
# Privileged sandbox launcher for left4me script overlays.
|
|
||||||
#
|
|
||||||
# Invoked via sudo by the web user with two arguments:
|
|
||||||
# <overlay_id> numeric overlay id; bind-mounts /var/lib/left4me/overlays/<id>
|
|
||||||
# read-write at /overlay inside the sandbox.
|
|
||||||
# <script_path> absolute path to a bash file already written by the web app;
|
|
||||||
# bind-mounted read-only at /script.sh inside the sandbox.
|
|
||||||
#
|
|
||||||
# The script runs as a transient systemd .service with the full hardening
|
|
||||||
# surface: cgroup limits + walltime kill, NoNewPrivileges, ProtectSystem,
|
|
||||||
# ProtectHome, kernel-tunable / -module / -log protection, namespace
|
|
||||||
# restriction, address-family restriction, capability bounding (empty),
|
|
||||||
# seccomp filter (@system-service @network-io), MemoryDenyWriteExecute,
|
|
||||||
# LockPersonality, RestrictSUIDSGID. Network namespace is *not* restricted —
|
|
||||||
# scripts must reach the public internet to download workshop / l4d2center
|
|
||||||
# / cedapug content. PID namespace is shared with the host (no
|
|
||||||
# PrivatePID= directive in systemd); host PIDs are visible via /proc.
|
|
||||||
# Same-uid attack surface (the sandbox runs as left4me, so do the
|
|
||||||
# gameservers and the web app) is covered by the hardening profile plus
|
|
||||||
# system-wide kernel.yama.ptrace_scope=2 — see
|
|
||||||
# docs/superpowers/specs/2026-05-15-hardening-threat-model.md.
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
# Self-wrap into PID 1's mount namespace before doing anything mount-related.
|
|
||||||
# The web app's left4me-web.service has PrivateTmp=true, which gives it a
|
|
||||||
# private mount namespace. When the worker invokes us via sudo, we inherit
|
|
||||||
# that namespace; our `mount --bind` would land there. systemd-run below
|
|
||||||
# spawns transient units in PID 1's namespace (where they don't see the
|
|
||||||
# private bind), so the sandbox would bind onto an empty staging dir and
|
|
||||||
# permission-deny on every write. The sentinel env var avoids an exec loop.
|
|
||||||
if [[ "${L4D2_SANDBOX_IN_PID1_MNT_NS:-}" != "1" ]]; then
|
|
||||||
exec env L4D2_SANDBOX_IN_PID1_MNT_NS=1 \
|
|
||||||
/usr/bin/nsenter --mount=/proc/1/ns/mnt -- "$0" "$@"
|
|
||||||
fi
|
|
||||||
|
|
||||||
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
|
|
||||||
|
|
||||||
OVERLAY_ID=$1
|
|
||||||
SCRIPT=$2
|
|
||||||
|
|
||||||
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
|
|
||||||
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
|
|
||||||
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
|
|
||||||
[[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
|
|
||||||
|
|
||||||
if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
|
|
||||||
echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
SCRIPT_RC=0
|
|
||||||
systemd-run --quiet --collect --wait --pipe \
|
|
||||||
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
|
||||||
--slice=l4d2-build.slice \
|
|
||||||
-p OOMScoreAdjust=500 \
|
|
||||||
-p User=left4me -p Group=left4me \
|
|
||||||
-p UMask=0022 \
|
|
||||||
-p NoNewPrivileges=yes \
|
|
||||||
-p ProtectSystem=strict -p ProtectHome=yes \
|
|
||||||
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
|
|
||||||
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
|
|
||||||
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
|
|
||||||
-p RestrictNamespaces=yes \
|
|
||||||
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
|
|
||||||
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
|
|
||||||
-p MemoryDenyWriteExecute=yes \
|
|
||||||
-p SystemCallFilter="@system-service @network-io" \
|
|
||||||
-p SystemCallArchitectures=native \
|
|
||||||
-p CapabilityBoundingSet= -p AmbientCapabilities= \
|
|
||||||
-p IPAddressDeny="127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7" \
|
|
||||||
-p TemporaryFileSystem="/etc /var/lib" \
|
|
||||||
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
|
|
||||||
-p BindPaths="${OVERLAY_DIR}:/overlay" \
|
|
||||||
-p WorkingDirectory=/overlay \
|
|
||||||
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
|
|
||||||
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
|
|
||||||
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
|
|
||||||
-- /bin/bash /script.sh || SCRIPT_RC=$?
|
|
||||||
|
|
||||||
exit $SCRIPT_RC
|
|
||||||
|
|
@ -1,17 +0,0 @@
|
||||||
#!/bin/sh
|
|
||||||
# Run l4d2web flask CLI commands as the left4me user with the deploy env loaded.
|
|
||||||
# Usage: left4me <flask-subcommand> [args...]
|
|
||||||
# Examples:
|
|
||||||
# left4me create-user alice --admin
|
|
||||||
# left4me seed-script-overlays /opt/left4me/src/examples/script-overlays
|
|
||||||
# left4me routes
|
|
||||||
set -eu
|
|
||||||
exec sudo -u left4me sh -c '
|
|
||||||
set -a
|
|
||||||
. /etc/left4me/host.env
|
|
||||||
. /etc/left4me/web.env
|
|
||||||
set +a
|
|
||||||
export JOB_WORKER_ENABLED=false
|
|
||||||
export PYTHONPATH=/opt/left4me/src
|
|
||||||
exec /var/lib/left4me/.venv/bin/flask --app l4d2web.app:create_app "$@"
|
|
||||||
' sh "$@"
|
|
||||||
|
|
@ -1,36 +0,0 @@
|
||||||
"""Shared fixtures and path constants for `deploy/scripts/tests/`."""
|
|
||||||
import os
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
|
||||||
ROOT = Path(__file__).resolve().parents[3]
|
|
||||||
SCRIPTS = ROOT / "deploy" / "scripts"
|
|
||||||
LIBEXEC = SCRIPTS / "libexec"
|
|
||||||
SBIN = SCRIPTS / "sbin"
|
|
||||||
|
|
||||||
# `deploy/` is also the parent of the scripts/ tree. The sudoers example
|
|
||||||
# lives at `deploy/files/etc/sudoers.d/left4me` and is the canonical
|
|
||||||
# statement of which paths sudo grants to the `left4me` uid.
|
|
||||||
# `deploy/scripts/tests/test_sudoers_grants.py` reads it from there.
|
|
||||||
DEPLOY = ROOT / "deploy"
|
|
||||||
|
|
||||||
|
|
||||||
def fake_command(tmp_path, command_name):
|
|
||||||
"""Drop a no-op stub of `command_name` into `tmp_path`. Returns the
|
|
||||||
marker file the stub writes its args to, so tests can assert that the
|
|
||||||
helper rejected bad input before invoking the real command.
|
|
||||||
"""
|
|
||||||
marker = tmp_path / f"{command_name}.args"
|
|
||||||
command = tmp_path / command_name
|
|
||||||
command.write_text(f"#!/bin/sh\nprintf '%s\\n' \"$*\" > '{marker}'\nexit 0\n")
|
|
||||||
command.chmod(0o755)
|
|
||||||
return marker
|
|
||||||
|
|
||||||
|
|
||||||
def env_with_fake_commands(tmp_path):
|
|
||||||
"""Build an environment that prepends `tmp_path` onto PATH so helpers
|
|
||||||
find the fake commands first.
|
|
||||||
"""
|
|
||||||
env = os.environ.copy()
|
|
||||||
env["PATH"] = f"{tmp_path}{os.pathsep}{env.get('PATH', '')}"
|
|
||||||
return env
|
|
||||||
|
|
@ -1,15 +0,0 @@
|
||||||
from conftest import LIBEXEC
|
|
||||||
|
|
||||||
|
|
||||||
SYSTEMCTL_HELPER = LIBEXEC / "left4me-systemctl"
|
|
||||||
JOURNALCTL_HELPER = LIBEXEC / "left4me-journalctl"
|
|
||||||
|
|
||||||
|
|
||||||
def test_helpers_use_fixed_system_tool_paths_not_sudo_path():
|
|
||||||
systemctl = SYSTEMCTL_HELPER.read_text()
|
|
||||||
journalctl = JOURNALCTL_HELPER.read_text()
|
|
||||||
|
|
||||||
assert "command -v systemctl" not in systemctl
|
|
||||||
assert "command -v journalctl" not in journalctl
|
|
||||||
assert "/bin/systemctl" in systemctl or "/usr/bin/systemctl" in systemctl
|
|
||||||
assert "/bin/journalctl" in journalctl or "/usr/bin/journalctl" in journalctl
|
|
||||||
|
|
@ -1,38 +0,0 @@
|
||||||
import subprocess
|
|
||||||
|
|
||||||
from conftest import LIBEXEC, env_with_fake_commands, fake_command
|
|
||||||
|
|
||||||
|
|
||||||
JOURNALCTL_HELPER = LIBEXEC / "left4me-journalctl"
|
|
||||||
|
|
||||||
|
|
||||||
def test_journalctl_helper_passes_shell_syntax_check_and_rejects_bad_args(tmp_path):
|
|
||||||
subprocess.run(["sh", "-n", str(JOURNALCTL_HELPER)], check=True)
|
|
||||||
marker = fake_command(tmp_path, "journalctl")
|
|
||||||
|
|
||||||
for args in [
|
|
||||||
["../evil", "--lines", "25", "--no-follow"],
|
|
||||||
["alpha", "--bad", "25", "--no-follow"],
|
|
||||||
["alpha", "--lines", "not-number", "--no-follow"],
|
|
||||||
["alpha", "--lines", "25", "--bad-follow"],
|
|
||||||
["bad/name", "--lines", "25", "--no-follow"],
|
|
||||||
]:
|
|
||||||
result = subprocess.run(
|
|
||||||
["sh", str(JOURNALCTL_HELPER), *args],
|
|
||||||
env=env_with_fake_commands(tmp_path),
|
|
||||||
check=False,
|
|
||||||
)
|
|
||||||
assert result.returncode != 0, f"helper accepted bad args: {args!r}"
|
|
||||||
assert not marker.exists(), f"helper invoked journalctl for: {args!r}"
|
|
||||||
|
|
||||||
script = JOURNALCTL_HELPER.read_text()
|
|
||||||
assert 'unit="left4me-server@${name}.service"' in script
|
|
||||||
# Anchors `--since` to the unit's most recent start so the panel shows
|
|
||||||
# the current run (and any post-restart lines until reload).
|
|
||||||
assert 'InactiveExitTimestamp' in script
|
|
||||||
assert 'LC_ALL=C' in script
|
|
||||||
assert 'exec "$journalctl" -u "$unit" --since "$start_time" -n "$lines" -o cat "$follow_arg"' in script
|
|
||||||
assert 'exec "$journalctl" -u "$unit" --since "$start_time" -n "$lines" -o cat' in script
|
|
||||||
# Never-started fallback keeps the legacy unit-only form.
|
|
||||||
assert 'exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"' in script
|
|
||||||
assert 'exec "$journalctl" -u "$unit" -n "$lines" -o cat' in script
|
|
||||||
|
|
@ -1,32 +0,0 @@
|
||||||
from conftest import LIBEXEC
|
|
||||||
|
|
||||||
|
|
||||||
OVERLAY_HELPER = LIBEXEC / "left4me-overlay"
|
|
||||||
|
|
||||||
|
|
||||||
def test_overlay_helper_is_python_with_strict_validation():
|
|
||||||
text = OVERLAY_HELPER.read_text()
|
|
||||||
assert text.startswith("#!/usr/bin/python3")
|
|
||||||
# Validation surface
|
|
||||||
assert "NAME_RE = re.compile" in text
|
|
||||||
assert "LOWERDIR_ALLOWLIST" in text
|
|
||||||
assert "user.fuseoverlayfs." in text
|
|
||||||
assert "MAX_LOWERDIRS = 500" in text
|
|
||||||
# Mounts via PID 1's mount namespace
|
|
||||||
assert "/proc/1/ns/mnt" in text
|
|
||||||
assert "nsenter" in text
|
|
||||||
# Verbs are mount and umount (not unmount)
|
|
||||||
assert '"mount"' in text and '"umount"' in text
|
|
||||||
assert '"unmount"' not in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_overlay_helper_mount_is_idempotent_when_already_mounted():
|
|
||||||
"""ExecStartPre runs on every Restart=on-failure cycle. If a previous
|
|
||||||
start mounted successfully but ExecStart failed afterwards, the next
|
|
||||||
ExecStartPre would re-mount on top -- which fails. The helper must
|
|
||||||
short-circuit when merged is already a mount point.
|
|
||||||
"""
|
|
||||||
text = OVERLAY_HELPER.read_text()
|
|
||||||
# Two ismount checks now: one in cmd_mount (skip if mounted),
|
|
||||||
# one in cmd_umount (skip if not mounted).
|
|
||||||
assert text.count("os.path.ismount") >= 2
|
|
||||||
|
|
@ -1,146 +0,0 @@
|
||||||
import subprocess
|
|
||||||
|
|
||||||
from conftest import LIBEXEC
|
|
||||||
|
|
||||||
|
|
||||||
SCRIPT_SANDBOX_HELPER = LIBEXEC / "left4me-script-sandbox"
|
|
||||||
|
|
||||||
|
|
||||||
def test_script_sandbox_helper_present():
|
|
||||||
assert SCRIPT_SANDBOX_HELPER.is_file()
|
|
||||||
assert SCRIPT_SANDBOX_HELPER.read_text().startswith("#!/bin/bash")
|
|
||||||
mode = SCRIPT_SANDBOX_HELPER.stat().st_mode & 0o777
|
|
||||||
assert mode == 0o755, f"expected 0755, got {oct(mode)}"
|
|
||||||
|
|
||||||
|
|
||||||
def test_script_sandbox_helper_passes_shell_syntax_check():
|
|
||||||
subprocess.run(["bash", "-n", str(SCRIPT_SANDBOX_HELPER)], check=True)
|
|
||||||
|
|
||||||
|
|
||||||
def test_script_sandbox_helper_invokes_systemd_run_with_hardening():
|
|
||||||
text = SCRIPT_SANDBOX_HELPER.read_text()
|
|
||||||
|
|
||||||
# systemd-run service mode (no --scope), with synchronous I/O to caller.
|
|
||||||
assert "systemd-run" in text
|
|
||||||
assert "--scope" not in text, "v2 uses transient service units, not scopes"
|
|
||||||
assert "--pipe" in text
|
|
||||||
assert "--wait" in text
|
|
||||||
assert "--collect" in text
|
|
||||||
assert "--unit=" in text
|
|
||||||
|
|
||||||
# No bwrap.
|
|
||||||
assert "bwrap" not in text
|
|
||||||
assert "bubblewrap" not in text
|
|
||||||
|
|
||||||
# UID drop via systemd directives.
|
|
||||||
assert "User=left4me" in text
|
|
||||||
assert "Group=left4me" in text
|
|
||||||
|
|
||||||
# Cgroup limits unchanged from v1.
|
|
||||||
assert "MemoryMax=4G" in text
|
|
||||||
assert "MemorySwapMax=0" in text
|
|
||||||
assert "TasksMax=512" in text
|
|
||||||
assert "CPUQuota=200%" in text
|
|
||||||
assert "RuntimeMaxSec=3600" in text
|
|
||||||
|
|
||||||
# Hardening directives that v1 (scope mode) couldn't carry.
|
|
||||||
assert "NoNewPrivileges=yes" in text
|
|
||||||
assert "ProtectSystem=strict" in text
|
|
||||||
assert "ProtectHome=yes" in text
|
|
||||||
assert "PrivateTmp=yes" in text
|
|
||||||
assert "PrivateDevices=yes" in text
|
|
||||||
assert "PrivateIPC=yes" in text
|
|
||||||
assert "ProtectKernelTunables=yes" in text
|
|
||||||
assert "ProtectKernelModules=yes" in text
|
|
||||||
assert "ProtectKernelLogs=yes" in text
|
|
||||||
assert "ProtectControlGroups=yes" in text
|
|
||||||
assert "RestrictNamespaces=yes" in text
|
|
||||||
assert "RestrictSUIDSGID=yes" in text
|
|
||||||
assert "LockPersonality=yes" in text
|
|
||||||
assert "MemoryDenyWriteExecute=yes" in text
|
|
||||||
assert "SystemCallFilter=" in text
|
|
||||||
assert "@system-service" in text
|
|
||||||
assert "@network-io" in text
|
|
||||||
assert "CapabilityBoundingSet=" in text
|
|
||||||
assert "AmbientCapabilities=" in text
|
|
||||||
assert 'RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX"' in text
|
|
||||||
|
|
||||||
# Network namespace stays shared with host.
|
|
||||||
assert "PrivateNetwork=" not in text
|
|
||||||
|
|
||||||
# Mount setup: /etc and /var/lib masked with tmpfs; selective binds back.
|
|
||||||
assert 'TemporaryFileSystem="/etc /var/lib"' in text
|
|
||||||
assert "BindReadOnlyPaths=" in text
|
|
||||||
# The resolv.conf bind points at the sandbox-only file (not the host's
|
|
||||||
# /etc/resolv.conf, which typically references a private-IP DNS server
|
|
||||||
# that IPAddressDeny= blocks).
|
|
||||||
assert "/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf" in text
|
|
||||||
assert "/etc/ssl" in text
|
|
||||||
assert "/etc/ca-certificates" in text
|
|
||||||
assert "/etc/nsswitch.conf" in text
|
|
||||||
assert "/etc/alternatives" in text
|
|
||||||
assert "${SCRIPT}:/script.sh" in text
|
|
||||||
assert 'BindPaths="${OVERLAY_DIR}:/overlay"' in text
|
|
||||||
|
|
||||||
# IP egress filter: allow public, deny localhost / RFC1918 / link-local /
|
|
||||||
# multicast / CGNAT / ULA. systemd's "more specific rule wins" semantics
|
|
||||||
# mean public IPs hit the allow and listed ranges hit the deny.
|
|
||||||
# IPAddressDeny alone — no IPAddressAllow=any. Empirically, having both
|
|
||||||
# set causes the allow to win on this systemd/kernel combo regardless of
|
|
||||||
# the documented "more specific rule wins" behaviour. With only Deny,
|
|
||||||
# the kernel's default "allow all" applies to non-listed addresses.
|
|
||||||
assert "IPAddressDeny=" in text
|
|
||||||
assert "IPAddressAllow=any" not in text
|
|
||||||
# Explicit CIDRs — systemd-run's -p parser doesn't accept the
|
|
||||||
# `localhost` / `link-local` / `multicast` shorthand keywords that
|
|
||||||
# work in unit files (only the full strings parse).
|
|
||||||
for token in (
|
|
||||||
"127.0.0.0/8",
|
|
||||||
"::1/128",
|
|
||||||
"169.254.0.0/16",
|
|
||||||
"fe80::/10",
|
|
||||||
"224.0.0.0/4",
|
|
||||||
"ff00::/8",
|
|
||||||
"10.0.0.0/8",
|
|
||||||
"172.16.0.0/12",
|
|
||||||
"192.168.0.0/16",
|
|
||||||
"100.64.0.0/10",
|
|
||||||
"fc00::/7",
|
|
||||||
):
|
|
||||||
assert token in text, f"missing {token!r} in IPAddressDeny set"
|
|
||||||
|
|
||||||
|
|
||||||
def test_script_sandbox_in_build_slice_with_oom_adjust():
|
|
||||||
text = SCRIPT_SANDBOX_HELPER.read_text()
|
|
||||||
|
|
||||||
# Put the transient unit in the low-weight build slice so it yields to
|
|
||||||
# game-server instances under CPU/IO contention.
|
|
||||||
assert "--slice=l4d2-build.slice" in text
|
|
||||||
|
|
||||||
# Sandbox dies first if the host hits memory pressure; servers
|
|
||||||
# (OOMScoreAdjust=-200) survive.
|
|
||||||
assert "-p OOMScoreAdjust=500" in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_script_sandbox_helper_validates_overlay_id():
|
|
||||||
text = SCRIPT_SANDBOX_HELPER.read_text()
|
|
||||||
# Numeric-only overlay id
|
|
||||||
assert '[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]]' in text
|
|
||||||
# Overlay dir must exist
|
|
||||||
assert "/var/lib/left4me/overlays/" in text
|
|
||||||
assert "[[ -d $OVERLAY_DIR ]]" in text
|
|
||||||
# Script path must exist
|
|
||||||
assert "[[ -f $SCRIPT ]]" in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_script_sandbox_helper_dry_run_mode(tmp_path):
|
|
||||||
overlay_root = tmp_path / "var/lib/left4me/overlays/42"
|
|
||||||
overlay_root.mkdir(parents=True)
|
|
||||||
fake_script = tmp_path / "fake.sh"
|
|
||||||
fake_script.write_text("echo hi")
|
|
||||||
|
|
||||||
helper_text = SCRIPT_SANDBOX_HELPER.read_text()
|
|
||||||
# We can't actually exec this without root; just verify the dry-run
|
|
||||||
# guard short-circuits before systemd-run runs.
|
|
||||||
assert 'LEFT4ME_SCRIPT_SANDBOX_DRY_RUN' in helper_text
|
|
||||||
assert 'exit 0' in helper_text
|
|
||||||
|
|
@ -1,37 +0,0 @@
|
||||||
"""Audit the script→sudoers contract.
|
|
||||||
|
|
||||||
The sudoers file in `deploy/files/etc/sudoers.d/left4me` is a reference
|
|
||||||
example; ckn-bw ships its own verbatim copy under
|
|
||||||
`bundles/left4me/files/etc/sudoers.d/left4me`. The two are expected to
|
|
||||||
match. This test lives under `deploy/scripts/tests/` because the contract being
|
|
||||||
audited is about *scripts* (which paths the `left4me` uid can sudo into).
|
|
||||||
"""
|
|
||||||
from conftest import DEPLOY
|
|
||||||
|
|
||||||
|
|
||||||
SUDOERS = DEPLOY / "files/etc/sudoers.d/left4me"
|
|
||||||
|
|
||||||
|
|
||||||
def test_sudoers_allows_only_left4me_helpers_not_raw_system_tools():
|
|
||||||
sudoers = SUDOERS.read_text()
|
|
||||||
|
|
||||||
assert (
|
|
||||||
"left4me ALL=(root) NOPASSWD: "
|
|
||||||
"/usr/local/libexec/left4me/left4me-systemctl *"
|
|
||||||
) in sudoers
|
|
||||||
assert (
|
|
||||||
"left4me ALL=(root) NOPASSWD: "
|
|
||||||
"/usr/local/libexec/left4me/left4me-journalctl *"
|
|
||||||
) in sudoers
|
|
||||||
assert "/usr/local/libexec/left4me/left4me-overlay mount *" in sudoers
|
|
||||||
assert "/usr/local/libexec/left4me/left4me-overlay umount *" in sudoers
|
|
||||||
assert (
|
|
||||||
"left4me ALL=(root) NOPASSWD: "
|
|
||||||
"/usr/local/libexec/left4me/left4me-script-sandbox"
|
|
||||||
) in sudoers
|
|
||||||
assert "/bin/systemctl" not in sudoers
|
|
||||||
assert "/usr/bin/systemctl" not in sudoers
|
|
||||||
assert "/bin/journalctl" not in sudoers
|
|
||||||
assert "/usr/bin/journalctl" not in sudoers
|
|
||||||
assert "/bin/mount" not in sudoers
|
|
||||||
assert "/bin/umount" not in sudoers
|
|
||||||
|
|
@ -1,39 +0,0 @@
|
||||||
import subprocess
|
|
||||||
|
|
||||||
from conftest import LIBEXEC, env_with_fake_commands, fake_command
|
|
||||||
|
|
||||||
|
|
||||||
SYSTEMCTL_HELPER = LIBEXEC / "left4me-systemctl"
|
|
||||||
|
|
||||||
|
|
||||||
def test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args(tmp_path):
|
|
||||||
subprocess.run(["sh", "-n", str(SYSTEMCTL_HELPER)], check=True)
|
|
||||||
marker = fake_command(tmp_path, "systemctl")
|
|
||||||
|
|
||||||
for args in [
|
|
||||||
["bad/action", "alpha"],
|
|
||||||
# `start` and `stop` are no longer accepted verbs — the lifecycle now
|
|
||||||
# uses `enable`/`disable` for reboot survival via WantedBy= symlinks.
|
|
||||||
["start", "alpha"],
|
|
||||||
["stop", "alpha"],
|
|
||||||
["enable", ""],
|
|
||||||
["enable", ".hidden"],
|
|
||||||
["enable", "bad..name"],
|
|
||||||
["enable", "bad/name"],
|
|
||||||
["enable", "bad\\name"],
|
|
||||||
["enable", "bad name"],
|
|
||||||
]:
|
|
||||||
result = subprocess.run(
|
|
||||||
["sh", str(SYSTEMCTL_HELPER), *args],
|
|
||||||
env=env_with_fake_commands(tmp_path),
|
|
||||||
check=False,
|
|
||||||
)
|
|
||||||
assert result.returncode != 0
|
|
||||||
assert not marker.exists()
|
|
||||||
|
|
||||||
script = SYSTEMCTL_HELPER.read_text()
|
|
||||||
assert 'unit="left4me-server@${name}.service"' in script
|
|
||||||
assert 'enable) exec "$systemctl" enable --now "$unit"' in script
|
|
||||||
assert 'disable) exec "$systemctl" disable --now "$unit"' in script
|
|
||||||
assert "--property=ActiveState" in script
|
|
||||||
assert "--property=SubState" in script
|
|
||||||
|
|
@ -1,10 +1,3 @@
|
||||||
DATABASE_URL=sqlite:////var/lib/left4me/left4me.db
|
DATABASE_URL=sqlite:////var/lib/left4me/left4me.db
|
||||||
SECRET_KEY=replace-with-generated-secret
|
SECRET_KEY=replace-with-generated-secret
|
||||||
JOB_WORKER_THREADS=4
|
JOB_WORKER_THREADS=4
|
||||||
|
|
||||||
# Steam Web API key for ISteamUser/GetPlayerSummaries — used by the
|
|
||||||
# live-state poller to resolve player Steam IDs to persona names + avatars
|
|
||||||
# in the server detail panel. Free at https://steamcommunity.com/dev/apikey.
|
|
||||||
# Optional: if empty, the live-state panel still shows counts/map and the
|
|
||||||
# in-game name from RCON, just with placeholder avatars.
|
|
||||||
STEAM_WEB_API_KEY=
|
|
||||||
|
|
|
||||||
187
deploy/tests/test_deploy_artifacts.py
Normal file
187
deploy/tests/test_deploy_artifacts.py
Normal file
|
|
@ -0,0 +1,187 @@
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parents[2]
|
||||||
|
DEPLOY = ROOT / "deploy"
|
||||||
|
|
||||||
|
|
||||||
|
WEB_UNIT = DEPLOY / "files/usr/local/lib/systemd/system/left4me-web.service"
|
||||||
|
SERVER_UNIT = DEPLOY / "files/usr/local/lib/systemd/system/left4me-server@.service"
|
||||||
|
SYSTEMCTL_HELPER = DEPLOY / "files/usr/local/libexec/left4me/left4me-systemctl"
|
||||||
|
JOURNALCTL_HELPER = DEPLOY / "files/usr/local/libexec/left4me/left4me-journalctl"
|
||||||
|
SUDOERS = DEPLOY / "files/etc/sudoers.d/left4me"
|
||||||
|
HOST_ENV = DEPLOY / "templates/etc/left4me/host.env"
|
||||||
|
WEB_ENV_TEMPLATE = DEPLOY / "templates/etc/left4me/web.env.template"
|
||||||
|
DEPLOY_SCRIPT = DEPLOY / "deploy-test-server.sh"
|
||||||
|
|
||||||
|
|
||||||
|
def test_global_unit_files_exist_at_product_level_paths():
|
||||||
|
assert WEB_UNIT.is_file()
|
||||||
|
assert SERVER_UNIT.is_file()
|
||||||
|
|
||||||
|
|
||||||
|
def test_web_unit_contains_required_runtime_contract():
|
||||||
|
unit = WEB_UNIT.read_text()
|
||||||
|
|
||||||
|
assert "User=left4me" in unit
|
||||||
|
assert "Group=left4me" in unit
|
||||||
|
assert "WorkingDirectory=/opt/left4me" in unit
|
||||||
|
assert "Environment=PATH=/opt/left4me/.venv/bin:" in unit
|
||||||
|
assert "EnvironmentFile=/etc/left4me/host.env" in unit
|
||||||
|
assert "EnvironmentFile=/etc/left4me/web.env" in unit
|
||||||
|
assert "ExecStart=/opt/left4me/.venv/bin/gunicorn" in unit
|
||||||
|
assert "--workers 1" in unit
|
||||||
|
assert "NoNewPrivileges=true" in unit
|
||||||
|
assert "PrivateTmp=true" in unit
|
||||||
|
assert "ProtectSystem=full" in unit
|
||||||
|
assert "ReadWritePaths=/var/lib/left4me" in unit
|
||||||
|
|
||||||
|
|
||||||
|
def test_server_unit_contains_required_runtime_contract():
|
||||||
|
unit = SERVER_UNIT.read_text()
|
||||||
|
|
||||||
|
assert "User=left4me" in unit
|
||||||
|
assert "Group=left4me" in unit
|
||||||
|
assert "EnvironmentFile=/etc/left4me/host.env" in unit
|
||||||
|
assert "EnvironmentFile=/var/lib/left4me/instances/%i/instance.env" in unit
|
||||||
|
assert "WorkingDirectory=/var/lib/left4me/runtime/%i/merged/left4dead2" in unit
|
||||||
|
assert "ExecStart=/var/lib/left4me/installation/srcds_run" in unit
|
||||||
|
assert "$L4D2_ARGS" in unit
|
||||||
|
assert "${L4D2_ARGS}" not in unit
|
||||||
|
assert "NoNewPrivileges=true" in unit
|
||||||
|
assert "PrivateTmp=true" in unit
|
||||||
|
assert "PrivateDevices=true" in unit
|
||||||
|
assert "ProtectHome=true" in unit
|
||||||
|
assert "ProtectSystem=strict" in unit
|
||||||
|
assert "ReadOnlyPaths=/var/lib/left4me/installation /var/lib/left4me/overlays" in unit
|
||||||
|
assert "ReadWritePaths=/var/lib/left4me/runtime/%i" in unit
|
||||||
|
assert "RestrictSUIDSGID=true" in unit
|
||||||
|
assert "LockPersonality=true" in unit
|
||||||
|
|
||||||
|
|
||||||
|
def _fake_command(tmp_path, command_name):
|
||||||
|
marker = tmp_path / f"{command_name}.args"
|
||||||
|
command = tmp_path / command_name
|
||||||
|
command.write_text(f"#!/bin/sh\nprintf '%s\n' \"$*\" > '{marker}'\nexit 0\n")
|
||||||
|
command.chmod(0o755)
|
||||||
|
return marker
|
||||||
|
|
||||||
|
|
||||||
|
def _env_with_fake_commands(tmp_path):
|
||||||
|
env = os.environ.copy()
|
||||||
|
env["PATH"] = f"{tmp_path}{os.pathsep}{env.get('PATH', '')}"
|
||||||
|
return env
|
||||||
|
|
||||||
|
|
||||||
|
def test_helpers_use_fixed_system_tool_paths_not_sudo_path():
|
||||||
|
systemctl = SYSTEMCTL_HELPER.read_text()
|
||||||
|
journalctl = JOURNALCTL_HELPER.read_text()
|
||||||
|
|
||||||
|
assert "command -v systemctl" not in systemctl
|
||||||
|
assert "command -v journalctl" not in journalctl
|
||||||
|
assert "/bin/systemctl" in systemctl or "/usr/bin/systemctl" in systemctl
|
||||||
|
assert "/bin/journalctl" in journalctl or "/usr/bin/journalctl" in journalctl
|
||||||
|
|
||||||
|
|
||||||
|
def test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args(tmp_path):
|
||||||
|
subprocess.run(["sh", "-n", str(SYSTEMCTL_HELPER)], check=True)
|
||||||
|
marker = _fake_command(tmp_path, "systemctl")
|
||||||
|
|
||||||
|
for args in [
|
||||||
|
["bad/action", "alpha"],
|
||||||
|
["start", ""],
|
||||||
|
["start", ".hidden"],
|
||||||
|
["start", "bad..name"],
|
||||||
|
["start", "bad/name"],
|
||||||
|
["start", "bad\\name"],
|
||||||
|
["start", "bad name"],
|
||||||
|
]:
|
||||||
|
result = subprocess.run(["sh", str(SYSTEMCTL_HELPER), *args], env=_env_with_fake_commands(tmp_path), check=False)
|
||||||
|
assert result.returncode != 0
|
||||||
|
assert not marker.exists()
|
||||||
|
|
||||||
|
script = SYSTEMCTL_HELPER.read_text()
|
||||||
|
assert 'unit="left4me-server@${name}.service"' in script
|
||||||
|
assert 'start) exec "$systemctl" start "$unit"' in script
|
||||||
|
assert 'stop) exec "$systemctl" stop "$unit"' in script
|
||||||
|
assert "--property=ActiveState" in script
|
||||||
|
assert "--property=SubState" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_journalctl_helper_passes_shell_syntax_check_and_rejects_bad_args(tmp_path):
|
||||||
|
subprocess.run(["sh", "-n", str(JOURNALCTL_HELPER)], check=True)
|
||||||
|
marker = _fake_command(tmp_path, "journalctl")
|
||||||
|
|
||||||
|
for args in [
|
||||||
|
["../evil", "--lines", "25", "--no-follow"],
|
||||||
|
["alpha", "--bad", "25", "--no-follow"],
|
||||||
|
["alpha", "--lines", "not-number", "--no-follow"],
|
||||||
|
["alpha", "--lines", "25", "--bad-follow"],
|
||||||
|
["bad/name", "--lines", "25", "--no-follow"],
|
||||||
|
]:
|
||||||
|
result = subprocess.run(["sh", str(JOURNALCTL_HELPER), *args], env=_env_with_fake_commands(tmp_path), check=False)
|
||||||
|
assert result.returncode != 0
|
||||||
|
assert not marker.exists()
|
||||||
|
|
||||||
|
script = JOURNALCTL_HELPER.read_text()
|
||||||
|
assert 'unit="left4me-server@${name}.service"' in script
|
||||||
|
assert 'exec "$journalctl" -u "$unit" -n "$lines" -o cat "$follow_arg"' in script
|
||||||
|
assert 'exec "$journalctl" -u "$unit" -n "$lines" -o cat' in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_sudoers_allows_only_left4me_helpers_not_raw_system_tools():
|
||||||
|
sudoers = SUDOERS.read_text()
|
||||||
|
|
||||||
|
assert (
|
||||||
|
"left4me ALL=(root) NOPASSWD: "
|
||||||
|
"/usr/local/libexec/left4me/left4me-systemctl *"
|
||||||
|
) in sudoers
|
||||||
|
assert (
|
||||||
|
"left4me ALL=(root) NOPASSWD: "
|
||||||
|
"/usr/local/libexec/left4me/left4me-journalctl *"
|
||||||
|
) in sudoers
|
||||||
|
assert "/bin/systemctl" not in sudoers
|
||||||
|
assert "/usr/bin/systemctl" not in sudoers
|
||||||
|
assert "/bin/journalctl" not in sudoers
|
||||||
|
assert "/usr/bin/journalctl" not in sudoers
|
||||||
|
|
||||||
|
|
||||||
|
def test_env_templates_contain_required_defaults():
|
||||||
|
host_env = HOST_ENV.read_text()
|
||||||
|
assert "Deployment units use fixed /var/lib/left4me paths" in host_env
|
||||||
|
assert host_env.endswith("LEFT4ME_ROOT=/var/lib/left4me\n")
|
||||||
|
assert WEB_ENV_TEMPLATE.read_text() == (
|
||||||
|
"DATABASE_URL=sqlite:////var/lib/left4me/left4me.db\n"
|
||||||
|
"SECRET_KEY=replace-with-generated-secret\n"
|
||||||
|
"JOB_WORKER_THREADS=4\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_deploy_script_has_safe_defaults_and_preserves_state() -> None:
|
||||||
|
script = DEPLOY_SCRIPT.read_text()
|
||||||
|
|
||||||
|
assert "useradd --system --home-dir /var/lib/left4me" in script
|
||||||
|
assert "/var/lib/left4me/installation" in script
|
||||||
|
assert "/var/lib/left4me/overlays" in script
|
||||||
|
assert "/var/lib/left4me/instances" in script
|
||||||
|
assert "/var/lib/left4me/runtime" in script
|
||||||
|
assert "tar" in script
|
||||||
|
assert "--exclude .venv" in script
|
||||||
|
assert "pip install -e /opt/left4me/l4d2host -e /opt/left4me/l4d2web" in script
|
||||||
|
assert "systemctl enable --now left4me-web.service" in script
|
||||||
|
assert "for attempt in" in script
|
||||||
|
assert "/opt/left4me/.venv" in script
|
||||||
|
assert "visudo -cf /etc/sudoers.d/left4me" in script
|
||||||
|
assert "if [ ! -f /etc/left4me/web.env ]" in script
|
||||||
|
assert ". /etc/left4me/web.env\n" not in script
|
||||||
|
assert "run_left4me_with_env" in script
|
||||||
|
assert "LEFT4ME_ADMIN_USERNAME" in script
|
||||||
|
assert "LEFT4ME_ADMIN_PASSWORD" in script
|
||||||
|
assert "user already exists" in script
|
||||||
|
assert "deploy/files" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_deploy_script_shell_syntax() -> None:
|
||||||
|
subprocess.run(["sh", "-n", str(DEPLOY_SCRIPT)], check=True)
|
||||||
|
|
@ -1,330 +0,0 @@
|
||||||
"""Lockdown tests for the curated examples kept under `deploy/files/`.
|
|
||||||
|
|
||||||
`deploy/` is reference material. The production units are emitted by
|
|
||||||
ckn-bw's `systemd_units` reactor in `bundles/left4me/metadata.py`;
|
|
||||||
when reactor output drifts intentionally, update these examples to match.
|
|
||||||
"""
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
|
||||||
ROOT = Path(__file__).resolve().parents[2]
|
|
||||||
DEPLOY = ROOT / "deploy"
|
|
||||||
|
|
||||||
|
|
||||||
WEB_UNIT = DEPLOY / "files/usr/local/lib/systemd/system/left4me-web.service"
|
|
||||||
SERVER_UNIT = DEPLOY / "files/usr/local/lib/systemd/system/left4me-server@.service"
|
|
||||||
GAME_SLICE = DEPLOY / "files/usr/local/lib/systemd/system/l4d2-game.slice"
|
|
||||||
BUILD_SLICE = DEPLOY / "files/usr/local/lib/systemd/system/l4d2-build.slice"
|
|
||||||
SYSCTL_CONF = DEPLOY / "files/etc/sysctl.d/99-left4me.conf"
|
|
||||||
SANDBOX_RESOLV_CONF = DEPLOY / "files/etc/left4me/sandbox-resolv.conf"
|
|
||||||
HOST_ENV = DEPLOY / "templates/etc/left4me/host.env"
|
|
||||||
WEB_ENV_TEMPLATE = DEPLOY / "templates/etc/left4me/web.env.template"
|
|
||||||
WEB_HARDENING_DROPIN = DEPLOY / "files/etc/systemd/system/left4me-web.service.d/10-hardening.conf"
|
|
||||||
SERVER_HARDENING_DROPIN = DEPLOY / "files/etc/systemd/system/left4me-server@.service.d/10-hardening.conf"
|
|
||||||
|
|
||||||
|
|
||||||
def test_global_unit_files_exist_at_product_level_paths():
|
|
||||||
assert WEB_UNIT.is_file()
|
|
||||||
assert SERVER_UNIT.is_file()
|
|
||||||
|
|
||||||
|
|
||||||
def test_web_unit_contains_required_runtime_contract():
|
|
||||||
unit = WEB_UNIT.read_text()
|
|
||||||
|
|
||||||
assert "User=left4me" in unit
|
|
||||||
assert "Group=left4me" in unit
|
|
||||||
assert "WorkingDirectory=/opt/left4me" in unit
|
|
||||||
assert "PATH=/var/lib/left4me/.venv/bin:" in unit
|
|
||||||
assert "EnvironmentFile=/etc/left4me/host.env" in unit
|
|
||||||
assert "EnvironmentFile=/etc/left4me/web.env" in unit
|
|
||||||
assert "ExecStart=/var/lib/left4me/.venv/bin/gunicorn" in unit
|
|
||||||
assert "--workers 1" in unit
|
|
||||||
assert "--threads 32" in unit
|
|
||||||
# NoNewPrivileges must remain unset because sudo (used by the overlay,
|
|
||||||
# systemctl and journalctl helpers) is setuid.
|
|
||||||
assert "NoNewPrivileges=true" not in unit
|
|
||||||
assert "ReadWritePaths=/var/lib/left4me" in unit
|
|
||||||
# Mounts now happen in PID 1's namespace via the left4me-overlay helper,
|
|
||||||
# so MountFlags propagation is irrelevant — and the previous assumption
|
|
||||||
# that MountFlags=shared made it work was incorrect.
|
|
||||||
assert "MountFlags=" not in unit
|
|
||||||
# Hardening directives belong in the drop-in; must not appear in the base unit.
|
|
||||||
assert "PrivateTmp=" not in unit
|
|
||||||
assert "ProtectSystem=" not in unit
|
|
||||||
|
|
||||||
|
|
||||||
def test_server_unit_contains_required_runtime_contract():
|
|
||||||
unit = SERVER_UNIT.read_text()
|
|
||||||
|
|
||||||
assert "User=left4me" in unit
|
|
||||||
assert "Group=left4me" in unit
|
|
||||||
assert "EnvironmentFile=/etc/left4me/host.env" in unit
|
|
||||||
assert "EnvironmentFile=/var/lib/left4me/instances/%i/instance.env" in unit
|
|
||||||
# `-` prefix: chdir failure is non-fatal so ExecStartPre can run the
|
|
||||||
# mount helper before the merged dir exists. ExecStart re-applies and
|
|
||||||
# finds the dir once the mount has landed.
|
|
||||||
assert "WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2" in unit
|
|
||||||
# ExecStart must invoke srcds_run from the *merged* overlay tree, not
|
|
||||||
# from installation/. srcds_run cds to its own dirname; if we point at
|
|
||||||
# installation/, the engine reads gameinfo.txt and addons from the lower
|
|
||||||
# layer and never sees overlay plugins (Metamod/SourceMod) or cfgs.
|
|
||||||
assert "ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run" in unit
|
|
||||||
assert "$L4D2_ARGS" in unit
|
|
||||||
assert "${L4D2_ARGS}" not in unit
|
|
||||||
# Hardening directives belong in the drop-in; must not appear in the base unit.
|
|
||||||
assert "NoNewPrivileges=" not in unit
|
|
||||||
assert "PrivateTmp=" not in unit
|
|
||||||
assert "PrivateDevices=" not in unit
|
|
||||||
assert "ProtectHome=" not in unit
|
|
||||||
assert "ProtectSystem=" not in unit
|
|
||||||
assert "RestrictSUIDSGID=" not in unit
|
|
||||||
assert "LockPersonality=" not in unit
|
|
||||||
|
|
||||||
|
|
||||||
def test_server_unit_mounts_overlay_via_exec_start_pre():
|
|
||||||
"""At boot, systemd auto-starts enabled units before the web app gets a
|
|
||||||
chance to run start_instance's pre-start mount. The unit itself must
|
|
||||||
re-mount the overlay so reboots are transparent. Pairs with the helper's
|
|
||||||
idempotency check (test_overlay_helper_mount_is_idempotent_when_mounted).
|
|
||||||
|
|
||||||
The unit-level `nsenter --mount=/proc/1/ns/mnt --` is what makes
|
|
||||||
umount fast: without it, the helper Python process would inherit
|
|
||||||
the unit's per-service mount namespace and pin it alive, blocking
|
|
||||||
PID 1's umount until the helper exited. Wrapping with nsenter at
|
|
||||||
the Exec line puts the helper itself in PID 1's namespace.
|
|
||||||
"""
|
|
||||||
unit = SERVER_UNIT.read_text()
|
|
||||||
# `+` prefix: runs as PID 1 (root, no sandbox). Required because
|
|
||||||
# the unit has NoNewPrivileges=true, which blocks sudo's setuid
|
|
||||||
# escalation — and the helper needs root for the mount syscall.
|
|
||||||
assert (
|
|
||||||
"ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- "
|
|
||||||
"/usr/local/libexec/left4me/left4me-overlay mount %i"
|
|
||||||
in unit
|
|
||||||
)
|
|
||||||
# Bound the restart loop; without these, a CHDIR-failure (or any other
|
|
||||||
# pre-start error) spins indefinitely.
|
|
||||||
assert "StartLimitBurst=5" in unit
|
|
||||||
assert "StartLimitIntervalSec=60s" in unit
|
|
||||||
|
|
||||||
|
|
||||||
def test_server_unit_unmounts_overlay_via_exec_stop_post():
|
|
||||||
"""Single source of truth for unmount, mirroring the mount path.
|
|
||||||
ExecStopPost (not ExecStop) so it runs after srcds has fully exited
|
|
||||||
and the cgroup is cleared.
|
|
||||||
|
|
||||||
Same nsenter-at-Exec-line wrapping as ExecStartPre — without it,
|
|
||||||
the helper process would itself hold a reference to the unit's
|
|
||||||
per-service mount namespace, and umount in PID 1 would loop on
|
|
||||||
EBUSY until the helper gave up. With it, umount succeeds first try.
|
|
||||||
"""
|
|
||||||
unit = SERVER_UNIT.read_text()
|
|
||||||
assert (
|
|
||||||
"ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- "
|
|
||||||
"/usr/local/libexec/left4me/left4me-overlay umount %i"
|
|
||||||
in unit
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_server_unit_contains_perf_baseline_directives():
|
|
||||||
unit = SERVER_UNIT.read_text()
|
|
||||||
|
|
||||||
# Slice membership.
|
|
||||||
assert "Slice=l4d2-game.slice" in unit
|
|
||||||
|
|
||||||
# CFS priority bump (no SCHED_FIFO).
|
|
||||||
assert "Nice=-5" in unit
|
|
||||||
assert "CPUSchedulingPolicy=" not in unit
|
|
||||||
|
|
||||||
# I/O priority.
|
|
||||||
assert "IOSchedulingClass=best-effort" in unit
|
|
||||||
assert "IOSchedulingPriority=4" in unit
|
|
||||||
|
|
||||||
# OOM ordering: game servers survive, sandbox dies first.
|
|
||||||
assert "OOMScoreAdjust=-200" in unit
|
|
||||||
|
|
||||||
# Memory caps with headroom for map-load spikes.
|
|
||||||
assert "MemoryHigh=1.5G" in unit
|
|
||||||
assert "MemoryMax=2G" in unit
|
|
||||||
|
|
||||||
# Bounded fork surface.
|
|
||||||
assert "TasksMax=256" in unit
|
|
||||||
|
|
||||||
# Plenty of fds for plugin-heavy setups.
|
|
||||||
assert "LimitNOFILE=65536" in unit
|
|
||||||
|
|
||||||
# srcds clean shutdown via SIGINT, with time to flush. With the
|
|
||||||
# helper running in PID 1's mount namespace (via the unit-level
|
|
||||||
# nsenter on ExecStopPost), umount has no race window and the
|
|
||||||
# default 15 s is plenty for the whole stop transition.
|
|
||||||
assert "KillSignal=SIGINT" in unit
|
|
||||||
assert "TimeoutStopSec=15s" in unit
|
|
||||||
|
|
||||||
# Per-unit override of journald rate limiting (default drops srcds output).
|
|
||||||
assert "LogRateLimitIntervalSec=0" in unit
|
|
||||||
|
|
||||||
|
|
||||||
def test_l4d2_game_slice_exists_with_high_weights():
|
|
||||||
assert GAME_SLICE.is_file()
|
|
||||||
text = GAME_SLICE.read_text()
|
|
||||||
assert "[Slice]" in text
|
|
||||||
assert "CPUWeight=1000" in text
|
|
||||||
assert "IOWeight=1000" in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_l4d2_build_slice_exists_with_low_weights():
|
|
||||||
assert BUILD_SLICE.is_file()
|
|
||||||
text = BUILD_SLICE.read_text()
|
|
||||||
assert "[Slice]" in text
|
|
||||||
assert "CPUWeight=10" in text
|
|
||||||
assert "IOWeight=10" in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_sysctl_conf_present_with_perf_settings():
|
|
||||||
assert SYSCTL_CONF.is_file()
|
|
||||||
text = SYSCTL_CONF.read_text()
|
|
||||||
for line in (
|
|
||||||
"net.core.rmem_max = 8388608",
|
|
||||||
"net.core.wmem_max = 8388608",
|
|
||||||
"net.core.rmem_default = 524288",
|
|
||||||
"net.core.wmem_default = 524288",
|
|
||||||
"net.core.netdev_max_backlog = 5000",
|
|
||||||
"net.core.netdev_budget = 600",
|
|
||||||
"vm.swappiness = 10",
|
|
||||||
"net.ipv4.udp_rmem_min = 16384",
|
|
||||||
"net.ipv4.udp_wmem_min = 16384",
|
|
||||||
"net.core.default_qdisc = fq_codel",
|
|
||||||
"net.ipv4.tcp_congestion_control = bbr",
|
|
||||||
"kernel.yama.ptrace_scope = 2",
|
|
||||||
):
|
|
||||||
assert line in text, f"missing {line!r} in 99-left4me.conf"
|
|
||||||
|
|
||||||
|
|
||||||
def test_env_templates_contain_required_defaults():
|
|
||||||
host_env = HOST_ENV.read_text()
|
|
||||||
assert "Deployment units use fixed /var/lib/left4me paths" in host_env
|
|
||||||
assert host_env.endswith("LEFT4ME_ROOT=/var/lib/left4me\n")
|
|
||||||
web_env = WEB_ENV_TEMPLATE.read_text()
|
|
||||||
assert web_env.startswith(
|
|
||||||
"DATABASE_URL=sqlite:////var/lib/left4me/left4me.db\n"
|
|
||||||
"SECRET_KEY=replace-with-generated-secret\n"
|
|
||||||
"JOB_WORKER_THREADS=4\n"
|
|
||||||
)
|
|
||||||
assert web_env.rstrip().endswith("STEAM_WEB_API_KEY=")
|
|
||||||
|
|
||||||
|
|
||||||
def test_sandbox_resolv_conf_exists():
|
|
||||||
assert SANDBOX_RESOLV_CONF.is_file()
|
|
||||||
text = SANDBOX_RESOLV_CONF.read_text()
|
|
||||||
nameservers = [
|
|
||||||
line.split()[1]
|
|
||||||
for line in text.splitlines()
|
|
||||||
if line.startswith("nameserver ")
|
|
||||||
]
|
|
||||||
assert len(nameservers) >= 2, "expected at least two nameservers for redundancy"
|
|
||||||
# Sanity: the resolvers must be public (not RFC1918 / loopback). We don't
|
|
||||||
# pin the exact IPs — Cloudflare/Google/Quad9 are all acceptable.
|
|
||||||
for ns in nameservers:
|
|
||||||
assert not ns.startswith("127."), ns
|
|
||||||
assert not ns.startswith("10."), ns
|
|
||||||
assert not ns.startswith("192.168."), ns
|
|
||||||
first_octet = int(ns.split(".")[0])
|
|
||||||
# Reject 172.16.0.0/12.
|
|
||||||
if first_octet == 172:
|
|
||||||
second_octet = int(ns.split(".")[1])
|
|
||||||
assert not (16 <= second_octet <= 31), ns
|
|
||||||
|
|
||||||
|
|
||||||
def test_web_hardening_dropin_present_with_directives():
|
|
||||||
assert WEB_HARDENING_DROPIN.is_file()
|
|
||||||
text = WEB_HARDENING_DROPIN.read_text()
|
|
||||||
assert "[Service]" in text
|
|
||||||
# COMMON
|
|
||||||
for d in (
|
|
||||||
"ProtectProc=invisible",
|
|
||||||
"ProtectKernelTunables=true",
|
|
||||||
"ProtectKernelModules=true",
|
|
||||||
"ProtectKernelLogs=true",
|
|
||||||
"ProtectClock=true",
|
|
||||||
"ProtectControlGroups=true",
|
|
||||||
"ProtectHostname=true",
|
|
||||||
"LockPersonality=true",
|
|
||||||
"ProtectSystem=strict",
|
|
||||||
"ProtectHome=true",
|
|
||||||
"PrivateTmp=true",
|
|
||||||
"RestrictNamespaces=true",
|
|
||||||
"RestrictRealtime=true",
|
|
||||||
"RemoveIPC=true",
|
|
||||||
"KeyringMode=private",
|
|
||||||
"UMask=0027",
|
|
||||||
"RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX",
|
|
||||||
):
|
|
||||||
assert d in text, f"missing {d!r} in web hardening drop-in"
|
|
||||||
# WEB-specific
|
|
||||||
# `native x86` (not `native`) because the install job fork-execs
|
|
||||||
# steamcmd_linux (32-bit). Plain `native` produces SIGSYS (bash exit 159).
|
|
||||||
assert "SystemCallArchitectures=native x86" in text
|
|
||||||
assert "SystemCallFilter=@system-service" in text
|
|
||||||
assert "SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete" in text
|
|
||||||
# WEB must NOT include the sudo-incompatible directives.
|
|
||||||
assert "NoNewPrivileges=" not in text
|
|
||||||
assert "PrivateUsers=" not in text
|
|
||||||
assert "RestrictSUIDSGID=" not in text
|
|
||||||
assert "CapabilityBoundingSet=" not in text
|
|
||||||
assert "~@privileged" not in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_server_hardening_dropin_present_with_directives():
|
|
||||||
assert SERVER_HARDENING_DROPIN.is_file()
|
|
||||||
text = SERVER_HARDENING_DROPIN.read_text()
|
|
||||||
assert "[Service]" in text
|
|
||||||
for d in (
|
|
||||||
"NoNewPrivileges=true",
|
|
||||||
"RestrictSUIDSGID=true",
|
|
||||||
"PrivateUsers=true",
|
|
||||||
"PrivatePIDs=true",
|
|
||||||
"PrivateIPC=true",
|
|
||||||
"PrivateDevices=true",
|
|
||||||
"CapabilityBoundingSet=",
|
|
||||||
"AmbientCapabilities=",
|
|
||||||
"SystemCallArchitectures=native x86",
|
|
||||||
"TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media",
|
|
||||||
"BindReadOnlyPaths=/var/lib/left4me/installation",
|
|
||||||
"BindReadOnlyPaths=/var/lib/left4me/overlays",
|
|
||||||
"BindReadOnlyPaths=/etc/left4me/host.env",
|
|
||||||
"BindPaths=/var/lib/left4me/runtime/%i",
|
|
||||||
"SocketBindAllow=udp:27000-27999",
|
|
||||||
"SocketBindAllow=tcp:27000-27999",
|
|
||||||
):
|
|
||||||
assert d in text, f"missing {d!r} in server hardening drop-in"
|
|
||||||
assert "SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged" in text
|
|
||||||
# MemoryDenyWriteExecute must remain absent (Source engine compat).
|
|
||||||
assert "MemoryDenyWriteExecute" not in text
|
|
||||||
# ProcSubset=pid must remain absent — hides /proc/cpuinfo and breaks
|
|
||||||
# SteamAPI master-server registration (LAN-only fallback). See
|
|
||||||
# ckn-bw 4339289 and the comment block in the drop-in itself.
|
|
||||||
for line in text.splitlines():
|
|
||||||
bare = line.split("#", 1)[0].strip()
|
|
||||||
assert bare != "ProcSubset=pid", "ProcSubset=pid must not be active in the server drop-in"
|
|
||||||
|
|
||||||
|
|
||||||
def test_hardening_dropins_agree_on_syscall_architectures():
|
|
||||||
# Both units fork-exec a 32-bit binary on critical paths: the web
|
|
||||||
# service runs the install job (steamcmd_linux), the server unit runs
|
|
||||||
# srcds_linux. Either drop-in without `x86` in SystemCallArchitectures
|
|
||||||
# SIGSYS-kills its child on first syscall (bash exit 159). They must
|
|
||||||
# agree, and both must include x86 — caught the hard way on
|
|
||||||
# 2026-05-15 when web had `native` only and the install job died.
|
|
||||||
import re
|
|
||||||
|
|
||||||
pat = re.compile(r"^SystemCallArchitectures=(.+)$", re.MULTILINE)
|
|
||||||
web_arch = pat.search(WEB_HARDENING_DROPIN.read_text()).group(1).strip()
|
|
||||||
srv_arch = pat.search(SERVER_HARDENING_DROPIN.read_text()).group(1).strip()
|
|
||||||
assert web_arch == srv_arch, (
|
|
||||||
f"hardening drop-ins disagree on SystemCallArchitectures: "
|
|
||||||
f"web={web_arch!r} server={srv_arch!r}. Both must include `x86`."
|
|
||||||
)
|
|
||||||
assert "x86" in web_arch.split(), (
|
|
||||||
f"SystemCallArchitectures missing x86: {web_arch!r}. Required for "
|
|
||||||
"steamcmd_linux (install job) and srcds_linux."
|
|
||||||
)
|
|
||||||
|
|
@ -1,17 +0,0 @@
|
||||||
"""Syntax-check the sudoers drop-in via visudo before it leaves the repo."""
|
|
||||||
import shutil
|
|
||||||
import subprocess
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
SUDOERS = Path(__file__).resolve().parents[2] / "deploy/files/etc/sudoers.d/left4me"
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.skipif(shutil.which("visudo") is None, reason="visudo not installed")
|
|
||||||
def test_sudoers_parses():
|
|
||||||
result = subprocess.run(
|
|
||||||
["visudo", "-cf", str(SUDOERS)],
|
|
||||||
capture_output=True, text=True,
|
|
||||||
)
|
|
||||||
assert result.returncode == 0, f"visudo -cf failed: {result.stdout}{result.stderr}"
|
|
||||||
|
|
@ -1,507 +0,0 @@
|
||||||
# L4D2 server cvar reference
|
|
||||||
|
|
||||||
Working notes from the 2026-05 research session on best-practice
|
|
||||||
L4D2 dedicated server settings. Sources cited inline; some findings
|
|
||||||
verified empirically on the `left4.me` Trixie test server (kernel
|
|
||||||
6.12.86). This is reference material, not a settled design.
|
|
||||||
|
|
||||||
## Quick lookup
|
|
||||||
|
|
||||||
| Topic | Recommended |
|
|
||||||
|---|---|
|
|
||||||
| Tickrate (stock) | 30 |
|
|
||||||
| Tickrate (competitive) | 100, requires Tickrate Enabler plugin |
|
|
||||||
| `sv_pure` | `2` (strict), or `0`/`1` for modded servers |
|
|
||||||
| `sv_cheats` | `0` (set to 1 only on private practice servers; disables VAC) |
|
|
||||||
| `sv_consistency` | `0` (allow custom campaigns) or `1` (strict for competitive) |
|
|
||||||
| `sv_alltalk` | `0` (no cross-team voice), `1` for casual / fun servers |
|
|
||||||
| `sv_lan` | `0` (internet server) |
|
|
||||||
| `sv_voiceenable` | `1` |
|
|
||||||
| `nb_update_frequency` | `0.033` (safe, no plugin), `0.014` with the SM fix plugin. **Cheat-protected — must be set via `sm_cvar`.** |
|
|
||||||
| `nb_update_framelimit` | `30` (was 15). Raises the per-frame bot-AI cap so commons don't lag at high counts. **Cheat-protected.** |
|
|
||||||
| `fps_max` | `64` for 30-tick, `0` (uncapped) for higher ticks |
|
|
||||||
| `net_maxcleartime` | `0.0001` — drop choked packets fast instead of stalling. **Cheat-protected.** |
|
|
||||||
| `sv_tags` | `"coop,custom"` (etc.) — Steam server browser hint |
|
|
||||||
| `sv_region` | `3` (EU) / `1` (US East) / `255` (any) |
|
|
||||||
|
|
||||||
## Copy-paste best practice config
|
|
||||||
|
|
||||||
A complete starting config that pairs with the project's existing
|
|
||||||
`examples/script-overlays/tickrate.sh` overlay (which installs the
|
|
||||||
Tickrate Enabler plugin) and a SourceMod install. Two files: the
|
|
||||||
plain `server.cfg` and a SourceMod-only `cfg/sourcemod/sourcemod.cfg`.
|
|
||||||
|
|
||||||
For background and per-cvar rationale, see the topic sections below.
|
|
||||||
|
|
||||||
### `server.cfg` (vanilla, non-cheat cvars only)
|
|
||||||
|
|
||||||
```
|
|
||||||
// --- Identity & discoverability ---
|
|
||||||
hostname "your server name here"
|
|
||||||
sv_tags "coop,custom"
|
|
||||||
sv_region 3 // 3=EU, 1=US East, 255=any
|
|
||||||
sv_lan 0
|
|
||||||
sv_steamgroup "0" // your Steam group ID for reserved slots
|
|
||||||
sv_search_key "0" // groups your servers in the lobby browser
|
|
||||||
|
|
||||||
// --- Security ---
|
|
||||||
sv_cheats 0
|
|
||||||
sv_pure 0 // 0/1 for modded servers; 2 = strict
|
|
||||||
sv_consistency 0 // 0 if hosting custom campaigns; 1 = strict
|
|
||||||
sv_password ""
|
|
||||||
sv_allow_lobby_connect_only 0 // let players connect via IP, not just lobby
|
|
||||||
|
|
||||||
// --- Voice / chat ---
|
|
||||||
sv_voiceenable 1
|
|
||||||
sv_alltalk 0
|
|
||||||
|
|
||||||
// --- Player limits (coop) ---
|
|
||||||
sv_maxplayers 4
|
|
||||||
sv_visiblemaxplayers 4
|
|
||||||
// (For versus: 8/8)
|
|
||||||
|
|
||||||
// --- Network rates (100-tick; requires Tickrate Enabler) ---
|
|
||||||
sv_minrate 100000
|
|
||||||
sv_maxrate 100000
|
|
||||||
sv_mincmdrate 100
|
|
||||||
sv_maxcmdrate 100
|
|
||||||
sv_minupdaterate 100
|
|
||||||
sv_maxupdaterate 100
|
|
||||||
sv_client_min_interp_ratio -1
|
|
||||||
sv_client_max_interp_ratio 2
|
|
||||||
net_splitpacket_maxrate 50000
|
|
||||||
net_splitrate 2
|
|
||||||
fps_max 0
|
|
||||||
sv_forcepreload 1
|
|
||||||
|
|
||||||
// --- Logging (used by left4me's log-streaming feature) ---
|
|
||||||
sv_logfile 1
|
|
||||||
sv_logflush 0
|
|
||||||
sv_logecho 1
|
|
||||||
sv_logbans 1
|
|
||||||
```
|
|
||||||
|
|
||||||
### `cfg/sourcemod/sourcemod.cfg` (cheat-flagged cvars, set via `sm_cvar`)
|
|
||||||
|
|
||||||
```
|
|
||||||
// --- Network tweaks (cheat-flagged or SM-managed) ---
|
|
||||||
sm_cvar net_maxcleartime 0.0001
|
|
||||||
|
|
||||||
// --- Simulation cadence (more frequent AI ticks; no behaviour change) ---
|
|
||||||
sm_cvar nb_update_frequency 0.033 // 0.014 if you have the AM fix plugin
|
|
||||||
sm_cvar nb_update_framelimit 30 // default 15 — raise per-frame bot AI cap
|
|
||||||
|
|
||||||
// --- Diagnostics ---
|
|
||||||
sm_cvar nb_stuck_dump_threshold 5 // log stuck bots ≥5s
|
|
||||||
```
|
|
||||||
|
|
||||||
> **If you're not running SourceMod**, the entire `sm_cvar` block
|
|
||||||
> above is dead — those cvars are cheat-protected and silently
|
|
||||||
> ignored from plain `server.cfg`. The vanilla block still applies
|
|
||||||
> and delivers the bulk of the network-feel improvements. See
|
|
||||||
> [Cheat-protected cvars and `sm_cvar`](#cheat-protected-cvars-and-sm_cvar).
|
|
||||||
|
|
||||||
For tickrates other than 100, see the
|
|
||||||
[Network rates](#network-rates) section below.
|
|
||||||
|
|
||||||
## Network rates
|
|
||||||
|
|
||||||
L4D2 default tickrate is **30**. Rates above the corresponding
|
|
||||||
ceiling are ignored without the
|
|
||||||
[Tickrate Enabler plugin](https://github.com/SirPlease/Server4Dead-Project/tree/master/Tickrate%20Enabler).
|
|
||||||
|
|
||||||
Rule of thumb: `sv_maxrate = tickrate × 1000`.
|
|
||||||
|
|
||||||
### 30-tick (stock)
|
|
||||||
|
|
||||||
```
|
|
||||||
sv_minrate 30000
|
|
||||||
sv_maxrate 30000
|
|
||||||
sv_mincmdrate 30
|
|
||||||
sv_maxcmdrate 30
|
|
||||||
sv_minupdaterate 30
|
|
||||||
sv_maxupdaterate 30
|
|
||||||
net_splitpacket_maxrate 30000
|
|
||||||
fps_max 64
|
|
||||||
```
|
|
||||||
|
|
||||||
### 60-tick (requires Tickrate Enabler)
|
|
||||||
|
|
||||||
```
|
|
||||||
sv_minrate 60000
|
|
||||||
sv_maxrate 60000
|
|
||||||
sv_mincmdrate 60
|
|
||||||
sv_maxcmdrate 60
|
|
||||||
sv_minupdaterate 60
|
|
||||||
sv_maxupdaterate 60
|
|
||||||
net_splitpacket_maxrate 60000
|
|
||||||
fps_max 128
|
|
||||||
```
|
|
||||||
|
|
||||||
### 100-tick (competitive, requires Tickrate Enabler)
|
|
||||||
|
|
||||||
```
|
|
||||||
sv_minrate 100000
|
|
||||||
sv_maxrate 100000
|
|
||||||
sv_mincmdrate 100
|
|
||||||
sv_maxcmdrate 100
|
|
||||||
sv_minupdaterate 100
|
|
||||||
sv_maxupdaterate 100
|
|
||||||
net_splitpacket_maxrate 100000
|
|
||||||
fps_max 0
|
|
||||||
```
|
|
||||||
|
|
||||||
### sv_min*rate vs. sv_max*rate
|
|
||||||
|
|
||||||
- Locking `min == max` (competitive servers do this) ensures every
|
|
||||||
client sends at the tickrate exactly. Strict — kicks clients
|
|
||||||
that dip below.
|
|
||||||
- Leaving a range (e.g. `min=10, max=30` on a 30-tick public
|
|
||||||
server) tolerates clients on weak connections or loaded CPUs.
|
|
||||||
- Setting `sv_mincmdrate=0` means *no enforced minimum* — clients
|
|
||||||
could send as few as 1-2 cmds/sec. Bad. Pick a floor that's
|
|
||||||
playable (~10 minimum).
|
|
||||||
|
|
||||||
## Cheat-protected cvars and `sm_cvar`
|
|
||||||
|
|
||||||
Several gameplay-affecting cvars are flagged as "cheat" in L4D2 and
|
|
||||||
**cannot be set via `server.cfg` unless `sv_cheats 1`** — which
|
|
||||||
disables VAC and gates achievements. Trying to set them from cfg
|
|
||||||
silently fails (the value stays at default).
|
|
||||||
|
|
||||||
To set them on a real (VAC-protected) server: install SourceMod and
|
|
||||||
use `sm_cvar <name> <value>` instead of `<name> <value>`. SourceMod
|
|
||||||
bypasses the cheat protection for *server-side cvar writes only*
|
|
||||||
(does not grant cheat commands to players).
|
|
||||||
|
|
||||||
Cheat-protected cvars worth knowing:
|
|
||||||
- `nb_update_frequency` — common-infected pathing/state update
|
|
||||||
rate (see below).
|
|
||||||
- `director_*` — most director cvars (AI difficulty, panic events,
|
|
||||||
pacing).
|
|
||||||
- `z_*` — most zombie-behavior cvars.
|
|
||||||
|
|
||||||
`sm_cvar` writes go in `cfg/sourcemod/sourcemod.cfg` (auto-execed
|
|
||||||
by SM on map change) or in any cfg under `cfg/sourcemod/`. SM
|
|
||||||
re-applies these on every map change — important because
|
|
||||||
cheat-protected cvars *reset to defaults on map change* even
|
|
||||||
within the same server session.
|
|
||||||
|
|
||||||
## `nb_update_frequency`
|
|
||||||
|
|
||||||
Like raising server tickrate, this controls *how often* common
|
|
||||||
infected and witches get an AI tick — it doesn't change what they
|
|
||||||
decide, only how quickly the engine asks them. Pure cadence cvar.
|
|
||||||
|
|
||||||
Default `0.1` (10 Hz), independent of server tickrate.
|
|
||||||
|
|
||||||
| Value | Effect |
|
|
||||||
|---|---|
|
|
||||||
| `0.1` (default) | Common-infected look choppy regardless of tickrate |
|
|
||||||
| `0.033` | ~30 Hz updates; smooth, safe without plugin |
|
|
||||||
| `0.024` | Lowest "safe" without plugin per community testing |
|
|
||||||
| `0.014` | ~71 Hz; clients with `cl_interp 0` see jittery commons unless the [nb_update_frequency fix plugin](https://forums.alliedmods.net/showthread.php?t=344019) is installed |
|
|
||||||
|
|
||||||
Set via `sm_cvar nb_update_frequency 0.033` in
|
|
||||||
`cfg/sourcemod/sourcemod.cfg` (or any sm-auto-execed cfg). Without
|
|
||||||
SourceMod, you cannot reliably set this on a VAC-protected server.
|
|
||||||
|
|
||||||
## NextBot scheduler & diagnostics
|
|
||||||
|
|
||||||
`nb_update_frequency` (covered above) is *how often* the scheduler
|
|
||||||
asks bots to think. Two related cvars are also pure
|
|
||||||
cadence/throughput — no behaviour change — and one is a passive
|
|
||||||
diagnostic.
|
|
||||||
|
|
||||||
### `nb_update_framelimit`
|
|
||||||
|
|
||||||
Default `15`. **Maximum number of NextBots that get an AI tick per
|
|
||||||
server frame.** Above this cap the engine round-robins bots across
|
|
||||||
frames, so at 30 commons on a 30-tick server each common gets a
|
|
||||||
fresh think roughly every other frame — visible as "zombies
|
|
||||||
hesitate before chasing." Raising this to `30`–`60` lets every bot
|
|
||||||
think every frame at the cost of linear extra CPU. Does not alter
|
|
||||||
how bots decide what to do; only how often they get to decide.
|
|
||||||
|
|
||||||
This is the most under-documented L4D2 cvar and the one most often
|
|
||||||
blamed on tickrate or `nb_update_frequency` when it's neither.
|
|
||||||
|
|
||||||
Cheat-protected — use `sm_cvar`.
|
|
||||||
|
|
||||||
### `nb_stuck_dump_threshold`
|
|
||||||
|
|
||||||
Default `-1` (disabled). Set to `5` to log any bot stuck for ≥5
|
|
||||||
seconds to the server console. Costs nothing at runtime and is the
|
|
||||||
single best diagnostic for "why do zombies keep clipping into
|
|
||||||
geometry on this custom campaign?" tickets. Pure logging — does
|
|
||||||
not affect bot behaviour. Cheat-protected.
|
|
||||||
|
|
||||||
## Lag compensation
|
|
||||||
|
|
||||||
Most lag-compensation cvars are present but not in the truncated
|
|
||||||
`cvar_list` dump. Verify on your own server with `sm_cvar <name>`
|
|
||||||
(no value) before relying on them.
|
|
||||||
|
|
||||||
| Cvar | Default | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `sv_unlag` | `1` | Enable lag compensation. Keep on. |
|
|
||||||
| `sv_maxunlag` | `0.5`–`1.0` | Max ms of lag-comp rewind. Confogl uses `1`. Higher = better for higher-ping shots. |
|
|
||||||
| `sv_unlag_fixstuck` | `1` | Used by upstream Competitive-Rework. |
|
|
||||||
| `sv_forcepreload` | `0` | Set to `1` to preload server-side assets at boot. Smoother first map. Confirmed in `cvar_list`. |
|
|
||||||
|
|
||||||
## Packet compression & high-entity-count tuning
|
|
||||||
|
|
||||||
Relevant when running custom servers with raised `z_common_limit`,
|
|
||||||
big mob spawns, or many addon entities. At high entity counts,
|
|
||||||
snapshots routinely exceed the UDP MTU and get split into multiple
|
|
||||||
packets. Clients perceive this as "lag" — but it's really
|
|
||||||
*snapshot drops*, visible in `net_graph` as updates/sec dipping
|
|
||||||
well below `sv_maxupdaterate`. The fix is on the wire, not in the
|
|
||||||
simulation.
|
|
||||||
|
|
||||||
Source: [Lux's L4D2 high-zombie-count discussion (Steam)](https://steamcommunity.com/app/550/discussions/0/2568690416482192538/).
|
|
||||||
|
|
||||||
| Cvar | Default | Recommended | Notes |
|
|
||||||
|---|---|---|---|
|
|
||||||
| `net_compresspackets` | varies | `1` | Enable LZ-style packet compression. Cheap CPU win for high-entity servers. Verify with `sm_cvar`. |
|
|
||||||
| `net_compresspackets_minsize` | varies | `2324` | Compress packets ≥ this size — roughly the wire MTU. |
|
|
||||||
| `net_splitrate` | `1` | `2` | Allow 2 split-packet pieces per net frame; drains queue faster. Confirmed in `cvar_list`. |
|
|
||||||
| `net_splitpacket_maxrate` | `15000` | `50000`+ | Throughput cap when sending split packets. |
|
|
||||||
| `net_maxcleartime` | `4.0` | `0.0001` | Don't stall on choke — drop choked packets fast. Confirmed real (RCON-verified 2026-05-20: `sm_cvar net_maxcleartime` returns the set value). |
|
|
||||||
| `sv_extra_client_connect_time` | varies | `0.0001` | Tiny handshake speedup from the Lux thread. Verify with `sm_cvar`. |
|
|
||||||
|
|
||||||
Several of these are missing from the local `cvar_list` dump but
|
|
||||||
that file is **not exhaustive** — see
|
|
||||||
[Verifying a cvar actually exists](#verifying-a-cvar-actually-exists)
|
|
||||||
below. Several of these lines exist verbatim in upstream
|
|
||||||
Competitive-Rework's `cfg/server.cfg`, which has been running on
|
|
||||||
public servers for years.
|
|
||||||
|
|
||||||
## Server discoverability
|
|
||||||
|
|
||||||
Cosmetic but real UX wins for public servers.
|
|
||||||
|
|
||||||
| Cvar | Recommended | What it does |
|
|
||||||
|---|---|---|
|
|
||||||
| `sv_tags` | `"coop,custom,modded"` (your choice) | Comma-separated tags shown in the Steam server browser. Players filter on these. |
|
|
||||||
| `sv_region` | `3` (EU), `1` (US East), `255` (any) | Region reported to the master server. Set this and your server appears in the right regional browser. |
|
|
||||||
| `sv_search_key` | `"left4me"` (or your own string) | When players search from the in-game lobby, only servers with a matching key appear. Useful for grouping a fleet. |
|
|
||||||
| `sv_steamgroup` | your group's ID | Steam group members get reserved-slot priority (with the appropriate plugin). |
|
|
||||||
| `sv_lan` | `0` | Set `1` only for local-only play; skips Steam auth (players can't friend-join). |
|
|
||||||
|
|
||||||
## Logging hygiene
|
|
||||||
|
|
||||||
Relevant because the project's log-streaming feature (the work in
|
|
||||||
`l4d2web/static/js/files-overlay/editor.js` and adjacent) tails
|
|
||||||
the server log file. These cvars control what actually gets
|
|
||||||
written.
|
|
||||||
|
|
||||||
| Cvar | Recommended | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `sv_logfile` | `1` | Server log to disk. Required for log-streaming. |
|
|
||||||
| `sv_logflush` | `0` | Don't flush after every line — slow. Keep at `0` unless you're debugging crashes. |
|
|
||||||
| `sv_logecho` | `1` | Mirror log to stdout — needed for any process that tails srcds's console. |
|
|
||||||
| `sv_logbans` | `1` | Log every `kickid` / `banid` to the same log file. Cheap audit trail. |
|
|
||||||
| `sv_log_onefile` | `0` | Default — one log per day. `1` rolls everything into a single file (gets large quickly). |
|
|
||||||
| `sv_logsdir` | `"logs"` | Default. Path is relative to the game directory. |
|
|
||||||
|
|
||||||
## Verifying a cvar actually exists
|
|
||||||
|
|
||||||
The local `/Users/mwiegand/Projekte/left4me/cvar_list` dump (~2199
|
|
||||||
entries) is **incomplete** — it's missing several real L4D2 cvars
|
|
||||||
that upstream Competitive-Rework uses and that have been verified
|
|
||||||
in-engine via RCON. Likely it was generated via the in-engine
|
|
||||||
`cvarlist` command, which truncates and filters.
|
|
||||||
|
|
||||||
Authoritative existence check via SourceMod console (RCON):
|
|
||||||
|
|
||||||
```
|
|
||||||
sm_cvar <name> # no value → "Value of cvar X: Y" if real,
|
|
||||||
# "unknown" otherwise
|
|
||||||
```
|
|
||||||
|
|
||||||
The screenshot evidence for `net_maxcleartime` (2026-05-20):
|
|
||||||
|
|
||||||
```
|
|
||||||
> sm_cvar net_maxcleartime
|
|
||||||
[SM] Value of cvar "net_maxcleartime": "0.0001"
|
|
||||||
```
|
|
||||||
|
|
||||||
Rule of thumb when copying configs from elsewhere:
|
|
||||||
|
|
||||||
1. If the cvar is in `cvar_list` → it's definitely real.
|
|
||||||
2. If it's *not* in `cvar_list` but is in upstream Competitive-
|
|
||||||
Rework's `server.cfg` → probably real, but verify via `sm_cvar`
|
|
||||||
before relying on it.
|
|
||||||
3. If it's in neither and only mentioned in a random forum post →
|
|
||||||
high probability it's a CSGO/CS:S or HL2 cvar that someone
|
|
||||||
assumed exists in L4D2.
|
|
||||||
|
|
||||||
## Cvars that DO NOT exist in L4D2 (despite some guides claiming otherwise)
|
|
||||||
|
|
||||||
These come up in older guides or are inherited from other Source
|
|
||||||
games but don't actually exist in L4D2's command set. Verified by
|
|
||||||
RCON `sm_cvar <name>` returning "unknown":
|
|
||||||
|
|
||||||
- `z_resolve_zombie_collision_multiplier` — confirmed unknown in
|
|
||||||
current L4D2 builds (verified via RCON 2026-05-14). Some
|
|
||||||
community guides list it; it's not in the binary.
|
|
||||||
- `z_update_rate` — referenced in older tuning guides but not a
|
|
||||||
real L4D2 cvar. The actual zombie-AI cadence knob is
|
|
||||||
`nb_update_frequency`.
|
|
||||||
|
|
||||||
If a guide tells you to set one of these in L4D2, the guide is
|
|
||||||
wrong or out of date.
|
|
||||||
|
|
||||||
**Earlier revisions of this doc also listed `net_maxcleartime`
|
|
||||||
here. That was wrong** — it's a real L4D2 cvar (RCON-verified
|
|
||||||
2026-05-20 returning `0.0001` on `left4.me`). It just happens to
|
|
||||||
be missing from the `cvar_list` dump. The lesson: the cvar_list
|
|
||||||
file is useful as a positive check but unreliable as a negative
|
|
||||||
check (see
|
|
||||||
[Verifying a cvar actually exists](#verifying-a-cvar-actually-exists)).
|
|
||||||
|
|
||||||
## Security and integrity
|
|
||||||
|
|
||||||
```
|
|
||||||
sv_cheats 0
|
|
||||||
sv_pure 2 # force Steam-only files (strictest)
|
|
||||||
sv_consistency 1 # enforce file hashes for critical files
|
|
||||||
# (set 0 if hosting custom campaigns)
|
|
||||||
sv_lan 0 # internet server
|
|
||||||
```
|
|
||||||
|
|
||||||
Launch the server with `-secure` to enable VAC. `sv_cheats 1`
|
|
||||||
requires `-insecure` (no VAC) — only acceptable on private
|
|
||||||
practice servers.
|
|
||||||
|
|
||||||
`sv_pure 2` breaks many workshop maps/mods. Use `sv_pure 0` or `1`
|
|
||||||
for modded servers.
|
|
||||||
|
|
||||||
## Player limits
|
|
||||||
|
|
||||||
```
|
|
||||||
# Co-op / Scavenge
|
|
||||||
sv_maxplayers 4
|
|
||||||
sv_visiblemaxplayers 4
|
|
||||||
|
|
||||||
# Versus
|
|
||||||
sv_maxplayers 8
|
|
||||||
sv_visiblemaxplayers 8
|
|
||||||
```
|
|
||||||
|
|
||||||
## Voice
|
|
||||||
|
|
||||||
```
|
|
||||||
sv_voiceenable 1
|
|
||||||
sv_alltalk 0 # 1 = cross-team voice (casual / fun servers)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Recommended plugins (SourceMod ecosystem)
|
|
||||||
|
|
||||||
| Plugin | Purpose |
|
|
||||||
|---|---|
|
|
||||||
| MetaMod:Source + SourceMod | Required foundation for most of the below |
|
|
||||||
| [Tickrate Enabler](https://github.com/SirPlease/Server4Dead-Project/tree/master/Tickrate%20Enabler) | Unlock >30 tick servers |
|
|
||||||
| [Little Anti-Cheat](https://github.com/J-Tanzanite/Little-Anti-Cheat) | Aimbot / angle-cheat detection |
|
|
||||||
| SMAC | Secondary AC layer (older but still works) |
|
|
||||||
| [ZoneMod](https://github.com/SirPlease/L4D2-Competitive-Rework) | Competitive Versus ruleset (full bundle: ZoneMod + MatchMode + Confogl-style plugins) |
|
|
||||||
| `l4d2_TKStopper` | Teamkill / griefing control |
|
|
||||||
| `l4d_sb_fix` | Survivor bot behavior fixes |
|
|
||||||
| [nb_update_frequency fix](https://forums.alliedmods.net/showthread.php?t=344019) | Eliminates client-side jitter at very low `nb_update_frequency` values |
|
|
||||||
|
|
||||||
## MetaMod:Source / SourceMod versioning
|
|
||||||
|
|
||||||
- Stable branches are pinned in URL paths: `1.10`, `1.11`, `1.12`,
|
|
||||||
etc. There is no "latest stable" alias URL — you pick the
|
|
||||||
branch.
|
|
||||||
- Within a branch, the `mmsource-latest-linux` and
|
|
||||||
`sourcemod-latest-linux` text files contain the current build's
|
|
||||||
filename, e.g. `mmsource-1.12.0-git1219-linux.tar.gz`. Curl the
|
|
||||||
pointer file, then curl the actual tarball.
|
|
||||||
- AM bumps stable every ~2-3 years. When 1.13 (or later) is
|
|
||||||
declared stable, update the `MM_BRANCH=1.12` / `SM_BRANCH=1.12`
|
|
||||||
pins in the seeded Sourcemod overlay script.
|
|
||||||
- L4D2 has no special branch — it uses whatever the current
|
|
||||||
stable supports. L4D2's engine is so stable that SM 1.11 and
|
|
||||||
1.12 both work.
|
|
||||||
|
|
||||||
Watch for stable announcements at
|
|
||||||
[Metamod:Source news](https://www.sourcemm.net/) and
|
|
||||||
[SourceMod releases](https://github.com/alliedmodders/sourcemod/releases).
|
|
||||||
|
|
||||||
## Empirically-verified kernel quirk (relevant if you tweak the helpers)
|
|
||||||
|
|
||||||
Idmapped bind mounts on kernel 6.12 (Trixie) **do** propagate
|
|
||||||
through plain `mount --bind` re-binds. Verified end-to-end on
|
|
||||||
`left4.me` during the 2026-05-15 build-time-idmap refactor: a
|
|
||||||
sandbox process inside a re-bound idmapped mount can write files,
|
|
||||||
and those writes land on disk with the idmap-translated uid.
|
|
||||||
|
|
||||||
This contradicts some published claims (including a generic
|
|
||||||
research-agent summary) that idmaps don't propagate through plain
|
|
||||||
re-bind on this kernel. Our use case is `mount --bind --map-users
|
|
||||||
src staging` → systemd-run with `BindPaths=staging:/overlay` (a
|
|
||||||
plain re-bind into the unit's namespace). It works.
|
|
||||||
|
|
||||||
The `--map-users <a>:<b>:<count>` direction is **on-disk uid
|
|
||||||
first**, then in-mount uid. The util-linux man page calls these
|
|
||||||
`<inner>:<outer>` which is confusing — `<inner>` means "the
|
|
||||||
filesystem's native uid" (on disk) and `<outer>` means "the uid
|
|
||||||
exposed outward through the mount." Empirically verified; do not
|
|
||||||
trust the man page's word choice.
|
|
||||||
|
|
||||||
## Project integration (left4me overlays)
|
|
||||||
|
|
||||||
The project already ships overlays in `examples/script-overlays/`
|
|
||||||
that map cleanly onto the recommendations above:
|
|
||||||
|
|
||||||
| Overlay | Use it for |
|
|
||||||
|---|---|
|
|
||||||
| [`tickrate.sh`](../examples/script-overlays/tickrate.sh) | Drop-in 100-tick foundation: installs the Tickrate Enabler plugin (`tickrate_enabler.dll/.so/.vdf`) and writes the core rate cvars (`sv_minrate/maxrate 100000`, `nb_update_frequency 0.014`, `net_splitpacket_maxrate 50000`, `net_maxcleartime 0.0001`, `fps_max 0`). Required base layer for any of the higher-tick recommendations in this doc. |
|
|
||||||
| [`competitive_rework.sh`](../examples/script-overlays/competitive_rework.sh) | Pulls the entire SirPlease/L4D2-Competitive-Rework master branch into the overlay. Full confogl bundle — plugins, configs, cfgogl per-mode tuning. Opinionated for tournament versus. Use this *or* `tickrate.sh`, not both. |
|
|
||||||
| [`cedapug_maps.sh`](../examples/script-overlays/cedapug_maps.sh), [`l4d2center_maps.sh`](../examples/script-overlays/l4d2center_maps.sh) | Competitive map pools (orthogonal to cvars). |
|
|
||||||
|
|
||||||
The cvars in the "Copy-paste best practice config" section above
|
|
||||||
are intended to be applied **on top of `tickrate.sh`** — either by
|
|
||||||
adding them to an instance's `spec.config` YAML list, or by
|
|
||||||
creating a new overlay (e.g. `examples/script-overlays/ux_polish.sh`)
|
|
||||||
that writes them to `$OVERLAY/left4dead2/cfg/server.cfg`.
|
|
||||||
|
|
||||||
How `spec.config` becomes `server.cfg` (for reference):
|
|
||||||
`l4d2host/l4d2host/instances.py:52-54` joins the YAML list with
|
|
||||||
newlines into `{LEFT4ME_ROOT}/instances/{name}/server.cfg`, then
|
|
||||||
that file is staged into the runtime upper layer at instance start.
|
|
||||||
|
|
||||||
## Launch parameters (reference)
|
|
||||||
|
|
||||||
Typical srcds invocation:
|
|
||||||
|
|
||||||
```
|
|
||||||
./srcds_run -console -game left4dead2 -secure -autoupdate \
|
|
||||||
+maxplayers 8 -port 27015 +exec server.cfg +log on
|
|
||||||
```
|
|
||||||
|
|
||||||
- `-secure` enables VAC. Don't run public servers without it.
|
|
||||||
- `-autoupdate` keeps the server patched automatically.
|
|
||||||
- `+exec server.cfg` runs your config on startup.
|
|
||||||
- `-tickrate <N>` sets the engine tickrate (requires Tickrate
|
|
||||||
Enabler for `N > 30`).
|
|
||||||
|
|
||||||
## Sources
|
|
||||||
|
|
||||||
Primary references used for the recommendations above:
|
|
||||||
|
|
||||||
- [L4D2-Competitive-Rework server.cfg](https://github.com/SirPlease/L4D2-Competitive-Rework/blob/master/cfg/server.cfg) — the canonical confogl/competitive cvar block. Many cheat-flagged cvars in this doc are sourced from here.
|
|
||||||
- [L4D2-Competitive-Rework cvar_tracking.cfg](https://github.com/SirPlease/L4D2-Competitive-Rework/blob/master/cfg/cvar_tracking.cfg) — client-cvar enforcement list (anti-cheat tracking; not directly used here but useful context).
|
|
||||||
- [Lux's L4D2 high-zombie-count packet compression analysis (Steam Discussions, app/550)](https://steamcommunity.com/app/550/discussions/0/2568690416482192538/) — origin of the `net_compresspackets` / `net_splitrate` / `net_maxcleartime` recommendations.
|
|
||||||
- [L4D2 Dedicated Server Guide (Steam Community)](https://steamcommunity.com/sharedfiles/filedetails/?id=276173458)
|
|
||||||
- [L4D2 Dedicated Server Network Tweaks (Steam Discussions)](https://steamcommunity.com/app/550/discussions/1/1839063537784156851/)
|
|
||||||
- [SirPlease/Server4Dead-Project — Tickrate Enabler](https://github.com/SirPlease/Server4Dead-Project/tree/master/Tickrate%20Enabler)
|
|
||||||
- [Valve Developer Community — L4D2 console commands](https://developer.valvesoftware.com/wiki/List_of_Left_4_Dead_2_console_commands_and_variables)
|
|
||||||
- [AlliedModders — nb_update_frequency fix (Experimental)](https://forums.alliedmods.net/showthread.php?t=344019)
|
|
||||||
- [Source Multiplayer Networking — Valve Developer Community](https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking)
|
|
||||||
- [Required Versions (SourceMod wiki)](https://wiki.alliedmods.net/Required_Versions_(SourceMod))
|
|
||||||
- [MetaMod:Source news](https://www.sourcemm.net/)
|
|
||||||
- Local: `/Users/mwiegand/Projekte/left4me/cvar_list` — 2199-line dump of L4D2 cvars (positive existence reference; *not* exhaustive — see [Verifying a cvar actually exists](#verifying-a-cvar-actually-exists)).
|
|
||||||
- Local: `examples/script-overlays/tickrate.sh`, `examples/script-overlays/competitive_rework.sh` — overlay scripts that apply these settings to a left4me instance.
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,557 +0,0 @@
|
||||||
# L4D2 Workshop Overlays Implementation Plan
|
|
||||||
|
|
||||||
> **Approval gate:** This plan may be written and refined without further approval. Do not implement code changes from this plan until the user explicitly approves implementation.
|
|
||||||
|
|
||||||
**Goal:** Implement the workshop overlay feature per `docs/superpowers/specs/2026-05-07-l4d2-workshop-overlays-design.md`. Add a `WorkshopItem` registry, a typed `Overlay.type` column with a builder registry, a workshop builder that downloads from the Steam Web API and manages symlinks into a deduplicated cache, and the supporting routes, templates, jobs, and tests.
|
|
||||||
|
|
||||||
**Architecture:** Keep the v1 single-process Flask architecture. New code is additive: a `WorkshopBuilder` class registered in a builder dispatcher, a `steam_workshop` service module for the Steam Web API and downloader, two new database tables and one extended one, and two new job operations on the existing in-process worker. fuse-overlayfs mount handling in `l4d2host` is unchanged — workshop content arrives at overlay paths the same way externals do today.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
See `docs/superpowers/specs/2026-05-07-l4d2-workshop-overlays-design.md` for the design rationale. Implementation-relevant decisions:
|
|
||||||
|
|
||||||
- Typed overlays: `external` (existing rows; no-op builder) and `workshop` (new); future types deferred.
|
|
||||||
- No JSON `source_config` blob; per-type structured data in proper tables.
|
|
||||||
- `WorkshopItem` is a global deduplicated registry keyed on `steam_id`. Cache at `/var/lib/left4me/workshop_cache/{steam_id}.vpk`.
|
|
||||||
- Overlay symlinks are absolute, named `{steam_id}.vpk`; no Steam filename in any on-disk path.
|
|
||||||
- `overlay_workshop_items` is a pure association; toggle = remove/re-add.
|
|
||||||
- Collections are atomic UI bulk-imports; DB never tracks collection attribution.
|
|
||||||
- Single global admin "Refresh all workshop items" button.
|
|
||||||
- No cache GC in v1.
|
|
||||||
- `Overlay.user_id` is the scope (NULL = system, set = private); independent of `type`.
|
|
||||||
- Workshop overlays default to private; existing externals stay system-wide.
|
|
||||||
- One unified Create-overlay button with type radio; no path field — paths are always `str(overlay_id)`.
|
|
||||||
- `consumer_app_id == 550` validated at fetch/add; not stored.
|
|
||||||
- Input field accepts numeric ID, full Workshop URL, or multi-line batch.
|
|
||||||
- Auto-rebuild after add/remove with build coalescing.
|
|
||||||
- HTTPS for all Steam Web API calls.
|
|
||||||
- `Overlay.id` uses `AUTOINCREMENT`; `create_overlay_directory` uses `exist_ok=False`.
|
|
||||||
- Two partial unique indexes for overlay names: `(name) WHERE user_id IS NULL` and `(name, user_id) WHERE user_id IS NOT NULL`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current Gap
|
|
||||||
|
|
||||||
- `Overlay` rows have `id`, `name`, `path`, no type, no scope.
|
|
||||||
- The web app cannot download anything from Steam; users must SFTP `.vpk` files into prepared overlay directories.
|
|
||||||
- The job worker has no operations for overlay builds or workshop refreshes.
|
|
||||||
- The mount/build pipeline assumes overlay directories are externally populated.
|
|
||||||
- There is no UI affordance to add or list workshop content.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Extend Tests First — Schema Migration And Models
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/tests/test_workshop_overlay_models.py`
|
|
||||||
- Modify: `l4d2web/tests/test_models.py` (extend) — partial unique index behavior
|
|
||||||
|
|
||||||
Write tests against fresh SQLite schemas asserting:
|
|
||||||
|
|
||||||
- An `Overlay` migration round-trip: existing rows acquire `type='external'` and `user_id=NULL`; their `name` values remain unique by partial index.
|
|
||||||
- After migration, two externals (both `user_id=NULL`) with the same name are rejected by the system partial unique index.
|
|
||||||
- After migration, two users may both own a workshop overlay named `"my-maps"` (per-user partial unique index).
|
|
||||||
- `WorkshopItem.steam_id` is unique; concurrent inserts of the same `steam_id` raise integrity errors.
|
|
||||||
- `overlay_workshop_items` enforces `UNIQUE(overlay_id, workshop_item_id)`.
|
|
||||||
- `Overlay` deletion cascades `overlay_workshop_items` rows but does not delete `WorkshopItem` rows (`ON DELETE RESTRICT`).
|
|
||||||
- `Job.overlay_id` is nullable and references `overlays(id)`.
|
|
||||||
- `Overlay.id` does not reuse a deleted ID after the migration (AUTOINCREMENT).
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_workshop_overlay_models.py l4d2web/tests/test_models.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Schema Migration And ORM Mappings
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/alembic/versions/0002_workshop_overlays.py`
|
|
||||||
- Modify: `l4d2web/models.py`
|
|
||||||
|
|
||||||
Migration `0002_workshop_overlays` (`down_revision = "b2c684fddbd3"`):
|
|
||||||
|
|
||||||
1. `op.batch_alter_table("overlays")`:
|
|
||||||
- Add `type VARCHAR(16) NOT NULL DEFAULT 'external'` (server_default during migration; remove after backfill).
|
|
||||||
- Add `user_id INTEGER NULL REFERENCES users(id)`.
|
|
||||||
- Drop the existing `unique=True` on `name`.
|
|
||||||
- Add index `ix_overlays_type_user_id` on `(type, user_id)`.
|
|
||||||
- Switch `id` to `AUTOINCREMENT`.
|
|
||||||
2. After batch alter, create the two partial unique indexes via raw `op.create_index(..., postgresql_where=..., sqlite_where=...)`:
|
|
||||||
- `uq_overlay_name_system` on `(name)` `WHERE user_id IS NULL`.
|
|
||||||
- `uq_overlay_name_per_user` on `(name, user_id)` `WHERE user_id IS NOT NULL`.
|
|
||||||
3. `op.create_table("workshop_items", ...)` per spec data-model section.
|
|
||||||
4. `op.create_table("overlay_workshop_items", ...)` with the unique constraint and the reverse-lookup index.
|
|
||||||
5. `op.batch_alter_table("jobs")`: add `overlay_id INTEGER NULL REFERENCES overlays(id)`.
|
|
||||||
|
|
||||||
ORM (`models.py`):
|
|
||||||
|
|
||||||
- Extend `Overlay`: add `type`, `user_id`. Drop `unique=True` on `name`. Set `__table_args__` with the two partial indexes and `ix_overlays_type_user_id`.
|
|
||||||
- Extend `Job`: add `overlay_id` mapped column with FK.
|
|
||||||
- New `WorkshopItem` and `OverlayWorkshopItem` classes per spec. Set up `Overlay.workshop_items` relationship through the association.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_workshop_overlay_models.py l4d2web/tests/test_models.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
Run alembic against a fresh test DB to verify upgrade and downgrade succeed.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Tests First — Steam Web API And Downloader
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/tests/test_steam_workshop.py`
|
|
||||||
|
|
||||||
Mock HTTP with `responses` or `pytest-httpserver`. Cover:
|
|
||||||
|
|
||||||
- `parse_workshop_input` accepts a single numeric ID, a single Workshop URL (`steamcommunity.com/sharedfiles/filedetails/?id=N`), and a multi-line whitespace-separated batch of either; returns deduplicated ordered list of digit-only IDs.
|
|
||||||
- `parse_workshop_input` rejects garbage, paths outside `?id=`, non-digit IDs.
|
|
||||||
- `resolve_collection` POSTs to the HTTPS endpoint with the form-encoded payload and returns `publishedfileid` children.
|
|
||||||
- `fetch_metadata_batch` POSTs once with `itemcount=N`; returns parsed `WorkshopMetadata` per item; captures `result != 1` into `last_error`; raises `WorkshopValidationError` when any `consumer_app_id != 550` during user-add; logs and skips during refresh-mode.
|
|
||||||
- `WorkshopMetadata.preview_url` is captured.
|
|
||||||
- `download_to_cache` writes `cache_root/{steam_id}.vpk.partial`, then `os.replace` to the final name; sets `os.utime(file, (time_updated, time_updated))`.
|
|
||||||
- `download_to_cache` is idempotent: a second call where on-disk `(mtime, size)` matches `(time_updated, file_size)` is a no-op (no HTTP request issued).
|
|
||||||
- `refresh_all` runs downloads via `ThreadPoolExecutor(max_workers=8)` and reports per-item errors without aborting the batch.
|
|
||||||
- All Steam API URLs use `https://`.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_steam_workshop.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Steam Workshop Service Module
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/services/steam_workshop.py`
|
|
||||||
|
|
||||||
Public surface:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def parse_workshop_input(raw: str) -> list[str]: ...
|
|
||||||
def resolve_collection(collection_id: str) -> list[str]: ...
|
|
||||||
def fetch_metadata_batch(steam_ids: list[str], *, mode: Literal["add","refresh"]) -> list[WorkshopMetadata]: ...
|
|
||||||
def download_to_cache(meta: WorkshopMetadata, cache_root: Path, *, on_progress=None, should_cancel=None) -> Path: ...
|
|
||||||
def refresh_all(items: list[WorkshopItem], cache_root: Path, executor_workers: int = 8) -> RefreshReport: ...
|
|
||||||
```
|
|
||||||
|
|
||||||
Implementation rules:
|
|
||||||
|
|
||||||
- Endpoints are HTTPS:
|
|
||||||
- `https://api.steampowered.com/ISteamRemoteStorage/GetCollectionDetails/v1/`
|
|
||||||
- `https://api.steampowered.com/ISteamRemoteStorage/GetPublishedFileDetails/v1/`
|
|
||||||
- Form-encoded POSTs with `itemcount=N` / `collectioncount=N` and `publishedfileids[i]=…` per index.
|
|
||||||
- Per-request timeout 30s; per-item ceiling 5min. No retry or backoff in v1.
|
|
||||||
- `consumer_app_id != 550`:
|
|
||||||
- In `mode="add"`: raise `WorkshopValidationError` with the offending `steam_id`.
|
|
||||||
- In `mode="refresh"`: log and skip; do not abort other items.
|
|
||||||
- `result != 1`: capture Steam's result code in the item's `last_error`; do not download; do not abort siblings.
|
|
||||||
- Cooperative cancellation: `download_to_cache` checks `should_cancel()` between chunked reads; `refresh_all`'s executor checks before each task.
|
|
||||||
- `WorkshopMetadata` is a dataclass with `steam_id, title, filename, file_url, file_size, time_updated, preview_url, consumer_app_id, result`.
|
|
||||||
- `RefreshReport` aggregates per-item outcomes for the caller's job log.
|
|
||||||
- Use a single `requests.Session` per call site for connection reuse.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_steam_workshop.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: Tests First — Path Helpers And Overlay Creation
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/tests/test_workshop_paths.py`
|
|
||||||
- Create: `l4d2web/tests/test_overlay_creation.py`
|
|
||||||
|
|
||||||
Cover:
|
|
||||||
|
|
||||||
- `workshop_cache_root()` returns `LEFT4ME_ROOT/workshop_cache`.
|
|
||||||
- `cache_path(steam_id)` returns `cache_root / f"{steam_id}.vpk"` for valid digit strings; rejects non-digits, slashes, dot-dot.
|
|
||||||
- `generate_overlay_path(overlay_id)` returns `str(overlay_id)`; passes `validate_overlay_ref` from `l4d2host.paths`.
|
|
||||||
- `create_overlay_directory(overlay)` creates `LEFT4ME_ROOT/overlays/{path}/` with `exist_ok=False`. Calling twice raises (DB/disk drift surfaced loudly).
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_workshop_paths.py l4d2web/tests/test_overlay_creation.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 6: Path Helpers And Overlay Creation
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/services/workshop_paths.py`
|
|
||||||
- Create: `l4d2web/services/overlay_creation.py`
|
|
||||||
|
|
||||||
`workshop_paths`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def workshop_cache_root() -> Path: ... # LEFT4ME_ROOT/workshop_cache
|
|
||||||
def cache_path(steam_id: str) -> Path: ... # validates digits-only; returns cache_root/{steam_id}.vpk
|
|
||||||
```
|
|
||||||
|
|
||||||
`overlay_creation`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def generate_overlay_path(overlay_id: int) -> str: ... # str(overlay_id) + validate_overlay_ref
|
|
||||||
def create_overlay_directory(overlay: Overlay) -> None: # makedirs(..., exist_ok=False)
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_workshop_paths.py l4d2web/tests/test_overlay_creation.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 7: Tests First — Overlay Builders
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/tests/test_overlay_builders.py`
|
|
||||||
|
|
||||||
Cover with `tmp_path`:
|
|
||||||
|
|
||||||
- `BUILDERS` dict resolves `"external"` and `"workshop"` to instances; unknown types raise `KeyError` (caller's error).
|
|
||||||
- `ExternalBuilder.build()` is a no-op: makes the overlay directory if missing, writes one log line, returns. Existing files in the directory are untouched.
|
|
||||||
- `WorkshopBuilder.build()` against a fixture overlay with three associated `WorkshopItem` rows (two with cache files present, one without):
|
|
||||||
- Creates `left4dead2/addons/` if missing.
|
|
||||||
- Creates symlinks `addons/{steam_id_a}.vpk → cache_root/{steam_id_a}.vpk` for items with cache files. Symlinks are absolute.
|
|
||||||
- Skips the uncached item; emits a warning log line. Does not create a dangling symlink.
|
|
||||||
- On a re-run with the same associations: no FS changes; logs report `unchanged=2 skipped(uncached)=1`.
|
|
||||||
- On a re-run after one association is removed: removes the obsolete symlink only; leaves cache files alone.
|
|
||||||
- On a re-run after one item is added: adds only the new symlink.
|
|
||||||
- Files in `addons/` that aren't symlinks into the cache are left untouched.
|
|
||||||
- `should_cancel` mid-build: stops between filesystem ops; partial state is consistent and a re-run heals.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_overlay_builders.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 8: Overlay Builders And Dispatcher
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/services/overlay_builders.py`
|
|
||||||
|
|
||||||
```python
|
|
||||||
class OverlayBuilder(Protocol):
|
|
||||||
def build(self, overlay: Overlay, *, on_stdout, on_stderr, should_cancel) -> None: ...
|
|
||||||
|
|
||||||
class ExternalBuilder: ...
|
|
||||||
class WorkshopBuilder: ...
|
|
||||||
|
|
||||||
BUILDERS: dict[str, OverlayBuilder] = {
|
|
||||||
"external": ExternalBuilder(),
|
|
||||||
"workshop": WorkshopBuilder(),
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
`WorkshopBuilder.build()`:
|
|
||||||
|
|
||||||
1. Load the overlay's `WorkshopItem` rows.
|
|
||||||
2. `os.makedirs(overlay_root / "left4dead2/addons", exist_ok=True)`.
|
|
||||||
3. Compute `desired = {f"{steam_id}.vpk": cache_path(steam_id)}` for items where `last_downloaded_at IS NOT NULL` and the cache file exists. Skip and warn for items missing a cache file.
|
|
||||||
4. Inspect existing entries in `addons/` via `os.scandir`: keep entries that are not symlinks into `workshop_cache`; otherwise diff against `desired` and apply changes via `os.unlink` and `os.symlink(absolute_target, link_path)`.
|
|
||||||
5. Emit `created N, removed M, unchanged K, skipped (uncached) S` log line.
|
|
||||||
6. Check `should_cancel()` between filesystem ops.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_overlay_builders.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 9: Tests First — Worker Scheduler Truth Table And Coalescing
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/tests/test_job_worker.py`
|
|
||||||
|
|
||||||
Add coverage:
|
|
||||||
|
|
||||||
- Truth table for `can_start`:
|
|
||||||
- `install` not claimed while `refresh_workshop_items`, any `build_overlay`, or any server job is running.
|
|
||||||
- `refresh_workshop_items` not claimed while `install`, any `build_overlay`, or any server job is running.
|
|
||||||
- `build_overlay(N)` not claimed while `install`, `refresh_workshop_items`, or another `build_overlay(N)` is running. Two `build_overlay` jobs for **different** overlay IDs claim concurrently.
|
|
||||||
- Server start/init blocks if `refresh_workshop_items` runs or if any `build_overlay(N)` runs where N ∈ overlays of the server's blueprint.
|
|
||||||
- `enqueue_build_overlay(overlay_id)`:
|
|
||||||
- Inserts a new queued job when no pending job exists.
|
|
||||||
- Returns the existing pending job when one is already queued (coalescing).
|
|
||||||
- Does not coalesce against running jobs (a new add after build start gets a fresh queued job).
|
|
||||||
- `refresh_workshop_items` post-completion enqueues `build_overlay` only for overlays whose items had `time_updated` advance or `filename` change; each such enqueue uses the coalescing helper.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_job_worker.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 10: Worker Scheduler And New Operations
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/services/job_worker.py`
|
|
||||||
|
|
||||||
Changes:
|
|
||||||
|
|
||||||
- Define `OVERLAY_OPERATIONS = {"build_overlay"}` and `GLOBAL_OPERATIONS = {"install", "refresh_workshop_items"}`. Update `malformed_server_job` to allow `server_id IS NULL` for these.
|
|
||||||
- Extend `SchedulerState` with `running_overlays: set[int]` and `refresh_running: bool`.
|
|
||||||
- Update `claim_next_job()`:
|
|
||||||
- Compute `running_overlays` from queries against `running` jobs of operation `build_overlay`.
|
|
||||||
- Apply the truth-table rules above.
|
|
||||||
- Continue using `created_at, id` ordering for deterministic claim.
|
|
||||||
- Add `enqueue_build_overlay(overlay_id: int) -> Job` helper:
|
|
||||||
- Look for `queued` `build_overlay` job with same `overlay_id`. Return it if present.
|
|
||||||
- Otherwise insert a new queued job with `overlay_id` set, `server_id=None`, `operation="build_overlay"`.
|
|
||||||
- Update `run_job` dispatch:
|
|
||||||
- `build_overlay` → load `Overlay`, dispatch to `BUILDERS[overlay.type].build(overlay, on_stdout, on_stderr, should_cancel)`.
|
|
||||||
- `refresh_workshop_items` → call `steam_workshop.refresh_all(...)`. After completion, for each affected overlay, call `enqueue_build_overlay(overlay_id)`.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_job_worker.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 11: Tests First — Routes, Permissions, And Auto-Rebuild
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/tests/test_overlays.py`
|
|
||||||
- Create: `l4d2web/tests/test_workshop_routes.py`
|
|
||||||
|
|
||||||
Cover:
|
|
||||||
|
|
||||||
- `POST /overlays` with `type='workshop'` and `name` succeeds for any logged-in user; `path` is auto-generated; `user_id` is set; the directory exists at `LEFT4ME_ROOT/overlays/{id}`.
|
|
||||||
- `POST /overlays` with `type='external'` succeeds only for admins; `user_id` is NULL.
|
|
||||||
- Duplicate workshop name within the same user is rejected; duplicate names across users are accepted.
|
|
||||||
- Duplicate external name is rejected.
|
|
||||||
- Non-admins see `type='external' OR user_id=current_user.id` only when listing overlays.
|
|
||||||
- `POST /overlays/{id}/items` with one numeric ID adds an association and enqueues a coalesced `build_overlay`. The response is an HTMX fragment of the updated item table.
|
|
||||||
- `POST /overlays/{id}/items` with a multi-line batch (mix of IDs and URLs) adds all and enqueues one coalesced job for the batch.
|
|
||||||
- `POST /overlays/{id}/items` with a collection ID resolves members and adds N associations.
|
|
||||||
- Adding a non-L4D2 item (`consumer_app_id != 550`) returns HTTP 400 with a useful message; no association is created.
|
|
||||||
- Adding an item already in the overlay returns "already in overlay" (no 500).
|
|
||||||
- `POST /overlays/{id}/items/{item_id}/delete` removes the association and enqueues a coalesced build.
|
|
||||||
- `POST /overlays/{id}/build` enqueues the manual rebuild and redirects to the job page.
|
|
||||||
- `POST /admin/workshop/refresh` is admin-only; non-admins receive 403.
|
|
||||||
|
|
||||||
Mock `steam_workshop` HTTP layer for these tests.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_overlays.py l4d2web/tests/test_workshop_routes.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 12: Routes And Templates
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/routes/overlay_routes.py`
|
|
||||||
- Create: `l4d2web/routes/workshop_routes.py`
|
|
||||||
- Modify: `l4d2web/routes/page_routes.py`
|
|
||||||
- Modify: `l4d2web/templates/overlays.html`
|
|
||||||
- Modify: `l4d2web/templates/overlay_detail.html`
|
|
||||||
- Create: `l4d2web/templates/_overlay_item_table.html`
|
|
||||||
- Modify: `l4d2web/templates/admin.html`
|
|
||||||
- Modify: `l4d2web/app.py` (register the workshop blueprint)
|
|
||||||
|
|
||||||
`overlay_routes.py`:
|
|
||||||
|
|
||||||
- `create_overlay`: read `type` and `name` from form. No `path` field accepted.
|
|
||||||
- `type='external'`: admin-only; `user_id=NULL`. After insert, set `path = generate_overlay_path(id)`; call `create_overlay_directory(overlay)`.
|
|
||||||
- `type='workshop'`: any logged-in user; `user_id=current_user.id`. After insert, set `path = generate_overlay_path(id)`; call `create_overlay_directory(overlay)`.
|
|
||||||
- `update_overlay`: forbid changing `type` and `path`. Workshop: owner or admin can edit `name`. External: admin-only `name` edits.
|
|
||||||
- `delete_overlay`: after the row deletes, `shutil.rmtree(LEFT4ME_ROOT/overlays/{path})` only if `overlay.path == str(overlay.id)` (legacy externals are left alone). Cache untouched.
|
|
||||||
|
|
||||||
`workshop_routes.py`:
|
|
||||||
|
|
||||||
- `POST /overlays/{id}/items`: parse input via `parse_workshop_input`; if a collection ID, resolve members; batch-fetch metadata in `mode="add"`; reject non-550 with HTTP 400; upsert `WorkshopItem` via SQLite `INSERT ... ON CONFLICT DO UPDATE` on `steam_id`; bulk-add associations catching `(overlay_id, workshop_item_id)` unique violations; call `enqueue_build_overlay(overlay_id)`; return rendered `_overlay_item_table.html` fragment.
|
|
||||||
- `POST /overlays/{id}/items/{item_id}/delete`: ownership check; remove association; call `enqueue_build_overlay(overlay_id)`; return updated fragment.
|
|
||||||
- `POST /overlays/{id}/build`: ownership check; enqueue (coalesced); redirect to `/jobs/{job_id}`.
|
|
||||||
- `POST /admin/workshop/refresh`: `@require_admin`; insert a `refresh_workshop_items` queued job; redirect to `/admin/jobs`.
|
|
||||||
|
|
||||||
`page_routes.py`:
|
|
||||||
|
|
||||||
- `overlays()`: admins see all; non-admins see `type='external' OR user_id=current_user.id`.
|
|
||||||
- `overlay_detail()`: load `WorkshopItem` rows for workshop-type overlays.
|
|
||||||
|
|
||||||
Templates:
|
|
||||||
|
|
||||||
- `overlays.html`: add Type column. Modal has type radio (External | Workshop) and name field. No path field.
|
|
||||||
- `overlay_detail.html`: branch on `overlay.type`.
|
|
||||||
- External view: read-only path display, name edit (admin only).
|
|
||||||
- Workshop view: an `<textarea>` accepting one or many IDs/URLs plus a radio (Items | Collection); item table with thumbnail (`preview_url`), `steam_id` linked to Steam, title, filename, time_updated, file_size, last_error, Remove; Rebuild button; small status indicator showing the latest related job.
|
|
||||||
- `_overlay_item_table.html`: renderable standalone for HTMX swaps.
|
|
||||||
- `admin.html`: add a CSRF-protected "Refresh all workshop items" button.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_overlays.py l4d2web/tests/test_workshop_routes.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 13: Tests First — Initialize-Time Guard
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/tests/test_l4d2_facade.py` (or create if missing)
|
|
||||||
|
|
||||||
Cover:
|
|
||||||
|
|
||||||
- `initialize_server(server_id)` calls `BUILDERS[overlay.type].build()` for each overlay in the blueprint before writing the spec.
|
|
||||||
- For workshop overlays, when an associated `WorkshopItem` lacks a cache file (`workshop_cache/{steam_id}.vpk` missing), `initialize_server` raises a clear error containing the missing `steam_id`s and the overlay name; the spec is not written; `l4d2ctl initialize` is not invoked.
|
|
||||||
- For workshop overlays where all items have cache files, the symlinks are present and `l4d2ctl initialize` runs.
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_l4d2_facade.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 14: Initialize-Time Guard
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/services/l4d2_facade.py`
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Before writing the temp spec, iterate over the blueprint's overlays and call `BUILDERS[overlay.type].build(...)`.
|
|
||||||
- For workshop overlays, the builder logs and skips uncached items rather than failing. After all builders run, perform a second pass: query the blueprint's workshop overlays for any associated `WorkshopItem` with no cache file. If any are found, raise an exception whose message names the missing `steam_id`s and points at the overlay page (`Open overlay {name} ({id}) and click Build`).
|
|
||||||
|
|
||||||
Verification command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_l4d2_facade.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected after implementation: PASS.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 15: Deploy Provisioning
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/install.sh` (or whichever provisioning script creates `/var/lib/left4me/`)
|
|
||||||
- Modify: `deploy/README.md`
|
|
||||||
|
|
||||||
Behavior:
|
|
||||||
|
|
||||||
- Provisioning creates `/var/lib/left4me/workshop_cache/` (mode 0755), owned by the web user.
|
|
||||||
- `deploy/README.md` documents:
|
|
||||||
- The new directory and its purpose.
|
|
||||||
- Permission requirement: web user owns; host user reads (shared group with `g+r` if uids differ).
|
|
||||||
- `LEFT4ME_ROOT` layout updated with the new subtree.
|
|
||||||
|
|
||||||
No tests; verify via test deploy.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 16: Full Verification And Manual Test Plan
|
|
||||||
|
|
||||||
Run focused suites first:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests/test_workshop_overlay_models.py -q
|
|
||||||
pytest l4d2web/tests/test_models.py -q
|
|
||||||
pytest l4d2web/tests/test_steam_workshop.py -q
|
|
||||||
pytest l4d2web/tests/test_workshop_paths.py l4d2web/tests/test_overlay_creation.py -q
|
|
||||||
pytest l4d2web/tests/test_overlay_builders.py -q
|
|
||||||
pytest l4d2web/tests/test_job_worker.py -q
|
|
||||||
pytest l4d2web/tests/test_overlays.py l4d2web/tests/test_workshop_routes.py -q
|
|
||||||
pytest l4d2web/tests/test_l4d2_facade.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Then run the full web suite:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pytest l4d2web/tests -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Manual test plan on the test deploy:
|
|
||||||
|
|
||||||
1. Apply migration on a copy of the prod DB; verify all existing overlays read as `type='external'`, `user_id=NULL`; names still unique by partial index; two externals with the same name are rejected.
|
|
||||||
2. As non-admin, create a workshop overlay. Add a known popular L4D2 addon by URL. Verify the build job auto-enqueues. Verify symlink + cache file. Confirm web UI shows metadata and thumbnail.
|
|
||||||
3. Paste a multi-line block of item IDs and URLs. Verify all are parsed and added; verify coalescing (only one `build_overlay` job runs).
|
|
||||||
4. Add a 50-item collection. Verify all 50 metadata rows appear and no UI mention of "from collection". Verify single coalesced build job.
|
|
||||||
5. Remove an item. Verify auto-rebuild removes the symlink while the cache file remains.
|
|
||||||
6. As admin, click Refresh All. Verify only items with newer `time_updated` re-download. Verify affected overlays get coalesced `build_overlay` jobs enqueued.
|
|
||||||
7. Boot an L4D2 server with a workshop overlay attached. Connect locally and confirm the maps appear in the map vote and load.
|
|
||||||
8. Concurrency probe: enqueue Refresh All while a `build_overlay` is queued; verify scheduler waits per truth table.
|
|
||||||
9. Initialize-time guard: manually delete a cache file for an item that's in an overlay attached to a server's blueprint. Try to start the server; verify clear error mentioning the missing `steam_id`.
|
|
||||||
10. Negative: paste a non-L4D2 workshop ID (e.g., a Skyrim mod). Expect HTTP 400 with a clear message; no row inserted.
|
|
||||||
11. Negative: simulate Steam API down (block egress). Verify add fails with clean error, not 500. Verify refresh job logs the failure.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Commit Strategy
|
|
||||||
|
|
||||||
Use small commits after passing relevant tests:
|
|
||||||
|
|
||||||
1. `feat(l4d2-web): typed overlays + workshop schema migration`
|
|
||||||
2. `feat(l4d2-web): steam workshop API client and downloader`
|
|
||||||
3. `feat(l4d2-web): overlay path helpers and creation`
|
|
||||||
4. `feat(l4d2-web): overlay builder registry with workshop builder`
|
|
||||||
5. `feat(l4d2-web): worker support for build_overlay and refresh_workshop_items`
|
|
||||||
6. `feat(l4d2-web): workshop overlay UI (routes + templates)`
|
|
||||||
7. `feat(l4d2-web): initialize-time guard for uncached workshop items`
|
|
||||||
8. `feat(deploy): workshop_cache provisioning`
|
|
||||||
|
|
||||||
Do not commit unless the user explicitly asks for commits.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Open Approval Gate
|
|
||||||
|
|
||||||
Before modifying implementation files, ask the user for explicit approval to proceed with the workshop-overlays implementation.
|
|
||||||
|
|
@ -1,229 +0,0 @@
|
||||||
# Kernel Overlayfs Helper Implementation Plan
|
|
||||||
|
|
||||||
> **Approval status:** User-approved 2026-05-08. Implementation proceeds.
|
|
||||||
|
|
||||||
**Goal:** Implement the kernel-overlayfs migration per `docs/superpowers/specs/2026-05-08-kernel-overlayfs-helper-design.md`. Add a Python `left4me-overlay` privileged helper, a `KernelOverlayFSMounter` Python class, wire the existing `OverlayMounter` ABC through `l4d2host/instances.py`, drop `fuse-overlayfs` from the deploy stack, and migrate existing on-disk upper/work directories.
|
|
||||||
|
|
||||||
**Architecture:** The web app continues to call `l4d2ctl start|stop|delete <name>`; `l4d2host` continues to expose the same CLI verbs. Internally, `start_instance`/`stop_instance`/`delete_instance` move from a hardcoded subprocess call to `fuse-overlayfs`/`fusermount3` to using `KernelOverlayFSMounter`, which invokes the new sudo helper that mounts in PID 1's namespace via `nsenter`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
See `docs/superpowers/specs/2026-05-08-kernel-overlayfs-helper-design.md` for the design rationale. Implementation-relevant summary:
|
|
||||||
|
|
||||||
- `left4me-overlay` Python helper in `/usr/local/libexec/left4me/`, owned root, mode 0755, system `/usr/bin/python3`, stdlib only.
|
|
||||||
- Verbs: `mount <name>`, `umount <name>`.
|
|
||||||
- Validation in helper: name regex; realpath + allowlist for each lowerdir; exact-prefix check for upper/work/merged; reject upperdir with `user.fuseoverlayfs.*` xattrs; lowerdir count ≤ 500.
|
|
||||||
- Sudoers verb-constrained: `mount *`, `umount *`.
|
|
||||||
- `KernelOverlayFSMounter` in `l4d2host/fs/kernel_overlayfs.py` — implements `OverlayMounter`. Derives `name` from the merged path's parent.
|
|
||||||
- `start_instance` adds `os.path.ismount(merged)` guard before mounting.
|
|
||||||
- Deploy migration: gated on sentinel file `/var/lib/left4me/.kernel-overlay-migrated`; stops gameservers + web, force-unmounts stale mounts, wipes upper/work, recreates empty.
|
|
||||||
- Web unit cleanup: drop `MountFlags=shared`, restore `PrivateTmp=true`, rewrite comment block. Keep `NoNewPrivileges` unset.
|
|
||||||
- Delete `l4d2host/fs/fuse_overlayfs.py` (currently unused — `start_instance` bypasses it).
|
|
||||||
- AGENTS.md contracts unchanged.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current Gap
|
|
||||||
|
|
||||||
- `l4d2host/instances.py` `start_instance` calls `fuse-overlayfs` directly (lines 85-101); `stop_instance`/`delete_instance` call `fusermount3 -u` directly. The `OverlayMounter` ABC at `l4d2host/fs/base.py` and the `FuseOverlayFSMounter` impl at `l4d2host/fs/fuse_overlayfs.py` exist but are unused.
|
|
||||||
- Mounts land in the web service's private mount namespace, invisible to host and to gameserver units. `MountFlags=shared` does not fix it.
|
|
||||||
- No privileged mount helper exists; only `left4me-systemctl` and `left4me-journalctl`.
|
|
||||||
- Deploy script installs `fuse-overlayfs` apt package and assumes it as a runtime tool.
|
|
||||||
- Existing `runtime/<name>/upper` directories may carry `user.fuseoverlayfs.*` xattrs that kernel overlayfs would silently ignore (resurrecting "deleted" files).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Helper Script + Sudoers + Mounter Class (RED-first)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/usr/local/libexec/left4me/left4me-overlay` (Python, mode 0755 after deploy)
|
|
||||||
- Modify: `deploy/files/etc/sudoers.d/left4me`
|
|
||||||
- Create: `l4d2host/fs/kernel_overlayfs.py`
|
|
||||||
- Create: `l4d2host/tests/test_kernel_overlayfs.py`
|
|
||||||
- Create: `l4d2host/tests/test_overlay_helper.py`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (assert helper deployed + sudoers entry)
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
1. `test_kernel_overlayfs.py::test_mount_invokes_helper_with_name` — mock `run_command`, call `KernelOverlayFSMounter().mount(lowerdirs="/x:/y", upperdir=Path("/var/lib/left4me/runtime/alpha/upper"), workdir=Path("/var/lib/left4me/runtime/alpha/work"), merged=Path("/var/lib/left4me/runtime/alpha/merged"))`, assert argv `["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "mount", "alpha"]`.
|
|
||||||
2. `test_kernel_overlayfs.py::test_unmount_invokes_helper_with_umount_verb` — mock + call + assert argv with `umount`.
|
|
||||||
3. `test_overlay_helper.py` — drives the helper script as a subprocess with `LEFT4ME_OVERLAY_PRINT_ONLY=1` env var (helper prints the would-be `nsenter …` command line and exits 0 instead of execve), and with isolated `LEFT4ME_ROOT=tmp_path`. Cases:
|
|
||||||
- Valid mount: prints expected `nsenter --mount=/proc/1/ns/mnt -- /bin/mount -t overlay …` line.
|
|
||||||
- Valid umount: prints expected umount line.
|
|
||||||
- Bad name (`../escape`, uppercase, empty): exit non-zero, stderr matches.
|
|
||||||
- Lowerdir traversal (`/etc`, `/var/lib/left4me/../etc`, symlink escape): exit non-zero.
|
|
||||||
- Missing `instance.env`: exit non-zero.
|
|
||||||
- Tainted upperdir (with `user.fuseoverlayfs.opaque` xattr): exit non-zero with clear message. (Optional: skip if `setfattr` is unavailable on dev machine; keep test on Linux only via `pytest.mark.skipif`.)
|
|
||||||
- Lowerdir count > 500: exit non-zero.
|
|
||||||
4. `test_deploy_artifacts.py` — assert `/usr/local/libexec/left4me/left4me-overlay` is present in deployed files; sudoers includes the new lines.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Helper script structure: `argparse` for the verb, then path-validation funcs, then `os.execv("/usr/bin/nsenter", [...])` (or printing it under `LEFT4ME_OVERLAY_PRINT_ONLY`).
|
|
||||||
- `KernelOverlayFSMounter`: `name = merged.parent.name` (with a one-line comment), then `run_command(["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", verb, name], on_stdout=…, on_stderr=…, passthrough=…, should_cancel=…)`.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2host/tests/test_kernel_overlayfs.py l4d2host/tests/test_overlay_helper.py deploy/tests/test_deploy_artifacts.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected before implementation: FAIL on missing class/script. After: all green.
|
|
||||||
|
|
||||||
**Commit:** `feat(l4d2-host): KernelOverlayFSMounter + left4me-overlay helper`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Wire OverlayMounter Through Lifecycle + Drop Fuse Module
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2host/instances.py` (start/stop/delete)
|
|
||||||
- Modify: `l4d2host/tests/test_lifecycle.py` (update argv assertions, add double-mount guard test)
|
|
||||||
- Delete: `l4d2host/fs/fuse_overlayfs.py`
|
|
||||||
- Verify: `l4d2host/fs/__init__.py` does not re-export `FuseOverlayFSMounter`
|
|
||||||
|
|
||||||
Test plan (update RED, then GREEN):
|
|
||||||
|
|
||||||
1. `test_lifecycle.py::test_start_order` — change assertion: `calls[0]` is now `["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "mount", "alpha"]`. Adjust setup so the test still creates the merged directory.
|
|
||||||
2. `test_lifecycle.py::test_stop_succeeds_when_unmount_fails` — `cmd[0:5] == ["sudo", "-n", "/usr/local/libexec/left4me/left4me-overlay", "umount", "alpha"]`.
|
|
||||||
3. `test_lifecycle.py::test_delete_succeeds_when_unmount_fails` — same.
|
|
||||||
4. NEW `test_lifecycle.py::test_start_refuses_double_mount` — monkeypatch `os.path.ismount` to return True; expect `start_instance` to raise `subprocess.CalledProcessError`; assert NO mount command was issued.
|
|
||||||
5. `test_lifecycle.py::test_lifecycle_rejects_unsafe_instance_names` — unchanged.
|
|
||||||
6. `test_lifecycle.py::test_delete_missing_is_noop` — unchanged.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- `instances.py` imports `KernelOverlayFSMounter`. Module-level singleton instance (`_mounter = KernelOverlayFSMounter()`). Replace direct `run_command([...fuse-overlayfs...])` with `_mounter.mount(...)`. Replace direct `run_command([...fusermount3...])` with `_mounter.unmount(...)` (still inside the existing try/except for stop/delete).
|
|
||||||
- Add the ismount guard at the top of `start_instance` after `runtime_dir` is computed, before `emit_step("mounting runtime overlay...")`. Raise `subprocess.CalledProcessError(returncode=1, cmd=["mount-guard"], stderr="runtime overlay already mounted at <path>; refusing to double-mount")`.
|
|
||||||
- Delete `l4d2host/fs/fuse_overlayfs.py`.
|
|
||||||
- Confirm `l4d2host/fs/__init__.py` is empty (already verified to be 1 line).
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2host/tests -q
|
|
||||||
python3 -m pytest l4d2web/tests -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Both green. Web tests: the `"Step: mounting runtime overlay..."` log line is preserved in `start_instance`.
|
|
||||||
|
|
||||||
**Commit:** `refactor(l4d2-host): start/stop/delete go through OverlayMounter; drop FuseOverlayFSMounter`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Deploy Script Migration (Apt Deps + Wipe Upper/Work)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/deploy-test-server.sh`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (assert deploy script contains migration lines; assert `fuse-overlayfs` no longer in apt-get install)
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `test_deploy_artifacts.py::test_deploy_script_drops_fuse_overlayfs_apt_dep` — `assert "fuse-overlayfs" not in deploy_script` and `assert "kernel-overlay-migrated" in deploy_script`.
|
|
||||||
2. `test_deploy_artifacts.py::test_deploy_script_migration_block_uses_sentinel` — `assert ".kernel-overlay-migrated" in deploy_script`.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
In `deploy/deploy-test-server.sh`, drop `fuse-overlayfs` from the apt-get and dnf lines (lines 82, 84). Insert before the existing `systemctl restart left4me-web.service` (line 182):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
# One-time migration: fuse-overlayfs upperdir → kernel overlayfs upperdir.
|
|
||||||
# fuse-overlayfs running as the left4me user uses user.fuseoverlayfs.* xattrs
|
|
||||||
# for whiteouts and opaque dirs; kernel overlayfs ignores those, so any
|
|
||||||
# pre-existing upper/ from the fuse era would resurrect "deleted" files.
|
|
||||||
sentinel=/var/lib/left4me/.kernel-overlay-migrated
|
|
||||||
if [ ! -e "$sentinel" ]; then
|
|
||||||
$sudo_cmd systemctl stop 'left4me-server@*.service' 2>/dev/null || true
|
|
||||||
$sudo_cmd systemctl stop left4me-web.service 2>/dev/null || true
|
|
||||||
$sudo_cmd sh -c 'findmnt -t fuse.fuse-overlayfs -o TARGET --noheadings | xargs -r -n1 fusermount3 -u 2>/dev/null || true'
|
|
||||||
$sudo_cmd sh -c "findmnt -t overlay -o TARGET --noheadings | grep '/var/lib/left4me/runtime/' | xargs -r -n1 umount 2>/dev/null || true"
|
|
||||||
$sudo_cmd sh -c 'for d in /var/lib/left4me/runtime/*/; do [ -d "$d" ] || continue; rm -rf "$d/upper" "$d/work"; mkdir -p "$d/upper" "$d/work"; chown left4me:left4me "$d/upper" "$d/work"; done'
|
|
||||||
$sudo_cmd touch "$sentinel"
|
|
||||||
$sudo_cmd chown left4me:left4me "$sentinel"
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest deploy/tests -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Green.
|
|
||||||
|
|
||||||
**Commit:** `chore(deploy): drop fuse-overlayfs apt dep + one-shot migrate upper/work`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Web Unit Hardening Cleanup + Docs
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/files/usr/local/lib/systemd/system/left4me-web.service`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py`
|
|
||||||
- Modify: `README.md`
|
|
||||||
- Modify: `l4d2host/README.md`
|
|
||||||
- Modify: `deploy/README.md`
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `test_deploy_artifacts.py::test_web_unit_contains_required_runtime_contract` — drop `assert "MountFlags=shared" in unit` (or rather: replace with `assert "MountFlags=" not in unit`); add `assert "PrivateTmp=true" in unit`; add `assert "left4me-overlay" not in unit` (just to be precise — the unit shouldn't reference the helper directly, only via Python code).
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
Edit `left4me-web.service`:
|
|
||||||
|
|
||||||
- Drop `MountFlags=shared`.
|
|
||||||
- Restore `PrivateTmp=true`.
|
|
||||||
- Rewrite the comment block above hardening lines to explain: mounts now go through the `left4me-overlay` helper which `nsenter`s into PID 1's mount namespace, so this unit's namespace is irrelevant to gameserver visibility. `NoNewPrivileges` stays unset because sudo is setuid.
|
|
||||||
|
|
||||||
README updates:
|
|
||||||
|
|
||||||
- `README.md` (line ~59): drop fuse-overlayfs from tech-stack list; replace with "kernel overlayfs via privileged helper".
|
|
||||||
- `l4d2host/README.md`: lines 29, 52, 64 reference fuse — update to "kernel overlayfs (mount via the `left4me-overlay` helper deployed to `/usr/local/libexec/left4me/`)".
|
|
||||||
- `deploy/README.md`: add `/usr/local/libexec/left4me/left4me-overlay` to the privileged-helpers inventory.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest deploy/tests -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Green. Manual readthrough of the three READMEs confirms no stale fuse references.
|
|
||||||
|
|
||||||
**Commit:** `chore(deploy): cleanup left4me-web hardening + docs for kernel overlayfs`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: End-to-End Verification on `ckn@10.0.4.128`
|
|
||||||
|
|
||||||
**Pre-deploy:** branch is clean, all four prior commits land, all tests green locally.
|
|
||||||
|
|
||||||
**Deploy:**
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/deploy-test-server.sh ckn@10.0.4.128
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification commands on the box:**
|
|
||||||
|
|
||||||
1. `test -e /var/lib/left4me/.kernel-overlay-migrated && echo migrated` — sentinel created.
|
|
||||||
2. `systemctl status left4me-web.service --no-pager` — `active (running)`, recent invocation timestamp.
|
|
||||||
3. From the UI or via `sudo -u left4me /opt/left4me/.venv/bin/l4d2ctl start test-server` — exit 0.
|
|
||||||
4. `findmnt /var/lib/left4me/runtime/test-server/merged` — shows fstype `overlay` in the host namespace.
|
|
||||||
5. `systemctl status left4me-server@test-server --no-pager` — `active (running)` after the start; **not** in `activating (auto-restart)`. No `status=200/CHDIR` errors in `journalctl -u left4me-server@test-server`.
|
|
||||||
6. `sudo journalctl -k --since "5 minutes ago" | grep -i apparmor | tail` — no overlay-related denials.
|
|
||||||
7. Negative test: `sudo -u left4me sudo -n /usr/local/libexec/left4me/left4me-overlay mount '../escape'` — exits non-zero with validation error.
|
|
||||||
8. Idempotency: `l4d2ctl stop test-server && l4d2ctl stop test-server` — both succeed (per the prior `fix(l4d2-host): make stop_instance idempotent` commit, still holds).
|
|
||||||
9. Re-start: `l4d2ctl start test-server` — succeeds, `findmnt` shows the mount again.
|
|
||||||
10. Double-mount guard: while the server is running, attempting another start (not via UI; via Python REPL or a second job) — `start_instance` raises `CalledProcessError` with the "refusing to double-mount" message. Optional, can be left to the unit test.
|
|
||||||
|
|
||||||
**On failure of any step:** stop and report. Do NOT push. The deploy script is rerunnable; the migration sentinel stays so wipe doesn't repeat.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
- See spec's "Out Of Scope" section.
|
|
||||||
- This plan does not push commits; pushing is a separate user decision after end-to-end verification passes.
|
|
||||||
|
|
@ -1,350 +0,0 @@
|
||||||
# L4D2 Script Overlays Implementation Plan
|
|
||||||
|
|
||||||
> **Approval status:** User-approved 2026-05-08. Implementation proceeds.
|
|
||||||
|
|
||||||
**Goal:** Implement the `script` overlay type per `docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md`. Add an `Overlay.script` TEXT column and `Overlay.last_build_status` enum-string column, a `ScriptBuilder` that runs user bash inside a `bubblewrap` + `systemd-run --scope` sandbox via a new `left4me-script-sandbox` privileged helper, route + UI surface for editing/wiping/rebuilding, and delete the entire managed-globals (`l4d2center_maps`, `cedapug_maps`) subsystem and its daily-refresh timer/CLI.
|
|
||||||
|
|
||||||
**Architecture:** The web app continues to enqueue `build_overlay` jobs for any overlay row. The job worker dispatches via `BUILDERS[overlay.type].build(...)`. After this change `BUILDERS = {"workshop": WorkshopBuilder(), "script": ScriptBuilder()}`. The new `ScriptBuilder` writes `overlay.script` to a tmpfile and execs `sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>`, which itself execs `systemd-run --scope --collect ... -- bwrap [namespace flags] /bin/bash /script.sh`. stdout/stderr stream through the existing `run_with_streamed_output` helper into the existing job-log SSE plumbing. The job-completion path writes `Overlay.last_build_status` based on the build outcome. The kernel-overlayfs mount layer (`KernelOverlayFSMounter`) is unchanged.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
See `docs/superpowers/specs/2026-05-08-l4d2-script-overlays-design.md` for design rationale. Implementation-relevant summary:
|
|
||||||
|
|
||||||
- Final overlay type list: `workshop` (unchanged) + `script` (new). Drop `l4d2center_maps`, `cedapug_maps`.
|
|
||||||
- New columns on `overlays`: `script TEXT NOT NULL DEFAULT ''`, `last_build_status VARCHAR(16) NOT NULL DEFAULT ''`.
|
|
||||||
- Drop tables (FK order): `global_overlay_item_files`, `global_overlay_items`, `global_overlay_sources`.
|
|
||||||
- `ScriptBuilder` in `l4d2web/services/overlay_builders.py`, uses existing `run_with_streamed_output`.
|
|
||||||
- Privileged helper `left4me-script-sandbox` (bash, mode 0755, owned root). `systemd-run --scope --collect -p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 -p CPUQuota=200% -p RuntimeMaxSec=3600 -- bwrap …`. Limits 1 h walltime, 4 GB RAM, 20 GB post-build `du` cap.
|
|
||||||
- New system user `l4d2-sandbox` (`/usr/sbin/nologin`, no home). New apt dep `bubblewrap`.
|
|
||||||
- Sudoers verb-unrestricted: `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox`.
|
|
||||||
- Daily refresh subsystem deleted: `left4me-refresh-global-overlays.{timer,service}` and `flask refresh-global-overlays` CLI removed. No replacement.
|
|
||||||
- Wipe is the same sandbox helper invoked with the literal script `find /overlay -mindepth 1 -delete`.
|
|
||||||
- `auto_refresh` column NOT added in this iteration.
|
|
||||||
- Test deploy DB is wiped on rollout; migration includes `DELETE FROM overlays WHERE type IN ('l4d2center_maps', 'cedapug_maps')` for safety.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current Gap
|
|
||||||
|
|
||||||
- `l4d2web/models.py` `Overlay` has no `script` or `last_build_status` columns. The 3 globals tables are present.
|
|
||||||
- `l4d2web/services/overlay_builders.py` `BUILDERS = {"workshop": WorkshopBuilder(), "l4d2center_maps": GlobalMapOverlayBuilder(), "cedapug_maps": GlobalMapOverlayBuilder()}`. No `ScriptBuilder`.
|
|
||||||
- `l4d2web/services/{global_map_sources,global_overlay_refresh,global_map_cache,global_overlays}.py` exist and are referenced by routes / CLI.
|
|
||||||
- `l4d2web/services/job_worker.py` carries `refresh_global_overlays_running` plumbing.
|
|
||||||
- `l4d2web/cli.py` defines `refresh-global-overlays`.
|
|
||||||
- `l4d2web/routes/overlay_routes.py` has no `/script`, `/wipe`, or `/build` endpoints for non-workshop types.
|
|
||||||
- `l4d2web/templates/overlays.html` create modal type radio offers only `workshop`.
|
|
||||||
- `l4d2web/templates/overlay_detail.html` has a global-source block (~lines 34–46) that should not survive.
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.{timer,service}` exist.
|
|
||||||
- `deploy/deploy-test-server.sh` provisions `global_overlay_cache/` and does not provision `l4d2-sandbox` or install `bubblewrap`.
|
|
||||||
- Seven `tests/test_global_*.py` files exist and reference removed code.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Schema migration (alembic 0005)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `l4d2web/alembic/versions/0005_script_overlays.py` (revises `0004_drop_legacy_external_overlay_type`).
|
|
||||||
- Modify: `l4d2web/models.py` — `Overlay` gains `script` and `last_build_status` columns; remove `GlobalOverlaySource`, `GlobalOverlayItem`, `GlobalOverlayItemFile` model classes.
|
|
||||||
- Modify: `l4d2web/tests/test_overlay_models.py` (or whichever existing test asserts the Overlay schema; create one if absent) — assert new columns present.
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
1. `tests/test_alembic_migrations.py::test_upgrade_0005_adds_script_columns` — apply migrations to a fresh in-memory SQLite, assert `script` and `last_build_status` columns present on `overlays`, assert no `global_overlay_*` tables, assert old data wipe `DELETE FROM overlays WHERE type IN (...)` is part of the upgrade.
|
|
||||||
2. `tests/test_alembic_migrations.py::test_downgrade_0005_restores_globals` (only if downgrade is supported in the project's migration policy; skip with `pytest.skip` if not — kernel-overlayfs migration is one-way, follow that precedent).
|
|
||||||
3. `tests/test_overlay_models.py::test_overlay_has_script_columns` — `Overlay(...)` instance has `script=''` and `last_build_status=''` defaults.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Migration uses `op.drop_table('global_overlay_item_files')` etc. in correct FK order; uses `op.add_column('overlays', sa.Column('script', sa.Text(), nullable=False, server_default=''))` and similar for `last_build_status` (`sa.String(16)`).
|
|
||||||
- The `DELETE FROM overlays WHERE type IN ('l4d2center_maps','cedapug_maps')` runs *before* the column additions so the operation is straightforward — these rows do not reference the new columns.
|
|
||||||
- `models.py`: delete the three globals model classes outright; add the two new columns to `Overlay` with explicit defaults.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/test_alembic_migrations.py l4d2web/tests/test_overlay_models.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `feat(l4d2-web): script overlay schema — add overlay.script + last_build_status, drop globals tables`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: ScriptBuilder + BUILDERS registry update
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `l4d2web/services/overlay_builders.py` — add `ScriptBuilder`, remove `GlobalMapOverlayBuilder`, change `BUILDERS` dict.
|
|
||||||
- Rewrite: `l4d2web/tests/test_overlay_builders.py` — drop globals-builder tests, add ScriptBuilder tests.
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
1. `test_overlay_builders.py::test_builders_registry` — `set(BUILDERS) == {"workshop", "script"}`. Assert `"l4d2center_maps"` and `"cedapug_maps"` and `"external"` are absent.
|
|
||||||
2. `test_overlay_builders.py::test_script_builder_invokes_helper` — patch `run_with_streamed_output` to capture argv; build an `Overlay(id=42, type='script', script='echo hi')`; assert argv shape `["sudo", "-n", "/usr/local/libexec/left4me/left4me-script-sandbox", "42", <script_path>]` and that the script_path file exists with content `"echo hi"` at invocation time. Verify the tmpfile is unlinked after build.
|
|
||||||
3. `test_overlay_builders.py::test_script_builder_disk_cap` — fake `subprocess.check_output` for `du` to return `25000000000`; build raises `BuildError("disk-cap-exceeded")` and `on_stderr` was called with the cap message.
|
|
||||||
4. `test_overlay_builders.py::test_script_builder_streams_output` — fake `run_with_streamed_output` invokes both `on_stdout("hello\n")` and `on_stderr("warn\n")`; both lambda lists capture the lines.
|
|
||||||
5. `test_overlay_builders.py::test_script_builder_cancel` — `should_cancel` returns True after the first stdout line; assert `run_with_streamed_output` propagated cancellation (the existing helper's contract — the test just ensures we pass `should_cancel` through and don't run the disk-budget check on cancel).
|
|
||||||
6. `test_overlay_builders.py::test_workshop_builder_unchanged` — smoke test that `WorkshopBuilder` still exists and is invokable (regression guard against accidental removal during refactor).
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Add `import os, subprocess, tempfile` at the top of `overlay_builders.py` if not present.
|
|
||||||
- `ScriptBuilder` exactly as in the spec (verbatim copy from the design doc, §Build Lifecycle).
|
|
||||||
- Define a small `BuildError` exception class if one doesn't already exist locally; reuse the existing one if `WorkshopBuilder` already raises a similar type.
|
|
||||||
- `_enforce_disk_budget` calls `subprocess.check_output(["du", "-sb", str(overlay_path(overlay_id))])`; the existing `overlay_path` helper in the module already returns the absolute Path. Parse first whitespace-delimited integer; cap is `20 * 1024**3`.
|
|
||||||
- Job-completion path: locate the existing path that handles `build_overlay` job success/failure (likely in `services/job_worker.py` or a related orchestration module). Add a single column write: on success `last_build_status='ok'`, on `BuildError` / non-zero exit / cancel `last_build_status='failed'`. Add a `tests/test_job_worker.py::test_build_overlay_writes_last_build_status` covering both branches.
|
|
||||||
- Remove `GlobalMapOverlayBuilder` class and any helper functions it owns that are not used elsewhere.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/test_overlay_builders.py l4d2web/tests/test_job_worker.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `feat(l4d2-web): ScriptBuilder + BUILDERS registry update`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Delete global-overlay services + CLI command + their tests
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Delete: `l4d2web/services/global_map_sources.py`
|
|
||||||
- Delete: `l4d2web/services/global_overlay_refresh.py`
|
|
||||||
- Delete: `l4d2web/services/global_map_cache.py`
|
|
||||||
- Delete: `l4d2web/services/global_overlays.py`
|
|
||||||
- Modify: `l4d2web/cli.py` — remove `refresh-global-overlays` command (lines ~44–55). Drop any imports that go orphaned.
|
|
||||||
- Delete: `l4d2web/tests/test_global_map_sources.py`
|
|
||||||
- Delete: `l4d2web/tests/test_global_overlay_models.py`
|
|
||||||
- Delete: `l4d2web/tests/test_global_overlay_builders.py`
|
|
||||||
- Delete: `l4d2web/tests/test_global_overlay_cli.py`
|
|
||||||
- Delete: `l4d2web/tests/test_global_overlay_refresh.py`
|
|
||||||
- Delete: `l4d2web/tests/test_global_overlays.py`
|
|
||||||
- Delete: `l4d2web/tests/test_global_map_cache.py`
|
|
||||||
- Audit & fix: any other module that imports the deleted modules. Likely candidates: `l4d2web/app.py` (CLI registration), `routes/overlay_routes.py`, `routes/page_routes.py`. Resolve by deletion of the dead import / call site, not by stubbing.
|
|
||||||
- Modify: `pyproject.toml` — drop `py7zr` from dependencies (only used by the deleted globals subsystem).
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. RED-first via grep: `grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/` — should return zero hits at the end of this task. Add this as `tests/test_no_globals_references.py::test_no_globals_imports` if you want it as a permanent regression guard, otherwise spot-check.
|
|
||||||
2. Existing `tests/test_cli.py` (or whichever covers Flask CLI) loses any cases for `refresh-global-overlays`; add a `test_refresh_global_overlays_command_removed` that asserts the click command is not registered.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Delete files via `git rm`.
|
|
||||||
- In `cli.py`, remove the command function and its `@app.cli.command(...)` decorator. Drop any helper imports that become orphaned.
|
|
||||||
- Remove `py7zr` from `pyproject.toml` and re-lock if a lockfile is present.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/ -q
|
|
||||||
grep -RIn 'global_map_sources\|global_overlay_refresh\|global_map_cache\|global_overlays\|refresh_global_overlays\|GlobalMapOverlayBuilder' l4d2web/ deploy/ || echo "clean"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `refactor(l4d2-web): drop global-overlays subsystem in favor of script type`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Job worker — drop refresh_global_overlays from scheduler
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `l4d2web/services/job_worker.py` — remove `"refresh_global_overlays"` from `GLOBAL_OPERATIONS`; remove `refresh_global_overlays_running` field from `SchedulerState` and any references in `can_start()`; check whether `blocked_servers_by_overlay` was added solely for the globals subsystem and remove if so.
|
|
||||||
- Modify: `l4d2web/tests/test_job_worker.py` — drop `refresh_global_overlays` truth-table rows; add explicit `build_overlay` truth-table cases for `script`-type overlays (mechanically identical to workshop, but pinned by test).
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `test_job_worker.py::test_global_operations_set` — `GLOBAL_OPERATIONS == {"install", "refresh_workshop_items"}` (or whatever subset remains; pin it).
|
|
||||||
2. `test_job_worker.py::test_build_overlay_script_type_blocks_per_overlay` — start `build_overlay(overlay_id=7)` for a `script`-type overlay; assert second `build_overlay(overlay_id=7)` cannot start; assert `build_overlay(overlay_id=8)` can.
|
|
||||||
3. `test_job_worker.py::test_build_overlay_blocks_server_init_on_blueprint_overlay` — existing test, may need re-pinning if it referenced globals.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Remove the field from the dataclass / TypedDict that backs `SchedulerState`.
|
|
||||||
- Remove any update sites that flipped the flag (the worker's enqueue / on-start / on-complete paths).
|
|
||||||
- The remaining mutex rules (`install` / `refresh_workshop_items` are global; `build_overlay` per-overlay; server ops block on overlays in their blueprint) are unchanged structurally.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/test_job_worker.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `refactor(l4d2-web): drop refresh_global_overlays from scheduler`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: Routes (script update / wipe / build)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `l4d2web/routes/overlay_routes.py` — add three POST endpoints.
|
|
||||||
- Create: `l4d2web/tests/test_script_overlay_routes.py`.
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
1. `test_script_overlay_routes.py::test_create_script_overlay` — POST `/overlays` with form `{"name": "x", "type": "script"}` as a regular user → 302 to detail; row exists with `type='script'`, `script=''`, `last_build_status=''`, `user_id=current_user.id`, `path=str(id)`.
|
|
||||||
2. `test_script_overlay_routes.py::test_admin_creates_system_wide_script_overlay` — admin POST with system-wide flag → row has `user_id=NULL`.
|
|
||||||
3. `test_script_overlay_routes.py::test_update_script_body_enqueues_build` — POST `/overlays/{id}/script` with `{"script": "echo new"}` → row.script updated; one new `build_overlay` job enqueued for the overlay; second immediate POST coalesces (no second job inserted while first is pending).
|
|
||||||
4. `test_script_overlay_routes.py::test_manual_rebuild` — POST `/overlays/{id}/build` → enqueues `build_overlay`; coalesces.
|
|
||||||
5. `test_script_overlay_routes.py::test_wipe_runs_find_delete` — POST `/overlays/{id}/wipe` → invokes `ScriptBuilder.build` (or the underlying helper) with the literal script `find /overlay -mindepth 1 -delete`. After success, row.last_build_status `==''`. Does not enqueue a `build_overlay`.
|
|
||||||
6. `test_script_overlay_routes.py::test_wipe_refuses_during_running_build` — set scheduler state to `build_overlay(overlay_id=7)` running; POST `/overlays/7/wipe` → 409 (or whatever the existing pattern uses for scheduler conflicts), no sandbox invocation.
|
|
||||||
7. `test_script_overlay_routes.py::test_permissions_non_owner_denied` — user A creates private script overlay; user B POSTs `/overlays/{id}/script` → 403.
|
|
||||||
8. `test_script_overlay_routes.py::test_permissions_admin_can_edit_any` — admin POSTs `/overlays/{id}/script` for user A's row → 200.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Mirror the existing `_can_edit_overlay()` permission helper.
|
|
||||||
- The `/wipe` endpoint can either (a) call `ScriptBuilder` directly with a synthetic `Overlay`-like object whose `.script` is the find command and whose `.id` is the real overlay id, or (b) factor a `_run_sandbox(overlay_id, script_text, on_stdout, on_stderr, should_cancel)` helper out of `ScriptBuilder.build()` and call it from both. (b) is cleaner; do (b).
|
|
||||||
- Wipe runs **synchronously** in the request thread (small, fast). It does NOT enqueue a job. Surface log output as flash messages or by streaming through the existing log infra — pick whichever matches the existing wipe-equivalent pattern (workshop overlays don't have a wipe; closest analog is the existing delete-overlay flow).
|
|
||||||
- The `/script` endpoint enqueues via the same `enqueue_build_overlay(overlay_id)` helper used by workshop overlays' add/remove flows. Coalescing is already implemented there.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/test_script_overlay_routes.py l4d2web/tests/test_overlay_routes.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `feat(l4d2-web): script overlay routes (script update / wipe / build)`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 6: Templates (overlays.html + overlay_detail.html)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `l4d2web/templates/overlays.html` — add `script` to the create-modal type radio (lines ~29–49).
|
|
||||||
- Modify: `l4d2web/templates/overlay_detail.html` — add a `{% if overlay.type == 'script' %}` block with textarea + Save / Rebuild / Wipe buttons + status badge; delete the global-source block (lines ~34–46).
|
|
||||||
- Modify: `l4d2web/tests/test_pages.py` — assert script-section renders for type=`script`, workshop-section renders for type=`workshop`, global-source-section is absent.
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `test_pages.py::test_overlay_create_modal_offers_script_type` — GET `/overlays`; HTML contains `value="script"` radio.
|
|
||||||
2. `test_pages.py::test_overlay_detail_script_section` — create script overlay, GET `/overlays/{id}`; HTML contains `<textarea name="script">`, "Rebuild" button, "Wipe" button, status badge element.
|
|
||||||
3. `test_pages.py::test_overlay_detail_workshop_section_unchanged` — existing workshop detail still has thumbnail grid, add-item form, etc.
|
|
||||||
4. `test_pages.py::test_overlay_detail_no_global_source_block` — page HTML has no element from the deleted global-source block (check for an attribute or string unique to that block).
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Detail-page wipe button uses a small confirm-modal pattern (copy from the existing delete-overlay confirm modal).
|
|
||||||
- Status badge: existing CSS classes for ok/warn/error already exist in `static/`; reuse them.
|
|
||||||
- No new JS deps. Plain `<form method="post">` with HTMX `hx-post` for the script update if a streaming UX is desired (match existing patterns).
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/test_pages.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Manual: start dev server (`flask run`), create a script overlay, paste `echo "hi" > foo`, click Save, watch log stream. Then click Wipe; confirm dir is empty. Then click Rebuild; confirm `foo` reappears.
|
|
||||||
|
|
||||||
**Commit:** `feat(l4d2-web): script overlay UI`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 7: Libexec sandbox helper + sudoers + deploy-artifacts test
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` (bash, mode 0755 after deploy, owned root).
|
|
||||||
- Modify: `deploy/files/etc/sudoers.d/left4me` — append the rule.
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` — assert helper file present + sudoers contains the new line.
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
1. `test_deploy_artifacts.py::test_script_sandbox_helper_present` — file exists, mode bits indicate 0755 (or whatever the test framework allows checking pre-deploy), shebang is `#!/bin/bash`.
|
|
||||||
2. `test_deploy_artifacts.py::test_sudoers_includes_script_sandbox_rule` — sudoers file contains the exact line `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox`.
|
|
||||||
3. Optional integration test (skip on non-Linux dev): drive the helper as a subprocess with a synthesized fake `/var/lib/left4me/overlays/1/` and a no-op script, assert `bwrap` invocation happens (use a mock `systemd-run` or `LEFT4ME_SCRIPT_SANDBOX_DRY_RUN=1` env that prints the would-be invocation and exits 0). Mirrors the `LEFT4ME_OVERLAY_PRINT_ONLY=1` pattern from the kernel-overlayfs helper test.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Helper script verbatim from the spec §Sandbox.
|
|
||||||
- Sudoers fragment: append (don't replace existing rules). The existing fragment has rules for `left4me-overlay`, `left4me-systemctl`, `left4me-journalctl` — match the same formatting (one rule per line, no trailing whitespace).
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
|
|
||||||
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `feat(deploy): left4me-script-sandbox helper + sudoers fragment`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 8: Deploy script — provision l4d2-sandbox + bubblewrap; drop globals timer
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `deploy/deploy-test-server.sh` — add `useradd --system ... l4d2-sandbox`, add `apt-get install -y bubblewrap`, ensure helper installation step picks up `left4me-script-sandbox` (likely automatic if it's a glob in `deploy/files/usr/local/libexec/left4me/*`); drop the `mkdir global_overlay_cache` line if present.
|
|
||||||
- Delete: `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.timer`
|
|
||||||
- Delete: `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.service`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` — assert the two unit files are absent; assert `useradd l4d2-sandbox` and `apt-get install ... bubblewrap` lines are present in the deploy script.
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `test_deploy_artifacts.py::test_globals_refresh_units_removed` — files do not exist under `deploy/files/usr/local/lib/systemd/system/`.
|
|
||||||
2. `test_deploy_artifacts.py::test_deploy_script_provisions_sandbox_user` — grep the deploy script for the useradd line.
|
|
||||||
3. `test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap` — grep for `bubblewrap` in apt invocations.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- `useradd` line uses `--system --no-create-home --shell /usr/sbin/nologin`. Idempotency: wrap with `id l4d2-sandbox &>/dev/null || useradd ...`.
|
|
||||||
- `apt-get install`: append `bubblewrap` to whatever package list the script already maintains.
|
|
||||||
- Globals timer/service deletions: `git rm`.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest deploy/tests/ -q
|
|
||||||
shellcheck deploy/deploy-test-server.sh deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `chore(deploy): provision l4d2-sandbox + bubblewrap; drop globals refresh timer`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 9: Full pytest run + drift fixes
|
|
||||||
|
|
||||||
**Files:** as needed across the repo.
|
|
||||||
|
|
||||||
Test plan: run the full test suite for both packages; chase down any drift caused by removed model classes, dropped imports, or template changes.
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/ -q
|
|
||||||
python3 -m pytest l4d2host/tests/ -q
|
|
||||||
python3 -m pytest deploy/tests/ -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Implementation: fix what breaks. Common drift sources to expect:
|
|
||||||
|
|
||||||
- Tests that imported from deleted modules.
|
|
||||||
- Tests that asserted exact `BUILDERS` keyset (good — they should have been updated in Task 2).
|
|
||||||
- Tests that built fixtures with `type='l4d2center_maps'` or `type='cedapug_maps'` — those tests likely belong to the deleted set or need conversion to `type='script'`.
|
|
||||||
- Template snapshot tests (if any) that captured the deleted global-source block.
|
|
||||||
|
|
||||||
**Verification:** all three suites green.
|
|
||||||
|
|
||||||
**Commit:** `chore(l4d2-web): test suite drift fixes after script-overlays migration` (only if drift fixes needed; skip if Tasks 1–8 left the suite green)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## End-to-end deployment verification (manual, on test host)
|
|
||||||
|
|
||||||
After all tasks committed:
|
|
||||||
|
|
||||||
1. **Reset deploy:** run `deploy/deploy-test-server.sh` from clean state. Confirm `bubblewrap` installed (`dpkg -l bubblewrap`), `l4d2-sandbox` user exists (`id l4d2-sandbox`), `/usr/local/libexec/left4me/left4me-script-sandbox` is mode 0755 and root-owned, `sudo -ln` as `left4me` shows the new rule.
|
|
||||||
2. **Sandbox smoke:** as `left4me`, write `/tmp/echo.sh` containing `echo $(whoami) > /overlay/sentinel`. `mkdir -p /var/lib/left4me/overlays/1`. `sudo /usr/local/libexec/left4me/left4me-script-sandbox 1 /tmp/echo.sh`. Confirm `/var/lib/left4me/overlays/1/sentinel` contains `l4d2-sandbox` and is owned by `l4d2-sandbox`. Confirm `/etc/passwd`, `/var/lib/left4me/l4d2web.db`, and `/home` are not visible inside the sandbox by running probe scripts.
|
|
||||||
3. **Resource limits:**
|
|
||||||
- `dd if=/dev/zero of=/overlay/big bs=1M count=25000` → succeeds inside sandbox; `ScriptBuilder._enforce_disk_budget` flags the build failed; `last_build_status='failed'`.
|
|
||||||
- `sleep 7200` → killed at 1 h by `RuntimeMaxSec=3600`.
|
|
||||||
- Memory hog (`python3 -c "x=' '*(5*1024**3)"`) → OOM at 4 GB.
|
|
||||||
4. **App-level happy path:** as a non-admin user, create a script overlay via the UI, paste an old `competitive_rework`-style script, Save → build runs, succeeds, addons appear in `overlays/{id}/left4dead2/`. Stack onto a server blueprint, start the server, verify content mounts via the L4D2 admin console (`map workshop/...`).
|
|
||||||
5. **Wipe:** click Wipe → dir empty (find -delete output in log). Click Rebuild → repopulates. `last_build_status` cycles: `''` → `'ok'`.
|
|
||||||
6. **Scheduler:** start a server using the script overlay; in another browser tab attempt to Rebuild → 409 / scheduler-blocked. Stop server; rebuild succeeds.
|
|
||||||
7. **Audit log:** `journalctl --since "5 min ago" | grep run-` shows transient scopes per build with cgroup memory accounting visible.
|
|
||||||
|
|
||||||
These are not required for any single commit but should pass before declaring the work done.
|
|
||||||
|
|
@ -1,146 +0,0 @@
|
||||||
# L4D2 Script Sandbox v2 Implementation Plan
|
|
||||||
|
|
||||||
> **Approval status:** User-approved 2026-05-08 after smoke-testing the v2 prototype on `ckn@10.0.4.128`.
|
|
||||||
|
|
||||||
**Goal:** Replace the bwrap-based sandbox helper with a systemd-only one per `docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v2-systemd.md`. Drop the `bubblewrap` apt dep. Tighten `left4me.db` file mode to 0640 root:left4me. Update the deploy-artifact tests to assert the new helper shape.
|
|
||||||
|
|
||||||
**Architecture:** See spec. Helper invokes `systemd-run --pipe --wait` in service-unit mode with full hardening directives. No bwrap. Web-app side (`ScriptBuilder`, `run_sandboxed_script`, routes) is unchanged.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
See spec §Locked Decisions for rationale. Implementation summary:
|
|
||||||
|
|
||||||
- Helper file at the same path (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`) is rewritten in place.
|
|
||||||
- The sudoers rule is unchanged.
|
|
||||||
- `bubblewrap` dropped from `apt-get install` / `dnf install` lines.
|
|
||||||
- `left4me.db` chmod 0640 added to deploy script as a post-init step.
|
|
||||||
- Sandbox UID, system user, overlay-dir chown logic, and ScriptBuilder API stay the same.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current Gap
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` invokes `systemd-run --scope ... -- bwrap [namespace flags] /bin/bash /script.sh`.
|
|
||||||
- `deploy/deploy-test-server.sh` line ~84 installs `bubblewrap` via apt/dnf.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` asserts `bwrap`, `--unshare-pid`, `--uid=l4d2-sandbox`, etc.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_bubblewrap` asserts `bubblewrap` is in apt/dnf install lines.
|
|
||||||
- `left4me.db` is created at deploy time with the default 0644 permissions; any host user can read it.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Rewrite the sandbox helper to be systemd-only
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — replace the `systemd-run --scope … bwrap …` invocation with `systemd-run --service --pipe --wait …` carrying the hardening directives.
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `bash -n` syntax check (already covered by `test_script_sandbox_helper_passes_shell_syntax_check`).
|
|
||||||
2. `test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_and_bwrap` is replaced by a new pin: `test_script_sandbox_helper_invokes_systemd_run_with_hardening`. Asserts:
|
|
||||||
- No `bwrap` reference remains.
|
|
||||||
- `systemd-run` is invoked with `--pipe`, `--wait`, `--collect`, `--unit=` (transient service unit form, no `--scope`).
|
|
||||||
- All hardening directives present: `NoNewPrivileges=yes`, `ProtectSystem=strict`, `ProtectHome=yes`, `PrivateTmp=yes`, `PrivateDevices=yes`, `PrivateIPC=yes`, `ProtectKernelTunables=yes`, `ProtectKernelModules=yes`, `ProtectKernelLogs=yes`, `ProtectControlGroups=yes`, `RestrictNamespaces=yes`, `RestrictSUIDSGID=yes`, `LockPersonality=yes`, `MemoryDenyWriteExecute=yes`, `SystemCallFilter=`, `CapabilityBoundingSet=` (empty), `User=l4d2-sandbox`, `Group=l4d2-sandbox`.
|
|
||||||
- `TemporaryFileSystem=` covers `/etc` and `/var/lib`.
|
|
||||||
- `BindReadOnlyPaths=` includes `/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives` and the script bind `${SCRIPT}:/script.sh`.
|
|
||||||
- `BindPaths=` carries the overlay bind.
|
|
||||||
- Cgroup limits unchanged (`MemoryMax=4G`, `MemorySwapMax=0`, `TasksMax=512`, `CPUQuota=200%`, `RuntimeMaxSec=3600`).
|
|
||||||
3. Existing `test_script_sandbox_helper_dry_run_mode` keeps passing — the dry-run guard still short-circuits before `systemd-run`.
|
|
||||||
4. Existing `test_script_sandbox_helper_validates_overlay_id` keeps passing — argument validation is unchanged.
|
|
||||||
|
|
||||||
Implementation: helper body verbatim from the spec §Helper.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
|
|
||||||
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `refactor(deploy): rewrite left4me-script-sandbox to systemd-only — drop bwrap`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Drop bubblewrap apt/dnf dep + tighten left4me.db mode
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Modify: `deploy/deploy-test-server.sh` — remove `bubblewrap` from `apt-get install` / `dnf install` package lists; add a post-init step that ensures `left4me.db` is mode 0640 owned `root:left4me`.
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` — replace `test_deploy_script_installs_bubblewrap` with `test_deploy_script_does_not_install_bubblewrap`; add `test_deploy_script_tightens_left4me_db_permissions`.
|
|
||||||
|
|
||||||
Test plan:
|
|
||||||
|
|
||||||
1. `test_deploy_script_does_not_install_bubblewrap` — for each `apt-get install` / `dnf install` line, `bubblewrap` is absent.
|
|
||||||
2. `test_deploy_script_tightens_left4me_db_permissions` — script contains `chmod 0640 /var/lib/left4me/left4me.db` and `chown root:left4me /var/lib/left4me/left4me.db` (in either order).
|
|
||||||
3. `test_deploy_script_shell_syntax` keeps passing (`sh -n`).
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
- Remove the bare `bubblewrap` token from the two install lines.
|
|
||||||
- After the `alembic upgrade head` step (which creates the DB if missing), add:
|
|
||||||
```
|
|
||||||
$sudo_cmd chown root:left4me /var/lib/left4me/left4me.db
|
|
||||||
$sudo_cmd chmod 0640 /var/lib/left4me/left4me.db
|
|
||||||
```
|
|
||||||
Idempotent — re-runs are no-ops.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest deploy/tests/test_deploy_artifacts.py -q
|
|
||||||
sh -n deploy/deploy-test-server.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `chore(deploy): drop bubblewrap apt dep + tighten left4me.db mode to 0640 root:left4me`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Deploy + smoke-test on the test host
|
|
||||||
|
|
||||||
**Files:** none.
|
|
||||||
|
|
||||||
This is an operational verification step, not a code change. Run `deploy/deploy-test-server.sh ckn@10.0.4.128`, then on the host re-run the same smoke battery used to validate the prototype:
|
|
||||||
|
|
||||||
1. **Identity / privileges**: `id` returns `uid=996 gid=985`; `/proc/self/status` shows `NoNewPrivs: 1` and `CapBnd: 0000000000000000`.
|
|
||||||
2. **Filesystem isolation**: `/etc/passwd` absent, `/etc/alternatives/awk` present, `/var/lib/left4me/left4me.db` absent, `/home` inaccessible, `/usr` not writable, `/overlay` writable.
|
|
||||||
3. **Tools + network**: `awk` resolves through `/etc/alternatives`; `curl https://steamcommunity.com/` returns 200.
|
|
||||||
4. **Cgroup limits**: while a 5s-sleep script runs, `cat /sys/fs/cgroup/.../memory.max` returns `4294967296`; `pids.max` `512`; `cpu.max` `200000 100000`.
|
|
||||||
5. **Memory cap**: 5 GB Python alloc raises `MemoryError`.
|
|
||||||
6. **Wipe**: `find /overlay -mindepth 1 -delete` empties the overlay dir.
|
|
||||||
7. **Seccomp / restriction probes**: `unshare -U`, `mount -t tmpfs`, `setarch -X`, `bpf` setsockopt all fail with EPERM/EINVAL.
|
|
||||||
8. **Build via web UI**: log in as admin, create a script overlay with `echo "hi" > foo`, click Save, confirm job succeeds and `foo` appears in `/var/lib/left4me/overlays/{id}/foo`.
|
|
||||||
9. **DB hardening**: `stat -c "%a %U:%G" /var/lib/left4me/left4me.db` returns `640 root:left4me`.
|
|
||||||
|
|
||||||
Mark this task complete only after every check passes on the live host.
|
|
||||||
|
|
||||||
**Commit:** none (operational verification — record results in conversation/PR description).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Drift sweep + push
|
|
||||||
|
|
||||||
**Files:** as needed across the repo.
|
|
||||||
|
|
||||||
Run the full test suite for all three packages; chase any drift caused by the helper rewrite or deploy-script changes.
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -m pytest l4d2web/tests/ -q
|
|
||||||
python3 -m pytest l4d2host/tests/ -q
|
|
||||||
python3 -m pytest deploy/tests/ -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Implementation: fix what breaks. Expected: nothing new should break, since the Python-side contract is unchanged. If something does, treat it as a sign of an unintended coupling and address.
|
|
||||||
|
|
||||||
Push the commits to `origin/master`.
|
|
||||||
|
|
||||||
**Verification:** all three suites green; `git status` clean; commits visible on `git.sublimity.de/cronekorkn/left4me`.
|
|
||||||
|
|
||||||
**Commit:** none unless drift fixes are needed.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback plan
|
|
||||||
|
|
||||||
If Task 3 surfaces a blocker (a hardening directive breaks a real-world script class, seccomp filter is too narrow, BindPaths semantics differ on the host's systemd version), roll back via `git revert` of Tasks 1+2 and redeploy. Git history preserves both the v1 and v2 helper. The Python side never changed, so reverting only the deploy artifacts is sufficient — no DB migration to undo, no template change to roll back.
|
|
||||||
|
|
@ -1,89 +0,0 @@
|
||||||
# L4D2 Script Sandbox v3 Implementation Plan
|
|
||||||
|
|
||||||
> **Approval status:** User-approved 2026-05-08; implemented and pushed in `7e66936`. This plan is recorded retrospectively for symmetry with the v1 / v2 plans.
|
|
||||||
|
|
||||||
**Goal:** Restrict the sandbox to public-internet egress per `docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v3-egress-filter.md`. Bind a static public-resolver `resolv.conf` into the sandbox.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Locked Decisions (see spec for rationale)
|
|
||||||
|
|
||||||
- `IPAddressDeny=` only; no `IPAddressAllow=any`.
|
|
||||||
- Explicit CIDRs (no `localhost` / `link-local` shorthand keywords — `systemd-run -p` parser rejects them).
|
|
||||||
- Static `nameserver 1.1.1.1` + `nameserver 8.8.8.8` in a sandbox-only resolv.conf.
|
|
||||||
- `AF_UNIX` left enabled.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current Gap (at start of this iteration)
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` (v2) shares the host network namespace with no egress filter.
|
|
||||||
- The helper bind-mounts `/etc/resolv.conf` from the host into the sandbox (which points at private-IP DNS).
|
|
||||||
- `deploy/deploy-test-server.sh` does not install a sandbox-only resolv.conf.
|
|
||||||
- No deploy-artifact tests for `IPAddressDeny=` or for the resolv.conf shape.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Add `IPAddressDeny=`, swap resolv.conf bind, ship the static file
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `deploy/files/etc/left4me/sandbox-resolv.conf` — two `nameserver` lines + a header comment.
|
|
||||||
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add `-p IPAddressDeny="..."` directive (11 explicit CIDRs); replace the `/etc/resolv.conf:/etc/resolv.conf` token in `BindReadOnlyPaths=` with `/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf`.
|
|
||||||
- Modify: `deploy/deploy-test-server.sh` — add an `install -m 0644 -o root -g root .../sandbox-resolv.conf /etc/left4me/sandbox-resolv.conf` line near the existing `host.env` install.
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` — extend `test_script_sandbox_helper_invokes_systemd_run_with_hardening` to assert each CIDR is present and that `IPAddressAllow=any` is **absent** (regression guard); update the BindReadOnlyPaths assertion to expect the sandbox-resolv.conf bind; add `test_sandbox_resolv_conf_exists` and `test_deploy_script_installs_sandbox_resolv_conf`.
|
|
||||||
|
|
||||||
Test plan (RED-first not used here; the work was driven by smoke-test feedback against a live host):
|
|
||||||
|
|
||||||
1. `test_script_sandbox_helper_invokes_systemd_run_with_hardening` — `IPAddressDeny=` present with all 11 CIDRs; no `IPAddressAllow=any`; resolv.conf bind path is `/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf`.
|
|
||||||
2. `test_sandbox_resolv_conf_exists` — file present, ≥2 nameservers, all in non-private space.
|
|
||||||
3. `test_deploy_script_installs_sandbox_resolv_conf` — deploy script references both source path under `deploy/files/etc/left4me/sandbox-resolv.conf` and target path `/etc/left4me/sandbox-resolv.conf`.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
sh -n deploy/deploy-test-server.sh
|
|
||||||
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
python3 -m pytest deploy/tests/ -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** `feat(deploy): restrict script-sandbox egress to public internet only`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Deploy + smoke-test on `ckn@10.0.4.128`
|
|
||||||
|
|
||||||
**Files:** none.
|
|
||||||
|
|
||||||
Run `deploy/deploy-test-server.sh ckn@10.0.4.128`. Then on the host, invoke the helper with a probe script that opens TCP connections to:
|
|
||||||
|
|
||||||
- `1.1.1.1:443` — must connect (public)
|
|
||||||
- `127.0.0.1:8000` — must block (web app on loopback)
|
|
||||||
- `127.0.0.1:22` — must block (sshd on loopback)
|
|
||||||
- `10.0.4.128:22` — must block (host's external SSH on private LAN)
|
|
||||||
- `10.0.0.1:53` — must block (LAN DNS resolver)
|
|
||||||
|
|
||||||
Plus `curl -m 5 https://steamcommunity.com/` end-to-end (DNS + HTTPS) → 200.
|
|
||||||
|
|
||||||
Inside the sandbox, `cat /etc/resolv.conf` must show the two public resolvers.
|
|
||||||
|
|
||||||
If any of the localhost / private targets connects, the deny is being silently overridden — see spec §Locked Decisions point 1.
|
|
||||||
|
|
||||||
**Commit:** none — operational verification.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Lessons surfaced during execution
|
|
||||||
|
|
||||||
These belong in the spec but are repeated here as the "things the next person should not have to rediscover":
|
|
||||||
|
|
||||||
- **`IPAddressAllow=any` silently overrides every `IPAddressDeny=` rule** on this systemd 257 / kernel 6.12 combo, despite documentation stating "more specific rule wins". The negative test (`IPAddressAllow=any not in text`) locks this in.
|
|
||||||
- **systemd-run's `-p` parser rejects the `localhost` / `link-local` / `multicast` shorthand keywords** even though they parse fine in unit files. Use explicit CIDRs.
|
|
||||||
- **`/var/lib/left4me/.../left4me.db` is mode 0644 by default** — writing this file from the web app left it world-readable. Tightening to 0640 root:left4me happens in v2's deploy-script change; v3 does not re-touch it.
|
|
||||||
- **bpftool ships separately on Debian.** It's not needed for runtime, but `apt-get install bpftool` is useful for inspecting `sd_fw_egress` attach state when debugging filter behaviour.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
`git revert 7e66936` and redeploy. The change is purely in deploy artifacts; no app code, no DB migration. Reverting reopens the previous v2 reachability.
|
|
||||||
|
|
@ -1,161 +0,0 @@
|
||||||
# Overlay File Tree Implementation Plan
|
|
||||||
|
|
||||||
> **Approval status:** User-approved 2026-05-08; implemented + deployed in the same session. This plan is committed retrospectively to record the work.
|
|
||||||
|
|
||||||
**Goal:** Build the overlay-detail "Files" section per `docs/superpowers/specs/2026-05-08-overlay-file-tree-design.md` — a server-rendered collapsible tree of `${LEFT4ME_ROOT}/overlays/{overlay.id}/` with HTMX lazy expansion and click-to-download for individual files. Read-only; same access rule as the rest of the overlay detail page.
|
|
||||||
|
|
||||||
**Architecture:** A new `files_bp` blueprint exposes two GETs: `/overlays/<id>/files?path=<rel>` returns the listing as an HTML fragment (used both for first paint at the root level via `page_routes.overlay_detail` context, and for HTMX swaps when a folder expands), and `/overlays/<id>/files/download?path=<rel>` streams a single file. Pure helpers live in `l4d2web/services/overlay_files.py`: `safe_resolve_for_listing` (refuses symlink escape from overlay root), `safe_resolve_for_download` (allows symlink targets anywhere under `LEFT4ME_ROOT` — workshop addons stream from the shared cache; absolute symlinks to `/etc/passwd` are still blocked), and `list_directory` (one-level scan, dirs-first sort, 500-entry cap, symlink + broken-symlink markers, resolved size for files). Two Jinja partials (`_overlay_file_tree.html`, `_overlay_file_node.html`) plus a 12-line event-delegated `static/js/file-tree.js` for collapse/re-expand handle the UI; styles append to `static/css/components.css` against existing tokens.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
See the design doc for rationale. Implementation-relevant summary:
|
|
||||||
|
|
||||||
- New blueprint `files_bp` registered in `l4d2web/app.py` next to `overlay_bp`.
|
|
||||||
- Path resolution chains through `l4d2host.paths.overlay_path()` (already validates the overlay ref + resolves under `LEFT4ME_ROOT/overlays/`) and `l4d2web.services.security.validate_overlay_ref` (rejects empty/`.`/`..`/absolute/whitespace/backslash for the sub-path component).
|
|
||||||
- Listing rule: target must be a descendant of `overlay_root` after `Path.resolve()`. Download rule: real path must be a descendant of `LEFT4ME_ROOT` after `os.path.realpath()`.
|
|
||||||
- Tree shape: single recursive partial. `_overlay_file_tree.html` renders `<ul>`; `_overlay_file_node.html` renders one folder or file `<li>`. Folder buttons carry `data-files-url="/overlays/{id}/files?path=…"`. `static/js/file-tree.js` handles every click — toggles `aria-expanded` + `hidden`, fetches once on first expand, dedupes rapid clicks via `dataset.loaded`.
|
|
||||||
- `DEFAULT_MAX_ENTRIES = 500` in the helper module; re-resolved per call so tests can monkeypatch.
|
|
||||||
- No changes to `l4d2host`, builders, or workshop/script edit flows.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Pure helpers — path safety + directory listing
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `l4d2web/services/overlay_files.py` — `safe_resolve_for_listing`, `safe_resolve_for_download`, `list_directory`, `_format_size`, `DEFAULT_MAX_ENTRIES`.
|
|
||||||
- Create: `l4d2web/tests/test_overlay_files.py` — 20 tests (path safety, listing semantics, symlink + broken-symlink handling, sort order, truncation cap, human-size formatting).
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
1. Listing returns overlay root for empty sub-path; joins under root for nested sub-path; rejects `..`, absolute path, empty component (`foo//bar`); rejects symlink escaping the overlay root even when target sits in `workshop_cache/`.
|
|
||||||
2. Download rejects empty path; returns real path for a regular file; follows a symlink into `workshop_cache/`; rejects a symlink to a path outside `LEFT4ME_ROOT`; rejects `..` and absolute paths.
|
|
||||||
3. `list_directory`: empty dir → empty list, truncated 0; dirs-first then files, both case-insensitive alphabetical; `kind ∈ {"dir", "file"}`; `rel` is forward-slash relative to overlay root; symlinks marked with `is_symlink=True` and resolved-target size; broken symlinks marked `broken=True` with `size=None`; truncation at supplied cap returns first N + `truncated_count`; `size_human` formats `5 B` and `3.0 MB` correctly.
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
|
|
||||||
- `safe_resolve_for_listing` calls `l4d2host.paths.overlay_path(overlay_path_value).resolve()` for the overlay root, short-circuits on empty `sub_path`, validates the sub-path via `validate_overlay_ref`, then `(overlay_root / sub_path).resolve(strict=False)` and asserts the result is the overlay root or a descendant.
|
|
||||||
- `safe_resolve_for_download` rejects empty `sub_path`, validates, builds `overlay_root / sub_path`, applies `os.path.realpath()`, asserts the result is under `get_left4me_root().resolve()`.
|
|
||||||
- `list_directory(target, overlay_root, *, max_entries=None)` uses `os.scandir` (free `stat` cache, `follow_symlinks` toggle). Per entry: `is_symlink = entry.is_symlink()`; `is_dir = entry.is_dir(follow_symlinks=True)` inside a try (OSError → broken=True, kind=file, size=None); regular files use `entry.stat(follow_symlinks=True).st_size`. `rel` is `"/".join(Path(entry.path).relative_to(overlay_root).parts)`. Sort by `(0 if dir else 1, name.casefold())`. Truncate to `max_entries or DEFAULT_MAX_ENTRIES`.
|
|
||||||
- `_format_size`: bytes (`N B`, no decimal) up to 1024, then KB/MB/GB/TB at one decimal place.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/test_overlay_files.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** part of Task 4's bundled `feat` commit.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: HTTP routes — files_bp blueprint
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `l4d2web/routes/files_routes.py` — `files_bp` with `GET /overlays/<id>/files` (fragment) and `GET /overlays/<id>/files/download` (stream).
|
|
||||||
- Modify: `l4d2web/app.py` — `from l4d2web.routes.files_routes import bp as files_bp` and `app.register_blueprint(files_bp)` next to `overlay_bp`.
|
|
||||||
- Create: `l4d2web/tests/test_overlay_files_routes.py` — 16 HTTP-level tests at this stage (3 more added in Task 4).
|
|
||||||
|
|
||||||
Test plan (RED first):
|
|
||||||
|
|
||||||
- Fragment: 200 + entries for root listing; 200 + entries for sub-directory; 400 on `..`, absolute path, empty component; 404 on unknown overlay; 404 on missing sub-dir; 403 on foreign user's overlay; 200 for admin viewing foreign overlay; truncation cap exposes "+ N more" footer (monkeypatch `DEFAULT_MAX_ENTRIES`); broken symlink rendered with `broken` badge and no `<a>` link.
|
|
||||||
- Download: 200 + `Content-Disposition: attachment` + exact byte match for regular file; 200 + cache content for workshop-cache symlink; 400 for symlink resolving outside `LEFT4ME_ROOT`; 400 for directory target; 404 for missing file; 403 for foreign user's overlay.
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
|
|
||||||
- Decorator stack: `@files_bp.get(...)` + `@require_login`. Auth gate inside the handler mirrors `page_routes.overlay_detail:194` (`g.user.admin or overlay.user_id is None or overlay.user_id == g.user.id`).
|
|
||||||
- Shared `_load_overlay_for_user(overlay_id, user)` does the lookup, the auth gate, and `db.expunge(overlay)` so the route can read scalar attributes after the session closes.
|
|
||||||
- `ValueError` from either resolver → `Response("invalid path", status=400)`. `target.is_dir()` failure on the listing route → 404. `real.exists()` / `real.is_dir()` failure on the download route → 404 / 400.
|
|
||||||
- `send_file(str(real), as_attachment=True, download_name=os.path.basename(real))`.
|
|
||||||
- The fragment renders `_overlay_file_tree.html` only — no `base.html` shell — so HTMX swaps inject just the `<ul>` content.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/test_overlay_files_routes.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** part of Task 4's bundled `feat` commit.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Templates + page-routes integration
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `l4d2web/templates/_overlay_file_tree.html` — `<ul class="file-tree" role="group">` + per-entry `_overlay_file_node.html` include + optional truncated-footer `<li>`.
|
|
||||||
- Create: `l4d2web/templates/_overlay_file_node.html` — folder row (button + HTMX attrs + empty `<div class="file-tree-children" hidden>`) or file row (`<a>` for regular/symlink files; `<span>` for broken symlinks; `link` / `broken link` badges; `size_human`).
|
|
||||||
- Modify: `l4d2web/templates/overlay_detail.html` — add `<section class="panel"><h2>Files</h2>…</section>` between the type-specific sections and the existing "Used by" section. Renders empty-state `<p class="muted">No files yet — build this overlay to populate it.</p>` when `file_tree_root_entries is none`, else includes the partial.
|
|
||||||
- Modify: `l4d2web/routes/page_routes.py` — import the helpers, add `_root_file_tree(overlay)` (returns `(entries, truncated_count)` or `(None, 0)` on `ValueError` / missing dir / legacy absolute `overlay.path`), pass `file_tree_root_entries` + `file_tree_truncated` + `file_tree_truncated_count` into `render_template("overlay_detail.html", …)`.
|
|
||||||
|
|
||||||
Test plan (RED first, added to `test_overlay_files_routes.py`):
|
|
||||||
|
|
||||||
- `test_overlay_detail_renders_files_section_with_tree` — page contains "Files" header + entry names.
|
|
||||||
- `test_overlay_detail_shows_empty_state_when_overlay_dir_missing` — wipe directory, page shows "No files yet".
|
|
||||||
- `test_overlay_detail_files_section_present_for_workshop_overlays` — workshop type also gets the section.
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
|
|
||||||
- Section placement matters: `<section><h2>Files</h2>…</section>` is inserted before the existing "Used by" `<section>`.
|
|
||||||
- The partial uses `{% set entries = file_tree_root_entries %}` etc. so the same partial works whether called from the page (with full context) or from the HTMX route (rendering directly with named kwargs).
|
|
||||||
- `_root_file_tree` swallows `ValueError` and missing-dir cases into `(None, 0)`, and the template's `{% if file_tree_root_entries is none %}` renders the empty state.
|
|
||||||
- Use `overlay.path` (not `str(overlay.id)`) so legacy/seeded rows whose path differs still work correctly when resolvable.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/test_overlay_files_routes.py -q -k overlay_detail
|
|
||||||
pytest l4d2web/tests/ -q # no regressions across the full suite
|
|
||||||
```
|
|
||||||
|
|
||||||
**Commit:** part of Task 4's bundled `feat` commit.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: CSS + JS + base.html script wiring
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
|
|
||||||
- Create: `l4d2web/static/js/file-tree.js` — event-delegated `click` handler that toggles `aria-expanded` on `.file-tree-toggle` and `hidden` on the next `.file-tree-children` sibling, and on first expand fires `fetch(button.dataset.filesUrl)` and innerHTMLs the response. `dataset.loaded` flag dedupes rapid clicks; cleared on error to allow retry.
|
|
||||||
- Modify: `l4d2web/templates/base.html` — `<script src="{{ url_for('static', filename='js/file-tree.js') }}"></script>` next to the existing `csrf.js` / `sse.js` / `modal.js` lines.
|
|
||||||
- Modify: `l4d2web/static/css/components.css` — append `~50` lines: `.file-tree`, `.file-tree-row`, `.file-tree-toggle` (transparent button, inherits color), `.file-tree-toggle .chevron` rotation transform on `aria-expanded="true"`, `.file-tree-children[hidden]`, `.file-tree-badge` + `.file-tree-badge-warn`. All against existing tokens (`--space-xs`, `--space-l`, `--color-surface-muted`, `--color-muted`, `--color-danger`, `--radius-s`).
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
|
|
||||||
- The JS handler fires on every click. First-expand path: read `button.dataset.filesUrl`, set `dataset.loaded="1"` optimistically, `fetch(url, {credentials: "same-origin"})`, replace `.file-tree-children` innerHTML with the response. Subsequent clicks just toggle `aria-expanded` + `hidden` — no re-fetch since `dataset.loaded` is set. On fetch error: `delete dataset.loaded` so a future click retries.
|
|
||||||
- The CSS chevron is a Unicode `›` inside a `<span class="chevron">`; rotated 90° on expanded via `transform: rotate(90deg)` with a 120ms transition.
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/ -q # 293 passed, 1 skipped
|
|
||||||
```
|
|
||||||
|
|
||||||
Manual smoke (post-deploy on `ckn@10.0.4.128`):
|
|
||||||
|
|
||||||
- Navigate to an overlay detail page with a populated runtime directory.
|
|
||||||
- Confirm the "Files" section renders the root level.
|
|
||||||
- Click a folder: HTMX request fires once, children appear, chevron rotates.
|
|
||||||
- Click again: children hide; no second request in DevTools network tab.
|
|
||||||
- Click a file: browser downloads it with the correct filename.
|
|
||||||
- Visit another user's overlay as a non-admin: 403.
|
|
||||||
|
|
||||||
**Commit:** `feat(l4d2-web): overlay detail Files section with HTMX file tree + downloads` — covers all four tasks (services helper + routes + templates + CSS/JS), since the feature is small and the tasks share a single set of integration tests.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## End-to-end verification
|
|
||||||
|
|
||||||
After all tasks committed:
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/ -q # 293 passed, 1 skipped
|
|
||||||
deploy/deploy-test-server.sh ckn@10.0.4.128
|
|
||||||
ssh ckn@10.0.4.128 'systemctl status left4me-web --no-pager | head -10'
|
|
||||||
curl -s http://10.0.4.128:8000/health # {"status":"ok"}
|
|
||||||
```
|
|
||||||
|
|
||||||
Then exercise the manual smoke checklist from Task 4 against the deployed instance.
|
|
||||||
|
|
@ -1,140 +0,0 @@
|
||||||
# Per-overlay `server.cfg` aliases — opt-in via blueprint checkbox
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
L4D2 overlays stack via kernel overlayfs. When two overlays both ship `left4dead2/cfg/server.cfg`, the topmost wins; lower-layer copies become unreachable. On top of that, the blueprint's own `server.cfg` is copied into `merged/.../cfg/server.cfg` at instance start (`l4d2host/instances.py:112-115`), so the merged view's `server.cfg` is always the blueprint's.
|
|
||||||
|
|
||||||
We want a per-blueprint opt-in mechanism: for each linked overlay, the blueprint owner can check a box to expose that overlay's `server.cfg` as a reloadable alias under a known name. The alias is identified by overlay id (`server_overlay_<id>.cfg`), so it's stable across overlay renames and namespaced.
|
|
||||||
|
|
||||||
Trade-off accepted: only checked overlays are addressable in the in-game console. That's intentional — explicit opt-in beats automatic exposure of every overlay's config.
|
|
||||||
|
|
||||||
Earlier rounds of this plan considered (and rejected):
|
|
||||||
- Doing it in each script overlay: too easy to forget, doesn't scale.
|
|
||||||
- A name-as-slug constraint with auto-aliasing for every overlay: more invasive (regex on names, blueprint-wide collision checks) and exposes everything by default.
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
- New boolean column `expose_server_cfg` on `BlueprintOverlay` (per-blueprint, per-overlay state).
|
|
||||||
- Blueprint detail page: each linked overlay row gets a checkbox labeled with its alias (`exec server_overlay_<id>`).
|
|
||||||
- Spec yaml carries an optional `alias` per overlay; web app sets it to `overlay_<id>` when the box is checked, otherwise omits it.
|
|
||||||
- Host copies `<lowerdir>/left4dead2/cfg/server.cfg` → `merged/left4dead2/cfg/server_<alias>.cfg` at instance start, only for entries with `alias` set and an existing source. Pre-sweep removes stale aliases from prior starts.
|
|
||||||
- **Auto-inject `exec` lines into the blueprint's final `server.cfg`**: for each opted-in overlay, prepend `exec server_overlay_<id>` to the config list, in `BlueprintOverlay.position` ascending order (lowest overlay first, highest last), with the user's custom config lines appended after. Source-style cfg semantics: later lines override earlier ones, so this gives "lowest overlay's settings → higher overlay's settings → blueprint customizations" in the right precedence.
|
|
||||||
|
|
||||||
No constraint on `Overlay.name`. No alias / slug column on `Overlay`. The previously-added manual `cp` in `competitive_rework.sh` gets reverted (the framework will do it when checked).
|
|
||||||
|
|
||||||
## Changes
|
|
||||||
|
|
||||||
### 1. Schema
|
|
||||||
|
|
||||||
`l4d2web/models.py` — `BlueprintOverlay` gets:
|
|
||||||
```python
|
|
||||||
expose_server_cfg: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False, server_default=text("0"))
|
|
||||||
```
|
|
||||||
|
|
||||||
`l4d2web/alembic/versions/0007_blueprint_overlay_expose_server_cfg.py` — new Alembic migration:
|
|
||||||
- `op.add_column("blueprint_overlays", sa.Column("expose_server_cfg", sa.Boolean(), nullable=False, server_default=sa.text("0")))`
|
|
||||||
- Downgrade drops the column.
|
|
||||||
|
|
||||||
### 2. Spec contract (host ↔ web)
|
|
||||||
|
|
||||||
`l4d2host/spec.py`: replace `overlays: list[str]` with typed refs.
|
|
||||||
```python
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class OverlayRef:
|
|
||||||
path: str
|
|
||||||
alias: str | None = None # if set, copy server.cfg to server_<alias>.cfg in merged
|
|
||||||
|
|
||||||
@dataclass(slots=True)
|
|
||||||
class InstanceSpec:
|
|
||||||
port: int
|
|
||||||
overlays: list[OverlayRef] = field(default_factory=list)
|
|
||||||
arguments: list[str] = field(default_factory=list)
|
|
||||||
config: list[str] = field(default_factory=list)
|
|
||||||
```
|
|
||||||
`load_spec` accepts both shapes per overlay entry: a bare string is treated as `OverlayRef(path=string)` (back-compat for hand-written specs and existing tests); a dict carries `path` and optional `alias`.
|
|
||||||
|
|
||||||
`l4d2web/services/l4d2_facade.py`:
|
|
||||||
- `load_server_blueprint_bundle`: change select to `select(Overlay.id, Overlay.path, BlueprintOverlay.expose_server_cfg)`, ordered by `BlueprintOverlay.position` ascending (already the case). Returns the raw list of (id, path, expose) tuples to the caller.
|
|
||||||
- `build_server_spec_payload`:
|
|
||||||
- Emit overlays as dicts: `{"path": p}` if not exposed, `{"path": p, "alias": f"overlay_{i}"}` if exposed.
|
|
||||||
- Build `exec_lines = [f"exec server_overlay_{i}" for i, _, expose in rows if expose]` — same ordering as overlays (lowest first).
|
|
||||||
- Set `config = exec_lines + json.loads(blueprint.config)`. Net effect: `exec` lines appear at the top of the written `instance_dir/server.cfg`, blueprint custom lines follow.
|
|
||||||
|
|
||||||
### 3. Lowerdir construction (host)
|
|
||||||
|
|
||||||
`l4d2host/instances.py:44`:
|
|
||||||
```python
|
|
||||||
lowerdirs = [str(overlay_path(o.path, root=root)) for o in spec.overlays]
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Per-overlay copy in `start_instance` (host)
|
|
||||||
|
|
||||||
`l4d2host/instances.py`, after the existing main `server.cfg` copy. New block:
|
|
||||||
```python
|
|
||||||
emit_step("copying per-overlay server.cfg aliases...", on_stdout, passthrough)
|
|
||||||
cfg_dir = runtime_dir / "merged" / "left4dead2" / "cfg"
|
|
||||||
for stale in cfg_dir.glob("server_*.cfg"):
|
|
||||||
stale.unlink()
|
|
||||||
for o in spec.overlays:
|
|
||||||
if not o.alias:
|
|
||||||
continue
|
|
||||||
src = root / "overlays" / o.path / "left4dead2" / "cfg" / "server.cfg"
|
|
||||||
if not src.exists():
|
|
||||||
continue
|
|
||||||
shutil.copy2(src, cfg_dir / f"server_{o.alias}.cfg")
|
|
||||||
```
|
|
||||||
- Sweep first: prevents orphans when a checkbox is unticked or an overlay is removed from the blueprint.
|
|
||||||
- Skip overlays with no `alias` (not opted in) and overlays whose lower dir has no `server.cfg` (workshop overlays etc.).
|
|
||||||
- Writes go to the upper layer of the overlayfs mount; lower dirs untouched.
|
|
||||||
|
|
||||||
### 5. Blueprint detail UI
|
|
||||||
|
|
||||||
`l4d2web/templates/blueprint_detail.html` — extend each linked-overlay `<li>` with a checkbox + label showing the alias inline.
|
|
||||||
|
|
||||||
`l4d2web/routes/page_routes.py` `blueprint_page`: also pass an `overlay_expose_state: dict[int, bool]` keyed by overlay_id so the template can read the current `expose_server_cfg` value.
|
|
||||||
|
|
||||||
`l4d2web/routes/blueprint_routes.py` (`replace_blueprint_overlays` and its callers): also read `expose_server_cfg_ids` from the form (`request.form.getlist("expose_server_cfg_ids")`), convert to `set[int]`, and set `BlueprintOverlay.expose_server_cfg = (overlay_id in expose_set)` per row.
|
|
||||||
|
|
||||||
### 6. Revert the manual cp
|
|
||||||
|
|
||||||
`examples/script-overlays/competitive_rework.sh`: remove the `cp "$DEST/cfg/server.cfg" "$DEST/cfg/server_competitive.cfg"` block added in the previous round. The framework handles this on demand now.
|
|
||||||
|
|
||||||
## Critical files
|
|
||||||
|
|
||||||
- `l4d2web/models.py` — `BlueprintOverlay.expose_server_cfg`
|
|
||||||
- `l4d2web/alembic/versions/0007_*.py` — new Alembic migration
|
|
||||||
- `l4d2web/routes/blueprint_routes.py` — read checkbox set on save, persist expose flag
|
|
||||||
- `l4d2web/routes/page_routes.py` — pass overlay state map to template
|
|
||||||
- `l4d2web/templates/blueprint_detail.html` — checkbox + alias display
|
|
||||||
- `l4d2web/services/l4d2_facade.py` — emit alias per overlay in spec payload + prepend exec lines
|
|
||||||
- `l4d2host/spec.py` — `OverlayRef` dataclass + spec deserialization
|
|
||||||
- `l4d2host/instances.py` — lowerdir construction + per-overlay copy step + sweep
|
|
||||||
- `examples/script-overlays/competitive_rework.sh` — remove manual `cp`
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- No constraint on `Overlay.name`.
|
|
||||||
- No `cfg_alias` / `slug` column on `Overlay`.
|
|
||||||
- No per-blueprint custom alias text (id-based naming is fixed: `overlay_<id>`).
|
|
||||||
- No automatic detection of which overlays ship a `server.cfg` to gate the checkbox in UI — checkbox is always available; the host silently skips at start time if the source doesn't exist.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. **Unit tests**:
|
|
||||||
- `l4d2host/tests/`: `start_instance` test where two overlays exist on disk — one with `server.cfg`, one without; spec marks both with aliases; assert only the one with a source produces `server_<alias>.cfg` in merged. Pre-existing `server_old.cfg` in merged is swept.
|
|
||||||
- `l4d2host/tests/`: spec yaml round-trip test for `OverlayRef` with and without `alias`; back-compat test for bare-string entries.
|
|
||||||
- `l4d2web/tests/`: blueprint payload build asserts overlays without `expose_server_cfg` produce no `alias`; with, produce `overlay_<id>`.
|
|
||||||
- `l4d2web/tests/`: blueprint payload `config` field equals `["exec server_overlay_<id_low>", "exec server_overlay_<id_high>", *blueprint_custom_lines]` — `exec` lines in `BlueprintOverlay.position` ascending order, custom lines last, no exec lines for unchecked overlays.
|
|
||||||
- `l4d2web/tests/`: form submit with `expose_server_cfg_ids=[6, 8]` updates the matching `BlueprintOverlay` rows; unchecked rows reset to false.
|
|
||||||
- Run: `pytest l4d2host/tests -q`, `pytest l4d2web/tests -q`.
|
|
||||||
|
|
||||||
2. **End-to-end on the test server (`ckn@10.0.4.128`)**:
|
|
||||||
- Deploy via `deploy/deploy-test-server.sh`.
|
|
||||||
- Blueprint detail: each linked overlay shows a checkbox with its alias label.
|
|
||||||
- Tick the box for `competitive_rework`; save; reload; checkbox stays checked.
|
|
||||||
- Start a server using that blueprint: `ls /var/lib/left4me/runtime/<name>/merged/left4dead2/cfg/server_*.cfg` → shows `server_overlay_<id>.cfg` for the checked overlay only.
|
|
||||||
- Inspect the written `server.cfg`: `head -n 5 /var/lib/left4me/instances/<name>/server.cfg` → top lines are `exec server_overlay_<id>` for each checked overlay in lowest-first order, followed by the blueprint's custom lines.
|
|
||||||
- In-game console: server boot should auto-load the per-overlay configs.
|
|
||||||
- Untick the box, restart the server → `server_overlay_<id>.cfg` no longer present in merged, and the corresponding `exec` line is no longer in the written `server.cfg`.
|
|
||||||
|
|
||||||
3. **Negative**: tick an overlay that doesn't ship a `server.cfg` (e.g. a workshop overlay) → start succeeds, no alias file produced (host skipped silently).
|
|
||||||
|
|
@ -1,260 +0,0 @@
|
||||||
# L4D2 CPU Isolation Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Constrain every cgroup that isn't a live game server to core 0; give game servers cores 1..N-1 exclusively, scaled automatically across host sizes.
|
|
||||||
|
|
||||||
**Architecture:** Four `99-left4me-cpuset.conf` drop-ins under `/etc/systemd/system/{system,user,l4d2-build,l4d2-game}.slice.d/`, written by the deploy script from heredocs. `LEFT4ME_SYSTEM_CPUS` (default `0`) and `LEFT4ME_GAME_CPUS` (default `1-$((NPROC-1))`) are env-var overrides. Single-core hosts skip the cpuset writes with a warning.
|
|
||||||
|
|
||||||
**Tech Stack:** systemd cgroup-v2 `AllowedCPUs=` directive, bash heredoc + `install`, Linux `nproc(1)`, pytest text-assertion tests.
|
|
||||||
|
|
||||||
**Spec:** `docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
Files to modify:
|
|
||||||
|
|
||||||
- `deploy/deploy-test-server.sh` — compute `NPROC`, default `LEFT4ME_SYSTEM_CPUS=0` / `LEFT4ME_GAME_CPUS=1-$((NPROC-1))`, write four drop-in files. Skip when `nproc < 2` (with stderr warning) unless either env var is set explicitly.
|
|
||||||
- `deploy/README.md` — append a "CPU isolation" subsection inside the existing "Performance Tuning" section.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — new test functions.
|
|
||||||
|
|
||||||
No host library or web app changes.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Pre-flight
|
|
||||||
|
|
||||||
- [ ] **Step 0a: Verify clean working tree**
|
|
||||||
|
|
||||||
Run: `git status`
|
|
||||||
Expected: `nothing to commit, working tree clean`
|
|
||||||
|
|
||||||
- [ ] **Step 0b: Verify the existing deploy tests are at the known-good baseline**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
|
||||||
Expected: 35 passed, 1 failed (the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state`).
|
|
||||||
|
|
||||||
If the count differs, stop and surface — this plan assumes that exact baseline.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Deploy-script CPU-isolation block + tests
|
|
||||||
|
|
||||||
Write the four drop-ins from the deploy script in one cohesive block. The block computes `NPROC` once, resolves both env vars (with defaults), guards single-core hosts, and writes each drop-in via the existing `install -m 0644 -o root -g root` pattern. Tests cover defaults, overrides, single-core skip, and drop-in paths.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/deploy-test-server.sh`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
|
||||||
|
|
||||||
- [ ] **Step 1.1: Add the failing test**
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py` and append (after the `test_deploy_script_installs_perf_artifacts` from the perf-baseline branch):
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_deploy_script_writes_cpuset_drop_ins():
|
|
||||||
script = DEPLOY_SCRIPT.read_text()
|
|
||||||
|
|
||||||
# Reads nproc and binds defaults via ${VAR:-...}.
|
|
||||||
assert "nproc" in script
|
|
||||||
assert "LEFT4ME_SYSTEM_CPUS" in script
|
|
||||||
assert "LEFT4ME_GAME_CPUS" in script
|
|
||||||
assert "${LEFT4ME_SYSTEM_CPUS:-0}" in script
|
|
||||||
# Default game-core expression: 1-(nproc-1). Match the form the
|
|
||||||
# implementer chose; both `1-$((NPROC-1))` and `1-$((nproc-1))` are
|
|
||||||
# acceptable as long as the upper bound is computed from nproc.
|
|
||||||
assert ("1-$((NPROC-1))" in script) or ("1-$((nproc-1))" in script) \
|
|
||||||
or ("LEFT4ME_GAME_CPUS:-1-" in script)
|
|
||||||
|
|
||||||
# All four drop-in paths.
|
|
||||||
for slice_name in ("system", "user", "l4d2-build", "l4d2-game"):
|
|
||||||
assert f"/etc/systemd/system/{slice_name}.slice.d/99-left4me-cpuset.conf" in script
|
|
||||||
|
|
||||||
# Drop-ins use the existing install pattern.
|
|
||||||
assert "install -m 0644 -o root -g root" in script
|
|
||||||
|
|
||||||
# Single-core host: skip with a warning to stderr.
|
|
||||||
# Match either an explicit `nproc < 2` / `-lt 2` guard or `[ "$nproc" -ge 2 ]` form.
|
|
||||||
assert ("nproc" in script) and (("-lt 2" in script) or ("-ge 2" in script) or ("< 2" in script))
|
|
||||||
assert "skipping CPU isolation" in script.lower() or "skip cpu isolation" in script.lower()
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 1.2: Run the new test, verify it fails**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_writes_cpuset_drop_ins -v`
|
|
||||||
Expected: FAIL — none of the new strings exist yet.
|
|
||||||
|
|
||||||
- [ ] **Step 1.3: Edit the deploy script — add the cpuset block**
|
|
||||||
|
|
||||||
Open `deploy/deploy-test-server.sh`. Find the block that copies the slice files (added in the perf-baseline branch, around lines 139–140):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-game.slice /usr/local/lib/systemd/system/l4d2-game.slice
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-build.slice /usr/local/lib/systemd/system/l4d2-build.slice
|
|
||||||
```
|
|
||||||
|
|
||||||
Immediately after that pair, before any of the helper-script copies that follow, insert this block:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
# CPU isolation via cgroup-v2 AllowedCPUs= drop-ins. Pin everything that
|
|
||||||
# isn't a live game server to core 0; give game servers cores 1..N-1.
|
|
||||||
# See docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md.
|
|
||||||
NPROC=$(nproc)
|
|
||||||
SYSTEM_CPUS=${LEFT4ME_SYSTEM_CPUS:-0}
|
|
||||||
if [ "${LEFT4ME_GAME_CPUS+x}" = x ]; then
|
|
||||||
GAME_CPUS=$LEFT4ME_GAME_CPUS
|
|
||||||
else
|
|
||||||
GAME_CPUS="1-$((NPROC - 1))"
|
|
||||||
fi
|
|
||||||
if [ "$NPROC" -lt 2 ] && [ -z "${LEFT4ME_SYSTEM_CPUS+x}${LEFT4ME_GAME_CPUS+x}" ]; then
|
|
||||||
printf 'left4me deploy: skipping CPU isolation (nproc=%s); cpuset drop-ins not written.\n' "$NPROC" >&2
|
|
||||||
else
|
|
||||||
for slice_name in system user l4d2-build; do
|
|
||||||
$sudo_cmd mkdir -p "/etc/systemd/system/${slice_name}.slice.d"
|
|
||||||
printf '[Slice]\nAllowedCPUs=%s\n' "$SYSTEM_CPUS" \
|
|
||||||
| $sudo_cmd install -m 0644 -o root -g root /dev/stdin \
|
|
||||||
"/etc/systemd/system/${slice_name}.slice.d/99-left4me-cpuset.conf"
|
|
||||||
done
|
|
||||||
$sudo_cmd mkdir -p "/etc/systemd/system/l4d2-game.slice.d"
|
|
||||||
printf '[Slice]\nAllowedCPUs=%s\n' "$GAME_CPUS" \
|
|
||||||
| $sudo_cmd install -m 0644 -o root -g root /dev/stdin \
|
|
||||||
"/etc/systemd/system/l4d2-game.slice.d/99-left4me-cpuset.conf"
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
Notes for the implementer:
|
|
||||||
|
|
||||||
- The single-core skip only triggers when **neither** override is set. If the operator sets either `LEFT4ME_SYSTEM_CPUS` or `LEFT4ME_GAME_CPUS` explicitly on a single-core host, honor their intent.
|
|
||||||
- `install -m 0644 -o root -g root /dev/stdin <dest>` is the idiomatic way to install a small generated file from a pipeline (matches the existing pattern for sandbox-resolv.conf, just with `/dev/stdin` as source).
|
|
||||||
- The `mkdir -p` for each `.d` directory is required: systemd reads drop-ins only from existing directories.
|
|
||||||
|
|
||||||
- [ ] **Step 1.4: Verify shell syntax still parses**
|
|
||||||
|
|
||||||
Run: `sh -n deploy/deploy-test-server.sh`
|
|
||||||
Expected: exit 0, no output.
|
|
||||||
|
|
||||||
- [ ] **Step 1.5: Run the new test and full deploy test suite**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
|
||||||
Expected: 36 passed, 1 failed (the pre-existing unrelated test, count goes from 35→36 because of the new test).
|
|
||||||
|
|
||||||
If your specific assertion forms in Step 1.1 don't match the implementation, adjust the test — but only the `or` branches; do not weaken the contract.
|
|
||||||
|
|
||||||
- [ ] **Step 1.6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/deploy-test-server.sh deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(deploy): cgroup-v2 cpuset drop-ins pin system to core 0, game to rest
|
|
||||||
|
|
||||||
Computes NPROC at deploy time. Defaults LEFT4ME_SYSTEM_CPUS=0 and
|
|
||||||
LEFT4ME_GAME_CPUS=1-(NPROC-1). Single-core hosts skip cpuset writes
|
|
||||||
with a stderr warning unless an env var override is set. Spec:
|
|
||||||
docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: README "CPU isolation" subsection
|
|
||||||
|
|
||||||
Append a subsection to `deploy/README.md` inside the existing "Performance Tuning" section, documenting the layout, the env-var overrides, the single-core skip, and the relationship to the existing per-instance `CPUAffinity=` escape hatch.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/README.md`
|
|
||||||
|
|
||||||
No test for this task — README content is documentation, not contract.
|
|
||||||
|
|
||||||
- [ ] **Step 2.1: Append the CPU isolation subsection**
|
|
||||||
|
|
||||||
Open `deploy/README.md`. Find the existing `### Per-instance CPU affinity` subsection (added in the perf-baseline branch). Insert a new subsection **immediately before** it (so the slice-level isolation is documented before the per-instance refinement that builds on top). The new subsection content:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
### CPU isolation (cores)
|
|
||||||
|
|
||||||
The deploy script writes four `AllowedCPUs=` drop-ins so that, by default, only `l4d2-game.slice` is allowed to run on cores 1..N-1; `system.slice`, `user.slice`, and `l4d2-build.slice` are pinned to core 0. Game servers thus get the host minus core 0 exclusively, the build sandbox and the web app stay on core 0, and a logged-in admin running CPU-heavy work in their shell can't steal cycles from a live match.
|
|
||||||
|
|
||||||
Override the split by setting either env var when running the deploy:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
LEFT4ME_SYSTEM_CPUS="0,1" LEFT4ME_GAME_CPUS="2-7" deploy/deploy-test-server.sh deploy-user@host
|
|
||||||
```
|
|
||||||
|
|
||||||
On single-core hosts the deploy skips the cpuset drop-ins entirely and prints a warning to stderr; the rest of the perf baseline (cgroup weights, sysctls, OOM scores) still applies. To force isolation on a single-core host anyway (rarely useful), set either env var explicitly.
|
|
||||||
|
|
||||||
Per-instance `CPUAffinity=` (next subsection) composes on top of this — the per-instance value must be a subset of `l4d2-game.slice`'s `AllowedCPUs=`, which the kernel enforces.
|
|
||||||
```
|
|
||||||
|
|
||||||
(The outer triple-backticks above are markdown punctuation around this prompt block, not part of the README content. Inner code-block fences DO need to be written into the README. The `markdown` language tag on the outer fence in this plan is documentation-only.)
|
|
||||||
|
|
||||||
- [ ] **Step 2.2: Run the full deploy test suite**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
|
||||||
Expected: 36 passed, 1 failed (unchanged; README has no test).
|
|
||||||
|
|
||||||
- [ ] **Step 2.3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/README.md
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
docs(deploy): document CPU isolation in performance-tuning section
|
|
||||||
|
|
||||||
Explains the core-0-vs-game-cores split, the LEFT4ME_SYSTEM_CPUS /
|
|
||||||
LEFT4ME_GAME_CPUS overrides, the single-core skip, and the
|
|
||||||
subset-of relationship with per-instance CPUAffinity=.
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Final Verification
|
|
||||||
|
|
||||||
- [ ] **Step F.1: Full deploy + host + web test sweep**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ l4d2host/tests l4d2web/tests -q`
|
|
||||||
Expected: deploy 36 passed / 1 failed (pre-existing); host 111 passed / 1 skipped; web 313 passed / 1 skipped.
|
|
||||||
|
|
||||||
- [ ] **Step F.2: Working tree clean and commits in order**
|
|
||||||
|
|
||||||
Run: `git status && git log --oneline -5`
|
|
||||||
Expected:
|
|
||||||
- `git status`: clean.
|
|
||||||
- Top of `git log`:
|
|
||||||
1. `docs(deploy): document CPU isolation in performance-tuning section`
|
|
||||||
2. `feat(deploy): cgroup-v2 cpuset drop-ins pin system to core 0, game to rest`
|
|
||||||
3. `docs(plans): l4d2 cpu isolation — implementation plan`
|
|
||||||
4. `docs(specs): l4d2 cpu isolation — design`
|
|
||||||
|
|
||||||
- [ ] **Step F.3: Operator-side smoke test (deferred, not part of this plan)**
|
|
||||||
|
|
||||||
This plan ships artifacts. Confirming systemd actually enforces `AllowedCPUs=` on a real Trixie host is operator-side:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deploy/deploy-test-server.sh deploy-user@example-host
|
|
||||||
ssh deploy-user@example-host '
|
|
||||||
systemctl cat system.slice | grep AllowedCPUs
|
|
||||||
systemctl cat l4d2-game.slice | grep AllowedCPUs
|
|
||||||
cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective
|
|
||||||
cat /sys/fs/cgroup/l4d2-game.slice/cpuset.cpus.effective
|
|
||||||
'
|
|
||||||
# Expect on an 8-core box:
|
|
||||||
# system.slice → AllowedCPUs=0 → cpuset.cpus.effective = 0
|
|
||||||
# l4d2-game.slice → AllowedCPUs=1-7 → cpuset.cpus.effective = 1-7
|
|
||||||
```
|
|
||||||
|
|
||||||
End-to-end behavioural test (manual, ops-side): on a 4-core host, run two L4D2 instances + a script-sandbox build simultaneously. Confirm via `htop` (with affinity column on) that the srcds processes only ever appear on cores 1, 2, 3 and the sandbox + web stay on core 0.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Out of Scope (do NOT implement here)
|
|
||||||
|
|
||||||
- Kernel `isolcpus=` / `nohz_full=` / `rcu_nocbs=` boot params.
|
|
||||||
- NIC IRQ pinning automation.
|
|
||||||
- Per-instance `CPUAffinity=` driven by a deploy-env knob.
|
|
||||||
- A separate `l4d2-web.slice`.
|
|
||||||
- Any web-app or host-library code changes.
|
|
||||||
|
|
||||||
If you find yourself touching any of these, stop — they belong in a separate spec.
|
|
||||||
|
|
@ -1,686 +0,0 @@
|
||||||
# L4D2 Server Host Perf Baseline Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Apply a host-side performance and resource-isolation baseline (systemd directives, slice hierarchy, host sysctls) to every L4D2 server instance, leaving game ConVars to the maintainer.
|
|
||||||
|
|
||||||
**Architecture:** Add resource-control directives to `left4me-server@.service`; introduce two flat top-level slices (`l4d2-game.slice` weight 1000, `l4d2-build.slice` weight 10) so the build sandbox is starved by the kernel under contention; ship `/etc/sysctl.d/99-left4me.conf` for UDP buffer and netdev tuning; place the script-sandbox transient unit into `l4d2-build.slice` with `OOMScoreAdjust=500`. RT scheduling, CPU governor, CPUAffinity, NIC tuning are documentation-only escape hatches.
|
|
||||||
|
|
||||||
**Tech Stack:** systemd unit files (service + slice), `systemd-run` properties, Linux sysctl, bash deploy script, pytest text-assertion tests under `deploy/tests/test_deploy_artifacts.py`.
|
|
||||||
|
|
||||||
**Spec:** `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
Files to create:
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice` — high-weight slice for game-server instances.
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice` — low-weight slice for sandboxed script-overlay builds.
|
|
||||||
- `deploy/files/etc/sysctl.d/99-left4me.conf` — host UDP/netdev/swap sysctls.
|
|
||||||
|
|
||||||
Files to modify:
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-server@.service` — add resource-control directives (`Slice`, `Nice`, `IOSchedulingClass`, `IOSchedulingPriority`, `OOMScoreAdjust`, `MemoryHigh`, `MemoryMax`, `TasksMax`, `LimitNOFILE`, `KillSignal`, `TimeoutStopSec`, `LogRateLimitIntervalSec`).
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add `--slice=l4d2-build.slice` and `-p OOMScoreAdjust=500` to the `systemd-run` invocation.
|
|
||||||
- `deploy/deploy-test-server.sh` — copy the two slice files and the sysctl conf during deploy; run `sysctl --system` so values take effect immediately.
|
|
||||||
- `deploy/README.md` — append a "Performance tuning" section with the four documented escape hatches.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — new tests for each artifact above (text assertions following the existing `assert "X" in text` style).
|
|
||||||
|
|
||||||
No application code (Python, Flask, host library) is touched.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Pre-flight
|
|
||||||
|
|
||||||
- [ ] **Step 0a: Verify clean working tree**
|
|
||||||
|
|
||||||
Run: `git status`
|
|
||||||
Expected: `nothing to commit, working tree clean`
|
|
||||||
|
|
||||||
- [ ] **Step 0b: Verify the existing deploy tests pass**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
|
||||||
Expected: all green.
|
|
||||||
|
|
||||||
If any test is already red, stop and surface — this plan assumes the baseline is green.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Per-Instance Unit Resource-Control Directives
|
|
||||||
|
|
||||||
Add the per-instance baseline to `left4me-server@.service`. This task is self-contained even though `Slice=l4d2-game.slice` references a slice that doesn't exist yet — systemd does not validate the reference until the unit is actually started, and the deploy artifact tests are pure text checks.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
|
|
||||||
- Test: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
|
||||||
|
|
||||||
- [ ] **Step 1.1: Add the failing test**
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py` and append (after `test_server_unit_contains_required_runtime_contract`):
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_server_unit_contains_perf_baseline_directives():
|
|
||||||
unit = SERVER_UNIT.read_text()
|
|
||||||
|
|
||||||
# Slice membership.
|
|
||||||
assert "Slice=l4d2-game.slice" in unit
|
|
||||||
|
|
||||||
# CFS priority bump (no SCHED_FIFO).
|
|
||||||
assert "Nice=-5" in unit
|
|
||||||
assert "CPUSchedulingPolicy=" not in unit
|
|
||||||
|
|
||||||
# I/O priority.
|
|
||||||
assert "IOSchedulingClass=best-effort" in unit
|
|
||||||
assert "IOSchedulingPriority=4" in unit
|
|
||||||
|
|
||||||
# OOM ordering: game servers survive, sandbox dies first.
|
|
||||||
assert "OOMScoreAdjust=-200" in unit
|
|
||||||
|
|
||||||
# Memory caps with headroom for map-load spikes.
|
|
||||||
assert "MemoryHigh=1.5G" in unit
|
|
||||||
assert "MemoryMax=2G" in unit
|
|
||||||
|
|
||||||
# Bounded fork surface.
|
|
||||||
assert "TasksMax=256" in unit
|
|
||||||
|
|
||||||
# Plenty of fds for plugin-heavy setups.
|
|
||||||
assert "LimitNOFILE=65536" in unit
|
|
||||||
|
|
||||||
# srcds clean shutdown via SIGINT, with time to flush.
|
|
||||||
assert "KillSignal=SIGINT" in unit
|
|
||||||
assert "TimeoutStopSec=15s" in unit
|
|
||||||
|
|
||||||
# Per-unit override of journald rate limiting (default drops srcds output).
|
|
||||||
assert "LogRateLimitIntervalSec=0" in unit
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 1.2: Run the new test, verify it fails**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_server_unit_contains_perf_baseline_directives -v`
|
|
||||||
Expected: FAIL — first failing assert is on `Slice=l4d2-game.slice`.
|
|
||||||
|
|
||||||
- [ ] **Step 1.3: Edit the unit file**
|
|
||||||
|
|
||||||
Open `deploy/files/usr/local/lib/systemd/system/left4me-server@.service` and replace its contents with:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me server instance %i
|
|
||||||
After=network-online.target
|
|
||||||
Wants=network-online.target
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=simple
|
|
||||||
User=left4me
|
|
||||||
Group=left4me
|
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
|
||||||
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
|
||||||
WorkingDirectory=/var/lib/left4me/runtime/%i/merged/left4dead2
|
|
||||||
ExecStart=/var/lib/left4me/installation/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
|
||||||
Restart=on-failure
|
|
||||||
RestartSec=5
|
|
||||||
|
|
||||||
# Resource control baseline — see docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
Slice=l4d2-game.slice
|
|
||||||
Nice=-5
|
|
||||||
IOSchedulingClass=best-effort
|
|
||||||
IOSchedulingPriority=4
|
|
||||||
OOMScoreAdjust=-200
|
|
||||||
MemoryHigh=1.5G
|
|
||||||
MemoryMax=2G
|
|
||||||
TasksMax=256
|
|
||||||
LimitNOFILE=65536
|
|
||||||
KillSignal=SIGINT
|
|
||||||
TimeoutStopSec=15s
|
|
||||||
LogRateLimitIntervalSec=0
|
|
||||||
|
|
||||||
# Hardening (unchanged from previous baseline).
|
|
||||||
NoNewPrivileges=true
|
|
||||||
PrivateTmp=true
|
|
||||||
PrivateDevices=true
|
|
||||||
ProtectHome=true
|
|
||||||
ProtectSystem=strict
|
|
||||||
ReadOnlyPaths=/var/lib/left4me/installation /var/lib/left4me/overlays
|
|
||||||
ReadWritePaths=/var/lib/left4me/runtime/%i
|
|
||||||
RestrictSUIDSGID=true
|
|
||||||
LockPersonality=true
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 1.4: Run the new test, verify it passes**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_server_unit_contains_perf_baseline_directives -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 1.5: Re-run the existing server-unit test, verify still passes**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_server_unit_contains_required_runtime_contract -v`
|
|
||||||
Expected: PASS — the existing assertions (`User=left4me`, `Group=left4me`, hardening directives, etc.) still match.
|
|
||||||
|
|
||||||
- [ ] **Step 1.6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/files/usr/local/lib/systemd/system/left4me-server@.service deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(deploy): perf-baseline directives on left4me-server@.service
|
|
||||||
|
|
||||||
Slice=l4d2-game.slice, Nice=-5, IOSchedulingClass=best-effort,
|
|
||||||
OOMScoreAdjust=-200, MemoryHigh=1.5G, MemoryMax=2G, TasksMax=256,
|
|
||||||
LimitNOFILE=65536, KillSignal=SIGINT, TimeoutStopSec=15s,
|
|
||||||
LogRateLimitIntervalSec=0. Spec:
|
|
||||||
docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Slice Unit Files
|
|
||||||
|
|
||||||
Create the two slice unit files. After this task the perf unit's `Slice=l4d2-game.slice` reference is satisfied.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice`
|
|
||||||
- Create: `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice`
|
|
||||||
- Test: `deploy/tests/test_deploy_artifacts.py` (new constants + new test functions)
|
|
||||||
|
|
||||||
- [ ] **Step 2.1: Add path constants and failing tests**
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py`. After the existing `SERVER_UNIT = ...` line, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
GAME_SLICE = DEPLOY / "files/usr/local/lib/systemd/system/l4d2-game.slice"
|
|
||||||
BUILD_SLICE = DEPLOY / "files/usr/local/lib/systemd/system/l4d2-build.slice"
|
|
||||||
```
|
|
||||||
|
|
||||||
After the new `test_server_unit_contains_perf_baseline_directives`, append:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_l4d2_game_slice_exists_with_high_weights():
|
|
||||||
assert GAME_SLICE.is_file()
|
|
||||||
text = GAME_SLICE.read_text()
|
|
||||||
assert "[Slice]" in text
|
|
||||||
assert "CPUWeight=1000" in text
|
|
||||||
assert "IOWeight=1000" in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_l4d2_build_slice_exists_with_low_weights():
|
|
||||||
assert BUILD_SLICE.is_file()
|
|
||||||
text = BUILD_SLICE.read_text()
|
|
||||||
assert "[Slice]" in text
|
|
||||||
assert "CPUWeight=10" in text
|
|
||||||
assert "IOWeight=10" in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2.2: Run the new tests, verify they fail**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_l4d2_game_slice_exists_with_high_weights deploy/tests/test_deploy_artifacts.py::test_l4d2_build_slice_exists_with_low_weights -v`
|
|
||||||
Expected: FAIL on `assert GAME_SLICE.is_file()` (file does not exist).
|
|
||||||
|
|
||||||
- [ ] **Step 2.3: Create the game slice file**
|
|
||||||
|
|
||||||
Create `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice` with:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me game-server slice
|
|
||||||
Before=slices.target
|
|
||||||
|
|
||||||
[Slice]
|
|
||||||
CPUWeight=1000
|
|
||||||
IOWeight=1000
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2.4: Create the build slice file**
|
|
||||||
|
|
||||||
Create `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice` with:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me script-sandbox build slice
|
|
||||||
Before=slices.target
|
|
||||||
|
|
||||||
[Slice]
|
|
||||||
CPUWeight=10
|
|
||||||
IOWeight=10
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2.5: Run the new tests, verify they pass**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_l4d2_game_slice_exists_with_high_weights deploy/tests/test_deploy_artifacts.py::test_l4d2_build_slice_exists_with_low_weights -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 2.6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/files/usr/local/lib/systemd/system/l4d2-game.slice deploy/files/usr/local/lib/systemd/system/l4d2-build.slice deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(deploy): l4d2-game.slice + l4d2-build.slice with 100:1 weight ratio
|
|
||||||
|
|
||||||
Flat top-level slices. Game wins under contention; build still gets
|
|
||||||
the box when uncontended. Referenced by left4me-server@.service and
|
|
||||||
the script-sandbox systemd-run invocation.
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Host Sysctls
|
|
||||||
|
|
||||||
Ship a `/etc/sysctl.d/` drop-in for UDP buffers, netdev backlog, netdev budget, and `vm.swappiness`.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/etc/sysctl.d/99-left4me.conf`
|
|
||||||
- Test: `deploy/tests/test_deploy_artifacts.py` (new constant + new test function)
|
|
||||||
|
|
||||||
- [ ] **Step 3.1: Add path constant and failing test**
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py`. After the slice constants, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
SYSCTL_CONF = DEPLOY / "files/etc/sysctl.d/99-left4me.conf"
|
|
||||||
```
|
|
||||||
|
|
||||||
Append a new test:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_sysctl_conf_present_with_perf_settings():
|
|
||||||
assert SYSCTL_CONF.is_file()
|
|
||||||
text = SYSCTL_CONF.read_text()
|
|
||||||
for line in (
|
|
||||||
"net.core.rmem_max = 8388608",
|
|
||||||
"net.core.wmem_max = 8388608",
|
|
||||||
"net.core.rmem_default = 524288",
|
|
||||||
"net.core.wmem_default = 524288",
|
|
||||||
"net.core.netdev_max_backlog = 5000",
|
|
||||||
"net.core.netdev_budget = 600",
|
|
||||||
"vm.swappiness = 10",
|
|
||||||
):
|
|
||||||
assert line in text, f"missing {line!r} in 99-left4me.conf"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3.2: Run the new test, verify it fails**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_sysctl_conf_present_with_perf_settings -v`
|
|
||||||
Expected: FAIL on `assert SYSCTL_CONF.is_file()`.
|
|
||||||
|
|
||||||
- [ ] **Step 3.3: Create the sysctl conf file**
|
|
||||||
|
|
||||||
Create `deploy/files/etc/sysctl.d/99-left4me.conf` with:
|
|
||||||
|
|
||||||
```
|
|
||||||
# Host-side perf baseline for left4me — see
|
|
||||||
# docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
#
|
|
||||||
# UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
|
|
||||||
# Source-engine UDP across multiple instances. 8 MiB matches the standard
|
|
||||||
# 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
|
|
||||||
# explicitly enlarge their buffers.
|
|
||||||
net.core.rmem_max = 8388608
|
|
||||||
net.core.wmem_max = 8388608
|
|
||||||
net.core.rmem_default = 524288
|
|
||||||
net.core.wmem_default = 524288
|
|
||||||
|
|
||||||
# Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
|
|
||||||
# at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
|
|
||||||
# netdev_budget = 600 gives softirq more drain headroom per pass.
|
|
||||||
net.core.netdev_max_backlog = 5000
|
|
||||||
net.core.netdev_budget = 600
|
|
||||||
|
|
||||||
# Latency-sensitive default: avoid swap unless the box is really under
|
|
||||||
# pressure. Harmless on swapless hosts.
|
|
||||||
vm.swappiness = 10
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3.4: Run the new test, verify it passes**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_sysctl_conf_present_with_perf_settings -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 3.5: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/files/etc/sysctl.d/99-left4me.conf deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(deploy): host sysctls for UDP buffers + netdev backlog/budget
|
|
||||||
|
|
||||||
99-left4me.conf: rmem_max/wmem_max=8M (with 512K defaults),
|
|
||||||
netdev_max_backlog=5000, netdev_budget=600, vm.swappiness=10.
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Sandbox in Build Slice
|
|
||||||
|
|
||||||
Place the script-sandbox transient unit into `l4d2-build.slice` and give it `OOMScoreAdjust=500` so it dies first under memory pressure.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
|
|
||||||
- Test: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
|
||||||
|
|
||||||
- [ ] **Step 4.1: Add the failing test**
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py`. Append:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_script_sandbox_in_build_slice_with_oom_adjust():
|
|
||||||
text = SCRIPT_SANDBOX_HELPER.read_text()
|
|
||||||
|
|
||||||
# Put the transient unit in the low-weight build slice so it yields to
|
|
||||||
# game-server instances under CPU/IO contention.
|
|
||||||
assert "--slice=l4d2-build.slice" in text
|
|
||||||
|
|
||||||
# Sandbox dies first if the host hits memory pressure; servers
|
|
||||||
# (OOMScoreAdjust=-200) survive.
|
|
||||||
assert "-p OOMScoreAdjust=500" in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4.2: Run the new test, verify it fails**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_script_sandbox_in_build_slice_with_oom_adjust -v`
|
|
||||||
Expected: FAIL — neither string is in the helper yet.
|
|
||||||
|
|
||||||
- [ ] **Step 4.3: Edit the sandbox helper**
|
|
||||||
|
|
||||||
Open `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`. Locate the `systemd-run` invocation that begins with:
|
|
||||||
|
|
||||||
```
|
|
||||||
systemd-run --quiet --collect --wait --pipe \
|
|
||||||
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
|
||||||
```
|
|
||||||
|
|
||||||
Insert two new lines immediately after the `--unit=` line, before `-p User=l4d2-sandbox`. The block becomes:
|
|
||||||
|
|
||||||
```
|
|
||||||
systemd-run --quiet --collect --wait --pipe \
|
|
||||||
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
|
||||||
--slice=l4d2-build.slice \
|
|
||||||
-p OOMScoreAdjust=500 \
|
|
||||||
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
|
|
||||||
```
|
|
||||||
|
|
||||||
Leave every other `-p` line untouched.
|
|
||||||
|
|
||||||
- [ ] **Step 4.4: Verify shell syntax still parses**
|
|
||||||
|
|
||||||
Run: `bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
|
|
||||||
Expected: exit 0, no output.
|
|
||||||
|
|
||||||
- [ ] **Step 4.5: Run the new test and the existing sandbox-helper tests, verify they pass**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_script_sandbox_in_build_slice_with_oom_adjust deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_with_hardening deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_passes_shell_syntax_check -v`
|
|
||||||
Expected: PASS for all three. The hardening test still matches because it only checks for substring presence; we added strings, didn't remove any.
|
|
||||||
|
|
||||||
- [ ] **Step 4.6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/files/usr/local/libexec/left4me/left4me-script-sandbox deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(deploy): script-sandbox runs in l4d2-build.slice + OOMScoreAdjust=500
|
|
||||||
|
|
||||||
Builds yield CPU/IO to game-server instances under contention via the
|
|
||||||
slice's weight=10, and are killed first under memory pressure
|
|
||||||
(servers have OOMScoreAdjust=-200).
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: Deploy Script Installs Slice + Sysctl Artifacts
|
|
||||||
|
|
||||||
Wire the new artifacts into `deploy-test-server.sh` so a fresh deploy actually puts them on disk and applies the sysctls.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/deploy-test-server.sh`
|
|
||||||
- Test: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
|
||||||
|
|
||||||
- [ ] **Step 5.1: Add the failing test**
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py`. Append:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_deploy_script_installs_perf_artifacts():
|
|
||||||
script = DEPLOY_SCRIPT.read_text()
|
|
||||||
|
|
||||||
# Slice files copied into the system-wide systemd unit dir.
|
|
||||||
assert "/usr/local/lib/systemd/system/l4d2-game.slice" in script
|
|
||||||
assert "/usr/local/lib/systemd/system/l4d2-build.slice" in script
|
|
||||||
|
|
||||||
# Sysctl drop-in installed under /etc/sysctl.d/.
|
|
||||||
assert "/etc/sysctl.d/99-left4me.conf" in script
|
|
||||||
|
|
||||||
# Values applied immediately, not on next boot.
|
|
||||||
assert "sysctl --system" in script
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 5.2: Run the new test, verify it fails**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_perf_artifacts -v`
|
|
||||||
Expected: FAIL on the first assertion.
|
|
||||||
|
|
||||||
- [ ] **Step 5.3: Edit the deploy script — copy the slice + sysctl files**
|
|
||||||
|
|
||||||
Open `deploy/deploy-test-server.sh`. Find the block that copies unit files (currently around line 138):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-web.service /usr/local/lib/systemd/system/left4me-web.service
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-server@.service /usr/local/lib/systemd/system/left4me-server@.service
|
|
||||||
```
|
|
||||||
|
|
||||||
Add two new lines immediately after the `left4me-server@.service` copy line, so the block becomes:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-web.service /usr/local/lib/systemd/system/left4me-web.service
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-server@.service /usr/local/lib/systemd/system/left4me-server@.service
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-game.slice /usr/local/lib/systemd/system/l4d2-game.slice
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-build.slice /usr/local/lib/systemd/system/l4d2-build.slice
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 5.4: Edit the deploy script — install the sysctl conf and apply it**
|
|
||||||
|
|
||||||
In `deploy/deploy-test-server.sh`, find the block that installs `/etc/left4me/sandbox-resolv.conf` (currently around lines 153–155):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
$sudo_cmd install -m 0644 -o root -g root \
|
|
||||||
/opt/left4me/deploy/files/etc/left4me/sandbox-resolv.conf \
|
|
||||||
/etc/left4me/sandbox-resolv.conf
|
|
||||||
```
|
|
||||||
|
|
||||||
Immediately after that block, add:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
# Host perf-baseline sysctls. Apply with `sysctl --system` so values
|
|
||||||
# take effect this deploy, not on next reboot.
|
|
||||||
$sudo_cmd install -m 0644 -o root -g root \
|
|
||||||
/opt/left4me/deploy/files/etc/sysctl.d/99-left4me.conf \
|
|
||||||
/etc/sysctl.d/99-left4me.conf
|
|
||||||
$sudo_cmd sysctl --system >/dev/null
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 5.5: Verify the deploy script's shell syntax still parses**
|
|
||||||
|
|
||||||
Run: `sh -n deploy/deploy-test-server.sh`
|
|
||||||
Expected: exit 0, no output.
|
|
||||||
|
|
||||||
- [ ] **Step 5.6: Run the new test and the existing deploy-script tests, verify they pass**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_perf_artifacts deploy/tests/test_deploy_artifacts.py::test_deploy_script_has_safe_defaults_and_preserves_state deploy/tests/test_deploy_artifacts.py::test_deploy_script_shell_syntax -v`
|
|
||||||
Expected: PASS for all three.
|
|
||||||
|
|
||||||
- [ ] **Step 5.7: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/deploy-test-server.sh deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(deploy): install slice + sysctl artifacts and apply via sysctl --system
|
|
||||||
|
|
||||||
Copies l4d2-game.slice and l4d2-build.slice into
|
|
||||||
/usr/local/lib/systemd/system/, installs 99-left4me.conf into
|
|
||||||
/etc/sysctl.d/, and runs sysctl --system so the perf baseline is
|
|
||||||
live this deploy, not on next reboot.
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 6: Performance-Tuning Section in deploy/README.md
|
|
||||||
|
|
||||||
Document the four escape hatches the spec lists as opt-in: CPU governor, per-instance `CPUAffinity`, NIC tuning, and SCHED_FIFO.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/README.md`
|
|
||||||
|
|
||||||
No test for this task — README content is documentation, not contract.
|
|
||||||
|
|
||||||
- [ ] **Step 6.1: Append the Performance Tuning section**
|
|
||||||
|
|
||||||
Open `deploy/README.md`. Append (after the existing final paragraph) a new section:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Performance Tuning
|
|
||||||
|
|
||||||
The deployment ships a host-side perf baseline (slices, unit directives, sysctls). See `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md` for design rationale.
|
|
||||||
|
|
||||||
The following knobs are documented escape hatches — they are **not** auto-applied. Apply only if you have measured a need and understand the failure modes.
|
|
||||||
|
|
||||||
### CPU governor
|
|
||||||
|
|
||||||
The performance governor squeezes a few percent off jitter under bursty load. `schedutil` is acceptable for sustained UDP workloads.
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo cpupower frequency-set -g performance
|
|
||||||
```
|
|
||||||
|
|
||||||
Persist via your distro's CPU-frequency tooling (e.g. `/etc/default/cpufrequtils`).
|
|
||||||
|
|
||||||
### Per-instance CPU affinity
|
|
||||||
|
|
||||||
`srcds` is single-threaded per instance. On a multi-core host, pinning each instance to its own core can cut jitter under contention. Drop in `/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
CPUAffinity=2
|
|
||||||
```
|
|
||||||
|
|
||||||
A reasonable strategy on an N-core host: leave core 0 for the kernel + IRQs + system services, then pin one instance per remaining core.
|
|
||||||
|
|
||||||
### NIC tuning
|
|
||||||
|
|
||||||
Hardware-specific. On a host with a single primary interface (replace `eth0`):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo ethtool -G eth0 rx 4096 tx 4096
|
|
||||||
sudo ethtool -K eth0 gro on lro off
|
|
||||||
```
|
|
||||||
|
|
||||||
If you run a high instance count, also pin the NIC's interrupts off the cores that game servers occupy (see `/proc/interrupts` and `/proc/irq/<n>/smp_affinity`).
|
|
||||||
|
|
||||||
### Real-time scheduling (advanced, opt-in)
|
|
||||||
|
|
||||||
Source-engine servers do not need real-time scheduling, and a misbehaving `srcds` at any RT priority can starve kernel threads — even with the default `kernel.sched_rt_runtime_us=950000` throttling 5% of CPU back. Use only if you have a measured jitter problem that the baseline does not solve.
|
|
||||||
|
|
||||||
`/etc/systemd/system/left4me-server@.service.d/realtime.conf`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
CPUSchedulingPolicy=fifo
|
|
||||||
CPUSchedulingPriority=10
|
|
||||||
LimitRTPRIO=10
|
|
||||||
```
|
|
||||||
|
|
||||||
### Applying changes to running servers
|
|
||||||
|
|
||||||
Unit-file changes do not apply to already-running services. After any change:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo systemctl daemon-reload
|
|
||||||
# Restart each game server via the web UI's stop + start, or:
|
|
||||||
sudo systemctl restart 'left4me-server@*.service'
|
|
||||||
```
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 6.2: Run the full deploy test suite and verify it stays green**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
|
||||||
Expected: all green. README changes have no test, but should not break any existing tests.
|
|
||||||
|
|
||||||
- [ ] **Step 6.3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/README.md
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
docs(deploy): performance-tuning escape-hatch section in README
|
|
||||||
|
|
||||||
Documents CPU governor, per-instance CPUAffinity, NIC tuning, and
|
|
||||||
SCHED_FIFO opt-in patterns. None of these are auto-applied; they're
|
|
||||||
ops-side knobs for measured problems the perf baseline doesn't solve.
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Final Verification
|
|
||||||
|
|
||||||
- [ ] **Step F.1: Full deploy test suite green**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ -q`
|
|
||||||
Expected: all green.
|
|
||||||
|
|
||||||
- [ ] **Step F.2: Host library + web tests still green (regression check)**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2host/tests -q && pytest l4d2web/tests -q`
|
|
||||||
Expected: all green. Nothing in this plan touches host or web Python code, but a clean run rules out accidental import-time damage.
|
|
||||||
|
|
||||||
- [ ] **Step F.3: Working tree clean and commits in order**
|
|
||||||
|
|
||||||
Run: `git status && git log --oneline -8`
|
|
||||||
Expected:
|
|
||||||
- `git status`: `nothing to commit, working tree clean`.
|
|
||||||
- `git log`: six new commits in this order, top-most first:
|
|
||||||
1. `docs(deploy): performance-tuning escape-hatch section in README`
|
|
||||||
2. `feat(deploy): install slice + sysctl artifacts and apply via sysctl --system`
|
|
||||||
3. `feat(deploy): script-sandbox runs in l4d2-build.slice + OOMScoreAdjust=500`
|
|
||||||
4. `feat(deploy): host sysctls for UDP buffers + netdev backlog/budget`
|
|
||||||
5. `feat(deploy): l4d2-game.slice + l4d2-build.slice with 100:1 weight ratio`
|
|
||||||
6. `feat(deploy): perf-baseline directives on left4me-server@.service`
|
|
||||||
|
|
||||||
If any step is missing or out of order, do not amend — diagnose, fix, and create new commits.
|
|
||||||
|
|
||||||
- [ ] **Step F.4: Manual deploy smoke test (deferred, ops-side)**
|
|
||||||
|
|
||||||
This plan ships artifacts. Confirming that systemd actually accepts and applies them on a real host requires running the deploy script against a test target. That validation is operator-side, not part of this implementation:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deploy/deploy-test-server.sh deploy-user@example-host
|
|
||||||
ssh deploy-user@example-host 'systemctl cat l4d2-game.slice'
|
|
||||||
ssh deploy-user@example-host 'sysctl net.core.rmem_max' # expect 8388608
|
|
||||||
ssh deploy-user@example-host 'systemd-analyze verify /usr/local/lib/systemd/system/left4me-server@.service'
|
|
||||||
```
|
|
||||||
|
|
||||||
Document any deploy-time problems back into the spec or this plan as v1.x corrections. Do not invent fixes that go beyond the spec.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Out of Scope (do NOT implement here)
|
|
||||||
|
|
||||||
Listed in the spec — repeated for clarity:
|
|
||||||
|
|
||||||
- ConVars / blueprint arguments / tickrate / sv_minrate.
|
|
||||||
- SCHED_FIFO auto-apply.
|
|
||||||
- CPU governor auto-apply.
|
|
||||||
- Per-instance `CPUAffinity` auto-apply.
|
|
||||||
- NIC ring-buffer / IRQ-pinning code.
|
|
||||||
- Job-scheduler awareness ("don't build while server X has players").
|
|
||||||
- Hardening tightening (`ProtectKernelTunables=yes`, etc.).
|
|
||||||
|
|
||||||
If you find yourself touching any of these, stop — they belong in a separate spec.
|
|
||||||
|
|
@ -1,584 +0,0 @@
|
||||||
# L4D2 Server Lifecycle: Reboot-Safe + Drift Reconciliation Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Make L4D2 server instances survive a host reboot (Part A) and converge `Server.actual_state` to systemd reality every ~30s for out-of-band drift (Part B).
|
|
||||||
|
|
||||||
**Architecture:** Helper script + `service_control.py` switch from `systemctl start/stop` to `systemctl enable --now / disable --now`. A new background thread spawned with the job workers polls every server's status periodically and writes the result via the existing `refresh_server_actual_state()` path. Skip servers with in-flight jobs to avoid racing with the post-job refresh.
|
|
||||||
|
|
||||||
**Tech Stack:** bash helper script + sudoers; Python `subprocess` via `l4d2host.service_control.systemctl_command`; SQLAlchemy via `session_scope()`; threading; pytest.
|
|
||||||
|
|
||||||
**Spec:** `docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
Files to modify (Part A — lifecycle verb change):
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-systemctl` — accept verbs `enable`/`disable`/`show` (drop `start`/`stop`).
|
|
||||||
- `l4d2host/service_control.py` — rename `start_service` → `enable_service`, `stop_service` → `disable_service`. Action tokens become `"enable"` / `"disable"`.
|
|
||||||
- `l4d2host/instances.py` — call `enable_service` from `start_instance`; call `disable_service` from `stop_instance` and `_purge_instance`.
|
|
||||||
- `l4d2host/tests/test_lifecycle.py` — update mock-call expectations.
|
|
||||||
- `l4d2host/tests/test_service_control.py` — new file with direct unit tests for `enable_service` / `disable_service`.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args` — update the verb assertions.
|
|
||||||
|
|
||||||
Files to modify (Part B — poller):
|
|
||||||
|
|
||||||
- `l4d2web/services/job_worker.py` — add `start_state_poller`, `state_poller_loop`, `poll_all_servers`.
|
|
||||||
- `l4d2web/app.py` — call `start_state_poller(app)` next to `start_job_workers(app)`.
|
|
||||||
- `l4d2web/config.py` — default `STATE_POLLER_INTERVAL_SECONDS = 30`.
|
|
||||||
- `l4d2web/tests/test_job_worker.py` — four new tests for the poller.
|
|
||||||
|
|
||||||
No host-library, web-app facade, or CLI surface signatures change. The `l4d2ctl start <name>` / `l4d2ctl stop <name>` commands keep their names (per `AGENTS.md`).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Pre-flight
|
|
||||||
|
|
||||||
- [ ] **Step 0a: Verify clean working tree**
|
|
||||||
|
|
||||||
Run: `git status`
|
|
||||||
Expected: `nothing to commit, working tree clean`
|
|
||||||
|
|
||||||
- [ ] **Step 0b: Verify the existing test suite is at the known-good baseline**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ l4d2host/tests l4d2web/tests -q`
|
|
||||||
Expected: 460 passed, 1 failed (the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state`), 2 skipped.
|
|
||||||
|
|
||||||
If the count differs, stop and surface — this plan assumes that exact baseline.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Part A — Switch lifecycle verbs to `enable --now` / `disable --now`
|
|
||||||
|
|
||||||
This task changes the helper script, the Python wrapper, and the instance lifecycle in one cohesive commit. The change is end-to-end vertical — splitting it across commits would leave broken intermediate states (helper accepting verbs that no caller uses, or callers using verbs the helper rejects).
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-systemctl`
|
|
||||||
- Modify: `l4d2host/service_control.py`
|
|
||||||
- Modify: `l4d2host/instances.py`
|
|
||||||
- Modify: `l4d2host/tests/test_lifecycle.py`
|
|
||||||
- Create: `l4d2host/tests/test_service_control.py`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py`
|
|
||||||
|
|
||||||
### Step 1.1: Update the deploy artifact test for the helper
|
|
||||||
|
|
||||||
Open `deploy/tests/test_deploy_artifacts.py`. Find `test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args`.
|
|
||||||
|
|
||||||
Replace the assertions that check the helper's case-statement bodies. Currently the test asserts something like:
|
|
||||||
|
|
||||||
```python
|
|
||||||
assert 'start) exec "$systemctl" start "$unit"' in script
|
|
||||||
assert 'stop) exec "$systemctl" stop "$unit"' in script
|
|
||||||
```
|
|
||||||
|
|
||||||
Update to:
|
|
||||||
|
|
||||||
```python
|
|
||||||
assert 'enable)' in script
|
|
||||||
assert 'enable --now' in script
|
|
||||||
assert 'disable)' in script
|
|
||||||
assert 'disable --now' in script
|
|
||||||
```
|
|
||||||
|
|
||||||
Keep the `--property=ActiveState` and `--property=SubState` assertions for the `show` action (unchanged).
|
|
||||||
|
|
||||||
The rejected-action examples list (currently includes things like `["bad/action", "alpha"]`) is unchanged — those are still bad. If the test currently asserts that `start` and `stop` are accepted (e.g., a positive case), drop those — `start`/`stop` are now rejected verbs, not accepted ones.
|
|
||||||
|
|
||||||
### Step 1.2: Run the updated artifact test to verify it fails
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args -v`
|
|
||||||
Expected: FAIL — the helper script still has `start)`/`stop)` cases, not `enable)`/`disable)`.
|
|
||||||
|
|
||||||
### Step 1.3: Edit the helper script
|
|
||||||
|
|
||||||
Open `deploy/files/usr/local/libexec/left4me/left4me-systemctl`. Find the case-statement (currently around lines 24–27). Replace:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
case "$action" in
|
|
||||||
start) exec "$systemctl" start "$unit" ;;
|
|
||||||
stop) exec "$systemctl" stop "$unit" ;;
|
|
||||||
show) exec "$systemctl" show "$unit" --property=ActiveState --property=SubState ;;
|
|
||||||
*) ...
|
|
||||||
esac
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
case "$action" in
|
|
||||||
enable) exec "$systemctl" enable --now "$unit" ;;
|
|
||||||
disable) exec "$systemctl" disable --now "$unit" ;;
|
|
||||||
show) exec "$systemctl" show "$unit" --property=ActiveState --property=SubState ;;
|
|
||||||
*) ...
|
|
||||||
esac
|
|
||||||
```
|
|
||||||
|
|
||||||
Keep the rest of the script (shebang, name validation, `*)` reject-and-exit branch) unchanged. The exact form of the `*)` reject case in the existing helper should be preserved.
|
|
||||||
|
|
||||||
### Step 1.4: Verify the helper script still parses
|
|
||||||
|
|
||||||
Run: `sh -n deploy/files/usr/local/libexec/left4me/left4me-systemctl`
|
|
||||||
Expected: exit 0, no output.
|
|
||||||
|
|
||||||
### Step 1.5: Run the artifact test, verify it passes
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
### Step 1.6: Update `service_control.py`
|
|
||||||
|
|
||||||
Open `l4d2host/service_control.py`. Replace:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def start_service(
|
|
||||||
name: str,
|
|
||||||
*,
|
|
||||||
on_stdout: Callable[[str], None] | None = None,
|
|
||||||
on_stderr: Callable[[str], None] | None = None,
|
|
||||||
passthrough: bool = False,
|
|
||||||
should_cancel: Callable[[], bool] | None = None,
|
|
||||||
) -> CommandResult:
|
|
||||||
return run_command(
|
|
||||||
systemctl_command("start", name),
|
|
||||||
on_stdout=on_stdout,
|
|
||||||
on_stderr=on_stderr,
|
|
||||||
passthrough=passthrough,
|
|
||||||
should_cancel=should_cancel,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def stop_service(
|
|
||||||
name: str,
|
|
||||||
*,
|
|
||||||
on_stdout: Callable[[str], None] | None = None,
|
|
||||||
on_stderr: Callable[[str], None] | None = None,
|
|
||||||
passthrough: bool = False,
|
|
||||||
should_cancel: Callable[[], bool] | None = None,
|
|
||||||
) -> CommandResult:
|
|
||||||
return run_command(
|
|
||||||
systemctl_command("stop", name),
|
|
||||||
on_stdout=on_stdout,
|
|
||||||
on_stderr=on_stderr,
|
|
||||||
passthrough=passthrough,
|
|
||||||
should_cancel=should_cancel,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def enable_service(
|
|
||||||
name: str,
|
|
||||||
*,
|
|
||||||
on_stdout: Callable[[str], None] | None = None,
|
|
||||||
on_stderr: Callable[[str], None] | None = None,
|
|
||||||
passthrough: bool = False,
|
|
||||||
should_cancel: Callable[[], bool] | None = None,
|
|
||||||
) -> CommandResult:
|
|
||||||
return run_command(
|
|
||||||
systemctl_command("enable", name),
|
|
||||||
on_stdout=on_stdout,
|
|
||||||
on_stderr=on_stderr,
|
|
||||||
passthrough=passthrough,
|
|
||||||
should_cancel=should_cancel,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def disable_service(
|
|
||||||
name: str,
|
|
||||||
*,
|
|
||||||
on_stdout: Callable[[str], None] | None = None,
|
|
||||||
on_stderr: Callable[[str], None] | None = None,
|
|
||||||
passthrough: bool = False,
|
|
||||||
should_cancel: Callable[[], bool] | None = None,
|
|
||||||
) -> CommandResult:
|
|
||||||
return run_command(
|
|
||||||
systemctl_command("disable", name),
|
|
||||||
on_stdout=on_stdout,
|
|
||||||
on_stderr=on_stderr,
|
|
||||||
passthrough=passthrough,
|
|
||||||
should_cancel=should_cancel,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
`show_service`, `stream_command`, `stream_journal`, and the `systemctl_command` / `journalctl_command` helpers are unchanged.
|
|
||||||
|
|
||||||
### Step 1.7: Update `instances.py` to call the new names
|
|
||||||
|
|
||||||
Open `l4d2host/instances.py`. Replace the import:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from l4d2host.service_control import start_service, stop_service
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from l4d2host.service_control import disable_service, enable_service
|
|
||||||
```
|
|
||||||
|
|
||||||
Inside `start_instance`, find the `start_service(...)` call (around line 137 in current source) and replace with `enable_service(...)`. Inside `stop_instance` (line 159) and `_purge_instance` (line 194), replace `stop_service(...)` with `disable_service(...)`. Keep all keyword arguments identical — only the function name changes.
|
|
||||||
|
|
||||||
### Step 1.8: Update `test_lifecycle.py`
|
|
||||||
|
|
||||||
Open `l4d2host/tests/test_lifecycle.py`. Search for every assertion that references the `start` or `stop` action token in mock-call expectations against `service_control.run_command` or `systemctl_command`. The tests typically look for argument lists like `["sudo", "-n", "/usr/local/libexec/left4me/left4me-systemctl", "start", "<name>"]`.
|
|
||||||
|
|
||||||
Update each occurrence:
|
|
||||||
- `"start"` → `"enable"` (in the `start_instance` test paths)
|
|
||||||
- `"stop"` → `"disable"` (in the `stop_instance`, `delete_instance`, `reset_instance`, and `_purge_instance` test paths)
|
|
||||||
|
|
||||||
Some tests may import `start_service` / `stop_service` directly. Update those imports to `enable_service` / `disable_service`.
|
|
||||||
|
|
||||||
### Step 1.9: Create direct unit tests for `enable_service` / `disable_service`
|
|
||||||
|
|
||||||
Create `l4d2host/tests/test_service_control.py` with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from unittest.mock import patch
|
|
||||||
|
|
||||||
from l4d2host.service_control import (
|
|
||||||
SYSTEMCTL_HELPER,
|
|
||||||
disable_service,
|
|
||||||
enable_service,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@patch("l4d2host.service_control.run_command")
|
|
||||||
def test_enable_service_invokes_helper_with_enable_action(mock_run):
|
|
||||||
enable_service("instance-7")
|
|
||||||
args, _ = mock_run.call_args
|
|
||||||
assert args[0] == ["sudo", "-n", SYSTEMCTL_HELPER, "enable", "instance-7"]
|
|
||||||
|
|
||||||
|
|
||||||
@patch("l4d2host.service_control.run_command")
|
|
||||||
def test_disable_service_invokes_helper_with_disable_action(mock_run):
|
|
||||||
disable_service("instance-7")
|
|
||||||
args, _ = mock_run.call_args
|
|
||||||
assert args[0] == ["sudo", "-n", SYSTEMCTL_HELPER, "disable", "instance-7"]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 1.10: Run the host-library tests
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2host/tests -q`
|
|
||||||
Expected: all green (110 or 111 passing depending on whether `test_service_control.py` already existed; `+2` from the new direct tests).
|
|
||||||
|
|
||||||
If anything red: fix the test expectations, not the implementation. The implementation matches the spec exactly. Most likely failure mode: a test in `test_lifecycle.py` you missed updating; search for any remaining string literal `"start"` or `"stop"` in helper-arg-list contexts.
|
|
||||||
|
|
||||||
### Step 1.11: Run the deploy artifact test suite
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ -q`
|
|
||||||
Expected: 36 passed, 1 failed (the pre-existing unrelated test).
|
|
||||||
|
|
||||||
### Step 1.12: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add deploy/files/usr/local/libexec/left4me/left4me-systemctl \
|
|
||||||
l4d2host/service_control.py l4d2host/instances.py \
|
|
||||||
l4d2host/tests/test_lifecycle.py \
|
|
||||||
l4d2host/tests/test_service_control.py \
|
|
||||||
deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(l4d2-host): server lifecycle uses systemctl enable --now / disable --now
|
|
||||||
|
|
||||||
Servers started via the web UI now create a WantedBy= symlink under
|
|
||||||
multi-user.target.wants/, so they auto-start on the next host reboot.
|
|
||||||
Helper verbs renamed start/stop -> enable/disable; service_control.py
|
|
||||||
renamed start_service/stop_service -> enable_service/disable_service.
|
|
||||||
The user-facing l4d2ctl start/stop commands keep their names per the
|
|
||||||
AGENTS.md contract — only the implementation changes. Spec:
|
|
||||||
docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Part B — Periodic state poller
|
|
||||||
|
|
||||||
This task adds the poller code, wires it into the Flask startup, exposes its config knob, and tests four behaviors. One cohesive commit.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/services/job_worker.py`
|
|
||||||
- Modify: `l4d2web/app.py`
|
|
||||||
- Modify: `l4d2web/config.py`
|
|
||||||
- Modify: `l4d2web/tests/test_job_worker.py`
|
|
||||||
|
|
||||||
### Step 2.1: Add the failing tests
|
|
||||||
|
|
||||||
Open `l4d2web/tests/test_job_worker.py`. Append after the existing tests:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_state_poller_refreshes_each_server(app, monkeypatch):
|
|
||||||
from l4d2web.services import job_worker as jw
|
|
||||||
|
|
||||||
with app.app_context():
|
|
||||||
from l4d2web.db import session_scope
|
|
||||||
from l4d2web.models import Server
|
|
||||||
with session_scope() as db:
|
|
||||||
db.add_all([
|
|
||||||
Server(id=11, name="alpha", port=27015, blueprint_id=None,
|
|
||||||
desired_state="running", actual_state="unknown"),
|
|
||||||
Server(id=12, name="beta", port=27016, blueprint_id=None,
|
|
||||||
desired_state="running", actual_state="unknown"),
|
|
||||||
])
|
|
||||||
|
|
||||||
refreshed = []
|
|
||||||
monkeypatch.setattr(jw, "refresh_server_actual_state", lambda sid: refreshed.append(sid))
|
|
||||||
|
|
||||||
with app.app_context():
|
|
||||||
jw.poll_all_servers()
|
|
||||||
|
|
||||||
assert sorted(refreshed) == [11, 12]
|
|
||||||
|
|
||||||
|
|
||||||
def test_state_poller_skips_servers_with_inflight_jobs(app, monkeypatch):
|
|
||||||
from l4d2web.services import job_worker as jw
|
|
||||||
|
|
||||||
with app.app_context():
|
|
||||||
from l4d2web.db import session_scope
|
|
||||||
from l4d2web.models import Job, Server
|
|
||||||
with session_scope() as db:
|
|
||||||
db.add(Server(id=21, name="gamma", port=27017, blueprint_id=None,
|
|
||||||
desired_state="running", actual_state="running"))
|
|
||||||
db.add(Job(server_id=21, operation="stop", state="running"))
|
|
||||||
|
|
||||||
refreshed = []
|
|
||||||
monkeypatch.setattr(jw, "refresh_server_actual_state", lambda sid: refreshed.append(sid))
|
|
||||||
|
|
||||||
with app.app_context():
|
|
||||||
jw.poll_all_servers()
|
|
||||||
|
|
||||||
assert refreshed == []
|
|
||||||
|
|
||||||
|
|
||||||
def test_state_poller_swallows_per_server_exceptions(app, monkeypatch):
|
|
||||||
from l4d2web.services import job_worker as jw
|
|
||||||
|
|
||||||
with app.app_context():
|
|
||||||
from l4d2web.db import session_scope
|
|
||||||
from l4d2web.models import Server
|
|
||||||
with session_scope() as db:
|
|
||||||
db.add_all([
|
|
||||||
Server(id=31, name="bad", port=27018, blueprint_id=None,
|
|
||||||
desired_state="running", actual_state="unknown"),
|
|
||||||
Server(id=32, name="good", port=27019, blueprint_id=None,
|
|
||||||
desired_state="running", actual_state="unknown"),
|
|
||||||
])
|
|
||||||
|
|
||||||
refreshed = []
|
|
||||||
|
|
||||||
def fake_refresh(sid):
|
|
||||||
if sid == 31:
|
|
||||||
raise RuntimeError("simulated host failure")
|
|
||||||
refreshed.append(sid)
|
|
||||||
|
|
||||||
monkeypatch.setattr(jw, "refresh_server_actual_state", fake_refresh)
|
|
||||||
|
|
||||||
with app.app_context():
|
|
||||||
jw.poll_all_servers() # must not raise
|
|
||||||
|
|
||||||
assert refreshed == [32]
|
|
||||||
|
|
||||||
|
|
||||||
def test_state_poller_disabled_when_job_workers_disabled(monkeypatch):
|
|
||||||
"""create_app must not spawn the poller thread when JOB_WORKER_ENABLED=False."""
|
|
||||||
import threading
|
|
||||||
|
|
||||||
from l4d2web.app import create_app
|
|
||||||
|
|
||||||
spawned = []
|
|
||||||
real_thread_init = threading.Thread.__init__
|
|
||||||
|
|
||||||
def tracking_init(self, *args, **kwargs):
|
|
||||||
if kwargs.get("name") == "left4me-state-poller":
|
|
||||||
spawned.append(True)
|
|
||||||
real_thread_init(self, *args, **kwargs)
|
|
||||||
|
|
||||||
monkeypatch.setattr(threading.Thread, "__init__", tracking_init)
|
|
||||||
create_app({"TESTING": True, "JOB_WORKER_ENABLED": False})
|
|
||||||
assert not spawned
|
|
||||||
```
|
|
||||||
|
|
||||||
(The tests assume the existing `app` fixture from `conftest.py`. If your project uses a different fixture name, adjust accordingly. The polling tests run `poll_all_servers()` synchronously to avoid testing the loop's `time.sleep`.)
|
|
||||||
|
|
||||||
### Step 2.2: Run the new tests, verify they fail
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2web/tests/test_job_worker.py::test_state_poller_refreshes_each_server l4d2web/tests/test_job_worker.py::test_state_poller_skips_servers_with_inflight_jobs l4d2web/tests/test_job_worker.py::test_state_poller_swallows_per_server_exceptions l4d2web/tests/test_job_worker.py::test_state_poller_disabled_when_job_workers_disabled -v`
|
|
||||||
Expected: FAIL — `poll_all_servers` and `start_state_poller` don't exist yet.
|
|
||||||
|
|
||||||
### Step 2.3: Add the poller code to `job_worker.py`
|
|
||||||
|
|
||||||
Open `l4d2web/services/job_worker.py`. Add at the bottom of the file:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def start_state_poller(app):
|
|
||||||
interval = float(app.config.get("STATE_POLLER_INTERVAL_SECONDS", 30))
|
|
||||||
thread = threading.Thread(
|
|
||||||
target=state_poller_loop,
|
|
||||||
args=(app, interval),
|
|
||||||
daemon=True,
|
|
||||||
name="left4me-state-poller",
|
|
||||||
)
|
|
||||||
thread.start()
|
|
||||||
|
|
||||||
|
|
||||||
def state_poller_loop(app, interval: float) -> None:
|
|
||||||
while True:
|
|
||||||
try:
|
|
||||||
with app.app_context():
|
|
||||||
poll_all_servers()
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
time.sleep(interval)
|
|
||||||
|
|
||||||
|
|
||||||
def poll_all_servers() -> None:
|
|
||||||
with session_scope() as db:
|
|
||||||
active_server_ids = set(db.scalars(
|
|
||||||
select(Job.server_id).where(Job.state.in_(("queued", "running")))
|
|
||||||
).all())
|
|
||||||
server_ids = [
|
|
||||||
sid for sid in db.scalars(select(Server.id)).all()
|
|
||||||
if sid not in active_server_ids
|
|
||||||
]
|
|
||||||
for sid in server_ids:
|
|
||||||
try:
|
|
||||||
refresh_server_actual_state(sid)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
```
|
|
||||||
|
|
||||||
`Server`, `Job`, `select`, `session_scope`, `threading`, `time`, and `refresh_server_actual_state` are already imported in this file. Verify by scanning the existing imports; if any are missing (unlikely for `select`/`Server`/`Job` since the worker uses them), add them.
|
|
||||||
|
|
||||||
### Step 2.4: Wire the poller into `create_app`
|
|
||||||
|
|
||||||
Open `l4d2web/app.py`. Find the existing `start_job_workers(app)` call (around line 91, inside the `if should_start_workers:` block). Add `start_state_poller(app)` immediately after it:
|
|
||||||
|
|
||||||
```python
|
|
||||||
if should_start_workers:
|
|
||||||
recover_stale_jobs()
|
|
||||||
start_job_workers(app)
|
|
||||||
start_state_poller(app)
|
|
||||||
```
|
|
||||||
|
|
||||||
Also update the import:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from l4d2web.services.job_worker import (
|
|
||||||
recover_stale_jobs,
|
|
||||||
start_job_workers,
|
|
||||||
start_state_poller,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
(If the existing import is single-line `from ... import recover_stale_jobs, start_job_workers`, just add `start_state_poller` to the list.)
|
|
||||||
|
|
||||||
### Step 2.5: Add the config default
|
|
||||||
|
|
||||||
Open `l4d2web/config.py`. Find the dict literal that contains other defaults like `JOB_WORKER_THREADS`, `PORT_RANGE_START`, etc. Add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
"STATE_POLLER_INTERVAL_SECONDS": 30,
|
|
||||||
```
|
|
||||||
|
|
||||||
In the env-var-loading section (where `LEFT4ME_PORT_RANGE_START` etc. are read), add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
"STATE_POLLER_INTERVAL_SECONDS": float(os.getenv("LEFT4ME_STATE_POLLER_INTERVAL_SECONDS", "30")),
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2.6: Run the four new tests, verify they pass
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2web/tests/test_job_worker.py::test_state_poller_refreshes_each_server l4d2web/tests/test_job_worker.py::test_state_poller_skips_servers_with_inflight_jobs l4d2web/tests/test_job_worker.py::test_state_poller_swallows_per_server_exceptions l4d2web/tests/test_job_worker.py::test_state_poller_disabled_when_job_workers_disabled -v`
|
|
||||||
Expected: PASS for all four.
|
|
||||||
|
|
||||||
### Step 2.7: Run the full web test suite
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2web/tests -q`
|
|
||||||
Expected: 317 passed, 1 skipped (313 + 4 new tests).
|
|
||||||
|
|
||||||
### Step 2.8: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/services/job_worker.py l4d2web/app.py l4d2web/config.py l4d2web/tests/test_job_worker.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(l4d2-web): periodic state poller refreshes Server.actual_state
|
|
||||||
|
|
||||||
A background thread spawned alongside the job workers polls every
|
|
||||||
server's status every STATE_POLLER_INTERVAL_SECONDS (default 30) and
|
|
||||||
writes the result via the existing refresh_server_actual_state path.
|
|
||||||
Servers with in-flight jobs are skipped to avoid racing the post-job
|
|
||||||
refresh. Catches reboot drift, OOM kills, manual systemctl operations,
|
|
||||||
and any other out-of-band state change. Spec:
|
|
||||||
docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Final Verification
|
|
||||||
|
|
||||||
- [ ] **Step F.1: Full test sweep**
|
|
||||||
|
|
||||||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ l4d2host/tests l4d2web/tests -q`
|
|
||||||
Expected: ~466 passed, 1 failed (the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state`), 2 skipped.
|
|
||||||
|
|
||||||
- [ ] **Step F.2: Working tree clean and commit shape**
|
|
||||||
|
|
||||||
Run: `git status && git log --oneline -5`
|
|
||||||
Expected:
|
|
||||||
- `git status`: clean.
|
|
||||||
- Top of `git log`:
|
|
||||||
1. `feat(l4d2-web): periodic state poller refreshes Server.actual_state`
|
|
||||||
2. `feat(l4d2-host): server lifecycle uses systemctl enable --now / disable --now`
|
|
||||||
3. `docs(plans): l4d2 server lifecycle reboot-and-drift — implementation plan`
|
|
||||||
4. `docs(specs): l4d2 server lifecycle reboot-and-drift — design`
|
|
||||||
|
|
||||||
- [ ] **Step F.3: Operator-side smoke test (deferred, not part of this plan)**
|
|
||||||
|
|
||||||
End-to-end on `ckn@10.0.4.128` after deploy:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
deploy/deploy-test-server.sh ckn@10.0.4.128
|
|
||||||
|
|
||||||
# Confirm the helper now drives enable/disable
|
|
||||||
ssh ckn@10.0.4.128 'cat /usr/local/libexec/left4me/left4me-systemctl | grep -E "enable|disable"'
|
|
||||||
# expect: enable) exec "$systemctl" enable --now "$unit"
|
|
||||||
# disable) exec "$systemctl" disable --now "$unit"
|
|
||||||
|
|
||||||
# Click "start" in the web UI for a server. Then:
|
|
||||||
ssh ckn@10.0.4.128 'systemctl is-enabled left4me-server@1.service'
|
|
||||||
# expect: enabled
|
|
||||||
|
|
||||||
# Reboot the host:
|
|
||||||
ssh ckn@10.0.4.128 'sudo systemctl reboot'
|
|
||||||
# wait for it to come back, then:
|
|
||||||
ssh ckn@10.0.4.128 'systemctl is-active left4me-server@1.service && pgrep -fa srcds'
|
|
||||||
# expect: active, srcds running with no UI intervention
|
|
||||||
|
|
||||||
# Confirm the poller corrects out-of-band drift
|
|
||||||
ssh ckn@10.0.4.128 'sudo systemctl disable --now left4me-server@1.service'
|
|
||||||
# Within ~30s the web UI's actual_state for server 1 flips from "running" to "stopped".
|
|
||||||
ssh ckn@10.0.4.128 'sudo -u left4me /opt/left4me/.venv/bin/python -c "
|
|
||||||
import sqlite3
|
|
||||||
c = sqlite3.connect(\"/var/lib/left4me/left4me.db\")
|
|
||||||
print(c.execute(\"SELECT id, actual_state, actual_state_updated_at FROM servers WHERE id=1\").fetchone())
|
|
||||||
"'
|
|
||||||
# expect: actual_state='stopped' with a fresh updated_at.
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Out of Scope (do NOT implement here)
|
|
||||||
|
|
||||||
- Auto-restart on `desired_state=running && actual_state=stopped`.
|
|
||||||
- UI banners for stale-state warnings.
|
|
||||||
- Reconciliation of orphan systemd units.
|
|
||||||
- Per-server poll intervals.
|
|
||||||
- Replacing `Restart=on-failure`.
|
|
||||||
- Touching the pre-existing red test (`test_deploy_script_has_safe_defaults_and_preserves_state`).
|
|
||||||
|
|
||||||
If you find yourself touching any of these, stop — they belong in a separate spec.
|
|
||||||
|
|
@ -1,161 +0,0 @@
|
||||||
# Overlay umount helper was pinning the unit's mount namespace alive
|
|
||||||
|
|
||||||
> **Status:** fixed in `5eac51a` (helper nsenter wrap) and `87d56a0`
|
|
||||||
> (modal delegation). This doc is a postmortem so future maintainers
|
|
||||||
> don't walk the same path.
|
|
||||||
|
|
||||||
## Symptom
|
|
||||||
|
|
||||||
After commit `936c8bb` ("ExecStart srcds_run from merged overlay,
|
|
||||||
not installation/"), every Reset job started failing:
|
|
||||||
|
|
||||||
```
|
|
||||||
OSError: [Errno 16] Device or resource busy:
|
|
||||||
'/var/lib/left4me/runtime/<id>/merged'
|
|
||||||
```
|
|
||||||
|
|
||||||
`shutil.rmtree(runtime_dir)` in `_purge_instance` tripped on the
|
|
||||||
still-mounted `merged/`. The unit's `ExecStopPost` had run the umount
|
|
||||||
helper, the helper had returned non-zero, the unit went `failed`, and
|
|
||||||
the rmtree downstream couldn't proceed.
|
|
||||||
|
|
||||||
## False starts (don't repeat these)
|
|
||||||
|
|
||||||
We initially modeled this as an unavoidable kernel-level race between
|
|
||||||
ExecStopPost and the deferred reaping of the unit's per-service mount
|
|
||||||
namespace. The "fixes" applied in that frame:
|
|
||||||
|
|
||||||
1. **Eager-retry loop in `cmd_umount`** (started at 4 s deadline,
|
|
||||||
bumped to 12 s, then 25 s). Each bump worked sometimes and broke
|
|
||||||
sometimes — because we were timing the helper's own life, not the
|
|
||||||
kernel's reaping (see root cause).
|
|
||||||
2. **Lazy-umount (`umount -l`) fallback** if eager retries exhausted.
|
|
||||||
This *would* have made the unit not go `failed`, but it left
|
|
||||||
`work/work` half-finalized and just moved the EBUSY downstream.
|
|
||||||
3. **`TimeoutStopSec=15s` → `60s`** to give ExecStopPost more retry
|
|
||||||
room. This made Stop sit in "stopping" for tens of seconds.
|
|
||||||
|
|
||||||
All three workarounds shipped to the test box and were reverted in
|
|
||||||
`5eac51a` once we found the actual cause.
|
|
||||||
|
|
||||||
## Root cause
|
|
||||||
|
|
||||||
A live empirical probe (`/tmp/probe-umount2.sh` on the test box,
|
|
||||||
polling `/proc/*/ns/mnt` while a stop was in flight) showed:
|
|
||||||
|
|
||||||
```
|
|
||||||
[t= 0.00] mounted=Y holders=[]
|
|
||||||
[t= 2.27] mounted=Y holders=[35259(left4me-overlay) ]
|
|
||||||
[t= 4.53] mounted=Y holders=[35259(left4me-overlay) ]
|
|
||||||
[t= … ] (steady for ~22 s)
|
|
||||||
[t=22.97] mounted=Y holders=[35259(left4me-overlay) ]
|
|
||||||
[t=25.22] mounted=N holders=[] ← helper finally exited
|
|
||||||
```
|
|
||||||
|
|
||||||
The single PID holding a reference to the unit's dying mount namespace
|
|
||||||
was **our own umount helper** running as ExecStopPost. The EBUSY
|
|
||||||
window matched the helper's retry budget exactly. The mount became
|
|
||||||
unmountable the moment the helper exited.
|
|
||||||
|
|
||||||
### Why the helper was holding the namespace
|
|
||||||
|
|
||||||
systemd's `+` Exec prefix removes sandbox & credentials, but does
|
|
||||||
**not** detach from the unit's per-service mount namespace (created
|
|
||||||
by `PrivateTmp=true` + `Protect*` directives). The Python interpreter
|
|
||||||
that runs `left4me-overlay` was launched inside the unit's namespace.
|
|
||||||
|
|
||||||
Inside the helper we did:
|
|
||||||
|
|
||||||
```python
|
|
||||||
subprocess.run([NSENTER, "--mount=/proc/1/ns/mnt", "--", UMOUNT_BIN, ...])
|
|
||||||
```
|
|
||||||
|
|
||||||
That nsenter put the *child process* (the umount syscall) in PID 1's
|
|
||||||
namespace — but the *parent process* (the helper Python interpreter)
|
|
||||||
never left the unit's namespace. As long as the helper was alive, it
|
|
||||||
held a reference to that namespace, which kept the slave-mount tree
|
|
||||||
alive, which made `umount` in PID 1 return EBUSY (mount-propagation
|
|
||||||
can't reconcile a slave that still has open references).
|
|
||||||
|
|
||||||
Self-defeating loop: the helper tried to umount the namespace it was
|
|
||||||
holding open. The mount only released when the helper gave up.
|
|
||||||
|
|
||||||
### Why this didn't surface before commit `936c8bb`
|
|
||||||
|
|
||||||
Before that commit, `ExecStart` invoked `srcds_run` from the
|
|
||||||
`installation/` lower layer. Srcds processes had cwd / mmaps in
|
|
||||||
`installation/`, **not** in the overlay mount. The unit's namespace
|
|
||||||
still existed and the helper still pinned it, but the kernel didn't
|
|
||||||
need to reconcile any references inside the overlay — so `umount` in
|
|
||||||
PID 1 found nothing busy and succeeded immediately.
|
|
||||||
|
|
||||||
Once srcds started running from inside `merged/`, the unit's namespace
|
|
||||||
gained file references inside the overlay, and the helper's
|
|
||||||
namespace-pin became the thing keeping those references in place.
|
|
||||||
|
|
||||||
## Fix
|
|
||||||
|
|
||||||
**One change at the systemd Exec line, two consequential cleanups.**
|
|
||||||
|
|
||||||
### `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
|
|
||||||
|
|
||||||
Wrap both helper invocations with nsenter at the unit level:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
|
||||||
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
|
|
||||||
```
|
|
||||||
|
|
||||||
`nsenter` runs in the unit's namespace momentarily, switches its own
|
|
||||||
mount namespace to PID 1's, then `execve`s the helper. From that
|
|
||||||
point the helper Python interpreter — *the long-lived parent process*
|
|
||||||
— lives in PID 1's namespace and holds no reference to the unit's
|
|
||||||
namespace.
|
|
||||||
|
|
||||||
`TimeoutStopSec` reverts to `15s`.
|
|
||||||
|
|
||||||
### `deploy/files/usr/local/libexec/left4me/left4me-overlay`
|
|
||||||
|
|
||||||
With the helper already in PID 1's namespace, internal nsenter is
|
|
||||||
redundant. Removed:
|
|
||||||
|
|
||||||
- `nsenter --mount=/proc/1/ns/mnt --` prefix on the mount/umount argv.
|
|
||||||
- `cmd_umount`'s eager-retry loop (no race left to ride out).
|
|
||||||
- Lazy-umount (`umount -l`) fallback (no fallback needed; eager
|
|
||||||
succeeds first try).
|
|
||||||
- `work_inner` cleanup retry (no kernel-finalisation residual after a
|
|
||||||
successful eager umount).
|
|
||||||
- `import time`.
|
|
||||||
|
|
||||||
Kept: input validation, idempotency guards (`os.path.ismount`),
|
|
||||||
`work_inner` rmtree (the kernel-overlayfs orphan dir is unrelated to
|
|
||||||
the namespace issue and still needs cleaning up).
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
After deploy on the test box:
|
|
||||||
|
|
||||||
| Metric | Before fix | After fix |
|
|
||||||
|---|---|---|
|
|
||||||
| Reset duration (`l4d2ctl reset 3`) | ~25 s | ~0.5 s |
|
|
||||||
| `holders=` of dying namespace | `[helper_pid]` for ~25 s | `[]` immediately |
|
|
||||||
| Unit state after Stop | `failed` | `inactive` |
|
|
||||||
| ExecStopPost exit code | 32 (EBUSY) | 0 |
|
|
||||||
|
|
||||||
UI flow (`/servers/3` → Start → Reset): job `#164 reset succeeded`
|
|
||||||
in 1.3 s end-to-end. No `failed` rows on subsequent resets.
|
|
||||||
|
|
||||||
## Lessons
|
|
||||||
|
|
||||||
- **A retry loop is a hint, not a fix.** If you find yourself reaching
|
|
||||||
for "retry until kernel finishes," check whether *your own process*
|
|
||||||
is what's blocking the kernel from finishing. nsenter at the
|
|
||||||
syscall level looks right, but only escapes the namespace for the
|
|
||||||
child process; the parent still pins it.
|
|
||||||
- **Probe for the holder, don't assume async.** `/proc/*/ns/mnt` plus
|
|
||||||
a tight polling loop quickly tells you who's actually holding a
|
|
||||||
namespace alive. We jumped to "task_work_add reaping" as the
|
|
||||||
explanation and burned a round of workarounds before checking.
|
|
||||||
- **`+` prefix only escapes sandbox & credentials.** Mount namespace
|
|
||||||
inheritance is unaffected; if you need PID 1's namespace, do
|
|
||||||
`nsenter` yourself at the Exec line.
|
|
||||||
|
|
@ -1,895 +0,0 @@
|
||||||
# L4D2 Network Shaping & Marking Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Ship a network-side player-experience baseline alongside the existing host perf baseline: nftables uid-based DSCP-EF + skb-priority marking for srcds UDP, rounding sysctls (`udp_rmem_min`/`wmem_min`, `default_qdisc=fq_codel`, `tcp_congestion_control=bbr`), and CAKE egress shaping via a systemd oneshot driven by an operator-edited env file. Production hosts running `systemd-networkd` consume an equivalent `[CAKE]` section documented in the README.
|
|
||||||
|
|
||||||
**Architecture:** Eight ship-ready artifacts under `deploy/files/...`, wired into `deploy-test-server.sh`, asserted in `deploy/tests/test_deploy_artifacts.py`, and documented in `deploy/README.md`. Each artifact is a separate, independently-testable file. The CAKE helper takes an `apply`/`clear` mode argument so the unit's `ExecStart`/`ExecStop` are clean shell calls without escape soup.
|
|
||||||
|
|
||||||
**Tech Stack:** sysctl, nftables (`inet` table, output hook, mangle priority), tc-cake, systemd oneshot units, POSIX `/bin/sh` for the helper, pytest substring assertions.
|
|
||||||
|
|
||||||
**Spec:** `docs/superpowers/specs/2026-05-10-l4d2-network-shaping-design.md`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
**New files (`deploy/files/...`):**
|
|
||||||
- `usr/local/lib/left4me/nft/left4me-mark.nft` — nftables ruleset, own `inet` table.
|
|
||||||
- `usr/local/lib/systemd/system/left4me-nft-mark.service` — applies/removes the table.
|
|
||||||
- `etc/left4me/cake.env` — operator-edited template (deploy preserves edits).
|
|
||||||
- `usr/local/libexec/left4me/left4me-apply-cake` — POSIX shell helper, `apply`/`clear` modes.
|
|
||||||
- `usr/local/lib/systemd/system/left4me-cake.service` — runs the helper at network-online, clears on stop.
|
|
||||||
|
|
||||||
**Modified files:**
|
|
||||||
- `deploy/files/etc/sysctl.d/99-left4me.conf` — append four new directives.
|
|
||||||
- `deploy/deploy-test-server.sh` — add `nftables iproute2` to apt/dnf install lines, copy the new artifacts, conditional cake.env copy, enable the two new units.
|
|
||||||
- `deploy/README.md` — Network shaping subsection + three new escape hatches (IFB ingress, busy_poll, GRO).
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — add path constants and assertions.
|
|
||||||
|
|
||||||
Each task adds (or extends) one artifact and the matching test, ending in a commit. Order matters: sysctl extension first (smallest, isolated), then the nftables pair, then the CAKE pair, then deploy-script wiring (depends on every prior task), then README.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 1: Sysctl additions to `99-left4me.conf`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/files/etc/sysctl.d/99-left4me.conf` (append block)
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py:199-211` (extend existing `test_sysctl_conf_present_with_perf_settings`)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Extend the existing sysctl test with the new lines.**
|
|
||||||
|
|
||||||
In `deploy/tests/test_deploy_artifacts.py`, edit `test_sysctl_conf_present_with_perf_settings` to append four lines to the tuple it already iterates:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_sysctl_conf_present_with_perf_settings():
|
|
||||||
assert SYSCTL_CONF.is_file()
|
|
||||||
text = SYSCTL_CONF.read_text()
|
|
||||||
for line in (
|
|
||||||
"net.core.rmem_max = 8388608",
|
|
||||||
"net.core.wmem_max = 8388608",
|
|
||||||
"net.core.rmem_default = 524288",
|
|
||||||
"net.core.wmem_default = 524288",
|
|
||||||
"net.core.netdev_max_backlog = 5000",
|
|
||||||
"net.core.netdev_budget = 600",
|
|
||||||
"vm.swappiness = 10",
|
|
||||||
"net.ipv4.udp_rmem_min = 16384",
|
|
||||||
"net.ipv4.udp_wmem_min = 16384",
|
|
||||||
"net.core.default_qdisc = fq_codel",
|
|
||||||
"net.ipv4.tcp_congestion_control = bbr",
|
|
||||||
):
|
|
||||||
assert line in text, f"missing {line!r} in 99-left4me.conf"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run the test to verify it fails.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_sysctl_conf_present_with_perf_settings -v
|
|
||||||
```
|
|
||||||
Expected: FAIL — `AssertionError: missing 'net.ipv4.udp_rmem_min = 16384' in 99-left4me.conf`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Append the new block to `99-left4me.conf`.**
|
|
||||||
|
|
||||||
Open `deploy/files/etc/sysctl.d/99-left4me.conf` and append (after the existing `vm.swappiness = 10` line):
|
|
||||||
|
|
||||||
```
|
|
||||||
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
|
|
||||||
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
|
|
||||||
net.ipv4.udp_rmem_min = 16384
|
|
||||||
net.ipv4.udp_wmem_min = 16384
|
|
||||||
|
|
||||||
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian Trixie
|
|
||||||
# already defaults to fq_codel; setting it explicitly is belt-and-suspenders
|
|
||||||
# and survives kernel-default churn.
|
|
||||||
net.core.default_qdisc = fq_codel
|
|
||||||
|
|
||||||
# TCP congestion control: BBR for any bulk TCP egress on the host (admin SSH,
|
|
||||||
# backups, package fetches, web-app responses) so a long flow does not push
|
|
||||||
# the bottleneck queue ahead of game UDP. UDP srcds is unaffected.
|
|
||||||
net.ipv4.tcp_congestion_control = bbr
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run the test again to verify it passes.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_sysctl_conf_present_with_perf_settings -v
|
|
||||||
```
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/files/etc/sysctl.d/99-left4me.conf deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): extend sysctls with udp_*_min, fq_codel default, BBR"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 2: nftables marking file
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (add path constant + new test function)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the path constant and a failing test.**
|
|
||||||
|
|
||||||
In `deploy/tests/test_deploy_artifacts.py`, add the constant near the existing path constants block (around line 26, after `DEPLOY_SCRIPT`):
|
|
||||||
|
|
||||||
```python
|
|
||||||
NFT_MARK_FILE = DEPLOY / "files/usr/local/lib/left4me/nft/left4me-mark.nft"
|
|
||||||
```
|
|
||||||
|
|
||||||
Append this test function to the bottom of the file:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_nft_mark_file_marks_left4me_udp_with_dscp_ef_and_priority():
|
|
||||||
assert NFT_MARK_FILE.is_file()
|
|
||||||
text = NFT_MARK_FILE.read_text()
|
|
||||||
|
|
||||||
# Own table in the inet family so it cannot conflict with operator nftables config.
|
|
||||||
assert "table inet left4me_mark" in text
|
|
||||||
assert "chain mangle_output" in text
|
|
||||||
assert "type filter hook output priority mangle" in text
|
|
||||||
|
|
||||||
# Match by uid (every srcds runs as `left4me`) restricted to UDP.
|
|
||||||
assert 'meta skuid "left4me"' in text
|
|
||||||
assert "meta l4proto udp" in text
|
|
||||||
|
|
||||||
# DSCP EF for both L3 families; in `inet` tables, `ip` only fires on v4
|
|
||||||
# and `ip6` only on v6.
|
|
||||||
assert "ip dscp set ef" in text
|
|
||||||
assert "ip6 dscp set ef" in text
|
|
||||||
|
|
||||||
# skb->priority class 6:0, set inline alongside DSCP.
|
|
||||||
assert "meta priority set 0006:0000" in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run the new test and confirm it fails.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_nft_mark_file_marks_left4me_udp_with_dscp_ef_and_priority -v
|
|
||||||
```
|
|
||||||
Expected: FAIL — `AssertionError: assert False` on `NFT_MARK_FILE.is_file()`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Create the directory and write the nftables file.**
|
|
||||||
|
|
||||||
```
|
|
||||||
mkdir -p deploy/files/usr/local/lib/left4me/nft
|
|
||||||
```
|
|
||||||
|
|
||||||
Write `deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft`:
|
|
||||||
|
|
||||||
```nft
|
|
||||||
# left4me — uid-based DSCP/priority marking for srcds UDP egress.
|
|
||||||
# Loaded by left4me-nft-mark.service into its own `inet` table so it cannot
|
|
||||||
# conflict with whatever the operator already runs in /etc/nftables.conf.
|
|
||||||
# See docs/superpowers/specs/2026-05-10-l4d2-network-shaping-design.md.
|
|
||||||
|
|
||||||
table inet left4me_mark {
|
|
||||||
chain mangle_output {
|
|
||||||
type filter hook output priority mangle; policy accept;
|
|
||||||
meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000
|
|
||||||
meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run the test and confirm it passes.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_nft_mark_file_marks_left4me_udp_with_dscp_ef_and_priority -v
|
|
||||||
```
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): nftables uid-based DSCP-EF + skb-priority marking for srcds"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 3: nftables systemd unit
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (path constant + test)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the path constant and a failing test.**
|
|
||||||
|
|
||||||
Append the constant near the existing systemd-unit constants (around line 16):
|
|
||||||
|
|
||||||
```python
|
|
||||||
NFT_MARK_UNIT = DEPLOY / "files/usr/local/lib/systemd/system/left4me-nft-mark.service"
|
|
||||||
```
|
|
||||||
|
|
||||||
Append the test:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_nft_mark_unit_loads_and_clears_left4me_table():
|
|
||||||
assert NFT_MARK_UNIT.is_file()
|
|
||||||
text = NFT_MARK_UNIT.read_text()
|
|
||||||
|
|
||||||
# Loads the rules early so the very first packet srcds emits is marked.
|
|
||||||
assert "After=network-pre.target" in text
|
|
||||||
assert "Before=network.target" in text
|
|
||||||
assert "Wants=network-pre.target" in text
|
|
||||||
|
|
||||||
# Oneshot lifecycle: load on start, drop on stop.
|
|
||||||
assert "Type=oneshot" in text
|
|
||||||
assert "RemainAfterExit=yes" in text
|
|
||||||
assert (
|
|
||||||
"ExecStart=/usr/sbin/nft -f /usr/local/lib/left4me/nft/left4me-mark.nft"
|
|
||||||
in text
|
|
||||||
)
|
|
||||||
assert "ExecStop=/usr/sbin/nft delete table inet left4me_mark" in text
|
|
||||||
assert "WantedBy=multi-user.target" in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run the test and confirm FAIL.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_nft_mark_unit_loads_and_clears_left4me_table -v
|
|
||||||
```
|
|
||||||
Expected: FAIL — `assert False` on `NFT_MARK_UNIT.is_file()`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Write the unit file.**
|
|
||||||
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me nftables packet marking (DSCP EF + priority for srcds)
|
|
||||||
After=network-pre.target
|
|
||||||
Before=network.target
|
|
||||||
Wants=network-pre.target
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
RemainAfterExit=yes
|
|
||||||
ExecStart=/usr/sbin/nft -f /usr/local/lib/left4me/nft/left4me-mark.nft
|
|
||||||
ExecStop=/usr/sbin/nft delete table inet left4me_mark
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run the test and confirm PASS.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_nft_mark_unit_loads_and_clears_left4me_table -v
|
|
||||||
```
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): systemd unit to load/clear left4me_mark nftables table"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 4: CAKE env template
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/etc/left4me/cake.env`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (path constant + test)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add path constant and failing test.**
|
|
||||||
|
|
||||||
Append the constant near the other `/etc/left4me` constants (around line 22):
|
|
||||||
|
|
||||||
```python
|
|
||||||
CAKE_ENV = DEPLOY / "files/etc/left4me/cake.env"
|
|
||||||
```
|
|
||||||
|
|
||||||
Append the test:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_cake_env_template_documents_required_knobs():
|
|
||||||
assert CAKE_ENV.is_file()
|
|
||||||
text = CAKE_ENV.read_text()
|
|
||||||
|
|
||||||
# Both knobs are documented and present (commented OK; the deploy preserves
|
|
||||||
# operator edits, so the template must not bake in a wrong value).
|
|
||||||
assert "LEFT4ME_UPLINK_MBIT" in text
|
|
||||||
assert "LEFT4ME_UPLINK_IFACE" in text
|
|
||||||
# Empty defaults: shaper unit no-ops with a journal warning when unset.
|
|
||||||
assert "LEFT4ME_UPLINK_MBIT=" in text
|
|
||||||
assert "LEFT4ME_UPLINK_IFACE=" in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run and confirm FAIL.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_cake_env_template_documents_required_knobs -v
|
|
||||||
```
|
|
||||||
Expected: FAIL on `CAKE_ENV.is_file()`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Write the env template.**
|
|
||||||
|
|
||||||
`deploy/files/etc/left4me/cake.env`:
|
|
||||||
|
|
||||||
```
|
|
||||||
# left4me — CAKE egress shaper config. Consumed by left4me-cake.service via
|
|
||||||
# its EnvironmentFile=. Edit then `systemctl restart left4me-cake.service`.
|
|
||||||
# See docs/superpowers/specs/2026-05-10-l4d2-network-shaping-design.md.
|
|
||||||
|
|
||||||
# Uplink bandwidth in Mbit/s. Set to ~95% of the smaller of measured upload
|
|
||||||
# and measured download. CAKE only shapes correctly when its declared
|
|
||||||
# bandwidth sits below the real bottleneck. If unset, the shaper unit logs
|
|
||||||
# a warning and exits 0 (no shaping).
|
|
||||||
LEFT4ME_UPLINK_MBIT=
|
|
||||||
|
|
||||||
# Egress interface. If unset, auto-detected from the IPv4 default route.
|
|
||||||
LEFT4ME_UPLINK_IFACE=
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run and confirm PASS.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_cake_env_template_documents_required_knobs -v
|
|
||||||
```
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/files/etc/left4me/cake.env deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): cake.env template with documented uplink knobs"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 5: CAKE helper script
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/usr/local/libexec/left4me/left4me-apply-cake`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (path constant + tests)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add path constant and failing tests.**
|
|
||||||
|
|
||||||
Append the constant near the libexec helper constants (around line 21):
|
|
||||||
|
|
||||||
```python
|
|
||||||
APPLY_CAKE_HELPER = DEPLOY / "files/usr/local/libexec/left4me/left4me-apply-cake"
|
|
||||||
```
|
|
||||||
|
|
||||||
Append two test functions:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_apply_cake_helper_supports_apply_and_clear_modes():
|
|
||||||
assert APPLY_CAKE_HELPER.is_file()
|
|
||||||
text = APPLY_CAKE_HELPER.read_text()
|
|
||||||
|
|
||||||
assert text.startswith("#!/bin/sh")
|
|
||||||
# Both knobs are read from the env file.
|
|
||||||
assert "LEFT4ME_UPLINK_MBIT" in text
|
|
||||||
assert "LEFT4ME_UPLINK_IFACE" in text
|
|
||||||
assert ". /etc/left4me/cake.env" in text
|
|
||||||
# Iface fallback to default route.
|
|
||||||
assert "ip -4 route show default" in text
|
|
||||||
# Two modes; default to apply.
|
|
||||||
assert "mode=${1:-apply}" in text
|
|
||||||
assert 'apply)' in text and 'clear)' in text
|
|
||||||
# Apply: idempotent `tc qdisc replace` with the documented flags.
|
|
||||||
assert "tc qdisc replace" in text
|
|
||||||
assert "cake" in text
|
|
||||||
assert "bandwidth" in text
|
|
||||||
assert "internet" in text
|
|
||||||
assert "diffserv4" in text
|
|
||||||
assert "dual-dsthost" in text
|
|
||||||
# Clear: tolerates a missing qdisc.
|
|
||||||
assert "tc qdisc del" in text
|
|
||||||
assert "|| true" in text
|
|
||||||
# Fail-soft on missing config.
|
|
||||||
assert "LEFT4ME_UPLINK_MBIT unset" in text
|
|
||||||
|
|
||||||
|
|
||||||
def test_apply_cake_helper_passes_shell_syntax_check():
|
|
||||||
subprocess.run(["sh", "-n", str(APPLY_CAKE_HELPER)], check=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run and confirm FAIL.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_apply_cake_helper_supports_apply_and_clear_modes deploy/tests/test_deploy_artifacts.py::test_apply_cake_helper_passes_shell_syntax_check -v
|
|
||||||
```
|
|
||||||
Expected: both FAIL.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Write the helper.**
|
|
||||||
|
|
||||||
`deploy/files/usr/local/libexec/left4me/left4me-apply-cake`:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
#!/bin/sh
|
|
||||||
# left4me — apply or clear CAKE egress shaper on the configured uplink.
|
|
||||||
# Driven by left4me-cake.service. See spec
|
|
||||||
# docs/superpowers/specs/2026-05-10-l4d2-network-shaping-design.md.
|
|
||||||
set -eu
|
|
||||||
|
|
||||||
mode=${1:-apply}
|
|
||||||
|
|
||||||
if [ -r /etc/left4me/cake.env ]; then
|
|
||||||
. /etc/left4me/cake.env
|
|
||||||
fi
|
|
||||||
|
|
||||||
resolve_iface() {
|
|
||||||
if [ -n "${LEFT4ME_UPLINK_IFACE:-}" ]; then
|
|
||||||
printf '%s' "$LEFT4ME_UPLINK_IFACE"
|
|
||||||
return
|
|
||||||
fi
|
|
||||||
ip -4 route show default | awk '/default/ {print $5; exit}'
|
|
||||||
}
|
|
||||||
|
|
||||||
case "$mode" in
|
|
||||||
apply)
|
|
||||||
if [ -z "${LEFT4ME_UPLINK_MBIT:-}" ]; then
|
|
||||||
echo "left4me-cake: LEFT4ME_UPLINK_MBIT unset; skipping shaper" >&2
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
iface=$(resolve_iface)
|
|
||||||
if [ -z "$iface" ]; then
|
|
||||||
echo "left4me-cake: cannot determine egress iface; skipping" >&2
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
exec tc qdisc replace dev "$iface" root cake \
|
|
||||||
bandwidth "${LEFT4ME_UPLINK_MBIT}mbit" \
|
|
||||||
internet diffserv4 dual-dsthost
|
|
||||||
;;
|
|
||||||
clear)
|
|
||||||
iface=$(resolve_iface)
|
|
||||||
if [ -z "$iface" ]; then
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
tc qdisc del dev "$iface" root 2>/dev/null || true
|
|
||||||
;;
|
|
||||||
*)
|
|
||||||
echo "usage: $0 [apply|clear]" >&2
|
|
||||||
exit 2
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
```
|
|
||||||
|
|
||||||
Make it executable in the repo (the deploy script also `chmod 0755`s the destination, but executable mode in the source tree is conventional here):
|
|
||||||
|
|
||||||
```
|
|
||||||
chmod 0755 deploy/files/usr/local/libexec/left4me/left4me-apply-cake
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run and confirm PASS.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_apply_cake_helper_supports_apply_and_clear_modes deploy/tests/test_deploy_artifacts.py::test_apply_cake_helper_passes_shell_syntax_check -v
|
|
||||||
```
|
|
||||||
Expected: both PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/files/usr/local/libexec/left4me/left4me-apply-cake deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): left4me-apply-cake helper with apply/clear modes"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 6: CAKE systemd unit
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `deploy/files/usr/local/lib/systemd/system/left4me-cake.service`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (path constant + test)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add path constant and failing test.**
|
|
||||||
|
|
||||||
Append the constant near the existing systemd-unit constants (around line 16):
|
|
||||||
|
|
||||||
```python
|
|
||||||
CAKE_UNIT = DEPLOY / "files/usr/local/lib/systemd/system/left4me-cake.service"
|
|
||||||
```
|
|
||||||
|
|
||||||
Append the test:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_cake_unit_runs_helper_in_apply_and_clear_modes():
|
|
||||||
assert CAKE_UNIT.is_file()
|
|
||||||
text = CAKE_UNIT.read_text()
|
|
||||||
|
|
||||||
assert "After=network-online.target" in text
|
|
||||||
assert "Wants=network-online.target" in text
|
|
||||||
assert "Type=oneshot" in text
|
|
||||||
assert "RemainAfterExit=yes" in text
|
|
||||||
# `-` prefix: missing env file is non-fatal (deploy ships one, but be safe).
|
|
||||||
assert "EnvironmentFile=-/etc/left4me/cake.env" in text
|
|
||||||
assert (
|
|
||||||
"ExecStart=/usr/local/libexec/left4me/left4me-apply-cake apply" in text
|
|
||||||
)
|
|
||||||
assert (
|
|
||||||
"ExecStop=/usr/local/libexec/left4me/left4me-apply-cake clear" in text
|
|
||||||
)
|
|
||||||
assert "WantedBy=multi-user.target" in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run and confirm FAIL.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_cake_unit_runs_helper_in_apply_and_clear_modes -v
|
|
||||||
```
|
|
||||||
Expected: FAIL on `CAKE_UNIT.is_file()`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Write the unit.**
|
|
||||||
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/left4me-cake.service`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me CAKE egress shaper
|
|
||||||
After=network-online.target
|
|
||||||
Wants=network-online.target
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
RemainAfterExit=yes
|
|
||||||
EnvironmentFile=-/etc/left4me/cake.env
|
|
||||||
ExecStart=/usr/local/libexec/left4me/left4me-apply-cake apply
|
|
||||||
ExecStop=/usr/local/libexec/left4me/left4me-apply-cake clear
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run and confirm PASS.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_cake_unit_runs_helper_in_apply_and_clear_modes -v
|
|
||||||
```
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/files/usr/local/lib/systemd/system/left4me-cake.service deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): left4me-cake.service oneshot wrapping apply-cake helper"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 7: Wire artifacts into `deploy-test-server.sh`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/deploy-test-server.sh`
|
|
||||||
- Modify: `deploy/tests/test_deploy_artifacts.py` (new test)
|
|
||||||
|
|
||||||
This task adds: `nftables` to apt/dnf install lines, copies the four new artifact files into their target paths, conditionally copies `cake.env` only if absent, and `systemctl enable --now`s the two new units. Each piece gets its own assertion in a single new test function.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the new test.**
|
|
||||||
|
|
||||||
Append to `deploy/tests/test_deploy_artifacts.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_deploy_script_installs_network_shaping_artifacts():
|
|
||||||
script = DEPLOY_SCRIPT.read_text()
|
|
||||||
|
|
||||||
# nftables: package install on both apt and dnf paths.
|
|
||||||
apt_lines = [l for l in script.splitlines() if "apt-get install" in l]
|
|
||||||
dnf_lines = [l for l in script.splitlines() if "dnf install" in l]
|
|
||||||
assert apt_lines and dnf_lines
|
|
||||||
for line in apt_lines:
|
|
||||||
assert "nftables" in line, line
|
|
||||||
for line in dnf_lines:
|
|
||||||
assert "nftables" in line, line
|
|
||||||
|
|
||||||
# nft rules + unit copied to system paths.
|
|
||||||
assert "/usr/local/lib/left4me/nft/left4me-mark.nft" in script
|
|
||||||
assert (
|
|
||||||
"/usr/local/lib/systemd/system/left4me-nft-mark.service" in script
|
|
||||||
)
|
|
||||||
assert "systemctl enable --now left4me-nft-mark.service" in script
|
|
||||||
|
|
||||||
# CAKE helper + unit copied; helper made executable.
|
|
||||||
assert "/usr/local/libexec/left4me/left4me-apply-cake" in script
|
|
||||||
assert (
|
|
||||||
"/usr/local/lib/systemd/system/left4me-cake.service" in script
|
|
||||||
)
|
|
||||||
assert "chmod 0755" in script and "left4me-apply-cake" in script
|
|
||||||
assert "systemctl enable --now left4me-cake.service" in script
|
|
||||||
|
|
||||||
# cake.env: copied only if absent (operator edits survive re-deploys).
|
|
||||||
assert "/etc/left4me/cake.env" in script
|
|
||||||
assert "[ -e /etc/left4me/cake.env ]" in script
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run and confirm FAIL.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_network_shaping_artifacts -v
|
|
||||||
```
|
|
||||||
Expected: FAIL on the first missing string.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Edit `deploy-test-server.sh`.**
|
|
||||||
|
|
||||||
Make these targeted edits — do not rewrite the script.
|
|
||||||
|
|
||||||
(a) **Append `nftables` to both package-install lines (line 88 and line 90 in the current file).**
|
|
||||||
|
|
||||||
Old (line 88):
|
|
||||||
```
|
|
||||||
$sudo_cmd apt-get install -y python3 python3-venv python3-pip curl ca-certificates tar gzip util-linux sudo coreutils p7zip-full
|
|
||||||
```
|
|
||||||
New:
|
|
||||||
```
|
|
||||||
$sudo_cmd apt-get install -y python3 python3-venv python3-pip curl ca-certificates tar gzip util-linux sudo coreutils p7zip-full nftables
|
|
||||||
```
|
|
||||||
|
|
||||||
Old (line 90):
|
|
||||||
```
|
|
||||||
$sudo_cmd dnf install -y python3 python3-pip curl ca-certificates tar gzip util-linux sudo coreutils p7zip p7zip-plugins
|
|
||||||
```
|
|
||||||
New:
|
|
||||||
```
|
|
||||||
$sudo_cmd dnf install -y python3 python3-pip curl ca-certificates tar gzip util-linux sudo coreutils p7zip p7zip-plugins nftables
|
|
||||||
```
|
|
||||||
|
|
||||||
(b) **Add the nft-rules-dir creation to the `mkdir -p` block (currently lines 96-106).**
|
|
||||||
|
|
||||||
Append `/usr/local/lib/left4me/nft` to the existing `mkdir -p` invocation:
|
|
||||||
|
|
||||||
Old (lines 96-106):
|
|
||||||
```
|
|
||||||
$sudo_cmd mkdir -p \
|
|
||||||
/etc/left4me \
|
|
||||||
/opt/left4me \
|
|
||||||
/usr/local/lib/systemd/system \
|
|
||||||
/usr/local/libexec/left4me \
|
|
||||||
/var/lib/left4me/installation \
|
|
||||||
/var/lib/left4me/overlays \
|
|
||||||
/var/lib/left4me/instances \
|
|
||||||
/var/lib/left4me/runtime \
|
|
||||||
/var/lib/left4me/workshop_cache \
|
|
||||||
/var/lib/left4me/tmp
|
|
||||||
```
|
|
||||||
New (insert one line after `/usr/local/libexec/left4me`):
|
|
||||||
```
|
|
||||||
$sudo_cmd mkdir -p \
|
|
||||||
/etc/left4me \
|
|
||||||
/opt/left4me \
|
|
||||||
/usr/local/lib/systemd/system \
|
|
||||||
/usr/local/libexec/left4me \
|
|
||||||
/usr/local/lib/left4me/nft \
|
|
||||||
/var/lib/left4me/installation \
|
|
||||||
/var/lib/left4me/overlays \
|
|
||||||
/var/lib/left4me/instances \
|
|
||||||
/var/lib/left4me/runtime \
|
|
||||||
/var/lib/left4me/workshop_cache \
|
|
||||||
/var/lib/left4me/tmp
|
|
||||||
```
|
|
||||||
|
|
||||||
(c) **Copy the new systemd units alongside the existing ones (after line 140's `l4d2-build.slice` copy).**
|
|
||||||
|
|
||||||
Insert immediately after the `l4d2-build.slice` copy (the existing line that ends `l4d2-build.slice`):
|
|
||||||
|
|
||||||
```
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service /usr/local/lib/systemd/system/left4me-nft-mark.service
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-cake.service /usr/local/lib/systemd/system/left4me-cake.service
|
|
||||||
```
|
|
||||||
|
|
||||||
(d) **Copy the nftables rules file alongside the existing `install`-mode copies (next to the sandbox-resolv.conf install at lines 189-191).**
|
|
||||||
|
|
||||||
Insert after the sandbox-resolv install block:
|
|
||||||
|
|
||||||
```
|
|
||||||
# Network packet marking + shaping. See spec
|
|
||||||
# docs/superpowers/specs/2026-05-10-l4d2-network-shaping-design.md.
|
|
||||||
$sudo_cmd install -m 0644 -o root -g root \
|
|
||||||
/opt/left4me/deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft \
|
|
||||||
/usr/local/lib/left4me/nft/left4me-mark.nft
|
|
||||||
```
|
|
||||||
|
|
||||||
(e) **Copy the CAKE helper alongside the other libexec helpers (after the existing `cp` block at lines 175-179).**
|
|
||||||
|
|
||||||
Find the existing `cp` block that copies `left4me-systemctl`, `left4me-journalctl`, `left4me-overlay`, `left4me-script-sandbox`. Add a new `cp` line for `left4me-apply-cake`, and add it to the `chmod 0755` line on line 179:
|
|
||||||
|
|
||||||
Old (line 178):
|
|
||||||
```
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/libexec/left4me/left4me-script-sandbox /usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
After it, insert:
|
|
||||||
```
|
|
||||||
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/libexec/left4me/left4me-apply-cake /usr/local/libexec/left4me/left4me-apply-cake
|
|
||||||
```
|
|
||||||
|
|
||||||
Old (line 179):
|
|
||||||
```
|
|
||||||
$sudo_cmd chmod 0755 /usr/local/libexec/left4me/left4me-systemctl /usr/local/libexec/left4me/left4me-journalctl /usr/local/libexec/left4me/left4me-overlay /usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
New (append `left4me-apply-cake`):
|
|
||||||
```
|
|
||||||
$sudo_cmd chmod 0755 /usr/local/libexec/left4me/left4me-systemctl /usr/local/libexec/left4me/left4me-journalctl /usr/local/libexec/left4me/left4me-overlay /usr/local/libexec/left4me/left4me-script-sandbox /usr/local/libexec/left4me/left4me-apply-cake
|
|
||||||
```
|
|
||||||
|
|
||||||
(f) **Conditionally copy `cake.env` (after the existing sysctl install/apply block at lines 193-198).**
|
|
||||||
|
|
||||||
Insert immediately after `$sudo_cmd sysctl --system >/dev/null`:
|
|
||||||
|
|
||||||
```
|
|
||||||
# CAKE config: ship the template only if the operator hasn't created one
|
|
||||||
# (their LEFT4ME_UPLINK_MBIT value must survive re-deploys).
|
|
||||||
if [ ! -e /etc/left4me/cake.env ]; then
|
|
||||||
$sudo_cmd install -m 0644 -o root -g root \
|
|
||||||
/opt/left4me/deploy/files/etc/left4me/cake.env \
|
|
||||||
/etc/left4me/cake.env
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
(g) **Enable the new units alongside the existing `systemctl enable --now left4me-web.service`.**
|
|
||||||
|
|
||||||
Find the existing block (around line 315-316):
|
|
||||||
```
|
|
||||||
$sudo_cmd systemctl daemon-reload
|
|
||||||
$sudo_cmd systemctl enable --now left4me-web.service
|
|
||||||
```
|
|
||||||
|
|
||||||
Insert two lines between them:
|
|
||||||
```
|
|
||||||
$sudo_cmd systemctl daemon-reload
|
|
||||||
$sudo_cmd systemctl enable --now left4me-nft-mark.service
|
|
||||||
$sudo_cmd systemctl enable --now left4me-cake.service
|
|
||||||
$sudo_cmd systemctl enable --now left4me-web.service
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run all existing tests + the new one to make sure nothing regressed.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py -v
|
|
||||||
```
|
|
||||||
Expected: every test passes, including the new `test_deploy_script_installs_network_shaping_artifacts` and the unmodified `test_deploy_script_shell_syntax` (the latter validates `sh -n` on the modified script).
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/deploy-test-server.sh deploy/tests/test_deploy_artifacts.py
|
|
||||||
git commit -m "feat(deploy): wire nft marking + CAKE shaper into deploy script"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 8: README documentation
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `deploy/README.md`
|
|
||||||
|
|
||||||
This is documentation only — no test asserts the README contents. Run an `sh -n` of the deploy script one more time after editing, just as a hygiene check (the README change can't affect it, but the test suite is fast).
|
|
||||||
|
|
||||||
- [ ] **Step 1: Open `deploy/README.md` and locate the existing Performance tuning section.**
|
|
||||||
|
|
||||||
The previous perf-baseline spec added a "Performance tuning" section (entries for CPU governor, CPU affinity, NIC tuning, and real-time scheduling opt-in). Find it.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Add a "Network shaping" subsection.**
|
|
||||||
|
|
||||||
Add this subsection at the top of "Performance tuning" (before the existing entries; network-shaping covers the universal artifacts that ship by default, while the existing entries are escape hatches):
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
### Network shaping
|
|
||||||
|
|
||||||
The deploy ships three things that affect player-experience network behaviour:
|
|
||||||
|
|
||||||
1. **Per-flow marking.** `left4me-nft-mark.service` loads a small nftables
|
|
||||||
table (`inet left4me_mark`) that marks every UDP packet from uid `left4me`
|
|
||||||
with DSCP EF and `skb->priority` 6. srcds doesn't set these itself, so
|
|
||||||
without this rule its UDP is indistinguishable from any other flow.
|
|
||||||
2. **Sysctl baseline.** `99-left4me.conf` sets `udp_rmem_min=16384`,
|
|
||||||
`udp_wmem_min=16384`, `default_qdisc=fq_codel`, and
|
|
||||||
`tcp_congestion_control=bbr`. Reduces head-of-line blocking when bulk
|
|
||||||
TCP egress (backups, package fetches, web responses) coexists with
|
|
||||||
game UDP.
|
|
||||||
3. **CAKE egress shaping.** `left4me-cake.service` runs
|
|
||||||
`tc qdisc replace dev <iface> root cake bandwidth Xmbit internet
|
|
||||||
diffserv4 dual-dsthost` from `/etc/left4me/cake.env`. CAKE only shapes
|
|
||||||
if its declared bandwidth is **below** the real bottleneck, so set
|
|
||||||
`LEFT4ME_UPLINK_MBIT` to ≈95% of measured uplink:
|
|
||||||
|
|
||||||
sudoedit /etc/left4me/cake.env
|
|
||||||
# set LEFT4ME_UPLINK_MBIT=480 (or whatever ~95% of your uplink is)
|
|
||||||
sudo systemctl restart left4me-cake.service
|
|
||||||
|
|
||||||
`LEFT4ME_UPLINK_IFACE` is auto-detected from the IPv4 default route;
|
|
||||||
override only on hosts with multi-homed setups.
|
|
||||||
|
|
||||||
At idle 500 Mbit with no competing egress, CAKE shapes nothing — that's
|
|
||||||
expected, not a bug. The win materialises when bulk traffic on the
|
|
||||||
same uplink would otherwise bufferbloat the link the players share.
|
|
||||||
|
|
||||||
**Production hosts running `systemd-networkd`** should NOT use the
|
|
||||||
`left4me-cake.service` oneshot. Instead, configure the equivalent in the
|
|
||||||
matching `.network` file, which systemd-networkd reapplies across iface
|
|
||||||
lifecycle events:
|
|
||||||
|
|
||||||
# /etc/systemd/network/<your-uplink>.network
|
|
||||||
[CAKE]
|
|
||||||
Bandwidth=480M
|
|
||||||
OverheadKeyword=internet
|
|
||||||
PriorityQueueingPreset=diffserv4
|
|
||||||
EgressHostIsolation=yes
|
|
||||||
|
|
||||||
The nftables marking from (1) is qdisc-installer-agnostic and ships
|
|
||||||
unchanged on production.
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Append the three new escape hatches to the existing Performance tuning section.**
|
|
||||||
|
|
||||||
Add after the existing escape-hatch entries (CPU governor / CPU affinity / NIC tuning / real-time scheduling):
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
### Additional opt-in network knobs
|
|
||||||
|
|
||||||
- **Ingress shaping via IFB.** Egress CAKE alone does not protect srcds
|
|
||||||
receive against ingress saturation (large workshop downloads, package
|
|
||||||
fetches arriving at line rate). One-liner:
|
|
||||||
|
|
||||||
sudo modprobe ifb && sudo ip link set ifb0 up
|
|
||||||
sudo tc qdisc add dev <uplink> handle ffff: ingress
|
|
||||||
sudo tc filter add dev <uplink> parent ffff: protocol ip u32 \
|
|
||||||
match u32 0 0 action mirred egress redirect dev ifb0
|
|
||||||
sudo tc qdisc add dev ifb0 root cake bandwidth Xmbit ingress \
|
|
||||||
diffserv4 dual-srchost
|
|
||||||
|
|
||||||
Worth flipping only when measurement shows ingress hurting receive.
|
|
||||||
|
|
||||||
- **`net.core.busy_poll = 50` / `net.core.busy_read = 50`.** Reduces UDP
|
|
||||||
receive median latency by polling for incoming packets briefly at
|
|
||||||
syscall boundaries. Cost: measurable CPU per syscall under load. Worth
|
|
||||||
flipping if a host is dedicated to game serving and CPU headroom is
|
|
||||||
plentiful.
|
|
||||||
|
|
||||||
- **`ethtool -K <iface> gro off`.** Some Source-engine ops disable
|
|
||||||
generic receive offload to avoid receive-side coalescing latency.
|
|
||||||
Hardware/driver dependent; document only.
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Re-run the full test suite.**
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py -v
|
|
||||||
```
|
|
||||||
Expected: every test passes, including `test_deploy_script_shell_syntax`.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit.**
|
|
||||||
|
|
||||||
```
|
|
||||||
git add deploy/README.md
|
|
||||||
git commit -m "docs(deploy): document network-shaping defaults + opt-in network knobs"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Final verification
|
|
||||||
|
|
||||||
After all eight tasks land, run the whole suite once more and verify the new files are tracked:
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py -v
|
|
||||||
git status
|
|
||||||
git log --oneline -10
|
|
||||||
```
|
|
||||||
|
|
||||||
Every test should pass. `git status` should be clean. The last 8 commits should match the eight tasks above.
|
|
||||||
|
|
||||||
The new files in the tree:
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/files/etc/left4me/cake.env
|
|
||||||
deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft
|
|
||||||
deploy/files/usr/local/lib/systemd/system/left4me-cake.service
|
|
||||||
deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service
|
|
||||||
deploy/files/usr/local/libexec/left4me/left4me-apply-cake
|
|
||||||
```
|
|
||||||
|
|
||||||
Modified files:
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/files/etc/sysctl.d/99-left4me.conf
|
|
||||||
deploy/deploy-test-server.sh
|
|
||||||
deploy/README.md
|
|
||||||
deploy/tests/test_deploy_artifacts.py
|
|
||||||
```
|
|
||||||
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -1,131 +0,0 @@
|
||||||
# RCON Password Display Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Show the RCON password on the server detail page with a show/hide toggle.
|
|
||||||
|
|
||||||
**Architecture:** Three-file change. An external JS file (`password-reveal.js`) provides the reveal/hide interaction via event delegation on `[data-password-toggle]` attributes — no inline handlers or HTML event attributes. The template adds a row to the existing `.server-info` definition list with a masked span, value span, and toggle button. Base.html adds the script include alongside existing JS files.
|
|
||||||
|
|
||||||
**Tech Stack:** Vanilla JS, Jinja2 templates, Flask
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
| File | Responsibility |
|
|
||||||
|------|---------------|
|
|
||||||
| `l4d2web/static/js/password-reveal.js` | New. Delegated click listener for show/hide toggle on `[data-password-toggle]` |
|
|
||||||
| `l4d2web/templates/server_detail.html` | Add one `<div>` row to `.server-info` DL |
|
|
||||||
| `l4d2web/templates/base.html` | Add `<script src="...password-reveal.js">` |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 1: Create the reveal/hide JS
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/static/js/password-reveal.js`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Create `password-reveal.js`**
|
|
||||||
|
|
||||||
```js
|
|
||||||
document.addEventListener('click', (e) => {
|
|
||||||
const btn = e.target.closest('[data-password-toggle]');
|
|
||||||
if (!btn) return;
|
|
||||||
const id = btn.dataset.passwordToggle;
|
|
||||||
const mask = document.querySelector(`[data-password-field="${id}"].password-mask`);
|
|
||||||
const value = document.querySelector(`[data-password-field="${id}"].password-value`);
|
|
||||||
if (!mask || !value) return;
|
|
||||||
const hidden = value.hidden;
|
|
||||||
value.hidden = !hidden;
|
|
||||||
mask.hidden = hidden;
|
|
||||||
btn.textContent = hidden ? 'hide' : 'show';
|
|
||||||
btn.setAttribute('aria-label', hidden ? 'Hide RCON password' : 'Show RCON password');
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Verify the file exists**
|
|
||||||
|
|
||||||
Run: `ls -la l4d2web/static/js/password-reveal.js`
|
|
||||||
Expected: File exists, is about 450 bytes
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/static/js/password-reveal.js
|
|
||||||
git commit -m "feat: add password reveal toggle JS"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 2: Add RCON password row to server detail template
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/templates/server_detail.html:13`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the RCON password row after the blueprint row**
|
|
||||||
|
|
||||||
Insert after line 13 (`</dd></div>` for blueprint):
|
|
||||||
|
|
||||||
```html
|
|
||||||
<div><dt>RCON Password</dt><dd><span class="password-mask" data-password-field="{{ server.id }}">••••••••••••</span><span class="password-value" data-password-field="{{ server.id }}" hidden>{{ server.rcon_password }}</span> <button class="link-button" data-password-toggle="{{ server.id }}" aria-label="Show RCON password">show</button></dd></div>
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected result: the `.server-info` DL now shows three rows: Port, Blueprint, RCON Password.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Verify template renders**
|
|
||||||
|
|
||||||
Run: `python -c "from jinja2 import Environment; env=Environment(); env.parse(open('l4d2web/templates/server_detail.html').read()); print('parse ok')"`
|
|
||||||
Expected: `parse ok`
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/templates/server_detail.html
|
|
||||||
git commit -m "feat: add RCON password row to server detail page"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 3: Include the script in base template
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/templates/base.html:44`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the script include**
|
|
||||||
|
|
||||||
Insert after line 43 (`<script src="...file-tree.js">`):
|
|
||||||
|
|
||||||
```html
|
|
||||||
<script src="{{ url_for('static', filename='js/password-reveal.js') }}"></script>
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected result: `base.html` now has 5 script includes: htmx, csrf.js, sse.js, modal.js, file-tree.js, password-reveal.js.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Verify the app starts**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && python -c "from l4d2web.app import create_app; app=create_app(); print('app created ok')"` (or similar smoke test)
|
|
||||||
|
|
||||||
Expected: App initializes without import/template errors.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/templates/base.html
|
|
||||||
git commit -m "feat: include password-reveal.js in base template"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 4: Run tests
|
|
||||||
|
|
||||||
**Files:** None
|
|
||||||
|
|
||||||
- [ ] **Step 1: Run existing test suite**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: All tests pass (no regressions from this purely-presentational change)
|
|
||||||
|
|
||||||
- [ ] **Step 2: If any tests fail, investigate and fix**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q --tb=long`
|
|
||||||
Expected: Clear failure report to debug
|
|
||||||
|
|
@ -1,408 +0,0 @@
|
||||||
# Server Hostname (Source `hostname` cvar) Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Add a `hostname` column to the Server model so users can set the Source `hostname` cvar (server browser/MOTD name), with an ephemeral `"<username> <server.name>"` fallback resolved at deploy time.
|
|
||||||
|
|
||||||
**Architecture:** New `hostname VARCHAR(128)` column on `servers` table (default `""`). Empty = auto-generate at deploy. The `build_server_spec_payload()` function gains a `resolved_hostname` kwarg; `initialize_server()` resolves the fallback ephemerally. The server detail page shows an inline form under RCON password. Same `POST /servers/<id>` endpoint handles saving.
|
|
||||||
|
|
||||||
**Tech Stack:** Python 3.12+, Flask, SQLAlchemy, Alembic, pytest.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 1: Add `hostname` column to Server model and migration
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/models.py`
|
|
||||||
- Create: `l4d2web/alembic/versions/0011_server_hostname.py`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add hostname column to model**
|
|
||||||
|
|
||||||
Add to the `Server` class in `l4d2web/models.py:131`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
hostname: Mapped[str] = mapped_column(String(128), default="", nullable=False)
|
|
||||||
```
|
|
||||||
|
|
||||||
Place it after `rcon_password` (line 148) and before `created_at` (line 149).
|
|
||||||
|
|
||||||
- [ ] **Step 2: Create the migration**
|
|
||||||
|
|
||||||
Create `l4d2web/alembic/versions/0011_server_hostname.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
"""add hostname column to servers
|
|
||||||
|
|
||||||
Revision ID: 0011_server_hostname
|
|
||||||
Revises: 0010_server_live_state
|
|
||||||
Create Date: 2026-05-13
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import Sequence, Union
|
|
||||||
|
|
||||||
import sqlalchemy as sa
|
|
||||||
from alembic import op
|
|
||||||
|
|
||||||
|
|
||||||
revision: str = "0011_server_hostname"
|
|
||||||
down_revision: Union[str, Sequence[str], None] = "0010_server_live_state"
|
|
||||||
branch_labels: Union[str, Sequence[str], None] = None
|
|
||||||
depends_on: Union[str, Sequence[str], None] = None
|
|
||||||
|
|
||||||
|
|
||||||
def upgrade() -> None:
|
|
||||||
with op.batch_alter_table("servers") as batch_op:
|
|
||||||
batch_op.add_column(
|
|
||||||
sa.Column("hostname", sa.String(length=128), nullable=False, server_default="")
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def downgrade() -> None:
|
|
||||||
with op.batch_alter_table("servers") as batch_op:
|
|
||||||
batch_op.drop_column("hostname")
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Verify migration applies cleanly**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && alembic upgrade head`
|
|
||||||
Expected: runs `0011_server_hostname` migration, adds the column.
|
|
||||||
|
|
||||||
Run: `cd l4d2web && alembic downgrade -1`
|
|
||||||
Expected: drops the column.
|
|
||||||
|
|
||||||
Run: `cd l4d2web && alembic upgrade head`
|
|
||||||
Expected: re-adds the column.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit model + migration**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/models.py l4d2web/alembic/versions/0011_server_hostname.py
|
|
||||||
git commit -m "feat(l4d2-web): add hostname column to Server model"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 2: Accept and save `hostname` on server update
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/routes/server_routes.py`
|
|
||||||
- Test: `l4d2web/tests/test_servers.py`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write failing hostname update tests**
|
|
||||||
|
|
||||||
Add to `l4d2web/tests/test_servers.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_create_server_hostname_defaults_empty(user_client_with_blueprints) -> None:
|
|
||||||
from sqlalchemy import select
|
|
||||||
from l4d2web.models import Server
|
|
||||||
|
|
||||||
client, data = user_client_with_blueprints
|
|
||||||
response = client.post(
|
|
||||||
"/servers",
|
|
||||||
data={"name": "alpha", "port": "27015", "blueprint_id": str(data["blueprint_id"])},
|
|
||||||
headers={"X-CSRF-Token": "test-token"},
|
|
||||||
)
|
|
||||||
assert response.status_code == 302
|
|
||||||
|
|
||||||
with session_scope() as session:
|
|
||||||
server = session.scalar(select(Server).where(Server.name == "alpha"))
|
|
||||||
assert server is not None
|
|
||||||
assert server.hostname == ""
|
|
||||||
|
|
||||||
|
|
||||||
def test_update_server_hostname_via_form(user_client_with_blueprints) -> None:
|
|
||||||
from sqlalchemy import select
|
|
||||||
from l4d2web.models import Server
|
|
||||||
|
|
||||||
client, data = user_client_with_blueprints
|
|
||||||
create = client.post(
|
|
||||||
"/servers",
|
|
||||||
data={"name": "alpha", "port": "27015", "blueprint_id": str(data["blueprint_id"])},
|
|
||||||
headers={"X-CSRF-Token": "test-token"},
|
|
||||||
)
|
|
||||||
server_id = create.headers["Location"].rsplit("/", 1)[1]
|
|
||||||
|
|
||||||
update = client.post(
|
|
||||||
f"/servers/{server_id}",
|
|
||||||
data={"name": "alpha", "hostname": "My Cool Server"},
|
|
||||||
headers={"X-CSRF-Token": "test-token"},
|
|
||||||
)
|
|
||||||
assert update.status_code == 302
|
|
||||||
|
|
||||||
with session_scope() as session:
|
|
||||||
server = session.scalar(select(Server).where(Server.name == "alpha"))
|
|
||||||
assert server is not None
|
|
||||||
assert server.hostname == "My Cool Server"
|
|
||||||
|
|
||||||
|
|
||||||
def test_update_server_clears_hostname(user_client_with_blueprints) -> None:
|
|
||||||
from sqlalchemy import select
|
|
||||||
from l4d2web.models import Server
|
|
||||||
|
|
||||||
client, data = user_client_with_blueprints
|
|
||||||
create = client.post(
|
|
||||||
"/servers",
|
|
||||||
data={"name": "alpha", "port": "27015", "blueprint_id": str(data["blueprint_id"])},
|
|
||||||
headers={"X-CSRF-Token": "test-token"},
|
|
||||||
)
|
|
||||||
server_id = create.headers["Location"].rsplit("/", 1)[1]
|
|
||||||
|
|
||||||
# Set hostname first
|
|
||||||
client.post(
|
|
||||||
f"/servers/{server_id}",
|
|
||||||
data={"name": "alpha", "hostname": "My Cool Server"},
|
|
||||||
headers={"X-CSRF-Token": "test-token"},
|
|
||||||
)
|
|
||||||
|
|
||||||
# Clear it
|
|
||||||
client.post(
|
|
||||||
f"/servers/{server_id}",
|
|
||||||
data={"name": "alpha", "hostname": ""},
|
|
||||||
headers={"X-CSRF-Token": "test-token"},
|
|
||||||
)
|
|
||||||
|
|
||||||
with session_scope() as session:
|
|
||||||
server = session.scalar(select(Server).where(Server.name == "alpha"))
|
|
||||||
assert server is not None
|
|
||||||
assert server.hostname == ""
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run tests to verify failure**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_servers.py -k "hostname" -v`
|
|
||||||
Expected: FAIL — `Server` object has no attribute `hostname`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Save hostname from form in update route**
|
|
||||||
|
|
||||||
In `l4d2web/routes/server_routes.py`, modify `update_server_form` (around line 130) to save hostname:
|
|
||||||
|
|
||||||
```python
|
|
||||||
server.name = name
|
|
||||||
server.hostname = request.form.get("hostname", "")
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run tests to verify pass**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_servers.py -k "hostname" -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit hostname update support**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/routes/server_routes.py l4d2web/tests/test_servers.py
|
|
||||||
git commit -m "feat(l4d2-web): accept hostname on server update, default empty on create"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 3: Emit `hostname` in spec payload with ephemeral fallback
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/services/l4d2_facade.py`
|
|
||||||
- Test: `l4d2web/tests/test_l4d2_facade.py`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write failing hostname spec tests**
|
|
||||||
|
|
||||||
Add to `l4d2web/tests/test_l4d2_facade.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_build_server_spec_payload_injects_hostname() -> None:
|
|
||||||
from l4d2web.services.l4d2_facade import build_server_spec_payload
|
|
||||||
|
|
||||||
bp = Blueprint(id=1, user_id=1, name="bp", arguments="[]", config='["sv_consistency 1"]')
|
|
||||||
srv = Server(id=1, user_id=1, blueprint_id=1, name="alpha", port=27015, rcon_password="sekret")
|
|
||||||
spec = build_server_spec_payload(srv, bp, [], resolved_hostname="My Server")
|
|
||||||
cfg = spec["config"]
|
|
||||||
assert "hostname \"My Server\"" in cfg
|
|
||||||
assert cfg[-1] == "rcon_password \"sekret\""
|
|
||||||
|
|
||||||
|
|
||||||
def test_build_server_spec_payload_omits_hostname_when_empty() -> None:
|
|
||||||
from l4d2web.services.l4d2_facade import build_server_spec_payload
|
|
||||||
|
|
||||||
bp = Blueprint(id=1, user_id=1, name="bp", arguments="[]", config="[]")
|
|
||||||
srv = Server(id=1, user_id=1, blueprint_id=1, name="alpha", port=27015, rcon_password="sekret")
|
|
||||||
spec = build_server_spec_payload(srv, bp, [])
|
|
||||||
for line in spec["config"]:
|
|
||||||
assert not line.startswith("hostname ")
|
|
||||||
|
|
||||||
|
|
||||||
def test_initialize_server_resolves_fallback_hostname(
|
|
||||||
monkeypatch: pytest.MonkeyPatch, server_with_blueprint,
|
|
||||||
) -> None:
|
|
||||||
"""When server.hostname is empty, deploy emits hostname "<username> <server.name>"."""
|
|
||||||
from l4d2web.services.l4d2_facade import initialize_server
|
|
||||||
|
|
||||||
spec_contents: list[str] = []
|
|
||||||
|
|
||||||
def fake_run_command(cmd, **kwargs):
|
|
||||||
nonlocal spec_contents
|
|
||||||
spec_path = cmd[cmd.index("-f") + 1]
|
|
||||||
spec_contents.append(Path(spec_path).read_text())
|
|
||||||
return CommandResult(returncode=0, stdout="", stderr="")
|
|
||||||
|
|
||||||
monkeypatch.setattr("l4d2web.services.host_commands.run_command", fake_run_command)
|
|
||||||
|
|
||||||
server_id, _ = server_with_blueprint
|
|
||||||
initialize_server(server_id)
|
|
||||||
|
|
||||||
assert len(spec_contents) == 1
|
|
||||||
assert "hostname" in spec_contents[0]
|
|
||||||
# The fixture creates user "alice" and server named "alpha"
|
|
||||||
assert '"alice alpha"' in spec_contents[0]
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run tests to verify failure**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_l4d2_facade.py -k "hostname" -v`
|
|
||||||
Expected: FAIL — `build_server_spec_payload()` got unexpected keyword `resolved_hostname`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Add `resolved_hostname` kwarg and emit line**
|
|
||||||
|
|
||||||
In `l4d2web/services/l4d2_facade.py`, modify `build_server_spec_payload` signature and add the hostname injection before `rcon_password`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def build_server_spec_payload(
|
|
||||||
server: Server,
|
|
||||||
blueprint: Blueprint,
|
|
||||||
overlay_rows: list[tuple[int, str, bool]],
|
|
||||||
*,
|
|
||||||
resolved_hostname: str = "",
|
|
||||||
) -> dict:
|
|
||||||
```
|
|
||||||
|
|
||||||
Inside the function, after building `config_lines` and before the `if server.rcon_password:` block, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
if resolved_hostname:
|
|
||||||
config_lines.append(f'hostname "{resolved_hostname}"')
|
|
||||||
```
|
|
||||||
|
|
||||||
Then in `initialize_server`, resolve the fallback. Add the `User` import at the top:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from l4d2web.models import (
|
|
||||||
Blueprint,
|
|
||||||
BlueprintOverlay,
|
|
||||||
Overlay,
|
|
||||||
OverlayWorkshopItem,
|
|
||||||
Server,
|
|
||||||
User,
|
|
||||||
WorkshopItem,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
In `initialize_server`, after `load_server_blueprint_bundle(server_id)`, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Resolve hostname — explicit override or ephemeral fallback
|
|
||||||
if server.hostname:
|
|
||||||
resolved_hostname = server.hostname
|
|
||||||
else:
|
|
||||||
with session_scope() as db:
|
|
||||||
user = db.get(User, server.user_id)
|
|
||||||
resolved_hostname = f"{user.username} {server.name}"
|
|
||||||
```
|
|
||||||
|
|
||||||
Then pass it to `build_server_spec_payload`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
spec_path = write_temp_spec(build_server_spec_payload(
|
|
||||||
server, blueprint, overlay_rows, resolved_hostname=resolved_hostname,
|
|
||||||
))
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run tests to verify pass**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_l4d2_facade.py -k "hostname" -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
Also run full suite to check nothing broken: `pytest l4d2web/tests/test_l4d2_facade.py -v`
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit hostname spec payload**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/services/l4d2_facade.py l4d2web/tests/test_l4d2_facade.py
|
|
||||||
git commit -m "feat(l4d2-web): emit hostname in spec config with ephemeral fallback"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 4: Add hostname form to server detail page
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/templates/server_detail.html`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Verify the current template renders correctly first**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: PASS (baseline).
|
|
||||||
|
|
||||||
- [ ] **Step 2: Add hostname form under RCON password**
|
|
||||||
|
|
||||||
In `l4d2web/templates/server_detail.html`, after the RCON password `<dd>` block (closing `</dd>` at line 14) and before the closing `</dl>` (line 15), add:
|
|
||||||
|
|
||||||
```html
|
|
||||||
<div><dt>Hostname</dt>
|
|
||||||
<dd>
|
|
||||||
<form method="post" action="/servers/{{ server.id }}" class="inline-save">
|
|
||||||
<input type="hidden" name="csrf_token" value="{{ session.get('csrf_token', '') }}">
|
|
||||||
<input name="hostname" value="{{ server.hostname }}" placeholder="{{ user.username }} {{ server.name }}" maxlength="128">
|
|
||||||
<button type="submit">Save</button>
|
|
||||||
<span class="field-hint">Leave empty for auto: "{{ user.username }} {{ server.name }}"</span>
|
|
||||||
</form>
|
|
||||||
</dd>
|
|
||||||
</div>
|
|
||||||
```
|
|
||||||
|
|
||||||
The `user` variable is already available in the template context (the server detail route passes it through the auth mechanism).
|
|
||||||
|
|
||||||
- [ ] **Step 3: Run full test suite to verify nothing broken**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit template change**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/templates/server_detail.html
|
|
||||||
git commit -m "feat(l4d2-web): add hostname edit form to server detail page"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 5: Final integration verification
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Run full test suites
|
|
||||||
|
|
||||||
- [ ] **Step 1: Run all tests**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
Run: `pytest l4d2host/tests -q` (host lib must not be affected)
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run alembic check to ensure migration is the latest**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && alembic check`
|
|
||||||
Expected: "No new upgrade operations detected."
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit any final touches needed**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add -A
|
|
||||||
git commit -m "chore: finalize server hostname feature"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Self-Review
|
|
||||||
|
|
||||||
- [ ] Spec coverage: model column, migration, update route saves hostname, spec payload emits hostname line, ephemeral fallback resolved in initialize_server, template has inline form.
|
|
||||||
- [ ] Placeholder scan: no TODOs or TBDs.
|
|
||||||
- [ ] Type/name consistency: `resolved_hostname` kwarg matches usage in both caller and callee.
|
|
||||||
- [ ] Verification: each task has exact test commands and expected outcomes.
|
|
||||||
|
|
@ -1,331 +0,0 @@
|
||||||
# Idmapped lowerdirs for left4me kernel-overlayfs
|
|
||||||
|
|
||||||
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
|
|
||||||
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). With
|
|
||||||
> `l4d2-sandbox` collapsed into `left4me`, all overlay content is
|
|
||||||
> uniformly `left4me`-owned end-to-end and no idmap is needed at
|
|
||||||
> mount time either. Kept for design-evolution context.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
Kernel-overlayfs copy-up preserves the lower-layer file's owner and mode in the
|
|
||||||
upperdir. Script overlays today are built by `left4me-script-sandbox` running as
|
|
||||||
uid `l4d2-sandbox`, and the helper finalizes them as `l4d2-sandbox:l4d2-sandbox
|
|
||||||
0755`. When the L4D2 server (uid `left4me`) tries to write into a directory
|
|
||||||
that exists only in the lower layer — e.g. SourceMod's `addons/sourcemod/logs/`
|
|
||||||
for log rotation — copy-up succeeds but the result is `l4d2-sandbox`-owned, so
|
|
||||||
the write fails `EACCES`. Workshop overlays are unaffected because the web app
|
|
||||||
(uid `left4me`) builds them as `left4me`-owned with symlinks into a `left4me`-
|
|
||||||
owned cache.
|
|
||||||
|
|
||||||
We considered four fixes (chown-flip on every rebuild, shared group, collapse
|
|
||||||
sandbox uid into `left4me`, idmapped lowerdir bind mounts). The user chose the
|
|
||||||
idmap path: disk state stays untouched, the mount stack remaps `l4d2-sandbox →
|
|
||||||
left4me` at mount time, kernel-overlayfs sees a `left4me`-owned lower layer,
|
|
||||||
and copy-up creates upperdir entries owned by `left4me` naturally. No
|
|
||||||
ownership flipping, no shared group, no security regression.
|
|
||||||
|
|
||||||
Outcome: `sm_cvar` writes succeed, SM logs land in `runtime/<n>/upper/...`,
|
|
||||||
and any future "writes into a sandbox-built lower layer" works without
|
|
||||||
left4me-specific plumbing.
|
|
||||||
|
|
||||||
## Environment confirmed
|
|
||||||
|
|
||||||
Test server: Debian Trixie, kernel `6.12.86+deb13-amd64`, util-linux supports
|
|
||||||
`mount --map-users <on_disk>:<in_mount>:<count>`. Idmapped lowerdirs for
|
|
||||||
overlayfs landed mainline in 6.6, so 6.12 is fine. Verified end-to-end on
|
|
||||||
`/var/lib/left4me/` (ext4) in a temp dir on 2026-05-14:
|
|
||||||
|
|
||||||
1. Source dir owned `l4d2-sandbox:l4d2-sandbox` (uid 981).
|
|
||||||
2. `mount --bind --map-users=981:980:1 --map-groups=981:980:1 src dst` — `dst`
|
|
||||||
view shows uid 980 (left4me).
|
|
||||||
3. Overlay mount with the idmapped path as `lowerdir=` — merged view also
|
|
||||||
shows uid 980.
|
|
||||||
4. `sudo -u left4me touch merged/addons/sourcemod/logs/L_test.log` — succeeds.
|
|
||||||
`sudo -u left4me bash -c "echo x >> merged/file-from-sandbox.txt"` —
|
|
||||||
succeeds (copy-up of existing file).
|
|
||||||
5. `upper/` after writes is entirely `left4me`-owned (uid 980).
|
|
||||||
|
|
||||||
Caveat surfaced during testing: `--map-users` direction is **on-disk uid
|
|
||||||
first**, not "inner-namespace uid". The util-linux man page calls it
|
|
||||||
`<inner>:<outer>:<count>` but `<inner>` means "the filesystem's native view"
|
|
||||||
(on disk) and `<outer>` means "what the mount exposes outward". Easy to get
|
|
||||||
wrong; do not trust the man page wording.
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
The privileged mount helper grows one step before the overlay mount: for each
|
|
||||||
lowerdir whose owning uid is `l4d2-sandbox`, create an idmapped bind mount at
|
|
||||||
`runtime/<n>/idmap/<basename>` that remaps that uid to `left4me`. Use the
|
|
||||||
idmapped paths (instead of raw paths) in the `lowerdir=` string passed to
|
|
||||||
`mount -t overlay`. On umount, tear the idmap binds down after the overlay
|
|
||||||
itself is unmounted.
|
|
||||||
|
|
||||||
Lowerdirs already owned by `left4me` (workshop builds, `installation/`,
|
|
||||||
caches) bypass the idmap step and are used as-is, so workshop overlays keep
|
|
||||||
working without behavior change.
|
|
||||||
|
|
||||||
## Execution shape
|
|
||||||
|
|
||||||
Implementation will be driven by `superpowers:subagent-driven-development`:
|
|
||||||
fresh implementer subagent per task, followed by a spec-compliance reviewer
|
|
||||||
then a code-quality reviewer. The project's `AGENTS.md` forbids git
|
|
||||||
worktrees, so all work happens in the live tree. Commits land directly on
|
|
||||||
`master` per the user-confirmed project pattern.
|
|
||||||
|
|
||||||
Tasks ordered for review-friendly progression. Each task is independently
|
|
||||||
committable; the deploy/verify step at the end exercises the whole chain on
|
|
||||||
the real test server.
|
|
||||||
|
|
||||||
### Task 1 — Idmap bind mounts in `left4me-overlay`
|
|
||||||
|
|
||||||
Edit `deploy/files/usr/local/libexec/left4me/left4me-overlay` and
|
|
||||||
`l4d2host/tests/test_overlay_helper.py` together (TDD: write failing
|
|
||||||
PRINT_ONLY-mode tests first, then make them pass).
|
|
||||||
|
|
||||||
Behavior to add:
|
|
||||||
- Resolve `l4d2_sandbox_uid` and `left4me_uid` (and gids) via `pwd.getpwnam`
|
|
||||||
/ `grp.getgrnam`. Hard fail with a clear message if either is missing.
|
|
||||||
- On `mount <name>`: before constructing the `lowerdir=` string, for each
|
|
||||||
resolved lowerdir, stat it; if the top-level dir's `st_uid` equals
|
|
||||||
`l4d2_sandbox_uid`, create `runtime/<n>/idmap/<basename>` (mode `0700`,
|
|
||||||
root-owned), and if it's not already a mountpoint, exec `mount --bind
|
|
||||||
--map-users=<l4d2_sandbox_uid>:<left4me_uid>:1
|
|
||||||
--map-groups=<l4d2_sandbox_gid>:<left4me_gid>:1 <src> <target>`. Use
|
|
||||||
numeric uids/gids in the argv. Substitute the idmap path into the
|
|
||||||
`lowerdir=` colon string in place of the original path.
|
|
||||||
- On `umount <name>`: after the existing `umount` of `merged`, iterate
|
|
||||||
`runtime/<n>/idmap/*`, `umount` each that is a mountpoint, then
|
|
||||||
`shutil.rmtree(runtime/<n>/idmap, ignore_errors=True)`. Idempotent.
|
|
||||||
- PRINT_ONLY mode emits the bind-mount argv (one line per bind) before the
|
|
||||||
overlay-mount argv, same shell-quoting style.
|
|
||||||
|
|
||||||
Tests to add to `test_overlay_helper.py` (reuse PRINT_ONLY harness):
|
|
||||||
- `test_mount_idmaps_sandbox_owned_lowerdir` — tmp lower owned by
|
|
||||||
faked-sandbox uid, assert helper emits `mount --bind --map-users=...`
|
|
||||||
argv and the overlay `lowerdir=` references the idmap path.
|
|
||||||
- `test_mount_skips_idmap_for_left4me_owned_lowerdir` — assert no bind
|
|
||||||
argv, raw path in `lowerdir=`.
|
|
||||||
- `test_umount_unwinds_idmap_binds` — pre-seed an idmap subdir as a
|
|
||||||
mountpoint sentinel; assert the umount sequence in PRINT_ONLY includes
|
|
||||||
the bind teardown after the overlay umount.
|
|
||||||
|
|
||||||
Uid lookup in tests: monkeypatch `pwd.getpwnam` to return synthetic uids
|
|
||||||
matching what the test's `chown` set up. (No root required.)
|
|
||||||
|
|
||||||
### Task 2 — Deploy-artifact regression test
|
|
||||||
|
|
||||||
Edit `deploy/tests/test_deploy_artifacts.py`. Add a single test that opens
|
|
||||||
`deploy/files/usr/local/libexec/left4me/left4me-overlay` and asserts the
|
|
||||||
strings `--map-users` and `runtime/` followed by `idmap` (or similar
|
|
||||||
identifying marker) are present. Cheap guard against silent regression of
|
|
||||||
the deploy artifact.
|
|
||||||
|
|
||||||
### Task 3 — Deploy README mirror note
|
|
||||||
|
|
||||||
Edit `deploy/README.md`. Add one line under the existing ckn-bw mirror
|
|
||||||
notes flagging that the helper file change must be picked up by
|
|
||||||
`bundles/left4me/` in ckn-bw (no new group, user, or unit needed).
|
|
||||||
|
|
||||||
### Task 4 — Persist the plan in-repo
|
|
||||||
|
|
||||||
Per `AGENTS.md`: "the persisted artifact must end up under
|
|
||||||
`docs/superpowers/` and be committed." Copy this scratch plan to
|
|
||||||
`docs/superpowers/plans/2026-05-14-overlay-idmap.md` and commit it. Do
|
|
||||||
this as a separate commit from Task 1 so the plan lands before the
|
|
||||||
implementation.
|
|
||||||
|
|
||||||
### Task 5 — Deploy and verify on `left4.me`
|
|
||||||
|
|
||||||
Out-of-band, after the code tasks land:
|
|
||||||
1. ckn-bw apply (or scp the helper into place on the test server) to get
|
|
||||||
the new helper deployed.
|
|
||||||
2. Stop server 2: `sudo systemctl stop left4me-server@2`.
|
|
||||||
3. Clear the stale `l4d2-sandbox`-owned upperdir SM dirs:
|
|
||||||
`sudo rm -rf /var/lib/left4me/runtime/2/upper/left4dead2/addons/sourcemod/{logs,data}`.
|
|
||||||
4. Start server 2: `sudo systemctl start left4me-server@2`.
|
|
||||||
5. Confirm `journalctl -u left4me-server@2 -o cat -n 50` shows the new
|
|
||||||
`mount --bind --map-users=...` line.
|
|
||||||
6. RCON `sm_cvar nb_update_frequency 0.0333`. Expect no `Platform returned
|
|
||||||
error: "Permission denied"` log line.
|
|
||||||
7. `sudo ls -ln /var/lib/left4me/runtime/2/upper/left4dead2/addons/sourcemod/logs/`.
|
|
||||||
Expect uid 980 (left4me).
|
|
||||||
8. Restart again to confirm idempotency: idmap binds set up fresh, no
|
|
||||||
leftover mounts from prior start.
|
|
||||||
|
|
||||||
## Files to modify
|
|
||||||
|
|
||||||
### 1. `deploy/files/usr/local/libexec/left4me/left4me-overlay`
|
|
||||||
|
|
||||||
Single privileged code path; everything else flows from here.
|
|
||||||
|
|
||||||
**Add a helper function** to decide whether a path needs idmapping. Stat the
|
|
||||||
directory; if its `st_uid` matches the resolved `l4d2-sandbox` uid, return the
|
|
||||||
idmapped path under `runtime/<n>/idmap/`; otherwise return the input path
|
|
||||||
unchanged.
|
|
||||||
|
|
||||||
**In `cmd_mount`**, before constructing `lowerdir=`:
|
|
||||||
1. `os.makedirs(runtime_dir / "idmap", exist_ok=True)` (root-owned, mode
|
|
||||||
`0o700`; only the helper writes here).
|
|
||||||
2. Resolve `l4d2_sandbox_uid = pwd.getpwnam("l4d2-sandbox").pw_uid` and
|
|
||||||
`left4me_uid = pwd.getpwnam("left4me").pw_uid`. Cache. Fail fast with a
|
|
||||||
clear message if either user is missing.
|
|
||||||
3. For each `lowerdir` in the resolved list, compute
|
|
||||||
`idmapped_path(lowerdir)`. If remapping is required:
|
|
||||||
- Create the target directory under `runtime/<n>/idmap/<basename>` if
|
|
||||||
missing.
|
|
||||||
- Skip the bind if it's already mounted there (`os.path.ismount`).
|
|
||||||
- Otherwise exec `mount --bind
|
|
||||||
--map-users=<l4d2_sandbox_uid>:<left4me_uid>:1
|
|
||||||
--map-groups=<l4d2_sandbox_gid>:<left4me_gid>:1 <src> <target>`.
|
|
||||||
**Direction note**: first arg is the on-disk uid, second arg is the
|
|
||||||
uid the mount exposes. We verified empirically that this is what the
|
|
||||||
kernel honors despite ambiguous man-page wording.
|
|
||||||
4. Pass the (possibly idmapped) paths into the `lowerdir=` colon string in
|
|
||||||
the same order.
|
|
||||||
|
|
||||||
**In `cmd_umount`**, after the existing overlay umount:
|
|
||||||
- For each subdirectory under `runtime/<n>/idmap/`, if it's a mountpoint,
|
|
||||||
`umount` it.
|
|
||||||
- `shutil.rmtree(runtime_dir / "idmap", ignore_errors=True)` after all binds
|
|
||||||
are gone.
|
|
||||||
- Idempotent: re-running after the dir is gone is a no-op.
|
|
||||||
|
|
||||||
**PRINT_ONLY mode**: emit the bind-mount argv before the overlay-mount argv,
|
|
||||||
on separate lines, so test assertions can match. Same shell-quoting.
|
|
||||||
|
|
||||||
**Allowlist**: no change needed — the idmap binds land under `runtime/`,
|
|
||||||
which is already write-permitted for the helper.
|
|
||||||
|
|
||||||
### 2. `l4d2host/tests/test_overlay_helper.py`
|
|
||||||
|
|
||||||
Add tests using the existing PRINT_ONLY harness:
|
|
||||||
|
|
||||||
- `test_mount_idmaps_sandbox_owned_lowerdir`: create a tmp lowerdir, `chown`
|
|
||||||
it to a fake-`l4d2-sandbox` (use `monkeypatch` on the uid lookup if running
|
|
||||||
unprivileged), run helper in PRINT_ONLY, assert a `mount --bind
|
|
||||||
--map-users=...` line appears and the `lowerdir=` string references the
|
|
||||||
idmap path.
|
|
||||||
- `test_mount_skips_idmap_for_left4me_owned_lowerdir`: tmp lowerdir owned by
|
|
||||||
the test user, assert no bind-mount argv emitted and `lowerdir=`
|
|
||||||
references the raw path.
|
|
||||||
- `test_umount_unwinds_idmap_binds`: pre-create `runtime/<n>/idmap/foo` as a
|
|
||||||
mountpoint sentinel, assert the PRINT_ONLY umount sequence includes the
|
|
||||||
bind-mount teardown before the overlay umount? Actually overlay-first,
|
|
||||||
then binds — match the helper order.
|
|
||||||
|
|
||||||
Reuse the existing `LEFT4ME_OVERLAY_PRINT_ONLY=1` plumbing rather than
|
|
||||||
inventing a new mode.
|
|
||||||
|
|
||||||
### 3. `deploy/tests/test_deploy_artifacts.py`
|
|
||||||
|
|
||||||
Add a grep-style assertion that the helper file contains the strings
|
|
||||||
`--map-users` and `idmap` so the deploy artifact can't silently regress.
|
|
||||||
|
|
||||||
### 4. `deploy/README.md`
|
|
||||||
|
|
||||||
One-line mirror note: ckn-bw's `bundles/left4me/` ships the helper verbatim,
|
|
||||||
so no new bundle-side change is needed beyond updating the file. No new
|
|
||||||
group, no new user, no new systemd unit. Flag this explicitly so the next
|
|
||||||
deploy-to-prod step is "rebuild the helper file in ckn-bw, `bw apply
|
|
||||||
ovh.left4me`".
|
|
||||||
|
|
||||||
## Migration
|
|
||||||
|
|
||||||
No on-disk schema change. Existing overlays keep their current ownership
|
|
||||||
(`l4d2-sandbox`-owned for script builds, `left4me`-owned for workshop
|
|
||||||
builds). The mount helper picks the right path per lowerdir at next
|
|
||||||
`systemctl start <instance>`.
|
|
||||||
|
|
||||||
Already-running instances on the test server pick up the change after a
|
|
||||||
service restart. Live SourceMod sessions whose `addons/sourcemod/logs/`
|
|
||||||
copy-up is already broken in upperdir need a fix too: the upperdir entries
|
|
||||||
are `l4d2-sandbox:l4d2-sandbox 0755` from the previous broken copy-up. The
|
|
||||||
helper doesn't touch upper/ on mount, so those stale entries persist.
|
|
||||||
|
|
||||||
Two safe migration options:
|
|
||||||
|
|
||||||
1. Manual: on the test server, stop server 2, `rm -rf
|
|
||||||
runtime/2/upper/left4dead2/addons/sourcemod/logs runtime/2/upper/left4dead2/addons/sourcemod/data`,
|
|
||||||
start it again. Copy-up will redo with idmapped lower → `left4me`-owned
|
|
||||||
upper.
|
|
||||||
2. Automatic: have `start_instance` proactively delete known-SM writable
|
|
||||||
dirs from `upper/` if their uid is `l4d2-sandbox`. Out of scope for this
|
|
||||||
change unless we hit it again.
|
|
||||||
|
|
||||||
Recommend option 1 — one-shot, no code change.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
End-to-end test on `left4.me`:
|
|
||||||
|
|
||||||
1. `bw apply ovh.left4me` (or scp the updated helper into place).
|
|
||||||
2. Stop server 2: `sudo systemctl stop left4me-server@2`.
|
|
||||||
3. Clean the stale broken SM upperdir: `sudo rm -rf
|
|
||||||
/var/lib/left4me/runtime/2/upper/left4dead2/addons/sourcemod/{logs,data}`.
|
|
||||||
4. Start server 2: `sudo systemctl start left4me-server@2`.
|
|
||||||
5. From inside `left4me-overlay mount` argv (check `journalctl -u
|
|
||||||
left4me-server@2 -o cat -n 50`), confirm `mount --bind --map-users=...`
|
|
||||||
was executed.
|
|
||||||
6. RCON to server 2, `sm_cvar nb_update_frequency 0.0333`. Expect no
|
|
||||||
`Platform returned error: "Permission denied"` log line.
|
|
||||||
7. `ls -ln
|
|
||||||
/var/lib/left4me/runtime/2/upper/left4dead2/addons/sourcemod/logs/`.
|
|
||||||
Expect files owned by `left4me`'s numeric uid.
|
|
||||||
8. `sudo umount /var/lib/left4me/runtime/2/merged` should still work; then
|
|
||||||
verify `runtime/2/idmap/` is cleaned up by `ExecStopPost`. Restart and
|
|
||||||
confirm idempotent (no leftover binds).
|
|
||||||
|
|
||||||
Local tests:
|
|
||||||
|
|
||||||
- `pytest l4d2host/tests/test_overlay_helper.py -q`
|
|
||||||
- `pytest deploy/tests/test_deploy_artifacts.py -q`
|
|
||||||
|
|
||||||
## Risks and edge cases
|
|
||||||
|
|
||||||
- **Workshop overlay misidentification**: a workshop overlay with a
|
|
||||||
`l4d2-sandbox`-owned subdir somehow (e.g. partial migration) would get
|
|
||||||
idmapped despite containing `left4me`-owned files. Files with other uids
|
|
||||||
through an idmap appear as the overflow uid (`nobody`/65534), which would
|
|
||||||
break reads. Mitigation: check ownership of the **top-level overlay
|
|
||||||
directory** as the trigger, not file-by-file. If the top is sandbox-owned,
|
|
||||||
trust the whole tree; if the top is left4me-owned, no idmap. This matches
|
|
||||||
what each builder actually produces.
|
|
||||||
- **`installation/` and caches**: always `left4me`-owned, never idmapped.
|
|
||||||
- **Symlinks inside script overlays**: idmap operates at the mount level,
|
|
||||||
not per-inode. Symlink ownership translates the same as files. Targets
|
|
||||||
inside the overlay resolve through the same mount. Targets outside (none
|
|
||||||
in script overlays today; workshop ones don't take this code path) would
|
|
||||||
not be affected.
|
|
||||||
- **Mount namespace**: the helper runs in PID 1's mount namespace via the
|
|
||||||
unit's `ExecStartPre=+nsenter ...`. Bind mounts created there persist
|
|
||||||
until the matching `ExecStopPost` umount, exactly like the overlay mount
|
|
||||||
itself.
|
|
||||||
- **Crash mid-build**: idmap binds are created only at `mount` time, not at
|
|
||||||
build time. A crashed build leaves no orphan mounts.
|
|
||||||
- **Crash mid-start (ExecStartPre fails between bind and overlay mount)**:
|
|
||||||
systemd's `Restart=on-failure` re-invokes ExecStartPre. The helper checks
|
|
||||||
`os.path.ismount` on each idmap target and skips already-mounted ones.
|
|
||||||
Idempotent.
|
|
||||||
- **`runtime/<n>/idmap/` cleanup on `_purge_instance`**: existing
|
|
||||||
`shutil.rmtree(runtime_dir)` after `disable_service` already triggers the
|
|
||||||
helper's umount sequence, which removes the idmap dir. No new code.
|
|
||||||
- **util-linux flag form**: prefer `--map-users <inner>:<outer>:<count>` and
|
|
||||||
`--map-groups` (numeric uids/gids resolved by the helper) over the
|
|
||||||
`X-mount.idmap=` mount-option syntax — clearer and easier to log.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- Web app uid split (`l4d2-web` separate from `left4me`) — orthogonal,
|
|
||||||
rejected for this change.
|
|
||||||
- Gameserver uid split (separating the gameserver-runtime uid from
|
|
||||||
`left4me`) — planned for a later session. **One forward-compat
|
|
||||||
coupling**: the helper looks up `pwd.getpwnam("left4me")` as the in-mount
|
|
||||||
target uid. When the gameserver moves to its own user (e.g. `l4d2-game`),
|
|
||||||
change that one string. Everything else (script-sandbox uid, workshop
|
|
||||||
builder uid, file-tree endpoint, idmap cleanup) is uid-agnostic.
|
|
||||||
- Replacing `l4d2-sandbox` with a different uid scheme — kept as defense in
|
|
||||||
depth.
|
|
||||||
- Spec doc updates (including the bubblewrap → systemd-run wording
|
|
||||||
correction in the existing script-overlay spec): dropped per user
|
|
||||||
decision; this change ships plan-only.
|
|
||||||
|
|
@ -1,387 +0,0 @@
|
||||||
# Add an RCON console to the server detail page
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The server detail page (`/servers/<id>`) already exposes the RCON password,
|
|
||||||
live state polling, log streaming, and start/stop actions, but to send any
|
|
||||||
arbitrary command (`changelevel`, `sm_kick`, `mp_*`, `say`, etc.) the user
|
|
||||||
has to open a separate RCON client and reconnect. Adding an inline console
|
|
||||||
turns the web UI into a complete operator tool for the owner of a server:
|
|
||||||
type a command, see the reply, recall earlier commands via persisted
|
|
||||||
history.
|
|
||||||
|
|
||||||
Scope is intentionally narrow:
|
|
||||||
- One server, one user (the owner). Multi-user shared console = not now.
|
|
||||||
- Per-user history persisted across reloads.
|
|
||||||
- No blocklist — owner already has the RCON password and can run anything
|
|
||||||
via any RCON client; the UI is a thin wrapper.
|
|
||||||
|
|
||||||
## Design decisions (already settled)
|
|
||||||
|
|
||||||
| Topic | Choice |
|
|
||||||
|---|---|
|
|
||||||
| UI placement | Panel on `server_detail.html`, between **Live State** and **Files**. |
|
|
||||||
| Output transport | **HTMX append swap**, not SSE. RCON is request/response — SSE adds no value. Matches existing inline-form / `hx-swap` patterns in the codebase. |
|
|
||||||
| Safety | Owner-of-server check only (`Server.user_id == current_user.id`). No command blocklist. **No admin override** — admins can already SSH if needed; an unaudited UI backdoor isn't worth the asymmetry. |
|
|
||||||
| History | New `command_history` table, scoped per (user, server). Stores **command + reply + error flag** so the full transcript can be replayed on page reload. |
|
|
||||||
| Transcript on page load | **Replays the last 50 rows** for this user+server, rendered server-side into the transcript via the same `_console_line.html` partial used for live additions. Visually identical to live lines (no "old vs new" distinction — the whole point is page-reload continuity). |
|
|
||||||
| Transcript height | Fixed max-height ~400 px, internal vertical scroll. New lines auto-scroll to the bottom on add AND on initial load. Page layout below stays stable. |
|
|
||||||
| Clear button | None. Reload doesn't help (it replays). If anyone wants to drop history, that's a separate concern handled later. |
|
|
||||||
| RCON timeout | **30s per command.** Comfortably covers a cold map load with custom add-ons (community-observed worst case ~25s on modest hardware). 3× the python-valve default. Far below `director_transition_timeout` (120s) so no aliasing. If a command exceeds 30s, the RCON exec packet was already sent — the server still did the work; the user just doesn't see the textual reply but sees the effect in the Server Log SSE panel above. |
|
|
||||||
| Worker model | Rely on `gunicorn --threads N` (or whatever the existing deployment uses for the long-lived SSE log streams). Threads share memory; one stuck `changelevel` holds a thread, not a process. Don't scale processes — adding hundreds of workers wastes RAM (~100 MB each); threads cost nothing. |
|
|
||||||
|
|
||||||
## Server-side changes
|
|
||||||
|
|
||||||
### 1. Extend `l4d2web/services/rcon.py`
|
|
||||||
|
|
||||||
The wire-protocol layer already exists (`l4d2web/services/rcon.py:64`).
|
|
||||||
Add a generic command executor with **multi-packet response handling**:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def execute_command(
|
|
||||||
host: str, port: int, password: str, command: str, *, timeout: float = 30.0
|
|
||||||
) -> str:
|
|
||||||
"""Authenticate, send a single command, return the joined reply body.
|
|
||||||
|
|
||||||
Implements the trailing-marker pattern: after the exec packet we
|
|
||||||
immediately send an empty SERVERDATA_RESPONSE_VALUE packet with a
|
|
||||||
sentinel req_id. We then read response packets, concatenating bodies,
|
|
||||||
until we see the sentinel echo back. This is the only reliable way
|
|
||||||
to detect end-of-output, because Source RCON splits replies >4096 B
|
|
||||||
across multiple packets with no length header.
|
|
||||||
"""
|
|
||||||
```
|
|
||||||
|
|
||||||
Implementation notes:
|
|
||||||
- Factor `_connect_and_auth(sock, password)` out of `query_status` so
|
|
||||||
both functions share the auth dance.
|
|
||||||
- Use req_id `0xDEADBEEF` (or any constant ≠ the exec req_id) for the
|
|
||||||
sentinel; read packets until one comes back with that req_id.
|
|
||||||
- Input validation **inside this function** (not just at the route):
|
|
||||||
- Reject empty / whitespace-only `command` → `ValueError`.
|
|
||||||
- Reject embedded `\x00` bytes (would corrupt the null-terminated
|
|
||||||
wire format) → `ValueError`.
|
|
||||||
- Cap length at 1000 chars (RCON packet limit is 4096 incl. headers;
|
|
||||||
no real command needs more). Longer → `ValueError`.
|
|
||||||
- Trim trailing whitespace from the joined body. Otherwise return verbatim.
|
|
||||||
- Existing `RconError` / `RconAuthError` exception types are reused.
|
|
||||||
|
|
||||||
Tests in `l4d2web/tests/test_rcon.py` (extend the `FakeRconServer` to
|
|
||||||
support multi-packet replies):
|
|
||||||
- happy path: single-packet response
|
|
||||||
- multi-packet response (synthesize a >4096 B reply)
|
|
||||||
- empty reply (server replies only with the sentinel — case for `say`)
|
|
||||||
- bad password → `RconAuthError`
|
|
||||||
- timeout (fake server sleeps longer than the test timeout)
|
|
||||||
- input validation: empty / null byte / oversized → `ValueError`
|
|
||||||
|
|
||||||
### 2. New `CommandHistory` model (`l4d2web/models.py`)
|
|
||||||
|
|
||||||
Append at the bottom of `models.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
class CommandHistory(Base):
|
|
||||||
__tablename__ = "command_history"
|
|
||||||
__table_args__ = (
|
|
||||||
Index("ix_cmdhist_user_server_id", "user_id", "server_id", "id"),
|
|
||||||
)
|
|
||||||
id: Mapped[int] = mapped_column(Integer, primary_key=True)
|
|
||||||
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"), nullable=False)
|
|
||||||
server_id: Mapped[int] = mapped_column(ForeignKey("servers.id", ondelete="CASCADE"), nullable=False)
|
|
||||||
command: Mapped[str] = mapped_column(Text, nullable=False)
|
|
||||||
reply: Mapped[str] = mapped_column(Text, nullable=False, default="", server_default="")
|
|
||||||
is_error: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False, server_default=text("0"))
|
|
||||||
created_at: Mapped[datetime] = mapped_column(DateTime, default=now_utc, nullable=False)
|
|
||||||
```
|
|
||||||
|
|
||||||
Index `(user_id, server_id, id)` because every lookup is "latest N for
|
|
||||||
this user+server", ordered by `id DESC`.
|
|
||||||
|
|
||||||
A row is persisted on **every** RCON outcome — successful reply,
|
|
||||||
empty reply, and error (auth fail, connect refused, `RconError`). The
|
|
||||||
`is_error` flag drives the red styling on replay, so the transcript
|
|
||||||
looks identical after a page reload.
|
|
||||||
|
|
||||||
**Storage cost**: most replies are <500 B; `status` ~1 KB;
|
|
||||||
`sm plugins list` a few KB; `cvarlist` can be 50 KB+. A power user
|
|
||||||
running 100 commands/day at an average ~2 KB → ~73 MB/year. SQLite
|
|
||||||
handles that without complaint; a trim job (cap N per user/server,
|
|
||||||
e.g. last 5000) can be added if anyone ever notices.
|
|
||||||
|
|
||||||
**Privacy note for the implementer**: replies from `status` include
|
|
||||||
player names (user-controlled strings from random Steam users) and
|
|
||||||
SteamID64s. Treat them as untrusted text on output (handled by Jinja
|
|
||||||
auto-escaping — see §5) and don't surface them outside this user's
|
|
||||||
session.
|
|
||||||
|
|
||||||
### 3. New alembic migration `0012_command_history.py`
|
|
||||||
|
|
||||||
Mirror `l4d2web/alembic/versions/0011_server_hostname.py`:
|
|
||||||
- `revision = "0012_command_history"`
|
|
||||||
- `down_revision = "0011_server_hostname"`
|
|
||||||
- `upgrade()`: `op.create_table("command_history", …)` with columns
|
|
||||||
`id`, `user_id`, `server_id`, `command (Text)`, `reply (Text, server_default="")`,
|
|
||||||
`is_error (Boolean, server_default="0")`, `created_at`; plus
|
|
||||||
`op.create_index("ix_cmdhist_user_server_id", ...)`.
|
|
||||||
- `downgrade()`: drop index then table.
|
|
||||||
- `test_alembic_migrations.py` auto-discovers revisions (skim once to
|
|
||||||
confirm; no edit if so).
|
|
||||||
|
|
||||||
### 4. New route module `l4d2web/routes/console_routes.py`
|
|
||||||
|
|
||||||
Two endpoints, both `@require_login`, both verify ownership with
|
|
||||||
**404** on miss (matches the existing pattern at
|
|
||||||
`page_routes.py:303` — no admin backdoor).
|
|
||||||
|
|
||||||
**`POST /servers/<id>/console`** — submit a command.
|
|
||||||
- CSRF-checked (form field `csrf_token`).
|
|
||||||
- Form field `command`. Validation happens twice: at the route (return a
|
|
||||||
user-facing error fragment for empty / oversized) and inside
|
|
||||||
`execute_command` (defence in depth — never trust a single layer).
|
|
||||||
- Calls
|
|
||||||
`rcon.execute_command("127.0.0.1", server.port, server.rcon_password, command)`.
|
|
||||||
- **Every outcome persists a `CommandHistory` row** (so the transcript
|
|
||||||
fully reconstructs on page reload):
|
|
||||||
- Success with reply → `command`, `reply`, `is_error=False`.
|
|
||||||
- Success with empty reply (e.g. `say`) → `command`, `reply=""`,
|
|
||||||
`is_error=False`. Template renders `(no reply)` in muted text.
|
|
||||||
- `RconAuthError` / `RconError` / connect-failed → `command`,
|
|
||||||
`reply=<exception message>`, `is_error=True`. Red styling on render.
|
|
||||||
- On `ValueError` from input validation (empty / null byte / oversized):
|
|
||||||
render an error fragment, **do not** insert history (the command
|
|
||||||
never reached the wire — nothing happened to remember).
|
|
||||||
- Returns 200 in all cases (errors are rendered, not raised) so HTMX
|
|
||||||
appends them to the transcript like any other line.
|
|
||||||
|
|
||||||
**`GET /servers/<id>/console/history?before=<id>&limit=50`** — paged
|
|
||||||
history for up-arrow navigation.
|
|
||||||
- Returns JSON `[{"id": …, "command": …}, …]` ordered newest-first.
|
|
||||||
- The client owns the input state; this stays JSON, not HTML.
|
|
||||||
- `limit` clamped to ≤200.
|
|
||||||
|
|
||||||
Register the blueprint in `l4d2web/app.py` alongside the other
|
|
||||||
`*_routes` modules.
|
|
||||||
|
|
||||||
**Also extend `server_detail()` in `page_routes.py`** to fetch the last
|
|
||||||
50 `CommandHistory` rows for this `(user, server)`, ordered oldest-first
|
|
||||||
(so they iterate naturally in the template), and pass as
|
|
||||||
`console_history` in the render context. Use the same `session_scope`
|
|
||||||
block that already loads `server` and `blueprint` (`page_routes.py:301`)
|
|
||||||
— one extra `db.scalars(select(CommandHistory)…)` call, no new round
|
|
||||||
trip cost.
|
|
||||||
|
|
||||||
### 5. Template fragment `templates/_console_line.html`
|
|
||||||
|
|
||||||
```jinja2
|
|
||||||
<div class="console-line{% if error %} console-error{% endif %}">
|
|
||||||
<div class="console-prompt">> {{ command }}</div>
|
|
||||||
{% if reply %}
|
|
||||||
<pre class="console-reply">{{ reply }}</pre>
|
|
||||||
{% else %}
|
|
||||||
<div class="console-reply muted">(no reply)</div>
|
|
||||||
{% endif %}
|
|
||||||
</div>
|
|
||||||
```
|
|
||||||
|
|
||||||
**XSS reminder for the implementer:** `reply` originates from the game
|
|
||||||
server's RCON output — we do not trust it. **Never use `|safe`**, never
|
|
||||||
`{{ reply|markdown }}`, never anything that bypasses Jinja's default
|
|
||||||
HTML escaping. The existing `{{ reply }}` is the right call.
|
|
||||||
|
|
||||||
### 6. Console panel in `templates/server_detail.html`
|
|
||||||
|
|
||||||
Insert between the existing live-state section (line 33–37) and the
|
|
||||||
Files section (line 39):
|
|
||||||
|
|
||||||
```jinja2
|
|
||||||
<h2 class="section-title">Console</h2>
|
|
||||||
<section class="panel console-panel">
|
|
||||||
<div id="console-transcript-{{ server.id }}"
|
|
||||||
class="console-transcript"
|
|
||||||
data-autoscroll>
|
|
||||||
{% for h in console_history %}
|
|
||||||
{% include "_console_line.html" with context %}
|
|
||||||
{# Loops with h.command, h.reply, h.is_error, h.created_at #}
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
<form hx-post="/servers/{{ server.id }}/console"
|
|
||||||
hx-target="#console-transcript-{{ server.id }}"
|
|
||||||
hx-swap="beforeend"
|
|
||||||
hx-indicator=".console-spinner"
|
|
||||||
hx-on::after-request="this.command.value=''; this.command.focus(); this.closest('section').querySelector('[data-autoscroll]').scrollTop = 1e9"
|
|
||||||
class="console-input-form"
|
|
||||||
data-console-form data-server-id="{{ server.id }}">
|
|
||||||
<input type="hidden" name="csrf_token" value="{{ session.get('csrf_token', '') }}">
|
|
||||||
<span class="console-prompt-glyph">></span>
|
|
||||||
<input name="command" autocomplete="off" spellcheck="false" maxlength="1000"
|
|
||||||
placeholder="status, changelevel c1m1_hotel, sm_kick …">
|
|
||||||
<span class="console-spinner" aria-hidden="true">…</span>
|
|
||||||
<button type="submit">Send</button>
|
|
||||||
</form>
|
|
||||||
</section>
|
|
||||||
```
|
|
||||||
|
|
||||||
- Transcript is server-side rendered with the last 50 history rows on
|
|
||||||
page load. `_console_line.html` is the single source of truth for
|
|
||||||
line layout — same template, same look, whether the line came from
|
|
||||||
this session or last week.
|
|
||||||
- `hx-indicator` gives visible feedback during slow commands (a
|
|
||||||
`changelevel` can sit at ~10s+).
|
|
||||||
- `maxlength="1000"` on the input mirrors the server-side cap.
|
|
||||||
- The `hx-on::after-request` inline scrolls the transcript to the
|
|
||||||
bottom after each new line. On initial page load, the JS module
|
|
||||||
scrolls to the bottom once after the DOM is ready (so the most
|
|
||||||
recent history is visible, not the oldest).
|
|
||||||
|
|
||||||
**Cross-feature interaction (do not "fix"):** Silent or slow commands
|
|
||||||
(`say`, `kick`, `changelevel`) will produce empty or terse RCON replies
|
|
||||||
in this transcript. The actual game-side effect is already visible in
|
|
||||||
the **Server Log** SSE panel right above. A future implementer should
|
|
||||||
NOT try to mirror server-log lines back into the console transcript —
|
|
||||||
that's a redundancy, not a feature.
|
|
||||||
|
|
||||||
### 7. New `static/js/console-history.js`
|
|
||||||
|
|
||||||
Tiny module bound to `[data-console-form]`:
|
|
||||||
- **On DOM ready**: scroll each `[data-autoscroll]` transcript to the
|
|
||||||
bottom so the most recent replayed lines are visible. This is the
|
|
||||||
initial-load equivalent of the `hx-on::after-request` scroll.
|
|
||||||
- **On first focus** of the input: lazy-fetch
|
|
||||||
`/servers/<id>/console/history?limit=50` and cache the array in
|
|
||||||
memory. (Distinct from the rendered-on-load transcript: this cache
|
|
||||||
is *just commands* for up/down recall — replies don't matter for
|
|
||||||
navigation, so the JSON endpoint stays narrow.)
|
|
||||||
- **ArrowUp / ArrowDown**: walk the cached array, set `input.value`.
|
|
||||||
- ArrowUp from a non-history state: snapshot the current value so
|
|
||||||
ArrowDown can restore it.
|
|
||||||
- **ArrowUp past the end**: fetch next page using
|
|
||||||
`?before=<oldest_cached_id>`. If empty, stop.
|
|
||||||
- **After a successful submit** (`htmx:afterRequest` with 2xx):
|
|
||||||
prepend the just-sent command to the in-memory cache so it's
|
|
||||||
instantly recallable.
|
|
||||||
|
|
||||||
Loaded via a `<script defer>` line in `base.html` next to the other
|
|
||||||
small static JS modules (same pattern as `sse.js`).
|
|
||||||
|
|
||||||
### 8. Concurrency sanity (no code, just verifying the design)
|
|
||||||
|
|
||||||
`live_state_poller.py` already opens fresh RCON connections every 5s
|
|
||||||
against the same port. SrcDS handles concurrent RCON sessions cleanly
|
|
||||||
(each is independently auth'd, no shared state). The console adds at
|
|
||||||
most one more concurrent connection per active user — well within
|
|
||||||
limits. No locking needed.
|
|
||||||
|
|
||||||
### 9. Minimal CSS in `static/css/`
|
|
||||||
|
|
||||||
Monospace transcript, dark background, `console-error` styled like the
|
|
||||||
existing error pills. Match the visual weight of the existing log-stream
|
|
||||||
`<pre>` block on the detail page — no new design system.
|
|
||||||
|
|
||||||
## Files to touch
|
|
||||||
|
|
||||||
| File | Change |
|
|
||||||
|---|---|
|
|
||||||
| `l4d2web/services/rcon.py` | Add `execute_command()` with multi-packet handling + input validation; extract `_connect_and_auth()` |
|
|
||||||
| `l4d2web/tests/test_rcon.py` | Extend `FakeRconServer` for multi-packet; add success / multi-packet / empty / bad-pw / timeout / validation tests |
|
|
||||||
| `l4d2web/models.py` | Add `CommandHistory` (with `reply`, `is_error`) |
|
|
||||||
| `l4d2web/alembic/versions/0012_command_history.py` | New migration |
|
|
||||||
| `l4d2web/routes/console_routes.py` | **NEW** — POST + GET endpoints |
|
|
||||||
| `l4d2web/routes/page_routes.py` | Extend `server_detail()` to fetch last 50 history rows and pass `console_history` |
|
|
||||||
| `l4d2web/app.py` | Register the new blueprint |
|
|
||||||
| `l4d2web/templates/_console_line.html` | **NEW** fragment |
|
|
||||||
| `l4d2web/templates/server_detail.html` | Insert console panel section (with server-rendered replay loop) |
|
|
||||||
| `l4d2web/static/js/console-history.js` | **NEW** up/down history nav + initial scroll-to-bottom |
|
|
||||||
| `l4d2web/templates/base.html` | `<script defer src="…/console-history.js">` |
|
|
||||||
| `l4d2web/static/css/*.css` | Console panel styling (fixed-height scroll transcript, error variant) |
|
|
||||||
| `l4d2web/tests/test_console_routes.py` | **NEW** route tests |
|
|
||||||
|
|
||||||
## Tests to write explicitly
|
|
||||||
|
|
||||||
**`test_rcon.py`** (extending existing file):
|
|
||||||
- `execute_command` happy path, single-packet reply
|
|
||||||
- `execute_command` multi-packet reply (>4096 B) reassembled in order
|
|
||||||
- `execute_command` empty reply (server returns only the sentinel)
|
|
||||||
- `execute_command` bad password → `RconAuthError`
|
|
||||||
- `execute_command` socket timeout → `RconError`
|
|
||||||
- Input validation: empty / whitespace-only / null-byte / oversized → `ValueError`
|
|
||||||
|
|
||||||
**`test_console_routes.py`** (new):
|
|
||||||
- not logged in → 302 to login
|
|
||||||
- logged in but not server owner → **404** (not 403 — match
|
|
||||||
`page_routes.py:303`)
|
|
||||||
- valid command → 200, fragment HTML rendered, `CommandHistory` row
|
|
||||||
inserted with `reply` populated and `is_error=False`
|
|
||||||
- empty RCON reply → 200, fragment renders `(no reply)`, history row
|
|
||||||
inserted with `reply=""`, `is_error=False`
|
|
||||||
- RCON error (mock `execute_command` to raise) → 200, error fragment,
|
|
||||||
history row inserted with `is_error=True` and the exception message
|
|
||||||
in `reply`
|
|
||||||
- empty/oversized command (validation error before wire) → 200, error
|
|
||||||
fragment, **no** history row
|
|
||||||
- CSRF token missing → rejected
|
|
||||||
- `GET /console/history` returns newest-first
|
|
||||||
- `GET /console/history?before=<id>` paginates correctly
|
|
||||||
- `GET /console/history?limit=10000` is clamped to ≤200
|
|
||||||
|
|
||||||
**`test_page_routes.py`** (extend existing if present, otherwise add):
|
|
||||||
- `server_detail` returns the last 50 `CommandHistory` rows for the
|
|
||||||
viewing user only, oldest-first in the rendered page (newest at the
|
|
||||||
bottom of the transcript)
|
|
||||||
- a history row belonging to another user for the same server is **not**
|
|
||||||
visible (ownership scoping is by `user_id`, not just `server_id`)
|
|
||||||
|
|
||||||
## What we are deliberately NOT doing
|
|
||||||
|
|
||||||
- No command blocklist or admin gate — owner already has the password.
|
|
||||||
- **No admin override** to console other users' servers (admins can SSH if
|
|
||||||
they truly need to; UI backdoor would be unaudited and asymmetric).
|
|
||||||
- No shared multi-user view of the same console.
|
|
||||||
- No streaming output (RCON doesn't stream; replies are one-shot).
|
|
||||||
- No autocomplete of cvars — out of scope; up-arrow history is enough.
|
|
||||||
- No "Clear transcript" button — the transcript replays on every page
|
|
||||||
load by design. Discarding history is a different concern (delete
|
|
||||||
rows from the DB) and is out of scope for v1.
|
|
||||||
- No history-trim job — file an issue if anyone hits >100k rows; not
|
|
||||||
worth pre-empting at this scale.
|
|
||||||
- No mirroring of server-log lines into the console transcript — the
|
|
||||||
Server Log panel above already serves that purpose.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. `pytest l4d2web/tests/test_rcon.py l4d2web/tests/test_console_routes.py l4d2web/tests/test_alembic_migrations.py` — unit + migration tests pass.
|
|
||||||
2. Boot the web app locally, log in, open a server detail page for a
|
|
||||||
running server, send `status` — multi-line reply renders in the
|
|
||||||
transcript; the input clears and refocuses; spinner shows during
|
|
||||||
the request; the transcript scrolls to the new line at the bottom.
|
|
||||||
3. Send `cvarlist` — a large multi-packet response — and confirm the
|
|
||||||
full output reassembles, not truncated.
|
|
||||||
4. Send `say hello` — transcript shows `> say hello` followed by
|
|
||||||
`(no reply)` in muted text; the line appears in the Server Log
|
|
||||||
panel above.
|
|
||||||
5. Send `changelevel c1m1_hotel` — request takes ~10–20s, spinner
|
|
||||||
visible the whole time, then a (likely empty) reply appears, and
|
|
||||||
the live-state panel updates to the new map within 5s.
|
|
||||||
6. Send an invalid command (e.g. `nonsense_cvar`) — reply renders
|
|
||||||
normally (RCON tolerates unknown commands).
|
|
||||||
7. Send a command with embedded null bytes (via curl, since the
|
|
||||||
browser strips them) — returns 200 with an error fragment, no
|
|
||||||
history row.
|
|
||||||
8. Send a 2000-char command — rejected with an error fragment, no
|
|
||||||
history row.
|
|
||||||
9. **Reload the page** — the transcript reappears identical to before,
|
|
||||||
showing the same `> status`, `> say hello`, `> nonsense_cvar` lines
|
|
||||||
with their replies, scrolled to the bottom. Errors are still red.
|
|
||||||
10. Focus the input, press ArrowUp — the previous command reappears.
|
|
||||||
ArrowDown restores the empty state.
|
|
||||||
11. Send 60+ commands, then ArrowUp past the in-memory page boundary —
|
|
||||||
older commands load on demand.
|
|
||||||
12. Stop the server, try to send a command — surfaces as a styled
|
|
||||||
`console-error` line ("connect failed") rather than a 500; **a
|
|
||||||
history row IS inserted** with `is_error=True`, so the error
|
|
||||||
replays on next page load.
|
|
||||||
13. Log in as a different user, visit `/servers/<other-user-id>` —
|
|
||||||
404, no console rendered. POST to that URL also 404. The other
|
|
||||||
user's transcript is not visible.
|
|
||||||
14. Confirm that a `cvarlist`-class large reply persists fully in the
|
|
||||||
DB (`SELECT length(reply) FROM command_history ORDER BY id DESC LIMIT 1;`)
|
|
||||||
and replays in full on page reload.
|
|
||||||
|
|
@ -1,270 +0,0 @@
|
||||||
# Build-time idmap: move the uid translation from the gameserver mount
|
|
||||||
into the script sandbox
|
|
||||||
|
|
||||||
> **SUPERSEDED 2026-05-15** by the uid-collapse refactor
|
|
||||||
> ([`2026-05-15-uid-collapse.md`](2026-05-15-uid-collapse.md)). The
|
|
||||||
> idmap pattern this plan introduced is removed because source uid
|
|
||||||
> (`left4me`) now equals target uid (`left4me`) — the translation is
|
|
||||||
> a no-op. Kept for design-evolution context.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The current idmap implementation translates uids at **gameserver mount
|
|
||||||
time**: `left4me-overlay` stats each lowerdir, creates a per-lowerdir
|
|
||||||
idmapped bind under `runtime/<n>/idmap/<basename>` for the sandbox-
|
|
||||||
owned ones, then uses those bind paths in the overlay's `lowerdir=`.
|
|
||||||
On stop, the binds get torn down. Works correctly today, but spreads
|
|
||||||
the idmap concern across two helpers and adds mount lifecycle code on
|
|
||||||
every gameserver start.
|
|
||||||
|
|
||||||
Cleaner alternative: do the idmap translation at **script-sandbox
|
|
||||||
build time**, so files land on disk as `left4me`-owned. The on-disk
|
|
||||||
state then matches workshop-built overlays (also left4me-owned), and
|
|
||||||
the gameserver mount path becomes uniform — no per-lowerdir stat,
|
|
||||||
no idmap binds, no extra cleanup.
|
|
||||||
|
|
||||||
This plan switches the architecture to the build-time approach and
|
|
||||||
reverts the gameserver-mount idmap code.
|
|
||||||
|
|
||||||
## Verified mechanism
|
|
||||||
|
|
||||||
Tested end-to-end on `left4.me` (Trixie, kernel 6.12.86, ext4) on
|
|
||||||
2026-05-15:
|
|
||||||
|
|
||||||
1. `/source/` dir owned by `left4me` on disk.
|
|
||||||
2. `mount --bind --map-users=980:981:1 --map-groups=980:981:1
|
|
||||||
/source /idmapped` — inside `/idmapped`, files appear as uid 981
|
|
||||||
(sandbox view).
|
|
||||||
3. `mount --bind /idmapped /rebound` — a plain second bind. The idmap
|
|
||||||
**propagates** to `/rebound` (rebound view also shows uid 981).
|
|
||||||
This is what `BindPaths=` in the sandbox unit does.
|
|
||||||
4. `sudo -u l4d2-sandbox touch /rebound/x.txt` — write **succeeds**.
|
|
||||||
The file lands on disk owned by `left4me` (uid 980).
|
|
||||||
|
|
||||||
Map direction is the inverse of the gameserver-side map:
|
|
||||||
`--map-users=<disk_uid>:<mount_uid>:1` where disk is `left4me` and
|
|
||||||
mount-side is `l4d2-sandbox`. Inside the bind, the sandbox uid sees
|
|
||||||
its own uid as itself; writes from that uid get translated back to
|
|
||||||
the disk-side (left4me) for storage.
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
### Script-sandbox helper (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`)
|
|
||||||
|
|
||||||
Pre-create an idmapped bind staging path, point the sandbox's
|
|
||||||
BindPaths at it, clean up on exit. Concretely:
|
|
||||||
|
|
||||||
1. **Remove** the existing `chown -R l4d2-sandbox:l4d2-sandbox
|
|
||||||
"$OVERLAY_DIR"` and `chmod 0755` lines. The overlay dir stays
|
|
||||||
`left4me`-owned (web app's creation default).
|
|
||||||
2. **Add** a setup block before `systemd-run`:
|
|
||||||
```bash
|
|
||||||
STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}
|
|
||||||
trap 'umount "$STAGING" 2>/dev/null || true; rmdir "$STAGING" 2>/dev/null || true' EXIT
|
|
||||||
mkdir -p "$STAGING"
|
|
||||||
mount --bind \
|
|
||||||
--map-users=$(id -u left4me):$(id -u l4d2-sandbox):1 \
|
|
||||||
--map-groups=$(id -g left4me):$(id -g l4d2-sandbox):1 \
|
|
||||||
"$OVERLAY_DIR" "$STAGING"
|
|
||||||
```
|
|
||||||
3. **Change** the systemd-run line:
|
|
||||||
- `BindPaths="${OVERLAY_DIR}:/overlay"` → `BindPaths="${STAGING}:/overlay"`
|
|
||||||
4. **Remove** the post-build `find ... chmod o+r` block. Files end up
|
|
||||||
left4me-owned, web app reads them via its primary uid. The
|
|
||||||
world-read kludge was only needed because of the old sandbox-
|
|
||||||
owned files; with this change it's obsolete.
|
|
||||||
|
|
||||||
`trap` ensures the staging bind is umounted even on errors / signals.
|
|
||||||
Idempotent: if the helper is re-run, `umount + rmdir` handle existing
|
|
||||||
state, and `mkdir -p` + `mount --bind` over an existing mountpoint
|
|
||||||
adds another bind that the next exit cleans up. The kernel 6.12 bind
|
|
||||||
nesting on the same path works fine (verified during the recent
|
|
||||||
gameserver-side idmap fix).
|
|
||||||
|
|
||||||
### Gameserver-mount helper (`deploy/files/usr/local/libexec/left4me/left4me-overlay`)
|
|
||||||
|
|
||||||
Revert the idmap logic added in commit `2f6a9cf` (+ fix in `9053186`,
|
|
||||||
+ mountpoint-detection fix in `dd918ac`). Specifically:
|
|
||||||
|
|
||||||
1. **Remove** the per-lowerdir stat + idmap-decision loop in `cmd_mount`.
|
|
||||||
`lowerdir=` becomes the simple colon-join of resolved lowerdirs
|
|
||||||
(the pre-2f6a9cf shape).
|
|
||||||
2. **Remove** the bind-umount loop in `cmd_umount` and the
|
|
||||||
`shutil.rmtree(idmap_dir, ...)` line.
|
|
||||||
3. **Remove** the `_is_mountpoint`, `_lookup_uid`, and `_get_user_ids`
|
|
||||||
helpers — no longer used. (Keep `os.path.ismount` for the merged
|
|
||||||
overlay check; that one's reliable.)
|
|
||||||
4. **Remove** the `LEFT4ME_TEST_*_UID/GID` test-only env-var stubs.
|
|
||||||
5. **Remove** the idmap PRINT_ONLY emission.
|
|
||||||
|
|
||||||
The helper shrinks back to the pre-idmap size (~242 lines from current 381).
|
|
||||||
|
|
||||||
### Tests
|
|
||||||
|
|
||||||
In `l4d2host/tests/test_overlay_helper.py`:
|
|
||||||
|
|
||||||
1. **Remove** `test_mount_idmaps_sandbox_owned_lowerdir`.
|
|
||||||
2. **Remove** `test_mount_skips_idmap_for_left4me_owned_lowerdir`.
|
|
||||||
3. **Remove** `test_umount_unwinds_idmap_binds`.
|
|
||||||
4. **Remove** `test_is_mountpoint_detects_same_fs_bind_mount` and the
|
|
||||||
`_load_helper_module` helper.
|
|
||||||
5. **Remove** `_setup_instance_with_uid` and the `FAKE_*_UID/GID`
|
|
||||||
constants.
|
|
||||||
6. **Remove** the `LEFT4ME_TEST_*` env-var injection in `_run`.
|
|
||||||
|
|
||||||
In `deploy/tests/test_deploy_artifacts.py`:
|
|
||||||
|
|
||||||
1. **Remove** `test_overlay_helper_idmaps_sandbox_owned_lowerdirs`
|
|
||||||
(the regression test for the soon-removed feature).
|
|
||||||
2. **Add** a new test `test_script_sandbox_uses_idmap_staging` that
|
|
||||||
asserts the sandbox helper contains:
|
|
||||||
- `--map-users=` and `--map-groups=` strings (the bind setup),
|
|
||||||
- `/var/lib/left4me/tmp/sandbox-idmap-` (the staging path prefix),
|
|
||||||
- `BindPaths="${STAGING}:/overlay"` (or close equivalent — point
|
|
||||||
the bind at the idmapped staging path, not at OVERLAY_DIR).
|
|
||||||
- A `trap` for cleanup.
|
|
||||||
3. **Remove** the existing `chown -R l4d2-sandbox` assertion in the
|
|
||||||
sandbox-helper test (if any).
|
|
||||||
|
|
||||||
### Migration
|
|
||||||
|
|
||||||
Existing overlays under `/var/lib/left4me/overlays/<id>/` are a mix:
|
|
||||||
|
|
||||||
- Workshop-built: already `left4me`-owned (no migration needed).
|
|
||||||
- Script-built (e.g. server 2's overlays 4 and 9): currently
|
|
||||||
`l4d2-sandbox`-owned from the prior helper version. **Need chown to
|
|
||||||
`left4me:left4me`.**
|
|
||||||
|
|
||||||
One-shot migration command on the test server (run before deploying
|
|
||||||
the new helpers, OR after — both work because the new script-sandbox
|
|
||||||
also expects left4me-owned dirs):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo chown -R left4me:left4me /var/lib/left4me/overlays/
|
|
||||||
```
|
|
||||||
|
|
||||||
That's safe — overlays/* are all overlay content, no other tenants.
|
|
||||||
The workshop ones are already left4me; the chown is a no-op for them.
|
|
||||||
The script-built ones get flipped to the new ownership model.
|
|
||||||
|
|
||||||
Running gameservers using the old idmap-bind setup will keep working
|
|
||||||
on the old overlays/<id> files (which they bind via the now-orphan
|
|
||||||
idmap bind that's already in place). The next stop/start cycle picks
|
|
||||||
up the new helper, which:
|
|
||||||
|
|
||||||
- Doesn't create any new idmap binds (gameserver-side helper has
|
|
||||||
none),
|
|
||||||
- Cleans up the legacy idmap binds it finds (the existing umount loop
|
|
||||||
in the current helper handles this on the way out).
|
|
||||||
|
|
||||||
After the first stop/start cycle, no more idmap binds exist anywhere
|
|
||||||
in the system. Steady state.
|
|
||||||
|
|
||||||
### ckn-bw bundle
|
|
||||||
|
|
||||||
No changes needed. The `install_left4me_scripts` action picks up the
|
|
||||||
new helper contents from `/opt/left4me/src/deploy/files/usr/local/...`
|
|
||||||
on the next `git_deploy` apply. ckn-bw itself is content-agnostic
|
|
||||||
about the helper internals.
|
|
||||||
|
|
||||||
## Files to modify
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add
|
|
||||||
idmap bind setup + trap cleanup; remove old chown; switch BindPaths.
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-overlay` — revert the
|
|
||||||
~140 lines of idmap-handling code; remove uid lookup, mountinfo
|
|
||||||
helper, test-stub env vars; drop the idmap PRINT_ONLY emission.
|
|
||||||
- `l4d2host/tests/test_overlay_helper.py` — drop idmap tests and
|
|
||||||
helpers.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — flip the asserted
|
|
||||||
invariant (helper has idmap → sandbox has idmap).
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
End-to-end on `left4.me`:
|
|
||||||
|
|
||||||
1. Push left4me commit, `bw apply ovh.left4me`.
|
|
||||||
2. `sudo chown -R left4me:left4me /var/lib/left4me/overlays/` (one-shot
|
|
||||||
migration).
|
|
||||||
3. `sudo systemctl restart left4me-server@2`.
|
|
||||||
4. `sudo findmnt --task 1 -o TARGET | grep runtime/2` — expect *only*
|
|
||||||
`runtime/2/merged`, no `idmap/*` subdirs.
|
|
||||||
5. `sudo ls -ln /var/lib/left4me/overlays/9/` and a couple of other
|
|
||||||
script overlays — expect `left4me:left4me`.
|
|
||||||
6. Trigger an overlay rebuild from the web UI on a script overlay.
|
|
||||||
Confirm the build succeeds and the resulting files are
|
|
||||||
left4me-owned on disk.
|
|
||||||
7. `sudo -u left4me touch
|
|
||||||
/var/lib/left4me/runtime/2/merged/left4dead2/addons/sourcemod/logs/test.log`
|
|
||||||
— expect write to succeed (verifies SM logging path still works).
|
|
||||||
8. RCON `sm_cvar nb_update_frequency 0.0333` — no permission-denied
|
|
||||||
line in `journalctl -u left4me-server@2`.
|
|
||||||
|
|
||||||
Local tests:
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2host/tests/test_overlay_helper.py -q
|
|
||||||
pytest deploy/tests/test_deploy_artifacts.py -q
|
|
||||||
```
|
|
||||||
|
|
||||||
Both should pass with reduced test count (removed idmap-on-mount
|
|
||||||
tests, added one sandbox-helper assertion).
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **Kernel version dependency**: idmap propagation through plain
|
|
||||||
re-bind was verified on 6.12.86. Older kernels may behave
|
|
||||||
differently. ovh.left4me is on Trixie's 6.12, so we're fine; future
|
|
||||||
hosts on older kernels would need verification. Document the kernel
|
|
||||||
floor (≥ 6.6 for overlayfs+idmap, but ≥ 6.x for the propagation —
|
|
||||||
we have no exact lower bound documented).
|
|
||||||
- **Stale idmap binds during migration**: server 2 currently has two
|
|
||||||
active gameserver-side idmap binds (`runtime/2/idmap/overlays_4`
|
|
||||||
and `overlays_9`). The first stop after deploy uses the existing
|
|
||||||
helper code (with `_is_mountpoint` fix) to umount them. Verified
|
|
||||||
in the recent fix cycle. New starts won't create new binds.
|
|
||||||
- **Sandbox migration of in-flight builds**: if a script-overlay
|
|
||||||
build is running during the deploy + chown migration, the chown
|
|
||||||
could happen mid-write. Mitigation: don't run the chown while a
|
|
||||||
build is active; check via `systemctl list-units
|
|
||||||
'left4me-script-*'` first.
|
|
||||||
- **The trap-based cleanup in bash**: if the helper is hit with
|
|
||||||
SIGKILL, the trap doesn't fire and the staging bind leaks. Same
|
|
||||||
exposure as today's leaks (gameserver-side stale binds on similar
|
|
||||||
scenarios). Acceptable; the next sandbox run for the same overlay
|
|
||||||
id `umount`s the leftover bind first via the trap setup pattern
|
|
||||||
(`umount; rmdir; mkdir -p; mount --bind` is idempotent).
|
|
||||||
|
|
||||||
## Why this is worth doing despite the working current solution
|
|
||||||
|
|
||||||
Today's idmap-on-mount works and is correct. The reasons to refactor:
|
|
||||||
|
|
||||||
- **Architectural locality**: the uid translation is a build-time
|
|
||||||
concern (the sandbox creates files); having it as a mount-time
|
|
||||||
concern means the gameserver path needs to know about a producer-
|
|
||||||
side decision.
|
|
||||||
- **Code reduction**: helper shrinks by ~140 lines; tests by ~150.
|
|
||||||
Removed code is removed bug surface.
|
|
||||||
- **On-disk consistency**: all overlay content becomes `left4me`-
|
|
||||||
owned. Easier to reason about (no two-tier ownership), easier to
|
|
||||||
manually inspect (no per-overlay-type ownership).
|
|
||||||
- **Mount lifecycle simplification**: no per-instance idmap dir
|
|
||||||
creation, no per-start uid lookups, no per-stop bind teardown, no
|
|
||||||
stacked-bind regression hazard from the same-fs `os.path.ismount`
|
|
||||||
trap (we already fixed that once).
|
|
||||||
- **Web app read path**: drops the world-read chmod kludge in the
|
|
||||||
sandbox helper. File-tree download endpoint reads via primary uid.
|
|
||||||
|
|
||||||
The cost (refactor + migration) is paid once; the benefit is
|
|
||||||
permanent.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- Splitting the web-app uid from the gameserver uid (future change
|
|
||||||
noted in earlier plans).
|
|
||||||
- Rewriting shell helpers in Python.
|
|
||||||
- `left4me-apply-cake` cleanup (still drifting along in the install
|
|
||||||
glob).
|
|
||||||
- Re-examining whether `l4d2-sandbox` should exist as a separate uid
|
|
||||||
at all (this plan keeps it, but the cost-benefit might shift
|
|
||||||
later).
|
|
||||||
|
|
@ -1,198 +0,0 @@
|
||||||
# Deploy-dir architecture rethink — implementation plan
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
Resolves the open questions in `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md`. After the 2026-05-15 script-consolidation work, `deploy/` ended up half-canonical / half-historical: the privileged scripts were treated as load-bearing source-of-truth there, while sudoers/sysctl/env-templates stayed duplicated against ckn-bw, and the obsolete `deploy-test-server.sh` plus a pile of dead static unit files lingered. The shape worked but couldn't be described in two sentences.
|
|
||||||
|
|
||||||
This plan commits to the framing the user picked: **`deploy/` is a reference exemplar** — readable enough that a fresh consumer (ckn-bw today, hypothetical docker/ansible/manual tomorrow) could build a deployment from it, but not the live source of truth for installed binaries. The privileged scripts are **application-inherent code** and move out of `deploy/` to top-level `scripts/{libexec,sbin}/`. Dead code is deleted in the same pass. ckn-bw is updated to read scripts from the new location. The intended outcome: `deploy/` shrinks to README + example configs + a couple of curated example units, the rules for "what goes here" fit in two sentences, and the cross-repo install path becomes self-explanatory.
|
|
||||||
|
|
||||||
## End state
|
|
||||||
|
|
||||||
```
|
|
||||||
left4me/
|
|
||||||
scripts/
|
|
||||||
libexec/
|
|
||||||
left4me-overlay # 244-line Python helper (mount/umount)
|
|
||||||
left4me-script-sandbox # 109-line bash (systemd-run sandbox)
|
|
||||||
left4me-systemctl # 44-line sh wrapper
|
|
||||||
left4me-journalctl # 53-line sh wrapper
|
|
||||||
sbin/
|
|
||||||
left4me # 17-line admin CLI wrapper
|
|
||||||
tests/
|
|
||||||
test_overlay.py
|
|
||||||
test_script_sandbox.py
|
|
||||||
test_systemctl_helper.py
|
|
||||||
test_journalctl_helper.py
|
|
||||||
test_sudoers_grants.py # tests the contract between scripts and sudoers
|
|
||||||
deploy/ # REFERENCE ONLY — see deploy/README.md
|
|
||||||
README.md # rewritten: explains target layout, points at scripts/
|
|
||||||
files/
|
|
||||||
etc/
|
|
||||||
sudoers.d/left4me # example, ckn-bw ships its own verbatim copy
|
|
||||||
sysctl.d/99-left4me.conf # example
|
|
||||||
left4me/sandbox-resolv.conf # example
|
|
||||||
usr/local/lib/systemd/system/
|
|
||||||
left4me-server@.service # curated example of what ckn-bw's reactor emits
|
|
||||||
left4me-web.service # curated example
|
|
||||||
left4me-workshop-refresh.service # curated example
|
|
||||||
left4me-workshop-refresh.timer # curated example
|
|
||||||
l4d2-game.slice # curated example
|
|
||||||
l4d2-build.slice # curated example
|
|
||||||
templates/etc/left4me/
|
|
||||||
host.env # example, ckn-bw renders its own mako version
|
|
||||||
web.env.template
|
|
||||||
tests/
|
|
||||||
test_example_units.py # slimmed: just locks down the curated examples
|
|
||||||
l4d2host/ # unchanged
|
|
||||||
l4d2web/ # unchanged
|
|
||||||
docs/
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step-by-step
|
|
||||||
|
|
||||||
### 1. Create `scripts/` and move helpers
|
|
||||||
|
|
||||||
- `mkdir -p scripts/libexec scripts/sbin scripts/tests`
|
|
||||||
- `git mv` the four live helpers and the admin CLI:
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-overlay` → `scripts/libexec/left4me-overlay`
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` → `scripts/libexec/left4me-script-sandbox`
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-systemctl` → `scripts/libexec/left4me-systemctl`
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-journalctl` → `scripts/libexec/left4me-journalctl`
|
|
||||||
- `deploy/files/usr/local/sbin/left4me` → `scripts/sbin/left4me`
|
|
||||||
- The scripts' contents are unchanged. Every install-target path inside them (`/usr/local/libexec/left4me/...`, `/etc/left4me/...`, `/var/lib/left4me/...`) stays exactly as is — those are runtime paths, not source-tree paths.
|
|
||||||
|
|
||||||
### 2. Delete dead code
|
|
||||||
|
|
||||||
- `git rm` (truly obsolete; replacements live elsewhere or feature was retired):
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-apply-cake` — CAKE migrated to systemd-networkd via `network/<iface>/cake` node metadata in ckn-bw.
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-cake.service` — same reason.
|
|
||||||
- `deploy/files/etc/left4me/cake.env` — bandwidth lives in node metadata, not an env file.
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service` — central `bundles/nftables/` consumes the rules now.
|
|
||||||
- `deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft` — same. After the delete, the now-empty `deploy/files/usr/local/lib/left4me/` and its `nft/` child disappear (git doesn't track empty dirs).
|
|
||||||
- `deploy/deploy-test-server.sh` — superseded by `bw apply`; content survives in git history.
|
|
||||||
- **Do NOT delete** `deploy/files/usr/local/lib/systemd/system/left4me-workshop-refresh.{service,timer}`. The workshop-refresh job is live (invokes `flask workshop-refresh`, defined in `l4d2web/cli.py`); ckn-bw's reactor emits these on production. They stay as curated examples, same category as `left4me-server@.service` / `left4me-web.service` / the slices. (This corrects the framing in `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md` and item 2 of `docs/superpowers/specs/2026-05-15-janitorial-cleanup.md`, both of which lumped workshop-refresh together with truly-dead units.)
|
|
||||||
- Stale `__pycache__` dirs under `deploy/files/usr/local/libexec/left4me/` are deleted by the moves in step 1.
|
|
||||||
|
|
||||||
### 3. Split and relocate `deploy/tests/test_deploy_artifacts.py`
|
|
||||||
|
|
||||||
The current file (~880 lines) is doing four jobs. Split as follows; do not duplicate tests across files.
|
|
||||||
|
|
||||||
**Concrete sequence to preserve git history where it counts**:
|
|
||||||
|
|
||||||
1. `git mv deploy/tests/test_deploy_artifacts.py deploy/tests/test_example_units.py` — single rename, history follows via `git log --follow`.
|
|
||||||
2. In the renamed file, delete every test except the "Keep in `deploy/tests/test_example_units.py`" list below. The kept tests track the unit/sysctl/env-template examples, which is what `deploy/tests/` will mean afterwards.
|
|
||||||
3. Create new `scripts/tests/*.py` files (and `conftest.py`) by writing them fresh — pasting the relevant test functions across. The extracted tests lose direct rename history, but blame against the new files still resolves to the originals one git ref back; acceptable tradeoff.
|
|
||||||
|
|
||||||
**Move to `scripts/tests/`** (tests of script behavior + the sudoers contract that gates the scripts):
|
|
||||||
|
|
||||||
- `scripts/tests/test_overlay.py` — `test_overlay_helper_is_python_with_strict_validation`, `test_overlay_helper_mount_is_idempotent_when_already_mounted`
|
|
||||||
- `scripts/tests/test_script_sandbox.py` — `test_script_sandbox_helper_present`, `test_script_sandbox_helper_passes_shell_syntax_check`, `test_script_sandbox_helper_invokes_systemd_run_with_hardening`, `test_script_sandbox_uses_idmap_staging`, `test_script_sandbox_in_build_slice_with_oom_adjust`, `test_script_sandbox_helper_validates_overlay_id`, `test_script_sandbox_helper_dry_run_mode`
|
|
||||||
- `scripts/tests/test_systemctl_helper.py` — `test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args`
|
|
||||||
- `scripts/tests/test_journalctl_helper.py` — `test_journalctl_helper_passes_shell_syntax_check_and_rejects_bad_args`
|
|
||||||
- `scripts/tests/test_helpers_use_fixed_paths.py` — `test_helpers_use_fixed_system_tool_paths_not_sudo_path`
|
|
||||||
- `scripts/tests/test_sudoers_grants.py` — `test_sudoers_allows_only_left4me_helpers_not_raw_system_tools` (still reads `deploy/files/etc/sudoers.d/left4me` as the canonical example; comment why)
|
|
||||||
|
|
||||||
The `ROOT/DEPLOY` path-prefix constants in each file get rewritten so `SCRIPTS = Path(__file__).resolve().parents[2] / "scripts"` and helpers resolve to `SCRIPTS / "libexec/left4me-overlay"` etc. Shared helpers (`_fake_command`, `_env_with_fake_commands`) move into `scripts/tests/conftest.py`.
|
|
||||||
|
|
||||||
**Keep in `deploy/tests/test_example_units.py`** (locks down the curated examples; renamed from the current file):
|
|
||||||
|
|
||||||
- `test_global_unit_files_exist_at_product_level_paths`
|
|
||||||
- `test_web_unit_contains_required_runtime_contract`
|
|
||||||
- `test_server_unit_contains_required_runtime_contract`
|
|
||||||
- `test_server_unit_mounts_overlay_via_exec_start_pre`
|
|
||||||
- `test_server_unit_unmounts_overlay_via_exec_stop_post`
|
|
||||||
- `test_server_unit_contains_perf_baseline_directives`
|
|
||||||
- `test_l4d2_game_slice_exists_with_high_weights`
|
|
||||||
- `test_l4d2_build_slice_exists_with_low_weights`
|
|
||||||
- `test_sysctl_conf_present_with_perf_settings`
|
|
||||||
- `test_env_templates_contain_required_defaults`
|
|
||||||
- `test_sandbox_resolv_conf_exists`
|
|
||||||
|
|
||||||
Add a top-of-file docstring: *"These tests lock down the curated examples kept in `deploy/files/` for reference. The production units are emitted by ckn-bw's reactor in `bundles/left4me/metadata.py`; when reactor output drifts intentionally, update the examples here too."*
|
|
||||||
|
|
||||||
**Delete entirely** (target removed or no longer load-bearing):
|
|
||||||
|
|
||||||
- All `test_deploy_script_*` tests (12 tests; `deploy-test-server.sh` is gone)
|
|
||||||
- `test_globals_refresh_units_removed` — file already deleted; nothing to lock down
|
|
||||||
- `test_nft_mark_file_marks_left4me_udp_with_dscp_ef_and_priority`, `test_nft_mark_unit_loads_and_clears_left4me_table` — nft-mark moved to central nftables bundle
|
|
||||||
- `test_cake_env_template_documents_required_knobs`, `test_apply_cake_helper_supports_apply_and_clear_modes`, `test_apply_cake_helper_passes_shell_syntax_check`, `test_cake_unit_runs_helper_in_apply_and_clear_modes` — CAKE moved to systemd-networkd
|
|
||||||
- `test_deploy_script_installs_overlay_helper_with_executable_mode`, `test_deploy_script_installs_script_sandbox_helper` — install responsibility now lives in ckn-bw's bundle, not in any left4me-side script
|
|
||||||
|
|
||||||
Final file count: `scripts/tests/` gets 6 files, `deploy/tests/test_example_units.py` is one file, `deploy/tests/test_deploy_artifacts.py` is gone (renamed).
|
|
||||||
|
|
||||||
### 4. Rewrite `deploy/README.md`
|
|
||||||
|
|
||||||
Reframe the top of the file as: *"This directory is a reference exemplar. The canonical deploy is [ckn-bw](https://git.sublimity.de/cronekorkn/ckn-bw)'s `bundles/left4me/` (run `bw apply ovh.left4me`). Files under `deploy/files/` and `deploy/templates/` are readable examples — not the binaries / configs ckn-bw actually installs. Read them to understand the target layout if you're building a fresh deployment by other means."*
|
|
||||||
|
|
||||||
Update the file/status table:
|
|
||||||
|
|
||||||
- Drop rows for files that no longer exist (apply-cake, cake.service, cake.env, nft-mark.*, workshop-refresh.*).
|
|
||||||
- Drop the `deploy-test-server.sh` row.
|
|
||||||
- For the privileged-scripts rows, change `files/usr/local/libexec/left4me/...` → `(moved to scripts/libexec/, installed by ckn-bw's install_left4me_scripts action)`; same for the sbin row.
|
|
||||||
- Mark the remaining `files/etc/...` and `files/usr/local/lib/systemd/system/...` entries explicitly as **example**: ckn-bw ships its own verbatim copies of the configs, its reactor emits the units.
|
|
||||||
|
|
||||||
Keep the "Target Layout" / "Runtime User" / "Overlay References" / "Performance Tuning" sections — they're useful reference prose. Strip the "Running A Test Deployment" / "Admin Bootstrap" sections that refer to the deleted shell installer; replace with a one-paragraph pointer to ckn-bw.
|
|
||||||
|
|
||||||
### 5. ckn-bw cross-repo update
|
|
||||||
|
|
||||||
The `install_left4me_scripts` action in `bundles/left4me/items.py` currently reads from `/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/`. Update it to read from `/opt/left4me/src/scripts/{libexec,sbin}/`. The install target is unchanged (`/usr/local/libexec/left4me/`, `/usr/local/sbin/left4me`), so nothing on the deployed host moves.
|
|
||||||
|
|
||||||
This is a separate PR in the ckn-bw repo. It must land **at the same time** as the left4me move — the install action depends on the source paths existing. Coordination:
|
|
||||||
|
|
||||||
1. Open both PRs simultaneously.
|
|
||||||
2. Merge order: left4me first (scripts exist at the new path in `/opt/left4me/src/` only after a fresh `git_deploy`), then ckn-bw, then `bw apply ovh.left4me`.
|
|
||||||
3. Alternative: have the ckn-bw PR fall back to the old path if the new path doesn't exist (one extra glob); decide during ckn-bw review whether the complexity is worth the looser coupling. Default: no fallback, coordinate the merges.
|
|
||||||
|
|
||||||
Verification on the deploy target: after `bw apply`, the files under `/usr/local/libexec/left4me/` and `/usr/local/sbin/left4me` should be byte-identical to before. Sudoers, services, the web app: all unchanged.
|
|
||||||
|
|
||||||
### 6. Mark adjacent specs / docs as resolved
|
|
||||||
|
|
||||||
- `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md`: prepend a `**Resolved 2026-05-15 by docs/superpowers/plans/…</plan-name>.md.**` line at the top. Leave the body intact for archaeology.
|
|
||||||
- `docs/superpowers/specs/2026-05-15-janitorial-cleanup.md`: cross out items 1, 5, 6 (now handled here). Item 2 needs a rewrite — the framing "all static unit files are obsolete drift" was wrong; the live reactor-emitted set (`server@`, `web`, `workshop-refresh.{service,timer}`, `l4d2-{game,build}.slice`) stays in `deploy/files/` as curated examples. The truly-dead two (`left4me-cake.service`, `left4me-nft-mark.service`) are already deleted by this plan, so item 2 collapses to "no remaining work."
|
|
||||||
- No memory file changes needed; the project state captured here is structural and re-derivable from `deploy/README.md` after the rewrite lands.
|
|
||||||
|
|
||||||
### 7. Rollback notes
|
|
||||||
|
|
||||||
If `bw apply ovh.left4me` against the test server breaks something after the cross-repo merge:
|
|
||||||
|
|
||||||
1. Revert the ckn-bw `install_left4me_scripts` action change to the old source path (`/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/`). Re-apply.
|
|
||||||
2. The left4me side never needs reverting in isolation — the scripts at the new path are byte-identical to the old ones, so a stale ckn-bw install action against a *new* left4me checkout would fail at `install -t` (source path missing). That failure is loud and safe: nothing on the deployed system gets modified.
|
|
||||||
3. The only foot-gun is **partial rollout**: ckn-bw updated but left4me not yet checked out at the right revision. The `git_deploy` step pins the revision, so as long as the two PRs reference compatible commits, the deployed `/opt/left4me/src/` always matches the action's expectation.
|
|
||||||
|
|
||||||
## What does NOT change
|
|
||||||
|
|
||||||
- Runtime install-target paths (`/usr/local/libexec/left4me/...`, `/usr/local/sbin/left4me`) — every reference inside `l4d2host/service_control.py:7-8`, `l4d2web/services/overlay_builders.py:34`, the sudoers file, and the systemd units stays the same.
|
|
||||||
- The Python packages `l4d2host/` and `l4d2web/`.
|
|
||||||
- ckn-bw's bundles for sudoers / sysctl / sandbox-resolv.conf — those keep their own verbatim copies (the user picked "deploy/ keeps configs as examples; duplication-with-ckn-bw is OK because deploy/ is explicitly reference"). Janitoring the duplication is *not* in scope for this plan.
|
|
||||||
- The Mako env templates in ckn-bw — they stay where they are, since they need bw's metadata access for rendering.
|
|
||||||
- The recent overlay-idmap / script-sandbox idmap-staging work — untouched.
|
|
||||||
|
|
||||||
## Critical files (jump points for the implementor)
|
|
||||||
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — the source for the test split (lines 20-32 are the path constants; tests grouped roughly by helper from line 138 onward)
|
|
||||||
- `deploy/README.md` — full rewrite of the top section, partial rewrite of the table
|
|
||||||
- `l4d2host/service_control.py:7-8` — verify install-target paths unchanged (sanity)
|
|
||||||
- `l4d2web/services/overlay_builders.py:34` — same
|
|
||||||
- `deploy/files/etc/sudoers.d/left4me` — sanity-check that no path inside changed
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/{left4me-server@.service,left4me-web.service,l4d2-{game,build}.slice}` — survive as curated examples
|
|
||||||
- ckn-bw repo: `bundles/left4me/items.py` — the `install_left4me_scripts` action (separate PR)
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
End-to-end:
|
|
||||||
|
|
||||||
1. **Source-tree consistency.** `find scripts deploy -type f | sort` matches the layout in "End state" above (modulo `__pycache__`).
|
|
||||||
2. **All tests pass locally.** From the repo root: `pytest scripts/tests/ deploy/tests/ l4d2host/tests/ l4d2web/tests/` — every test passes. Specifically verify `scripts/tests/test_sudoers_grants.py` still reads `deploy/files/etc/sudoers.d/left4me` correctly (path constant points across the dir boundary).
|
|
||||||
3. **Shell syntax checks.** The split tests should still run `sh -n` / `bash -n` against the moved scripts; no script edits means no syntax regressions, but the test paths must resolve.
|
|
||||||
4. **No accidental application breakage.** `grep -rn '/usr/local/libexec/left4me\|/usr/local/sbin/left4me' l4d2host l4d2web` returns the same hits as before (paths are install-target, source moves don't affect them).
|
|
||||||
5. **ckn-bw dry-run.** Once the ckn-bw PR is up, `bw apply --dry-run ovh.left4me` from the ckn-bw repo: the diff should show **no changes** to files under `/usr/local/libexec/left4me/` or `/usr/local/sbin/left4me` (byte-identical content via the new path).
|
|
||||||
6. **Production apply.** `bw apply ovh.left4me` against the real test server. After apply: `systemctl status left4me-web.service` is green, starting a game server via the web UI still works (overlay mount → srcds_run → unmount on stop), running an overlay build script through the sandbox still works.
|
|
||||||
|
|
||||||
## Out of scope (handled elsewhere or deferred)
|
|
||||||
|
|
||||||
- The Mako template duplication in ckn-bw — separate cleanup; the templates legitimately need bw's metadata access.
|
|
||||||
- The 1/2/3-user uid-split decision — `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
|
|
||||||
- The script-sandbox → systemd template unit refactor — `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`.
|
|
||||||
- Remaining janitorial items: item 3 (bubblewrap→systemd-run doc drift), item 4 (stale gameserver-side idmap binds), calendar reminder for SM 1.13 stable. Items 1, 2 (partial — see step 6), 5, 6 are subsumed here.
|
|
||||||
- Rewriting the shell helpers in Python / packaging them as console_scripts — explicitly rejected in the recent script-consolidation plan (egg-info + TOCTOU privilege concerns).
|
|
||||||
- Historical references inside `docs/superpowers/plans/*` and `docs/superpowers/specs/*` to `deploy/files/...` or `deploy-test-server.sh` paths. Those are time-stamped snapshots of past sessions; they don't get rewritten when the underlying tree moves.
|
|
||||||
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -1,198 +0,0 @@
|
||||||
# Plan: scope Server Log to the current unit invocation
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
Today the Server Log panel on `/servers/<id>` shows the last 200 lines of the
|
|
||||||
unit's **entire** journal — i.e. across every prior `start` / `stop` /
|
|
||||||
`reset` cycle — then follows. That means a freshly-started server can mix
|
|
||||||
lines from the current boot with leftovers from yesterday, which makes the
|
|
||||||
log harder to reason about. The user wants the panel to begin at the most
|
|
||||||
recent unit start.
|
|
||||||
|
|
||||||
The right systemd primitive is `_SYSTEMD_INVOCATION_ID`: systemd assigns a
|
|
||||||
fresh 128-bit ID to every (re)start of a unit, queryable via
|
|
||||||
`systemctl show -p InvocationID --value <unit>`. Filtering
|
|
||||||
`journalctl _SYSTEMD_INVOCATION_ID=<id>` gives exactly that one run.
|
|
||||||
|
|
||||||
User decisions (already confirmed):
|
|
||||||
|
|
||||||
- **Scope** — always last invocation; no toggle, no historical view.
|
|
||||||
- **Empty case** (unit has never been started) — SSE stays open with
|
|
||||||
keepalives, yields no data lines, attaches when first invocation appears.
|
|
||||||
- **Mid-stream restart** — backend force-disconnects when the InvocationID
|
|
||||||
changes. `EventSource` reconnects on its own and the next request picks
|
|
||||||
up the new run.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
Three layers, smallest blast radius first:
|
|
||||||
|
|
||||||
```
|
|
||||||
browser ─SSE─► l4d2web routes/log_routes.py
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
l4d2web services/l4d2_facade.stream_server_logs
|
|
||||||
│ (l4d2ctl logs <unit> --lines N --follow)
|
|
||||||
▼
|
|
||||||
l4d2host CLI logs → service_control.stream_journal
|
|
||||||
│
|
|
||||||
┌────────────┴──────────────────┐
|
|
||||||
▼ ▼
|
|
||||||
sudo left4me-systemctl show sudo left4me-journalctl
|
|
||||||
-p ActiveState <unit> --invocation-id <hex32>
|
|
||||||
-p SubState --lines N --follow|--no-follow
|
|
||||||
-p InvocationID ←NEW
|
|
||||||
```
|
|
||||||
|
|
||||||
`l4d2host.service_control.stream_journal` becomes the orchestrator:
|
|
||||||
|
|
||||||
1. Resolve `InvocationID` via `show_service` (already returns a parsed
|
|
||||||
`key=value` dict in `status.py:32-37` — adding a property is harmless).
|
|
||||||
2. **Empty `InvocationID`** (unit never ran):
|
|
||||||
- `follow=False` → return `iter(())`.
|
|
||||||
- `follow=True` → yield `""` every ~10 s as a keepalive nudge; poll
|
|
||||||
`InvocationID` every ~3 s; once it appears, fall through to step 3.
|
|
||||||
3. **Non-empty `InvocationID`** → `Popen` the journalctl helper with
|
|
||||||
`--invocation-id <id>`. Start a daemon thread that re-reads the unit's
|
|
||||||
`InvocationID` every ~5 s; if it changes, call `proc.terminate()`.
|
|
||||||
The generator's normal end-of-stream path then closes the SSE response,
|
|
||||||
the browser's `EventSource` reconnects, and the next call picks up the
|
|
||||||
new ID.
|
|
||||||
|
|
||||||
`lines` cap is preserved (journalctl `-n N` inside the invocation), so a
|
|
||||||
long-running server doesn't dump tens of thousands of lines on page-load.
|
|
||||||
|
|
||||||
## Files to change
|
|
||||||
|
|
||||||
### Helpers (`deploy/scripts/libexec/`)
|
|
||||||
|
|
||||||
- **`left4me-systemctl`** — extend `show` to also request
|
|
||||||
`--property=InvocationID`. One-line change at `:43`:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
show) exec "$systemctl" show \
|
|
||||||
--property=ActiveState \
|
|
||||||
--property=SubState \
|
|
||||||
--property=InvocationID \
|
|
||||||
"$unit" ;;
|
|
||||||
```
|
|
||||||
|
|
||||||
- **`left4me-journalctl`** — replace the unit-based filter with an
|
|
||||||
invocation-id-based one. New CLI signature:
|
|
||||||
|
|
||||||
```
|
|
||||||
left4me-journalctl <name> --invocation-id <hex32> --lines <n> --follow|--no-follow
|
|
||||||
```
|
|
||||||
|
|
||||||
Validate `<hex32>` against `^[0-9a-f]{32}$` (32 lowercase hex chars), same
|
|
||||||
defensive style as the existing name validation. Exec:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
exec "$journalctl" \
|
|
||||||
_SYSTEMD_INVOCATION_ID="$invocation_id" \
|
|
||||||
-n "$lines" -o cat $follow_arg
|
|
||||||
```
|
|
||||||
|
|
||||||
No `-u <unit>` — the invocation ID is globally unique, the predicate is
|
|
||||||
enough. Old `<name> --lines <n> --follow` shape is removed (no callers
|
|
||||||
remain after the host layer change).
|
|
||||||
|
|
||||||
### Host (`l4d2host/`)
|
|
||||||
|
|
||||||
- **`l4d2host/service_control.py`**
|
|
||||||
- `journalctl_command(name, *, invocation_id, lines, follow)` → builds
|
|
||||||
the new arg list. Drop the `-u`-based form.
|
|
||||||
- `stream_journal(name, *, lines=200, follow=True)` → orchestrator from
|
|
||||||
Architecture step 1-3 above. Helpers in this file:
|
|
||||||
- `get_invocation_id(name) -> str` (parses `show_service` output;
|
|
||||||
returns `""` if unset).
|
|
||||||
- `_stream_with_restart_guard(invocation_id, lines, follow)` —
|
|
||||||
`Popen` + daemon poller thread.
|
|
||||||
- Keep `stream_command` as-is (still consumed by `host_commands`).
|
|
||||||
|
|
||||||
- **`l4d2host/logs.py`** — no signature change; just forwards.
|
|
||||||
|
|
||||||
- **`l4d2host/cli.py`** — `logs` command keeps `--lines`/`--follow` flags
|
|
||||||
unchanged. No CLI-surface break.
|
|
||||||
|
|
||||||
### Web (`l4d2web/`)
|
|
||||||
|
|
||||||
- **`services/l4d2_facade.py`** — `stream_server_logs` keeps its signature.
|
|
||||||
The behavior change is fully inherited from the host layer.
|
|
||||||
|
|
||||||
- **`routes/log_routes.py`** — unchanged. The existing keepalive logic at
|
|
||||||
`:33` (`if line == "": yield ": keepalive\n\n"`) already handles the
|
|
||||||
empty-line nudges the host yields during the idle wait.
|
|
||||||
|
|
||||||
### Tests
|
|
||||||
|
|
||||||
- **`l4d2host/tests/test_logs.py`**
|
|
||||||
- Update `test_stream_instance_logs_uses_journalctl_helper` for the new
|
|
||||||
arg shape: `["sudo", "-n", "/usr/local/libexec/left4me/left4me-journalctl", "alpha", "--invocation-id", "<32hex>", "--lines", "25", "--no-follow"]`.
|
|
||||||
Stub out `get_invocation_id` to return a known ID.
|
|
||||||
- Add: empty InvocationID + `follow=False` → empty iterator (no
|
|
||||||
journalctl call).
|
|
||||||
- Add: empty InvocationID + `follow=True` → yields `""` then the next
|
|
||||||
`get_invocation_id` returns a real ID and the journalctl helper is
|
|
||||||
called once.
|
|
||||||
- Add: invocation changes mid-stream → poller calls `proc.terminate()`.
|
|
||||||
|
|
||||||
- **`l4d2host/tests/test_cli.py`** — `test_logs_command_streams_lines`:
|
|
||||||
update expected helper invocation, or stub at the `stream_instance_logs`
|
|
||||||
level (it's already monkeypatched in similar tests).
|
|
||||||
|
|
||||||
- **`deploy/scripts/tests/test_journalctl_helper.py`** — update existing
|
|
||||||
shell-syntax & argument-validation test for the new CLI signature.
|
|
||||||
Assert rejection of malformed invocation IDs (too short, non-hex,
|
|
||||||
uppercase, embedded slash).
|
|
||||||
|
|
||||||
- **`l4d2web/tests/test_status_and_server_logs.py`** — should pass
|
|
||||||
unchanged (the SSE shape and route surface haven't moved).
|
|
||||||
|
|
||||||
## Critical files
|
|
||||||
|
|
||||||
- `deploy/scripts/libexec/left4me-systemctl` (extend `show`)
|
|
||||||
- `deploy/scripts/libexec/left4me-journalctl` (rewrite CLI shape)
|
|
||||||
- `l4d2host/l4d2host/service_control.py` (orchestrator + helpers)
|
|
||||||
- `l4d2host/tests/test_logs.py`, `l4d2host/tests/test_cli.py`
|
|
||||||
- `deploy/scripts/tests/test_journalctl_helper.py`
|
|
||||||
- (No changes:) `l4d2web/services/l4d2_facade.py`, `l4d2web/routes/log_routes.py`
|
|
||||||
|
|
||||||
## Why not alternative approaches
|
|
||||||
|
|
||||||
- **`--since <ActiveEnterTimestamp>`** — works in the happy path but is
|
|
||||||
fragile to clock skew, system suspend, and units that restart inside
|
|
||||||
the same second. `_SYSTEMD_INVOCATION_ID` was added to systemd
|
|
||||||
specifically for this filter.
|
|
||||||
- **String-match the systemd `Started …` marker** — locale-dependent,
|
|
||||||
breaks with systemd-message changes, can't survive `Restart=`.
|
|
||||||
- **Toggle in the UI** — user explicitly opted out; YAGNI.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. **Unit tests** (sandboxed):
|
|
||||||
|
|
||||||
```
|
|
||||||
uv run --package l4d2host pytest l4d2host/tests/
|
|
||||||
uv run --package l4d2web pytest l4d2web/tests/
|
|
||||||
uv run pytest deploy/scripts/tests/ deploy/tests/
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Manual on the host** (`ckn@10.0.4.128`):
|
|
||||||
|
|
||||||
```
|
|
||||||
# the unit is running
|
|
||||||
l4d2ctl logs vanilla --no-follow | head -5
|
|
||||||
# → first lines should be from this run's start, not yesterday
|
|
||||||
|
|
||||||
# never-started case (pick an unstarted server, or stop first)
|
|
||||||
l4d2ctl stop vanilla && l4d2ctl logs vanilla --no-follow
|
|
||||||
# → empty output, exit 0
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **End-to-end in browser**:
|
|
||||||
- Open `/servers/1`. Confirm log starts at this run's first line.
|
|
||||||
- Click Stop. Stream goes quiet. Click Start. SSE auto-reconnects and
|
|
||||||
shows the new run from line one.
|
|
||||||
- Open a fresh server that has never been started: log panel is empty
|
|
||||||
but connection is alive; clicking Start makes log appear within seconds.
|
|
||||||
|
|
@ -1,226 +0,0 @@
|
||||||
# UID collapse — remove `l4d2-sandbox` user
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The hardening refactor landed earlier today
|
|
||||||
(`docs/superpowers/plans/2026-05-15-hardening-refactor.md`) deployed
|
|
||||||
the systemd-directive composition that covers all same-uid attack
|
|
||||||
vectors for the gameserver + web units running as `left4me`.
|
|
||||||
|
|
||||||
The script-sandbox unit still runs as a separate uid `l4d2-sandbox`
|
|
||||||
(981) with a build-time idmap (`mount --bind --map-users=980:981:1`)
|
|
||||||
translating sandbox-side writes to land on disk as `left4me`. After
|
|
||||||
the hardening refactor, the same-uid attack vectors the sandbox uid
|
|
||||||
defends against (FS-view access, ptrace, /proc, signals) are
|
|
||||||
already closed by the sandbox's own systemd-run hardening profile.
|
|
||||||
The separate uid is now defense-in-depth only — and it's
|
|
||||||
inconsistent with the decision *not* to split the web/server uid.
|
|
||||||
|
|
||||||
Pick one principle. Option C from the discussion: **one user**.
|
|
||||||
Delete `l4d2-sandbox`, simplify the sandbox helper, remove the
|
|
||||||
idmap. Architecture gets smaller (one fewer uid, no idmap binds,
|
|
||||||
~30 lines deleted from the helper). Trade: if sandbox hardening
|
|
||||||
regresses, kernel uid boundary no longer helps — consistent with
|
|
||||||
what we already accepted for server/web.
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
1. **Edit `scripts/libexec/left4me-script-sandbox`** (left4me repo):
|
|
||||||
delete the idmap block (lines 49-78 per Phase 1 exploration —
|
|
||||||
the `LEFT4ME_UID`/`SANDBOX_UID` lookups, `STAGING` setup,
|
|
||||||
`cleanup_staging` trap, `mount --bind --map-users=…` call).
|
|
||||||
Change `User=l4d2-sandbox -p Group=l4d2-sandbox` (line 85)
|
|
||||||
to `User=left4me -p Group=left4me`. Change
|
|
||||||
`BindPaths="${STAGING}:/overlay"` (line 102) to
|
|
||||||
`BindPaths="${OVERLAY_DIR}:/overlay"`. Keep the
|
|
||||||
`nsenter --mount=/proc/1/ns/mnt` self-wrap at the top — it's
|
|
||||||
about namespace escape, not uid.
|
|
||||||
|
|
||||||
2. **Update `scripts/tests/test_script_sandbox.py`** (left4me repo):
|
|
||||||
- Lines 36-37: change `User=l4d2-sandbox`/`Group=l4d2-sandbox`
|
|
||||||
assertions → `User=left4me`/`Group=left4me`.
|
|
||||||
- Delete `test_script_sandbox_uses_idmap_staging` (lines 114-133)
|
|
||||||
entirely — it asserts the idmap and staging exist; after
|
|
||||||
refactor neither does.
|
|
||||||
- Update line 165-166 comments to drop the sandbox-uid reference.
|
|
||||||
|
|
||||||
3. **Update inline comments** referencing the sandbox uid:
|
|
||||||
- `l4d2web/services/overlay_builders.py:342` (or near 100 — agents
|
|
||||||
reported different lines; locate via grep) — "as l4d2-sandbox"
|
|
||||||
→ "as left4me".
|
|
||||||
- `l4d2host/instances.py:80` — comment about l4d2-sandbox-owned
|
|
||||||
lower-layer files → reflect that all overlay content is now
|
|
||||||
left4me-owned end-to-end.
|
|
||||||
|
|
||||||
4. **Mark the build-time-idmap plan superseded**:
|
|
||||||
`docs/superpowers/plans/2026-05-15-build-time-idmap.md` — add a
|
|
||||||
top-line status note: "SUPERSEDED 2026-05-15 by the uid-collapse
|
|
||||||
refactor (this plan). The idmap pattern this plan introduced is
|
|
||||||
removed because source uid (`left4me`) now equals target uid
|
|
||||||
(`left4me`) — translation is a no-op." Same one-line treatment
|
|
||||||
for `docs/superpowers/plans/2026-05-14-overlay-idmap.md`.
|
|
||||||
|
|
||||||
5. **Update the user-uid-split spec's existing superseded header**:
|
|
||||||
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md` —
|
|
||||||
currently says "2 users (current state) is correct"; revise to
|
|
||||||
say "1 user (after uid-collapse refactor) is correct" and update
|
|
||||||
the reasoning paragraph.
|
|
||||||
|
|
||||||
6. **Light-touch updates to other docs** that reference
|
|
||||||
`l4d2-sandbox` for accuracy. Pragmatic scope — add a top-line
|
|
||||||
note instead of rewriting body content:
|
|
||||||
- `deploy/README.md` — drop the `l4d2-sandbox` bullet (line 84),
|
|
||||||
fix the paragraph at line 141 to reflect no-idmap state.
|
|
||||||
- `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`
|
|
||||||
and `2026-05-15-hardening-threat-model.md` — add a one-line
|
|
||||||
"Updated 2026-05-15: l4d2-sandbox collapsed into left4me; see
|
|
||||||
plans/2026-05-15-uid-collapse.md" note in the relevant context
|
|
||||||
section.
|
|
||||||
- `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`
|
|
||||||
— same one-line note (the spec's hardening profile sketch
|
|
||||||
references the old `User=l4d2-sandbox`; the new build-overlay-unit
|
|
||||||
refactor when it lands will inherit `User=left4me` from this
|
|
||||||
change).
|
|
||||||
- **Leave the 2026-05-08-* design specs alone.** They describe
|
|
||||||
historical design at the time; rewriting them obscures the
|
|
||||||
evolution. Anyone reading them sees the date and the
|
|
||||||
superseded-note chain leads forward.
|
|
||||||
|
|
||||||
7. **Remove `l4d2-sandbox` from the ckn-bw bundle**
|
|
||||||
(`~/Projekte/ckn-bw/bundles/left4me/items.py`):
|
|
||||||
- Delete the `l4d2-sandbox` entry from the `users` dict
|
|
||||||
(lines 54-58 per Phase 1).
|
|
||||||
- Delete the `l4d2-sandbox` entry from the `groups` dict
|
|
||||||
(line 44).
|
|
||||||
- Update the `/var/lib/left4me` mode comment + decide whether to
|
|
||||||
change `0711` → `0755`. The `0711` was specifically to let
|
|
||||||
`l4d2-sandbox` traverse (not list) the dir; with sandbox gone,
|
|
||||||
`0755` is the natural choice. Pick `0755`.
|
|
||||||
|
|
||||||
8. **On-host pre-flight**: before `bw apply`, chown any remaining
|
|
||||||
uid-981 files to `left4me`:
|
|
||||||
```bash
|
|
||||||
ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 -print
|
|
||||||
| head -50'
|
|
||||||
# If any results, chown them:
|
|
||||||
ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981
|
|
||||||
-exec chown left4me:left4me {} +'
|
|
||||||
```
|
|
||||||
Per the build-time-idmap plan that landed earlier, new sandbox
|
|
||||||
writes already land as `left4me`, so the result should be small
|
|
||||||
or empty. The chown catches any stragglers.
|
|
||||||
|
|
||||||
9. **Cross-repo push + bw apply**:
|
|
||||||
- Commit left4me changes (helper, tests, doc updates) on master.
|
|
||||||
- Commit ckn-bw changes (users/groups deletion, mode change) on
|
|
||||||
master.
|
|
||||||
- Push both.
|
|
||||||
- `bw apply ovh.left4me`.
|
|
||||||
|
|
||||||
10. **Verify**:
|
|
||||||
- `getent passwd l4d2-sandbox` on the host → no result (user
|
|
||||||
removed).
|
|
||||||
- `sudo find /var/lib/left4me /opt/left4me -uid 981 -print` →
|
|
||||||
empty.
|
|
||||||
- Trigger a sandbox build via the web UI; observe in
|
|
||||||
`journalctl -u 'left4me-script-*'` that the transient unit
|
|
||||||
runs as `left4me`, completes successfully, and the resulting
|
|
||||||
overlay files in `/var/lib/left4me/overlays/<id>/` are
|
|
||||||
`left4me:left4me`.
|
|
||||||
- `pytest scripts/tests/test_script_sandbox.py` locally passes
|
|
||||||
with updated assertions.
|
|
||||||
|
|
||||||
## Files to modify
|
|
||||||
|
|
||||||
**Left4me repo (`~/Projekte/left4me`):**
|
|
||||||
- `scripts/libexec/left4me-script-sandbox` — helper changes (step 1)
|
|
||||||
- `scripts/tests/test_script_sandbox.py` — test updates (step 2)
|
|
||||||
- `l4d2web/services/overlay_builders.py` — comment update (step 3)
|
|
||||||
- `l4d2host/instances.py` — comment update (step 3)
|
|
||||||
- `docs/superpowers/plans/2026-05-15-build-time-idmap.md` —
|
|
||||||
SUPERSEDED header (step 4)
|
|
||||||
- `docs/superpowers/plans/2026-05-14-overlay-idmap.md` —
|
|
||||||
SUPERSEDED header (step 4)
|
|
||||||
- `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` —
|
|
||||||
update existing superseded header (step 5)
|
|
||||||
- `docs/superpowers/specs/2026-05-15-hardening-refactor-design.md` —
|
|
||||||
one-line note (step 6)
|
|
||||||
- `docs/superpowers/specs/2026-05-15-hardening-threat-model.md` —
|
|
||||||
one-line note (step 6)
|
|
||||||
- `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md` —
|
|
||||||
one-line note (step 6)
|
|
||||||
- `deploy/README.md` — drop sandbox bullet, update idmap paragraph
|
|
||||||
(step 6)
|
|
||||||
|
|
||||||
**Ckn-bw repo (`~/Projekte/ckn-bw`):**
|
|
||||||
- `bundles/left4me/items.py` — drop `l4d2-sandbox` user + group;
|
|
||||||
tighten mode (step 7)
|
|
||||||
|
|
||||||
**Host actions (no commits):**
|
|
||||||
- pre-flight chown of orphan-981 files (step 8)
|
|
||||||
- `bw apply ovh.left4me` (step 9)
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
End-to-end on `left4.me`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# User removed
|
|
||||||
ssh left4.me 'getent passwd l4d2-sandbox; getent group l4d2-sandbox'
|
|
||||||
# Expect: empty (both)
|
|
||||||
|
|
||||||
# No orphan-uid files
|
|
||||||
ssh left4.me 'sudo find /var/lib/left4me /opt/left4me -uid 981 -print 2>/dev/null'
|
|
||||||
# Expect: empty
|
|
||||||
|
|
||||||
# Sandbox build runs as left4me end-to-end
|
|
||||||
# (Trigger via web UI; then check)
|
|
||||||
ssh left4.me 'sudo journalctl --since "5 minutes ago" -u "left4me-script-*" | head -30'
|
|
||||||
# Expect: clean run, no permission errors
|
|
||||||
|
|
||||||
ssh left4.me 'sudo ls -ln /var/lib/left4me/overlays/<id>/ | head -5'
|
|
||||||
# Expect: uid 980 (left4me), not 981
|
|
||||||
|
|
||||||
# Local tests
|
|
||||||
cd ~/Projekte/left4me && pytest scripts/tests/test_script_sandbox.py -q
|
|
||||||
# Expect: all green (one fewer test — the idmap test was deleted)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
If the deploy goes wrong:
|
|
||||||
- `git revert` the left4me commits + the ckn-bw commit, push,
|
|
||||||
`bw apply` again.
|
|
||||||
- ckn-bw will recreate the `l4d2-sandbox` user on the host.
|
|
||||||
- The old helper script comes back via `git_deploy`.
|
|
||||||
- Any files chown'd from 981→980 in the pre-flight stay at 980 —
|
|
||||||
that's fine because the new helper would have written them as 980
|
|
||||||
anyway.
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **Sandbox build running during `bw apply`**: ckn-bw's user-removal
|
|
||||||
step might fail if a `l4d2-sandbox`-uid process is alive.
|
|
||||||
Mitigation: don't apply during a build. Quick check before apply:
|
|
||||||
`ssh left4.me 'sudo systemctl list-units --type=service "left4me-script-*"'`
|
|
||||||
→ expect "0 loaded units".
|
|
||||||
- **Orphan files not caught by the pre-flight find**: if any uid-981
|
|
||||||
file exists outside `/var/lib/left4me` or `/opt/left4me`, the user
|
|
||||||
removal succeeds but the file becomes orphan-uid. Practically these
|
|
||||||
paths are exhaustive; if paranoid, expand the find to `/`.
|
|
||||||
- **The `nsenter` self-wrap still needs `PrivateTmp=true` on the web
|
|
||||||
unit to be the *reason* the wrap exists**. If the web unit's
|
|
||||||
PrivateTmp ever goes away, the wrap becomes unnecessary. Not
|
|
||||||
affected by this refactor; flag for future cleanup.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- Renaming `left4me` to something else (e.g., `l4d2-app`). Cosmetic
|
|
||||||
only; not worth the migration cost.
|
|
||||||
- The broader configmgmt responsibility reshape (drop-ins owned by
|
|
||||||
left4me, ckn-bw as thin file-shipper). Deferred per the
|
|
||||||
hardening-refactor design.
|
|
||||||
- `build-overlay-unit` template refactor
|
|
||||||
(`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`)
|
|
||||||
— still queued; will inherit `User=left4me` cleanly from this work.
|
|
||||||
- Rewriting historical 2026-05-08-* design specs.
|
|
||||||
|
|
@ -1,408 +0,0 @@
|
||||||
# Plan — collapse left4me venv chain into uv workspace + `uv sync`
|
|
||||||
|
|
||||||
**Status:** executed (left4me side). ckn-bw side queued — see
|
|
||||||
`~/Projekte/ckn-bw/bundles/left4me/` and the matching section below.
|
|
||||||
|
|
||||||
**Notable deviations from the original handoff
|
|
||||||
(`docs/superpowers/specs/2026-05-15-handoff-uv-workspace.md`):**
|
|
||||||
|
|
||||||
- Handoff assumed `pkg_apt: uv` works on Debian Trixie. It does not — uv
|
|
||||||
is in `experimental`/`sid` only. Replaced with a `left4me_install_uv`
|
|
||||||
action that downloads a pinned 0.11.8 tarball from astral-sh/uv
|
|
||||||
releases, SHA256-verifies, installs to `/usr/local/bin/`.
|
|
||||||
- Handoff assumed the existing layout (`l4d2host/pyproject.toml` with
|
|
||||||
`package-dir = "."`) was workspace-compatible. It was not — setuptools
|
|
||||||
writes `egg-info/` to source during any build, which fails on the
|
|
||||||
root-owned `/opt/left4me/src` tree. Required layout restructure to
|
|
||||||
`l4d2host/l4d2host/` (package source nested) plus a switch from
|
|
||||||
setuptools to hatchling.
|
|
||||||
- `git` is not installed on the prod host (bw drives git from the
|
|
||||||
control machine). Verification check #1 uses `find` for build
|
|
||||||
artifacts instead of `git status`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The production deploy of left4me to `ovh.left4me` currently uses a 5-action
|
|
||||||
chain in `ckn-bw/bundles/left4me/items.py` that builds out a Python venv
|
|
||||||
under `/var/lib/left4me/.venv` by chaining `python3 -m venv` → `pip upgrade`
|
|
||||||
→ `pip install` (with an 8-line tempdir-copy dance because the source at
|
|
||||||
`/opt/left4me/src` is root-owned and setuptools wants to write `.egg-info/`
|
|
||||||
into it) → `alembic upgrade` → `seed_overlays`. The chain has three
|
|
||||||
problems:
|
|
||||||
|
|
||||||
1. **Non-deterministic prod deploys.** `pip install` resolves whatever is
|
|
||||||
latest at apply time. A transitive CVE-relevant bump between two
|
|
||||||
`bw apply` runs is invisible until something breaks.
|
|
||||||
2. **Cognitive cost.** The tempdir-copy in `left4me_pip_install` is the
|
|
||||||
single longest, gnarliest action in the bundle.
|
|
||||||
3. **Implicit cross-package dep.** `l4d2web` imports from `l4d2host.paths`
|
|
||||||
in 5 files but doesn't declare the dependency — today's setup works
|
|
||||||
only because both get `pip install -e`'d side-by-side.
|
|
||||||
|
|
||||||
This plan migrates the repo to a uv workspace with a committed `uv.lock`,
|
|
||||||
replacing the 5-action chain with `left4me_install_uv` (download +
|
|
||||||
SHA256 verify, idempotent — only re-runs on version change) plus
|
|
||||||
`left4me_uv_sync`. On the steady-state path (uv already pinned at
|
|
||||||
0.11.8), only `uv_sync` fires per deploy. Both sides of the change
|
|
||||||
(left4me repo and the ckn-bw `left4me` bundle) ship together. The plan
|
|
||||||
executes the migration sequence already documented in
|
|
||||||
`/Users/mwiegand/Projekte/left4me/docs/superpowers/specs/2026-05-15-handoff-uv-workspace.md`
|
|
||||||
— treat that handoff as the design document. This plan adds the
|
|
||||||
empirically-verified ground truth, resolves the small open questions, and
|
|
||||||
encodes the executable sequence.
|
|
||||||
|
|
||||||
## Source of truth
|
|
||||||
|
|
||||||
- **Design**: `docs/superpowers/specs/2026-05-15-handoff-uv-workspace.md`
|
|
||||||
(in the left4me repo) — read this first; do not duplicate its content
|
|
||||||
here.
|
|
||||||
- **Sibling context** (don't dive in):
|
|
||||||
`docs/superpowers/specs/2026-05-15-deployment-responsibility-design.md`
|
|
||||||
(just-shipped; left the venv chain alone),
|
|
||||||
`docs/superpowers/specs/2026-05-15-runtime-state-relocation-design.md`
|
|
||||||
(made `/opt/left4me/src` root-owned, which is *why* the current
|
|
||||||
tempdir-copy dance exists).
|
|
||||||
|
|
||||||
## Resolved questions (from planning)
|
|
||||||
|
|
||||||
- **Branch flow**: direct-to-master on both sides. (Matches left4me's
|
|
||||||
recent workflow, e.g. `b13d164`, `55b0138`. ckn-bw side committed
|
|
||||||
but NOT pushed — operator pushes manually.)
|
|
||||||
- **Python version alignment**: align all three pyprojects (root + both
|
|
||||||
members) to `requires-python = ">=3.13"`. Matches `.envrc` and the
|
|
||||||
production host. Removes the workspace-vs-member skew.
|
|
||||||
- **Spike test scope**: extend beyond the handoff to also dry-run a
|
|
||||||
`uv sync --frozen` shape against a root-owned source — the production
|
|
||||||
command path is `sync`, not `build`, and they're different code paths.
|
|
||||||
- **Scope handoff at `git push`**: agent's deliverable is two ready-to-deploy
|
|
||||||
commits (left4me pushed; ckn-bw committed but unpushed). The user runs
|
|
||||||
`bw apply ovh.left4me`, the post-apply restart, and the 6-check
|
|
||||||
verification matrix themselves. (Per session memory:
|
|
||||||
`feedback_left4me_deploy_workflow` — supersedes the original prompt's
|
|
||||||
ask to drive apply + verify end-to-end.) The spike test remains agent
|
|
||||||
work — it's information gathering, and the one-shot direct install
|
|
||||||
fits the "one-shot via direct command" rule from the same memory.
|
|
||||||
- **uv install vector**: direct GitHub tarball download + SHA256 verify
|
|
||||||
against the official `.sha256` sibling, install to `/usr/local/bin/`.
|
|
||||||
The handoff doc's `pkg_apt: uv` assumption was wrong — uv is not in
|
|
||||||
Debian Trixie's apt archive (in `experimental`/`sid` only). Astral's
|
|
||||||
canonical methods are curl-pipe-sh and direct tarball; we chose
|
|
||||||
tarball for auditability and pattern-match with the existing
|
|
||||||
`left4me_install_steamcmd` action. Pin to **uv 0.11.8** to match
|
|
||||||
the local brew-installed version, eliminating the lockfile-format-skew
|
|
||||||
risk between dev and prod.
|
|
||||||
|
|
||||||
## Ground-truth from exploration
|
|
||||||
|
|
||||||
- **Cross-package imports confirmed**: 5 files in `l4d2web/` import
|
|
||||||
`from l4d2host.paths`:
|
|
||||||
- `l4d2web/routes/overlay_routes.py`
|
|
||||||
- `l4d2web/services/overlay_creation.py`
|
|
||||||
- `l4d2web/services/overlay_builders.py`
|
|
||||||
- `l4d2web/services/overlay_files.py`
|
|
||||||
- `l4d2web/services/workshop_paths.py`
|
|
||||||
- **Layout compatibility**: both members use `[tool.setuptools.package-dir]
|
|
||||||
{name} = "."` (pyproject lives inside the package directory). uv
|
|
||||||
workspace `members = ["l4d2host", "l4d2web"]` handles this fine — uv
|
|
||||||
uses the pyproject as the project root regardless of the package-dir
|
|
||||||
mapping.
|
|
||||||
- **`.gitignore` already covers** `*.egg-info/`, `.venv/`, `__pycache__/`,
|
|
||||||
etc. No `.gitignore` changes needed.
|
|
||||||
- **No `pytest.ini` / `[tool.pytest.ini_options]` exists** — pytest
|
|
||||||
defaults work; `uv run pytest` from repo root will discover tests in
|
|
||||||
`l4d2host/tests/` and `l4d2web/tests/`.
|
|
||||||
- **Bundle action conventions** (from `ckn-bw/bundles/left4me/items.py`
|
|
||||||
and neighbors): every action sets `cascade_skip: False` explicitly.
|
|
||||||
Action keys in use: `command`, `triggered`, `cascade_skip`, `unless`,
|
|
||||||
`needs`, `triggers`, `comment`.
|
|
||||||
- **Additional `git_deploy` consumer**: `left4me_chmod_scripts` at
|
|
||||||
`items.py:324` also `needs: 'git_deploy:/opt/left4me/src'`. Untouched
|
|
||||||
by this refactor, but listed here so it's not missed during review.
|
|
||||||
- **Bundle README §"deploy-flow"**: lines 84–90 of
|
|
||||||
`bundles/left4me/README.md` document the pip_install tempdir dance.
|
|
||||||
This is the prose to rewrite (not vague — those exact lines).
|
|
||||||
- **`apt.packages`** declaration: `metadata.py:29–49`. Currently lists
|
|
||||||
`python3`, `python3-venv`, `python3-pip`, `python3-dev`, plus i386
|
|
||||||
multiarch entries.
|
|
||||||
- **uv NOT in Debian Trixie apt archive** (verified via
|
|
||||||
`apt-cache search "^uv$"` and `apt-cache policy uv` on the live host
|
|
||||||
— both return nothing for the actual `uv` package). Handoff doc's
|
|
||||||
assumption was wrong on this point.
|
|
||||||
- **`git` is NOT installed on the production host** (verified via
|
|
||||||
`command -v git` on prod returning empty; `/usr/bin/git` doesn't
|
|
||||||
exist). The bw `git_deploy` item operates from the *control* machine
|
|
||||||
(dev laptop), pushing files to prod via SSH — prod itself needs no
|
|
||||||
git. Implication: the handoff's verification check #1
|
|
||||||
(`sudo git -C /opt/left4me/src status --porcelain`) cannot be used.
|
|
||||||
Replace with `find /opt/left4me/src \( -name '*.egg-info' -o -name
|
|
||||||
build -o -name dist \) -print`.
|
|
||||||
- **ckn-bw is currently EVEN with `origin/master`** (verified via
|
|
||||||
`git status -sb` showing `## master...origin/master` with empty
|
|
||||||
log). The original prompt's "7 commits ahead" was stale — the
|
|
||||||
operator has since pushed. After our ckn-bw commit lands locally,
|
|
||||||
the repo will be 1 commit ahead (not 8).
|
|
||||||
- **Prod arch**: `x86_64` / `amd64`. **Prod curl**: 8.14.1 at
|
|
||||||
`/usr/bin/curl`. **Prod tar**: GNU tar 1.35. **Prod install**: GNU
|
|
||||||
coreutils 9.7. **`/usr/local/bin`** exists, root-owned, currently
|
|
||||||
contains only the `downtime` binary.
|
|
||||||
- **Current prod venv state**: `/var/lib/left4me/.venv/` exists, owned
|
|
||||||
by `left4me:left4me`, contains `python3.13`, `pip`, `alembic`,
|
|
||||||
`flask`, `gunicorn`, `l4d2ctl`. `pip show l4d2host` / `pip show
|
|
||||||
l4d2web` both report version 0.1.0. So uv will be adopting a venv
|
|
||||||
that already has working installs of both members + their deps.
|
|
||||||
- **Local dev environment**: `uv 0.11.8` (brew), `direnv 2.37.1`
|
|
||||||
(supports `use uv`), `python 3.13.13`. No `.venv` exists locally yet
|
|
||||||
— clean slate.
|
|
||||||
|
|
||||||
## Critical files
|
|
||||||
|
|
||||||
### left4me repo
|
|
||||||
- **NEW** `/Users/mwiegand/Projekte/left4me/pyproject.toml` — workspace root
|
|
||||||
- **NEW** `/Users/mwiegand/Projekte/left4me/uv.lock` — generated via `uv lock`
|
|
||||||
- `l4d2host/pyproject.toml:10` — bump `requires-python` to `>=3.13`
|
|
||||||
- `l4d2web/pyproject.toml:10–18` — bump `requires-python`, add
|
|
||||||
`"l4d2host"` to `dependencies`, add `[tool.uv.sources] l4d2host = { workspace = true }`
|
|
||||||
- `.envrc` — replace `layout python python3.13` with `use uv` (with
|
|
||||||
fallback if direnv stdlib is too old)
|
|
||||||
- `README.md`, `AGENTS.md`, `l4d2web/README.md` — update install
|
|
||||||
instructions
|
|
||||||
|
|
||||||
### ckn-bw repo (`~/Projekte/ckn-bw/`)
|
|
||||||
- `bundles/left4me/metadata.py:29–49` — **ensure** `'curl': {}` is in
|
|
||||||
`apt.packages` (required by the new install action; verify it's not
|
|
||||||
already inherited from a base bundle). **Drop** `'python3-pip'` (uv
|
|
||||||
replaces pip; bundle has no other consumer). **Drop** `'python3-venv'`
|
|
||||||
(the chain no longer uses `python3 -m venv`; uv creates its own venv
|
|
||||||
via `UV_PROJECT_ENVIRONMENT`). **Keep** `'python3'`, `'python3-dev'`,
|
|
||||||
and the i386 multiarch entries.
|
|
||||||
**Do NOT add** `'uv': {}` — uv is not in Trixie's apt archive.
|
|
||||||
- `bundles/left4me/items.py:285–305` — update `git_deploy:/opt/left4me/src`
|
|
||||||
triggers: replace `action:left4me_pip_install` with
|
|
||||||
`action:left4me_uv_sync`
|
|
||||||
- `bundles/left4me/items.py:328–340` — **DELETE** `left4me_create_venv`
|
|
||||||
- `bundles/left4me/items.py:342–352` — **DELETE** `left4me_pip_upgrade`
|
|
||||||
- `bundles/left4me/items.py:354–382` — **DELETE** `left4me_pip_install`
|
|
||||||
(replaced by `left4me_uv_sync` below)
|
|
||||||
- `bundles/left4me/items.py:384–407` — `left4me_alembic_upgrade`:
|
|
||||||
update `needs:` (or `triggered_by:` equivalent) to point at
|
|
||||||
`action:left4me_uv_sync` instead of `action:left4me_pip_install`
|
|
||||||
- `bundles/left4me/items.py` — **ADD** two new actions:
|
|
||||||
- `left4me_install_uv`: download pinned 0.11.8 tarball from
|
|
||||||
github.com/astral-sh/uv/releases/, SHA256-verify, install to
|
|
||||||
/usr/local/bin/. Idempotent via `unless: '/usr/local/bin/uv --version
|
|
||||||
| grep -qx "uv 0.11.8"'`. `needs: ['pkg_apt:curl']`,
|
|
||||||
`triggers: ['action:left4me_uv_sync']`. (Body matches the approved
|
|
||||||
preview, with `unless:` refined to `grep -qx` for BRE portability.)
|
|
||||||
- `left4me_uv_sync`: `sudo -u left4me env
|
|
||||||
UV_PROJECT_ENVIRONMENT=/var/lib/left4me/.venv /usr/local/bin/uv
|
|
||||||
sync --frozen --project /opt/left4me/src`. `triggered: True`,
|
|
||||||
`cascade_skip: False`, `needs:` includes
|
|
||||||
`'git_deploy:/opt/left4me/src'`, `'action:left4me_install_uv'`,
|
|
||||||
`'directory:/var/lib/left4me'`, `'user:left4me'`. `triggers:
|
|
||||||
['action:left4me_alembic_upgrade']`.
|
|
||||||
- `bundles/left4me/README.md:84–90` — rewrite the deploy-flow description
|
|
||||||
to mention the install_uv + uv_sync chain instead of the tempdir-dance
|
|
||||||
|
|
||||||
## Execution steps
|
|
||||||
|
|
||||||
### Step 0 — Spike test (extended) — DO FIRST
|
|
||||||
Verify the architectural assumption empirically on the live host.
|
|
||||||
Uses the SAME install vector the production action will use (direct
|
|
||||||
tarball + SHA256 verify), so the spike doubles as a smoke test for
|
|
||||||
the install action itself.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# A. Install pinned uv on prod (one-shot via direct command; matches
|
|
||||||
# what the future bw action will do).
|
|
||||||
ssh ckn@left4.me '
|
|
||||||
set -e
|
|
||||||
tmpdir=$(mktemp -d); trap "rm -rf $tmpdir" EXIT
|
|
||||||
base=https://github.com/astral-sh/uv/releases/download/0.11.8
|
|
||||||
tar=uv-x86_64-unknown-linux-gnu.tar.gz
|
|
||||||
curl -fsSL -o $tmpdir/$tar $base/$tar
|
|
||||||
curl -fsSL -o $tmpdir/$tar.sha256 $base/$tar.sha256
|
|
||||||
(cd $tmpdir && sha256sum -c $tar.sha256)
|
|
||||||
tar -xzf $tmpdir/$tar -C $tmpdir --strip-components=1
|
|
||||||
sudo install -m 0755 $tmpdir/uv /usr/local/bin/uv
|
|
||||||
sudo install -m 0755 $tmpdir/uvx /usr/local/bin/uvx
|
|
||||||
/usr/local/bin/uv --version
|
|
||||||
'
|
|
||||||
|
|
||||||
# B. uv build against root-owned source: source must stay clean.
|
|
||||||
ssh ckn@left4.me '
|
|
||||||
sudo -u left4me sh -c "
|
|
||||||
wheels=\$(mktemp -d)
|
|
||||||
/usr/local/bin/uv build --wheel --sdist /opt/left4me/src/l4d2host --out-dir \$wheels
|
|
||||||
ls \$wheels
|
|
||||||
"
|
|
||||||
'
|
|
||||||
# Cleanliness probe — git not on prod, so use find for build artifacts.
|
|
||||||
# Expected: only-existing egg-info dirs (the ones already on disk from
|
|
||||||
# the current pip install -e flow); NO NEW artifacts from this run.
|
|
||||||
# Capture a baseline BEFORE the build, compare AFTER.
|
|
||||||
ssh ckn@left4.me 'sudo find /opt/left4me/src \( -name "*.egg-info" -o -name build -o -name dist -o -name "__pycache__" \) -printf "%T@ %p\n" | sort'
|
|
||||||
|
|
||||||
# C. Extended sync-shape check — dry-run `uv sync --frozen` against a
|
|
||||||
# root-owned workspace mock in /tmp. Verify the project root stays
|
|
||||||
# clean (no .python-version written, no transient files left over).
|
|
||||||
# This validates that `uv sync` (not just `uv build`) is safe against
|
|
||||||
# a read-only project tree, which is the actual production code path.
|
|
||||||
```
|
|
||||||
|
|
||||||
**Decision gate**:
|
|
||||||
- Source stays clean across B and C → proceed with full plan.
|
|
||||||
- New `*.egg-info` / `build/` / `dist/` directories appear in
|
|
||||||
`/opt/left4me/src` after `uv build` → fall back to **Medium scope**
|
|
||||||
(handoff §"Empirical spike" → fallback). Update the handoff doc to
|
|
||||||
record the fallback decision and re-plan.
|
|
||||||
- `uv sync` writes into the project root during step C → also fall back
|
|
||||||
to Medium scope. Same handoff update.
|
|
||||||
|
|
||||||
### Step 1 — left4me workspace setup (local)
|
|
||||||
1. Write `/Users/mwiegand/Projekte/left4me/pyproject.toml` (workspace root)
|
|
||||||
— see handoff §"What changes — left4me side / New: pyproject.toml"
|
|
||||||
2. Bump `l4d2host/pyproject.toml:10` to `requires-python = ">=3.13"`
|
|
||||||
3. Update `l4d2web/pyproject.toml`: bump `requires-python`, add
|
|
||||||
`"l4d2host"` to `dependencies`, append `[tool.uv.sources]` block
|
|
||||||
4. `uv lock` at the repo root → produces `uv.lock`
|
|
||||||
5. `uv sync` → creates `.venv/`, installs both members editable + pytest
|
|
||||||
6. `uv run pytest` → all green
|
|
||||||
7. Update `.envrc`: replace `layout python python3.13` with `use uv`
|
|
||||||
(fallback to `uv sync >/dev/null && source .venv/bin/activate` if
|
|
||||||
the dev's direnv version doesn't ship `use uv`)
|
|
||||||
8. Update `README.md`, `AGENTS.md`, `l4d2web/README.md`: replace the
|
|
||||||
`pip install -e ...` invocation with `uv sync` and add the one-time
|
|
||||||
prereq line about installing uv. Mention macOS (`brew install uv`)
|
|
||||||
and Linux (curl-pipe-sh from astral.sh) — **do NOT** suggest
|
|
||||||
`apt install uv`, as it's not in Debian's apt archive yet (only
|
|
||||||
`experimental`/`sid`).
|
|
||||||
|
|
||||||
### Step 2 — left4me commit + push
|
|
||||||
Single commit using the suggested message from the handoff
|
|
||||||
(§"Commit messages — left4me side"). Push to `origin` (gitlab on
|
|
||||||
sublimity.de — confirmed safe-publish-exempt per memory). The commit
|
|
||||||
makes the workspace and lockfile available to ckn-bw's `git_deploy`.
|
|
||||||
|
|
||||||
### Step 3 — ckn-bw bundle refactor
|
|
||||||
1. Edit `bundles/left4me/metadata.py:29–49`:
|
|
||||||
- Ensure `'curl': {}` is in `apt.packages` (verify it's not already
|
|
||||||
inherited from a base bundle; if not, add it explicitly).
|
|
||||||
- Drop `'python3-pip'` (uv replaces pip; bundle has no other
|
|
||||||
consumer — grep the bundle to confirm).
|
|
||||||
- Drop `'python3-venv'` (chain no longer uses `python3 -m venv`).
|
|
||||||
- Keep `'python3'`, `'python3-dev'`, and the i386 multiarch entries.
|
|
||||||
- **Do NOT add `'uv': {}`** — not in Trixie's apt.
|
|
||||||
2. Edit `bundles/left4me/items.py`:
|
|
||||||
- Delete `left4me_create_venv`, `left4me_pip_upgrade`,
|
|
||||||
`left4me_pip_install` blocks (lines 328–382 inclusive).
|
|
||||||
- Add `left4me_install_uv` action: downloads pinned uv 0.11.8 tarball
|
|
||||||
from github.com/astral-sh/uv/releases/, SHA256-verifies against the
|
|
||||||
official `.sha256` sibling, installs to `/usr/local/bin/{uv,uvx}`.
|
|
||||||
Idempotent via `unless: '/usr/local/bin/uv --version 2>/dev/null
|
|
||||||
| grep -qx "uv 0.11.8"'`. `needs: ['pkg_apt:curl']`,
|
|
||||||
`triggers: ['action:left4me_uv_sync']`, `triggered: False`,
|
|
||||||
`cascade_skip: False`.
|
|
||||||
- Add `left4me_uv_sync` action: `sudo -u left4me env
|
|
||||||
UV_PROJECT_ENVIRONMENT=/var/lib/left4me/.venv /usr/local/bin/uv
|
|
||||||
sync --frozen --project /opt/left4me/src`. `triggered: True`,
|
|
||||||
`cascade_skip: False`. `needs:` includes
|
|
||||||
`'git_deploy:/opt/left4me/src'`, `'action:left4me_install_uv'`,
|
|
||||||
`'directory:/var/lib/left4me'`, `'user:left4me'`. `triggers:
|
|
||||||
['action:left4me_alembic_upgrade']`.
|
|
||||||
- Update `git_deploy:/opt/left4me/src` triggers (lines 285–305):
|
|
||||||
replace `'action:left4me_pip_install'` with
|
|
||||||
`'action:left4me_uv_sync'`. Keep `left4me_alembic_upgrade` and
|
|
||||||
`left4me_daemon_reload` triggers.
|
|
||||||
- Update `left4me_alembic_upgrade` (lines 384–407): its dependency
|
|
||||||
on `left4me_pip_install` must now point at `left4me_uv_sync`.
|
|
||||||
3. Rewrite `bundles/left4me/README.md:84–90` to describe the new
|
|
||||||
`install_uv → uv_sync → alembic_upgrade → seed_overlays + restart`
|
|
||||||
chain (drop the pip + tempdir-dance prose).
|
|
||||||
4. `(cd ~/Projekte/ckn-bw && .venv/bin/bw test)` → must pass clean.
|
|
||||||
|
|
||||||
### Step 4 — ckn-bw commit (DO NOT PUSH)
|
|
||||||
Single commit using the suggested message from the handoff
|
|
||||||
(§"Commit messages — ckn-bw side"). Do **not** `git push`. Per
|
|
||||||
verified state today, ckn-bw is currently EVEN with `origin/master`
|
|
||||||
(not 7 ahead as the original prompt claimed — the operator pushed
|
|
||||||
since the prompt was written). After this commit lands locally, the
|
|
||||||
repo will be 1 commit ahead of origin.
|
|
||||||
|
|
||||||
### Step 5 — Report to operator (handoff to user for deploy)
|
|
||||||
Agent's work ends here. Brief summary to the user including:
|
|
||||||
- Spike outcome (full uv-workspace path confirmed, or Medium-scope
|
|
||||||
fallback taken — including any handoff doc updates if the latter).
|
|
||||||
- What's committed and where it sits: left4me pushed to `origin/master`;
|
|
||||||
ckn-bw committed locally, now 1 commit ahead of origin (unpushed).
|
|
||||||
- The `bw apply ovh.left4me` invocation for the user to run, with the
|
|
||||||
expected output (left4me_install_uv runs the download+verify, three
|
|
||||||
old actions removed from the graph, two new actions present
|
|
||||||
(install_uv + uv_sync), alembic+seed+restart cascade fires).
|
|
||||||
- The 6-check verification matrix from handoff §"Verification
|
|
||||||
(end-to-end)" for the user to walk through after apply — with
|
|
||||||
check #1 amended: use
|
|
||||||
`sudo find /opt/left4me/src \( -name '*.egg-info' -o -name build
|
|
||||||
-o -name dist \) -newer <baseline>` instead of `git status`,
|
|
||||||
because git isn't installed on prod.
|
|
||||||
- Recovery path if uv refuses to adopt the existing venv: one-shot
|
|
||||||
`ssh ckn@left4.me 'sudo -u left4me rm -rf /var/lib/left4me/.venv'`,
|
|
||||||
then re-apply.
|
|
||||||
- Open follow-ups (uv version pinning policy — bump cadence, signing,
|
|
||||||
etc; direnv `use uv` fallback applied or not; whether to add a
|
|
||||||
separate `pkg_apt: curl` if it wasn't already declared).
|
|
||||||
|
|
||||||
**Do NOT run `bw apply`, the verification matrix, or the gameserver
|
|
||||||
round-trip — those are explicitly user-side per session memory.**
|
|
||||||
|
|
||||||
## Plan storage after approval
|
|
||||||
|
|
||||||
Per the user's global AGENTS.md (`~/.claude/agents/AGENTS.md`): specs
|
|
||||||
and plans live in the repo they describe, typically under `docs/`. After
|
|
||||||
ExitPlanMode and approval, this plan should be copied to
|
|
||||||
`/Users/mwiegand/Projekte/left4me/docs/superpowers/plans/2026-05-15-uv-workspace-execution.md`
|
|
||||||
as a peer to the design handoff, then committed alongside the left4me
|
|
||||||
changes in Step 2.
|
|
||||||
|
|
||||||
## What does NOT change (out of scope)
|
|
||||||
|
|
||||||
- Source ownership: `/opt/left4me/src` stays root-owned.
|
|
||||||
- Venv location: `/var/lib/left4me/.venv` stays where it is, owned by
|
|
||||||
the `left4me` user, accessed via `UV_PROJECT_ENVIRONMENT`.
|
|
||||||
- Hardening drop-ins, sudoers, sysctl, helpers — all stable from the
|
|
||||||
deployment-responsibility migration.
|
|
||||||
- systemd unit shapes — reactor-emitted, unchanged.
|
|
||||||
- `alembic_upgrade` and `seed_overlays` shell bodies — same commands,
|
|
||||||
just triggered from `uv_sync` instead of `pip_install`.
|
|
||||||
- `pkg_apt: python3` and `python3-dev` — kept (uv shells out to system
|
|
||||||
Python).
|
|
||||||
- Other ckn-bw bundles — this is left4me-specific.
|
|
||||||
- The build-overlay-unit refactor — separate queued thread.
|
|
||||||
- CI — none currently exists.
|
|
||||||
|
|
||||||
## Risks (carried from handoff, sized empirically)
|
|
||||||
|
|
||||||
1. **Spike test failure** → fall back to Medium scope. Graceful.
|
|
||||||
2. ~~Lockfile format skew between dev and prod~~ → **MITIGATED** by
|
|
||||||
pinning prod uv to 0.11.8 (same as local brew). Lockfile generated
|
|
||||||
by dev's uv 0.11.8 will be consumed by prod's uv 0.11.8 byte-for-byte
|
|
||||||
compatible. Risk effectively eliminated unless dev's brew bumps uv
|
|
||||||
independently — track this in the pinning-policy follow-up.
|
|
||||||
3. **direnv `use uv` availability** → local direnv is 2.37.1 (`use uv`
|
|
||||||
added in 2.34+, so we're fine). Fallback snippet documented in case
|
|
||||||
another dev has an older direnv.
|
|
||||||
4. **`alembic`/`flask` binary paths** → uv installs the same
|
|
||||||
`console_scripts` entrypoints as pip, so paths under
|
|
||||||
`/var/lib/left4me/.venv/bin/` are identical. Verify in verification
|
|
||||||
matrix.
|
|
||||||
5. **`--force-reinstall` semantics** → no longer needed; `uv sync` is
|
|
||||||
lockfile-aware, not package-version-aware.
|
|
||||||
6. **uv release artifact availability** → if github.com/astral-sh/uv
|
|
||||||
takes down release 0.11.8 (extremely unlikely but theoretically
|
|
||||||
possible), the install action would fail. Mitigation: pin a recent
|
|
||||||
stable release, monitor astral's deprecation cadence; if needed,
|
|
||||||
mirror the artifact to an internal location for future-proofing
|
|
||||||
(out of scope for this migration).
|
|
||||||
7. **SHA256 of the tarball** → we trust the `.sha256` sibling fetched
|
|
||||||
from the same github release. A future hardening pass could embed
|
|
||||||
the checksum in the bundle source for offline verification, but the
|
|
||||||
current trust model matches steamcmd's (also github-sourced).
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,747 +0,0 @@
|
||||||
# timeago Shared Display Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Unify all user-facing datetime rendering in `l4d2web` behind a single `timeago` Jinja filter that returns a `<time>` element with a relative label and a precise UTC tooltip.
|
|
||||||
|
|
||||||
**Architecture:** Two callables in `l4d2web/l4d2web/services/timeago.py` — `humanize_delta` (pure text, source of truth for the relative-label ladder) and `format_time_html` (wraps the text in a `<time>` element). The latter is registered as Jinja filter `timeago` in the Flask app factory. Templates and routes migrate from raw datetime repr and bespoke inline math to `{{ ts | timeago }}`.
|
|
||||||
|
|
||||||
**Tech Stack:** Python 3.13, Flask, Jinja2, `markupsafe.Markup`, pytest.
|
|
||||||
|
|
||||||
**Reference spec:** `docs/superpowers/specs/2026-05-16-timeago-shared-display-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
| File | Action | Responsibility |
|
|
||||||
|---|---|---|
|
|
||||||
| `l4d2web/l4d2web/services/timeago.py` | Rewrite | `humanize_delta` (new symmetric ladder) + new `format_time_html` |
|
|
||||||
| `l4d2web/l4d2web/app.py` | Modify | Register `timeago` filter in `create_app` |
|
|
||||||
| `l4d2web/tests/test_timeago.py` | Create | Unit tests for both helpers + Flask smoke test |
|
|
||||||
| `l4d2web/l4d2web/templates/admin_users.html` | Modify | Use filter for `created_at` / `updated_at` |
|
|
||||||
| `l4d2web/l4d2web/templates/blueprints.html` | Modify | Use filter for `created_at` / `updated_at` |
|
|
||||||
| `l4d2web/l4d2web/templates/_job_table.html` | Modify | Use filter for `created_at` / `finished_at` (with None guard) |
|
|
||||||
| `l4d2web/l4d2web/templates/job_detail.html` | Modify | Use filter for `created_at` / `started_at` / `finished_at` |
|
|
||||||
| `l4d2web/l4d2web/templates/_live_state.html` | Modify | Replace inline `(now - x).total_seconds()` with filter |
|
|
||||||
| `l4d2web/l4d2web/templates/_server_actions.html` | Modify | Switch from `latest_job_when` (string) to `latest_job_at \| timeago` |
|
|
||||||
| `l4d2web/l4d2web/templates/_overlay_build_status.html` | Modify | Switch from `latest_build_when` to `latest_build_at \| timeago` |
|
|
||||||
| `l4d2web/l4d2web/routes/page_routes.py` | Modify | Drop `humanize_delta` imports; pass raw datetime as `latest_job_at` / `latest_build_at` |
|
|
||||||
| `l4d2web/l4d2web/routes/server_routes.py` | Modify | Remove now-dead `now=` kwarg from `_live_state.html` render call |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Rewrite `humanize_delta` with the new ladder
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/services/timeago.py`
|
|
||||||
- Create: `l4d2web/tests/test_timeago.py`
|
|
||||||
|
|
||||||
The current ladder uses `just now` under 45s and clamps future deltas. The new ladder is symmetric, has second precision, and uses day-month (with year if different) for ≥7 days. Spec table in section "Ladder (long form, symmetric for past and future)".
|
|
||||||
|
|
||||||
- [ ] **Step 1: Create the test file with parameterised boundary tests**
|
|
||||||
|
|
||||||
Create `l4d2web/tests/test_timeago.py` with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from datetime import UTC, datetime, timedelta
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
from l4d2web.services.timeago import humanize_delta
|
|
||||||
|
|
||||||
|
|
||||||
NOW = datetime(2026, 5, 16, 12, 0, 0, tzinfo=UTC)
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize(
|
|
||||||
("delta", "expected"),
|
|
||||||
[
|
|
||||||
# zero
|
|
||||||
(timedelta(0), "now"),
|
|
||||||
# past, seconds
|
|
||||||
(timedelta(seconds=1), "1 second ago"),
|
|
||||||
(timedelta(seconds=2), "2 seconds ago"),
|
|
||||||
(timedelta(seconds=59), "59 seconds ago"),
|
|
||||||
# past, minutes
|
|
||||||
(timedelta(seconds=60), "1 minute ago"),
|
|
||||||
(timedelta(minutes=1), "1 minute ago"),
|
|
||||||
(timedelta(minutes=2), "2 minutes ago"),
|
|
||||||
(timedelta(minutes=59), "59 minutes ago"),
|
|
||||||
# past, hours
|
|
||||||
(timedelta(minutes=60), "1 hour ago"),
|
|
||||||
(timedelta(hours=1), "1 hour ago"),
|
|
||||||
(timedelta(hours=2), "2 hours ago"),
|
|
||||||
(timedelta(hours=23), "23 hours ago"),
|
|
||||||
# past, days
|
|
||||||
(timedelta(hours=24), "1 day ago"),
|
|
||||||
(timedelta(days=1), "1 day ago"),
|
|
||||||
(timedelta(days=2), "2 days ago"),
|
|
||||||
(timedelta(days=6), "6 days ago"),
|
|
||||||
# past, date fallback same year (now = 16 May 2026)
|
|
||||||
(timedelta(days=7), "9 May"),
|
|
||||||
(timedelta(days=30), "16 Apr"),
|
|
||||||
(timedelta(days=120), "16 Jan"),
|
|
||||||
# past, date fallback different year
|
|
||||||
(timedelta(days=365), "16 May 2025"),
|
|
||||||
(timedelta(days=400), "11 Apr 2025"),
|
|
||||||
],
|
|
||||||
)
|
|
||||||
def test_humanize_delta_past(delta, expected):
|
|
||||||
then = NOW - delta
|
|
||||||
assert humanize_delta(then, now=NOW) == expected
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize(
|
|
||||||
("delta", "expected"),
|
|
||||||
[
|
|
||||||
# future, seconds
|
|
||||||
(timedelta(seconds=1), "in 1 second"),
|
|
||||||
(timedelta(seconds=2), "in 2 seconds"),
|
|
||||||
(timedelta(seconds=59), "in 59 seconds"),
|
|
||||||
# future, minutes
|
|
||||||
(timedelta(seconds=60), "in 1 minute"),
|
|
||||||
(timedelta(minutes=2), "in 2 minutes"),
|
|
||||||
(timedelta(minutes=59), "in 59 minutes"),
|
|
||||||
# future, hours
|
|
||||||
(timedelta(hours=1), "in 1 hour"),
|
|
||||||
(timedelta(hours=23), "in 23 hours"),
|
|
||||||
# future, days
|
|
||||||
(timedelta(days=1), "in 1 day"),
|
|
||||||
(timedelta(days=6), "in 6 days"),
|
|
||||||
# future, date fallback same year
|
|
||||||
(timedelta(days=7), "23 May"),
|
|
||||||
(timedelta(days=30), "15 Jun"),
|
|
||||||
# future, date fallback different year
|
|
||||||
(timedelta(days=365), "16 May 2027"),
|
|
||||||
],
|
|
||||||
)
|
|
||||||
def test_humanize_delta_future(delta, expected):
|
|
||||||
then = NOW + delta
|
|
||||||
assert humanize_delta(then, now=NOW) == expected
|
|
||||||
|
|
||||||
|
|
||||||
def test_humanize_delta_accepts_naive_input_as_utc():
|
|
||||||
then_naive = (NOW - timedelta(minutes=5)).replace(tzinfo=None)
|
|
||||||
assert humanize_delta(then_naive, now=NOW) == "5 minutes ago"
|
|
||||||
|
|
||||||
|
|
||||||
def test_humanize_delta_accepts_naive_now_as_utc():
|
|
||||||
then = NOW - timedelta(minutes=5)
|
|
||||||
now_naive = NOW.replace(tzinfo=None)
|
|
||||||
assert humanize_delta(then, now=now_naive) == "5 minutes ago"
|
|
||||||
|
|
||||||
|
|
||||||
def test_humanize_delta_default_now_is_datetime_now_utc():
|
|
||||||
then = datetime.now(UTC) - timedelta(seconds=3)
|
|
||||||
assert humanize_delta(then) in {"3 seconds ago", "2 seconds ago", "4 seconds ago"}
|
|
||||||
|
|
||||||
|
|
||||||
def test_humanize_delta_year_boundary_includes_year_when_years_differ():
|
|
||||||
now = datetime(2026, 1, 2, 12, 0, 0, tzinfo=UTC)
|
|
||||||
then = datetime(2025, 12, 30, 12, 0, 0, tzinfo=UTC)
|
|
||||||
assert humanize_delta(then, now=now) == "30 Dec 2025"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run the new tests to verify they fail against the current implementation**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_timeago.py -v`
|
|
||||||
Expected: most past tests FAIL (current implementation returns `just now` under 45s, no singular `1 second ago`); all future tests FAIL (current clamps to 0 → `just now`); date-fallback tests FAIL (current returns ISO `2025-04-21` not `9 May`).
|
|
||||||
|
|
||||||
- [ ] **Step 3: Rewrite `humanize_delta` to satisfy the tests**
|
|
||||||
|
|
||||||
Replace the entire contents of `l4d2web/l4d2web/services/timeago.py` with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from datetime import UTC, datetime
|
|
||||||
|
|
||||||
|
|
||||||
_MONTHS = (
|
|
||||||
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
|
|
||||||
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _ensure_utc(dt: datetime) -> datetime:
|
|
||||||
if dt.tzinfo is None:
|
|
||||||
return dt.replace(tzinfo=UTC)
|
|
||||||
return dt
|
|
||||||
|
|
||||||
|
|
||||||
def _format_date(then: datetime, now: datetime) -> str:
|
|
||||||
month = _MONTHS[then.month - 1]
|
|
||||||
if then.year == now.year:
|
|
||||||
return f"{then.day} {month}"
|
|
||||||
return f"{then.day} {month} {then.year}"
|
|
||||||
|
|
||||||
|
|
||||||
def _relative_label(seconds: int, past: bool) -> str:
|
|
||||||
if seconds < 60:
|
|
||||||
unit, n = "second", seconds
|
|
||||||
elif seconds < 3600:
|
|
||||||
unit, n = "minute", seconds // 60
|
|
||||||
elif seconds < 86400:
|
|
||||||
unit, n = "hour", seconds // 3600
|
|
||||||
else:
|
|
||||||
unit, n = "day", seconds // 86400
|
|
||||||
plural = "" if n == 1 else "s"
|
|
||||||
if past:
|
|
||||||
return f"{n} {unit}{plural} ago"
|
|
||||||
return f"in {n} {unit}{plural}"
|
|
||||||
|
|
||||||
|
|
||||||
def humanize_delta(then: datetime, now: datetime | None = None) -> str:
|
|
||||||
if now is None:
|
|
||||||
now = datetime.now(UTC)
|
|
||||||
then = _ensure_utc(then)
|
|
||||||
now = _ensure_utc(now)
|
|
||||||
|
|
||||||
delta_seconds = int((now - then).total_seconds())
|
|
||||||
abs_seconds = abs(delta_seconds)
|
|
||||||
|
|
||||||
if abs_seconds == 0:
|
|
||||||
return "now"
|
|
||||||
|
|
||||||
if abs_seconds >= 7 * 86400:
|
|
||||||
return _format_date(then, now)
|
|
||||||
|
|
||||||
return _relative_label(abs_seconds, past=(delta_seconds > 0))
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run the tests to verify they pass**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_timeago.py -v`
|
|
||||||
Expected: all tests PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run the full test suite to check for regressions in callers of `humanize_delta`**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: all tests pass. If any pre-existing test asserts on the legacy "just now" / 7-day ISO fallback strings via `latest_job_when` rendering, update those assertions to match the new format (e.g. "1 second ago", "9 May"). Note in commit message which tests were updated and why.
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/services/timeago.py l4d2web/tests/test_timeago.py
|
|
||||||
git commit -m "feat(timeago): symmetric ladder with second precision and date fallback
|
|
||||||
|
|
||||||
Rewrite humanize_delta as a symmetric past/future ladder with
|
|
||||||
sub-minute precision. Replace the bare ISO date fallback after 7 days
|
|
||||||
with a day-month form (year suppressed when same as now). Refs spec
|
|
||||||
docs/superpowers/specs/2026-05-16-timeago-shared-display-design.md."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Add `format_time_html` returning a `<time>` element
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/services/timeago.py`
|
|
||||||
- Modify: `l4d2web/tests/test_timeago.py`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Append tests for `format_time_html` to the test file**
|
|
||||||
|
|
||||||
Append to `l4d2web/tests/test_timeago.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from markupsafe import Markup
|
|
||||||
|
|
||||||
from l4d2web.services.timeago import format_time_html
|
|
||||||
|
|
||||||
|
|
||||||
def test_format_time_html_returns_markup():
|
|
||||||
then = NOW - timedelta(minutes=5)
|
|
||||||
out = format_time_html(then, now=NOW)
|
|
||||||
assert isinstance(out, Markup)
|
|
||||||
|
|
||||||
|
|
||||||
def test_format_time_html_contains_time_element_with_attrs():
|
|
||||||
then = datetime(2026, 5, 16, 14, 32, 11, tzinfo=UTC)
|
|
||||||
now = then + timedelta(minutes=5)
|
|
||||||
out = str(format_time_html(then, now=now))
|
|
||||||
assert out.startswith("<time ")
|
|
||||||
assert out.endswith("</time>")
|
|
||||||
assert 'datetime="2026-05-16T14:32:11+00:00"' in out
|
|
||||||
assert 'title="2026-05-16 14:32:11 UTC"' in out
|
|
||||||
assert ">5 minutes ago<" in out
|
|
||||||
|
|
||||||
|
|
||||||
def test_format_time_html_label_matches_humanize_delta():
|
|
||||||
then = NOW - timedelta(hours=2)
|
|
||||||
label = humanize_delta(then, now=NOW)
|
|
||||||
out = str(format_time_html(then, now=NOW))
|
|
||||||
assert f">{label}<" in out
|
|
||||||
|
|
||||||
|
|
||||||
def test_format_time_html_normalises_naive_input_to_utc():
|
|
||||||
then_naive = datetime(2026, 5, 16, 14, 32, 11)
|
|
||||||
now = datetime(2026, 5, 16, 14, 37, 11, tzinfo=UTC)
|
|
||||||
out = str(format_time_html(then_naive, now=now))
|
|
||||||
assert 'datetime="2026-05-16T14:32:11+00:00"' in out
|
|
||||||
assert 'title="2026-05-16 14:32:11 UTC"' in out
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run the new tests to verify they fail**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_timeago.py -v -k format_time_html`
|
|
||||||
Expected: FAIL with `ImportError: cannot import name 'format_time_html'`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Implement `format_time_html` in `timeago.py`**
|
|
||||||
|
|
||||||
Append to `l4d2web/l4d2web/services/timeago.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from markupsafe import Markup, escape
|
|
||||||
|
|
||||||
|
|
||||||
def format_time_html(then: datetime, now: datetime | None = None) -> Markup:
|
|
||||||
if now is None:
|
|
||||||
now = datetime.now(UTC)
|
|
||||||
then_utc = _ensure_utc(then).astimezone(UTC)
|
|
||||||
now = _ensure_utc(now)
|
|
||||||
|
|
||||||
label = humanize_delta(then_utc, now=now)
|
|
||||||
iso = then_utc.isoformat()
|
|
||||||
title = then_utc.strftime("%Y-%m-%d %H:%M:%S UTC")
|
|
||||||
return Markup(
|
|
||||||
f'<time datetime="{escape(iso)}" title="{escape(title)}">'
|
|
||||||
f"{escape(label)}</time>"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Note: place the `from markupsafe import Markup, escape` import at the top of the file alongside the existing `from datetime import ...` line — don't leave it inline as written above.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run the tests to verify they pass**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_timeago.py -v`
|
|
||||||
Expected: all tests PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/services/timeago.py l4d2web/tests/test_timeago.py
|
|
||||||
git commit -m "feat(timeago): add format_time_html returning a <time> element
|
|
||||||
|
|
||||||
Wrap humanize_delta in an HTML <time> element with datetime= and
|
|
||||||
title= attributes carrying the precise UTC value, so hovering surfaces
|
|
||||||
the exact timestamp regardless of the relative label."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Register `timeago` Jinja filter in the Flask app factory
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/app.py:37-58`
|
|
||||||
- Modify: `l4d2web/tests/test_timeago.py`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add a Flask smoke test for the filter**
|
|
||||||
|
|
||||||
There is no shared `app` fixture in this codebase — each test instantiates `create_app` directly (see `l4d2web/tests/test_health.py` for the minimal pattern). Append to `l4d2web/tests/test_timeago.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from flask import render_template_string
|
|
||||||
|
|
||||||
from l4d2web.app import create_app
|
|
||||||
|
|
||||||
|
|
||||||
def test_timeago_filter_registered_on_app():
|
|
||||||
app = create_app({"TESTING": True, "SECRET_KEY": "test"})
|
|
||||||
with app.app_context():
|
|
||||||
rendered = render_template_string(
|
|
||||||
"{{ ts | timeago }}",
|
|
||||||
ts=datetime.now(UTC) - timedelta(minutes=3),
|
|
||||||
)
|
|
||||||
assert "<time " in rendered
|
|
||||||
assert "<time" not in rendered
|
|
||||||
assert "3 minutes ago" in rendered
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Verify the fixture and the failing assertion**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_timeago.py::test_timeago_filter_registered_on_app -v`
|
|
||||||
Expected: FAIL with a Jinja `TemplateSyntaxError: No filter named 'timeago'` (or similar), confirming the filter is not yet registered.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Register the filter in `create_app`**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/app.py`:
|
|
||||||
|
|
||||||
Add the import near the other `from l4d2web...` imports at the top:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from l4d2web.services.timeago import format_time_html
|
|
||||||
```
|
|
||||||
|
|
||||||
Inside `create_app`, register the filter immediately after `init_db()` runs and before the `@app.before_request` definitions. Add a single line:
|
|
||||||
|
|
||||||
```python
|
|
||||||
app.add_template_filter(format_time_html, "timeago")
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run the smoke test to verify it passes**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_timeago.py::test_timeago_filter_registered_on_app -v`
|
|
||||||
Expected: PASS.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run the full test suite to confirm nothing else broke**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: all tests pass.
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/app.py l4d2web/tests/test_timeago.py
|
|
||||||
git commit -m "feat(app): register timeago Jinja filter
|
|
||||||
|
|
||||||
Templates can now call {{ ts | timeago }} directly without route-side
|
|
||||||
precomputation."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Migrate `admin_users.html` and `blueprints.html`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/admin_users.html:25-26`
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/blueprints.html:17-18`
|
|
||||||
|
|
||||||
Both templates render `created_at` / `updated_at` as raw Python `datetime` repr. No `None` guard needed — these columns are `nullable=False` in `models.py`.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Modify `admin_users.html`**
|
|
||||||
|
|
||||||
Replace lines 25-26 of `l4d2web/l4d2web/templates/admin_users.html`:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<td>{{ user.created_at }}</td>
|
|
||||||
<td>{{ user.updated_at }}</td>
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<td>{{ user.created_at | timeago }}</td>
|
|
||||||
<td>{{ user.updated_at | timeago }}</td>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Modify `blueprints.html`**
|
|
||||||
|
|
||||||
Replace lines 17-18 of `l4d2web/l4d2web/templates/blueprints.html`:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<td>{{ blueprint.created_at }}</td>
|
|
||||||
<td>{{ blueprint.updated_at }}</td>
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<td>{{ blueprint.created_at | timeago }}</td>
|
|
||||||
<td>{{ blueprint.updated_at | timeago }}</td>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Run the existing tests for these pages**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_admin_users.py l4d2web/tests/test_blueprints.py -q`
|
|
||||||
Expected: all tests pass. If a test asserts on the raw datetime string in the rendered HTML, update it to assert the presence of `<time ` for the same row instead.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/admin_users.html l4d2web/l4d2web/templates/blueprints.html
|
|
||||||
git commit -m "refactor(templates): use timeago filter for admin/blueprint timestamps"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: Migrate `_job_table.html` and `job_detail.html` (with `None` guards)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/_job_table.html:22-23`
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/job_detail.html:24-26`
|
|
||||||
|
|
||||||
In `models.py`, `Job.started_at` and `Job.finished_at` are nullable; `Job.created_at` is not. Preserve the existing `-` placeholder for nullable columns.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Modify `_job_table.html`**
|
|
||||||
|
|
||||||
Replace lines 22-23 of `l4d2web/l4d2web/templates/_job_table.html`:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<td>{{ job.created_at }}</td>
|
|
||||||
<td>{{ job.finished_at or "-" }}</td>
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<td>{{ job.created_at | timeago }}</td>
|
|
||||||
<td>{% if job.finished_at %}{{ job.finished_at | timeago }}{% else %}-{% endif %}</td>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Modify `job_detail.html`**
|
|
||||||
|
|
||||||
Replace lines 24-26 of `l4d2web/l4d2web/templates/job_detail.html`:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<tr><th>Created</th><td>{{ job.created_at }}</td></tr>
|
|
||||||
<tr><th>Started</th><td>{{ job.started_at or "-" }}</td></tr>
|
|
||||||
<tr><th>Finished</th><td>{{ job.finished_at or "-" }}</td></tr>
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<tr><th>Created</th><td>{{ job.created_at | timeago }}</td></tr>
|
|
||||||
<tr><th>Started</th><td>{% if job.started_at %}{{ job.started_at | timeago }}{% else %}-{% endif %}</td></tr>
|
|
||||||
<tr><th>Finished</th><td>{% if job.finished_at %}{{ job.finished_at | timeago }}{% else %}-{% endif %}</td></tr>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Run the job-related tests**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_job_logs.py l4d2web/tests/test_pages.py -q`
|
|
||||||
Expected: all tests pass. Update assertions that pin raw-datetime substrings to instead assert `<time `; the `-` placeholder for nullable fields must still render in the absence of `started_at` / `finished_at`.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/_job_table.html l4d2web/l4d2web/templates/job_detail.html
|
|
||||||
git commit -m "refactor(templates): use timeago filter for job timestamps
|
|
||||||
|
|
||||||
Preserves the existing '-' placeholder for nullable started_at /
|
|
||||||
finished_at columns."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 6: Migrate `_live_state.html`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/_live_state.html:9-11, 30-33, 53-56`
|
|
||||||
|
|
||||||
Three call sites; all use bespoke `(now - x).total_seconds() // …` math. Replace with the filter. The `now` template variable becomes unused inside this file after the rewrite.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Replace the `polled Ns ago` line (lines 9-11)**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/_live_state.html`, find:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<small class="muted">
|
|
||||||
polled {{ ((now - snapshot.last_seen_at).total_seconds() | int) }}s ago
|
|
||||||
</small>
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<small class="muted">
|
|
||||||
polled {{ snapshot.last_seen_at | timeago }}
|
|
||||||
</small>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Replace the `joined Nm ago` line (line 31)**
|
|
||||||
|
|
||||||
Find:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<span class="meta">
|
|
||||||
joined {{ ((now - session.joined_at).total_seconds() // 60) | int }}m ago
|
|
||||||
· ping {{ session.min_ping }}-{{ session.max_ping }}ms
|
|
||||||
</span>
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<span class="meta">
|
|
||||||
joined {{ session.joined_at | timeago }}
|
|
||||||
· ping {{ session.min_ping }}-{{ session.max_ping }}ms
|
|
||||||
</span>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Replace the `last seen Nm ago` line (line 55)**
|
|
||||||
|
|
||||||
Find:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<span class="meta">
|
|
||||||
last seen {{ ((now - row.last_seen).total_seconds() // 60) | int }}m ago
|
|
||||||
</span>
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<span class="meta">
|
|
||||||
last seen {{ row.last_seen | timeago }}
|
|
||||||
</span>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run the live-state tests**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests/test_servers.py -q`
|
|
||||||
Expected: tests pass. The two tests `test_servers_index_renders_live_state_badge` and `test_live_state_fragment_renders_current_and_recent` (server_routes.py:449, 513) render this fragment. If they assert on `Nm ago` substrings, replace those assertions with checks for `<time ` or for the new long-form output (e.g. `joined 5 minutes ago`).
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/_live_state.html
|
|
||||||
git commit -m "refactor(templates): use timeago filter in _live_state.html
|
|
||||||
|
|
||||||
Replaces three bespoke (now - x).total_seconds() expressions with the
|
|
||||||
shared filter, unifying vocabulary (no more '0m ago' inside the first
|
|
||||||
minute) and adding the UTC tooltip."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 7: Migrate `_server_actions.html` + `_overlay_build_status.html` + `page_routes.py`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/routes/page_routes.py:240-305, 442-484`
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/_server_actions.html:25-32`
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/_overlay_build_status.html:7-14`
|
|
||||||
|
|
||||||
The two route helpers currently precompute a string via `humanize_delta`. Replace with raw `datetime` passed under a new key, let the template apply the filter.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Update `_build_server_actions_context` in `page_routes.py`**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/routes/page_routes.py`, replace the block at lines 239-305 (function body of `_build_server_actions_context`) so that:
|
|
||||||
|
|
||||||
- Line 240 — remove `from l4d2web.services.timeago import humanize_delta`.
|
|
||||||
- Line 284 — rename `latest_job_when: str | None = None` to `latest_job_at: datetime | None = None`.
|
|
||||||
- Line 294 — replace `latest_job_when = humanize_delta(ref_time)` with `latest_job_at = ref_time`.
|
|
||||||
- Line 303 — update the returned dict key from `"latest_job_when": latest_job_when` to `"latest_job_at": latest_job_at`.
|
|
||||||
|
|
||||||
`datetime` is already imported at `page_routes.py:2` (`from datetime import UTC, datetime, timedelta`) — no import change needed.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Update `_build_overlay_build_status_context` in `page_routes.py`**
|
|
||||||
|
|
||||||
In the same file, replace the block at lines 442-484 (function body of `_build_overlay_build_status_context`) so that:
|
|
||||||
|
|
||||||
- Line 443 — remove `from l4d2web.services.timeago import humanize_delta`.
|
|
||||||
- Line 467 — rename `latest_build_when: str | None = None` to `latest_build_at: datetime | None = None`.
|
|
||||||
- Line 475 — replace `latest_build_when = humanize_delta(ref_time)` with `latest_build_at = ref_time`.
|
|
||||||
- Line 481 — update the returned dict key from `"latest_build_when": latest_build_when` to `"latest_build_at": latest_build_at`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Update `_server_actions.html`**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/_server_actions.html`, line 29, replace:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
{{ latest_job_when }}
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
{{ latest_job_at | timeago }}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Update `_overlay_build_status.html`**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/_overlay_build_status.html`, line 11, replace:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
{{ latest_build_when }}
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
{{ latest_build_at | timeago }}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run the test suite to catch context-key mismatches**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: tests pass. The most likely failure point is tests that check the rendered server actions fragment (`test_servers.py`) or overlay build status fragment. If any test asserts the old `latest_job_when` string output, update it to look for `<time ` or the new long-form output (e.g. `12 minutes ago`).
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/routes/page_routes.py l4d2web/l4d2web/templates/_server_actions.html l4d2web/l4d2web/templates/_overlay_build_status.html
|
|
||||||
git commit -m "refactor(page_routes): pass datetime to templates for timeago filter
|
|
||||||
|
|
||||||
Drop the inline humanize_delta imports and string-precomputation; pass
|
|
||||||
the raw datetime as latest_job_at / latest_build_at and let the
|
|
||||||
template apply the timeago filter. One fewer code path computing
|
|
||||||
relative-time strings."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 8: Drop the dead `now=` kwarg from `_live_state.html` render call
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/routes/server_routes.py:266-275`
|
|
||||||
|
|
||||||
After Task 6, `_live_state.html` no longer reads `now`. Remove the kwarg from the only `render_template` call that passes it.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Confirm no other template uses the `now` context variable**
|
|
||||||
|
|
||||||
Run: `grep -rn "\\bnow\\b" l4d2web/l4d2web/templates/`
|
|
||||||
Inspect the output. The only references should be in template *files* that we have already migrated. Expected: no remaining `(now - …)` or bare `{{ now }}` references in any template.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Remove the `now=` kwarg**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/routes/server_routes.py`, at line 273 inside the `render_template("_live_state.html", …)` call, remove the line:
|
|
||||||
|
|
||||||
```python
|
|
||||||
now=datetime.now(UTC).replace(tzinfo=None),
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Check whether `datetime` and `UTC` are still used in the file**
|
|
||||||
|
|
||||||
If lines 210 and 234 still reference `datetime.now(UTC).replace(tzinfo=None)` (for the `cutoff` and `recent_cutoff` variables), the imports stay. Don't remove them speculatively.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run the test suite**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: all tests pass. If a test passes a fake `now` into the live-state context expecting it to be respected, that test relied on dead code and should be updated to assert against `<time ` output relative to a real `datetime.now(UTC)` reference.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/routes/server_routes.py
|
|
||||||
git commit -m "refactor(server_routes): drop unused 'now' kwarg from _live_state render
|
|
||||||
|
|
||||||
After the timeago migration, the live-state template no longer reads
|
|
||||||
'now' — it computes relative labels through the filter, which derives
|
|
||||||
its own reference time."
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 9: End-to-end verification
|
|
||||||
|
|
||||||
**Files:** none — verification only.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Run the entire test suite**
|
|
||||||
|
|
||||||
Run: `pytest l4d2web/tests -q`
|
|
||||||
Expected: all tests pass.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run ruff if it's part of the project's check workflow**
|
|
||||||
|
|
||||||
Run: `ruff check l4d2web/`
|
|
||||||
Expected: no new violations. The `.ruff_cache/` directory at the project root suggests ruff is in active use.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Confirm no remaining raw-datetime renders or bespoke inline-time math**
|
|
||||||
|
|
||||||
Run: `grep -rn -E "\\{\\{ [a-z_.]+\\.(created_at|updated_at|started_at|finished_at|joined_at|last_seen|last_seen_at)" l4d2web/l4d2web/templates/`
|
|
||||||
Expected: every match is followed by `| timeago` or `| timeago }}{% else %}…{% endif %}`. No bare `{{ x.created_at }}` should remain.
|
|
||||||
|
|
||||||
Run: `grep -rn "(now -" l4d2web/l4d2web/templates/`
|
|
||||||
Expected: no matches.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Manual UI smoke (developer-side, optional but recommended)**
|
|
||||||
|
|
||||||
Start the dev server (see `README.md` for the exact command) and log in:
|
|
||||||
|
|
||||||
- Visit `/admin/users` — `Created` / `Updated` columns render `<time>` elements; hovering shows UTC.
|
|
||||||
- Visit `/blueprints` — same.
|
|
||||||
- Visit `/jobs` and a single job detail — `Created` / `Started` / `Finished` use the filter; null `Finished` shows `-`.
|
|
||||||
- Open a server with live state — `polled N seconds ago`, `joined N minutes ago`, `last seen N minutes ago`; check that page-source shows `<time` markup, not literal `<time>`.
|
|
||||||
|
|
||||||
- [ ] **Step 5: No-op commit not required — work is already committed across Tasks 1-8.**
|
|
||||||
|
|
||||||
End of plan.
|
|
||||||
|
|
@ -1,725 +0,0 @@
|
||||||
# Console Command Autocomplete Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Add command/cvar autocomplete to the runtime console input on `server_detail.html`, sharing the editor's ranking algorithm via a pure-JS module compiled to a tiny additional bundle, with a vanilla dropdown that does not collide with the existing ArrowUp/Down history recall.
|
|
||||||
|
|
||||||
**Architecture:** Extract the editor's inlined ranking logic into a pure ES module `editor-src/vocab-rank.js`. The editor imports it directly; for the console, a second esbuild entry point bundles it into a small `static/vendor/vocab-rank.bundle.js` that exposes `window.__rankVocab`. A new `static/js/console-autocomplete.js` builds a vanilla dropdown (positioned absolutely under the console input), lazy-fetches `srccfg-vocab.json` on first focus, hides the dropdown once the user types past the first token, and binds Tab/Shift+Tab/Esc only — leaving ArrowUp/Down/Enter untouched for `console-history.js`.
|
|
||||||
|
|
||||||
**Tech Stack:** Vanilla JS (no framework), esbuild (IIFE bundles), CodeMirror 6 (editor-side only — console is plain `<input>`), HTMX (existing — for form submission and dynamic page-fragment swap), CSS variables defined in `tokens.css`/`editor.css`. Tests use Node's built-in `node:test` runner (no extra deps).
|
|
||||||
|
|
||||||
**Reference Spec:** `docs/superpowers/specs/2026-05-17-console-command-autocomplete-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
**New files:**
|
|
||||||
- `l4d2web/scripts/editor-src/vocab-rank.js` — pure ranking module (ES, exports `rankVocab`)
|
|
||||||
- `l4d2web/scripts/editor-src/vocab-rank-entry.js` — IIFE entry that assigns `rankVocab` to `window.__rankVocab`
|
|
||||||
- `l4d2web/scripts/editor-src/vocab-rank.test.js` — Node `node:test` unit tests for the ranker
|
|
||||||
- `l4d2web/l4d2web/static/js/console-autocomplete.js` — vanilla dropdown, lazy fetch, key handling
|
|
||||||
- `l4d2web/l4d2web/static/css/console-autocomplete.css` — dropdown styling using existing CSS tokens
|
|
||||||
|
|
||||||
**Modified files:**
|
|
||||||
- `l4d2web/scripts/editor-src/autocomplete.js` — replace inlined `rank()` + scoring with `import { rankVocab } from "./vocab-rank.js"`
|
|
||||||
- `l4d2web/scripts/editor-src/package.json` — add `build:vocab-rank` script; chain into `build`
|
|
||||||
- `l4d2web/l4d2web/templates/base.html` — add `<script defer>` for `vocab-rank.bundle.js` and `console-autocomplete.js`; add `<link>` for `console-autocomplete.css`
|
|
||||||
|
|
||||||
**Build artifacts (regenerated, do not hand-edit):**
|
|
||||||
- `l4d2web/l4d2web/static/vendor/editor.bundle.js` — rebuilt because `autocomplete.js` changed
|
|
||||||
- `l4d2web/l4d2web/static/vendor/vocab-rank.bundle.js` — new tiny bundle
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Extract `rankVocab` into a pure module (TDD)
|
|
||||||
|
|
||||||
**Goal:** Move the editor's inlined ranking logic into a standalone, testable, dependency-free function.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/scripts/editor-src/vocab-rank.js`
|
|
||||||
- Create: `l4d2web/scripts/editor-src/vocab-rank.test.js`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write the failing test file**
|
|
||||||
|
|
||||||
Create `l4d2web/scripts/editor-src/vocab-rank.test.js`:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
import { test } from "node:test";
|
|
||||||
import assert from "node:assert/strict";
|
|
||||||
import { rankVocab } from "./vocab-rank.js";
|
|
||||||
|
|
||||||
const vocab = {
|
|
||||||
cvars: [
|
|
||||||
{ name: "sv_cheats", desc: "Allow cheats" },
|
|
||||||
{ name: "sv_gravity" },
|
|
||||||
{ name: "mp_friendlyfire", desc: "Toggle FF" },
|
|
||||||
],
|
|
||||||
commands: [
|
|
||||||
{ name: "kick", desc: "Kick a player" },
|
|
||||||
{ name: "kickall", desc: "Kick everyone" },
|
|
||||||
{ name: "changelevel", desc: "Change map" },
|
|
||||||
],
|
|
||||||
};
|
|
||||||
|
|
||||||
test("exact match comes first", () => {
|
|
||||||
const out = rankVocab("kick", vocab);
|
|
||||||
assert.equal(out[0].name, "kick");
|
|
||||||
assert.equal(out[1].name, "kickall");
|
|
||||||
});
|
|
||||||
|
|
||||||
test("prefix matches beat substring matches", () => {
|
|
||||||
const out = rankVocab("sv_", vocab);
|
|
||||||
assert.equal(out[0].name, "sv_cheats");
|
|
||||||
assert.equal(out[1].name, "sv_gravity");
|
|
||||||
// mp_friendlyfire contains no "sv_" → should not appear
|
|
||||||
assert.ok(!out.some(e => e.name === "mp_friendlyfire"));
|
|
||||||
});
|
|
||||||
|
|
||||||
test("substring matches included after prefix matches", () => {
|
|
||||||
// "iendly" is a substring of mp_friendlyfire but a prefix of nothing
|
|
||||||
const out = rankVocab("iendly", vocab);
|
|
||||||
assert.equal(out.length, 1);
|
|
||||||
assert.equal(out[0].name, "mp_friendlyfire");
|
|
||||||
});
|
|
||||||
|
|
||||||
test("kind is preserved on each result", () => {
|
|
||||||
const out = rankVocab("kick", vocab);
|
|
||||||
assert.equal(out[0].kind, "command");
|
|
||||||
const sv = rankVocab("sv_cheats", vocab);
|
|
||||||
assert.equal(sv[0].kind, "cvar");
|
|
||||||
});
|
|
||||||
|
|
||||||
test("desc is preserved when present", () => {
|
|
||||||
const out = rankVocab("kick", vocab);
|
|
||||||
assert.equal(out[0].desc, "Kick a player");
|
|
||||||
});
|
|
||||||
|
|
||||||
test("desc is undefined when source had no desc", () => {
|
|
||||||
const out = rankVocab("sv_gravity", vocab);
|
|
||||||
assert.equal(out[0].desc, undefined);
|
|
||||||
});
|
|
||||||
|
|
||||||
test("results are capped at the configured limit", () => {
|
|
||||||
const big = { cvars: [], commands: [] };
|
|
||||||
for (let i = 0; i < 200; i++) big.commands.push({ name: `cmd${i}` });
|
|
||||||
const out = rankVocab("cmd", big, { limit: 50 });
|
|
||||||
assert.equal(out.length, 50);
|
|
||||||
});
|
|
||||||
|
|
||||||
test("default limit is 50", () => {
|
|
||||||
const big = { cvars: [], commands: [] };
|
|
||||||
for (let i = 0; i < 200; i++) big.commands.push({ name: `cmd${i}` });
|
|
||||||
const out = rankVocab("cmd", big);
|
|
||||||
assert.equal(out.length, 50);
|
|
||||||
});
|
|
||||||
|
|
||||||
test("empty query returns no results", () => {
|
|
||||||
const out = rankVocab("", vocab);
|
|
||||||
assert.equal(out.length, 0);
|
|
||||||
});
|
|
||||||
|
|
||||||
test("case-insensitive match", () => {
|
|
||||||
const out = rankVocab("KICK", vocab);
|
|
||||||
assert.equal(out[0].name, "kick");
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run tests to verify they fail**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web/scripts/editor-src && node --test vocab-rank.test.js
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: FAIL with `Cannot find module './vocab-rank.js'` or `ERR_MODULE_NOT_FOUND`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Create the ranker module**
|
|
||||||
|
|
||||||
Create `l4d2web/scripts/editor-src/vocab-rank.js`:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Pure, dependency-free ranking of a vocabulary against a query string.
|
|
||||||
// Used by both the CodeMirror editor (via autocomplete.js) and the
|
|
||||||
// runtime console (via the vocab-rank bundle exposed on window).
|
|
||||||
//
|
|
||||||
// Score (lower = better):
|
|
||||||
// exact match → 0
|
|
||||||
// prefix match → 1 + label.length (shorter prefix matches win)
|
|
||||||
// substring match → 10000 + indexOf (earlier substring beats later)
|
|
||||||
// no match → -1 (excluded)
|
|
||||||
|
|
||||||
function score(query, label) {
|
|
||||||
if (label === query) return 0;
|
|
||||||
if (label.startsWith(query)) return 1 + label.length;
|
|
||||||
const i = label.indexOf(query);
|
|
||||||
if (i !== -1) return 10000 + i;
|
|
||||||
return -1;
|
|
||||||
}
|
|
||||||
|
|
||||||
export function rankVocab(query, vocab, { limit = 50 } = {}) {
|
|
||||||
if (!query) return [];
|
|
||||||
const q = query.toLowerCase();
|
|
||||||
|
|
||||||
const entries = [
|
|
||||||
...vocab.cvars.map(e => ({ ...e, kind: "cvar" })),
|
|
||||||
...vocab.commands.map(e => ({ ...e, kind: "command" })),
|
|
||||||
];
|
|
||||||
|
|
||||||
const scored = [];
|
|
||||||
for (const e of entries) {
|
|
||||||
const s = score(q, e.name.toLowerCase());
|
|
||||||
if (s === -1) continue;
|
|
||||||
scored.push([s, e]);
|
|
||||||
if (scored.length > limit * 4) break;
|
|
||||||
}
|
|
||||||
scored.sort((a, b) => a[0] - b[0]);
|
|
||||||
return scored.slice(0, limit).map(([, e]) => e);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run tests to verify they pass**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web/scripts/editor-src && node --test vocab-rank.test.js
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: PASS — 10 tests passing.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/scripts/editor-src/vocab-rank.js \
|
|
||||||
l4d2web/scripts/editor-src/vocab-rank.test.js
|
|
||||||
git commit -m "feat(editor): extract pure rankVocab module + tests"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Refactor `autocomplete.js` to use the shared ranker
|
|
||||||
|
|
||||||
**Goal:** Replace the inlined `rank()` and scoring loop in `autocomplete.js` with a call to `rankVocab`, with no behavior change.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/scripts/editor-src/autocomplete.js`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Rewrite `autocomplete.js`**
|
|
||||||
|
|
||||||
Replace the entire file contents with:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
import { autocompletion } from "@codemirror/autocomplete";
|
|
||||||
import { rankVocab } from "./vocab-rank.js";
|
|
||||||
|
|
||||||
const WORD_RE = /[A-Za-z0-9_]{2,}/;
|
|
||||||
|
|
||||||
export function vocabCompletions(vocab) {
|
|
||||||
// vocab: { cvars: [{name, desc?}, …], commands: [{name, desc?}, …] }
|
|
||||||
return (context) => {
|
|
||||||
const word = context.matchBefore(WORD_RE);
|
|
||||||
if (!word || (word.from === word.to && !context.explicit)) return null;
|
|
||||||
|
|
||||||
const ranked = rankVocab(word.text, vocab);
|
|
||||||
const options = ranked.map(e => ({
|
|
||||||
label: e.name,
|
|
||||||
info: e.desc || e.kind,
|
|
||||||
type: e.kind === "command" ? "function" : "variable",
|
|
||||||
}));
|
|
||||||
return { from: word.from, options, validFor: WORD_RE };
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
export function autocompleteExtension(vocab) {
|
|
||||||
return autocompletion({
|
|
||||||
override: [vocabCompletions(vocab)],
|
|
||||||
activateOnTyping: true,
|
|
||||||
maxRenderedOptions: 8,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Rebuild the editor bundle**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web/scripts/editor-src && npm run build
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `editor.bundle.js` regenerated in `l4d2web/l4d2web/static/vendor/`. No esbuild warnings or errors.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Manually verify editor autocomplete still works (regression check)**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web && python ../scripts/dev-server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
(Note: per memory, the dev server is `scripts/dev-server.py` at repo root, not `flask run`.) Then in a browser:
|
|
||||||
|
|
||||||
1. Open a server-detail page with a config file editor visible, or navigate to any `.cfg` file edit view.
|
|
||||||
2. In the editor, type `sv_` — autocomplete dropdown appears with cvars (e.g. `sv_cheats`, `sv_gravity`).
|
|
||||||
3. Type `sv_cheats` exactly — `sv_cheats` is first in the list.
|
|
||||||
4. Press Tab — completion is accepted.
|
|
||||||
|
|
||||||
Stop the dev server (Ctrl+C).
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/scripts/editor-src/autocomplete.js \
|
|
||||||
l4d2web/l4d2web/static/vendor/editor.bundle.js
|
|
||||||
git commit -m "refactor(editor): use shared rankVocab in autocomplete"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Build a standalone ranker bundle for the console
|
|
||||||
|
|
||||||
**Goal:** Produce `vocab-rank.bundle.js` — a tiny IIFE that exposes `window.__rankVocab` — so the non-bundled console-autocomplete.js can call the same ranker.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/scripts/editor-src/vocab-rank-entry.js`
|
|
||||||
- Modify: `l4d2web/scripts/editor-src/package.json`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Create the IIFE entry point**
|
|
||||||
|
|
||||||
Create `l4d2web/scripts/editor-src/vocab-rank-entry.js`:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
import { rankVocab } from "./vocab-rank.js";
|
|
||||||
|
|
||||||
// Expose as a global function so plain (non-module) scripts on
|
|
||||||
// server_detail.html can call window.__rankVocab(query, vocab).
|
|
||||||
window.__rankVocab = rankVocab;
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Add a build script for it in `package.json`**
|
|
||||||
|
|
||||||
Open `l4d2web/scripts/editor-src/package.json` and replace the `"scripts"` block with:
|
|
||||||
|
|
||||||
```json
|
|
||||||
"scripts": {
|
|
||||||
"build:editor": "esbuild editor-entry.js --bundle --minify --format=iife --global-name=__editor_pkg --outfile=../../l4d2web/static/vendor/editor.bundle.js --metafile=meta.json",
|
|
||||||
"build:vocab-rank": "esbuild vocab-rank-entry.js --bundle --minify --format=iife --outfile=../../l4d2web/static/vendor/vocab-rank.bundle.js",
|
|
||||||
"build": "npm run build:editor && npm run build:vocab-rank"
|
|
||||||
},
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Run the build**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web/scripts/editor-src && npm run build
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: two output files updated/created. Verify with:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ls -la l4d2web/l4d2web/static/vendor/editor.bundle.js l4d2web/l4d2web/static/vendor/vocab-rank.bundle.js
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `vocab-rank.bundle.js` exists (should be ~1-3 KB).
|
|
||||||
|
|
||||||
- [ ] **Step 4: Smoke-test the bundle from Node**
|
|
||||||
|
|
||||||
Quick check the bundle is well-formed (no syntax errors):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
node -e 'const fs = require("fs"); const code = fs.readFileSync("l4d2web/l4d2web/static/vendor/vocab-rank.bundle.js", "utf8"); new Function("window", code)({}); console.log("ok");'
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: prints `ok` (means the IIFE parsed and ran).
|
|
||||||
|
|
||||||
- [ ] **Step 5: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/scripts/editor-src/vocab-rank-entry.js \
|
|
||||||
l4d2web/scripts/editor-src/package.json \
|
|
||||||
l4d2web/l4d2web/static/vendor/vocab-rank.bundle.js
|
|
||||||
git commit -m "feat(editor): build standalone vocab-rank bundle for console"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Build the console-autocomplete module
|
|
||||||
|
|
||||||
**Goal:** Create the vanilla-JS module that renders the dropdown, handles keyboard interaction, and binds to console forms (including HTMX-injected ones).
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/l4d2web/static/js/console-autocomplete.js`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write the module**
|
|
||||||
|
|
||||||
Create `l4d2web/l4d2web/static/js/console-autocomplete.js`:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// console-autocomplete.js
|
|
||||||
// Vanilla dropdown autocomplete for [data-console-form] inputs.
|
|
||||||
// Reads ranked completions from window.__rankVocab (loaded via
|
|
||||||
// vocab-rank.bundle.js). Owns: Tab, Shift+Tab, Esc, mouse events.
|
|
||||||
// Leaves: ArrowUp, ArrowDown, Enter (console-history.js owns those).
|
|
||||||
//
|
|
||||||
// First-token only: the dropdown is hidden as soon as the cursor
|
|
||||||
// is past the first space in the input.
|
|
||||||
|
|
||||||
const VOCAB_URL = "/static/data/srccfg-vocab.json";
|
|
||||||
const MAX_RENDERED = 8;
|
|
||||||
let vocabPromise = null;
|
|
||||||
|
|
||||||
function loadVocab() {
|
|
||||||
if (vocabPromise) return vocabPromise;
|
|
||||||
vocabPromise = fetch(VOCAB_URL, { credentials: "same-origin" })
|
|
||||||
.then(r => r.ok ? r.json() : Promise.reject(new Error("vocab fetch failed: " + r.status)))
|
|
||||||
.catch(err => { console.warn("[console-autocomplete] vocab load failed", err); return null; });
|
|
||||||
return vocabPromise;
|
|
||||||
}
|
|
||||||
|
|
||||||
function firstTokenSlice(value, caret) {
|
|
||||||
// Returns the substring [0, end-of-first-token) if the caret is
|
|
||||||
// within the first token; otherwise null.
|
|
||||||
const spaceIdx = value.indexOf(" ");
|
|
||||||
if (spaceIdx === -1) {
|
|
||||||
return { token: value, from: 0, to: value.length };
|
|
||||||
}
|
|
||||||
if (caret > spaceIdx) return null;
|
|
||||||
return { token: value.slice(0, spaceIdx), from: 0, to: spaceIdx };
|
|
||||||
}
|
|
||||||
|
|
||||||
function bindConsoleAutocomplete(form) {
|
|
||||||
if (form.dataset.consoleAutocompleteBound === "true") return;
|
|
||||||
form.dataset.consoleAutocompleteBound = "true";
|
|
||||||
|
|
||||||
const input = form.querySelector("input[name='command']");
|
|
||||||
if (!input) return;
|
|
||||||
|
|
||||||
// --- Dropdown DOM (created lazily on first show) ---
|
|
||||||
let dropdown = null;
|
|
||||||
let items = []; // current ranked items
|
|
||||||
let highlightIdx = 0; // index of currently-highlighted row
|
|
||||||
let vocab = null;
|
|
||||||
|
|
||||||
function ensureDropdown() {
|
|
||||||
if (dropdown) return dropdown;
|
|
||||||
dropdown = document.createElement("div");
|
|
||||||
dropdown.className = "console-autocomplete-dropdown";
|
|
||||||
dropdown.setAttribute("role", "listbox");
|
|
||||||
dropdown.style.display = "none";
|
|
||||||
document.body.appendChild(dropdown);
|
|
||||||
return dropdown;
|
|
||||||
}
|
|
||||||
|
|
||||||
function position() {
|
|
||||||
if (!dropdown) return;
|
|
||||||
const rect = input.getBoundingClientRect();
|
|
||||||
dropdown.style.left = `${rect.left + window.scrollX}px`;
|
|
||||||
dropdown.style.top = `${rect.bottom + window.scrollY}px`;
|
|
||||||
dropdown.style.minWidth = `${rect.width}px`;
|
|
||||||
}
|
|
||||||
|
|
||||||
function close() {
|
|
||||||
if (!dropdown) return;
|
|
||||||
dropdown.style.display = "none";
|
|
||||||
items = [];
|
|
||||||
highlightIdx = 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
function render() {
|
|
||||||
ensureDropdown();
|
|
||||||
if (items.length === 0) { close(); return; }
|
|
||||||
const rows = items.slice(0, MAX_RENDERED).map((e, i) => {
|
|
||||||
const selected = i === highlightIdx ? " aria-selected='true'" : "";
|
|
||||||
const kindClass = e.kind === "command" ? "kind-command" : "kind-cvar";
|
|
||||||
const desc = e.desc ? `<span class="console-autocomplete-desc">${escapeHtml(e.desc)}</span>` : "";
|
|
||||||
return `<div class="console-autocomplete-row ${kindClass}"${selected} role="option" data-idx="${i}"><span class="console-autocomplete-name">${escapeHtml(e.name)}</span>${desc}</div>`;
|
|
||||||
}).join("");
|
|
||||||
dropdown.innerHTML = rows;
|
|
||||||
dropdown.style.display = "block";
|
|
||||||
position();
|
|
||||||
}
|
|
||||||
|
|
||||||
function escapeHtml(s) {
|
|
||||||
return String(s).replace(/[&<>"']/g, c => ({
|
|
||||||
"&": "&", "<": "<", ">": ">", '"': """, "'": "'",
|
|
||||||
}[c]));
|
|
||||||
}
|
|
||||||
|
|
||||||
function acceptHighlighted() {
|
|
||||||
if (items.length === 0) return;
|
|
||||||
const chosen = items[highlightIdx];
|
|
||||||
const slice = firstTokenSlice(input.value, input.selectionStart || 0);
|
|
||||||
if (!slice) return;
|
|
||||||
const before = input.value.slice(0, slice.from);
|
|
||||||
const after = input.value.slice(slice.to);
|
|
||||||
input.value = before + chosen.name + after;
|
|
||||||
// Place caret at end of inserted name
|
|
||||||
const caret = before.length + chosen.name.length;
|
|
||||||
input.setSelectionRange(caret, caret);
|
|
||||||
recompute();
|
|
||||||
}
|
|
||||||
|
|
||||||
function recompute() {
|
|
||||||
if (!vocab) return;
|
|
||||||
const slice = firstTokenSlice(input.value, input.selectionStart || 0);
|
|
||||||
if (!slice || !slice.token) { close(); return; }
|
|
||||||
items = window.__rankVocab(slice.token, vocab);
|
|
||||||
if (items.length === 0) { close(); return; }
|
|
||||||
highlightIdx = 0;
|
|
||||||
render();
|
|
||||||
}
|
|
||||||
|
|
||||||
// --- Lazy vocab fetch on first focus ---
|
|
||||||
input.addEventListener("focus", async () => {
|
|
||||||
if (!vocab) {
|
|
||||||
vocab = await loadVocab();
|
|
||||||
}
|
|
||||||
}, { once: true });
|
|
||||||
|
|
||||||
input.addEventListener("input", () => {
|
|
||||||
if (!vocab) return; // fetch may not have resolved yet; next input will recompute
|
|
||||||
recompute();
|
|
||||||
});
|
|
||||||
|
|
||||||
input.addEventListener("keydown", (event) => {
|
|
||||||
if (event.key === "Tab" && !event.shiftKey) {
|
|
||||||
if (items.length > 0) {
|
|
||||||
event.preventDefault();
|
|
||||||
acceptHighlighted();
|
|
||||||
}
|
|
||||||
} else if (event.key === "Tab" && event.shiftKey) {
|
|
||||||
if (items.length > 0) {
|
|
||||||
event.preventDefault();
|
|
||||||
highlightIdx = (highlightIdx - 1 + Math.min(items.length, MAX_RENDERED))
|
|
||||||
% Math.min(items.length, MAX_RENDERED);
|
|
||||||
render();
|
|
||||||
}
|
|
||||||
} else if (event.key === "Escape") {
|
|
||||||
if (dropdown && dropdown.style.display !== "none") {
|
|
||||||
event.preventDefault();
|
|
||||||
close();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
// ArrowUp/ArrowDown/Enter intentionally NOT handled here.
|
|
||||||
});
|
|
||||||
|
|
||||||
input.addEventListener("blur", () => {
|
|
||||||
// Delay close so a click on a dropdown row can fire first.
|
|
||||||
setTimeout(close, 100);
|
|
||||||
});
|
|
||||||
|
|
||||||
// Mouse click on a row → accept that row.
|
|
||||||
document.addEventListener("mousedown", (event) => {
|
|
||||||
if (!dropdown || dropdown.style.display === "none") return;
|
|
||||||
const row = event.target.closest(".console-autocomplete-row");
|
|
||||||
if (!row || !dropdown.contains(row)) return;
|
|
||||||
event.preventDefault();
|
|
||||||
highlightIdx = parseInt(row.dataset.idx, 10) || 0;
|
|
||||||
acceptHighlighted();
|
|
||||||
input.focus();
|
|
||||||
});
|
|
||||||
|
|
||||||
// HTMX form submission clears the input; close on submit.
|
|
||||||
form.addEventListener("htmx:beforeRequest", close);
|
|
||||||
|
|
||||||
// Reposition on resize/scroll while dropdown is open.
|
|
||||||
window.addEventListener("resize", () => { if (dropdown && dropdown.style.display !== "none") position(); });
|
|
||||||
window.addEventListener("scroll", () => { if (dropdown && dropdown.style.display !== "none") position(); }, true);
|
|
||||||
}
|
|
||||||
|
|
||||||
function bindAll(root) {
|
|
||||||
if (!root) return;
|
|
||||||
const scope = root.matches && root.matches("[data-console-form]") ? [root] : [];
|
|
||||||
if (root.querySelectorAll) {
|
|
||||||
root.querySelectorAll("[data-console-form]").forEach((el) => scope.push(el));
|
|
||||||
}
|
|
||||||
scope.forEach(bindConsoleAutocomplete);
|
|
||||||
}
|
|
||||||
|
|
||||||
document.addEventListener("DOMContentLoaded", () => bindAll(document));
|
|
||||||
document.addEventListener("htmx:load", (event) => bindAll(event.detail.elt));
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Commit (no template/CSS wire-up yet — module is not yet loaded)**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/console-autocomplete.js
|
|
||||||
git commit -m "feat(console): add vanilla autocomplete dropdown module"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: Add dropdown stylesheet
|
|
||||||
|
|
||||||
**Goal:** Provide minimal CSS so the dropdown is positioned, themed via existing CSS tokens, and visually consistent with the editor's autocomplete popup.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/l4d2web/static/css/console-autocomplete.css`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write the stylesheet**
|
|
||||||
|
|
||||||
Create `l4d2web/l4d2web/static/css/console-autocomplete.css`:
|
|
||||||
|
|
||||||
```css
|
|
||||||
/* Console autocomplete dropdown.
|
|
||||||
Positioned absolutely under the console input by JS; visuals match
|
|
||||||
the editor's tooltip styling (var(--cm-*) tokens defined in
|
|
||||||
tokens.css and editor.css). */
|
|
||||||
|
|
||||||
.console-autocomplete-dropdown {
|
|
||||||
position: absolute;
|
|
||||||
z-index: 1000;
|
|
||||||
max-height: calc(8 * 2.4rem);
|
|
||||||
overflow-y: auto;
|
|
||||||
background-color: var(--cm-bg, #1e1e1e);
|
|
||||||
color: var(--cm-fg, #e0e0e0);
|
|
||||||
border: 1px solid var(--border-strong, #444);
|
|
||||||
border-radius: 4px;
|
|
||||||
font-family: var(--font-mono, ui-monospace, SFMono-Regular, Menlo, monospace);
|
|
||||||
font-size: 13px;
|
|
||||||
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-row {
|
|
||||||
display: flex;
|
|
||||||
align-items: baseline;
|
|
||||||
gap: 0.75em;
|
|
||||||
padding: 0.3em 0.6em;
|
|
||||||
cursor: pointer;
|
|
||||||
white-space: nowrap;
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-row[aria-selected="true"] {
|
|
||||||
background-color: var(--cm-selection, #264f78);
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-row:hover {
|
|
||||||
background-color: var(--cm-selection, #264f78);
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-name {
|
|
||||||
font-weight: 600;
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-row.kind-cvar .console-autocomplete-name {
|
|
||||||
color: var(--cm-keyword, #569cd6);
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-row.kind-command .console-autocomplete-name {
|
|
||||||
color: var(--cm-string, #ce9178);
|
|
||||||
}
|
|
||||||
|
|
||||||
.console-autocomplete-desc {
|
|
||||||
color: var(--fg-muted, #888);
|
|
||||||
font-size: 0.9em;
|
|
||||||
overflow: hidden;
|
|
||||||
text-overflow: ellipsis;
|
|
||||||
max-width: 40em;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/css/console-autocomplete.css
|
|
||||||
git commit -m "feat(console): add autocomplete dropdown stylesheet"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 6: Wire up in `base.html`
|
|
||||||
|
|
||||||
**Goal:** Load the ranker bundle, the console-autocomplete script, and the stylesheet — placed alongside the existing `console-history.js` tag so loading order matches.
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/base.html`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Read the current head/body script section**
|
|
||||||
|
|
||||||
Open `l4d2web/l4d2web/templates/base.html` and find the line that currently loads `console-history.js`:
|
|
||||||
|
|
||||||
```html
|
|
||||||
<script defer src="{{ url_for('static', filename='js/console-history.js') }}"></script>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Add the new tags directly after it**
|
|
||||||
|
|
||||||
Add immediately after the `console-history.js` script tag:
|
|
||||||
|
|
||||||
```html
|
|
||||||
<script defer src="{{ url_for('static', filename='vendor/vocab-rank.bundle.js') }}"></script>
|
|
||||||
<script defer src="{{ url_for('static', filename='js/console-autocomplete.js') }}"></script>
|
|
||||||
```
|
|
||||||
|
|
||||||
And add to the `<head>` section (alongside other `<link rel="stylesheet">` tags — search for existing ones in `base.html`):
|
|
||||||
|
|
||||||
```html
|
|
||||||
<link rel="stylesheet" href="{{ url_for('static', filename='css/console-autocomplete.css') }}">
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Sanity-check the template renders without syntax errors**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web && python -c "from l4d2web.app import create_app; create_app(); print('ok')"
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: prints `ok` (Flask app boots; templates are valid Jinja).
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/base.html
|
|
||||||
git commit -m "feat(console): wire up autocomplete bundle + stylesheet in base.html"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 7: End-to-end smoke test
|
|
||||||
|
|
||||||
**Goal:** Verify the full feature works in the browser against the dev server.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Start the dev server**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web && python ../scripts/dev-server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: server starts on `http://localhost:5000` (or whatever the script reports). `LEFT4ME_ROOT` is auto-set to `.tmp/dev-server` and seeded with demo content per memory.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run through the smoke-test checklist in a browser**
|
|
||||||
|
|
||||||
Open a server-detail page (one of the demo servers seeded by the dev server). Then verify each:
|
|
||||||
|
|
||||||
1. **Vocab fetch is lazy.** Open DevTools → Network → filter `srccfg-vocab`. Reload page. **Expected:** no request yet.
|
|
||||||
2. **Click into the console input.** **Expected:** one `srccfg-vocab.json` request fires.
|
|
||||||
3. **Type `sv_`.** **Expected:** dropdown appears showing cvars starting with `sv_`. Top row highlighted.
|
|
||||||
4. **Press Tab.** **Expected:** first token replaced with the highlighted suggestion (e.g. `sv_cheats`). Dropdown updates with matches for the new query.
|
|
||||||
5. **Press Shift+Tab.** **Expected:** highlight moves up; or wraps to bottom if at top.
|
|
||||||
6. **Press Esc.** **Expected:** dropdown closes. Input value unchanged.
|
|
||||||
7. **Type a space then `god`.** **Expected:** dropdown stays hidden (we're past the first token).
|
|
||||||
8. **Press ArrowUp.** **Expected:** history recall works — input is replaced with a previously submitted command. No interference from autocomplete.
|
|
||||||
9. **Clear the input. Type `sv_che`.** Verify `sv_cheats` is highlighted in the dropdown. **Press Enter.** **Expected:** the server console receives `sv_che` (the typed text), not `sv_cheats`. Confirm in the console transcript.
|
|
||||||
10. **Refocus the input.** **Expected:** no second `srccfg-vocab.json` request (cached in module-scope promise).
|
|
||||||
11. **Click on a dropdown row with the mouse.** **Expected:** that row's command is inserted into the input.
|
|
||||||
12. **Editor regression check.** Navigate to a `.cfg` file in the editor (files view). Type `sv_`. **Expected:** editor's autocomplete still works exactly as before.
|
|
||||||
|
|
||||||
If all 12 pass, the feature is complete.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Stop dev server (Ctrl+C) and confirm final commit state**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git log --oneline -10
|
|
||||||
git status
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: 6 new commits ahead of the pre-feature state; working tree clean.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Summary
|
|
||||||
|
|
||||||
- **Unit tests:** `cd l4d2web/scripts/editor-src && node --test vocab-rank.test.js` — 10 passing tests for the ranker.
|
|
||||||
- **Manual editor regression:** Editor autocomplete still works on `.cfg` files.
|
|
||||||
- **Manual console smoke test:** 12-point checklist in Task 7 Step 2.
|
|
||||||
- **No new runtime JS dependencies** added (vocab-rank.test.js uses only `node:test` + `node:assert/strict`, which are built into Node ≥ 18).
|
|
||||||
|
|
||||||
## What's Explicitly Out of Scope
|
|
||||||
|
|
||||||
- Argument value completion (player names, map names) — would require runtime data, not in `srccfg-vocab.json`.
|
|
||||||
- Fuzzy / typo-tolerant matching.
|
|
||||||
- Replacing CodeMirror's editor dropdown with a custom widget.
|
|
||||||
- Cross-browser e2e automation (no Playwright/Cypress in the codebase; not adding one as part of this work).
|
|
||||||
|
|
@ -1,243 +0,0 @@
|
||||||
# Files-overlay E2E test handoff
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The files-overlay rewrite (commits `4fa3964..8dc14f0`, May 2026)
|
|
||||||
moved all editor flows behind URL-addressable modals and split the
|
|
||||||
1091-line `files-overlay.js` monolith into four focused modules under
|
|
||||||
`l4d2web/l4d2web/static/js/files-overlay/`. Behavior was verified
|
|
||||||
step-by-step in Chromium during the rewrite, but there is no automated
|
|
||||||
browser regression coverage for the editor / dialog / upload flows.
|
|
||||||
|
|
||||||
The existing Playwright suite (`l4d2web/tests/e2e/test_editor.py`)
|
|
||||||
covers only the CodeMirror 6 controller — autocomplete, form-bridge,
|
|
||||||
copy/paste — invoked through a blueprint detail page. Nothing
|
|
||||||
exercises the file manager UI.
|
|
||||||
|
|
||||||
This handoff specifies what to add: fixture extensions, the test
|
|
||||||
cases worth writing, and the patterns / pitfalls a future implementer
|
|
||||||
should know before starting. Estimated effort: a focused half-day for
|
|
||||||
the seven critical cases, a full day for the full matrix.
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Lock down the user-visible behavior of the four files-overlay modules
|
|
||||||
against future regressions. The rewrite proved each module works in
|
|
||||||
isolation; e2e proves they cooperate over real DOM, real HTTP, real
|
|
||||||
HTMX, and real CodeMirror.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- Re-testing pure CodeMirror behavior (the existing `test_editor.py`
|
|
||||||
covers this on a non-files page; the controller is the same one).
|
|
||||||
- Replacing the existing pytest route tests (`tests/test_overlay_files_routes.py`,
|
|
||||||
`tests/test_url_addressable_modals.py`). E2E adds *integration*
|
|
||||||
coverage on top of those, not in place.
|
|
||||||
- Performance / load testing of the upload queue (concurrency 3 is
|
|
||||||
the current behavior; testing it would need 4+ simultaneous uploads
|
|
||||||
and is high-flake low-value).
|
|
||||||
- The drag-drop-from-OS path. Playwright can't synthesize a real OS
|
|
||||||
drag (`webkitGetAsEntry` returns `null` for synthetic drops, so the
|
|
||||||
fallback `getAsFile` branch always runs). The internal-drag path
|
|
||||||
(row → folder) is testable; the external drag fallback is covered
|
|
||||||
enough by the route tests.
|
|
||||||
|
|
||||||
## Fixture work
|
|
||||||
|
|
||||||
`l4d2web/tests/e2e/conftest.py` currently seeds only a `User` and a
|
|
||||||
`Blueprint`. The files-overlay tests need a files-type overlay with a
|
|
||||||
working filesystem root. Add a new fixture (or extend `live_server`):
|
|
||||||
|
|
||||||
```python
|
|
||||||
# tests/e2e/conftest.py
|
|
||||||
|
|
||||||
@pytest.fixture(scope="function")
|
|
||||||
def files_overlay_server(tmp_path, monkeypatch):
|
|
||||||
"""live_server + a files-type Overlay seeded with a small fixture
|
|
||||||
set: one editable text file, one binary file, one nested folder
|
|
||||||
with one file inside.
|
|
||||||
|
|
||||||
Returns {base_url, user_id, overlay_id, overlay_root: Path}.
|
|
||||||
"""
|
|
||||||
# Same boot as live_server (extract a helper to avoid duplication).
|
|
||||||
# Set LEFT4ME_ROOT to tmp_path before create_app() so the files
|
|
||||||
# overlay's path resolution lands under tmp_path.
|
|
||||||
monkeypatch.setenv("LEFT4ME_ROOT", str(tmp_path))
|
|
||||||
...
|
|
||||||
|
|
||||||
with session_scope() as session:
|
|
||||||
user = User(username="alice", password_digest=hash_password("secret"), admin=False)
|
|
||||||
session.add(user); session.flush()
|
|
||||||
overlay = Overlay(name="cfgs", path="", type="files", user_id=user.id)
|
|
||||||
session.add(overlay); session.flush()
|
|
||||||
overlay.path = str(overlay.id)
|
|
||||||
overlay_root = tmp_path / "overlays" / str(overlay.id)
|
|
||||||
overlay_root.mkdir(parents=True)
|
|
||||||
(overlay_root / "server.cfg").write_text("hostname \"left4me\"\n")
|
|
||||||
(overlay_root / "icon.png").write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 60)
|
|
||||||
(overlay_root / "cfg").mkdir()
|
|
||||||
(overlay_root / "cfg" / "admins.txt").write_text("STEAM_1:0:1\n")
|
|
||||||
user_id, overlay_id = user.id, overlay.id
|
|
||||||
...
|
|
||||||
yield {
|
|
||||||
"base_url": ...,
|
|
||||||
"user_id": user_id,
|
|
||||||
"overlay_id": overlay_id,
|
|
||||||
"overlay_root": overlay_root,
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The `LEFT4ME_ROOT` env-var monkey-patch is critical — without it,
|
|
||||||
`overlay_files.resolve_overlay_root` falls back to the production
|
|
||||||
`/var/lib/left4me` path (per the `AGENTS.md` "symptom-to-cause"
|
|
||||||
note) and every route returns 404. Set it BEFORE `create_app()`.
|
|
||||||
|
|
||||||
## Test cases to add
|
|
||||||
|
|
||||||
Suggested file: `l4d2web/tests/e2e/test_files_overlay.py`. Pattern
|
|
||||||
each test like the existing `test_editor.py`: log in via the form,
|
|
||||||
navigate to `/overlays/<id>`, drive the UI through Playwright `page`
|
|
||||||
locators, assert on DOM state + filesystem state under
|
|
||||||
`overlay_root`.
|
|
||||||
|
|
||||||
### Tier 1 — critical paths (write these first)
|
|
||||||
|
|
||||||
1. **`test_edit_text_file_save_round_trip`**
|
|
||||||
- Click `server.cfg` filename. Wait for `#modal-content
|
|
||||||
textarea[data-rel-path="server.cfg"]`. URL should contain
|
|
||||||
`?modal=%2Foverlays%2F<id>%2Ffiles%2Fedit%3Fpath%3Dserver.cfg`.
|
|
||||||
- Modify content via Playwright `page.fill` on the textarea (or
|
|
||||||
via the `__filesEditor.setContent` controller for the CM6 case
|
|
||||||
— the existing `test_editor.py` shows both approaches).
|
|
||||||
- Click `.files-editor-save`. Modal closes (modal-container
|
|
||||||
`aria-modal` gone / `open` false).
|
|
||||||
- Assert `overlay_root / "server.cfg"` on disk has the new content.
|
|
||||||
|
|
||||||
2. **`test_create_new_file_routed`**
|
|
||||||
- Click `+ new file` on the overlay-root row. Wait for
|
|
||||||
`#modal-content textarea[data-rel-path=""]` and save button
|
|
||||||
labeled `Create`.
|
|
||||||
- Type a filename and content. Click Create.
|
|
||||||
- Assert file appears on disk + the file tree refreshes to show
|
|
||||||
the new row.
|
|
||||||
|
|
||||||
3. **`test_create_new_file_409_askConflict_keep_both`**
|
|
||||||
- Click `+ new file`. Type `cfg` as the filename (collides with
|
|
||||||
the seeded directory). Click Create.
|
|
||||||
- Wait for `#files-conflict-modal[open]`. Its
|
|
||||||
`.files-conflict-path` should read `cfg`.
|
|
||||||
- Click `[data-files-conflict-action="keep-both"]`.
|
|
||||||
- Assert the file `cfg (1)` appears on disk and the routed modal
|
|
||||||
closes.
|
|
||||||
- This is the path F4 (`8dc14f0`) added; without coverage it can
|
|
||||||
regress silently.
|
|
||||||
|
|
||||||
4. **`test_open_binary_file_renders_replace_ui`**
|
|
||||||
- Click `icon.png`. Modal opens.
|
|
||||||
- Assert `#modal-content .files-editor-binary[data-rel-path="icon.png"]`
|
|
||||||
exists, save button reads `Replace` and is disabled,
|
|
||||||
`.files-editor-replace-zone` and the download anchor are present.
|
|
||||||
|
|
||||||
5. **`test_binary_replace_via_browse_writes_new_bytes`**
|
|
||||||
- Open `icon.png` editor (as above).
|
|
||||||
- Click `.files-editor-replace-browse`. Use Playwright's
|
|
||||||
`page.expect_file_chooser()` to attach a small File buffer.
|
|
||||||
- Save button enables. Click it. Modal closes.
|
|
||||||
- Assert the file's bytes on disk are the new content.
|
|
||||||
|
|
||||||
6. **`test_new_folder_then_delete`**
|
|
||||||
- Click `+ new folder` on the overlay root. Inline dialog opens.
|
|
||||||
- Type a name, press Enter (keydown path). Dialog closes.
|
|
||||||
- Assert folder exists on disk + appears in tree.
|
|
||||||
- Click the folder's `✕`. Delete-confirm dialog opens with the
|
|
||||||
folder name. Click `.files-delete-confirm`.
|
|
||||||
- Assert folder gone from disk + from tree.
|
|
||||||
|
|
||||||
7. **`test_filename_rename_on_save`**
|
|
||||||
- Open `server.cfg`. Change the filename input to
|
|
||||||
`server-renamed.cfg`. Click Save.
|
|
||||||
- Assert disk has the new name + old name gone + tree row updated.
|
|
||||||
|
|
||||||
### Tier 2 — round out the matrix
|
|
||||||
|
|
||||||
8. **`test_drag_row_to_folder_moves_file`** — internal drag.
|
|
||||||
Playwright's `locator.drag_to()` can move a row onto a folder.
|
|
||||||
Assert the move via `/files/move` succeeded and disk reflects it.
|
|
||||||
|
|
||||||
9. **`test_upload_queue_progress`** — drop a single file onto the
|
|
||||||
tree root. The progress panel becomes visible; the row enters
|
|
||||||
`data-state="active"`, then `data-state="done"`. Assert the
|
|
||||||
uploaded file is on disk. (Skip the 409 / conflict / cancel
|
|
||||||
permutations — they're covered by the route tests.)
|
|
||||||
|
|
||||||
10. **`test_modal_close_on_escape_preserves_no_state`** — open the
|
|
||||||
routed editor, type some content but don't save, press Escape.
|
|
||||||
Modal closes. Reopen — content is fresh (no stale buffer),
|
|
||||||
`routedReplacement` cleared.
|
|
||||||
|
|
||||||
11. **`test_share_url_deep_link_reopens_editor`** — navigate
|
|
||||||
directly to `/overlays/<id>?modal=%2Foverlays%2F<id>%2Ffiles%2Fedit%3Fpath%3Dserver.cfg`.
|
|
||||||
Modal should auto-open on DOMContentLoaded (the bootstrap path
|
|
||||||
from `modals.js`). This is the URL-addressable spec's central
|
|
||||||
promise; without coverage it regresses easily.
|
|
||||||
|
|
||||||
### Tier 3 — nice to have
|
|
||||||
|
|
||||||
12. Server detail page hover-download button (the F0 prefactor):
|
|
||||||
seed a server, navigate to `/servers/<id>`, hover a file row,
|
|
||||||
click the `⬇` button, assert a file download initiates.
|
|
||||||
|
|
||||||
## Patterns to follow / pitfalls
|
|
||||||
|
|
||||||
- **The existing `test_editor.py` is the closest pattern.** Read it
|
|
||||||
end-to-end before starting. The login helper, the `live_server`
|
|
||||||
fixture shape, the `expect`-based assertions, and the way
|
|
||||||
Playwright interacts with the CM6 controller (`page.evaluate(...)`
|
|
||||||
on `window.__filesEditor`) all transfer.
|
|
||||||
- **Run with `uv run pytest -m e2e tests/e2e/test_files_overlay.py`.**
|
|
||||||
Anything else crashes Chromium under macOS sandbox.
|
|
||||||
`uv run playwright install chromium` once per fresh checkout.
|
|
||||||
- **Routed modals load via `htmx.ajax` — they're async.** Don't assert
|
|
||||||
immediately after the click. Use `expect(page.locator(...)).to_be_visible()`
|
|
||||||
with a timeout (Playwright's default 5s is fine).
|
|
||||||
- **Reading the file tree after a refresh is also async.** The JS
|
|
||||||
`scheduleRefresh` debounces by 50ms then fetches the directory
|
|
||||||
partial via HTMX. Use `expect(page.locator(".file-tree-row-file[data-target-path='...']")).to_be_visible()`
|
|
||||||
rather than polling DOM directly.
|
|
||||||
- **`data-rel-path` lives on the textarea in text mode and on the
|
|
||||||
binary panel in binary mode.** Tests asserting "the editor opened
|
|
||||||
for X" should query whichever matches — or use the fragment
|
|
||||||
wrapper `#files-editor-fragment` as a stable container.
|
|
||||||
- **The conflict dialog is inline, not routed.** Don't expect URL
|
|
||||||
changes when it opens. The decision tree:
|
|
||||||
- "Did the URL change?" → routed modal (editor) vs. inline modal
|
|
||||||
(new-folder, conflict, delete-confirm).
|
|
||||||
- **`SESSION_COOKIE_SECURE=0` is non-optional.** The fixture must set
|
|
||||||
it; otherwise the browser drops the session cookie over http and
|
|
||||||
every test redirects back to `/login`. The existing `conftest.py`
|
|
||||||
has the right pattern at line 39.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
Per AGENTS.md: `uv run pytest -m e2e tests/e2e/test_files_overlay.py -v`.
|
|
||||||
The tier-1 seven cases should pass green in <60s on a warm Chromium.
|
|
||||||
The full matrix (12 cases) target <2 minutes.
|
|
||||||
|
|
||||||
When wiring CI / pre-push hooks: the e2e marker is excluded from the
|
|
||||||
default fast suite, so the existing 580-passing `uv run pytest tests/`
|
|
||||||
run remains the quick check. The e2e suite runs explicitly when
|
|
||||||
`-m e2e` is set.
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- `l4d2web/tests/e2e/test_editor.py` — pattern model
|
|
||||||
- `l4d2web/tests/e2e/conftest.py:39` — `SESSION_COOKIE_SECURE` note
|
|
||||||
- `l4d2web/tests/test_url_addressable_modals.py` — non-browser route
|
|
||||||
tests that already cover the server-side contract (200/404/415/400
|
|
||||||
on edit, new, save). E2E shouldn't duplicate these.
|
|
||||||
- `l4d2web/l4d2web/static/js/files-overlay/{core,editor,dialogs,uploads}.js`
|
|
||||||
— read each file's module header comment for the listener layout
|
|
||||||
before writing assertions.
|
|
||||||
- `AGENTS.md` "Files overlay: module layout" — high-level orientation.
|
|
||||||
- `AGENTS.md` "Modals: inline vs routed" — decision tree the test
|
|
||||||
matrix follows.
|
|
||||||
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -1,973 +0,0 @@
|
||||||
# URL-Addressable Modals Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Pilot the swift3-style URL-addressable modal pattern in left4me by migrating the file editor's open/render flow. Same URL renders as a full page or a layoutless fragment based on an `HX-Modal: 1` request header. Save flow stays AJAX.
|
|
||||||
|
|
||||||
**Architecture:** Approach C (Hybrid). Custom ~50-line `modal-router.js` owns click intercept, `?modal=<path>` URL composition, history, and native `<dialog>` open/close. HTMX (already loaded) owns fetch + swap + loading state. Jinja `inject_base_layout` context processor switches between `base.html` and `_modal_partial.html` based on the header.
|
|
||||||
|
|
||||||
**Tech Stack:** Flask 3.x + Jinja2, HTMX 2.0.4, native `<dialog>`, CodeMirror 6 (already bundled as `editor.bundle.js`), pytest for backend tests, Chromium for frontend verification.
|
|
||||||
|
|
||||||
**Spec reference:** `docs/superpowers/specs/2026-05-17-url-addressable-modals-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Errata (post-execution)
|
|
||||||
|
|
||||||
The plan shipped via 14 commits between 2026-05-17 and the same day's evening. Three defects in the verbatim plan code were caught by code review during execution; if you re-run this plan, watch for them:
|
|
||||||
|
|
||||||
1. **Task 1, Step 4 — context processor needs a `has_request_context()` guard.** Plan code reads `request.headers.get("HX-Modal")` unconditionally, but `tests/test_timeago.py` renders templates inside `app.app_context()` only (no request context). Without the guard the processor crashes with `RuntimeError: Working outside of request context`. Fix: `is_modal = has_request_context() and request.headers.get("HX-Modal") == "1"` (lazy import `from flask import has_request_context` is fine). Shipped in commit `82c3f04`.
|
|
||||||
|
|
||||||
2. **Task 3, Step 1 — test fixture must respect `LEFT4ME_ROOT`.** Plan code uses `path=str(overlay_root)` (absolute filesystem path) on the `Overlay` model. The codebase resolves `overlay.path` relative to `LEFT4ME_ROOT` via `validate_overlay_ref` and rejects absolute paths. Fix: `monkeypatch.setenv("LEFT4ME_ROOT", str(tmp_path))`, write files to `tmp_path/overlays/<id>/`, set `overlay.path = str(overlay.id)`. Mirrors `tests/test_overlay_files_routes.py`'s convention. Shipped in commit `60e7968`.
|
|
||||||
|
|
||||||
3. **Task 9, Step 2 — "save flow unchanged" was wrong.** The legacy save/delete handlers in `files-overlay.js` are direct-bound to `editorEls.saveBtn` / `editorEls.deleteBtn` (the inline dialog's specific elements), not document-delegated. The new server-rendered modal's identical-class buttons get no handler. Fix: add document-level event delegation for `.files-editor-save` and `.files-editor-delete` clicks gated on `modalContent.contains(btn)`, read `data-rel-path` from the textarea (NOT from a JS var the now-deleted open path used to set), use `window.__filesEditor.getValue()`, POST + `closeModal()` + `scheduleRefresh(parentOf(path))`. Also support rename: read filename input, compose `payload.new_path = parent/filename` when changed, handle 409 with alert + keep modal open. Shipped across commits `64cf203` and `33a2e52`.
|
|
||||||
|
|
||||||
## Tasks added during execution
|
|
||||||
|
|
||||||
Three tasks were inserted that weren't in the original plan:
|
|
||||||
|
|
||||||
- **Task 8.5 (commit `f6b8ecf`)** — `overlay_file_editor.html`'s `<dialog open>` nested inside `<dialog id="modal-container">` collapses to 2 px tall in browsers. Replaced with `<div role="document">`. Bundled with CM6 `controller.destroy()` on modal close (memory leak fix — every open/close cycle had been orphaning an `EditorView` and a `matchMedia` listener) and a `mountOne` idempotency guard. CSS broadened: `dialog.modal, div.modal`.
|
|
||||||
- **Task 8.5b (commit `7829d1c`)** — the broadened CSS caused double-card painting (outer dialog + inner div both matched the `.modal` styling). Dropped `class="modal modal-wide"` and `role="document"` from the inner div; the outer dialog owns the chrome.
|
|
||||||
- **Task 9b (commit `33a2e52`)** — see defect #3 above for rename-on-save support.
|
|
||||||
|
|
||||||
## Design refinement during execution (Task 6 superseded)
|
|
||||||
|
|
||||||
Task 6's original "every close source updates state directly" code was replaced with a close-event-centric design: every close source (Esc cancel, backdrop click, `[data-modal-dismiss]`, browser back, `htmx:responseError`, programmatic close) just calls `dialog.close()`, and a single `close`-event listener clears `currentModalPath` and removes `?modal=` from the URL. This kills two latent bugs simultaneously: (a) the legacy `modal.js:31-33` backdrop handler closes `dialog.modal` without clearing URL, and (b) HTMX's `htmx.ajax` resolves on 4xx so plain `.then(() => showModal())` would open a modal on error responses. Shipped in commit `6e66375`. The revised design is in that commit's diff.
|
|
||||||
|
|
||||||
## Post-pilot polish (commits 5dc4xx after Task 10)
|
|
||||||
|
|
||||||
- Removed dangling `aria-labelledby="modal-content-title"` from `#modal-container` in `base.html` (referenced an id that never existed).
|
|
||||||
- Renamed the new editor template's outer `<div>` id from `files-editor-modal` to `files-editor-fragment` to resolve a duplicate-id W3C violation with the legacy inline `<dialog id="files-editor-modal">` in `overlay_detail.html`. Updated `editor.js`'s `closest()` to match both selectors so auto-language detection works for both modal pipelines.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
| Path | New / Modify | Responsibility |
|
|
||||||
|------|--------------|----------------|
|
|
||||||
| `l4d2web/l4d2web/app.py` | Modify (insert ~5 lines after `add_template_filter`) | Register `inject_base_layout` context processor |
|
|
||||||
| `l4d2web/l4d2web/templates/_modal_partial.html` | New (1 line) | Layoutless base template — just `{% block content %}{% endblock %}` |
|
|
||||||
| `l4d2web/l4d2web/templates/overlay_file_editor.html` | New | Editor markup lifted from `overlay_detail.html:165-228`, content pre-filled, extends `base_layout` |
|
|
||||||
| `l4d2web/l4d2web/routes/files_routes.py` | Modify (add one route, ~30 lines) | `GET /overlays/<id>/files/edit?path=<rel>` |
|
|
||||||
| `l4d2web/l4d2web/templates/base.html` | Modify (insert ~3 lines) | Persistent `<dialog id="modal-container">` slot + `modal-router.js` script include |
|
|
||||||
| `l4d2web/l4d2web/static/js/modal-router.js` | New (~60 lines) | Click intercept, URL composition, history, open/close, bootstrap |
|
|
||||||
| `l4d2web/l4d2web/static/js/editor.js` | Modify (expose `initEditors(root)`, add `htmx:afterSwap` listener) | CM6 re-init after HTMX swap |
|
|
||||||
| `l4d2web/l4d2web/static/js/files-overlay.js` | Modify (change one code path) | Replace inline-dialog populate-and-show with `window.openModal(url)` |
|
|
||||||
| `l4d2web/l4d2web/templates/overlay_detail.html` | Modify (remove `<dialog id="files-editor-modal">` block at lines 165-228) | Delete the old inline editor dialog |
|
|
||||||
| `l4d2web/tests/test_url_addressable_modals.py` | New | pytest coverage for context processor + new edit route |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Layout context processor + partial template
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/l4d2web/templates/_modal_partial.html`
|
|
||||||
- Modify: `l4d2web/l4d2web/app.py` (insert after `app.add_template_filter(format_time_html, "timeago")` on line 62)
|
|
||||||
- Test: `l4d2web/tests/test_url_addressable_modals.py` (new)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write the failing test**
|
|
||||||
|
|
||||||
Create `l4d2web/tests/test_url_addressable_modals.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from flask import render_template_string
|
|
||||||
|
|
||||||
from l4d2web.app import create_app
|
|
||||||
|
|
||||||
|
|
||||||
def _make_app(tmp_path, monkeypatch, db_name: str):
|
|
||||||
db_url = f"sqlite:///{tmp_path/db_name}"
|
|
||||||
monkeypatch.setenv("DATABASE_URL", db_url)
|
|
||||||
return create_app({"TESTING": True, "DATABASE_URL": db_url, "SECRET_KEY": "test"})
|
|
||||||
|
|
||||||
|
|
||||||
def test_base_layout_is_modal_partial_when_hx_modal_header_set(tmp_path, monkeypatch):
|
|
||||||
app = _make_app(tmp_path, monkeypatch, "layout-modal.db")
|
|
||||||
with app.test_request_context("/", headers={"HX-Modal": "1"}):
|
|
||||||
assert render_template_string("{{ base_layout }}") == "_modal_partial.html"
|
|
||||||
|
|
||||||
|
|
||||||
def test_base_layout_is_base_html_for_normal_request(tmp_path, monkeypatch):
|
|
||||||
app = _make_app(tmp_path, monkeypatch, "layout-default.db")
|
|
||||||
with app.test_request_context("/"):
|
|
||||||
assert render_template_string("{{ base_layout }}") == "base.html"
|
|
||||||
|
|
||||||
|
|
||||||
def test_base_layout_does_not_react_to_plain_hx_request_header(tmp_path, monkeypatch):
|
|
||||||
# HTMX sets HX-Request on every request including the build-status poll;
|
|
||||||
# only HX-Modal should switch the layout.
|
|
||||||
app = _make_app(tmp_path, monkeypatch, "layout-hxreq.db")
|
|
||||||
with app.test_request_context("/", headers={"HX-Request": "true"}):
|
|
||||||
assert render_template_string("{{ base_layout }}") == "base.html"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run test to verify it fails**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/test_url_addressable_modals.py -v`
|
|
||||||
|
|
||||||
Expected: 3 failures (all asserting that `base_layout` resolves to something — currently undefined, so render fails with `UndefinedError` or returns empty string).
|
|
||||||
|
|
||||||
- [ ] **Step 3: Create the partial template**
|
|
||||||
|
|
||||||
Create `l4d2web/l4d2web/templates/_modal_partial.html` with exactly this content:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
{% block content %}{% endblock %}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Register the context processor**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/app.py`, insert immediately after line 62 (`app.add_template_filter(format_time_html, "timeago")`):
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.context_processor
|
|
||||||
def inject_base_layout() -> dict[str, str]:
|
|
||||||
is_modal = request.headers.get("HX-Modal") == "1"
|
|
||||||
return {"base_layout": "_modal_partial.html" if is_modal else "base.html"}
|
|
||||||
```
|
|
||||||
|
|
||||||
`request` is already imported at the top of the file.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run tests to verify pass**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/test_url_addressable_modals.py -v`
|
|
||||||
|
|
||||||
Expected: 3 passed.
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/app.py l4d2web/l4d2web/templates/_modal_partial.html l4d2web/tests/test_url_addressable_modals.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): layout context processor for HX-Modal header
|
|
||||||
|
|
||||||
Switches the Jinja base layout to _modal_partial.html (yield-only) when
|
|
||||||
the HX-Modal:1 request header is set, otherwise base.html. Foundation
|
|
||||||
for URL-addressable modals (spec 2026-05-17-url-addressable-modals).
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Editor template (file editor as standalone page)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `l4d2web/l4d2web/templates/overlay_file_editor.html`
|
|
||||||
- Test: covered by Task 3's route tests (template is unreachable until then)
|
|
||||||
|
|
||||||
This task is a lift-and-shift of the editor markup from `overlay_detail.html:165-228` into its own template with server-side content variables substituted in.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Read the source markup to lift**
|
|
||||||
|
|
||||||
Run: `sed -n '164,228p' l4d2web/l4d2web/templates/overlay_detail.html`
|
|
||||||
|
|
||||||
Note the surrounding `{% if files_can_edit %}` guard — that gating moves to the route (only `files` overlays expose the link). The template itself unconditionally renders the editor.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Create the new template**
|
|
||||||
|
|
||||||
Create `l4d2web/l4d2web/templates/overlay_file_editor.html`:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
{% extends base_layout %}
|
|
||||||
{% block title %}Edit {{ rel_path }} · {{ overlay.name }}{% endblock %}
|
|
||||||
{% block extra_head %}{% include "_editor_assets.html" %}{% endblock %}
|
|
||||||
{% block content %}
|
|
||||||
<dialog id="files-editor-modal" class="modal modal-wide" open aria-labelledby="files-editor-title">
|
|
||||||
<div class="modal-header">
|
|
||||||
<h2 id="files-editor-title" class="files-editor-path">
|
|
||||||
<span class="files-editor-title-text">{{ rel_path }}</span>
|
|
||||||
</h2>
|
|
||||||
<button type="button" class="modal-close" data-modal-dismiss aria-label="Close">×</button>
|
|
||||||
</div>
|
|
||||||
<div class="modal-body">
|
|
||||||
<label class="files-editor-field">
|
|
||||||
<span class="files-field-label">Filename</span>
|
|
||||||
<input type="text" class="files-editor-filename" data-editor-filename autocomplete="off" spellcheck="false" value="{{ rel_path }}">
|
|
||||||
</label>
|
|
||||||
<p class="files-editor-rename-hint" hidden>↻ Save will rename <code class="files-rename-from"></code> → <code class="files-rename-to"></code>.</p>
|
|
||||||
|
|
||||||
<div class="files-editor-text">
|
|
||||||
<label class="files-editor-field files-editor-language-field">
|
|
||||||
<span class="files-field-label">Language</span>
|
|
||||||
<select data-editor-language-select aria-label="Editor language">
|
|
||||||
<option value="auto">auto (from filename)</option>
|
|
||||||
<option value="srccfg">srccfg (.cfg)</option>
|
|
||||||
<option value="bash">bash (.sh)</option>
|
|
||||||
<option value="plain">plain</option>
|
|
||||||
</select>
|
|
||||||
</label>
|
|
||||||
<label class="files-editor-field">
|
|
||||||
<span class="files-field-label">Content</span>
|
|
||||||
<div class="editor-mount" style="--editor-rows: 14"><textarea class="files-editor-content" rows="14" spellcheck="false" data-editor-language="auto" data-overlay-id="{{ overlay.id }}" data-rel-path="{{ rel_path }}">{{ content }}</textarea></div>
|
|
||||||
</label>
|
|
||||||
<div class="files-editor-meta muted">
|
|
||||||
<span class="files-editor-byte-count">UTF-8 · {{ byte_count }} bytes</span>
|
|
||||||
<span>Ctrl+S to save</span>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
<div class="modal-footer files-editor-footer">
|
|
||||||
<button type="button" class="danger-outline files-editor-delete">Delete</button>
|
|
||||||
<span class="files-editor-footer-spacer"></span>
|
|
||||||
<a class="button-secondary files-editor-download" href="/overlays/{{ overlay.id }}/files/download?path={{ rel_path|urlencode }}">⬇ Download</a>
|
|
||||||
<button type="button" class="button-secondary" data-modal-dismiss>Cancel</button>
|
|
||||||
<button type="button" class="files-editor-save">Save</button>
|
|
||||||
</div>
|
|
||||||
</dialog>
|
|
||||||
{% endblock %}
|
|
||||||
```
|
|
||||||
|
|
||||||
Notes baked into the markup:
|
|
||||||
- `{% extends base_layout %}` — picks `_modal_partial.html` or `base.html` based on the request header
|
|
||||||
- `<dialog … open>` for the full-page render — when standalone, the dialog stays open without `showModal()`. When fragment-rendered into the modal slot, `modal-router.js` calls `showModal()` on the *outer* `#modal-container` (not this inner dialog — see Task 4)
|
|
||||||
- `data-modal-dismiss` on close buttons — picked up by modal-router (deferred to Task 6)
|
|
||||||
- `data-overlay-id` + `data-rel-path` on the textarea — so the AJAX save in `files-overlay.js` can find its target without depending on global state
|
|
||||||
- Binary-file replacement UI from `overlay_detail.html:204-219` is **omitted** from this pilot template. Editable-only files reach this route (the route returns 415 for non-editable per Task 3). Binary replace stays inline-modal for now (out of pilot scope)
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/overlay_file_editor.html
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): editor template that extends base_layout
|
|
||||||
|
|
||||||
Lifts the file editor markup out of overlay_detail.html into its own
|
|
||||||
template with server-side filename, content, byte count, and download
|
|
||||||
URL pre-filled. Uses {% extends base_layout %} so the same template
|
|
||||||
renders as either a full page or a layoutless modal fragment.
|
|
||||||
|
|
||||||
Binary replace UI deferred — pilot scope is editable text files only.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: New GET `/overlays/<id>/files/edit` route
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/routes/files_routes.py` (add one route, ~35 lines)
|
|
||||||
- Test: `l4d2web/tests/test_url_addressable_modals.py` (extend)
|
|
||||||
|
|
||||||
The route mirrors the existing `overlay_file_content` at `files_routes.py:203-234`: resolves the path, checks editability, reads UTF-8 content. Difference: returns HTML (via `overlay_file_editor.html`) instead of JSON.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write the failing tests**
|
|
||||||
|
|
||||||
Append to `l4d2web/tests/test_url_addressable_modals.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from datetime import UTC, datetime
|
|
||||||
|
|
||||||
from l4d2web.auth import hash_password
|
|
||||||
from l4d2web.db import init_db, session_scope
|
|
||||||
from l4d2web.models import Overlay, User
|
|
||||||
|
|
||||||
|
|
||||||
def _auth_client_with_files_overlay(tmp_path, monkeypatch, db_name: str):
|
|
||||||
db_url = f"sqlite:///{tmp_path/db_name}"
|
|
||||||
monkeypatch.setenv("DATABASE_URL", db_url)
|
|
||||||
app = create_app({"TESTING": True, "DATABASE_URL": db_url, "SECRET_KEY": "test"})
|
|
||||||
init_db()
|
|
||||||
|
|
||||||
overlay_root = tmp_path / "overlay_root"
|
|
||||||
overlay_root.mkdir()
|
|
||||||
(overlay_root / "server.cfg").write_text("hostname \"left4me\"\nrcon_password \"hunter2\"\n", encoding="utf-8")
|
|
||||||
|
|
||||||
with session_scope() as session:
|
|
||||||
user = User(username="alice", password_digest=hash_password("secret"), admin=False)
|
|
||||||
session.add(user)
|
|
||||||
session.flush()
|
|
||||||
overlay = Overlay(name="cfgs", path=str(overlay_root), type="files", user_id=user.id)
|
|
||||||
session.add(overlay)
|
|
||||||
session.flush()
|
|
||||||
user_id = user.id
|
|
||||||
overlay_id = overlay.id
|
|
||||||
|
|
||||||
client = app.test_client()
|
|
||||||
with client.session_transaction() as sess:
|
|
||||||
sess["user_id"] = user_id
|
|
||||||
sess["pw_changed_at"] = datetime.now(UTC).isoformat()
|
|
||||||
return client, overlay_id
|
|
||||||
|
|
||||||
|
|
||||||
def test_edit_route_renders_full_page_without_modal_header(tmp_path, monkeypatch):
|
|
||||||
client, overlay_id = _auth_client_with_files_overlay(tmp_path, monkeypatch, "edit-full.db")
|
|
||||||
response = client.get(f"/overlays/{overlay_id}/files/edit?path=server.cfg")
|
|
||||||
text = response.get_data(as_text=True)
|
|
||||||
|
|
||||||
assert response.status_code == 200
|
|
||||||
assert "<!doctype html>" in text.lower() # full base.html rendered
|
|
||||||
assert 'href="/dashboard"' in text # nav present
|
|
||||||
assert 'class="files-editor-content"' in text
|
|
||||||
assert 'rcon_password' in text # content pre-filled
|
|
||||||
|
|
||||||
|
|
||||||
def test_edit_route_renders_fragment_with_modal_header(tmp_path, monkeypatch):
|
|
||||||
client, overlay_id = _auth_client_with_files_overlay(tmp_path, monkeypatch, "edit-fragment.db")
|
|
||||||
response = client.get(
|
|
||||||
f"/overlays/{overlay_id}/files/edit?path=server.cfg",
|
|
||||||
headers={"HX-Modal": "1"},
|
|
||||||
)
|
|
||||||
text = response.get_data(as_text=True)
|
|
||||||
|
|
||||||
assert response.status_code == 200
|
|
||||||
assert "<html" not in text # layoutless
|
|
||||||
assert 'class="primary-nav"' not in text
|
|
||||||
assert 'class="files-editor-content"' in text
|
|
||||||
assert "hostname" in text # content pre-filled
|
|
||||||
|
|
||||||
|
|
||||||
def test_edit_route_404s_for_missing_file(tmp_path, monkeypatch):
|
|
||||||
client, overlay_id = _auth_client_with_files_overlay(tmp_path, monkeypatch, "edit-404.db")
|
|
||||||
response = client.get(f"/overlays/{overlay_id}/files/edit?path=nonexistent.cfg")
|
|
||||||
assert response.status_code == 404
|
|
||||||
|
|
||||||
|
|
||||||
def test_edit_route_415s_for_non_editable_file(tmp_path, monkeypatch):
|
|
||||||
client, overlay_id = _auth_client_with_files_overlay(tmp_path, monkeypatch, "edit-415.db")
|
|
||||||
# Forge a non-editable file by writing binary garbage.
|
|
||||||
with session_scope() as s:
|
|
||||||
overlay = s.query(Overlay).filter_by(id=overlay_id).one()
|
|
||||||
from pathlib import Path
|
|
||||||
Path(overlay.path).joinpath("blob.bin").write_bytes(b"\x00\x01\x02\x03" * 1024)
|
|
||||||
|
|
||||||
response = client.get(f"/overlays/{overlay_id}/files/edit?path=blob.bin")
|
|
||||||
assert response.status_code == 415
|
|
||||||
|
|
||||||
|
|
||||||
def test_edit_route_400s_for_path_traversal(tmp_path, monkeypatch):
|
|
||||||
client, overlay_id = _auth_client_with_files_overlay(tmp_path, monkeypatch, "edit-400.db")
|
|
||||||
response = client.get(f"/overlays/{overlay_id}/files/edit?path=../../etc/passwd")
|
|
||||||
assert response.status_code == 400
|
|
||||||
|
|
||||||
|
|
||||||
def test_edit_route_404s_for_non_files_overlay(tmp_path, monkeypatch):
|
|
||||||
db_url = f"sqlite:///{tmp_path/'edit-script-overlay.db'}"
|
|
||||||
monkeypatch.setenv("DATABASE_URL", db_url)
|
|
||||||
app = create_app({"TESTING": True, "DATABASE_URL": db_url, "SECRET_KEY": "test"})
|
|
||||||
init_db()
|
|
||||||
with session_scope() as s:
|
|
||||||
user = User(username="alice", password_digest=hash_password("secret"), admin=False)
|
|
||||||
s.add(user)
|
|
||||||
s.flush()
|
|
||||||
overlay = Overlay(name="scripted", path=str(tmp_path), type="script", user_id=user.id)
|
|
||||||
s.add(overlay)
|
|
||||||
s.flush()
|
|
||||||
user_id = user.id
|
|
||||||
overlay_id = overlay.id
|
|
||||||
client = app.test_client()
|
|
||||||
with client.session_transaction() as sess:
|
|
||||||
sess["user_id"] = user_id
|
|
||||||
sess["pw_changed_at"] = datetime.now(UTC).isoformat()
|
|
||||||
|
|
||||||
response = client.get(f"/overlays/{overlay_id}/files/edit?path=anything.cfg")
|
|
||||||
assert response.status_code == 404
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run tests to verify they fail**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/test_url_addressable_modals.py -v -k edit_route`
|
|
||||||
|
|
||||||
Expected: 6 failures (route doesn't exist → 404 for all).
|
|
||||||
|
|
||||||
- [ ] **Step 3: Add the route**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/routes/files_routes.py`, append immediately after the `overlay_file_content` function (line 234):
|
|
||||||
|
|
||||||
```python
|
|
||||||
@bp.get("/overlays/<int:overlay_id>/files/edit")
|
|
||||||
@require_login
|
|
||||||
def overlay_file_edit_page(overlay_id: int):
|
|
||||||
"""Server-rendered editor page. Renders full-page by default or as a
|
|
||||||
layoutless modal fragment when the HX-Modal header is set (see the
|
|
||||||
inject_base_layout context processor in app.py)."""
|
|
||||||
user = current_user()
|
|
||||||
assert user is not None
|
|
||||||
sub_path = request.args.get("path", "")
|
|
||||||
|
|
||||||
result = _load_files_overlay(overlay_id, user)
|
|
||||||
if isinstance(result, Response):
|
|
||||||
return result
|
|
||||||
overlay = result
|
|
||||||
|
|
||||||
try:
|
|
||||||
target = safe_resolve_for_listing(overlay.path, sub_path)
|
|
||||||
except ValueError:
|
|
||||||
return Response("invalid path", status=400)
|
|
||||||
|
|
||||||
if not target.exists() or not target.is_file():
|
|
||||||
return Response(status=404)
|
|
||||||
if not is_editable(target):
|
|
||||||
return Response("not editable", status=415)
|
|
||||||
|
|
||||||
try:
|
|
||||||
content = target.read_text(encoding="utf-8")
|
|
||||||
except OSError:
|
|
||||||
return Response("read failed", status=500)
|
|
||||||
except UnicodeDecodeError:
|
|
||||||
return Response("not editable", status=415)
|
|
||||||
|
|
||||||
return render_template(
|
|
||||||
"overlay_file_editor.html",
|
|
||||||
overlay=overlay,
|
|
||||||
rel_path=sub_path,
|
|
||||||
content=content,
|
|
||||||
byte_count=len(content.encode("utf-8")),
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run tests to verify pass**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/test_url_addressable_modals.py -v`
|
|
||||||
|
|
||||||
Expected: 9 passed (3 from Task 1 + 6 new).
|
|
||||||
|
|
||||||
- [ ] **Step 5: Smoke-test the existing test suite for regressions**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/ -v --tb=short -q`
|
|
||||||
|
|
||||||
Expected: all tests pass. The context processor adds `base_layout` to every template render; existing templates ignore it (they all use `{% extends "base.html" %}` literally), so behavior is unchanged.
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/routes/files_routes.py l4d2web/tests/test_url_addressable_modals.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): GET /overlays/<id>/files/edit route
|
|
||||||
|
|
||||||
Server-renders the file editor as a real page. With HX-Modal:1 returns
|
|
||||||
a layoutless fragment for modal embedding; without it returns the full
|
|
||||||
standalone page. Mirrors overlay_file_content's path/editability checks.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Persistent modal slot in base.html
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/base.html`
|
|
||||||
|
|
||||||
The slot is a sibling of `<main>`, sitting at body scope so backdrop renders over everything.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the slot and script include**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/base.html`, modify the body section. After the closing `</main>` (currently line 39), insert the modal slot. After the `<script src="…/modal.js">` line (currently line 43), add the modal-router include.
|
|
||||||
|
|
||||||
The body section should look like this after the edit:
|
|
||||||
|
|
||||||
```html
|
|
||||||
<main class="container">
|
|
||||||
{% block content %}{% endblock %}
|
|
||||||
</main>
|
|
||||||
<dialog id="modal-container" class="modal modal-wide" aria-labelledby="modal-content-title">
|
|
||||||
<div id="modal-content"></div>
|
|
||||||
</dialog>
|
|
||||||
<script src="{{ url_for('static', filename='vendor/htmx.min.js') }}"></script>
|
|
||||||
<script src="{{ url_for('static', filename='js/csrf.js') }}"></script>
|
|
||||||
<script src="{{ url_for('static', filename='js/sse.js') }}"></script>
|
|
||||||
<script src="{{ url_for('static', filename='js/modal.js') }}"></script>
|
|
||||||
<script src="{{ url_for('static', filename='js/modal-router.js') }}"></script>
|
|
||||||
<script src="{{ url_for('static', filename='js/file-tree.js') }}"></script>
|
|
||||||
<script src="{{ url_for('static', filename='js/password-reveal.js') }}"></script>
|
|
||||||
<script defer src="{{ url_for('static', filename='js/console-history.js') }}"></script>
|
|
||||||
</body>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Create an empty modal-router.js stub**
|
|
||||||
|
|
||||||
So the new `<script src>` doesn't 404. Create `l4d2web/l4d2web/static/js/modal-router.js`:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// URL-addressable modal router (see docs/superpowers/specs/2026-05-17-url-addressable-modals-design.md).
|
|
||||||
// Implementation lands in subsequent tasks; this stub keeps base.html's
|
|
||||||
// script include from 404'ing during the staged rollout.
|
|
||||||
(function () {
|
|
||||||
"use strict";
|
|
||||||
})();
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Run existing tests for regressions**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/test_pages.py -v -q`
|
|
||||||
|
|
||||||
Expected: all pass. The added `<dialog>` is closed by default (no `open` attribute), so it's invisible and inert.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/base.html l4d2web/l4d2web/static/js/modal-router.js
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): persistent modal slot + router script stub in base.html
|
|
||||||
|
|
||||||
Adds <dialog id="modal-container"> with #modal-content slot at body
|
|
||||||
scope. Script stub created so the include doesn't 404; logic follows.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: `modal-router.js` — click intercept, open, fetch, show
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/modal-router.js`
|
|
||||||
|
|
||||||
This task wires the click → URL → fetch → show pipeline. Close/popstate/bootstrap come in Tasks 6 and 7.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Implement click intercept + openModal + fetchAndShow**
|
|
||||||
|
|
||||||
Replace `l4d2web/l4d2web/static/js/modal-router.js` with:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// URL-addressable modal router (see spec 2026-05-17-url-addressable-modals).
|
|
||||||
// Click intercept on a[data-modal] → ?modal=<path> in URL → htmx swap into
|
|
||||||
// #modal-content → showModal(). Close/popstate/bootstrap in later tasks.
|
|
||||||
(function () {
|
|
||||||
"use strict";
|
|
||||||
|
|
||||||
let currentModalPath = null; // race-guard against stale swaps
|
|
||||||
|
|
||||||
function openModal(path) {
|
|
||||||
const url = new URL(window.location.href);
|
|
||||||
url.searchParams.set("modal", path);
|
|
||||||
history.pushState({ modal: path }, "", url.toString());
|
|
||||||
fetchAndShow(path);
|
|
||||||
}
|
|
||||||
|
|
||||||
function fetchAndShow(path) {
|
|
||||||
currentModalPath = path;
|
|
||||||
if (typeof window.htmx === "undefined") {
|
|
||||||
console.error("[modal-router] htmx not loaded; cannot fetch modal");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
window.htmx.ajax("GET", path, {
|
|
||||||
target: "#modal-content",
|
|
||||||
swap: "innerHTML",
|
|
||||||
headers: { "HX-Modal": "1" },
|
|
||||||
}).then(() => {
|
|
||||||
// Race guard: if the user clicked again during the fetch, abandon
|
|
||||||
// this swap; the newer click will win.
|
|
||||||
if (currentModalPath !== path) return;
|
|
||||||
const dlg = document.getElementById("modal-container");
|
|
||||||
if (dlg && !dlg.open) dlg.showModal();
|
|
||||||
}).catch((err) => {
|
|
||||||
console.error("[modal-router] fetch failed", err);
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
document.addEventListener("click", (event) => {
|
|
||||||
const link = event.target.closest("a[data-modal]");
|
|
||||||
if (!link) return;
|
|
||||||
if (event.metaKey || event.ctrlKey || event.shiftKey || event.altKey) return;
|
|
||||||
if (event.button !== 0) return;
|
|
||||||
const href = link.getAttribute("href");
|
|
||||||
if (!href) return;
|
|
||||||
event.preventDefault();
|
|
||||||
openModal(href);
|
|
||||||
});
|
|
||||||
|
|
||||||
// Public API — used by files-overlay.js to open the editor from row clicks
|
|
||||||
// that aren't a literal <a data-modal> (existing event delegation).
|
|
||||||
window.openModal = openModal;
|
|
||||||
})();
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Chromium verification**
|
|
||||||
|
|
||||||
Steps to verify manually (or via the user's Chromium tooling). The new editor route is reachable but not yet linked from the file tree — use a temporary `<a>` for the smoke test.
|
|
||||||
|
|
||||||
1. Start the dev server: `cd l4d2web && uv run flask --app l4d2web.app:create_app run --debug --port 5000` (or whatever the project's dev-server idiom is — confirm at implementation time).
|
|
||||||
2. Log in and create a `files` overlay with a `.cfg` file in it (or use an existing one).
|
|
||||||
3. Open dev tools → Console.
|
|
||||||
4. In the console, run: `window.openModal('/overlays/<id>/files/edit?path=server.cfg')` (substitute real id).
|
|
||||||
5. **Expected:** URL gains `?modal=/overlays/<id>/files/edit?path=server.cfg`. Modal opens with the editor markup inside. (CodeMirror not yet mounted — that's Task 8 — so you'll see the raw `<textarea>` with content.)
|
|
||||||
6. Network tab: confirm the request to `/overlays/<id>/files/edit?path=server.cfg` carries the header `HX-Modal: 1`.
|
|
||||||
7. Negative check: confirm the build-status poll (`/overlays/<id>/build-status` every 2s) does **not** carry `HX-Modal: 1`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/modal-router.js
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): click intercept + openModal + fetchAndShow
|
|
||||||
|
|
||||||
a[data-modal] clicks push ?modal=<path> to URL and trigger htmx.ajax
|
|
||||||
into #modal-content with the HX-Modal header. window.openModal exposed
|
|
||||||
for non-<a> trigger sites (files-overlay row clicks). Race guard via
|
|
||||||
currentModalPath token. Close/popstate/bootstrap follow.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 6: `modal-router.js` — close, popstate, dismiss, Esc
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/modal-router.js`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add close, popstate, dismiss-click, dialog cancel handlers**
|
|
||||||
|
|
||||||
In `modal-router.js`, replace the IIFE body with the expanded version. Insert the new function and listeners after `fetchAndShow` and before the `document.addEventListener("click", …)` for opens:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
function closeModal() {
|
|
||||||
currentModalPath = null;
|
|
||||||
const dlg = document.getElementById("modal-container");
|
|
||||||
if (dlg && dlg.open) dlg.close();
|
|
||||||
const url = new URL(window.location.href);
|
|
||||||
if (url.searchParams.has("modal")) {
|
|
||||||
url.searchParams.delete("modal");
|
|
||||||
history.pushState({}, "", url.toString());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
window.addEventListener("popstate", () => {
|
|
||||||
const path = new URL(window.location.href).searchParams.get("modal");
|
|
||||||
if (path) {
|
|
||||||
fetchAndShow(path);
|
|
||||||
} else {
|
|
||||||
currentModalPath = null;
|
|
||||||
const dlg = document.getElementById("modal-container");
|
|
||||||
if (dlg && dlg.open) dlg.close();
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
// Dismiss triggers inside modal content
|
|
||||||
document.addEventListener("click", (event) => {
|
|
||||||
if (event.target.closest("[data-modal-dismiss]")) {
|
|
||||||
event.preventDefault();
|
|
||||||
closeModal();
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
// Esc key fires the dialog's 'cancel' event; sync URL.
|
|
||||||
document.addEventListener("DOMContentLoaded", () => {
|
|
||||||
const dlg = document.getElementById("modal-container");
|
|
||||||
if (dlg) {
|
|
||||||
dlg.addEventListener("cancel", (event) => {
|
|
||||||
event.preventDefault(); // prevent default close so we control URL sync
|
|
||||||
closeModal();
|
|
||||||
});
|
|
||||||
// Backdrop click — native <dialog> doesn't dismiss on backdrop; replicate.
|
|
||||||
dlg.addEventListener("click", (event) => {
|
|
||||||
if (event.target === dlg) closeModal();
|
|
||||||
});
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
window.closeModal = closeModal;
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Chromium verification**
|
|
||||||
|
|
||||||
1. From the prior task's setup (modal open via `window.openModal(...)`):
|
|
||||||
2. Click the `[data-modal-dismiss]` Cancel button (or the × in the modal header). **Expected:** modal closes, URL loses `?modal=…`, underlying overlay page intact and still polling build-status.
|
|
||||||
3. Open the modal again. Press Esc. **Expected:** same close behavior.
|
|
||||||
4. Open the modal again. Click on the backdrop outside the dialog content. **Expected:** same close behavior.
|
|
||||||
5. Open the modal again. Click browser back button. **Expected:** modal closes, URL clears.
|
|
||||||
6. Now click forward. **Expected:** modal re-opens with the same file's content.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/modal-router.js
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): close, popstate, dismiss-click, Esc, backdrop-click
|
|
||||||
|
|
||||||
closeModal pops ?modal= from URL via pushState. popstate handler reacts
|
|
||||||
to back/forward by fetching or closing. [data-modal-dismiss] click,
|
|
||||||
native dialog 'cancel' (Esc), and backdrop click all funnel to
|
|
||||||
closeModal. window.closeModal exposed for callers.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 7: `modal-router.js` — initial-load bootstrap
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/modal-router.js`
|
|
||||||
|
|
||||||
This makes refreshing on a `?modal=…` URL reopen the modal automatically — the headline feature.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add bootstrap on DOMContentLoaded**
|
|
||||||
|
|
||||||
Extend the existing `DOMContentLoaded` listener in `modal-router.js` (added in Task 6) so it also bootstraps from URL. Replace the block:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
document.addEventListener("DOMContentLoaded", () => {
|
|
||||||
const dlg = document.getElementById("modal-container");
|
|
||||||
if (dlg) {
|
|
||||||
dlg.addEventListener("cancel", (event) => {
|
|
||||||
event.preventDefault();
|
|
||||||
closeModal();
|
|
||||||
});
|
|
||||||
dlg.addEventListener("click", (event) => {
|
|
||||||
if (event.target === dlg) closeModal();
|
|
||||||
});
|
|
||||||
}
|
|
||||||
const initialPath = new URL(window.location.href).searchParams.get("modal");
|
|
||||||
if (initialPath) {
|
|
||||||
fetchAndShow(initialPath);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Chromium verification**
|
|
||||||
|
|
||||||
1. Open the modal via `window.openModal('/overlays/<id>/files/edit?path=server.cfg')`.
|
|
||||||
2. Hit the browser refresh button. **Expected:** page reloads, modal re-opens automatically with the same file's content. URL retains `?modal=…`.
|
|
||||||
3. Copy the full URL. Open a new incognito window, log in, paste the URL. **Expected:** lands on the overlay page with the modal already open.
|
|
||||||
4. Negative: visit `/overlays/<id>` (no `?modal=`). **Expected:** modal does not open; underlying page renders normally.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/modal-router.js
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): DOMContentLoaded bootstrap reopens modal from ?modal= URL
|
|
||||||
|
|
||||||
Refresh and share-link flows both work — the modal-state URL is the
|
|
||||||
canonical shareable artifact for "this overlay with this file open."
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 8: CodeMirror re-init after HTMX swap
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/editor.js`
|
|
||||||
|
|
||||||
`editor.js` currently mounts CM6 once at `DOMContentLoaded`. After modal swap-in, the new `<textarea>` is unmounted. Fix: expose `initEditors(root)` and call it from an `htmx:afterSwap` listener.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Refactor `init` to accept a root**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/static/js/editor.js`, replace the existing `init` function (currently around lines 93-100) and the bootstrap-at-end (lines 101-105) with:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
function initEditors(root) {
|
|
||||||
const scope = root || document;
|
|
||||||
for (const ta of scope.querySelectorAll("textarea[data-editor-language]")) {
|
|
||||||
mountOne(ta).catch(err => {
|
|
||||||
console.error("[editor] mount failed", err);
|
|
||||||
unhideTextarea(ta);
|
|
||||||
});
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (document.readyState === "loading") {
|
|
||||||
document.addEventListener("DOMContentLoaded", () => initEditors(document));
|
|
||||||
} else {
|
|
||||||
initEditors(document);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Re-init editors that arrive via HTMX swap (modal content, etc.).
|
|
||||||
document.body.addEventListener("htmx:afterSwap", (event) => {
|
|
||||||
if (event.target && event.target.id === "modal-content") {
|
|
||||||
initEditors(event.target);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
// Expose for callers that need to re-mount imperatively.
|
|
||||||
if (window.__editor) {
|
|
||||||
window.__editor.initEditors = initEditors;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Chromium verification**
|
|
||||||
|
|
||||||
1. Open the editor modal via the URL flow from Task 7 (`/overlays/<id>?modal=/overlays/<id>/files/edit?path=server.cfg`).
|
|
||||||
2. **Expected:** CM6 renders inside the modal — syntax-highlighted content, NOT a raw textarea. Byte count matches actual content size. Language dropdown reflects auto-detected language (srccfg for .cfg, bash for .sh).
|
|
||||||
3. Type into the editor. **Expected:** edits are reflected; UI is responsive.
|
|
||||||
4. Close the modal, re-open. **Expected:** CM6 re-mounts cleanly each time. No duplicate editor instances visible (only one rendered).
|
|
||||||
5. Open dev tools → Network → confirm no console errors mentioning `mount failed` or duplicate-init warnings.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/editor.js
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(editor): re-init CM6 on htmx:afterSwap into #modal-content
|
|
||||||
|
|
||||||
editor.js exposes initEditors(root) and listens for htmx:afterSwap so
|
|
||||||
editor textareas that arrive via modal swap get CM6 mounted. The
|
|
||||||
DOMContentLoaded path remains for first-paint mounting.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 9: Wire file-tree edit triggers to use `window.openModal`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/files-overlay.js` (specific code path; rest unchanged)
|
|
||||||
|
|
||||||
`files-overlay.js` currently populates the empty inline `#files-editor-modal` dialog when a file row is clicked. Replace that code path with a call to `window.openModal(editUrl)`. The save flow (also in this file) stays untouched.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Locate the "open editor" entry point**
|
|
||||||
|
|
||||||
Run: `grep -n "files-editor-modal\|showModal\|filesEditor\|getValue\|files-editor-content" l4d2web/l4d2web/static/js/files-overlay.js`
|
|
||||||
|
|
||||||
Identify the function that handles a file-row click and currently calls `showModal()` on `#files-editor-modal`, plus the code that stuffs filename + content + language into the empty markup. That whole code path becomes a single `window.openModal(editUrl)` call.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Replace the inline-dialog open path**
|
|
||||||
|
|
||||||
In that function, replace the block that populates and shows the inline dialog with:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const editUrl = `/overlays/${overlayId}/files/edit?path=${encodeURIComponent(relPath)}`;
|
|
||||||
if (typeof window.openModal === "function") {
|
|
||||||
window.openModal(editUrl);
|
|
||||||
} else {
|
|
||||||
// Graceful fallback if modal-router didn't load — full-page navigation
|
|
||||||
// still hits the same route and renders the standalone editor page.
|
|
||||||
window.location.href = editUrl;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Delete the code that previously read the file via `/files/content` JSON endpoint and set `filenameInput.value`, the language dropdown, byte-count text, `controller.setValue(...)`, and called `showModal()`. The new route delivers all of that as server-rendered HTML.
|
|
||||||
|
|
||||||
Keep untouched:
|
|
||||||
- The **save** handler (POSTs to `/overlays/<id>/files/save` reading `window.__filesEditor.getValue()` — still works inside the modal because CM6 re-init from Task 8 sets `window.__filesEditor` on the new instance)
|
|
||||||
- The **delete** button handler (POSTs to `/overlays/<id>/files/delete`)
|
|
||||||
- The **download** link (now a server-rendered `<a href>` in the template)
|
|
||||||
- The rename hint, replace-file flow, and any other in-modal interactions — these continue to bind on the editor element inside `#modal-content` via the existing event delegation
|
|
||||||
|
|
||||||
- [ ] **Step 3: Chromium verification**
|
|
||||||
|
|
||||||
1. On `/overlays/<id>` (a `files` overlay's page), click an editable file (e.g. `server.cfg`).
|
|
||||||
2. **Expected:** URL updates to `?modal=/overlays/<id>/files/edit?path=server.cfg`. Modal opens with CM6 editor, content pre-filled, language auto-detected.
|
|
||||||
3. Edit content and click **Save**. **Expected:** save succeeds (network request to `/overlays/<id>/files/save` returns 200), file persists.
|
|
||||||
4. Refresh the page (still on the `?modal=` URL). **Expected:** modal reopens with the *saved* (updated) content.
|
|
||||||
5. Click Cancel. **Expected:** modal closes; URL loses `?modal=`.
|
|
||||||
6. Race test: click file A, then immediately click file B before A's swap arrives. **Expected:** modal ends in file B's state, not file A's.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/files-overlay.js
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(files): file-row click opens editor via URL-addressable modal
|
|
||||||
|
|
||||||
files-overlay.js no longer fetches /files/content JSON and populates
|
|
||||||
the inline <dialog>; it calls window.openModal(<edit-url>) which the
|
|
||||||
modal-router handles end-to-end. Save flow unchanged — CM6 re-init on
|
|
||||||
htmx:afterSwap re-binds window.__filesEditor on the new instance.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 10: Remove the dead inline editor dialog + final verification
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/overlay_detail.html` (delete lines 164-228 — the `{% if files_can_edit %} <dialog id="files-editor-modal">…</dialog> {% endif %}` block)
|
|
||||||
- Modify: `l4d2web/tests/test_url_addressable_modals.py` (optional: add a test that the inline dialog is gone)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write a "dialog removed" assertion test**
|
|
||||||
|
|
||||||
Append to `l4d2web/tests/test_url_addressable_modals.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_overlay_detail_no_longer_includes_inline_editor_dialog(tmp_path, monkeypatch):
|
|
||||||
client, overlay_id = _auth_client_with_files_overlay(tmp_path, monkeypatch, "no-inline-dialog.db")
|
|
||||||
response = client.get(f"/overlays/{overlay_id}")
|
|
||||||
text = response.get_data(as_text=True)
|
|
||||||
|
|
||||||
assert response.status_code == 200
|
|
||||||
# The inline editor dialog is replaced by the URL-addressable route.
|
|
||||||
assert 'id="files-editor-modal"' not in text
|
|
||||||
# The persistent modal-container slot from base.html *is* present.
|
|
||||||
assert 'id="modal-container"' in text
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run test to verify it fails**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/test_url_addressable_modals.py::test_overlay_detail_no_longer_includes_inline_editor_dialog -v`
|
|
||||||
|
|
||||||
Expected: FAIL — `id="files-editor-modal"` is still in `overlay_detail.html`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Remove the inline dialog**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/overlay_detail.html`, delete lines 164-228 inclusive — the entire `{% if files_can_edit %} … <dialog id="files-editor-modal"> … </dialog> … {% endif %}` block.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Run all backend tests for regressions**
|
|
||||||
|
|
||||||
Run: `cd l4d2web && uv run pytest tests/ -v --tb=short -q`
|
|
||||||
|
|
||||||
Expected: all tests pass. The new assertion passes; nothing else regresses.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run the full Chromium verification matrix from the spec**
|
|
||||||
|
|
||||||
Walk through all 10 checks from `docs/superpowers/specs/2026-05-17-url-addressable-modals-design.md` ## Verification:
|
|
||||||
|
|
||||||
1. Direct link works as full page — paste `/overlays/<id>/files/edit?path=server.cfg` in a new tab, no `?modal=`, full-page editor renders, save works.
|
|
||||||
2. Modal open from overlay — click edit in the file tree, modal opens, URL gets `?modal=`.
|
|
||||||
3. Refresh in modal state — F5 reopens modal on the same overlay with build-status polling resumed.
|
|
||||||
4. Share URL — paste in incognito, lands with modal open.
|
|
||||||
5. Back button — closes modal, URL clears, underlying page intact.
|
|
||||||
6. Forward button — reopens modal with same content.
|
|
||||||
7. Esc to close — URL syncs.
|
|
||||||
8. Race on rapid clicks — final state is the last-clicked file.
|
|
||||||
9. No HTMX poll misclassification — build-status polls don't carry `HX-Modal:1`.
|
|
||||||
10. Existing inline dialogs unaffected — rename, delete, new-folder, conflict-resolution still open from `[data-modal-open]` triggers (these don't use `[data-modal]`).
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/overlay_detail.html l4d2web/tests/test_url_addressable_modals.py
|
|
||||||
git commit -m "$(cat <<'EOF'
|
|
||||||
feat(modals): remove inline editor dialog, complete pilot migration
|
|
||||||
|
|
||||||
overlay_detail.html no longer carries the empty <dialog
|
|
||||||
id="files-editor-modal"> placeholder — content lives at
|
|
||||||
/overlays/<id>/files/edit?path=… and renders via the URL-addressable
|
|
||||||
modal pipeline. Pilot complete; spec follow-ups (save→hx-post, other
|
|
||||||
modals, server-side URL composition) deferred.
|
|
||||||
|
|
||||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||||||
EOF
|
|
||||||
)"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Self-review notes (against the spec)
|
|
||||||
|
|
||||||
- **Architecture (Approach C):** Tasks 5, 6, 7 implement the JS module exactly as specified — ~60 lines including comments and exposed API.
|
|
||||||
- **Layout switch via `HX-Modal: 1` header:** Task 1 implements it as a context processor; Task 1's third test explicitly guards against misclassifying HTMX's built-in `HX-Request` header.
|
|
||||||
- **`<dialog>` for show/hide:** Task 4 adds the persistent slot; Task 5 uses `showModal()`; Tasks 6/7 use `close()` and native `cancel` event.
|
|
||||||
- **Editor as a real page:** Tasks 2 + 3 cover this — separate template, separate route, dual-mode rendering.
|
|
||||||
- **CodeMirror re-init:** Task 8 covers `initEditors(root)` exposure + `htmx:afterSwap` listener.
|
|
||||||
- **Save flow stays AJAX:** Task 9 preserves the save path while replacing the *open* path.
|
|
||||||
- **Race guard, dismiss attrs, Esc, backdrop click, popstate, bootstrap:** all in Tasks 5–7.
|
|
||||||
- **Out of scope items** (binary replace, other modals, save-flow migration, server-side URL composition): not touched by any task — matches the spec's deferral.
|
|
||||||
|
|
||||||
No placeholders, no `TODO`s, no "implement appropriately." Every step has exact paths and exact code.
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,350 +0,0 @@
|
||||||
# Enable srcds log streaming + temp UDP capture listener
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
We want to start gathering data about what actually happens on our L4D2
|
|
||||||
servers (round boundaries, kills, team selection, lobby arrivals) so we can
|
|
||||||
later build round/match tracking and visualizations. The Source engine's
|
|
||||||
HL Log Standard UDP streaming (`logaddress_add`) is the right primary
|
|
||||||
source — it's built into srcds, no plugin required, well-documented (see
|
|
||||||
[HL Log Standard](https://developer.valvesoftware.com/wiki/HL_Log_Standard)).
|
|
||||||
|
|
||||||
This change does **two** things:
|
|
||||||
|
|
||||||
1. **Make every managed L4D2 server stream its logs** to a known UDP
|
|
||||||
endpoint, by auto-injecting `log on`, `mp_logdetail 3`, and
|
|
||||||
`logaddress_add <addr>` into generated `server.cfg` — alongside
|
|
||||||
`rcon_password`, where users can't accidentally break it.
|
|
||||||
2. **Stand up a deliberately disposable UDP listener** in the web app
|
|
||||||
that writes raw log lines to flat files (one per source address),
|
|
||||||
so we can observe a few days of real traffic before committing to
|
|
||||||
any schema or reducer design.
|
|
||||||
|
|
||||||
The listener is explicitly a Phase-1 *capture-only* tool. It does **not**
|
|
||||||
parse, reduce, store in DB, or render anything. That's the next plan,
|
|
||||||
once we have evidence of what L4D2 actually emits on our servers.
|
|
||||||
|
|
||||||
## Scope (in / out)
|
|
||||||
|
|
||||||
**In scope**
|
|
||||||
- Inject 3 cvars into the generated `server.cfg` at facade level.
|
|
||||||
- Config-driven listener address (env var, sensible default).
|
|
||||||
- UDP listener daemon thread, sibling of `live_state_poller`.
|
|
||||||
- Flat-file capture, one file per `(srcip, srcport)`.
|
|
||||||
- Dev-server integration (capture dir under `LEFT4ME_ROOT`).
|
|
||||||
- Production wiring (capture dir under `/var/lib/left4me/`, systemd
|
|
||||||
unit changes if any).
|
|
||||||
- Minimal smoke test for the listener.
|
|
||||||
|
|
||||||
**Out of scope (Phase 2+)**
|
|
||||||
- Parsing log lines into structured events.
|
|
||||||
- DB schema (`RawLogLine`, `LogEvent`, `MatchSession`, `Round`, etc.).
|
|
||||||
- Mapping source addr → `Server` row reliably (we have the data
|
|
||||||
in the flat-file name; we don't *need* to resolve it yet).
|
|
||||||
- `sv_logsecret` authentication (single-host loopback, defer).
|
|
||||||
- Any UI.
|
|
||||||
|
|
||||||
## Design
|
|
||||||
|
|
||||||
### 1. Auto-injected cvars in `server.cfg`
|
|
||||||
|
|
||||||
**File:** `l4d2web/l4d2web/services/l4d2_facade.py` (around line 41-48,
|
|
||||||
where `rcon_password` is appended after the user blueprint config).
|
|
||||||
|
|
||||||
After the existing `rcon_password` append, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
config_lines.append("log on")
|
|
||||||
config_lines.append("mp_logdetail 3")
|
|
||||||
log_addr = current_app.config["LOG_LISTENER_ADDR"] # e.g. "127.0.0.1:27800"
|
|
||||||
config_lines.append(f"logaddress_add {log_addr}")
|
|
||||||
```
|
|
||||||
|
|
||||||
Notes:
|
|
||||||
- Order matters: cvars must come *after* anything in the user
|
|
||||||
blueprint so users can't override them.
|
|
||||||
- `log on` is idempotent; safe to re-issue.
|
|
||||||
- `logaddress_add` is *additive* — re-running it just re-registers
|
|
||||||
the same destination. srcds tolerates this.
|
|
||||||
|
|
||||||
### 2. Listener address configuration
|
|
||||||
|
|
||||||
**File:** `l4d2web/l4d2web/config.py`
|
|
||||||
|
|
||||||
Add to `DEFAULT_CONFIG` (and the env-var loader):
|
|
||||||
|
|
||||||
```python
|
|
||||||
"LOG_LISTENER_ADDR": "127.0.0.1:27800", # what srcds logs to
|
|
||||||
"LOG_LISTENER_BIND": "127.0.0.1:27800", # what our listener binds
|
|
||||||
"LOG_LISTENER_ENABLED": True,
|
|
||||||
"LOG_CAPTURE_DIR": "/var/lib/left4me/captures", # overridden in dev
|
|
||||||
```
|
|
||||||
|
|
||||||
Port **27800** chosen to avoid:
|
|
||||||
- SRCDS server range 27015–27050
|
|
||||||
- Steam client range 27000–27015
|
|
||||||
- Steam master server range 27010–27050
|
|
||||||
|
|
||||||
Override env var: `LEFT4ME_LOG_LISTENER_ADDR`, `LEFT4ME_LOG_LISTENER_BIND`,
|
|
||||||
`LEFT4ME_LOG_CAPTURE_DIR`.
|
|
||||||
|
|
||||||
**Dev override:** `scripts/dev-server.py` already sets
|
|
||||||
`LEFT4ME_ROOT=.tmp/dev-server` — extend it to also set
|
|
||||||
`LEFT4ME_LOG_CAPTURE_DIR=$LEFT4ME_ROOT/captures` so dev captures live
|
|
||||||
under `.tmp/` and don't pollute `/var/lib/left4me`.
|
|
||||||
|
|
||||||
### 3. The listener
|
|
||||||
|
|
||||||
**New file:** `l4d2web/l4d2web/services/log_listener.py`
|
|
||||||
|
|
||||||
Pattern: copy the daemon-thread shape from
|
|
||||||
`live_state_poller.py:230-245`. Single global guard; one thread.
|
|
||||||
|
|
||||||
Sketch:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def start_log_listener(app):
|
|
||||||
if not app.config["LOG_LISTENER_ENABLED"]:
|
|
||||||
return
|
|
||||||
if _started: # global guard (match poller pattern)
|
|
||||||
return
|
|
||||||
bind = app.config["LOG_LISTENER_BIND"]
|
|
||||||
capture_dir = Path(app.config["LOG_CAPTURE_DIR"])
|
|
||||||
capture_dir.mkdir(parents=True, exist_ok=True)
|
|
||||||
t = threading.Thread(
|
|
||||||
target=_listener_loop,
|
|
||||||
args=(bind, capture_dir),
|
|
||||||
name="left4me-log-listener",
|
|
||||||
daemon=True,
|
|
||||||
)
|
|
||||||
t.start()
|
|
||||||
|
|
||||||
def _listener_loop(bind: str, capture_dir: Path) -> None:
|
|
||||||
host, port = bind.rsplit(":", 1)
|
|
||||||
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
|
|
||||||
sock.bind((host, int(port)))
|
|
||||||
while True:
|
|
||||||
data, (srcip, srcport) = sock.recvfrom(4096)
|
|
||||||
# srcds log packets: 0xFF 0xFF 0xFF 0xFF 'R' or 'S', then body, then trailing 0x00 0x0A
|
|
||||||
# See HL Log Standard. We strip nothing on first pass — write raw.
|
|
||||||
path = capture_dir / f"{srcip}-{srcport}.log"
|
|
||||||
with path.open("ab") as f:
|
|
||||||
f.write(data)
|
|
||||||
```
|
|
||||||
|
|
||||||
Wiring (`l4d2web/l4d2web/app.py:157`, alongside the poller):
|
|
||||||
|
|
||||||
```python
|
|
||||||
if not app.config.get("TESTING"):
|
|
||||||
start_live_state_poller(app)
|
|
||||||
start_log_listener(app)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. UDP packet structure note
|
|
||||||
|
|
||||||
srcds log packets aren't bare text — they have a small header:
|
|
||||||
|
|
||||||
- 4 bytes of `0xFF` (out-of-band marker)
|
|
||||||
- 1 byte type: `'R'` (no secret) or `'S'` (with `sv_logsecret`)
|
|
||||||
- If `'S'`: 4-byte little-endian secret
|
|
||||||
- Body: ASCII log line including the `L mm/dd/yyyy ...` prefix
|
|
||||||
- Trailing `0x00 0x0A` (null + newline)
|
|
||||||
|
|
||||||
For capture-only we just **write the raw bytes**; later parsers will
|
|
||||||
strip the header. The header is also useful diagnostic info (it tells
|
|
||||||
us whether `sv_logsecret` made it through).
|
|
||||||
|
|
||||||
### 5. Capture file format
|
|
||||||
|
|
||||||
One file per `(srcip, srcport)`, name `<ip>-<port>.log`, append mode,
|
|
||||||
unbuffered byte writes. Rotation is **out of scope** — these are
|
|
||||||
short-lived (days), and operators can `rm` them manually. If a file
|
|
||||||
gets surprisingly large, that itself is data.
|
|
||||||
|
|
||||||
### 6. Restart implications
|
|
||||||
|
|
||||||
Cvars in `server.cfg` are read by srcds at instance startup
|
|
||||||
(`instances.py:54,87`). Existing running servers **will not pick up
|
|
||||||
the new log destination** until they are restarted via
|
|
||||||
`l4d2ctl initialize` or a process restart. Document this in the
|
|
||||||
verification section — don't try to be clever about live-applying.
|
|
||||||
|
|
||||||
## Critical files
|
|
||||||
|
|
||||||
**To modify:**
|
|
||||||
- `l4d2web/l4d2web/services/l4d2_facade.py` — inject 3 cvars after `rcon_password`
|
|
||||||
- `l4d2web/l4d2web/config.py` — add 4 new config keys with env overrides
|
|
||||||
- `l4d2web/l4d2web/app.py` — start listener thread next to poller
|
|
||||||
- `scripts/dev-server.py` — set `LEFT4ME_LOG_CAPTURE_DIR` under `.tmp/`
|
|
||||||
|
|
||||||
**To create:**
|
|
||||||
- `l4d2web/l4d2web/services/log_listener.py` — UDP listener thread
|
|
||||||
|
|
||||||
**To reference (read-only):**
|
|
||||||
- `l4d2web/l4d2web/services/live_state_poller.py:230-245` — thread pattern
|
|
||||||
- `l4d2web/l4d2web/services/l4d2_facade.py:41-48` — rcon_password injection pattern
|
|
||||||
- `l4d2host/l4d2host/instances.py:54,87` — confirms server.cfg is generated at init time
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-web.service` — may need a
|
|
||||||
read-write path entry if `ReadWritePaths=` is set restrictively (check before
|
|
||||||
modifying)
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
End-to-end smoke test, on the dev box first:
|
|
||||||
|
|
||||||
1. **Static checks**
|
|
||||||
```bash
|
|
||||||
cd /Users/mwiegand/Projekte/left4me
|
|
||||||
uv run --project l4d2web ruff check l4d2web
|
|
||||||
uv run --project l4d2web pytest l4d2web/tests -k "facade or config" -x
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Cvar injection unit test**
|
|
||||||
- Add a test in `l4d2web/tests/` that calls `build_server_spec_payload`
|
|
||||||
and asserts the generated config lines include `log on`,
|
|
||||||
`mp_logdetail 3`, and a `logaddress_add` line whose target matches
|
|
||||||
`LOG_LISTENER_ADDR`.
|
|
||||||
|
|
||||||
3. **Listener smoke test**
|
|
||||||
- Run dev server: `scripts/dev-server.py`
|
|
||||||
- In another shell, fake an srcds log packet:
|
|
||||||
```bash
|
|
||||||
printf '\xff\xff\xff\xff' \
|
|
||||||
'RL 05/19/2026 - 14:23:11: "Test<1><STEAM_1:0:1><>" connected, address "127.0.0.1:27015"\x00\x0a' \
|
|
||||||
| nc -u -w1 127.0.0.1 27800
|
|
||||||
```
|
|
||||||
- Confirm `ls .tmp/dev-server/captures/` shows a `127.0.0.1-<srcport>.log`.
|
|
||||||
- Confirm the file contains the bytes (use `xxd` to inspect the
|
|
||||||
header + body shape).
|
|
||||||
|
|
||||||
4. **Live end-to-end on one production server**
|
|
||||||
- Pick **one** server (least-busy), trigger a re-init via the web
|
|
||||||
UI so `server.cfg` is regenerated with the new cvars.
|
|
||||||
- Verify via RCON: `logaddress_list` should show our address.
|
|
||||||
- Connect to the server, run around. Confirm a file appears in
|
|
||||||
`/var/lib/left4me/captures/` with the server's source IP.
|
|
||||||
- `tail -f` and verify HL-Log-Standard lines (`L ... : "..."
|
|
||||||
entered the game`, `Loading map`, `World triggered "round_start"`).
|
|
||||||
|
|
||||||
5. **What we are explicitly NOT verifying yet**
|
|
||||||
- Parsing correctness — there's no parser.
|
|
||||||
- Reliable server-id mapping — we have srcip/srcport in the
|
|
||||||
filename, that's enough for now.
|
|
||||||
- Long-running stability past a few days — listener is temp.
|
|
||||||
|
|
||||||
## Post-deploy verification findings (2026-05-20)
|
|
||||||
|
|
||||||
The plan above describes what we INTENDED to ship. This section
|
|
||||||
documents what we ACTUALLY learned, including a wrong intermediate
|
|
||||||
conclusion that was later corrected.
|
|
||||||
|
|
||||||
### What worked
|
|
||||||
|
|
||||||
Listener deployed and confirmed bound (`ss -ulnp` showed
|
|
||||||
`gunicorn:28000`). Servers re-initialized; `server.cfg` had
|
|
||||||
`log on` / `logaddress_add 127.0.0.1:28000` after `rcon_password`.
|
|
||||||
Listener proven healthy end-to-end with a local `nc` probe writing
|
|
||||||
a capture file.
|
|
||||||
|
|
||||||
### What didn't (and why)
|
|
||||||
|
|
||||||
Captures stayed empty even during active play. Symptoms:
|
|
||||||
|
|
||||||
1. `logaddress_list` via RCON → `1 entry: 127.0.0.1:28000` ✓
|
|
||||||
2. `log` via RCON → `currently logging to: file, console, udp` ✓
|
|
||||||
3. Local `.log` files in
|
|
||||||
`/var/lib/left4me/runtime/<n>/merged/left4dead2/logs/` grow
|
|
||||||
normally; rich gameplay events show in `journalctl -u
|
|
||||||
left4me-server@<n>` (bot connect, team join, spawn, character
|
|
||||||
pick — full HL-Log-Standard verbosity) ✓
|
|
||||||
4. `tcpdump -i lo udp port 28000` during rcon say/echo/status bursts
|
|
||||||
→ **0 packets** ✗
|
|
||||||
5. `tcpdump -i any host 127.0.0.1 and udp` → still 0 ✗
|
|
||||||
6. Toggle `log off` / `log on` live, `sv_logflush 1` → no effect ✗
|
|
||||||
7. Tested `SocketBindAllow=udp:32768-60999` drop-in (suspected the
|
|
||||||
ephemeral source-port bind was being rejected) → still 0
|
|
||||||
packets. Drop-in rolled back ✗
|
|
||||||
8. `strace -p <srcds_linux> -e sendto,sendmsg` → **zero sendto
|
|
||||||
calls toward the destination** ✗
|
|
||||||
|
|
||||||
A premature conclusion was reached and committed as `46bba0d`
|
|
||||||
("L4D2 logaddress UDP emit is dead"). User pushed back, asked for
|
|
||||||
verification. Research found multiple production HLstatsX:CE
|
|
||||||
instances running L4D2 stats successfully — disproving the
|
|
||||||
engine-bug theory.
|
|
||||||
|
|
||||||
### Real cause
|
|
||||||
|
|
||||||
Re-registered logaddress at a non-loopback IP (`172.30.0.5:28000`
|
|
||||||
on the wireguard interface) and reran the test. **8 packets in 12
|
|
||||||
seconds**, each a properly framed HL-Log-Standard payload —
|
|
||||||
including `Console<0>" say "wg-test-1"`, all the rcon-from lines,
|
|
||||||
and live poller status calls.
|
|
||||||
|
|
||||||
**The Source engine silently drops `logaddress` destinations in
|
|
||||||
`127.0.0.0/8`.** Registration succeeds (data-structure op), the
|
|
||||||
cvar API reports "logging to: udp", but the engine's send loop
|
|
||||||
filters out loopback destinations and never calls sendto for them.
|
|
||||||
This is presumably an anti-self-loop / anti-amplification measure
|
|
||||||
that I have not seen documented in any Valve or community source.
|
|
||||||
|
|
||||||
Everyone else using `logaddress` for stat tracking puts the
|
|
||||||
collector on a *separate host* or at minimum a different interface
|
|
||||||
IP — they never hit this. We're the unusual case of co-locating
|
|
||||||
the listener with the gameserver and naively pointing at
|
|
||||||
`127.0.0.1`.
|
|
||||||
|
|
||||||
### Fix
|
|
||||||
|
|
||||||
- `LOG_LISTENER_BIND` default → `0.0.0.0:28000` (accept on any
|
|
||||||
interface).
|
|
||||||
- `LOG_LISTENER_ADDR` default → `""` (empty). Production env file
|
|
||||||
MUST set this to a non-loopback IP. Dev gets a safe no-op
|
|
||||||
(cfg injector skips emitting log cvars when the address is
|
|
||||||
empty).
|
|
||||||
- Production `web.env`: set
|
|
||||||
`LOG_LISTENER_ADDR=<host-non-loopback-ip>:28000`. The host's
|
|
||||||
public interface IP works; the kernel's same-host routing
|
|
||||||
optimization keeps the actual traffic on `lo` internally, but the
|
|
||||||
*destination IP* in the packet header is non-loopback so Source's
|
|
||||||
filter passes.
|
|
||||||
|
|
||||||
### Implications for the roadmap (revised)
|
|
||||||
|
|
||||||
- The vanilla UDP logaddress mechanism works in L4D2. **No SourceMod
|
|
||||||
bridge is required for Phase 1** — we were going to add one to
|
|
||||||
work around a non-existent engine bug.
|
|
||||||
- `mp_logdetail` was correctly removed from the cfg injection: it
|
|
||||||
is a CS-only cvar; L4D2 prints `Unknown command` at startup.
|
|
||||||
- A future SourceMod plugin in `l4d2host/` may still be useful for
|
|
||||||
L4D-specific events the engine doesn't auto-log (kill events with
|
|
||||||
weapon detail, special-infected spawns) — see
|
|
||||||
`SMILEWHENYOUDIE/HLstatsX-CE`'s `superlogs-l4d.sp` for prior
|
|
||||||
art. That's a Phase 2 enhancement, not a Phase 1 prerequisite.
|
|
||||||
|
|
||||||
### Lessons (filed under "validate before implementing")
|
|
||||||
|
|
||||||
- I anchored hard on "L4D2 engine bug" after disproving two
|
|
||||||
reasonable alternative hypotheses (hibernation, SocketBindAllow).
|
|
||||||
The third alternative — destination-IP filter — wasn't tested
|
|
||||||
before committing the wrong-conclusion docs. Should have tried a
|
|
||||||
non-loopback destination as the very first test.
|
|
||||||
- The journal evidence of rich game events going to console/file
|
|
||||||
was real and significant, but I misread it as proof of an
|
|
||||||
engine-level UDP stub instead of evidence the engine's line
|
|
||||||
generator works fine and only the *destination filter* was at
|
|
||||||
play.
|
|
||||||
- The user's push-back ("maybe do research if it's really broken?")
|
|
||||||
forced the right next step. Worth internalizing: a single
|
|
||||||
contrarian search query found HLstatsX-on-L4D2 production
|
|
||||||
instances within seconds and would have prevented the
|
|
||||||
wrong-conclusion commit.
|
|
||||||
|
|
||||||
## Follow-ups (separate plans)
|
|
||||||
|
|
||||||
- **After a few days of capture data**: design
|
|
||||||
`RawLogLine` / `LogEvent` / `MatchSession` / `Round` schema.
|
|
||||||
- Build a reducer.
|
|
||||||
- Consider a SourceMod bridge for richer L4D2-specific events
|
|
||||||
(kill weapons, special-infected spawns, finale outcomes).
|
|
||||||
- `sv_logsecret` if the listener is ever moved off-host.
|
|
||||||
|
|
@ -1,540 +0,0 @@
|
||||||
# Server detail — console + log autoscroll Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Make the Log and Console transcripts on `/servers/<id>` stay pinned to their bottom on initial load, tab activation, and command append; cap the inline Console to the 20 newest entries while the modal keeps all 50.
|
|
||||||
|
|
||||||
**Architecture:** Single opt-in `data-autoscroll` attribute on any scroll-pinned region. One helper (`scrollAutoscrollTargets`) handles root, descendants, *and* ancestors so HTMX `beforeend` swaps that fire `htmx:load` on the inserted child still find and scroll the parent transcript. `tabs.js` calls the helper after activating a tab. `modals.js` already dispatches `modal:opened` on `showModal()`, so the Console modal hooks that event to scroll on first open.
|
|
||||||
|
|
||||||
**Tech Stack:** Flask + Jinja2 templates, vanilla JS, HTMX, Playwright for e2e, Claude-in-Chrome for live-browser verification.
|
|
||||||
|
|
||||||
**Reference spec:** `docs/superpowers/specs/2026-05-20-server-console-log-autoscroll-design.md`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File map
|
|
||||||
|
|
||||||
| Path | Action | Responsibility |
|
|
||||||
|------|--------|----------------|
|
|
||||||
| `l4d2web/l4d2web/routes/page_routes.py` | modify (~L318-345) | Add `console_history_overview = console_history[-20:]` to the render context |
|
|
||||||
| `l4d2web/l4d2web/templates/server_detail.html` | modify (L60-73, L101, L111-117, L159) | Inline loop uses `console_history_overview`; add `data-autoscroll` to both `<pre class="log-stream">`; Console modal transcript wires `modal:opened` → scroll |
|
|
||||||
| `l4d2web/l4d2web/static/js/console-history.js` | modify (L159-179) | Rename and generalise `scrollConsolesToBottom` → `scrollAutoscrollTargets` with ancestor walk; expose on `window` |
|
|
||||||
| `l4d2web/l4d2web/static/js/tabs.js` | modify (~L9-19) | After `activateTab` toggles `hidden`, scroll any `[data-autoscroll]` in the newly-active pane |
|
|
||||||
| `l4d2web/tests/test_servers.py` | extend | Server-side: assert inline pane caps at 20 and modal keeps 50 |
|
|
||||||
| `l4d2web/tests/e2e/test_server_detail.py` | extend | E2E: Console tab pinned to bottom on activation; Console pane pinned to bottom after command submit |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 1: Server-side slice for inline Console history
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/routes/page_routes.py:318-347`
|
|
||||||
- Test: `l4d2web/tests/test_servers.py`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Write the failing test**
|
|
||||||
|
|
||||||
`test_servers.py` uses the `user_client_with_blueprints` fixture (returns `(client, data)` where `data` carries `user_id` and `blueprint_id`) plus direct DB writes via `session_scope()`. Mirror that pattern:
|
|
||||||
|
|
||||||
Append to `l4d2web/tests/test_servers.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_server_detail_inline_console_caps_at_20_modal_keeps_all(user_client_with_blueprints) -> None:
|
|
||||||
"""When > 20 CommandHistory rows exist for the server, the inline
|
|
||||||
Console transcript renders only the 20 newest (chronological order),
|
|
||||||
while the modal transcript renders the full set (capped at the
|
|
||||||
route-level 50)."""
|
|
||||||
import re
|
|
||||||
from datetime import UTC, datetime, timedelta
|
|
||||||
|
|
||||||
from l4d2web.models import CommandHistory, Server
|
|
||||||
|
|
||||||
client, data = user_client_with_blueprints
|
|
||||||
|
|
||||||
with session_scope() as db:
|
|
||||||
server = Server(
|
|
||||||
user_id=data["user_id"],
|
|
||||||
blueprint_id=data["blueprint_id"],
|
|
||||||
name="console-cap",
|
|
||||||
port=27123,
|
|
||||||
rcon_password="x",
|
|
||||||
)
|
|
||||||
db.add(server)
|
|
||||||
db.flush()
|
|
||||||
sid = server.id
|
|
||||||
for i in range(35):
|
|
||||||
db.add(CommandHistory(
|
|
||||||
user_id=data["user_id"],
|
|
||||||
server_id=sid,
|
|
||||||
command=f"cmd_{i:02d}",
|
|
||||||
reply=f"reply {i}",
|
|
||||||
is_error=False,
|
|
||||||
created_at=datetime.now(UTC) - timedelta(minutes=40 - i),
|
|
||||||
))
|
|
||||||
|
|
||||||
resp = client.get(f"/servers/{sid}")
|
|
||||||
assert resp.status_code == 200
|
|
||||||
body = resp.get_data(as_text=True)
|
|
||||||
|
|
||||||
inline_match = re.search(
|
|
||||||
rf'<div id="console-transcript-inline-{sid}"[^>]*>(.*?)</div>\s*<form',
|
|
||||||
body,
|
|
||||||
re.DOTALL,
|
|
||||||
)
|
|
||||||
assert inline_match, "inline transcript container not found"
|
|
||||||
inline_lines = inline_match.group(1).count('class="console-line')
|
|
||||||
assert inline_lines == 20, f"inline expected 20, got {inline_lines}"
|
|
||||||
|
|
||||||
modal_match = re.search(
|
|
||||||
rf'<div id="console-transcript-modal-{sid}"[^>]*>(.*?)</div>\s*<form',
|
|
||||||
body,
|
|
||||||
re.DOTALL,
|
|
||||||
)
|
|
||||||
assert modal_match, "modal transcript container not found"
|
|
||||||
modal_lines = modal_match.group(1).count('class="console-line')
|
|
||||||
assert modal_lines == 35, f"modal expected 35, got {modal_lines}"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run the test to verify it fails**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web && pytest tests/test_servers.py::test_server_detail_inline_console_caps_at_20_modal_keeps_all -v
|
|
||||||
```
|
|
||||||
Expected: FAIL (inline returns 35, not 20)
|
|
||||||
|
|
||||||
- [ ] **Step 3: Modify the route**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/routes/page_routes.py`, locate `server_detail` (~L306) and the `console_history = list(reversed(...))` block (~L318-330). After it, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
console_history_overview = console_history[-20:]
|
|
||||||
```
|
|
||||||
|
|
||||||
Then in the `return render_template("server_detail.html", …)` call (~L335-347), add the new kwarg:
|
|
||||||
|
|
||||||
```python
|
|
||||||
return render_template(
|
|
||||||
"server_detail.html",
|
|
||||||
server=server,
|
|
||||||
blueprint=blueprint,
|
|
||||||
connect_host=connect_host,
|
|
||||||
file_tree_root_entries=file_tree_root_entries,
|
|
||||||
file_tree_truncated=file_tree_truncated_count > 0
|
|
||||||
if file_tree_root_entries is not None
|
|
||||||
else False,
|
|
||||||
file_tree_truncated_count=file_tree_truncated_count,
|
|
||||||
console_history=console_history,
|
|
||||||
console_history_overview=console_history_overview,
|
|
||||||
**ctx,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Update the template to use the new variable**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/server_detail.html` at the **inline** Console pane (~L65-71, inside `<div role="tabpanel" data-tab="console">`):
|
|
||||||
|
|
||||||
Change:
|
|
||||||
```jinja
|
|
||||||
{% for h in console_history %}
|
|
||||||
```
|
|
||||||
to:
|
|
||||||
```jinja
|
|
||||||
{% for h in console_history_overview %}
|
|
||||||
```
|
|
||||||
|
|
||||||
Leave the **modal** Console transcript (~L112-116) iterating over `console_history` unchanged.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run the test to verify it passes**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web && pytest tests/test_servers.py::test_server_detail_inline_console_caps_at_20_modal_keeps_all -v
|
|
||||||
```
|
|
||||||
Expected: PASS
|
|
||||||
|
|
||||||
- [ ] **Step 6: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/routes/page_routes.py \
|
|
||||||
l4d2web/l4d2web/templates/server_detail.html \
|
|
||||||
l4d2web/tests/test_servers.py
|
|
||||||
git commit -m "feat(server-detail): cap inline console to 20 newest; modal keeps 50"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 2: Generalise `scrollConsolesToBottom` to walk ancestors
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/console-history.js:159-179`
|
|
||||||
|
|
||||||
The current helper only matches `root` and its descendants. When `htmx:load` fires after `hx-swap="beforeend"`, `event.detail.elt` is the newly inserted child line — neither it nor its descendants match `[data-autoscroll]`, so the transcript never scrolls. Adding an ancestor walk fixes this case without affecting the existing one.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Rewrite the helper**
|
|
||||||
|
|
||||||
Replace lines 159-179 of `l4d2web/l4d2web/static/js/console-history.js` with:
|
|
||||||
|
|
||||||
```js
|
|
||||||
function scrollAutoscrollTargets(root) {
|
|
||||||
if (!root) return;
|
|
||||||
const targets = [];
|
|
||||||
// Case 1: root itself opts in.
|
|
||||||
if (root.matches && root.matches("[data-autoscroll]")) {
|
|
||||||
targets.push(root);
|
|
||||||
}
|
|
||||||
// Case 2: descendants opt in.
|
|
||||||
if (root.querySelectorAll) {
|
|
||||||
root.querySelectorAll("[data-autoscroll]").forEach((el) => targets.push(el));
|
|
||||||
}
|
|
||||||
// Case 3: neither — walk up. Handles htmx:load firing with the inserted
|
|
||||||
// child as the root after hx-swap="beforeend" on a console line.
|
|
||||||
if (targets.length === 0 && root.closest) {
|
|
||||||
const up = root.closest("[data-autoscroll]");
|
|
||||||
if (up) targets.push(up);
|
|
||||||
}
|
|
||||||
targets.forEach((el) => {
|
|
||||||
el.scrollTop = el.scrollHeight;
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
// Expose for tabs.js (and any future cross-module consumer). The script
|
|
||||||
// is `defer`red in base.html, so it runs before DOMContentLoaded and the
|
|
||||||
// global is defined by the time tabs.js's DCL-deferred initStrips runs.
|
|
||||||
window.scrollAutoscrollTargets = scrollAutoscrollTargets;
|
|
||||||
|
|
||||||
document.addEventListener("DOMContentLoaded", () => {
|
|
||||||
scrollAutoscrollTargets(document);
|
|
||||||
bindAllConsoleForms(document);
|
|
||||||
});
|
|
||||||
|
|
||||||
document.addEventListener("htmx:load", (event) => {
|
|
||||||
scrollAutoscrollTargets(event.detail.elt);
|
|
||||||
bindAllConsoleForms(event.detail.elt);
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
That is: rename the function, add the third (ancestor-walk) case, expose on `window`, and update the two listeners to call the new name. Behavior preserved for the existing two cases.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Smoke-verify in a browser**
|
|
||||||
|
|
||||||
Start the dev server (or use a running one). Open `/servers/<id>` and run in the devtools console:
|
|
||||||
|
|
||||||
```js
|
|
||||||
typeof window.scrollAutoscrollTargets
|
|
||||||
```
|
|
||||||
Expected: `"function"`.
|
|
||||||
|
|
||||||
Then with a Console tab that already has > clientHeight of content, click into it and verify (manual eye check) that the transcript is no longer at the top. (It still won't be — tabs.js doesn't call the helper yet; that's Task 3. The smoke-check here is only that the helper is defined and reachable.)
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/console-history.js
|
|
||||||
git commit -m "feat(console): scrollAutoscrollTargets walks ancestors; expose on window"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 3: `tabs.js` — scroll on tab activation
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/static/js/tabs.js:9-19`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Edit `activateTab`**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/static/js/tabs.js`, at the end of the `activateTab(strip, name)` function, after `strip.dataset.activeTab = name;`, append:
|
|
||||||
|
|
||||||
```js
|
|
||||||
// Pin any scroll-locked regions (log streams, console transcripts) in
|
|
||||||
// the newly-visible pane to the bottom. While the pane was hidden,
|
|
||||||
// their scrollHeight was 0 so previous appends couldn't anchor.
|
|
||||||
const activePane = strip.querySelector('[role="tabpanel"]:not([hidden])');
|
|
||||||
if (activePane && window.scrollAutoscrollTargets) {
|
|
||||||
window.scrollAutoscrollTargets(activePane);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The full block now looks like:
|
|
||||||
|
|
||||||
```js
|
|
||||||
function activateTab(strip, name) {
|
|
||||||
strip.querySelectorAll('[role="tab"]').forEach((t) => {
|
|
||||||
const on = t.dataset.tab === name;
|
|
||||||
t.setAttribute("aria-selected", on ? "true" : "false");
|
|
||||||
t.tabIndex = on ? 0 : -1;
|
|
||||||
});
|
|
||||||
strip.querySelectorAll('[role="tabpanel"]').forEach((p) => {
|
|
||||||
p.hidden = p.dataset.tab !== name;
|
|
||||||
});
|
|
||||||
strip.dataset.activeTab = name;
|
|
||||||
|
|
||||||
const activePane = strip.querySelector('[role="tabpanel"]:not([hidden])');
|
|
||||||
if (activePane && window.scrollAutoscrollTargets) {
|
|
||||||
window.scrollAutoscrollTargets(activePane);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Add `data-autoscroll` to the log-stream elements**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/server_detail.html`:
|
|
||||||
|
|
||||||
- Inline log (~L61):
|
|
||||||
```jinja
|
|
||||||
<pre class="log-stream" data-autoscroll data-sse-url="/servers/{{ server.id }}/logs/stream"></pre>
|
|
||||||
```
|
|
||||||
- Modal log (~L101):
|
|
||||||
```jinja
|
|
||||||
<pre class="log-stream tall" data-autoscroll data-sse-url="/servers/{{ server.id }}/logs/stream"></pre>
|
|
||||||
```
|
|
||||||
- Modal job-log (~L159):
|
|
||||||
```jinja
|
|
||||||
<pre class="log-stream tall" data-autoscroll data-sse-url="/jobs/{{ latest_job.id }}/stream"></pre>
|
|
||||||
```
|
|
||||||
|
|
||||||
The Console transcripts already carry `data-autoscroll` and don't need editing.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Live-browser verification**
|
|
||||||
|
|
||||||
This is the moment to confirm the user-visible bug is fixed. Start (or reuse) the dev server and seed > 20 console rows for `demo-server` (the dev seed has only 9; bring that to 30+):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sqlite3 .tmp/dev-server/l4d2web.db "
|
|
||||||
WITH RECURSIVE seq(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM seq WHERE n<30)
|
|
||||||
INSERT INTO command_history (user_id, server_id, command, reply, is_error, created_at)
|
|
||||||
SELECT 1, 1, 'verify_cmd_' || printf('%02d', n),
|
|
||||||
'reply ' || n, 0, datetime('now', '-'||(35-n)||' minutes') FROM seq;"
|
|
||||||
```
|
|
||||||
|
|
||||||
Then in a browser, log into `/login` as `dev` / `devdevdev`, open `/servers/1`, click the **Console** tab, and run in devtools:
|
|
||||||
|
|
||||||
```js
|
|
||||||
(() => {
|
|
||||||
const t = document.querySelector('[id^="console-transcript-inline-"]');
|
|
||||||
return { scrollTop: t.scrollTop, scrollHeight: t.scrollHeight, clientHeight: t.clientHeight,
|
|
||||||
bottomDistance: t.scrollHeight - t.scrollTop - t.clientHeight,
|
|
||||||
inline_lines: t.querySelectorAll('.console-line').length };
|
|
||||||
})();
|
|
||||||
```
|
|
||||||
Expected: `bottomDistance < 2` (pinned to bottom), `inline_lines == 20`.
|
|
||||||
|
|
||||||
Same exercise on the **Log** tab: switch to Console, then back to Log; the SSE-streamed log should be pinned to bottom. If the log stream has no content yet (server isn't running and never has been), this assertion vacuously passes (`scrollHeight === clientHeight`).
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/static/js/tabs.js \
|
|
||||||
l4d2web/l4d2web/templates/server_detail.html
|
|
||||||
git commit -m "feat(server-detail): pin transcripts/logs to bottom on tab activation"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 4: Pin Console-modal transcript on `modal:opened`
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/l4d2web/templates/server_detail.html:105-120` (Console modal)
|
|
||||||
|
|
||||||
When the Console modal opens via `data-inline-modal-open="console-modal"`, the dialog goes from `display:none` to displayed. The transcript inside it has `scrollHeight=0` while hidden, so any earlier autoscroll attempt did nothing. `modals.js:35` dispatches a `modal:opened` CustomEvent on `dialog.showModal()`; we listen for it on the dialog and re-pin.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the inline listener via `hx-on` (no JS file changes)**
|
|
||||||
|
|
||||||
In `l4d2web/l4d2web/templates/server_detail.html`, change the Console modal opening tag (~L105):
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<dialog id="console-modal" class="modal" aria-labelledby="console-modal-title"
|
|
||||||
onmodal-opened="if(window.scrollAutoscrollTargets){window.scrollAutoscrollTargets(this)}">
|
|
||||||
```
|
|
||||||
|
|
||||||
The non-standard `onmodal-opened` attribute only works through an `addEventListener` registration — `dialog.dispatchEvent` doesn't invoke `onevent` handlers for custom events. So instead add a small DOMContentLoaded hook inside the template (one-off, not worth a new JS file):
|
|
||||||
|
|
||||||
Replace the opening `<dialog id="console-modal" …>` line with:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<dialog id="console-modal" class="modal" aria-labelledby="console-modal-title">
|
|
||||||
```
|
|
||||||
|
|
||||||
…and **immediately before `{% endblock %}`** (end of file), add:
|
|
||||||
|
|
||||||
```jinja
|
|
||||||
<script>
|
|
||||||
// Pin the Console modal transcript to its bottom each time the modal
|
|
||||||
// opens. While the <dialog> is closed, its descendants have scrollHeight=0,
|
|
||||||
// so neither the page-load autoscroll nor htmx:load can anchor them.
|
|
||||||
// The 'modal:opened' CustomEvent is dispatched by modals.js on
|
|
||||||
// dialog.showModal().
|
|
||||||
(() => {
|
|
||||||
const dlg = document.getElementById("console-modal");
|
|
||||||
if (!dlg) return;
|
|
||||||
dlg.addEventListener("modal:opened", () => {
|
|
||||||
if (window.scrollAutoscrollTargets) {
|
|
||||||
window.scrollAutoscrollTargets(dlg);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
})();
|
|
||||||
</script>
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Live-browser verification**
|
|
||||||
|
|
||||||
Reload `/servers/1` (after seeding from Task 3 Step 3). Click the ⛶ expand button on the Console tab to open `#console-modal`. In devtools:
|
|
||||||
|
|
||||||
```js
|
|
||||||
(() => {
|
|
||||||
const t = document.querySelector('[id^="console-transcript-modal-"]');
|
|
||||||
return { scrollTop: t.scrollTop, scrollHeight: t.scrollHeight,
|
|
||||||
clientHeight: t.clientHeight,
|
|
||||||
bottomDistance: t.scrollHeight - t.scrollTop - t.clientHeight };
|
|
||||||
})();
|
|
||||||
```
|
|
||||||
Expected: `bottomDistance < 2`.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/l4d2web/templates/server_detail.html
|
|
||||||
git commit -m "feat(server-detail): pin Console-modal transcript on modal:opened"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 5: e2e — Console tab pinned on activation; pinned after submit
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `l4d2web/tests/e2e/test_server_detail.py` (append)
|
|
||||||
- Possibly: `l4d2web/tests/e2e/conftest.py` (add seeded-history fixture if helpful)
|
|
||||||
|
|
||||||
The dev server seed has too few rows to trigger overflow; the e2e fixture seeds its own. Reusing the `server_with_files` fixture is fine — it already builds a Server; we just need to insert `CommandHistory` rows before navigating.
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add a fixture that seeds console history**
|
|
||||||
|
|
||||||
Append to `l4d2web/tests/e2e/conftest.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@pytest.fixture(scope="function")
|
|
||||||
def server_with_console_history(server_with_files):
|
|
||||||
"""server_with_files + 30 seeded CommandHistory rows for that server,
|
|
||||||
so the inline Console transcript exceeds its visible height and the
|
|
||||||
autoscroll behaviour is observable."""
|
|
||||||
from datetime import UTC, datetime, timedelta
|
|
||||||
from l4d2web.models import CommandHistory
|
|
||||||
|
|
||||||
sid = server_with_files["server_id"]
|
|
||||||
uid = server_with_files["user_id"]
|
|
||||||
with session_scope() as session:
|
|
||||||
for i in range(30):
|
|
||||||
session.add(CommandHistory(
|
|
||||||
user_id=uid,
|
|
||||||
server_id=sid,
|
|
||||||
command=f"seed_{i:02d}",
|
|
||||||
reply=f"reply {i}",
|
|
||||||
is_error=False,
|
|
||||||
created_at=datetime.now(UTC) - timedelta(minutes=35 - i),
|
|
||||||
))
|
|
||||||
|
|
||||||
return server_with_files
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Write the failing e2e tests**
|
|
||||||
|
|
||||||
Append to `l4d2web/tests/e2e/test_server_detail.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def test_console_tab_pinned_to_bottom_on_activation(page: Page, server_with_console_history) -> None:
|
|
||||||
"""Clicking the Console tab leaves the transcript scrolled to its
|
|
||||||
bottom — the newest seeded command must be visible, not the oldest."""
|
|
||||||
base = server_with_console_history["base_url"]
|
|
||||||
sid = server_with_console_history["server_id"]
|
|
||||||
login(page, base)
|
|
||||||
page.goto(f"{base}/servers/{sid}")
|
|
||||||
|
|
||||||
strip = page.locator("[data-tab-strip]")
|
|
||||||
strip.locator('[role="tab"][data-tab="console"]').click()
|
|
||||||
|
|
||||||
transcript = page.locator(f"#console-transcript-inline-{sid}")
|
|
||||||
expect(transcript).to_be_visible()
|
|
||||||
|
|
||||||
# Pinned to bottom: |scrollHeight - scrollTop - clientHeight| < 2
|
|
||||||
bottom_distance = transcript.evaluate(
|
|
||||||
"(el) => el.scrollHeight - el.scrollTop - el.clientHeight"
|
|
||||||
)
|
|
||||||
assert abs(bottom_distance) < 2, f"transcript not pinned to bottom: {bottom_distance}px"
|
|
||||||
|
|
||||||
# Inline pane caps at 20 lines.
|
|
||||||
line_count = transcript.locator(".console-line").count()
|
|
||||||
assert line_count == 20, f"inline expected 20 lines, got {line_count}"
|
|
||||||
|
|
||||||
|
|
||||||
def test_console_pane_pinned_after_command_submit(page: Page, server_with_console_history) -> None:
|
|
||||||
"""After submitting a command, the transcript scrolls so the new line
|
|
||||||
is visible at the bottom.
|
|
||||||
|
|
||||||
The dev server has no live RCON, but the POST still records a
|
|
||||||
CommandHistory row and HTMX appends a console-line to the transcript;
|
|
||||||
that's enough to exercise the autoscroll path.
|
|
||||||
"""
|
|
||||||
base = server_with_console_history["base_url"]
|
|
||||||
sid = server_with_console_history["server_id"]
|
|
||||||
login(page, base)
|
|
||||||
page.goto(f"{base}/servers/{sid}")
|
|
||||||
|
|
||||||
strip = page.locator("[data-tab-strip]")
|
|
||||||
strip.locator('[role="tab"][data-tab="console"]').click()
|
|
||||||
|
|
||||||
transcript = page.locator(f"#console-transcript-inline-{sid}")
|
|
||||||
pane = page.locator('[role="tabpanel"][data-tab="console"]')
|
|
||||||
cmd_input = pane.locator('input[name="command"]')
|
|
||||||
cmd_input.fill("verify_submit")
|
|
||||||
cmd_input.press("Enter")
|
|
||||||
|
|
||||||
# Wait for the new line to appear in the DOM.
|
|
||||||
expect(transcript.locator(".console-line", has_text="verify_submit")).to_be_visible()
|
|
||||||
|
|
||||||
bottom_distance = transcript.evaluate(
|
|
||||||
"(el) => el.scrollHeight - el.scrollTop - el.clientHeight"
|
|
||||||
)
|
|
||||||
assert abs(bottom_distance) < 2, f"transcript not pinned after submit: {bottom_distance}px"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Run e2e**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd l4d2web && pytest tests/e2e/test_server_detail.py -m e2e -v
|
|
||||||
```
|
|
||||||
|
|
||||||
If the dev-machine doesn't have Chromium installed: `playwright install chromium` first.
|
|
||||||
|
|
||||||
Expected: PASS for both new tests; existing tests still pass.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add l4d2web/tests/e2e/conftest.py l4d2web/tests/e2e/test_server_detail.py
|
|
||||||
git commit -m "test(e2e): console transcript pinned to bottom on tab + submit"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Final verification (live browser)
|
|
||||||
|
|
||||||
After all tasks merge:
|
|
||||||
|
|
||||||
1. Reset / re-seed the dev DB if you used the `verify_cmd_*` seed above:
|
|
||||||
```bash
|
|
||||||
sqlite3 .tmp/dev-server/l4d2web.db "DELETE FROM command_history WHERE command LIKE 'verify_cmd_%' OR command LIKE 'seed_cmd_%';"
|
|
||||||
```
|
|
||||||
2. Restart `scripts/dev-server.py`.
|
|
||||||
3. In a browser, log in as `dev` / `devdevdev`, open `/servers/1`.
|
|
||||||
4. Click **Console** — transcript shows ≤20 most-recent commands, scrolled to bottom.
|
|
||||||
5. Submit any command (e.g. `status`) — the response error appends as a new line and the transcript scrolls to keep it visible.
|
|
||||||
6. Click ⛶ to open the Console modal — modal transcript shows all 50 most-recent (or however many exist), scrolled to bottom.
|
|
||||||
7. Switch to **Log**, then between Console and Log a few times — each transition leaves the active tab's transcript pinned to its bottom.
|
|
||||||
|
|
||||||
## Spec coverage check
|
|
||||||
|
|
||||||
- ✅ Server-side slice to 20 newest inline; modal keeps 50 — **Task 1**
|
|
||||||
- ✅ Both log-stream `<pre>`s get `data-autoscroll` — **Task 3 Step 2**
|
|
||||||
- ✅ Helper walks ancestors to handle htmx:load on appended child — **Task 2**
|
|
||||||
- ✅ Helper exposed on `window` — **Task 2**
|
|
||||||
- ✅ `tabs.js` pins on activation — **Task 3 Step 1**
|
|
||||||
- ✅ Console-modal pin on `modal:opened` — **Task 4**
|
|
||||||
- ✅ Unit/template coverage for cap — **Task 1**
|
|
||||||
- ✅ e2e coverage for tab activation + submit pin — **Task 5**
|
|
||||||
|
|
@ -1,353 +0,0 @@
|
||||||
# L4D2 Global Map Overlays Design
|
|
||||||
|
|
||||||
**Goal:** Add two managed, system-wide map overlays, `l4d2center-maps` and `cedapug-maps`, populated from upstream map sources and refreshed daily through the existing job system.
|
|
||||||
|
|
||||||
**Approval status:** User-approved design direction. Implementation must not start until this spec is reviewed and an implementation plan is written.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
`left4me` already has typed overlays, a builder registry, global overlays through `Overlay.user_id = NULL`, and queued overlay build jobs. Steam Workshop overlays use a cache plus symlinks into `left4dead2/addons/`, and server initialization already runs overlay builders before calling `l4d2ctl initialize`.
|
|
||||||
|
|
||||||
Global map sources fit the same model. The host library remains unchanged: it receives overlay refs and mounts directories. The web app owns map-source fetching, cache management, reconciliation, and job logs.
|
|
||||||
|
|
||||||
The two upstream sources are:
|
|
||||||
|
|
||||||
- `https://l4d2center.com/maps/servers/index.csv`
|
|
||||||
- `https://cedapug.com/custom`
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **One general operation.** Use `refresh_global_overlays`, not source-specific cron operations.
|
|
||||||
2. **Systemd owns time.** A systemd timer runs daily and invokes a Flask CLI command. The CLI only enqueues work; the existing worker performs downloads and writes logs.
|
|
||||||
3. **System jobs are nullable-owner jobs.** `jobs.user_id` becomes nullable. `NULL` means the job was created by the system. UI displays owner as `system`. Only admins can access system jobs.
|
|
||||||
4. **Managed global overlays are auto-seeded.** The app creates or repairs exactly one `l4d2center-maps` overlay and exactly one `cedapug-maps` overlay.
|
|
||||||
5. **Global overlays are normal system overlays for users.** `Overlay.user_id = NULL` makes them visible to every authenticated user and selectable in every user's blueprint editor.
|
|
||||||
6. **Managed types are not user-creatable.** Normal overlay creation does not offer `l4d2center_maps` or `cedapug_maps`. The seeder is the only code path that creates those types.
|
|
||||||
7. **Exact reconciliation.** Refresh makes each managed overlay match its upstream manifest. Removed upstream maps are removed from the managed overlay symlink set. Foreign files are left alone and logged.
|
|
||||||
8. **No initialize-time downloads.** `initialize_server()` may run builders to repair symlinks, but it must not fetch remote manifests or download large archives. Missing cache content fails clearly.
|
|
||||||
9. **Separate cache from Workshop.** Non-Steam global maps use `${LEFT4ME_ROOT}/global_overlay_cache`, not `${LEFT4ME_ROOT}/workshop_cache`.
|
|
||||||
10. **Source-specific parsing stays explicit.** Do not introduce a generic arbitrary HTTP source framework in this phase.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
The design extends the existing overlay-builder registry:
|
|
||||||
|
|
||||||
```python
|
|
||||||
BUILDERS = {
|
|
||||||
"external": ExternalBuilder(),
|
|
||||||
"workshop": WorkshopBuilder(),
|
|
||||||
"l4d2center_maps": GlobalMapOverlayBuilder(),
|
|
||||||
"cedapug_maps": GlobalMapOverlayBuilder(),
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Both global map overlay types share the same filesystem builder. Source-specific code lives in refresh services that know how to fetch and parse upstream manifests.
|
|
||||||
|
|
||||||
High-level flow:
|
|
||||||
|
|
||||||
```text
|
|
||||||
systemd timer
|
|
||||||
-> flask refresh-global-overlays
|
|
||||||
-> ensure_global_overlays()
|
|
||||||
-> enqueue refresh_global_overlays job (coalesced)
|
|
||||||
-> worker fetches manifests
|
|
||||||
-> worker downloads/extracts cache files
|
|
||||||
-> worker records desired VPK files
|
|
||||||
-> worker rebuilds overlay symlinks directly
|
|
||||||
```
|
|
||||||
|
|
||||||
Auto-seeded overlay rows use fixed names, managed types, `user_id = NULL`, and web-generated paths:
|
|
||||||
|
|
||||||
```text
|
|
||||||
name=l4d2center-maps, type=l4d2center_maps, user_id=NULL, path=str(id)
|
|
||||||
name=cedapug-maps, type=cedapug_maps, user_id=NULL, path=str(id)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Data Model
|
|
||||||
|
|
||||||
### `jobs`
|
|
||||||
|
|
||||||
Change `jobs.user_id` from required to nullable.
|
|
||||||
|
|
||||||
`NULL` means a system-created job. Authorization rules become:
|
|
||||||
|
|
||||||
- Admins can view, stream, and cancel every job, including system jobs.
|
|
||||||
- Non-admins can access only jobs where `job.user_id == current_user.id`.
|
|
||||||
- System jobs are not visible to non-admins through direct job URLs.
|
|
||||||
|
|
||||||
Job list/detail pages use outer joins to `users` and render missing owners as `system`.
|
|
||||||
|
|
||||||
### `global_overlay_sources`
|
|
||||||
|
|
||||||
One row per managed global source overlay:
|
|
||||||
|
|
||||||
```text
|
|
||||||
id INTEGER PRIMARY KEY
|
|
||||||
overlay_id INTEGER NOT NULL UNIQUE REFERENCES overlays(id) ON DELETE CASCADE
|
|
||||||
source_key VARCHAR(64) NOT NULL UNIQUE -- l4d2center-maps | cedapug-maps
|
|
||||||
source_type VARCHAR(32) NOT NULL -- l4d2center_csv | cedapug_custom_page
|
|
||||||
source_url TEXT NOT NULL
|
|
||||||
last_manifest_hash VARCHAR(64) NOT NULL DEFAULT ''
|
|
||||||
last_refreshed_at DATETIME NULL
|
|
||||||
last_error TEXT NOT NULL DEFAULT ''
|
|
||||||
created_at DATETIME NOT NULL
|
|
||||||
updated_at DATETIME NOT NULL
|
|
||||||
```
|
|
||||||
|
|
||||||
`source_key` is stable and used by the seeder to repair missing rows.
|
|
||||||
|
|
||||||
### `global_overlay_items`
|
|
||||||
|
|
||||||
One row per manifest item belonging to a global overlay source:
|
|
||||||
|
|
||||||
```text
|
|
||||||
id INTEGER PRIMARY KEY
|
|
||||||
source_id INTEGER NOT NULL REFERENCES global_overlay_sources(id) ON DELETE CASCADE
|
|
||||||
item_key VARCHAR(255) NOT NULL -- stable per source
|
|
||||||
display_name VARCHAR(255) NOT NULL DEFAULT ''
|
|
||||||
download_url TEXT NOT NULL
|
|
||||||
expected_vpk_name VARCHAR(255) NOT NULL DEFAULT ''
|
|
||||||
expected_size BIGINT NULL
|
|
||||||
expected_md5 VARCHAR(32) NOT NULL DEFAULT ''
|
|
||||||
etag VARCHAR(255) NOT NULL DEFAULT ''
|
|
||||||
last_modified VARCHAR(255) NOT NULL DEFAULT ''
|
|
||||||
content_length BIGINT NULL
|
|
||||||
last_downloaded_at DATETIME NULL
|
|
||||||
last_error TEXT NOT NULL DEFAULT ''
|
|
||||||
created_at DATETIME NOT NULL
|
|
||||||
updated_at DATETIME NOT NULL
|
|
||||||
UNIQUE(source_id, item_key)
|
|
||||||
```
|
|
||||||
|
|
||||||
For `l4d2center`, `item_key` and `expected_vpk_name` come from the CSV `Name` column, and `expected_size` / `expected_md5` come from the CSV.
|
|
||||||
|
|
||||||
For `cedapug`, `item_key` is the direct download URL path basename, normalized without query parameters. CEDAPUG does not publish checksums in the observed page, so integrity uses HTTP metadata when available and archive extraction checks.
|
|
||||||
|
|
||||||
### `global_overlay_item_files`
|
|
||||||
|
|
||||||
One row per extracted VPK file that should appear in an overlay:
|
|
||||||
|
|
||||||
```text
|
|
||||||
id INTEGER PRIMARY KEY
|
|
||||||
item_id INTEGER NOT NULL REFERENCES global_overlay_items(id) ON DELETE CASCADE
|
|
||||||
vpk_name VARCHAR(255) NOT NULL
|
|
||||||
cache_path TEXT NOT NULL -- relative path under global_overlay_cache
|
|
||||||
size BIGINT NOT NULL
|
|
||||||
md5 VARCHAR(32) NOT NULL DEFAULT ''
|
|
||||||
created_at DATETIME NOT NULL
|
|
||||||
updated_at DATETIME NOT NULL
|
|
||||||
UNIQUE(item_id, vpk_name)
|
|
||||||
```
|
|
||||||
|
|
||||||
This extra file table handles archives that contain more than one `.vpk` without overloading the item row.
|
|
||||||
|
|
||||||
## Filesystem Layout
|
|
||||||
|
|
||||||
Use a cache separate from Steam Workshop:
|
|
||||||
|
|
||||||
```text
|
|
||||||
${LEFT4ME_ROOT}/
|
|
||||||
global_overlay_cache/
|
|
||||||
l4d2center-maps/
|
|
||||||
archives/
|
|
||||||
vpks/
|
|
||||||
cedapug-maps/
|
|
||||||
archives/
|
|
||||||
vpks/
|
|
||||||
overlays/
|
|
||||||
{overlay_id}/
|
|
||||||
left4dead2/addons/
|
|
||||||
*.vpk -> absolute symlink to global_overlay_cache/.../vpks/*.vpk
|
|
||||||
```
|
|
||||||
|
|
||||||
Cache file writes are atomic: download to `*.partial`, extract to a temporary directory, verify, then `os.replace()` final VPK files.
|
|
||||||
|
|
||||||
Symlink targets are absolute, matching the existing Workshop overlay design.
|
|
||||||
|
|
||||||
## Source Parsing
|
|
||||||
|
|
||||||
### L4D2Center
|
|
||||||
|
|
||||||
Fetch `https://l4d2center.com/maps/servers/index.csv` with a normal HTTP timeout.
|
|
||||||
|
|
||||||
The CSV is semicolon-delimited and contains:
|
|
||||||
|
|
||||||
```text
|
|
||||||
Name;Size;md5;Download link
|
|
||||||
```
|
|
||||||
|
|
||||||
Each item produces:
|
|
||||||
|
|
||||||
- `item_key = Name`
|
|
||||||
- `expected_vpk_name = Name`
|
|
||||||
- `expected_size = Size`
|
|
||||||
- `expected_md5 = md5`
|
|
||||||
- `download_url = Download link`
|
|
||||||
|
|
||||||
Downloads are `.7z` archives. Extraction uses a Python 7z implementation such as `py7zr` so tests do not depend on a system `7z` binary. After extraction, the expected VPK file must exist and match both size and md5. A mismatch fails that item and leaves the prior cached file in place.
|
|
||||||
|
|
||||||
### CEDAPUG
|
|
||||||
|
|
||||||
Fetch `https://cedapug.com/custom` and parse the embedded `renderCustomMapDownloads([...])` data.
|
|
||||||
|
|
||||||
Only direct download links are managed in v1:
|
|
||||||
|
|
||||||
- Relative links like `/maps/FatalFreight.zip` are converted to absolute `https://cedapug.com/maps/FatalFreight.zip`.
|
|
||||||
- External `http` links are logged and skipped in v1.
|
|
||||||
- Entries without a download link are built-in campaigns and skipped.
|
|
||||||
|
|
||||||
Downloads are `.zip` archives extracted with Python's standard `zipfile`. Every `.vpk` in the archive becomes a managed output file for that item. If no `.vpk` is present, the item fails and the prior cached files remain in place.
|
|
||||||
|
|
||||||
Because CEDAPUG does not publish checksums in the observed page, refresh detects changes using `ETag`, `Last-Modified`, `Content-Length`, and local extracted file metadata when available. A manual refresh can force revalidation by clearing item metadata in a later maintenance path; no force-refresh UI is included in this design.
|
|
||||||
|
|
||||||
## Refresh Job
|
|
||||||
|
|
||||||
`refresh_global_overlays` is a global worker operation.
|
|
||||||
|
|
||||||
Behavior:
|
|
||||||
|
|
||||||
1. Ensure both managed global overlays and source rows exist.
|
|
||||||
2. Fetch both manifests.
|
|
||||||
3. Upsert manifest items.
|
|
||||||
4. Mark items absent from the manifest as no longer desired by deleting their item rows; cascading deletes remove their file rows.
|
|
||||||
5. Download and extract new or changed items.
|
|
||||||
6. Keep prior cache files when an item download or verification fails, but record `last_error`.
|
|
||||||
7. Rebuild symlinks for changed sources directly through the same builder interface used by `build_overlay`.
|
|
||||||
8. Emit clear job logs: manifest counts, downloads, skips, removals, verification failures, and build summaries.
|
|
||||||
|
|
||||||
`refresh_global_overlays` does not enqueue child `build_overlay` jobs. Direct builder invocation keeps the overlay in sync before the refresh job releases its global mutex, so a server job cannot start against updated cache metadata but stale overlay symlinks.
|
|
||||||
|
|
||||||
Coalescing:
|
|
||||||
|
|
||||||
- If a `refresh_global_overlays` job is queued or running, CLI/admin requests return the existing job instead of inserting a duplicate.
|
|
||||||
|
|
||||||
## Builder Reconciliation
|
|
||||||
|
|
||||||
`GlobalMapOverlayBuilder` reads desired file rows for the overlay's source and reconciles only symlinks it manages.
|
|
||||||
|
|
||||||
Managed symlink rule:
|
|
||||||
|
|
||||||
- A symlink in `left4dead2/addons/` is managed if its resolved target is under `${LEFT4ME_ROOT}/global_overlay_cache/{source_key}/vpks/`.
|
|
||||||
- Managed symlinks absent from desired files are removed.
|
|
||||||
- Desired files missing from cache are skipped and logged as errors.
|
|
||||||
- Non-symlink files and symlinks outside the source cache are left untouched and logged as foreign entries.
|
|
||||||
|
|
||||||
This mirrors `WorkshopBuilder` behavior and keeps manual files safe.
|
|
||||||
|
|
||||||
## Scheduler Rules
|
|
||||||
|
|
||||||
`refresh_global_overlays` joins the existing global mutex group.
|
|
||||||
|
|
||||||
It must not run concurrently with:
|
|
||||||
|
|
||||||
- `install`
|
|
||||||
- `refresh_workshop_items`
|
|
||||||
- any `build_overlay`
|
|
||||||
- any server job (`initialize`, `start`, `stop`, `delete`)
|
|
||||||
|
|
||||||
No server or overlay job may start while `refresh_global_overlays` is running.
|
|
||||||
|
|
||||||
This conservative rule is acceptable because daily map refreshes are rare and large downloads should not race runtime changes.
|
|
||||||
|
|
||||||
## CLI And Systemd Timer
|
|
||||||
|
|
||||||
Add Flask CLI command:
|
|
||||||
|
|
||||||
```text
|
|
||||||
flask refresh-global-overlays
|
|
||||||
```
|
|
||||||
|
|
||||||
The command:
|
|
||||||
|
|
||||||
- Loads app config and DB.
|
|
||||||
- Ensures global overlays exist.
|
|
||||||
- Enqueues or returns the existing `refresh_global_overlays` job.
|
|
||||||
- Prints the job id.
|
|
||||||
- Does not run downloads itself.
|
|
||||||
|
|
||||||
Add deployment units:
|
|
||||||
|
|
||||||
```text
|
|
||||||
left4me-refresh-global-overlays.service
|
|
||||||
left4me-refresh-global-overlays.timer
|
|
||||||
```
|
|
||||||
|
|
||||||
Service command:
|
|
||||||
|
|
||||||
```text
|
|
||||||
/opt/left4me/.venv/bin/flask --app l4d2web.app:create_app refresh-global-overlays
|
|
||||||
```
|
|
||||||
|
|
||||||
Timer policy:
|
|
||||||
|
|
||||||
```text
|
|
||||||
OnCalendar=daily
|
|
||||||
Persistent=true
|
|
||||||
```
|
|
||||||
|
|
||||||
The service runs as the `left4me` user with `/etc/left4me/host.env` and `/etc/left4me/web.env`, matching `left4me-web.service`.
|
|
||||||
|
|
||||||
## Permissions And UI
|
|
||||||
|
|
||||||
Overlay list behavior:
|
|
||||||
|
|
||||||
- Admins see all overlays, including managed global map overlays.
|
|
||||||
- Non-admin users see system overlays and their own private workshop overlays.
|
|
||||||
- Managed global overlays appear in blueprint overlay selection for every user.
|
|
||||||
|
|
||||||
Creation behavior:
|
|
||||||
|
|
||||||
- Non-admin users can create only user-creatable types, currently `workshop`.
|
|
||||||
- Admins can create normal admin-creatable types, currently `external` and `workshop`.
|
|
||||||
- No user-facing create form offers `l4d2center_maps` or `cedapug_maps`.
|
|
||||||
- Auto-seeding is the only creation path for managed global map overlay types.
|
|
||||||
|
|
||||||
Admin controls:
|
|
||||||
|
|
||||||
- Add a manual "Refresh global overlays" action in the admin area.
|
|
||||||
- The action enqueues the same coalesced `refresh_global_overlays` job as the timer.
|
|
||||||
- Managed overlay detail pages show source type, source URL, last refresh time, last error, item count, and latest related jobs.
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
- Manifest fetch failure fails the job if no source can be processed. If one source succeeds and one fails, the job should still finish failed with partial-success logs and preserve prior content for the failed source.
|
|
||||||
- Per-item download failures do not abort sibling items.
|
|
||||||
- Verification failures keep prior cached files and record `last_error` on the item.
|
|
||||||
- Extraction rejects path traversal entries and ignores non-VPK files.
|
|
||||||
- Unsupported CEDAPUG external links are skipped with a warning.
|
|
||||||
- Initialize-time checks fail if desired global map files are missing from cache, naming the overlay and missing VPK names.
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
Test coverage should include:
|
|
||||||
|
|
||||||
- Auto-seeding creates exactly one source overlay per source and repairs missing source rows.
|
|
||||||
- `jobs.user_id` nullable behavior, outer joins, and `system` display.
|
|
||||||
- Non-admins cannot access system jobs directly.
|
|
||||||
- CLI coalesces queued/running `refresh_global_overlays` jobs.
|
|
||||||
- Scheduler truth table for the new global operation.
|
|
||||||
- L4D2Center CSV parser with semicolon-delimited fixture data.
|
|
||||||
- CEDAPUG embedded JavaScript parser with fixture HTML.
|
|
||||||
- L4D2Center download/extract verifies VPK size and md5.
|
|
||||||
- CEDAPUG download/extract records every VPK in a zip archive.
|
|
||||||
- Reconcile removes obsolete managed symlinks and leaves foreign files alone.
|
|
||||||
- Overlay create UI rejects managed singleton types.
|
|
||||||
- Blueprint overlay selection includes managed global overlays for all users.
|
|
||||||
- Deployment tests cover the service and timer artifacts.
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
- User-created global map source overlays.
|
|
||||||
- Arbitrary configurable HTTP manifest sources.
|
|
||||||
- Force-refresh UI for CEDAPUG items.
|
|
||||||
- Cache garbage collection for unreferenced archive files.
|
|
||||||
- Client-side map download UX.
|
|
||||||
- Steam Workshop links discovered on the CEDAPUG page; those are skipped rather than imported into workshop overlays.
|
|
||||||
- Host-library awareness of managed overlay types.
|
|
||||||
|
|
||||||
## Implementation Boundaries
|
|
||||||
|
|
||||||
- `l4d2host` remains unchanged.
|
|
||||||
- The web app continues to call host operations only through `l4d2ctl`.
|
|
||||||
- Existing blueprint semantics remain unchanged: overlays are live-linked, ordered, and first overlay has highest precedence.
|
|
||||||
- Existing workshop overlay behavior remains unchanged except scheduler interactions with the new global operation.
|
|
||||||
|
|
@ -1,226 +0,0 @@
|
||||||
# L4D2 Workshop Overlays Design
|
|
||||||
|
|
||||||
**Goal:** Let users add Steam Workshop content (.vpk addons and maps) to L4D2 servers from the web UI. Workshop downloads run as a new typed overlay that fits the existing `Overlay` + `BlueprintOverlay` model, downloaded via the public Steam Web API and exposed through the existing fuse-overlayfs mount layer.
|
|
||||||
|
|
||||||
**Approval status:** User-approved design direction. Implementation proceeds in lockstep with the companion plan at `docs/superpowers/plans/2026-05-07-l4d2-workshop-overlays.md`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
`left4me` users today add `.vpk` content to a server only by SFTP-ing files into a manually-prepared overlay directory or by maintaining shell scripts (`competitive_rework`, `workshop_maps`, `tickrate`, etc.) that wrap `curl`/`steamcmd`. The web app exposes overlay rows but offers no way for users to populate them.
|
|
||||||
|
|
||||||
This spec adds **workshop overlays**: a user-private overlay type that downloads `.vpk` files via the public `ISteamRemoteStorage` API and surfaces them through the existing mount layer. Users keep composing blueprints by stacking overlays — workshop overlays become another row alongside today's externally-managed ones.
|
|
||||||
|
|
||||||
This is the first *typed* overlay. The design adds a `type` column and a builder-registry so future overlay types (tarball, inline, manual upload) plug in without schema churn or workflow changes.
|
|
||||||
|
|
||||||
Steam Workshop content for L4D2 (consumer_app_id 550) is downloadable via two anonymous-POST endpoints with no Steam Web API key required: `GetCollectionDetails` resolves a collection ID to its child item IDs, and `GetPublishedFileDetails` returns per-item metadata including a public `file_url` for the `.vpk`. This is the same API the user's existing `steam-workshop-download` script uses.
|
|
||||||
|
|
||||||
L4D2-specific player-side pain points (sv_consistency / RestrictAddons configuration gotchas, the inability to push workshop content via `sv_downloadurl`) are documented in **Out of scope** and tracked as separate follow-ups. This spec stays strictly on workshop content acquisition.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **Typed overlays.** `Overlay.type` joins `external` (existing rows; admin-managed; no-op builder) and `workshop` (new). Future types — tarball, inline, manual upload — slot in via the same builder registry without schema churn.
|
|
||||||
2. **No JSON `source_config` blob.** Per-type structured data lives in proper relational tables. JSON is reserved for genuinely opaque diagnostic payloads.
|
|
||||||
3. **Central deduplicated `WorkshopItem` registry** keyed on `steam_id`. Cache lives at `/var/lib/left4me/workshop_cache/{steam_id}.vpk`. Multiple overlays referencing the same Steam item share the same cache file.
|
|
||||||
4. **Symlinks, not copies.** Overlay directories contain `left4dead2/addons/{steam_id}.vpk` symlinks pointing into the cache. Both the cache file and the symlink are named by `{steam_id}` only — no Steam filename in any on-disk path, so Steam can rename the upstream `.vpk` without breaking lookup.
|
|
||||||
5. **Many-to-many association is pure** (no `enabled` flag). Toggle a workshop item by removing or re-adding the association. The shared cache makes this cheap.
|
|
||||||
6. **Collections are atomic UI bulk-imports.** Pasting a collection URL/ID resolves member items and creates N item associations. The DB never tracks "this came from a collection." Re-importing a collection is idempotent on existing items and additive for new ones.
|
|
||||||
7. **Single global admin "Refresh all workshop items" button.** One Steam metadata batch call, then re-download items whose `time_updated` advanced. No per-item, per-overlay, or scheduled refresh in v1.
|
|
||||||
8. **No cache GC in v1.** Cache grows monotonically. Reference-counted cleanup is a follow-up.
|
|
||||||
9. **Globality is independent of overlay type.** `Overlay.user_id` is the scope (NULL = system-wide, set = private to that user). v1 defaults newly-created workshop overlays to private and leaves existing external overlays as system-wide. A future "publish/share" button will let owners toggle `user_id` without changing type.
|
|
||||||
10. **One unified "Create overlay" UI button.** Modal has a type radio (External | Workshop). No path field — the web app generates the path for every new overlay.
|
|
||||||
11. **Strict scope.** v1 ships only the workshop type. L4D2 server-config gotchas, client-subscription helpers, other recipe types — all deferred to follow-up specs.
|
|
||||||
12. **`consumer_app_id == 550` validation** at every Steam API response at fetch/add time; non-L4D2 items are rejected and never reach the row. The value is a fixed precondition, not data.
|
|
||||||
13. **Input field accepts numeric ID, full Workshop URL, or a multi-line batch** of either. Pasting `123456` and pasting `steamcommunity.com/sharedfiles/filedetails/?id=123456` produce the same result; pasting many of either at once works too.
|
|
||||||
14. **Web-managed overlay paths.** All new overlays (any type) get `path = str(overlay_id)` at insert time. The user never picks a path. Existing legacy external overlay rows keep their current path values; migrating them to the ID-based scheme is a follow-up. `Overlay.id` uses SQLite `AUTOINCREMENT` so deleted IDs are never reused.
|
|
||||||
15. **Auto-rebuild on item change.** Adding or removing items from a workshop overlay automatically enqueues a `build_overlay` job. The "Rebuild" button on the detail page is for manual recovery only. New build jobs for an overlay coalesce with any pending one for the same overlay (don't queue duplicates).
|
|
||||||
16. **HTTPS** for all Steam Web API calls. The reference downloader uses HTTP; we don't.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
Overlay row (type=workshop)
|
|
||||||
└─refs─▶ overlay_workshop_items
|
|
||||||
└─▶ WorkshopItem (global, by steam_id)
|
|
||||||
│
|
|
||||||
▼ download (Steam GetPublishedFileDetails + HTTP GET)
|
|
||||||
workshop_cache/{steam_id}.vpk
|
|
||||||
▲
|
|
||||||
overlay_dir/left4dead2/addons/{steam_id}.vpk ─symlink─┘
|
|
||||||
```
|
|
||||||
|
|
||||||
Build dispatch via a registry:
|
|
||||||
|
|
||||||
```python
|
|
||||||
BUILDERS = {"external": ExternalBuilder(), "workshop": WorkshopBuilder()}
|
|
||||||
|
|
||||||
def build_overlay(overlay_id):
|
|
||||||
overlay = db.get(Overlay, overlay_id)
|
|
||||||
BUILDERS[overlay.type].build(overlay, on_stdout, on_stderr, should_cancel)
|
|
||||||
```
|
|
||||||
|
|
||||||
`ExternalBuilder` is a no-op for legacy admin-managed dirs. `WorkshopBuilder` performs an idempotent diff-apply of `addons/` symlinks against the current associations. Future types add their own builders without changing the dispatcher, the mount layer, or the blueprint editor.
|
|
||||||
|
|
||||||
## Data Model
|
|
||||||
|
|
||||||
### `Overlay` (extended)
|
|
||||||
|
|
||||||
```
|
|
||||||
id INTEGER PK AUTOINCREMENT
|
|
||||||
name VARCHAR(255) NOT NULL
|
|
||||||
path VARCHAR(255) NOT NULL -- new overlays: str(id); legacy externals: existing values
|
|
||||||
type VARCHAR(16) NOT NULL -- 'external' | 'workshop' (extensible)
|
|
||||||
user_id INTEGER NULL REFERENCES users(id) -- NULL = system-wide
|
|
||||||
created_at, updated_at
|
|
||||||
|
|
||||||
UNIQUE INDEX on (name) WHERE user_id IS NULL -- system overlays globally unique by name
|
|
||||||
UNIQUE INDEX on (name, user_id) WHERE user_id IS NOT NULL -- per-user namespace
|
|
||||||
INDEX on (type, user_id)
|
|
||||||
```
|
|
||||||
|
|
||||||
Two partial unique indexes are required because a naive composite `UNIQUE(name, user_id)` doesn't constrain externals — SQLite treats NULL as distinct in unique constraints, so two externals could share a name. Partial indexes preserve the prior global-uniqueness invariant for system rows.
|
|
||||||
|
|
||||||
### `WorkshopItem` (new)
|
|
||||||
|
|
||||||
```
|
|
||||||
id INTEGER PK
|
|
||||||
steam_id VARCHAR(20) NOT NULL UNIQUE -- 64-bit, store as text
|
|
||||||
title VARCHAR(255) NOT NULL DEFAULT ''
|
|
||||||
filename VARCHAR(255) NOT NULL DEFAULT '' -- upstream Steam filename, display only
|
|
||||||
file_url TEXT NOT NULL DEFAULT ''
|
|
||||||
file_size BIGINT NOT NULL DEFAULT 0
|
|
||||||
time_updated INTEGER NOT NULL DEFAULT 0 -- Steam epoch
|
|
||||||
preview_url TEXT NOT NULL DEFAULT '' -- thumbnail URL hot-linked from Steam
|
|
||||||
last_downloaded_at DATETIME NULL
|
|
||||||
last_error TEXT NOT NULL DEFAULT ''
|
|
||||||
created_at, updated_at
|
|
||||||
```
|
|
||||||
|
|
||||||
`consumer_app_id` is **not** stored. It's validated at fetch time and the row never exists for non-L4D2 items.
|
|
||||||
|
|
||||||
### `overlay_workshop_items` (new, pure association)
|
|
||||||
|
|
||||||
```
|
|
||||||
id INTEGER PK
|
|
||||||
overlay_id INTEGER NOT NULL REFERENCES overlays(id) ON DELETE CASCADE
|
|
||||||
workshop_item_id INTEGER NOT NULL REFERENCES workshop_items(id) ON DELETE RESTRICT
|
|
||||||
UNIQUE (overlay_id, workshop_item_id)
|
|
||||||
INDEX (workshop_item_id) -- reverse lookup for refresh
|
|
||||||
```
|
|
||||||
|
|
||||||
No `enabled` column — toggle is remove/add, which is cheap because the cache survives.
|
|
||||||
|
|
||||||
### `Job` (extended)
|
|
||||||
|
|
||||||
Add `overlay_id INTEGER NULL REFERENCES overlays(id)` for `build_overlay` jobs.
|
|
||||||
|
|
||||||
## Filesystem Layout
|
|
||||||
|
|
||||||
```
|
|
||||||
/var/lib/left4me/
|
|
||||||
overlays/
|
|
||||||
{overlay_id}/ # flat — same shape for every type
|
|
||||||
left4dead2/addons/
|
|
||||||
{steam_id}.vpk -> /var/lib/left4me/workshop_cache/{steam_id}.vpk
|
|
||||||
workshop_cache/
|
|
||||||
{steam_id}.vpk # one file per Steam item
|
|
||||||
```
|
|
||||||
|
|
||||||
- Every new overlay (workshop, future tarball/inline/manual) lives at `overlays/{overlay_id}/`. Legacy external overlays keep their pre-migration paths (e.g. `overlays/standard/`).
|
|
||||||
- `workshop_cache/` is created during deploy provisioning, not lazily — avoids races between concurrent first downloads.
|
|
||||||
- Web user owns both trees (mode 0755). Host user (`l4d2ctl`) needs read on both. If web and host are different users, they share a group.
|
|
||||||
- Symlink targets are absolute. Relative targets resolve in the merged-mount namespace and break across the host/web boundary.
|
|
||||||
- The builder never creates a dangling symlink. If a `WorkshopItem` lacks a cache file at build time, the builder logs a warning and skips it — fuse-overlayfs surfaces broken links to L4D2 as opaque addon-scan failures.
|
|
||||||
|
|
||||||
## UI
|
|
||||||
|
|
||||||
A single "Create overlay" button on `/overlays` opens a modal with type radio (External | Workshop) and a name field. No path field. The web app generates `path = str(overlay_id)` after insert.
|
|
||||||
|
|
||||||
Workshop overlay detail page (`/overlays/{id}` when `type='workshop'`) shows:
|
|
||||||
|
|
||||||
- A multi-line input plus a radio (Items | Collection). Pasting one or many IDs/URLs adds them in order; pasting a collection ID resolves its members.
|
|
||||||
- An item table with: thumbnail (`preview_url`), `steam_id` linking to Steam, title, filename, last-updated, size, last-error if any, Remove.
|
|
||||||
- A manual "Rebuild" button (for recovery only — every add/remove auto-enqueues a coalesced `build_overlay` job).
|
|
||||||
- Status indicator pulled from the latest related `Job` row.
|
|
||||||
|
|
||||||
External overlay detail page is unchanged in shape: read-only path display, name edit (admin only). The "External" type retains the existing admin-only SFTP-to-disk workflow until a future "manual upload" type replaces it.
|
|
||||||
|
|
||||||
The blueprint editor is unchanged in structure. Workshop overlays appear alongside externals in the user's overlay picker; ordering and stacking semantics are identical.
|
|
||||||
|
|
||||||
Admin section gets one new control: "Refresh all workshop items" button on the admin landing or workshop subsection. Pressing it enqueues a single `refresh_workshop_items` job.
|
|
||||||
|
|
||||||
### Routes
|
|
||||||
|
|
||||||
| Method | Path | Purpose |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | `/overlays` | List with Type column, filtered by user permissions |
|
|
||||||
| POST | `/overlays` | Create; reads `type` and `name` only |
|
|
||||||
| GET | `/overlays/{id}` | Type-aware detail page |
|
|
||||||
| POST | `/overlays/{id}/items` | Add items or collection; auto-enqueues coalesced `build_overlay` |
|
|
||||||
| POST | `/overlays/{id}/items/{item_id}/delete` | Remove association; auto-enqueues coalesced `build_overlay` |
|
|
||||||
| POST | `/overlays/{id}/build` | Manual rebuild (recovery) |
|
|
||||||
| POST | `/admin/workshop/refresh` | Admin only; enqueue `refresh_workshop_items` |
|
|
||||||
|
|
||||||
HTMX usage stays minimal: only the add-item form and per-row delete swap a fragment. Everything else is full-page POST/redirect/GET.
|
|
||||||
|
|
||||||
## Job Operations
|
|
||||||
|
|
||||||
Two new operations join the existing job worker:
|
|
||||||
|
|
||||||
- **`build_overlay(overlay_id)`** — `Job.overlay_id` is set; `server_id` is NULL. Dispatches to `BUILDERS[overlay.type].build(...)`. Cancellation between filesystem operations.
|
|
||||||
- **`refresh_workshop_items()`** — admin-only. Both `server_id` and `overlay_id` are NULL. Phases: fetch all metadata in one batched call, download items where `time_updated` advanced, enqueue (coalesced) `build_overlay` for affected overlays. v1 doesn't wait on child builds; the admin sees them in the jobs list.
|
|
||||||
|
|
||||||
### Scheduler rules
|
|
||||||
|
|
||||||
- `install` and `refresh_workshop_items` are mutually exclusive with each other, with all `build_overlay`s, and with all server jobs.
|
|
||||||
- `build_overlay(overlay_id=N)` blocks if `install_running`, `refresh_running`, or another build for the same `overlay_id` is running. Builds for *different* overlays may run concurrently.
|
|
||||||
- Server start/init blocks if `refresh_running` or any `build_overlay` for an overlay referenced by the server's blueprint is running.
|
|
||||||
|
|
||||||
Coalescing: a new `build_overlay` for an overlay that already has a queued (not-yet-running) build returns the existing job instead of inserting a new row.
|
|
||||||
|
|
||||||
`initialize_server` synchronously calls each overlay's builder before writing the spec for `l4d2ctl initialize`. If a workshop overlay references uncached items (no file in `workshop_cache/`), `initialize_server` fails fast with a clear error naming the missing IDs and pointing the user at the overlay page. It never silently mounts a partial overlay.
|
|
||||||
|
|
||||||
## Permissions
|
|
||||||
|
|
||||||
- **External overlays**: admin-only create/edit. Visible to all authenticated users (system-wide).
|
|
||||||
- **Workshop overlays**: any logged-in user can create. Owner or admin can edit and delete. Visible to the owner and admins.
|
|
||||||
- **Admin refresh**: admin-only.
|
|
||||||
|
|
||||||
The `Overlay` listing query for non-admins becomes: `type='external' OR user_id=current_user.id`.
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **Broken symlinks across host/web boundary** — mitigated by absolute targets, build-time pre-check skipping uncached items, and `deploy/` documenting permission requirements.
|
|
||||||
- **Initialize against uncached items** — would silently mount overlays missing maps. Mitigated by `initialize_server`'s fail-fast check; tested.
|
|
||||||
- **Steam API rate limits** — refresh of 100 items is one metadata POST plus 100 downloads at 8-way parallelism. No retry/backoff in v1; 429s surface verbatim in the job log.
|
|
||||||
- **Partial failure during refresh** — each item is independent; per-item errors land on the row. Re-running refresh retries failures.
|
|
||||||
- **Concurrent same-ID adds** — `WorkshopItem.steam_id` unique handles cache dedup. `(overlay_id, workshop_item_id)` unique catches double-association; the route returns "already in overlay" rather than 500.
|
|
||||||
- **Build coalescing missed** — would enqueue dozens of redundant builds during multi-item adds. Mitigated by the `enqueue_build_overlay` helper; tested.
|
|
||||||
- **Worker concurrency rule miss** — the truth-table test in `test_job_worker.py` is the only way to trust the new scheduler logic; written before dispatch.
|
|
||||||
- **DB/disk drift** — a stray directory left by a prior failed delete could shadow a fresh overlay. Mitigated by `AUTOINCREMENT` (no ID reuse) and `os.makedirs(exist_ok=False)` (loud failure on collision).
|
|
||||||
- **Partial unique gap on SQLite** — naive composite `UNIQUE(name, user_id)` doesn't constrain externals because NULL is distinct. Mitigated by two partial unique indexes; tested explicitly.
|
|
||||||
- **Cache growth without GC** — accepted v1 trade-off.
|
|
||||||
- **Item removed from Steam** — refresh marks `result != 1`; row keeps last good cache file; UI surfaces error string. Operator decides removal.
|
|
||||||
- **L4D2 containerized run** — symlink absolute targets break if the server runs in a different mount namespace. Re-evaluate when containerization comes up.
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
These came up in research and dialog but stay out of v1:
|
|
||||||
|
|
||||||
- **Publish / share button on overlays.** Lets owners flip `Overlay.user_id` between their own ID and NULL without changing type. The schema already supports it; only the UI is deferred.
|
|
||||||
- **Migrate legacy external overlay paths to the ID-based scheme.** Existing external rows keep their pre-migration paths in v1; a follow-up migration moves the directories on disk and updates the rows.
|
|
||||||
- **Switch from fuse-overlayfs to kernel overlayfs via a privileged helper.** Matches the existing systemd / steam-install sudoers helper pattern under `/usr/local/libexec/left4me/`. Workshop overlays would work identically under either mount engine — symlinks resolve through normal VFS in both.
|
|
||||||
- **`sv_consistency` / `addonconfig.cfg RestrictAddons` auto-handling.** When a workshop overlay attaches to a blueprint, surface a banner with a one-click fix. Most-cited L4D2 player pain.
|
|
||||||
- **Shareable Steam Workshop collection link for clients.** Server cannot push workshop content via `sv_downloadurl`; clients must subscribe themselves. A panel-generated collection makes that one click for players. Requires Steam OAuth.
|
|
||||||
- **Other overlay types.** `tarball` (covers the old `competitive_rework` GitHub-tarball recipe), `inline` (covers `tickrate`'s inline `server.cfg`), `manual` (file manager / upload, replaces the admin-SFTP external workflow). All slot in via the builder registry without schema churn.
|
|
||||||
- **Cache GC.** Reference-counted delete or admin "Clear unreferenced" page.
|
|
||||||
- **Per-item / per-overlay / scheduled refresh.** v1 has one global admin button; revisit if users want finer control.
|
|
||||||
- **Update-aware server restart UX.** Notify users when a running server's overlay content has been refreshed underneath it.
|
|
||||||
|
|
||||||
## Implementation Boundaries
|
|
||||||
|
|
||||||
- The host library contract is unchanged. Workshop content arrives in overlay directories the same way externals do today; `l4d2host` doesn't know overlays have types.
|
|
||||||
- The job-execution model is preserved: same workers, same logs, same cancel callbacks. Only the operations table grows.
|
|
||||||
- The blueprint privacy model and desired-vs-actual server state model are unchanged.
|
|
||||||
- No new frontend dependencies. Vendored HTMX + custom CSS + small inline JS.
|
|
||||||
- No new Steam Web API key required; both endpoints used accept anonymous POSTs.
|
|
||||||
- The companion implementation plan governs task ordering and verification commands. Implementation must not start without explicit user approval per that plan's gate.
|
|
||||||
|
|
@ -1,80 +0,0 @@
|
||||||
# Kernel Overlayfs Helper Design
|
|
||||||
|
|
||||||
**Goal:** Replace the per-instance `fuse-overlayfs` mount with kernel-native overlayfs invoked through a privileged sudo helper that mounts in PID 1's mount namespace. Restores host-namespace visibility of the merged overlay so gameserver units (`left4me-server@%i.service`) can `chdir` into it at unshare time.
|
|
||||||
|
|
||||||
**Approval status:** User-approved design direction. Implementation proceeds in lockstep with the companion plan at `docs/superpowers/plans/2026-05-08-kernel-overlayfs-helper.md`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
**Symptom.** After redeploys, starting a gameserver leaves the systemd unit in `activating (auto-restart)` with `status=200/CHDIR — Changing to the requested working directory failed: No such file or directory`. Investigation showed:
|
|
||||||
|
|
||||||
- `fuse-overlayfs` running as `left4me` user mounts in `left4me-web.service`'s mount namespace.
|
|
||||||
- `ProtectSystem=full` + `ReadWritePaths=/var/lib/left4me` forces `PrivateMounts=yes` on the unit (`systemd-analyze security` confirms).
|
|
||||||
- The unit's bind of `/var/lib/left4me` shows `shared:471 master:1` in `/proc/<pid>/mountinfo` — slave-receive-only — so mounts created beneath it never propagate back to host.
|
|
||||||
- `MountFlags=shared` (added in commit `1968684` to fix this) sets only the unit's *root* propagation; it does not override the slave-direction propagation that `ProtectSystem`/`ReadWritePaths` apply to their bind mounts. The gameserver unit, on unshare, inherits *host* mounts and sees nothing at the merged path → CHDIR fails.
|
|
||||||
|
|
||||||
The system *appeared* to work for ~1d8h before this investigation because the prior fuse daemon happened to land in the host namespace via some transient state. The mechanism documented in `1968684` does not reliably work on systemd 257 with this hardening shape.
|
|
||||||
|
|
||||||
**Out-of-scope item now in scope.** The 2026-05-07 workshop-overlays spec already lists this transition at line 211: *"Switch from fuse-overlayfs to kernel overlayfs via a privileged helper. Matches the existing systemd / steam-install sudoers helper pattern under `/usr/local/libexec/left4me/`."* The mount-propagation bug is the trigger to do it now.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **Privileged helper does the mount.** New `left4me-overlay` script under `/usr/local/libexec/left4me/`, invoked via `sudo -n`. Mirrors the existing `left4me-systemctl` and `left4me-journalctl` pattern. The helper enters PID 1's mount namespace via `nsenter --mount=/proc/1/ns/mnt` and then calls `/bin/mount -t overlay …` or `/bin/umount`. Result: all overlay mounts live in the host namespace, visible to gameserver units.
|
|
||||||
2. **Kernel-native overlayfs, not fuse.** Once a privileged helper exists, fuse-overlayfs's rootless-mount-via-setuid-`fusermount3` advantage disappears. Kernel overlayfs is faster, has no long-running daemon, simpler unmount, and one fewer runtime dep.
|
|
||||||
3. **Helper is Python, not shell.** Path canonicalization, env-file parsing, and lowerdir prefix-allowlist validation are too brittle in shell. Uses system `/usr/bin/python3` (never the venv) and stdlib only. Owned by root, mode 0755.
|
|
||||||
4. **Verbs are `mount` and `umount`.** Matches the kernel/userspace utility names; reduces cognitive friction over `unmount`.
|
|
||||||
5. **Helper takes only the instance name as input.** It reads `${LEFT4ME_ROOT:-/var/lib/left4me}/instances/<name>/instance.env` for `L4D2_LOWERDIRS=` and computes `upper`/`work`/`merged` from the runtime root. Equivalent in security to taking lowerdirs as args (the user already controls instance.env), and produces a one-line audit trail in `journalctl _COMM=sudo`.
|
|
||||||
6. **Strict path validation in the helper.**
|
|
||||||
- Instance name matches `^[a-z0-9][a-z0-9_-]{0,63}$` (mirrors `validate_instance_name` in `l4d2host/paths.py`).
|
|
||||||
- Each lowerdir from `L4D2_LOWERDIRS` is `os.path.realpath`'d and must resolve under one of an allowlist: `installation/`, `overlays/`, `global_overlay_cache/`, `workshop_cache/`. Empty entries and traversals are rejected.
|
|
||||||
- `upper`/`work`/`merged` must resolve exactly to `runtime/<name>/{upper,work,merged}`.
|
|
||||||
- Lowerdir count ≤ 500 (kernel overlayfs hard cap; was 64 before kernel 5.2).
|
|
||||||
7. **Whiteout-format guard.** `fuse-overlayfs` running as non-root uses `user.fuseoverlayfs.*` xattrs for whiteouts and opaque dirs, which kernel overlayfs ignores entirely. Before mounting, the helper walks `upperdir` once and refuses if any such xattr is present. Defensive; catches a stale fuse-era upperdir that wasn't wiped during migration.
|
|
||||||
8. **One-time migration: wipe existing `upper/` and `work/`.** Deploy script runs a gated migration (sentinel file `/var/lib/left4me/.kernel-overlay-migrated`) that stops gameservers, stops web service, unmounts any stale fuse/overlay mounts, recreates empty `upper`/`work` dirs for every instance. Players' in-place edits to merged content are sacrificed; v1 accepts this for a test deployment.
|
|
||||||
9. **Sudoers verb constraints.** `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-overlay mount *, /usr/local/libexec/left4me/left4me-overlay umount *`. Defense in depth (real validation lives in the helper); makes `sudo -l` output self-documenting.
|
|
||||||
10. **Wire the existing `OverlayMounter` ABC through.** `start_instance`/`stop_instance`/`delete_instance` today bypass the abstraction at `l4d2host/fs/base.py`. The new `KernelOverlayFSMounter` replaces the unused `FuseOverlayFSMounter` AND becomes the only path through `instances.py`. `FuseOverlayFSMounter` and the `fuse_overlayfs.py` module are deleted.
|
|
||||||
11. **Double-mount guard in `start_instance`.** Kernel mounts persist when the web worker dies (unlike fuse daemons, which die with their cgroup). `start_instance` checks `os.path.ismount(merged)` and refuses with a clear error rather than double-mounting.
|
|
||||||
12. **Hardening cleanup on `left4me-web.service`.** Drop `MountFlags=shared` (no longer the mechanism). Restore `PrivateTmp=true` (was dropped in commit `593611e` for fuse propagation that did not work). Keep `NoNewPrivileges` unset (sudo still requires setuid). Update the comment block to reflect the new model.
|
|
||||||
13. **AGENTS.md contracts unchanged.** The host library's CLI surface (`install`, `initialize`, `start`, `stop`, `delete`, `status`, `logs`) is unchanged. The web app continues to drive operations via `l4d2ctl`. The fuse-overlayfs implementation detail was never part of the public contract.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
left4me-web.service (hardened, private mount namespace)
|
|
||||||
│
|
|
||||||
│ start_instance(name=…)
|
|
||||||
▼
|
|
||||||
l4d2host.instances.start_instance
|
|
||||||
│
|
|
||||||
│ KernelOverlayFSMounter().mount(merged=…)
|
|
||||||
▼
|
|
||||||
sudo -n /usr/local/libexec/left4me/left4me-overlay mount <name>
|
|
||||||
│ • validate name (regex)
|
|
||||||
│ • parse instance.env → L4D2_LOWERDIRS
|
|
||||||
│ • realpath each lowerdir, prefix-allowlist check
|
|
||||||
│ • compute upper/work/merged under runtime/<name>/
|
|
||||||
│ • walk upperdir, refuse if any user.fuseoverlayfs.* xattr
|
|
||||||
▼
|
|
||||||
nsenter --mount=/proc/1/ns/mnt -- \
|
|
||||||
/bin/mount -t overlay overlay \
|
|
||||||
-o "lowerdir=…,upperdir=…,workdir=…" \
|
|
||||||
/var/lib/left4me/runtime/<name>/merged
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
host mount namespace now has the overlay; gameserver unit, on
|
|
||||||
unshare, inherits it and CHDIRs into …/merged/left4dead2 successfully.
|
|
||||||
```
|
|
||||||
|
|
||||||
## Operational Notes
|
|
||||||
|
|
||||||
- **Migration ordering on the test box (test-server, …).** The deploy script must, in order: (1) stop all `left4me-server@*.service`, (2) stop `left4me-web.service` (kills any lingering fuse-overlayfs daemons by reaping their cgroup), (3) `findmnt` + force-unmount any leftover fuse/overlay mounts under `/var/lib/left4me/runtime/`, (4) wipe and recreate `upper`/`work` for every instance, (5) deploy + start the new code. The sentinel file `/var/lib/left4me/.kernel-overlay-migrated` gates reruns.
|
|
||||||
- **Filesystem.** `/var/lib/left4me` is btrfs on the test box. Kernel overlayfs on btrfs is supported on kernel ≥ 5.10; the box is on 6.12 — fine. AppArmor ships enabled on Debian Trixie; verify no overlay-related denials in `journalctl -k` after first start.
|
|
||||||
- **Concurrency.** Two threads racing on `start_instance` for the same name is a latent issue unaffected by this change. The double-mount guard partly mitigates: the loser hits the existing mount and errors cleanly.
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
- **Replace `sudo` with `AmbientCapabilities=CAP_SYS_ADMIN`** on a dedicated helper unit. Broader blast radius than the wrapper-script approach.
|
|
||||||
- **A `systemd-mount` per-instance mount unit.** Considered as the alternative architectural fix but adds more moving parts than the helper-script approach. The helper matches the established privileged-helper pattern in this codebase.
|
|
||||||
- **Re-enable `NoNewPrivileges` on `left4me-web.service`.** Requires removing sudo; not feasible while the helper invocation pattern stays.
|
|
||||||
- **Multi-process job-worker-claim safety.** The `_claim_lock` in `l4d2host/services/job_worker.py:131-138` is process-local; correctness depends on `--workers 1`. This change doesn't touch it.
|
|
||||||
- **Replicating the migration on production deployments.** v1 covers only the test-server deployment shape.
|
|
||||||
|
|
@ -1,118 +0,0 @@
|
||||||
# L4D2 Blueprint Overlay Picker Design
|
|
||||||
|
|
||||||
**Goal:** Replace the checkbox + numeric-Order table on the blueprint detail page with a drag-to-reorder list and a single dropdown to add overlays. Drag-and-drop is the primary reorder mechanic; per-row Order text inputs are removed.
|
|
||||||
|
|
||||||
**Approval status:** User-approved. No companion implementation plan — small surface, implemented directly.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
`templates/blueprint_detail.html:14-28` currently renders one HTML table for the blueprint's overlays. Each row carries a `Use` checkbox, a numeric `Order` text input, and the overlay name. To enable an overlay you check it; to reorder you type integers into per-row text fields. Adding a new overlay between existing ones means renumbering everything below it by hand.
|
|
||||||
|
|
||||||
This spec replaces that table with a single ordered list of *selected* overlays plus a `<select>` dropdown for adding more. Drag-to-reorder is the only reorder interaction. A ✕ button on each row removes it (returning it to the dropdown). Picking an entry from the dropdown appends it to the list (and removes it from the dropdown).
|
|
||||||
|
|
||||||
The change is intentionally scoped small: no two-panel layout, no filter widget, no touch / keyboard reorder support, no JS-disabled fallback. The native `<select>` element supplies typeahead-by-letter and keyboard navigation for free, which covers the no-drag path. The page is desktop-primary.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **Single ordered list of selected overlays only.** No second pane. The "available" set lives in the `<select>`. Adding via dropdown is one click; removing via ✕ is one click; reordering is one drag.
|
|
||||||
2. **Native HTML5 drag-and-drop.** No vendored library, no polyfill. Touch-screen drag is unsupported on Android and rough on iOS — accepted because the page is desktop-primary. Add and remove still work on touch via the `<select>` and the `<button>`.
|
|
||||||
3. **JS-required UI.** If JS does not load, the page is unusable. No degradation to the old checkbox table.
|
|
||||||
4. **Server contract unchanged.** Each list row owns one `<input type="hidden" name="overlay_ids" value="{id}">`. Form-submission order = DOM order. The existing `ordered_overlay_ids_from_form` handler in `routes/blueprint_routes.py` already falls back to enumerate index when no `overlay_position_<id>` field is present, so it accepts the new shape with no Python edit.
|
|
||||||
5. **Dropdown re-sorted alphabetically on remove.** When ✕ removes a row, the corresponding `<option>` is sorted-inserted back into the `<select>` (case-insensitive name compare). The dropdown stays predictable.
|
|
||||||
6. **Drop-indicator visual.** A 2px focus-color bar drawn via `box-shadow … inset` on the row under the cursor: top-bar = "drop will land before this row", bottom-bar = "drop will land after this row". The hover side is computed by comparing `event.clientY` to the row's vertical midpoint.
|
|
||||||
7. **Drop on empty space inside the list = append.** Drop directly on the dragged row (or with no movement) = no-op. Escape during drag triggers `dragend`, which clears all visual classes.
|
|
||||||
8. **Out of scope:** keyboard reorder, ARIA live announcements, touch DnD polyfill, server-side cleanup of the now-unused `overlay_position_<id>` form-field path.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
GET /blueprints/<id>
|
|
||||||
page_routes.blueprint_page
|
|
||||||
├─▶ selected_overlays (ordered by BlueprintOverlay.position)
|
|
||||||
└─▶ available_overlays = all_overlays \ selected_overlays
|
|
||||||
(alphabetical)
|
|
||||||
|
|
||||||
templates/blueprint_detail.html
|
|
||||||
<ol data-overlay-list> ← drag target, hidden inputs
|
|
||||||
<li data-overlay-id draggable> × ⋮⋮ name </li>
|
|
||||||
…
|
|
||||||
</ol>
|
|
||||||
<select data-overlay-add> ← add path
|
|
||||||
<option>Pick a name…</option>
|
|
||||||
<option value=overlay.id>name</option> ← available_overlays
|
|
||||||
…
|
|
||||||
</select>
|
|
||||||
|
|
||||||
static/js/blueprint-overlay-picker.js
|
|
||||||
├─ dragstart/over/leave/drop/end → reorder DOM under [data-overlay-list]
|
|
||||||
├─ click [data-action="remove"] → remove row + sorted-insert <option>
|
|
||||||
├─ change [data-overlay-add] → append <li>, remove <option>
|
|
||||||
└─ refreshEmpty() → toggle [data-overlay-empty][hidden]
|
|
||||||
|
|
||||||
POST /blueprints/<id>
|
|
||||||
form-encoded body: overlay_ids=<id>&overlay_ids=<id>&… (in DOM order)
|
|
||||||
blueprint_routes.update_blueprint_form
|
|
||||||
→ ordered_overlay_ids_from_form (existing; fallback_position branch)
|
|
||||||
→ replace_blueprint_overlays (existing)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Form-contract details
|
|
||||||
|
|
||||||
The new template emits one hidden input per selected row, colocated as a child of the `<li>`:
|
|
||||||
|
|
||||||
```html
|
|
||||||
<li data-overlay-id="3" draggable="true">
|
|
||||||
<span class="overlay-picker-handle">⋮⋮</span>
|
|
||||||
<span class="overlay-picker-name">workshop_maps</span>
|
|
||||||
<button type="button" data-action="remove">×</button>
|
|
||||||
<input type="hidden" name="overlay_ids" value="3">
|
|
||||||
</li>
|
|
||||||
```
|
|
||||||
|
|
||||||
Browser form serialization preserves DOM order across multiple inputs that share a `name`. Werkzeug's `request.form.getlist("overlay_ids")` returns them in submission order. `ordered_overlay_ids_from_form` then assigns each id its enumerate-index position via the `fallback_position` branch (lines 19-31 of `routes/blueprint_routes.py`) and feeds the result to `replace_blueprint_overlays`.
|
|
||||||
|
|
||||||
The JSON path (`POST /blueprints` with `application/json`) already takes `overlay_ids` list order at line 64 of the same file — this spec does not affect it.
|
|
||||||
|
|
||||||
## UI / UX details
|
|
||||||
|
|
||||||
- **Empty state.** When no overlays are selected, a `[data-overlay-empty]` paragraph reads "No overlays selected. Pick one below to add." JS toggles its `hidden` attribute on every list mutation.
|
|
||||||
- **Drag handle.** Visual only (`⋮⋮` glyph). The whole row is `draggable="true"`; the user does not have to grab the handle specifically.
|
|
||||||
- **Drop indicator math.** During `dragover`, compute `event.clientY < rect.top + rect.height/2`; that boolean picks `drop-before` (bar at top) vs `drop-after` (bar at bottom). On `drop`, read which class is set and `insertBefore` or `insertBefore(…, target.nextSibling)` accordingly.
|
|
||||||
- **Sorted insert on remove.** Walk `<select>` children comparing `option.dataset.overlayName` (lowercased) against the removed name; `insertBefore` the new option ahead of the first option whose name sorts later, or append if none.
|
|
||||||
- **Reset select after add.** Set `select.value = ""` so the placeholder reappears after each add.
|
|
||||||
|
|
||||||
## Files
|
|
||||||
|
|
||||||
| Path | Change |
|
|
||||||
|---|---|
|
|
||||||
| `l4d2web/routes/page_routes.py` | Compute `available_overlays`; pass to template. |
|
|
||||||
| `l4d2web/templates/blueprint_detail.html` | Replace overlay table with `<ol>` + `<select>`; add `<script defer>`. |
|
|
||||||
| `l4d2web/static/css/components.css` | Append `.overlay-picker-*` rules. Reuse existing tokens. |
|
|
||||||
| `l4d2web/static/js/blueprint-overlay-picker.js` | New IIFE. ~150 LOC. |
|
|
||||||
| `l4d2web/tests/test_blueprints.py` | Two new GET-page assertions. |
|
|
||||||
| `l4d2web/tests/test_pages.py` | Update `test_blueprint_detail_has_ordered_overlay_form` to match new shape. |
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
Manual browser flow (`/blueprints/<id>`):
|
|
||||||
|
|
||||||
1. Initial render shows the saved selection in saved order; dropdown holds the rest. No console errors.
|
|
||||||
2. Drag a row up/down. Focus-colored bar appears at the top or bottom of the hover-target row (depending on which half is hovered). On drop, the row moves; hidden inputs reflect the new order.
|
|
||||||
3. Click ✕ on a row. Row vanishes; the same name appears in the dropdown in alphabetical position.
|
|
||||||
4. Pick from the dropdown. New row appears at the end of the list; the option leaves the dropdown; the placeholder is reselected.
|
|
||||||
5. Save the blueprint, reload. Order survives the round-trip.
|
|
||||||
6. Press Escape mid-drag. Drop indicators clear; source row regains opacity; nothing moved.
|
|
||||||
|
|
||||||
Test commands:
|
|
||||||
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/test_blueprints.py -q
|
|
||||||
pytest l4d2web/tests -q
|
|
||||||
```
|
|
||||||
|
|
||||||
## Out of scope / future follow-ups
|
|
||||||
|
|
||||||
- **Drop the `overlay_position_<id>` server-side path.** Once no client emits those fields, `ordered_overlay_ids_from_form` collapses to `[int(v) for v in request.form.getlist("overlay_ids")]`. Test `test_form_update_preserves_ordered_overlays_and_multiline_fields` (`l4d2web/tests/test_blueprints.py:220`) gets simplified accordingly.
|
|
||||||
- **Touch-friendly DnD.** Vendor a polyfill (`drag-drop-touch`) or rewrite the picker on pointer events if mobile editing becomes a real use case.
|
|
||||||
- **Keyboard reorder.** Space-to-grab + arrow-keys + ARIA live announcements. Currently only add/remove are keyboard-accessible.
|
|
||||||
- **Filter on the selected list.** Not needed at v1's overlay counts; revisit if blueprints commonly carry 20+ overlays.
|
|
||||||
|
|
@ -1,332 +0,0 @@
|
||||||
# L4D2 Script Overlays Design
|
|
||||||
|
|
||||||
> **Sandbox engine superseded by [`2026-05-08-l4d2-script-sandbox-v2-systemd.md`](2026-05-08-l4d2-script-sandbox-v2-systemd.md).**
|
|
||||||
> The v1 design below specifies `bubblewrap` + `systemd-run --scope` as the
|
|
||||||
> sandbox engine. The v2 design (approved 2026-05-08, same day) replaced that
|
|
||||||
> with `systemd-run` in service-unit mode and dropped `bubblewrap` entirely.
|
|
||||||
> The current implementation in `deploy/scripts/libexec/left4me-script-sandbox`
|
|
||||||
> follows v2; this v1 design is preserved for archaeology. The rest of the
|
|
||||||
> design (overlay-type unification, resource caps, helper auth model, etc.)
|
|
||||||
> still applies — only the sandbox-engine choice changed.
|
|
||||||
|
|
||||||
**Goal:** Add a single new overlay type, `script`, that lets users author arbitrary build recipes as bash and runs them inside a `bubblewrap` + `systemd-run --scope` sandbox. The new type subsumes the existing `l4d2center_maps` and `cedapug_maps` managed-globals overlay types, both of which are removed in the same change. After this work the overlay type list is exactly `workshop` (unchanged) and `script` (new).
|
|
||||||
|
|
||||||
**Approval status:** User-approved design direction. Implementation proceeds in lockstep with the companion plan at `docs/superpowers/plans/2026-05-08-l4d2-script-overlays.md`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
`left4me` users today have two ways to add content to a server: workshop overlays (rich UI for Steam Workshop items via `WorkshopBuilder`) and a pair of managed global-map overlay types (`l4d2center_maps`, `cedapug_maps`) with bespoke parsers, per-item DB rows, ETag-based change detection, and a daily refresh timer. They cannot author arbitrary build recipes.
|
|
||||||
|
|
||||||
The user's previous setup at `ckn-bw/bundles/left4dead2/files/scripts/overlays/` expressed every recipe as a small bash file: `competitive_rework` (GitHub tarball download), `tickrate` (inline `server.cfg` + addon DLL fetch), `standard` (workshop items + admin-list write), `workshop_maps` (workshop collection import), `l4d2center_maps` (CSV-driven map sync). All five fit naturally into a single "run a sandboxed bash script that populates the overlay dir" model.
|
|
||||||
|
|
||||||
The two managed global-map types in the current codebase are over-engineered for what they do — each is essentially "fetch a manifest, download archives, extract VPKs, place in `addons/`." Folding them into the new `script` type eliminates three database tables, two source-parser modules, the `GlobalMapOverlayBuilder`, the `py7zr` dependency, the global-overlay cache root, and the managed-singleton machinery, while letting an admin paste the equivalent shell code (which the user already wrote years ago) into a normal admin-owned, system-wide script overlay.
|
|
||||||
|
|
||||||
The trust model for the sandbox is "semi-public deployment, registered users." The threat surface is one user reading another user's overlay, the application DB, or arbitrary host secrets, plus runaway scripts exhausting disk/CPU/RAM. Network access is *not* restricted — scripts must be able to download from arbitrary URLs (GitHub, l4d2center, Steam CDN). Sandbox boundaries are namespace-based (mount, PID, IPC, UTS, cgroup), not command-allowlist-based; binary-allowlist sandboxing of bash is theatre because of `eval` and `exec`.
|
|
||||||
|
|
||||||
The test deploy DB is wiped as part of rollout; no data migration is performed. Existing user blueprints that reference `l4d2center_maps` or `cedapug_maps` overlay rows do not survive the change in the test environment.
|
|
||||||
|
|
||||||
A scheduled-refresh feature (the daily timer that today drives the global-map types) is intentionally **out of scope for this iteration**. The two existing systemd units and the `flask refresh-global-overlays` CLI command are deleted with no replacement. Refresh is reintroduced in a later iteration designed against concrete needs.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **Single new overlay type: `script`.** Replaces both managed-globals types. Final type list: `workshop` + `script`. No `tarball`/`inline`/`manual` types — all of those collapse into `script` (with UI templates as a future ergonomics improvement).
|
|
||||||
2. **`Overlay.script` is a DB `TEXT` column** holding the raw bash. No file storage, no revision history in v1. Empty string for `workshop` rows.
|
|
||||||
3. **Build idempotency contract: script runs against the existing overlay dir.** No automatic wipe between builds. Users write `test -f … || curl …`-style guards if they want bandwidth efficiency. A manual "Wipe overlay" button on the detail page resets the dir to empty.
|
|
||||||
4. **No left4me-aware helpers in the sandbox.** The script sees pure bash plus whatever's in `/usr` (RO bind-mount of the host). Workshop items are not exposed via a helper — users wanting workshop content create a `workshop`-type overlay, which has its own first-class UX (thumbnails, collection paste, dedup cache, refresh).
|
|
||||||
5. **Sandbox engine: `bubblewrap` (`bwrap`) inside `systemd-run --scope --collect`.** `systemd-run` provides cgroup v2 limits + walltime kill via `RuntimeMaxSec`; `bwrap` provides the namespace isolation. Both are stable, well-audited, in-tree on Debian.
|
|
||||||
6. **Resource limits (system-wide, not per-overlay):** 1 hour walltime (`RuntimeMaxSec=3600`), 4 GB RAM (`MemoryMax=4G`, `MemorySwapMax=0`), 512 tasks, 200% CPU quota, post-build 20 GB disk cap on `du -sb` of the overlay dir.
|
|
||||||
7. **Network: host-shared.** No `--unshare-net`. Scripts have full outbound. Egress filtering is not in v1; the sandbox prevents reading internal state but does not prevent talking to internal IPs. Acceptable for the current trust model.
|
|
||||||
8. **No auto-seeding of "default" overlays.** Admin manually creates the equivalents of the old `l4d2center-maps`/`cedapug-maps` post-deploy by pasting the bash. The deploy script does not insert overlay rows.
|
|
||||||
9. **Daily/scheduled refresh: out of scope for this iteration.** No `auto_refresh` flag, no timer, no CLI command. Manual rebuild via the detail-page button is the only build trigger after this change.
|
|
||||||
10. **Permissions mirror workshop overlays.** Any logged-in user can create a private (`user_id = me`) script overlay. Admin can create system-wide (`user_id = NULL`). Owner or admin can edit/delete.
|
|
||||||
11. **Failure semantics via `Overlay.last_build_status`** (`'' | 'ok' | 'failed'`). Drives a "rebuild required" badge on the list and detail pages. Server initialization does **not** auto-block on `failed` (matches workshop's current behavior).
|
|
||||||
12. **Wipe is just another sandbox invocation.** The wipe endpoint runs the literal script `find /overlay -mindepth 1 -delete` through the same `left4me-script-sandbox` helper. No second helper, no privilege/UID puzzle (files are owned by `l4d2-sandbox`, who runs the wipe). After a successful wipe, `last_build_status` is reset to `''`. Wipe does **not** auto-enqueue a rebuild — the user decides.
|
|
||||||
13. **Privileged helper: `/usr/local/libexec/left4me/left4me-script-sandbox`.** Same pattern as the existing `left4me-overlay`, `left4me-systemctl`, `left4me-journalctl` helpers. Bash, owned root, mode 0755. The web user invokes it via `sudo -n` per a sudoers fragment. Root is needed to set up the namespaces; bwrap drops to the unprivileged `l4d2-sandbox` UID immediately.
|
|
||||||
14. **Dedicated sandbox UID `l4d2-sandbox`** (system user, `/usr/sbin/nologin`, no home). Owns nothing on the host outside what bwrap binds in. UID-drop happens inside the bwrap invocation via `--uid`/`--gid`.
|
|
||||||
15. **Strict argument validation in the helper.** Overlay id matches `^[0-9]+$`; overlay dir must exist under `/var/lib/left4me/overlays/`; script path must exist. Defense in depth — the real authorization check lives in the web app.
|
|
||||||
16. **Streaming I/O via the existing `run_with_streamed_output` helper.** Same plumbing `WorkshopBuilder` already uses for `steamcmd`/`curl` invocations. No new SSE/log path.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```text
|
|
||||||
Overlay row (type=script, script=TEXT, last_build_status)
|
|
||||||
│
|
|
||||||
▼ build_overlay(overlay_id) job
|
|
||||||
│
|
|
||||||
▼ BUILDERS["script"].build(overlay, on_stdout, on_stderr, should_cancel)
|
|
||||||
│
|
|
||||||
▼ ScriptBuilder writes overlay.script → tmpfile, then:
|
|
||||||
│ sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>
|
|
||||||
│
|
|
||||||
▼ Helper validates args, then exec()s:
|
|
||||||
│ systemd-run --scope --collect
|
|
||||||
│ -p MemoryMax=4G -p MemorySwapMax=0
|
|
||||||
│ -p TasksMax=512 -p CPUQuota=200%
|
|
||||||
│ -p RuntimeMaxSec=3600
|
|
||||||
│ -- bwrap [namespace flags...] /bin/bash /script.sh
|
|
||||||
│
|
|
||||||
▼ Inside the sandbox the script sees:
|
|
||||||
│ /overlay ← /var/lib/left4me/overlays/{id} RW (the build target)
|
|
||||||
│ /tmp,/run ← fresh tmpfs RW (ephemeral)
|
|
||||||
│ /usr,/lib,/lib64,/etc/{ssl,resolv.conf,nsswitch} RO (host-curated)
|
|
||||||
│ /proc,/dev ← fresh
|
|
||||||
│ network ← shared with host
|
|
||||||
│ UID/GID ← l4d2-sandbox (no_new_privs implicit in bwrap)
|
|
||||||
│
|
|
||||||
▼ stdout/stderr → run_with_streamed_output → existing job-log SSE stream
|
|
||||||
▼ After exit:
|
|
||||||
│ exit 0 ∧ du -sb /overlay ≤ 20 GB → last_build_status='ok'
|
|
||||||
│ any other outcome → last_build_status='failed'
|
|
||||||
```
|
|
||||||
|
|
||||||
The host library (`l4d2host`) is unchanged. The `KernelOverlayFSMounter` already mounts whatever's at `overlays/{id}/` regardless of how it got there. The Job model and worker model are essentially unchanged — `script` is just another overlay type for the same `build_overlay` operation that today supports `workshop`.
|
|
||||||
|
|
||||||
```python
|
|
||||||
BUILDERS = {
|
|
||||||
"workshop": WorkshopBuilder(),
|
|
||||||
"script": ScriptBuilder(),
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Data Model
|
|
||||||
|
|
||||||
### `Overlay` (modified)
|
|
||||||
|
|
||||||
```text
|
|
||||||
id INTEGER PK AUTOINCREMENT
|
|
||||||
name VARCHAR(255) NOT NULL
|
|
||||||
path VARCHAR(255) NOT NULL -- str(id) for new rows
|
|
||||||
type VARCHAR(16) NOT NULL -- 'workshop' | 'script'
|
|
||||||
user_id INTEGER NULL REFERENCES users(id) -- NULL = system-wide
|
|
||||||
|
|
||||||
script TEXT NOT NULL DEFAULT '' -- new; meaningful for type='script'
|
|
||||||
last_build_status VARCHAR(16) NOT NULL DEFAULT '' -- new; '' | 'ok' | 'failed'
|
|
||||||
|
|
||||||
created_at, updated_at
|
|
||||||
|
|
||||||
UNIQUE INDEX on (name) WHERE user_id IS NULL
|
|
||||||
UNIQUE INDEX on (name, user_id) WHERE user_id IS NOT NULL
|
|
||||||
INDEX on (type, user_id)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tables removed
|
|
||||||
|
|
||||||
- `global_overlay_item_files`
|
|
||||||
- `global_overlay_items`
|
|
||||||
- `global_overlay_sources`
|
|
||||||
|
|
||||||
Drop order matters for the SQLite migration: drop `_item_files` first (FK to `_items`), then `_items` (FK to `_sources`), then `_sources` (FK to `overlays`).
|
|
||||||
|
|
||||||
### Unchanged
|
|
||||||
|
|
||||||
`WorkshopItem`, `overlay_workshop_items`, `Job` (including `Job.overlay_id` and nullable `Job.user_id`), `Server`, `Blueprint`, etc.
|
|
||||||
|
|
||||||
## Filesystem Layout
|
|
||||||
|
|
||||||
```text
|
|
||||||
${LEFT4ME_ROOT}/
|
|
||||||
overlays/
|
|
||||||
{overlay_id}/ # script writes here; mounted by host
|
|
||||||
left4dead2/... # whatever the script produces
|
|
||||||
workshop_cache/{steam_id}.vpk # workshop type only — unchanged
|
|
||||||
|
|
||||||
# removed:
|
|
||||||
# global_overlay_cache/ # was used by managed-globals types
|
|
||||||
```
|
|
||||||
|
|
||||||
Single tree per overlay. No per-overlay scratch cache (the chosen idempotency model is "script runs against existing dir," so any caching the user wants lives inside the overlay dir and is preserved between builds).
|
|
||||||
|
|
||||||
The sandbox bind-mounts `${LEFT4ME_ROOT}/overlays/{id}/` to `/overlay` (RW). Nothing else under `${LEFT4ME_ROOT}` is visible inside the sandbox.
|
|
||||||
|
|
||||||
## Sandbox
|
|
||||||
|
|
||||||
### Helper script
|
|
||||||
|
|
||||||
`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, mode 0755, owned root:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
#!/bin/bash
|
|
||||||
# args: <overlay_id> <script_path>
|
|
||||||
set -euo pipefail
|
|
||||||
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
|
|
||||||
OVERLAY_ID=$1; SCRIPT=$2
|
|
||||||
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
|
|
||||||
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
|
|
||||||
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir" >&2; exit 65; }
|
|
||||||
[[ -f $SCRIPT ]] || { echo "no script" >&2; exit 65; }
|
|
||||||
|
|
||||||
SBX_UID=$(id -u l4d2-sandbox); SBX_GID=$(id -g l4d2-sandbox)
|
|
||||||
|
|
||||||
exec systemd-run --quiet --scope --collect \
|
|
||||||
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
|
|
||||||
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
|
|
||||||
-- bwrap \
|
|
||||||
--die-with-parent --new-session \
|
|
||||||
--unshare-pid --unshare-ipc --unshare-uts --unshare-cgroup \
|
|
||||||
--uid "$SBX_UID" --gid "$SBX_GID" \
|
|
||||||
--proc /proc --dev /dev --tmpfs /tmp --tmpfs /run \
|
|
||||||
--ro-bind /usr /usr --ro-bind /lib /lib --ro-bind /lib64 /lib64 \
|
|
||||||
--symlink usr/bin /bin --symlink usr/sbin /sbin \
|
|
||||||
--ro-bind /etc/resolv.conf /etc/resolv.conf \
|
|
||||||
--ro-bind /etc/ssl /etc/ssl \
|
|
||||||
--ro-bind /etc/ca-certificates /etc/ca-certificates \
|
|
||||||
--ro-bind /etc/nsswitch.conf /etc/nsswitch.conf \
|
|
||||||
--bind "$OVERLAY_DIR" /overlay \
|
|
||||||
--chdir /overlay \
|
|
||||||
--setenv HOME /tmp --setenv PATH /usr/bin:/usr/sbin \
|
|
||||||
--setenv OVERLAY /overlay \
|
|
||||||
--ro-bind "$SCRIPT" /script.sh \
|
|
||||||
/bin/bash /script.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Network is *not* unshared (no `--unshare-net`); the sandbox shares the host network namespace. Every transient unit is visible via `systemctl list-units --type=scope` while running and journaled afterward (`journalctl --user-unit=run-…scope` or system journal depending on invocation).
|
|
||||||
|
|
||||||
### Sudoers fragment
|
|
||||||
|
|
||||||
Append to `deploy/files/etc/sudoers.d/left4me`:
|
|
||||||
|
|
||||||
```
|
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
|
|
||||||
### System user
|
|
||||||
|
|
||||||
Provisioned in `deploy/deploy-test-server.sh`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
useradd --system --no-create-home --shell /usr/sbin/nologin l4d2-sandbox
|
|
||||||
apt-get install -y bubblewrap
|
|
||||||
```
|
|
||||||
|
|
||||||
## Build Lifecycle
|
|
||||||
|
|
||||||
`ScriptBuilder` lives in `l4d2web/services/overlay_builders.py` next to `WorkshopBuilder`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
class ScriptBuilder:
|
|
||||||
def build(self, overlay, *, on_stdout, on_stderr, should_cancel):
|
|
||||||
with tempfile.NamedTemporaryFile("w", suffix=".sh", delete=False) as f:
|
|
||||||
f.write(overlay.script or "")
|
|
||||||
script_path = f.name
|
|
||||||
try:
|
|
||||||
cmd = [
|
|
||||||
"sudo", "-n",
|
|
||||||
"/usr/local/libexec/left4me/left4me-script-sandbox",
|
|
||||||
str(overlay.id), script_path,
|
|
||||||
]
|
|
||||||
run_with_streamed_output(cmd, on_stdout, on_stderr, should_cancel)
|
|
||||||
self._enforce_disk_budget(overlay.id, on_stderr)
|
|
||||||
finally:
|
|
||||||
os.unlink(script_path)
|
|
||||||
|
|
||||||
def _enforce_disk_budget(self, overlay_id, on_stderr):
|
|
||||||
size = subprocess.check_output(["du", "-sb", overlay_path(overlay_id)])
|
|
||||||
if int(size.split()[0]) > 20 * 1024**3:
|
|
||||||
on_stderr("overlay exceeded 20 GB disk cap")
|
|
||||||
raise BuildError("disk-cap-exceeded")
|
|
||||||
```
|
|
||||||
|
|
||||||
`run_with_streamed_output` is the existing helper used by `WorkshopBuilder` for `steamcmd`/`curl` invocations. The `should_cancel` callback fires `kill -TERM` on the sudo-`systemd-run` process tree; cgroup-collect tears down the whole scope on exit.
|
|
||||||
|
|
||||||
The job worker's existing job-completion path writes `Overlay.last_build_status = 'ok'` on success and `'failed'` on any non-zero exit / `BuildError` / cancel. This is a single column update inside the existing transaction; no new infrastructure.
|
|
||||||
|
|
||||||
## UI
|
|
||||||
|
|
||||||
### Create modal (`templates/overlays.html`)
|
|
||||||
|
|
||||||
The existing modal grows one option in the type radio: `Workshop | Script`. Name field unchanged. After insert, the web app generates `path = str(overlay_id)` for new rows (existing pattern).
|
|
||||||
|
|
||||||
### Detail page when `type='script'` (`templates/overlay_detail.html`)
|
|
||||||
|
|
||||||
- Plain styled `<textarea>` for `overlay.script` with a Save button → `POST /overlays/{id}/script`. No CodeMirror dependency in v1 (out of scope; keep frontend dep-light).
|
|
||||||
- "Rebuild" button → `POST /overlays/{id}/build`. Existing pattern from workshop overlays.
|
|
||||||
- "Wipe overlay" button (red, confirm-modal) → `POST /overlays/{id}/wipe`.
|
|
||||||
- `last_build_status` indicator badge: empty / "ok" / "failed".
|
|
||||||
- Live build log via existing SSE plumbing on the related Job row.
|
|
||||||
|
|
||||||
### Detail page when `type='workshop'`: unchanged.
|
|
||||||
|
|
||||||
### Sections removed
|
|
||||||
|
|
||||||
The global-source detail block (`overlay_detail.html` lines 34–46) is deleted along with the managed-globals subsystem.
|
|
||||||
|
|
||||||
## Routes
|
|
||||||
|
|
||||||
`l4d2web/routes/overlay_routes.py` adds:
|
|
||||||
|
|
||||||
| Method | Path | Purpose |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | `/overlays/{id}/script` | Update `script` text. Auto-enqueue coalesced `build_overlay` job. |
|
|
||||||
| POST | `/overlays/{id}/wipe` | Invoke `left4me-script-sandbox` with the literal script `find /overlay -mindepth 1 -delete`. Owner/admin only. Refuses if a `build_overlay` for this overlay is running. After success, set `last_build_status=''`. Does not auto-enqueue a rebuild. |
|
|
||||||
| POST | `/overlays/{id}/build` | Manual rebuild — same pattern as today's workshop overlay manual rebuild. |
|
|
||||||
|
|
||||||
Existing `POST /overlays` accepts `type=script` and an optional initial `script` body.
|
|
||||||
|
|
||||||
## Permissions
|
|
||||||
|
|
||||||
| Action | Who |
|
|
||||||
|---|---|
|
|
||||||
| Create script overlay (private, `user_id = me`) | Any authenticated user |
|
|
||||||
| Create script overlay (system-wide, `user_id = NULL`) | Admin |
|
|
||||||
| Edit (script body, name) | Owner or admin |
|
|
||||||
| Wipe / Rebuild | Owner or admin |
|
|
||||||
| Delete | Owner or admin |
|
|
||||||
| View | Owner, admin, or any user when `user_id IS NULL` |
|
|
||||||
|
|
||||||
These match the existing rules for workshop overlays.
|
|
||||||
|
|
||||||
## Job Worker / Scheduler
|
|
||||||
|
|
||||||
`services/job_worker.py` drops `"refresh_global_overlays"` from `GLOBAL_OPERATIONS` and removes the corresponding `refresh_global_overlays_running` and `blocked_servers_by_overlay` plumbing that exists only for the global-maps subsystem. The remaining mutex rules already cover:
|
|
||||||
|
|
||||||
- `build_overlay` per overlay (one running build per overlay).
|
|
||||||
- `install` and `refresh_workshop_items` as global mutexes.
|
|
||||||
- Server start/init blocks if any `build_overlay` for an overlay in the server's blueprint is running.
|
|
||||||
|
|
||||||
No new rules are needed for `script` — its build is mechanically identical to a `workshop` build from the scheduler's perspective.
|
|
||||||
|
|
||||||
## Daily Refresh — Removed
|
|
||||||
|
|
||||||
This iteration deletes the daily-refresh subsystem entirely:
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-refresh-global-overlays.timer` and `.service` — deleted.
|
|
||||||
- `flask refresh-global-overlays` CLI command in `l4d2web/cli.py` — deleted.
|
|
||||||
- No replacement timer, no replacement CLI, no `auto_refresh` column on `Overlay`.
|
|
||||||
|
|
||||||
The only build trigger after this change is the user clicking Rebuild on the detail page (or the auto-enqueue when they Save the script body). A scheduled-refresh feature is reintroduced in a future iteration designed against concrete operational needs.
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **Sandbox escape via kernel bug.** `bwrap` has a strong track record but is not invulnerable. Mitigated by running as `l4d2-sandbox` (no privileged capabilities), no setuid binaries reachable, `no_new_privs` implicit. A successful escape would land in an unprivileged UID with no host secrets reachable.
|
|
||||||
- **Disk fill via runaway script.** A script that writes a 20 GB+ payload to `/overlay` succeeds inside the sandbox and only fails afterward at the post-build `du` check. The 20 GB lands on disk transiently. Mitigated by the kernel's per-cgroup IO accounting being unaware of file size (no good IO-time limit), accepting this as a v1 trade-off; a future improvement is overlay-dir-on-its-own-filesystem with a quota.
|
|
||||||
- **Network exfiltration.** Script can connect to anything outbound, including internal IPs. Acceptable for the current trust model (semi-public; users have credentials). Egress firewall is out of scope.
|
|
||||||
- **Build-mid-server-running.** The scheduler refuses `build_overlay` for an overlay attached to a starting/running server (existing rule, unchanged). Good. A user can still rebuild while a server using a *different* blueprint runs concurrently.
|
|
||||||
- **Wipe race with running build.** The wipe endpoint refuses if a `build_overlay` for the overlay is running. Without this check, a wipe could blow away files mid-script and produce undefined results.
|
|
||||||
- **Stale `last_build_status`.** A row inserted via direct DB write or restored from backup could carry an `'ok'` status that no longer reflects reality. Treated as cosmetic; users can rebuild to refresh.
|
|
||||||
- **Sudoers misconfig.** A typo in the sudoers fragment could grant `left4me` more than intended. Mitigated by deploy-artifact tests asserting the exact expected lines.
|
|
||||||
- **DB row deletion racing the sandbox.** A user deleting an overlay while its build runs would invalidate the bind-mount target. Mitigated by the existing scheduler rule that tracks running overlays; delete should refuse if a build is running. (Existing pattern for workshop overlays; reuse.)
|
|
||||||
- **Migration drops globals tables.** Acceptable for the test deploy. Production rollout would need a different migration story; this spec explicitly assumes test-deploy DB wipe.
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
- **Scheduled / daily refresh.** Intentionally removed in this iteration. Reintroduced later, designed against the use cases that emerge.
|
|
||||||
- **Per-overlay resource overrides.** All script overlays share the same 1 h / 4 GB / 20 GB envelope. If a real overlay needs more (l4d2center mirror at peak), revisit.
|
|
||||||
- **CodeMirror or other rich script editor.** Plain `<textarea>` in v1.
|
|
||||||
- **Egress allowlist / proxy.** No network restrictions on the sandbox in v1.
|
|
||||||
- **`$CACHE` scratch dir** persisted across builds. Users cache inside the overlay dir if they want; idempotency model is "script runs against existing dir."
|
|
||||||
- **Multi-tenant cgroup tree per user.** All sandboxes share the same cgroup-quota envelope.
|
|
||||||
- **Revision history on `script` column.** No `overlay_script_revisions` table; whatever's in the row is the current script.
|
|
||||||
- **Auto-seeding of l4d2center / cedapug equivalents.** Admin pastes the script post-deploy.
|
|
||||||
- **Migration that preserves existing global-map overlay rows.** Test deploy DB is wiped.
|
|
||||||
- **Container-per-build (podman / docker).** Heavier than `bwrap`; revisit only if multi-tenant escalates to "fully public sign-up."
|
|
||||||
- **left4me-aware helpers** (`workshop`, `download`, `extract`) inside the sandbox. Pure bash + host `/usr` only.
|
|
||||||
|
|
||||||
## Implementation Boundaries
|
|
||||||
|
|
||||||
- **`l4d2host` is unchanged.** The host library has no concept of overlay types and the mount layer (`KernelOverlayFSMounter`) doesn't care how the overlay dir got populated.
|
|
||||||
- **The `OverlayBuilder` Protocol is unchanged** — same `build(overlay, *, on_stdout, on_stderr, should_cancel)` signature. `ScriptBuilder` plugs into the existing registry.
|
|
||||||
- **The job worker model is unchanged.** Same operations, same logs, same SSE plumbing, same scheduler rules (minus the refresh_global_overlays entry).
|
|
||||||
- **No new application-level dependencies.** Vendored HTMX, no new Python packages. Two new system dependencies: `bubblewrap` apt package and the `l4d2-sandbox` system user.
|
|
||||||
- **No new config keys.** Same env files (`/etc/left4me/host.env`, `/etc/left4me/web.env`).
|
|
||||||
- **DB migration is destructive for global-maps overlay rows.** This is acceptable per the test-deploy assumption; a production-rollout follow-up would need to address it.
|
|
||||||
- The companion implementation plan governs task ordering and verification commands. Implementation must not start without explicit user approval per that plan's gate.
|
|
||||||
|
|
@ -1,138 +0,0 @@
|
||||||
# L4D2 Script Sandbox v2 — Systemd-Only
|
|
||||||
|
|
||||||
**Goal:** Replace the bwrap-based `left4me-script-sandbox` helper with one that uses `systemd-run` in **service-unit mode** alone. Drop `bubblewrap` as a system dependency. Gain capability bounding, seccomp filtering, kernel-tunable / -module / -log protection, address-family restriction, `LockPersonality`, `MemoryDenyWriteExecute`, and `RestrictSUIDSGID` — none of which the bwrap+systemd-run-scope composition could provide. Lose PID-namespace isolation (no `PrivatePID=` directive in systemd) — judged acceptable for the current trust model.
|
|
||||||
|
|
||||||
**Approval status:** User-approved 2026-05-08 after smoke testing on `ckn@10.0.4.128`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The v1 sandbox (see `2026-05-08-l4d2-script-overlays-design.md`) layers `bubblewrap` for namespacing inside `systemd-run --scope` for cgroup limits. That works, but `--scope` units register an existing process tree and so cannot accept service-only directives like `NoNewPrivileges=`, `ProtectSystem=`, `SystemCallFilter=`, `CapabilityBoundingSet=`, etc. Smoke testing on the deployed host confirmed bwrap covers mount/PID/IPC/UTS namespacing well, but leaves capability bounding, seccomp, and kernel-surface protection unenforced.
|
|
||||||
|
|
||||||
A switch to `systemd-run` in default (transient service) mode unlocks the full hardening surface. Smoke testing of a v2 prototype against the deployed test host confirmed:
|
|
||||||
|
|
||||||
- Every isolation invariant the bwrap version provides (filesystem masking, UID drop, network reachability, `/overlay` RW bind, host-side `l4d2-sandbox` ownership, host secret hiding) is reproducible with systemd directives.
|
|
||||||
- All cgroup limits (`memory.max=4G`, `memory.swap.max=0`, `pids.max=512`, `cpu.max=200%`, `RuntimeMaxSec=3600`) apply identically.
|
|
||||||
- `MemoryError` fires at the 4 GB cap (cgroup-enforced).
|
|
||||||
- The wipe path (`find /overlay -mindepth 1 -delete`) succeeds.
|
|
||||||
- Hardening directives the v1 design couldn't express enforce real syscall blocks: `unshare(CLONE_NEWUSER)`, `mount(2)`, `personality(2)`, `bpf(2)`, `swapoff(2)`, `sysctl -w` are all blocked.
|
|
||||||
|
|
||||||
The single behavioral regression: host process IDs are visible via `/proc` and `ps -ef` because systemd has no `PrivatePID=` directive. Sending signals to those processes is still blocked by the kernel's UID-mismatch check (`l4d2-sandbox` cannot signal `root`-owned processes). Information disclosure is the only leak; signal capability is intact.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **Replace the helper body wholesale.** No `bwrap` invocation. `systemd-run` in service mode does both isolation and resource limits.
|
|
||||||
2. **Helper path, sudoers rule, ScriptBuilder API, and `l4d2-sandbox` UID are unchanged.** The Python side (`run_sandboxed_script`, route handlers, tests) does not change.
|
|
||||||
3. **`bubblewrap` apt dependency dropped from `deploy-test-server.sh`.**
|
|
||||||
4. **`left4me.db` file mode tightened to 0640 root:left4me at deploy time.** This is a host-hygiene fix that is independent of the sandbox change but was surfaced by smoke testing — without it, *any* host user (and, transitively, the sandbox) could read the application database.
|
|
||||||
5. **`TemporaryFileSystem=/var/lib` is required.** `ProtectSystem=strict` makes `/var/lib/left4me` read-only but visible; the only way to reliably hide its contents from the unit is to mask the parent with a tmpfs. The `BindPaths=…/overlays/{id}:/overlay` mount is unaffected because `/overlay` is at a different path.
|
|
||||||
6. **`PrivatePID=` is not configured.** systemd has no such directive. `ps -ef` from inside the sandbox shows host processes. The kernel's UID-based signal restriction blocks any actual interaction with them. Acceptable for the current trust model.
|
|
||||||
7. **Walltime kill remains `RuntimeMaxSec=3600`.** Same as v1.
|
|
||||||
8. **Network namespace remains shared with the host.** No `PrivateNetwork=`. Scripts must reach Steam / l4d2center / GitHub / etc.
|
|
||||||
9. **`SystemCallFilter=@system-service @network-io`** is the seccomp baseline. systemd's curated `@system-service` group is "everything a normal service does"; adding `@network-io` is explicit even though it overlaps. Build failures revealing missing syscall classes are surfaced via `journalctl` and addressed by widening the filter (`@process`, etc.) on demand.
|
|
||||||
10. **Single helper file replaces v1.** Not adding a `-v2` variant. The v1 implementation is removed in the same change.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```text
|
|
||||||
sudo helper
|
|
||||||
└─ systemd-run --service (default) --pipe --wait
|
|
||||||
(transient .service unit, full hardening directives)
|
|
||||||
└─ /bin/bash /script.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
systemd-run in service mode:
|
|
||||||
- Opens a transient service unit on the system bus.
|
|
||||||
- Applies all `-p` properties as the unit's exec context.
|
|
||||||
- Forks; the child sets up the unit's namespaces (mount, IPC, user), drops privileges to `User=l4d2-sandbox`, applies the seccomp filter, and `execve()`s `/bin/bash /script.sh`.
|
|
||||||
- `--pipe` connects the unit's stdin/stdout/stderr to the calling helper's stdio (so the existing `run_command` harness in `ScriptBuilder` continues to capture line-by-line).
|
|
||||||
- `--wait` blocks until the unit terminates and propagates the exit code.
|
|
||||||
- `--collect` removes the unit on exit even if it failed.
|
|
||||||
- The cgroup carries the resource limits; the systemd timer enforces `RuntimeMaxSec=3600`.
|
|
||||||
|
|
||||||
### Helper
|
|
||||||
|
|
||||||
`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, mode 0755, owned root:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
#!/bin/bash
|
|
||||||
set -euo pipefail
|
|
||||||
[[ $# -eq 2 ]] || { echo "usage: $0 <overlay_id> <script>" >&2; exit 64; }
|
|
||||||
OVERLAY_ID=$1; SCRIPT=$2
|
|
||||||
[[ "$OVERLAY_ID" =~ ^[0-9]+$ ]] || { echo "bad overlay id" >&2; exit 64; }
|
|
||||||
OVERLAY_DIR=/var/lib/left4me/overlays/$OVERLAY_ID
|
|
||||||
[[ -d $OVERLAY_DIR ]] || { echo "no overlay dir at $OVERLAY_DIR" >&2; exit 65; }
|
|
||||||
[[ -f $SCRIPT ]] || { echo "no script at $SCRIPT" >&2; exit 65; }
|
|
||||||
|
|
||||||
if [[ "${LEFT4ME_SCRIPT_SANDBOX_DRY_RUN:-}" == "1" ]]; then
|
|
||||||
echo "DRY RUN: overlay_id=$OVERLAY_ID script=$SCRIPT overlay_dir=$OVERLAY_DIR"
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"
|
|
||||||
chmod 0755 "$OVERLAY_DIR"
|
|
||||||
|
|
||||||
exec systemd-run --quiet --collect --wait --pipe \
|
|
||||||
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
|
||||||
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
|
|
||||||
-p NoNewPrivileges=yes \
|
|
||||||
-p ProtectSystem=strict -p ProtectHome=yes \
|
|
||||||
-p PrivateTmp=yes -p PrivateDevices=yes -p PrivateIPC=yes \
|
|
||||||
-p ProtectKernelTunables=yes -p ProtectKernelModules=yes \
|
|
||||||
-p ProtectKernelLogs=yes -p ProtectControlGroups=yes \
|
|
||||||
-p RestrictNamespaces=yes \
|
|
||||||
-p RestrictAddressFamilies="AF_INET AF_INET6 AF_UNIX" \
|
|
||||||
-p RestrictSUIDSGID=yes -p LockPersonality=yes \
|
|
||||||
-p MemoryDenyWriteExecute=yes \
|
|
||||||
-p SystemCallFilter="@system-service @network-io" \
|
|
||||||
-p SystemCallArchitectures=native \
|
|
||||||
-p CapabilityBoundingSet= -p AmbientCapabilities= \
|
|
||||||
-p TemporaryFileSystem="/etc /var/lib" \
|
|
||||||
-p BindReadOnlyPaths="/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives ${SCRIPT}:/script.sh" \
|
|
||||||
-p BindPaths="${OVERLAY_DIR}:/overlay" \
|
|
||||||
-p WorkingDirectory=/overlay \
|
|
||||||
-p Environment="HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay" \
|
|
||||||
-p MemoryMax=4G -p MemorySwapMax=0 -p TasksMax=512 \
|
|
||||||
-p CPUQuota=200% -p RuntimeMaxSec=3600 \
|
|
||||||
-- /bin/bash /script.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Sudoers fragment
|
|
||||||
|
|
||||||
Unchanged from v1: `left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox`.
|
|
||||||
|
|
||||||
### System user
|
|
||||||
|
|
||||||
Unchanged from v1: `l4d2-sandbox` (`useradd --system --no-create-home --shell /usr/sbin/nologin`).
|
|
||||||
|
|
||||||
### Filesystem expectations
|
|
||||||
|
|
||||||
- `/var/lib/left4me` must be mode 0711 (left4me-owned). Already provisioned by v1 deploy script.
|
|
||||||
- `/var/lib/left4me/left4me.db` mode 0640 root:left4me. **New** — added by this change.
|
|
||||||
- Overlay directory `/var/lib/left4me/overlays/{id}/` chowned to `l4d2-sandbox:l4d2-sandbox` 0755 by the helper before each run. Unchanged from v1.
|
|
||||||
|
|
||||||
## Build Lifecycle (unchanged from v1)
|
|
||||||
|
|
||||||
`ScriptBuilder.build()` writes the script to a 0644 tmpfile, exec's `sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id> <tmpfile>` via `run_command`, then runs `_enforce_disk_budget`. The helper's internal mechanism changes; the wrapper API is identical. `Overlay.last_build_status` is written by the job worker on completion.
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **systemd CVE landing in our directive set.** Single-tool migration removes one isolation layer. Mitigated by uid drop + cgroup limits + `NoNewPrivileges=yes` (kernel-enforced state independent of namespace setup). The escape would be an unprivileged process with no filesystem isolation but still capped on resources; same severity envelope as a hypothetical bwrap CVE in v1. The trust model (registered users) makes a single isolation layer acceptable.
|
|
||||||
- **`SystemCallFilter` rejecting a syscall a user script unexpectedly needs.** Symptom: build fails with SIGSYS. Diagnosis: `journalctl --since "1 min ago" | grep SECCOMP`. Resolution: widen the filter (`+@process`, `+@privileged` if the script genuinely needs more than a normal service). v1 had no syscall filter, so this is a new failure class.
|
|
||||||
- **`ProtectSystem=strict` masking something a script wanted to write to.** Only `/overlay`, `/tmp`, `/run` are writable inside the sandbox. Same as v1.
|
|
||||||
- **Host PID visibility (no `PrivatePID=`).** Information disclosure; not a privilege boundary.
|
|
||||||
- **`MemoryDenyWriteExecute=yes` blocking JITs.** A script that launches `node` / a JIT runtime would fail because W+X mappings are blocked. None of the recipe set the user has historically used (curl + tar + cp) needs a JIT; revisit if a real script trips this.
|
|
||||||
- **`RestrictAddressFamilies` blocking some download tools.** `curl`, `wget`, `git over https` use `AF_INET`/`AF_INET6`; `getent hosts` uses `AF_UNIX` (nss). Smoke-tested as working. A script that wanted raw sockets (`AF_PACKET`) or netlink (`AF_NETLINK`) would fail; neither is plausible for build recipes.
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
- **Per-overlay UID isolation.** Cross-script-overlay write access is still possible after a hypothetical sandbox bypass (every script overlay's dir is owned by `l4d2-sandbox`). A per-overlay UID pool was discussed as the next-step hardening but is deferred.
|
|
||||||
- **`PrivateNetwork=` / egress filtering.** No change from v1.
|
|
||||||
- **systemd-nspawn or LXC.** Researched; both are heavier than necessary for transient bash builds.
|
|
||||||
- **`PrivatePID=` workaround via `unshare`.** Not pursued — would require re-introducing a wrapper inside the unit, defeating the simplification.
|
|
||||||
|
|
||||||
## Implementation Boundaries
|
|
||||||
|
|
||||||
- **Web app code is unchanged.** `ScriptBuilder`, `run_sandboxed_script`, route handlers, models, migrations — all untouched. The migration is purely in the deployed helper script and adjacent deploy artifacts.
|
|
||||||
- **`bubblewrap` apt package removed.** Already absent from production paths after this change; deploy script updated.
|
|
||||||
- **No new systemd unit files.** Each invocation is a transient unit named `left4me-script-{overlay_id}-{pid}.service`.
|
|
||||||
- **No application-level dependency changes.** No new Python packages, no template changes, no DB migration.
|
|
||||||
|
|
@ -1,113 +0,0 @@
|
||||||
# L4D2 Script Sandbox v3 — Egress Filter (Public Internet Only)
|
|
||||||
|
|
||||||
**Goal:** Restrict the script-overlay sandbox to public-internet egress only. Block reachability to the host's own services (localhost), the LAN, and any private RFC1918 / link-local / multicast / CGNAT / ULA addresses. Public DNS is preserved by bind-mounting a sandbox-only `resolv.conf` pointing at Cloudflare + Google.
|
|
||||||
|
|
||||||
**Approval status:** User-approved 2026-05-08. Implemented and smoke-tested on `ckn@10.0.4.128`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
After the v2 (systemd-only) migration, the sandbox still shared the host's network namespace. A live probe demonstrated the script could:
|
|
||||||
|
|
||||||
- Reach the web app on `127.0.0.1:8000` (HTTP 200 from `/health`).
|
|
||||||
- Reach the host's SSH daemon on `127.0.0.1:22` (banner returned).
|
|
||||||
- Reach the host on the LAN at `10.0.4.128:22` (banner returned).
|
|
||||||
- Reach the LAN gateway / DNS server at `10.0.0.1`.
|
|
||||||
- See Unix sockets in `/run` (`AF_UNIX` allowed).
|
|
||||||
|
|
||||||
The threat model says the sandbox should reach the public internet to download Workshop / l4d2center / GitHub content, but should **not** be able to talk to the host or LAN. systemd's `IPAddressDeny=` BPF cgroup egress filter is the right tool. It attaches a BPF program (`sd_fw_egress`) to the unit's cgroup; matching packets are silently dropped at send time.
|
|
||||||
|
|
||||||
A complication: the host's `/etc/resolv.conf` typically points at a private-IP DNS server (10.0.0.1 in the test deploy). Naively blocking `10.0.0.0/8` kills DNS, which kills outbound HTTP. The fix is to give the sandbox a static `resolv.conf` with public resolvers; DNS traffic then targets allowed public IPs.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **`IPAddressDeny=` alone — no `IPAddressAllow=any`.** The systemd documentation claims "more specific rule wins" when both are set, but on systemd 257 + kernel 6.12 (and likely other combos), `IPAddressAllow=any` silently overrides every `IPAddressDeny=` rule. Verified empirically. With only `IPAddressDeny=` set, the kernel's default "allow all" applies to non-listed addresses; the listed CIDRs are dropped at the egress hook. **This must not be regressed** — adding back `IPAddressAllow=any` reopens every blocked range.
|
|
||||||
|
|
||||||
2. **Explicit CIDRs, no shorthand keywords.** systemd's unit-file parser accepts `localhost`, `link-local`, `multicast` shortcuts, but the `systemd-run -p` parser rejects them with `Failed to parse IP address prefix: localhost`. Use the CIDRs directly: `127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7`.
|
|
||||||
|
|
||||||
3. **Static `/etc/left4me/sandbox-resolv.conf` with public resolvers** (Cloudflare 1.1.1.1, Google 8.8.8.8). Bind-mounted into the sandbox at `/etc/resolv.conf` via `BindReadOnlyPaths=/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf`. Two nameservers for redundancy. Picking other public resolvers (Quad9, OpenDNS) would also be acceptable; the file is the source of truth, not the helper.
|
|
||||||
|
|
||||||
4. **`AF_UNIX` stays in `RestrictAddressFamilies=`.** Dropping it would risk breaking NSS / syslog / D-Bus introspection paths for marginal gain — the IP-level filter handles the actual threat (reaching host TCP services). The Unix-socket surface (D-Bus system bus, systemd notify) is uid-gated and `l4d2-sandbox` has no special D-Bus permissions.
|
|
||||||
|
|
||||||
5. **No `PrivateNetwork=`.** That would block all networking, including the public internet. The whole point of script overlays is reaching public download sources.
|
|
||||||
|
|
||||||
6. **No DNS-over-HTTPS or DNSSEC.** Plain UDP-53 to public resolvers is sufficient; the threat is "egress targeting", not "DNS hijacking". Revisit if the trust model relaxes.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```text
|
|
||||||
sudo helper (root)
|
|
||||||
└─ chown overlay dir to l4d2-sandbox
|
|
||||||
└─ systemd-run --service [...all v2 directives...]
|
|
||||||
-p IPAddressDeny="<11 CIDRs>"
|
|
||||||
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf [...]"
|
|
||||||
└─ /bin/bash /script.sh
|
|
||||||
(egress to listed CIDRs dropped at sd_fw_egress BPF hook;
|
|
||||||
DNS goes to 1.1.1.1 / 8.8.8.8; everything else
|
|
||||||
reaches the public internet normally)
|
|
||||||
```
|
|
||||||
|
|
||||||
`IPAddressDeny=` blocks egress to:
|
|
||||||
|
|
||||||
| CIDR | Coverage |
|
|
||||||
|---|---|
|
|
||||||
| `127.0.0.0/8` | IPv4 loopback |
|
|
||||||
| `::1/128` | IPv6 loopback |
|
|
||||||
| `169.254.0.0/16` | IPv4 link-local (incl. AWS metadata, DHCP fallback) |
|
|
||||||
| `fe80::/10` | IPv6 link-local |
|
|
||||||
| `224.0.0.0/4` | IPv4 multicast |
|
|
||||||
| `ff00::/8` | IPv6 multicast |
|
|
||||||
| `10.0.0.0/8` | RFC1918 private |
|
|
||||||
| `172.16.0.0/12` | RFC1918 private |
|
|
||||||
| `192.168.0.0/16` | RFC1918 private |
|
|
||||||
| `100.64.0.0/10` | CGNAT (RFC6598) |
|
|
||||||
| `fc00::/7` | IPv6 ULA |
|
|
||||||
|
|
||||||
Public IPv4 / IPv6 destinations are unaffected.
|
|
||||||
|
|
||||||
## Files
|
|
||||||
|
|
||||||
- `deploy/files/etc/left4me/sandbox-resolv.conf` *(new)* — `nameserver 1.1.1.1` + `nameserver 8.8.8.8`. Mode 0644 root-owned at deploy time.
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — `IPAddressDeny=` directive added; `BindReadOnlyPaths=` references the sandbox-resolv.conf instead of `/etc/resolv.conf`.
|
|
||||||
- `deploy/deploy-test-server.sh` — `install -m 0644 -o root -g root .../sandbox-resolv.conf /etc/left4me/sandbox-resolv.conf`.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — assert all of the above + the **negative assertion `IPAddressAllow=any not in text`** (regression guard).
|
|
||||||
|
|
||||||
The web app, ScriptBuilder, routes, models, and migrations are all unchanged. Same as v2.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
Smoke battery on the deployed host (probe script invoked through the helper as root):
|
|
||||||
|
|
||||||
| Target | Expected | Actual |
|
|
||||||
|---|---|---|
|
|
||||||
| `1.1.1.1:443` | connected | ✓ CONNECTED |
|
|
||||||
| `https://steamcommunity.com/` (DNS + HTTPS) | 200 | ✓ 200 |
|
|
||||||
| `127.0.0.1:8000` (web app) | blocked | ✓ TimeoutError |
|
|
||||||
| `127.0.0.1:22` (sshd) | blocked | ✓ TimeoutError |
|
|
||||||
| `10.0.4.128:22` (host LAN ssh) | blocked | ✓ TimeoutError |
|
|
||||||
| `10.0.0.1:53` (host's DNS resolver) | blocked | ✓ TimeoutError |
|
|
||||||
| `cat /etc/resolv.conf` inside | shows 1.1.1.1 + 8.8.8.8 | ✓ |
|
|
||||||
|
|
||||||
`bpftool cgroup show` against the unit's cgroup confirms `sd_fw_egress` and `sd_fw_ingress` are attached.
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **`IPAddressAllow=` accidentally added back.** Reopens every blocked range silently. Mitigation: explicit negative test in `test_deploy_artifacts.py` plus a comment in the helper.
|
|
||||||
- **Public DNS resolver outage.** 1.1.1.1 and 8.8.8.8 are both down → DNS in sandbox fails → builds fail. Two resolvers from independent operators makes this very unlikely. Operator can change the file in `/etc/left4me/sandbox-resolv.conf` if they prefer different resolvers; the helper picks it up on next invocation.
|
|
||||||
- **Public DNS resolver privacy.** Cloudflare and Google see hostnames the scripts query. Acceptable for the workload (Steam Workshop, GitHub, etc. are public anyway); switch to Quad9 or self-hosted if this is a concern.
|
|
||||||
- **Future kernel/systemd that flips the documented "more specific wins" semantics.** If a future systemd version actually implements the documented behavior, a unit with only `IPAddressDeny=` continues to work; the negative test on `IPAddressAllow=any` keeps the regression-safe configuration locked in. Re-test on each major systemd upgrade.
|
|
||||||
- **Scripts that legitimately need a private IP.** E.g., a self-hosted internal mirror at 10.x. Not a use case today; if it arises, expose specific IPs via a future `IPAddressAllow=10.x.y.z/32` for that one host (not blanket).
|
|
||||||
|
|
||||||
## Out Of Scope
|
|
||||||
|
|
||||||
- **Per-overlay UID isolation.** Cross-script-overlay write access via the shared `l4d2-sandbox` UID is still possible after a hypothetical sandbox bypass. Deferred from earlier discussions.
|
|
||||||
- **Egress allowlist by hostname / domain.** Would require a forward proxy (Squid, mitmproxy). Heavier than warranted for the trust model.
|
|
||||||
- **Dropping `AF_UNIX` from `RestrictAddressFamilies=`.** Tangential to IP-level egress; risks breaking NSS / syslog.
|
|
||||||
- **DNSSEC / DoH.** Threat model is egress targeting, not DNS hijacking.
|
|
||||||
- **Network-namespace isolation (`PrivateNetwork=` + custom netns + NAT).** Heavier than `IPAddressDeny=` for equivalent outcome.
|
|
||||||
|
|
||||||
## Implementation Boundaries
|
|
||||||
|
|
||||||
- **No app code change.** Helper-side only.
|
|
||||||
- **No new systemd units.** Same transient `left4me-script-{id}-{pid}.service` pattern.
|
|
||||||
- **No new apt deps.** `bpftool` was used during smoke testing but is not required at runtime.
|
|
||||||
- **One new deploy artifact.** `sandbox-resolv.conf` shipped under `deploy/files/etc/left4me/`.
|
|
||||||
|
|
@ -1,115 +0,0 @@
|
||||||
# Overlay File Tree Section Design
|
|
||||||
|
|
||||||
**Goal:** Add a "Files" section to the overlay detail page (`/overlays/<id>`) that renders a collapsible tree of the overlay's runtime directory at `${LEFT4ME_ROOT}/overlays/{overlay.id}/`, with lazy expansion of folders (one fetch per first-time expand) and click-to-download for individual files. Same access rule as the rest of the overlay detail page (admin or `overlay.user_id == g.user.id`). Read-only; no rename/delete/upload in v1.
|
|
||||||
|
|
||||||
**Approval status:** User-approved 2026-05-08 (visual companion brainstorm + plan-mode review). Implemented + deployed in the same session. The lazy-load originally targeted HTMX (vendored in `base.html`), but the post-deploy smoke uncovered that `static/vendor/htmx.min.js` was a 33-byte placeholder — the real library was never vendored. Rather than vendoring full HTMX for one feature, the lazy-load was switched to plain JS using the same fetch + innerHTML pattern (~30 lines in `static/js/file-tree.js`). The route + partial contracts are unchanged.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
Today, the overlay detail page shows the row's metadata (name, type, scope, path, last build status), a workshop-items table or script editor depending on `overlay.type`, and links to the build-job stream. It never shows what's actually inside the overlay directory on disk. To verify "did my script actually produce what I expected?" or "did the right VPKs land in `addons/`?" the user has to SSH into the host and `ls /var/lib/left4me/overlays/{id}/`.
|
|
||||||
|
|
||||||
Click-to-download is a secondary nice-to-have: workshop overlays' `addons/*.vpk` are absolute symlinks into the shared `${LEFT4ME_ROOT}/workshop_cache/`, and pulling a single VPK to a dev box otherwise means scping with the right path translation.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **All overlay types show the section.** Script + workshop + system/managed. Consistency over a tighter scope; even workshop's predictable `addons/*.vpk` layout is worth confirming.
|
|
||||||
2. **Collapsible tree, lazy load on expand.** Tree can get large; only the root level is rendered server-side at first paint. Each folder click fires `GET /overlays/<id>/files?path=<rel>` and innerHTMLs the response into that folder's `.file-tree-children` div. The no-JS path still shows the root level (the same partial is server-rendered) — folders just won't expand.
|
|
||||||
3. **Single delegated JS handler.** `static/js/file-tree.js` listens for `click` on `document`, finds the closest `.file-tree-toggle` button, toggles `aria-expanded` + `hidden`, and on first expand fires a `fetch()` against the URL in the button's `data-files-url`. Subsequent toggles never re-fetch (`button.dataset.loaded` flag, set optimistically before the fetch to dedupe rapid clicks; cleared on error to allow retry).
|
|
||||||
4. **Single-file download in v1.** No bulk archive (e.g., "download whole overlay as `.tar.gz`"). Files are streamed via Flask `send_file(..., as_attachment=True)`. No size cap — VPKs are commonly 100–500 MB and that's the whole point.
|
|
||||||
5. **No auto-refresh.** The tree reflects what was on disk at page render. After a build, the user reloads the page. Polling/SSE would duplicate the existing live-log mechanism on the build-job page for negligible benefit.
|
|
||||||
6. **Same access rule as the rest of the page.** `g.user.admin or overlay.user_id is None or overlay.user_id == g.user.id`. GETs need no CSRF (`l4d2web/app.py:56`).
|
|
||||||
7. **`overlay.path` not `overlay.id`.** The runtime directory is reached via `overlay.path` (current creation flow guarantees `path == str(id)`, but legacy/seeded rows may differ). Path resolution happens through the existing `l4d2host.paths.overlay_path()` helper, which already validates the ref string and resolves+verifies it stays under `${LEFT4ME_ROOT}/overlays/`.
|
|
||||||
8. **Empty / unresolvable → empty state.** If the overlay's path is unresolvable (legacy absolute-path rows) or the directory doesn't exist (overlay never built), the section renders "No files yet — build this overlay to populate it." rather than crashing.
|
|
||||||
9. **500-entry cap per folder.** Folders with more than 500 children render the alphabetical-first 500 plus a `+ M more (truncated)` footer. Tunable at runtime via `l4d2web.services.overlay_files.DEFAULT_MAX_ENTRIES` (re-resolved per call so tests can monkeypatch).
|
|
||||||
10. **Hidden files shown.** No filtering of `.git`, `.DS_Store`, etc. Users want ground truth.
|
|
||||||
11. **One dedicated blueprint, `files_bp`.** Not folded into `overlay_routes.py` (which is exclusively POST mutations) or `page_routes.py` (top-level pages, not embedded fragments). `files_bp` owns both the tree fragment and the download endpoint.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```text
|
|
||||||
GET /overlays/<id> (page_routes.overlay_detail)
|
|
||||||
│
|
|
||||||
▼ computes (file_tree_root_entries, truncated_count) via
|
|
||||||
│ _root_file_tree(overlay) → safe_resolve_for_listing(overlay.path, "")
|
|
||||||
│ → list_directory(overlay_root, overlay_root)
|
|
||||||
│
|
|
||||||
▼ renders overlay_detail.html, which includes _overlay_file_tree.html
|
|
||||||
for the root level (or the empty-state <p>).
|
|
||||||
|
|
||||||
GET /overlays/<id>/files?path=<rel> (files_routes.overlay_files_fragment)
|
|
||||||
│
|
|
||||||
▼ auth gate (admin or owner)
|
|
||||||
│
|
|
||||||
▼ safe_resolve_for_listing(overlay.path, rel) → Path under overlay_root
|
|
||||||
│
|
|
||||||
▼ list_directory(target, overlay_root) → entries[], truncated_count
|
|
||||||
│
|
|
||||||
▼ renders _overlay_file_tree.html (partial only — no base.html)
|
|
||||||
|
|
||||||
GET /overlays/<id>/files/download?path=<rel> (files_routes.overlay_files_download)
|
|
||||||
│
|
|
||||||
▼ auth gate
|
|
||||||
│
|
|
||||||
▼ safe_resolve_for_download(overlay.path, rel) → real Path under LEFT4ME_ROOT
|
|
||||||
│ (follows symlinks; allows targets anywhere in LEFT4ME_ROOT, e.g. workshop_cache)
|
|
||||||
│
|
|
||||||
▼ Flask send_file(real, as_attachment=True, download_name=basename(real))
|
|
||||||
```
|
|
||||||
|
|
||||||
### File tree fragment shape
|
|
||||||
|
|
||||||
`_overlay_file_tree.html` produces a `<ul class="file-tree">` containing one `_overlay_file_node.html` row per entry plus an optional truncated-footer `<li>`. A folder row is a `<button class="file-tree-toggle" data-files-url="…">` followed by an empty `<div class="file-tree-children" hidden>` that becomes the fetch target. A file row is an `<a href=".../files/download?path=…">` (or a plain `<span>` for broken symlinks) plus optional badges (`link`, `broken link`) and the resolved size.
|
|
||||||
|
|
||||||
Nesting after expand:
|
|
||||||
|
|
||||||
```html
|
|
||||||
<li class="file-tree-row file-tree-row-dir">
|
|
||||||
<button class="file-tree-toggle" aria-expanded="true" data-files-url="…">…</button>
|
|
||||||
<div class="file-tree-children">
|
|
||||||
<ul class="file-tree" role="group">…</ul> <!-- inserted by file-tree.js -->
|
|
||||||
</div>
|
|
||||||
</li>
|
|
||||||
```
|
|
||||||
|
|
||||||
## Path Safety
|
|
||||||
|
|
||||||
Two resolvers in `l4d2web/services/overlay_files.py`:
|
|
||||||
|
|
||||||
- `safe_resolve_for_listing(overlay.path, sub_path)` — resolves `overlay_root / sub_path`, applies `Path.resolve(strict=False)`, requires the result to be the overlay root or a descendant. Used by the tree-fragment route. **Refuses to recurse through symlinks that leave the overlay root**, including symlinks pointing into `workshop_cache/` — listing has no need to follow them, since workshop addons are leaf files, not directories we'd descend into.
|
|
||||||
- `safe_resolve_for_download(overlay.path, sub_path)` — resolves the candidate path, applies `os.path.realpath()`, requires the result to be under `${LEFT4ME_ROOT}` (anywhere — overlay dir, `workshop_cache/`, future siblings). This is the relaxed gate that lets workshop addons stream from the shared cache while still blocking absolute symlinks to `/etc/passwd` planted by a malicious script overlay.
|
|
||||||
|
|
||||||
Both resolvers re-use `l4d2host.paths.overlay_path()` (which itself calls `validate_overlay_ref`) for the overlay-root resolution, and `l4d2web.services.security.validate_overlay_ref` for the sub-path component (rejects empty / `.` / `..` / absolute / whitespace / backslash). Empty `sub_path` is valid for listing (means "the overlay root") and invalid for download.
|
|
||||||
|
|
||||||
Listing: `target.is_dir()` check after resolution; non-directory → 404.
|
|
||||||
|
|
||||||
Download: `real.exists()` check (404), `real.is_dir()` rejection (400 — "not a file").
|
|
||||||
|
|
||||||
## Symlink Behaviour
|
|
||||||
|
|
||||||
`list_directory` uses `os.scandir()` with explicit `follow_symlinks` flags:
|
|
||||||
|
|
||||||
- `is_symlink = entry.is_symlink()`
|
|
||||||
- `kind`: `entry.is_dir(follow_symlinks=True)` inside a try block. Raised `OSError` → broken symlink, treated as `kind="file"` with `broken=True` and no `<a>` download link.
|
|
||||||
- `size`: `entry.stat(follow_symlinks=True).st_size` for files (resolved target's size — what users care about for VPKs); `None` for dirs and broken symlinks.
|
|
||||||
|
|
||||||
Symlinked directories pointing inside the overlay root are rendered as folders and remain expandable; the listing-time safety check rejects expansion if the symlink resolves outside the overlay root.
|
|
||||||
|
|
||||||
Concurrent build vs listing race: a build mid-symlink-rewrite can yield a transient broken-symlink view. Acceptable — page is a snapshot; the visible "broken link" badge tells the user to refresh.
|
|
||||||
|
|
||||||
## Test Strategy
|
|
||||||
|
|
||||||
Two test modules, both following existing fixture patterns (`tests/test_script_overlay_routes.py` style — `monkeypatch.setenv("LEFT4ME_ROOT", str(tmp_path))`, app fixture with `TESTING=True`).
|
|
||||||
|
|
||||||
- `tests/test_overlay_files.py` — pure-helper unit tests (Flask-free): listing-resolver happy/sad paths (root, sub-path, `..`, absolute, empty component, symlink-out-of-overlay), download-resolver happy/sad paths (regular file, workshop-cache symlink, outside-LEFT4ME_ROOT symlink, traversal, absolute, empty), `list_directory` behaviour (empty, dir-first sort, kind detection, rel paths, symlink markers, broken-symlink markers, truncation cap, human-size formatting).
|
|
||||||
- `tests/test_overlay_files_routes.py` — HTTP integration tests: tree-fragment 200 / 400 / 403 / 404 across the same axes; download 200 / 400 / 403 / 404 + content-disposition + byte-exact body for both regular files and workshop-cache symlinks; admin-can-view-foreign overlay; truncation-via-route (monkeypatching `DEFAULT_MAX_ENTRIES`); broken-symlink rendering omits the `<a>` download link; the page-level `overlay_detail` integration shows the section with entries when populated and the empty state when the directory is missing.
|
|
||||||
|
|
||||||
39 tests total. The full web suite (`pytest l4d2web/tests/ -q`) must remain green.
|
|
||||||
|
|
||||||
## Out of Scope
|
|
||||||
|
|
||||||
- Bulk download (e.g., "download overlay as tar.gz").
|
|
||||||
- Inline file preview (text peek, image thumbnail).
|
|
||||||
- File deletion / rename / upload from the UI.
|
|
||||||
- Auto-refresh while a build is active.
|
|
||||||
- Filtering hidden files or applying a `.gitignore`-style rule.
|
|
||||||
- Reusable file-tree component for things outside overlays.
|
|
||||||
|
|
@ -1,55 +0,0 @@
|
||||||
# Server ID as Host Identifier Design
|
|
||||||
|
|
||||||
**Goal:** Decouple the user-facing server label from the host-side identifier. The systemd unit name and on-disk paths become functions of `Server.id`; `Server.name` becomes a free-form display label.
|
|
||||||
|
|
||||||
**Approval status:** User-approved 2026-05-08.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
`Server.name` was doing two unrelated jobs. It was the human label rendered in the UI *and* the literal string fed to `l4d2ctl`, which became the systemd unit instance (`left4me-server@<name>.service`) and the directories under `/var/lib/left4me/{instances,runtime}/<name>/`. To stay safe as a unit-template parameter and a path component, the name was forced through `[a-z0-9][a-z0-9_-]{0,63}` and held globally unique. The cost was a UI that demanded machine-friendly slugs, no rename support, and an awkward divergence from overlays — which already separate identity (`id`) from label (`name`).
|
|
||||||
|
|
||||||
This change moves servers onto the same model as overlays. Web URLs already key on `id` (`/servers/<int:server_id>`), so the change is mostly local: pick an id-derived host identifier, pass that everywhere `server.name` was passed, and relax the `name` constraints.
|
|
||||||
|
|
||||||
## Locked Decisions
|
|
||||||
|
|
||||||
1. **Host-side identifier = plain numeric id.** `left4me-server@42.service`, `/var/lib/left4me/instances/42/`, `/var/lib/left4me/runtime/42/`. The host CLI's `validate_instance_name` regex (`[a-z0-9][a-z0-9_-]{0,63}`), the systemctl helper's argument check (`[A-Za-z0-9_.-]`), and the unit template (`%i`) all already accept digit-only strings — no host-side change.
|
|
||||||
2. **Name = free-form display label, unique per user, required (≤128 chars).** Whitespace is stripped on save. Two users can both have a server named "Practice"; one user cannot.
|
|
||||||
3. **No data preservation.** Dev-only deploy. Existing servers on the test host are not migrated; their old `left4me-server@<old-name>.service` units and `<old-name>/` directories become orphans and are cleaned up manually.
|
|
||||||
4. **Single source of truth for the id-to-host-name rule.** A one-line helper (`server_unit_name(server_id) -> str(server_id)`) lives in `l4d2web/services/server_identity.py`. Every callsite that used to pass `server.name` to `l4d2ctl` or `journalctl` calls this. Future format tweaks (e.g. `srv-{id}`) are a one-line edit.
|
|
||||||
|
|
||||||
## Schema
|
|
||||||
|
|
||||||
`servers` (Alembic 0006):
|
|
||||||
- Drop the (unnamed) global `UNIQUE (name)` from the original 0001 schema.
|
|
||||||
- Add `UNIQUE (user_id, name)` as `uq_servers_user_name`.
|
|
||||||
- Column stays `name VARCHAR(128) NOT NULL`.
|
|
||||||
|
|
||||||
The migration uses `batch_alter_table(recreate="always")` with a `naming_convention` so the originally-anonymous unique can be referenced as `uq_servers_name` for `drop_constraint`.
|
|
||||||
|
|
||||||
## Code touchpoints
|
|
||||||
|
|
||||||
- `l4d2web/services/server_identity.py` (new)
|
|
||||||
- `l4d2web/models.py` — drop `unique=True` on `Server.name`; add `__table_args__` with the per-user unique.
|
|
||||||
- `l4d2web/alembic/versions/0006_server_name_per_user.py` (new)
|
|
||||||
- `l4d2web/services/l4d2_facade.py` — five `l4d2ctl` invocations switched to `server_unit_name(server.id)`. Parameter renamed to `unit_name` on `server_status` / `stream_server_logs`.
|
|
||||||
- `l4d2web/services/job_worker.py` — status refresh uses `server_unit_name(server.id)`. The `server_name` log-label variable still holds `server.name` (the display label); that's correct now and shows up in job logs as e.g. "starting initialize for My Practice".
|
|
||||||
- `l4d2web/routes/log_routes.py` — SSE log stream feeds `server_unit_name(server.id)` to `journalctl`.
|
|
||||||
- `l4d2web/routes/server_routes.py` — replace `validate_instance_name` with `_validate_display_name` (strip + non-empty + length ≤128). Broaden the `IntegrityError` handler to disambiguate `servers.name` (409 "name already in use") from `servers.port` (409 "port already in use") via the underlying SQLite error string.
|
|
||||||
- `l4d2web/services/security.py` — `validate_instance_name` deleted (no remaining callers).
|
|
||||||
- `l4d2web/templates/servers.html` — name input gains `maxlength="128"`.
|
|
||||||
|
|
||||||
## Failure modes
|
|
||||||
|
|
||||||
- **Name with shell metacharacters reaches a host command.** Cannot happen — the host call now receives only `str(server.id)` (digits). The display name is never passed through `l4d2ctl`.
|
|
||||||
- **Two servers under the same user with the same name.** Blocked at the DB layer (`uq_servers_user_name`); surfaced as a 409 "name already in use" with no row written.
|
|
||||||
- **Migration on a DB with existing servers.** `batch_alter_table(recreate="always")` rebuilds the table preserving rows; the new per-user constraint is satisfied trivially since the old global constraint already enforced strict uniqueness.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. `python -m pytest l4d2web l4d2host deploy` from the repo root — green.
|
|
||||||
2. Stepwise migration on a fresh sqlite (upgrade to 0005, insert two users + a server, upgrade to 0006): row preserved, second user can take the same name, same user cannot (UNIQUE constraint failed: servers.user_id, servers.name).
|
|
||||||
3. Post-deploy on the test host: create a server named `"My Practice"` (with the space), confirm the systemd unit is `left4me-server@<id>.service`, confirm `/var/lib/left4me/runtime/<id>/merged` is mounted on start, confirm log streaming still works.
|
|
||||||
|
|
||||||
## Operator note
|
|
||||||
|
|
||||||
After deploy, on the test host: stop and remove any pre-existing `left4me-server@<old-name>.service` units and their `/var/lib/left4me/{instances,runtime}/<old-name>/` directories. The web app no longer references them.
|
|
||||||
|
|
@ -1,220 +0,0 @@
|
||||||
# Files overlay (user-managed file content)
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
In the prior `ckn-bw` setup, per-server config-style files (`admins.txt`, `motd.txt`, mapcycle, etc.) lived under `bundles/left4dead2/files/scripts/overlays/standard`. `left4me` has no equivalent: today an overlay's contents come from either Steam Workshop (`workshop` type) or a user-authored bash build script (`script` type). Both have an external source-of-truth, so neither is the right home for files the user owns directly. The user wants both online editing of text files *and* arbitrary file upload, and we unify them into a single mechanism.
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Add a third overlay type `files` whose source-of-truth IS the overlay directory itself. Provide a web UI to:
|
|
||||||
|
|
||||||
- **Upload** any file or whole folder by dragging it onto a folder row in the tree (drag from the OS).
|
|
||||||
- **Move** files and folders by dragging rows inside the tree (internal drag).
|
|
||||||
- **Create / edit / rename / replace** files through a single modal editor, opened from row buttons. Modal adapts to text or binary content.
|
|
||||||
- **Download** files (or zip an entire folder).
|
|
||||||
- **Delete** files and empty folders.
|
|
||||||
- **Create new folders** explicitly (including nested intermediates in one shot).
|
|
||||||
|
|
||||||
Reuse the existing overlayfs / spec / mount / `expose_server_cfg` pipeline unchanged: a `files` overlay is a normal overlay attached to blueprints.
|
|
||||||
|
|
||||||
## Non-goals (v1)
|
|
||||||
|
|
||||||
- Per-server overrides (servers still bind to a blueprint without per-instance file changes).
|
|
||||||
- Concurrency policing when an overlay is in use by a running server. Overlayfs technically calls lower-layer mutation undefined behavior, but L4D2 reads most config at boot, so "edits visible on next start" is acceptable.
|
|
||||||
- Versioning / undo / history.
|
|
||||||
- Syntax highlighting (CodeMirror-style). Plain `<textarea>`; can add later.
|
|
||||||
- "Save As" copy. The filename input *is* Save-As.
|
|
||||||
- Recursive directory delete from the UI.
|
|
||||||
- Multi-file drop into the binary "replace" zone (single file only).
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
### Data model
|
|
||||||
|
|
||||||
`Overlay.type` accepts a new value: `"files"` (in addition to `"workshop"` and `"script"`). No schema change needed — `Overlay.type` is already `String(16)`. The `script` column stays empty for files overlays; `last_build_status` is set to `"ok"` on creation and not otherwise managed. Privacy follows the existing `user_id` rules unchanged.
|
|
||||||
|
|
||||||
`BlueprintOverlay` and the `expose_server_cfg` checkbox keep working as-is: a `files` overlay containing a `server.cfg` is exposed via the same alias mechanism the 2026-05-08 plan introduced.
|
|
||||||
|
|
||||||
### Filesystem layout
|
|
||||||
|
|
||||||
A files overlay lives at `${LEFT4ME_ROOT}/overlays/{overlay.path}/` like every other overlay. Example contents:
|
|
||||||
|
|
||||||
```
|
|
||||||
overlays/{id}/
|
|
||||||
left4dead2/
|
|
||||||
cfg/
|
|
||||||
server.cfg
|
|
||||||
motd.txt
|
|
||||||
mapcycle.txt
|
|
||||||
addons/
|
|
||||||
sourcemod/configs/admins_simple.ini
|
|
||||||
custom_map.vpk
|
|
||||||
```
|
|
||||||
|
|
||||||
The `InstanceSpec` / `OverlayRef` shape already supports this. The spec builder in `l4d2web/services/l4d2_facade.py` doesn't need to learn about overlay types, only to keep emitting `path` (and `alias` when `expose_server_cfg` is set).
|
|
||||||
|
|
||||||
### Builder registration
|
|
||||||
|
|
||||||
`l4d2web/services/overlay_builders.py::BUILDERS` gains a `"files"` entry whose `build()` is a no-op that ensures `_overlay_root(overlay)` exists. The route layer also short-circuits: there is no "rebuild" concept for a files overlay — every save / upload / move / mkdir / delete is immediately authoritative.
|
|
||||||
|
|
||||||
### Safety helpers
|
|
||||||
|
|
||||||
`l4d2web/services/overlay_files.py` already has `safe_resolve_for_listing` and `safe_resolve_for_download` (anchor-and-resolve, refuse `..` traversal and symlink-target escapes). Add three siblings using the same pattern:
|
|
||||||
|
|
||||||
- `safe_resolve_for_write(overlay_path_value, sub_path) -> Path` — destination path. Refuses empty `sub_path`, refuses any escape, refuses to overwrite an existing symlink, refuses a path whose parent resolves to a non-directory.
|
|
||||||
- `safe_resolve_for_delete(overlay_path_value, sub_path) -> Path` — same root-escape rules; allows deleting files and empty directories. Non-empty directory delete returns an error.
|
|
||||||
- `safe_resolve_for_move(overlay_path_value, src, dst) -> tuple[Path, Path]` — both endpoints inside the overlay root. Refuses `dst` inside `src` (cycle). Refuses if `src` doesn't exist. Refuses if `dst` parent is missing or not a directory. Refuses overwriting a symlink at `dst`.
|
|
||||||
|
|
||||||
Plus a small predicate:
|
|
||||||
|
|
||||||
- `is_editable(path: Path) -> bool` — true iff `path` is a regular file (not symlink), size ≤ 1 MiB, and first 8 KiB decodes as strict UTF-8. Surfaced via `_entry_dict` in listings as `editable: bool`.
|
|
||||||
|
|
||||||
### UI design
|
|
||||||
|
|
||||||
The file-manager lives inside the existing overlay detail page, only when `overlay.type == "files"`. Layout follows the existing `<ul class="file-tree">` pattern, extended as below.
|
|
||||||
|
|
||||||
#### Tree row buttons (hover-reveal, CSS `:hover`)
|
|
||||||
|
|
||||||
| Row | Buttons (left-to-right) | Click on row body | Draggable |
|
|
||||||
|---|---|---|---|
|
|
||||||
| Folder (incl. overlay root) | `+ new file` · `+ new folder` · `⬇ zip` · `✕` | toggle expand/collapse | yes (move subtree) |
|
|
||||||
| File (any) | `edit` · `⬇` · `✕` | nothing | yes (move file) |
|
|
||||||
|
|
||||||
Files always show `edit` regardless of editability — the modal adapts. Touch devices fall back to always-visible buttons via a `(hover: none)` media query.
|
|
||||||
|
|
||||||
#### Drag-and-drop on tree rows — single gesture, source distinguishes
|
|
||||||
|
|
||||||
| Drag source | Action | Visual on hovered row | Endpoint |
|
|
||||||
|---|---|---|---|
|
|
||||||
| OS file/folder (`dataTransfer.files` / `webkitGetAsEntry`) | upload | green outline + `↑ Release to upload N items here` | `POST /overlays/{id}/files/upload` |
|
|
||||||
| Tree row (file or folder) | move | green outline + `↦ Move {name} here` | `POST /overlays/{id}/files/move` |
|
|
||||||
|
|
||||||
Refused drops (UI rejects without server round-trip): drop on self, drop on own ancestor (cycle), drop where parent doesn't exist. Conflict at destination → server returns 409 → overwrite/keep-both modal.
|
|
||||||
|
|
||||||
#### Upload progress panel
|
|
||||||
|
|
||||||
Each dropped item becomes one `POST /files/upload` request (one file part, `target_path` set to the dropped row's path, `webkitRelativePath` preserved). A floating "Uploads" panel docks to the bottom-right of the page while there is at least one in-flight or queued upload, and auto-collapses when the queue is empty.
|
|
||||||
|
|
||||||
- **Per-file rows** in the panel: filename, target path (subtle), progress bar driven by `XMLHttpRequest.upload.onprogress`, queue position, per-file cancel button.
|
|
||||||
- **Concurrency:** at most 3 uploads in flight; remainder queue. Drop-while-uploading appends to the queue with no special UI.
|
|
||||||
- **Cancel mid-flight:** aborts the XHR; server cleans up any partial file in a `finally` block.
|
|
||||||
- **Conflicts:** a 409 on an individual file pauses just that upload (panel row shows "conflict — overwrite / keep both") and opens the existing overwrite/keep-both modal scoped to that one path. The rest of the queue keeps running.
|
|
||||||
- **Errors:** per-file error states (413 too large, 415 bad content, 422 path validation, 5xx) stay sticky in the panel until the user dismisses them. The panel has a "clear done" toggle.
|
|
||||||
- **Tree refresh:** when an upload finishes, the affected parent folder's listing partial is re-fetched (`hx-get` on the folder row). Debounced (50 ms) so many siblings finishing in one tick coalesce into one fetch.
|
|
||||||
|
|
||||||
#### Editor modal — single `<dialog>` with two flavors
|
|
||||||
|
|
||||||
The editor modal opens via the row's `edit` button or the folder's `+ new file` button.
|
|
||||||
|
|
||||||
**Common chrome (both flavors):**
|
|
||||||
- **Title** = full path (e.g. `left4dead2/cfg/motd.txt`). For new files: `addons/sourcemod/configs/…new file`.
|
|
||||||
- **Filename input** — single line, slashes rejected. Diverging from the original shows an inline `↻ Save will rename foo.txt → bar.txt` hint.
|
|
||||||
- **Footer** — `Delete` on the left (only for existing files), then `⬇ Download`, `Cancel`, `Save`/`Create` on the right.
|
|
||||||
|
|
||||||
**Text flavor** (file is editable, or new file):
|
|
||||||
- Content `<textarea>`, 1 MiB cap on save, UTF-8 only.
|
|
||||||
- Footer hint: `UTF-8 · {n} bytes` + `Ctrl+S to save`.
|
|
||||||
|
|
||||||
**Binary flavor** (existing file is not editable):
|
|
||||||
- Replaces the textarea with a "Replace file" panel: a label noting `⛌ Inline editing not available · {size} · binary content`, plus a drop zone (`↑ Drop a file here to replace`) with a `browse` link as fallback. Single file only.
|
|
||||||
- Once a replacement is queued, the drop zone shows `↻ {newName} · {size} · queued` with an `✕` to clear the queue.
|
|
||||||
|
|
||||||
**Save semantics** (atomic per call; rename + content change happen in one server operation):
|
|
||||||
|
|
||||||
| Mode | Filename unchanged | Filename changed |
|
|
||||||
|---|---|---|
|
|
||||||
| Text | write content | rename + write content |
|
|
||||||
| Binary, no replacement queued | (Save disabled) | rename only |
|
|
||||||
| Binary, replacement queued | overwrite content | rename + overwrite content |
|
|
||||||
|
|
||||||
Rename target collision → 409 → overwrite/keep-both modal (same modal as upload conflicts).
|
|
||||||
|
|
||||||
#### `+ new folder` dialog
|
|
||||||
|
|
||||||
A small dedicated `<dialog>` separate from the editor. Single text input for the folder name. Slashes allowed → creates intermediate dirs (`mkdir(parents=True, exist_ok=False)`).
|
|
||||||
|
|
||||||
#### `+ new file` flow
|
|
||||||
|
|
||||||
Reuses the editor modal in text flavor with empty content; the filename input is empty and focused, the title shows the source folder + `…new file`.
|
|
||||||
|
|
||||||
### Web routes
|
|
||||||
|
|
||||||
In `l4d2web/routes/files_routes.py` (alongside the existing `overlay_files_fragment` and `download` endpoints):
|
|
||||||
|
|
||||||
| Method | Path | Body | Purpose |
|
|
||||||
|---|---|---|---|
|
|
||||||
| GET | `/overlays/{id}/files/content` | `?path=` | Returns `{path, content}` for an editable file. 415 if not editable. |
|
|
||||||
| POST | `/overlays/{id}/files/save` | JSON `{path, content, new_path?}` | Text-mode save. Optional `new_path` performs rename atomically with the write. |
|
|
||||||
| POST | `/overlays/{id}/files/replace` | multipart `path`, `file`, optional `new_path` | Binary-mode replace. Optional `new_path` performs rename atomically. |
|
|
||||||
| POST | `/overlays/{id}/files/upload` | multipart `target_path`, single `file` part (carrying `webkitRelativePath`) | OS-drag upload, one file per request. Creates intermediate dirs via `mkdir(parents=True)`. Cleans up partial writes on cancel via `finally`. 200 on success, 409 on conflict, 413/415/422 on validation failure. |
|
|
||||||
| POST | `/overlays/{id}/files/move` | JSON `{src, dst}` | Internal drag move (and plain rename when same parent). |
|
|
||||||
| POST | `/overlays/{id}/files/mkdir` | JSON `{path}` | Create empty folder; slashes in `path` produce nested intermediates. |
|
|
||||||
| POST | `/overlays/{id}/files/delete` | form `path` | Delete file or empty folder. |
|
|
||||||
| GET | `/overlays/{id}/files/download_zip` | `?path=` | Stream a zip of the folder's contents. |
|
|
||||||
|
|
||||||
Existing `GET /overlays/{id}/files?path=...` and `GET /overlays/{id}/files/download?path=...` stay as-is. The listing endpoint additionally returns `editable` per file row.
|
|
||||||
|
|
||||||
All new routes:
|
|
||||||
- 404 when `overlay.type != "files"`.
|
|
||||||
- Require `overlay.user_id == current_user.id` (or admin).
|
|
||||||
- Use the new safe-resolve helpers.
|
|
||||||
- CSRF via the existing `csrf.js` injection (multipart endpoints included).
|
|
||||||
|
|
||||||
### Tech stack
|
|
||||||
|
|
||||||
Stay inside the project's established stack — Flask + Jinja2 + HTMX + tiny vanilla JS in `static/js/` + custom CSS with tokens, no build step:
|
|
||||||
|
|
||||||
- **Templates:** Jinja2 partials, returned as HTMX swaps where appropriate (subtree refresh after upload/move/mkdir/delete).
|
|
||||||
- **Modals:** native `<dialog>` with the existing `data-modal-open` / `data-modal-close` event-delegated handlers.
|
|
||||||
- **JS:** vanilla. Extend `static/js/file-tree.js` (or add a sibling `files-overlay.js`) covering: `dragstart` on rows, `dragover` highlight + source-discrimination (`dataTransfer.types.includes("Files")` vs internal MIME), `webkitGetAsEntry()` walk for whole-folder OS drops, editor modal open/save (Ctrl+S, fetch POST), binary replace-zone drop handler, conflict-modal flow, new-folder dialog, upload queue + floating progress panel (XHR per file, concurrency 3, abort on cancel, debounced tree-refresh on completion).
|
|
||||||
- **CSS:** extend `tokens.css` and `components.css` with file-manager-specific rules — drop-target outline, hover-reveal action column, editor modal sizing, replace-zone styling.
|
|
||||||
|
|
||||||
No external libraries (no Dropzone, no jsTree, no CodeMirror) — adding one would be a meaningful departure from the project's "no build step, vendored libs only" posture.
|
|
||||||
|
|
||||||
### Creation flow for new overlays
|
|
||||||
|
|
||||||
The "create overlay" UI gains a third radio option: `Files`. Selecting it skips the type-specific fields (no Steam Workshop selector, no script editor) and creates an empty `Overlay` row with `type="files"`, `last_build_status="ok"`, and an empty directory.
|
|
||||||
|
|
||||||
### Host-side
|
|
||||||
|
|
||||||
No changes. The mount helper, instance lifecycle, and srcds startup don't care what produced the contents of an overlay directory.
|
|
||||||
|
|
||||||
### Migration / Alembic
|
|
||||||
|
|
||||||
None. `Overlay.type` already stores arbitrary strings; introducing a new value is data-only.
|
|
||||||
|
|
||||||
## Critical files
|
|
||||||
|
|
||||||
| Layer | File | Change |
|
|
||||||
|---|---|---|
|
|
||||||
| Models | `l4d2web/models.py` | None (Overlay.type already String) |
|
|
||||||
| Builders | `l4d2web/services/overlay_builders.py` | Register `FilesBuilder` (no-op `build`) |
|
|
||||||
| Safety | `l4d2web/services/overlay_files.py` | Add `safe_resolve_for_write`, `safe_resolve_for_delete`, `safe_resolve_for_move`; add `is_editable` and surface it via `_entry_dict` |
|
|
||||||
| Routes | `l4d2web/routes/files_routes.py` | Add `content`, `save`, `replace`, `upload`, `move`, `mkdir`, `delete`, `download_zip` endpoints |
|
|
||||||
| Templates | `l4d2web/templates/overlay_detail.html`, `l4d2web/templates/_overlay_file_tree.html` | Hover-reveal action buttons; `data-target-path` on folder rows; `draggable="true"` on file/folder rows; editor modal `<dialog>` with both flavors; new-folder modal `<dialog>`; conflict modal `<dialog>` |
|
|
||||||
| Static JS | `l4d2web/static/js/file-tree.js` (extend) or new `files-overlay.js` | Drag-drop wiring, modal save, binary replace, mkdir, conflict flow, upload queue + panel |
|
|
||||||
| Static CSS | `l4d2web/static/css/components.css` | Drop-target outline, hover action column, editor modal sizing, replace-zone, upload panel |
|
|
||||||
| Create form | overlay creation template + route | Add `files` option to the type radio |
|
|
||||||
| Spec / facade | `l4d2web/services/l4d2_facade.py` | None — already type-agnostic |
|
|
||||||
| Host spec | `l4d2host/spec.py`, `l4d2host/instances.py` | None |
|
|
||||||
| Tests | adjacent to each touched module | safe-resolve refusals; `is_editable` heuristic; CRUD round-trip; ownership; non-files-type 404s; multipart with `webkitRelativePath`; move refuses cycles; conflict (409); zip stream; mkdir parents |
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. **Safety unit tests** — `safe_resolve_for_write`, `_for_delete`, `_for_move` reject `..` traversal, absolute paths, symlink-target escapes, attempts to overwrite a symlink, non-empty-dir delete, and `dst` inside `src`.
|
|
||||||
2. **Editability heuristic** — `is_editable` returns false for files > 1 MiB, symlinks, files with non-UTF-8 bytes in their first 8 KiB.
|
|
||||||
3. **Editor round-trip (text)** — from a folder row, "+ new file" → modal → save creates `left4dead2/cfg/admins.txt`; row appears with `edit` button; edit; rename via filename input; delete.
|
|
||||||
4. **Editor round-trip (binary)** — upload a `.vpk`, click `edit`, queue a replacement file via drop, change filename, Save → rename + replace happen atomically.
|
|
||||||
5. **Upload single file** — drag a file from the OS onto `left4dead2/cfg/`; appears with size and download link.
|
|
||||||
6. **Upload whole folder** — drag `addons/sourcemod/` from the OS onto the overlay root; nested structure preserved; intermediate directories auto-created.
|
|
||||||
7. **Conflict on upload** — drop a file with a colliding name; overwrite/keep-both modal; both choices behave correctly.
|
|
||||||
8. **Move within tree** — drag `motd.txt` onto `addons/`; file moves; tree refreshes.
|
|
||||||
9. **Move refusals** — drag a folder onto itself or a descendant; UI rejects without server round-trip.
|
|
||||||
10. **mkdir** — `+ new folder` with name `sourcemod/configs` creates both intermediates; collision returns 409.
|
|
||||||
11. **Zip download** — `⬇ zip` on `addons/` streams a valid zip containing the subtree.
|
|
||||||
12. **Mount integration** — attach the files overlay to a blueprint, start a server, confirm the files appear under `runtime/{server_id}/merged/...`.
|
|
||||||
13. **server.cfg alias** — with `expose_server_cfg=true` and a `server.cfg` in the files overlay, `exec server_overlay_{id}` is auto-injected into the merged `server.cfg`.
|
|
||||||
14. **Type isolation** — every new endpoint returns 404 for `workshop` and `script` overlays.
|
|
||||||
15. **Browser smoke test** — Chromium and Firefox: drag a folder containing nested files into a row; confirm `webkitRelativePath` arrives correctly.
|
|
||||||
16. **Upload progress panel** — drop 5 files of mixed sizes; panel shows 3 in flight, 2 queued; per-file progress bars advance; canceling one file aborts that XHR cleanly without affecting the others; partial file is removed server-side; tree refreshes once per parent folder (debounced) when uploads finish.
|
|
||||||
17. **End-to-end on the real test box** — deploy the branch to `ckn@10.0.4.128` via the project's deploy path, then drive the running web UI through the `claude-in-chrome` MCP tools end-to-end: create a `files` overlay, attach to a blueprint, exercise every CRUD path, boot a server, confirm the files materialize in the merged mount. Iterate until all paths work without errors.
|
|
||||||
|
|
@ -1,131 +0,0 @@
|
||||||
# l4d2 cpu isolation — design
|
|
||||||
|
|
||||||
Date: 2026-05-09
|
|
||||||
Status: design
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Constrain every cgroup that isn't a live game server to core 0; give game servers cores 1..N-1 exclusively. Implementation is systemd cgroup-v2 `AllowedCPUs=` drop-ins, computed at deploy time from `nproc`, overridable via env vars. Lands on top of the perf baseline shipped in `851e662..e5126c8`.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- A logged-in admin doing CPU-heavy work, the script-build sandbox, and the Flask web app cannot steal cycles from a live match.
|
|
||||||
- Layout scales automatically across host sizes (4-core, 8-core, 16-core) without per-host edits.
|
|
||||||
- Operator can override the default `0` / `1..N-1` split for NUMA boxes or hyperthread quirks.
|
|
||||||
- Single-core hosts degrade gracefully: skip CPU isolation, keep the rest of the perf baseline.
|
|
||||||
|
|
||||||
## Non-goals
|
|
||||||
|
|
||||||
- Kernel `isolcpus=` / `nohz_full=` / `rcu_nocbs=` boot parameters. True core isolation (eviction of softirqs, RCU, timer ticks) requires GRUB edits + reboot + per-host tuning. cgroup cpuset is sufficient for L4D2 tickrates; document as a future opt-in if measurement justifies it.
|
|
||||||
- NIC IRQ pinning. Hardware-specific; already documented as an escape hatch in `deploy/README.md`.
|
|
||||||
- Per-instance pinning *within* the game-core set. The slice-level cpuset is the floor; the existing per-instance `CPUAffinity=` drop-in escape hatch (already in `deploy/README.md`) composes on top — the kernel enforces "per-instance value must be a subset of slice's allowed set."
|
|
||||||
- A separate `l4d2-web.slice`. The web app is light; living in `system.slice` on core 0 is fine.
|
|
||||||
- Web-app or host-library code changes. Pure deploy-side artifact work.
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
The perf baseline (commit range `851e662..e5126c8`) introduced two slices (`l4d2-game.slice` weight 1000, `l4d2-build.slice` weight 10), per-instance unit directives (Nice, OOM, memory caps), and host sysctls. None of those constrain *which* CPUs cgroups run on. Under the kernel CFS, every task can move to any core; the build sandbox, ssh sessions, the web app, and game servers all compete for the same cores.
|
|
||||||
|
|
||||||
## Design
|
|
||||||
|
|
||||||
### Topology
|
|
||||||
|
|
||||||
```
|
|
||||||
core 0 cores 1..N-1
|
|
||||||
───────── ────────────
|
|
||||||
system.slice AllowedCPUs=0
|
|
||||||
user.slice AllowedCPUs=0
|
|
||||||
l4d2-build.slice AllowedCPUs=0
|
|
||||||
l4d2-game.slice AllowedCPUs=1-(N-1)
|
|
||||||
```
|
|
||||||
|
|
||||||
Everything that isn't a live game server (Flask web app, ssh sessions, journald, script-sandbox builds, cron, systemd housekeeping) is funneled to core 0. Game servers get cores 1..N-1 exclusively.
|
|
||||||
|
|
||||||
### Why slice-level `AllowedCPUs=`, not per-instance `CPUAffinity=`
|
|
||||||
|
|
||||||
- **Hierarchy does the work for free.** A cpuset on `l4d2-game.slice` propagates to every `left4me-server@*.service` automatically. No per-instance drop-ins to manage; no logic in the web app to pick cores.
|
|
||||||
- **Hot-applied.** cgroup-v2 cpuset changes apply to running cgroups; existing servers move next time the kernel schedules them. No need to restart instances after a deploy.
|
|
||||||
- **Composable.** A future operator who wants per-instance pinning *within* the game cores adds `CPUAffinity=N` via `/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf` (already documented). The slice constraint and per-instance pin compose; the kernel enforces subset-of.
|
|
||||||
|
|
||||||
### Why drop-ins, not edits to the existing `.slice` files
|
|
||||||
|
|
||||||
The two slice files we ship today (`l4d2-game.slice`, `l4d2-build.slice`) are static text and host-portable. `AllowedCPUs=1-7` is true on an 8-core host and wrong on a 4-core host. Drop-ins under `<unit>.d/*.conf` are the standard systemd pattern for host-specific overrides. We already use `99-` prefixing for the sysctl drop-in so it lex-orders last; reuse that.
|
|
||||||
|
|
||||||
### Operator override
|
|
||||||
|
|
||||||
Two env vars consumed by the deploy script:
|
|
||||||
|
|
||||||
- `LEFT4ME_SYSTEM_CPUS` — defaults to `0`. Goes into `system.slice`, `user.slice`, `l4d2-build.slice` drop-ins.
|
|
||||||
- `LEFT4ME_GAME_CPUS` — defaults to `1-$((NPROC-1))`. Goes into `l4d2-game.slice` drop-in.
|
|
||||||
|
|
||||||
Operators with NUMA boxes, hyperthread quirks, or "I want core 0 *and* core 1 for system" set the vars explicitly. Defaults handle the typical case.
|
|
||||||
|
|
||||||
### Single-core fallback
|
|
||||||
|
|
||||||
If `nproc < 2`, skip CPU isolation entirely (write no drop-ins). Print a warning to stderr explaining the deploy is leaving cpuset unset. The rest of the perf baseline still applies (weights, sysctls, OOM scores).
|
|
||||||
|
|
||||||
If `LEFT4ME_GAME_CPUS` or `LEFT4ME_SYSTEM_CPUS` is set explicitly on a single-core host, honor the operator's intent — they presumably know what they're doing — but still write the drop-ins.
|
|
||||||
|
|
||||||
### Drop-in layout
|
|
||||||
|
|
||||||
Four files written to `/etc/systemd/system/`, each named `99-left4me-cpuset.conf`:
|
|
||||||
|
|
||||||
```
|
|
||||||
/etc/systemd/system/system.slice.d/99-left4me-cpuset.conf
|
|
||||||
/etc/systemd/system/user.slice.d/99-left4me-cpuset.conf
|
|
||||||
/etc/systemd/system/l4d2-build.slice.d/99-left4me-cpuset.conf
|
|
||||||
/etc/systemd/system/l4d2-game.slice.d/99-left4me-cpuset.conf
|
|
||||||
```
|
|
||||||
|
|
||||||
Each file contains:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Slice]
|
|
||||||
AllowedCPUs=<resolved value>
|
|
||||||
```
|
|
||||||
|
|
||||||
### systemd compatibility
|
|
||||||
|
|
||||||
`AllowedCPUs=` is systemd 244+. Debian Trixie ships systemd 256+. Cgroup-v2 cpuset controller is enabled by default on Trixie; systemd auto-enables the controller when `AllowedCPUs=` is set on a unit. No additional machinery.
|
|
||||||
|
|
||||||
### Files changed / added
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/deploy-test-server.sh (modified — compute layout, write four drop-ins)
|
|
||||||
deploy/README.md (modified — new "CPU isolation" subsection inside Performance Tuning)
|
|
||||||
deploy/tests/test_deploy_artifacts.py (modified — new tests)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
`deploy/tests/test_deploy_artifacts.py` additions, following the existing
|
|
||||||
`assert "X" in script` pattern:
|
|
||||||
|
|
||||||
- For `deploy-test-server.sh`, assert:
|
|
||||||
- All four drop-in paths (`/etc/systemd/system/{system,user,l4d2-build,l4d2-game}.slice.d/99-left4me-cpuset.conf`) appear.
|
|
||||||
- The script reads `nproc` (substring `nproc` plus a default-binding form for `LEFT4ME_GAME_CPUS`).
|
|
||||||
- The script honors `LEFT4ME_SYSTEM_CPUS` and `LEFT4ME_GAME_CPUS` env-var overrides (substrings present, default-binding form like `${LEFT4ME_SYSTEM_CPUS:-...}`).
|
|
||||||
- The script has a single-core fallback (substring guarding `nproc -lt 2` or equivalent, with a warning to stderr).
|
|
||||||
- Each drop-in is written via the existing `install -m 0644 -o root -g root` heredoc pattern.
|
|
||||||
|
|
||||||
No runtime tests in this spec — verifying that systemd actually enforces `AllowedCPUs=` is operator-side via `cat /sys/fs/cgroup/<slice>/cpuset.cpus.effective` after deploy.
|
|
||||||
|
|
||||||
## Rollout
|
|
||||||
|
|
||||||
Single deploy. cgroup-v2 cpuset changes apply to running cgroups, so already-running servers move next time the kernel reschedules them — no instance restarts required. The `daemon-reload` already in the deploy script picks up the new drop-ins.
|
|
||||||
|
|
||||||
If something goes wrong (cpuset too narrow, a slice can't run any process), `systemctl status <slice>` will show the error and the operator can either fix the env vars and redeploy or `rm /etc/systemd/system/<slice>.slice.d/99-left4me-cpuset.conf` followed by `systemctl daemon-reload` to revert.
|
|
||||||
|
|
||||||
## Open questions
|
|
||||||
|
|
||||||
None blocking. Possible v2 candidates if measurement justifies them:
|
|
||||||
|
|
||||||
- Pair this with kernel `isolcpus=` boot params for true core isolation.
|
|
||||||
- Auto-pin NIC IRQs to core 0 (would compose with this isolation).
|
|
||||||
- Per-instance `CPUAffinity=` driven by a deploy-env knob, partitioning the game-core set across instances deterministically.
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- systemd.resource-control(5) — `AllowedCPUs=` semantics.
|
|
||||||
- Linux Documentation/admin-guide/cgroup-v2.rst — cpuset controller behavior on `cpuset.cpus` / `cpuset.cpus.effective`.
|
|
||||||
- Existing perf-baseline spec: `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md` — sibling work that introduced the slices this spec extends.
|
|
||||||
|
|
@ -1,83 +0,0 @@
|
||||||
# l4d2 cpu pinning — decision record (deferred)
|
|
||||||
|
|
||||||
Date: 2026-05-09
|
|
||||||
Status: decision (no implementation)
|
|
||||||
|
|
||||||
## Question
|
|
||||||
|
|
||||||
After the lifecycle + drift fix landed (commits `8552c55`, `67b5521`), the
|
|
||||||
question came up: with `AllowedCPUs=1-7` already constraining game servers
|
|
||||||
to cores 1–7, do CFS scheduler migrations *within* that range still cause
|
|
||||||
meaningful jitter? Should we hard-pin each instance to a single core?
|
|
||||||
|
|
||||||
## Investigation
|
|
||||||
|
|
||||||
The classic "lazy CFS" sysctl knob is **gone** on modern kernels. Verified
|
|
||||||
on Trixie's running kernel 6.12 (`ckn@10.0.4.128`):
|
|
||||||
|
|
||||||
```
|
|
||||||
/sbin/sysctl -a | grep -E "sched_migration_cost|sched_min_granularity|sched_wakeup_granularity|sched_latency"
|
|
||||||
# (no output)
|
|
||||||
```
|
|
||||||
|
|
||||||
`kernel.sched_migration_cost_ns` and the other classic CFS tunables were
|
|
||||||
removed in 5.13+ as part of the scheduler internals refactor that culminated
|
|
||||||
in EEVDF (6.6). Only `kernel.sched_rt_period_us` / `sched_rt_runtime_us`
|
|
||||||
remain. There is no global "be lazy about migrations" knob anymore.
|
|
||||||
|
|
||||||
### Available paths
|
|
||||||
|
|
||||||
| Option | Cost | Strictness | Pays off when |
|
|
||||||
|---|---|---|---|
|
|
||||||
| Trust CFS + `Nice=-5` + `AllowedCPUs=1-7` (current) | None | Soft | ≤ 3 instances on 7 cores; CFS rarely migrates active CPU-bound nice<0 tasks |
|
|
||||||
| Per-instance `CPUAffinity=N` drop-in | Web-app machinery to write drop-ins, daemon-reload, modulo or DB-persisted assignment | Strict | ≥ 4 instances (each gets exclusive core), or measured jitter |
|
|
||||||
| `isolcpus=1-7 nohz_full=1-7 rcu_nocbs=1-7` kernel cmdline | GRUB edit + reboot, host-specific | Strongest (also evicts kernel softirqs/RCU/timer ticks from game cores) | Tickrate-128 with measurable kernel-induced jitter |
|
|
||||||
| `SCHED_FIFO` per unit | Risky (RT misconfig can stall kernel) | Strict | Already documented as ops-side escape hatch in `deploy/README.md` |
|
|
||||||
|
|
||||||
### Why deferring is defensible
|
|
||||||
|
|
||||||
- The slice's `AllowedCPUs=1-7` already prevents game servers from running on core 0. The open question is "do they migrate within 1–7?" — yes, CFS can migrate, but for long-running CPU-bound `srcds` with `Nice=-5`, migrations are infrequent. CFS prefers cache locality and only migrates when an idle core "steals" or a periodic load-balance tick detects imbalance.
|
|
||||||
- With ≤ 3 instances on 7 game cores, the load balancer rarely sees imbalance to fix.
|
|
||||||
- Per-instance hard pinning adds non-trivial machinery (drop-in writer through `left4me-systemctl`, or extending `instance.env` + a `taskset` wrapper in the unit). Not warranted unless we observe a real problem.
|
|
||||||
- `deploy/README.md` already documents the `CPUAffinity=N` per-instance drop-in as an opt-in escape hatch. An operator who measures jitter can apply it without code changes.
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
**No code change.** Keep the current setup:
|
|
||||||
|
|
||||||
- Slice-level `AllowedCPUs=1-7` ensures game servers never touch core 0.
|
|
||||||
- `Nice=-5` keeps active srcds tasks weighted heavily so CFS prefers leaving them alone.
|
|
||||||
- The `CPUAffinity=N` per-instance drop-in remains the documented escape hatch.
|
|
||||||
|
|
||||||
## Revisit triggers
|
|
||||||
|
|
||||||
Any of these signals appears, then design + implement strict per-instance pinning:
|
|
||||||
|
|
||||||
- ≥ 4 game-server instances running simultaneously on one host.
|
|
||||||
- A specific server reports tickrate dips / rubber-banding correlated with another instance starting or a build sandbox firing.
|
|
||||||
- `perf stat -e sched:sched_migrate_task -p <srcds-pid>` shows > 1 migration/sec under load.
|
|
||||||
|
|
||||||
When revisiting, two implementation paths to choose from:
|
|
||||||
|
|
||||||
1. **Modulo assignment in the host library.** Read `LEFT4ME_GAME_CPUS` (or parse the slice's `AllowedCPUs=` drop-in), pick `game_cpus[(int(name) - 1) % len(game_cpus)]`, write `L4D2_CPU=N` into `instance.env`, wrap the unit's `ExecStart` with `taskset -c ${L4D2_CPU}`. Stateless, deterministic, no DB column. **Preferred.**
|
|
||||||
2. **Persisted assignment.** Add `Server.cpu_pin` column, web app picks at initialize time and stores. Survives `LEFT4ME_GAME_CPUS` changes (each server keeps its assigned core). Bigger ripple.
|
|
||||||
|
|
||||||
## Verification (no-op confirmation)
|
|
||||||
|
|
||||||
```sh
|
|
||||||
ssh ckn@10.0.4.128 'systemctl show l4d2-game.slice -p AllowedCPUs'
|
|
||||||
# expect: AllowedCPUs=1-7
|
|
||||||
|
|
||||||
ssh ckn@10.0.4.128 'cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective'
|
|
||||||
# expect: 0 (everything-not-game still pinned to core 0)
|
|
||||||
|
|
||||||
# When ≥ 1 server is running:
|
|
||||||
ssh ckn@10.0.4.128 'for p in $(pgrep srcds); do grep ^Cpus_allowed_list /proc/$p/status; done'
|
|
||||||
# expect: 1-7 (CFS picks whichever of those is hottest at any given moment)
|
|
||||||
```
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- `docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md` — sibling design that introduced the `AllowedCPUs=1-7` slice constraint this record builds on.
|
|
||||||
- `deploy/README.md` "Performance Tuning" section — the `CPUAffinity=N` per-instance escape hatch.
|
|
||||||
- Linux kernel changelog 5.13+ — removal of classic CFS tunable sysctls.
|
|
||||||
|
|
@ -1,230 +0,0 @@
|
||||||
# l4d2 server host perf baseline — design
|
|
||||||
|
|
||||||
Date: 2026-05-09
|
|
||||||
Status: design
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Apply a host-side performance and resource-isolation baseline to every L4D2 server instance, using systemd unit directives, a slice hierarchy, and host sysctls. The blueprint-level game configuration (tickrate, sv_minrate/maxrate, fps_max, plugins) stays the responsibility of the individual server maintainer and is out of scope.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- Game-server processes get measurable scheduling, I/O, and OOM priority over the script-build sandbox and over interactive system traffic.
|
|
||||||
- One misbehaving server cannot OOM-kill its siblings or the host.
|
|
||||||
- The kernel's UDP path is sized for sustained Source-engine traffic instead of distro defaults.
|
|
||||||
- Operators have documented escape hatches for host-specific tuning (CPU pinning, governor, NIC IRQs, real-time scheduling) without any of it being imposed by default.
|
|
||||||
|
|
||||||
## Non-goals
|
|
||||||
|
|
||||||
- ConVars, blueprint arguments, plugins, tickrate, rate values — owned by the maintainer of each server.
|
|
||||||
- Real-time (`SCHED_FIFO`/`SCHED_RR`) scheduling for game servers. Documented as opt-in only; see Out-of-scope rationale.
|
|
||||||
- CPU governor changes. Documented opt-in only.
|
|
||||||
- Per-instance `CPUAffinity`. Host-specific; documented only.
|
|
||||||
- NIC ring-buffer / IRQ-pinning changes. Hardware-specific; documented only.
|
|
||||||
- Job-scheduler awareness ("don't build a script overlay while server X has players"). Cgroup weights cover this in v1; revisit if real-world data disagrees.
|
|
||||||
- Hardening tightening (`ProtectKernelTunables=yes`, etc.). Security-focused, separate spec.
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
Current state (commit `965b67e`):
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/left4me-server@.service` runs `srcds_run` as user `left4me` with security hardening (`NoNewPrivileges`, `PrivateTmp`, `PrivateDevices`, `ProtectHome`, `ProtectSystem=strict`, `ReadOnlyPaths`, `ReadWritePaths`, `RestrictSUIDSGID`, `LockPersonality`) but **no scheduling, memory, OOM, kill-signal, or log-rate directives**.
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` runs script-overlay builds via `systemd-run --scope` with `CPUQuota=200%` and `RuntimeMaxSec=3600`, but in the **default cgroup** — it competes against game servers as an equal sibling under `system.slice`.
|
|
||||||
- No host sysctls are deployed. Linux defaults (`rmem_max`/`wmem_max` ≈ 128 KB, `netdev_max_backlog=1000`) are below what sustained UDP gameplay across multiple instances expects.
|
|
||||||
|
|
||||||
srcds is single-threaded per instance, so multi-instance hosts contend over CPU cycles, kernel softirq budget, and journald rate limits.
|
|
||||||
|
|
||||||
## Design
|
|
||||||
|
|
||||||
### Slice topology
|
|
||||||
|
|
||||||
Flat top-level slices, siblings of `system.slice` and `user.slice`:
|
|
||||||
|
|
||||||
```
|
|
||||||
-.slice
|
|
||||||
├── system.slice (default CPUWeight=100, IOWeight=100)
|
|
||||||
├── user.slice (default CPUWeight=100, IOWeight=100)
|
|
||||||
├── l4d2-game.slice (CPUWeight=1000, IOWeight=1000)
|
|
||||||
└── l4d2-build.slice (CPUWeight=10, IOWeight=10)
|
|
||||||
```
|
|
||||||
|
|
||||||
Rationale:
|
|
||||||
|
|
||||||
- 100:1 weight ratio between game and build means: under contention, the build sandbox is starved; when uncontended, the build still gets the full box modulo its own `CPUQuota=200%`.
|
|
||||||
- Flat (not nested under `system.slice`) so a logged-in admin running a heavy task in `user.slice` cannot steal cycles from a live match.
|
|
||||||
|
|
||||||
### Per-instance unit additions (`left4me-server@.service`)
|
|
||||||
|
|
||||||
Add to `[Service]`:
|
|
||||||
|
|
||||||
```
|
|
||||||
Slice=l4d2-game.slice
|
|
||||||
Nice=-5
|
|
||||||
IOSchedulingClass=best-effort
|
|
||||||
IOSchedulingPriority=4
|
|
||||||
OOMScoreAdjust=-200
|
|
||||||
MemoryHigh=1.5G
|
|
||||||
MemoryMax=2G
|
|
||||||
TasksMax=256
|
|
||||||
LimitNOFILE=65536
|
|
||||||
KillSignal=SIGINT
|
|
||||||
TimeoutStopSec=15s
|
|
||||||
LogRateLimitIntervalSec=0
|
|
||||||
```
|
|
||||||
|
|
||||||
Per-directive justification:
|
|
||||||
|
|
||||||
- `Slice=l4d2-game.slice` — places the instance in the high-weight slice.
|
|
||||||
- `Nice=-5` — modest CFS priority bump. Negative `Nice` set by systemd does not require `CAP_SYS_NICE` because systemd applies the value before dropping to the unit user. SCHED_FIFO is intentionally rejected; see Out-of-scope rationale.
|
|
||||||
- `IOSchedulingClass=best-effort` + `IOSchedulingPriority=4` — explicit best-effort with a slight bump above the default of 4 in the same class on most distros; deterministic and harmless.
|
|
||||||
- `OOMScoreAdjust=-200` — game servers survive memory pressure; sandbox dies first (see sandbox section).
|
|
||||||
- `MemoryHigh=1.5G`, `MemoryMax=2G` — soft + hard ceiling. Typical L4D2 srcds runs ~500–800 MB; map-load spikes fit in headroom; a runaway is bounded.
|
|
||||||
- `TasksMax=256` — bounds thread count well above srcds' steady-state usage; prevents fork-bomb style failures from leaking host-wide.
|
|
||||||
- `LimitNOFILE=65536` — Valve wiki recommendation; cheap and matches multi-plugin setups.
|
|
||||||
- `KillSignal=SIGINT` — srcds responds to SIGINT for clean shutdown (writes demos, flushes logs); SIGTERM is harsher.
|
|
||||||
- `TimeoutStopSec=15s` — gives srcds time to finish flush before SIGKILL.
|
|
||||||
- `LogRateLimitIntervalSec=0` — disables journald per-unit rate limiting (default `10000 msgs/30s`). srcds + plugins exceed this on busy maps; dropped messages break diagnostics.
|
|
||||||
|
|
||||||
Existing security directives are kept verbatim.
|
|
||||||
|
|
||||||
### Slice unit files
|
|
||||||
|
|
||||||
New file `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me game-server slice
|
|
||||||
Before=slices.target
|
|
||||||
|
|
||||||
[Slice]
|
|
||||||
CPUWeight=1000
|
|
||||||
IOWeight=1000
|
|
||||||
```
|
|
||||||
|
|
||||||
New file `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me script-sandbox build slice
|
|
||||||
Before=slices.target
|
|
||||||
|
|
||||||
[Slice]
|
|
||||||
CPUWeight=10
|
|
||||||
IOWeight=10
|
|
||||||
```
|
|
||||||
|
|
||||||
### Sandbox slice + OOM placement
|
|
||||||
|
|
||||||
Edit `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` to add to the `systemd-run` invocation (transient service mode — the existing helper uses `--unit=` without `--scope`):
|
|
||||||
|
|
||||||
- `--slice=l4d2-build.slice`
|
|
||||||
- `-p OOMScoreAdjust=500`
|
|
||||||
|
|
||||||
Existing `CPUQuota=200%` and `RuntimeMaxSec=3600` stay. Cgroup weight (slice) and CPU quota (per-unit) compose: weight handles contention, quota handles the absolute ceiling.
|
|
||||||
|
|
||||||
### Host sysctls
|
|
||||||
|
|
||||||
New file `deploy/files/etc/sysctl.d/99-left4me.conf`:
|
|
||||||
|
|
||||||
```
|
|
||||||
net.core.rmem_max = 8388608
|
|
||||||
net.core.wmem_max = 8388608
|
|
||||||
net.core.rmem_default = 524288
|
|
||||||
net.core.wmem_default = 524288
|
|
||||||
net.core.netdev_max_backlog = 5000
|
|
||||||
net.core.netdev_budget = 600
|
|
||||||
vm.swappiness = 10
|
|
||||||
```
|
|
||||||
|
|
||||||
Per-value justification:
|
|
||||||
|
|
||||||
- `rmem_max`/`wmem_max = 8 MB` — Linux default of ~128 KB is a known bottleneck for sustained UDP. 8 MB is the standard 1 Gbit recommendation (Red Hat performance guide); enough headroom for ~10 instances on a host without going to 16 MB.
|
|
||||||
- `rmem_default`/`wmem_default = 512 KB` — protects sockets that don't explicitly call `setsockopt(SO_RCVBUF/SO_SNDBUF)`; harmless when they do.
|
|
||||||
- `netdev_max_backlog = 5000` — default `1000` overflows under multi-instance UDP burst; the per-CPU softnet queue starts dropping packets once full.
|
|
||||||
- `netdev_budget = 600` — gives softirq more packet-drain headroom per pass; default `300` is undersized for multi-Gbit-class hosts.
|
|
||||||
- `vm.swappiness = 10` — universally recommended for latency-sensitive servers; harmless on swapless hosts.
|
|
||||||
|
|
||||||
### Deploy script integration
|
|
||||||
|
|
||||||
`deploy/deploy-test-server.sh` must:
|
|
||||||
|
|
||||||
1. Copy `etc/sysctl.d/99-left4me.conf` to `/etc/sysctl.d/`.
|
|
||||||
2. Run `sysctl --system` (or `sysctl -p /etc/sysctl.d/99-left4me.conf`) so values take effect immediately, not on next boot.
|
|
||||||
3. Copy the two `.slice` files into `/usr/local/lib/systemd/system/`.
|
|
||||||
4. `systemctl daemon-reload` after unit/slice changes (already done in current deploy flow).
|
|
||||||
5. No explicit `systemctl start` of the slices is required — they activate on first child reference.
|
|
||||||
|
|
||||||
### Documented escape hatches (no auto-apply)
|
|
||||||
|
|
||||||
Append a "Performance tuning" section to `deploy/README.md`:
|
|
||||||
|
|
||||||
- **CPU governor**: `cpupower frequency-set -g performance` if jitter under load matters more than power. Schedutil is acceptable for sustained UDP workloads. Provide the one-liner; do not ship a oneshot service in v1.
|
|
||||||
- **CPU affinity per instance**: example drop-in at `/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf` setting `CPUAffinity=N`. Document the strategy "one instance per core, leave core 0 for system + IRQ".
|
|
||||||
- **NIC tuning**: example `ethtool -G <iface> rx 4096 tx 4096`, IRQ-pinning hints. Hardware-specific; ops-only.
|
|
||||||
- **Real-time scheduling opt-in**: example drop-in adding `CPUSchedulingPolicy=fifo`, `CPUSchedulingPriority=10`, `LimitRTPRIO=10`. Include a one-paragraph warning citing RT-throttling defaults (`sched_rt_runtime_us=950000`) and the failure mode if a single instance misbehaves.
|
|
||||||
|
|
||||||
These stay pure documentation in v1 — no code paths, no tests asserting them.
|
|
||||||
|
|
||||||
### Out-of-scope rationale
|
|
||||||
|
|
||||||
- **SCHED_FIFO**: a misbehaving srcds at any RT priority can starve kernel threads and produces failure modes that are harder to diagnose than the jitter problem it claims to solve. `Nice=-5` plus the slice weights captures the practical benefit. Ops who need RT can opt in via the documented drop-in.
|
|
||||||
- **CPU governor auto-set**: Phoronix and Arch comparisons show `schedutil` is within noise of `performance` on sustained workloads like Source UDP; aggressively forcing `performance` would surprise users on power-managed hosts.
|
|
||||||
- **CPUAffinity in the unit**: the unit template is shared across all instances; a single hard-coded `CPUAffinity=` would pin every instance to the same cores, defeating the purpose. Per-instance pinning needs deploy-time policy that is outside v1's scope.
|
|
||||||
|
|
||||||
### Files changed / added
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/files/usr/local/lib/systemd/system/left4me-server@.service (modified)
|
|
||||||
deploy/files/usr/local/lib/systemd/system/l4d2-game.slice (new)
|
|
||||||
deploy/files/usr/local/lib/systemd/system/l4d2-build.slice (new)
|
|
||||||
deploy/files/etc/sysctl.d/99-left4me.conf (new)
|
|
||||||
deploy/files/usr/local/libexec/left4me/left4me-script-sandbox (modified)
|
|
||||||
deploy/deploy-test-server.sh (modified — sysctl --system step)
|
|
||||||
deploy/README.md (modified — performance section)
|
|
||||||
deploy/tests/test_deploy_artifacts.py (modified — assertions)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
`deploy/tests/test_deploy_artifacts.py` additions, following the existing
|
|
||||||
`assert "key=value" in text` pattern:
|
|
||||||
|
|
||||||
- For `left4me-server@.service`, assert every line listed in *Per-instance
|
|
||||||
unit additions* is present verbatim. Each is a separate assertion so a
|
|
||||||
failing line is identifiable.
|
|
||||||
- For `l4d2-game.slice`, assert `CPUWeight=1000` and `IOWeight=1000`.
|
|
||||||
- For `l4d2-build.slice`, assert `CPUWeight=10` and `IOWeight=10`.
|
|
||||||
- For `99-left4me.conf`, assert every sysctl line listed in *Host sysctls*.
|
|
||||||
- For `left4me-script-sandbox`, assert the strings `--slice=l4d2-build.slice`
|
|
||||||
and `OOMScoreAdjust=500` both appear.
|
|
||||||
- Assert the deploy script invokes `sysctl --system` (or
|
|
||||||
`sysctl -p /etc/sysctl.d/99-left4me.conf`) at least once after copying the
|
|
||||||
conf into place.
|
|
||||||
|
|
||||||
No runtime perf tests in v1 — the spec ships defaults, not measured wins.
|
|
||||||
Real-world measurement is left to operators with concrete instance counts,
|
|
||||||
hardware, and player loads.
|
|
||||||
|
|
||||||
## Rollout
|
|
||||||
|
|
||||||
Single deploy. Running game servers will not pick up the new directives until each instance is restarted (systemd does not reapply unit changes to already-running services). The web UI's "stop" + "start" cycle is sufficient. Document this in `deploy/README.md`.
|
|
||||||
|
|
||||||
## Open questions
|
|
||||||
|
|
||||||
None blocking. v2 candidates if measurement justifies them:
|
|
||||||
|
|
||||||
- Per-instance `CPUAffinity` driven by a deploy-env knob (`LEFT4ME_INSTANCE_CPUS`).
|
|
||||||
- Job-worker awareness of "server has active players" to defer builds further than weights alone.
|
|
||||||
- Optional `left4me-host-perf.service` oneshot that sets governor + NIC tuning under a single env-flag opt-in.
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- systemd.exec(5) — `Nice=`, `IOSchedulingClass=`, `OOMScoreAdjust=`, `MemoryHigh=`, `MemoryMax=`, `TasksMax=`, `KillSignal=`, `TimeoutStopSec=`, `LimitNOFILE=`, `LogRateLimitIntervalSec=`.
|
|
||||||
- systemd.resource-control(5) — slice semantics, `CPUWeight=`, `IOWeight=`, weight competition rules.
|
|
||||||
- systemd.kill(5) — signal handling and `KillSignal`.
|
|
||||||
- Red Hat Enterprise Linux Network Performance Tuning Guide — `rmem_max`/`wmem_max`/`netdev_max_backlog`/`netdev_budget`.
|
|
||||||
- LWN "SCHED_FIFO and realtime throttling"; RHEL Real-Time CPU throttling docs — rationale for not shipping RT by default.
|
|
||||||
- Linux Foundation real-time wiki — `sched_rt_runtime_us` semantics.
|
|
||||||
- forums.srcds.com / AlliedModders / linuxquestions.org threads — confirmation that srcds is single-threaded per instance.
|
|
||||||
- Phoronix governor comparisons — performance vs schedutil for sustained workloads.
|
|
||||||
- Multiple latency-tuning guides — `vm.swappiness=10` consensus.
|
|
||||||
|
|
@ -1,217 +0,0 @@
|
||||||
# l4d2 server lifecycle: reboot-safe + drift reconciliation — design
|
|
||||||
|
|
||||||
Date: 2026-05-09
|
|
||||||
Status: design
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Make L4D2 server instances survive a host reboot by switching their lifecycle verbs from `systemctl start`/`stop` to `systemctl enable --now`/`disable --now`. Pair this with a periodic background poller that refreshes `Server.actual_state` so out-of-band state changes (OOM kills, manual `systemctl stop`, crashes that exhaust `Restart=on-failure`) no longer leave the web UI showing stale "running" indicators.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- An L4D2 server started via the web UI (or `l4d2ctl start`) automatically comes back up after a host reboot, with no operator action.
|
|
||||||
- The web app's `Server.actual_state` converges to systemd's actual state within ~30 seconds of any out-of-band change.
|
|
||||||
- The single-source-of-truth for "this server should be running" lives in systemd's wants-symlinks, not in a SQLite row that systemd has no awareness of.
|
|
||||||
- Migration from the existing `systemctl start`-based fleet is a no-op: the next stop+start cycle through the UI converts each server to the enable-based model.
|
|
||||||
|
|
||||||
## Non-goals
|
|
||||||
|
|
||||||
- **Auto-restart on detected drift.** When the poller observes `desired_state=running` but `actual_state=stopped`, this spec does not re-enqueue a start job. That's a v2 UX/policy decision.
|
|
||||||
- **UI surfacing of stale-state warnings.** Once the poller is reliable, the dashboard could show "DB believes X, but actual_state was last refreshed N seconds ago." Out of scope.
|
|
||||||
- **Reconciliation of orphan systemd units.** Units enabled on disk but not represented by any `Server` row (e.g., from a crashed delete) — separate cleanup spec.
|
|
||||||
- **Per-server poller intervals.** A single global cadence is sufficient.
|
|
||||||
- **Replacing `Restart=on-failure`** with anything more elaborate. The unit's existing restart policy stays.
|
|
||||||
- **Reactive-style state propagation.** No SSE/websocket pushes to the UI when actual_state changes. The next page render reads the fresh value from the DB.
|
|
||||||
|
|
||||||
## Premise check: system units, not user units
|
|
||||||
|
|
||||||
`systemctl --user enable --now` has different lifecycle rules — auto-start only at user login (unless `loginctl enable-linger <user>` is set), symlinks land in `~/.config/systemd/user/<target>.wants/`. It would be wrong here.
|
|
||||||
|
|
||||||
This project uses **system units**, confirmed by:
|
|
||||||
|
|
||||||
- Unit path: `/usr/local/lib/systemd/system/left4me-server@.service` is the system search path; user units live in `/etc/systemd/user/` or `~/.config/systemd/user/`.
|
|
||||||
- The `left4me-systemctl` helper (`deploy/files/usr/local/libexec/left4me/left4me-systemctl:31-44`) calls plain `systemctl` (no `--user` flag) and runs as **root** via the sudoers rule at `deploy/files/etc/sudoers.d/left4me:2`.
|
|
||||||
- The unit's `[Install] WantedBy=multi-user.target` (line 43 of the unit) is a system target; user units would use `default.target`.
|
|
||||||
- The same machinery is already in production for `left4me-web.service` — `deploy-test-server.sh` runs `sudo systemctl enable --now left4me-web.service`, and that's how the web service auto-came-back after today's reboot. We're applying the same pattern to the game-server template instances.
|
|
||||||
|
|
||||||
`systemctl enable left4me-server@1.service` will create `/etc/systemd/system/multi-user.target.wants/left4me-server@1.service` symlinked to `/usr/local/lib/systemd/system/left4me-server@.service`. systemd handles the template instantiation via the `@` syntax automatically.
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
Today's behavior, confirmed by forensics on `ckn@10.0.4.128` after the operator ran `sudo systemctl poweroff` at 11:48:02 CEST:
|
|
||||||
|
|
||||||
- The `left4me-systemctl` helper (`deploy/files/usr/local/libexec/left4me/left4me-systemctl`) accepts the verbs `start`, `stop`, and `show`, each invoking the literal `systemctl` action.
|
|
||||||
- `l4d2host/service_control.py` exposes `start_service(name)` and `stop_service(name)` that build `systemctl_command("start"/"stop", name)`.
|
|
||||||
- `l4d2host/instances.py` `start_instance` and `stop_instance` call those functions.
|
|
||||||
- `systemctl start` is a transient activation. systemd creates **no** `WantedBy=multi-user.target.wants/` symlink, so the unit doesn't auto-start on next boot.
|
|
||||||
- After the host poweroff at 11:48:02, both running instances were cleanly shut down. The host rebooted; `left4me-web.service` came back (it *is* `enable`d); the game instances did not.
|
|
||||||
- The web app's `Server.actual_state` is only ever written by `refresh_server_actual_state_after_job()` in `l4d2web/services/job_worker.py:581`, called solely after a job completes. With no jobs in flight after the reboot, the row's `actual_state="running"` from yesterday remained the displayed truth.
|
|
||||||
|
|
||||||
## Design
|
|
||||||
|
|
||||||
### Part A — Switch lifecycle verbs to `enable --now` / `disable --now`
|
|
||||||
|
|
||||||
**Helper script** (`deploy/files/usr/local/libexec/left4me/left4me-systemctl`):
|
|
||||||
|
|
||||||
Rename the action verbs the helper accepts: drop `start`/`stop`, add `enable`/`disable`. The bodies become:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
case "$action" in
|
|
||||||
enable) exec "$systemctl" enable --now "$unit" ;;
|
|
||||||
disable) exec "$systemctl" disable --now "$unit" ;;
|
|
||||||
show) exec "$systemctl" show "$unit" --property=ActiveState --property=SubState ;;
|
|
||||||
*) reject ;;
|
|
||||||
esac
|
|
||||||
```
|
|
||||||
|
|
||||||
The existing instance-name validation regex (currently lines 12–17) is unchanged — it constrains the `<name>` argument, not the action. The sudoers rule at `deploy/files/etc/sudoers.d/left4me`:
|
|
||||||
|
|
||||||
```
|
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
|
|
||||||
```
|
|
||||||
|
|
||||||
already passes any args; no sudoers update needed.
|
|
||||||
|
|
||||||
**Python wrapper** (`l4d2host/service_control.py`):
|
|
||||||
|
|
||||||
Rename `start_service` → `enable_service` and `stop_service` → `disable_service`. Each builds `systemctl_command("enable", name)` / `systemctl_command("disable", name)`. The existing `show_service` is unchanged.
|
|
||||||
|
|
||||||
**Instance lifecycle** (`l4d2host/instances.py`):
|
|
||||||
|
|
||||||
- `start_instance` — replace the `start_service(...)` call with `enable_service(...)`.
|
|
||||||
- `stop_instance` — replace `stop_service(...)` with `disable_service(...)`.
|
|
||||||
- `_purge_instance` (called by `delete_instance` and `reset_instance`) — replace `stop_service(...)` with `disable_service(...)`. A disabled-but-not-running unit's `disable --now` is a no-op for the runtime + still removes any leftover wants-symlink, which is the desired idempotent behavior.
|
|
||||||
|
|
||||||
**CLI surface** (`l4d2host/cli.py`):
|
|
||||||
|
|
||||||
`l4d2ctl start <name>` and `l4d2ctl stop <name>` keep their names per the contract in `AGENTS.md` ("Host CLI write commands are fixed to: install, initialize, start, stop, delete"). The semantics now genuinely match the verb at the operator level: `start` = "ensure running, now and after reboot." Internal call paths route through `start_instance` → `enable_service` as renamed above.
|
|
||||||
|
|
||||||
**Web facade** (`l4d2web/services/l4d2_facade.py`):
|
|
||||||
|
|
||||||
Unchanged. Still invokes `["l4d2ctl", "start", ...]` / `["l4d2ctl", "stop", ...]`.
|
|
||||||
|
|
||||||
### Part B — Periodic state poller
|
|
||||||
|
|
||||||
Add a single background thread spawned alongside the existing job-worker threads in `l4d2web/services/job_worker.py:start_job_workers`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def start_state_poller(app):
|
|
||||||
interval = float(app.config.get("STATE_POLLER_INTERVAL_SECONDS", 30))
|
|
||||||
thread = threading.Thread(
|
|
||||||
target=state_poller_loop,
|
|
||||||
args=(app, interval),
|
|
||||||
daemon=True,
|
|
||||||
name="left4me-state-poller",
|
|
||||||
)
|
|
||||||
thread.start()
|
|
||||||
|
|
||||||
|
|
||||||
def state_poller_loop(app, interval):
|
|
||||||
while True:
|
|
||||||
try:
|
|
||||||
with app.app_context():
|
|
||||||
poll_all_servers()
|
|
||||||
except Exception:
|
|
||||||
pass # never let a single failure kill the loop
|
|
||||||
time.sleep(interval)
|
|
||||||
|
|
||||||
|
|
||||||
def poll_all_servers():
|
|
||||||
with session_scope() as db:
|
|
||||||
active_server_ids = set(db.scalars(
|
|
||||||
select(Job.server_id).where(Job.state.in_(("queued", "running")))
|
|
||||||
).all())
|
|
||||||
server_ids = [
|
|
||||||
sid for sid in db.scalars(select(Server.id)).all()
|
|
||||||
if sid not in active_server_ids
|
|
||||||
]
|
|
||||||
for sid in server_ids:
|
|
||||||
try:
|
|
||||||
refresh_server_actual_state(sid)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
```
|
|
||||||
|
|
||||||
**Why skip in-flight servers:** the job worker's success path also calls `refresh_server_actual_state`. Both writers touching the same row at overlapping times produces no kernel-level race (SQLite WAL serializes writes), but a poller observing transient state mid-job — e.g., the brief window where the unit is being enabled but `srcds` hasn't fully bound the port yet — could write a misleading value that the worker's post-completion refresh then overwrites. Skipping is simpler than reasoning about the orderings.
|
|
||||||
|
|
||||||
**Wiring in startup** (`l4d2web/app.py:create_app`): call `start_state_poller(app)` adjacent to `start_job_workers(app)`, gated by the same `should_start_workers` predicate (existing lines 84–88: `JOB_WORKER_ENABLED && not TESTING && not _in_flask_cli_context()`).
|
|
||||||
|
|
||||||
**First-tick latency:** the loop runs `poll_all_servers()` once before the first `time.sleep(interval)`, so the DB catches up to systemd reality within milliseconds of app boot (one `systemctl show` per server). A separate startup-reconcile path is not needed.
|
|
||||||
|
|
||||||
**Concurrency:** the poller and the workers all use `session_scope()` (`l4d2web/db.py:44–58`) which commits-on-success / rolls-back-on-exception. SQLite WAL mode (configured by the deploy script per `deploy-test-server.sh:188-198`) handles concurrent reads + serialized writes. No new locking primitives.
|
|
||||||
|
|
||||||
### Why both parts
|
|
||||||
|
|
||||||
Either part alone is insufficient:
|
|
||||||
|
|
||||||
- **Part A alone** survives reboots but doesn't catch OOM kills, manual `systemctl disable --now <unit>` from a shell, or crashes that exhaust `Restart=on-failure`. The DB still drifts in those cases.
|
|
||||||
- **Part B alone** keeps the DB honest but doesn't bring servers back after a reboot — the operator would still be looking at `actual_state=stopped` on a server they expected to come back, with the only recourse being to click start again.
|
|
||||||
|
|
||||||
Together: enable-based lifecycle keeps systemd as the source of truth; the poller keeps the DB honest about whatever systemd reports.
|
|
||||||
|
|
||||||
### Migration on running hosts
|
|
||||||
|
|
||||||
Zero one-shot needed. After this lands, a server currently running via the old `systemctl start` (so: started but not enabled) keeps running through the deploy. The next time the operator clicks stop in the UI, `systemctl disable --now` runs — `disable` is a no-op for an already-not-enabled unit, but `--now` still kills the live process. The next start runs `systemctl enable --now`, which enables + starts. From that point on the unit survives reboot.
|
|
||||||
|
|
||||||
The poller's first tick after deploy will refresh every server's `actual_state` to whatever systemd reports — if the test box's two stale "running" rows still claim running but no unit is loaded, the next tick flips them to `stopped`.
|
|
||||||
|
|
||||||
### Files changed / added
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/files/usr/local/libexec/left4me/left4me-systemctl (Part A — verbs)
|
|
||||||
l4d2host/service_control.py (Part A — rename)
|
|
||||||
l4d2host/instances.py (Part A — call new names)
|
|
||||||
l4d2host/tests/test_lifecycle.py (Part A — test updates)
|
|
||||||
l4d2host/tests/test_service_control.py (Part A — new direct unit tests, create if absent)
|
|
||||||
deploy/tests/test_deploy_artifacts.py (Part A — helper assertions)
|
|
||||||
|
|
||||||
l4d2web/services/job_worker.py (Part B — poller code)
|
|
||||||
l4d2web/app.py (Part B — wire start_state_poller)
|
|
||||||
l4d2web/config.py (Part B — STATE_POLLER_INTERVAL_SECONDS default)
|
|
||||||
l4d2web/tests/test_job_worker.py (Part B — poller tests)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
### Part A
|
|
||||||
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args`: update body assertions to expect `enable)` / `disable)` / `show)`. Add an assertion that `enable)` body contains `enable --now` and `disable)` body contains `disable --now`. Update rejected-action examples (drop `start`/`stop` since they're no longer accepted).
|
|
||||||
- `l4d2host/tests/test_lifecycle.py`: every assertion that mocks `run_command` and inspects the systemctl-helper invocation needs the action token updated from `start` → `enable` and `stop` → `disable`. The `_purge_instance` paths exercised by `delete_instance` and `reset_instance` flip from `stop` to `disable`.
|
|
||||||
- New direct unit tests in `l4d2host/tests/test_service_control.py` (create the file if it doesn't exist already): exercise `enable_service` and `disable_service` with a mocked `run_command` and assert they emit `["sudo", "-n", helper_path, "enable"|"disable", name]`.
|
|
||||||
|
|
||||||
### Part B
|
|
||||||
|
|
||||||
- `l4d2web/tests/test_job_worker.py::test_state_poller_refreshes_each_server` (new): seed two `Server` rows with `actual_state="unknown"`; monkey-patch `refresh_server_actual_state` to record calls; run one iteration of `poll_all_servers()`; assert it was called once per server in any order.
|
|
||||||
- `test_state_poller_skips_servers_with_inflight_jobs` (new): seed a `Server` row + a `Job` with `state="running"` for that server; run `poll_all_servers()`; assert `refresh_server_actual_state` was NOT called for that server.
|
|
||||||
- `test_state_poller_swallows_per_server_exceptions` (new): make `refresh_server_actual_state` raise for one server; assert other servers are still polled and the loop function returns normally.
|
|
||||||
- `test_state_poller_disabled_when_job_workers_disabled` (new): create app with `JOB_WORKER_ENABLED=False`; assert `start_state_poller` is not invoked (or that no `left4me-state-poller` thread is alive after `create_app`).
|
|
||||||
|
|
||||||
### CI sanity
|
|
||||||
|
|
||||||
`pytest deploy/tests/ l4d2host/tests l4d2web/tests -q` is green except the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state` (stale since `caa8b83`, out of scope).
|
|
||||||
|
|
||||||
## Rollout
|
|
||||||
|
|
||||||
Single deploy. After deploy:
|
|
||||||
|
|
||||||
1. The poller's first tick (within seconds of `left4me-web.service` starting) refreshes every server's `actual_state` to systemd reality. Any servers stuck on stale "running" flip to "stopped" automatically. **No operator UI clicks required.**
|
|
||||||
2. Servers currently `running` (started via the old `systemctl start`) keep running, but they're not yet `enabled`. The operator's next stop+start through the UI converts them to enable-based and from that point onwards they're reboot-safe.
|
|
||||||
3. Newly-started servers (`l4d2ctl start <name>` or web UI start) are enable-based from the first invocation.
|
|
||||||
|
|
||||||
If something goes wrong — e.g., the helper rejects a previously-valid invocation or the poller floods the journal — the helper script + `service_control.py` change can be reverted independently of the poller, and vice versa.
|
|
||||||
|
|
||||||
## Open questions
|
|
||||||
|
|
||||||
None blocking. v2 candidates:
|
|
||||||
|
|
||||||
- Auto-restart on `desired_state=running && actual_state=stopped` (separate UX decision).
|
|
||||||
- Per-server poll intervals or backoff for repeatedly-failing servers.
|
|
||||||
- A "drift" badge in the UI when `actual_state_updated_at` is older than 2× the poll interval (proxy for "the poller isn't running" or "the host is unreachable").
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- systemd.unit(5) — `WantedBy=`, `Install` section semantics.
|
|
||||||
- systemctl(1) — `enable --now` / `disable --now` flags.
|
|
||||||
- Existing perf-baseline spec: `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md`.
|
|
||||||
- Existing CPU-isolation spec: `docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md`.
|
|
||||||
- `AGENTS.md` — Host CLI write-command set is fixed; this spec preserves that contract.
|
|
||||||
|
|
@ -1,487 +0,0 @@
|
||||||
# l4d2 network shaping & marking — design
|
|
||||||
|
|
||||||
Date: 2026-05-10
|
|
||||||
Status: design
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Add a network-side player-experience baseline alongside the existing host
|
|
||||||
perf baseline. Three concerns ship together:
|
|
||||||
|
|
||||||
1. **Mark srcds outbound packets** with DSCP `EF` and skb priority `6:0` so
|
|
||||||
any qdisc — host CAKE, ISP gear that honours DSCP, future systems —
|
|
||||||
recognises L4D2 game traffic as latency-sensitive. Marking happens by uid
|
|
||||||
match on the `left4me` user.
|
|
||||||
2. **Round out the UDP-socket sysctl baseline** (`udp_rmem_min`,
|
|
||||||
`udp_wmem_min`), set the default qdisc explicitly to `fq_codel`, and
|
|
||||||
switch TCP to `bbr` so coexisting TCP egress (admin, backups, web app,
|
|
||||||
apt) cannot bufferbloat the link the players share.
|
|
||||||
3. **Shape egress with CAKE.** On the test deploy, install a systemd oneshot
|
|
||||||
that applies `tc qdisc replace … cake …` from an operator-edited env
|
|
||||||
file. On production hosts running `systemd-networkd`, document the
|
|
||||||
equivalent `[CAKE]` section in the matching `.network` file as the
|
|
||||||
long-term path.
|
|
||||||
|
|
||||||
The intent is "all reasonable measures that do not depend on host-specific
|
|
||||||
hardware." Hardware-specific tuning (NIC ring buffers, IRQ pinning, CPU
|
|
||||||
governor, real-time scheduling, CPU affinity) remains a documented escape
|
|
||||||
hatch — same boundary the existing perf-baseline spec drew. The pieces
|
|
||||||
that *are* universally safe ship as defaults.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- Game-server UDP packets carry an unambiguous priority signal in DSCP and
|
|
||||||
in `skb->priority`, set on the host before any qdisc inspects them.
|
|
||||||
- A coexisting bulk TCP flow on the same host (backup upload, package
|
|
||||||
fetch, web-app response) cannot push the bottleneck queue ahead of game
|
|
||||||
UDP under saturation.
|
|
||||||
- An operator who declares uplink bandwidth gets fair-queueing egress
|
|
||||||
shaping with diffserv-aware tin selection — i.e. EF-marked srcds traffic
|
|
||||||
drops into the highest-priority CAKE tin, per-destination-host fairness
|
|
||||||
keeps every connected player on equal footing.
|
|
||||||
- A production deployment using `systemd-networkd` has a one-block
|
|
||||||
configuration recipe, no helper script needed.
|
|
||||||
- Operators have a documented set of additional knobs (ingress shaping via
|
|
||||||
IFB, `busy_poll`, GRO toggling) for cases the default baseline does not
|
|
||||||
cover. None of these auto-apply.
|
|
||||||
|
|
||||||
## Non-goals
|
|
||||||
|
|
||||||
- NIC ring-buffer / IRQ pinning / RPS / RFS / hardware timestamping —
|
|
||||||
already declared host-specific in the perf-baseline spec; not
|
|
||||||
re-litigated here.
|
|
||||||
- `busy_poll` / `busy_read` as defaults — non-trivial CPU cost; documented
|
|
||||||
as opt-in.
|
|
||||||
- Ingress shaping via IFB as a default — only matters if egress CAKE turns
|
|
||||||
out load-bearing and ingress is also saturated; documented as opt-in.
|
|
||||||
- Real-time scheduling, governor changes — already declined by the
|
|
||||||
perf-baseline spec.
|
|
||||||
- Blueprint-side game settings (`sv_minrate`, `sv_maxrate`, tickrate,
|
|
||||||
`fps_max`) — owned by the server maintainer.
|
|
||||||
- Auto-detection or measurement of uplink bandwidth. CAKE only shapes
|
|
||||||
correctly when its declared bandwidth sits below the real bottleneck;
|
|
||||||
the operator must measure once and configure.
|
|
||||||
- Iface-flap watchdog. `tc qdisc replace` is idempotent; on prod,
|
|
||||||
`systemd-networkd` reapplies CAKE across iface lifecycle events. On
|
|
||||||
test, `systemctl restart left4me-cake.service` is the documented
|
|
||||||
recovery.
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
Current state (commit `62d6d4c` or thereabouts):
|
|
||||||
|
|
||||||
- The perf-baseline spec ships `/etc/sysctl.d/99-left4me.conf` with
|
|
||||||
`rmem_max`, `wmem_max`, `rmem_default`, `wmem_default`,
|
|
||||||
`netdev_max_backlog`, `netdev_budget`, `vm.swappiness`. No per-socket
|
|
||||||
UDP minimums, no default-qdisc directive, no TCP congestion-control
|
|
||||||
setting.
|
|
||||||
- `srcds_run` runs as system user `left4me`. srcds itself does not set
|
|
||||||
`IP_TOS` or `SO_PRIORITY`, so its UDP packets leave the host with
|
|
||||||
DSCP 0 and priority 0 — indistinguishable from any other UDP traffic to
|
|
||||||
any qdisc.
|
|
||||||
- The deploy ships nftables-relevant infrastructure only via package
|
|
||||||
defaults (Debian Trixie ships `nftables` in base, but no `left4me`
|
|
||||||
table is created).
|
|
||||||
- No qdisc is explicitly configured. The kernel's per-iface default
|
|
||||||
applies — `fq_codel` on Trixie, but only because Debian's default has
|
|
||||||
been `fq_codel` since Buster.
|
|
||||||
- The deploy script already copies sysctl drop-ins and runs
|
|
||||||
`sysctl --system` (`deploy/deploy-test-server.sh:196`).
|
|
||||||
|
|
||||||
## Design
|
|
||||||
|
|
||||||
### Sysctl additions to `99-left4me.conf`
|
|
||||||
|
|
||||||
Append to `deploy/files/etc/sysctl.d/99-left4me.conf`:
|
|
||||||
|
|
||||||
```
|
|
||||||
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
|
|
||||||
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
|
|
||||||
net.ipv4.udp_rmem_min = 16384
|
|
||||||
net.ipv4.udp_wmem_min = 16384
|
|
||||||
|
|
||||||
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian
|
|
||||||
# Trixie already defaults to fq_codel; setting it explicitly is
|
|
||||||
# belt-and-suspenders and survives kernel-default churn.
|
|
||||||
net.core.default_qdisc = fq_codel
|
|
||||||
|
|
||||||
# TCP congestion control: BBR for any bulk TCP egress on the host (admin
|
|
||||||
# SSH, backups, package fetches, web-app responses) so a long flow does
|
|
||||||
# not push the bottleneck queue ahead of game UDP. UDP srcds is
|
|
||||||
# unaffected.
|
|
||||||
net.ipv4.tcp_congestion_control = bbr
|
|
||||||
```
|
|
||||||
|
|
||||||
The deploy already runs `sysctl --system` after copying the conf
|
|
||||||
(`deploy/deploy-test-server.sh:198`); no script change required for this
|
|
||||||
block.
|
|
||||||
|
|
||||||
### nftables packet marking
|
|
||||||
|
|
||||||
New file `deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft`:
|
|
||||||
|
|
||||||
```nft
|
|
||||||
table inet left4me_mark {
|
|
||||||
chain mangle_output {
|
|
||||||
type filter hook output priority mangle; policy accept;
|
|
||||||
meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000
|
|
||||||
meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Per-element rationale:
|
|
||||||
|
|
||||||
- `meta skuid "left4me"` — every srcds instance runs as that user. The
|
|
||||||
match is exact; nothing else on the host matches. No false positives
|
|
||||||
against the web app (which runs as `left4me` too but speaks TCP) or the
|
|
||||||
build sandbox (different uid).
|
|
||||||
- `meta l4proto udp` — bypass anything not UDP, including the future
|
|
||||||
RCON/HTTP TCP traffic from the web app.
|
|
||||||
- `ip dscp set ef` / `ip6 dscp set ef` — DSCP `EF` (Expedited Forwarding,
|
|
||||||
decimal 46) is the standard low-latency marking. CAKE's `diffserv4`
|
|
||||||
preset routes EF into its highest-priority "Voice" tin. Two rules,
|
|
||||||
one per L3 family, because in an `inet` table the `ip` matcher only
|
|
||||||
fires on v4 and `ip6` only on v6.
|
|
||||||
- `meta priority set 0006:0000` — sets `skb->priority` to class `6:0`.
|
|
||||||
Read by qdiscs that classify on skb priority (CAKE included) ahead of
|
|
||||||
any DSCP table lookup. Set inline with the DSCP rule so a single
|
|
||||||
rule-match runs both statements.
|
|
||||||
|
|
||||||
The table is named `left4me_mark` and lives in its own `inet` namespace.
|
|
||||||
It does not touch, depend on, or conflict with any nftables config the
|
|
||||||
operator may run independently. `nft -f` loads the file; `nft delete
|
|
||||||
table inet left4me_mark` cleanly removes it.
|
|
||||||
|
|
||||||
New unit `deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service`:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me nftables packet marking (DSCP EF + priority for srcds)
|
|
||||||
After=network-pre.target
|
|
||||||
Before=network.target
|
|
||||||
Wants=network-pre.target
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
RemainAfterExit=yes
|
|
||||||
ExecStart=/usr/sbin/nft -f /usr/local/lib/left4me/nft/left4me-mark.nft
|
|
||||||
ExecStop=/usr/sbin/nft delete table inet left4me_mark
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
```
|
|
||||||
|
|
||||||
`After=network-pre.target` / `Before=network.target` keeps the rules in
|
|
||||||
place before any iface comes up, so the very first packet srcds emits
|
|
||||||
post-boot is already marked.
|
|
||||||
|
|
||||||
Deploy script changes:
|
|
||||||
|
|
||||||
- Ensure `nftables` is installed (`apt-get install -y nftables`;
|
|
||||||
idempotent — package is in Trixie base).
|
|
||||||
- Create `/usr/local/lib/left4me/nft/` and copy `left4me-mark.nft` into
|
|
||||||
it.
|
|
||||||
- Copy the unit, `daemon-reload`, `systemctl enable --now
|
|
||||||
left4me-nft-mark.service`.
|
|
||||||
|
|
||||||
### CAKE egress shaper — test deploy mechanism
|
|
||||||
|
|
||||||
Three files plus deploy-script changes. All operator-tunable knobs go in
|
|
||||||
the env file; the helper and unit are static.
|
|
||||||
|
|
||||||
**`deploy/files/etc/left4me/cake.env`** (template; deploy installs only
|
|
||||||
if absent so operator edits survive re-runs):
|
|
||||||
|
|
||||||
```
|
|
||||||
# Uplink bandwidth in Mbit/s. Set to ~95% of the smaller of measured
|
|
||||||
# upload and measured download. CAKE only shapes correctly when its
|
|
||||||
# declared bandwidth sits below the real bottleneck. If unset, the
|
|
||||||
# left4me-cake.service unit logs a warning and exits 0 (no shaping).
|
|
||||||
LEFT4ME_UPLINK_MBIT=
|
|
||||||
|
|
||||||
# Egress interface. If unset, auto-detected from the IPv4 default route.
|
|
||||||
LEFT4ME_UPLINK_IFACE=
|
|
||||||
```
|
|
||||||
|
|
||||||
**`deploy/files/usr/local/libexec/left4me/left4me-apply-cake`** (mode
|
|
||||||
`0755`, owner `root:root`). The helper takes a single argument — `apply`
|
|
||||||
or `clear` — so the unit's `ExecStart` and `ExecStop` both call the same
|
|
||||||
script and the unit file stays free of shell escaping:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
#!/bin/sh
|
|
||||||
set -eu
|
|
||||||
|
|
||||||
mode=${1:-apply}
|
|
||||||
|
|
||||||
if [ -r /etc/left4me/cake.env ]; then
|
|
||||||
. /etc/left4me/cake.env
|
|
||||||
fi
|
|
||||||
|
|
||||||
resolve_iface() {
|
|
||||||
if [ -n "${LEFT4ME_UPLINK_IFACE:-}" ]; then
|
|
||||||
printf '%s' "$LEFT4ME_UPLINK_IFACE"
|
|
||||||
return
|
|
||||||
fi
|
|
||||||
ip -4 route show default | awk '/default/ {print $5; exit}'
|
|
||||||
}
|
|
||||||
|
|
||||||
case "$mode" in
|
|
||||||
apply)
|
|
||||||
if [ -z "${LEFT4ME_UPLINK_MBIT:-}" ]; then
|
|
||||||
echo "left4me-cake: LEFT4ME_UPLINK_MBIT unset; skipping shaper" >&2
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
iface=$(resolve_iface)
|
|
||||||
if [ -z "$iface" ]; then
|
|
||||||
echo "left4me-cake: cannot determine egress iface; skipping" >&2
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
exec tc qdisc replace dev "$iface" root cake \
|
|
||||||
bandwidth "${LEFT4ME_UPLINK_MBIT}mbit" \
|
|
||||||
internet diffserv4 dual-dsthost
|
|
||||||
;;
|
|
||||||
clear)
|
|
||||||
iface=$(resolve_iface)
|
|
||||||
if [ -z "$iface" ]; then
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
tc qdisc del dev "$iface" root 2>/dev/null || true
|
|
||||||
;;
|
|
||||||
*)
|
|
||||||
echo "usage: $0 [apply|clear]" >&2
|
|
||||||
exit 2
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
```
|
|
||||||
|
|
||||||
`tc qdisc replace` is idempotent: replaces an existing root qdisc on the
|
|
||||||
iface, adds one if absent. Re-running the unit any time is safe. `clear`
|
|
||||||
swallows the "no such qdisc" error so stop is also idempotent.
|
|
||||||
|
|
||||||
Fail-soft on missing config matches the perf-baseline philosophy — the
|
|
||||||
deploy does not refuse to boot servers because the operator has not yet
|
|
||||||
filled in `LEFT4ME_UPLINK_MBIT`. The journal warning surfaces the gap.
|
|
||||||
|
|
||||||
**`deploy/files/usr/local/lib/systemd/system/left4me-cake.service`**:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=left4me CAKE egress shaper
|
|
||||||
After=network-online.target
|
|
||||||
Wants=network-online.target
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
RemainAfterExit=yes
|
|
||||||
EnvironmentFile=-/etc/left4me/cake.env
|
|
||||||
ExecStart=/usr/local/libexec/left4me/left4me-apply-cake apply
|
|
||||||
ExecStop=/usr/local/libexec/left4me/left4me-apply-cake clear
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
```
|
|
||||||
|
|
||||||
Per-flag rationale for the `cake` invocation:
|
|
||||||
|
|
||||||
- `bandwidth ${LEFT4ME_UPLINK_MBIT}mbit` — operator-declared, ≈95% of
|
|
||||||
measured uplink. CAKE only shapes if its declared bandwidth is below
|
|
||||||
the real bottleneck; setting it slightly low moves the queue into a
|
|
||||||
place the host controls.
|
|
||||||
- `internet` — overhead-accounting keyword that handles common
|
|
||||||
Ethernet+ISP encapsulation (DOCSIS / GPON / PPPoE) correctly without
|
|
||||||
undershooting. Conservative default.
|
|
||||||
- `diffserv4` — four-tier DSCP-aware tin selection. Reads the EF marks
|
|
||||||
set by the nftables rule and routes srcds packets into the
|
|
||||||
highest-priority "Voice" tin. Without `diffserv4`, the marks are
|
|
||||||
ignored.
|
|
||||||
- `dual-dsthost` — egress fairness keyed on destination host. With ≥2
|
|
||||||
players connected, each player gets fair share regardless of how
|
|
||||||
chatty the server is to any single client.
|
|
||||||
|
|
||||||
Iface-flap behaviour: the kernel keeps the qdisc on an iface across
|
|
||||||
link-down/link-up while the iface itself exists. If the iface is
|
|
||||||
recreated (e.g., NetworkManager reconfiguration), `systemctl restart
|
|
||||||
left4me-cake.service` reapplies. Documented; no auto-watchdog in v1.
|
|
||||||
|
|
||||||
Deploy script changes (in `deploy/deploy-test-server.sh`):
|
|
||||||
|
|
||||||
- Copy `cake.env` to `/etc/left4me/cake.env` only if absent (do not
|
|
||||||
clobber operator edits).
|
|
||||||
- Copy `left4me-apply-cake` to `/usr/local/libexec/left4me/`, mode
|
|
||||||
`0755`, owner `root:root`.
|
|
||||||
- Copy `left4me-cake.service` to `/usr/local/lib/systemd/system/`.
|
|
||||||
- `systemctl daemon-reload` (already done in the existing flow).
|
|
||||||
- `systemctl enable --now left4me-cake.service`.
|
|
||||||
|
|
||||||
### CAKE egress shaper — production deployment (systemd-networkd)
|
|
||||||
|
|
||||||
On hosts running `systemd-networkd`, the CAKE configuration belongs in
|
|
||||||
the matching `.network` file. systemd-networkd reapplies it across iface
|
|
||||||
lifecycle events, addressing the only fragility of the test-deploy
|
|
||||||
oneshot.
|
|
||||||
|
|
||||||
Document in `deploy/README.md` Performance section:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
# /etc/systemd/network/<your-uplink>.network
|
|
||||||
[CAKE]
|
|
||||||
Bandwidth=480M
|
|
||||||
OverheadKeyword=internet
|
|
||||||
PriorityQueueingPreset=diffserv4
|
|
||||||
EgressHostIsolation=yes
|
|
||||||
```
|
|
||||||
|
|
||||||
Directive names follow `systemd.network(5)`. Values mirror the test
|
|
||||||
deploy's `tc` invocation:
|
|
||||||
|
|
||||||
- `Bandwidth=480M` — placeholder; operator sets to ≈95% of measured
|
|
||||||
uplink in their actual `.network`.
|
|
||||||
- `OverheadKeyword=internet` — equivalent of the `internet` keyword.
|
|
||||||
- `PriorityQueueingPreset=diffserv4` — equivalent of `diffserv4`.
|
|
||||||
- `EgressHostIsolation=yes` — equivalent of `dual-dsthost` on egress.
|
|
||||||
|
|
||||||
The nftables marking from the previous section ships unchanged on prod;
|
|
||||||
it is qdisc-installer-agnostic.
|
|
||||||
|
|
||||||
The test-deploy oneshot does NOT install on a host running
|
|
||||||
`systemd-networkd`. v1 does not implement that gate — production hosts
|
|
||||||
do not run the test-deploy script. If the boundary blurs in the future,
|
|
||||||
add a check in `left4me-apply-cake` for `systemctl is-active
|
|
||||||
systemd-networkd` and skip cleanly.
|
|
||||||
|
|
||||||
### Documented escape hatches
|
|
||||||
|
|
||||||
Append to `deploy/README.md` Performance section, alongside the existing
|
|
||||||
governor / CPU-affinity / NIC entries:
|
|
||||||
|
|
||||||
- **Ingress shaping via IFB.** Egress CAKE alone does not protect srcds
|
|
||||||
receive against ingress saturation (large workshop downloads, package
|
|
||||||
fetches arriving at line rate). One-liner template using `modprobe
|
|
||||||
ifb`, `ip link set ifb0 up`, `tc qdisc add dev ifb0 root cake bandwidth
|
|
||||||
Xmbit ingress diffserv4 dual-srchost`, and a `tc filter` redirect from
|
|
||||||
the uplink iface. Worth flipping only when measurement shows ingress
|
|
||||||
hurting receive; in v1 we have no such measurement, so it stays
|
|
||||||
documented.
|
|
||||||
- **`net.core.busy_poll = 50` / `net.core.busy_read = 50`.** Reduces UDP
|
|
||||||
receive median latency by polling for incoming packets briefly at
|
|
||||||
syscall boundaries. Cost: measurable CPU per syscall under load. Worth
|
|
||||||
flipping if a host is dedicated to game serving and CPU headroom is
|
|
||||||
plentiful.
|
|
||||||
- **`ethtool -K <iface> gro off`.** Some Source-engine ops disable
|
|
||||||
generic receive offload to avoid receive-side coalescing latency.
|
|
||||||
Hardware/driver dependent. Document, do not ship.
|
|
||||||
|
|
||||||
These three entries follow the existing escape-hatch style: a one-liner
|
|
||||||
or short config block, plus one sentence on when it matters.
|
|
||||||
|
|
||||||
### Files changed / added
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/files/etc/sysctl.d/99-left4me.conf (modified — block added)
|
|
||||||
deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft (new)
|
|
||||||
deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service (new)
|
|
||||||
deploy/files/etc/left4me/cake.env (new — template, deploy preserves operator edits)
|
|
||||||
deploy/files/usr/local/libexec/left4me/left4me-apply-cake (new)
|
|
||||||
deploy/files/usr/local/lib/systemd/system/left4me-cake.service (new)
|
|
||||||
deploy/deploy-test-server.sh (modified — install+enable nft and cake units, conditional copy of cake.env)
|
|
||||||
deploy/README.md (modified — Network shaping subsection + 3 new escape hatches)
|
|
||||||
deploy/tests/test_deploy_artifacts.py (modified — assertions for all artifacts above)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
Following the existing `assert "key=value" in text` pattern in
|
|
||||||
`deploy/tests/test_deploy_artifacts.py`:
|
|
||||||
|
|
||||||
**Sysctl block** (extension of the existing perf-baseline assertions):
|
|
||||||
|
|
||||||
- Each of `net.ipv4.udp_rmem_min = 16384`, `net.ipv4.udp_wmem_min =
|
|
||||||
16384`, `net.core.default_qdisc = fq_codel`,
|
|
||||||
`net.ipv4.tcp_congestion_control = bbr` is asserted as a separate line.
|
|
||||||
|
|
||||||
**nftables marking artifacts:**
|
|
||||||
|
|
||||||
- `left4me-mark.nft` ships with `table inet left4me_mark`, `chain
|
|
||||||
mangle_output`, `meta skuid "left4me"`, `ip dscp set ef`, `ip6 dscp
|
|
||||||
set ef`, and `meta priority set 0006:0000` each asserted as separate
|
|
||||||
substring matches. (DSCP and priority statements appear inline on
|
|
||||||
the same rule per L3 family; substring assertions don't depend on
|
|
||||||
rule layout.)
|
|
||||||
- `left4me-nft-mark.service` has `ExecStart=/usr/sbin/nft -f
|
|
||||||
/usr/local/lib/left4me/nft/left4me-mark.nft`, `ExecStop=/usr/sbin/nft
|
|
||||||
delete table inet left4me_mark`, `Type=oneshot`,
|
|
||||||
`RemainAfterExit=yes`, `WantedBy=multi-user.target`.
|
|
||||||
- `deploy-test-server.sh` invokes `systemctl enable --now
|
|
||||||
left4me-nft-mark.service` (or equivalent at-deploy enabling step).
|
|
||||||
|
|
||||||
**CAKE artifacts:**
|
|
||||||
|
|
||||||
- `cake.env` template contains the literal lines `LEFT4ME_UPLINK_MBIT=`
|
|
||||||
and `LEFT4ME_UPLINK_IFACE=` (commented or uncommented; matched as
|
|
||||||
substring).
|
|
||||||
- `left4me-apply-cake` contains the literals `tc qdisc replace`, `cake`,
|
|
||||||
`bandwidth`, `internet`, `diffserv4`, `dual-dsthost`,
|
|
||||||
`LEFT4ME_UPLINK_MBIT`, `LEFT4ME_UPLINK_IFACE`.
|
|
||||||
- `left4me-apply-cake` is mode `0755` after deploy (asserted via the
|
|
||||||
same mechanism the existing helper-script tests use).
|
|
||||||
- `left4me-cake.service` contains
|
|
||||||
`EnvironmentFile=-/etc/left4me/cake.env`,
|
|
||||||
`ExecStart=/usr/local/libexec/left4me/left4me-apply-cake apply`,
|
|
||||||
`ExecStop=/usr/local/libexec/left4me/left4me-apply-cake clear`,
|
|
||||||
`Wants=network-online.target`, `Type=oneshot`,
|
|
||||||
`WantedBy=multi-user.target`.
|
|
||||||
- `deploy-test-server.sh` invokes `systemctl enable --now
|
|
||||||
left4me-cake.service`.
|
|
||||||
- `deploy-test-server.sh` copies `cake.env` only when target absent
|
|
||||||
(asserted by literal substring of the guarding `[ -e
|
|
||||||
/etc/left4me/cake.env ]` test or equivalent).
|
|
||||||
|
|
||||||
No runtime networking tests in v1. The artifacts are static; their
|
|
||||||
runtime behaviour requires a real iface and a real bandwidth load,
|
|
||||||
which the operator measures.
|
|
||||||
|
|
||||||
## Rollout
|
|
||||||
|
|
||||||
Single deploy. After the new sysctl block lands, `sysctl --system`
|
|
||||||
applies it immediately (already in the deploy flow). The two new
|
|
||||||
systemd units start on `systemctl enable --now`; CAKE without a
|
|
||||||
configured `LEFT4ME_UPLINK_MBIT` logs a warning and no-ops, which is
|
|
||||||
the expected fresh-deploy state. The operator measures their uplink,
|
|
||||||
edits `/etc/left4me/cake.env`, and runs `systemctl restart
|
|
||||||
left4me-cake.service`.
|
|
||||||
|
|
||||||
Already-running game servers are unaffected by the network changes
|
|
||||||
themselves. The marking applies on every emitted packet from the moment
|
|
||||||
the nft rule loads; future-emitted packets pick up DSCP+priority without
|
|
||||||
restarting any srcds instance.
|
|
||||||
|
|
||||||
## Open questions
|
|
||||||
|
|
||||||
None blocking. v2 candidates if measurement justifies them:
|
|
||||||
|
|
||||||
- A `LEFT4ME_INGRESS_MBIT` knob that flips on the IFB ingress shaper as
|
|
||||||
a default, conditional on the env value being set.
|
|
||||||
- A `left4me-net-doctor` helper that reports current qdisc, applied
|
|
||||||
marks, and a one-shot saturation+ping measurement against a local
|
|
||||||
endpoint.
|
|
||||||
- A small Python wrapper in `l4d2host` that reads `cake.env` for
|
|
||||||
display in the web UI, so the operator sees in one place whether
|
|
||||||
shaping is active.
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- `tc-cake(8)` — keyword semantics: `bandwidth`, `internet`,
|
|
||||||
`diffserv4`, `dual-dsthost`, tin priority mapping.
|
|
||||||
- `systemd.network(5)` — `[CAKE]` section directives:
|
|
||||||
`Bandwidth=`, `OverheadKeyword=`, `PriorityQueueingPreset=`,
|
|
||||||
`EgressHostIsolation=`.
|
|
||||||
- `nft(8)` — `meta skuid`, `meta priority`, `ip dscp set`, table
|
|
||||||
isolation semantics.
|
|
||||||
- RFC 3246 — Expedited Forwarding (EF) PHB.
|
|
||||||
- Linux kernel `Documentation/networking/tcp_bbr.txt` — BBR pairs with
|
|
||||||
`fq` / `fq_codel` for correct pacing.
|
|
||||||
- `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md`
|
|
||||||
— sibling spec; this spec extends `99-left4me.conf` and reuses the
|
|
||||||
same deploy-test-artifact pattern.
|
|
||||||
|
|
@ -1,96 +0,0 @@
|
||||||
# Profile page with self-service password change — design
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The web app has login/logout (`l4d2web/auth.py`, `l4d2web/routes/auth_routes.py`) and admin user management (activate/deactivate/delete), but no way for a logged-in user to change their own password. The header (`l4d2web/templates/base.html:27`) renders the username as a non-clickable `<span class="muted">`.
|
|
||||||
|
|
||||||
This design adds a `/profile` page reachable by clicking the username, with "Change password" as its first (and only) section. Future profile fields can slot into the same page as new sections without rework.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- Logged-in users can change their own password from a self-service page.
|
|
||||||
- Following an industry-standard session model: a successful password change **invalidates every other active session for the user** and **keeps the current session signed in**. No re-login on the current browser.
|
|
||||||
- Single password policy enforced everywhere a password is set (web flow today, CLI `create-user` for consistency).
|
|
||||||
|
|
||||||
## Non-goals (out of scope for v1)
|
|
||||||
|
|
||||||
- Admin password reset for other users. A separate feature; no rework needed to add later.
|
|
||||||
- Password recovery / email reset flow.
|
|
||||||
- Other profile fields (display name, email, etc.). The page is structured to grow but ships with one section.
|
|
||||||
|
|
||||||
## Decisions
|
|
||||||
|
|
||||||
- **URLs.** `GET /profile` for the page, `POST /profile/password` for the form submission.
|
|
||||||
- **Form fields.** `current_password`, `new_password`, `confirm_new_password`. All three required.
|
|
||||||
- **Password policy.** Not empty, minimum 8 characters. Same rule applies to the CLI `create-user` so policy lives in one place.
|
|
||||||
- **Session policy.** Invalidate other sessions on success; keep the current session signed in.
|
|
||||||
- **Rate limit.** Per-IP, sliding window. Re-uses the same primitive as `/login`.
|
|
||||||
- **CSRF.** Standard hidden-token pattern shared with the rest of the app.
|
|
||||||
|
|
||||||
## Session-invalidation mechanism
|
|
||||||
|
|
||||||
A new `password_changed_at: DateTime NOT NULL` column on `users`. Two checkpoints:
|
|
||||||
|
|
||||||
1. **On login.** `login_user` stashes `session["pw_changed_at"] = user.password_changed_at.isoformat()`.
|
|
||||||
2. **On every request.** `load_current_user` rejects the session — same shape as the existing `user.active` check — when the marker is missing, malformed, or strictly older than `user.password_changed_at`.
|
|
||||||
|
|
||||||
On successful password change:
|
|
||||||
|
|
||||||
- Rotate `user.password_digest` and bump `user.password_changed_at` to "now".
|
|
||||||
- Re-stamp `session["pw_changed_at"]` to the new value so this browser keeps working.
|
|
||||||
- Other browsers carry the old marker and get logged out the next time they hit a `@require_login` route.
|
|
||||||
|
|
||||||
This mirrors the established `g.user = None if not user.active else user` pattern, so the surface area added to the auth path is small and the behavior is easy to reason about.
|
|
||||||
|
|
||||||
## Validation branches (POST /profile/password)
|
|
||||||
|
|
||||||
In order:
|
|
||||||
|
|
||||||
1. All three fields present → otherwise `error=fields_required`.
|
|
||||||
2. `new_password == confirm_new_password` → otherwise `error=mismatch`.
|
|
||||||
3. `validate_new_password(new_password)` passes → otherwise `error=empty` or `error=too_short`.
|
|
||||||
4. `verify_password(current_password, user.password_digest)` succeeds → otherwise `error=wrong_current`.
|
|
||||||
5. Rotate, re-stamp, redirect to `/profile?success=1`.
|
|
||||||
|
|
||||||
Errors are surfaced inline on the next render of `/profile` via a small `?error=<key>` → human-readable message map in the route. No flash storage required.
|
|
||||||
|
|
||||||
## Migration story
|
|
||||||
|
|
||||||
Adding `password_changed_at` to `users` requires a migration:
|
|
||||||
|
|
||||||
- Add the column nullable.
|
|
||||||
- Backfill existing rows with their `created_at` so historical data has a sane marker.
|
|
||||||
- Alter to `NOT NULL`.
|
|
||||||
|
|
||||||
Effect on existing live sessions: any cookie that predates the migration lacks `pw_changed_at` and is rejected on first request after deploy. Users log in once more. Acceptable for v1 deployment.
|
|
||||||
|
|
||||||
## Surface area
|
|
||||||
|
|
||||||
**New files**
|
|
||||||
- `l4d2web/alembic/versions/0009_user_password_changed_at.py`
|
|
||||||
- `l4d2web/services/rate_limit.py`
|
|
||||||
- `l4d2web/routes/profile_routes.py`
|
|
||||||
- `l4d2web/templates/profile.html`
|
|
||||||
- `l4d2web/tests/test_profile.py`
|
|
||||||
|
|
||||||
**Modified files**
|
|
||||||
- `l4d2web/models.py` — column.
|
|
||||||
- `l4d2web/auth.py` — `MIN_PASSWORD_LENGTH`, `validate_new_password`, `login_user` signature, freshness check in `load_current_user`.
|
|
||||||
- `l4d2web/routes/auth_routes.py` — pass marker to `login_user`; use the generic rate-limit helper.
|
|
||||||
- `l4d2web/templates/base.html` — username `<span>` → `<a href="/profile">`.
|
|
||||||
- `l4d2web/app.py` — register the new blueprint, reset its rate-limit bucket in TESTING.
|
|
||||||
- `l4d2web/cli.py` — apply `validate_new_password` for parity with the web flow.
|
|
||||||
|
|
||||||
## Reused utilities
|
|
||||||
|
|
||||||
- `hash_password`, `verify_password` — `l4d2web/auth.py`
|
|
||||||
- `require_login` — `l4d2web/auth.py`
|
|
||||||
- `session_scope` — `l4d2web/db.py`
|
|
||||||
- `now_utc` — `l4d2web/models.py`
|
|
||||||
- CSRF hidden-token pattern — see `templates/admin_users.html`, `routes/auth_routes.py`
|
|
||||||
|
|
||||||
## Open questions resolved during brainstorming
|
|
||||||
|
|
||||||
- *Should the current session also be invalidated?* No — industry consensus (Django `update_session_auth_hash`, Rails Devise, GitHub/Google behaviour, OWASP / NIST SP 800-63B implications) is to keep the current session and rotate other sessions. Forcing re-login on a session that just proved knowledge of the current password adds friction without security gain.
|
|
||||||
- *Should we add `password_changed_at` or use a `session_version` counter?* Timestamp is enough; the comparison is unambiguous and avoids an extra integer field with arbitrary meaning.
|
|
||||||
- *Admin reset?* Deferred. The current model has no rework debt for adding it later.
|
|
||||||
|
|
@ -1,326 +0,0 @@
|
||||||
# Workshop Auto-Download — Design
|
|
||||||
|
|
||||||
## Problem
|
|
||||||
|
|
||||||
When a user adds workshop items to an overlay (`POST /overlays/{id}/items`), the route saves `WorkshopItem` metadata and enqueues a `build_overlay` job. The build symlinks already-cached `.vpk` files and emits `skipped: not yet downloaded` to stderr for everything else. The only thing that actually pulls bytes from Steam is the admin-only `refresh_workshop_items` job, which is a global mutex blocking all server starts, all builds, and installs.
|
|
||||||
|
|
||||||
In practice, this means freshly-added items never appear in the overlay until an admin presses a button. That isn't workable.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
1. Newly added items get downloaded without admin action.
|
|
||||||
2. Items that authors update on Steam get re-downloaded automatically on a daily cadence.
|
|
||||||
3. Overlay owners can manually re-check / re-pull their own overlay's items.
|
|
||||||
|
|
||||||
## Non-Goals
|
|
||||||
|
|
||||||
See "Out of Scope" at the end. In particular: the `refresh_workshop_items` global mutex stays; there is no cache GC; no per-item retry inside `download_to_cache`; no update-aware server-restart prompt.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
Three changes layered onto the existing scheduler. None introduce a new job type or new scheduler rule.
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ User adds items │
|
|
||||||
│ POST /overlays/{id}/items │
|
|
||||||
│ ↳ fetch metadata batch (mode=add) │
|
|
||||||
│ ↳ upsert WorkshopItem rows │
|
|
||||||
│ ↳ enqueue_build_overlay ◀── already happens today │
|
|
||||||
└─────────────────────────────────────────────────────────────────────┘
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
┌─────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ build_overlay job (per-overlay; not a global mutex) │
|
|
||||||
│ WorkshopBuilder.build(): │
|
|
||||||
│ 1. query overlay's items │
|
|
||||||
│ 2. for each item where cache miss / stale: ◀── NEW │
|
|
||||||
│ download_to_cache(meta) with retry+backoff │
|
|
||||||
│ stamp WorkshopItem.last_downloaded_at │
|
|
||||||
│ 3. apply symlinks (existing logic) │
|
|
||||||
└─────────────────────────────────────────────────────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ Owner re-checks one overlay │
|
|
||||||
│ POST /overlays/{id}/refresh ◀── NEW │
|
|
||||||
│ ↳ fetch metadata batch for this overlay only (mode=refresh) │
|
|
||||||
│ ↳ update WorkshopItem rows │
|
|
||||||
│ ↳ enqueue_build_overlay (does the download) │
|
|
||||||
└─────────────────────────────────────────────────────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ Daily global update │
|
|
||||||
│ systemd timer → l4d2web workshop-refresh CLI ◀── NEW │
|
|
||||||
│ ↳ inserts Job(operation='refresh_workshop_items') │
|
|
||||||
│ ↳ worker picks it up; existing global-mutex rule still applies │
|
|
||||||
│ ↳ existing _run_refresh_workshop_items code unchanged │
|
|
||||||
└─────────────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
Key invariant: **on-add downloads run inside the per-overlay `build_overlay` job, so they do not block server starts globally.** Only the daily global refresh keeps the existing global-mutex semantics.
|
|
||||||
|
|
||||||
## Component 1 — Auto-download inside `WorkshopBuilder.build`
|
|
||||||
|
|
||||||
The builder gets a new download phase between "query items" and "apply symlinks". Today's behavior (skip-uncached with stderr warning) is replaced.
|
|
||||||
|
|
||||||
### Decision logic
|
|
||||||
|
|
||||||
For each item bound to the overlay:
|
|
||||||
|
|
||||||
1. **Skip with warning** if `file_url == ""` (Steam returned `result != 1` last time we asked — delisted, private, or hidden). Emit one stderr line `workshop item {steam_id} skipped: no file_url (steam result: {last_error})`. Do **not** fail the build — these items quietly fall out of the symlink set because they never produce a cache file. An owner can investigate via the overlay detail page where `last_error` is shown.
|
|
||||||
2. Otherwise, **download** when any of:
|
|
||||||
- `last_downloaded_at IS NULL`, or
|
|
||||||
- cache file `{steam_id}.vpk` missing, or
|
|
||||||
- cache file `(mtime, size)` doesn't match `(time_updated, file_size)` from the row.
|
|
||||||
3. Otherwise, leave the item alone (its cache file is current).
|
|
||||||
|
|
||||||
`steam_workshop.download_to_cache` already does the `(mtime, size)` check internally and short-circuits when the cache is current, so the builder can call it unconditionally for items in the "maybe download" set and trust the helper for idempotence.
|
|
||||||
|
|
||||||
### Stamping
|
|
||||||
|
|
||||||
- On success per item: `WorkshopItem.last_downloaded_at = now()`, `last_error = ""`.
|
|
||||||
- On failure per item (after retry exhaustion): `last_error` records the final exception string; the builder raises → `last_build_status='failed'`.
|
|
||||||
|
|
||||||
### What the builder does NOT do
|
|
||||||
|
|
||||||
It does not fetch fresh Steam metadata. Metadata is the responsibility of the add route, the per-overlay refresh route, and the daily refresh job. The builder is a pure function of DB state — this keeps it cheap and predictable, and lets builds run without any outbound metadata call.
|
|
||||||
|
|
||||||
### Concurrency
|
|
||||||
|
|
||||||
Items are downloaded sequentially within one builder run. Different overlays' builds run in parallel under existing scheduler rules; when two overlays share an item and race, the existing `download_to_cache` idempotence handles it — the loser sees a fresh file and skips. `last_downloaded_at` writes from two concurrent builds collapse to one timestamp; no real race.
|
|
||||||
|
|
||||||
### Cancellation
|
|
||||||
|
|
||||||
The builder threads `should_cancel` into `download_to_cache` (the helper already accepts it). Cancelled mid-download deletes the `.partial` file; the symlink phase doesn't run. Cancellation during the inter-attempt sleep wakes up within ~250 ms (see retry section).
|
|
||||||
|
|
||||||
### Logging
|
|
||||||
|
|
||||||
Each item's download start / finish / error emits one line. Counts are reported in the existing summary line:
|
|
||||||
|
|
||||||
```
|
|
||||||
workshop overlay 'mycollection': downloaded=3 cached=12 skipped=1 created=14 removed=1 unchanged=11 errors=0
|
|
||||||
```
|
|
||||||
|
|
||||||
`skipped` now means "Steam can't serve this item (no file_url)" instead of the old "uncached" meaning. Uncached items get downloaded.
|
|
||||||
|
|
||||||
## Component 2 — Retry & backoff
|
|
||||||
|
|
||||||
Wraps each `download_to_cache(meta, ...)` call inside the builder.
|
|
||||||
|
|
||||||
```
|
|
||||||
attempts = 3
|
|
||||||
delays = [1s, 2s, 4s] # exponential; slept between attempts
|
|
||||||
|
|
||||||
for n in 1..attempts:
|
|
||||||
try:
|
|
||||||
download_to_cache(meta, cache_root, should_cancel=should_cancel)
|
|
||||||
break
|
|
||||||
except InterruptedError: # cancellation
|
|
||||||
raise # propagate immediately
|
|
||||||
except (requests.RequestException, OSError) as exc:
|
|
||||||
if n == attempts: raise # final attempt: bubble up → job fails
|
|
||||||
on_stderr(f"workshop {meta.steam_id} attempt {n}/{attempts} failed: {exc}")
|
|
||||||
sleep_with_cancel(delays[n-1], should_cancel)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Notes
|
|
||||||
|
|
||||||
- `sleep_with_cancel` is a small helper that polls `should_cancel` every ~250 ms during the sleep so a cancel does not wait out the full backoff window.
|
|
||||||
- The retry loop lives in the builder (`overlay_builders.py`), not in `steam_workshop.download_to_cache`. The downloader stays a single-shot primitive; retry policy is a caller concern. Keeps the helper testable without time-mocking.
|
|
||||||
- HTTP 4xx responses raised by `raise_for_status()` are `requests.HTTPError` (a `RequestException`), so they are retried too. That is intentional — 404 / 410 will fail three times quickly and surface; the cost of three failed attempts is negligible compared to the cost of users having to guess why a single transient blip killed the job.
|
|
||||||
- On final failure the job fails with the per-item error string and overlay `last_build_status='failed'`, matching the existing "never silently mount a partial overlay" rule.
|
|
||||||
|
|
||||||
## Component 3 — Per-overlay refresh
|
|
||||||
|
|
||||||
New route `POST /overlays/{id}/refresh`. Mirrors the add route's metadata-fetch path but scoped to the items already in this overlay.
|
|
||||||
|
|
||||||
### Route sketch
|
|
||||||
|
|
||||||
```python
|
|
||||||
@bp.post("/overlays/<int:overlay_id>/refresh")
|
|
||||||
@require_login
|
|
||||||
def refresh_overlay(overlay_id: int) -> Response:
|
|
||||||
user = current_user()
|
|
||||||
with session_scope() as db:
|
|
||||||
overlay, err = _check_workshop_overlay_access(overlay_id, user, db)
|
|
||||||
if err is not None: return err
|
|
||||||
steam_ids = db.scalars(
|
|
||||||
select(WorkshopItem.steam_id)
|
|
||||||
.join(OverlayWorkshopItem, OverlayWorkshopItem.workshop_item_id == WorkshopItem.id)
|
|
||||||
.where(OverlayWorkshopItem.overlay_id == overlay_id)
|
|
||||||
).all()
|
|
||||||
|
|
||||||
if not steam_ids:
|
|
||||||
return Response("overlay has no items", status=400)
|
|
||||||
|
|
||||||
try:
|
|
||||||
metas = steam_workshop.fetch_metadata_batch(steam_ids, mode="refresh")
|
|
||||||
except Exception as exc:
|
|
||||||
return Response(f"steam api error: {exc}", status=502)
|
|
||||||
|
|
||||||
with session_scope() as db:
|
|
||||||
overlay, err = _check_workshop_overlay_access(overlay_id, user, db)
|
|
||||||
if err is not None: return err
|
|
||||||
metas_by_id = {m.steam_id: m for m in metas}
|
|
||||||
for steam_id in steam_ids:
|
|
||||||
wi = db.scalar(select(WorkshopItem).where(WorkshopItem.steam_id == steam_id))
|
|
||||||
meta = metas_by_id.get(steam_id)
|
|
||||||
if wi is None: continue
|
|
||||||
if meta is None:
|
|
||||||
wi.last_error = "steam returned no entry for this item"
|
|
||||||
continue
|
|
||||||
wi.title = meta.title
|
|
||||||
wi.filename = meta.filename
|
|
||||||
wi.file_url = meta.file_url
|
|
||||||
wi.file_size = meta.file_size
|
|
||||||
wi.time_updated = meta.time_updated
|
|
||||||
wi.preview_url = meta.preview_url
|
|
||||||
wi.last_error = "" if meta.result == 1 else f"steam result {meta.result}"
|
|
||||||
job = enqueue_build_overlay(db, overlay_id=overlay_id, user_id=user.id)
|
|
||||||
job_id = job.id
|
|
||||||
return redirect(f"/jobs/{job_id}")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Behavior notes
|
|
||||||
|
|
||||||
- Permission: same `_check_workshop_overlay_access` used by add/remove — owner or admin.
|
|
||||||
- `mode="refresh"` (not `"add"`): non-L4D2 items silently drop from the batch instead of raising. An item whose `consumer_app_id` somehow changed after add will not break refresh.
|
|
||||||
- The metadata write does **not** stamp `last_downloaded_at`. That field stays bound to actual file presence — the builder's download phase stamps it after the bytes land. A refresh that finds `time_updated` advanced therefore leaves `last_downloaded_at` pointing at the prior version; the `(mtime, size)` check in `download_to_cache` sees the mismatch and the builder re-downloads. Correct by construction.
|
|
||||||
- One Steam metadata POST per click, owner-gated. No new rate-limit concern.
|
|
||||||
|
|
||||||
### UI
|
|
||||||
|
|
||||||
A "Refresh" button next to "Add items" on the overlay detail page (workshop type only). Submits the POST; redirects to the job page like everything else.
|
|
||||||
|
|
||||||
## Component 4 — Periodic global refresh (CLI + systemd timer)
|
|
||||||
|
|
||||||
The existing `_run_refresh_workshop_items` job is complete and correct — it fetches all metadata, downloads what advanced, re-enqueues `build_overlay` for affected overlays. We only need a way to enqueue it on a schedule.
|
|
||||||
|
|
||||||
### CLI subcommand
|
|
||||||
|
|
||||||
In `l4d2web/cli.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@cli.command("workshop-refresh")
|
|
||||||
def workshop_refresh() -> None:
|
|
||||||
"""Enqueue a global workshop refresh job. Idempotent: if one is already
|
|
||||||
queued or running, prints its id and exits 0."""
|
|
||||||
with session_scope() as db:
|
|
||||||
existing = db.scalar(
|
|
||||||
select(Job).where(
|
|
||||||
Job.operation == "refresh_workshop_items",
|
|
||||||
Job.state.in_(("queued", "running", "cancelling")),
|
|
||||||
).order_by(Job.id.desc()).limit(1)
|
|
||||||
)
|
|
||||||
if existing is not None:
|
|
||||||
click.echo(f"refresh_workshop_items job {existing.id} already {existing.state}")
|
|
||||||
return
|
|
||||||
job = Job(
|
|
||||||
user_id=None,
|
|
||||||
server_id=None,
|
|
||||||
operation="refresh_workshop_items",
|
|
||||||
state="queued",
|
|
||||||
)
|
|
||||||
db.add(job)
|
|
||||||
db.flush()
|
|
||||||
click.echo(f"enqueued refresh_workshop_items job {job.id}")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Schema follow-up
|
|
||||||
|
|
||||||
`Job.user_id = None` for system-enqueued refreshes. The implementation plan must verify whether the column is currently nullable; if it is `NOT NULL`, the plan either (a) relaxes it to nullable (preferred — "system" is a real category) or (b) records the lowest-id admin user as the actor. The design assumes (a).
|
|
||||||
|
|
||||||
### systemd units in `deploy/`
|
|
||||||
|
|
||||||
```ini
|
|
||||||
# left4me-workshop-refresh.service
|
|
||||||
[Unit]
|
|
||||||
Description=Left4me — enqueue daily workshop refresh
|
|
||||||
After=network-online.target left4me-web.service
|
|
||||||
Requires=left4me-web.service
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
User=left4me
|
|
||||||
ExecStart=/opt/left4me/bin/l4d2web workshop-refresh
|
|
||||||
```
|
|
||||||
|
|
||||||
```ini
|
|
||||||
# left4me-workshop-refresh.timer
|
|
||||||
[Unit]
|
|
||||||
Description=Left4me — daily workshop refresh
|
|
||||||
|
|
||||||
[Timer]
|
|
||||||
OnCalendar=*-*-* 04:00:00
|
|
||||||
Persistent=true
|
|
||||||
RandomizedDelaySec=15min
|
|
||||||
|
|
||||||
[Install]
|
|
||||||
WantedBy=timers.target
|
|
||||||
```
|
|
||||||
|
|
||||||
### Operator notes
|
|
||||||
|
|
||||||
- The timer enqueues; the worker decides when to actually run. The existing scheduler will defer the refresh if a server start, install, or build is in progress. Worst case the refresh starts after the conflicting job finishes — the intended behavior.
|
|
||||||
- `Persistent=true` handles "host was down at 04:00" — the unit runs on next boot. The CLI's idempotence check prevents pile-up if it fires twice.
|
|
||||||
- Deployment wires this into the existing `deploy/` install flow (in scope for the implementation plan).
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
Layered against the existing test files. No new test infrastructure.
|
|
||||||
|
|
||||||
### `tests/test_overlay_builders.py` — bulk of new coverage
|
|
||||||
|
|
||||||
- `test_workshop_build_downloads_uncached_items` — item with `last_downloaded_at=None` and no cache file → patched `download_to_cache` is called → file appears → symlink created → `last_downloaded_at` stamped.
|
|
||||||
- `test_workshop_build_skips_already_cached_items` — item with cache file matching `(time_updated, size)` → `download_to_cache` returns immediately (its existing idempotence) → no network → symlink created.
|
|
||||||
- `test_workshop_build_redownloads_stale_cache` — cache file exists but `(mtime, size)` mismatches the DB row → re-download happens.
|
|
||||||
- `test_workshop_build_retry_succeeds` — patched downloader fails twice then succeeds → builder finishes ok, retry messages on stderr, `last_downloaded_at` stamped. Backoff sleep monkey-patched to zero for speed.
|
|
||||||
- `test_workshop_build_retry_exhausted_fails_job` — downloader fails all three attempts → builder raises → `last_build_status='failed'`, `last_error` populated on the WorkshopItem.
|
|
||||||
- `test_workshop_build_cancellation_during_download` — `should_cancel` flips true mid-download → builder returns early, `.partial` cleaned up by `download_to_cache`, symlink phase did not run.
|
|
||||||
- `test_workshop_build_cancellation_during_backoff` — cancel flips true while sleeping between retries → wakes up within ~250 ms of the cancel.
|
|
||||||
- `test_workshop_build_skips_items_with_no_file_url` — item with `file_url=""` and `last_error="steam result 9"` → builder writes one stderr line, does NOT call `download_to_cache`, build succeeds with `last_build_status='ok'`, item is absent from the symlink set.
|
|
||||||
|
|
||||||
### `tests/test_workshop_routes.py` — new per-overlay refresh route
|
|
||||||
|
|
||||||
- `test_overlay_refresh_owner_allowed` — owner POST → `fetch_metadata_batch` called with exactly that overlay's steam_ids → WorkshopItem rows updated → `build_overlay` enqueued → 302 to /jobs/{id}.
|
|
||||||
- `test_overlay_refresh_other_user_forbidden` — non-owner non-admin → 403.
|
|
||||||
- `test_overlay_refresh_admin_can_refresh_any` — admin POST on someone else's overlay → 200/302.
|
|
||||||
- `test_overlay_refresh_steam_api_error_502` — `fetch_metadata_batch` raises → response is 502, no job enqueued.
|
|
||||||
- `test_overlay_refresh_empty_overlay_400` — overlay has no items → 400, no Steam call.
|
|
||||||
- `test_overlay_refresh_drops_missing_items_gracefully` — Steam returns nothing for one ID → that row gets `last_error="steam returned no entry…"`, build still enqueued.
|
|
||||||
|
|
||||||
### `tests/test_cli.py` — new CLI subcommand
|
|
||||||
|
|
||||||
- `test_workshop_refresh_enqueues_job` — CLI invocation inserts a queued `Job(operation='refresh_workshop_items')` and prints its id.
|
|
||||||
- `test_workshop_refresh_idempotent_when_queued` — pre-existing queued/running refresh job → second invocation prints the existing id and does not insert a duplicate.
|
|
||||||
|
|
||||||
### `tests/test_job_worker.py`
|
|
||||||
|
|
||||||
No new tests. Scheduler rules and `_run_refresh_workshop_items` are unchanged. Existing coverage holds.
|
|
||||||
|
|
||||||
### Out of test scope
|
|
||||||
|
|
||||||
The systemd timer. Validating it requires a host; smoke it on the dev host post-deploy.
|
|
||||||
|
|
||||||
## Out of Scope
|
|
||||||
|
|
||||||
- **Replacing the global mutex on `refresh_workshop_items`.** Daily refresh still blocks server starts/builds during its run. Scheduled at 04:00 with `Persistent=true`; revisit only if it observably hurts.
|
|
||||||
- **Per-item retry policy in `download_to_cache`.** Retry stays in the builder.
|
|
||||||
- **Cache GC.** Cache still grows monotonically — same as the v1 spec.
|
|
||||||
- **Steam API rate-limit handling for the metadata endpoint.** No backoff for metadata calls. Retries apply only to per-item file downloads.
|
|
||||||
- **Update-aware server restart UX.** When the daily refresh re-downloads an item mounted by a running server, the running server keeps its old mount. Notifying the user / offering a "restart to pick up updates" prompt stays in the backlog.
|
|
||||||
- **Per-overlay refresh on non-workshop overlay types.** Only workshop overlays get the Refresh button.
|
|
||||||
|
|
||||||
## Affected Files
|
|
||||||
|
|
||||||
Implementation will touch roughly:
|
|
||||||
|
|
||||||
- `l4d2web/services/overlay_builders.py` — WorkshopBuilder download phase, retry helper.
|
|
||||||
- `l4d2web/routes/workshop_routes.py` — new `/overlays/{id}/refresh` route.
|
|
||||||
- `l4d2web/templates/...` — Refresh button on overlay detail page.
|
|
||||||
- `l4d2web/cli.py` — new `workshop-refresh` subcommand.
|
|
||||||
- `l4d2web/models.py` and `alembic/versions/...` — possibly relax `Job.user_id` to nullable (TBD per schema check).
|
|
||||||
- `deploy/` — systemd `.service` + `.timer` units, wired into the install flow.
|
|
||||||
- `l4d2web/tests/test_overlay_builders.py`, `test_workshop_routes.py`, `test_cli.py` — new test cases per the testing section.
|
|
||||||
|
|
||||||
The implementation plan will turn these into ordered steps with explicit checkpoints.
|
|
||||||
|
|
@ -1,396 +0,0 @@
|
||||||
# Server live-state display (counts, map, roster, avatars, history)
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The l4d2web UI currently shows systemd lifecycle state per game server (running/stopped/unknown) but nothing about what's happening *inside* the game: player count, current map, whether the server is hibernating, who is connected. To know any of that, users have to context-switch (open the game, query externally).
|
|
||||||
|
|
||||||
The goal is a **read-side live-state display**: counts + map + hibernating on the server list, plus a server-detail panel showing the current player roster (avatars + names) and a "recent players" section for who's been on lately. Backed by a persistent history table so we get count-over-time graphs and player-presence history (foundation for future ban UX) for free.
|
|
||||||
|
|
||||||
**Source: RCON exclusively.** A2S_INFO (UDP, anonymous) was investigated and discarded — it can't deliver Steam IDs, hibernating flag, or interactive commands, so anything beyond raw counts re-routes through RCON anyway. Both transports were verified working against prod `left4.me`. Going RCON-only means one transport, one set of tests, no throwaway scaffolding.
|
|
||||||
|
|
||||||
**Avatars: Steam Web API.** RCON gives Steam IDs; `ISteamUser/GetPlayerSummaries` resolves them to persona names + avatar URLs hot-linked from Steam's CDN. API key already obtained.
|
|
||||||
|
|
||||||
**Commands are deferred** to a separate plan. This plan is read-only.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────┐
|
|
||||||
│ left4me-web (Flask) │
|
|
||||||
┌──────────────┐ RCON │ ┌───────────────────────┐ │
|
|
||||||
│ srcds 27016 │◄──────┼──┤ live-state poller │ │
|
|
||||||
└──────────────┘ TCP │ │ (daemon thread) │ │
|
|
||||||
│ └───────┬───────────────┘ │
|
|
||||||
┌──────────────┐ RCON │ │ writes │
|
|
||||||
│ srcds 27021 │◄──────┤ ▼ │
|
|
||||||
└──────────────┘ │ ┌───────────────────────┐ │
|
|
||||||
│ │ server_live_state │ │
|
|
||||||
Steam Web API │ │ server_player_session │ │
|
|
||||||
┌────────────┐ │ │ steam_user_profile │ │
|
|
||||||
│ Steam CDN │◄─┼──┤ │ │
|
|
||||||
│ avatars... │ │ └───────┬───────────────┘ │
|
|
||||||
└────────────┘ │ │ reads │
|
|
||||||
▲ │ ▼ │
|
|
||||||
│ │ ┌───────────────────────┐ │
|
|
||||||
└────────┼──┤ /servers, /servers/N │ │
|
|
||||||
<img src=...> │ │ (HTMX 5s refresh) │ │
|
|
||||||
│ └───────────────────────┘ │
|
|
||||||
└─────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
Single daemon thread (modeled on the existing `start_state_poller` in `l4d2web/services/job_worker.py:617-647`), inside the Flask process, polls every `LIVE_STATE_POLL_SECONDS` (default 5). Per poll, per running server with a configured RCON password:
|
|
||||||
|
|
||||||
1. TCP connect to `127.0.0.1:<port>`, auth, send `status`, parse response.
|
|
||||||
2. Compare server-level state (players/map/hibernating/etc.) to the latest `server_live_state` row for this server. If unchanged, bump `last_seen_at`. If changed, insert a new row.
|
|
||||||
3. Reconcile open sessions (`server_player_session` rows where `left_at IS NULL`) with the current `status` roster: open new sessions for new players (backfilling `joined_at` from RCON's `connected` field), close sessions for players no longer present, update `min_ping`/`max_ping` for continuing sessions.
|
|
||||||
4. Collect Steam IDs that are missing from `steam_user_profile` or have `fetched_at` older than 24h; batch them into a single `GetPlayerSummaries` call; upsert results.
|
|
||||||
5. Trim `server_live_state` and closed sessions older than retention.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Schema (one new alembic migration)
|
|
||||||
|
|
||||||
### New column: `servers.rcon_password`
|
|
||||||
|
|
||||||
```python
|
|
||||||
rcon_password: Mapped[str] = mapped_column(
|
|
||||||
String(64), nullable=False, default="", server_default=""
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Empty string = "no password configured yet" (poller skips). Migration backfills every existing row with `secrets.token_urlsafe(32)` (~43 chars, URL-safe character set so the literal `"..."` cfg-quoting needs no escaping).
|
|
||||||
|
|
||||||
### `server_live_state` — run-length-encoded snapshots
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE server_live_state (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE,
|
|
||||||
started_at DATETIME NOT NULL, -- when this exact state first appeared
|
|
||||||
last_seen_at DATETIME NOT NULL, -- most recent poll where it still held
|
|
||||||
players INTEGER NOT NULL,
|
|
||||||
max_players INTEGER NOT NULL,
|
|
||||||
bots INTEGER NOT NULL,
|
|
||||||
map VARCHAR(64) NOT NULL,
|
|
||||||
hibernating BOOLEAN NOT NULL
|
|
||||||
);
|
|
||||||
CREATE INDEX ix_sls_server_started ON server_live_state(server_id, started_at DESC);
|
|
||||||
```
|
|
||||||
|
|
||||||
- "State" = the tuple `(players, max_players, bots, map, hibernating)`. Ping/loss are deliberately not stored at server-level, so they don't churn rows.
|
|
||||||
- Idle hibernating server collapses from one-row-per-poll to one-row-per-state-change (≈17,280× compression for a 24h-idle server).
|
|
||||||
- Latest snapshot for a server: `ORDER BY started_at DESC LIMIT 1`. UI staleness check: `last_seen_at > now - LIVE_STATE_STALE_SECONDS` (default 30).
|
|
||||||
- Retention: trim rows where `last_seen_at < now - LIVE_STATE_HISTORY_DAYS` (default 30).
|
|
||||||
- Failed polls produce no DB write; the staleness check on `last_seen_at` handles UI degradation cleanly.
|
|
||||||
|
|
||||||
### `server_player_session` — interval per connection
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE server_player_session (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE,
|
|
||||||
steam_id_64 VARCHAR(20) NOT NULL,
|
|
||||||
joined_at DATETIME NOT NULL,
|
|
||||||
left_at DATETIME NULL, -- NULL = currently in-game
|
|
||||||
name_at_join VARCHAR(64) NOT NULL,
|
|
||||||
min_ping INTEGER NOT NULL,
|
|
||||||
max_ping INTEGER NOT NULL
|
|
||||||
);
|
|
||||||
CREATE INDEX ix_sps_server_open ON server_player_session(server_id, left_at);
|
|
||||||
CREATE INDEX ix_sps_steam_history ON server_player_session(steam_id_64, joined_at);
|
|
||||||
```
|
|
||||||
|
|
||||||
- `joined_at` is **backfilled from RCON's `connected` duration** on first sighting (`joined_at = now - connected_seconds`). This heals brief polling gaps and survives web restarts: even if we just started polling, we know when the still-connected players actually joined.
|
|
||||||
- A player who disconnects and rejoins gets two rows, not one merged interval.
|
|
||||||
- Bots are excluded — rows with a non-`STEAM_X:Y:Z` uniqueid are skipped.
|
|
||||||
- `min_ping`/`max_ping` updated only when a new poll pushes the range, to avoid noise writes.
|
|
||||||
- On poller startup, close any sessions whose server isn't in current RCON output. Plus: close sessions after N consecutive failed polls of their server (TBD constant during implementation, e.g. 6 polls = ~30s).
|
|
||||||
- Retention: trim closed sessions where `left_at < now - SESSION_HISTORY_DAYS` (default 30). Open sessions never trimmed.
|
|
||||||
|
|
||||||
### `steam_user_profile` — cached profile data (24h TTL)
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE steam_user_profile (
|
|
||||||
steam_id_64 VARCHAR(20) PRIMARY KEY,
|
|
||||||
persona_name VARCHAR(64) NOT NULL,
|
|
||||||
avatar_url TEXT NOT NULL, -- avatarmedium from Steam Web API
|
|
||||||
fetched_at DATETIME NOT NULL
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
- Cache is global, not per-server (one profile per Steam ID).
|
|
||||||
- Refreshed when `fetched_at < now - 24h` or when entry is missing.
|
|
||||||
- Soft-fail: if the Steam API key is unset, the API is down, or a profile is private, we just leave the cache as-is and the UI falls back to `name_at_join` + placeholder avatar.
|
|
||||||
|
|
||||||
### Bind-rendered queries
|
|
||||||
|
|
||||||
**Current players on server X:**
|
|
||||||
```sql
|
|
||||||
SELECT sp.steam_id_64, sp.joined_at, sp.name_at_join,
|
|
||||||
sp.min_ping, sp.max_ping,
|
|
||||||
p.persona_name, p.avatar_url
|
|
||||||
FROM server_player_session sp
|
|
||||||
LEFT JOIN steam_user_profile p USING (steam_id_64)
|
|
||||||
WHERE sp.server_id = ? AND sp.left_at IS NULL
|
|
||||||
ORDER BY sp.joined_at;
|
|
||||||
```
|
|
||||||
|
|
||||||
**Recent players on server X (last 30 days, excluding currently in-game):**
|
|
||||||
```sql
|
|
||||||
SELECT sp.steam_id_64, MAX(sp.left_at) AS last_seen,
|
|
||||||
p.persona_name, p.avatar_url
|
|
||||||
FROM server_player_session sp
|
|
||||||
LEFT JOIN steam_user_profile p USING (steam_id_64)
|
|
||||||
WHERE sp.server_id = ?
|
|
||||||
AND sp.left_at IS NOT NULL
|
|
||||||
AND sp.left_at > datetime('now', '-30 days')
|
|
||||||
AND sp.steam_id_64 NOT IN (
|
|
||||||
SELECT steam_id_64 FROM server_player_session
|
|
||||||
WHERE server_id = ? AND left_at IS NULL
|
|
||||||
)
|
|
||||||
GROUP BY sp.steam_id_64, p.persona_name, p.avatar_url
|
|
||||||
ORDER BY last_seen DESC
|
|
||||||
LIMIT 20;
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Modules
|
|
||||||
|
|
||||||
### `l4d2web/services/rcon.py` (new)
|
|
||||||
|
|
||||||
Pure stdlib (`socket`, `struct`), no new dependency. Source RCON protocol:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@dataclass(slots=True, frozen=True)
|
|
||||||
class PlayerRow:
|
|
||||||
steam_id_64: str # converted from STEAM_X:Y:Z
|
|
||||||
name: str
|
|
||||||
connected_seconds: int
|
|
||||||
ping: int
|
|
||||||
|
|
||||||
@dataclass(slots=True, frozen=True)
|
|
||||||
class StatusResponse:
|
|
||||||
map: str
|
|
||||||
players: int # humans
|
|
||||||
max_players: int
|
|
||||||
bots: int
|
|
||||||
hibernating: bool
|
|
||||||
roster: list[PlayerRow]
|
|
||||||
|
|
||||||
class RconError(Exception): ...
|
|
||||||
class RconAuthError(RconError): ...
|
|
||||||
|
|
||||||
def query_status(host: str, port: int, password: str, *, timeout: float = 2.0) -> StatusResponse: ...
|
|
||||||
```
|
|
||||||
|
|
||||||
Implementation notes:
|
|
||||||
- Auth handshake quirk verified live: server sends a `type=0` empty-body packet **before** the `type=2` auth response. Consume both. `req_id == -1` on the auth response = bad password.
|
|
||||||
- Single TCP connection per query (loopback, ~10-20ms total round-trip — pooling not worth it at this scale).
|
|
||||||
- Header regex on `map :` and `players :` lines (the `(hibernating|not hibernating)` token is in `players :`).
|
|
||||||
- Roster regex: split lines starting with `#`, skip the column-header line, robustly extract the quoted name + the `STEAM_X:Y:Z` token + `MM:SS` or `HH:MM:SS` connected duration + ping. Tolerate the two-numeric-prefix L4D2 variant (`# 2 1 "Crone" STEAM_1:0:...`).
|
|
||||||
- Steam ID conversion: `STEAM_X:Y:Z` → `76561197960265728 + (Z * 2) + Y` (Y is the low bit; returned as string).
|
|
||||||
|
|
||||||
### `l4d2web/services/steam_users.py` (new)
|
|
||||||
|
|
||||||
Modeled directly on `l4d2web/services/steam_workshop.py:17-43` (single `requests.Session`, 30s timeout, anonymous-pattern POST with form-encoded body — only difference is the `key=` parameter).
|
|
||||||
|
|
||||||
```python
|
|
||||||
@dataclass(slots=True, frozen=True)
|
|
||||||
class SteamProfile:
|
|
||||||
steam_id_64: str
|
|
||||||
persona_name: str
|
|
||||||
avatar_url: str # avatarmedium
|
|
||||||
|
|
||||||
def fetch_profiles_batch(steam_ids: Iterable[str], *, api_key: str) -> list[SteamProfile]: ...
|
|
||||||
```
|
|
||||||
|
|
||||||
- Endpoint: `GET https://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key=<key>&steamids=<csv>`.
|
|
||||||
- Up to 100 IDs per call; caller batches.
|
|
||||||
- Returns only successful resolutions (private/deleted accounts simply absent from the response — fine, they stay uncached and the UI falls back).
|
|
||||||
- Raises on transport errors; caller decides whether to surface.
|
|
||||||
|
|
||||||
### `l4d2web/services/live_state_poller.py` (new)
|
|
||||||
|
|
||||||
Modeled on `start_state_poller` / `state_poller_loop` in `l4d2web/services/job_worker.py:617-647`.
|
|
||||||
|
|
||||||
```python
|
|
||||||
def start_live_state_poller(app) -> None: ... # spawns daemon thread, skipped under TESTING
|
|
||||||
def live_state_poller_loop(app, interval: float) -> None: ...
|
|
||||||
def poll_once() -> None: # one full pass over running servers
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Per-server algorithm:
|
|
||||||
1. RCON `status` → `StatusResponse` (or skip on auth/timeout, logged via `app.logger`).
|
|
||||||
2. **Server-level RLE upsert**: load newest `server_live_state` row for this server. If `(players, max_players, bots, map, hibernating)` matches → `UPDATE last_seen_at = now()`. Else → `INSERT` new row.
|
|
||||||
3. **Session reconciliation** in a single transaction:
|
|
||||||
- Load open sessions for this server.
|
|
||||||
- For each player in `response.roster` not in open sessions: `INSERT` new session with `joined_at = now - connected_seconds`, `name_at_join = roster.name`, `min_ping = max_ping = roster.ping`.
|
|
||||||
- For each open session whose player is in the roster: if `roster.ping < min_ping` or `> max_ping`, `UPDATE` the range. Otherwise skip the write.
|
|
||||||
- For each open session whose player is *not* in the roster: `UPDATE left_at = now()`.
|
|
||||||
4. **Profile enrichment**: collect Steam IDs from the roster where the cached profile is missing or `fetched_at < now - 24h`. Skip if `STEAM_WEB_API_KEY` unset. Batch into one Steam API call. Upsert results.
|
|
||||||
|
|
||||||
Periodic (every Nth cycle, e.g. once a minute):
|
|
||||||
- Trim `server_live_state` and closed sessions past retention.
|
|
||||||
- Close any open sessions whose `server_id` hasn't had a successful RCON response in the last `STUCK_SESSION_SECONDS` (default 60).
|
|
||||||
|
|
||||||
### Modify: `l4d2web/services/l4d2_facade.py:28-52`
|
|
||||||
|
|
||||||
`build_server_spec_payload` **appends** `f'rcon_password "{server.rcon_password}"'` as the *last* entry in the returned `config` list, only if the password is non-empty. Appending (not prepending) matters: Source's cfg semantics are last-wins, so putting our line after both the overlay `exec` lines and the user's blueprint config guarantees no overlay or blueprint can silently clobber the password and break the poller. `l4d2host/instances.py:40-58` already writes `spec.config` lines verbatim to `server.cfg` — **no host-side change needed**.
|
|
||||||
|
|
||||||
### Modify: server-create route
|
|
||||||
|
|
||||||
Wherever the server-create form handler lives (`l4d2web/routes/server_routes.py` or similar — confirm during implementation): before commit, generate `rcon_password = secrets.token_urlsafe(32)`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Web UI
|
|
||||||
|
|
||||||
### Server list (template TBD: `ls l4d2web/templates/` during implementation)
|
|
||||||
|
|
||||||
Add an inline live-state cell per server row:
|
|
||||||
- Stopped server: `—`
|
|
||||||
- Stale (no row newer than `LIVE_STATE_STALE_SECONDS`): dim `?` with tooltip "no data"
|
|
||||||
- Hibernating: `0/4 · idle · c1m1_hotel`
|
|
||||||
- Active: `2/4 · c1m2_streets`
|
|
||||||
|
|
||||||
No HTMX on the list page; page reload picks up the latest snapshot.
|
|
||||||
|
|
||||||
### Server detail (`l4d2web/templates/server_detail.html`)
|
|
||||||
|
|
||||||
New section, HTMX-refreshed every `LIVE_STATE_POLL_SECONDS` (default 5):
|
|
||||||
|
|
||||||
```html
|
|
||||||
<section class="panel"
|
|
||||||
hx-get="/servers/{{ server.id }}/live-state"
|
|
||||||
hx-trigger="every 5s"
|
|
||||||
hx-swap="outerHTML">
|
|
||||||
<!-- rendered from l4d2web/templates/_live_state.html -->
|
|
||||||
</section>
|
|
||||||
```
|
|
||||||
|
|
||||||
The partial renders three blocks:
|
|
||||||
|
|
||||||
1. **Summary**: `players/max_players · map · idle?` plus a small "polled Ns ago" caption.
|
|
||||||
2. **Current players** (only if non-empty): grid of cards, each `<img src="{{ profile.avatar_url or placeholder }}" /> {{ profile.persona_name or session.name_at_join }} · {{ joined_relative }} · ping {{ min }}-{{ max }}ms`.
|
|
||||||
3. **Recent players** (last 30 days, excluding current; only if non-empty): smaller cards, `{{ avatar }} {{ persona_name or name_at_join }} · last seen {{ last_seen_relative }}`.
|
|
||||||
|
|
||||||
New route: `GET /servers/<id>/live-state` returns the partial. Composition mirrors the existing build-status pattern at `l4d2web/templates/_overlay_build_status.html:1-5`.
|
|
||||||
|
|
||||||
Avatar `<img>` tags point straight at Steam CDN URLs (`avatars.cloudflare.steamstatic.com` / `avatars.akamai.steamstatic.com`). No proxying. Same approach as `WorkshopItem.preview_url`. Note: confirm the existing CSP allows these hosts; if not, extend it.
|
|
||||||
|
|
||||||
No JS framework added — HTMX only.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Config keys
|
|
||||||
|
|
||||||
In `l4d2web/config.py`, plus documented defaults in `deploy/templates/etc/left4me/web.env` where applicable:
|
|
||||||
|
|
||||||
| key | default | purpose |
|
|
||||||
|---|---|---|
|
|
||||||
| `LIVE_STATE_POLL_SECONDS` | `5` | poll interval |
|
|
||||||
| `LIVE_STATE_QUERY_TIMEOUT_SECONDS` | `2.0` | per-RCON-query timeout |
|
|
||||||
| `LIVE_STATE_POLL_WORKERS` | `4` | thread-pool size for parallel per-server polls |
|
|
||||||
| `LIVE_STATE_STALE_SECONDS` | `30` | UI staleness threshold |
|
|
||||||
| `LIVE_STATE_HISTORY_DAYS` | `30` | retention for snapshots + closed sessions |
|
|
||||||
| `STUCK_SESSION_SECONDS` | `60` | close open sessions whose server has been unreachable for this long |
|
|
||||||
| `STEAM_PROFILE_TTL_SECONDS` | `86400` | profile cache TTL |
|
|
||||||
| `STEAM_WEB_API_KEY` | `""` | from `web.env`; empty disables enrichment |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Tests
|
|
||||||
|
|
||||||
- `l4d2web/tests/test_rcon.py` — protocol handshake against an in-process TCP fixture: auth-success, auth-failure (`req_id == -1`), header parse (incl. `(hibernating)` and `(reserved <token>)` variants), roster parse (incl. the two-numeric-prefix L4D2 variant), Steam ID conversion.
|
|
||||||
- `l4d2web/tests/test_steam_users.py` — request shape (key in querystring, batched ids, 100-per-call ceiling), response parsing, partial response (some IDs missing).
|
|
||||||
- `l4d2web/tests/test_live_state_poller.py` — mirror `test_state_poller_*` at `l4d2web/tests/test_job_worker.py:882-952`. Cover: iterates only running servers with non-empty `rcon_password`, RLE upsert (matching state → `last_seen_at` bump only; differing state → new row), session open with backfilled `joined_at`, session close on disappearance, ping range expansion, stuck-session close after N failures, drops auth failures silently, respects retention.
|
|
||||||
- `l4d2web/tests/test_server_routes.py` (extend) — `/servers/<id>/live-state` fragment route renders summary/current/recent blocks correctly; stale rendering when latest snapshot is old; soft-fail rendering when no profile cached.
|
|
||||||
- `l4d2web/tests/test_l4d2_facade.py` (extend) — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when password is set; omits the line when empty; appears after both the overlay `exec` lines and the blueprint config lines.
|
|
||||||
- Migration test — existing rows backfilled with non-empty 43-char passwords; tables created with correct indexes.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Critical files
|
|
||||||
|
|
||||||
**New:**
|
|
||||||
- `l4d2web/services/rcon.py` — Source RCON client + status parser
|
|
||||||
- `l4d2web/services/steam_users.py` — Steam Web API client (mirrors `steam_workshop.py`)
|
|
||||||
- `l4d2web/services/live_state_poller.py` — background thread + poll loop + session reconciler
|
|
||||||
- `l4d2web/alembic/versions/00XX_server_live_state.py` — migration: new column, three new tables, password backfill
|
|
||||||
- `l4d2web/templates/_live_state.html` — HTMX-refreshed fragment (summary + current + recent)
|
|
||||||
- `l4d2web/tests/test_rcon.py`, `l4d2web/tests/test_steam_users.py`, `l4d2web/tests/test_live_state_poller.py`
|
|
||||||
|
|
||||||
**Modify:**
|
|
||||||
- `l4d2web/models.py` — add `ServerLiveState`, `ServerPlayerSession`, `SteamUserProfile`; add `rcon_password` to `Server` (after line 137)
|
|
||||||
- `l4d2web/services/l4d2_facade.py:28-52` — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when set
|
|
||||||
- `l4d2web/app.py` — call `start_live_state_poller(app)` next to existing `start_state_poller`
|
|
||||||
- `l4d2web/routes/server_routes.py` (or equivalent — confirm) — generate `rcon_password` in create handler; add `GET /servers/<id>/live-state`
|
|
||||||
- `l4d2web/templates/server_detail.html` — include `_live_state.html`
|
|
||||||
- `l4d2web/templates/<server-list>.html` — confirm filename; add inline badge column
|
|
||||||
- `l4d2web/config.py` — register the eight new config keys
|
|
||||||
- `deploy/templates/etc/left4me/web.env` — add `STEAM_WEB_API_KEY=` and any tunables we expose
|
|
||||||
|
|
||||||
**Reused without changes:**
|
|
||||||
- `l4d2web/services/job_worker.py:617-647` — daemon-thread / poll-loop pattern reference
|
|
||||||
- `l4d2web/services/steam_workshop.py:17-43` — `requests.Session` + form-POST pattern for Steam Web API
|
|
||||||
- `l4d2host/instances.py:40-58` — already writes `spec.config` verbatim, so no host-side change for password injection
|
|
||||||
- `l4d2web/templates/_overlay_build_status.html` — HTMX polling pattern reference
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. **Unit tests**:
|
|
||||||
```
|
|
||||||
pytest l4d2web/tests/test_rcon.py l4d2web/tests/test_steam_users.py l4d2web/tests/test_live_state_poller.py -v
|
|
||||||
pytest l4d2web/tests -q # full regression
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Migration check**:
|
|
||||||
```
|
|
||||||
alembic upgrade head
|
|
||||||
sqlite3 l4d2web.db "SELECT id, name, length(rcon_password) FROM servers;" # every row ~43
|
|
||||||
sqlite3 l4d2web.db ".schema server_live_state server_player_session steam_user_profile"
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **End-to-end against prod** (`left4.me`):
|
|
||||||
- Deploy. Confirm `systemctl status left4me-web.service` shows no crash-loop and the journal logs `start_live_state_poller` once.
|
|
||||||
- Restart both existing game servers so they pick up the injected password.
|
|
||||||
- SQL sanity (web-host shell):
|
|
||||||
```
|
|
||||||
sqlite3 l4d2web.db "SELECT server_id, started_at, last_seen_at, players, map, hibernating
|
|
||||||
FROM server_live_state ORDER BY server_id, started_at DESC LIMIT 10;"
|
|
||||||
```
|
|
||||||
Expect a single recent row per server while idle; new rows when players come/go.
|
|
||||||
- Connect to one server from the L4D2 client; within 5s, `/servers/<id>` shows a card with your avatar + persona name + ping range. Disconnect; within 5s the card moves to "recent."
|
|
||||||
- `sqlite3 l4d2web.db "SELECT * FROM server_player_session WHERE left_at IS NULL;"` — empty when nobody's connected; one row per current player when someone is.
|
|
||||||
- `sqlite3 l4d2web.db "SELECT count(*), MIN(fetched_at), MAX(fetched_at) FROM steam_user_profile;"` — at least one row after a player has been resolved.
|
|
||||||
|
|
||||||
4. **Failure-path checks**:
|
|
||||||
- Manually corrupt `servers.rcon_password` for one server; confirm the journal logs auth failure and the row's badge goes stale within `LIVE_STATE_STALE_SECONDS`; other servers unaffected.
|
|
||||||
- Unset `STEAM_WEB_API_KEY` in `web.env`, restart web; confirm display still works (in-game names + placeholder avatars), no errors in journal.
|
|
||||||
- `nft` drop the loopback TCP on one server's port; confirm rows stop appearing, open sessions close after `STUCK_SESSION_SECONDS`, badge goes stale.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Open implementation questions
|
|
||||||
|
|
||||||
- **Server-list template filename**: confirm with `ls l4d2web/templates/` once implementation starts.
|
|
||||||
- **Server-create route location**: confirm path (likely `l4d2web/routes/server_routes.py`).
|
|
||||||
- **CSP allowlist for Steam avatar CDNs**: check `l4d2web/app.py` (or wherever security headers live) — extend `img-src` to include `avatars.cloudflare.steamstatic.com`, `avatars.akamai.steamstatic.com`, `avatars.steamstatic.com` if a CSP is enforced.
|
|
||||||
- **Adaptive backoff** for hibernating servers: defer; start with fixed 5s and revisit only if load becomes a concern (which it won't at current server count).
|
|
||||||
- **Migration data step**: SQLite alembic batch operation with a Python data step that iterates rows and generates `secrets.token_urlsafe(32)` per row — confirm pattern against existing migrations under `l4d2web/alembic/versions/`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Deferred to a separate plan
|
|
||||||
|
|
||||||
- Generic RCON command execution (`changelevel`, `kick`, `say`, `sm_ban`, ...)
|
|
||||||
- Web UI buttons mapped to those commands with CSRF + admin authz
|
|
||||||
- Audit log table for issued commands
|
|
||||||
- Player-count history graphs (data already accumulating from this plan)
|
|
||||||
- Ban UX (lookup by Steam ID, search across `server_player_session`)
|
|
||||||
|
|
@ -1,61 +0,0 @@
|
||||||
# RCON Password Display on Server Detail Page — Design
|
|
||||||
|
|
||||||
**Goal:** Show the RCON password on the server detail page with a show/hide toggle.
|
|
||||||
|
|
||||||
**Architecture:** Presentational change only. The `server.rcon_password` field already exists in the database and is rendered via Jinja2 autoescaping into the template. A small external JS file provides the reveal/hide interaction via delegated click on `[data-password-toggle]` attributes — no inline handlers.
|
|
||||||
|
|
||||||
**Files touched:**
|
|
||||||
- `l4d2web/static/js/password-reveal.js` — new, ~15 lines
|
|
||||||
- `l4d2web/templates/server_detail.html` — add one row to `.server-info` DL
|
|
||||||
- `l4d2web/templates/base.html` — add script include
|
|
||||||
- `l4d2web/static/css/components.css` — optional, add `.password-mask` letter-spacing if default renders poorly
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Template
|
|
||||||
|
|
||||||
Add after the blueprint row in `server_detail.html` (line 13):
|
|
||||||
|
|
||||||
```html
|
|
||||||
<div>
|
|
||||||
<dt>RCON Password</dt>
|
|
||||||
<dd>
|
|
||||||
<span class="password-mask" data-password-field="{{ server.id }}">••••••••••••</span>
|
|
||||||
<span class="password-value" data-password-field="{{ server.id }}" hidden>{{ server.rcon_password }}</span>
|
|
||||||
<button class="link-button" data-password-toggle="{{ server.id }}" aria-label="Show RCON password">show</button>
|
|
||||||
</dd>
|
|
||||||
</div>
|
|
||||||
```
|
|
||||||
|
|
||||||
## JavaScript (`password-reveal.js`)
|
|
||||||
|
|
||||||
Delegated click listener on `[data-password-toggle]`. Toggles `hidden` between the mask span and value span, updates button text and aria-label.
|
|
||||||
|
|
||||||
```js
|
|
||||||
document.addEventListener('click', (e) => {
|
|
||||||
const btn = e.target.closest('[data-password-toggle]');
|
|
||||||
if (!btn) return;
|
|
||||||
const id = btn.dataset.passwordToggle;
|
|
||||||
const mask = document.querySelector(`[data-password-field="${id}"].password-mask`);
|
|
||||||
const value = document.querySelector(`[data-password-field="${id}"].password-value`);
|
|
||||||
const hidden = value.hidden;
|
|
||||||
value.hidden = !hidden;
|
|
||||||
mask.hidden = hidden;
|
|
||||||
btn.textContent = hidden ? 'hide' : 'show';
|
|
||||||
btn.setAttribute('aria-label', hidden ? 'Hide RCON password' : 'Show RCON password');
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
## CSS
|
|
||||||
|
|
||||||
Reuse existing `.link-button` for the toggle button. If the bullet characters render inconsistently across browsers (spacing, baseline), add a simple `.password-mask { letter-spacing: 0.15em; }` class — but likely unnecessary.
|
|
||||||
|
|
||||||
## Security
|
|
||||||
|
|
||||||
- Password is server-rendered via Jinja2 autoescaping — no XSS vector.
|
|
||||||
- Visible in page source to the server owner (consistent with existing auth model: user must own the server).
|
|
||||||
- No copy-to-clipboard functionality (per requirements).
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
No new tests required — purely presentational change. Existing `test_create_server_generates_rcon_password` in `test_servers.py` already covers password generation.
|
|
||||||
|
|
@ -1,77 +0,0 @@
|
||||||
# Server Hostname (Source `hostname` cvar) Design
|
|
||||||
|
|
||||||
**Goal:** Allow users to set the L4D2 server name (`hostname` cvar) that players see in the server browser and MOTD, with an ephemeral auto-generated fallback.
|
|
||||||
|
|
||||||
**Architecture:** A new `hostname VARCHAR(128)` column on the `servers` table. Empty string means "auto-generate at deploy time." The fallback is resolved ephemerally in `initialize_server` — computed fresh from `user.username + server.name` on each deploy, never persisted. Explicit overrides are stored and emitted verbatim.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Model
|
|
||||||
|
|
||||||
Add one column to `Server` in `l4d2web/models.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
hostname: Mapped[str] = mapped_column(String(128), default="", nullable=False)
|
|
||||||
```
|
|
||||||
|
|
||||||
Default `""` means auto-generate. Non-empty means explicit override.
|
|
||||||
|
|
||||||
## Behavior
|
|
||||||
|
|
||||||
| `hostname` value | Deploy result |
|
|
||||||
|---|---|
|
|
||||||
| `""` (empty) | Emit `hostname "<username> <server.name>"` — computed fresh each deploy, never written to DB |
|
|
||||||
| `"My Server"` | Emit `hostname "My Server"` verbatim |
|
|
||||||
| User clears the field | Resets to `""`, next deploy auto-generates |
|
|
||||||
|
|
||||||
The fallback is ephemeral — `initialize_server` resolves it in-memory for the spec YAML. The DB row stays empty. This means renames auto-propagate to the hostname on the next deploy without manual updates.
|
|
||||||
|
|
||||||
## Spec Payload
|
|
||||||
|
|
||||||
`build_server_spec_payload()` gains an optional `resolved_hostname: str = ""` keyword parameter. When non-empty, a `hostname "..."` line is inserted into the config array, before the `rcon_password` line (so rcon remains last-wins).
|
|
||||||
|
|
||||||
`initialize_server()` resolves the hostname:
|
|
||||||
|
|
||||||
```python
|
|
||||||
with session_scope() as db:
|
|
||||||
user = db.get(User, server.user_id)
|
|
||||||
resolved = server.hostname or f"{user.username} {server.name}"
|
|
||||||
```
|
|
||||||
|
|
||||||
## UI
|
|
||||||
|
|
||||||
On `server_detail.html`, a new row in the info `<dl>` block, placed after the RCON password row:
|
|
||||||
|
|
||||||
```
|
|
||||||
Hostname: [ _______________ ] [Save]
|
|
||||||
Leave empty for auto: "alice alpha"
|
|
||||||
```
|
|
||||||
|
|
||||||
- Input `name="hostname"`, `maxlength="128"`
|
|
||||||
- `value="{{ server.hostname }}"` (empty when not set)
|
|
||||||
- `placeholder="{{ user.username }} {{ server.name }}"` (previews auto-generated value)
|
|
||||||
- Form submits to `POST /servers/<id>` — same endpoint as the rename form
|
|
||||||
- No hostname field in the create-server modal; new servers always start with `hostname=""`
|
|
||||||
|
|
||||||
## Routes
|
|
||||||
|
|
||||||
**`POST /servers/<int:server_id>`** (update_server_form) — unchanged signature; just also saves `request.form.get("hostname", "")` to `server.hostname`.
|
|
||||||
|
|
||||||
**`POST /servers`** (create_server) — unchanged; `hostname` defaults to `""` from the model default.
|
|
||||||
|
|
||||||
## Files Touched
|
|
||||||
|
|
||||||
| File | Change |
|
|
||||||
|---|---|
|
|
||||||
| `l4d2web/models.py` | Add `hostname` column to `Server` |
|
|
||||||
| `l4d2web/alembic/versions/0011_server_hostname.py` | Migration — `ADD COLUMN hostname VARCHAR(128) NOT NULL DEFAULT ''` |
|
|
||||||
| `l4d2web/routes/server_routes.py` | `update_server_form` saves `hostname` from form |
|
|
||||||
| `l4d2web/services/l4d2_facade.py` | `build_server_spec_payload` accepts `resolved_hostname=`, emits `hostname "..."` line. `initialize_server` resolves fallback. |
|
|
||||||
| `l4d2web/templates/server_detail.html` | Hostname form row in info `<dl>` |
|
|
||||||
| `l4d2web/tests/test_servers.py` | Tests for create default, update, clear |
|
|
||||||
| `l4d2web/tests/test_l4d2_facade.py` | Tests for hostname in spec, fallback resolution |
|
|
||||||
|
|
||||||
## Open / Closed
|
|
||||||
|
|
||||||
- **Explicit vs ephemeral:** Explicit overrides persist; empty means auto at deploy time. No toggle, no "locked" mode needed in v1.
|
|
||||||
- **No hostname in create modal:** Simplifies the form. Hostname is configured post-creation on the detail page.
|
|
||||||
|
|
@ -1,579 +0,0 @@
|
||||||
# Build-overlay template unit — refactor the script-sandbox helper
|
|
||||||
|
|
||||||
**Status: open question, not settled design.** This is a handoff
|
|
||||||
document prompted by the build-time idmap landing on 2026-05-15. The
|
|
||||||
current `left4me-script-sandbox` shell helper works but has accumulated
|
|
||||||
several layers of complexity (idmap bind setup, trap cleanup, nsenter
|
|
||||||
self-wrap) that a systemd template unit would handle declaratively.
|
|
||||||
The same pattern is already established in the codebase for
|
|
||||||
gameservers (`left4me-server@.service`). A future session should
|
|
||||||
evaluate whether to refactor and, if so, follow the steps below.
|
|
||||||
|
|
||||||
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
|
|
||||||
> — see `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The
|
|
||||||
> idmap bind setup + trap cleanup are gone, so the remaining
|
|
||||||
> complexity in the helper is just the nsenter self-wrap. References
|
|
||||||
> below to `User=l4d2-sandbox` should read as `User=left4me`; the
|
|
||||||
> template refactor will inherit that cleanly.
|
|
||||||
|
|
||||||
## Why this came up
|
|
||||||
|
|
||||||
While verifying the build-time idmap refactor, the first 5 build jobs
|
|
||||||
failed with `mkdir: Permission denied` on `/overlay/...`. Root cause:
|
|
||||||
|
|
||||||
- `left4me-web.service` runs with `PrivateTmp=true`, which puts the
|
|
||||||
web app (and anything it sudoes into) in a private mount namespace.
|
|
||||||
- The script-sandbox helper, invoked via `sudo` from the web app,
|
|
||||||
inherits that namespace.
|
|
||||||
- The helper's `mount --bind --map-users=...` pre-creates the idmap
|
|
||||||
staging path *in the web app's namespace*.
|
|
||||||
- `systemd-run` (called by the helper) spawns a transient unit in
|
|
||||||
PID 1's mount namespace.
|
|
||||||
- The transient unit's `BindPaths=...:/overlay` resolves the staging
|
|
||||||
path in PID 1's namespace — where the bind doesn't exist. It sees
|
|
||||||
an empty root-owned dir at the staging path (mkdir'd by the helper
|
|
||||||
before the bind) and binds *that* to `/overlay`.
|
|
||||||
- Sandbox uid hits EACCES on every write.
|
|
||||||
|
|
||||||
We fixed it (commit `f1aa05d`) by self-wrapping the helper into
|
|
||||||
PID 1's mount namespace at the top of the script:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
if [[ "${L4D2_SANDBOX_IN_PID1_MNT_NS:-}" != "1" ]]; then
|
|
||||||
exec env L4D2_SANDBOX_IN_PID1_MNT_NS=1 \
|
|
||||||
/usr/bin/nsenter --mount=/proc/1/ns/mnt -- "$0" "$@"
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
That works. But it's a band-aid for an architectural friction:
|
|
||||||
**helper invocation via `sudo` from a hardened service forces us to
|
|
||||||
manually escape the caller's namespace before any mount syscall**.
|
|
||||||
If the helper were *itself* a systemd unit started by PID 1, the
|
|
||||||
namespace would be correct by default.
|
|
||||||
|
|
||||||
The gameserver helper handles this at the unit level. Its
|
|
||||||
ExecStartPre is:
|
|
||||||
|
|
||||||
```
|
|
||||||
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
|
||||||
```
|
|
||||||
|
|
||||||
i.e. wrapped in nsenter *at the unit*. The unit is started by PID 1,
|
|
||||||
so it has PID 1's namespace, then nsenter is a belt-and-braces.
|
|
||||||
Mirror that pattern for builds: introduce `build-overlay@.service` as
|
|
||||||
a template unit, have the worker activate it instead of forking a
|
|
||||||
helper.
|
|
||||||
|
|
||||||
## Current state (the thing being replaced)
|
|
||||||
|
|
||||||
Files:
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` —
|
|
||||||
the bash helper. ~100 lines. Self-wraps in nsenter, does pre-bind
|
|
||||||
with `--map-users`, invokes `systemd-run --quiet --collect --wait
|
|
||||||
--pipe -p ... -- /bin/bash /script.sh`, cleans up via trap.
|
|
||||||
- `l4d2web/services/overlay_builders.py:run_sandboxed_script` — the
|
|
||||||
worker entry point. Writes script content to
|
|
||||||
`/var/lib/left4me/sandbox-scripts/<uniqued>.sh`, invokes
|
|
||||||
`sudo -n /usr/local/libexec/left4me/left4me-script-sandbox <id>
|
|
||||||
<path>`, streams stdout/stderr via `subprocess.Popen` + the existing
|
|
||||||
`run_command` plumbing.
|
|
||||||
- `deploy/files/etc/sudoers.d/left4me` — grants `left4me` NOPASSWD to
|
|
||||||
the helper path.
|
|
||||||
|
|
||||||
What the helper actually does:
|
|
||||||
1. nsenter into PID 1's mount ns (the band-aid)
|
|
||||||
2. validate args + overlay dir exists
|
|
||||||
3. compute `STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}`
|
|
||||||
4. `trap` cleanup; pre-emptive `umount` of stale staging; `mkdir -p`
|
|
||||||
the staging
|
|
||||||
5. `mount --bind --map-users=$(id -u left4me):$(id -u l4d2-sandbox):1
|
|
||||||
--map-groups=... $OVERLAY_DIR $STAGING`
|
|
||||||
6. `systemd-run` with the full hardening profile, `BindPaths=$STAGING:/overlay`
|
|
||||||
7. Wait for completion, propagate exit code
|
|
||||||
8. trap fires: `umount $STAGING; rmdir $STAGING`
|
|
||||||
|
|
||||||
## Proposed design
|
|
||||||
|
|
||||||
Replace the bash helper with two systemd units (template + a slice)
|
|
||||||
emitted from ckn-bw's existing `systemd_units` reactor, plus a small
|
|
||||||
worker rewrite.
|
|
||||||
|
|
||||||
### `build-overlay@.service` (template unit)
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Unit]
|
|
||||||
Description=Sandboxed overlay build for instance %i
|
|
||||||
DefaultDependencies=no
|
|
||||||
After=local-fs.target
|
|
||||||
RequiresMountsFor=/var/lib/left4me/overlays/%i
|
|
||||||
ConditionPathIsDirectory=/var/lib/left4me/overlays/%i
|
|
||||||
ConditionPathExists=/var/lib/left4me/sandbox-scripts/%i.sh
|
|
||||||
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
User=l4d2-sandbox
|
|
||||||
Group=l4d2-sandbox
|
|
||||||
Slice=l4d2-build.slice
|
|
||||||
|
|
||||||
# Idmap bind: disk uid 980 (left4me) ↔ mount uid 981 (sandbox), so writes
|
|
||||||
# from the sandbox land on disk as left4me. + prefix runs as root before
|
|
||||||
# the User= drop (mount syscall requires CAP_SYS_ADMIN).
|
|
||||||
ExecStartPre=+/usr/bin/mkdir -p /run/left4me/idmap/%i
|
|
||||||
ExecStartPre=+/usr/bin/mount --bind \
|
|
||||||
--map-users=980:981:1 --map-groups=980:981:1 \
|
|
||||||
/var/lib/left4me/overlays/%i /run/left4me/idmap/%i
|
|
||||||
|
|
||||||
ExecStart=/bin/bash /script.sh
|
|
||||||
|
|
||||||
ExecStopPost=+-/usr/bin/umount /run/left4me/idmap/%i
|
|
||||||
ExecStopPost=+-/usr/bin/rmdir /run/left4me/idmap/%i
|
|
||||||
|
|
||||||
# Hardening — all the -p flags from the current bash helper, declared
|
|
||||||
# declaratively here instead of as systemd-run -p arguments.
|
|
||||||
NoNewPrivileges=yes
|
|
||||||
ProtectSystem=strict
|
|
||||||
ProtectHome=yes
|
|
||||||
PrivateTmp=yes
|
|
||||||
PrivateDevices=yes
|
|
||||||
PrivateIPC=yes
|
|
||||||
ProtectKernelTunables=yes
|
|
||||||
ProtectKernelModules=yes
|
|
||||||
ProtectKernelLogs=yes
|
|
||||||
ProtectControlGroups=yes
|
|
||||||
RestrictNamespaces=yes
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
|
|
||||||
RestrictSUIDSGID=yes
|
|
||||||
LockPersonality=yes
|
|
||||||
MemoryDenyWriteExecute=yes
|
|
||||||
SystemCallFilter=@system-service @network-io
|
|
||||||
SystemCallArchitectures=native
|
|
||||||
CapabilityBoundingSet=
|
|
||||||
AmbientCapabilities=
|
|
||||||
IPAddressDeny=127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7
|
|
||||||
TemporaryFileSystem=/etc /var/lib
|
|
||||||
BindReadOnlyPaths=/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf /etc/ssl /etc/ca-certificates /etc/nsswitch.conf /etc/alternatives /var/lib/left4me/sandbox-scripts/%i.sh:/script.sh
|
|
||||||
BindPaths=/run/left4me/idmap/%i:/overlay
|
|
||||||
WorkingDirectory=/overlay
|
|
||||||
Environment=HOME=/tmp PATH=/usr/bin:/usr/sbin OVERLAY=/overlay
|
|
||||||
UMask=0022
|
|
||||||
OOMScoreAdjust=500
|
|
||||||
MemoryMax=4G
|
|
||||||
MemorySwapMax=0
|
|
||||||
TasksMax=512
|
|
||||||
CPUQuota=200%
|
|
||||||
RuntimeMaxSec=3600
|
|
||||||
TimeoutStartSec=1h
|
|
||||||
TimeoutStopSec=30s
|
|
||||||
```
|
|
||||||
|
|
||||||
Notes:
|
|
||||||
- `Type=oneshot` makes `systemctl start` block until ExecStart exits.
|
|
||||||
- `ConditionPath*` provides early failure if the overlay dir or script
|
|
||||||
doesn't exist (avoids running the unit at all in those cases).
|
|
||||||
- `RequiresMountsFor=/var/lib/left4me/overlays/%i` ensures the parent
|
|
||||||
fs is mounted before this unit runs (`/` and `/var/lib` if it's a
|
|
||||||
separate mount point).
|
|
||||||
- `ExecStopPost` uses `+-` (root, ignore failures) — the bind might
|
|
||||||
already be torn down if the unit is restarting.
|
|
||||||
- `BindReadOnlyPaths=...:/script.sh` makes the per-overlay script
|
|
||||||
available at `/script.sh` inside the sandbox, picked from the
|
|
||||||
predictable path `/var/lib/left4me/sandbox-scripts/%i.sh`.
|
|
||||||
|
|
||||||
### Script source: filesystem vs. DB
|
|
||||||
|
|
||||||
**Critical design decision the future session must make.** The current
|
|
||||||
plan in the unit sketch above assumes the worker writes the script
|
|
||||||
content to `/var/lib/left4me/sandbox-scripts/<id>.sh` before calling
|
|
||||||
`systemctl start`. But the script *already lives in the DB* (the
|
|
||||||
`overlays.script` column), and the unit instance name `%i` is the
|
|
||||||
overlay row id. The filesystem copy is redundant unless we want it.
|
|
||||||
|
|
||||||
Three options:
|
|
||||||
|
|
||||||
**Option A — worker writes the script (the unit sketch above).**
|
|
||||||
Worker queries DB, writes `<id>.sh` to a known path, then
|
|
||||||
`systemctl start`. Unit reads via `BindReadOnlyPaths`. Simple, no DB
|
|
||||||
access from the unit, the existing `_sandbox_script_dir()` plumbing
|
|
||||||
mostly works. Cost: redundant on-disk copy; stale files between
|
|
||||||
builds if you don't clean them.
|
|
||||||
|
|
||||||
**Option B — unit fetches the script from the DB itself.** A small
|
|
||||||
root-side helper installed as
|
|
||||||
`/usr/local/libexec/left4me/left4me-fetch-script` does:
|
|
||||||
|
|
||||||
```python
|
|
||||||
#!/usr/bin/python3
|
|
||||||
import sqlite3, sys
|
|
||||||
overlay_id = int(sys.argv[1])
|
|
||||||
conn = sqlite3.connect("/var/lib/left4me/left4me.db")
|
|
||||||
row = conn.execute(
|
|
||||||
"SELECT script FROM overlays WHERE id = ?", (overlay_id,)
|
|
||||||
).fetchone()
|
|
||||||
sys.stdout.write((row[0] if row else "") or "")
|
|
||||||
```
|
|
||||||
|
|
||||||
Unit's ExecStartPre runs it as root (the `+` prefix), pipes the
|
|
||||||
output to a runtime path that ExecStart reads:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
RuntimeDirectory=left4me/sandbox-scripts
|
|
||||||
RuntimeDirectoryMode=0700
|
|
||||||
ExecStartPre=+/bin/sh -c '/usr/local/libexec/left4me/left4me-fetch-script %i \
|
|
||||||
> /run/left4me/sandbox-scripts/%i.sh && chmod 0644 /run/left4me/sandbox-scripts/%i.sh'
|
|
||||||
BindReadOnlyPaths=/run/left4me/sandbox-scripts/%i.sh:/script.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
(`RuntimeDirectory=` auto-creates `/run/left4me/sandbox-scripts/` on
|
|
||||||
start and removes it on stop, including the file inside.)
|
|
||||||
|
|
||||||
The fetch script doesn't need sudoers — it runs from ExecStartPre with
|
|
||||||
root privileges already. It only reads the DB; no writes. The DB is
|
|
||||||
`root:left4me 0640` so root can read it.
|
|
||||||
|
|
||||||
Worker becomes a one-liner: `sudo systemctl start build-overlay@<id>`.
|
|
||||||
No FS prep, no tmpfile cleanup.
|
|
||||||
|
|
||||||
**Option C — pipe the script content directly into bash stdin.** The
|
|
||||||
unit's ExecStart is something like
|
|
||||||
`/bin/sh -c "fetch-script %i | /bin/bash"`. Pros: no on-disk file at
|
|
||||||
all. Cons: `/bin/bash` runs without a file path, so `$0` is `bash` and
|
|
||||||
error messages look weird; harder to debug a failing script when there's
|
|
||||||
no file to inspect.
|
|
||||||
|
|
||||||
**Recommendation**: Option B. Decouples script storage (DB) from
|
|
||||||
sandbox transport (a /run/ runtime file). RuntimeDirectory= handles
|
|
||||||
cleanup. Worker becomes trivially small. The fetch-script helper is
|
|
||||||
~10 lines and stays in deploy/files/usr/local/libexec/left4me/.
|
|
||||||
|
|
||||||
If Option A is chosen instead, plan to track the script tmpfiles
|
|
||||||
explicitly so they don't accumulate. With Option B, RuntimeDirectory
|
|
||||||
auto-cleans on stop.
|
|
||||||
|
|
||||||
### Worker invocation
|
|
||||||
|
|
||||||
Replace `run_sandboxed_script` in
|
|
||||||
`l4d2web/services/overlay_builders.py`. The code below is the **Option
|
|
||||||
A** shape (worker writes the script). For **Option B** (recommended),
|
|
||||||
drop the `script_dir`/`script_path`/`write_text`/`chmod` lines — the
|
|
||||||
unit's ExecStartPre fetches from the DB. The signature can also drop
|
|
||||||
`script_text` since the worker doesn't need to pass content anymore.
|
|
||||||
|
|
||||||
```python
|
|
||||||
def run_sandboxed_script(
|
|
||||||
overlay_id: int,
|
|
||||||
script_text: str, # remove this param if Option B
|
|
||||||
*,
|
|
||||||
on_stdout: LogSink,
|
|
||||||
on_stderr: LogSink,
|
|
||||||
should_cancel: CancelCheck,
|
|
||||||
) -> None:
|
|
||||||
# The four lines below are Option A only — delete for Option B.
|
|
||||||
script_dir = _sandbox_script_dir()
|
|
||||||
script_dir.mkdir(parents=True, exist_ok=True)
|
|
||||||
script_path = script_dir / f"{overlay_id}.sh"
|
|
||||||
script_path.write_text(script_text or "")
|
|
||||||
os.chmod(script_path, 0o644)
|
|
||||||
|
|
||||||
unit = f"build-overlay@{overlay_id}.service"
|
|
||||||
|
|
||||||
# Tail the unit's journal as a sidecar so output streams into job-logs
|
|
||||||
# while the unit runs. --follow exits when the unit reaches "inactive".
|
|
||||||
journal = subprocess.Popen(
|
|
||||||
["journalctl", "--unit", unit, "--output=cat", "--follow",
|
|
||||||
"--since=now", "--no-pager"],
|
|
||||||
stdout=subprocess.PIPE,
|
|
||||||
stderr=subprocess.STDOUT,
|
|
||||||
text=True,
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
|
||||||
# Start the unit (sudoers permits this exact verb pattern).
|
|
||||||
# Type=oneshot makes this block until ExecStart returns.
|
|
||||||
rc = subprocess.run(
|
|
||||||
["sudo", "-n", "/bin/systemctl", "start", unit],
|
|
||||||
check=False,
|
|
||||||
).returncode
|
|
||||||
finally:
|
|
||||||
# Drain remaining journal lines (journalctl --follow may not have
|
|
||||||
# printed everything yet by the time systemctl returns).
|
|
||||||
journal.terminate()
|
|
||||||
try:
|
|
||||||
for line in journal.stdout or []:
|
|
||||||
on_stdout(line.rstrip("\n"))
|
|
||||||
finally:
|
|
||||||
journal.wait(timeout=5)
|
|
||||||
|
|
||||||
# Read exit code from the unit. ExecMainStatus is the script's rc;
|
|
||||||
# Result is "success" / "failed" / "timeout" etc.
|
|
||||||
show = subprocess.check_output(
|
|
||||||
["systemctl", "show", unit,
|
|
||||||
"-p", "ExecMainStatus", "-p", "Result", "--value"],
|
|
||||||
text=True,
|
|
||||||
).split()
|
|
||||||
exec_main_status = int(show[0])
|
|
||||||
result = show[1]
|
|
||||||
|
|
||||||
if rc != 0 or result != "success":
|
|
||||||
raise BuildError(
|
|
||||||
f"build-overlay@{overlay_id} failed: "
|
|
||||||
f"systemctl rc={rc} unit result={result} script exit={exec_main_status}"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
That's ~30 lines vs. ~50 today, and the helper script disappears
|
|
||||||
entirely.
|
|
||||||
|
|
||||||
**Two refinements to consider:**
|
|
||||||
|
|
||||||
1. **Cancel semantics**: today the worker's `should_cancel` callback
|
|
||||||
triggers a SIGTERM via the existing `run_command` plumbing. With
|
|
||||||
systemctl-start, you'd issue `systemctl stop build-overlay@<id>`
|
|
||||||
in a parallel thread when `should_cancel()` returns True. Wire
|
|
||||||
that up.
|
|
||||||
2. **Journal streaming race**: `journalctl --follow --since=now`
|
|
||||||
started *after* `systemctl start` may miss the first few lines.
|
|
||||||
Two fixes:
|
|
||||||
- Start the journal tail before systemctl-start (the unit doesn't
|
|
||||||
exist yet, so journalctl waits silently — verify this behaviour
|
|
||||||
on Trixie).
|
|
||||||
- Or use `journalctl --cursor` machinery: snapshot the cursor
|
|
||||||
before start, then read with `--cursor=` after.
|
|
||||||
|
|
||||||
Start-before is simpler and likely sufficient for L4D2 build
|
|
||||||
verbosity, where the first second of output isn't critical.
|
|
||||||
|
|
||||||
### Sudoers
|
|
||||||
|
|
||||||
Replace:
|
|
||||||
```
|
|
||||||
left4me ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
```
|
|
||||||
left4me ALL=(root) NOPASSWD: /bin/systemctl start build-overlay@*.service
|
|
||||||
left4me ALL=(root) NOPASSWD: /bin/systemctl stop build-overlay@*.service
|
|
||||||
```
|
|
||||||
|
|
||||||
(Tighter — verb-prefixed and instance-globbed. No script path passed.)
|
|
||||||
|
|
||||||
### Slice
|
|
||||||
|
|
||||||
`l4d2-build.slice` already exists (per the gameserver/sandbox today's
|
|
||||||
configuration). Reuse it — no change needed.
|
|
||||||
|
|
||||||
### Sandbox script tmpfile cleanup
|
|
||||||
|
|
||||||
Currently `run_sandboxed_script` writes a per-invocation
|
|
||||||
`tempfile.NamedTemporaryFile` with a random suffix and unlinks it in a
|
|
||||||
`finally`. With template-unit lookup, the script path is **predictable
|
|
||||||
per overlay id** (`/var/lib/left4me/sandbox-scripts/<id>.sh`).
|
|
||||||
Implications:
|
|
||||||
|
|
||||||
- Two concurrent builds for the *same* overlay id would clobber the
|
|
||||||
script file. The job queue already serializes per-overlay (per
|
|
||||||
`l4d2web/services/job_worker.py:OVERLAY_OPERATIONS`), so this
|
|
||||||
is OK.
|
|
||||||
- Scripts persist between builds (no auto-cleanup). Either accept
|
|
||||||
that (the next build overwrites) or delete after the unit goes
|
|
||||||
inactive. Recommend: leave them — small, useful for debugging.
|
|
||||||
|
|
||||||
## Migration
|
|
||||||
|
|
||||||
In order:
|
|
||||||
|
|
||||||
1. **Add the unit emission to ckn-bw's `bundles/left4me/metadata.py`
|
|
||||||
systemd_units reactor.** Mirror the pattern used for
|
|
||||||
`left4me-server@.service`. Drop in the template-unit content as
|
|
||||||
another reactor entry.
|
|
||||||
2. **Update sudoers** (`bundles/left4me/files/etc/sudoers.d/left4me`)
|
|
||||||
to permit `systemctl start/stop build-overlay@*.service` and
|
|
||||||
remove the script-sandbox grant.
|
|
||||||
3. **Replace `run_sandboxed_script` in left4me.** Add the new
|
|
||||||
journalctl-based output streaming, exit-code reading, and cancel
|
|
||||||
handling. Keep the function signature stable so callers
|
|
||||||
(`ScriptBuilder.build`, the wipe route) are unchanged.
|
|
||||||
4. **Delete `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`.**
|
|
||||||
5. **Update tests:**
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py`:
|
|
||||||
- Drop `test_script_sandbox_uses_idmap_staging` and any other
|
|
||||||
tests that read SCRIPT_SANDBOX_HELPER.
|
|
||||||
- Add tests that assert the new unit emission in ckn-bw's
|
|
||||||
reactor output. (But that's in the other repo — left4me's
|
|
||||||
deploy tests can't directly cover it.)
|
|
||||||
- Add a test that asserts the worker invokes
|
|
||||||
`sudo systemctl start build-overlay@*` (grep
|
|
||||||
`overlay_builders.py`).
|
|
||||||
- `l4d2web/tests/test_overlay_builders.py` (if it exists):
|
|
||||||
update mocks for `run_sandboxed_script` to expect the new
|
|
||||||
subprocess shape.
|
|
||||||
6. **Test on `left4.me`:**
|
|
||||||
- Push left4me, `bw apply ovh.left4me`. Apply also picks up the
|
|
||||||
new unit emission and the sudoers change.
|
|
||||||
- Trigger a script-overlay rebuild via the web UI or the
|
|
||||||
enqueue API path used in this session (see test history in
|
|
||||||
git log around 2026-05-15).
|
|
||||||
- Inspect: `journalctl -u build-overlay@9.service`,
|
|
||||||
`systemctl status build-overlay@9.service`.
|
|
||||||
- Verify on-disk state: overlay files end up `left4me`-owned;
|
|
||||||
idmap bind cleanly torn down (`findmnt | grep idmap` empty).
|
|
||||||
|
|
||||||
## Open decisions for the future session
|
|
||||||
|
|
||||||
0. **Script source: filesystem (Option A) vs. DB-fetched in ExecStartPre
|
|
||||||
(Option B) vs. piped to stdin (Option C).** See the "Script source"
|
|
||||||
section above. This is the highest-impact decision because it
|
|
||||||
shapes the worker, the unit's ExecStartPre, and whether you need
|
|
||||||
a fetch-script helper binary at all. Recommendation: Option B.
|
|
||||||
|
|
||||||
1. **`/run/left4me/idmap/%i` vs. `/var/lib/left4me/tmp/sandbox-idmap-%i`** —
|
|
||||||
`/run` is tmpfs and wiped on reboot, more correct for transient
|
|
||||||
mount paths. But it requires the dir to exist (created by
|
|
||||||
ExecStartPre). Either works.
|
|
||||||
2. **What to do with the existing `left4me-apply-cake` dead code** —
|
|
||||||
irrelevant to this refactor; flagged in the other handoff doc.
|
|
||||||
3. **Whether to drop the post-build `chmod o+r` in the sandbox helper** —
|
|
||||||
already gone in the build-time-idmap commit. (Verify in the new
|
|
||||||
unit nothing equivalent is needed; files are left4me-owned, web
|
|
||||||
reads via primary uid.)
|
|
||||||
4. **`Type=oneshot` vs. `Type=exec`** — oneshot blocks `systemctl
|
|
||||||
start`. exec doesn't. With oneshot we don't need the
|
|
||||||
`journalctl --follow` workaround if we read journal *after*
|
|
||||||
completion. But for live progress (which the existing builds
|
|
||||||
stream), `--follow` is still needed. Stick with oneshot.
|
|
||||||
5. **Should the unit set `KillMode=mixed`** to ensure children die on
|
|
||||||
stop? Worth checking — the existing systemd-run line doesn't set
|
|
||||||
it explicitly; defaults usually suffice.
|
|
||||||
6. **`StateDirectory=` vs. explicit `mkdir -p`** — systemd has
|
|
||||||
StateDirectory and RuntimeDirectory directives that auto-create
|
|
||||||
per-unit directories. Could replace the `mkdir -p /run/left4me/idmap/%i`
|
|
||||||
ExecStartPre with `RuntimeDirectory=left4me/idmap/%i`. Cleaner;
|
|
||||||
gets auto-cleanup on stop too. Recommend doing this — both the
|
|
||||||
mkdir and the rmdir ExecStopPost would go away.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
End-to-end smoke test on `left4.me` after the deploy:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# unit is installed and template-parseable
|
|
||||||
systemctl status build-overlay@.service # should show "loaded; static"
|
|
||||||
sudo systemd-analyze verify build-overlay@1.service
|
|
||||||
|
|
||||||
# enqueue a build via the web app's worker path (mimic the
|
|
||||||
# enqueue_build_overlay pattern from this session's job 64 onwards)
|
|
||||||
# then watch:
|
|
||||||
sudo journalctl -u build-overlay@9.service -f
|
|
||||||
|
|
||||||
# on completion:
|
|
||||||
systemctl show build-overlay@9.service -p Result -p ExecMainStatus
|
|
||||||
# expect: Result=success, ExecMainStatus=0
|
|
||||||
|
|
||||||
# disk state
|
|
||||||
sudo find /var/lib/left4me/overlays/9 -uid 981 # should be empty
|
|
||||||
sudo find /run/left4me/idmap # should not exist or be empty
|
|
||||||
|
|
||||||
# pid 1 mount table — no orphan idmap binds
|
|
||||||
sudo findmnt --task 1 -o TARGET | grep idmap # empty
|
|
||||||
```
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
- **Worker cancel-during-build**: today's `should_cancel` callback
|
|
||||||
signals via `run_command`'s child process. With the unit, the
|
|
||||||
worker needs a separate path: spawn a thread that polls
|
|
||||||
`should_cancel()` and calls `sudo systemctl stop build-overlay@<id>`
|
|
||||||
when triggered. Without this, builds that exceed `RuntimeMaxSec` or
|
|
||||||
hit user-cancel won't terminate promptly.
|
|
||||||
- **Journal lag at unit start**: `journalctl --follow` started before
|
|
||||||
`systemctl start` should pick up all output. If not, may need
|
|
||||||
cursor-based streaming. Test with a script that prints immediately
|
|
||||||
(`echo hello; exit 0`) — if "hello" appears in the job log, race
|
|
||||||
is handled.
|
|
||||||
- **Sudoers globbing**: `systemctl start build-overlay@*.service`
|
|
||||||
permits any instance id including weird strings like `../etc-passwd`.
|
|
||||||
Use a tighter glob if possible (e.g.,
|
|
||||||
`build-overlay@[0-9]*.service`). Test that sudoers rejects
|
|
||||||
unexpected instance names.
|
|
||||||
- **Type=oneshot return semantics**: confirm that `systemctl start
|
|
||||||
build-overlay@<id>` on a Type=oneshot unit returns rc=3 (or
|
|
||||||
similar) when the unit's ExecStart fails, so the worker can detect
|
|
||||||
failure without re-querying `systemctl show`.
|
|
||||||
- **Idle running over reboot**: a build that's running across a reboot
|
|
||||||
is killed when the system goes down. That's identical to today's
|
|
||||||
behavior with systemd-run. Acceptable.
|
|
||||||
- **The journalctl sidecar process accumulates as a zombie if not
|
|
||||||
reaped properly.** The proposed code does `journal.wait(timeout=5)`
|
|
||||||
— handle the timeout case (force-kill).
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
Reference files (with line numbers if applicable):
|
|
||||||
|
|
||||||
- **Current helper to be removed**:
|
|
||||||
`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
|
|
||||||
- **Current worker invoker**:
|
|
||||||
`l4d2web/services/overlay_builders.py:run_sandboxed_script` (~ln 324)
|
|
||||||
- **Current job-worker dispatch**:
|
|
||||||
`l4d2web/services/job_worker.py` (build_overlay operation)
|
|
||||||
- **Sudoers**:
|
|
||||||
`deploy/files/etc/sudoers.d/left4me` (matched verbatim in
|
|
||||||
`ckn-bw/bundles/left4me/files/etc/sudoers.d/left4me`)
|
|
||||||
- **Sample template unit pattern** (the model to copy):
|
|
||||||
`left4me-server@.service` emission in ckn-bw's
|
|
||||||
`bundles/left4me/metadata.py` systemd_units reactor.
|
|
||||||
- **Existing slice declaration** (already correct):
|
|
||||||
`l4d2-build.slice` in ckn-bw's reactor.
|
|
||||||
|
|
||||||
Recent commits that touched this surface:
|
|
||||||
- `4838108` — moved idmap to build time (the refactor that surfaced
|
|
||||||
the namespace bug)
|
|
||||||
- `f1aa05d` — added nsenter self-wrap (the band-aid this refactor
|
|
||||||
removes)
|
|
||||||
- `2f6a9cf`, `9053186`, `dd918ac` — earlier idmap-on-mount approach
|
|
||||||
that was reverted
|
|
||||||
|
|
||||||
Related design docs:
|
|
||||||
- `docs/superpowers/plans/2026-05-15-build-time-idmap.md` — the
|
|
||||||
plan whose architecture this refactor builds on
|
|
||||||
- `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md` —
|
|
||||||
unrelated open questions about deploy/ layout
|
|
||||||
|
|
||||||
## What's NOT in scope
|
|
||||||
|
|
||||||
- Rewriting the sandbox in Python / packaging differently.
|
|
||||||
- Changing the security hardening profile (the unit duplicates the
|
|
||||||
current set verbatim — adjust later if needed).
|
|
||||||
- Splitting the gameserver uid from the web app uid (noted in earlier
|
|
||||||
handoff doc).
|
|
||||||
- Re-evaluating whether `l4d2-sandbox` should exist as a separate
|
|
||||||
uid (kept; defense in depth).
|
|
||||||
- Touching the `left4me-overlay` gameserver helper (it already uses
|
|
||||||
the pattern; only the sandbox helper is being refactored to match).
|
|
||||||
|
|
||||||
## Estimate
|
|
||||||
|
|
||||||
Rough breakdown for the future session:
|
|
||||||
- Unit file design + ckn-bw reactor change: 1-2 hours
|
|
||||||
- Worker rewrite (run_sandboxed_script): 1-2 hours
|
|
||||||
- Tests: 1 hour
|
|
||||||
- Deploy + verify on test server: 30 min
|
|
||||||
- Bug-fix and iteration buffer: 1 hour
|
|
||||||
|
|
||||||
~5 hours of focused work, assuming no surprises with journalctl
|
|
||||||
streaming or sudoers semantics.
|
|
||||||
|
|
||||||
## Decision criteria for whether to do this
|
|
||||||
|
|
||||||
Do it if:
|
|
||||||
- You're about to make any other change to the sandbox hardening,
|
|
||||||
build lifecycle, or sandbox uid story.
|
|
||||||
- You're frustrated by debugging the existing helper.
|
|
||||||
- You want to remove the nsenter band-aid for hygiene.
|
|
||||||
|
|
||||||
Skip if:
|
|
||||||
- The sandbox is stable and you're not planning related changes.
|
|
||||||
- You'd rather invest the time in higher-value work elsewhere.
|
|
||||||
|
|
||||||
The current solution is fine; this refactor is upgrade-not-fix.
|
|
||||||
|
|
@ -1,252 +0,0 @@
|
||||||
# Deploy directory architecture — open questions
|
|
||||||
|
|
||||||
**Resolved 2026-05-15 by [`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`](../plans/2026-05-15-deploy-dir-rethink.md).**
|
|
||||||
Decision summary: `deploy/` is reference material; privileged scripts moved
|
|
||||||
to top-level `scripts/{libexec,sbin}/`; `deploy-test-server.sh` deleted;
|
|
||||||
dead static units (cake.service, nft-mark.service) deleted; reactor-emitted
|
|
||||||
units (server@, web, workshop-refresh.{service,timer}, slices) retained as
|
|
||||||
curated examples; ckn-bw `install_left4me_scripts` action repointed to the
|
|
||||||
new source paths. Body below preserved for archaeology.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Status: open questions, not a settled design.** This is a thinking-aloud
|
|
||||||
handoff prompted by the script-consolidation change on 2026-05-15. Decisions
|
|
||||||
deferred; a future session should pick this up, talk through the options,
|
|
||||||
and commit to one shape.
|
|
||||||
|
|
||||||
## What happened on 2026-05-15 that prompted this
|
|
||||||
|
|
||||||
Two changes landed in quick succession:
|
|
||||||
|
|
||||||
1. `left4me-overlay` grew idmap bind-mount support so kernel-overlayfs copy-up
|
|
||||||
from `l4d2-sandbox`-owned lowerdirs produces `left4me`-owned upperdir
|
|
||||||
entries (commits `2f6a9cf` + `9053186`).
|
|
||||||
2. Consolidated all five privileged scripts (4 libexec helpers + 1 sbin
|
|
||||||
admin CLI) so left4me owns the source of truth and ckn-bw `install`s
|
|
||||||
them from `/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/`
|
|
||||||
after `git_deploy` (left4me `f5e36ee`, ckn-bw `3ccaa91`).
|
|
||||||
|
|
||||||
During (2), several architectural assumptions got revised mid-flight rather
|
|
||||||
than thought through fully:
|
|
||||||
|
|
||||||
- `deploy/README.md` flipped from "Status: superseded, historical reference"
|
|
||||||
to "deploy/files/ is canonical, only `deploy-test-server.sh` is historical."
|
|
||||||
- The scripts kept their existing deeply-nested paths under
|
|
||||||
`deploy/files/usr/local/libexec/left4me/*` rather than moving to a
|
|
||||||
cleaner top-level layout (an earlier draft of the plan proposed `bin/`,
|
|
||||||
but the user pushed back on mixing the admin CLI with the helpers).
|
|
||||||
- The resulting state works but several things feel half-finished. This
|
|
||||||
document enumerates them so they don't rot.
|
|
||||||
|
|
||||||
## Current state to look at before deciding anything
|
|
||||||
|
|
||||||
- `deploy/files/usr/local/libexec/left4me/{left4me-systemctl,journalctl,overlay,script-sandbox,apply-cake}`
|
|
||||||
- `deploy/files/usr/local/sbin/left4me`
|
|
||||||
- `deploy/files/usr/local/lib/systemd/system/{left4me-server@.service,left4me-web.service,...}` — **NOT** deployed; ckn-bw emits via reactor. Currently dead-but-kept-for-reference.
|
|
||||||
- `deploy/files/etc/{sudoers.d/left4me,sysctl.d/99-left4me.conf,left4me/sandbox-resolv.conf,left4me/cake.env}` — `sudoers.d/left4me` and `sysctl.d/99-left4me.conf` and `left4me/sandbox-resolv.conf` are shipped (verbatim, from ckn-bw's own copies — **still duplicated!**). `cake.env` is dead code.
|
|
||||||
- `deploy/templates/etc/left4me/{host.env,web.env.template}` — Mako-rendered by ckn-bw's `bundles/left4me/files/etc/left4me/{host.env.mako,web.env.mako}` (its own copies, **also duplicated**).
|
|
||||||
- `deploy/deploy-test-server.sh` — superseded one-shot bash installer.
|
|
||||||
- `deploy/tests/test_deploy_artifacts.py` — pytest assertions over the
|
|
||||||
files above. Currently canonical / load-bearing.
|
|
||||||
|
|
||||||
The script consolidation only handled `usr/local/libexec/left4me/*` and
|
|
||||||
`usr/local/sbin/left4me`. The other duplicated items above were not in
|
|
||||||
scope.
|
|
||||||
|
|
||||||
## Open question 1: what does `deploy/` mean?
|
|
||||||
|
|
||||||
Four framings, not mutually exclusive but each implies different next moves:
|
|
||||||
|
|
||||||
- **A. "Files to install onto the target"** — single source of truth for
|
|
||||||
every deployable artifact (scripts, configs, sudoers, sysctl, units,
|
|
||||||
env templates). ckn-bw becomes pure orchestration: users, groups,
|
|
||||||
dirs, apt, venv, install actions reading from deploy/.
|
|
||||||
- **B. "Deploy-mechanism artifacts only"** — installer scripts, runbook
|
|
||||||
docs, env-template *examples*. Real project executables live elsewhere
|
|
||||||
in the repo.
|
|
||||||
- **C. "Reference documentation of deploy decisions"** — historical-flavored.
|
|
||||||
Real source-of-truth lives in ckn-bw. This was the framing before
|
|
||||||
2026-05-15.
|
|
||||||
- **D. "Configuration for the deploy target"** — sudoers, sysctl,
|
|
||||||
sandbox-resolv.conf, env. Executables live elsewhere.
|
|
||||||
|
|
||||||
Today we drifted into **A** for the scripts, **C** lingering for the
|
|
||||||
systemd units, partial-A-partial-C for /etc/ stuff, and we promoted the
|
|
||||||
templates section without changing its actual role. Inconsistent.
|
|
||||||
|
|
||||||
Pick one and lean in.
|
|
||||||
|
|
||||||
## Open question 2: should the scripts live in deploy/ at all?
|
|
||||||
|
|
||||||
Argument for keeping them where they are:
|
|
||||||
- Source path = deploy target. Self-documenting.
|
|
||||||
- Zero churn from the just-landed consolidation.
|
|
||||||
|
|
||||||
Argument for moving them out (top-level `libexec/`, `sbin/`, or `bin/`):
|
|
||||||
- `deploy/` has historically meant "deploy mechanism." Putting 381-line
|
|
||||||
Python code (`left4me-overlay`) there mixes "deploy artifacts" with
|
|
||||||
"core project logic." `left4me-overlay` is real software; it has
|
|
||||||
tests, it gets edited like any other code.
|
|
||||||
- Nesting is deep: `deploy/files/usr/local/libexec/left4me/left4me-overlay`
|
|
||||||
is 5 levels of dir before the actual file.
|
|
||||||
- Shorter paths make Python constants more readable (the test file uses
|
|
||||||
`OVERLAY_HELPER = DEPLOY / "files/usr/local/libexec/left4me/left4me-overlay"`).
|
|
||||||
|
|
||||||
Counter to the move:
|
|
||||||
- The user pushed back on a flat `bin/` because it mixes admin CLI
|
|
||||||
(`left4me`, sbin role) with internal helpers (`left4me-overlay` et al.,
|
|
||||||
libexec role). A two-dir top-level layout (`libexec/` + `sbin/`) avoids
|
|
||||||
that mix at the cost of two top-level dirs.
|
|
||||||
|
|
||||||
Open variants:
|
|
||||||
- Flat top-level `bin/` (mixed roles, simplest)
|
|
||||||
- Top-level `libexec/` + `sbin/` (role-separated, two top-level dirs)
|
|
||||||
- Top-level `scripts/` with `libexec/` and `sbin/` subdirs (one umbrella)
|
|
||||||
- Stay in `deploy/files/usr/local/{libexec,sbin}/` (current)
|
|
||||||
|
|
||||||
## Open question 3: what to do with `deploy-test-server.sh`
|
|
||||||
|
|
||||||
The script duplicates ckn-bw's install logic in bash form. ckn-bw is
|
|
||||||
authoritative now; the script is at best stale documentation, at worst
|
|
||||||
actively misleading (the user almost-but-didn't run it against an ovh.left4me
|
|
||||||
node during one of the recent debugging passes).
|
|
||||||
|
|
||||||
Options:
|
|
||||||
- **Delete entirely.** ckn-bw is the deploy. Script's content survives
|
|
||||||
in git history if anyone wants to reference it.
|
|
||||||
- **Relocate to `docs/`** as a readable "what does deploy do?" walkthrough.
|
|
||||||
Drop the executable bit, mark it explicitly as docs-only.
|
|
||||||
- **Keep as-is.** README already says superseded; one extra warning in
|
|
||||||
the script header would suffice. Lowest churn, ongoing rot risk.
|
|
||||||
|
|
||||||
If we go with the consolidation direction (everything canonical in
|
|
||||||
left4me), keeping a `deploy-test-server.sh` that doesn't match the
|
|
||||||
canonical paths becomes a documentation bug. Maintaining it in sync
|
|
||||||
with ckn-bw's items.py is overhead nobody wants.
|
|
||||||
|
|
||||||
## Open question 4: bw responsibilities vs. file installs
|
|
||||||
|
|
||||||
Today's split:
|
|
||||||
|
|
||||||
- **bw owns:** users, groups, dirs, env files (Mako-templated with node
|
|
||||||
metadata), sudoers + sysctl + sandbox-resolv.conf (verbatim, **its own
|
|
||||||
copies**), systemd units (reactor-emitted from `metadata.py`), apt
|
|
||||||
packages, venv creation, pip install, alembic, seed-overlays, the
|
|
||||||
install action for privileged scripts.
|
|
||||||
- **left4me owns:** privileged scripts (via the install action reading
|
|
||||||
from `/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/`).
|
|
||||||
|
|
||||||
The split is inconsistent. ckn-bw ships its own copies of:
|
|
||||||
|
|
||||||
- `bundles/left4me/files/etc/sudoers.d/left4me`
|
|
||||||
- `bundles/left4me/files/etc/sysctl.d/99-left4me.conf`
|
|
||||||
- `bundles/left4me/files/etc/left4me/sandbox-resolv.conf`
|
|
||||||
- `bundles/left4me/files/etc/left4me/{host.env.mako,web.env.mako}`
|
|
||||||
|
|
||||||
And **left4me also has copies** of the first three at
|
|
||||||
`deploy/files/etc/{sudoers.d/left4me,sysctl.d/99-left4me.conf,left4me/sandbox-resolv.conf}`.
|
|
||||||
Either ckn-bw's are the source of truth (in which case left4me's are
|
|
||||||
stale/historical), or left4me's are (in which case we should extend the
|
|
||||||
install-from-checkout pattern to these too).
|
|
||||||
|
|
||||||
Mako-templated env files genuinely need bw's metadata access — those
|
|
||||||
probably stay in ckn-bw as the authoritative renderer. But the
|
|
||||||
templates themselves could live in left4me with placeholders that bw
|
|
||||||
substitutes. We're not far from that today.
|
|
||||||
|
|
||||||
The clean version of "left4me canonical" would have:
|
|
||||||
|
|
||||||
- Verbatim files (sudoers, sysctl, sandbox-resolv.conf, scripts) all in
|
|
||||||
`deploy/files/...` in left4me. ckn-bw's bundle files/ directory holds
|
|
||||||
nothing but the Mako env templates (which need bw's metadata).
|
|
||||||
- Sudoers gets `test_with: visudo -cf {}` — currently a property of
|
|
||||||
ckn-bw's files item. To preserve this when the file moves to install-
|
|
||||||
via-action, the action itself would need to run `visudo -cf
|
|
||||||
/opt/left4me/src/deploy/files/etc/sudoers.d/left4me` before the install
|
|
||||||
step. Doable but adds complexity.
|
|
||||||
|
|
||||||
The clean version of "split-by-purpose" would have:
|
|
||||||
|
|
||||||
- Verbatim files stay in ckn-bw (config bundles are bundles' jobs).
|
|
||||||
- Scripts in left4me, exactly as today.
|
|
||||||
- left4me's `deploy/files/etc/` becomes pure reference — and we should
|
|
||||||
either keep it explicitly labeled as such, or delete it to avoid
|
|
||||||
duplication drift.
|
|
||||||
|
|
||||||
Both are coherent. Today we have neither — half-and-half.
|
|
||||||
|
|
||||||
## Open question 5: dead-code cleanup
|
|
||||||
|
|
||||||
These files exist in `deploy/files/` but serve no live purpose:
|
|
||||||
|
|
||||||
- `usr/local/lib/systemd/system/{left4me-cake.service,left4me-nft-mark.service}` — units replaced by ckn-bw's reactor / nftables bundle.
|
|
||||||
- `usr/local/lib/systemd/system/{left4me-server@.service,left4me-web.service,left4me-workshop-refresh.{service,timer},l4d2-game.slice,l4d2-build.slice}` — also reactor-emitted, not installed from these files.
|
|
||||||
- `usr/local/libexec/left4me/left4me-apply-cake` — dead since CAKE moved to networkd. Currently ships via the new install glob (harmless extra file on `/usr/local/libexec/left4me/`).
|
|
||||||
- `usr/local/lib/left4me/nft/left4me-mark.nft` — central nftables bundle replaced this.
|
|
||||||
- `etc/left4me/cake.env` — replaced by node metadata.
|
|
||||||
|
|
||||||
Each one of these is a self-contained delete-when-someone-feels-like-it
|
|
||||||
job. Cumulatively they add up to enough noise that future readers will
|
|
||||||
get confused about what's load-bearing.
|
|
||||||
|
|
||||||
Probably worth a "deploy/ janitorial pass" PR that just deletes the
|
|
||||||
documented-as-obsolete files. Out of scope for whatever architectural
|
|
||||||
shift you commit to, but mention it as adjacent cleanup.
|
|
||||||
|
|
||||||
## Adjacent thing the script consolidation introduced
|
|
||||||
|
|
||||||
The `install_left4me_scripts` action in ckn-bw ships *everything* in
|
|
||||||
`deploy/files/usr/local/libexec/left4me/` to `/usr/local/libexec/left4me/`
|
|
||||||
via `install -t DEST .../left4me/*`. This is what makes the action
|
|
||||||
filename-agnostic. Side effect: `left4me-apply-cake` (dead code) gets
|
|
||||||
installed too. It does nothing on disk because no unit references it.
|
|
||||||
Three escape hatches:
|
|
||||||
|
|
||||||
- Delete the file from `deploy/files/...` (clean — kills dead code).
|
|
||||||
- Move the file out of the install path (e.g. to `docs/historical/`).
|
|
||||||
- Filter the glob (introduces a named exclusion; user explicitly didn't
|
|
||||||
want filename-naming in the action).
|
|
||||||
|
|
||||||
If the broader "open question 5" cleanup happens, this resolves itself.
|
|
||||||
|
|
||||||
## Recommended structure for the followup session
|
|
||||||
|
|
||||||
When picking this up:
|
|
||||||
|
|
||||||
1. Read `deploy/README.md` (current shape) and this doc.
|
|
||||||
2. Pick a position on **open question 1**: what does `deploy/` mean?
|
|
||||||
The answer constrains everything else.
|
|
||||||
3. Once 1 is settled, **open questions 2 and 4 fall out**: where do
|
|
||||||
scripts live, where do config files live.
|
|
||||||
4. **Open question 3** (`deploy-test-server.sh` fate) is independent of
|
|
||||||
the others and can be decided in isolation.
|
|
||||||
5. **Open question 5** (dead-code cleanup) is independent too;
|
|
||||||
probably worth doing alongside whatever else lands.
|
|
||||||
6. End state should be: the rules for "what goes in deploy/" can be
|
|
||||||
written in two sentences. Today they take a paragraph plus
|
|
||||||
exceptions.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Current `deploy/README.md` has the current canonical/historical split.
|
|
||||||
- ckn-bw's bundle: `git.sublimity.de/cronekorkn/ckn-bw`,
|
|
||||||
`bundles/left4me/items.py`. The `install_left4me_scripts` action and
|
|
||||||
the files dict are the relevant entry points.
|
|
||||||
- Plan that landed the recent change:
|
|
||||||
`docs/superpowers/plans/2026-05-14-overlay-idmap.md` (idmap helper) and
|
|
||||||
the ~/.claude/plans scratch file for the script consolidation.
|
|
||||||
- Recent commit history that touched this surface:
|
|
||||||
- `f5e36ee` deploy: claim /usr/local/sbin/left4me admin CLI in deploy/files
|
|
||||||
- `2f6a9cf` + `9053186` left4me-overlay idmap support
|
|
||||||
- ckn-bw `3ccaa91` left4me: install privileged scripts from git_deploy artifact
|
|
||||||
|
|
||||||
## What I don't think is in scope here
|
|
||||||
|
|
||||||
- Rewriting the shell helpers in Python / packaging them as
|
|
||||||
console_scripts. Considered and rejected in the script-consolidation
|
|
||||||
plan because of the egg-info / TOCTOU privilege concern around
|
|
||||||
left4me-uid-writable bin dirs.
|
|
||||||
- Switching to a kernel-overlayfs alternative.
|
|
||||||
- Splitting the gameserver uid from the web app uid. Separate planned
|
|
||||||
change.
|
|
||||||
|
|
@ -1,349 +0,0 @@
|
||||||
# Deployment responsibility — design
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
**Shipped 2026-05-15.** All five migration steps landed and verified on
|
|
||||||
ovh.left4me. Implementation plan:
|
|
||||||
`docs/superpowers/plans/2026-05-15-deployment-responsibility.md`.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
Trace: `2026-05-06-left4me-deployment-design.md` established the original
|
|
||||||
model — left4me's `deploy/files/` mirrors target filesystem paths;
|
|
||||||
ckn-bw integrates. The hardening refactor
|
|
||||||
(`2026-05-15-hardening-refactor-design.md`) landed *inline-in-reactor*
|
|
||||||
as an explicit tradeoff and queued the responsibility question for this
|
|
||||||
brainstorm (handoff: `2026-05-15-handoff-deployment-responsibility.md`).
|
|
||||||
The runtime-state relocation
|
|
||||||
(`2026-05-15-runtime-state-relocation-design.md`) made
|
|
||||||
`/opt/left4me/src` root-owned, which is the prerequisite that makes
|
|
||||||
target-side symlinks into the checkout safe — left4me cannot rewrite
|
|
||||||
its own deployment artifacts at runtime.
|
|
||||||
|
|
||||||
This design picks a narrow, conservative line. Application-shape
|
|
||||||
artifacts that are static across hosts move to left4me's `deploy/`
|
|
||||||
tree and are delivered to the target via **target-side symlinks**.
|
|
||||||
Per-host shape (CPU pinning, gunicorn workers, env file values) stays
|
|
||||||
bw-managed. The base systemd unit bodies stay bw-managed too — they
|
|
||||||
encode per-host values (workers, threads, CPU set) that are awkward to
|
|
||||||
parameterize cleanly, and ckn-bw is already the right place for that
|
|
||||||
computation.
|
|
||||||
|
|
||||||
The wedge between "moves" and "stays" is **threat model knowledge vs.
|
|
||||||
host shape**. The hardening profile is the security knowledge of the
|
|
||||||
application; the base unit body is the operational shape of the host.
|
|
||||||
Different repos.
|
|
||||||
|
|
||||||
## Scope
|
|
||||||
|
|
||||||
### Moves to left4me/deploy/, delivered via target-side symlinks
|
|
||||||
|
|
||||||
| Artifact | Source path | Symlink target |
|
|
||||||
|---|---|---|
|
|
||||||
| Hardening drop-in for `left4me-web` | `deploy/files/etc/systemd/system/left4me-web.service.d/10-hardening.conf` (NEW) | `/etc/systemd/system/left4me-web.service.d/10-hardening.conf` |
|
|
||||||
| Hardening drop-in for `left4me-server@` | `deploy/files/etc/systemd/system/left4me-server@.service.d/10-hardening.conf` (NEW) | same pattern |
|
|
||||||
| Sudoers | `deploy/files/etc/sudoers.d/left4me` (exists) | `/etc/sudoers.d/left4me` |
|
|
||||||
| Sysctl drop-in (absorbs `ptrace_scope`) | `deploy/files/etc/sysctl.d/99-left4me.conf` (exists; one line added) | `/etc/sysctl.d/99-left4me.conf` |
|
|
||||||
| Privileged helpers (`libexec/`) | `deploy/scripts/libexec/*` (relocated from `scripts/libexec/`) | `/usr/local/libexec/left4me/<name>` |
|
|
||||||
| Privileged helpers (`sbin/`) | `deploy/scripts/sbin/*` (relocated from `scripts/sbin/`) | `/usr/local/sbin/<name>` |
|
|
||||||
|
|
||||||
All symlinks are created by bw `symlinks{}` items in
|
|
||||||
`bundles/left4me/items.py`. `git_deploy:/opt/left4me/src` triggers
|
|
||||||
`systemctl daemon-reload` (for unit drop-ins) and `sysctl --system`
|
|
||||||
(for sysctl) so changes to the symlink-target content propagate even
|
|
||||||
though the symlink path itself doesn't change.
|
|
||||||
|
|
||||||
### Stays bw-managed
|
|
||||||
|
|
||||||
- **Base unit bodies** (`left4me-web.service`, `left4me-server@.service`):
|
|
||||||
emitted by the `systemd/units` reactor in
|
|
||||||
`bundles/left4me/metadata.py`. These encode per-host values
|
|
||||||
(gunicorn workers/threads, CPU pinning, instance bind paths). Pulling
|
|
||||||
them into left4me would require either templating or env-var
|
|
||||||
parameterization that doesn't cleanly cover everything (systemd
|
|
||||||
doesn't substitute env vars in non-Exec directives like
|
|
||||||
`SocketBindAllow=`).
|
|
||||||
- **Slice units** (`l4d2-game.slice`, `l4d2-build.slice`) and cpuset
|
|
||||||
drop-ins (`system.slice.d/99-left4me-cpuset.conf`,
|
|
||||||
`user.slice.d/99-left4me-cpuset.conf`): all encode per-host CPU
|
|
||||||
pinning. Reactor stays.
|
|
||||||
- **`host.env.mako`, `web.env.mako`**: per-host secret + scalar
|
|
||||||
templating. Stays.
|
|
||||||
- **`nginx/vhosts`, `nftables/input`, `nftables/output`**: bundle
|
|
||||||
abstractions (letsencrypt auto-population, set-merge) add real value
|
|
||||||
over raw files.
|
|
||||||
- **`systemd-timers/left4me-workshop-refresh`**: same — bundle
|
|
||||||
synthesizes the `.timer` + `.service` from the metadata dict.
|
|
||||||
- **Action chains**: `git_deploy`, `pip_install`, `alembic_upgrade`,
|
|
||||||
`seed_overlays`, `create_venv`, `pip_upgrade`, `install_steamcmd`.
|
|
||||||
Stays.
|
|
||||||
- **`directory`, `user`, `group`** items: must exist before
|
|
||||||
`git_deploy` runs.
|
|
||||||
- **`apt/packages`, `backup/paths`** defaults. Stays.
|
|
||||||
|
|
||||||
### Stays in left4me as reference fixtures (no change)
|
|
||||||
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/*.{service,slice}` —
|
|
||||||
reference units matched against the live form by
|
|
||||||
`deploy/tests/test_deploy_artifacts.py`. Base units stay bw-emitted,
|
|
||||||
so reference-vs-live assertion stays valid. Reference units should
|
|
||||||
**not** include hardening directives once the drop-in extraction
|
|
||||||
lands; the live form's hardening lives in the drop-in, not the base
|
|
||||||
unit.
|
|
||||||
|
|
||||||
## Repo layout (left4me)
|
|
||||||
|
|
||||||
```
|
|
||||||
deploy/
|
|
||||||
files/
|
|
||||||
etc/sudoers.d/left4me
|
|
||||||
etc/sysctl.d/99-left4me.conf
|
|
||||||
etc/systemd/system/left4me-web.service.d/10-hardening.conf # NEW
|
|
||||||
etc/systemd/system/left4me-server@.service.d/10-hardening.conf # NEW
|
|
||||||
etc/left4me/sandbox-resolv.conf # unchanged
|
|
||||||
usr/local/lib/systemd/system/*.{service,slice} # reference (unchanged shape)
|
|
||||||
scripts/ # moves in from scripts/
|
|
||||||
libexec/{left4me-overlay,left4me-systemctl,left4me-journalctl,left4me-script-sandbox}
|
|
||||||
sbin/<wrappers>
|
|
||||||
tests/
|
|
||||||
```
|
|
||||||
|
|
||||||
## Mechanism: target-side symlinks
|
|
||||||
|
|
||||||
bw `symlinks{}` item type. One entry per artifact:
|
|
||||||
|
|
||||||
```python
|
|
||||||
symlinks = {
|
|
||||||
'/etc/sudoers.d/left4me': {
|
|
||||||
'target': '/opt/left4me/src/deploy/files/etc/sudoers.d/left4me',
|
|
||||||
'owner': 'root',
|
|
||||||
'group': 'root',
|
|
||||||
'needs': ['git_deploy:/opt/left4me/src'],
|
|
||||||
},
|
|
||||||
'/etc/sysctl.d/99-left4me.conf': {
|
|
||||||
'target': '/opt/left4me/src/deploy/files/etc/sysctl.d/99-left4me.conf',
|
|
||||||
'owner': 'root',
|
|
||||||
'group': 'root',
|
|
||||||
'needs': ['git_deploy:/opt/left4me/src'],
|
|
||||||
'triggers': ['action:left4me_sysctl_reload'],
|
|
||||||
},
|
|
||||||
'/etc/systemd/system/left4me-web.service.d/10-hardening.conf': {
|
|
||||||
'target': '/opt/left4me/src/deploy/files/etc/systemd/system/left4me-web.service.d/10-hardening.conf',
|
|
||||||
'needs': [
|
|
||||||
'directory:/etc/systemd/system/left4me-web.service.d',
|
|
||||||
'git_deploy:/opt/left4me/src',
|
|
||||||
],
|
|
||||||
'triggers': ['action:systemd_daemon_reload'],
|
|
||||||
},
|
|
||||||
# …same for left4me-server@.service.d/10-hardening.conf
|
|
||||||
# …same for each script in /usr/local/{libexec/left4me,sbin}/
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Drop-in directories (`*.service.d/`) need explicit `directory:` items
|
|
||||||
in `items.py` (the bw systemd bundle does not create them
|
|
||||||
automatically for symlink-only drop-ins). Mode `0755`, owner
|
|
||||||
`root:root`.
|
|
||||||
|
|
||||||
bw fires the symlink's `triggers:` when **the symlink itself
|
|
||||||
changes** (path/target update). It does *not* fire when the symlink's
|
|
||||||
*target content* changes — that's still a `git_deploy:` event. So
|
|
||||||
both wirings are needed: every symlink declares
|
|
||||||
`needs: ['git_deploy:/opt/left4me/src']`, and ckn-bw declares
|
|
||||||
`triggered_by: [git_deploy:/opt/left4me/src]` actions for the global
|
|
||||||
reloads (`daemon-reload`, `sysctl --system`).
|
|
||||||
|
|
||||||
## Per-artifact details
|
|
||||||
|
|
||||||
### Hardening drop-ins
|
|
||||||
|
|
||||||
Extract from `HARDENING_COMMON`, `HARDENING_SERVER`, `HARDENING_WEB`
|
|
||||||
Python dicts in `bundles/left4me/metadata.py` into static `.conf`
|
|
||||||
files:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
# deploy/files/etc/systemd/system/left4me-web.service.d/10-hardening.conf
|
|
||||||
[Service]
|
|
||||||
ProtectProc=invisible
|
|
||||||
ProcSubset=pid
|
|
||||||
ProtectKernelTunables=true
|
|
||||||
…
|
|
||||||
SystemCallArchitectures=native
|
|
||||||
SystemCallFilter=@system-service
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete
|
|
||||||
```
|
|
||||||
|
|
||||||
Per-directive comments documenting *why* each directive is set the
|
|
||||||
way it is (sudo-incompatibility carve-outs for web; i386 amendment
|
|
||||||
and `PrivatePIDs` rationale for server@) should live inline as `#`
|
|
||||||
comments. Today those rationale comments live in the Python source;
|
|
||||||
they need to come along.
|
|
||||||
|
|
||||||
After extraction, the reactor's emitted unit bodies drop the
|
|
||||||
`**HARDENING_WEB` / `**HARDENING_SERVER` splat. The reactor still
|
|
||||||
emits the base unit and is responsible for everything except the
|
|
||||||
hardening profile.
|
|
||||||
|
|
||||||
### Sudoers
|
|
||||||
|
|
||||||
Today: identical content in
|
|
||||||
`left4me/deploy/files/etc/sudoers.d/left4me` and
|
|
||||||
`ckn-bw/bundles/left4me/files/etc/sudoers.d/left4me`. The bw item
|
|
||||||
sources from the ckn-bw copy.
|
|
||||||
|
|
||||||
After: bw `symlinks{}` item; delete the ckn-bw copy and the bw
|
|
||||||
`files{}` entry. The `test_with: 'visudo -cf {}'` semantics don't
|
|
||||||
apply to symlinks; tested instead on commit in left4me CI (a
|
|
||||||
`test_sudoers.py` that runs `visudo -cf` against the live file).
|
|
||||||
|
|
||||||
### Sysctl drop-in + ptrace_scope absorption
|
|
||||||
|
|
||||||
Today: same dual-copy story as sudoers, plus `kernel.yama.ptrace_scope`
|
|
||||||
exists as a metadata default (`sysctl/kernel/yama/ptrace_scope: '2'`)
|
|
||||||
that gets deployed via `bundles/sysctl/` into a separate file.
|
|
||||||
|
|
||||||
After: append `kernel.yama.ptrace_scope = 2` to
|
|
||||||
`deploy/files/etc/sysctl.d/99-left4me.conf`. Delete the metadata
|
|
||||||
entry. Delete the bw `files{}` entry + ckn-bw mirror; replace with
|
|
||||||
symlink. `bundles/sysctl/` no longer renders anything for left4me;
|
|
||||||
all left4me sysctl tuning lives in the one drop-in.
|
|
||||||
|
|
||||||
### Privileged scripts
|
|
||||||
|
|
||||||
Done (Task 4): `deploy/scripts/libexec/`, `deploy/scripts/sbin/` under
|
|
||||||
`deploy/` for layout consistency.
|
|
||||||
`install_left4me_scripts` copy-action replaced by target-side symlinks
|
|
||||||
from `/usr/local/libexec/left4me/` and `/usr/local/sbin/` into the
|
|
||||||
checkout at `/opt/left4me/src/deploy/scripts/{libexec,sbin}/`.
|
|
||||||
|
|
||||||
Sudo follows symlinks. With `/opt/left4me/src` root-owned, the
|
|
||||||
symlink target is root-owned, and sudo's `Cmnd_Alias` path matching
|
|
||||||
sees the original `/usr/local/{libexec,sbin}/<name>` path.
|
|
||||||
|
|
||||||
### Reference units in `deploy/files/`
|
|
||||||
|
|
||||||
No structural change. Remove hardening directives from the reference
|
|
||||||
files in lockstep with extracting them into the drop-ins (otherwise
|
|
||||||
`test_deploy_artifacts.py` sees the reference unit with hardening
|
|
||||||
inline but the live unit without). The reference file then represents
|
|
||||||
"the base unit ckn-bw emits"; the drop-in represents "the hardening
|
|
||||||
profile left4me ships".
|
|
||||||
|
|
||||||
## Migration order
|
|
||||||
|
|
||||||
Each step is an independent landable PR.
|
|
||||||
|
|
||||||
1. **Canary — sysctl consolidation.**
|
|
||||||
- Add `kernel.yama.ptrace_scope = 2` to
|
|
||||||
`deploy/files/etc/sysctl.d/99-left4me.conf`.
|
|
||||||
- Delete `defaults['sysctl']['kernel']['yama']['ptrace_scope']`
|
|
||||||
from `bundles/left4me/metadata.py`.
|
|
||||||
- Delete `bundles/left4me/files/etc/sysctl.d/99-left4me.conf`
|
|
||||||
(the verbatim mirror).
|
|
||||||
- Replace the bw `files{}` entry with a `symlinks{}` entry
|
|
||||||
pointing at the checkout.
|
|
||||||
- Verify: `sysctl kernel.yama.ptrace_scope` reads `2`;
|
|
||||||
`bw apply` idempotent.
|
|
||||||
|
|
||||||
2. **Hardening drop-ins.**
|
|
||||||
- Create `deploy/files/etc/systemd/system/left4me-web.service.d/10-hardening.conf`
|
|
||||||
and `…/left4me-server@.service.d/10-hardening.conf` from the
|
|
||||||
`HARDENING_*` dicts.
|
|
||||||
- Remove `**HARDENING_WEB` / `**HARDENING_SERVER` splats from the
|
|
||||||
reactor; delete the three constants.
|
|
||||||
- Remove hardening directives from the reference units in
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/`.
|
|
||||||
- Add `directory:/etc/systemd/system/left4me-{web,server@}.service.d`
|
|
||||||
items + symlinks for the drop-ins.
|
|
||||||
- Wire `systemctl daemon-reload` to fire on
|
|
||||||
`git_deploy:/opt/left4me/src`.
|
|
||||||
- Verify: `systemctl show -p ProtectSystem,ProtectKernelTunables,PrivateUsers,…
|
|
||||||
left4me-web.service left4me-server@1.service` matches the
|
|
||||||
pre-extraction values (full hardening test plan rerun is the
|
|
||||||
gold standard).
|
|
||||||
|
|
||||||
3. **Sudoers.**
|
|
||||||
- Replace bw `files{}` entry with `symlinks{}`.
|
|
||||||
- Delete `bundles/left4me/files/etc/sudoers.d/left4me`.
|
|
||||||
- Add left4me CI test running `visudo -cf` on the file.
|
|
||||||
- Verify: `sudo -l -U left4me` lists the expected commands;
|
|
||||||
gameserver start via the web app still works.
|
|
||||||
|
|
||||||
4. **Privileged scripts.**
|
|
||||||
- `git mv left4me/scripts left4me/deploy/scripts`.
|
|
||||||
- Update any references (commit hooks, docs).
|
|
||||||
- Replace `actions['install_left4me_scripts']` with `symlinks{}`
|
|
||||||
items, one per script. Drop the action.
|
|
||||||
- Update `git_deploy:` `triggers:` to remove
|
|
||||||
`action:install_left4me_scripts`.
|
|
||||||
- Verify: `sudo /usr/local/libexec/left4me/left4me-overlay status 1`
|
|
||||||
still works; gameserver lifecycle (start/stop) still works.
|
|
||||||
|
|
||||||
5. **Cleanup.**
|
|
||||||
- Prune `gunicorn_workers` / `gunicorn_threads` metadata defaults
|
|
||||||
if they end up referenced only by `web.env.mako` (they do today;
|
|
||||||
keep the metadata, they're real per-host values).
|
|
||||||
- Update `deploy/README.md` to describe the new layout
|
|
||||||
(deploy/files = symlink source-of-truth; deploy/scripts = same
|
|
||||||
for helpers).
|
|
||||||
- Update `bundles/left4me/README.md` to describe the new
|
|
||||||
symlink-based delivery model.
|
|
||||||
|
|
||||||
## Sequence vs. build-overlay-unit refactor
|
|
||||||
|
|
||||||
This design lands **before** the build-overlay-unit refactor
|
|
||||||
(`2026-05-15-build-overlay-unit-design.md`). Reasons:
|
|
||||||
|
|
||||||
- build-overlay-unit introduces a dispatcher unit template; its
|
|
||||||
hardening profile should live as a drop-in alongside the dispatcher
|
|
||||||
from the start, using the pattern this design establishes.
|
|
||||||
- The reactor surgery in step 2 (removing `HARDENING_*` splats) is
|
|
||||||
cleaner against today's reactor than against a reactor that's also
|
|
||||||
being reshaped for the build-overlay-unit work.
|
|
||||||
|
|
||||||
## Verification (end-to-end)
|
|
||||||
|
|
||||||
After all five steps land and `bw apply` is idempotent on
|
|
||||||
`ovh.left4me`:
|
|
||||||
|
|
||||||
1. `systemctl show -p ProtectSystem,PrivateUsers,SystemCallFilter,…
|
|
||||||
left4me-web.service left4me-server@1.service` matches the
|
|
||||||
hardening test plan's reference values (run the relevant tests
|
|
||||||
from `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`).
|
|
||||||
2. `sysctl kernel.yama.ptrace_scope net.core.rmem_max
|
|
||||||
net.ipv4.tcp_congestion_control` returns expected values.
|
|
||||||
3. `sudo -l -U left4me` reports the same allowed commands as before.
|
|
||||||
4. `ls -la /etc/sudoers.d/left4me /etc/sysctl.d/99-left4me.conf
|
|
||||||
/etc/systemd/system/left4me-*.service.d/10-hardening.conf
|
|
||||||
/usr/local/libexec/left4me/* /usr/local/sbin/<wrappers>` shows
|
|
||||||
symlinks into `/opt/left4me/src/deploy/...`.
|
|
||||||
5. A gameserver round-trip (start via web app → cvar inspect → stop)
|
|
||||||
succeeds.
|
|
||||||
6. `bw verify ovh.left4me` reports no drift.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- Moving base unit bodies into left4me. Per-host shape stays
|
|
||||||
reactor-emitted.
|
|
||||||
- AppArmor profiles (deferred from the defenses survey).
|
|
||||||
- Reshaping the bw `files{}` items for `host.env.mako` /
|
|
||||||
`web.env.mako` — they need mako templating with metadata context,
|
|
||||||
which ckn-bw is the right place for.
|
|
||||||
- The build-overlay-unit refactor itself. Lands separately on top of
|
|
||||||
this.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Handoff (this brainstorm's framing):
|
|
||||||
`docs/superpowers/specs/2026-05-15-handoff-deployment-responsibility.md`
|
|
||||||
- Prereq (runtime state relocation + non-editable install, shipped):
|
|
||||||
`docs/superpowers/specs/2026-05-15-runtime-state-relocation-design.md`
|
|
||||||
- Original deployment design (the model being reaffirmed for
|
|
||||||
application-shape artifacts):
|
|
||||||
`docs/superpowers/specs/2026-05-06-left4me-deployment-design.md`
|
|
||||||
- Hardening refactor design (the inline-in-reactor approach this
|
|
||||||
design supersedes for hardening):
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`
|
|
||||||
- Hardening test plan (reference for step-2 verification):
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
|
||||||
- ckn-bw left4me bundle: `~/Projekte/ckn-bw/bundles/left4me/`
|
|
||||||
|
|
@ -1,244 +0,0 @@
|
||||||
# Handoff — brainstorm deployment responsibility (left4me vs. ckn-bw)
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
**Resolved 2026-05-15** — the brainstorming session happened and produced
|
|
||||||
`docs/superpowers/specs/2026-05-15-deployment-responsibility-design.md`.
|
|
||||||
Read that for the answer. The runtime-state relocation
|
|
||||||
(`2026-05-15-runtime-state-relocation-design.md`) shipped as a prereq;
|
|
||||||
the design lands hardening drop-ins, sudoers, sysctl, and helpers as
|
|
||||||
symlinks into the (now root-owned) `/opt/left4me/src/deploy/...`
|
|
||||||
checkout, while base unit bodies and per-host shape stay bw-managed.
|
|
||||||
|
|
||||||
This doc is kept as the historical framing — the question that opened
|
|
||||||
the brainstorm, the operator's leaning, and the candidate options that
|
|
||||||
got evaluated. The actual landed answer is the design doc.
|
|
||||||
|
|
||||||
## The question
|
|
||||||
|
|
||||||
How should left4me and ckn-bw split responsibility for the host's
|
|
||||||
deployment?
|
|
||||||
|
|
||||||
**Not a fresh question.** The original deployment design at
|
|
||||||
`docs/superpowers/specs/2026-05-06-left4me-deployment-design.md`
|
|
||||||
already laid out the canonical shape: `deploy/files/` in the left4me
|
|
||||||
repo mirrors target filesystem paths for root-owned deployment
|
|
||||||
artifacts (systemd units, sudoers, helpers, env templates);
|
|
||||||
"production config management can own both env files directly"
|
|
||||||
(line 91). The implicit model: **left4me defines the deployment
|
|
||||||
artifacts; ckn-bw integrates them onto the host.** That spec also
|
|
||||||
defined a self-contained `deploy/deploy-test-server.sh` so the
|
|
||||||
deployment could be exercised without ckn-bw at all.
|
|
||||||
|
|
||||||
Over time, more and more of those artifacts migrated *into* ckn-bw's
|
|
||||||
`bundles/left4me/` — specifically:
|
|
||||||
- systemd unit definitions are now emitted by the
|
|
||||||
`systemd/units` reactor in `~/Projekte/ckn-bw/bundles/left4me/metadata.py`
|
|
||||||
(the hardening refactor we just landed reinforced this).
|
|
||||||
- sysctl options ended up in ckn-bw `bundles/left4me/metadata.py`
|
|
||||||
`defaults` (just landed too).
|
|
||||||
- sudoers exists in *both* repos (left4me `deploy/files/.../sudoers.d/left4me`
|
|
||||||
+ ckn-bw verbatim mirror).
|
|
||||||
- Privileged helpers moved BACK to left4me as part of deploy-dir-rethink
|
|
||||||
(commit `5284e28`) — `scripts/{libexec,sbin}/`. Pattern works:
|
|
||||||
left4me defines, ckn-bw deploys via `install_left4me_scripts`.
|
|
||||||
|
|
||||||
So the trajectory has been mixed: helpers re-converged on left4me
|
|
||||||
(good, matches 2026-05-06); systemd units + sysctl drifted into
|
|
||||||
ckn-bw (away from 2026-05-06). The brainstorm reconciles this.
|
|
||||||
|
|
||||||
**The question**: should we return to the 2026-05-06 model
|
|
||||||
end-to-end — every deployment artifact lives in left4me's
|
|
||||||
`deploy/files/`, ckn-bw becomes a thin integrator — or is the
|
|
||||||
current mixed shape the right answer for some artifact classes?
|
|
||||||
|
|
||||||
## Operator's leaning
|
|
||||||
|
|
||||||
Security-related artifacts belong **in the left4me repo**, owned by
|
|
||||||
the project; ckn-bw is responsible for **integrating** them into the
|
|
||||||
host (deploying them to the right paths, restarting affected units,
|
|
||||||
etc.) but doesn't *author* them.
|
|
||||||
|
|
||||||
Concretely the operator's preference (from session
|
|
||||||
2026-05-15): "security-related stuff should be bundled in this repo
|
|
||||||
and ckn-bw is responsible for integrating it into the server."
|
|
||||||
|
|
||||||
## Why we're doing this
|
|
||||||
|
|
||||||
Background from the hardening-refactor session
|
|
||||||
(`docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`,
|
|
||||||
"Approach" section). We considered two shapes for the hardening
|
|
||||||
landing:
|
|
||||||
|
|
||||||
- **A** — hardening directives inline in ckn-bw's `systemd/units`
|
|
||||||
reactor (the path we took)
|
|
||||||
- **B** — hardening as drop-in `.conf` files living in left4me's
|
|
||||||
`deploy/files/etc/systemd/system/<unit>.d/`, ckn-bw deploys them
|
|
||||||
(consistent with 2026-05-06's `deploy/files/` model)
|
|
||||||
|
|
||||||
We picked A for the hardening refactor because B implied a broader
|
|
||||||
configmgmt responsibility reshape that deserved its own session.
|
|
||||||
That session is this one.
|
|
||||||
|
|
||||||
The motivating arguments for B (this brainstorming session evaluates
|
|
||||||
them seriously):
|
|
||||||
|
|
||||||
1. **Hardening is application knowledge.** Knowing srcds is i386,
|
|
||||||
that `MemoryDenyWriteExecute=true` breaks Source's text
|
|
||||||
relocations, that web's sudo path is incompatible with
|
|
||||||
`PrivateUsers=true` — all of this is left4me's domain, not
|
|
||||||
ckn-bw's. ckn-bw shouldn't need to understand the threat model.
|
|
||||||
2. **Test-artifact = production-artifact.** The Test 7 drop-in from
|
|
||||||
the hardening test plan literally is the file we'd want
|
|
||||||
deployed. With B, there's no translation step.
|
|
||||||
3. **Repo self-containment for security review.** A reviewer of
|
|
||||||
left4me sees the threat model in code form without needing to
|
|
||||||
read the configmgmt repo.
|
|
||||||
4. **Easier coordination with the `build-overlay-unit` refactor**
|
|
||||||
(queued). That unit's hardening profile can ship in its own
|
|
||||||
drop-in inline with the unit template.
|
|
||||||
|
|
||||||
The counter-argument:
|
|
||||||
|
|
||||||
- **Coupling cost.** A change to a directive may require redeploying
|
|
||||||
via ckn-bw, which means a cross-repo coordination cycle (edit
|
|
||||||
left4me → commit → push → ckn-bw `bw apply`). Today the same is
|
|
||||||
true (edit ckn-bw → push → apply); just the *which* repo changes.
|
|
||||||
|
|
||||||
## What "security-related" likely means
|
|
||||||
|
|
||||||
Enumerate during the brainstorm. Initial candidates:
|
|
||||||
|
|
||||||
- **systemd unit hardening directives** — currently in
|
|
||||||
ckn-bw `bundles/left4me/metadata.py` `HARDENING_COMMON` /
|
|
||||||
`HARDENING_SERVER` / `HARDENING_WEB`. Strong candidate for left4me.
|
|
||||||
- **sysctl drop-ins** — currently `kernel.yama.ptrace_scope=2` in
|
|
||||||
ckn-bw's left4me bundle `defaults` (`sysctl/kernel/yama/ptrace_scope`).
|
|
||||||
Strong candidate for left4me.
|
|
||||||
- **sudoers** — already in `left4me/deploy/files/etc/sudoers.d/left4me`
|
|
||||||
+ a verbatim mirror in `ckn-bw/bundles/left4me/files/etc/sudoers.d/left4me`.
|
|
||||||
Already mostly left4me-owned; redundancy worth resolving.
|
|
||||||
- **Privileged helper scripts** — already in `left4me/scripts/{libexec,sbin}/`,
|
|
||||||
ckn-bw deploys them via `install_left4me_scripts`. Already
|
|
||||||
left4me-owned. The pattern works.
|
|
||||||
- **systemd unit BASE definitions** (`User=`, `ExecStart=`, `Restart=`,
|
|
||||||
resource limits) — currently in ckn-bw's reactor. **Open question:**
|
|
||||||
is this application knowledge or infrastructure knowledge? They
|
|
||||||
depend on the application's binary paths, env files, restart
|
|
||||||
semantics — all application knowledge. Probably also belongs to
|
|
||||||
left4me.
|
|
||||||
- **AppArmor profiles** (if we add them later — deferred from the
|
|
||||||
defenses survey). Application knowledge.
|
|
||||||
- **`/etc/left4me/host.env` / `web.env` templating** — ckn-bw owns
|
|
||||||
these today because they're templated via mako from node metadata
|
|
||||||
(per-host overrides). Probably stays in ckn-bw.
|
|
||||||
- **User/group creation** — kernel-side infrastructure, no
|
|
||||||
application knowledge needed. Stays in ckn-bw.
|
|
||||||
- **Package installation** (apt). Stays in ckn-bw.
|
|
||||||
- **Firewall rules** — depend on per-instance port ranges
|
|
||||||
(`LEFT4ME_PORT_RANGE_*`); could be either. Worth discussing.
|
|
||||||
- **Nginx vhost** — same: depends on app-specific routes.
|
|
||||||
|
|
||||||
## Mechanism: how does ckn-bw "integrate"?
|
|
||||||
|
|
||||||
Brainstorm the deploy mechanism. Candidates (already partially
|
|
||||||
sketched in the hardening-refactor design doc's earlier draft, before
|
|
||||||
it was reverted to the inline-in-reactor approach):
|
|
||||||
|
|
||||||
- **Symlinks.** ckn-bw creates symlinks like
|
|
||||||
`/etc/systemd/system/left4me-server@.service.d/10-hardening.conf`
|
|
||||||
→ `/opt/left4me/src/deploy/files/etc/systemd/system/.../10-hardening.conf`.
|
|
||||||
Editing the file in the repo + `systemctl daemon-reload` picks it
|
|
||||||
up. Cleanest for "ckn-bw doesn't author."
|
|
||||||
- **File copy via `files` entries.** ckn-bw `files = {...}` reads
|
|
||||||
from `/opt/left4me/src/deploy/files/...` (post-git_deploy) and
|
|
||||||
copies to the target. Standard idiom. Two-place state.
|
|
||||||
- **Glob-walker action.** A small ckn-bw action walks `deploy/files/`
|
|
||||||
tree and mirrors paths to root.
|
|
||||||
- **Bundle inclusion / left4me-as-bundle.** Left4me's `deploy/`
|
|
||||||
becomes its own bundlewrap bundle that ckn-bw imports. Strongest
|
|
||||||
decoupling; requires bundlewrap bundle conventions.
|
|
||||||
|
|
||||||
Each has different implications for: triggers (which units restart
|
|
||||||
when which files change), drift detection, rollback semantics.
|
|
||||||
|
|
||||||
## Migration / coexistence path
|
|
||||||
|
|
||||||
Brainstorm: how do we get from the current state to the new state
|
|
||||||
without breaking things?
|
|
||||||
|
|
||||||
- Inventory: every artifact ckn-bw currently emits/ships for left4me
|
|
||||||
(the `systemd/units` reactor entries, sysctl defaults, sudoers
|
|
||||||
mirror, file deploy actions, etc.).
|
|
||||||
- For each: stays, moves, or split (some in each).
|
|
||||||
- Mechanism rollout: pick one (symlinks vs. file copy vs. ...) and
|
|
||||||
apply it consistently.
|
|
||||||
- Test-driven: pick one artifact as the canary (probably the sysctl
|
|
||||||
drop-in — smallest), validate the mechanism end-to-end, then
|
|
||||||
migrate the others.
|
|
||||||
|
|
||||||
## Key sub-questions for the brainstorm
|
|
||||||
|
|
||||||
1. **Is the unit's BASE definition application knowledge?** If yes,
|
|
||||||
ckn-bw's `systemd/units` reactor shrinks dramatically — to maybe
|
|
||||||
one line per unit ("ckn-bw, deploy this file as a unit"). If no,
|
|
||||||
we have a more delicate split.
|
|
||||||
2. **What about the user/group definitions?** Infrastructure-side
|
|
||||||
today. But the application defines that `left4me` (uid 980)
|
|
||||||
exists; ckn-bw just creates it. Could move.
|
|
||||||
3. **Per-host configuration** (gunicorn worker count, port ranges,
|
|
||||||
CPU pinning): these are per-host overrides ckn-bw computes from
|
|
||||||
node metadata. Stays in ckn-bw (or whatever owns deployment-time
|
|
||||||
parameterization).
|
|
||||||
4. **Test infrastructure**: `deploy/tests/test_deploy_artifacts.py`
|
|
||||||
asserts left4me's reference units match the deployed form. If
|
|
||||||
left4me starts owning the deployed form, those tests get
|
|
||||||
stronger (no longer "reference vs. live" drift; the file in
|
|
||||||
`deploy/files/` *is* the live form).
|
|
||||||
5. **Drift / observability**: how do we know the deployed state
|
|
||||||
matches the repo? Today `bw apply` + git diff is the source of
|
|
||||||
truth. Same applies; mechanism details vary.
|
|
||||||
6. **Rollback semantics**: removing a drop-in is one `rm` away; the
|
|
||||||
base unit is preserved. Same applies to reverting the
|
|
||||||
left4me-side commit and re-applying.
|
|
||||||
|
|
||||||
## Prereqs (must land before this brainstorming session)
|
|
||||||
|
|
||||||
- **uid-collapse refactor** — queued in
|
|
||||||
`docs/superpowers/plans/2026-05-15-uid-collapse.md`. Settles the
|
|
||||||
user model first so the deployment-responsibility brainstorm
|
|
||||||
doesn't have to juggle a moving user definition.
|
|
||||||
|
|
||||||
## Out of scope for the brainstorm
|
|
||||||
|
|
||||||
- The hardening composition itself (already settled, deployed,
|
|
||||||
verified).
|
|
||||||
- The `build-overlay-unit` template unit refactor
|
|
||||||
(`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`)
|
|
||||||
— both this brainstorm *and* the build-overlay-unit refactor
|
|
||||||
benefit from settling responsibility first. Sequencing TBD; the
|
|
||||||
brainstorm should consider whether to land before or after
|
|
||||||
build-overlay-unit.
|
|
||||||
- The application code itself (`l4d2web`, `l4d2host`) — that's
|
|
||||||
always been left4me-owned.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- **Original deployment design (the model to revisit):**
|
|
||||||
`docs/superpowers/specs/2026-05-06-left4me-deployment-design.md`
|
|
||||||
- Hardening refactor design (motivation; the deferred reshape):
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-refactor-design.md`
|
|
||||||
- Hardening refactor plan (what got landed):
|
|
||||||
`docs/superpowers/plans/2026-05-15-hardening-refactor.md`
|
|
||||||
- Defenses survey (mentions AppArmor, deferred):
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
|
|
||||||
- Test plan + executed results:
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
|
||||||
- uid-collapse plan (prereq):
|
|
||||||
`docs/superpowers/plans/2026-05-15-uid-collapse.md`
|
|
||||||
- deploy-dir-rethink (recent reshape that moved scripts into left4me;
|
|
||||||
background on the current `deploy/` tree):
|
|
||||||
`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md` (or
|
|
||||||
`2026-05-15-deploy-dir-rethink-design.md`)
|
|
||||||
- Live ckn-bw bundle (the thing being rethought):
|
|
||||||
`~/Projekte/ckn-bw/bundles/left4me/`
|
|
||||||
|
|
@ -1,285 +0,0 @@
|
||||||
# Handoff — non-editable install + root-owned `/opt/left4me/src`
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
**Superseded 2026-05-15** by what actually shipped — see
|
|
||||||
`docs/superpowers/specs/2026-05-15-runtime-state-relocation-design.md`.
|
|
||||||
The narrow approach proposed here (just flip `/opt/left4me/src` to
|
|
||||||
root, switch `pip install -e` → `pip install`) doesn't work as
|
|
||||||
described: `setuptools.build_meta` writes `<pkg>.egg-info/` into the
|
|
||||||
source dir during `get_requires_for_build_wheel`, which fails against
|
|
||||||
a root-owned source. The shipped fix copies source to a writable
|
|
||||||
tempdir before building, and (since that one-shot copy was needed
|
|
||||||
anyway) also relocates `.venv` + `steam` to `/var/lib/left4me/`.
|
|
||||||
|
|
||||||
The original prereq goal — making target-side symlinks of deployment
|
|
||||||
artifacts safe — is still met; the realized shape is just bigger than
|
|
||||||
this doc sketched.
|
|
||||||
|
|
||||||
This doc is kept as the historical record of the originally-proposed
|
|
||||||
approach and why it didn't work.
|
|
||||||
|
|
||||||
## The task
|
|
||||||
|
|
||||||
Change ckn-bw's `bundles/left4me/` so that:
|
|
||||||
|
|
||||||
1. The production install uses **non-editable** pip installs
|
|
||||||
(`pip install /opt/left4me/src/l4d2host /opt/left4me/src/l4d2web`),
|
|
||||||
not `pip install -e …`.
|
|
||||||
2. `/opt/left4me/src/` is **owned by root:root**, not left4me:left4me.
|
|
||||||
3. The `left4me_chown_src` action and the `/opt/left4me/src` directory
|
|
||||||
item's `owner`/`group` flip accordingly.
|
|
||||||
4. The pip-install action moves from "runs every apply" to "triggered
|
|
||||||
by `git_deploy:/opt/left4me/src`" — non-editable installs always
|
|
||||||
rebuild a wheel, so running unconditionally is wasteful.
|
|
||||||
|
|
||||||
Local-development install flows (direnv + `pip install -e ./l4d2host
|
|
||||||
-e ./l4d2web`) are **unchanged**. Editable installs remain correct on
|
|
||||||
developer machines; only the production install model on the host
|
|
||||||
changes.
|
|
||||||
|
|
||||||
## Why
|
|
||||||
|
|
||||||
Two reasons, listed in priority order.
|
|
||||||
|
|
||||||
**Security.** The deployment-responsibility brainstorm wants to make
|
|
||||||
`left4me/deploy/files/` the live source of truth for systemd units,
|
|
||||||
drop-ins, sudoers, sysctl, and helpers, delivered by ckn-bw via
|
|
||||||
target-side symlinks (`/etc/foo` → `/opt/left4me/src/deploy/files/...`).
|
|
||||||
If the symlink target sits inside a left4me-writable directory, the
|
|
||||||
service can rewrite its own hardening drop-in and escape the sandbox
|
|
||||||
on next restart. Making `/opt/left4me/src/` root-owned closes that
|
|
||||||
hole at the filesystem layer, before symlinks ever come into the
|
|
||||||
picture. Defense-in-depth that costs us nothing the production
|
|
||||||
workflow actually used.
|
|
||||||
|
|
||||||
**Operational honesty.** The only reason `/opt/left4me/src/` is
|
|
||||||
user-owned today is that `pip install -e` writes `.egg-info` into the
|
|
||||||
source tree. No production workflow ever edits files under
|
|
||||||
`/opt/left4me/src/` directly — code updates always come through
|
|
||||||
`git_deploy` + `pip_install`. Editable mode buys nothing on the host;
|
|
||||||
non-editable matches what the deploy actually does (rebuild + reinstall
|
|
||||||
wheel from new source).
|
|
||||||
|
|
||||||
## What changes — concretely
|
|
||||||
|
|
||||||
All edits are in `~/Projekte/ckn-bw/bundles/left4me/`.
|
|
||||||
|
|
||||||
### `items.py`
|
|
||||||
|
|
||||||
**Directory items** (`items.py:7-42`) — flip `/opt/left4me/src` to root:
|
|
||||||
|
|
||||||
```python
|
|
||||||
directories = {
|
|
||||||
'/opt/left4me': {
|
|
||||||
'owner': 'root',
|
|
||||||
'group': 'root',
|
|
||||||
},
|
|
||||||
'/opt/left4me/src': {
|
|
||||||
'owner': 'root',
|
|
||||||
'group': 'root',
|
|
||||||
# Was left4me:left4me before the non-editable install switch;
|
|
||||||
# production now installs wheels, so the source tree is read-only
|
|
||||||
# at runtime. Keeps left4me from being able to rewrite its own
|
|
||||||
# hardening drop-ins / unit files (see deployment-responsibility
|
|
||||||
# handoff for the full argument).
|
|
||||||
},
|
|
||||||
# /var/lib/left4me/* and /opt/left4me/{steam,.venv} stay left4me:left4me.
|
|
||||||
...
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**`left4me_pip_install` action** (`items.py:247-263`) — drop `-e`,
|
|
||||||
become triggered:
|
|
||||||
|
|
||||||
```python
|
|
||||||
actions['left4me_pip_install'] = {
|
|
||||||
# Non-editable install: builds wheels from the checkout, installs
|
|
||||||
# into the venv's site-packages. Source tree is no longer mutated by
|
|
||||||
# pip, so /opt/left4me/src/ stays root:root with read-only access for
|
|
||||||
# left4me at runtime.
|
|
||||||
'command': 'sudo -u left4me /opt/left4me/.venv/bin/pip install /opt/left4me/src/l4d2host /opt/left4me/src/l4d2web',
|
|
||||||
'triggered': True, # was: ran every apply
|
|
||||||
'cascade_skip': False,
|
|
||||||
'needs': [
|
|
||||||
'git_deploy:/opt/left4me/src',
|
|
||||||
'action:left4me_create_venv',
|
|
||||||
# action:left4me_chown_src removed (deleted below).
|
|
||||||
],
|
|
||||||
'triggers': [
|
|
||||||
'action:left4me_alembic_upgrade',
|
|
||||||
],
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**`left4me_chown_src` action** (`items.py:207-219`) — **delete**.
|
|
||||||
The action exists to repair file ownership after each `git_deploy`
|
|
||||||
extracts the tarball as root and we needed it as left4me. With the new
|
|
||||||
model, root is the target ownership, which is also what `git_deploy`
|
|
||||||
already produces. Action becomes a no-op; remove it.
|
|
||||||
|
|
||||||
**`git_deploy` triggers** (`items.py:157-183`) — ensure
|
|
||||||
`action:left4me_pip_install` is in `triggers`. Currently triggers
|
|
||||||
`left4me_alembic_upgrade` and `install_left4me_scripts`; add
|
|
||||||
`left4me_pip_install` so that a fresh checkout always rebuilds the
|
|
||||||
wheel and reinstalls.
|
|
||||||
|
|
||||||
### `metadata.py`
|
|
||||||
|
|
||||||
No changes. The `systemd/units` reactor's `WorkingDirectory` and
|
|
||||||
timer `working_dir` still point at `/opt/left4me/src` — that path is
|
|
||||||
still readable as left4me regardless of ownership (it's
|
|
||||||
world-readable by default after `git_deploy` extracts as root).
|
|
||||||
|
|
||||||
### `README.md`
|
|
||||||
|
|
||||||
Line 48 mentions `pip install -e`. Update to reflect non-editable
|
|
||||||
production install and add a one-line note that local dev still uses
|
|
||||||
`-e`. Two lines of edits.
|
|
||||||
|
|
||||||
### `l4d2web.egg-info/`, `l4d2host.egg-info/` on the live host
|
|
||||||
|
|
||||||
These directories exist today inside `/opt/left4me/src/l4d2{host,web}/`
|
|
||||||
as a side-effect of editable installs. After the switch they become
|
|
||||||
stale (pip installs a fresh wheel into the venv; the in-source egg-info
|
|
||||||
is unused). Clean-up options:
|
|
||||||
|
|
||||||
- **Leave them**: harmless, ignored by Python. Eventually removed by
|
|
||||||
whoever next refactors the source layout.
|
|
||||||
- **One-shot remove on the live host**: `sudo find /opt/left4me/src
|
|
||||||
-name "*.egg-info" -type d -exec rm -rf {} +`. Cosmetic; do whatever.
|
|
||||||
|
|
||||||
Either's fine. Document the choice in the commit message.
|
|
||||||
|
|
||||||
## What does NOT change
|
|
||||||
|
|
||||||
- **`l4d2host/` and `l4d2web/` `pyproject.toml`** — both already declare
|
|
||||||
`[build-system] requires = ["setuptools>=68", "wheel"]` and use the
|
|
||||||
flat `package-dir = {l4d2host = "."}` layout. Non-editable install
|
|
||||||
works out of the box; no packaging edits needed.
|
|
||||||
- **`alembic.ini` + migrations** — alembic reads
|
|
||||||
`/opt/left4me/src/l4d2web/alembic/versions/*.py` at runtime. Root
|
|
||||||
ownership + world-readable means left4me can still read; no change.
|
|
||||||
- **`examples/script-overlays/`** — same; read-only access by left4me
|
|
||||||
at seed time.
|
|
||||||
- **`/opt/left4me/.venv/`** — stays left4me:left4me (pip writes here
|
|
||||||
during the install action, run as left4me via sudo).
|
|
||||||
- **`/opt/left4me/steam/`** — stays left4me:left4me (steamcmd
|
|
||||||
self-updates).
|
|
||||||
- **`/var/lib/left4me/`** and all subdirs — stays left4me:left4me
|
|
||||||
(application runtime state).
|
|
||||||
- **Local-dev install instructions** in `README.md`, `AGENTS.md`,
|
|
||||||
`l4d2web/README.md` — keep `-e`. Developer machines need editable.
|
|
||||||
- **`install_left4me_scripts` action** — already copies from src as
|
|
||||||
root, target paths under `/usr/local/{libexec,sbin}/`. Source can be
|
|
||||||
root-owned now (no change in behavior).
|
|
||||||
- **Hardening composition + every deployed unit / drop-in / sudoers /
|
|
||||||
sysctl file** — out of scope for this change. Those move in the
|
|
||||||
deployment-responsibility brainstorm, after this lands.
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
Run on left4.me (the production host) after `bw apply`:
|
|
||||||
|
|
||||||
1. **Source ownership**:
|
|
||||||
```
|
|
||||||
stat -c '%U:%G %a %n' /opt/left4me/src /opt/left4me/.venv /opt/left4me/steam /var/lib/left4me
|
|
||||||
```
|
|
||||||
Expected: `/opt/left4me/src` → `root:root`; `.venv` and `steam` and
|
|
||||||
`/var/lib/left4me` → `left4me:left4me`.
|
|
||||||
|
|
||||||
2. **Wheel installed, not editable**:
|
|
||||||
```
|
|
||||||
sudo -u left4me /opt/left4me/.venv/bin/pip show l4d2web l4d2host
|
|
||||||
```
|
|
||||||
Expected: `Location:` points inside
|
|
||||||
`/opt/left4me/.venv/lib/python*/site-packages/`, NOT inside
|
|
||||||
`/opt/left4me/src/`. (Editable installs report the source path as
|
|
||||||
`Location:`; non-editable reports site-packages.)
|
|
||||||
|
|
||||||
3. **App runs**:
|
|
||||||
```
|
|
||||||
systemctl status left4me-web.service
|
|
||||||
```
|
|
||||||
Active, recent logs clean.
|
|
||||||
|
|
||||||
4. **Alembic can still read migrations**:
|
|
||||||
```
|
|
||||||
sudo -u left4me sh -c 'cd /opt/left4me/src/l4d2web && /opt/left4me/.venv/bin/alembic current'
|
|
||||||
```
|
|
||||||
Returns the current head without errors.
|
|
||||||
|
|
||||||
5. **A gameserver starts**:
|
|
||||||
```
|
|
||||||
sudo /usr/local/libexec/left4me/left4me-systemctl start left4me-server@test
|
|
||||||
journalctl -u left4me-server@test -n 50
|
|
||||||
```
|
|
||||||
srcds_run starts cleanly. Stop it after verification.
|
|
||||||
|
|
||||||
6. **Idempotent `bw apply`**:
|
|
||||||
Run `bw apply left4.me` a second time. Should report zero changes —
|
|
||||||
no chown action drifting back, no pip install re-firing.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- **The deployment-responsibility reshape itself.** That brainstorm
|
|
||||||
resumes after this prereq lands on left4.me. Do not touch
|
|
||||||
`deploy/files/`, hardening drop-ins, sudoers location, etc. — those
|
|
||||||
are the *next* session's work.
|
|
||||||
- **Removing the `bundles/left4me/files/etc/{sudoers.d,sysctl.d}/`
|
|
||||||
verbatim mirrors.** Same; that's the deployment-responsibility
|
|
||||||
session.
|
|
||||||
- **Moving `scripts/{libexec,sbin}/` into `deploy/scripts/`.** Same.
|
|
||||||
- **Reviewing whether the editable install pattern should change for
|
|
||||||
developer machines.** It should not — local dev wants editable for
|
|
||||||
fast iteration; only the host install model changes.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- **Deployment-responsibility brainstorm handoff** (the parent
|
|
||||||
context): `docs/superpowers/specs/2026-05-15-handoff-deployment-responsibility.md`
|
|
||||||
- **ckn-bw left4me bundle**:
|
|
||||||
`~/Projekte/ckn-bw/bundles/left4me/` —
|
|
||||||
- `items.py:7-42` (directories)
|
|
||||||
- `items.py:157-183` (git_deploy)
|
|
||||||
- `items.py:207-219` (left4me_chown_src — delete)
|
|
||||||
- `items.py:247-263` (left4me_pip_install)
|
|
||||||
- `README.md:48` (docs update)
|
|
||||||
- **pyproject.toml layouts**:
|
|
||||||
`l4d2host/pyproject.toml`, `l4d2web/pyproject.toml`. Flat
|
|
||||||
`package-dir = {<pkg> = "."}` layout. Non-editable wheel build works
|
|
||||||
with this layout without further changes.
|
|
||||||
- **Hardening test plan** (motivates the security argument):
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
|
||||||
- **Original deployment design** (the shape we're working toward):
|
|
||||||
`docs/superpowers/specs/2026-05-06-left4me-deployment-design.md`
|
|
||||||
|
|
||||||
## Commit messages (suggested)
|
|
||||||
|
|
||||||
ckn-bw side (the actual change):
|
|
||||||
|
|
||||||
```
|
|
||||||
refactor(left4me): non-editable install + root-owned /opt/left4me/src
|
|
||||||
|
|
||||||
Drop `pip install -e` for the production install; switch to wheel
|
|
||||||
install (`pip install /opt/left4me/src/l4d2{host,web}`). Source tree no
|
|
||||||
longer needs to be writable by left4me, so flip /opt/left4me/src to
|
|
||||||
root:root and delete the left4me_chown_src action.
|
|
||||||
|
|
||||||
Prereq for the deployment-responsibility reshape: makes target-side
|
|
||||||
symlinks from /etc/... into /opt/left4me/src/deploy/files/... safe by
|
|
||||||
construction (left4me cannot rewrite its own hardening profile).
|
|
||||||
|
|
||||||
Verified on left4.me: bw apply idempotent; pip show reports
|
|
||||||
site-packages location; web + gameserver units run clean.
|
|
||||||
```
|
|
||||||
|
|
||||||
left4me side (this handoff doc):
|
|
||||||
|
|
||||||
```
|
|
||||||
spec(noneditable-install): handoff for the install refactor prereq
|
|
||||||
|
|
||||||
Self-contained spec for the next agent to land the editable→
|
|
||||||
non-editable install switch and the root-ownership flip on
|
|
||||||
/opt/left4me/src. Prereq for the deployment-responsibility brainstorm.
|
|
||||||
```
|
|
||||||
|
|
@ -1,467 +0,0 @@
|
||||||
# Handoff — collapse venv chain into uv workspace + `uv sync`
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
**Executed (left4me side) — see
|
|
||||||
`docs/superpowers/plans/2026-05-15-uv-workspace-execution.md` for what
|
|
||||||
actually shipped and what diverged from the assumptions below.** Three
|
|
||||||
load-bearing assumptions in this doc turned out to be wrong (no
|
|
||||||
`pkg_apt: uv` on Trixie; existing layout incompatible with read-only
|
|
||||||
source builds via setuptools; no `git` on prod). The executed plan
|
|
||||||
records the corrections.
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Replace the current five-action venv chain in `bundles/left4me/items.py`
|
|
||||||
with a single `uv sync --frozen` action driven by a committed
|
|
||||||
`uv.lock` at the left4me repo root. Eliminate the tempdir-copy dance
|
|
||||||
in `pip_install` (8 lines of shell working around setuptools writing
|
|
||||||
`<pkg>.egg-info/` into a root-owned source tree).
|
|
||||||
|
|
||||||
Net change: 5 actions → 3 actions; deterministic deploys via locked
|
|
||||||
dep versions; single command in dev and prod; one new build-time
|
|
||||||
dependency (`uv`) on the host.
|
|
||||||
|
|
||||||
## Why
|
|
||||||
|
|
||||||
Three motivations, listed in priority order.
|
|
||||||
|
|
||||||
**Deterministic prod deploys.** Today's chain installs whatever pip
|
|
||||||
resolves at apply time. A transitive dep getting a CVE-relevant bump
|
|
||||||
between two `bw apply` runs is invisible until it breaks something.
|
|
||||||
`uv sync --frozen` against a committed `uv.lock` makes the installed
|
|
||||||
version set reproducible from git history alone.
|
|
||||||
|
|
||||||
**Lower cognitive cost in `items.py`.** The `pip_install` action is
|
|
||||||
the longest, gnarliest action in the bundle — it does its own
|
|
||||||
tempdir/cleanup-trap/cp-r dance because the obvious `pip install
|
|
||||||
/opt/left4me/src/...` would write egg-info to a root-owned source
|
|
||||||
tree. uv's `sdist-then-wheel-from-tarball` build path makes this
|
|
||||||
problem go away: the source is read-only throughout.
|
|
||||||
|
|
||||||
**Workspace declares what's actually true.** `l4d2web` already imports
|
|
||||||
from `l4d2host` (5 files use `from l4d2host.paths import ...`).
|
|
||||||
Today's setup happens to work because both packages get installed
|
|
||||||
side-by-side via `pip install -e ./l4d2host -e ./l4d2web`, but the
|
|
||||||
dependency relationship is implicit. A uv workspace makes it explicit
|
|
||||||
via `[tool.uv.sources] l4d2host = { workspace = true }`.
|
|
||||||
|
|
||||||
## Current state — the 5-action chain
|
|
||||||
|
|
||||||
(All in `~/Projekte/ckn-bw/bundles/left4me/items.py`, ~lines 285-425.)
|
|
||||||
|
|
||||||
```
|
|
||||||
git_deploy:/opt/left4me/src
|
|
||||||
├── triggers → left4me_pip_install
|
|
||||||
│ ├── needs ← left4me_create_venv (always-on, gated unless)
|
|
||||||
│ │ └── triggers → left4me_pip_upgrade
|
|
||||||
│ └── triggers → left4me_alembic_upgrade
|
|
||||||
│ ├── triggers → left4me_seed_overlays
|
|
||||||
│ └── triggers → svc_systemd:left4me-web.service:restart
|
|
||||||
├── triggers → left4me_alembic_upgrade (belt-and-braces direct trigger)
|
|
||||||
└── triggers → left4me_daemon_reload
|
|
||||||
```
|
|
||||||
|
|
||||||
`left4me_pip_install` body (the part that simplifies):
|
|
||||||
|
|
||||||
```sh
|
|
||||||
sudo -u left4me sh -c '
|
|
||||||
set -e
|
|
||||||
tmpdir=$(mktemp -d -t left4me-build-XXXXXX)
|
|
||||||
trap "rm -rf \"$tmpdir\"" EXIT
|
|
||||||
cp -r /opt/left4me/src/l4d2host /opt/left4me/src/l4d2web "$tmpdir/"
|
|
||||||
/var/lib/left4me/.venv/bin/pip install --force-reinstall "$tmpdir/l4d2host" "$tmpdir/l4d2web"
|
|
||||||
'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Target state — uv workspace + single sync action
|
|
||||||
|
|
||||||
Three actions instead of five:
|
|
||||||
|
|
||||||
```
|
|
||||||
git_deploy:/opt/left4me/src
|
|
||||||
├── triggers → left4me_uv_sync
|
|
||||||
│ └── triggers → left4me_alembic_upgrade
|
|
||||||
│ ├── triggers → left4me_seed_overlays
|
|
||||||
│ └── triggers → svc_systemd:left4me-web.service:restart
|
|
||||||
├── triggers → left4me_alembic_upgrade (belt-and-braces)
|
|
||||||
└── triggers → left4me_daemon_reload
|
|
||||||
```
|
|
||||||
|
|
||||||
`left4me_uv_sync` body:
|
|
||||||
|
|
||||||
```python
|
|
||||||
actions['left4me_uv_sync'] = {
|
|
||||||
'command': (
|
|
||||||
'sudo -u left4me '
|
|
||||||
'env UV_PROJECT_ENVIRONMENT=/var/lib/left4me/.venv '
|
|
||||||
'uv sync --frozen --project /opt/left4me/src'
|
|
||||||
),
|
|
||||||
'triggered': True,
|
|
||||||
'cascade_skip': False,
|
|
||||||
'needs': [
|
|
||||||
'git_deploy:/opt/left4me/src',
|
|
||||||
'pkg_apt:uv',
|
|
||||||
'directory:/var/lib/left4me',
|
|
||||||
'user:left4me',
|
|
||||||
],
|
|
||||||
'triggers': [
|
|
||||||
'action:left4me_alembic_upgrade',
|
|
||||||
],
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
`UV_PROJECT_ENVIRONMENT` redirects uv's default venv path (`<project>/.venv`)
|
|
||||||
to our writable runtime location at `/var/lib/left4me/.venv` (the source
|
|
||||||
at `/opt/left4me/src` is root-owned, so the default would be a permission
|
|
||||||
error).
|
|
||||||
|
|
||||||
`--frozen` requires `uv.lock` to be present and consistent with
|
|
||||||
`pyproject.toml` — refuses to silently update the lockfile during deploy.
|
|
||||||
|
|
||||||
## Empirical spike — do this FIRST
|
|
||||||
|
|
||||||
Before touching anything, verify the architectural assumption that
|
|
||||||
`uv` actually keeps a root-owned source directory pristine during
|
|
||||||
build. ~5 minute test on the live host:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh ckn@left4.me 'sudo apt-get install -y uv'
|
|
||||||
ssh ckn@left4.me '
|
|
||||||
sudo -u left4me sh -c "
|
|
||||||
wheels=\$(mktemp -d)
|
|
||||||
uv build --wheel --sdist /opt/left4me/src/l4d2host --out-dir \$wheels
|
|
||||||
ls \$wheels
|
|
||||||
sudo git -C /opt/left4me/src status --porcelain
|
|
||||||
"
|
|
||||||
'
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: the wheel + sdist exist in the tempdir, AND `git status`
|
|
||||||
reports the source tree clean (no new `l4d2host.egg-info/` directory).
|
|
||||||
|
|
||||||
If the source stays clean, proceed with the full migration.
|
|
||||||
|
|
||||||
If the source picks up `l4d2host.egg-info/` (uv's build invoked
|
|
||||||
setuptools.build_meta directly on the source instead of via the sdist
|
|
||||||
intermediate), fall back to **Medium scope**: keep the tempdir-copy
|
|
||||||
dance but use `uv pip install` in place of `pip install` (1:1 swap,
|
|
||||||
no workspace, smaller change). Update this handoff with the fallback
|
|
||||||
decision.
|
|
||||||
|
|
||||||
## What changes — left4me side
|
|
||||||
|
|
||||||
### New: `/Users/mwiegand/Projekte/left4me/pyproject.toml`
|
|
||||||
|
|
||||||
Workspace root. Short:
|
|
||||||
|
|
||||||
```toml
|
|
||||||
[project]
|
|
||||||
name = "left4me"
|
|
||||||
version = "0.0.0"
|
|
||||||
description = "Workspace root; packaging lives in the members."
|
|
||||||
requires-python = ">=3.13"
|
|
||||||
|
|
||||||
[tool.uv.workspace]
|
|
||||||
members = ["l4d2host", "l4d2web"]
|
|
||||||
|
|
||||||
# Dev-only dependencies (pytest, etc.) for the workspace.
|
|
||||||
[dependency-groups]
|
|
||||||
dev = [
|
|
||||||
"pytest",
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Modified: `l4d2host/pyproject.toml` and `l4d2web/pyproject.toml`
|
|
||||||
|
|
||||||
No real change to declared deps. `l4d2web` adds the workspace cross-dep:
|
|
||||||
|
|
||||||
```toml
|
|
||||||
# l4d2web/pyproject.toml
|
|
||||||
[project]
|
|
||||||
dependencies = [
|
|
||||||
"Flask>=3.0",
|
|
||||||
"SQLAlchemy>=2.0",
|
|
||||||
"alembic>=1.13",
|
|
||||||
"PyYAML>=6.0",
|
|
||||||
"gunicorn>=22.0",
|
|
||||||
"requests>=2.31",
|
|
||||||
"l4d2host", # NEW: declares the import relationship
|
|
||||||
]
|
|
||||||
|
|
||||||
[tool.uv.sources]
|
|
||||||
l4d2host = { workspace = true } # NEW: resolves to the in-workspace member
|
|
||||||
```
|
|
||||||
|
|
||||||
This makes explicit what's already true: `l4d2web/routes/overlay_routes.py`,
|
|
||||||
`l4d2web/services/overlay_creation.py`, and three other files import
|
|
||||||
from `l4d2host.paths`.
|
|
||||||
|
|
||||||
### New: `/Users/mwiegand/Projekte/left4me/uv.lock`
|
|
||||||
|
|
||||||
Generated by `uv lock` at the repo root. Committed to git. Pins every
|
|
||||||
transitive dep version.
|
|
||||||
|
|
||||||
### Modified: `/Users/mwiegand/Projekte/left4me/.envrc`
|
|
||||||
|
|
||||||
Today:
|
|
||||||
```
|
|
||||||
layout python python3.13
|
|
||||||
```
|
|
||||||
|
|
||||||
New (direnv hands off to uv for venv management):
|
|
||||||
```
|
|
||||||
# direnv's stdlib uv helper creates .venv via `uv sync` and activates it.
|
|
||||||
# Equivalent to: uv sync && source .venv/bin/activate
|
|
||||||
use uv
|
|
||||||
```
|
|
||||||
|
|
||||||
If `use uv` isn't available in this direnv version (it's a stdlib
|
|
||||||
function added in direnv 2.34+), fall back to:
|
|
||||||
```
|
|
||||||
uv sync >/dev/null
|
|
||||||
source .venv/bin/activate
|
|
||||||
```
|
|
||||||
|
|
||||||
### Modified: `README.md`, `AGENTS.md`, `l4d2web/README.md`
|
|
||||||
|
|
||||||
Update install instructions from:
|
|
||||||
```
|
|
||||||
pip install -e ./l4d2host -e ./l4d2web pytest
|
|
||||||
```
|
|
||||||
to:
|
|
||||||
```
|
|
||||||
uv sync # creates .venv, installs members editable, installs dev deps
|
|
||||||
```
|
|
||||||
|
|
||||||
One-time prereq for developers: install uv (`brew install uv` on
|
|
||||||
macOS, `apt install uv` on Debian Trixie+, or curl-pipe-sh from
|
|
||||||
astral.sh for older distros).
|
|
||||||
|
|
||||||
### Modified: `.gitignore`
|
|
||||||
|
|
||||||
Probably no change needed. uv's caches default to `~/.cache/uv` (not
|
|
||||||
in-repo). The `.venv` is already ignored.
|
|
||||||
|
|
||||||
## What changes — ckn-bw side
|
|
||||||
|
|
||||||
All edits in `~/Projekte/ckn-bw/bundles/left4me/`.
|
|
||||||
|
|
||||||
### `metadata.py`
|
|
||||||
|
|
||||||
Add `uv` to `apt.packages`:
|
|
||||||
```python
|
|
||||||
'apt': {
|
|
||||||
'packages': {
|
|
||||||
...
|
|
||||||
'uv': {}, # Required by left4me_uv_sync for production install.
|
|
||||||
...
|
|
||||||
},
|
|
||||||
},
|
|
||||||
```
|
|
||||||
|
|
||||||
Drop `python3-pip` if nothing else needs it (uv replaces pip). Keep
|
|
||||||
`python3-venv` if anything else on the host uses `python3 -m venv`; if
|
|
||||||
not, drop it too. `python3` and `python3-dev` stay (uv invokes them).
|
|
||||||
|
|
||||||
### `items.py`
|
|
||||||
|
|
||||||
Delete three actions:
|
|
||||||
- `left4me_create_venv`
|
|
||||||
- `left4me_pip_upgrade`
|
|
||||||
- `left4me_pip_install`
|
|
||||||
|
|
||||||
Add one action: `left4me_uv_sync` (body in the "Target state" section
|
|
||||||
above).
|
|
||||||
|
|
||||||
Update `git_deploy:/opt/left4me/src` triggers:
|
|
||||||
- Remove: `action:left4me_pip_install`
|
|
||||||
- Add: `action:left4me_uv_sync`
|
|
||||||
- Keep: `action:left4me_alembic_upgrade`, `action:left4me_daemon_reload`
|
|
||||||
|
|
||||||
`alembic_upgrade` and `seed_overlays` are unchanged — they invoke the
|
|
||||||
venv's `alembic` and `flask` binaries by absolute path, which `uv sync`
|
|
||||||
ensures exist. Update their `needs:` lists to point at
|
|
||||||
`action:left4me_uv_sync` instead of `action:left4me_pip_install`.
|
|
||||||
|
|
||||||
### `README.md`
|
|
||||||
|
|
||||||
Update the bundle README's deploy-flow description to mention `uv sync`
|
|
||||||
instead of `pip install -e`, matching the new shape.
|
|
||||||
|
|
||||||
## Migration order
|
|
||||||
|
|
||||||
1. **Spike test** (above): confirm uv preserves source cleanliness.
|
|
||||||
If fails, retreat to Medium scope.
|
|
||||||
2. **left4me-side preparation** (independent PR, can land first):
|
|
||||||
- Add root `pyproject.toml`, declare workspace
|
|
||||||
- Add `l4d2host` to `l4d2web`'s deps + workspace source
|
|
||||||
- Run `uv lock`, commit `uv.lock`
|
|
||||||
- Update `.envrc`
|
|
||||||
- Update local-dev docs
|
|
||||||
- Run `uv sync` locally, run `pytest` — all green
|
|
||||||
- Commit + push
|
|
||||||
3. **ckn-bw-side install** (depends on step 2):
|
|
||||||
- Add `pkg_apt: uv` to bundle defaults
|
|
||||||
- Delete the three old actions, add `uv_sync`
|
|
||||||
- Update `git_deploy` triggers and downstream `needs:`
|
|
||||||
- `bw test` clean
|
|
||||||
4. **First apply to ovh.left4me**:
|
|
||||||
- Expect: `pkg_apt: uv` installed, three old actions removed from
|
|
||||||
the graph, new `uv_sync` action fires (because git_deploy fires
|
|
||||||
with the new commit), runs `uv sync --frozen` against the new
|
|
||||||
workspace, alembic_upgrade + seed_overlays + web restart cascade.
|
|
||||||
- The existing `/var/lib/left4me/.venv` (created by
|
|
||||||
`python3 -m venv`) is structurally a uv-compatible venv; uv
|
|
||||||
should adopt it without recreation. If uv refuses to adopt
|
|
||||||
(incompatible metadata), one-shot fix on the host:
|
|
||||||
```
|
|
||||||
sudo -u left4me rm -rf /var/lib/left4me/.venv
|
|
||||||
# bw apply will recreate via `uv sync`
|
|
||||||
```
|
|
||||||
5. **Idempotency check + verification matrix**:
|
|
||||||
- `bw apply` idempotent (`0 fixed, 0 failed`)
|
|
||||||
- `pip show l4d2{host,web}` reports the locked version
|
|
||||||
- Web service active, gameserver round-trip works
|
|
||||||
6. **Commit ckn-bw side, do not push** (operator pushes manually).
|
|
||||||
|
|
||||||
## What does NOT change
|
|
||||||
|
|
||||||
- **Source ownership**: `/opt/left4me/src` stays `root:root` (the
|
|
||||||
runtime-state relocation made it so; uv reads it as world-readable).
|
|
||||||
- **Venv location**: `/var/lib/left4me/.venv` stays where it is, owned
|
|
||||||
by `left4me`, accessed via `UV_PROJECT_ENVIRONMENT`.
|
|
||||||
- **Hardening drop-ins, sudoers, sysctl, helpers**: all stable from
|
|
||||||
the deployment-responsibility migration. uv migration is independent.
|
|
||||||
- **systemd unit shapes**: reactor-emitted, per-host parameters
|
|
||||||
unchanged.
|
|
||||||
- **`alembic_upgrade` and `seed_overlays`**: same shell, same
|
|
||||||
triggering, same binaries (just from a uv-managed venv).
|
|
||||||
- **`pkg_apt: python3` and `python3-dev`**: kept (uv shells out to
|
|
||||||
the system Python interpreter).
|
|
||||||
- **CI workflows**: no CI currently exists; nothing to update.
|
|
||||||
|
|
||||||
## Out of scope
|
|
||||||
|
|
||||||
- Merging `l4d2host` and `l4d2web` into a single package. They stay
|
|
||||||
as separate workspace members.
|
|
||||||
- Switching to a non-direnv-based dev flow. `direnv` + `use uv` stays.
|
|
||||||
- Migrating other ckn-bw bundles to uv. This is left4me-specific.
|
|
||||||
- Pinning the host's `uv` version below the apt-current. If lockfile
|
|
||||||
format issues surface, address as a follow-up (e.g., apt-pin or
|
|
||||||
switch to astral.sh-installed uv).
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
1. **Spike test failure**: uv build isn't actually source-clean → falls
|
|
||||||
back to Medium scope. Captured above; this is a graceful degradation.
|
|
||||||
2. **Lockfile format skew**: dev's brew-installed uv (latest) ahead of
|
|
||||||
prod's apt-installed uv (Trixie's version) → lockfile produced in
|
|
||||||
dev rejected in prod. Mitigation: stick to features supported by
|
|
||||||
the apt-installed version; if needed, switch prod to a pinned
|
|
||||||
astral.sh install.
|
|
||||||
3. **`alembic` invocation path**: today the action calls
|
|
||||||
`/var/lib/left4me/.venv/bin/alembic`. After uv sync, this path
|
|
||||||
should still exist (uv installs the same console_scripts entrypoint
|
|
||||||
as pip). Verify in step 4.
|
|
||||||
4. **direnv `use uv` availability**: `use uv` was added to direnv's
|
|
||||||
stdlib relatively recently. If the dev's direnv is older, use the
|
|
||||||
fallback `.envrc` snippet (`uv sync >/dev/null && source .venv/bin/activate`).
|
|
||||||
5. **`--force-reinstall` semantics gone**: today's chain uses
|
|
||||||
`pip install --force-reinstall` to work around the static
|
|
||||||
`0.1.0` version in pyproject.toml — without it pip would skip on
|
|
||||||
no-op. `uv sync --frozen` is version-aware via the lockfile, not
|
|
||||||
the package version string, so this concern goes away.
|
|
||||||
|
|
||||||
## Verification (end-to-end)
|
|
||||||
|
|
||||||
After ckn-bw apply:
|
|
||||||
|
|
||||||
1. **Source still clean**:
|
|
||||||
```
|
|
||||||
ssh ckn@left4.me 'sudo git -C /opt/left4me/src status --porcelain'
|
|
||||||
```
|
|
||||||
Empty output.
|
|
||||||
|
|
||||||
2. **Venv has the workspace members installed**:
|
|
||||||
```
|
|
||||||
ssh ckn@left4.me 'sudo -u left4me /var/lib/left4me/.venv/bin/python -c "import l4d2host; import l4d2web; print(l4d2host.__file__, l4d2web.__file__)"'
|
|
||||||
```
|
|
||||||
Both paths point inside `/var/lib/left4me/.venv/lib/python3.13/site-packages/`.
|
|
||||||
|
|
||||||
3. **Pinned versions match the lockfile**:
|
|
||||||
```
|
|
||||||
ssh ckn@left4.me 'sudo -u left4me /var/lib/left4me/.venv/bin/pip show flask | grep Version'
|
|
||||||
```
|
|
||||||
Matches the Flask version in `uv.lock`.
|
|
||||||
|
|
||||||
4. **Web service health**:
|
|
||||||
```
|
|
||||||
ssh ckn@left4.me 'sudo systemctl is-active left4me-web.service'
|
|
||||||
```
|
|
||||||
`active`.
|
|
||||||
|
|
||||||
5. **Idempotent apply**:
|
|
||||||
```
|
|
||||||
(cd ~/Projekte/ckn-bw && .venv/bin/bw apply ovh.left4me)
|
|
||||||
```
|
|
||||||
`0 fixed, 0 failed`.
|
|
||||||
|
|
||||||
6. **Gameserver round-trip**: start a verify instance via
|
|
||||||
`left4me-systemctl enable verify`, check journal for clean
|
|
||||||
srcds_run startup behaviour (modulo any missing instance dir),
|
|
||||||
disable.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Deployment-responsibility design (just shipped; the venv chain it
|
|
||||||
did NOT touch is what this handoff replaces):
|
|
||||||
`docs/superpowers/specs/2026-05-15-deployment-responsibility-design.md`
|
|
||||||
- Runtime state relocation (made `/opt/left4me/src` root-owned, which
|
|
||||||
is why the current `pip_install` needs the tempdir dance):
|
|
||||||
`docs/superpowers/specs/2026-05-15-runtime-state-relocation-design.md`
|
|
||||||
- ckn-bw left4me bundle:
|
|
||||||
`~/Projekte/ckn-bw/bundles/left4me/`
|
|
||||||
- `items.py:285-306` — `git_deploy` triggers
|
|
||||||
- `items.py:328-340` — `left4me_create_venv`
|
|
||||||
- `items.py:342-352` — `left4me_pip_upgrade`
|
|
||||||
- `items.py:354-382` — `left4me_pip_install` (the tempdir dance)
|
|
||||||
- `items.py:384-407` — `left4me_alembic_upgrade`
|
|
||||||
- `items.py:409-424` — `left4me_seed_overlays`
|
|
||||||
- uv docs: https://docs.astral.sh/uv/ — workspace, `uv sync`,
|
|
||||||
`UV_PROJECT_ENVIRONMENT`.
|
|
||||||
|
|
||||||
## Commit messages (suggested)
|
|
||||||
|
|
||||||
left4me side (root pyproject + lockfile + member deps + .envrc + docs):
|
|
||||||
|
|
||||||
```
|
|
||||||
refactor(repo): uv workspace + lockfile
|
|
||||||
|
|
||||||
Declare the repo as a uv workspace with l4d2host and l4d2web as
|
|
||||||
members. Add uv.lock for deterministic dep resolution. l4d2web now
|
|
||||||
declares its cross-dep on l4d2host explicitly via tool.uv.sources.
|
|
||||||
|
|
||||||
Local-dev install switches from `pip install -e ./l4d2host -e ./l4d2web`
|
|
||||||
to `uv sync` (creates venv, installs members editable, installs dev
|
|
||||||
deps from one source). .envrc uses direnv's `use uv` helper.
|
|
||||||
|
|
||||||
Prereq for the ckn-bw bundle uv-sync action (handoff:
|
|
||||||
docs/superpowers/specs/2026-05-15-handoff-uv-workspace.md).
|
|
||||||
```
|
|
||||||
|
|
||||||
ckn-bw side (drop chain, install uv, single sync action):
|
|
||||||
|
|
||||||
```
|
|
||||||
refactor(left4me): collapse venv chain into uv sync
|
|
||||||
|
|
||||||
Replace left4me_create_venv + left4me_pip_upgrade + left4me_pip_install
|
|
||||||
(the tempdir-copy dance) with a single left4me_uv_sync action driven
|
|
||||||
by left4me's committed uv.lock. Deterministic dep versions, no source
|
|
||||||
mutation during build, three actions instead of five.
|
|
||||||
|
|
||||||
pkg_apt: uv added. python3-pip removed (uv replaces it).
|
|
||||||
|
|
||||||
Per docs/superpowers/specs/2026-05-15-handoff-uv-workspace.md (in the
|
|
||||||
left4me repo).
|
|
||||||
```
|
|
||||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue