spec(deploy-dir-rethink): plan + mark adjacent specs resolved

Adds the implementation plan that landed in the preceding commit
(2026-05-15-deploy-dir-rethink.md) under docs/superpowers/plans/, and
marks the two related specs:

- 2026-05-15-deploy-dir-rethink-design.md (the source handoff) gets a
  "Resolved by …" banner at the top with a one-paragraph summary of
  the decisions taken. Body preserved for archaeology.

- 2026-05-15-janitorial-cleanup.md gets a status banner noting that
  items 1, 3, 4, 5 are fully resolved by the deploy-dir-rethink plan
  and item 2 is partially resolved with a third option the original
  enumeration didn't list: only the truly-dead two static units
  (cake.service, nft-mark.service) deleted, the reactor-emitted set
  (server@, web, workshop-refresh.{service,timer}, slices) retained
  as curated examples. Resolved items left in place but flagged.

Remaining live janitorial items: 6 (bubblewrap doc drift), 7
(conditional on build-overlay-unit refactor), 8 (operational idmap
bind cleanup), 9 (Optimized Settings overlay verification), 10 (SM
1.13 calendar reminder).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-15 12:05:53 +02:00
parent 5284e28af7
commit 160911fbca
No known key found for this signature in database
3 changed files with 238 additions and 5 deletions

View file

@ -0,0 +1,198 @@
# Deploy-dir architecture rethink — implementation plan
## Context
Resolves the open questions in `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md`. After the 2026-05-15 script-consolidation work, `deploy/` ended up half-canonical / half-historical: the privileged scripts were treated as load-bearing source-of-truth there, while sudoers/sysctl/env-templates stayed duplicated against ckn-bw, and the obsolete `deploy-test-server.sh` plus a pile of dead static unit files lingered. The shape worked but couldn't be described in two sentences.
This plan commits to the framing the user picked: **`deploy/` is a reference exemplar** — readable enough that a fresh consumer (ckn-bw today, hypothetical docker/ansible/manual tomorrow) could build a deployment from it, but not the live source of truth for installed binaries. The privileged scripts are **application-inherent code** and move out of `deploy/` to top-level `scripts/{libexec,sbin}/`. Dead code is deleted in the same pass. ckn-bw is updated to read scripts from the new location. The intended outcome: `deploy/` shrinks to README + example configs + a couple of curated example units, the rules for "what goes here" fit in two sentences, and the cross-repo install path becomes self-explanatory.
## End state
```
left4me/
scripts/
libexec/
left4me-overlay # 244-line Python helper (mount/umount)
left4me-script-sandbox # 109-line bash (systemd-run sandbox)
left4me-systemctl # 44-line sh wrapper
left4me-journalctl # 53-line sh wrapper
sbin/
left4me # 17-line admin CLI wrapper
tests/
test_overlay.py
test_script_sandbox.py
test_systemctl_helper.py
test_journalctl_helper.py
test_sudoers_grants.py # tests the contract between scripts and sudoers
deploy/ # REFERENCE ONLY — see deploy/README.md
README.md # rewritten: explains target layout, points at scripts/
files/
etc/
sudoers.d/left4me # example, ckn-bw ships its own verbatim copy
sysctl.d/99-left4me.conf # example
left4me/sandbox-resolv.conf # example
usr/local/lib/systemd/system/
left4me-server@.service # curated example of what ckn-bw's reactor emits
left4me-web.service # curated example
left4me-workshop-refresh.service # curated example
left4me-workshop-refresh.timer # curated example
l4d2-game.slice # curated example
l4d2-build.slice # curated example
templates/etc/left4me/
host.env # example, ckn-bw renders its own mako version
web.env.template
tests/
test_example_units.py # slimmed: just locks down the curated examples
l4d2host/ # unchanged
l4d2web/ # unchanged
docs/
```
## Step-by-step
### 1. Create `scripts/` and move helpers
- `mkdir -p scripts/libexec scripts/sbin scripts/tests`
- `git mv` the four live helpers and the admin CLI:
- `deploy/files/usr/local/libexec/left4me/left4me-overlay``scripts/libexec/left4me-overlay`
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox``scripts/libexec/left4me-script-sandbox`
- `deploy/files/usr/local/libexec/left4me/left4me-systemctl``scripts/libexec/left4me-systemctl`
- `deploy/files/usr/local/libexec/left4me/left4me-journalctl``scripts/libexec/left4me-journalctl`
- `deploy/files/usr/local/sbin/left4me``scripts/sbin/left4me`
- The scripts' contents are unchanged. Every install-target path inside them (`/usr/local/libexec/left4me/...`, `/etc/left4me/...`, `/var/lib/left4me/...`) stays exactly as is — those are runtime paths, not source-tree paths.
### 2. Delete dead code
- `git rm` (truly obsolete; replacements live elsewhere or feature was retired):
- `deploy/files/usr/local/libexec/left4me/left4me-apply-cake` — CAKE migrated to systemd-networkd via `network/<iface>/cake` node metadata in ckn-bw.
- `deploy/files/usr/local/lib/systemd/system/left4me-cake.service` — same reason.
- `deploy/files/etc/left4me/cake.env` — bandwidth lives in node metadata, not an env file.
- `deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service` — central `bundles/nftables/` consumes the rules now.
- `deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft` — same. After the delete, the now-empty `deploy/files/usr/local/lib/left4me/` and its `nft/` child disappear (git doesn't track empty dirs).
- `deploy/deploy-test-server.sh` — superseded by `bw apply`; content survives in git history.
- **Do NOT delete** `deploy/files/usr/local/lib/systemd/system/left4me-workshop-refresh.{service,timer}`. The workshop-refresh job is live (invokes `flask workshop-refresh`, defined in `l4d2web/cli.py`); ckn-bw's reactor emits these on production. They stay as curated examples, same category as `left4me-server@.service` / `left4me-web.service` / the slices. (This corrects the framing in `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md` and item 2 of `docs/superpowers/specs/2026-05-15-janitorial-cleanup.md`, both of which lumped workshop-refresh together with truly-dead units.)
- Stale `__pycache__` dirs under `deploy/files/usr/local/libexec/left4me/` are deleted by the moves in step 1.
### 3. Split and relocate `deploy/tests/test_deploy_artifacts.py`
The current file (~880 lines) is doing four jobs. Split as follows; do not duplicate tests across files.
**Concrete sequence to preserve git history where it counts**:
1. `git mv deploy/tests/test_deploy_artifacts.py deploy/tests/test_example_units.py` — single rename, history follows via `git log --follow`.
2. In the renamed file, delete every test except the "Keep in `deploy/tests/test_example_units.py`" list below. The kept tests track the unit/sysctl/env-template examples, which is what `deploy/tests/` will mean afterwards.
3. Create new `scripts/tests/*.py` files (and `conftest.py`) by writing them fresh — pasting the relevant test functions across. The extracted tests lose direct rename history, but blame against the new files still resolves to the originals one git ref back; acceptable tradeoff.
**Move to `scripts/tests/`** (tests of script behavior + the sudoers contract that gates the scripts):
- `scripts/tests/test_overlay.py``test_overlay_helper_is_python_with_strict_validation`, `test_overlay_helper_mount_is_idempotent_when_already_mounted`
- `scripts/tests/test_script_sandbox.py``test_script_sandbox_helper_present`, `test_script_sandbox_helper_passes_shell_syntax_check`, `test_script_sandbox_helper_invokes_systemd_run_with_hardening`, `test_script_sandbox_uses_idmap_staging`, `test_script_sandbox_in_build_slice_with_oom_adjust`, `test_script_sandbox_helper_validates_overlay_id`, `test_script_sandbox_helper_dry_run_mode`
- `scripts/tests/test_systemctl_helper.py``test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args`
- `scripts/tests/test_journalctl_helper.py``test_journalctl_helper_passes_shell_syntax_check_and_rejects_bad_args`
- `scripts/tests/test_helpers_use_fixed_paths.py``test_helpers_use_fixed_system_tool_paths_not_sudo_path`
- `scripts/tests/test_sudoers_grants.py``test_sudoers_allows_only_left4me_helpers_not_raw_system_tools` (still reads `deploy/files/etc/sudoers.d/left4me` as the canonical example; comment why)
The `ROOT/DEPLOY` path-prefix constants in each file get rewritten so `SCRIPTS = Path(__file__).resolve().parents[2] / "scripts"` and helpers resolve to `SCRIPTS / "libexec/left4me-overlay"` etc. Shared helpers (`_fake_command`, `_env_with_fake_commands`) move into `scripts/tests/conftest.py`.
**Keep in `deploy/tests/test_example_units.py`** (locks down the curated examples; renamed from the current file):
- `test_global_unit_files_exist_at_product_level_paths`
- `test_web_unit_contains_required_runtime_contract`
- `test_server_unit_contains_required_runtime_contract`
- `test_server_unit_mounts_overlay_via_exec_start_pre`
- `test_server_unit_unmounts_overlay_via_exec_stop_post`
- `test_server_unit_contains_perf_baseline_directives`
- `test_l4d2_game_slice_exists_with_high_weights`
- `test_l4d2_build_slice_exists_with_low_weights`
- `test_sysctl_conf_present_with_perf_settings`
- `test_env_templates_contain_required_defaults`
- `test_sandbox_resolv_conf_exists`
Add a top-of-file docstring: *"These tests lock down the curated examples kept in `deploy/files/` for reference. The production units are emitted by ckn-bw's reactor in `bundles/left4me/metadata.py`; when reactor output drifts intentionally, update the examples here too."*
**Delete entirely** (target removed or no longer load-bearing):
- All `test_deploy_script_*` tests (12 tests; `deploy-test-server.sh` is gone)
- `test_globals_refresh_units_removed` — file already deleted; nothing to lock down
- `test_nft_mark_file_marks_left4me_udp_with_dscp_ef_and_priority`, `test_nft_mark_unit_loads_and_clears_left4me_table` — nft-mark moved to central nftables bundle
- `test_cake_env_template_documents_required_knobs`, `test_apply_cake_helper_supports_apply_and_clear_modes`, `test_apply_cake_helper_passes_shell_syntax_check`, `test_cake_unit_runs_helper_in_apply_and_clear_modes` — CAKE moved to systemd-networkd
- `test_deploy_script_installs_overlay_helper_with_executable_mode`, `test_deploy_script_installs_script_sandbox_helper` — install responsibility now lives in ckn-bw's bundle, not in any left4me-side script
Final file count: `scripts/tests/` gets 6 files, `deploy/tests/test_example_units.py` is one file, `deploy/tests/test_deploy_artifacts.py` is gone (renamed).
### 4. Rewrite `deploy/README.md`
Reframe the top of the file as: *"This directory is a reference exemplar. The canonical deploy is [ckn-bw](https://git.sublimity.de/cronekorkn/ckn-bw)'s `bundles/left4me/` (run `bw apply ovh.left4me`). Files under `deploy/files/` and `deploy/templates/` are readable examples — not the binaries / configs ckn-bw actually installs. Read them to understand the target layout if you're building a fresh deployment by other means."*
Update the file/status table:
- Drop rows for files that no longer exist (apply-cake, cake.service, cake.env, nft-mark.*, workshop-refresh.*).
- Drop the `deploy-test-server.sh` row.
- For the privileged-scripts rows, change `files/usr/local/libexec/left4me/...``(moved to scripts/libexec/, installed by ckn-bw's install_left4me_scripts action)`; same for the sbin row.
- Mark the remaining `files/etc/...` and `files/usr/local/lib/systemd/system/...` entries explicitly as **example**: ckn-bw ships its own verbatim copies of the configs, its reactor emits the units.
Keep the "Target Layout" / "Runtime User" / "Overlay References" / "Performance Tuning" sections — they're useful reference prose. Strip the "Running A Test Deployment" / "Admin Bootstrap" sections that refer to the deleted shell installer; replace with a one-paragraph pointer to ckn-bw.
### 5. ckn-bw cross-repo update
The `install_left4me_scripts` action in `bundles/left4me/items.py` currently reads from `/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/`. Update it to read from `/opt/left4me/src/scripts/{libexec,sbin}/`. The install target is unchanged (`/usr/local/libexec/left4me/`, `/usr/local/sbin/left4me`), so nothing on the deployed host moves.
This is a separate PR in the ckn-bw repo. It must land **at the same time** as the left4me move — the install action depends on the source paths existing. Coordination:
1. Open both PRs simultaneously.
2. Merge order: left4me first (scripts exist at the new path in `/opt/left4me/src/` only after a fresh `git_deploy`), then ckn-bw, then `bw apply ovh.left4me`.
3. Alternative: have the ckn-bw PR fall back to the old path if the new path doesn't exist (one extra glob); decide during ckn-bw review whether the complexity is worth the looser coupling. Default: no fallback, coordinate the merges.
Verification on the deploy target: after `bw apply`, the files under `/usr/local/libexec/left4me/` and `/usr/local/sbin/left4me` should be byte-identical to before. Sudoers, services, the web app: all unchanged.
### 6. Mark adjacent specs / docs as resolved
- `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md`: prepend a `**Resolved 2026-05-15 by docs/superpowers/plans/…</plan-name>.md.**` line at the top. Leave the body intact for archaeology.
- `docs/superpowers/specs/2026-05-15-janitorial-cleanup.md`: cross out items 1, 5, 6 (now handled here). Item 2 needs a rewrite — the framing "all static unit files are obsolete drift" was wrong; the live reactor-emitted set (`server@`, `web`, `workshop-refresh.{service,timer}`, `l4d2-{game,build}.slice`) stays in `deploy/files/` as curated examples. The truly-dead two (`left4me-cake.service`, `left4me-nft-mark.service`) are already deleted by this plan, so item 2 collapses to "no remaining work."
- No memory file changes needed; the project state captured here is structural and re-derivable from `deploy/README.md` after the rewrite lands.
### 7. Rollback notes
If `bw apply ovh.left4me` against the test server breaks something after the cross-repo merge:
1. Revert the ckn-bw `install_left4me_scripts` action change to the old source path (`/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/`). Re-apply.
2. The left4me side never needs reverting in isolation — the scripts at the new path are byte-identical to the old ones, so a stale ckn-bw install action against a *new* left4me checkout would fail at `install -t` (source path missing). That failure is loud and safe: nothing on the deployed system gets modified.
3. The only foot-gun is **partial rollout**: ckn-bw updated but left4me not yet checked out at the right revision. The `git_deploy` step pins the revision, so as long as the two PRs reference compatible commits, the deployed `/opt/left4me/src/` always matches the action's expectation.
## What does NOT change
- Runtime install-target paths (`/usr/local/libexec/left4me/...`, `/usr/local/sbin/left4me`) — every reference inside `l4d2host/service_control.py:7-8`, `l4d2web/services/overlay_builders.py:34`, the sudoers file, and the systemd units stays the same.
- The Python packages `l4d2host/` and `l4d2web/`.
- ckn-bw's bundles for sudoers / sysctl / sandbox-resolv.conf — those keep their own verbatim copies (the user picked "deploy/ keeps configs as examples; duplication-with-ckn-bw is OK because deploy/ is explicitly reference"). Janitoring the duplication is *not* in scope for this plan.
- The Mako env templates in ckn-bw — they stay where they are, since they need bw's metadata access for rendering.
- The recent overlay-idmap / script-sandbox idmap-staging work — untouched.
## Critical files (jump points for the implementor)
- `deploy/tests/test_deploy_artifacts.py` — the source for the test split (lines 20-32 are the path constants; tests grouped roughly by helper from line 138 onward)
- `deploy/README.md` — full rewrite of the top section, partial rewrite of the table
- `l4d2host/service_control.py:7-8` — verify install-target paths unchanged (sanity)
- `l4d2web/services/overlay_builders.py:34` — same
- `deploy/files/etc/sudoers.d/left4me` — sanity-check that no path inside changed
- `deploy/files/usr/local/lib/systemd/system/{left4me-server@.service,left4me-web.service,l4d2-{game,build}.slice}` — survive as curated examples
- ckn-bw repo: `bundles/left4me/items.py` — the `install_left4me_scripts` action (separate PR)
## Verification
End-to-end:
1. **Source-tree consistency.** `find scripts deploy -type f | sort` matches the layout in "End state" above (modulo `__pycache__`).
2. **All tests pass locally.** From the repo root: `pytest scripts/tests/ deploy/tests/ l4d2host/tests/ l4d2web/tests/` — every test passes. Specifically verify `scripts/tests/test_sudoers_grants.py` still reads `deploy/files/etc/sudoers.d/left4me` correctly (path constant points across the dir boundary).
3. **Shell syntax checks.** The split tests should still run `sh -n` / `bash -n` against the moved scripts; no script edits means no syntax regressions, but the test paths must resolve.
4. **No accidental application breakage.** `grep -rn '/usr/local/libexec/left4me\|/usr/local/sbin/left4me' l4d2host l4d2web` returns the same hits as before (paths are install-target, source moves don't affect them).
5. **ckn-bw dry-run.** Once the ckn-bw PR is up, `bw apply --dry-run ovh.left4me` from the ckn-bw repo: the diff should show **no changes** to files under `/usr/local/libexec/left4me/` or `/usr/local/sbin/left4me` (byte-identical content via the new path).
6. **Production apply.** `bw apply ovh.left4me` against the real test server. After apply: `systemctl status left4me-web.service` is green, starting a game server via the web UI still works (overlay mount → srcds_run → unmount on stop), running an overlay build script through the sandbox still works.
## Out of scope (handled elsewhere or deferred)
- The Mako template duplication in ckn-bw — separate cleanup; the templates legitimately need bw's metadata access.
- The 1/2/3-user uid-split decision — `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`.
- The script-sandbox → systemd template unit refactor — `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`.
- Remaining janitorial items: item 3 (bubblewrap→systemd-run doc drift), item 4 (stale gameserver-side idmap binds), calendar reminder for SM 1.13 stable. Items 1, 2 (partial — see step 6), 5, 6 are subsumed here.
- Rewriting the shell helpers in Python / packaging them as console_scripts — explicitly rejected in the recent script-consolidation plan (egg-info + TOCTOU privilege concerns).
- Historical references inside `docs/superpowers/plans/*` and `docs/superpowers/specs/*` to `deploy/files/...` or `deploy-test-server.sh` paths. Those are time-stamped snapshots of past sessions; they don't get rewritten when the underlying tree moves.

View file

@ -1,5 +1,15 @@
# Deploy directory architecture — open questions # Deploy directory architecture — open questions
**Resolved 2026-05-15 by [`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`](../plans/2026-05-15-deploy-dir-rethink.md).**
Decision summary: `deploy/` is reference material; privileged scripts moved
to top-level `scripts/{libexec,sbin}/`; `deploy-test-server.sh` deleted;
dead static units (cake.service, nft-mark.service) deleted; reactor-emitted
units (server@, web, workshop-refresh.{service,timer}, slices) retained as
curated examples; ckn-bw `install_left4me_scripts` action repointed to the
new source paths. Body below preserved for archaeology.
---
**Status: open questions, not a settled design.** This is a thinking-aloud **Status: open questions, not a settled design.** This is a thinking-aloud
handoff prompted by the script-consolidation change on 2026-05-15. Decisions handoff prompted by the script-consolidation change on 2026-05-15. Decisions
deferred; a future session should pick this up, talk through the options, deferred; a future session should pick this up, talk through the options,

View file

@ -7,9 +7,18 @@ self-contained. Knock them out individually or batch them into a
single janitorial PR. None are urgent — the project works fine with single janitorial PR. None are urgent — the project works fine with
all of these still present. all of these still present.
> **2026-05-15 update**: items 1, 3, 4, and 5 resolved by
> [`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`](../plans/2026-05-15-deploy-dir-rethink.md).
> Item 2 partially resolved by the same plan with a third option the
> original enumeration didn't list: the truly-dead units (cake.service,
> nft-mark.service) are deleted, the reactor-emitted set (server@, web,
> workshop-refresh.{service,timer}, slices) stays as curated examples
> under `deploy/files/`. Resolved items left in place below, marked
> RESOLVED, for archaeology. Remaining live items: 6, 7, 8, 9, 10.
## Items ## Items
### 1. `left4me-apply-cake` — dead code ### 1. `left4me-apply-cake` — dead code [RESOLVED]
**What**: `deploy/files/usr/local/libexec/left4me/left4me-apply-cake` **What**: `deploy/files/usr/local/libexec/left4me/left4me-apply-cake`
(POSIX sh, ~47 lines) that applies/clears CAKE egress traffic (POSIX sh, ~47 lines) that applies/clears CAKE egress traffic
@ -34,7 +43,19 @@ sudo find /var/lib/left4me /opt/left4me /usr/local -name 'left4me-apply-cake'
# expect: empty after the rm # expect: empty after the rm
``` ```
### 2. Obsolete systemd unit files in `deploy/files/` ### 2. Obsolete systemd unit files in `deploy/files/` [PARTIALLY RESOLVED]
**Resolution path chosen**: third option not in the original enumeration —
*only the truly-dead two* (`left4me-cake.service`, `left4me-nft-mark.service`)
were deleted. The reactor-emitted set (`left4me-server@.service`,
`left4me-web.service`, `left4me-workshop-refresh.{service,timer}`,
`l4d2-game.slice`, `l4d2-build.slice`) is retained as **curated examples**
under `deploy/files/`, locked down by `deploy/tests/test_example_units.py`.
The framing in this item — "all six are equally drift" — was wrong: the
reactor-emitted units carry useful signal as readable examples of what
ckn-bw's `systemd_units` reactor emits at apply time. Original body below.
**What**: **What**:
- `deploy/files/usr/local/lib/systemd/system/left4me-cake.service` - `deploy/files/usr/local/lib/systemd/system/left4me-cake.service`
@ -65,7 +86,7 @@ they matter).
**Verification**: `find deploy/files/usr/local/lib/systemd/system -type f` **Verification**: `find deploy/files/usr/local/lib/systemd/system -type f`
should match the README's "what's canonical" list. should match the README's "what's canonical" list.
### 3. `deploy/files/etc/left4me/cake.env` ### 3. `deploy/files/etc/left4me/cake.env` [RESOLVED]
**What**: env file referenced by the obsolete `left4me-cake.service`. **What**: env file referenced by the obsolete `left4me-cake.service`.
@ -75,7 +96,7 @@ read by anything live.
**Action**: delete `deploy/files/etc/left4me/cake.env`. **Action**: delete `deploy/files/etc/left4me/cake.env`.
### 4. `deploy/files/usr/local/lib/left4me/nft/` ### 4. `deploy/files/usr/local/lib/left4me/nft/` [RESOLVED]
**What**: nftables fragment for `left4me-nft-mark.service`. **What**: nftables fragment for `left4me-nft-mark.service`.
@ -86,7 +107,11 @@ fragment isn't read.
**Action**: delete `deploy/files/usr/local/lib/left4me/` **Action**: delete `deploy/files/usr/local/lib/left4me/`
recursively. recursively.
### 5. `deploy-test-server.sh`'s fate ### 5. `deploy-test-server.sh`'s fate [RESOLVED]
**Resolution**: deleted entirely. Content survives in git history.
**What**: `deploy/deploy-test-server.sh`, the historical one-shot **What**: `deploy/deploy-test-server.sh`, the historical one-shot
bash deploy. bash deploy.