left4me/docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md
mwiegand 160911fbca
spec(deploy-dir-rethink): plan + mark adjacent specs resolved
Adds the implementation plan that landed in the preceding commit
(2026-05-15-deploy-dir-rethink.md) under docs/superpowers/plans/, and
marks the two related specs:

- 2026-05-15-deploy-dir-rethink-design.md (the source handoff) gets a
  "Resolved by …" banner at the top with a one-paragraph summary of
  the decisions taken. Body preserved for archaeology.

- 2026-05-15-janitorial-cleanup.md gets a status banner noting that
  items 1, 3, 4, 5 are fully resolved by the deploy-dir-rethink plan
  and item 2 is partially resolved with a third option the original
  enumeration didn't list: only the truly-dead two static units
  (cake.service, nft-mark.service) deleted, the reactor-emitted set
  (server@, web, workshop-refresh.{service,timer}, slices) retained
  as curated examples. Resolved items left in place but flagged.

Remaining live janitorial items: 6 (bubblewrap doc drift), 7
(conditional on build-overlay-unit refactor), 8 (operational idmap
bind cleanup), 9 (Optimized Settings overlay verification), 10 (SM
1.13 calendar reminder).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:05:53 +02:00

18 KiB

Deploy-dir architecture rethink — implementation plan

Context

Resolves the open questions in docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md. After the 2026-05-15 script-consolidation work, deploy/ ended up half-canonical / half-historical: the privileged scripts were treated as load-bearing source-of-truth there, while sudoers/sysctl/env-templates stayed duplicated against ckn-bw, and the obsolete deploy-test-server.sh plus a pile of dead static unit files lingered. The shape worked but couldn't be described in two sentences.

This plan commits to the framing the user picked: deploy/ is a reference exemplar — readable enough that a fresh consumer (ckn-bw today, hypothetical docker/ansible/manual tomorrow) could build a deployment from it, but not the live source of truth for installed binaries. The privileged scripts are application-inherent code and move out of deploy/ to top-level scripts/{libexec,sbin}/. Dead code is deleted in the same pass. ckn-bw is updated to read scripts from the new location. The intended outcome: deploy/ shrinks to README + example configs + a couple of curated example units, the rules for "what goes here" fit in two sentences, and the cross-repo install path becomes self-explanatory.

End state

left4me/
  scripts/
    libexec/
      left4me-overlay              # 244-line Python helper (mount/umount)
      left4me-script-sandbox       # 109-line bash (systemd-run sandbox)
      left4me-systemctl            # 44-line sh wrapper
      left4me-journalctl           # 53-line sh wrapper
    sbin/
      left4me                      # 17-line admin CLI wrapper
    tests/
      test_overlay.py
      test_script_sandbox.py
      test_systemctl_helper.py
      test_journalctl_helper.py
      test_sudoers_grants.py       # tests the contract between scripts and sudoers
  deploy/                          # REFERENCE ONLY — see deploy/README.md
    README.md                      # rewritten: explains target layout, points at scripts/
    files/
      etc/
        sudoers.d/left4me                # example, ckn-bw ships its own verbatim copy
        sysctl.d/99-left4me.conf         # example
        left4me/sandbox-resolv.conf      # example
      usr/local/lib/systemd/system/
        left4me-server@.service          # curated example of what ckn-bw's reactor emits
        left4me-web.service              # curated example
        left4me-workshop-refresh.service # curated example
        left4me-workshop-refresh.timer   # curated example
        l4d2-game.slice                  # curated example
        l4d2-build.slice                 # curated example
    templates/etc/left4me/
      host.env                           # example, ckn-bw renders its own mako version
      web.env.template
    tests/
      test_example_units.py              # slimmed: just locks down the curated examples
  l4d2host/                        # unchanged
  l4d2web/                         # unchanged
  docs/

Step-by-step

1. Create scripts/ and move helpers

  • mkdir -p scripts/libexec scripts/sbin scripts/tests
  • git mv the four live helpers and the admin CLI:
    • deploy/files/usr/local/libexec/left4me/left4me-overlayscripts/libexec/left4me-overlay
    • deploy/files/usr/local/libexec/left4me/left4me-script-sandboxscripts/libexec/left4me-script-sandbox
    • deploy/files/usr/local/libexec/left4me/left4me-systemctlscripts/libexec/left4me-systemctl
    • deploy/files/usr/local/libexec/left4me/left4me-journalctlscripts/libexec/left4me-journalctl
    • deploy/files/usr/local/sbin/left4mescripts/sbin/left4me
  • The scripts' contents are unchanged. Every install-target path inside them (/usr/local/libexec/left4me/..., /etc/left4me/..., /var/lib/left4me/...) stays exactly as is — those are runtime paths, not source-tree paths.

2. Delete dead code

  • git rm (truly obsolete; replacements live elsewhere or feature was retired):
    • deploy/files/usr/local/libexec/left4me/left4me-apply-cake — CAKE migrated to systemd-networkd via network/<iface>/cake node metadata in ckn-bw.
    • deploy/files/usr/local/lib/systemd/system/left4me-cake.service — same reason.
    • deploy/files/etc/left4me/cake.env — bandwidth lives in node metadata, not an env file.
    • deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service — central bundles/nftables/ consumes the rules now.
    • deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft — same. After the delete, the now-empty deploy/files/usr/local/lib/left4me/ and its nft/ child disappear (git doesn't track empty dirs).
    • deploy/deploy-test-server.sh — superseded by bw apply; content survives in git history.
  • Do NOT delete deploy/files/usr/local/lib/systemd/system/left4me-workshop-refresh.{service,timer}. The workshop-refresh job is live (invokes flask workshop-refresh, defined in l4d2web/cli.py); ckn-bw's reactor emits these on production. They stay as curated examples, same category as left4me-server@.service / left4me-web.service / the slices. (This corrects the framing in docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md and item 2 of docs/superpowers/specs/2026-05-15-janitorial-cleanup.md, both of which lumped workshop-refresh together with truly-dead units.)
  • Stale __pycache__ dirs under deploy/files/usr/local/libexec/left4me/ are deleted by the moves in step 1.

3. Split and relocate deploy/tests/test_deploy_artifacts.py

The current file (~880 lines) is doing four jobs. Split as follows; do not duplicate tests across files.

Concrete sequence to preserve git history where it counts:

  1. git mv deploy/tests/test_deploy_artifacts.py deploy/tests/test_example_units.py — single rename, history follows via git log --follow.
  2. In the renamed file, delete every test except the "Keep in deploy/tests/test_example_units.py" list below. The kept tests track the unit/sysctl/env-template examples, which is what deploy/tests/ will mean afterwards.
  3. Create new scripts/tests/*.py files (and conftest.py) by writing them fresh — pasting the relevant test functions across. The extracted tests lose direct rename history, but blame against the new files still resolves to the originals one git ref back; acceptable tradeoff.

Move to scripts/tests/ (tests of script behavior + the sudoers contract that gates the scripts):

  • scripts/tests/test_overlay.pytest_overlay_helper_is_python_with_strict_validation, test_overlay_helper_mount_is_idempotent_when_already_mounted
  • scripts/tests/test_script_sandbox.pytest_script_sandbox_helper_present, test_script_sandbox_helper_passes_shell_syntax_check, test_script_sandbox_helper_invokes_systemd_run_with_hardening, test_script_sandbox_uses_idmap_staging, test_script_sandbox_in_build_slice_with_oom_adjust, test_script_sandbox_helper_validates_overlay_id, test_script_sandbox_helper_dry_run_mode
  • scripts/tests/test_systemctl_helper.pytest_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args
  • scripts/tests/test_journalctl_helper.pytest_journalctl_helper_passes_shell_syntax_check_and_rejects_bad_args
  • scripts/tests/test_helpers_use_fixed_paths.pytest_helpers_use_fixed_system_tool_paths_not_sudo_path
  • scripts/tests/test_sudoers_grants.pytest_sudoers_allows_only_left4me_helpers_not_raw_system_tools (still reads deploy/files/etc/sudoers.d/left4me as the canonical example; comment why)

The ROOT/DEPLOY path-prefix constants in each file get rewritten so SCRIPTS = Path(__file__).resolve().parents[2] / "scripts" and helpers resolve to SCRIPTS / "libexec/left4me-overlay" etc. Shared helpers (_fake_command, _env_with_fake_commands) move into scripts/tests/conftest.py.

Keep in deploy/tests/test_example_units.py (locks down the curated examples; renamed from the current file):

  • test_global_unit_files_exist_at_product_level_paths
  • test_web_unit_contains_required_runtime_contract
  • test_server_unit_contains_required_runtime_contract
  • test_server_unit_mounts_overlay_via_exec_start_pre
  • test_server_unit_unmounts_overlay_via_exec_stop_post
  • test_server_unit_contains_perf_baseline_directives
  • test_l4d2_game_slice_exists_with_high_weights
  • test_l4d2_build_slice_exists_with_low_weights
  • test_sysctl_conf_present_with_perf_settings
  • test_env_templates_contain_required_defaults
  • test_sandbox_resolv_conf_exists

Add a top-of-file docstring: "These tests lock down the curated examples kept in deploy/files/ for reference. The production units are emitted by ckn-bw's reactor in bundles/left4me/metadata.py; when reactor output drifts intentionally, update the examples here too."

Delete entirely (target removed or no longer load-bearing):

  • All test_deploy_script_* tests (12 tests; deploy-test-server.sh is gone)
  • test_globals_refresh_units_removed — file already deleted; nothing to lock down
  • test_nft_mark_file_marks_left4me_udp_with_dscp_ef_and_priority, test_nft_mark_unit_loads_and_clears_left4me_table — nft-mark moved to central nftables bundle
  • test_cake_env_template_documents_required_knobs, test_apply_cake_helper_supports_apply_and_clear_modes, test_apply_cake_helper_passes_shell_syntax_check, test_cake_unit_runs_helper_in_apply_and_clear_modes — CAKE moved to systemd-networkd
  • test_deploy_script_installs_overlay_helper_with_executable_mode, test_deploy_script_installs_script_sandbox_helper — install responsibility now lives in ckn-bw's bundle, not in any left4me-side script

Final file count: scripts/tests/ gets 6 files, deploy/tests/test_example_units.py is one file, deploy/tests/test_deploy_artifacts.py is gone (renamed).

4. Rewrite deploy/README.md

Reframe the top of the file as: "This directory is a reference exemplar. The canonical deploy is ckn-bw's bundles/left4me/ (run bw apply ovh.left4me). Files under deploy/files/ and deploy/templates/ are readable examples — not the binaries / configs ckn-bw actually installs. Read them to understand the target layout if you're building a fresh deployment by other means."

Update the file/status table:

  • Drop rows for files that no longer exist (apply-cake, cake.service, cake.env, nft-mark., workshop-refresh.).
  • Drop the deploy-test-server.sh row.
  • For the privileged-scripts rows, change files/usr/local/libexec/left4me/...(moved to scripts/libexec/, installed by ckn-bw's install_left4me_scripts action); same for the sbin row.
  • Mark the remaining files/etc/... and files/usr/local/lib/systemd/system/... entries explicitly as example: ckn-bw ships its own verbatim copies of the configs, its reactor emits the units.

Keep the "Target Layout" / "Runtime User" / "Overlay References" / "Performance Tuning" sections — they're useful reference prose. Strip the "Running A Test Deployment" / "Admin Bootstrap" sections that refer to the deleted shell installer; replace with a one-paragraph pointer to ckn-bw.

5. ckn-bw cross-repo update

The install_left4me_scripts action in bundles/left4me/items.py currently reads from /opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/. Update it to read from /opt/left4me/src/scripts/{libexec,sbin}/. The install target is unchanged (/usr/local/libexec/left4me/, /usr/local/sbin/left4me), so nothing on the deployed host moves.

This is a separate PR in the ckn-bw repo. It must land at the same time as the left4me move — the install action depends on the source paths existing. Coordination:

  1. Open both PRs simultaneously.
  2. Merge order: left4me first (scripts exist at the new path in /opt/left4me/src/ only after a fresh git_deploy), then ckn-bw, then bw apply ovh.left4me.
  3. Alternative: have the ckn-bw PR fall back to the old path if the new path doesn't exist (one extra glob); decide during ckn-bw review whether the complexity is worth the looser coupling. Default: no fallback, coordinate the merges.

Verification on the deploy target: after bw apply, the files under /usr/local/libexec/left4me/ and /usr/local/sbin/left4me should be byte-identical to before. Sudoers, services, the web app: all unchanged.

6. Mark adjacent specs / docs as resolved

  • docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md: prepend a **Resolved 2026-05-15 by docs/superpowers/plans/…</plan-name>.md.** line at the top. Leave the body intact for archaeology.
  • docs/superpowers/specs/2026-05-15-janitorial-cleanup.md: cross out items 1, 5, 6 (now handled here). Item 2 needs a rewrite — the framing "all static unit files are obsolete drift" was wrong; the live reactor-emitted set (server@, web, workshop-refresh.{service,timer}, l4d2-{game,build}.slice) stays in deploy/files/ as curated examples. The truly-dead two (left4me-cake.service, left4me-nft-mark.service) are already deleted by this plan, so item 2 collapses to "no remaining work."
  • No memory file changes needed; the project state captured here is structural and re-derivable from deploy/README.md after the rewrite lands.

7. Rollback notes

If bw apply ovh.left4me against the test server breaks something after the cross-repo merge:

  1. Revert the ckn-bw install_left4me_scripts action change to the old source path (/opt/left4me/src/deploy/files/usr/local/{libexec,sbin}/). Re-apply.
  2. The left4me side never needs reverting in isolation — the scripts at the new path are byte-identical to the old ones, so a stale ckn-bw install action against a new left4me checkout would fail at install -t (source path missing). That failure is loud and safe: nothing on the deployed system gets modified.
  3. The only foot-gun is partial rollout: ckn-bw updated but left4me not yet checked out at the right revision. The git_deploy step pins the revision, so as long as the two PRs reference compatible commits, the deployed /opt/left4me/src/ always matches the action's expectation.

What does NOT change

  • Runtime install-target paths (/usr/local/libexec/left4me/..., /usr/local/sbin/left4me) — every reference inside l4d2host/service_control.py:7-8, l4d2web/services/overlay_builders.py:34, the sudoers file, and the systemd units stays the same.
  • The Python packages l4d2host/ and l4d2web/.
  • ckn-bw's bundles for sudoers / sysctl / sandbox-resolv.conf — those keep their own verbatim copies (the user picked "deploy/ keeps configs as examples; duplication-with-ckn-bw is OK because deploy/ is explicitly reference"). Janitoring the duplication is not in scope for this plan.
  • The Mako env templates in ckn-bw — they stay where they are, since they need bw's metadata access for rendering.
  • The recent overlay-idmap / script-sandbox idmap-staging work — untouched.

Critical files (jump points for the implementor)

  • deploy/tests/test_deploy_artifacts.py — the source for the test split (lines 20-32 are the path constants; tests grouped roughly by helper from line 138 onward)
  • deploy/README.md — full rewrite of the top section, partial rewrite of the table
  • l4d2host/service_control.py:7-8 — verify install-target paths unchanged (sanity)
  • l4d2web/services/overlay_builders.py:34 — same
  • deploy/files/etc/sudoers.d/left4me — sanity-check that no path inside changed
  • deploy/files/usr/local/lib/systemd/system/{left4me-server@.service,left4me-web.service,l4d2-{game,build}.slice} — survive as curated examples
  • ckn-bw repo: bundles/left4me/items.py — the install_left4me_scripts action (separate PR)

Verification

End-to-end:

  1. Source-tree consistency. find scripts deploy -type f | sort matches the layout in "End state" above (modulo __pycache__).
  2. All tests pass locally. From the repo root: pytest scripts/tests/ deploy/tests/ l4d2host/tests/ l4d2web/tests/ — every test passes. Specifically verify scripts/tests/test_sudoers_grants.py still reads deploy/files/etc/sudoers.d/left4me correctly (path constant points across the dir boundary).
  3. Shell syntax checks. The split tests should still run sh -n / bash -n against the moved scripts; no script edits means no syntax regressions, but the test paths must resolve.
  4. No accidental application breakage. grep -rn '/usr/local/libexec/left4me\|/usr/local/sbin/left4me' l4d2host l4d2web returns the same hits as before (paths are install-target, source moves don't affect them).
  5. ckn-bw dry-run. Once the ckn-bw PR is up, bw apply --dry-run ovh.left4me from the ckn-bw repo: the diff should show no changes to files under /usr/local/libexec/left4me/ or /usr/local/sbin/left4me (byte-identical content via the new path).
  6. Production apply. bw apply ovh.left4me against the real test server. After apply: systemctl status left4me-web.service is green, starting a game server via the web UI still works (overlay mount → srcds_run → unmount on stop), running an overlay build script through the sandbox still works.

Out of scope (handled elsewhere or deferred)

  • The Mako template duplication in ckn-bw — separate cleanup; the templates legitimately need bw's metadata access.
  • The 1/2/3-user uid-split decision — docs/superpowers/specs/2026-05-15-user-uid-split-design.md.
  • The script-sandbox → systemd template unit refactor — docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md.
  • Remaining janitorial items: item 3 (bubblewrap→systemd-run doc drift), item 4 (stale gameserver-side idmap binds), calendar reminder for SM 1.13 stable. Items 1, 2 (partial — see step 6), 5, 6 are subsumed here.
  • Rewriting the shell helpers in Python / packaging them as console_scripts — explicitly rejected in the recent script-consolidation plan (egg-info + TOCTOU privilege concerns).
  • Historical references inside docs/superpowers/plans/* and docs/superpowers/specs/* to deploy/files/... or deploy-test-server.sh paths. Those are time-stamped snapshots of past sessions; they don't get rewritten when the underlying tree moves.