From c91c029c38a6d3d661051473eb5bf09c4bbd9bf1 Mon Sep 17 00:00:00 2001 From: mwiegand Date: Sat, 9 May 2026 11:03:37 +0200 Subject: [PATCH] =?UTF-8?q?docs(plans):=20l4d2=20cpu=20isolation=20?= =?UTF-8?q?=E2=80=94=20implementation=20plan?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two TDD tasks: deploy-script cpuset block + tests, README "CPU isolation" subsection. Operator-side smoke test in F.3. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plans/2026-05-09-l4d2-cpu-isolation.md | 260 ++++++++++++++++++ 1 file changed, 260 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-09-l4d2-cpu-isolation.md diff --git a/docs/superpowers/plans/2026-05-09-l4d2-cpu-isolation.md b/docs/superpowers/plans/2026-05-09-l4d2-cpu-isolation.md new file mode 100644 index 0000000..7ffc0e0 --- /dev/null +++ b/docs/superpowers/plans/2026-05-09-l4d2-cpu-isolation.md @@ -0,0 +1,260 @@ +# L4D2 CPU Isolation Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Constrain every cgroup that isn't a live game server to core 0; give game servers cores 1..N-1 exclusively, scaled automatically across host sizes. + +**Architecture:** Four `99-left4me-cpuset.conf` drop-ins under `/etc/systemd/system/{system,user,l4d2-build,l4d2-game}.slice.d/`, written by the deploy script from heredocs. `LEFT4ME_SYSTEM_CPUS` (default `0`) and `LEFT4ME_GAME_CPUS` (default `1-$((NPROC-1))`) are env-var overrides. Single-core hosts skip the cpuset writes with a warning. + +**Tech Stack:** systemd cgroup-v2 `AllowedCPUs=` directive, bash heredoc + `install`, Linux `nproc(1)`, pytest text-assertion tests. + +**Spec:** `docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md` + +--- + +## File Structure + +Files to modify: + +- `deploy/deploy-test-server.sh` — compute `NPROC`, default `LEFT4ME_SYSTEM_CPUS=0` / `LEFT4ME_GAME_CPUS=1-$((NPROC-1))`, write four drop-in files. Skip when `nproc < 2` (with stderr warning) unless either env var is set explicitly. +- `deploy/README.md` — append a "CPU isolation" subsection inside the existing "Performance Tuning" section. +- `deploy/tests/test_deploy_artifacts.py` — new test functions. + +No host library or web app changes. + +--- + +## Pre-flight + +- [ ] **Step 0a: Verify clean working tree** + +Run: `git status` +Expected: `nothing to commit, working tree clean` + +- [ ] **Step 0b: Verify the existing deploy tests are at the known-good baseline** + +Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q` +Expected: 35 passed, 1 failed (the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state`). + +If the count differs, stop and surface — this plan assumes that exact baseline. + +--- + +## Task 1: Deploy-script CPU-isolation block + tests + +Write the four drop-ins from the deploy script in one cohesive block. The block computes `NPROC` once, resolves both env vars (with defaults), guards single-core hosts, and writes each drop-in via the existing `install -m 0644 -o root -g root` pattern. Tests cover defaults, overrides, single-core skip, and drop-in paths. + +**Files:** +- Modify: `deploy/deploy-test-server.sh` +- Modify: `deploy/tests/test_deploy_artifacts.py` (new test function) + +- [ ] **Step 1.1: Add the failing test** + +Open `deploy/tests/test_deploy_artifacts.py` and append (after the `test_deploy_script_installs_perf_artifacts` from the perf-baseline branch): + +```python +def test_deploy_script_writes_cpuset_drop_ins(): + script = DEPLOY_SCRIPT.read_text() + + # Reads nproc and binds defaults via ${VAR:-...}. + assert "nproc" in script + assert "LEFT4ME_SYSTEM_CPUS" in script + assert "LEFT4ME_GAME_CPUS" in script + assert "${LEFT4ME_SYSTEM_CPUS:-0}" in script + # Default game-core expression: 1-(nproc-1). Match the form the + # implementer chose; both `1-$((NPROC-1))` and `1-$((nproc-1))` are + # acceptable as long as the upper bound is computed from nproc. + assert ("1-$((NPROC-1))" in script) or ("1-$((nproc-1))" in script) \ + or ("LEFT4ME_GAME_CPUS:-1-" in script) + + # All four drop-in paths. + for slice_name in ("system", "user", "l4d2-build", "l4d2-game"): + assert f"/etc/systemd/system/{slice_name}.slice.d/99-left4me-cpuset.conf" in script + + # Drop-ins use the existing install pattern. + assert "install -m 0644 -o root -g root" in script + + # Single-core host: skip with a warning to stderr. + # Match either an explicit `nproc < 2` / `-lt 2` guard or `[ "$nproc" -ge 2 ]` form. + assert ("nproc" in script) and (("-lt 2" in script) or ("-ge 2" in script) or ("< 2" in script)) + assert "skipping CPU isolation" in script.lower() or "skip cpu isolation" in script.lower() +``` + +- [ ] **Step 1.2: Run the new test, verify it fails** + +Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_writes_cpuset_drop_ins -v` +Expected: FAIL — none of the new strings exist yet. + +- [ ] **Step 1.3: Edit the deploy script — add the cpuset block** + +Open `deploy/deploy-test-server.sh`. Find the block that copies the slice files (added in the perf-baseline branch, around lines 139–140): + +```sh +$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-game.slice /usr/local/lib/systemd/system/l4d2-game.slice +$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-build.slice /usr/local/lib/systemd/system/l4d2-build.slice +``` + +Immediately after that pair, before any of the helper-script copies that follow, insert this block: + +```sh +# CPU isolation via cgroup-v2 AllowedCPUs= drop-ins. Pin everything that +# isn't a live game server to core 0; give game servers cores 1..N-1. +# See docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md. +NPROC=$(nproc) +SYSTEM_CPUS=${LEFT4ME_SYSTEM_CPUS:-0} +if [ "${LEFT4ME_GAME_CPUS+x}" = x ]; then + GAME_CPUS=$LEFT4ME_GAME_CPUS +else + GAME_CPUS="1-$((NPROC - 1))" +fi +if [ "$NPROC" -lt 2 ] && [ -z "${LEFT4ME_SYSTEM_CPUS+x}${LEFT4ME_GAME_CPUS+x}" ]; then + printf 'left4me deploy: skipping CPU isolation (nproc=%s); cpuset drop-ins not written.\n' "$NPROC" >&2 +else + for slice_name in system user l4d2-build; do + $sudo_cmd mkdir -p "/etc/systemd/system/${slice_name}.slice.d" + printf '[Slice]\nAllowedCPUs=%s\n' "$SYSTEM_CPUS" \ + | $sudo_cmd install -m 0644 -o root -g root /dev/stdin \ + "/etc/systemd/system/${slice_name}.slice.d/99-left4me-cpuset.conf" + done + $sudo_cmd mkdir -p "/etc/systemd/system/l4d2-game.slice.d" + printf '[Slice]\nAllowedCPUs=%s\n' "$GAME_CPUS" \ + | $sudo_cmd install -m 0644 -o root -g root /dev/stdin \ + "/etc/systemd/system/l4d2-game.slice.d/99-left4me-cpuset.conf" +fi +``` + +Notes for the implementer: + +- The single-core skip only triggers when **neither** override is set. If the operator sets either `LEFT4ME_SYSTEM_CPUS` or `LEFT4ME_GAME_CPUS` explicitly on a single-core host, honor their intent. +- `install -m 0644 -o root -g root /dev/stdin ` is the idiomatic way to install a small generated file from a pipeline (matches the existing pattern for sandbox-resolv.conf, just with `/dev/stdin` as source). +- The `mkdir -p` for each `.d` directory is required: systemd reads drop-ins only from existing directories. + +- [ ] **Step 1.4: Verify shell syntax still parses** + +Run: `sh -n deploy/deploy-test-server.sh` +Expected: exit 0, no output. + +- [ ] **Step 1.5: Run the new test and full deploy test suite** + +Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q` +Expected: 36 passed, 1 failed (the pre-existing unrelated test, count goes from 35→36 because of the new test). + +If your specific assertion forms in Step 1.1 don't match the implementation, adjust the test — but only the `or` branches; do not weaken the contract. + +- [ ] **Step 1.6: Commit** + +```bash +git add deploy/deploy-test-server.sh deploy/tests/test_deploy_artifacts.py +git commit -m "$(cat <<'EOF' +feat(deploy): cgroup-v2 cpuset drop-ins pin system to core 0, game to rest + +Computes NPROC at deploy time. Defaults LEFT4ME_SYSTEM_CPUS=0 and +LEFT4ME_GAME_CPUS=1-(NPROC-1). Single-core hosts skip cpuset writes +with a stderr warning unless an env var override is set. Spec: +docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md +EOF +)" +``` + +--- + +## Task 2: README "CPU isolation" subsection + +Append a subsection to `deploy/README.md` inside the existing "Performance Tuning" section, documenting the layout, the env-var overrides, the single-core skip, and the relationship to the existing per-instance `CPUAffinity=` escape hatch. + +**Files:** +- Modify: `deploy/README.md` + +No test for this task — README content is documentation, not contract. + +- [ ] **Step 2.1: Append the CPU isolation subsection** + +Open `deploy/README.md`. Find the existing `### Per-instance CPU affinity` subsection (added in the perf-baseline branch). Insert a new subsection **immediately before** it (so the slice-level isolation is documented before the per-instance refinement that builds on top). The new subsection content: + +```markdown +### CPU isolation (cores) + +The deploy script writes four `AllowedCPUs=` drop-ins so that, by default, only `l4d2-game.slice` is allowed to run on cores 1..N-1; `system.slice`, `user.slice`, and `l4d2-build.slice` are pinned to core 0. Game servers thus get the host minus core 0 exclusively, the build sandbox and the web app stay on core 0, and a logged-in admin running CPU-heavy work in their shell can't steal cycles from a live match. + +Override the split by setting either env var when running the deploy: + +```sh +LEFT4ME_SYSTEM_CPUS="0,1" LEFT4ME_GAME_CPUS="2-7" deploy/deploy-test-server.sh deploy-user@host +``` + +On single-core hosts the deploy skips the cpuset drop-ins entirely and prints a warning to stderr; the rest of the perf baseline (cgroup weights, sysctls, OOM scores) still applies. To force isolation on a single-core host anyway (rarely useful), set either env var explicitly. + +Per-instance `CPUAffinity=` (next subsection) composes on top of this — the per-instance value must be a subset of `l4d2-game.slice`'s `AllowedCPUs=`, which the kernel enforces. +``` + +(The outer triple-backticks above are markdown punctuation around this prompt block, not part of the README content. Inner code-block fences DO need to be written into the README. The `markdown` language tag on the outer fence in this plan is documentation-only.) + +- [ ] **Step 2.2: Run the full deploy test suite** + +Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q` +Expected: 36 passed, 1 failed (unchanged; README has no test). + +- [ ] **Step 2.3: Commit** + +```bash +git add deploy/README.md +git commit -m "$(cat <<'EOF' +docs(deploy): document CPU isolation in performance-tuning section + +Explains the core-0-vs-game-cores split, the LEFT4ME_SYSTEM_CPUS / +LEFT4ME_GAME_CPUS overrides, the single-core skip, and the +subset-of relationship with per-instance CPUAffinity=. +EOF +)" +``` + +--- + +## Final Verification + +- [ ] **Step F.1: Full deploy + host + web test sweep** + +Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ l4d2host/tests l4d2web/tests -q` +Expected: deploy 36 passed / 1 failed (pre-existing); host 111 passed / 1 skipped; web 313 passed / 1 skipped. + +- [ ] **Step F.2: Working tree clean and commits in order** + +Run: `git status && git log --oneline -5` +Expected: +- `git status`: clean. +- Top of `git log`: + 1. `docs(deploy): document CPU isolation in performance-tuning section` + 2. `feat(deploy): cgroup-v2 cpuset drop-ins pin system to core 0, game to rest` + 3. `docs(plans): l4d2 cpu isolation — implementation plan` + 4. `docs(specs): l4d2 cpu isolation — design` + +- [ ] **Step F.3: Operator-side smoke test (deferred, not part of this plan)** + +This plan ships artifacts. Confirming systemd actually enforces `AllowedCPUs=` on a real Trixie host is operator-side: + +```sh +deploy/deploy-test-server.sh deploy-user@example-host +ssh deploy-user@example-host ' + systemctl cat system.slice | grep AllowedCPUs + systemctl cat l4d2-game.slice | grep AllowedCPUs + cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective + cat /sys/fs/cgroup/l4d2-game.slice/cpuset.cpus.effective +' +# Expect on an 8-core box: +# system.slice → AllowedCPUs=0 → cpuset.cpus.effective = 0 +# l4d2-game.slice → AllowedCPUs=1-7 → cpuset.cpus.effective = 1-7 +``` + +End-to-end behavioural test (manual, ops-side): on a 4-core host, run two L4D2 instances + a script-sandbox build simultaneously. Confirm via `htop` (with affinity column on) that the srcds processes only ever appear on cores 1, 2, 3 and the sandbox + web stay on core 0. + +--- + +## Out of Scope (do NOT implement here) + +- Kernel `isolcpus=` / `nohz_full=` / `rcu_nocbs=` boot params. +- NIC IRQ pinning automation. +- Per-instance `CPUAffinity=` driven by a deploy-env knob. +- A separate `l4d2-web.slice`. +- Any web-app or host-library code changes. + +If you find yourself touching any of these, stop — they belong in a separate spec.