docs(specs): l4d2 cpu pinning — decision record (deferred)

Investigated whether to hard-pin each srcds instance to a single core within the existing AllowedCPUs=1-7 set. Modern kernels (5.13+) no longer expose kernel.sched_migration_cost_ns or the other classic CFS "laziness" tunables, so a global cheap-fix is unavailable. Decision for now: trust CFS + Nice=-5 + AllowedCPUs=1-7. Per-instance CPUAffinity= remains an opt-in escape hatch in deploy/README.md. Documents the revisit triggers and the preferred implementation path when the time comes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:41:40 +02:00 · 2026-05-09 12:41:40 +02:00 · b62fc08127
commit b62fc08127
parent 67b5521eb6
1 changed files with 83 additions and 0 deletions
--- a/docs/superpowers/specs/2026-05-09-l4d2-cpu-pinning-decision.md
+++ b/docs/superpowers/specs/2026-05-09-l4d2-cpu-pinning-decision.md
@ -0,0 +1,83 @@
+# l4d2 cpu pinning — decision record (deferred)
+
+Date: 2026-05-09
+Status: decision (no implementation)
+
+## Question
+
+After the lifecycle + drift fix landed (commits `8552c55`, `67b5521`), the
+question came up: with `AllowedCPUs=1-7` already constraining game servers
+to cores 1–7, do CFS scheduler migrations *within* that range still cause
+meaningful jitter? Should we hard-pin each instance to a single core?
+
+## Investigation
+
+The classic "lazy CFS" sysctl knob is **gone** on modern kernels. Verified
+on Trixie's running kernel 6.12 (`ckn@10.0.4.128`):
+
+```
+/sbin/sysctl -a | grep -E "sched_migration_cost|sched_min_granularity|sched_wakeup_granularity|sched_latency"
+# (no output)
+```
+
+`kernel.sched_migration_cost_ns` and the other classic CFS tunables were
+removed in 5.13+ as part of the scheduler internals refactor that culminated
+in EEVDF (6.6). Only `kernel.sched_rt_period_us` / `sched_rt_runtime_us`
+remain. There is no global "be lazy about migrations" knob anymore.
+
+### Available paths
+
+| Option | Cost | Strictness | Pays off when |
+|---|---|---|---|
+| Trust CFS + `Nice=-5` + `AllowedCPUs=1-7` (current) | None | Soft | ≤ 3 instances on 7 cores; CFS rarely migrates active CPU-bound nice<0 tasks |
+| Per-instance `CPUAffinity=N` drop-in | Web-app machinery to write drop-ins, daemon-reload, modulo or DB-persisted assignment | Strict | ≥ 4 instances (each gets exclusive core), or measured jitter |
+| `isolcpus=1-7 nohz_full=1-7 rcu_nocbs=1-7` kernel cmdline | GRUB edit + reboot, host-specific | Strongest (also evicts kernel softirqs/RCU/timer ticks from game cores) | Tickrate-128 with measurable kernel-induced jitter |
+| `SCHED_FIFO` per unit | Risky (RT misconfig can stall kernel) | Strict | Already documented as ops-side escape hatch in `deploy/README.md` |
+
+### Why deferring is defensible
+
+- The slice's `AllowedCPUs=1-7` already prevents game servers from running on core 0. The open question is "do they migrate within 1–7?" — yes, CFS can migrate, but for long-running CPU-bound `srcds` with `Nice=-5`, migrations are infrequent. CFS prefers cache locality and only migrates when an idle core "steals" or a periodic load-balance tick detects imbalance.
+- With ≤ 3 instances on 7 game cores, the load balancer rarely sees imbalance to fix.
+- Per-instance hard pinning adds non-trivial machinery (drop-in writer through `left4me-systemctl`, or extending `instance.env` + a `taskset` wrapper in the unit). Not warranted unless we observe a real problem.
+- `deploy/README.md` already documents the `CPUAffinity=N` per-instance drop-in as an opt-in escape hatch. An operator who measures jitter can apply it without code changes.
+
+## Decision
+
+**No code change.** Keep the current setup:
+
+- Slice-level `AllowedCPUs=1-7` ensures game servers never touch core 0.
+- `Nice=-5` keeps active srcds tasks weighted heavily so CFS prefers leaving them alone.
+- The `CPUAffinity=N` per-instance drop-in remains the documented escape hatch.
+
+## Revisit triggers
+
+Any of these signals appears, then design + implement strict per-instance pinning:
+
+- ≥ 4 game-server instances running simultaneously on one host.
+- A specific server reports tickrate dips / rubber-banding correlated with another instance starting or a build sandbox firing.
+- `perf stat -e sched:sched_migrate_task -p <srcds-pid>` shows > 1 migration/sec under load.
+
+When revisiting, two implementation paths to choose from:
+
+1. **Modulo assignment in the host library.** Read `LEFT4ME_GAME_CPUS` (or parse the slice's `AllowedCPUs=` drop-in), pick `game_cpus[(int(name) - 1) % len(game_cpus)]`, write `L4D2_CPU=N` into `instance.env`, wrap the unit's `ExecStart` with `taskset -c ${L4D2_CPU}`. Stateless, deterministic, no DB column. **Preferred.**
+2. **Persisted assignment.** Add `Server.cpu_pin` column, web app picks at initialize time and stores. Survives `LEFT4ME_GAME_CPUS` changes (each server keeps its assigned core). Bigger ripple.
+
+## Verification (no-op confirmation)
+
+```sh
+ssh ckn@10.0.4.128 'systemctl show l4d2-game.slice -p AllowedCPUs'
+# expect: AllowedCPUs=1-7
+
+ssh ckn@10.0.4.128 'cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective'
+# expect: 0   (everything-not-game still pinned to core 0)
+
+# When ≥ 1 server is running:
+ssh ckn@10.0.4.128 'for p in $(pgrep srcds); do grep ^Cpus_allowed_list /proc/$p/status; done'
+# expect: 1-7   (CFS picks whichever of those is hottest at any given moment)
+```
+
+## References
+
+- `docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md` — sibling design that introduced the `AllowedCPUs=1-7` slice constraint this record builds on.
+- `deploy/README.md` "Performance Tuning" section — the `CPUAffinity=N` per-instance escape hatch.
+- Linux kernel changelog 5.13+ — removal of classic CFS tunable sysctls.