left4me/docs/superpowers/specs/2026-05-09-l4d2-cpu-pinning-decision.md
mwiegand b62fc08127
docs(specs): l4d2 cpu pinning — decision record (deferred)
Investigated whether to hard-pin each srcds instance to a single core
within the existing AllowedCPUs=1-7 set. Modern kernels (5.13+) no
longer expose kernel.sched_migration_cost_ns or the other classic CFS
"laziness" tunables, so a global cheap-fix is unavailable. Decision
for now: trust CFS + Nice=-5 + AllowedCPUs=1-7. Per-instance
CPUAffinity= remains an opt-in escape hatch in deploy/README.md.
Documents the revisit triggers and the preferred implementation path
when the time comes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:41:40 +02:00

83 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# l4d2 cpu pinning — decision record (deferred)
Date: 2026-05-09
Status: decision (no implementation)
## Question
After the lifecycle + drift fix landed (commits `8552c55`, `67b5521`), the
question came up: with `AllowedCPUs=1-7` already constraining game servers
to cores 17, do CFS scheduler migrations *within* that range still cause
meaningful jitter? Should we hard-pin each instance to a single core?
## Investigation
The classic "lazy CFS" sysctl knob is **gone** on modern kernels. Verified
on Trixie's running kernel 6.12 (`ckn@10.0.4.128`):
```
/sbin/sysctl -a | grep -E "sched_migration_cost|sched_min_granularity|sched_wakeup_granularity|sched_latency"
# (no output)
```
`kernel.sched_migration_cost_ns` and the other classic CFS tunables were
removed in 5.13+ as part of the scheduler internals refactor that culminated
in EEVDF (6.6). Only `kernel.sched_rt_period_us` / `sched_rt_runtime_us`
remain. There is no global "be lazy about migrations" knob anymore.
### Available paths
| Option | Cost | Strictness | Pays off when |
|---|---|---|---|
| Trust CFS + `Nice=-5` + `AllowedCPUs=1-7` (current) | None | Soft | ≤ 3 instances on 7 cores; CFS rarely migrates active CPU-bound nice<0 tasks |
| Per-instance `CPUAffinity=N` drop-in | Web-app machinery to write drop-ins, daemon-reload, modulo or DB-persisted assignment | Strict | 4 instances (each gets exclusive core), or measured jitter |
| `isolcpus=1-7 nohz_full=1-7 rcu_nocbs=1-7` kernel cmdline | GRUB edit + reboot, host-specific | Strongest (also evicts kernel softirqs/RCU/timer ticks from game cores) | Tickrate-128 with measurable kernel-induced jitter |
| `SCHED_FIFO` per unit | Risky (RT misconfig can stall kernel) | Strict | Already documented as ops-side escape hatch in `deploy/README.md` |
### Why deferring is defensible
- The slice's `AllowedCPUs=1-7` already prevents game servers from running on core 0. The open question is "do they migrate within 17?" yes, CFS can migrate, but for long-running CPU-bound `srcds` with `Nice=-5`, migrations are infrequent. CFS prefers cache locality and only migrates when an idle core "steals" or a periodic load-balance tick detects imbalance.
- With 3 instances on 7 game cores, the load balancer rarely sees imbalance to fix.
- Per-instance hard pinning adds non-trivial machinery (drop-in writer through `left4me-systemctl`, or extending `instance.env` + a `taskset` wrapper in the unit). Not warranted unless we observe a real problem.
- `deploy/README.md` already documents the `CPUAffinity=N` per-instance drop-in as an opt-in escape hatch. An operator who measures jitter can apply it without code changes.
## Decision
**No code change.** Keep the current setup:
- Slice-level `AllowedCPUs=1-7` ensures game servers never touch core 0.
- `Nice=-5` keeps active srcds tasks weighted heavily so CFS prefers leaving them alone.
- The `CPUAffinity=N` per-instance drop-in remains the documented escape hatch.
## Revisit triggers
Any of these signals appears, then design + implement strict per-instance pinning:
- 4 game-server instances running simultaneously on one host.
- A specific server reports tickrate dips / rubber-banding correlated with another instance starting or a build sandbox firing.
- `perf stat -e sched:sched_migrate_task -p <srcds-pid>` shows > 1 migration/sec under load.
When revisiting, two implementation paths to choose from:
1. **Modulo assignment in the host library.** Read `LEFT4ME_GAME_CPUS` (or parse the slice's `AllowedCPUs=` drop-in), pick `game_cpus[(int(name) - 1) % len(game_cpus)]`, write `L4D2_CPU=N` into `instance.env`, wrap the unit's `ExecStart` with `taskset -c ${L4D2_CPU}`. Stateless, deterministic, no DB column. **Preferred.**
2. **Persisted assignment.** Add `Server.cpu_pin` column, web app picks at initialize time and stores. Survives `LEFT4ME_GAME_CPUS` changes (each server keeps its assigned core). Bigger ripple.
## Verification (no-op confirmation)
```sh
ssh ckn@10.0.4.128 'systemctl show l4d2-game.slice -p AllowedCPUs'
# expect: AllowedCPUs=1-7
ssh ckn@10.0.4.128 'cat /sys/fs/cgroup/system.slice/cpuset.cpus.effective'
# expect: 0 (everything-not-game still pinned to core 0)
# When ≥ 1 server is running:
ssh ckn@10.0.4.128 'for p in $(pgrep srcds); do grep ^Cpus_allowed_list /proc/$p/status; done'
# expect: 1-7 (CFS picks whichever of those is hottest at any given moment)
```
## References
- `docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md` — sibling design that introduced the `AllowedCPUs=1-7` slice constraint this record builds on.
- `deploy/README.md` "Performance Tuning" section — the `CPUAffinity=N` per-instance escape hatch.
- Linux kernel changelog 5.13+ — removal of classic CFS tunable sysctls.