cgroup-v2 AllowedCPUs= drop-ins for system/user/build/game slices. Defaults: core 0 for everything-not-game, cores 1..N-1 for game, computed from nproc. LEFT4ME_SYSTEM_CPUS / LEFT4ME_GAME_CPUS overrides; single-core hosts skip with a warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
131 lines
7.7 KiB
Markdown
131 lines
7.7 KiB
Markdown
# l4d2 cpu isolation — design
|
|
|
|
Date: 2026-05-09
|
|
Status: design
|
|
|
|
## Summary
|
|
|
|
Constrain every cgroup that isn't a live game server to core 0; give game servers cores 1..N-1 exclusively. Implementation is systemd cgroup-v2 `AllowedCPUs=` drop-ins, computed at deploy time from `nproc`, overridable via env vars. Lands on top of the perf baseline shipped in `851e662..e5126c8`.
|
|
|
|
## Goals
|
|
|
|
- A logged-in admin doing CPU-heavy work, the script-build sandbox, and the Flask web app cannot steal cycles from a live match.
|
|
- Layout scales automatically across host sizes (4-core, 8-core, 16-core) without per-host edits.
|
|
- Operator can override the default `0` / `1..N-1` split for NUMA boxes or hyperthread quirks.
|
|
- Single-core hosts degrade gracefully: skip CPU isolation, keep the rest of the perf baseline.
|
|
|
|
## Non-goals
|
|
|
|
- Kernel `isolcpus=` / `nohz_full=` / `rcu_nocbs=` boot parameters. True core isolation (eviction of softirqs, RCU, timer ticks) requires GRUB edits + reboot + per-host tuning. cgroup cpuset is sufficient for L4D2 tickrates; document as a future opt-in if measurement justifies it.
|
|
- NIC IRQ pinning. Hardware-specific; already documented as an escape hatch in `deploy/README.md`.
|
|
- Per-instance pinning *within* the game-core set. The slice-level cpuset is the floor; the existing per-instance `CPUAffinity=` drop-in escape hatch (already in `deploy/README.md`) composes on top — the kernel enforces "per-instance value must be a subset of slice's allowed set."
|
|
- A separate `l4d2-web.slice`. The web app is light; living in `system.slice` on core 0 is fine.
|
|
- Web-app or host-library code changes. Pure deploy-side artifact work.
|
|
|
|
## Background
|
|
|
|
The perf baseline (commit range `851e662..e5126c8`) introduced two slices (`l4d2-game.slice` weight 1000, `l4d2-build.slice` weight 10), per-instance unit directives (Nice, OOM, memory caps), and host sysctls. None of those constrain *which* CPUs cgroups run on. Under the kernel CFS, every task can move to any core; the build sandbox, ssh sessions, the web app, and game servers all compete for the same cores.
|
|
|
|
## Design
|
|
|
|
### Topology
|
|
|
|
```
|
|
core 0 cores 1..N-1
|
|
───────── ────────────
|
|
system.slice AllowedCPUs=0
|
|
user.slice AllowedCPUs=0
|
|
l4d2-build.slice AllowedCPUs=0
|
|
l4d2-game.slice AllowedCPUs=1-(N-1)
|
|
```
|
|
|
|
Everything that isn't a live game server (Flask web app, ssh sessions, journald, script-sandbox builds, cron, systemd housekeeping) is funneled to core 0. Game servers get cores 1..N-1 exclusively.
|
|
|
|
### Why slice-level `AllowedCPUs=`, not per-instance `CPUAffinity=`
|
|
|
|
- **Hierarchy does the work for free.** A cpuset on `l4d2-game.slice` propagates to every `left4me-server@*.service` automatically. No per-instance drop-ins to manage; no logic in the web app to pick cores.
|
|
- **Hot-applied.** cgroup-v2 cpuset changes apply to running cgroups; existing servers move next time the kernel schedules them. No need to restart instances after a deploy.
|
|
- **Composable.** A future operator who wants per-instance pinning *within* the game cores adds `CPUAffinity=N` via `/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf` (already documented). The slice constraint and per-instance pin compose; the kernel enforces subset-of.
|
|
|
|
### Why drop-ins, not edits to the existing `.slice` files
|
|
|
|
The two slice files we ship today (`l4d2-game.slice`, `l4d2-build.slice`) are static text and host-portable. `AllowedCPUs=1-7` is true on an 8-core host and wrong on a 4-core host. Drop-ins under `<unit>.d/*.conf` are the standard systemd pattern for host-specific overrides. We already use `99-` prefixing for the sysctl drop-in so it lex-orders last; reuse that.
|
|
|
|
### Operator override
|
|
|
|
Two env vars consumed by the deploy script:
|
|
|
|
- `LEFT4ME_SYSTEM_CPUS` — defaults to `0`. Goes into `system.slice`, `user.slice`, `l4d2-build.slice` drop-ins.
|
|
- `LEFT4ME_GAME_CPUS` — defaults to `1-$((NPROC-1))`. Goes into `l4d2-game.slice` drop-in.
|
|
|
|
Operators with NUMA boxes, hyperthread quirks, or "I want core 0 *and* core 1 for system" set the vars explicitly. Defaults handle the typical case.
|
|
|
|
### Single-core fallback
|
|
|
|
If `nproc < 2`, skip CPU isolation entirely (write no drop-ins). Print a warning to stderr explaining the deploy is leaving cpuset unset. The rest of the perf baseline still applies (weights, sysctls, OOM scores).
|
|
|
|
If `LEFT4ME_GAME_CPUS` or `LEFT4ME_SYSTEM_CPUS` is set explicitly on a single-core host, honor the operator's intent — they presumably know what they're doing — but still write the drop-ins.
|
|
|
|
### Drop-in layout
|
|
|
|
Four files written to `/etc/systemd/system/`, each named `99-left4me-cpuset.conf`:
|
|
|
|
```
|
|
/etc/systemd/system/system.slice.d/99-left4me-cpuset.conf
|
|
/etc/systemd/system/user.slice.d/99-left4me-cpuset.conf
|
|
/etc/systemd/system/l4d2-build.slice.d/99-left4me-cpuset.conf
|
|
/etc/systemd/system/l4d2-game.slice.d/99-left4me-cpuset.conf
|
|
```
|
|
|
|
Each file contains:
|
|
|
|
```ini
|
|
[Slice]
|
|
AllowedCPUs=<resolved value>
|
|
```
|
|
|
|
### systemd compatibility
|
|
|
|
`AllowedCPUs=` is systemd 244+. Debian Trixie ships systemd 256+. Cgroup-v2 cpuset controller is enabled by default on Trixie; systemd auto-enables the controller when `AllowedCPUs=` is set on a unit. No additional machinery.
|
|
|
|
### Files changed / added
|
|
|
|
```
|
|
deploy/deploy-test-server.sh (modified — compute layout, write four drop-ins)
|
|
deploy/README.md (modified — new "CPU isolation" subsection inside Performance Tuning)
|
|
deploy/tests/test_deploy_artifacts.py (modified — new tests)
|
|
```
|
|
|
|
## Tests
|
|
|
|
`deploy/tests/test_deploy_artifacts.py` additions, following the existing
|
|
`assert "X" in script` pattern:
|
|
|
|
- For `deploy-test-server.sh`, assert:
|
|
- All four drop-in paths (`/etc/systemd/system/{system,user,l4d2-build,l4d2-game}.slice.d/99-left4me-cpuset.conf`) appear.
|
|
- The script reads `nproc` (substring `nproc` plus a default-binding form for `LEFT4ME_GAME_CPUS`).
|
|
- The script honors `LEFT4ME_SYSTEM_CPUS` and `LEFT4ME_GAME_CPUS` env-var overrides (substrings present, default-binding form like `${LEFT4ME_SYSTEM_CPUS:-...}`).
|
|
- The script has a single-core fallback (substring guarding `nproc -lt 2` or equivalent, with a warning to stderr).
|
|
- Each drop-in is written via the existing `install -m 0644 -o root -g root` heredoc pattern.
|
|
|
|
No runtime tests in this spec — verifying that systemd actually enforces `AllowedCPUs=` is operator-side via `cat /sys/fs/cgroup/<slice>/cpuset.cpus.effective` after deploy.
|
|
|
|
## Rollout
|
|
|
|
Single deploy. cgroup-v2 cpuset changes apply to running cgroups, so already-running servers move next time the kernel reschedules them — no instance restarts required. The `daemon-reload` already in the deploy script picks up the new drop-ins.
|
|
|
|
If something goes wrong (cpuset too narrow, a slice can't run any process), `systemctl status <slice>` will show the error and the operator can either fix the env vars and redeploy or `rm /etc/systemd/system/<slice>.slice.d/99-left4me-cpuset.conf` followed by `systemctl daemon-reload` to revert.
|
|
|
|
## Open questions
|
|
|
|
None blocking. Possible v2 candidates if measurement justifies them:
|
|
|
|
- Pair this with kernel `isolcpus=` boot params for true core isolation.
|
|
- Auto-pin NIC IRQs to core 0 (would compose with this isolation).
|
|
- Per-instance `CPUAffinity=` driven by a deploy-env knob, partitioning the game-core set across instances deterministically.
|
|
|
|
## References
|
|
|
|
- systemd.resource-control(5) — `AllowedCPUs=` semantics.
|
|
- Linux Documentation/admin-guide/cgroup-v2.rst — cpuset controller behavior on `cpuset.cpus` / `cpuset.cpus.effective`.
|
|
- Existing perf-baseline spec: `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md` — sibling work that introduced the slices this spec extends.
|