left4me/docs/superpowers/specs/2026-05-09-l4d2-cpu-isolation-design.md
mwiegand 17b7c2ff10
docs(specs): l4d2 cpu isolation — design
cgroup-v2 AllowedCPUs= drop-ins for system/user/build/game slices.
Defaults: core 0 for everything-not-game, cores 1..N-1 for game,
computed from nproc. LEFT4ME_SYSTEM_CPUS / LEFT4ME_GAME_CPUS
overrides; single-core hosts skip with a warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:03:37 +02:00

7.7 KiB

l4d2 cpu isolation — design

Date: 2026-05-09 Status: design

Summary

Constrain every cgroup that isn't a live game server to core 0; give game servers cores 1..N-1 exclusively. Implementation is systemd cgroup-v2 AllowedCPUs= drop-ins, computed at deploy time from nproc, overridable via env vars. Lands on top of the perf baseline shipped in 851e662..e5126c8.

Goals

  • A logged-in admin doing CPU-heavy work, the script-build sandbox, and the Flask web app cannot steal cycles from a live match.
  • Layout scales automatically across host sizes (4-core, 8-core, 16-core) without per-host edits.
  • Operator can override the default 0 / 1..N-1 split for NUMA boxes or hyperthread quirks.
  • Single-core hosts degrade gracefully: skip CPU isolation, keep the rest of the perf baseline.

Non-goals

  • Kernel isolcpus= / nohz_full= / rcu_nocbs= boot parameters. True core isolation (eviction of softirqs, RCU, timer ticks) requires GRUB edits + reboot + per-host tuning. cgroup cpuset is sufficient for L4D2 tickrates; document as a future opt-in if measurement justifies it.
  • NIC IRQ pinning. Hardware-specific; already documented as an escape hatch in deploy/README.md.
  • Per-instance pinning within the game-core set. The slice-level cpuset is the floor; the existing per-instance CPUAffinity= drop-in escape hatch (already in deploy/README.md) composes on top — the kernel enforces "per-instance value must be a subset of slice's allowed set."
  • A separate l4d2-web.slice. The web app is light; living in system.slice on core 0 is fine.
  • Web-app or host-library code changes. Pure deploy-side artifact work.

Background

The perf baseline (commit range 851e662..e5126c8) introduced two slices (l4d2-game.slice weight 1000, l4d2-build.slice weight 10), per-instance unit directives (Nice, OOM, memory caps), and host sysctls. None of those constrain which CPUs cgroups run on. Under the kernel CFS, every task can move to any core; the build sandbox, ssh sessions, the web app, and game servers all compete for the same cores.

Design

Topology

                core 0           cores 1..N-1
                ─────────        ────────────
system.slice    AllowedCPUs=0
user.slice      AllowedCPUs=0
l4d2-build.slice AllowedCPUs=0
l4d2-game.slice                 AllowedCPUs=1-(N-1)

Everything that isn't a live game server (Flask web app, ssh sessions, journald, script-sandbox builds, cron, systemd housekeeping) is funneled to core 0. Game servers get cores 1..N-1 exclusively.

Why slice-level AllowedCPUs=, not per-instance CPUAffinity=

  • Hierarchy does the work for free. A cpuset on l4d2-game.slice propagates to every left4me-server@*.service automatically. No per-instance drop-ins to manage; no logic in the web app to pick cores.
  • Hot-applied. cgroup-v2 cpuset changes apply to running cgroups; existing servers move next time the kernel schedules them. No need to restart instances after a deploy.
  • Composable. A future operator who wants per-instance pinning within the game cores adds CPUAffinity=N via /etc/systemd/system/left4me-server@<name>.service.d/affinity.conf (already documented). The slice constraint and per-instance pin compose; the kernel enforces subset-of.

Why drop-ins, not edits to the existing .slice files

The two slice files we ship today (l4d2-game.slice, l4d2-build.slice) are static text and host-portable. AllowedCPUs=1-7 is true on an 8-core host and wrong on a 4-core host. Drop-ins under <unit>.d/*.conf are the standard systemd pattern for host-specific overrides. We already use 99- prefixing for the sysctl drop-in so it lex-orders last; reuse that.

Operator override

Two env vars consumed by the deploy script:

  • LEFT4ME_SYSTEM_CPUS — defaults to 0. Goes into system.slice, user.slice, l4d2-build.slice drop-ins.
  • LEFT4ME_GAME_CPUS — defaults to 1-$((NPROC-1)). Goes into l4d2-game.slice drop-in.

Operators with NUMA boxes, hyperthread quirks, or "I want core 0 and core 1 for system" set the vars explicitly. Defaults handle the typical case.

Single-core fallback

If nproc < 2, skip CPU isolation entirely (write no drop-ins). Print a warning to stderr explaining the deploy is leaving cpuset unset. The rest of the perf baseline still applies (weights, sysctls, OOM scores).

If LEFT4ME_GAME_CPUS or LEFT4ME_SYSTEM_CPUS is set explicitly on a single-core host, honor the operator's intent — they presumably know what they're doing — but still write the drop-ins.

Drop-in layout

Four files written to /etc/systemd/system/, each named 99-left4me-cpuset.conf:

/etc/systemd/system/system.slice.d/99-left4me-cpuset.conf
/etc/systemd/system/user.slice.d/99-left4me-cpuset.conf
/etc/systemd/system/l4d2-build.slice.d/99-left4me-cpuset.conf
/etc/systemd/system/l4d2-game.slice.d/99-left4me-cpuset.conf

Each file contains:

[Slice]
AllowedCPUs=<resolved value>

systemd compatibility

AllowedCPUs= is systemd 244+. Debian Trixie ships systemd 256+. Cgroup-v2 cpuset controller is enabled by default on Trixie; systemd auto-enables the controller when AllowedCPUs= is set on a unit. No additional machinery.

Files changed / added

deploy/deploy-test-server.sh                   (modified — compute layout, write four drop-ins)
deploy/README.md                               (modified — new "CPU isolation" subsection inside Performance Tuning)
deploy/tests/test_deploy_artifacts.py          (modified — new tests)

Tests

deploy/tests/test_deploy_artifacts.py additions, following the existing assert "X" in script pattern:

  • For deploy-test-server.sh, assert:
    • All four drop-in paths (/etc/systemd/system/{system,user,l4d2-build,l4d2-game}.slice.d/99-left4me-cpuset.conf) appear.
    • The script reads nproc (substring nproc plus a default-binding form for LEFT4ME_GAME_CPUS).
    • The script honors LEFT4ME_SYSTEM_CPUS and LEFT4ME_GAME_CPUS env-var overrides (substrings present, default-binding form like ${LEFT4ME_SYSTEM_CPUS:-...}).
    • The script has a single-core fallback (substring guarding nproc -lt 2 or equivalent, with a warning to stderr).
    • Each drop-in is written via the existing install -m 0644 -o root -g root heredoc pattern.

No runtime tests in this spec — verifying that systemd actually enforces AllowedCPUs= is operator-side via cat /sys/fs/cgroup/<slice>/cpuset.cpus.effective after deploy.

Rollout

Single deploy. cgroup-v2 cpuset changes apply to running cgroups, so already-running servers move next time the kernel reschedules them — no instance restarts required. The daemon-reload already in the deploy script picks up the new drop-ins.

If something goes wrong (cpuset too narrow, a slice can't run any process), systemctl status <slice> will show the error and the operator can either fix the env vars and redeploy or rm /etc/systemd/system/<slice>.slice.d/99-left4me-cpuset.conf followed by systemctl daemon-reload to revert.

Open questions

None blocking. Possible v2 candidates if measurement justifies them:

  • Pair this with kernel isolcpus= boot params for true core isolation.
  • Auto-pin NIC IRQs to core 0 (would compose with this isolation).
  • Per-instance CPUAffinity= driven by a deploy-env knob, partitioning the game-core set across instances deterministically.

References

  • systemd.resource-control(5) — AllowedCPUs= semantics.
  • Linux Documentation/admin-guide/cgroup-v2.rst — cpuset controller behavior on cpuset.cpus / cpuset.cpus.effective.
  • Existing perf-baseline spec: docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md — sibling work that introduced the slices this spec extends.