# l4d2 server host perf baseline — design Date: 2026-05-09 Status: design ## Summary Apply a host-side performance and resource-isolation baseline to every L4D2 server instance, using systemd unit directives, a slice hierarchy, and host sysctls. The blueprint-level game configuration (tickrate, sv_minrate/maxrate, fps_max, plugins) stays the responsibility of the individual server maintainer and is out of scope. ## Goals - Game-server processes get measurable scheduling, I/O, and OOM priority over the script-build sandbox and over interactive system traffic. - One misbehaving server cannot OOM-kill its siblings or the host. - The kernel's UDP path is sized for sustained Source-engine traffic instead of distro defaults. - Operators have documented escape hatches for host-specific tuning (CPU pinning, governor, NIC IRQs, real-time scheduling) without any of it being imposed by default. ## Non-goals - ConVars, blueprint arguments, plugins, tickrate, rate values — owned by the maintainer of each server. - Real-time (`SCHED_FIFO`/`SCHED_RR`) scheduling for game servers. Documented as opt-in only; see Out-of-scope rationale. - CPU governor changes. Documented opt-in only. - Per-instance `CPUAffinity`. Host-specific; documented only. - NIC ring-buffer / IRQ-pinning changes. Hardware-specific; documented only. - Job-scheduler awareness ("don't build a script overlay while server X has players"). Cgroup weights cover this in v1; revisit if real-world data disagrees. - Hardening tightening (`ProtectKernelTunables=yes`, etc.). Security-focused, separate spec. ## Background Current state (commit `965b67e`): - `deploy/files/usr/local/lib/systemd/system/left4me-server@.service` runs `srcds_run` as user `left4me` with security hardening (`NoNewPrivileges`, `PrivateTmp`, `PrivateDevices`, `ProtectHome`, `ProtectSystem=strict`, `ReadOnlyPaths`, `ReadWritePaths`, `RestrictSUIDSGID`, `LockPersonality`) but **no scheduling, memory, OOM, kill-signal, or log-rate directives**. - `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` runs script-overlay builds via `systemd-run --scope` with `CPUQuota=200%` and `RuntimeMaxSec=3600`, but in the **default cgroup** — it competes against game servers as an equal sibling under `system.slice`. - No host sysctls are deployed. Linux defaults (`rmem_max`/`wmem_max` ≈ 128 KB, `netdev_max_backlog=1000`) are below what sustained UDP gameplay across multiple instances expects. srcds is single-threaded per instance, so multi-instance hosts contend over CPU cycles, kernel softirq budget, and journald rate limits. ## Design ### Slice topology Flat top-level slices, siblings of `system.slice` and `user.slice`: ``` -.slice ├── system.slice (default CPUWeight=100, IOWeight=100) ├── user.slice (default CPUWeight=100, IOWeight=100) ├── l4d2-game.slice (CPUWeight=1000, IOWeight=1000) └── l4d2-build.slice (CPUWeight=10, IOWeight=10) ``` Rationale: - 100:1 weight ratio between game and build means: under contention, the build sandbox is starved; when uncontended, the build still gets the full box modulo its own `CPUQuota=200%`. - Flat (not nested under `system.slice`) so a logged-in admin running a heavy task in `user.slice` cannot steal cycles from a live match. ### Per-instance unit additions (`left4me-server@.service`) Add to `[Service]`: ``` Slice=l4d2-game.slice Nice=-5 IOSchedulingClass=best-effort IOSchedulingPriority=4 OOMScoreAdjust=-200 MemoryHigh=1.5G MemoryMax=2G TasksMax=256 LimitNOFILE=65536 KillSignal=SIGINT TimeoutStopSec=15s LogRateLimitIntervalSec=0 ``` Per-directive justification: - `Slice=l4d2-game.slice` — places the instance in the high-weight slice. - `Nice=-5` — modest CFS priority bump. Negative `Nice` set by systemd does not require `CAP_SYS_NICE` because systemd applies the value before dropping to the unit user. SCHED_FIFO is intentionally rejected; see Out-of-scope rationale. - `IOSchedulingClass=best-effort` + `IOSchedulingPriority=4` — explicit best-effort with a slight bump above the default of 4 in the same class on most distros; deterministic and harmless. - `OOMScoreAdjust=-200` — game servers survive memory pressure; sandbox dies first (see sandbox section). - `MemoryHigh=1.5G`, `MemoryMax=2G` — soft + hard ceiling. Typical L4D2 srcds runs ~500–800 MB; map-load spikes fit in headroom; a runaway is bounded. - `TasksMax=256` — bounds thread count well above srcds' steady-state usage; prevents fork-bomb style failures from leaking host-wide. - `LimitNOFILE=65536` — Valve wiki recommendation; cheap and matches multi-plugin setups. - `KillSignal=SIGINT` — srcds responds to SIGINT for clean shutdown (writes demos, flushes logs); SIGTERM is harsher. - `TimeoutStopSec=15s` — gives srcds time to finish flush before SIGKILL. - `LogRateLimitIntervalSec=0` — disables journald per-unit rate limiting (default `10000 msgs/30s`). srcds + plugins exceed this on busy maps; dropped messages break diagnostics. Existing security directives are kept verbatim. ### Slice unit files New file `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice`: ```ini [Unit] Description=left4me game-server slice Before=slices.target [Slice] CPUWeight=1000 IOWeight=1000 ``` New file `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice`: ```ini [Unit] Description=left4me script-sandbox build slice Before=slices.target [Slice] CPUWeight=10 IOWeight=10 ``` ### Sandbox slice + OOM placement Edit `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` to add to the `systemd-run` invocation (transient service mode — the existing helper uses `--unit=` without `--scope`): - `--slice=l4d2-build.slice` - `-p OOMScoreAdjust=500` Existing `CPUQuota=200%` and `RuntimeMaxSec=3600` stay. Cgroup weight (slice) and CPU quota (per-unit) compose: weight handles contention, quota handles the absolute ceiling. ### Host sysctls New file `deploy/files/etc/sysctl.d/99-left4me.conf`: ``` net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.core.rmem_default = 524288 net.core.wmem_default = 524288 net.core.netdev_max_backlog = 5000 net.core.netdev_budget = 600 vm.swappiness = 10 ``` Per-value justification: - `rmem_max`/`wmem_max = 8 MB` — Linux default of ~128 KB is a known bottleneck for sustained UDP. 8 MB is the standard 1 Gbit recommendation (Red Hat performance guide); enough headroom for ~10 instances on a host without going to 16 MB. - `rmem_default`/`wmem_default = 512 KB` — protects sockets that don't explicitly call `setsockopt(SO_RCVBUF/SO_SNDBUF)`; harmless when they do. - `netdev_max_backlog = 5000` — default `1000` overflows under multi-instance UDP burst; the per-CPU softnet queue starts dropping packets once full. - `netdev_budget = 600` — gives softirq more packet-drain headroom per pass; default `300` is undersized for multi-Gbit-class hosts. - `vm.swappiness = 10` — universally recommended for latency-sensitive servers; harmless on swapless hosts. ### Deploy script integration `deploy/deploy-test-server.sh` must: 1. Copy `etc/sysctl.d/99-left4me.conf` to `/etc/sysctl.d/`. 2. Run `sysctl --system` (or `sysctl -p /etc/sysctl.d/99-left4me.conf`) so values take effect immediately, not on next boot. 3. Copy the two `.slice` files into `/usr/local/lib/systemd/system/`. 4. `systemctl daemon-reload` after unit/slice changes (already done in current deploy flow). 5. No explicit `systemctl start` of the slices is required — they activate on first child reference. ### Documented escape hatches (no auto-apply) Append a "Performance tuning" section to `deploy/README.md`: - **CPU governor**: `cpupower frequency-set -g performance` if jitter under load matters more than power. Schedutil is acceptable for sustained UDP workloads. Provide the one-liner; do not ship a oneshot service in v1. - **CPU affinity per instance**: example drop-in at `/etc/systemd/system/left4me-server@.service.d/affinity.conf` setting `CPUAffinity=N`. Document the strategy "one instance per core, leave core 0 for system + IRQ". - **NIC tuning**: example `ethtool -G rx 4096 tx 4096`, IRQ-pinning hints. Hardware-specific; ops-only. - **Real-time scheduling opt-in**: example drop-in adding `CPUSchedulingPolicy=fifo`, `CPUSchedulingPriority=10`, `LimitRTPRIO=10`. Include a one-paragraph warning citing RT-throttling defaults (`sched_rt_runtime_us=950000`) and the failure mode if a single instance misbehaves. These stay pure documentation in v1 — no code paths, no tests asserting them. ### Out-of-scope rationale - **SCHED_FIFO**: a misbehaving srcds at any RT priority can starve kernel threads and produces failure modes that are harder to diagnose than the jitter problem it claims to solve. `Nice=-5` plus the slice weights captures the practical benefit. Ops who need RT can opt in via the documented drop-in. - **CPU governor auto-set**: Phoronix and Arch comparisons show `schedutil` is within noise of `performance` on sustained workloads like Source UDP; aggressively forcing `performance` would surprise users on power-managed hosts. - **CPUAffinity in the unit**: the unit template is shared across all instances; a single hard-coded `CPUAffinity=` would pin every instance to the same cores, defeating the purpose. Per-instance pinning needs deploy-time policy that is outside v1's scope. ### Files changed / added ``` deploy/files/usr/local/lib/systemd/system/left4me-server@.service (modified) deploy/files/usr/local/lib/systemd/system/l4d2-game.slice (new) deploy/files/usr/local/lib/systemd/system/l4d2-build.slice (new) deploy/files/etc/sysctl.d/99-left4me.conf (new) deploy/files/usr/local/libexec/left4me/left4me-script-sandbox (modified) deploy/deploy-test-server.sh (modified — sysctl --system step) deploy/README.md (modified — performance section) deploy/tests/test_deploy_artifacts.py (modified — assertions) ``` ## Tests `deploy/tests/test_deploy_artifacts.py` additions, following the existing `assert "key=value" in text` pattern: - For `left4me-server@.service`, assert every line listed in *Per-instance unit additions* is present verbatim. Each is a separate assertion so a failing line is identifiable. - For `l4d2-game.slice`, assert `CPUWeight=1000` and `IOWeight=1000`. - For `l4d2-build.slice`, assert `CPUWeight=10` and `IOWeight=10`. - For `99-left4me.conf`, assert every sysctl line listed in *Host sysctls*. - For `left4me-script-sandbox`, assert the strings `--slice=l4d2-build.slice` and `OOMScoreAdjust=500` both appear. - Assert the deploy script invokes `sysctl --system` (or `sysctl -p /etc/sysctl.d/99-left4me.conf`) at least once after copying the conf into place. No runtime perf tests in v1 — the spec ships defaults, not measured wins. Real-world measurement is left to operators with concrete instance counts, hardware, and player loads. ## Rollout Single deploy. Running game servers will not pick up the new directives until each instance is restarted (systemd does not reapply unit changes to already-running services). The web UI's "stop" + "start" cycle is sufficient. Document this in `deploy/README.md`. ## Open questions None blocking. v2 candidates if measurement justifies them: - Per-instance `CPUAffinity` driven by a deploy-env knob (`LEFT4ME_INSTANCE_CPUS`). - Job-worker awareness of "server has active players" to defer builds further than weights alone. - Optional `left4me-host-perf.service` oneshot that sets governor + NIC tuning under a single env-flag opt-in. ## References - systemd.exec(5) — `Nice=`, `IOSchedulingClass=`, `OOMScoreAdjust=`, `MemoryHigh=`, `MemoryMax=`, `TasksMax=`, `KillSignal=`, `TimeoutStopSec=`, `LimitNOFILE=`, `LogRateLimitIntervalSec=`. - systemd.resource-control(5) — slice semantics, `CPUWeight=`, `IOWeight=`, weight competition rules. - systemd.kill(5) — signal handling and `KillSignal`. - Red Hat Enterprise Linux Network Performance Tuning Guide — `rmem_max`/`wmem_max`/`netdev_max_backlog`/`netdev_budget`. - LWN "SCHED_FIFO and realtime throttling"; RHEL Real-Time CPU throttling docs — rationale for not shipping RT by default. - Linux Foundation real-time wiki — `sched_rt_runtime_us` semantics. - forums.srcds.com / AlliedModders / linuxquestions.org threads — confirmation that srcds is single-threaded per instance. - Phoronix governor comparisons — performance vs schedutil for sustained workloads. - Multiple latency-tuning guides — `vm.swappiness=10` consensus.