Approach A: per-instance unit directives (Nice, OOM, Memory caps, KillSignal=SIGINT, log-rate disable), flat l4d2-game/l4d2-build slice hierarchy with 100:1 CPU/IO weight ratio, sandbox into build slice with OOMScoreAdjust=500, host sysctls for UDP buffers + netdev backlog/budget + vm.swappiness. SCHED_FIFO, CPU governor, CPUAffinity, NIC tuning are documented escape hatches, not auto-applied. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
l4d2 server host perf baseline — design
Date: 2026-05-09 Status: design
Summary
Apply a host-side performance and resource-isolation baseline to every L4D2 server instance, using systemd unit directives, a slice hierarchy, and host sysctls. The blueprint-level game configuration (tickrate, sv_minrate/maxrate, fps_max, plugins) stays the responsibility of the individual server maintainer and is out of scope.
Goals
- Game-server processes get measurable scheduling, I/O, and OOM priority over the script-build sandbox and over interactive system traffic.
- One misbehaving server cannot OOM-kill its siblings or the host.
- The kernel's UDP path is sized for sustained Source-engine traffic instead of distro defaults.
- Operators have documented escape hatches for host-specific tuning (CPU pinning, governor, NIC IRQs, real-time scheduling) without any of it being imposed by default.
Non-goals
- ConVars, blueprint arguments, plugins, tickrate, rate values — owned by the maintainer of each server.
- Real-time (
SCHED_FIFO/SCHED_RR) scheduling for game servers. Documented as opt-in only; see Out-of-scope rationale. - CPU governor changes. Documented opt-in only.
- Per-instance
CPUAffinity. Host-specific; documented only. - NIC ring-buffer / IRQ-pinning changes. Hardware-specific; documented only.
- Job-scheduler awareness ("don't build a script overlay while server X has players"). Cgroup weights cover this in v1; revisit if real-world data disagrees.
- Hardening tightening (
ProtectKernelTunables=yes, etc.). Security-focused, separate spec.
Background
Current state (commit 965b67e):
deploy/files/usr/local/lib/systemd/system/left4me-server@.servicerunssrcds_runas userleft4mewith security hardening (NoNewPrivileges,PrivateTmp,PrivateDevices,ProtectHome,ProtectSystem=strict,ReadOnlyPaths,ReadWritePaths,RestrictSUIDSGID,LockPersonality) but no scheduling, memory, OOM, kill-signal, or log-rate directives.deploy/files/usr/local/libexec/left4me/left4me-script-sandboxruns script-overlay builds viasystemd-run --scopewithCPUQuota=200%andRuntimeMaxSec=3600, but in the default cgroup — it competes against game servers as an equal sibling undersystem.slice.- No host sysctls are deployed. Linux defaults (
rmem_max/wmem_max≈ 128 KB,netdev_max_backlog=1000) are below what sustained UDP gameplay across multiple instances expects.
srcds is single-threaded per instance, so multi-instance hosts contend over CPU cycles, kernel softirq budget, and journald rate limits.
Design
Slice topology
Flat top-level slices, siblings of system.slice and user.slice:
-.slice
├── system.slice (default CPUWeight=100, IOWeight=100)
├── user.slice (default CPUWeight=100, IOWeight=100)
├── l4d2-game.slice (CPUWeight=1000, IOWeight=1000)
└── l4d2-build.slice (CPUWeight=10, IOWeight=10)
Rationale:
- 100:1 weight ratio between game and build means: under contention, the build sandbox is starved; when uncontended, the build still gets the full box modulo its own
CPUQuota=200%. - Flat (not nested under
system.slice) so a logged-in admin running a heavy task inuser.slicecannot steal cycles from a live match.
Per-instance unit additions (left4me-server@.service)
Add to [Service]:
Slice=l4d2-game.slice
Nice=-5
IOSchedulingClass=best-effort
IOSchedulingPriority=4
OOMScoreAdjust=-200
MemoryHigh=1.5G
MemoryMax=2G
TasksMax=256
LimitNOFILE=65536
KillSignal=SIGINT
TimeoutStopSec=15s
LogRateLimitIntervalSec=0
Per-directive justification:
Slice=l4d2-game.slice— places the instance in the high-weight slice.Nice=-5— modest CFS priority bump. NegativeNiceset by systemd does not requireCAP_SYS_NICEbecause systemd applies the value before dropping to the unit user. SCHED_FIFO is intentionally rejected; see Out-of-scope rationale.IOSchedulingClass=best-effort+IOSchedulingPriority=4— explicit best-effort with a slight bump above the default of 4 in the same class on most distros; deterministic and harmless.OOMScoreAdjust=-200— game servers survive memory pressure; sandbox dies first (see sandbox section).MemoryHigh=1.5G,MemoryMax=2G— soft + hard ceiling. Typical L4D2 srcds runs ~500–800 MB; map-load spikes fit in headroom; a runaway is bounded.TasksMax=256— bounds thread count well above srcds' steady-state usage; prevents fork-bomb style failures from leaking host-wide.LimitNOFILE=65536— Valve wiki recommendation; cheap and matches multi-plugin setups.KillSignal=SIGINT— srcds responds to SIGINT for clean shutdown (writes demos, flushes logs); SIGTERM is harsher.TimeoutStopSec=15s— gives srcds time to finish flush before SIGKILL.LogRateLimitIntervalSec=0— disables journald per-unit rate limiting (default10000 msgs/30s). srcds + plugins exceed this on busy maps; dropped messages break diagnostics.
Existing security directives are kept verbatim.
Slice unit files
New file deploy/files/usr/local/lib/systemd/system/l4d2-game.slice:
[Unit]
Description=left4me game-server slice
Before=slices.target
[Slice]
CPUWeight=1000
IOWeight=1000
New file deploy/files/usr/local/lib/systemd/system/l4d2-build.slice:
[Unit]
Description=left4me script-sandbox build slice
Before=slices.target
[Slice]
CPUWeight=10
IOWeight=10
Sandbox slice + OOM placement
Edit deploy/files/usr/local/libexec/left4me/left4me-script-sandbox to add to the systemd-run --scope invocation:
--slice=l4d2-build.slice-p OOMScoreAdjust=500
Existing CPUQuota=200% and RuntimeMaxSec=3600 stay. Cgroup weight (slice) and CPU quota (per-scope) compose: weight handles contention, quota handles the absolute ceiling.
Host sysctls
New file deploy/files/etc/sysctl.d/99-left4me.conf:
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.core.rmem_default = 524288
net.core.wmem_default = 524288
net.core.netdev_max_backlog = 5000
net.core.netdev_budget = 600
vm.swappiness = 10
Per-value justification:
rmem_max/wmem_max = 8 MB— Linux default of ~128 KB is a known bottleneck for sustained UDP. 8 MB is the standard 1 Gbit recommendation (Red Hat performance guide); enough headroom for ~10 instances on a host without going to 16 MB.rmem_default/wmem_default = 512 KB— protects sockets that don't explicitly callsetsockopt(SO_RCVBUF/SO_SNDBUF); harmless when they do.netdev_max_backlog = 5000— default1000overflows under multi-instance UDP burst; the per-CPU softnet queue starts dropping packets once full.netdev_budget = 600— gives softirq more packet-drain headroom per pass; default300is undersized for multi-Gbit-class hosts.vm.swappiness = 10— universally recommended for latency-sensitive servers; harmless on swapless hosts.
Deploy script integration
deploy/deploy-test-server.sh must:
- Copy
etc/sysctl.d/99-left4me.confto/etc/sysctl.d/. - Run
sysctl --system(orsysctl -p /etc/sysctl.d/99-left4me.conf) so values take effect immediately, not on next boot. - Copy the two
.slicefiles into/usr/local/lib/systemd/system/. systemctl daemon-reloadafter unit/slice changes (already done in current deploy flow).- No explicit
systemctl startof the slices is required — they activate on first child reference.
Documented escape hatches (no auto-apply)
Append a "Performance tuning" section to deploy/README.md:
- CPU governor:
cpupower frequency-set -g performanceif jitter under load matters more than power. Schedutil is acceptable for sustained UDP workloads. Provide the one-liner; do not ship a oneshot service in v1. - CPU affinity per instance: example drop-in at
/etc/systemd/system/left4me-server@<name>.service.d/affinity.confsettingCPUAffinity=N. Document the strategy "one instance per core, leave core 0 for system + IRQ". - NIC tuning: example
ethtool -G <iface> rx 4096 tx 4096, IRQ-pinning hints. Hardware-specific; ops-only. - Real-time scheduling opt-in: example drop-in adding
CPUSchedulingPolicy=fifo,CPUSchedulingPriority=10,LimitRTPRIO=10. Include a one-paragraph warning citing RT-throttling defaults (sched_rt_runtime_us=950000) and the failure mode if a single instance misbehaves.
These stay pure documentation in v1 — no code paths, no tests asserting them.
Out-of-scope rationale
- SCHED_FIFO: a misbehaving srcds at any RT priority can starve kernel threads and produces failure modes that are harder to diagnose than the jitter problem it claims to solve.
Nice=-5plus the slice weights captures the practical benefit. Ops who need RT can opt in via the documented drop-in. - CPU governor auto-set: Phoronix and Arch comparisons show
schedutilis within noise ofperformanceon sustained workloads like Source UDP; aggressively forcingperformancewould surprise users on power-managed hosts. - CPUAffinity in the unit: the unit template is shared across all instances; a single hard-coded
CPUAffinity=would pin every instance to the same cores, defeating the purpose. Per-instance pinning needs deploy-time policy that is outside v1's scope.
Files changed / added
deploy/files/usr/local/lib/systemd/system/left4me-server@.service (modified)
deploy/files/usr/local/lib/systemd/system/l4d2-game.slice (new)
deploy/files/usr/local/lib/systemd/system/l4d2-build.slice (new)
deploy/files/etc/sysctl.d/99-left4me.conf (new)
deploy/files/usr/local/libexec/left4me/left4me-script-sandbox (modified)
deploy/deploy-test-server.sh (modified — sysctl --system step)
deploy/README.md (modified — performance section)
deploy/tests/test_deploy_artifacts.py (modified — assertions)
Tests
deploy/tests/test_deploy_artifacts.py additions, following the existing
assert "key=value" in text pattern:
- For
left4me-server@.service, assert every line listed in Per-instance unit additions is present verbatim. Each is a separate assertion so a failing line is identifiable. - For
l4d2-game.slice, assertCPUWeight=1000andIOWeight=1000. - For
l4d2-build.slice, assertCPUWeight=10andIOWeight=10. - For
99-left4me.conf, assert every sysctl line listed in Host sysctls. - For
left4me-script-sandbox, assert the strings--slice=l4d2-build.sliceandOOMScoreAdjust=500both appear. - Assert the deploy script invokes
sysctl --system(orsysctl -p /etc/sysctl.d/99-left4me.conf) at least once after copying the conf into place.
No runtime perf tests in v1 — the spec ships defaults, not measured wins. Real-world measurement is left to operators with concrete instance counts, hardware, and player loads.
Rollout
Single deploy. Running game servers will not pick up the new directives until each instance is restarted (systemd does not reapply unit changes to already-running services). The web UI's "stop" + "start" cycle is sufficient. Document this in deploy/README.md.
Open questions
None blocking. v2 candidates if measurement justifies them:
- Per-instance
CPUAffinitydriven by a deploy-env knob (LEFT4ME_INSTANCE_CPUS). - Job-worker awareness of "server has active players" to defer builds further than weights alone.
- Optional
left4me-host-perf.serviceoneshot that sets governor + NIC tuning under a single env-flag opt-in.
References
- systemd.exec(5) —
Nice=,IOSchedulingClass=,OOMScoreAdjust=,MemoryHigh=,MemoryMax=,TasksMax=,KillSignal=,TimeoutStopSec=,LimitNOFILE=,LogRateLimitIntervalSec=. - systemd.resource-control(5) — slice semantics,
CPUWeight=,IOWeight=, weight competition rules. - systemd.kill(5) — signal handling and
KillSignal. - Red Hat Enterprise Linux Network Performance Tuning Guide —
rmem_max/wmem_max/netdev_max_backlog/netdev_budget. - LWN "SCHED_FIFO and realtime throttling"; RHEL Real-Time CPU throttling docs — rationale for not shipping RT by default.
- Linux Foundation real-time wiki —
sched_rt_runtime_ussemantics. - forums.srcds.com / AlliedModders / linuxquestions.org threads — confirmation that srcds is single-threaded per instance.
- Phoronix governor comparisons — performance vs schedutil for sustained workloads.
- Multiple latency-tuning guides —
vm.swappiness=10consensus.