docs(plans): l4d2 server host perf baseline — implementation plan
Six tasks (TDD, one commit each): unit directives, slice files, sysctl conf, sandbox slice + OOMScoreAdjust, deploy-script wiring, README escape-hatch section. Final verification step with full deploy + host + web pytest sweep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b6574e308b
commit
851e6629aa
1 changed files with 686 additions and 0 deletions
|
|
@ -0,0 +1,686 @@
|
||||||
|
# L4D2 Server Host Perf Baseline Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Apply a host-side performance and resource-isolation baseline (systemd directives, slice hierarchy, host sysctls) to every L4D2 server instance, leaving game ConVars to the maintainer.
|
||||||
|
|
||||||
|
**Architecture:** Add resource-control directives to `left4me-server@.service`; introduce two flat top-level slices (`l4d2-game.slice` weight 1000, `l4d2-build.slice` weight 10) so the build sandbox is starved by the kernel under contention; ship `/etc/sysctl.d/99-left4me.conf` for UDP buffer and netdev tuning; place the script-sandbox transient unit into `l4d2-build.slice` with `OOMScoreAdjust=500`. RT scheduling, CPU governor, CPUAffinity, NIC tuning are documentation-only escape hatches.
|
||||||
|
|
||||||
|
**Tech Stack:** systemd unit files (service + slice), `systemd-run` properties, Linux sysctl, bash deploy script, pytest text-assertion tests under `deploy/tests/test_deploy_artifacts.py`.
|
||||||
|
|
||||||
|
**Spec:** `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
Files to create:
|
||||||
|
|
||||||
|
- `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice` — high-weight slice for game-server instances.
|
||||||
|
- `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice` — low-weight slice for sandboxed script-overlay builds.
|
||||||
|
- `deploy/files/etc/sysctl.d/99-left4me.conf` — host UDP/netdev/swap sysctls.
|
||||||
|
|
||||||
|
Files to modify:
|
||||||
|
|
||||||
|
- `deploy/files/usr/local/lib/systemd/system/left4me-server@.service` — add resource-control directives (`Slice`, `Nice`, `IOSchedulingClass`, `IOSchedulingPriority`, `OOMScoreAdjust`, `MemoryHigh`, `MemoryMax`, `TasksMax`, `LimitNOFILE`, `KillSignal`, `TimeoutStopSec`, `LogRateLimitIntervalSec`).
|
||||||
|
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add `--slice=l4d2-build.slice` and `-p OOMScoreAdjust=500` to the `systemd-run` invocation.
|
||||||
|
- `deploy/deploy-test-server.sh` — copy the two slice files and the sysctl conf during deploy; run `sysctl --system` so values take effect immediately.
|
||||||
|
- `deploy/README.md` — append a "Performance tuning" section with the four documented escape hatches.
|
||||||
|
- `deploy/tests/test_deploy_artifacts.py` — new tests for each artifact above (text assertions following the existing `assert "X" in text` style).
|
||||||
|
|
||||||
|
No application code (Python, Flask, host library) is touched.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-flight
|
||||||
|
|
||||||
|
- [ ] **Step 0a: Verify clean working tree**
|
||||||
|
|
||||||
|
Run: `git status`
|
||||||
|
Expected: `nothing to commit, working tree clean`
|
||||||
|
|
||||||
|
- [ ] **Step 0b: Verify the existing deploy tests pass**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
||||||
|
Expected: all green.
|
||||||
|
|
||||||
|
If any test is already red, stop and surface — this plan assumes the baseline is green.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 1: Per-Instance Unit Resource-Control Directives
|
||||||
|
|
||||||
|
Add the per-instance baseline to `left4me-server@.service`. This task is self-contained even though `Slice=l4d2-game.slice` references a slice that doesn't exist yet — systemd does not validate the reference until the unit is actually started, and the deploy artifact tests are pure text checks.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
|
||||||
|
- Test: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
||||||
|
|
||||||
|
- [ ] **Step 1.1: Add the failing test**
|
||||||
|
|
||||||
|
Open `deploy/tests/test_deploy_artifacts.py` and append (after `test_server_unit_contains_required_runtime_contract`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
def test_server_unit_contains_perf_baseline_directives():
|
||||||
|
unit = SERVER_UNIT.read_text()
|
||||||
|
|
||||||
|
# Slice membership.
|
||||||
|
assert "Slice=l4d2-game.slice" in unit
|
||||||
|
|
||||||
|
# CFS priority bump (no SCHED_FIFO).
|
||||||
|
assert "Nice=-5" in unit
|
||||||
|
assert "CPUSchedulingPolicy=" not in unit
|
||||||
|
|
||||||
|
# I/O priority.
|
||||||
|
assert "IOSchedulingClass=best-effort" in unit
|
||||||
|
assert "IOSchedulingPriority=4" in unit
|
||||||
|
|
||||||
|
# OOM ordering: game servers survive, sandbox dies first.
|
||||||
|
assert "OOMScoreAdjust=-200" in unit
|
||||||
|
|
||||||
|
# Memory caps with headroom for map-load spikes.
|
||||||
|
assert "MemoryHigh=1.5G" in unit
|
||||||
|
assert "MemoryMax=2G" in unit
|
||||||
|
|
||||||
|
# Bounded fork surface.
|
||||||
|
assert "TasksMax=256" in unit
|
||||||
|
|
||||||
|
# Plenty of fds for plugin-heavy setups.
|
||||||
|
assert "LimitNOFILE=65536" in unit
|
||||||
|
|
||||||
|
# srcds clean shutdown via SIGINT, with time to flush.
|
||||||
|
assert "KillSignal=SIGINT" in unit
|
||||||
|
assert "TimeoutStopSec=15s" in unit
|
||||||
|
|
||||||
|
# Per-unit override of journald rate limiting (default drops srcds output).
|
||||||
|
assert "LogRateLimitIntervalSec=0" in unit
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 1.2: Run the new test, verify it fails**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_server_unit_contains_perf_baseline_directives -v`
|
||||||
|
Expected: FAIL — first failing assert is on `Slice=l4d2-game.slice`.
|
||||||
|
|
||||||
|
- [ ] **Step 1.3: Edit the unit file**
|
||||||
|
|
||||||
|
Open `deploy/files/usr/local/lib/systemd/system/left4me-server@.service` and replace its contents with:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=left4me server instance %i
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=left4me
|
||||||
|
Group=left4me
|
||||||
|
EnvironmentFile=/etc/left4me/host.env
|
||||||
|
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
||||||
|
WorkingDirectory=/var/lib/left4me/runtime/%i/merged/left4dead2
|
||||||
|
ExecStart=/var/lib/left4me/installation/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
|
||||||
|
# Resource control baseline — see docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
||||||
|
Slice=l4d2-game.slice
|
||||||
|
Nice=-5
|
||||||
|
IOSchedulingClass=best-effort
|
||||||
|
IOSchedulingPriority=4
|
||||||
|
OOMScoreAdjust=-200
|
||||||
|
MemoryHigh=1.5G
|
||||||
|
MemoryMax=2G
|
||||||
|
TasksMax=256
|
||||||
|
LimitNOFILE=65536
|
||||||
|
KillSignal=SIGINT
|
||||||
|
TimeoutStopSec=15s
|
||||||
|
LogRateLimitIntervalSec=0
|
||||||
|
|
||||||
|
# Hardening (unchanged from previous baseline).
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
PrivateDevices=true
|
||||||
|
ProtectHome=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ReadOnlyPaths=/var/lib/left4me/installation /var/lib/left4me/overlays
|
||||||
|
ReadWritePaths=/var/lib/left4me/runtime/%i
|
||||||
|
RestrictSUIDSGID=true
|
||||||
|
LockPersonality=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 1.4: Run the new test, verify it passes**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_server_unit_contains_perf_baseline_directives -v`
|
||||||
|
Expected: PASS.
|
||||||
|
|
||||||
|
- [ ] **Step 1.5: Re-run the existing server-unit test, verify still passes**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_server_unit_contains_required_runtime_contract -v`
|
||||||
|
Expected: PASS — the existing assertions (`User=left4me`, `Group=left4me`, hardening directives, etc.) still match.
|
||||||
|
|
||||||
|
- [ ] **Step 1.6: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add deploy/files/usr/local/lib/systemd/system/left4me-server@.service deploy/tests/test_deploy_artifacts.py
|
||||||
|
git commit -m "$(cat <<'EOF'
|
||||||
|
feat(deploy): perf-baseline directives on left4me-server@.service
|
||||||
|
|
||||||
|
Slice=l4d2-game.slice, Nice=-5, IOSchedulingClass=best-effort,
|
||||||
|
OOMScoreAdjust=-200, MemoryHigh=1.5G, MemoryMax=2G, TasksMax=256,
|
||||||
|
LimitNOFILE=65536, KillSignal=SIGINT, TimeoutStopSec=15s,
|
||||||
|
LogRateLimitIntervalSec=0. Spec:
|
||||||
|
docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
||||||
|
EOF
|
||||||
|
)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 2: Slice Unit Files
|
||||||
|
|
||||||
|
Create the two slice unit files. After this task the perf unit's `Slice=l4d2-game.slice` reference is satisfied.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice`
|
||||||
|
- Create: `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice`
|
||||||
|
- Test: `deploy/tests/test_deploy_artifacts.py` (new constants + new test functions)
|
||||||
|
|
||||||
|
- [ ] **Step 2.1: Add path constants and failing tests**
|
||||||
|
|
||||||
|
Open `deploy/tests/test_deploy_artifacts.py`. After the existing `SERVER_UNIT = ...` line, add:
|
||||||
|
|
||||||
|
```python
|
||||||
|
GAME_SLICE = DEPLOY / "files/usr/local/lib/systemd/system/l4d2-game.slice"
|
||||||
|
BUILD_SLICE = DEPLOY / "files/usr/local/lib/systemd/system/l4d2-build.slice"
|
||||||
|
```
|
||||||
|
|
||||||
|
After the new `test_server_unit_contains_perf_baseline_directives`, append:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def test_l4d2_game_slice_exists_with_high_weights():
|
||||||
|
assert GAME_SLICE.is_file()
|
||||||
|
text = GAME_SLICE.read_text()
|
||||||
|
assert "[Slice]" in text
|
||||||
|
assert "CPUWeight=1000" in text
|
||||||
|
assert "IOWeight=1000" in text
|
||||||
|
|
||||||
|
|
||||||
|
def test_l4d2_build_slice_exists_with_low_weights():
|
||||||
|
assert BUILD_SLICE.is_file()
|
||||||
|
text = BUILD_SLICE.read_text()
|
||||||
|
assert "[Slice]" in text
|
||||||
|
assert "CPUWeight=10" in text
|
||||||
|
assert "IOWeight=10" in text
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2.2: Run the new tests, verify they fail**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_l4d2_game_slice_exists_with_high_weights deploy/tests/test_deploy_artifacts.py::test_l4d2_build_slice_exists_with_low_weights -v`
|
||||||
|
Expected: FAIL on `assert GAME_SLICE.is_file()` (file does not exist).
|
||||||
|
|
||||||
|
- [ ] **Step 2.3: Create the game slice file**
|
||||||
|
|
||||||
|
Create `deploy/files/usr/local/lib/systemd/system/l4d2-game.slice` with:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=left4me game-server slice
|
||||||
|
Before=slices.target
|
||||||
|
|
||||||
|
[Slice]
|
||||||
|
CPUWeight=1000
|
||||||
|
IOWeight=1000
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2.4: Create the build slice file**
|
||||||
|
|
||||||
|
Create `deploy/files/usr/local/lib/systemd/system/l4d2-build.slice` with:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Unit]
|
||||||
|
Description=left4me script-sandbox build slice
|
||||||
|
Before=slices.target
|
||||||
|
|
||||||
|
[Slice]
|
||||||
|
CPUWeight=10
|
||||||
|
IOWeight=10
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2.5: Run the new tests, verify they pass**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_l4d2_game_slice_exists_with_high_weights deploy/tests/test_deploy_artifacts.py::test_l4d2_build_slice_exists_with_low_weights -v`
|
||||||
|
Expected: PASS.
|
||||||
|
|
||||||
|
- [ ] **Step 2.6: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add deploy/files/usr/local/lib/systemd/system/l4d2-game.slice deploy/files/usr/local/lib/systemd/system/l4d2-build.slice deploy/tests/test_deploy_artifacts.py
|
||||||
|
git commit -m "$(cat <<'EOF'
|
||||||
|
feat(deploy): l4d2-game.slice + l4d2-build.slice with 100:1 weight ratio
|
||||||
|
|
||||||
|
Flat top-level slices. Game wins under contention; build still gets
|
||||||
|
the box when uncontended. Referenced by left4me-server@.service and
|
||||||
|
the script-sandbox systemd-run invocation.
|
||||||
|
EOF
|
||||||
|
)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 3: Host Sysctls
|
||||||
|
|
||||||
|
Ship a `/etc/sysctl.d/` drop-in for UDP buffers, netdev backlog, netdev budget, and `vm.swappiness`.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `deploy/files/etc/sysctl.d/99-left4me.conf`
|
||||||
|
- Test: `deploy/tests/test_deploy_artifacts.py` (new constant + new test function)
|
||||||
|
|
||||||
|
- [ ] **Step 3.1: Add path constant and failing test**
|
||||||
|
|
||||||
|
Open `deploy/tests/test_deploy_artifacts.py`. After the slice constants, add:
|
||||||
|
|
||||||
|
```python
|
||||||
|
SYSCTL_CONF = DEPLOY / "files/etc/sysctl.d/99-left4me.conf"
|
||||||
|
```
|
||||||
|
|
||||||
|
Append a new test:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def test_sysctl_conf_present_with_perf_settings():
|
||||||
|
assert SYSCTL_CONF.is_file()
|
||||||
|
text = SYSCTL_CONF.read_text()
|
||||||
|
for line in (
|
||||||
|
"net.core.rmem_max = 8388608",
|
||||||
|
"net.core.wmem_max = 8388608",
|
||||||
|
"net.core.rmem_default = 524288",
|
||||||
|
"net.core.wmem_default = 524288",
|
||||||
|
"net.core.netdev_max_backlog = 5000",
|
||||||
|
"net.core.netdev_budget = 600",
|
||||||
|
"vm.swappiness = 10",
|
||||||
|
):
|
||||||
|
assert line in text, f"missing {line!r} in 99-left4me.conf"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3.2: Run the new test, verify it fails**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_sysctl_conf_present_with_perf_settings -v`
|
||||||
|
Expected: FAIL on `assert SYSCTL_CONF.is_file()`.
|
||||||
|
|
||||||
|
- [ ] **Step 3.3: Create the sysctl conf file**
|
||||||
|
|
||||||
|
Create `deploy/files/etc/sysctl.d/99-left4me.conf` with:
|
||||||
|
|
||||||
|
```
|
||||||
|
# Host-side perf baseline for left4me — see
|
||||||
|
# docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
||||||
|
#
|
||||||
|
# UDP socket buffers: distro defaults of ~128 KiB are too small for sustained
|
||||||
|
# Source-engine UDP across multiple instances. 8 MiB matches the standard
|
||||||
|
# 1 Gbit recommendation; rmem_default/wmem_default protect sockets that don't
|
||||||
|
# explicitly enlarge their buffers.
|
||||||
|
net.core.rmem_max = 8388608
|
||||||
|
net.core.wmem_max = 8388608
|
||||||
|
net.core.rmem_default = 524288
|
||||||
|
net.core.wmem_default = 524288
|
||||||
|
|
||||||
|
# Kernel softirq UDP path: the per-CPU backlog queue starts dropping packets
|
||||||
|
# at the default 1000 under multi-instance burst; 5000 absorbs realistic peaks.
|
||||||
|
# netdev_budget = 600 gives softirq more drain headroom per pass.
|
||||||
|
net.core.netdev_max_backlog = 5000
|
||||||
|
net.core.netdev_budget = 600
|
||||||
|
|
||||||
|
# Latency-sensitive default: avoid swap unless the box is really under
|
||||||
|
# pressure. Harmless on swapless hosts.
|
||||||
|
vm.swappiness = 10
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3.4: Run the new test, verify it passes**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_sysctl_conf_present_with_perf_settings -v`
|
||||||
|
Expected: PASS.
|
||||||
|
|
||||||
|
- [ ] **Step 3.5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add deploy/files/etc/sysctl.d/99-left4me.conf deploy/tests/test_deploy_artifacts.py
|
||||||
|
git commit -m "$(cat <<'EOF'
|
||||||
|
feat(deploy): host sysctls for UDP buffers + netdev backlog/budget
|
||||||
|
|
||||||
|
99-left4me.conf: rmem_max/wmem_max=8M (with 512K defaults),
|
||||||
|
netdev_max_backlog=5000, netdev_budget=600, vm.swappiness=10.
|
||||||
|
EOF
|
||||||
|
)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 4: Sandbox in Build Slice
|
||||||
|
|
||||||
|
Place the script-sandbox transient unit into `l4d2-build.slice` and give it `OOMScoreAdjust=500` so it dies first under memory pressure.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
|
||||||
|
- Test: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
||||||
|
|
||||||
|
- [ ] **Step 4.1: Add the failing test**
|
||||||
|
|
||||||
|
Open `deploy/tests/test_deploy_artifacts.py`. Append:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def test_script_sandbox_in_build_slice_with_oom_adjust():
|
||||||
|
text = SCRIPT_SANDBOX_HELPER.read_text()
|
||||||
|
|
||||||
|
# Put the transient unit in the low-weight build slice so it yields to
|
||||||
|
# game-server instances under CPU/IO contention.
|
||||||
|
assert "--slice=l4d2-build.slice" in text
|
||||||
|
|
||||||
|
# Sandbox dies first if the host hits memory pressure; servers
|
||||||
|
# (OOMScoreAdjust=-200) survive.
|
||||||
|
assert "-p OOMScoreAdjust=500" in text
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4.2: Run the new test, verify it fails**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_script_sandbox_in_build_slice_with_oom_adjust -v`
|
||||||
|
Expected: FAIL — neither string is in the helper yet.
|
||||||
|
|
||||||
|
- [ ] **Step 4.3: Edit the sandbox helper**
|
||||||
|
|
||||||
|
Open `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`. Locate the `systemd-run` invocation that begins with:
|
||||||
|
|
||||||
|
```
|
||||||
|
systemd-run --quiet --collect --wait --pipe \
|
||||||
|
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
||||||
|
```
|
||||||
|
|
||||||
|
Insert two new lines immediately after the `--unit=` line, before `-p User=l4d2-sandbox`. The block becomes:
|
||||||
|
|
||||||
|
```
|
||||||
|
systemd-run --quiet --collect --wait --pipe \
|
||||||
|
--unit="left4me-script-${OVERLAY_ID}-$$" \
|
||||||
|
--slice=l4d2-build.slice \
|
||||||
|
-p OOMScoreAdjust=500 \
|
||||||
|
-p User=l4d2-sandbox -p Group=l4d2-sandbox \
|
||||||
|
```
|
||||||
|
|
||||||
|
Leave every other `-p` line untouched.
|
||||||
|
|
||||||
|
- [ ] **Step 4.4: Verify shell syntax still parses**
|
||||||
|
|
||||||
|
Run: `bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`
|
||||||
|
Expected: exit 0, no output.
|
||||||
|
|
||||||
|
- [ ] **Step 4.5: Run the new test and the existing sandbox-helper tests, verify they pass**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_script_sandbox_in_build_slice_with_oom_adjust deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_invokes_systemd_run_with_hardening deploy/tests/test_deploy_artifacts.py::test_script_sandbox_helper_passes_shell_syntax_check -v`
|
||||||
|
Expected: PASS for all three. The hardening test still matches because it only checks for substring presence; we added strings, didn't remove any.
|
||||||
|
|
||||||
|
- [ ] **Step 4.6: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add deploy/files/usr/local/libexec/left4me/left4me-script-sandbox deploy/tests/test_deploy_artifacts.py
|
||||||
|
git commit -m "$(cat <<'EOF'
|
||||||
|
feat(deploy): script-sandbox runs in l4d2-build.slice + OOMScoreAdjust=500
|
||||||
|
|
||||||
|
Builds yield CPU/IO to game-server instances under contention via the
|
||||||
|
slice's weight=10, and are killed first under memory pressure
|
||||||
|
(servers have OOMScoreAdjust=-200).
|
||||||
|
EOF
|
||||||
|
)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 5: Deploy Script Installs Slice + Sysctl Artifacts
|
||||||
|
|
||||||
|
Wire the new artifacts into `deploy-test-server.sh` so a fresh deploy actually puts them on disk and applies the sysctls.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `deploy/deploy-test-server.sh`
|
||||||
|
- Test: `deploy/tests/test_deploy_artifacts.py` (new test function)
|
||||||
|
|
||||||
|
- [ ] **Step 5.1: Add the failing test**
|
||||||
|
|
||||||
|
Open `deploy/tests/test_deploy_artifacts.py`. Append:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def test_deploy_script_installs_perf_artifacts():
|
||||||
|
script = DEPLOY_SCRIPT.read_text()
|
||||||
|
|
||||||
|
# Slice files copied into the system-wide systemd unit dir.
|
||||||
|
assert "/usr/local/lib/systemd/system/l4d2-game.slice" in script
|
||||||
|
assert "/usr/local/lib/systemd/system/l4d2-build.slice" in script
|
||||||
|
|
||||||
|
# Sysctl drop-in installed under /etc/sysctl.d/.
|
||||||
|
assert "/etc/sysctl.d/99-left4me.conf" in script
|
||||||
|
|
||||||
|
# Values applied immediately, not on next boot.
|
||||||
|
assert "sysctl --system" in script
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5.2: Run the new test, verify it fails**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_perf_artifacts -v`
|
||||||
|
Expected: FAIL on the first assertion.
|
||||||
|
|
||||||
|
- [ ] **Step 5.3: Edit the deploy script — copy the slice + sysctl files**
|
||||||
|
|
||||||
|
Open `deploy/deploy-test-server.sh`. Find the block that copies unit files (currently around line 138):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-web.service /usr/local/lib/systemd/system/left4me-web.service
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-server@.service /usr/local/lib/systemd/system/left4me-server@.service
|
||||||
|
```
|
||||||
|
|
||||||
|
Add two new lines immediately after the `left4me-server@.service` copy line, so the block becomes:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-web.service /usr/local/lib/systemd/system/left4me-web.service
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/left4me-server@.service /usr/local/lib/systemd/system/left4me-server@.service
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-game.slice /usr/local/lib/systemd/system/l4d2-game.slice
|
||||||
|
$sudo_cmd cp /opt/left4me/deploy/files/usr/local/lib/systemd/system/l4d2-build.slice /usr/local/lib/systemd/system/l4d2-build.slice
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5.4: Edit the deploy script — install the sysctl conf and apply it**
|
||||||
|
|
||||||
|
In `deploy/deploy-test-server.sh`, find the block that installs `/etc/left4me/sandbox-resolv.conf` (currently around lines 153–155):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$sudo_cmd install -m 0644 -o root -g root \
|
||||||
|
/opt/left4me/deploy/files/etc/left4me/sandbox-resolv.conf \
|
||||||
|
/etc/left4me/sandbox-resolv.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
Immediately after that block, add:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# Host perf-baseline sysctls. Apply with `sysctl --system` so values
|
||||||
|
# take effect this deploy, not on next reboot.
|
||||||
|
$sudo_cmd install -m 0644 -o root -g root \
|
||||||
|
/opt/left4me/deploy/files/etc/sysctl.d/99-left4me.conf \
|
||||||
|
/etc/sysctl.d/99-left4me.conf
|
||||||
|
$sudo_cmd sysctl --system >/dev/null
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5.5: Verify the deploy script's shell syntax still parses**
|
||||||
|
|
||||||
|
Run: `sh -n deploy/deploy-test-server.sh`
|
||||||
|
Expected: exit 0, no output.
|
||||||
|
|
||||||
|
- [ ] **Step 5.6: Run the new test and the existing deploy-script tests, verify they pass**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_deploy_script_installs_perf_artifacts deploy/tests/test_deploy_artifacts.py::test_deploy_script_has_safe_defaults_and_preserves_state deploy/tests/test_deploy_artifacts.py::test_deploy_script_shell_syntax -v`
|
||||||
|
Expected: PASS for all three.
|
||||||
|
|
||||||
|
- [ ] **Step 5.7: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add deploy/deploy-test-server.sh deploy/tests/test_deploy_artifacts.py
|
||||||
|
git commit -m "$(cat <<'EOF'
|
||||||
|
feat(deploy): install slice + sysctl artifacts and apply via sysctl --system
|
||||||
|
|
||||||
|
Copies l4d2-game.slice and l4d2-build.slice into
|
||||||
|
/usr/local/lib/systemd/system/, installs 99-left4me.conf into
|
||||||
|
/etc/sysctl.d/, and runs sysctl --system so the perf baseline is
|
||||||
|
live this deploy, not on next reboot.
|
||||||
|
EOF
|
||||||
|
)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 6: Performance-Tuning Section in deploy/README.md
|
||||||
|
|
||||||
|
Document the four escape hatches the spec lists as opt-in: CPU governor, per-instance `CPUAffinity`, NIC tuning, and SCHED_FIFO.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `deploy/README.md`
|
||||||
|
|
||||||
|
No test for this task — README content is documentation, not contract.
|
||||||
|
|
||||||
|
- [ ] **Step 6.1: Append the Performance Tuning section**
|
||||||
|
|
||||||
|
Open `deploy/README.md`. Append (after the existing final paragraph) a new section:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Performance Tuning
|
||||||
|
|
||||||
|
The deployment ships a host-side perf baseline (slices, unit directives, sysctls). See `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md` for design rationale.
|
||||||
|
|
||||||
|
The following knobs are documented escape hatches — they are **not** auto-applied. Apply only if you have measured a need and understand the failure modes.
|
||||||
|
|
||||||
|
### CPU governor
|
||||||
|
|
||||||
|
The performance governor squeezes a few percent off jitter under bursty load. `schedutil` is acceptable for sustained UDP workloads.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo cpupower frequency-set -g performance
|
||||||
|
```
|
||||||
|
|
||||||
|
Persist via your distro's CPU-frequency tooling (e.g. `/etc/default/cpufrequtils`).
|
||||||
|
|
||||||
|
### Per-instance CPU affinity
|
||||||
|
|
||||||
|
`srcds` is single-threaded per instance. On a multi-core host, pinning each instance to its own core can cut jitter under contention. Drop in `/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Service]
|
||||||
|
CPUAffinity=2
|
||||||
|
```
|
||||||
|
|
||||||
|
A reasonable strategy on an N-core host: leave core 0 for the kernel + IRQs + system services, then pin one instance per remaining core.
|
||||||
|
|
||||||
|
### NIC tuning
|
||||||
|
|
||||||
|
Hardware-specific. On a host with a single primary interface (replace `eth0`):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo ethtool -G eth0 rx 4096 tx 4096
|
||||||
|
sudo ethtool -K eth0 gro on lro off
|
||||||
|
```
|
||||||
|
|
||||||
|
If you run a high instance count, also pin the NIC's interrupts off the cores that game servers occupy (see `/proc/interrupts` and `/proc/irq/<n>/smp_affinity`).
|
||||||
|
|
||||||
|
### Real-time scheduling (advanced, opt-in)
|
||||||
|
|
||||||
|
Source-engine servers do not need real-time scheduling, and a misbehaving `srcds` at any RT priority can starve kernel threads — even with the default `kernel.sched_rt_runtime_us=950000` throttling 5% of CPU back. Use only if you have a measured jitter problem that the baseline does not solve.
|
||||||
|
|
||||||
|
`/etc/systemd/system/left4me-server@.service.d/realtime.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Service]
|
||||||
|
CPUSchedulingPolicy=fifo
|
||||||
|
CPUSchedulingPriority=10
|
||||||
|
LimitRTPRIO=10
|
||||||
|
```
|
||||||
|
|
||||||
|
### Applying changes to running servers
|
||||||
|
|
||||||
|
Unit-file changes do not apply to already-running services. After any change:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
# Restart each game server via the web UI's stop + start, or:
|
||||||
|
sudo systemctl restart 'left4me-server@*.service'
|
||||||
|
```
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 6.2: Run the full deploy test suite and verify it stays green**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py -q`
|
||||||
|
Expected: all green. README changes have no test, but should not break any existing tests.
|
||||||
|
|
||||||
|
- [ ] **Step 6.3: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add deploy/README.md
|
||||||
|
git commit -m "$(cat <<'EOF'
|
||||||
|
docs(deploy): performance-tuning escape-hatch section in README
|
||||||
|
|
||||||
|
Documents CPU governor, per-instance CPUAffinity, NIC tuning, and
|
||||||
|
SCHED_FIFO opt-in patterns. None of these are auto-applied; they're
|
||||||
|
ops-side knobs for measured problems the perf baseline doesn't solve.
|
||||||
|
EOF
|
||||||
|
)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Final Verification
|
||||||
|
|
||||||
|
- [ ] **Step F.1: Full deploy test suite green**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ -q`
|
||||||
|
Expected: all green.
|
||||||
|
|
||||||
|
- [ ] **Step F.2: Host library + web tests still green (regression check)**
|
||||||
|
|
||||||
|
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2host/tests -q && pytest l4d2web/tests -q`
|
||||||
|
Expected: all green. Nothing in this plan touches host or web Python code, but a clean run rules out accidental import-time damage.
|
||||||
|
|
||||||
|
- [ ] **Step F.3: Working tree clean and commits in order**
|
||||||
|
|
||||||
|
Run: `git status && git log --oneline -8`
|
||||||
|
Expected:
|
||||||
|
- `git status`: `nothing to commit, working tree clean`.
|
||||||
|
- `git log`: six new commits in this order, top-most first:
|
||||||
|
1. `docs(deploy): performance-tuning escape-hatch section in README`
|
||||||
|
2. `feat(deploy): install slice + sysctl artifacts and apply via sysctl --system`
|
||||||
|
3. `feat(deploy): script-sandbox runs in l4d2-build.slice + OOMScoreAdjust=500`
|
||||||
|
4. `feat(deploy): host sysctls for UDP buffers + netdev backlog/budget`
|
||||||
|
5. `feat(deploy): l4d2-game.slice + l4d2-build.slice with 100:1 weight ratio`
|
||||||
|
6. `feat(deploy): perf-baseline directives on left4me-server@.service`
|
||||||
|
|
||||||
|
If any step is missing or out of order, do not amend — diagnose, fix, and create new commits.
|
||||||
|
|
||||||
|
- [ ] **Step F.4: Manual deploy smoke test (deferred, ops-side)**
|
||||||
|
|
||||||
|
This plan ships artifacts. Confirming that systemd actually accepts and applies them on a real host requires running the deploy script against a test target. That validation is operator-side, not part of this implementation:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
deploy/deploy-test-server.sh deploy-user@example-host
|
||||||
|
ssh deploy-user@example-host 'systemctl cat l4d2-game.slice'
|
||||||
|
ssh deploy-user@example-host 'sysctl net.core.rmem_max' # expect 8388608
|
||||||
|
ssh deploy-user@example-host 'systemd-analyze verify /usr/local/lib/systemd/system/left4me-server@.service'
|
||||||
|
```
|
||||||
|
|
||||||
|
Document any deploy-time problems back into the spec or this plan as v1.x corrections. Do not invent fixes that go beyond the spec.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope (do NOT implement here)
|
||||||
|
|
||||||
|
Listed in the spec — repeated for clarity:
|
||||||
|
|
||||||
|
- ConVars / blueprint arguments / tickrate / sv_minrate.
|
||||||
|
- SCHED_FIFO auto-apply.
|
||||||
|
- CPU governor auto-apply.
|
||||||
|
- Per-instance `CPUAffinity` auto-apply.
|
||||||
|
- NIC ring-buffer / IRQ-pinning code.
|
||||||
|
- Job-scheduler awareness ("don't build while server X has players").
|
||||||
|
- Hardening tightening (`ProtectKernelTunables=yes`, etc.).
|
||||||
|
|
||||||
|
If you find yourself touching any of these, stop — they belong in a separate spec.
|
||||||
Loading…
Reference in a new issue