CAKE egress shaping (test-deploy oneshot + systemd-networkd [CAKE] block on prod), nftables uid-based DSCP-EF + skb-priority marking for srcds UDP, plus rounding sysctls (udp_rmem_min/wmem_min, default_qdisc=fq_codel, tcp_congestion_control=bbr). Hardware-specific knobs stay documented escape hatches matching the perf-baseline boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
l4d2 network shaping & marking — design
Date: 2026-05-10 Status: design
Summary
Add a network-side player-experience baseline alongside the existing host perf baseline. Three concerns ship together:
- Mark srcds outbound packets with DSCP
EFand skb priority6:0so any qdisc — host CAKE, ISP gear that honours DSCP, future systems — recognises L4D2 game traffic as latency-sensitive. Marking happens by uid match on theleft4meuser. - Round out the UDP-socket sysctl baseline (
udp_rmem_min,udp_wmem_min), set the default qdisc explicitly tofq_codel, and switch TCP tobbrso coexisting TCP egress (admin, backups, web app, apt) cannot bufferbloat the link the players share. - Shape egress with CAKE. On the test deploy, install a systemd oneshot
that applies
tc qdisc replace … cake …from an operator-edited env file. On production hosts runningsystemd-networkd, document the equivalent[CAKE]section in the matching.networkfile as the long-term path.
The intent is "all reasonable measures that do not depend on host-specific hardware." Hardware-specific tuning (NIC ring buffers, IRQ pinning, CPU governor, real-time scheduling, CPU affinity) remains a documented escape hatch — same boundary the existing perf-baseline spec drew. The pieces that are universally safe ship as defaults.
Goals
- Game-server UDP packets carry an unambiguous priority signal in DSCP and
in
skb->priority, set on the host before any qdisc inspects them. - A coexisting bulk TCP flow on the same host (backup upload, package fetch, web-app response) cannot push the bottleneck queue ahead of game UDP under saturation.
- An operator who declares uplink bandwidth gets fair-queueing egress shaping with diffserv-aware tin selection — i.e. EF-marked srcds traffic drops into the highest-priority CAKE tin, per-destination-host fairness keeps every connected player on equal footing.
- A production deployment using
systemd-networkdhas a one-block configuration recipe, no helper script needed. - Operators have a documented set of additional knobs (ingress shaping via
IFB,
busy_poll, GRO toggling) for cases the default baseline does not cover. None of these auto-apply.
Non-goals
- NIC ring-buffer / IRQ pinning / RPS / RFS / hardware timestamping — already declared host-specific in the perf-baseline spec; not re-litigated here.
busy_poll/busy_readas defaults — non-trivial CPU cost; documented as opt-in.- Ingress shaping via IFB as a default — only matters if egress CAKE turns out load-bearing and ingress is also saturated; documented as opt-in.
- Real-time scheduling, governor changes — already declined by the perf-baseline spec.
- Blueprint-side game settings (
sv_minrate,sv_maxrate, tickrate,fps_max) — owned by the server maintainer. - Auto-detection or measurement of uplink bandwidth. CAKE only shapes correctly when its declared bandwidth sits below the real bottleneck; the operator must measure once and configure.
- Iface-flap watchdog.
tc qdisc replaceis idempotent; on prod,systemd-networkdreapplies CAKE across iface lifecycle events. On test,systemctl restart left4me-cake.serviceis the documented recovery.
Background
Current state (commit 62d6d4c or thereabouts):
- The perf-baseline spec ships
/etc/sysctl.d/99-left4me.confwithrmem_max,wmem_max,rmem_default,wmem_default,netdev_max_backlog,netdev_budget,vm.swappiness. No per-socket UDP minimums, no default-qdisc directive, no TCP congestion-control setting. srcds_runruns as system userleft4me. srcds itself does not setIP_TOSorSO_PRIORITY, so its UDP packets leave the host with DSCP 0 and priority 0 — indistinguishable from any other UDP traffic to any qdisc.- The deploy ships nftables-relevant infrastructure only via package
defaults (Debian Trixie ships
nftablesin base, but noleft4metable is created). - No qdisc is explicitly configured. The kernel's per-iface default
applies —
fq_codelon Trixie, but only because Debian's default has beenfq_codelsince Buster. - The deploy script already copies sysctl drop-ins and runs
sysctl --system(deploy/deploy-test-server.sh:196).
Design
Sysctl additions to 99-left4me.conf
Append to deploy/files/etc/sysctl.d/99-left4me.conf:
# Per-socket UDP buffer floors: protect game-server sockets that don't bump
# their own SO_RCVBUF/SO_SNDBUF when softirq drains lag briefly.
net.ipv4.udp_rmem_min = 16384
net.ipv4.udp_wmem_min = 16384
# Default qdisc for ifaces we don't explicitly shape with CAKE. Debian
# Trixie already defaults to fq_codel; setting it explicitly is
# belt-and-suspenders and survives kernel-default churn.
net.core.default_qdisc = fq_codel
# TCP congestion control: BBR for any bulk TCP egress on the host (admin
# SSH, backups, package fetches, web-app responses) so a long flow does
# not push the bottleneck queue ahead of game UDP. UDP srcds is
# unaffected.
net.ipv4.tcp_congestion_control = bbr
The deploy already runs sysctl --system after copying the conf
(deploy/deploy-test-server.sh:198); no script change required for this
block.
nftables packet marking
New file deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft:
table inet left4me_mark {
chain mangle_output {
type filter hook output priority mangle; policy accept;
meta skuid "left4me" meta l4proto udp ip dscp set ef meta priority set 0006:0000
meta skuid "left4me" meta l4proto udp ip6 dscp set ef meta priority set 0006:0000
}
}
Per-element rationale:
meta skuid "left4me"— every srcds instance runs as that user. The match is exact; nothing else on the host matches. No false positives against the web app (which runs asleft4metoo but speaks TCP) or the build sandbox (different uid).meta l4proto udp— bypass anything not UDP, including the future RCON/HTTP TCP traffic from the web app.ip dscp set ef/ip6 dscp set ef— DSCPEF(Expedited Forwarding, decimal 46) is the standard low-latency marking. CAKE'sdiffserv4preset routes EF into its highest-priority "Voice" tin. Two rules, one per L3 family, because in aninettable theipmatcher only fires on v4 andip6only on v6.meta priority set 0006:0000— setsskb->priorityto class6:0. Read by qdiscs that classify on skb priority (CAKE included) ahead of any DSCP table lookup. Set inline with the DSCP rule so a single rule-match runs both statements.
The table is named left4me_mark and lives in its own inet namespace.
It does not touch, depend on, or conflict with any nftables config the
operator may run independently. nft -f loads the file; nft delete table inet left4me_mark cleanly removes it.
New unit deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service:
[Unit]
Description=left4me nftables packet marking (DSCP EF + priority for srcds)
After=network-pre.target
Before=network.target
Wants=network-pre.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/nft -f /usr/local/lib/left4me/nft/left4me-mark.nft
ExecStop=/usr/sbin/nft delete table inet left4me_mark
[Install]
WantedBy=multi-user.target
After=network-pre.target / Before=network.target keeps the rules in
place before any iface comes up, so the very first packet srcds emits
post-boot is already marked.
Deploy script changes:
- Ensure
nftablesis installed (apt-get install -y nftables; idempotent — package is in Trixie base). - Create
/usr/local/lib/left4me/nft/and copyleft4me-mark.nftinto it. - Copy the unit,
daemon-reload,systemctl enable --now left4me-nft-mark.service.
CAKE egress shaper — test deploy mechanism
Three files plus deploy-script changes. All operator-tunable knobs go in the env file; the helper and unit are static.
deploy/files/etc/left4me/cake.env (template; deploy installs only
if absent so operator edits survive re-runs):
# Uplink bandwidth in Mbit/s. Set to ~95% of the smaller of measured
# upload and measured download. CAKE only shapes correctly when its
# declared bandwidth sits below the real bottleneck. If unset, the
# left4me-cake.service unit logs a warning and exits 0 (no shaping).
LEFT4ME_UPLINK_MBIT=
# Egress interface. If unset, auto-detected from the IPv4 default route.
LEFT4ME_UPLINK_IFACE=
deploy/files/usr/local/libexec/left4me/left4me-apply-cake (mode
0755, owner root:root). The helper takes a single argument — apply
or clear — so the unit's ExecStart and ExecStop both call the same
script and the unit file stays free of shell escaping:
#!/bin/sh
set -eu
mode=${1:-apply}
if [ -r /etc/left4me/cake.env ]; then
. /etc/left4me/cake.env
fi
resolve_iface() {
if [ -n "${LEFT4ME_UPLINK_IFACE:-}" ]; then
printf '%s' "$LEFT4ME_UPLINK_IFACE"
return
fi
ip -4 route show default | awk '/default/ {print $5; exit}'
}
case "$mode" in
apply)
if [ -z "${LEFT4ME_UPLINK_MBIT:-}" ]; then
echo "left4me-cake: LEFT4ME_UPLINK_MBIT unset; skipping shaper" >&2
exit 0
fi
iface=$(resolve_iface)
if [ -z "$iface" ]; then
echo "left4me-cake: cannot determine egress iface; skipping" >&2
exit 0
fi
exec tc qdisc replace dev "$iface" root cake \
bandwidth "${LEFT4ME_UPLINK_MBIT}mbit" \
internet diffserv4 dual-dsthost
;;
clear)
iface=$(resolve_iface)
if [ -z "$iface" ]; then
exit 0
fi
tc qdisc del dev "$iface" root 2>/dev/null || true
;;
*)
echo "usage: $0 [apply|clear]" >&2
exit 2
;;
esac
tc qdisc replace is idempotent: replaces an existing root qdisc on the
iface, adds one if absent. Re-running the unit any time is safe. clear
swallows the "no such qdisc" error so stop is also idempotent.
Fail-soft on missing config matches the perf-baseline philosophy — the
deploy does not refuse to boot servers because the operator has not yet
filled in LEFT4ME_UPLINK_MBIT. The journal warning surfaces the gap.
deploy/files/usr/local/lib/systemd/system/left4me-cake.service:
[Unit]
Description=left4me CAKE egress shaper
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=-/etc/left4me/cake.env
ExecStart=/usr/local/libexec/left4me/left4me-apply-cake apply
ExecStop=/usr/local/libexec/left4me/left4me-apply-cake clear
[Install]
WantedBy=multi-user.target
Per-flag rationale for the cake invocation:
bandwidth ${LEFT4ME_UPLINK_MBIT}mbit— operator-declared, ≈95% of measured uplink. CAKE only shapes if its declared bandwidth is below the real bottleneck; setting it slightly low moves the queue into a place the host controls.internet— overhead-accounting keyword that handles common Ethernet+ISP encapsulation (DOCSIS / GPON / PPPoE) correctly without undershooting. Conservative default.diffserv4— four-tier DSCP-aware tin selection. Reads the EF marks set by the nftables rule and routes srcds packets into the highest-priority "Voice" tin. Withoutdiffserv4, the marks are ignored.dual-dsthost— egress fairness keyed on destination host. With ≥2 players connected, each player gets fair share regardless of how chatty the server is to any single client.
Iface-flap behaviour: the kernel keeps the qdisc on an iface across
link-down/link-up while the iface itself exists. If the iface is
recreated (e.g., NetworkManager reconfiguration), systemctl restart left4me-cake.service reapplies. Documented; no auto-watchdog in v1.
Deploy script changes (in deploy/deploy-test-server.sh):
- Copy
cake.envto/etc/left4me/cake.envonly if absent (do not clobber operator edits). - Copy
left4me-apply-caketo/usr/local/libexec/left4me/, mode0755, ownerroot:root. - Copy
left4me-cake.serviceto/usr/local/lib/systemd/system/. systemctl daemon-reload(already done in the existing flow).systemctl enable --now left4me-cake.service.
CAKE egress shaper — production deployment (systemd-networkd)
On hosts running systemd-networkd, the CAKE configuration belongs in
the matching .network file. systemd-networkd reapplies it across iface
lifecycle events, addressing the only fragility of the test-deploy
oneshot.
Document in deploy/README.md Performance section:
# /etc/systemd/network/<your-uplink>.network
[CAKE]
Bandwidth=480M
OverheadKeyword=internet
PriorityQueueingPreset=diffserv4
EgressHostIsolation=yes
Directive names follow systemd.network(5). Values mirror the test
deploy's tc invocation:
Bandwidth=480M— placeholder; operator sets to ≈95% of measured uplink in their actual.network.OverheadKeyword=internet— equivalent of theinternetkeyword.PriorityQueueingPreset=diffserv4— equivalent ofdiffserv4.EgressHostIsolation=yes— equivalent ofdual-dsthoston egress.
The nftables marking from the previous section ships unchanged on prod; it is qdisc-installer-agnostic.
The test-deploy oneshot does NOT install on a host running
systemd-networkd. v1 does not implement that gate — production hosts
do not run the test-deploy script. If the boundary blurs in the future,
add a check in left4me-apply-cake for systemctl is-active systemd-networkd and skip cleanly.
Documented escape hatches
Append to deploy/README.md Performance section, alongside the existing
governor / CPU-affinity / NIC entries:
- Ingress shaping via IFB. Egress CAKE alone does not protect srcds
receive against ingress saturation (large workshop downloads, package
fetches arriving at line rate). One-liner template using
modprobe ifb,ip link set ifb0 up,tc qdisc add dev ifb0 root cake bandwidth Xmbit ingress diffserv4 dual-srchost, and atc filterredirect from the uplink iface. Worth flipping only when measurement shows ingress hurting receive; in v1 we have no such measurement, so it stays documented. net.core.busy_poll = 50/net.core.busy_read = 50. Reduces UDP receive median latency by polling for incoming packets briefly at syscall boundaries. Cost: measurable CPU per syscall under load. Worth flipping if a host is dedicated to game serving and CPU headroom is plentiful.ethtool -K <iface> gro off. Some Source-engine ops disable generic receive offload to avoid receive-side coalescing latency. Hardware/driver dependent. Document, do not ship.
These three entries follow the existing escape-hatch style: a one-liner or short config block, plus one sentence on when it matters.
Files changed / added
deploy/files/etc/sysctl.d/99-left4me.conf (modified — block added)
deploy/files/usr/local/lib/left4me/nft/left4me-mark.nft (new)
deploy/files/usr/local/lib/systemd/system/left4me-nft-mark.service (new)
deploy/files/etc/left4me/cake.env (new — template, deploy preserves operator edits)
deploy/files/usr/local/libexec/left4me/left4me-apply-cake (new)
deploy/files/usr/local/lib/systemd/system/left4me-cake.service (new)
deploy/deploy-test-server.sh (modified — install+enable nft and cake units, conditional copy of cake.env)
deploy/README.md (modified — Network shaping subsection + 3 new escape hatches)
deploy/tests/test_deploy_artifacts.py (modified — assertions for all artifacts above)
Tests
Following the existing assert "key=value" in text pattern in
deploy/tests/test_deploy_artifacts.py:
Sysctl block (extension of the existing perf-baseline assertions):
- Each of
net.ipv4.udp_rmem_min = 16384,net.ipv4.udp_wmem_min = 16384,net.core.default_qdisc = fq_codel,net.ipv4.tcp_congestion_control = bbris asserted as a separate line.
nftables marking artifacts:
left4me-mark.nftships withtable inet left4me_mark,chain mangle_output,meta skuid "left4me",ip dscp set ef,ip6 dscp set ef, andmeta priority set 0006:0000each asserted as separate substring matches. (DSCP and priority statements appear inline on the same rule per L3 family; substring assertions don't depend on rule layout.)left4me-nft-mark.servicehasExecStart=/usr/sbin/nft -f /usr/local/lib/left4me/nft/left4me-mark.nft,ExecStop=/usr/sbin/nft delete table inet left4me_mark,Type=oneshot,RemainAfterExit=yes,WantedBy=multi-user.target.deploy-test-server.shinvokessystemctl enable --now left4me-nft-mark.service(or equivalent at-deploy enabling step).
CAKE artifacts:
cake.envtemplate contains the literal linesLEFT4ME_UPLINK_MBIT=andLEFT4ME_UPLINK_IFACE=(commented or uncommented; matched as substring).left4me-apply-cakecontains the literalstc qdisc replace,cake,bandwidth,internet,diffserv4,dual-dsthost,LEFT4ME_UPLINK_MBIT,LEFT4ME_UPLINK_IFACE.left4me-apply-cakeis mode0755after deploy (asserted via the same mechanism the existing helper-script tests use).left4me-cake.servicecontainsEnvironmentFile=-/etc/left4me/cake.env,ExecStart=/usr/local/libexec/left4me/left4me-apply-cake apply,ExecStop=/usr/local/libexec/left4me/left4me-apply-cake clear,Wants=network-online.target,Type=oneshot,WantedBy=multi-user.target.deploy-test-server.shinvokessystemctl enable --now left4me-cake.service.deploy-test-server.shcopiescake.envonly when target absent (asserted by literal substring of the guarding[ -e /etc/left4me/cake.env ]test or equivalent).
No runtime networking tests in v1. The artifacts are static; their runtime behaviour requires a real iface and a real bandwidth load, which the operator measures.
Rollout
Single deploy. After the new sysctl block lands, sysctl --system
applies it immediately (already in the deploy flow). The two new
systemd units start on systemctl enable --now; CAKE without a
configured LEFT4ME_UPLINK_MBIT logs a warning and no-ops, which is
the expected fresh-deploy state. The operator measures their uplink,
edits /etc/left4me/cake.env, and runs systemctl restart left4me-cake.service.
Already-running game servers are unaffected by the network changes themselves. The marking applies on every emitted packet from the moment the nft rule loads; future-emitted packets pick up DSCP+priority without restarting any srcds instance.
Open questions
None blocking. v2 candidates if measurement justifies them:
- A
LEFT4ME_INGRESS_MBITknob that flips on the IFB ingress shaper as a default, conditional on the env value being set. - A
left4me-net-doctorhelper that reports current qdisc, applied marks, and a one-shot saturation+ping measurement against a local endpoint. - A small Python wrapper in
l4d2hostthat readscake.envfor display in the web UI, so the operator sees in one place whether shaping is active.
References
tc-cake(8)— keyword semantics:bandwidth,internet,diffserv4,dual-dsthost, tin priority mapping.systemd.network(5)—[CAKE]section directives:Bandwidth=,OverheadKeyword=,PriorityQueueingPreset=,EgressHostIsolation=.nft(8)—meta skuid,meta priority,ip dscp set, table isolation semantics.- RFC 3246 — Expedited Forwarding (EF) PHB.
- Linux kernel
Documentation/networking/tcp_bbr.txt— BBR pairs withfq/fq_codelfor correct pacing. docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md— sibling spec; this spec extends99-left4me.confand reuses the same deploy-test-artifact pattern.