fix(log-streaming): point logaddress at non-loopback IP

Reversed the wrong conclusion in 46bba0d. The vanilla L4D2 logaddress
UDP path is NOT broken — proved by retesting with destination
172.30.0.5:28000 (wireguard IP) which yielded 8 properly framed HL
Log Standard packets in 12s including real game events.

Root cause: the Source engine silently drops logaddress destinations
in 127.0.0.0/8. Registration succeeds and the cvar API reports
"logging to: udp" but sendto is never called for loopback. Every
other L4D2 stats deployment (multiple production HLstatsX:CE
instances) puts the collector on a separate host or interface IP
and never hits this.

Defaults: LOG_LISTENER_BIND=0.0.0.0:28000 (accept on any interface);
LOG_LISTENER_ADDR="" (production must set via web.env to the host's
non-loopback IP). Empty default = safe no-op for dev. The kernel's
same-host routing optimization keeps the traffic on lo internally
but the packet's destination IP must not literally be in 127/8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-20 00:31:45 +02:00
parent 46bba0d134
commit 122e0abddd
No known key found for this signature in database
4 changed files with 123 additions and 78 deletions

View file

@ -234,17 +234,21 @@ End-to-end smoke test, on the dev box first:
## Post-deploy verification findings (2026-05-20) ## Post-deploy verification findings (2026-05-20)
The plan above describes what we INTENDED to ship; this section The plan above describes what we INTENDED to ship. This section
documents what we ACTUALLY learned after deployment. documents what we ACTUALLY learned, including a wrong intermediate
conclusion that was later corrected.
Listener was deployed and confirmed bound (`ss -ulnp` showed ### What worked
`gunicorn:28000`). Servers were re-initialized, `server.cfg`
inspected and confirmed correct — `log on`, `logaddress_add
127.0.0.1:28000` at the bottom, after `rcon_password`. The listener
was proven healthy end-to-end with a local `nc` probe writing a
capture file. **Yet captures stayed empty even during active play.**
Diagnostic steps: Listener deployed and confirmed bound (`ss -ulnp` showed
`gunicorn:28000`). Servers re-initialized; `server.cfg` had
`log on` / `logaddress_add 127.0.0.1:28000` after `rcon_password`.
Listener proven healthy end-to-end with a local `nc` probe writing
a capture file.
### What didn't (and why)
Captures stayed empty even during active play. Symptoms:
1. `logaddress_list` via RCON → `1 entry: 127.0.0.1:28000` 1. `logaddress_list` via RCON → `1 entry: 127.0.0.1:28000`
2. `log` via RCON → `currently logging to: file, console, udp` 2. `log` via RCON → `currently logging to: file, console, udp`
@ -256,54 +260,91 @@ Diagnostic steps:
4. `tcpdump -i lo udp port 28000` during rcon say/echo/status bursts 4. `tcpdump -i lo udp port 28000` during rcon say/echo/status bursts
**0 packets** **0 packets**
5. `tcpdump -i any host 127.0.0.1 and udp` → still 0 ✗ 5. `tcpdump -i any host 127.0.0.1 and udp` → still 0 ✗
6. Toggle `log off` / `log on` live → engine confirms cycle, no 6. Toggle `log off` / `log on` live, `sv_logflush 1` → no effect ✗
effect ✗ 7. Tested `SocketBindAllow=udp:32768-60999` drop-in (suspected the
7. `sv_logflush 1` live → no effect ✗ ephemeral source-port bind was being rejected) → still 0
8. Suspected `SocketBindAllow=udp:27000-27999` was blocking the packets. Drop-in rolled back ✗
implicit ephemeral bind of a separate log socket. Added drop-in 8. `strace -p <srcds_linux> -e sendto,sendmsg` → **zero sendto
`SocketBindAllow=udp:32768-60999`, daemon-reloaded, restarted calls toward the destination** ✗
server → still 0 packets ✗. Drop-in rolled back.
9. `strace -p <srcds_linux> -e sendto,sendmsg` while triggering
events → **zero sendto/sendmsg calls toward 127.0.0.1:28000**.
Only one unrelated TLS sendmsg (Steam) ✗.
**Verdict:** L4D2's vanilla UDP logaddress emit path is A premature conclusion was reached and committed as `46bba0d`
stubbed/broken at the engine level. The cvar API and registration ("L4D2 logaddress UDP emit is dead"). User pushed back, asked for
paths are intact (data-structure ops); the emit path simply doesn't verification. Research found multiple production HLstatsX:CE
make the syscalls. Logs reach the **file** and **console** sinks instances running L4D2 stats successfully — disproving the
(the latter is what `journalctl` captures from srcds_run's stdout) engine-bug theory.
but never the **udp** sink, despite all three being listed in the
`log` cvar status. Same family of issue as L4D2's broken SourceTV
(see `shqke/sourcetvsupport`).
**Implications for the roadmap:** ### Real cause
- The current listener captures nothing in production. The cfg Re-registered logaddress at a non-loopback IP (`172.30.0.5:28000`
injection of `log on` / `logaddress_add` does no harm and is kept on the wireguard interface) and reran the test. **8 packets in 12
for a future bridge to attach to. seconds**, each a properly framed HL-Log-Standard payload —
- Phase 2 path A (recommended): **SourceMod bridge plugin** in including `Console<0>" say "wg-test-1"`, all the rcon-from lines,
`l4d2host/`. Subscribes to L4D2 game events and calls and live poller status calls.
`LogMessage()` from inside srcds. The explicit `LogMessage()`
path is implemented separately from the autonomous engine emit **The Source engine silently drops `logaddress` destinations in
and DOES reach UDP destinations on every Source 1 game stat `127.0.0.0/8`.** Registration succeeds (data-structure op), the
trackers have used in production. This proves out the listener. cvar API reports "logging to: udp", but the engine's send loop
- Phase 2 path B (interim, no plugin): **parse the journal**. filters out loopback destinations and never calls sendto for them.
`journalctl -u "left4me-server@*.service" -f` already streams This is presumably an anti-self-loop / anti-amplification measure
every HL-Log-Standard line as captured srcds stdout. A reducer that I have not seen documented in any Valve or community source.
that follows journald + matches the `srcds_run[*]: L ...` lines
produces structured events today without any srcds-side change. Everyone else using `logaddress` for stat tracking puts the
Less clean than UDP-to-listener (we'd hop through journald) but collector on a *separate host* or at minimum a different interface
available immediately. IP — they never hit this. We're the unusual case of co-locating
- `mp_logdetail` was removed from the cfg injection: it is a the listener with the gameserver and naively pointing at
CS-only cvar (L4D2 prints `Unknown command "mp_logdetail"` at `127.0.0.1`.
startup).
### Fix
- `LOG_LISTENER_BIND` default → `0.0.0.0:28000` (accept on any
interface).
- `LOG_LISTENER_ADDR` default → `""` (empty). Production env file
MUST set this to a non-loopback IP. Dev gets a safe no-op
(cfg injector skips emitting log cvars when the address is
empty).
- Production `web.env`: set
`LOG_LISTENER_ADDR=<host-non-loopback-ip>:28000`. The host's
public interface IP works; the kernel's same-host routing
optimization keeps the actual traffic on `lo` internally, but the
*destination IP* in the packet header is non-loopback so Source's
filter passes.
### Implications for the roadmap (revised)
- The vanilla UDP logaddress mechanism works in L4D2. **No SourceMod
bridge is required for Phase 1** — we were going to add one to
work around a non-existent engine bug.
- `mp_logdetail` was correctly removed from the cfg injection: it
is a CS-only cvar; L4D2 prints `Unknown command` at startup.
- A future SourceMod plugin in `l4d2host/` may still be useful for
L4D-specific events the engine doesn't auto-log (kill events with
weapon detail, special-infected spawns) — see
`SMILEWHENYOUDIE/HLstatsX-CE`'s `superlogs-l4d.sp` for prior
art. That's a Phase 2 enhancement, not a Phase 1 prerequisite.
### Lessons (filed under "validate before implementing")
- I anchored hard on "L4D2 engine bug" after disproving two
reasonable alternative hypotheses (hibernation, SocketBindAllow).
The third alternative — destination-IP filter — wasn't tested
before committing the wrong-conclusion docs. Should have tried a
non-loopback destination as the very first test.
- The journal evidence of rich game events going to console/file
was real and significant, but I misread it as proof of an
engine-level UDP stub instead of evidence the engine's line
generator works fine and only the *destination filter* was at
play.
- The user's push-back ("maybe do research if it's really broken?")
forced the right next step. Worth internalizing: a single
contrarian search query found HLstatsX-on-L4D2 production
instances within seconds and would have prevented the
wrong-conclusion commit.
## Follow-ups (separate plans) ## Follow-ups (separate plans)
- **Phase 2:** pick Path A (SourceMod bridge) vs Path B (journald - **After a few days of capture data**: design
follower). A is cleaner long-term; B is faster to ship. `RawLogLine` / `LogEvent` / `MatchSession` / `Round` schema.
- Once events flow somehow: design `RawLogLine` / `LogEvent` /
`MatchSession` / `Round` schema based on what we actually see.
- Build a reducer. - Build a reducer.
- `sv_logsecret` becomes relevant only once the listener is - Consider a SourceMod bridge for richer L4D2-specific events
reachable beyond loopback (currently moot). (kill weapons, special-infected spawns, finale outcomes).
- `sv_logsecret` if the listener is ever moved off-host.

View file

@ -22,8 +22,13 @@ DEFAULT_CONFIG: dict[str, object] = {
"STEAM_PROFILE_TTL_SECONDS": 86400, "STEAM_PROFILE_TTL_SECONDS": 86400,
"STEAM_WEB_API_KEY": "", "STEAM_WEB_API_KEY": "",
"LOG_LISTENER_ENABLED": True, "LOG_LISTENER_ENABLED": True,
"LOG_LISTENER_BIND": "127.0.0.1:28000", "LOG_LISTENER_BIND": "0.0.0.0:28000",
"LOG_LISTENER_ADDR": "127.0.0.1:28000", # Empty by default — production MUST set this to a non-loopback IP via
# web.env. Source silently drops logaddress destinations in 127.0.0.0/8,
# so 127.0.0.1 here would register but never receive packets. When
# empty, the cfg injector skips emitting `log on` / `logaddress_add`,
# which is the safer no-op for dev environments without a srcds host.
"LOG_LISTENER_ADDR": "",
"LOG_CAPTURE_DIR": "/var/lib/left4me/captures", "LOG_CAPTURE_DIR": "/var/lib/left4me/captures",
} }
@ -54,7 +59,7 @@ def load_config() -> dict[str, object]:
"STEAM_PROFILE_TTL_SECONDS": int(os.getenv("STEAM_PROFILE_TTL_SECONDS", "86400")), "STEAM_PROFILE_TTL_SECONDS": int(os.getenv("STEAM_PROFILE_TTL_SECONDS", "86400")),
"STEAM_WEB_API_KEY": os.getenv("STEAM_WEB_API_KEY", ""), "STEAM_WEB_API_KEY": os.getenv("STEAM_WEB_API_KEY", ""),
"LOG_LISTENER_ENABLED": _bool_from_env(os.getenv("LOG_LISTENER_ENABLED", "true")), "LOG_LISTENER_ENABLED": _bool_from_env(os.getenv("LOG_LISTENER_ENABLED", "true")),
"LOG_LISTENER_BIND": os.getenv("LOG_LISTENER_BIND", "127.0.0.1:28000"), "LOG_LISTENER_BIND": os.getenv("LOG_LISTENER_BIND", "0.0.0.0:28000"),
"LOG_LISTENER_ADDR": os.getenv("LOG_LISTENER_ADDR", "127.0.0.1:28000"), "LOG_LISTENER_ADDR": os.getenv("LOG_LISTENER_ADDR", ""),
"LOG_CAPTURE_DIR": os.getenv("LOG_CAPTURE_DIR", "/var/lib/left4me/captures"), "LOG_CAPTURE_DIR": os.getenv("LOG_CAPTURE_DIR", "/var/lib/left4me/captures"),
} }

View file

@ -55,16 +55,15 @@ def build_server_spec_payload(
# nor hostname can override it (Source's cvar semantics are last-wins). # nor hostname can override it (Source's cvar semantics are last-wins).
if server.rcon_password: if server.rcon_password:
config_lines.append(f'rcon_password "{server.rcon_password}"') config_lines.append(f'rcon_password "{server.rcon_password}"')
# HL-Log-Standard UDP streaming registration. Investigation on 2026-05-20 # HL-Log-Standard UDP streaming registration. CRITICAL: log_listener_addr
# established that L4D2's vanilla logaddress UDP emit path is stubbed/broken: # MUST be a non-loopback IP. The Source engine silently drops logaddress
# `log` cvar reports "logging to: file, console, udp" and `logaddress_list` # destinations in 127.0.0.0/8 — registration succeeds (logaddress_list
# shows the registration, but srcds_linux makes ZERO sendto/sendmsg calls # shows it, `log` cvar reports "logging to: udp") but sendto is never
# toward the destination (proven by strace + tcpdump showing 0 packets). # called for that destination. Use the host's non-loopback IP (public
# Local file logging works fine. So the listener currently captures nothing # interface, wireguard, dummy interface, etc.) and have the listener
# in production. The cvars are kept registered for a future SourceMod # bind on 0.0.0.0 to receive same-host packets routed via lo by the
# bridge plugin that will call LogMessage() directly — those messages DO # kernel's local-destination shortcut.
# flow through logaddress destinations on every Source 1 game including L4D2. # `mp_logdetail` is CS-only and emits `Unknown command` on L4D2; omitted.
# `mp_logdetail` was removed: it is a CS-only cvar (`Unknown command` on L4D2).
if log_listener_addr: if log_listener_addr:
config_lines.append("log on") config_lines.append("log on")
config_lines.append(f"logaddress_add {log_listener_addr}") config_lines.append(f"logaddress_add {log_listener_addr}")

View file

@ -1,4 +1,4 @@
"""Temporary UDP listener for srcds log streaming (HL Log Standard). """UDP listener for srcds log streaming (HL Log Standard).
Captures raw `logaddress_add`-streamed packets from every managed L4D2 Captures raw `logaddress_add`-streamed packets from every managed L4D2
server into flat files under LOG_CAPTURE_DIR one file per source server into flat files under LOG_CAPTURE_DIR one file per source
@ -6,16 +6,16 @@ server into flat files under LOG_CAPTURE_DIR — one file per source
reducer. The goal is to observe what L4D2 actually emits on our reducer. The goal is to observe what L4D2 actually emits on our
servers before we commit to a schema or event vocabulary. servers before we commit to a schema or event vocabulary.
KNOWN-DEAD AS OF 2026-05-20: L4D2's vanilla logaddress UDP emit path CRITICAL: the listener MUST bind on a non-loopback interface (default
is stubbed/broken at the engine level. srcds reports LOG_LISTENER_BIND=0.0.0.0:28000) and the cfg-injected logaddress MUST
"currently logging to: file, console, udp" and logaddress_list shows point to a non-loopback IP (LOG_LISTENER_ADDR set in web.env).
our address, but strace + tcpdump confirmed srcds_linux makes ZERO The Source engine silently drops logaddress destinations in
sendto/sendmsg calls toward the destination. Local file logging works 127.0.0.0/8 registration succeeds but no packets are ever sent
fine; only UDP broadcast is dead. This matches the broken-SourceTV (verified 2026-05-20 with strace + tcpdump). Use the host's public
pattern in L4D2 see docs/superpowers/plans/2026-05-19-...md for the interface IP, a wireguard interface IP, or a dummy interface IP; the
full investigation. The listener is therefore a no-op in production kernel's same-host routing optimization will keep the traffic on lo
until a SourceMod bridge plugin is deployed that calls LogMessage() internally but the destination IP must not literally be in 127/8.
directly (those messages DO flow through logaddress destinations). See docs/superpowers/plans/2026-05-19-...md for the full investigation.
Packet shape (per HL Log Standard): Packet shape (per HL Log Standard):
4 bytes 0xFF | out-of-band marker 4 bytes 0xFF | out-of-band marker