fix(log-streaming): point logaddress at non-loopback IP

Reversed the wrong conclusion in 46bba0d. The vanilla L4D2 logaddress
UDP path is NOT broken — proved by retesting with destination
172.30.0.5:28000 (wireguard IP) which yielded 8 properly framed HL
Log Standard packets in 12s including real game events.

Root cause: the Source engine silently drops logaddress destinations
in 127.0.0.0/8. Registration succeeds and the cvar API reports
"logging to: udp" but sendto is never called for loopback. Every
other L4D2 stats deployment (multiple production HLstatsX:CE
instances) puts the collector on a separate host or interface IP
and never hits this.

Defaults: LOG_LISTENER_BIND=0.0.0.0:28000 (accept on any interface);
LOG_LISTENER_ADDR="" (production must set via web.env to the host's
non-loopback IP). Empty default = safe no-op for dev. The kernel's
same-host routing optimization keeps the traffic on lo internally
but the packet's destination IP must not literally be in 127/8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-20 00:31:45 +02:00
parent 46bba0d134
commit 122e0abddd
No known key found for this signature in database
4 changed files with 123 additions and 78 deletions

View file

@ -234,17 +234,21 @@ End-to-end smoke test, on the dev box first:
## Post-deploy verification findings (2026-05-20)
The plan above describes what we INTENDED to ship; this section
documents what we ACTUALLY learned after deployment.
The plan above describes what we INTENDED to ship. This section
documents what we ACTUALLY learned, including a wrong intermediate
conclusion that was later corrected.
Listener was deployed and confirmed bound (`ss -ulnp` showed
`gunicorn:28000`). Servers were re-initialized, `server.cfg`
inspected and confirmed correct — `log on`, `logaddress_add
127.0.0.1:28000` at the bottom, after `rcon_password`. The listener
was proven healthy end-to-end with a local `nc` probe writing a
capture file. **Yet captures stayed empty even during active play.**
### What worked
Diagnostic steps:
Listener deployed and confirmed bound (`ss -ulnp` showed
`gunicorn:28000`). Servers re-initialized; `server.cfg` had
`log on` / `logaddress_add 127.0.0.1:28000` after `rcon_password`.
Listener proven healthy end-to-end with a local `nc` probe writing
a capture file.
### What didn't (and why)
Captures stayed empty even during active play. Symptoms:
1. `logaddress_list` via RCON → `1 entry: 127.0.0.1:28000`
2. `log` via RCON → `currently logging to: file, console, udp`
@ -256,54 +260,91 @@ Diagnostic steps:
4. `tcpdump -i lo udp port 28000` during rcon say/echo/status bursts
**0 packets**
5. `tcpdump -i any host 127.0.0.1 and udp` → still 0 ✗
6. Toggle `log off` / `log on` live → engine confirms cycle, no
effect ✗
7. `sv_logflush 1` live → no effect ✗
8. Suspected `SocketBindAllow=udp:27000-27999` was blocking the
implicit ephemeral bind of a separate log socket. Added drop-in
`SocketBindAllow=udp:32768-60999`, daemon-reloaded, restarted
server → still 0 packets ✗. Drop-in rolled back.
9. `strace -p <srcds_linux> -e sendto,sendmsg` while triggering
events → **zero sendto/sendmsg calls toward 127.0.0.1:28000**.
Only one unrelated TLS sendmsg (Steam) ✗.
6. Toggle `log off` / `log on` live, `sv_logflush 1` → no effect ✗
7. Tested `SocketBindAllow=udp:32768-60999` drop-in (suspected the
ephemeral source-port bind was being rejected) → still 0
packets. Drop-in rolled back ✗
8. `strace -p <srcds_linux> -e sendto,sendmsg` → **zero sendto
calls toward the destination** ✗
**Verdict:** L4D2's vanilla UDP logaddress emit path is
stubbed/broken at the engine level. The cvar API and registration
paths are intact (data-structure ops); the emit path simply doesn't
make the syscalls. Logs reach the **file** and **console** sinks
(the latter is what `journalctl` captures from srcds_run's stdout)
but never the **udp** sink, despite all three being listed in the
`log` cvar status. Same family of issue as L4D2's broken SourceTV
(see `shqke/sourcetvsupport`).
A premature conclusion was reached and committed as `46bba0d`
("L4D2 logaddress UDP emit is dead"). User pushed back, asked for
verification. Research found multiple production HLstatsX:CE
instances running L4D2 stats successfully — disproving the
engine-bug theory.
**Implications for the roadmap:**
### Real cause
- The current listener captures nothing in production. The cfg
injection of `log on` / `logaddress_add` does no harm and is kept
for a future bridge to attach to.
- Phase 2 path A (recommended): **SourceMod bridge plugin** in
`l4d2host/`. Subscribes to L4D2 game events and calls
`LogMessage()` from inside srcds. The explicit `LogMessage()`
path is implemented separately from the autonomous engine emit
and DOES reach UDP destinations on every Source 1 game stat
trackers have used in production. This proves out the listener.
- Phase 2 path B (interim, no plugin): **parse the journal**.
`journalctl -u "left4me-server@*.service" -f` already streams
every HL-Log-Standard line as captured srcds stdout. A reducer
that follows journald + matches the `srcds_run[*]: L ...` lines
produces structured events today without any srcds-side change.
Less clean than UDP-to-listener (we'd hop through journald) but
available immediately.
- `mp_logdetail` was removed from the cfg injection: it is a
CS-only cvar (L4D2 prints `Unknown command "mp_logdetail"` at
startup).
Re-registered logaddress at a non-loopback IP (`172.30.0.5:28000`
on the wireguard interface) and reran the test. **8 packets in 12
seconds**, each a properly framed HL-Log-Standard payload —
including `Console<0>" say "wg-test-1"`, all the rcon-from lines,
and live poller status calls.
**The Source engine silently drops `logaddress` destinations in
`127.0.0.0/8`.** Registration succeeds (data-structure op), the
cvar API reports "logging to: udp", but the engine's send loop
filters out loopback destinations and never calls sendto for them.
This is presumably an anti-self-loop / anti-amplification measure
that I have not seen documented in any Valve or community source.
Everyone else using `logaddress` for stat tracking puts the
collector on a *separate host* or at minimum a different interface
IP — they never hit this. We're the unusual case of co-locating
the listener with the gameserver and naively pointing at
`127.0.0.1`.
### Fix
- `LOG_LISTENER_BIND` default → `0.0.0.0:28000` (accept on any
interface).
- `LOG_LISTENER_ADDR` default → `""` (empty). Production env file
MUST set this to a non-loopback IP. Dev gets a safe no-op
(cfg injector skips emitting log cvars when the address is
empty).
- Production `web.env`: set
`LOG_LISTENER_ADDR=<host-non-loopback-ip>:28000`. The host's
public interface IP works; the kernel's same-host routing
optimization keeps the actual traffic on `lo` internally, but the
*destination IP* in the packet header is non-loopback so Source's
filter passes.
### Implications for the roadmap (revised)
- The vanilla UDP logaddress mechanism works in L4D2. **No SourceMod
bridge is required for Phase 1** — we were going to add one to
work around a non-existent engine bug.
- `mp_logdetail` was correctly removed from the cfg injection: it
is a CS-only cvar; L4D2 prints `Unknown command` at startup.
- A future SourceMod plugin in `l4d2host/` may still be useful for
L4D-specific events the engine doesn't auto-log (kill events with
weapon detail, special-infected spawns) — see
`SMILEWHENYOUDIE/HLstatsX-CE`'s `superlogs-l4d.sp` for prior
art. That's a Phase 2 enhancement, not a Phase 1 prerequisite.
### Lessons (filed under "validate before implementing")
- I anchored hard on "L4D2 engine bug" after disproving two
reasonable alternative hypotheses (hibernation, SocketBindAllow).
The third alternative — destination-IP filter — wasn't tested
before committing the wrong-conclusion docs. Should have tried a
non-loopback destination as the very first test.
- The journal evidence of rich game events going to console/file
was real and significant, but I misread it as proof of an
engine-level UDP stub instead of evidence the engine's line
generator works fine and only the *destination filter* was at
play.
- The user's push-back ("maybe do research if it's really broken?")
forced the right next step. Worth internalizing: a single
contrarian search query found HLstatsX-on-L4D2 production
instances within seconds and would have prevented the
wrong-conclusion commit.
## Follow-ups (separate plans)
- **Phase 2:** pick Path A (SourceMod bridge) vs Path B (journald
follower). A is cleaner long-term; B is faster to ship.
- Once events flow somehow: design `RawLogLine` / `LogEvent` /
`MatchSession` / `Round` schema based on what we actually see.
- **After a few days of capture data**: design
`RawLogLine` / `LogEvent` / `MatchSession` / `Round` schema.
- Build a reducer.
- `sv_logsecret` becomes relevant only once the listener is
reachable beyond loopback (currently moot).
- Consider a SourceMod bridge for richer L4D2-specific events
(kill weapons, special-infected spawns, finale outcomes).
- `sv_logsecret` if the listener is ever moved off-host.

View file

@ -22,8 +22,13 @@ DEFAULT_CONFIG: dict[str, object] = {
"STEAM_PROFILE_TTL_SECONDS": 86400,
"STEAM_WEB_API_KEY": "",
"LOG_LISTENER_ENABLED": True,
"LOG_LISTENER_BIND": "127.0.0.1:28000",
"LOG_LISTENER_ADDR": "127.0.0.1:28000",
"LOG_LISTENER_BIND": "0.0.0.0:28000",
# Empty by default — production MUST set this to a non-loopback IP via
# web.env. Source silently drops logaddress destinations in 127.0.0.0/8,
# so 127.0.0.1 here would register but never receive packets. When
# empty, the cfg injector skips emitting `log on` / `logaddress_add`,
# which is the safer no-op for dev environments without a srcds host.
"LOG_LISTENER_ADDR": "",
"LOG_CAPTURE_DIR": "/var/lib/left4me/captures",
}
@ -54,7 +59,7 @@ def load_config() -> dict[str, object]:
"STEAM_PROFILE_TTL_SECONDS": int(os.getenv("STEAM_PROFILE_TTL_SECONDS", "86400")),
"STEAM_WEB_API_KEY": os.getenv("STEAM_WEB_API_KEY", ""),
"LOG_LISTENER_ENABLED": _bool_from_env(os.getenv("LOG_LISTENER_ENABLED", "true")),
"LOG_LISTENER_BIND": os.getenv("LOG_LISTENER_BIND", "127.0.0.1:28000"),
"LOG_LISTENER_ADDR": os.getenv("LOG_LISTENER_ADDR", "127.0.0.1:28000"),
"LOG_LISTENER_BIND": os.getenv("LOG_LISTENER_BIND", "0.0.0.0:28000"),
"LOG_LISTENER_ADDR": os.getenv("LOG_LISTENER_ADDR", ""),
"LOG_CAPTURE_DIR": os.getenv("LOG_CAPTURE_DIR", "/var/lib/left4me/captures"),
}

View file

@ -55,16 +55,15 @@ def build_server_spec_payload(
# nor hostname can override it (Source's cvar semantics are last-wins).
if server.rcon_password:
config_lines.append(f'rcon_password "{server.rcon_password}"')
# HL-Log-Standard UDP streaming registration. Investigation on 2026-05-20
# established that L4D2's vanilla logaddress UDP emit path is stubbed/broken:
# `log` cvar reports "logging to: file, console, udp" and `logaddress_list`
# shows the registration, but srcds_linux makes ZERO sendto/sendmsg calls
# toward the destination (proven by strace + tcpdump showing 0 packets).
# Local file logging works fine. So the listener currently captures nothing
# in production. The cvars are kept registered for a future SourceMod
# bridge plugin that will call LogMessage() directly — those messages DO
# flow through logaddress destinations on every Source 1 game including L4D2.
# `mp_logdetail` was removed: it is a CS-only cvar (`Unknown command` on L4D2).
# HL-Log-Standard UDP streaming registration. CRITICAL: log_listener_addr
# MUST be a non-loopback IP. The Source engine silently drops logaddress
# destinations in 127.0.0.0/8 — registration succeeds (logaddress_list
# shows it, `log` cvar reports "logging to: udp") but sendto is never
# called for that destination. Use the host's non-loopback IP (public
# interface, wireguard, dummy interface, etc.) and have the listener
# bind on 0.0.0.0 to receive same-host packets routed via lo by the
# kernel's local-destination shortcut.
# `mp_logdetail` is CS-only and emits `Unknown command` on L4D2; omitted.
if log_listener_addr:
config_lines.append("log on")
config_lines.append(f"logaddress_add {log_listener_addr}")

View file

@ -1,4 +1,4 @@
"""Temporary UDP listener for srcds log streaming (HL Log Standard).
"""UDP listener for srcds log streaming (HL Log Standard).
Captures raw `logaddress_add`-streamed packets from every managed L4D2
server into flat files under LOG_CAPTURE_DIR one file per source
@ -6,16 +6,16 @@ server into flat files under LOG_CAPTURE_DIR — one file per source
reducer. The goal is to observe what L4D2 actually emits on our
servers before we commit to a schema or event vocabulary.
KNOWN-DEAD AS OF 2026-05-20: L4D2's vanilla logaddress UDP emit path
is stubbed/broken at the engine level. srcds reports
"currently logging to: file, console, udp" and logaddress_list shows
our address, but strace + tcpdump confirmed srcds_linux makes ZERO
sendto/sendmsg calls toward the destination. Local file logging works
fine; only UDP broadcast is dead. This matches the broken-SourceTV
pattern in L4D2 see docs/superpowers/plans/2026-05-19-...md for the
full investigation. The listener is therefore a no-op in production
until a SourceMod bridge plugin is deployed that calls LogMessage()
directly (those messages DO flow through logaddress destinations).
CRITICAL: the listener MUST bind on a non-loopback interface (default
LOG_LISTENER_BIND=0.0.0.0:28000) and the cfg-injected logaddress MUST
point to a non-loopback IP (LOG_LISTENER_ADDR set in web.env).
The Source engine silently drops logaddress destinations in
127.0.0.0/8 registration succeeds but no packets are ever sent
(verified 2026-05-20 with strace + tcpdump). Use the host's public
interface IP, a wireguard interface IP, or a dummy interface IP; the
kernel's same-host routing optimization will keep the traffic on lo
internally but the destination IP must not literally be in 127/8.
See docs/superpowers/plans/2026-05-19-...md for the full investigation.
Packet shape (per HL Log Standard):
4 bytes 0xFF | out-of-band marker