fix(log-streaming): point logaddress at non-loopback IP
Reversed the wrong conclusion in 46bba0d. The vanilla L4D2 logaddress
UDP path is NOT broken — proved by retesting with destination
172.30.0.5:28000 (wireguard IP) which yielded 8 properly framed HL
Log Standard packets in 12s including real game events.
Root cause: the Source engine silently drops logaddress destinations
in 127.0.0.0/8. Registration succeeds and the cvar API reports
"logging to: udp" but sendto is never called for loopback. Every
other L4D2 stats deployment (multiple production HLstatsX:CE
instances) puts the collector on a separate host or interface IP
and never hits this.
Defaults: LOG_LISTENER_BIND=0.0.0.0:28000 (accept on any interface);
LOG_LISTENER_ADDR="" (production must set via web.env to the host's
non-loopback IP). Empty default = safe no-op for dev. The kernel's
same-host routing optimization keeps the traffic on lo internally
but the packet's destination IP must not literally be in 127/8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
46bba0d134
commit
122e0abddd
4 changed files with 123 additions and 78 deletions
|
|
@ -234,17 +234,21 @@ End-to-end smoke test, on the dev box first:
|
|||
|
||||
## Post-deploy verification findings (2026-05-20)
|
||||
|
||||
The plan above describes what we INTENDED to ship; this section
|
||||
documents what we ACTUALLY learned after deployment.
|
||||
The plan above describes what we INTENDED to ship. This section
|
||||
documents what we ACTUALLY learned, including a wrong intermediate
|
||||
conclusion that was later corrected.
|
||||
|
||||
Listener was deployed and confirmed bound (`ss -ulnp` showed
|
||||
`gunicorn:28000`). Servers were re-initialized, `server.cfg`
|
||||
inspected and confirmed correct — `log on`, `logaddress_add
|
||||
127.0.0.1:28000` at the bottom, after `rcon_password`. The listener
|
||||
was proven healthy end-to-end with a local `nc` probe writing a
|
||||
capture file. **Yet captures stayed empty even during active play.**
|
||||
### What worked
|
||||
|
||||
Diagnostic steps:
|
||||
Listener deployed and confirmed bound (`ss -ulnp` showed
|
||||
`gunicorn:28000`). Servers re-initialized; `server.cfg` had
|
||||
`log on` / `logaddress_add 127.0.0.1:28000` after `rcon_password`.
|
||||
Listener proven healthy end-to-end with a local `nc` probe writing
|
||||
a capture file.
|
||||
|
||||
### What didn't (and why)
|
||||
|
||||
Captures stayed empty even during active play. Symptoms:
|
||||
|
||||
1. `logaddress_list` via RCON → `1 entry: 127.0.0.1:28000` ✓
|
||||
2. `log` via RCON → `currently logging to: file, console, udp` ✓
|
||||
|
|
@ -256,54 +260,91 @@ Diagnostic steps:
|
|||
4. `tcpdump -i lo udp port 28000` during rcon say/echo/status bursts
|
||||
→ **0 packets** ✗
|
||||
5. `tcpdump -i any host 127.0.0.1 and udp` → still 0 ✗
|
||||
6. Toggle `log off` / `log on` live → engine confirms cycle, no
|
||||
effect ✗
|
||||
7. `sv_logflush 1` live → no effect ✗
|
||||
8. Suspected `SocketBindAllow=udp:27000-27999` was blocking the
|
||||
implicit ephemeral bind of a separate log socket. Added drop-in
|
||||
`SocketBindAllow=udp:32768-60999`, daemon-reloaded, restarted
|
||||
server → still 0 packets ✗. Drop-in rolled back.
|
||||
9. `strace -p <srcds_linux> -e sendto,sendmsg` while triggering
|
||||
events → **zero sendto/sendmsg calls toward 127.0.0.1:28000**.
|
||||
Only one unrelated TLS sendmsg (Steam) ✗.
|
||||
6. Toggle `log off` / `log on` live, `sv_logflush 1` → no effect ✗
|
||||
7. Tested `SocketBindAllow=udp:32768-60999` drop-in (suspected the
|
||||
ephemeral source-port bind was being rejected) → still 0
|
||||
packets. Drop-in rolled back ✗
|
||||
8. `strace -p <srcds_linux> -e sendto,sendmsg` → **zero sendto
|
||||
calls toward the destination** ✗
|
||||
|
||||
**Verdict:** L4D2's vanilla UDP logaddress emit path is
|
||||
stubbed/broken at the engine level. The cvar API and registration
|
||||
paths are intact (data-structure ops); the emit path simply doesn't
|
||||
make the syscalls. Logs reach the **file** and **console** sinks
|
||||
(the latter is what `journalctl` captures from srcds_run's stdout)
|
||||
but never the **udp** sink, despite all three being listed in the
|
||||
`log` cvar status. Same family of issue as L4D2's broken SourceTV
|
||||
(see `shqke/sourcetvsupport`).
|
||||
A premature conclusion was reached and committed as `46bba0d`
|
||||
("L4D2 logaddress UDP emit is dead"). User pushed back, asked for
|
||||
verification. Research found multiple production HLstatsX:CE
|
||||
instances running L4D2 stats successfully — disproving the
|
||||
engine-bug theory.
|
||||
|
||||
**Implications for the roadmap:**
|
||||
### Real cause
|
||||
|
||||
- The current listener captures nothing in production. The cfg
|
||||
injection of `log on` / `logaddress_add` does no harm and is kept
|
||||
for a future bridge to attach to.
|
||||
- Phase 2 path A (recommended): **SourceMod bridge plugin** in
|
||||
`l4d2host/`. Subscribes to L4D2 game events and calls
|
||||
`LogMessage()` from inside srcds. The explicit `LogMessage()`
|
||||
path is implemented separately from the autonomous engine emit
|
||||
and DOES reach UDP destinations on every Source 1 game stat
|
||||
trackers have used in production. This proves out the listener.
|
||||
- Phase 2 path B (interim, no plugin): **parse the journal**.
|
||||
`journalctl -u "left4me-server@*.service" -f` already streams
|
||||
every HL-Log-Standard line as captured srcds stdout. A reducer
|
||||
that follows journald + matches the `srcds_run[*]: L ...` lines
|
||||
produces structured events today without any srcds-side change.
|
||||
Less clean than UDP-to-listener (we'd hop through journald) but
|
||||
available immediately.
|
||||
- `mp_logdetail` was removed from the cfg injection: it is a
|
||||
CS-only cvar (L4D2 prints `Unknown command "mp_logdetail"` at
|
||||
startup).
|
||||
Re-registered logaddress at a non-loopback IP (`172.30.0.5:28000`
|
||||
on the wireguard interface) and reran the test. **8 packets in 12
|
||||
seconds**, each a properly framed HL-Log-Standard payload —
|
||||
including `Console<0>" say "wg-test-1"`, all the rcon-from lines,
|
||||
and live poller status calls.
|
||||
|
||||
**The Source engine silently drops `logaddress` destinations in
|
||||
`127.0.0.0/8`.** Registration succeeds (data-structure op), the
|
||||
cvar API reports "logging to: udp", but the engine's send loop
|
||||
filters out loopback destinations and never calls sendto for them.
|
||||
This is presumably an anti-self-loop / anti-amplification measure
|
||||
that I have not seen documented in any Valve or community source.
|
||||
|
||||
Everyone else using `logaddress` for stat tracking puts the
|
||||
collector on a *separate host* or at minimum a different interface
|
||||
IP — they never hit this. We're the unusual case of co-locating
|
||||
the listener with the gameserver and naively pointing at
|
||||
`127.0.0.1`.
|
||||
|
||||
### Fix
|
||||
|
||||
- `LOG_LISTENER_BIND` default → `0.0.0.0:28000` (accept on any
|
||||
interface).
|
||||
- `LOG_LISTENER_ADDR` default → `""` (empty). Production env file
|
||||
MUST set this to a non-loopback IP. Dev gets a safe no-op
|
||||
(cfg injector skips emitting log cvars when the address is
|
||||
empty).
|
||||
- Production `web.env`: set
|
||||
`LOG_LISTENER_ADDR=<host-non-loopback-ip>:28000`. The host's
|
||||
public interface IP works; the kernel's same-host routing
|
||||
optimization keeps the actual traffic on `lo` internally, but the
|
||||
*destination IP* in the packet header is non-loopback so Source's
|
||||
filter passes.
|
||||
|
||||
### Implications for the roadmap (revised)
|
||||
|
||||
- The vanilla UDP logaddress mechanism works in L4D2. **No SourceMod
|
||||
bridge is required for Phase 1** — we were going to add one to
|
||||
work around a non-existent engine bug.
|
||||
- `mp_logdetail` was correctly removed from the cfg injection: it
|
||||
is a CS-only cvar; L4D2 prints `Unknown command` at startup.
|
||||
- A future SourceMod plugin in `l4d2host/` may still be useful for
|
||||
L4D-specific events the engine doesn't auto-log (kill events with
|
||||
weapon detail, special-infected spawns) — see
|
||||
`SMILEWHENYOUDIE/HLstatsX-CE`'s `superlogs-l4d.sp` for prior
|
||||
art. That's a Phase 2 enhancement, not a Phase 1 prerequisite.
|
||||
|
||||
### Lessons (filed under "validate before implementing")
|
||||
|
||||
- I anchored hard on "L4D2 engine bug" after disproving two
|
||||
reasonable alternative hypotheses (hibernation, SocketBindAllow).
|
||||
The third alternative — destination-IP filter — wasn't tested
|
||||
before committing the wrong-conclusion docs. Should have tried a
|
||||
non-loopback destination as the very first test.
|
||||
- The journal evidence of rich game events going to console/file
|
||||
was real and significant, but I misread it as proof of an
|
||||
engine-level UDP stub instead of evidence the engine's line
|
||||
generator works fine and only the *destination filter* was at
|
||||
play.
|
||||
- The user's push-back ("maybe do research if it's really broken?")
|
||||
forced the right next step. Worth internalizing: a single
|
||||
contrarian search query found HLstatsX-on-L4D2 production
|
||||
instances within seconds and would have prevented the
|
||||
wrong-conclusion commit.
|
||||
|
||||
## Follow-ups (separate plans)
|
||||
|
||||
- **Phase 2:** pick Path A (SourceMod bridge) vs Path B (journald
|
||||
follower). A is cleaner long-term; B is faster to ship.
|
||||
- Once events flow somehow: design `RawLogLine` / `LogEvent` /
|
||||
`MatchSession` / `Round` schema based on what we actually see.
|
||||
- **After a few days of capture data**: design
|
||||
`RawLogLine` / `LogEvent` / `MatchSession` / `Round` schema.
|
||||
- Build a reducer.
|
||||
- `sv_logsecret` becomes relevant only once the listener is
|
||||
reachable beyond loopback (currently moot).
|
||||
- Consider a SourceMod bridge for richer L4D2-specific events
|
||||
(kill weapons, special-infected spawns, finale outcomes).
|
||||
- `sv_logsecret` if the listener is ever moved off-host.
|
||||
|
|
|
|||
|
|
@ -22,8 +22,13 @@ DEFAULT_CONFIG: dict[str, object] = {
|
|||
"STEAM_PROFILE_TTL_SECONDS": 86400,
|
||||
"STEAM_WEB_API_KEY": "",
|
||||
"LOG_LISTENER_ENABLED": True,
|
||||
"LOG_LISTENER_BIND": "127.0.0.1:28000",
|
||||
"LOG_LISTENER_ADDR": "127.0.0.1:28000",
|
||||
"LOG_LISTENER_BIND": "0.0.0.0:28000",
|
||||
# Empty by default — production MUST set this to a non-loopback IP via
|
||||
# web.env. Source silently drops logaddress destinations in 127.0.0.0/8,
|
||||
# so 127.0.0.1 here would register but never receive packets. When
|
||||
# empty, the cfg injector skips emitting `log on` / `logaddress_add`,
|
||||
# which is the safer no-op for dev environments without a srcds host.
|
||||
"LOG_LISTENER_ADDR": "",
|
||||
"LOG_CAPTURE_DIR": "/var/lib/left4me/captures",
|
||||
}
|
||||
|
||||
|
|
@ -54,7 +59,7 @@ def load_config() -> dict[str, object]:
|
|||
"STEAM_PROFILE_TTL_SECONDS": int(os.getenv("STEAM_PROFILE_TTL_SECONDS", "86400")),
|
||||
"STEAM_WEB_API_KEY": os.getenv("STEAM_WEB_API_KEY", ""),
|
||||
"LOG_LISTENER_ENABLED": _bool_from_env(os.getenv("LOG_LISTENER_ENABLED", "true")),
|
||||
"LOG_LISTENER_BIND": os.getenv("LOG_LISTENER_BIND", "127.0.0.1:28000"),
|
||||
"LOG_LISTENER_ADDR": os.getenv("LOG_LISTENER_ADDR", "127.0.0.1:28000"),
|
||||
"LOG_LISTENER_BIND": os.getenv("LOG_LISTENER_BIND", "0.0.0.0:28000"),
|
||||
"LOG_LISTENER_ADDR": os.getenv("LOG_LISTENER_ADDR", ""),
|
||||
"LOG_CAPTURE_DIR": os.getenv("LOG_CAPTURE_DIR", "/var/lib/left4me/captures"),
|
||||
}
|
||||
|
|
|
|||
|
|
@ -55,16 +55,15 @@ def build_server_spec_payload(
|
|||
# nor hostname can override it (Source's cvar semantics are last-wins).
|
||||
if server.rcon_password:
|
||||
config_lines.append(f'rcon_password "{server.rcon_password}"')
|
||||
# HL-Log-Standard UDP streaming registration. Investigation on 2026-05-20
|
||||
# established that L4D2's vanilla logaddress UDP emit path is stubbed/broken:
|
||||
# `log` cvar reports "logging to: file, console, udp" and `logaddress_list`
|
||||
# shows the registration, but srcds_linux makes ZERO sendto/sendmsg calls
|
||||
# toward the destination (proven by strace + tcpdump showing 0 packets).
|
||||
# Local file logging works fine. So the listener currently captures nothing
|
||||
# in production. The cvars are kept registered for a future SourceMod
|
||||
# bridge plugin that will call LogMessage() directly — those messages DO
|
||||
# flow through logaddress destinations on every Source 1 game including L4D2.
|
||||
# `mp_logdetail` was removed: it is a CS-only cvar (`Unknown command` on L4D2).
|
||||
# HL-Log-Standard UDP streaming registration. CRITICAL: log_listener_addr
|
||||
# MUST be a non-loopback IP. The Source engine silently drops logaddress
|
||||
# destinations in 127.0.0.0/8 — registration succeeds (logaddress_list
|
||||
# shows it, `log` cvar reports "logging to: udp") but sendto is never
|
||||
# called for that destination. Use the host's non-loopback IP (public
|
||||
# interface, wireguard, dummy interface, etc.) and have the listener
|
||||
# bind on 0.0.0.0 to receive same-host packets routed via lo by the
|
||||
# kernel's local-destination shortcut.
|
||||
# `mp_logdetail` is CS-only and emits `Unknown command` on L4D2; omitted.
|
||||
if log_listener_addr:
|
||||
config_lines.append("log on")
|
||||
config_lines.append(f"logaddress_add {log_listener_addr}")
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
"""Temporary UDP listener for srcds log streaming (HL Log Standard).
|
||||
"""UDP listener for srcds log streaming (HL Log Standard).
|
||||
|
||||
Captures raw `logaddress_add`-streamed packets from every managed L4D2
|
||||
server into flat files under LOG_CAPTURE_DIR — one file per source
|
||||
|
|
@ -6,16 +6,16 @@ server into flat files under LOG_CAPTURE_DIR — one file per source
|
|||
reducer. The goal is to observe what L4D2 actually emits on our
|
||||
servers before we commit to a schema or event vocabulary.
|
||||
|
||||
KNOWN-DEAD AS OF 2026-05-20: L4D2's vanilla logaddress UDP emit path
|
||||
is stubbed/broken at the engine level. srcds reports
|
||||
"currently logging to: file, console, udp" and logaddress_list shows
|
||||
our address, but strace + tcpdump confirmed srcds_linux makes ZERO
|
||||
sendto/sendmsg calls toward the destination. Local file logging works
|
||||
fine; only UDP broadcast is dead. This matches the broken-SourceTV
|
||||
pattern in L4D2 — see docs/superpowers/plans/2026-05-19-...md for the
|
||||
full investigation. The listener is therefore a no-op in production
|
||||
until a SourceMod bridge plugin is deployed that calls LogMessage()
|
||||
directly (those messages DO flow through logaddress destinations).
|
||||
CRITICAL: the listener MUST bind on a non-loopback interface (default
|
||||
LOG_LISTENER_BIND=0.0.0.0:28000) and the cfg-injected logaddress MUST
|
||||
point to a non-loopback IP (LOG_LISTENER_ADDR set in web.env).
|
||||
The Source engine silently drops logaddress destinations in
|
||||
127.0.0.0/8 — registration succeeds but no packets are ever sent
|
||||
(verified 2026-05-20 with strace + tcpdump). Use the host's public
|
||||
interface IP, a wireguard interface IP, or a dummy interface IP; the
|
||||
kernel's same-host routing optimization will keep the traffic on lo
|
||||
internally but the destination IP must not literally be in 127/8.
|
||||
See docs/superpowers/plans/2026-05-19-...md for the full investigation.
|
||||
|
||||
Packet shape (per HL Log Standard):
|
||||
4 bytes 0xFF | out-of-band marker
|
||||
|
|
|
|||
Loading…
Reference in a new issue