# Server live-state display (counts, map, roster, avatars, history) ## Context The l4d2web UI currently shows systemd lifecycle state per game server (running/stopped/unknown) but nothing about what's happening *inside* the game: player count, current map, whether the server is hibernating, who is connected. To know any of that, users have to context-switch (open the game, query externally). The goal is a **read-side live-state display**: counts + map + hibernating on the server list, plus a server-detail panel showing the current player roster (avatars + names) and a "recent players" section for who's been on lately. Backed by a persistent history table so we get count-over-time graphs and player-presence history (foundation for future ban UX) for free. **Source: RCON exclusively.** A2S_INFO (UDP, anonymous) was investigated and discarded — it can't deliver Steam IDs, hibernating flag, or interactive commands, so anything beyond raw counts re-routes through RCON anyway. Both transports were verified working against prod `left4.me`. Going RCON-only means one transport, one set of tests, no throwaway scaffolding. **Avatars: Steam Web API.** RCON gives Steam IDs; `ISteamUser/GetPlayerSummaries` resolves them to persona names + avatar URLs hot-linked from Steam's CDN. API key already obtained. **Commands are deferred** to a separate plan. This plan is read-only. --- ## Architecture ``` ┌─────────────────────────────┐ │ left4me-web (Flask) │ ┌──────────────┐ RCON │ ┌───────────────────────┐ │ │ srcds 27016 │◄──────┼──┤ live-state poller │ │ └──────────────┘ TCP │ │ (daemon thread) │ │ │ └───────┬───────────────┘ │ ┌──────────────┐ RCON │ │ writes │ │ srcds 27021 │◄──────┤ ▼ │ └──────────────┘ │ ┌───────────────────────┐ │ │ │ server_live_state │ │ Steam Web API │ │ server_player_session │ │ ┌────────────┐ │ │ steam_user_profile │ │ │ Steam CDN │◄─┼──┤ │ │ │ avatars... │ │ └───────┬───────────────┘ │ └────────────┘ │ │ reads │ ▲ │ ▼ │ │ │ ┌───────────────────────┐ │ └────────┼──┤ /servers, /servers/N │ │ │ │ (HTMX 5s refresh) │ │ │ └───────────────────────┘ │ └─────────────────────────────┘ ``` Single daemon thread (modeled on the existing `start_state_poller` in `l4d2web/services/job_worker.py:617-647`), inside the Flask process, polls every `LIVE_STATE_POLL_SECONDS` (default 5). Per poll, per running server with a configured RCON password: 1. TCP connect to `127.0.0.1:`, auth, send `status`, parse response. 2. Compare server-level state (players/map/hibernating/etc.) to the latest `server_live_state` row for this server. If unchanged, bump `last_seen_at`. If changed, insert a new row. 3. Reconcile open sessions (`server_player_session` rows where `left_at IS NULL`) with the current `status` roster: open new sessions for new players (backfilling `joined_at` from RCON's `connected` field), close sessions for players no longer present, update `min_ping`/`max_ping` for continuing sessions. 4. Collect Steam IDs that are missing from `steam_user_profile` or have `fetched_at` older than 24h; batch them into a single `GetPlayerSummaries` call; upsert results. 5. Trim `server_live_state` and closed sessions older than retention. --- ## Schema (one new alembic migration) ### New column: `servers.rcon_password` ```python rcon_password: Mapped[str] = mapped_column( String(64), nullable=False, default="", server_default="" ) ``` Empty string = "no password configured yet" (poller skips). Migration backfills every existing row with `secrets.token_urlsafe(32)` (~43 chars, URL-safe character set so the literal `"..."` cfg-quoting needs no escaping). ### `server_live_state` — run-length-encoded snapshots ```sql CREATE TABLE server_live_state ( id INTEGER PRIMARY KEY AUTOINCREMENT, server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE, started_at DATETIME NOT NULL, -- when this exact state first appeared last_seen_at DATETIME NOT NULL, -- most recent poll where it still held players INTEGER NOT NULL, max_players INTEGER NOT NULL, bots INTEGER NOT NULL, map VARCHAR(64) NOT NULL, hibernating BOOLEAN NOT NULL ); CREATE INDEX ix_sls_server_started ON server_live_state(server_id, started_at DESC); ``` - "State" = the tuple `(players, max_players, bots, map, hibernating)`. Ping/loss are deliberately not stored at server-level, so they don't churn rows. - Idle hibernating server collapses from one-row-per-poll to one-row-per-state-change (≈17,280× compression for a 24h-idle server). - Latest snapshot for a server: `ORDER BY started_at DESC LIMIT 1`. UI staleness check: `last_seen_at > now - LIVE_STATE_STALE_SECONDS` (default 30). - Retention: trim rows where `last_seen_at < now - LIVE_STATE_HISTORY_DAYS` (default 30). - Failed polls produce no DB write; the staleness check on `last_seen_at` handles UI degradation cleanly. ### `server_player_session` — interval per connection ```sql CREATE TABLE server_player_session ( id INTEGER PRIMARY KEY AUTOINCREMENT, server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE, steam_id_64 VARCHAR(20) NOT NULL, joined_at DATETIME NOT NULL, left_at DATETIME NULL, -- NULL = currently in-game name_at_join VARCHAR(64) NOT NULL, min_ping INTEGER NOT NULL, max_ping INTEGER NOT NULL ); CREATE INDEX ix_sps_server_open ON server_player_session(server_id, left_at); CREATE INDEX ix_sps_steam_history ON server_player_session(steam_id_64, joined_at); ``` - `joined_at` is **backfilled from RCON's `connected` duration** on first sighting (`joined_at = now - connected_seconds`). This heals brief polling gaps and survives web restarts: even if we just started polling, we know when the still-connected players actually joined. - A player who disconnects and rejoins gets two rows, not one merged interval. - Bots are excluded — rows with a non-`STEAM_X:Y:Z` uniqueid are skipped. - `min_ping`/`max_ping` updated only when a new poll pushes the range, to avoid noise writes. - On poller startup, close any sessions whose server isn't in current RCON output. Plus: close sessions after N consecutive failed polls of their server (TBD constant during implementation, e.g. 6 polls = ~30s). - Retention: trim closed sessions where `left_at < now - SESSION_HISTORY_DAYS` (default 30). Open sessions never trimmed. ### `steam_user_profile` — cached profile data (24h TTL) ```sql CREATE TABLE steam_user_profile ( steam_id_64 VARCHAR(20) PRIMARY KEY, persona_name VARCHAR(64) NOT NULL, avatar_url TEXT NOT NULL, -- avatarmedium from Steam Web API fetched_at DATETIME NOT NULL ); ``` - Cache is global, not per-server (one profile per Steam ID). - Refreshed when `fetched_at < now - 24h` or when entry is missing. - Soft-fail: if the Steam API key is unset, the API is down, or a profile is private, we just leave the cache as-is and the UI falls back to `name_at_join` + placeholder avatar. ### Bind-rendered queries **Current players on server X:** ```sql SELECT sp.steam_id_64, sp.joined_at, sp.name_at_join, sp.min_ping, sp.max_ping, p.persona_name, p.avatar_url FROM server_player_session sp LEFT JOIN steam_user_profile p USING (steam_id_64) WHERE sp.server_id = ? AND sp.left_at IS NULL ORDER BY sp.joined_at; ``` **Recent players on server X (last 30 days, excluding currently in-game):** ```sql SELECT sp.steam_id_64, MAX(sp.left_at) AS last_seen, p.persona_name, p.avatar_url FROM server_player_session sp LEFT JOIN steam_user_profile p USING (steam_id_64) WHERE sp.server_id = ? AND sp.left_at IS NOT NULL AND sp.left_at > datetime('now', '-30 days') AND sp.steam_id_64 NOT IN ( SELECT steam_id_64 FROM server_player_session WHERE server_id = ? AND left_at IS NULL ) GROUP BY sp.steam_id_64, p.persona_name, p.avatar_url ORDER BY last_seen DESC LIMIT 20; ``` --- ## Modules ### `l4d2web/services/rcon.py` (new) Pure stdlib (`socket`, `struct`), no new dependency. Source RCON protocol: ```python @dataclass(slots=True, frozen=True) class PlayerRow: steam_id_64: str # converted from STEAM_X:Y:Z name: str connected_seconds: int ping: int @dataclass(slots=True, frozen=True) class StatusResponse: map: str players: int # humans max_players: int bots: int hibernating: bool roster: list[PlayerRow] class RconError(Exception): ... class RconAuthError(RconError): ... def query_status(host: str, port: int, password: str, *, timeout: float = 2.0) -> StatusResponse: ... ``` Implementation notes: - Auth handshake quirk verified live: server sends a `type=0` empty-body packet **before** the `type=2` auth response. Consume both. `req_id == -1` on the auth response = bad password. - Single TCP connection per query (loopback, ~10-20ms total round-trip — pooling not worth it at this scale). - Header regex on `map :` and `players :` lines (the `(hibernating|not hibernating)` token is in `players :`). - Roster regex: split lines starting with `#`, skip the column-header line, robustly extract the quoted name + the `STEAM_X:Y:Z` token + `MM:SS` or `HH:MM:SS` connected duration + ping. Tolerate the two-numeric-prefix L4D2 variant (`# 2 1 "Crone" STEAM_1:0:...`). - Steam ID conversion: `STEAM_X:Y:Z` → `76561197960265728 + (Y * 2) + Z` (returned as string). ### `l4d2web/services/steam_users.py` (new) Modeled directly on `l4d2web/services/steam_workshop.py:17-43` (single `requests.Session`, 30s timeout, anonymous-pattern POST with form-encoded body — only difference is the `key=` parameter). ```python @dataclass(slots=True, frozen=True) class SteamProfile: steam_id_64: str persona_name: str avatar_url: str # avatarmedium def fetch_profiles_batch(steam_ids: Iterable[str], *, api_key: str) -> list[SteamProfile]: ... ``` - Endpoint: `GET https://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key=&steamids=`. - Up to 100 IDs per call; caller batches. - Returns only successful resolutions (private/deleted accounts simply absent from the response — fine, they stay uncached and the UI falls back). - Raises on transport errors; caller decides whether to surface. ### `l4d2web/services/live_state_poller.py` (new) Modeled on `start_state_poller` / `state_poller_loop` in `l4d2web/services/job_worker.py:617-647`. ```python def start_live_state_poller(app) -> None: ... # spawns daemon thread, skipped under TESTING def live_state_poller_loop(app, interval: float) -> None: ... def poll_once() -> None: # one full pass over running servers ... ``` Per-server algorithm: 1. RCON `status` → `StatusResponse` (or skip on auth/timeout, logged via `app.logger`). 2. **Server-level RLE upsert**: load newest `server_live_state` row for this server. If `(players, max_players, bots, map, hibernating)` matches → `UPDATE last_seen_at = now()`. Else → `INSERT` new row. 3. **Session reconciliation** in a single transaction: - Load open sessions for this server. - For each player in `response.roster` not in open sessions: `INSERT` new session with `joined_at = now - connected_seconds`, `name_at_join = roster.name`, `min_ping = max_ping = roster.ping`. - For each open session whose player is in the roster: if `roster.ping < min_ping` or `> max_ping`, `UPDATE` the range. Otherwise skip the write. - For each open session whose player is *not* in the roster: `UPDATE left_at = now()`. 4. **Profile enrichment**: collect Steam IDs from the roster where the cached profile is missing or `fetched_at < now - 24h`. Skip if `STEAM_WEB_API_KEY` unset. Batch into one Steam API call. Upsert results. Periodic (every Nth cycle, e.g. once a minute): - Trim `server_live_state` and closed sessions past retention. - Close any open sessions whose `server_id` hasn't had a successful RCON response in the last `STUCK_SESSION_SECONDS` (default 60). ### Modify: `l4d2web/services/l4d2_facade.py:28-52` `build_server_spec_payload` **appends** `f'rcon_password "{server.rcon_password}"'` as the *last* entry in the returned `config` list, only if the password is non-empty. Appending (not prepending) matters: Source's cfg semantics are last-wins, so putting our line after both the overlay `exec` lines and the user's blueprint config guarantees no overlay or blueprint can silently clobber the password and break the poller. `l4d2host/instances.py:40-58` already writes `spec.config` lines verbatim to `server.cfg` — **no host-side change needed**. ### Modify: server-create route Wherever the server-create form handler lives (`l4d2web/routes/server_routes.py` or similar — confirm during implementation): before commit, generate `rcon_password = secrets.token_urlsafe(32)`. --- ## Web UI ### Server list (template TBD: `ls l4d2web/templates/` during implementation) Add an inline live-state cell per server row: - Stopped server: `—` - Stale (no row newer than `LIVE_STATE_STALE_SECONDS`): dim `?` with tooltip "no data" - Hibernating: `0/4 · idle · c1m1_hotel` - Active: `2/4 · c1m2_streets` No HTMX on the list page; page reload picks up the latest snapshot. ### Server detail (`l4d2web/templates/server_detail.html`) New section, HTMX-refreshed every `LIVE_STATE_POLL_SECONDS` (default 5): ```html
``` The partial renders three blocks: 1. **Summary**: `players/max_players · map · idle?` plus a small "polled Ns ago" caption. 2. **Current players** (only if non-empty): grid of cards, each ` {{ profile.persona_name or session.name_at_join }} · {{ joined_relative }} · ping {{ min }}-{{ max }}ms`. 3. **Recent players** (last 30 days, excluding current; only if non-empty): smaller cards, `{{ avatar }} {{ persona_name or name_at_join }} · last seen {{ last_seen_relative }}`. New route: `GET /servers//live-state` returns the partial. Composition mirrors the existing build-status pattern at `l4d2web/templates/_overlay_build_status.html:1-5`. Avatar `` tags point straight at Steam CDN URLs (`avatars.cloudflare.steamstatic.com` / `avatars.akamai.steamstatic.com`). No proxying. Same approach as `WorkshopItem.preview_url`. Note: confirm the existing CSP allows these hosts; if not, extend it. No JS framework added — HTMX only. --- ## Config keys In `l4d2web/config.py`, plus documented defaults in `deploy/templates/etc/left4me/web.env` where applicable: | key | default | purpose | |---|---|---| | `LIVE_STATE_POLL_SECONDS` | `5` | poll interval | | `LIVE_STATE_QUERY_TIMEOUT_SECONDS` | `2.0` | per-RCON-query timeout | | `LIVE_STATE_POLL_WORKERS` | `4` | thread-pool size for parallel per-server polls | | `LIVE_STATE_STALE_SECONDS` | `30` | UI staleness threshold | | `LIVE_STATE_HISTORY_DAYS` | `30` | retention for snapshots + closed sessions | | `STUCK_SESSION_SECONDS` | `60` | close open sessions whose server has been unreachable for this long | | `STEAM_PROFILE_TTL_SECONDS` | `86400` | profile cache TTL | | `STEAM_WEB_API_KEY` | `""` | from `web.env`; empty disables enrichment | --- ## Tests - `l4d2web/tests/test_rcon.py` — protocol handshake against an in-process TCP fixture: auth-success, auth-failure (`req_id == -1`), header parse (incl. `(hibernating)` and `(reserved )` variants), roster parse (incl. the two-numeric-prefix L4D2 variant), Steam ID conversion. - `l4d2web/tests/test_steam_users.py` — request shape (key in querystring, batched ids, 100-per-call ceiling), response parsing, partial response (some IDs missing). - `l4d2web/tests/test_live_state_poller.py` — mirror `test_state_poller_*` at `l4d2web/tests/test_job_worker.py:882-952`. Cover: iterates only running servers with non-empty `rcon_password`, RLE upsert (matching state → `last_seen_at` bump only; differing state → new row), session open with backfilled `joined_at`, session close on disappearance, ping range expansion, stuck-session close after N failures, drops auth failures silently, respects retention. - `l4d2web/tests/test_server_routes.py` (extend) — `/servers//live-state` fragment route renders summary/current/recent blocks correctly; stale rendering when latest snapshot is old; soft-fail rendering when no profile cached. - `l4d2web/tests/test_l4d2_facade.py` (extend) — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when password is set; omits the line when empty; appears after both the overlay `exec` lines and the blueprint config lines. - Migration test — existing rows backfilled with non-empty 43-char passwords; tables created with correct indexes. --- ## Critical files **New:** - `l4d2web/services/rcon.py` — Source RCON client + status parser - `l4d2web/services/steam_users.py` — Steam Web API client (mirrors `steam_workshop.py`) - `l4d2web/services/live_state_poller.py` — background thread + poll loop + session reconciler - `l4d2web/alembic/versions/00XX_server_live_state.py` — migration: new column, three new tables, password backfill - `l4d2web/templates/_live_state.html` — HTMX-refreshed fragment (summary + current + recent) - `l4d2web/tests/test_rcon.py`, `l4d2web/tests/test_steam_users.py`, `l4d2web/tests/test_live_state_poller.py` **Modify:** - `l4d2web/models.py` — add `ServerLiveState`, `ServerPlayerSession`, `SteamUserProfile`; add `rcon_password` to `Server` (after line 137) - `l4d2web/services/l4d2_facade.py:28-52` — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when set - `l4d2web/app.py` — call `start_live_state_poller(app)` next to existing `start_state_poller` - `l4d2web/routes/server_routes.py` (or equivalent — confirm) — generate `rcon_password` in create handler; add `GET /servers//live-state` - `l4d2web/templates/server_detail.html` — include `_live_state.html` - `l4d2web/templates/.html` — confirm filename; add inline badge column - `l4d2web/config.py` — register the eight new config keys - `deploy/templates/etc/left4me/web.env` — add `STEAM_WEB_API_KEY=` and any tunables we expose **Reused without changes:** - `l4d2web/services/job_worker.py:617-647` — daemon-thread / poll-loop pattern reference - `l4d2web/services/steam_workshop.py:17-43` — `requests.Session` + form-POST pattern for Steam Web API - `l4d2host/instances.py:40-58` — already writes `spec.config` verbatim, so no host-side change for password injection - `l4d2web/templates/_overlay_build_status.html` — HTMX polling pattern reference --- ## Verification 1. **Unit tests**: ``` pytest l4d2web/tests/test_rcon.py l4d2web/tests/test_steam_users.py l4d2web/tests/test_live_state_poller.py -v pytest l4d2web/tests -q # full regression ``` 2. **Migration check**: ``` alembic upgrade head sqlite3 l4d2web.db "SELECT id, name, length(rcon_password) FROM servers;" # every row ~43 sqlite3 l4d2web.db ".schema server_live_state server_player_session steam_user_profile" ``` 3. **End-to-end against prod** (`left4.me`): - Deploy. Confirm `systemctl status left4me-web.service` shows no crash-loop and the journal logs `start_live_state_poller` once. - Restart both existing game servers so they pick up the injected password. - SQL sanity (web-host shell): ``` sqlite3 l4d2web.db "SELECT server_id, started_at, last_seen_at, players, map, hibernating FROM server_live_state ORDER BY server_id, started_at DESC LIMIT 10;" ``` Expect a single recent row per server while idle; new rows when players come/go. - Connect to one server from the L4D2 client; within 5s, `/servers/` shows a card with your avatar + persona name + ping range. Disconnect; within 5s the card moves to "recent." - `sqlite3 l4d2web.db "SELECT * FROM server_player_session WHERE left_at IS NULL;"` — empty when nobody's connected; one row per current player when someone is. - `sqlite3 l4d2web.db "SELECT count(*), MIN(fetched_at), MAX(fetched_at) FROM steam_user_profile;"` — at least one row after a player has been resolved. 4. **Failure-path checks**: - Manually corrupt `servers.rcon_password` for one server; confirm the journal logs auth failure and the row's badge goes stale within `LIVE_STATE_STALE_SECONDS`; other servers unaffected. - Unset `STEAM_WEB_API_KEY` in `web.env`, restart web; confirm display still works (in-game names + placeholder avatars), no errors in journal. - `nft` drop the loopback TCP on one server's port; confirm rows stop appearing, open sessions close after `STUCK_SESSION_SECONDS`, badge goes stale. --- ## Open implementation questions - **Server-list template filename**: confirm with `ls l4d2web/templates/` once implementation starts. - **Server-create route location**: confirm path (likely `l4d2web/routes/server_routes.py`). - **CSP allowlist for Steam avatar CDNs**: check `l4d2web/app.py` (or wherever security headers live) — extend `img-src` to include `avatars.cloudflare.steamstatic.com`, `avatars.akamai.steamstatic.com`, `avatars.steamstatic.com` if a CSP is enforced. - **Adaptive backoff** for hibernating servers: defer; start with fixed 5s and revisit only if load becomes a concern (which it won't at current server count). - **Migration data step**: SQLite alembic batch operation with a Python data step that iterates rows and generates `secrets.token_urlsafe(32)` per row — confirm pattern against existing migrations under `l4d2web/alembic/versions/`. --- ## Deferred to a separate plan - Generic RCON command execution (`changelevel`, `kick`, `say`, `sm_ban`, ...) - Web UI buttons mapped to those commands with CSRF + admin authz - Audit log table for issued commands - Player-count history graphs (data already accumulating from this plan) - Ban UX (lookup by Steam ID, search across `server_player_session`)