From 202026e11a68f19b7e058cd26a61a211b54510e3 Mon Sep 17 00:00:00 2001 From: mwiegand Date: Tue, 12 May 2026 21:03:26 +0200 Subject: [PATCH] docs/spec: add server live-state display design RCON-based polling with run-length-encoded snapshots, session intervals with min/max ping, Steam profile cache, and a server-detail roster of current + recent players hot-linked from Steam CDN avatars. Co-Authored-By: Claude Sonnet 4.6 --- ...-05-12-server-live-state-display-design.md | 397 ++++++++++++++++++ 1 file changed, 397 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-12-server-live-state-display-design.md diff --git a/docs/superpowers/specs/2026-05-12-server-live-state-display-design.md b/docs/superpowers/specs/2026-05-12-server-live-state-display-design.md new file mode 100644 index 0000000..43a12f0 --- /dev/null +++ b/docs/superpowers/specs/2026-05-12-server-live-state-display-design.md @@ -0,0 +1,397 @@ +# Server live-state display (counts, map, roster, avatars, history) + +## Context + +The l4d2web UI currently shows systemd lifecycle state per game server (running/stopped/unknown) but nothing about what's happening *inside* the game: player count, current map, whether the server is hibernating, who is connected. To know any of that, users have to context-switch (open the game, query externally). + +The goal is a **read-side live-state display**: counts + map + hibernating on the server list, plus a server-detail panel showing the current player roster (avatars + names) and a "recent players" section for who's been on lately. Backed by a persistent history table so we get count-over-time graphs and player-presence history (foundation for future ban UX) for free. + +**Source: RCON exclusively.** A2S_INFO (UDP, anonymous) was investigated and discarded — it can't deliver Steam IDs, hibernating flag, or interactive commands, so anything beyond raw counts re-routes through RCON anyway. Both transports were verified working against prod `left4.me`. Going RCON-only means one transport, one set of tests, no throwaway scaffolding. + +**Avatars: Steam Web API.** RCON gives Steam IDs; `ISteamUser/GetPlayerSummaries` resolves them to persona names + avatar URLs hot-linked from Steam's CDN. API key already obtained. + +**Commands are deferred** to a separate plan. This plan is read-only. + +--- + +## Architecture + +``` + ┌─────────────────────────────┐ + │ left4me-web (Flask) │ +┌──────────────┐ RCON │ ┌───────────────────────┐ │ +│ srcds 27016 │◄──────┼──┤ live-state poller │ │ +└──────────────┘ TCP │ │ (daemon thread) │ │ + │ └───────┬───────────────┘ │ +┌──────────────┐ RCON │ │ writes │ +│ srcds 27021 │◄──────┤ ▼ │ +└──────────────┘ │ ┌───────────────────────┐ │ + │ │ server_live_state │ │ + Steam Web API │ │ server_player_session │ │ + ┌────────────┐ │ │ steam_user_profile │ │ + │ Steam CDN │◄─┼──┤ │ │ + │ avatars... │ │ └───────┬───────────────┘ │ + └────────────┘ │ │ reads │ + ▲ │ ▼ │ + │ │ ┌───────────────────────┐ │ + └────────┼──┤ /servers, /servers/N │ │ + │ │ (HTMX 5s refresh) │ │ + │ └───────────────────────┘ │ + └─────────────────────────────┘ +``` + +Single daemon thread (modeled on the existing `start_state_poller` in `l4d2web/services/job_worker.py:617-647`), inside the Flask process, polls every `LIVE_STATE_POLL_SECONDS` (default 5). Per poll, per running server with a configured RCON password: + +1. TCP connect to `127.0.0.1:`, auth, send `status`, parse response. +2. Compare server-level state (players/map/hibernating/etc.) to the latest `server_live_state` row for this server. If unchanged, bump `last_seen_at`. If changed, insert a new row. +3. Reconcile open sessions (`server_player_session` rows where `left_at IS NULL`) with the current `status` roster: open new sessions for new players (backfilling `joined_at` from RCON's `connected` field), close sessions for players no longer present, update `min_ping`/`max_ping` for continuing sessions. +4. Collect Steam IDs that are missing from `steam_user_profile` or have `fetched_at` older than 24h; batch them into a single `GetPlayerSummaries` call; upsert results. +5. Trim `server_live_state` and closed sessions older than retention. + +--- + +## Schema (one new alembic migration) + +### New column: `servers.rcon_password` + +```python +rcon_password: Mapped[str] = mapped_column( + String(64), nullable=False, default="", server_default="" +) +``` + +Empty string = "no password configured yet" (poller skips). Migration backfills every existing row with `secrets.token_urlsafe(32)` (~43 chars, URL-safe character set so the literal `"..."` cfg-quoting needs no escaping). + +### `server_live_state` — run-length-encoded snapshots + +```sql +CREATE TABLE server_live_state ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE, + started_at DATETIME NOT NULL, -- when this exact state first appeared + last_seen_at DATETIME NOT NULL, -- most recent poll where it still held + players INTEGER NOT NULL, + max_players INTEGER NOT NULL, + bots INTEGER NOT NULL, + map VARCHAR(64) NOT NULL, + hibernating BOOLEAN NOT NULL +); +CREATE INDEX ix_sls_server_started ON server_live_state(server_id, started_at DESC); +``` + +- "State" = the tuple `(players, max_players, bots, map, hibernating)`. Ping/loss are deliberately not stored at server-level, so they don't churn rows. +- Idle hibernating server collapses from one-row-per-poll to one-row-per-state-change (≈17,280× compression for a 24h-idle server). +- Latest snapshot for a server: `ORDER BY started_at DESC LIMIT 1`. UI staleness check: `last_seen_at > now - LIVE_STATE_STALE_SECONDS` (default 30). +- Retention: trim rows where `last_seen_at < now - LIVE_STATE_HISTORY_DAYS` (default 30). +- Failed polls produce no DB write; the staleness check on `last_seen_at` handles UI degradation cleanly. + +### `server_player_session` — interval per connection + +```sql +CREATE TABLE server_player_session ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE, + steam_id_64 VARCHAR(20) NOT NULL, + joined_at DATETIME NOT NULL, + left_at DATETIME NULL, -- NULL = currently in-game + name_at_join VARCHAR(64) NOT NULL, + min_ping INTEGER NOT NULL, + max_ping INTEGER NOT NULL +); +CREATE INDEX ix_sps_server_open ON server_player_session(server_id, left_at); +CREATE INDEX ix_sps_server_recent ON server_player_session(server_id, left_at DESC); +CREATE INDEX ix_sps_steam_history ON server_player_session(steam_id_64, joined_at DESC); +``` + +- `joined_at` is **backfilled from RCON's `connected` duration** on first sighting (`joined_at = now - connected_seconds`). This heals brief polling gaps and survives web restarts: even if we just started polling, we know when the still-connected players actually joined. +- A player who disconnects and rejoins gets two rows, not one merged interval. +- Bots are excluded — rows with a non-`STEAM_X:Y:Z` uniqueid are skipped. +- `min_ping`/`max_ping` updated only when a new poll pushes the range, to avoid noise writes. +- On poller startup, close any sessions whose server isn't in current RCON output. Plus: close sessions after N consecutive failed polls of their server (TBD constant during implementation, e.g. 6 polls = ~30s). +- Retention: trim closed sessions where `left_at < now - SESSION_HISTORY_DAYS` (default 30). Open sessions never trimmed. + +### `steam_user_profile` — cached profile data (24h TTL) + +```sql +CREATE TABLE steam_user_profile ( + steam_id_64 VARCHAR(20) PRIMARY KEY, + persona_name VARCHAR(64) NOT NULL, + avatar_url TEXT NOT NULL, -- avatarmedium from Steam Web API + fetched_at DATETIME NOT NULL +); +``` + +- Cache is global, not per-server (one profile per Steam ID). +- Refreshed when `fetched_at < now - 24h` or when entry is missing. +- Soft-fail: if the Steam API key is unset, the API is down, or a profile is private, we just leave the cache as-is and the UI falls back to `name_at_join` + placeholder avatar. + +### Bind-rendered queries + +**Current players on server X:** +```sql +SELECT sp.steam_id_64, sp.joined_at, sp.name_at_join, + sp.min_ping, sp.max_ping, + p.persona_name, p.avatar_url +FROM server_player_session sp +LEFT JOIN steam_user_profile p USING (steam_id_64) +WHERE sp.server_id = ? AND sp.left_at IS NULL +ORDER BY sp.joined_at; +``` + +**Recent players on server X (last 30 days, excluding currently in-game):** +```sql +SELECT sp.steam_id_64, MAX(sp.left_at) AS last_seen, + p.persona_name, p.avatar_url +FROM server_player_session sp +LEFT JOIN steam_user_profile p USING (steam_id_64) +WHERE sp.server_id = ? + AND sp.left_at IS NOT NULL + AND sp.left_at > datetime('now', '-30 days') + AND sp.steam_id_64 NOT IN ( + SELECT steam_id_64 FROM server_player_session + WHERE server_id = ? AND left_at IS NULL + ) +GROUP BY sp.steam_id_64, p.persona_name, p.avatar_url +ORDER BY last_seen DESC +LIMIT 20; +``` + +--- + +## Modules + +### `l4d2web/services/rcon.py` (new) + +Pure stdlib (`socket`, `struct`), no new dependency. Source RCON protocol: + +```python +@dataclass(slots=True, frozen=True) +class PlayerRow: + steam_id_64: str # converted from STEAM_X:Y:Z + name: str + connected_seconds: int + ping: int + +@dataclass(slots=True, frozen=True) +class StatusResponse: + map: str + players: int # humans + max_players: int + bots: int + hibernating: bool + roster: list[PlayerRow] + +class RconError(Exception): ... +class RconAuthError(RconError): ... + +def query_status(host: str, port: int, password: str, *, timeout: float = 2.0) -> StatusResponse: ... +``` + +Implementation notes: +- Auth handshake quirk verified live: server sends a `type=0` empty-body packet **before** the `type=2` auth response. Consume both. `req_id == -1` on the auth response = bad password. +- Single TCP connection per query (loopback, ~10-20ms total round-trip — pooling not worth it at this scale). +- Header regex on `map :` and `players :` lines (the `(hibernating|not hibernating)` token is in `players :`). +- Roster regex: split lines starting with `#`, skip the column-header line, robustly extract the quoted name + the `STEAM_X:Y:Z` token + `MM:SS` or `HH:MM:SS` connected duration + ping. Tolerate the two-numeric-prefix L4D2 variant (`# 2 1 "Crone" STEAM_1:0:...`). +- Steam ID conversion: `STEAM_X:Y:Z` → `76561197960265728 + (Y * 2) + Z` (returned as string). + +### `l4d2web/services/steam_users.py` (new) + +Modeled directly on `l4d2web/services/steam_workshop.py:17-43` (single `requests.Session`, 30s timeout, anonymous-pattern POST with form-encoded body — only difference is the `key=` parameter). + +```python +@dataclass(slots=True, frozen=True) +class SteamProfile: + steam_id_64: str + persona_name: str + avatar_url: str # avatarmedium + +def fetch_profiles_batch(steam_ids: Iterable[str], *, api_key: str) -> list[SteamProfile]: ... +``` + +- Endpoint: `GET https://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key=&steamids=`. +- Up to 100 IDs per call; caller batches. +- Returns only successful resolutions (private/deleted accounts simply absent from the response — fine, they stay uncached and the UI falls back). +- Raises on transport errors; caller decides whether to surface. + +### `l4d2web/services/live_state_poller.py` (new) + +Modeled on `start_state_poller` / `state_poller_loop` in `l4d2web/services/job_worker.py:617-647`. + +```python +def start_live_state_poller(app) -> None: ... # spawns daemon thread, skipped under TESTING +def live_state_poller_loop(app, interval: float) -> None: ... +def poll_once() -> None: # one full pass over running servers + ... +``` + +Per-server algorithm: +1. RCON `status` → `StatusResponse` (or skip on auth/timeout, logged via `app.logger`). +2. **Server-level RLE upsert**: load newest `server_live_state` row for this server. If `(players, max_players, bots, map, hibernating)` matches → `UPDATE last_seen_at = now()`. Else → `INSERT` new row. +3. **Session reconciliation** in a single transaction: + - Load open sessions for this server. + - For each player in `response.roster` not in open sessions: `INSERT` new session with `joined_at = now - connected_seconds`, `name_at_join = roster.name`, `min_ping = max_ping = roster.ping`. + - For each open session whose player is in the roster: if `roster.ping < min_ping` or `> max_ping`, `UPDATE` the range. Otherwise skip the write. + - For each open session whose player is *not* in the roster: `UPDATE left_at = now()`. +4. **Profile enrichment**: collect Steam IDs from the roster where the cached profile is missing or `fetched_at < now - 24h`. Skip if `STEAM_WEB_API_KEY` unset. Batch into one Steam API call. Upsert results. + +Periodic (every Nth cycle, e.g. once a minute): +- Trim `server_live_state` and closed sessions past retention. +- Close any open sessions whose `server_id` hasn't had a successful RCON response in the last `STUCK_SESSION_SECONDS` (default 60). + +### Modify: `l4d2web/services/l4d2_facade.py:28-52` + +`build_server_spec_payload` **appends** `f'rcon_password "{server.rcon_password}"'` as the *last* entry in the returned `config` list, only if the password is non-empty. Appending (not prepending) matters: Source's cfg semantics are last-wins, so putting our line after both the overlay `exec` lines and the user's blueprint config guarantees no overlay or blueprint can silently clobber the password and break the poller. `l4d2host/instances.py:40-58` already writes `spec.config` lines verbatim to `server.cfg` — **no host-side change needed**. + +### Modify: server-create route + +Wherever the server-create form handler lives (`l4d2web/routes/server_routes.py` or similar — confirm during implementation): before commit, generate `rcon_password = secrets.token_urlsafe(32)`. + +--- + +## Web UI + +### Server list (template TBD: `ls l4d2web/templates/` during implementation) + +Add an inline live-state cell per server row: +- Stopped server: `—` +- Stale (no row newer than `LIVE_STATE_STALE_SECONDS`): dim `?` with tooltip "no data" +- Hibernating: `0/4 · idle · c1m1_hotel` +- Active: `2/4 · c1m2_streets` + +No HTMX on the list page; page reload picks up the latest snapshot. + +### Server detail (`l4d2web/templates/server_detail.html`) + +New section, HTMX-refreshed every `LIVE_STATE_POLL_SECONDS` (default 5): + +```html +
+ +
+``` + +The partial renders three blocks: + +1. **Summary**: `players/max_players · map · idle?` plus a small "polled Ns ago" caption. +2. **Current players** (only if non-empty): grid of cards, each ` {{ profile.persona_name or session.name_at_join }} · {{ joined_relative }} · ping {{ min }}-{{ max }}ms`. +3. **Recent players** (last 30 days, excluding current; only if non-empty): smaller cards, `{{ avatar }} {{ persona_name or name_at_join }} · last seen {{ last_seen_relative }}`. + +New route: `GET /servers//live-state` returns the partial. Composition mirrors the existing build-status pattern at `l4d2web/templates/_overlay_build_status.html:1-5`. + +Avatar `` tags point straight at Steam CDN URLs (`avatars.cloudflare.steamstatic.com` / `avatars.akamai.steamstatic.com`). No proxying. Same approach as `WorkshopItem.preview_url`. Note: confirm the existing CSP allows these hosts; if not, extend it. + +No JS framework added — HTMX only. + +--- + +## Config keys + +In `l4d2web/config.py`, plus documented defaults in `deploy/templates/etc/left4me/web.env` where applicable: + +| key | default | purpose | +|---|---|---| +| `LIVE_STATE_POLL_SECONDS` | `5` | poll interval | +| `LIVE_STATE_QUERY_TIMEOUT_SECONDS` | `2.0` | per-RCON-query timeout | +| `LIVE_STATE_POLL_WORKERS` | `4` | thread-pool size for parallel per-server polls | +| `LIVE_STATE_STALE_SECONDS` | `30` | UI staleness threshold | +| `LIVE_STATE_HISTORY_DAYS` | `30` | retention for snapshots + closed sessions | +| `STUCK_SESSION_SECONDS` | `60` | close open sessions whose server has been unreachable for this long | +| `STEAM_PROFILE_TTL_SECONDS` | `86400` | profile cache TTL | +| `STEAM_WEB_API_KEY` | `""` | from `web.env`; empty disables enrichment | + +--- + +## Tests + +- `l4d2web/tests/test_rcon.py` — protocol handshake against an in-process TCP fixture: auth-success, auth-failure (`req_id == -1`), header parse (incl. `(hibernating)` and `(reserved )` variants), roster parse (incl. the two-numeric-prefix L4D2 variant), Steam ID conversion. +- `l4d2web/tests/test_steam_users.py` — request shape (key in querystring, batched ids, 100-per-call ceiling), response parsing, partial response (some IDs missing). +- `l4d2web/tests/test_live_state_poller.py` — mirror `test_state_poller_*` at `l4d2web/tests/test_job_worker.py:882-952`. Cover: iterates only running servers with non-empty `rcon_password`, RLE upsert (matching state → `last_seen_at` bump only; differing state → new row), session open with backfilled `joined_at`, session close on disappearance, ping range expansion, stuck-session close after N failures, drops auth failures silently, respects retention. +- `l4d2web/tests/test_server_routes.py` (extend) — `/servers//live-state` fragment route renders summary/current/recent blocks correctly; stale rendering when latest snapshot is old; soft-fail rendering when no profile cached. +- `l4d2web/tests/test_l4d2_facade.py` (extend) — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when password is set; omits the line when empty; appears after both the overlay `exec` lines and the blueprint config lines. +- Migration test — existing rows backfilled with non-empty 43-char passwords; tables created with correct indexes. + +--- + +## Critical files + +**New:** +- `l4d2web/services/rcon.py` — Source RCON client + status parser +- `l4d2web/services/steam_users.py` — Steam Web API client (mirrors `steam_workshop.py`) +- `l4d2web/services/live_state_poller.py` — background thread + poll loop + session reconciler +- `l4d2web/alembic/versions/00XX_server_live_state.py` — migration: new column, three new tables, password backfill +- `l4d2web/templates/_live_state.html` — HTMX-refreshed fragment (summary + current + recent) +- `l4d2web/tests/test_rcon.py`, `l4d2web/tests/test_steam_users.py`, `l4d2web/tests/test_live_state_poller.py` + +**Modify:** +- `l4d2web/models.py` — add `ServerLiveState`, `ServerPlayerSession`, `SteamUserProfile`; add `rcon_password` to `Server` (after line 137) +- `l4d2web/services/l4d2_facade.py:28-52` — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when set +- `l4d2web/app.py` — call `start_live_state_poller(app)` next to existing `start_state_poller` +- `l4d2web/routes/server_routes.py` (or equivalent — confirm) — generate `rcon_password` in create handler; add `GET /servers//live-state` +- `l4d2web/templates/server_detail.html` — include `_live_state.html` +- `l4d2web/templates/.html` — confirm filename; add inline badge column +- `l4d2web/config.py` — register the eight new config keys +- `deploy/templates/etc/left4me/web.env` — add `STEAM_WEB_API_KEY=` and any tunables we expose + +**Reused without changes:** +- `l4d2web/services/job_worker.py:617-647` — daemon-thread / poll-loop pattern reference +- `l4d2web/services/steam_workshop.py:17-43` — `requests.Session` + form-POST pattern for Steam Web API +- `l4d2host/instances.py:40-58` — already writes `spec.config` verbatim, so no host-side change for password injection +- `l4d2web/templates/_overlay_build_status.html` — HTMX polling pattern reference + +--- + +## Verification + +1. **Unit tests**: + ``` + pytest l4d2web/tests/test_rcon.py l4d2web/tests/test_steam_users.py l4d2web/tests/test_live_state_poller.py -v + pytest l4d2web/tests -q # full regression + ``` + +2. **Migration check**: + ``` + alembic upgrade head + sqlite3 l4d2web.db "SELECT id, name, length(rcon_password) FROM servers;" # every row ~43 + sqlite3 l4d2web.db ".schema server_live_state server_player_session steam_user_profile" + ``` + +3. **End-to-end against prod** (`left4.me`): + - Deploy. Confirm `systemctl status left4me-web.service` shows no crash-loop and the journal logs `start_live_state_poller` once. + - Restart both existing game servers so they pick up the injected password. + - SQL sanity (web-host shell): + ``` + sqlite3 l4d2web.db "SELECT server_id, started_at, last_seen_at, players, map, hibernating + FROM server_live_state ORDER BY server_id, started_at DESC LIMIT 10;" + ``` + Expect a single recent row per server while idle; new rows when players come/go. + - Connect to one server from the L4D2 client; within 5s, `/servers/` shows a card with your avatar + persona name + ping range. Disconnect; within 5s the card moves to "recent." + - `sqlite3 l4d2web.db "SELECT * FROM server_player_session WHERE left_at IS NULL;"` — empty when nobody's connected; one row per current player when someone is. + - `sqlite3 l4d2web.db "SELECT count(*), MIN(fetched_at), MAX(fetched_at) FROM steam_user_profile;"` — at least one row after a player has been resolved. + +4. **Failure-path checks**: + - Manually corrupt `servers.rcon_password` for one server; confirm the journal logs auth failure and the row's badge goes stale within `LIVE_STATE_STALE_SECONDS`; other servers unaffected. + - Unset `STEAM_WEB_API_KEY` in `web.env`, restart web; confirm display still works (in-game names + placeholder avatars), no errors in journal. + - `nft` drop the loopback TCP on one server's port; confirm rows stop appearing, open sessions close after `STUCK_SESSION_SECONDS`, badge goes stale. + +--- + +## Open implementation questions + +- **Server-list template filename**: confirm with `ls l4d2web/templates/` once implementation starts. +- **Server-create route location**: confirm path (likely `l4d2web/routes/server_routes.py`). +- **CSP allowlist for Steam avatar CDNs**: check `l4d2web/app.py` (or wherever security headers live) — extend `img-src` to include `avatars.cloudflare.steamstatic.com`, `avatars.akamai.steamstatic.com`, `avatars.steamstatic.com` if a CSP is enforced. +- **Adaptive backoff** for hibernating servers: defer; start with fixed 5s and revisit only if load becomes a concern (which it won't at current server count). +- **Migration data step**: SQLite alembic batch operation with a Python data step that iterates rows and generates `secrets.token_urlsafe(32)` per row — confirm pattern against existing migrations under `l4d2web/alembic/versions/`. + +--- + +## Deferred to a separate plan + +- Generic RCON command execution (`changelevel`, `kick`, `say`, `sm_ban`, ...) +- Web UI buttons mapped to those commands with CSRF + admin authz +- Audit log table for issued commands +- Player-count history graphs (data already accumulating from this plan) +- Ban UX (lookup by Steam ID, search across `server_player_session`)