left4me/docs/superpowers/specs/2026-05-12-server-live-state-display-design.md
mwiegand 83d2a9932c
refactor(rcon): harden _parse_duration; surface fixture handler errors
- _parse_duration wraps int() in try/except so malformed connected
  durations raise RconError (not ValueError leaking past the poller's
  except RconError).
- fake_rcon_server captures handler exceptions and re-raises at context
  exit, so a buggy test handler surfaces as a real failure instead of
  silently degrading into a client-side timeout.
- Two new parser tests: HH:MM:SS duration parsing and malformed input
  coverage.
- Fix Steam ID formula typo in the spec doc (Z*2 + Y, not Y*2 + Z; Y is
  the low bit). Code was already correct.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 21:39:32 +02:00

23 KiB
Raw Blame History

Server live-state display (counts, map, roster, avatars, history)

Context

The l4d2web UI currently shows systemd lifecycle state per game server (running/stopped/unknown) but nothing about what's happening inside the game: player count, current map, whether the server is hibernating, who is connected. To know any of that, users have to context-switch (open the game, query externally).

The goal is a read-side live-state display: counts + map + hibernating on the server list, plus a server-detail panel showing the current player roster (avatars + names) and a "recent players" section for who's been on lately. Backed by a persistent history table so we get count-over-time graphs and player-presence history (foundation for future ban UX) for free.

Source: RCON exclusively. A2S_INFO (UDP, anonymous) was investigated and discarded — it can't deliver Steam IDs, hibernating flag, or interactive commands, so anything beyond raw counts re-routes through RCON anyway. Both transports were verified working against prod left4.me. Going RCON-only means one transport, one set of tests, no throwaway scaffolding.

Avatars: Steam Web API. RCON gives Steam IDs; ISteamUser/GetPlayerSummaries resolves them to persona names + avatar URLs hot-linked from Steam's CDN. API key already obtained.

Commands are deferred to a separate plan. This plan is read-only.


Architecture

                       ┌─────────────────────────────┐
                       │   left4me-web (Flask)       │
┌──────────────┐  RCON │  ┌───────────────────────┐  │
│  srcds 27016 │◄──────┼──┤ live-state poller     │  │
└──────────────┘   TCP │  │   (daemon thread)     │  │
                       │  └───────┬───────────────┘  │
┌──────────────┐  RCON │          │ writes          │
│  srcds 27021 │◄──────┤          ▼                 │
└──────────────┘       │  ┌───────────────────────┐  │
                       │  │ server_live_state     │  │
       Steam Web API   │  │ server_player_session │  │
       ┌────────────┐  │  │ steam_user_profile    │  │
       │ Steam CDN  │◄─┼──┤                       │  │
       │ avatars... │  │  └───────┬───────────────┘  │
       └────────────┘  │          │ reads            │
              ▲        │          ▼                  │
              │        │  ┌───────────────────────┐  │
              └────────┼──┤ /servers, /servers/N  │  │
        <img src=...>  │  │  (HTMX 5s refresh)    │  │
                       │  └───────────────────────┘  │
                       └─────────────────────────────┘

Single daemon thread (modeled on the existing start_state_poller in l4d2web/services/job_worker.py:617-647), inside the Flask process, polls every LIVE_STATE_POLL_SECONDS (default 5). Per poll, per running server with a configured RCON password:

  1. TCP connect to 127.0.0.1:<port>, auth, send status, parse response.
  2. Compare server-level state (players/map/hibernating/etc.) to the latest server_live_state row for this server. If unchanged, bump last_seen_at. If changed, insert a new row.
  3. Reconcile open sessions (server_player_session rows where left_at IS NULL) with the current status roster: open new sessions for new players (backfilling joined_at from RCON's connected field), close sessions for players no longer present, update min_ping/max_ping for continuing sessions.
  4. Collect Steam IDs that are missing from steam_user_profile or have fetched_at older than 24h; batch them into a single GetPlayerSummaries call; upsert results.
  5. Trim server_live_state and closed sessions older than retention.

Schema (one new alembic migration)

New column: servers.rcon_password

rcon_password: Mapped[str] = mapped_column(
    String(64), nullable=False, default="", server_default=""
)

Empty string = "no password configured yet" (poller skips). Migration backfills every existing row with secrets.token_urlsafe(32) (~43 chars, URL-safe character set so the literal "..." cfg-quoting needs no escaping).

server_live_state — run-length-encoded snapshots

CREATE TABLE server_live_state (
  id            INTEGER PRIMARY KEY AUTOINCREMENT,
  server_id     INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE,
  started_at    DATETIME NOT NULL,    -- when this exact state first appeared
  last_seen_at  DATETIME NOT NULL,    -- most recent poll where it still held
  players       INTEGER NOT NULL,
  max_players   INTEGER NOT NULL,
  bots          INTEGER NOT NULL,
  map           VARCHAR(64) NOT NULL,
  hibernating   BOOLEAN NOT NULL
);
CREATE INDEX ix_sls_server_started ON server_live_state(server_id, started_at DESC);
  • "State" = the tuple (players, max_players, bots, map, hibernating). Ping/loss are deliberately not stored at server-level, so they don't churn rows.
  • Idle hibernating server collapses from one-row-per-poll to one-row-per-state-change (≈17,280× compression for a 24h-idle server).
  • Latest snapshot for a server: ORDER BY started_at DESC LIMIT 1. UI staleness check: last_seen_at > now - LIVE_STATE_STALE_SECONDS (default 30).
  • Retention: trim rows where last_seen_at < now - LIVE_STATE_HISTORY_DAYS (default 30).
  • Failed polls produce no DB write; the staleness check on last_seen_at handles UI degradation cleanly.

server_player_session — interval per connection

CREATE TABLE server_player_session (
  id            INTEGER PRIMARY KEY AUTOINCREMENT,
  server_id     INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE,
  steam_id_64   VARCHAR(20) NOT NULL,
  joined_at     DATETIME NOT NULL,
  left_at       DATETIME NULL,                  -- NULL = currently in-game
  name_at_join  VARCHAR(64) NOT NULL,
  min_ping      INTEGER NOT NULL,
  max_ping      INTEGER NOT NULL
);
CREATE INDEX ix_sps_server_open    ON server_player_session(server_id, left_at);
CREATE INDEX ix_sps_steam_history  ON server_player_session(steam_id_64, joined_at);
  • joined_at is backfilled from RCON's connected duration on first sighting (joined_at = now - connected_seconds). This heals brief polling gaps and survives web restarts: even if we just started polling, we know when the still-connected players actually joined.
  • A player who disconnects and rejoins gets two rows, not one merged interval.
  • Bots are excluded — rows with a non-STEAM_X:Y:Z uniqueid are skipped.
  • min_ping/max_ping updated only when a new poll pushes the range, to avoid noise writes.
  • On poller startup, close any sessions whose server isn't in current RCON output. Plus: close sessions after N consecutive failed polls of their server (TBD constant during implementation, e.g. 6 polls = ~30s).
  • Retention: trim closed sessions where left_at < now - SESSION_HISTORY_DAYS (default 30). Open sessions never trimmed.

steam_user_profile — cached profile data (24h TTL)

CREATE TABLE steam_user_profile (
  steam_id_64   VARCHAR(20) PRIMARY KEY,
  persona_name  VARCHAR(64) NOT NULL,
  avatar_url    TEXT NOT NULL,           -- avatarmedium from Steam Web API
  fetched_at    DATETIME NOT NULL
);
  • Cache is global, not per-server (one profile per Steam ID).
  • Refreshed when fetched_at < now - 24h or when entry is missing.
  • Soft-fail: if the Steam API key is unset, the API is down, or a profile is private, we just leave the cache as-is and the UI falls back to name_at_join + placeholder avatar.

Bind-rendered queries

Current players on server X:

SELECT sp.steam_id_64, sp.joined_at, sp.name_at_join,
       sp.min_ping, sp.max_ping,
       p.persona_name, p.avatar_url
FROM server_player_session sp
LEFT JOIN steam_user_profile p USING (steam_id_64)
WHERE sp.server_id = ? AND sp.left_at IS NULL
ORDER BY sp.joined_at;

Recent players on server X (last 30 days, excluding currently in-game):

SELECT sp.steam_id_64, MAX(sp.left_at) AS last_seen,
       p.persona_name, p.avatar_url
FROM server_player_session sp
LEFT JOIN steam_user_profile p USING (steam_id_64)
WHERE sp.server_id = ?
  AND sp.left_at IS NOT NULL
  AND sp.left_at > datetime('now', '-30 days')
  AND sp.steam_id_64 NOT IN (
      SELECT steam_id_64 FROM server_player_session
      WHERE server_id = ? AND left_at IS NULL
  )
GROUP BY sp.steam_id_64, p.persona_name, p.avatar_url
ORDER BY last_seen DESC
LIMIT 20;

Modules

l4d2web/services/rcon.py (new)

Pure stdlib (socket, struct), no new dependency. Source RCON protocol:

@dataclass(slots=True, frozen=True)
class PlayerRow:
    steam_id_64: str     # converted from STEAM_X:Y:Z
    name: str
    connected_seconds: int
    ping: int

@dataclass(slots=True, frozen=True)
class StatusResponse:
    map: str
    players: int          # humans
    max_players: int
    bots: int
    hibernating: bool
    roster: list[PlayerRow]

class RconError(Exception): ...
class RconAuthError(RconError): ...

def query_status(host: str, port: int, password: str, *, timeout: float = 2.0) -> StatusResponse: ...

Implementation notes:

  • Auth handshake quirk verified live: server sends a type=0 empty-body packet before the type=2 auth response. Consume both. req_id == -1 on the auth response = bad password.
  • Single TCP connection per query (loopback, ~10-20ms total round-trip — pooling not worth it at this scale).
  • Header regex on map : and players : lines (the (hibernating|not hibernating) token is in players :).
  • Roster regex: split lines starting with #, skip the column-header line, robustly extract the quoted name + the STEAM_X:Y:Z token + MM:SS or HH:MM:SS connected duration + ping. Tolerate the two-numeric-prefix L4D2 variant (# 2 1 "Crone" STEAM_1:0:...).
  • Steam ID conversion: STEAM_X:Y:Z76561197960265728 + (Z * 2) + Y (Y is the low bit; returned as string).

l4d2web/services/steam_users.py (new)

Modeled directly on l4d2web/services/steam_workshop.py:17-43 (single requests.Session, 30s timeout, anonymous-pattern POST with form-encoded body — only difference is the key= parameter).

@dataclass(slots=True, frozen=True)
class SteamProfile:
    steam_id_64: str
    persona_name: str
    avatar_url: str       # avatarmedium

def fetch_profiles_batch(steam_ids: Iterable[str], *, api_key: str) -> list[SteamProfile]: ...
  • Endpoint: GET https://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key=<key>&steamids=<csv>.
  • Up to 100 IDs per call; caller batches.
  • Returns only successful resolutions (private/deleted accounts simply absent from the response — fine, they stay uncached and the UI falls back).
  • Raises on transport errors; caller decides whether to surface.

l4d2web/services/live_state_poller.py (new)

Modeled on start_state_poller / state_poller_loop in l4d2web/services/job_worker.py:617-647.

def start_live_state_poller(app) -> None: ...           # spawns daemon thread, skipped under TESTING
def live_state_poller_loop(app, interval: float) -> None: ...
def poll_once() -> None:                                # one full pass over running servers
    ...

Per-server algorithm:

  1. RCON statusStatusResponse (or skip on auth/timeout, logged via app.logger).
  2. Server-level RLE upsert: load newest server_live_state row for this server. If (players, max_players, bots, map, hibernating) matches → UPDATE last_seen_at = now(). Else → INSERT new row.
  3. Session reconciliation in a single transaction:
    • Load open sessions for this server.
    • For each player in response.roster not in open sessions: INSERT new session with joined_at = now - connected_seconds, name_at_join = roster.name, min_ping = max_ping = roster.ping.
    • For each open session whose player is in the roster: if roster.ping < min_ping or > max_ping, UPDATE the range. Otherwise skip the write.
    • For each open session whose player is not in the roster: UPDATE left_at = now().
  4. Profile enrichment: collect Steam IDs from the roster where the cached profile is missing or fetched_at < now - 24h. Skip if STEAM_WEB_API_KEY unset. Batch into one Steam API call. Upsert results.

Periodic (every Nth cycle, e.g. once a minute):

  • Trim server_live_state and closed sessions past retention.
  • Close any open sessions whose server_id hasn't had a successful RCON response in the last STUCK_SESSION_SECONDS (default 60).

Modify: l4d2web/services/l4d2_facade.py:28-52

build_server_spec_payload appends f'rcon_password "{server.rcon_password}"' as the last entry in the returned config list, only if the password is non-empty. Appending (not prepending) matters: Source's cfg semantics are last-wins, so putting our line after both the overlay exec lines and the user's blueprint config guarantees no overlay or blueprint can silently clobber the password and break the poller. l4d2host/instances.py:40-58 already writes spec.config lines verbatim to server.cfgno host-side change needed.

Modify: server-create route

Wherever the server-create form handler lives (l4d2web/routes/server_routes.py or similar — confirm during implementation): before commit, generate rcon_password = secrets.token_urlsafe(32).


Web UI

Server list (template TBD: ls l4d2web/templates/ during implementation)

Add an inline live-state cell per server row:

  • Stopped server:
  • Stale (no row newer than LIVE_STATE_STALE_SECONDS): dim ? with tooltip "no data"
  • Hibernating: 0/4 · idle · c1m1_hotel
  • Active: 2/4 · c1m2_streets

No HTMX on the list page; page reload picks up the latest snapshot.

Server detail (l4d2web/templates/server_detail.html)

New section, HTMX-refreshed every LIVE_STATE_POLL_SECONDS (default 5):

<section class="panel"
         hx-get="/servers/{{ server.id }}/live-state"
         hx-trigger="every 5s"
         hx-swap="outerHTML">
  <!-- rendered from l4d2web/templates/_live_state.html -->
</section>

The partial renders three blocks:

  1. Summary: players/max_players · map · idle? plus a small "polled Ns ago" caption.
  2. Current players (only if non-empty): grid of cards, each <img src="{{ profile.avatar_url or placeholder }}" /> {{ profile.persona_name or session.name_at_join }} · {{ joined_relative }} · ping {{ min }}-{{ max }}ms.
  3. Recent players (last 30 days, excluding current; only if non-empty): smaller cards, {{ avatar }} {{ persona_name or name_at_join }} · last seen {{ last_seen_relative }}.

New route: GET /servers/<id>/live-state returns the partial. Composition mirrors the existing build-status pattern at l4d2web/templates/_overlay_build_status.html:1-5.

Avatar <img> tags point straight at Steam CDN URLs (avatars.cloudflare.steamstatic.com / avatars.akamai.steamstatic.com). No proxying. Same approach as WorkshopItem.preview_url. Note: confirm the existing CSP allows these hosts; if not, extend it.

No JS framework added — HTMX only.


Config keys

In l4d2web/config.py, plus documented defaults in deploy/templates/etc/left4me/web.env where applicable:

key default purpose
LIVE_STATE_POLL_SECONDS 5 poll interval
LIVE_STATE_QUERY_TIMEOUT_SECONDS 2.0 per-RCON-query timeout
LIVE_STATE_POLL_WORKERS 4 thread-pool size for parallel per-server polls
LIVE_STATE_STALE_SECONDS 30 UI staleness threshold
LIVE_STATE_HISTORY_DAYS 30 retention for snapshots + closed sessions
STUCK_SESSION_SECONDS 60 close open sessions whose server has been unreachable for this long
STEAM_PROFILE_TTL_SECONDS 86400 profile cache TTL
STEAM_WEB_API_KEY "" from web.env; empty disables enrichment

Tests

  • l4d2web/tests/test_rcon.py — protocol handshake against an in-process TCP fixture: auth-success, auth-failure (req_id == -1), header parse (incl. (hibernating) and (reserved <token>) variants), roster parse (incl. the two-numeric-prefix L4D2 variant), Steam ID conversion.
  • l4d2web/tests/test_steam_users.py — request shape (key in querystring, batched ids, 100-per-call ceiling), response parsing, partial response (some IDs missing).
  • l4d2web/tests/test_live_state_poller.py — mirror test_state_poller_* at l4d2web/tests/test_job_worker.py:882-952. Cover: iterates only running servers with non-empty rcon_password, RLE upsert (matching state → last_seen_at bump only; differing state → new row), session open with backfilled joined_at, session close on disappearance, ping range expansion, stuck-session close after N failures, drops auth failures silently, respects retention.
  • l4d2web/tests/test_server_routes.py (extend) — /servers/<id>/live-state fragment route renders summary/current/recent blocks correctly; stale rendering when latest snapshot is old; soft-fail rendering when no profile cached.
  • l4d2web/tests/test_l4d2_facade.py (extend) — build_server_spec_payload appends rcon_password "..." as the last config line when password is set; omits the line when empty; appears after both the overlay exec lines and the blueprint config lines.
  • Migration test — existing rows backfilled with non-empty 43-char passwords; tables created with correct indexes.

Critical files

New:

  • l4d2web/services/rcon.py — Source RCON client + status parser
  • l4d2web/services/steam_users.py — Steam Web API client (mirrors steam_workshop.py)
  • l4d2web/services/live_state_poller.py — background thread + poll loop + session reconciler
  • l4d2web/alembic/versions/00XX_server_live_state.py — migration: new column, three new tables, password backfill
  • l4d2web/templates/_live_state.html — HTMX-refreshed fragment (summary + current + recent)
  • l4d2web/tests/test_rcon.py, l4d2web/tests/test_steam_users.py, l4d2web/tests/test_live_state_poller.py

Modify:

  • l4d2web/models.py — add ServerLiveState, ServerPlayerSession, SteamUserProfile; add rcon_password to Server (after line 137)
  • l4d2web/services/l4d2_facade.py:28-52build_server_spec_payload appends rcon_password "..." as the last config line when set
  • l4d2web/app.py — call start_live_state_poller(app) next to existing start_state_poller
  • l4d2web/routes/server_routes.py (or equivalent — confirm) — generate rcon_password in create handler; add GET /servers/<id>/live-state
  • l4d2web/templates/server_detail.html — include _live_state.html
  • l4d2web/templates/<server-list>.html — confirm filename; add inline badge column
  • l4d2web/config.py — register the eight new config keys
  • deploy/templates/etc/left4me/web.env — add STEAM_WEB_API_KEY= and any tunables we expose

Reused without changes:

  • l4d2web/services/job_worker.py:617-647 — daemon-thread / poll-loop pattern reference
  • l4d2web/services/steam_workshop.py:17-43requests.Session + form-POST pattern for Steam Web API
  • l4d2host/instances.py:40-58 — already writes spec.config verbatim, so no host-side change for password injection
  • l4d2web/templates/_overlay_build_status.html — HTMX polling pattern reference

Verification

  1. Unit tests:

    pytest l4d2web/tests/test_rcon.py l4d2web/tests/test_steam_users.py l4d2web/tests/test_live_state_poller.py -v
    pytest l4d2web/tests -q   # full regression
    
  2. Migration check:

    alembic upgrade head
    sqlite3 l4d2web.db "SELECT id, name, length(rcon_password) FROM servers;"   # every row ~43
    sqlite3 l4d2web.db ".schema server_live_state server_player_session steam_user_profile"
    
  3. End-to-end against prod (left4.me):

    • Deploy. Confirm systemctl status left4me-web.service shows no crash-loop and the journal logs start_live_state_poller once.
    • Restart both existing game servers so they pick up the injected password.
    • SQL sanity (web-host shell):
      sqlite3 l4d2web.db "SELECT server_id, started_at, last_seen_at, players, map, hibernating
                          FROM server_live_state ORDER BY server_id, started_at DESC LIMIT 10;"
      
      Expect a single recent row per server while idle; new rows when players come/go.
    • Connect to one server from the L4D2 client; within 5s, /servers/<id> shows a card with your avatar + persona name + ping range. Disconnect; within 5s the card moves to "recent."
    • sqlite3 l4d2web.db "SELECT * FROM server_player_session WHERE left_at IS NULL;" — empty when nobody's connected; one row per current player when someone is.
    • sqlite3 l4d2web.db "SELECT count(*), MIN(fetched_at), MAX(fetched_at) FROM steam_user_profile;" — at least one row after a player has been resolved.
  4. Failure-path checks:

    • Manually corrupt servers.rcon_password for one server; confirm the journal logs auth failure and the row's badge goes stale within LIVE_STATE_STALE_SECONDS; other servers unaffected.
    • Unset STEAM_WEB_API_KEY in web.env, restart web; confirm display still works (in-game names + placeholder avatars), no errors in journal.
    • nft drop the loopback TCP on one server's port; confirm rows stop appearing, open sessions close after STUCK_SESSION_SECONDS, badge goes stale.

Open implementation questions

  • Server-list template filename: confirm with ls l4d2web/templates/ once implementation starts.
  • Server-create route location: confirm path (likely l4d2web/routes/server_routes.py).
  • CSP allowlist for Steam avatar CDNs: check l4d2web/app.py (or wherever security headers live) — extend img-src to include avatars.cloudflare.steamstatic.com, avatars.akamai.steamstatic.com, avatars.steamstatic.com if a CSP is enforced.
  • Adaptive backoff for hibernating servers: defer; start with fixed 5s and revisit only if load becomes a concern (which it won't at current server count).
  • Migration data step: SQLite alembic batch operation with a Python data step that iterates rows and generates secrets.token_urlsafe(32) per row — confirm pattern against existing migrations under l4d2web/alembic/versions/.

Deferred to a separate plan

  • Generic RCON command execution (changelevel, kick, say, sm_ban, ...)
  • Web UI buttons mapped to those commands with CSRF + admin authz
  • Audit log table for issued commands
  • Player-count history graphs (data already accumulating from this plan)
  • Ban UX (lookup by Steam ID, search across server_player_session)