# Server live-state display (counts, map, roster, avatars, history)
## Context
The l4d2web UI currently shows systemd lifecycle state per game server (running/stopped/unknown) but nothing about what's happening *inside* the game: player count, current map, whether the server is hibernating, who is connected. To know any of that, users have to context-switch (open the game, query externally).
The goal is a **read-side live-state display**: counts + map + hibernating on the server list, plus a server-detail panel showing the current player roster (avatars + names) and a "recent players" section for who's been on lately. Backed by a persistent history table so we get count-over-time graphs and player-presence history (foundation for future ban UX) for free.
**Source: RCON exclusively.** A2S_INFO (UDP, anonymous) was investigated and discarded — it can't deliver Steam IDs, hibernating flag, or interactive commands, so anything beyond raw counts re-routes through RCON anyway. Both transports were verified working against prod `left4.me`. Going RCON-only means one transport, one set of tests, no throwaway scaffolding.
**Avatars: Steam Web API.** RCON gives Steam IDs; `ISteamUser/GetPlayerSummaries` resolves them to persona names + avatar URLs hot-linked from Steam's CDN. API key already obtained.
**Commands are deferred** to a separate plan. This plan is read-only.
---
## Architecture
```
┌─────────────────────────────┐
│ left4me-web (Flask) │
┌──────────────┐ RCON │ ┌───────────────────────┐ │
│ srcds 27016 │◄──────┼──┤ live-state poller │ │
└──────────────┘ TCP │ │ (daemon thread) │ │
│ └───────┬───────────────┘ │
┌──────────────┐ RCON │ │ writes │
│ srcds 27021 │◄──────┤ ▼ │
└──────────────┘ │ ┌───────────────────────┐ │
│ │ server_live_state │ │
Steam Web API │ │ server_player_session │ │
┌────────────┐ │ │ steam_user_profile │ │
│ Steam CDN │◄─┼──┤ │ │
│ avatars... │ │ └───────┬───────────────┘ │
└────────────┘ │ │ reads │
▲ │ ▼ │
│ │ ┌───────────────────────┐ │
└────────┼──┤ /servers, /servers/N │ │
│ │ (HTMX 5s refresh) │ │
│ └───────────────────────┘ │
└─────────────────────────────┘
```
Single daemon thread (modeled on the existing `start_state_poller` in `l4d2web/services/job_worker.py:617-647`), inside the Flask process, polls every `LIVE_STATE_POLL_SECONDS` (default 5). Per poll, per running server with a configured RCON password:
1. TCP connect to `127.0.0.1:`, auth, send `status`, parse response.
2. Compare server-level state (players/map/hibernating/etc.) to the latest `server_live_state` row for this server. If unchanged, bump `last_seen_at`. If changed, insert a new row.
3. Reconcile open sessions (`server_player_session` rows where `left_at IS NULL`) with the current `status` roster: open new sessions for new players (backfilling `joined_at` from RCON's `connected` field), close sessions for players no longer present, update `min_ping`/`max_ping` for continuing sessions.
4. Collect Steam IDs that are missing from `steam_user_profile` or have `fetched_at` older than 24h; batch them into a single `GetPlayerSummaries` call; upsert results.
5. Trim `server_live_state` and closed sessions older than retention.
---
## Schema (one new alembic migration)
### New column: `servers.rcon_password`
```python
rcon_password: Mapped[str] = mapped_column(
String(64), nullable=False, default="", server_default=""
)
```
Empty string = "no password configured yet" (poller skips). Migration backfills every existing row with `secrets.token_urlsafe(32)` (~43 chars, URL-safe character set so the literal `"..."` cfg-quoting needs no escaping).
### `server_live_state` — run-length-encoded snapshots
```sql
CREATE TABLE server_live_state (
id INTEGER PRIMARY KEY AUTOINCREMENT,
server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE,
started_at DATETIME NOT NULL, -- when this exact state first appeared
last_seen_at DATETIME NOT NULL, -- most recent poll where it still held
players INTEGER NOT NULL,
max_players INTEGER NOT NULL,
bots INTEGER NOT NULL,
map VARCHAR(64) NOT NULL,
hibernating BOOLEAN NOT NULL
);
CREATE INDEX ix_sls_server_started ON server_live_state(server_id, started_at DESC);
```
- "State" = the tuple `(players, max_players, bots, map, hibernating)`. Ping/loss are deliberately not stored at server-level, so they don't churn rows.
- Idle hibernating server collapses from one-row-per-poll to one-row-per-state-change (≈17,280× compression for a 24h-idle server).
- Latest snapshot for a server: `ORDER BY started_at DESC LIMIT 1`. UI staleness check: `last_seen_at > now - LIVE_STATE_STALE_SECONDS` (default 30).
- Retention: trim rows where `last_seen_at < now - LIVE_STATE_HISTORY_DAYS` (default 30).
- Failed polls produce no DB write; the staleness check on `last_seen_at` handles UI degradation cleanly.
### `server_player_session` — interval per connection
```sql
CREATE TABLE server_player_session (
id INTEGER PRIMARY KEY AUTOINCREMENT,
server_id INTEGER NOT NULL REFERENCES servers(id) ON DELETE CASCADE,
steam_id_64 VARCHAR(20) NOT NULL,
joined_at DATETIME NOT NULL,
left_at DATETIME NULL, -- NULL = currently in-game
name_at_join VARCHAR(64) NOT NULL,
min_ping INTEGER NOT NULL,
max_ping INTEGER NOT NULL
);
CREATE INDEX ix_sps_server_open ON server_player_session(server_id, left_at);
CREATE INDEX ix_sps_server_recent ON server_player_session(server_id, left_at DESC);
CREATE INDEX ix_sps_steam_history ON server_player_session(steam_id_64, joined_at DESC);
```
- `joined_at` is **backfilled from RCON's `connected` duration** on first sighting (`joined_at = now - connected_seconds`). This heals brief polling gaps and survives web restarts: even if we just started polling, we know when the still-connected players actually joined.
- A player who disconnects and rejoins gets two rows, not one merged interval.
- Bots are excluded — rows with a non-`STEAM_X:Y:Z` uniqueid are skipped.
- `min_ping`/`max_ping` updated only when a new poll pushes the range, to avoid noise writes.
- On poller startup, close any sessions whose server isn't in current RCON output. Plus: close sessions after N consecutive failed polls of their server (TBD constant during implementation, e.g. 6 polls = ~30s).
- Retention: trim closed sessions where `left_at < now - SESSION_HISTORY_DAYS` (default 30). Open sessions never trimmed.
### `steam_user_profile` — cached profile data (24h TTL)
```sql
CREATE TABLE steam_user_profile (
steam_id_64 VARCHAR(20) PRIMARY KEY,
persona_name VARCHAR(64) NOT NULL,
avatar_url TEXT NOT NULL, -- avatarmedium from Steam Web API
fetched_at DATETIME NOT NULL
);
```
- Cache is global, not per-server (one profile per Steam ID).
- Refreshed when `fetched_at < now - 24h` or when entry is missing.
- Soft-fail: if the Steam API key is unset, the API is down, or a profile is private, we just leave the cache as-is and the UI falls back to `name_at_join` + placeholder avatar.
### Bind-rendered queries
**Current players on server X:**
```sql
SELECT sp.steam_id_64, sp.joined_at, sp.name_at_join,
sp.min_ping, sp.max_ping,
p.persona_name, p.avatar_url
FROM server_player_session sp
LEFT JOIN steam_user_profile p USING (steam_id_64)
WHERE sp.server_id = ? AND sp.left_at IS NULL
ORDER BY sp.joined_at;
```
**Recent players on server X (last 30 days, excluding currently in-game):**
```sql
SELECT sp.steam_id_64, MAX(sp.left_at) AS last_seen,
p.persona_name, p.avatar_url
FROM server_player_session sp
LEFT JOIN steam_user_profile p USING (steam_id_64)
WHERE sp.server_id = ?
AND sp.left_at IS NOT NULL
AND sp.left_at > datetime('now', '-30 days')
AND sp.steam_id_64 NOT IN (
SELECT steam_id_64 FROM server_player_session
WHERE server_id = ? AND left_at IS NULL
)
GROUP BY sp.steam_id_64, p.persona_name, p.avatar_url
ORDER BY last_seen DESC
LIMIT 20;
```
---
## Modules
### `l4d2web/services/rcon.py` (new)
Pure stdlib (`socket`, `struct`), no new dependency. Source RCON protocol:
```python
@dataclass(slots=True, frozen=True)
class PlayerRow:
steam_id_64: str # converted from STEAM_X:Y:Z
name: str
connected_seconds: int
ping: int
@dataclass(slots=True, frozen=True)
class StatusResponse:
map: str
players: int # humans
max_players: int
bots: int
hibernating: bool
roster: list[PlayerRow]
class RconError(Exception): ...
class RconAuthError(RconError): ...
def query_status(host: str, port: int, password: str, *, timeout: float = 2.0) -> StatusResponse: ...
```
Implementation notes:
- Auth handshake quirk verified live: server sends a `type=0` empty-body packet **before** the `type=2` auth response. Consume both. `req_id == -1` on the auth response = bad password.
- Single TCP connection per query (loopback, ~10-20ms total round-trip — pooling not worth it at this scale).
- Header regex on `map :` and `players :` lines (the `(hibernating|not hibernating)` token is in `players :`).
- Roster regex: split lines starting with `#`, skip the column-header line, robustly extract the quoted name + the `STEAM_X:Y:Z` token + `MM:SS` or `HH:MM:SS` connected duration + ping. Tolerate the two-numeric-prefix L4D2 variant (`# 2 1 "Crone" STEAM_1:0:...`).
- Steam ID conversion: `STEAM_X:Y:Z` → `76561197960265728 + (Y * 2) + Z` (returned as string).
### `l4d2web/services/steam_users.py` (new)
Modeled directly on `l4d2web/services/steam_workshop.py:17-43` (single `requests.Session`, 30s timeout, anonymous-pattern POST with form-encoded body — only difference is the `key=` parameter).
```python
@dataclass(slots=True, frozen=True)
class SteamProfile:
steam_id_64: str
persona_name: str
avatar_url: str # avatarmedium
def fetch_profiles_batch(steam_ids: Iterable[str], *, api_key: str) -> list[SteamProfile]: ...
```
- Endpoint: `GET https://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key=&steamids=`.
- Up to 100 IDs per call; caller batches.
- Returns only successful resolutions (private/deleted accounts simply absent from the response — fine, they stay uncached and the UI falls back).
- Raises on transport errors; caller decides whether to surface.
### `l4d2web/services/live_state_poller.py` (new)
Modeled on `start_state_poller` / `state_poller_loop` in `l4d2web/services/job_worker.py:617-647`.
```python
def start_live_state_poller(app) -> None: ... # spawns daemon thread, skipped under TESTING
def live_state_poller_loop(app, interval: float) -> None: ...
def poll_once() -> None: # one full pass over running servers
...
```
Per-server algorithm:
1. RCON `status` → `StatusResponse` (or skip on auth/timeout, logged via `app.logger`).
2. **Server-level RLE upsert**: load newest `server_live_state` row for this server. If `(players, max_players, bots, map, hibernating)` matches → `UPDATE last_seen_at = now()`. Else → `INSERT` new row.
3. **Session reconciliation** in a single transaction:
- Load open sessions for this server.
- For each player in `response.roster` not in open sessions: `INSERT` new session with `joined_at = now - connected_seconds`, `name_at_join = roster.name`, `min_ping = max_ping = roster.ping`.
- For each open session whose player is in the roster: if `roster.ping < min_ping` or `> max_ping`, `UPDATE` the range. Otherwise skip the write.
- For each open session whose player is *not* in the roster: `UPDATE left_at = now()`.
4. **Profile enrichment**: collect Steam IDs from the roster where the cached profile is missing or `fetched_at < now - 24h`. Skip if `STEAM_WEB_API_KEY` unset. Batch into one Steam API call. Upsert results.
Periodic (every Nth cycle, e.g. once a minute):
- Trim `server_live_state` and closed sessions past retention.
- Close any open sessions whose `server_id` hasn't had a successful RCON response in the last `STUCK_SESSION_SECONDS` (default 60).
### Modify: `l4d2web/services/l4d2_facade.py:28-52`
`build_server_spec_payload` **appends** `f'rcon_password "{server.rcon_password}"'` as the *last* entry in the returned `config` list, only if the password is non-empty. Appending (not prepending) matters: Source's cfg semantics are last-wins, so putting our line after both the overlay `exec` lines and the user's blueprint config guarantees no overlay or blueprint can silently clobber the password and break the poller. `l4d2host/instances.py:40-58` already writes `spec.config` lines verbatim to `server.cfg` — **no host-side change needed**.
### Modify: server-create route
Wherever the server-create form handler lives (`l4d2web/routes/server_routes.py` or similar — confirm during implementation): before commit, generate `rcon_password = secrets.token_urlsafe(32)`.
---
## Web UI
### Server list (template TBD: `ls l4d2web/templates/` during implementation)
Add an inline live-state cell per server row:
- Stopped server: `—`
- Stale (no row newer than `LIVE_STATE_STALE_SECONDS`): dim `?` with tooltip "no data"
- Hibernating: `0/4 · idle · c1m1_hotel`
- Active: `2/4 · c1m2_streets`
No HTMX on the list page; page reload picks up the latest snapshot.
### Server detail (`l4d2web/templates/server_detail.html`)
New section, HTMX-refreshed every `LIVE_STATE_POLL_SECONDS` (default 5):
```html
```
The partial renders three blocks:
1. **Summary**: `players/max_players · map · idle?` plus a small "polled Ns ago" caption.
2. **Current players** (only if non-empty): grid of cards, each `
{{ profile.persona_name or session.name_at_join }} · {{ joined_relative }} · ping {{ min }}-{{ max }}ms`.
3. **Recent players** (last 30 days, excluding current; only if non-empty): smaller cards, `{{ avatar }} {{ persona_name or name_at_join }} · last seen {{ last_seen_relative }}`.
New route: `GET /servers//live-state` returns the partial. Composition mirrors the existing build-status pattern at `l4d2web/templates/_overlay_build_status.html:1-5`.
Avatar `
` tags point straight at Steam CDN URLs (`avatars.cloudflare.steamstatic.com` / `avatars.akamai.steamstatic.com`). No proxying. Same approach as `WorkshopItem.preview_url`. Note: confirm the existing CSP allows these hosts; if not, extend it.
No JS framework added — HTMX only.
---
## Config keys
In `l4d2web/config.py`, plus documented defaults in `deploy/templates/etc/left4me/web.env` where applicable:
| key | default | purpose |
|---|---|---|
| `LIVE_STATE_POLL_SECONDS` | `5` | poll interval |
| `LIVE_STATE_QUERY_TIMEOUT_SECONDS` | `2.0` | per-RCON-query timeout |
| `LIVE_STATE_POLL_WORKERS` | `4` | thread-pool size for parallel per-server polls |
| `LIVE_STATE_STALE_SECONDS` | `30` | UI staleness threshold |
| `LIVE_STATE_HISTORY_DAYS` | `30` | retention for snapshots + closed sessions |
| `STUCK_SESSION_SECONDS` | `60` | close open sessions whose server has been unreachable for this long |
| `STEAM_PROFILE_TTL_SECONDS` | `86400` | profile cache TTL |
| `STEAM_WEB_API_KEY` | `""` | from `web.env`; empty disables enrichment |
---
## Tests
- `l4d2web/tests/test_rcon.py` — protocol handshake against an in-process TCP fixture: auth-success, auth-failure (`req_id == -1`), header parse (incl. `(hibernating)` and `(reserved )` variants), roster parse (incl. the two-numeric-prefix L4D2 variant), Steam ID conversion.
- `l4d2web/tests/test_steam_users.py` — request shape (key in querystring, batched ids, 100-per-call ceiling), response parsing, partial response (some IDs missing).
- `l4d2web/tests/test_live_state_poller.py` — mirror `test_state_poller_*` at `l4d2web/tests/test_job_worker.py:882-952`. Cover: iterates only running servers with non-empty `rcon_password`, RLE upsert (matching state → `last_seen_at` bump only; differing state → new row), session open with backfilled `joined_at`, session close on disappearance, ping range expansion, stuck-session close after N failures, drops auth failures silently, respects retention.
- `l4d2web/tests/test_server_routes.py` (extend) — `/servers//live-state` fragment route renders summary/current/recent blocks correctly; stale rendering when latest snapshot is old; soft-fail rendering when no profile cached.
- `l4d2web/tests/test_l4d2_facade.py` (extend) — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when password is set; omits the line when empty; appears after both the overlay `exec` lines and the blueprint config lines.
- Migration test — existing rows backfilled with non-empty 43-char passwords; tables created with correct indexes.
---
## Critical files
**New:**
- `l4d2web/services/rcon.py` — Source RCON client + status parser
- `l4d2web/services/steam_users.py` — Steam Web API client (mirrors `steam_workshop.py`)
- `l4d2web/services/live_state_poller.py` — background thread + poll loop + session reconciler
- `l4d2web/alembic/versions/00XX_server_live_state.py` — migration: new column, three new tables, password backfill
- `l4d2web/templates/_live_state.html` — HTMX-refreshed fragment (summary + current + recent)
- `l4d2web/tests/test_rcon.py`, `l4d2web/tests/test_steam_users.py`, `l4d2web/tests/test_live_state_poller.py`
**Modify:**
- `l4d2web/models.py` — add `ServerLiveState`, `ServerPlayerSession`, `SteamUserProfile`; add `rcon_password` to `Server` (after line 137)
- `l4d2web/services/l4d2_facade.py:28-52` — `build_server_spec_payload` appends `rcon_password "..."` as the last config line when set
- `l4d2web/app.py` — call `start_live_state_poller(app)` next to existing `start_state_poller`
- `l4d2web/routes/server_routes.py` (or equivalent — confirm) — generate `rcon_password` in create handler; add `GET /servers//live-state`
- `l4d2web/templates/server_detail.html` — include `_live_state.html`
- `l4d2web/templates/.html` — confirm filename; add inline badge column
- `l4d2web/config.py` — register the eight new config keys
- `deploy/templates/etc/left4me/web.env` — add `STEAM_WEB_API_KEY=` and any tunables we expose
**Reused without changes:**
- `l4d2web/services/job_worker.py:617-647` — daemon-thread / poll-loop pattern reference
- `l4d2web/services/steam_workshop.py:17-43` — `requests.Session` + form-POST pattern for Steam Web API
- `l4d2host/instances.py:40-58` — already writes `spec.config` verbatim, so no host-side change for password injection
- `l4d2web/templates/_overlay_build_status.html` — HTMX polling pattern reference
---
## Verification
1. **Unit tests**:
```
pytest l4d2web/tests/test_rcon.py l4d2web/tests/test_steam_users.py l4d2web/tests/test_live_state_poller.py -v
pytest l4d2web/tests -q # full regression
```
2. **Migration check**:
```
alembic upgrade head
sqlite3 l4d2web.db "SELECT id, name, length(rcon_password) FROM servers;" # every row ~43
sqlite3 l4d2web.db ".schema server_live_state server_player_session steam_user_profile"
```
3. **End-to-end against prod** (`left4.me`):
- Deploy. Confirm `systemctl status left4me-web.service` shows no crash-loop and the journal logs `start_live_state_poller` once.
- Restart both existing game servers so they pick up the injected password.
- SQL sanity (web-host shell):
```
sqlite3 l4d2web.db "SELECT server_id, started_at, last_seen_at, players, map, hibernating
FROM server_live_state ORDER BY server_id, started_at DESC LIMIT 10;"
```
Expect a single recent row per server while idle; new rows when players come/go.
- Connect to one server from the L4D2 client; within 5s, `/servers/` shows a card with your avatar + persona name + ping range. Disconnect; within 5s the card moves to "recent."
- `sqlite3 l4d2web.db "SELECT * FROM server_player_session WHERE left_at IS NULL;"` — empty when nobody's connected; one row per current player when someone is.
- `sqlite3 l4d2web.db "SELECT count(*), MIN(fetched_at), MAX(fetched_at) FROM steam_user_profile;"` — at least one row after a player has been resolved.
4. **Failure-path checks**:
- Manually corrupt `servers.rcon_password` for one server; confirm the journal logs auth failure and the row's badge goes stale within `LIVE_STATE_STALE_SECONDS`; other servers unaffected.
- Unset `STEAM_WEB_API_KEY` in `web.env`, restart web; confirm display still works (in-game names + placeholder avatars), no errors in journal.
- `nft` drop the loopback TCP on one server's port; confirm rows stop appearing, open sessions close after `STUCK_SESSION_SECONDS`, badge goes stale.
---
## Open implementation questions
- **Server-list template filename**: confirm with `ls l4d2web/templates/` once implementation starts.
- **Server-create route location**: confirm path (likely `l4d2web/routes/server_routes.py`).
- **CSP allowlist for Steam avatar CDNs**: check `l4d2web/app.py` (or wherever security headers live) — extend `img-src` to include `avatars.cloudflare.steamstatic.com`, `avatars.akamai.steamstatic.com`, `avatars.steamstatic.com` if a CSP is enforced.
- **Adaptive backoff** for hibernating servers: defer; start with fixed 5s and revisit only if load becomes a concern (which it won't at current server count).
- **Migration data step**: SQLite alembic batch operation with a Python data step that iterates rows and generates `secrets.token_urlsafe(32)` per row — confirm pattern against existing migrations under `l4d2web/alembic/versions/`.
---
## Deferred to a separate plan
- Generic RCON command execution (`changelevel`, `kick`, `say`, `sm_ban`, ...)
- Web UI buttons mapped to those commands with CSRF + admin authz
- Audit log table for issued commands
- Player-count history graphs (data already accumulating from this plan)
- Ban UX (lookup by Steam ID, search across `server_player_session`)