plan(server-log-current-invocation): scope server log to last unit start

Today the Server Log panel shows the last 200 lines of the unit's entire
journal — mixing the current run with leftovers from prior starts. Filter
on systemd's per-(re)start InvocationID so the panel begins at the most
recent start, idles with keepalives when the unit has never run, and
force-disconnects on restart so the SSE client reconnects to the new run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-15 22:31:53 +02:00
parent 49992b3a26
commit 2adf42655e
No known key found for this signature in database

View file

@ -0,0 +1,198 @@
# Plan: scope Server Log to the current unit invocation
## Context
Today the Server Log panel on `/servers/<id>` shows the last 200 lines of the
unit's **entire** journal — i.e. across every prior `start` / `stop` /
`reset` cycle — then follows. That means a freshly-started server can mix
lines from the current boot with leftovers from yesterday, which makes the
log harder to reason about. The user wants the panel to begin at the most
recent unit start.
The right systemd primitive is `_SYSTEMD_INVOCATION_ID`: systemd assigns a
fresh 128-bit ID to every (re)start of a unit, queryable via
`systemctl show -p InvocationID --value <unit>`. Filtering
`journalctl _SYSTEMD_INVOCATION_ID=<id>` gives exactly that one run.
User decisions (already confirmed):
- **Scope** — always last invocation; no toggle, no historical view.
- **Empty case** (unit has never been started) — SSE stays open with
keepalives, yields no data lines, attaches when first invocation appears.
- **Mid-stream restart** — backend force-disconnects when the InvocationID
changes. `EventSource` reconnects on its own and the next request picks
up the new run.
## Architecture
Three layers, smallest blast radius first:
```
browser ─SSE─► l4d2web routes/log_routes.py
l4d2web services/l4d2_facade.stream_server_logs
│ (l4d2ctl logs <unit> --lines N --follow)
l4d2host CLI logs → service_control.stream_journal
┌────────────┴──────────────────┐
▼ ▼
sudo left4me-systemctl show sudo left4me-journalctl
-p ActiveState <unit> --invocation-id <hex32>
-p SubState --lines N --follow|--no-follow
-p InvocationID ←NEW
```
`l4d2host.service_control.stream_journal` becomes the orchestrator:
1. Resolve `InvocationID` via `show_service` (already returns a parsed
`key=value` dict in `status.py:32-37` — adding a property is harmless).
2. **Empty `InvocationID`** (unit never ran):
- `follow=False` → return `iter(())`.
- `follow=True` → yield `""` every ~10 s as a keepalive nudge; poll
`InvocationID` every ~3 s; once it appears, fall through to step 3.
3. **Non-empty `InvocationID`**`Popen` the journalctl helper with
`--invocation-id <id>`. Start a daemon thread that re-reads the unit's
`InvocationID` every ~5 s; if it changes, call `proc.terminate()`.
The generator's normal end-of-stream path then closes the SSE response,
the browser's `EventSource` reconnects, and the next call picks up the
new ID.
`lines` cap is preserved (journalctl `-n N` inside the invocation), so a
long-running server doesn't dump tens of thousands of lines on page-load.
## Files to change
### Helpers (`deploy/scripts/libexec/`)
- **`left4me-systemctl`** — extend `show` to also request
`--property=InvocationID`. One-line change at `:43`:
```sh
show) exec "$systemctl" show \
--property=ActiveState \
--property=SubState \
--property=InvocationID \
"$unit" ;;
```
- **`left4me-journalctl`** — replace the unit-based filter with an
invocation-id-based one. New CLI signature:
```
left4me-journalctl <name> --invocation-id <hex32> --lines <n> --follow|--no-follow
```
Validate `<hex32>` against `^[0-9a-f]{32}$` (32 lowercase hex chars), same
defensive style as the existing name validation. Exec:
```sh
exec "$journalctl" \
_SYSTEMD_INVOCATION_ID="$invocation_id" \
-n "$lines" -o cat $follow_arg
```
No `-u <unit>` — the invocation ID is globally unique, the predicate is
enough. Old `<name> --lines <n> --follow` shape is removed (no callers
remain after the host layer change).
### Host (`l4d2host/`)
- **`l4d2host/service_control.py`**
- `journalctl_command(name, *, invocation_id, lines, follow)` → builds
the new arg list. Drop the `-u`-based form.
- `stream_journal(name, *, lines=200, follow=True)` → orchestrator from
Architecture step 1-3 above. Helpers in this file:
- `get_invocation_id(name) -> str` (parses `show_service` output;
returns `""` if unset).
- `_stream_with_restart_guard(invocation_id, lines, follow)`
`Popen` + daemon poller thread.
- Keep `stream_command` as-is (still consumed by `host_commands`).
- **`l4d2host/logs.py`** — no signature change; just forwards.
- **`l4d2host/cli.py`** — `logs` command keeps `--lines`/`--follow` flags
unchanged. No CLI-surface break.
### Web (`l4d2web/`)
- **`services/l4d2_facade.py`** — `stream_server_logs` keeps its signature.
The behavior change is fully inherited from the host layer.
- **`routes/log_routes.py`** — unchanged. The existing keepalive logic at
`:33` (`if line == "": yield ": keepalive\n\n"`) already handles the
empty-line nudges the host yields during the idle wait.
### Tests
- **`l4d2host/tests/test_logs.py`**
- Update `test_stream_instance_logs_uses_journalctl_helper` for the new
arg shape: `["sudo", "-n", "/usr/local/libexec/left4me/left4me-journalctl", "alpha", "--invocation-id", "<32hex>", "--lines", "25", "--no-follow"]`.
Stub out `get_invocation_id` to return a known ID.
- Add: empty InvocationID + `follow=False` → empty iterator (no
journalctl call).
- Add: empty InvocationID + `follow=True` → yields `""` then the next
`get_invocation_id` returns a real ID and the journalctl helper is
called once.
- Add: invocation changes mid-stream → poller calls `proc.terminate()`.
- **`l4d2host/tests/test_cli.py`** — `test_logs_command_streams_lines`:
update expected helper invocation, or stub at the `stream_instance_logs`
level (it's already monkeypatched in similar tests).
- **`deploy/scripts/tests/test_journalctl_helper.py`** — update existing
shell-syntax & argument-validation test for the new CLI signature.
Assert rejection of malformed invocation IDs (too short, non-hex,
uppercase, embedded slash).
- **`l4d2web/tests/test_status_and_server_logs.py`** — should pass
unchanged (the SSE shape and route surface haven't moved).
## Critical files
- `deploy/scripts/libexec/left4me-systemctl` (extend `show`)
- `deploy/scripts/libexec/left4me-journalctl` (rewrite CLI shape)
- `l4d2host/l4d2host/service_control.py` (orchestrator + helpers)
- `l4d2host/tests/test_logs.py`, `l4d2host/tests/test_cli.py`
- `deploy/scripts/tests/test_journalctl_helper.py`
- (No changes:) `l4d2web/services/l4d2_facade.py`, `l4d2web/routes/log_routes.py`
## Why not alternative approaches
- **`--since <ActiveEnterTimestamp>`** — works in the happy path but is
fragile to clock skew, system suspend, and units that restart inside
the same second. `_SYSTEMD_INVOCATION_ID` was added to systemd
specifically for this filter.
- **String-match the systemd `Started …` marker** — locale-dependent,
breaks with systemd-message changes, can't survive `Restart=`.
- **Toggle in the UI** — user explicitly opted out; YAGNI.
## Verification
1. **Unit tests** (sandboxed):
```
uv run --package l4d2host pytest l4d2host/tests/
uv run --package l4d2web pytest l4d2web/tests/
uv run pytest deploy/scripts/tests/ deploy/tests/
```
2. **Manual on the host** (`ckn@10.0.4.128`):
```
# the unit is running
l4d2ctl logs vanilla --no-follow | head -5
# → first lines should be from this run's start, not yesterday
# never-started case (pick an unstarted server, or stop first)
l4d2ctl stop vanilla && l4d2ctl logs vanilla --no-follow
# → empty output, exit 0
```
3. **End-to-end in browser**:
- Open `/servers/1`. Confirm log starts at this run's first line.
- Click Stop. Stream goes quiet. Click Start. SSE auto-reconnects and
shows the new run from line one.
- Open a fresh server that has never been started: log panel is empty
but connection is alive; clicking Start makes log appear within seconds.