plan(server-log-current-invocation): scope server log to last unit start
Today the Server Log panel shows the last 200 lines of the unit's entire journal — mixing the current run with leftovers from prior starts. Filter on systemd's per-(re)start InvocationID so the panel begins at the most recent start, idles with keepalives when the unit has never run, and force-disconnects on restart so the SSE client reconnects to the new run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
49992b3a26
commit
2adf42655e
1 changed files with 198 additions and 0 deletions
|
|
@ -0,0 +1,198 @@
|
|||
# Plan: scope Server Log to the current unit invocation
|
||||
|
||||
## Context
|
||||
|
||||
Today the Server Log panel on `/servers/<id>` shows the last 200 lines of the
|
||||
unit's **entire** journal — i.e. across every prior `start` / `stop` /
|
||||
`reset` cycle — then follows. That means a freshly-started server can mix
|
||||
lines from the current boot with leftovers from yesterday, which makes the
|
||||
log harder to reason about. The user wants the panel to begin at the most
|
||||
recent unit start.
|
||||
|
||||
The right systemd primitive is `_SYSTEMD_INVOCATION_ID`: systemd assigns a
|
||||
fresh 128-bit ID to every (re)start of a unit, queryable via
|
||||
`systemctl show -p InvocationID --value <unit>`. Filtering
|
||||
`journalctl _SYSTEMD_INVOCATION_ID=<id>` gives exactly that one run.
|
||||
|
||||
User decisions (already confirmed):
|
||||
|
||||
- **Scope** — always last invocation; no toggle, no historical view.
|
||||
- **Empty case** (unit has never been started) — SSE stays open with
|
||||
keepalives, yields no data lines, attaches when first invocation appears.
|
||||
- **Mid-stream restart** — backend force-disconnects when the InvocationID
|
||||
changes. `EventSource` reconnects on its own and the next request picks
|
||||
up the new run.
|
||||
|
||||
## Architecture
|
||||
|
||||
Three layers, smallest blast radius first:
|
||||
|
||||
```
|
||||
browser ─SSE─► l4d2web routes/log_routes.py
|
||||
│
|
||||
▼
|
||||
l4d2web services/l4d2_facade.stream_server_logs
|
||||
│ (l4d2ctl logs <unit> --lines N --follow)
|
||||
▼
|
||||
l4d2host CLI logs → service_control.stream_journal
|
||||
│
|
||||
┌────────────┴──────────────────┐
|
||||
▼ ▼
|
||||
sudo left4me-systemctl show sudo left4me-journalctl
|
||||
-p ActiveState <unit> --invocation-id <hex32>
|
||||
-p SubState --lines N --follow|--no-follow
|
||||
-p InvocationID ←NEW
|
||||
```
|
||||
|
||||
`l4d2host.service_control.stream_journal` becomes the orchestrator:
|
||||
|
||||
1. Resolve `InvocationID` via `show_service` (already returns a parsed
|
||||
`key=value` dict in `status.py:32-37` — adding a property is harmless).
|
||||
2. **Empty `InvocationID`** (unit never ran):
|
||||
- `follow=False` → return `iter(())`.
|
||||
- `follow=True` → yield `""` every ~10 s as a keepalive nudge; poll
|
||||
`InvocationID` every ~3 s; once it appears, fall through to step 3.
|
||||
3. **Non-empty `InvocationID`** → `Popen` the journalctl helper with
|
||||
`--invocation-id <id>`. Start a daemon thread that re-reads the unit's
|
||||
`InvocationID` every ~5 s; if it changes, call `proc.terminate()`.
|
||||
The generator's normal end-of-stream path then closes the SSE response,
|
||||
the browser's `EventSource` reconnects, and the next call picks up the
|
||||
new ID.
|
||||
|
||||
`lines` cap is preserved (journalctl `-n N` inside the invocation), so a
|
||||
long-running server doesn't dump tens of thousands of lines on page-load.
|
||||
|
||||
## Files to change
|
||||
|
||||
### Helpers (`deploy/scripts/libexec/`)
|
||||
|
||||
- **`left4me-systemctl`** — extend `show` to also request
|
||||
`--property=InvocationID`. One-line change at `:43`:
|
||||
|
||||
```sh
|
||||
show) exec "$systemctl" show \
|
||||
--property=ActiveState \
|
||||
--property=SubState \
|
||||
--property=InvocationID \
|
||||
"$unit" ;;
|
||||
```
|
||||
|
||||
- **`left4me-journalctl`** — replace the unit-based filter with an
|
||||
invocation-id-based one. New CLI signature:
|
||||
|
||||
```
|
||||
left4me-journalctl <name> --invocation-id <hex32> --lines <n> --follow|--no-follow
|
||||
```
|
||||
|
||||
Validate `<hex32>` against `^[0-9a-f]{32}$` (32 lowercase hex chars), same
|
||||
defensive style as the existing name validation. Exec:
|
||||
|
||||
```sh
|
||||
exec "$journalctl" \
|
||||
_SYSTEMD_INVOCATION_ID="$invocation_id" \
|
||||
-n "$lines" -o cat $follow_arg
|
||||
```
|
||||
|
||||
No `-u <unit>` — the invocation ID is globally unique, the predicate is
|
||||
enough. Old `<name> --lines <n> --follow` shape is removed (no callers
|
||||
remain after the host layer change).
|
||||
|
||||
### Host (`l4d2host/`)
|
||||
|
||||
- **`l4d2host/service_control.py`**
|
||||
- `journalctl_command(name, *, invocation_id, lines, follow)` → builds
|
||||
the new arg list. Drop the `-u`-based form.
|
||||
- `stream_journal(name, *, lines=200, follow=True)` → orchestrator from
|
||||
Architecture step 1-3 above. Helpers in this file:
|
||||
- `get_invocation_id(name) -> str` (parses `show_service` output;
|
||||
returns `""` if unset).
|
||||
- `_stream_with_restart_guard(invocation_id, lines, follow)` —
|
||||
`Popen` + daemon poller thread.
|
||||
- Keep `stream_command` as-is (still consumed by `host_commands`).
|
||||
|
||||
- **`l4d2host/logs.py`** — no signature change; just forwards.
|
||||
|
||||
- **`l4d2host/cli.py`** — `logs` command keeps `--lines`/`--follow` flags
|
||||
unchanged. No CLI-surface break.
|
||||
|
||||
### Web (`l4d2web/`)
|
||||
|
||||
- **`services/l4d2_facade.py`** — `stream_server_logs` keeps its signature.
|
||||
The behavior change is fully inherited from the host layer.
|
||||
|
||||
- **`routes/log_routes.py`** — unchanged. The existing keepalive logic at
|
||||
`:33` (`if line == "": yield ": keepalive\n\n"`) already handles the
|
||||
empty-line nudges the host yields during the idle wait.
|
||||
|
||||
### Tests
|
||||
|
||||
- **`l4d2host/tests/test_logs.py`**
|
||||
- Update `test_stream_instance_logs_uses_journalctl_helper` for the new
|
||||
arg shape: `["sudo", "-n", "/usr/local/libexec/left4me/left4me-journalctl", "alpha", "--invocation-id", "<32hex>", "--lines", "25", "--no-follow"]`.
|
||||
Stub out `get_invocation_id` to return a known ID.
|
||||
- Add: empty InvocationID + `follow=False` → empty iterator (no
|
||||
journalctl call).
|
||||
- Add: empty InvocationID + `follow=True` → yields `""` then the next
|
||||
`get_invocation_id` returns a real ID and the journalctl helper is
|
||||
called once.
|
||||
- Add: invocation changes mid-stream → poller calls `proc.terminate()`.
|
||||
|
||||
- **`l4d2host/tests/test_cli.py`** — `test_logs_command_streams_lines`:
|
||||
update expected helper invocation, or stub at the `stream_instance_logs`
|
||||
level (it's already monkeypatched in similar tests).
|
||||
|
||||
- **`deploy/scripts/tests/test_journalctl_helper.py`** — update existing
|
||||
shell-syntax & argument-validation test for the new CLI signature.
|
||||
Assert rejection of malformed invocation IDs (too short, non-hex,
|
||||
uppercase, embedded slash).
|
||||
|
||||
- **`l4d2web/tests/test_status_and_server_logs.py`** — should pass
|
||||
unchanged (the SSE shape and route surface haven't moved).
|
||||
|
||||
## Critical files
|
||||
|
||||
- `deploy/scripts/libexec/left4me-systemctl` (extend `show`)
|
||||
- `deploy/scripts/libexec/left4me-journalctl` (rewrite CLI shape)
|
||||
- `l4d2host/l4d2host/service_control.py` (orchestrator + helpers)
|
||||
- `l4d2host/tests/test_logs.py`, `l4d2host/tests/test_cli.py`
|
||||
- `deploy/scripts/tests/test_journalctl_helper.py`
|
||||
- (No changes:) `l4d2web/services/l4d2_facade.py`, `l4d2web/routes/log_routes.py`
|
||||
|
||||
## Why not alternative approaches
|
||||
|
||||
- **`--since <ActiveEnterTimestamp>`** — works in the happy path but is
|
||||
fragile to clock skew, system suspend, and units that restart inside
|
||||
the same second. `_SYSTEMD_INVOCATION_ID` was added to systemd
|
||||
specifically for this filter.
|
||||
- **String-match the systemd `Started …` marker** — locale-dependent,
|
||||
breaks with systemd-message changes, can't survive `Restart=`.
|
||||
- **Toggle in the UI** — user explicitly opted out; YAGNI.
|
||||
|
||||
## Verification
|
||||
|
||||
1. **Unit tests** (sandboxed):
|
||||
|
||||
```
|
||||
uv run --package l4d2host pytest l4d2host/tests/
|
||||
uv run --package l4d2web pytest l4d2web/tests/
|
||||
uv run pytest deploy/scripts/tests/ deploy/tests/
|
||||
```
|
||||
|
||||
2. **Manual on the host** (`ckn@10.0.4.128`):
|
||||
|
||||
```
|
||||
# the unit is running
|
||||
l4d2ctl logs vanilla --no-follow | head -5
|
||||
# → first lines should be from this run's start, not yesterday
|
||||
|
||||
# never-started case (pick an unstarted server, or stop first)
|
||||
l4d2ctl stop vanilla && l4d2ctl logs vanilla --no-follow
|
||||
# → empty output, exit 0
|
||||
```
|
||||
|
||||
3. **End-to-end in browser**:
|
||||
- Open `/servers/1`. Confirm log starts at this run's first line.
|
||||
- Click Stop. Stream goes quiet. Click Start. SSE auto-reconnects and
|
||||
shows the new run from line one.
|
||||
- Open a fresh server that has never been started: log panel is empty
|
||||
but connection is alive; clicking Start makes log appear within seconds.
|
||||
Loading…
Reference in a new issue