left4me/docs/superpowers/plans/2026-05-15-server-log-current-invocation.md
mwiegand 2adf42655e
plan(server-log-current-invocation): scope server log to last unit start
Today the Server Log panel shows the last 200 lines of the unit's entire
journal — mixing the current run with leftovers from prior starts. Filter
on systemd's per-(re)start InvocationID so the panel begins at the most
recent start, idles with keepalives when the unit has never run, and
force-disconnects on restart so the SSE client reconnects to the new run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 22:31:53 +02:00

8 KiB

Plan: scope Server Log to the current unit invocation

Context

Today the Server Log panel on /servers/<id> shows the last 200 lines of the unit's entire journal — i.e. across every prior start / stop / reset cycle — then follows. That means a freshly-started server can mix lines from the current boot with leftovers from yesterday, which makes the log harder to reason about. The user wants the panel to begin at the most recent unit start.

The right systemd primitive is _SYSTEMD_INVOCATION_ID: systemd assigns a fresh 128-bit ID to every (re)start of a unit, queryable via systemctl show -p InvocationID --value <unit>. Filtering journalctl _SYSTEMD_INVOCATION_ID=<id> gives exactly that one run.

User decisions (already confirmed):

  • Scope — always last invocation; no toggle, no historical view.
  • Empty case (unit has never been started) — SSE stays open with keepalives, yields no data lines, attaches when first invocation appears.
  • Mid-stream restart — backend force-disconnects when the InvocationID changes. EventSource reconnects on its own and the next request picks up the new run.

Architecture

Three layers, smallest blast radius first:

browser  ─SSE─►  l4d2web routes/log_routes.py
                     │
                     ▼
                 l4d2web services/l4d2_facade.stream_server_logs
                     │  (l4d2ctl logs <unit> --lines N --follow)
                     ▼
                 l4d2host CLI logs → service_control.stream_journal
                     │
        ┌────────────┴──────────────────┐
        ▼                               ▼
sudo left4me-systemctl show      sudo left4me-journalctl
  -p ActiveState                  <unit> --invocation-id <hex32>
  -p SubState                       --lines N --follow|--no-follow
  -p InvocationID  ←NEW

l4d2host.service_control.stream_journal becomes the orchestrator:

  1. Resolve InvocationID via show_service (already returns a parsed key=value dict in status.py:32-37 — adding a property is harmless).
  2. Empty InvocationID (unit never ran):
    • follow=False → return iter(()).
    • follow=True → yield "" every ~10 s as a keepalive nudge; poll InvocationID every ~3 s; once it appears, fall through to step 3.
  3. Non-empty InvocationIDPopen the journalctl helper with --invocation-id <id>. Start a daemon thread that re-reads the unit's InvocationID every ~5 s; if it changes, call proc.terminate(). The generator's normal end-of-stream path then closes the SSE response, the browser's EventSource reconnects, and the next call picks up the new ID.

lines cap is preserved (journalctl -n N inside the invocation), so a long-running server doesn't dump tens of thousands of lines on page-load.

Files to change

Helpers (deploy/scripts/libexec/)

  • left4me-systemctl — extend show to also request --property=InvocationID. One-line change at :43:

    show) exec "$systemctl" show \
              --property=ActiveState \
              --property=SubState \
              --property=InvocationID \
              "$unit" ;;
    
  • left4me-journalctl — replace the unit-based filter with an invocation-id-based one. New CLI signature:

    left4me-journalctl <name> --invocation-id <hex32> --lines <n> --follow|--no-follow
    

    Validate <hex32> against ^[0-9a-f]{32}$ (32 lowercase hex chars), same defensive style as the existing name validation. Exec:

    exec "$journalctl" \
         _SYSTEMD_INVOCATION_ID="$invocation_id" \
         -n "$lines" -o cat $follow_arg
    

    No -u <unit> — the invocation ID is globally unique, the predicate is enough. Old <name> --lines <n> --follow shape is removed (no callers remain after the host layer change).

Host (l4d2host/)

  • l4d2host/service_control.py

    • journalctl_command(name, *, invocation_id, lines, follow) → builds the new arg list. Drop the -u-based form.
    • stream_journal(name, *, lines=200, follow=True) → orchestrator from Architecture step 1-3 above. Helpers in this file:
      • get_invocation_id(name) -> str (parses show_service output; returns "" if unset).
      • _stream_with_restart_guard(invocation_id, lines, follow)Popen + daemon poller thread.
    • Keep stream_command as-is (still consumed by host_commands).
  • l4d2host/logs.py — no signature change; just forwards.

  • l4d2host/cli.pylogs command keeps --lines/--follow flags unchanged. No CLI-surface break.

Web (l4d2web/)

  • services/l4d2_facade.pystream_server_logs keeps its signature. The behavior change is fully inherited from the host layer.

  • routes/log_routes.py — unchanged. The existing keepalive logic at :33 (if line == "": yield ": keepalive\n\n") already handles the empty-line nudges the host yields during the idle wait.

Tests

  • l4d2host/tests/test_logs.py

    • Update test_stream_instance_logs_uses_journalctl_helper for the new arg shape: ["sudo", "-n", "/usr/local/libexec/left4me/left4me-journalctl", "alpha", "--invocation-id", "<32hex>", "--lines", "25", "--no-follow"]. Stub out get_invocation_id to return a known ID.
    • Add: empty InvocationID + follow=False → empty iterator (no journalctl call).
    • Add: empty InvocationID + follow=True → yields "" then the next get_invocation_id returns a real ID and the journalctl helper is called once.
    • Add: invocation changes mid-stream → poller calls proc.terminate().
  • l4d2host/tests/test_cli.pytest_logs_command_streams_lines: update expected helper invocation, or stub at the stream_instance_logs level (it's already monkeypatched in similar tests).

  • deploy/scripts/tests/test_journalctl_helper.py — update existing shell-syntax & argument-validation test for the new CLI signature. Assert rejection of malformed invocation IDs (too short, non-hex, uppercase, embedded slash).

  • l4d2web/tests/test_status_and_server_logs.py — should pass unchanged (the SSE shape and route surface haven't moved).

Critical files

  • deploy/scripts/libexec/left4me-systemctl (extend show)
  • deploy/scripts/libexec/left4me-journalctl (rewrite CLI shape)
  • l4d2host/l4d2host/service_control.py (orchestrator + helpers)
  • l4d2host/tests/test_logs.py, l4d2host/tests/test_cli.py
  • deploy/scripts/tests/test_journalctl_helper.py
  • (No changes:) l4d2web/services/l4d2_facade.py, l4d2web/routes/log_routes.py

Why not alternative approaches

  • --since <ActiveEnterTimestamp> — works in the happy path but is fragile to clock skew, system suspend, and units that restart inside the same second. _SYSTEMD_INVOCATION_ID was added to systemd specifically for this filter.
  • String-match the systemd Started … marker — locale-dependent, breaks with systemd-message changes, can't survive Restart=.
  • Toggle in the UI — user explicitly opted out; YAGNI.

Verification

  1. Unit tests (sandboxed):

    uv run --package l4d2host pytest l4d2host/tests/
    uv run --package l4d2web pytest l4d2web/tests/
    uv run pytest deploy/scripts/tests/ deploy/tests/
    
  2. Manual on the host (ckn@10.0.4.128):

    # the unit is running
    l4d2ctl logs vanilla --no-follow | head -5
    # → first lines should be from this run's start, not yesterday
    
    # never-started case (pick an unstarted server, or stop first)
    l4d2ctl stop vanilla && l4d2ctl logs vanilla --no-follow
    # → empty output, exit 0
    
  3. End-to-end in browser:

    • Open /servers/1. Confirm log starts at this run's first line.
    • Click Stop. Stream goes quiet. Click Start. SSE auto-reconnects and shows the new run from line one.
    • Open a fresh server that has never been started: log panel is empty but connection is alive; clicking Start makes log appear within seconds.