left4me/docs/superpowers/plans/2026-05-06-l4d2host-step-logging.md

401 lines
14 KiB
Markdown

# Host Lifecycle Step Logging Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Provide granular, live progress feedback in the web GUI during lifecycle operations without exposing sensitive internal paths or raw subprocess mechanics.
**Architecture:** Modifies `l4d2host/instances.py` to emit safe, high-level step markers to `stdout` via an explicit helper function. These are automatically captured by `l4d2web/services/job_worker.py` and forwarded to the UI via existing SSE endpoints. The job worker will also add boundary markers for top-level operations.
**Tech Stack:** Python, subprocess, pytest
---
### Task 1: Add step logging helper to host library
**Files:**
- Modify: `l4d2host/instances.py`
- Modify: `l4d2host/tests/test_initialize.py`
- [ ] **Step 1: Write the failing test for the helper behavior**
In `l4d2host/tests/test_initialize.py`, add imports and a test verifying `_emit_step` correctly uses the callback or `sys.stdout` depending on arguments:
```python
import sys
from io import StringIO
from l4d2host.instances import _emit_step
def test_emit_step_uses_callback() -> None:
calls: list[str] = []
_emit_step("test step", on_stdout=calls.append, passthrough=False)
assert calls == ["Step: test step"]
def test_emit_step_uses_passthrough_stdout(monkeypatch) -> None:
fake_out = StringIO()
monkeypatch.setattr(sys, "stdout", fake_out)
_emit_step("passthrough step", on_stdout=None, passthrough=True)
assert fake_out.getvalue() == "Step: passthrough step\n"
def test_emit_step_does_nothing_if_no_target() -> None:
_emit_step("silent step", on_stdout=None, passthrough=False)
```
- [ ] **Step 2: Run test to verify it fails**
Run: `pytest l4d2host/tests/test_initialize.py -k test_emit_step -q`
Expected: FAIL due to `_emit_step` not defined.
- [ ] **Step 3: Implement minimal code**
In `l4d2host/instances.py`, add imports (if needed) and the `_emit_step` function at the top of the file:
```python
import sys
def _emit_step(msg: str, on_stdout: Callable[[str], None] | None, passthrough: bool) -> None:
formatted = f"Step: {msg}"
if on_stdout is not None:
on_stdout(formatted)
elif passthrough:
print(formatted, flush=True)
```
- [ ] **Step 4: Run test to verify it passes**
Run: `pytest l4d2host/tests/test_initialize.py -k test_emit_step -q`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add l4d2host/instances.py l4d2host/tests/test_initialize.py
git commit -m "feat(host): add _emit_step helper for lifecycle logging"
```
---
### Task 2: Add step logging to initialize_instance
**Files:**
- Modify: `l4d2host/instances.py`
- Modify: `l4d2host/tests/test_initialize.py`
- [ ] **Step 1: Write the failing test**
In `l4d2host/tests/test_initialize.py`, add a test to verify `initialize_instance` logs the correct steps:
```python
def test_initialize_instance_emits_steps(tmp_path: Path) -> None:
spec = tmp_path / "spec.yaml"
spec.write_text("port: 27015\noverlays: [standard]\n")
steps: list[str] = []
initialize_instance("alpha", spec, root=tmp_path, on_stdout=steps.append)
assert steps == [
"Step: creating instance directories...",
"Step: writing instance.env...",
"Step: writing server.cfg...",
"Step: initialization complete.",
]
```
- [ ] **Step 2: Run test to verify it fails**
Run: `pytest l4d2host/tests/test_initialize.py -k test_initialize_instance_emits_steps -q`
Expected: FAIL because `steps` is empty.
- [ ] **Step 3: Add step logs to initialize_instance**
Modify `initialize_instance` in `l4d2host/instances.py` to include `_emit_step` calls. Use the parameters `on_stdout` and `passthrough`:
```python
_emit_step("creating instance directories...", on_stdout, passthrough)
instance_dir = root / "instances" / name
runtime_dir = root / "runtime" / name
(runtime_dir / "upper").mkdir(parents=True, exist_ok=True)
(runtime_dir / "work").mkdir(parents=True, exist_ok=True)
(runtime_dir / "merged").mkdir(parents=True, exist_ok=True)
instance_dir.mkdir(parents=True, exist_ok=True)
lowerdirs = [str(overlay_path(overlay, root=root)) for overlay in spec.overlays]
lowerdirs.append(str(root / "installation"))
_emit_step("writing instance.env...", on_stdout, passthrough)
instance_env = "\n".join(
[
f"L4D2_PORT={spec.port}",
f"L4D2_ARGS={' '.join(spec.arguments)}",
f"L4D2_LOWERDIRS={':'.join(lowerdirs)}",
]
) + "\n"
(instance_dir / "instance.env").write_text(instance_env)
_emit_step("writing server.cfg...", on_stdout, passthrough)
server_cfg = "\n".join(spec.config) if spec.config else ""
(instance_dir / "server.cfg").write_text(server_cfg)
_emit_step("initialization complete.", on_stdout, passthrough)
```
- [ ] **Step 4: Run test to verify it passes**
Run: `pytest l4d2host/tests/test_initialize.py -k test_initialize_instance_emits_steps -q`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add l4d2host/instances.py l4d2host/tests/test_initialize.py
git commit -m "feat(host): emit steps during initialize_instance"
```
---
### Task 3: Add step logging to start_instance
**Files:**
- Modify: `l4d2host/instances.py`
- Modify: `l4d2web/tests/test_job_worker.py` (adjusting mocks that expect `on_stdout` logic)
- [ ] **Step 1: Check test suite safety net for start/stop**
We don't have dedicated host unit tests for `start`/`stop`/`delete` due to their systemd/fuse dependency, but they are mocked in web tests (`l4d2web/tests/test_job_worker.py` and `test_l4d2_facade.py`).
We must ensure our `_emit_step` calls inside `start_instance` don't break existing tests, and we'll add the logs to the implementation.
- [ ] **Step 2: Add step logs to start_instance**
In `l4d2host/instances.py`, modify `start_instance`:
```python
env = _load_instance_env(instance_dir / "instance.env")
_emit_step("mounting runtime overlay...", on_stdout, passthrough)
run_command(
[
"fuse-overlayfs",
"-o",
(
f"lowerdir={env['L4D2_LOWERDIRS']},"
f"upperdir={runtime_dir / 'upper'},"
f"workdir={runtime_dir / 'work'}"
),
str(runtime_dir / "merged"),
],
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
should_cancel=should_cancel,
)
_emit_step("copying server.cfg to runtime...", on_stdout, passthrough)
target_cfg = runtime_dir / "merged" / "left4dead2" / "cfg" / "server.cfg"
target_cfg.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(instance_dir / "server.cfg", target_cfg)
_emit_step("starting systemd service...", on_stdout, passthrough)
start_service(
name,
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
should_cancel=should_cancel,
)
_emit_step("start complete.", on_stdout, passthrough)
```
- [ ] **Step 3: Add step logs to stop_instance**
In `l4d2host/instances.py`, modify `stop_instance`:
```python
_emit_step("stopping systemd service...", on_stdout, passthrough)
stop_service(
name,
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
should_cancel=should_cancel,
)
_emit_step("unmounting runtime overlay...", on_stdout, passthrough)
run_command(
["fusermount3", "-u", str(root / "runtime" / name / "merged")],
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
should_cancel=should_cancel,
)
_emit_step("stop complete.", on_stdout, passthrough)
```
- [ ] **Step 4: Add step logs to delete_instance**
In `l4d2host/instances.py`, modify `delete_instance`:
```python
if not instance_dir.exists() and not runtime_dir.exists():
return
_emit_step("stopping systemd service (if running)...", on_stdout, passthrough)
stop_service(
name,
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
should_cancel=should_cancel,
)
merged = runtime_dir / "merged"
if merged.is_mount():
_emit_step("unmounting runtime overlay (if mounted)...", on_stdout, passthrough)
run_command(
["fusermount3", "-u", str(merged)],
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
should_cancel=should_cancel,
)
_emit_step("removing instance files...", on_stdout, passthrough)
if instance_dir.exists():
shutil.rmtree(instance_dir)
if runtime_dir.exists():
shutil.rmtree(runtime_dir)
_emit_step("delete complete.", on_stdout, passthrough)
```
- [ ] **Step 5: Run full test suite**
Run: `pytest l4d2host/tests l4d2web/tests -q`
Expected: PASS. If tests fail because of expected outputs in mock fake `run_command` handlers, update the mock assertions in the tests to allow the newly emitted step lines.
- [ ] **Step 6: Commit**
```bash
git add l4d2host/instances.py
git commit -m "feat(host): emit steps during start, stop, and delete operations"
```
---
### Task 4: Add boundary markers to job worker
**Files:**
- Modify: `l4d2web/services/job_worker.py`
- Modify: `l4d2web/tests/test_job_worker.py`
- [ ] **Step 1: Write failing tests for job worker boundaries**
In `l4d2web/tests/test_job_worker.py`, update `test_successful_start_job_logs_and_refreshes_server_state` to expect the synthetic boundary lines. Also, modify the mock to simulate actual stdout lines instead of arbitrary "initialized" text:
```python
def fake_initialize(server_id, *, on_stdout=None, on_stderr=None, should_cancel=None):
del should_cancel
calls.append(("initialize", server_id))
on_stdout("Step: creating instance directories...")
def fake_start(server_id, *, on_stdout=None, on_stderr=None, should_cancel=None):
del should_cancel
calls.append(("start", server_id))
on_stdout("Step: mounting runtime overlay...")
```
Change the `assert lines == ` section at the bottom of the test:
```python
assert lines == [
(1, "stdout", "starting initialize for alpha"),
(2, "stdout", "Step: creating instance directories..."),
(3, "stdout", "finished initialize successfully"),
(4, "stdout", "starting start for alpha"),
(5, "stdout", "Step: mounting runtime overlay..."),
(6, "stdout", "finished start successfully"),
]
```
- [ ] **Step 2: Run test to verify it fails**
Run: `pytest l4d2web/tests/test_job_worker.py::test_successful_start_job_logs_and_refreshes_server_state -v`
Expected: FAIL because boundary markers are missing.
- [ ] **Step 3: Implement boundary logging in job worker**
In `l4d2web/services/job_worker.py`, modify the `run_job` execution block:
First, add a helper to fetch the server name if `server_id` is present:
```python
server_name = None
if server_id is not None:
with session_scope() as db:
s = db.scalar(select(Server).where(Server.id == server_id))
server_name = s.name if s else "unknown"
```
Then, wrap the facade calls:
```python
def _run_with_boundaries(action: str, target: str, func, *args, **kwargs):
append_job_log_line(job_id, "stdout", f"starting {action} for {target}", max_chars=max_chars)
func(*args, **kwargs)
append_job_log_line(job_id, "stdout", f"finished {action} successfully", max_chars=max_chars)
try:
if operation == "install":
_run_with_boundaries("install", "server", l4d2_facade.install_runtime, on_stdout=on_stdout, on_stderr=on_stderr, should_cancel=should_cancel)
elif operation in SERVER_OPERATIONS and server_id is None:
raise ValueError(f"{operation} job has no server_id")
elif operation == "initialize":
_run_with_boundaries("initialize", server_name, l4d2_facade.initialize_server, server_id, on_stdout=on_stdout, on_stderr=on_stderr, should_cancel=should_cancel)
elif operation == "start":
_run_with_boundaries("initialize", server_name, l4d2_facade.initialize_server, server_id, on_stdout=on_stdout, on_stderr=on_stderr, should_cancel=should_cancel)
raise_if_cancelled()
_run_with_boundaries("start", server_name, l4d2_facade.start_server, server_id, on_stdout=on_stdout, on_stderr=on_stderr, should_cancel=should_cancel)
elif operation == "stop":
_run_with_boundaries("stop", server_name, l4d2_facade.stop_server, server_id, on_stdout=on_stdout, on_stderr=on_stderr, should_cancel=should_cancel)
elif operation == "delete":
_run_with_boundaries("delete", server_name, l4d2_facade.delete_server, server_id, on_stdout=on_stdout, on_stderr=on_stderr, should_cancel=should_cancel)
else:
raise ValueError(f"unknown job operation: {operation}")
```
- [ ] **Step 4: Run test to verify it passes**
Run: `pytest l4d2web/tests/test_job_worker.py::test_successful_start_job_logs_and_refreshes_server_state -v`
Expected: PASS
- [ ] **Step 5: Verify remaining job worker tests**
Run: `pytest l4d2web/tests/test_job_worker.py -q`
Some tests might fail if they explicitly check log content. Fix them to accommodate the boundary lines. For instance, `test_called_process_error_fails_job_and_sets_server_error` might need the new boundary lines in its assertions.
Example fix in `test_called_process_error_fails_job_and_sets_server_error`:
```python
assert "starting stop for alpha" in lines
assert "stop failed" in lines
```
Example fix in `test_cancelled_process_finishes_job_as_cancelled`:
```python
assert "starting stop for alpha" in lines
assert "terminating" in lines
```
- [ ] **Step 6: Commit**
```bash
git add l4d2web/services/job_worker.py l4d2web/tests/test_job_worker.py
git commit -m "feat(web): add boundary log lines to job worker execution"
```
---
### Task 5: Verify full test suite and clean up
- [ ] **Step 1: Run all tests**
Run: `pytest l4d2host/tests l4d2web/tests deploy/tests -q`
Expected: ALL PASS.
- [ ] **Step 2: Push changes**
```bash
git push
```