fix(l4d2): stabilize host smoke lifecycle

This commit is contained in:
mwiegand 2026-05-05 23:23:26 +02:00
parent 2621b56627
commit 16412f37f2
No known key found for this signature in database
6 changed files with 773 additions and 4 deletions

View file

@ -135,14 +135,22 @@ def delete_instance(
if not instance_dir.exists() and not runtime_dir.exists():
return
stop_instance(
name,
root=root,
run_command(
["systemctl", "--user", "stop", f"l4d2@{name}.service"],
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
)
merged = runtime_dir / "merged"
if merged.is_mount():
run_command(
["fusermount3", "-u", str(merged)],
on_stdout=on_stdout,
on_stderr=on_stderr,
passthrough=passthrough,
)
if instance_dir.exists():
shutil.rmtree(instance_dir)
if runtime_dir.exists():

View file

@ -31,7 +31,7 @@ def run_command(
)
def pump(
stream: subprocess.Popen[str].stdout,
stream,
sink: list[str],
callback: Callable[[str], None] | None,
output_stream,

View file

@ -31,3 +31,23 @@ def test_start_order(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
def test_delete_missing_is_noop(tmp_path: Path) -> None:
delete_instance("missing", root=tmp_path)
def test_delete_stopped_instance_removes_dirs_without_unmounting(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
calls: list[list[str]] = []
def fake_run_command(cmd, **kwargs):
del kwargs
calls.append(list(cmd))
(tmp_path / "instances" / "alpha").mkdir(parents=True)
(tmp_path / "runtime" / "alpha" / "merged").mkdir(parents=True)
monkeypatch.setattr("l4d2host.instances.run_command", fake_run_command)
delete_instance("alpha", root=tmp_path)
assert not (tmp_path / "instances" / "alpha").exists()
assert not (tmp_path / "runtime" / "alpha").exists()
assert ["systemctl", "--user", "stop", "l4d2@alpha.service"] in calls
assert not any(call[0] == "fusermount3" for call in calls)

View file

@ -1,3 +1,4 @@
import inspect
import subprocess
import pytest
@ -20,3 +21,8 @@ def test_callbacks_receive_lines() -> None:
def test_nonzero_exit_raises() -> None:
with pytest.raises(subprocess.CalledProcessError):
run_command(["python3", "-c", "import sys; sys.exit(7)"])
def test_run_command_avoids_runtime_unsafe_nested_annotations() -> None:
source = inspect.getsource(run_command)
assert "subprocess.Popen[str].stdout" not in source

View file

@ -0,0 +1,565 @@
# L4D2 Host Smoke Test Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Validate the implemented `l4d2host` library and `l4d2ctl` CLI on disposable host `ckn@10.0.4.128` with explicit user approval before every server-touching phase.
**Architecture:** This is a gated smoke-test runbook, not a feature implementation. Each task executes one bounded phase on the target server, captures command evidence, stops, and asks for approval before the next phase. The host library remains unchanged unless the smoke test identifies a defect that requires a separate fix plan.
**Tech Stack:** SSH, sudo, Python virtualenv/pip, Typer CLI entry point, SteamCMD, fuse-overlayfs/fuse3, systemd user services, journald, `/opt/l4d2` runtime paths.
---
## Source Design
- `docs/superpowers/specs/2026-05-05-l4d2-host-smoke-test-design.md`
## Gating Rule
Before running any task below, ask the user for explicit approval. After running a task, report evidence and stop. Do not continue to the next task until the user approves it.
Use this approval prompt before each task, replacing `N` and the title with the concrete task number and task title from this plan:
```text
Approve Task 1: read-only inspection on ckn@10.0.4.128?
```
If any command fails, stop immediately and report:
```text
Failed command: the exact command that failed
Exit/status: the observed exit code, signal, or SSH failure status
Relevant stdout/stderr: the shortest excerpt that explains the failure
Category: environment issue | host-lib bug | packaging/deploy issue | unclear
Recommended next action: one concrete next step based on the observed failure
```
Do not perform cleanup after a failure unless the user approves cleanup.
## Files And Runtime Locations
- Read: `docs/superpowers/specs/2026-05-05-l4d2-host-smoke-test-design.md`
- Read: `components/l4d2-host-lib/pyproject.toml`
- Read: `components/l4d2-host-lib/src/l4d2host/**`
- Remote create: `~/l4d2host-smoke/`
- Remote create: `~/l4d2host-smoke/.venv/`
- Remote create: `~/l4d2host-smoke/specs/smoke.yaml`
- Remote create: `~/l4d2host-smoke/logs/`
- Remote create/modify: `/opt/l4d2/`
- Remote create/modify: `/opt/l4d2/installation/`
- Remote create/delete: `/opt/l4d2/instances/smoke/`
- Remote create/delete: `/opt/l4d2/runtime/smoke/`
- Remote create/modify: `/home/ckn/.config/systemd/user/l4d2@.service`
- Local temporary create: `/var/folders/h4/nnvk2kxs2sv7nr32kmb_4dm40000gn/T/opencode/l4d2-host-lib-smoke.tar.gz`
### Task 1: Read-Only Server Inspection
**Files:**
- Read remote host state only.
- Do not create, modify, mount, install, start, stop, or delete anything.
- [ ] **Step 1: Ask for approval**
Ask:
```text
Approve Task 1: read-only inspection on ckn@10.0.4.128?
```
Expected: user explicitly approves before commands are run.
- [ ] **Step 2: Verify SSH identity and sudo availability without changing state**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; printf "user="; whoami; printf "host="; hostname; sudo -n true && printf "sudo=noninteractive\n" || printf "sudo=requires-password-or-unavailable\n"'
```
Expected: output includes `user=ckn`, a hostname, and either `sudo=noninteractive` or `sudo=requires-password-or-unavailable`.
- [ ] **Step 3: Inspect OS, package manager, Python, and runtime commands**
Run:
```bash
ssh ckn@10.0.4.128 'set -u; printf "os_release=\n"; if [ -r /etc/os-release ]; then sed -n "1,12p" /etc/os-release; else uname -a; fi; printf "\npackage_managers=\n"; for c in apt-get dnf yum pacman zypper; do command -v "$c" || true; done; printf "\npython=\n"; command -v python3 || true; python3 --version 2>&1 || true; printf "\nruntime_commands=\n"; for c in steamcmd fuse-overlayfs fusermount3 systemctl journalctl loginctl; do printf "%s=" "$c"; command -v "$c" || true; done'
```
Expected: reports OS details, any available package manager, Python version if installed, and presence/absence of required runtime commands.
- [ ] **Step 4: Inspect systemd user state and `/opt/l4d2` without changing it**
Run:
```bash
ssh ckn@10.0.4.128 'set -u; printf "uid="; id -u; printf "groups="; id -nG; printf "\nlinger=\n"; loginctl show-user "$(whoami)" -p Linger 2>/dev/null || true; printf "\nsystemd_user=\n"; XDG_RUNTIME_DIR="/run/user/$(id -u)" systemctl --user is-system-running 2>&1 || true; printf "\nopt_l4d2=\n"; if [ -e /opt/l4d2 ]; then ls -ld /opt/l4d2 /opt/l4d2/* 2>/dev/null || true; else printf "/opt/l4d2 missing\n"; fi; printf "\nmounts=\n"; mount | grep /opt/l4d2 || true'
```
Expected: reports UID/groups, lingering state, systemd user status, `/opt/l4d2` state, and existing `/opt/l4d2` mounts if any.
- [ ] **Step 5: Report findings and stop**
Report:
```text
Task 1 evidence:
- SSH identity: report the observed user and hostname
- sudo availability: report whether noninteractive sudo worked
- OS/package manager: report OS family and detected package manager
- Python: report Python path and version, or absence
- runtime commands present/missing: report steamcmd, fuse-overlayfs, fusermount3, systemctl, journalctl, loginctl
- systemd user state: report lingering and systemctl --user result
- /opt/l4d2 state: report whether it exists, ownership, and any mounts
Approve Task 2: server preparation on ckn@10.0.4.128?
```
Expected: no server state has been changed.
### Task 2: Server Preparation
**Files:**
- Remote create/modify: `/opt/l4d2/`
- Remote modify if required: system packages
- Remote modify if required: user lingering for `ckn`
- [ ] **Step 1: Ask for approval**
Ask:
```text
Approve Task 2: server preparation on ckn@10.0.4.128?
```
Expected: user explicitly approves before commands are run.
- [ ] **Step 2: Install baseline packages using the detected package manager**
Run exactly one of these command blocks based on Task 1 package-manager output.
For Debian/Ubuntu with `apt-get`:
```bash
ssh ckn@10.0.4.128 'set -eu; sudo apt-get update; sudo DEBIAN_FRONTEND=noninteractive apt-get install -y python3 python3-venv python3-pip curl ca-certificates tar gzip fuse-overlayfs fuse3'
```
Expected: command exits 0 and packages are installed or already current.
For Fedora/RHEL-like systems with `dnf`:
```bash
ssh ckn@10.0.4.128 'set -eu; sudo dnf install -y python3 python3-pip curl ca-certificates tar gzip fuse-overlayfs fuse3'
```
Expected: command exits 0 and packages are installed or already current.
For RHEL-like systems with `yum` and no `dnf`:
```bash
ssh ckn@10.0.4.128 'set -eu; sudo yum install -y python3 python3-pip curl ca-certificates tar gzip fuse-overlayfs fuse3'
```
Expected: command exits 0 and packages are installed or already current.
For Arch with `pacman`:
```bash
ssh ckn@10.0.4.128 'set -eu; sudo pacman -Sy --noconfirm python python-pip curl ca-certificates tar gzip fuse-overlayfs fuse3'
```
Expected: command exits 0 and packages are installed or already current.
- [ ] **Step 3: Install SteamCMD from Valve tarball**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; sudo mkdir -p /opt/steamcmd; curl -fsSL https://steamcdn-a.akamaihd.net/client/installer/steamcmd_linux.tar.gz -o /tmp/steamcmd_linux.tar.gz; sudo tar -xzf /tmp/steamcmd_linux.tar.gz -C /opt/steamcmd; sudo ln -sf /opt/steamcmd/steamcmd.sh /usr/local/bin/steamcmd; steamcmd +quit'
```
Expected: `steamcmd +quit` exits 0 after bootstrapping or verifying SteamCMD.
- [ ] **Step 4: Prepare `/opt/l4d2` and systemd user prerequisites**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; sudo mkdir -p /opt/l4d2/installation /opt/l4d2/overlays /opt/l4d2/instances /opt/l4d2/runtime; sudo chown -R ckn:ckn /opt/l4d2; sudo loginctl enable-linger ckn; mkdir -p "$HOME/.config/systemd/user"; XDG_RUNTIME_DIR="/run/user/$(id -u)" systemctl --user daemon-reload || true'
```
Expected: `/opt/l4d2` exists and is owned by `ckn`; lingering is enabled; systemd user daemon reload either succeeds or reports a diagnosable user-manager issue.
- [ ] **Step 5: Verify prepared commands and writable runtime paths**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; command -v python3; python3 --version; command -v steamcmd; command -v fuse-overlayfs; command -v fusermount3; test -w /opt/l4d2; ls -ld /opt/l4d2 /opt/l4d2/installation /opt/l4d2/overlays /opt/l4d2/instances /opt/l4d2/runtime; XDG_RUNTIME_DIR="/run/user/$(id -u)" systemctl --user is-system-running 2>&1 || true'
```
Expected: all `command -v` checks print paths, `/opt/l4d2` is writable, and systemd user state is visible.
- [ ] **Step 6: Report findings and stop**
Report:
```text
Task 2 evidence:
- package installation: report package manager used and install command status
- steamcmd bootstrap: report steamcmd path and bootstrap command status
- /opt/l4d2 ownership: report owner/group and writability for ckn
- systemd user/linger state: report loginctl linger value and systemctl --user result
Approve Task 3: deploy current host lib on ckn@10.0.4.128?
```
Expected: server is prepared for host-lib deployment.
### Task 3: Deploy Current Host Lib
**Files:**
- Local create: `/var/folders/h4/nnvk2kxs2sv7nr32kmb_4dm40000gn/T/opencode/l4d2-host-lib-smoke.tar.gz`
- Remote create/modify: `~/l4d2host-smoke/`
- Remote create/modify: `~/l4d2host-smoke/.venv/`
- [ ] **Step 1: Ask for approval**
Ask:
```text
Approve Task 3: deploy current host lib on ckn@10.0.4.128?
```
Expected: user explicitly approves before commands are run.
- [ ] **Step 2: Create local source archive from current host-lib component**
Run from repository root:
```bash
tar --exclude='*.pyc' --exclude='__pycache__' --exclude='.pytest_cache' --exclude='*.egg-info' -C components/l4d2-host-lib -czf /var/folders/h4/nnvk2kxs2sv7nr32kmb_4dm40000gn/T/opencode/l4d2-host-lib-smoke.tar.gz .
```
Expected: command exits 0 and archive exists at `/var/folders/h4/nnvk2kxs2sv7nr32kmb_4dm40000gn/T/opencode/l4d2-host-lib-smoke.tar.gz`.
- [ ] **Step 3: Copy source archive to remote host and unpack it**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; rm -rf "$HOME/l4d2host-smoke"; mkdir -p "$HOME/l4d2host-smoke/src" "$HOME/l4d2host-smoke/logs" "$HOME/l4d2host-smoke/specs"'
```
Expected: remote smoke workspace is recreated.
Run:
```bash
scp /var/folders/h4/nnvk2kxs2sv7nr32kmb_4dm40000gn/T/opencode/l4d2-host-lib-smoke.tar.gz ckn@10.0.4.128:~/l4d2host-smoke/l4d2-host-lib-smoke.tar.gz
```
Expected: archive copies successfully.
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; tar -xzf "$HOME/l4d2host-smoke/l4d2-host-lib-smoke.tar.gz" -C "$HOME/l4d2host-smoke/src"; test -f "$HOME/l4d2host-smoke/src/pyproject.toml"; test -f "$HOME/l4d2host-smoke/src/src/l4d2host/cli.py"'
```
Expected: source tree unpacks and expected files exist.
- [ ] **Step 4: Install host lib into remote virtualenv**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; python3 -m venv "$HOME/l4d2host-smoke/.venv"; "$HOME/l4d2host-smoke/.venv/bin/python" -m pip install --upgrade pip; "$HOME/l4d2host-smoke/.venv/bin/python" -m pip install -e "$HOME/l4d2host-smoke/src"'
```
Expected: pip exits 0 and installs `l4d2host` in editable mode.
- [ ] **Step 5: Verify CLI command surface**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; "$HOME/l4d2host-smoke/.venv/bin/l4d2ctl" --help | tee "$HOME/l4d2host-smoke/logs/l4d2ctl-help.log"; grep -E "install|initialize|start|stop|delete" "$HOME/l4d2host-smoke/logs/l4d2ctl-help.log"'
```
Expected: help output includes `install`, `initialize`, `start`, `stop`, and `delete`.
- [ ] **Step 6: Report findings and stop**
Report:
```text
Task 3 evidence:
- archive creation/copy: report local archive path and remote unpack path
- venv/pip install: report virtualenv path and pip install status
- l4d2ctl command surface: report the five commands found in help output
Approve Task 4: run l4d2ctl install on ckn@10.0.4.128?
```
Expected: host lib is installed on the server but no L4D2 server files have been downloaded by this task.
### Task 4: Run `l4d2ctl install`
**Files:**
- Remote create/modify: `/opt/l4d2/installation/`
- Remote create: `~/l4d2host-smoke/logs/install.log`
- [ ] **Step 1: Ask for approval**
Ask:
```text
Approve Task 4: run l4d2ctl install on ckn@10.0.4.128?
```
Expected: user explicitly approves before commands are run.
- [ ] **Step 2: Run install command and capture output**
Run with a long timeout when executing:
```bash
ssh ckn@10.0.4.128 'bash -lc "set -o pipefail; \"$HOME/l4d2host-smoke/.venv/bin/l4d2ctl\" install 2>&1 | tee \"$HOME/l4d2host-smoke/logs/install.log\""'
```
Expected: command exits 0 after SteamCMD completes Windows and Linux platform app updates for app `222860`.
- [ ] **Step 3: Inspect installation output paths**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; test -d /opt/l4d2/installation; find /opt/l4d2/installation -maxdepth 3 \( -name srcds_run -o -name left4dead2 \) -print; du -sh /opt/l4d2/installation; tail -n 40 "$HOME/l4d2host-smoke/logs/install.log"'
```
Expected: output includes `/opt/l4d2/installation/srcds_run`, a `left4dead2` path, install directory size, and recent SteamCMD log lines.
- [ ] **Step 4: Report findings and stop**
Report:
```text
Task 4 evidence:
- l4d2ctl install exit: report exit status and SteamCMD completion status
- installed paths: report srcds_run and left4dead2 paths found under /opt/l4d2/installation
- install log excerpt: report the last relevant SteamCMD lines
Approve Task 5: run smoke instance lifecycle on ckn@10.0.4.128?
```
Expected: L4D2 dedicated server files exist under `/opt/l4d2/installation`.
### Task 5: Run Instance Lifecycle Smoke Test
**Files:**
- Remote create: `~/l4d2host-smoke/specs/smoke.yaml`
- Remote create/modify/delete: `/opt/l4d2/instances/smoke/`
- Remote create/modify/delete: `/opt/l4d2/runtime/smoke/`
- Remote create/modify: `/home/ckn/.config/systemd/user/l4d2@.service`
- Remote create: `~/l4d2host-smoke/logs/lifecycle.log`
- [ ] **Step 1: Ask for approval**
Ask:
```text
Approve Task 5: run smoke instance lifecycle on ckn@10.0.4.128?
```
Expected: user explicitly approves before commands are run.
- [ ] **Step 2: Create minimal smoke spec**
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; mkdir -p "$HOME/l4d2host-smoke/specs"; printf "%s\n" "port: 27015" "arguments:" " - -insecure" " - +map" " - c5m1_waterfront" "config:" " - hostname left4me-smoke" > "$HOME/l4d2host-smoke/specs/smoke.yaml"; sed -n "1,20p" "$HOME/l4d2host-smoke/specs/smoke.yaml"'
```
Expected: spec file shows port `27015`, no overlays, three arguments, and one config line.
- [ ] **Step 3: Initialize the smoke instance**
Run:
```bash
ssh ckn@10.0.4.128 'bash -lc "set -o pipefail; \"$HOME/l4d2host-smoke/.venv/bin/l4d2ctl\" initialize smoke -f \"$HOME/l4d2host-smoke/specs/smoke.yaml\" 2>&1 | tee \"$HOME/l4d2host-smoke/logs/initialize.log\""'
```
Expected: command exits 0.
Run:
```bash
ssh ckn@10.0.4.128 'set -eu; test -f /opt/l4d2/instances/smoke/instance.env; test -f /opt/l4d2/instances/smoke/server.cfg; test -d /opt/l4d2/runtime/smoke/upper; test -d /opt/l4d2/runtime/smoke/work; test -d /opt/l4d2/runtime/smoke/merged; sed -n "1,20p" /opt/l4d2/instances/smoke/instance.env; sed -n "1,20p" /opt/l4d2/instances/smoke/server.cfg; test -f "$HOME/.config/systemd/user/l4d2@.service"'
```
Expected: instance files and runtime directories exist; `instance.env` contains `L4D2_PORT=27015` and `L4D2_LOWERDIRS=/opt/l4d2/installation`.
- [ ] **Step 4: Start the smoke instance**
Run:
```bash
ssh ckn@10.0.4.128 'bash -lc "set -o pipefail; XDG_RUNTIME_DIR=/run/user/$(id -u) \"$HOME/l4d2host-smoke/.venv/bin/l4d2ctl\" start smoke 2>&1 | tee \"$HOME/l4d2host-smoke/logs/start.log\""'
```
Expected: command exits 0 or fails with actionable stdout/stderr that identifies an environment or host-lib issue.
- [ ] **Step 5: Inspect service, mount, status API, and logs API**
Run:
```bash
ssh ckn@10.0.4.128 'set -u; XDG_RUNTIME_DIR="/run/user/$(id -u)" systemctl --user status l4d2@smoke.service --no-pager 2>&1 | sed -n "1,80p"; printf "\nmount_state=\n"; mount | grep "/opt/l4d2/runtime/smoke/merged" || true; printf "\nstatus_api=\n"; "$HOME/l4d2host-smoke/.venv/bin/python" -c "from l4d2host.status import get_instance_status; print(get_instance_status(\"smoke\"))"; printf "\nlogs_api=\n"; "$HOME/l4d2host-smoke/.venv/bin/python" -c "from itertools import islice; from l4d2host.logs import stream_instance_logs; [print(line) for line in islice(stream_instance_logs(\"smoke\", lines=50, follow=False), 20)]"'
```
Expected: service status is visible, mount state is reported, status API prints an `InstanceStatus`, and logs API prints up to 20 recent journal lines.
- [ ] **Step 6: Stop the smoke instance**
Run:
```bash
ssh ckn@10.0.4.128 'bash -lc "set -o pipefail; XDG_RUNTIME_DIR=/run/user/$(id -u) \"$HOME/l4d2host-smoke/.venv/bin/l4d2ctl\" stop smoke 2>&1 | tee \"$HOME/l4d2host-smoke/logs/stop.log\""'
```
Expected: command exits 0, service stops, and overlay unmount command succeeds.
- [ ] **Step 7: Delete the smoke instance and verify repeated delete**
Run:
```bash
ssh ckn@10.0.4.128 'bash -lc "set -o pipefail; XDG_RUNTIME_DIR=/run/user/$(id -u) \"$HOME/l4d2host-smoke/.venv/bin/l4d2ctl\" delete smoke 2>&1 | tee \"$HOME/l4d2host-smoke/logs/delete-1.log\""'
```
Expected: command exits 0 and removes `/opt/l4d2/instances/smoke` and `/opt/l4d2/runtime/smoke`.
Run:
```bash
ssh ckn@10.0.4.128 'set -u; if [ -e /opt/l4d2/instances/smoke ] || [ -e /opt/l4d2/runtime/smoke ]; then ls -ld /opt/l4d2/instances/smoke /opt/l4d2/runtime/smoke 2>/dev/null; exit 1; fi; printf "smoke instance/runtime removed\n"'
```
Expected: output is `smoke instance/runtime removed`.
Run:
```bash
ssh ckn@10.0.4.128 'bash -lc "set -o pipefail; XDG_RUNTIME_DIR=/run/user/$(id -u) \"$HOME/l4d2host-smoke/.venv/bin/l4d2ctl\" delete smoke 2>&1 | tee \"$HOME/l4d2host-smoke/logs/delete-2.log\""'
```
Expected: command exits 0, proving missing instance/runtime delete is a no-op success.
- [ ] **Step 8: Report findings and stop**
Report:
```text
Task 5 evidence:
- spec creation: report remote spec path and rendered YAML
- initialize: report exit status and created instance/runtime files
- start: report exit status and start log result
- service/mount status: report systemd user service state and overlay mount state
- status API: report printed InstanceStatus value
- logs API: report whether journal lines were returned
- stop: report exit status and unmount result
- delete and repeated delete: report first delete status, path removal status, and second delete status
Approve Task 6: cleanup decision on ckn@10.0.4.128?
```
Expected: smoke instance has been deleted, and `/opt/l4d2/installation` remains available for later web-app testing unless cleanup removes it.
### Task 6: Cleanup Decision
**Files:**
- Remote optional delete: `~/l4d2host-smoke/`
- Remote optional delete: `/opt/l4d2/installation/`
- Remote optional delete: `/opt/l4d2/`
- [ ] **Step 1: Ask for cleanup preference**
Ask:
```text
Cleanup options for ckn@10.0.4.128:
1. Keep /opt/l4d2/installation and remove only ~/l4d2host-smoke
2. Keep everything for debugging/later web testing
3. Remove all smoke-test artifacts including /opt/l4d2
Which cleanup option should I run?
```
Expected: user selects one option before cleanup commands are run.
- [ ] **Step 2: Run selected cleanup command**
For option 1:
```bash
ssh ckn@10.0.4.128 'set -eu; rm -rf "$HOME/l4d2host-smoke"; printf "kept /opt/l4d2, removed ~/l4d2host-smoke\n"'
```
Expected: remote smoke workspace is removed; `/opt/l4d2/installation` remains.
For option 2:
```bash
ssh ckn@10.0.4.128 'set -eu; printf "kept ~/l4d2host-smoke and /opt/l4d2 for later inspection\n"; ls -ld "$HOME/l4d2host-smoke" /opt/l4d2 /opt/l4d2/installation 2>/dev/null || true'
```
Expected: no cleanup is performed; paths are listed if present.
For option 3:
```bash
ssh ckn@10.0.4.128 'set -eu; rm -rf "$HOME/l4d2host-smoke"; sudo rm -rf /opt/l4d2; printf "removed ~/l4d2host-smoke and /opt/l4d2\n"'
```
Expected: remote smoke workspace and `/opt/l4d2` are removed.
- [ ] **Step 3: Report final state**
Run:
```bash
ssh ckn@10.0.4.128 'set -u; printf "workspace=\n"; ls -ld "$HOME/l4d2host-smoke" 2>/dev/null || printf "~/l4d2host-smoke missing\n"; printf "\nopt_l4d2=\n"; ls -ld /opt/l4d2 /opt/l4d2/installation 2>/dev/null || printf "/opt/l4d2 or installation missing\n"; printf "\nsmoke_mounts=\n"; mount | grep /opt/l4d2 || true; printf "\nsmoke_service=\n"; XDG_RUNTIME_DIR="/run/user/$(id -u)" systemctl --user status l4d2@smoke.service --no-pager 2>&1 | sed -n "1,30p" || true'
```
Expected: final state matches selected cleanup option; no smoke mounts remain.
- [ ] **Step 4: Summarize smoke-test result**
Report:
```text
Host-lib smoke-test result:
- Server inspected: state whether Task 1 completed successfully
- Server prepared: state whether Task 2 completed successfully
- Host lib deployed: state whether Task 3 completed successfully
- l4d2ctl install validated: state whether Task 4 completed successfully
- lifecycle validated: state whether Task 5 completed successfully
- cleanup option applied: state which cleanup option was applied
- host-lib defects found: list defects found, or state that none were found during the smoke test
- recommended next phase: choose web-app lifecycle job wiring or host-lib fix plan based on evidence
```
Expected: final report clearly states whether to proceed to web-app lifecycle job wiring or stop for a host-lib fix.
---
## Self-Review Checklist
- [ ] Spec coverage: all six design steps are represented as gated tasks.
- [ ] Approval constraint: every server-touching task starts with an explicit approval step.
- [ ] Failure policy: failure reporting and no automatic cleanup are documented.
- [ ] Evidence: each task has exact commands and expected evidence.
- [ ] Scope: no web-app implementation is included in this plan.

View file

@ -0,0 +1,170 @@
# L4D2 Host Smoke Test Design
**Goal:** Validate the implemented `l4d2host` library and `l4d2ctl` CLI on the disposable Linux server `ckn@10.0.4.128` before continuing web-app lifecycle job wiring.
**Target Host:** `ckn@10.0.4.128`
**Access Assumption:** SSH access as `ckn` with sudo privileges.
**Primary Constraint:** Ask for explicit user approval before every server-touching step.
## Context
The repository now contains both planned components:
- `components/l4d2-host-lib`: Python host library and `l4d2ctl` CLI.
- `components/l4d2-web-app`: Flask app for users, blueprints, servers, jobs, and logs.
The web app depends on the host library for real lifecycle behavior. Before wiring web lifecycle jobs end-to-end, the host contract should be proven on an actual Linux machine with `steamcmd`, `fuse-overlayfs`, systemd user services, and journald available.
## Scope
The smoke test verifies these host-lib behaviors:
- SSH connectivity and sudo access to `ckn@10.0.4.128`.
- Required runtime tools are present or can be installed: `steamcmd`, `fuse-overlayfs`, `fusermount3`, `systemctl --user`, `journalctl --user`, and Python packaging tooling.
- `/opt/l4d2` exists with permissions that allow the `ckn` user to run the v1 host workflow.
- `l4d2ctl install` downloads or updates the L4D2 dedicated server into `/opt/l4d2/installation`.
- `l4d2ctl initialize smoke -f spec.yaml` writes instance and runtime state under `/opt/l4d2`.
- `l4d2ctl start smoke` mounts the runtime overlay, copies `server.cfg`, and starts the systemd user service.
- `get_instance_status("smoke")` reports an interpretable status.
- `stream_instance_logs("smoke")` can read journald output.
- `l4d2ctl stop smoke` stops the user service and unmounts the runtime overlay.
- `l4d2ctl delete smoke` removes the instance/runtime directories.
- Re-running `l4d2ctl delete smoke` succeeds as a no-op.
## Out Of Scope
- Web-app job execution or UI changes.
- Long-running game-server operations beyond a short start/status/log/stop check.
- Workshop mod management or web-managed overlay file content.
- Production hardening for the disposable test server.
## Execution Strategy
The smoke test is intentionally gated. Each step must stop after reporting evidence and wait for user approval before moving to the next step.
### Step 1: Read-Only Server Inspection
Purpose: understand the target host without changing it.
Allowed actions:
- SSH into `ckn@10.0.4.128`.
- Inspect OS, package manager, current user, sudo availability, Python version, systemd user availability, lingering status, existing `/opt/l4d2` state, and relevant runtime tools.
Not allowed in this step:
- Installing packages.
- Creating or modifying files.
- Starting or stopping services.
- Mounting or unmounting filesystems.
Checkpoint: report findings and ask before any setup changes.
### Step 2: Server Preparation
Purpose: make the disposable server capable of running the host-lib workflow.
Allowed actions after approval:
- Install missing packages needed for the host workflow.
- Create `/opt/l4d2` if missing.
- Set ownership/permissions so `ckn` can run the smoke workflow.
- Configure systemd user prerequisites if required for `systemctl --user`.
Checkpoint: report exact changes and ask before deploying code.
### Step 3: Deploy Current Host Lib
Purpose: install the current repository implementation on the target host without inventing new packaging.
Allowed actions after approval:
- Copy or archive the current `components/l4d2-host-lib` source to the server.
- Install it using its existing `pyproject.toml`, preferably into an isolated virtual environment.
- Verify that `l4d2ctl --help` exposes the fixed v1 command surface.
Checkpoint: report command evidence and ask before downloading server files.
### Step 4: Run `l4d2ctl install`
Purpose: validate the install/update command against real `steamcmd` behavior.
Allowed actions after approval:
- Run `l4d2ctl install` on the target host.
- Capture stdout, stderr, and exit code.
- Inspect `/opt/l4d2/installation` enough to confirm expected installation output.
Checkpoint: report evidence and ask before creating a smoke instance.
### Step 5: Run Instance Lifecycle Smoke Test
Purpose: validate initialize/start/status/logs/stop/delete against the real runtime.
Allowed actions after approval:
- Create a minimal spec file for instance name `smoke`.
- Run `l4d2ctl initialize smoke -f spec.yaml`.
- Run `l4d2ctl start smoke`.
- Check `systemctl --user status l4d2@smoke.service`.
- Check mount state for `/opt/l4d2/runtime/smoke/merged`.
- Call `get_instance_status("smoke")` from Python.
- Call `stream_instance_logs("smoke", lines=50, follow=False)` from Python.
- Run `l4d2ctl stop smoke`.
- Run `l4d2ctl delete smoke`.
- Run `l4d2ctl delete smoke` again to verify no-op success.
Checkpoint: report command evidence and ask what to do with remaining artifacts.
### Step 6: Cleanup Decision
Purpose: preserve useful diagnostics or remove smoke-test state based on user preference.
Allowed actions after approval:
- Remove copied source archives or virtual environments.
- Remove smoke spec files.
- Leave `/opt/l4d2/installation` intact if useful for later web-app testing, or remove it if requested.
Checkpoint: report final target-host state.
## Failure Handling
Any failure stops the smoke-test flow immediately. The report must include:
- command that failed
- exit code if available
- relevant stdout and stderr
- likely category: environment issue, host-lib bug, packaging/deploy issue, or unclear
- recommended next action
No automatic destructive cleanup should happen after a failure. If a failure leaves `/opt/l4d2`, a mounted overlay, copied files, or a systemd service behind, inspectable state should be preserved until the user approves cleanup.
## Evidence Requirements
Each completed step should report fresh command evidence. Suitable evidence includes:
- exact commands run
- exit code or clear command success/failure status
- key stdout/stderr lines
- relevant filesystem paths
- service status summaries
- mount state
- journal/log snippets
No step should be called successful without current evidence from that step.
## Next Phase After Smoke Test
If the host-lib smoke test succeeds, continue with web-app lifecycle job wiring:
- enqueue lifecycle jobs from routes/UI
- run jobs through worker threads
- call `l4d2web.services.l4d2_facade`
- persist callback output to `job_logs`
- live-follow job logs through SSE
- update server desired and actual state
If the smoke test fails due to host-lib behavior, fix the host library before continuing web-app lifecycle work.