left4me/docs/superpowers/specs/2026-05-05-l4d2-host-smoke-test-design.md

# L4D2 Host Smoke Test Design

**Goal:** Validate the implemented `l4d2host` library and `l4d2ctl` CLI on the disposable Linux server `ckn@10.0.4.128` before continuing web-app lifecycle job wiring.

**Target Host:** `ckn@10.0.4.128`

**Access Assumption:** SSH access as `ckn` with sudo privileges.

**Primary Constraint:** Ask for explicit user approval before every server-touching step.

## Context

The repository now contains both planned components:

- `components/l4d2-host-lib`: Python host library and `l4d2ctl` CLI.
- `components/l4d2-web-app`: Flask app for users, blueprints, servers, jobs, and logs.

The web app depends on the host library for real lifecycle behavior. Before wiring web lifecycle jobs end-to-end, the host contract should be proven on an actual Linux machine with `steamcmd`, `fuse-overlayfs`, systemd user services, and journald available.

## Scope

The smoke test verifies these host-lib behaviors:

- SSH connectivity and sudo access to `ckn@10.0.4.128`.
- Required runtime tools are present or can be installed: `steamcmd`, `fuse-overlayfs`, `fusermount3`, `systemctl --user`, `journalctl --user`, and Python packaging tooling.
- `/opt/l4d2` exists with permissions that allow the `ckn` user to run the v1 host workflow.
- `l4d2ctl install` downloads or updates the L4D2 dedicated server into `/opt/l4d2/installation`.
- `l4d2ctl initialize smoke -f spec.yaml` writes instance and runtime state under `/opt/l4d2`.
- `l4d2ctl start smoke` mounts the runtime overlay, copies `server.cfg`, and starts the systemd user service.
- `get_instance_status("smoke")` reports an interpretable status.
- `stream_instance_logs("smoke")` can read journald output.
- `l4d2ctl stop smoke` stops the user service and unmounts the runtime overlay.
- `l4d2ctl delete smoke` removes the instance/runtime directories.
- Re-running `l4d2ctl delete smoke` succeeds as a no-op.

## Out Of Scope

- Web-app job execution or UI changes.
- Long-running game-server operations beyond a short start/status/log/stop check.
- Workshop mod management or web-managed overlay file content.
- Production hardening for the disposable test server.

## Execution Strategy

The smoke test is intentionally gated. Each step must stop after reporting evidence and wait for user approval before moving to the next step.

### Step 1: Read-Only Server Inspection

Purpose: understand the target host without changing it.

Allowed actions:

- SSH into `ckn@10.0.4.128`.
- Inspect OS, package manager, current user, sudo availability, Python version, systemd user availability, lingering status, existing `/opt/l4d2` state, and relevant runtime tools.

Not allowed in this step:

- Installing packages.
- Creating or modifying files.
- Starting or stopping services.
- Mounting or unmounting filesystems.

Checkpoint: report findings and ask before any setup changes.

### Step 2: Server Preparation

Purpose: make the disposable server capable of running the host-lib workflow.

Allowed actions after approval:

- Install missing packages needed for the host workflow.
- Create `/opt/l4d2` if missing.
- Set ownership/permissions so `ckn` can run the smoke workflow.
- Configure systemd user prerequisites if required for `systemctl --user`.

Checkpoint: report exact changes and ask before deploying code.

### Step 3: Deploy Current Host Lib

Purpose: install the current repository implementation on the target host without inventing new packaging.

Allowed actions after approval:

- Copy or archive the current `components/l4d2-host-lib` source to the server.
- Install it using its existing `pyproject.toml`, preferably into an isolated virtual environment.
- Verify that `l4d2ctl --help` exposes the fixed v1 command surface.

Checkpoint: report command evidence and ask before downloading server files.

### Step 4: Run `l4d2ctl install`

Purpose: validate the install/update command against real `steamcmd` behavior.

Allowed actions after approval:

- Run `l4d2ctl install` on the target host.
- Capture stdout, stderr, and exit code.
- Inspect `/opt/l4d2/installation` enough to confirm expected installation output.

Checkpoint: report evidence and ask before creating a smoke instance.

### Step 5: Run Instance Lifecycle Smoke Test

Purpose: validate initialize/start/status/logs/stop/delete against the real runtime.

Allowed actions after approval:

- Create a minimal spec file for instance name `smoke`.
- Run `l4d2ctl initialize smoke -f spec.yaml`.
- Run `l4d2ctl start smoke`.
- Check `systemctl --user status l4d2@smoke.service`.
- Check mount state for `/opt/l4d2/runtime/smoke/merged`.
- Call `get_instance_status("smoke")` from Python.
- Call `stream_instance_logs("smoke", lines=50, follow=False)` from Python.
- Run `l4d2ctl stop smoke`.
- Run `l4d2ctl delete smoke`.
- Run `l4d2ctl delete smoke` again to verify no-op success.

Checkpoint: report command evidence and ask what to do with remaining artifacts.

### Step 6: Cleanup Decision

Purpose: preserve useful diagnostics or remove smoke-test state based on user preference.

Allowed actions after approval:

- Remove copied source archives or virtual environments.
- Remove smoke spec files.
- Leave `/opt/l4d2/installation` intact if useful for later web-app testing, or remove it if requested.

Checkpoint: report final target-host state.

## Failure Handling

Any failure stops the smoke-test flow immediately. The report must include:

- command that failed
- exit code if available
- relevant stdout and stderr
- likely category: environment issue, host-lib bug, packaging/deploy issue, or unclear
- recommended next action

No automatic destructive cleanup should happen after a failure. If a failure leaves `/opt/l4d2`, a mounted overlay, copied files, or a systemd service behind, inspectable state should be preserved until the user approves cleanup.

## Evidence Requirements

Each completed step should report fresh command evidence. Suitable evidence includes:

- exact commands run
- exit code or clear command success/failure status
- key stdout/stderr lines
- relevant filesystem paths
- service status summaries
- mount state
- journal/log snippets

No step should be called successful without current evidence from that step.

## Next Phase After Smoke Test

If the host-lib smoke test succeeds, continue with web-app lifecycle job wiring:

- enqueue lifecycle jobs from routes/UI
- run jobs through worker threads
- call `l4d2web.services.l4d2_facade`
- persist callback output to `job_logs`
- live-follow job logs through SSE
- update server desired and actual state

If the smoke test fails due to host-lib behavior, fix the host library before continuing web-app lifecycle work.