left4me/docs/superpowers/specs/2026-05-05-l4d2-host-smoke-test-design.md
2026-05-05 23:47:06 +02:00

6.5 KiB

L4D2 Host Smoke Test Design

Goal: Validate the implemented l4d2host library and l4d2ctl CLI on the disposable Linux server ckn@10.0.4.128 before continuing web-app lifecycle job wiring.

Target Host: ckn@10.0.4.128

Access Assumption: SSH access as ckn with sudo privileges.

Primary Constraint: Ask for explicit user approval before every server-touching step.

Context

The repository now contains both planned components:

  • l4d2host: Python host library and l4d2ctl CLI.
  • l4d2web: Flask app for users, blueprints, servers, jobs, and logs.

The web app depends on the host library for real lifecycle behavior. Before wiring web lifecycle jobs end-to-end, the host contract should be proven on an actual Linux machine with steamcmd, fuse-overlayfs, systemd user services, and journald available.

Scope

The smoke test verifies these host-lib behaviors:

  • SSH connectivity and sudo access to ckn@10.0.4.128.
  • Required runtime tools are present or can be installed: steamcmd, fuse-overlayfs, fusermount3, systemctl --user, journalctl --user, and Python packaging tooling.
  • /opt/l4d2 exists with permissions that allow the ckn user to run the v1 host workflow.
  • l4d2ctl install downloads or updates the L4D2 dedicated server into /opt/l4d2/installation.
  • l4d2ctl initialize smoke -f spec.yaml writes instance and runtime state under /opt/l4d2.
  • l4d2ctl start smoke mounts the runtime overlay, copies server.cfg, and starts the systemd user service.
  • get_instance_status("smoke") reports an interpretable status.
  • stream_instance_logs("smoke") can read journald output.
  • l4d2ctl stop smoke stops the user service and unmounts the runtime overlay.
  • l4d2ctl delete smoke removes the instance/runtime directories.
  • Re-running l4d2ctl delete smoke succeeds as a no-op.

Out Of Scope

  • Web-app job execution or UI changes.
  • Long-running game-server operations beyond a short start/status/log/stop check.
  • Workshop mod management or web-managed overlay file content.
  • Production hardening for the disposable test server.

Execution Strategy

The smoke test is intentionally gated. Each step must stop after reporting evidence and wait for user approval before moving to the next step.

Step 1: Read-Only Server Inspection

Purpose: understand the target host without changing it.

Allowed actions:

  • SSH into ckn@10.0.4.128.
  • Inspect OS, package manager, current user, sudo availability, Python version, systemd user availability, lingering status, existing /opt/l4d2 state, and relevant runtime tools.

Not allowed in this step:

  • Installing packages.
  • Creating or modifying files.
  • Starting or stopping services.
  • Mounting or unmounting filesystems.

Checkpoint: report findings and ask before any setup changes.

Step 2: Server Preparation

Purpose: make the disposable server capable of running the host-lib workflow.

Allowed actions after approval:

  • Install missing packages needed for the host workflow.
  • Create /opt/l4d2 if missing.
  • Set ownership/permissions so ckn can run the smoke workflow.
  • Configure systemd user prerequisites if required for systemctl --user.

Checkpoint: report exact changes and ask before deploying code.

Step 3: Deploy Current Host Lib

Purpose: install the current repository implementation on the target host without inventing new packaging.

Allowed actions after approval:

  • Copy or archive the current l4d2host source to the server.
  • Install it using its existing pyproject.toml, preferably into an isolated virtual environment.
  • Verify that l4d2ctl --help exposes the fixed v1 command surface.

Checkpoint: report command evidence and ask before downloading server files.

Step 4: Run l4d2ctl install

Purpose: validate the install/update command against real steamcmd behavior.

Allowed actions after approval:

  • Run l4d2ctl install on the target host.
  • Capture stdout, stderr, and exit code.
  • Inspect /opt/l4d2/installation enough to confirm expected installation output.

Checkpoint: report evidence and ask before creating a smoke instance.

Step 5: Run Instance Lifecycle Smoke Test

Purpose: validate initialize/start/status/logs/stop/delete against the real runtime.

Allowed actions after approval:

  • Create a minimal spec file for instance name smoke.
  • Run l4d2ctl initialize smoke -f spec.yaml.
  • Run l4d2ctl start smoke.
  • Check systemctl --user status l4d2@smoke.service.
  • Check mount state for /opt/l4d2/runtime/smoke/merged.
  • Call get_instance_status("smoke") from Python.
  • Call stream_instance_logs("smoke", lines=50, follow=False) from Python.
  • Run l4d2ctl stop smoke.
  • Run l4d2ctl delete smoke.
  • Run l4d2ctl delete smoke again to verify no-op success.

Checkpoint: report command evidence and ask what to do with remaining artifacts.

Step 6: Cleanup Decision

Purpose: preserve useful diagnostics or remove smoke-test state based on user preference.

Allowed actions after approval:

  • Remove copied source archives or virtual environments.
  • Remove smoke spec files.
  • Leave /opt/l4d2/installation intact if useful for later web-app testing, or remove it if requested.

Checkpoint: report final target-host state.

Failure Handling

Any failure stops the smoke-test flow immediately. The report must include:

  • command that failed
  • exit code if available
  • relevant stdout and stderr
  • likely category: environment issue, host-lib bug, packaging/deploy issue, or unclear
  • recommended next action

No automatic destructive cleanup should happen after a failure. If a failure leaves /opt/l4d2, a mounted overlay, copied files, or a systemd service behind, inspectable state should be preserved until the user approves cleanup.

Evidence Requirements

Each completed step should report fresh command evidence. Suitable evidence includes:

  • exact commands run
  • exit code or clear command success/failure status
  • key stdout/stderr lines
  • relevant filesystem paths
  • service status summaries
  • mount state
  • journal/log snippets

No step should be called successful without current evidence from that step.

Next Phase After Smoke Test

If the host-lib smoke test succeeds, continue with web-app lifecycle job wiring:

  • enqueue lifecycle jobs from routes/UI
  • run jobs through worker threads
  • call l4d2web.services.l4d2_facade
  • persist callback output to job_logs
  • live-follow job logs through SSE
  • update server desired and actual state

If the smoke test fails due to host-lib behavior, fix the host library before continuing web-app lifecycle work.