Two TDD tasks: helper+service_control verb rename, then poller code + wiring + tests. Operator-side smoke test in F.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
584 lines
22 KiB
Markdown
584 lines
22 KiB
Markdown
# L4D2 Server Lifecycle: Reboot-Safe + Drift Reconciliation Implementation Plan
|
||
|
||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||
|
||
**Goal:** Make L4D2 server instances survive a host reboot (Part A) and converge `Server.actual_state` to systemd reality every ~30s for out-of-band drift (Part B).
|
||
|
||
**Architecture:** Helper script + `service_control.py` switch from `systemctl start/stop` to `systemctl enable --now / disable --now`. A new background thread spawned with the job workers polls every server's status periodically and writes the result via the existing `refresh_server_actual_state()` path. Skip servers with in-flight jobs to avoid racing with the post-job refresh.
|
||
|
||
**Tech Stack:** bash helper script + sudoers; Python `subprocess` via `l4d2host.service_control.systemctl_command`; SQLAlchemy via `session_scope()`; threading; pytest.
|
||
|
||
**Spec:** `docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md`
|
||
|
||
---
|
||
|
||
## File Structure
|
||
|
||
Files to modify (Part A — lifecycle verb change):
|
||
|
||
- `deploy/files/usr/local/libexec/left4me/left4me-systemctl` — accept verbs `enable`/`disable`/`show` (drop `start`/`stop`).
|
||
- `l4d2host/service_control.py` — rename `start_service` → `enable_service`, `stop_service` → `disable_service`. Action tokens become `"enable"` / `"disable"`.
|
||
- `l4d2host/instances.py` — call `enable_service` from `start_instance`; call `disable_service` from `stop_instance` and `_purge_instance`.
|
||
- `l4d2host/tests/test_lifecycle.py` — update mock-call expectations.
|
||
- `l4d2host/tests/test_service_control.py` — new file with direct unit tests for `enable_service` / `disable_service`.
|
||
- `deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args` — update the verb assertions.
|
||
|
||
Files to modify (Part B — poller):
|
||
|
||
- `l4d2web/services/job_worker.py` — add `start_state_poller`, `state_poller_loop`, `poll_all_servers`.
|
||
- `l4d2web/app.py` — call `start_state_poller(app)` next to `start_job_workers(app)`.
|
||
- `l4d2web/config.py` — default `STATE_POLLER_INTERVAL_SECONDS = 30`.
|
||
- `l4d2web/tests/test_job_worker.py` — four new tests for the poller.
|
||
|
||
No host-library, web-app facade, or CLI surface signatures change. The `l4d2ctl start <name>` / `l4d2ctl stop <name>` commands keep their names (per `AGENTS.md`).
|
||
|
||
---
|
||
|
||
## Pre-flight
|
||
|
||
- [ ] **Step 0a: Verify clean working tree**
|
||
|
||
Run: `git status`
|
||
Expected: `nothing to commit, working tree clean`
|
||
|
||
- [ ] **Step 0b: Verify the existing test suite is at the known-good baseline**
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ l4d2host/tests l4d2web/tests -q`
|
||
Expected: 460 passed, 1 failed (the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state`), 2 skipped.
|
||
|
||
If the count differs, stop and surface — this plan assumes that exact baseline.
|
||
|
||
---
|
||
|
||
## Task 1: Part A — Switch lifecycle verbs to `enable --now` / `disable --now`
|
||
|
||
This task changes the helper script, the Python wrapper, and the instance lifecycle in one cohesive commit. The change is end-to-end vertical — splitting it across commits would leave broken intermediate states (helper accepting verbs that no caller uses, or callers using verbs the helper rejects).
|
||
|
||
**Files:**
|
||
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-systemctl`
|
||
- Modify: `l4d2host/service_control.py`
|
||
- Modify: `l4d2host/instances.py`
|
||
- Modify: `l4d2host/tests/test_lifecycle.py`
|
||
- Create: `l4d2host/tests/test_service_control.py`
|
||
- Modify: `deploy/tests/test_deploy_artifacts.py`
|
||
|
||
### Step 1.1: Update the deploy artifact test for the helper
|
||
|
||
Open `deploy/tests/test_deploy_artifacts.py`. Find `test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args`.
|
||
|
||
Replace the assertions that check the helper's case-statement bodies. Currently the test asserts something like:
|
||
|
||
```python
|
||
assert 'start) exec "$systemctl" start "$unit"' in script
|
||
assert 'stop) exec "$systemctl" stop "$unit"' in script
|
||
```
|
||
|
||
Update to:
|
||
|
||
```python
|
||
assert 'enable)' in script
|
||
assert 'enable --now' in script
|
||
assert 'disable)' in script
|
||
assert 'disable --now' in script
|
||
```
|
||
|
||
Keep the `--property=ActiveState` and `--property=SubState` assertions for the `show` action (unchanged).
|
||
|
||
The rejected-action examples list (currently includes things like `["bad/action", "alpha"]`) is unchanged — those are still bad. If the test currently asserts that `start` and `stop` are accepted (e.g., a positive case), drop those — `start`/`stop` are now rejected verbs, not accepted ones.
|
||
|
||
### Step 1.2: Run the updated artifact test to verify it fails
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args -v`
|
||
Expected: FAIL — the helper script still has `start)`/`stop)` cases, not `enable)`/`disable)`.
|
||
|
||
### Step 1.3: Edit the helper script
|
||
|
||
Open `deploy/files/usr/local/libexec/left4me/left4me-systemctl`. Find the case-statement (currently around lines 24–27). Replace:
|
||
|
||
```sh
|
||
case "$action" in
|
||
start) exec "$systemctl" start "$unit" ;;
|
||
stop) exec "$systemctl" stop "$unit" ;;
|
||
show) exec "$systemctl" show "$unit" --property=ActiveState --property=SubState ;;
|
||
*) ...
|
||
esac
|
||
```
|
||
|
||
with:
|
||
|
||
```sh
|
||
case "$action" in
|
||
enable) exec "$systemctl" enable --now "$unit" ;;
|
||
disable) exec "$systemctl" disable --now "$unit" ;;
|
||
show) exec "$systemctl" show "$unit" --property=ActiveState --property=SubState ;;
|
||
*) ...
|
||
esac
|
||
```
|
||
|
||
Keep the rest of the script (shebang, name validation, `*)` reject-and-exit branch) unchanged. The exact form of the `*)` reject case in the existing helper should be preserved.
|
||
|
||
### Step 1.4: Verify the helper script still parses
|
||
|
||
Run: `sh -n deploy/files/usr/local/libexec/left4me/left4me-systemctl`
|
||
Expected: exit 0, no output.
|
||
|
||
### Step 1.5: Run the artifact test, verify it passes
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/test_deploy_artifacts.py::test_systemctl_helper_passes_shell_syntax_check_and_rejects_bad_args -v`
|
||
Expected: PASS.
|
||
|
||
### Step 1.6: Update `service_control.py`
|
||
|
||
Open `l4d2host/service_control.py`. Replace:
|
||
|
||
```python
|
||
def start_service(
|
||
name: str,
|
||
*,
|
||
on_stdout: Callable[[str], None] | None = None,
|
||
on_stderr: Callable[[str], None] | None = None,
|
||
passthrough: bool = False,
|
||
should_cancel: Callable[[], bool] | None = None,
|
||
) -> CommandResult:
|
||
return run_command(
|
||
systemctl_command("start", name),
|
||
on_stdout=on_stdout,
|
||
on_stderr=on_stderr,
|
||
passthrough=passthrough,
|
||
should_cancel=should_cancel,
|
||
)
|
||
|
||
|
||
def stop_service(
|
||
name: str,
|
||
*,
|
||
on_stdout: Callable[[str], None] | None = None,
|
||
on_stderr: Callable[[str], None] | None = None,
|
||
passthrough: bool = False,
|
||
should_cancel: Callable[[], bool] | None = None,
|
||
) -> CommandResult:
|
||
return run_command(
|
||
systemctl_command("stop", name),
|
||
on_stdout=on_stdout,
|
||
on_stderr=on_stderr,
|
||
passthrough=passthrough,
|
||
should_cancel=should_cancel,
|
||
)
|
||
```
|
||
|
||
with:
|
||
|
||
```python
|
||
def enable_service(
|
||
name: str,
|
||
*,
|
||
on_stdout: Callable[[str], None] | None = None,
|
||
on_stderr: Callable[[str], None] | None = None,
|
||
passthrough: bool = False,
|
||
should_cancel: Callable[[], bool] | None = None,
|
||
) -> CommandResult:
|
||
return run_command(
|
||
systemctl_command("enable", name),
|
||
on_stdout=on_stdout,
|
||
on_stderr=on_stderr,
|
||
passthrough=passthrough,
|
||
should_cancel=should_cancel,
|
||
)
|
||
|
||
|
||
def disable_service(
|
||
name: str,
|
||
*,
|
||
on_stdout: Callable[[str], None] | None = None,
|
||
on_stderr: Callable[[str], None] | None = None,
|
||
passthrough: bool = False,
|
||
should_cancel: Callable[[], bool] | None = None,
|
||
) -> CommandResult:
|
||
return run_command(
|
||
systemctl_command("disable", name),
|
||
on_stdout=on_stdout,
|
||
on_stderr=on_stderr,
|
||
passthrough=passthrough,
|
||
should_cancel=should_cancel,
|
||
)
|
||
```
|
||
|
||
`show_service`, `stream_command`, `stream_journal`, and the `systemctl_command` / `journalctl_command` helpers are unchanged.
|
||
|
||
### Step 1.7: Update `instances.py` to call the new names
|
||
|
||
Open `l4d2host/instances.py`. Replace the import:
|
||
|
||
```python
|
||
from l4d2host.service_control import start_service, stop_service
|
||
```
|
||
|
||
with:
|
||
|
||
```python
|
||
from l4d2host.service_control import disable_service, enable_service
|
||
```
|
||
|
||
Inside `start_instance`, find the `start_service(...)` call (around line 137 in current source) and replace with `enable_service(...)`. Inside `stop_instance` (line 159) and `_purge_instance` (line 194), replace `stop_service(...)` with `disable_service(...)`. Keep all keyword arguments identical — only the function name changes.
|
||
|
||
### Step 1.8: Update `test_lifecycle.py`
|
||
|
||
Open `l4d2host/tests/test_lifecycle.py`. Search for every assertion that references the `start` or `stop` action token in mock-call expectations against `service_control.run_command` or `systemctl_command`. The tests typically look for argument lists like `["sudo", "-n", "/usr/local/libexec/left4me/left4me-systemctl", "start", "<name>"]`.
|
||
|
||
Update each occurrence:
|
||
- `"start"` → `"enable"` (in the `start_instance` test paths)
|
||
- `"stop"` → `"disable"` (in the `stop_instance`, `delete_instance`, `reset_instance`, and `_purge_instance` test paths)
|
||
|
||
Some tests may import `start_service` / `stop_service` directly. Update those imports to `enable_service` / `disable_service`.
|
||
|
||
### Step 1.9: Create direct unit tests for `enable_service` / `disable_service`
|
||
|
||
Create `l4d2host/tests/test_service_control.py` with:
|
||
|
||
```python
|
||
from unittest.mock import patch
|
||
|
||
from l4d2host.service_control import (
|
||
SYSTEMCTL_HELPER,
|
||
disable_service,
|
||
enable_service,
|
||
)
|
||
|
||
|
||
@patch("l4d2host.service_control.run_command")
|
||
def test_enable_service_invokes_helper_with_enable_action(mock_run):
|
||
enable_service("instance-7")
|
||
args, _ = mock_run.call_args
|
||
assert args[0] == ["sudo", "-n", SYSTEMCTL_HELPER, "enable", "instance-7"]
|
||
|
||
|
||
@patch("l4d2host.service_control.run_command")
|
||
def test_disable_service_invokes_helper_with_disable_action(mock_run):
|
||
disable_service("instance-7")
|
||
args, _ = mock_run.call_args
|
||
assert args[0] == ["sudo", "-n", SYSTEMCTL_HELPER, "disable", "instance-7"]
|
||
```
|
||
|
||
### Step 1.10: Run the host-library tests
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2host/tests -q`
|
||
Expected: all green (110 or 111 passing depending on whether `test_service_control.py` already existed; `+2` from the new direct tests).
|
||
|
||
If anything red: fix the test expectations, not the implementation. The implementation matches the spec exactly. Most likely failure mode: a test in `test_lifecycle.py` you missed updating; search for any remaining string literal `"start"` or `"stop"` in helper-arg-list contexts.
|
||
|
||
### Step 1.11: Run the deploy artifact test suite
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ -q`
|
||
Expected: 36 passed, 1 failed (the pre-existing unrelated test).
|
||
|
||
### Step 1.12: Commit
|
||
|
||
```bash
|
||
git add deploy/files/usr/local/libexec/left4me/left4me-systemctl \
|
||
l4d2host/service_control.py l4d2host/instances.py \
|
||
l4d2host/tests/test_lifecycle.py \
|
||
l4d2host/tests/test_service_control.py \
|
||
deploy/tests/test_deploy_artifacts.py
|
||
git commit -m "$(cat <<'EOF'
|
||
feat(l4d2-host): server lifecycle uses systemctl enable --now / disable --now
|
||
|
||
Servers started via the web UI now create a WantedBy= symlink under
|
||
multi-user.target.wants/, so they auto-start on the next host reboot.
|
||
Helper verbs renamed start/stop -> enable/disable; service_control.py
|
||
renamed start_service/stop_service -> enable_service/disable_service.
|
||
The user-facing l4d2ctl start/stop commands keep their names per the
|
||
AGENTS.md contract — only the implementation changes. Spec:
|
||
docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md
|
||
EOF
|
||
)"
|
||
```
|
||
|
||
---
|
||
|
||
## Task 2: Part B — Periodic state poller
|
||
|
||
This task adds the poller code, wires it into the Flask startup, exposes its config knob, and tests four behaviors. One cohesive commit.
|
||
|
||
**Files:**
|
||
- Modify: `l4d2web/services/job_worker.py`
|
||
- Modify: `l4d2web/app.py`
|
||
- Modify: `l4d2web/config.py`
|
||
- Modify: `l4d2web/tests/test_job_worker.py`
|
||
|
||
### Step 2.1: Add the failing tests
|
||
|
||
Open `l4d2web/tests/test_job_worker.py`. Append after the existing tests:
|
||
|
||
```python
|
||
def test_state_poller_refreshes_each_server(app, monkeypatch):
|
||
from l4d2web.services import job_worker as jw
|
||
|
||
with app.app_context():
|
||
from l4d2web.db import session_scope
|
||
from l4d2web.models import Server
|
||
with session_scope() as db:
|
||
db.add_all([
|
||
Server(id=11, name="alpha", port=27015, blueprint_id=None,
|
||
desired_state="running", actual_state="unknown"),
|
||
Server(id=12, name="beta", port=27016, blueprint_id=None,
|
||
desired_state="running", actual_state="unknown"),
|
||
])
|
||
|
||
refreshed = []
|
||
monkeypatch.setattr(jw, "refresh_server_actual_state", lambda sid: refreshed.append(sid))
|
||
|
||
with app.app_context():
|
||
jw.poll_all_servers()
|
||
|
||
assert sorted(refreshed) == [11, 12]
|
||
|
||
|
||
def test_state_poller_skips_servers_with_inflight_jobs(app, monkeypatch):
|
||
from l4d2web.services import job_worker as jw
|
||
|
||
with app.app_context():
|
||
from l4d2web.db import session_scope
|
||
from l4d2web.models import Job, Server
|
||
with session_scope() as db:
|
||
db.add(Server(id=21, name="gamma", port=27017, blueprint_id=None,
|
||
desired_state="running", actual_state="running"))
|
||
db.add(Job(server_id=21, operation="stop", state="running"))
|
||
|
||
refreshed = []
|
||
monkeypatch.setattr(jw, "refresh_server_actual_state", lambda sid: refreshed.append(sid))
|
||
|
||
with app.app_context():
|
||
jw.poll_all_servers()
|
||
|
||
assert refreshed == []
|
||
|
||
|
||
def test_state_poller_swallows_per_server_exceptions(app, monkeypatch):
|
||
from l4d2web.services import job_worker as jw
|
||
|
||
with app.app_context():
|
||
from l4d2web.db import session_scope
|
||
from l4d2web.models import Server
|
||
with session_scope() as db:
|
||
db.add_all([
|
||
Server(id=31, name="bad", port=27018, blueprint_id=None,
|
||
desired_state="running", actual_state="unknown"),
|
||
Server(id=32, name="good", port=27019, blueprint_id=None,
|
||
desired_state="running", actual_state="unknown"),
|
||
])
|
||
|
||
refreshed = []
|
||
|
||
def fake_refresh(sid):
|
||
if sid == 31:
|
||
raise RuntimeError("simulated host failure")
|
||
refreshed.append(sid)
|
||
|
||
monkeypatch.setattr(jw, "refresh_server_actual_state", fake_refresh)
|
||
|
||
with app.app_context():
|
||
jw.poll_all_servers() # must not raise
|
||
|
||
assert refreshed == [32]
|
||
|
||
|
||
def test_state_poller_disabled_when_job_workers_disabled(monkeypatch):
|
||
"""create_app must not spawn the poller thread when JOB_WORKER_ENABLED=False."""
|
||
import threading
|
||
|
||
from l4d2web.app import create_app
|
||
|
||
spawned = []
|
||
real_thread_init = threading.Thread.__init__
|
||
|
||
def tracking_init(self, *args, **kwargs):
|
||
if kwargs.get("name") == "left4me-state-poller":
|
||
spawned.append(True)
|
||
real_thread_init(self, *args, **kwargs)
|
||
|
||
monkeypatch.setattr(threading.Thread, "__init__", tracking_init)
|
||
create_app({"TESTING": True, "JOB_WORKER_ENABLED": False})
|
||
assert not spawned
|
||
```
|
||
|
||
(The tests assume the existing `app` fixture from `conftest.py`. If your project uses a different fixture name, adjust accordingly. The polling tests run `poll_all_servers()` synchronously to avoid testing the loop's `time.sleep`.)
|
||
|
||
### Step 2.2: Run the new tests, verify they fail
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2web/tests/test_job_worker.py::test_state_poller_refreshes_each_server l4d2web/tests/test_job_worker.py::test_state_poller_skips_servers_with_inflight_jobs l4d2web/tests/test_job_worker.py::test_state_poller_swallows_per_server_exceptions l4d2web/tests/test_job_worker.py::test_state_poller_disabled_when_job_workers_disabled -v`
|
||
Expected: FAIL — `poll_all_servers` and `start_state_poller` don't exist yet.
|
||
|
||
### Step 2.3: Add the poller code to `job_worker.py`
|
||
|
||
Open `l4d2web/services/job_worker.py`. Add at the bottom of the file:
|
||
|
||
```python
|
||
def start_state_poller(app):
|
||
interval = float(app.config.get("STATE_POLLER_INTERVAL_SECONDS", 30))
|
||
thread = threading.Thread(
|
||
target=state_poller_loop,
|
||
args=(app, interval),
|
||
daemon=True,
|
||
name="left4me-state-poller",
|
||
)
|
||
thread.start()
|
||
|
||
|
||
def state_poller_loop(app, interval: float) -> None:
|
||
while True:
|
||
try:
|
||
with app.app_context():
|
||
poll_all_servers()
|
||
except Exception:
|
||
pass
|
||
time.sleep(interval)
|
||
|
||
|
||
def poll_all_servers() -> None:
|
||
with session_scope() as db:
|
||
active_server_ids = set(db.scalars(
|
||
select(Job.server_id).where(Job.state.in_(("queued", "running")))
|
||
).all())
|
||
server_ids = [
|
||
sid for sid in db.scalars(select(Server.id)).all()
|
||
if sid not in active_server_ids
|
||
]
|
||
for sid in server_ids:
|
||
try:
|
||
refresh_server_actual_state(sid)
|
||
except Exception:
|
||
pass
|
||
```
|
||
|
||
`Server`, `Job`, `select`, `session_scope`, `threading`, `time`, and `refresh_server_actual_state` are already imported in this file. Verify by scanning the existing imports; if any are missing (unlikely for `select`/`Server`/`Job` since the worker uses them), add them.
|
||
|
||
### Step 2.4: Wire the poller into `create_app`
|
||
|
||
Open `l4d2web/app.py`. Find the existing `start_job_workers(app)` call (around line 91, inside the `if should_start_workers:` block). Add `start_state_poller(app)` immediately after it:
|
||
|
||
```python
|
||
if should_start_workers:
|
||
recover_stale_jobs()
|
||
start_job_workers(app)
|
||
start_state_poller(app)
|
||
```
|
||
|
||
Also update the import:
|
||
|
||
```python
|
||
from l4d2web.services.job_worker import (
|
||
recover_stale_jobs,
|
||
start_job_workers,
|
||
start_state_poller,
|
||
)
|
||
```
|
||
|
||
(If the existing import is single-line `from ... import recover_stale_jobs, start_job_workers`, just add `start_state_poller` to the list.)
|
||
|
||
### Step 2.5: Add the config default
|
||
|
||
Open `l4d2web/config.py`. Find the dict literal that contains other defaults like `JOB_WORKER_THREADS`, `PORT_RANGE_START`, etc. Add:
|
||
|
||
```python
|
||
"STATE_POLLER_INTERVAL_SECONDS": 30,
|
||
```
|
||
|
||
In the env-var-loading section (where `LEFT4ME_PORT_RANGE_START` etc. are read), add:
|
||
|
||
```python
|
||
"STATE_POLLER_INTERVAL_SECONDS": float(os.getenv("LEFT4ME_STATE_POLLER_INTERVAL_SECONDS", "30")),
|
||
```
|
||
|
||
### Step 2.6: Run the four new tests, verify they pass
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2web/tests/test_job_worker.py::test_state_poller_refreshes_each_server l4d2web/tests/test_job_worker.py::test_state_poller_skips_servers_with_inflight_jobs l4d2web/tests/test_job_worker.py::test_state_poller_swallows_per_server_exceptions l4d2web/tests/test_job_worker.py::test_state_poller_disabled_when_job_workers_disabled -v`
|
||
Expected: PASS for all four.
|
||
|
||
### Step 2.7: Run the full web test suite
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest l4d2web/tests -q`
|
||
Expected: 317 passed, 1 skipped (313 + 4 new tests).
|
||
|
||
### Step 2.8: Commit
|
||
|
||
```bash
|
||
git add l4d2web/services/job_worker.py l4d2web/app.py l4d2web/config.py l4d2web/tests/test_job_worker.py
|
||
git commit -m "$(cat <<'EOF'
|
||
feat(l4d2-web): periodic state poller refreshes Server.actual_state
|
||
|
||
A background thread spawned alongside the job workers polls every
|
||
server's status every STATE_POLLER_INTERVAL_SECONDS (default 30) and
|
||
writes the result via the existing refresh_server_actual_state path.
|
||
Servers with in-flight jobs are skipped to avoid racing the post-job
|
||
refresh. Catches reboot drift, OOM kills, manual systemctl operations,
|
||
and any other out-of-band state change. Spec:
|
||
docs/superpowers/specs/2026-05-09-l4d2-server-lifecycle-reboot-and-drift-design.md
|
||
EOF
|
||
)"
|
||
```
|
||
|
||
---
|
||
|
||
## Final Verification
|
||
|
||
- [ ] **Step F.1: Full test sweep**
|
||
|
||
Run: `cd /Users/mwiegand/Projekte/left4me && pytest deploy/tests/ l4d2host/tests l4d2web/tests -q`
|
||
Expected: ~466 passed, 1 failed (the pre-existing unrelated `test_deploy_script_has_safe_defaults_and_preserves_state`), 2 skipped.
|
||
|
||
- [ ] **Step F.2: Working tree clean and commit shape**
|
||
|
||
Run: `git status && git log --oneline -5`
|
||
Expected:
|
||
- `git status`: clean.
|
||
- Top of `git log`:
|
||
1. `feat(l4d2-web): periodic state poller refreshes Server.actual_state`
|
||
2. `feat(l4d2-host): server lifecycle uses systemctl enable --now / disable --now`
|
||
3. `docs(plans): l4d2 server lifecycle reboot-and-drift — implementation plan`
|
||
4. `docs(specs): l4d2 server lifecycle reboot-and-drift — design`
|
||
|
||
- [ ] **Step F.3: Operator-side smoke test (deferred, not part of this plan)**
|
||
|
||
End-to-end on `ckn@10.0.4.128` after deploy:
|
||
|
||
```sh
|
||
deploy/deploy-test-server.sh ckn@10.0.4.128
|
||
|
||
# Confirm the helper now drives enable/disable
|
||
ssh ckn@10.0.4.128 'cat /usr/local/libexec/left4me/left4me-systemctl | grep -E "enable|disable"'
|
||
# expect: enable) exec "$systemctl" enable --now "$unit"
|
||
# disable) exec "$systemctl" disable --now "$unit"
|
||
|
||
# Click "start" in the web UI for a server. Then:
|
||
ssh ckn@10.0.4.128 'systemctl is-enabled left4me-server@1.service'
|
||
# expect: enabled
|
||
|
||
# Reboot the host:
|
||
ssh ckn@10.0.4.128 'sudo systemctl reboot'
|
||
# wait for it to come back, then:
|
||
ssh ckn@10.0.4.128 'systemctl is-active left4me-server@1.service && pgrep -fa srcds'
|
||
# expect: active, srcds running with no UI intervention
|
||
|
||
# Confirm the poller corrects out-of-band drift
|
||
ssh ckn@10.0.4.128 'sudo systemctl disable --now left4me-server@1.service'
|
||
# Within ~30s the web UI's actual_state for server 1 flips from "running" to "stopped".
|
||
ssh ckn@10.0.4.128 'sudo -u left4me /opt/left4me/.venv/bin/python -c "
|
||
import sqlite3
|
||
c = sqlite3.connect(\"/var/lib/left4me/left4me.db\")
|
||
print(c.execute(\"SELECT id, actual_state, actual_state_updated_at FROM servers WHERE id=1\").fetchone())
|
||
"'
|
||
# expect: actual_state='stopped' with a fresh updated_at.
|
||
```
|
||
|
||
---
|
||
|
||
## Out of Scope (do NOT implement here)
|
||
|
||
- Auto-restart on `desired_state=running && actual_state=stopped`.
|
||
- UI banners for stale-state warnings.
|
||
- Reconciliation of orphan systemd units.
|
||
- Per-server poll intervals.
|
||
- Replacing `Restart=on-failure`.
|
||
- Touching the pre-existing red test (`test_deploy_script_has_safe_defaults_and_preserves_state`).
|
||
|
||
If you find yourself touching any of these, stop — they belong in a separate spec.
|