left4me/deploy/README.md
mwiegand e5126c8c0b
docs(deploy): tighten perf-tuning escape hatches
- RT example: add AmbientCapabilities=CAP_SYS_NICE so the User=left4me
  service can actually enter SCHED_FIFO on Trixie.
- CPU governor: note that linux-cpupower may need apt install.
- CPUAffinity=2: clarify that per-instance values typically increment.
- NIC tuning: note that ethtool may need apt install.
2026-05-09 10:15:45 +02:00

141 lines
6.8 KiB
Markdown

# left4me Deployment
This directory contains the production-like test deployment for a Linux server. It installs the repository into a fixed host layout, configures a dedicated runtime user, installs systemd units, and wires the web app to host operations through privileged helper commands.
## Target Layout
The deployment uses these paths:
- `/etc/left4me/host.env`: host library environment configuration.
- `/etc/left4me/web.env`: web app environment configuration.
- `/opt/left4me/.venv`: Python virtual environment for deployed commands.
- `/opt/left4me`: deployed repository contents.
- `/var/lib/left4me/left4me.db`: SQLite database used by the web app.
- `/var/lib/left4me/installation`: shared L4D2 installation.
- `/var/lib/left4me/overlays`: overlay directories. Each overlay lives at `${overlay_id}` under here.
- `/var/lib/left4me/workshop_cache`: deduplicated cache of `.vpk` files downloaded for workshop overlays. One file per Steam item, named `{steam_id}.vpk`. Workshop overlays symlink into this tree.
- `/var/lib/left4me/global_overlay_cache`: cache of non-Steam map archives and extracted `.vpk` files used by managed global map overlays.
- `/var/lib/left4me/instances`: rendered instance specifications and per-instance state.
- `/var/lib/left4me/runtime`: per-instance runtime mount directories.
- `/var/lib/left4me/tmp`: temporary files used by deployment/runtime operations.
- `/usr/local/lib/systemd/system`: global systemd unit files, including `left4me-server@.service`.
- `/usr/local/libexec/left4me`: privileged helper commands, including `left4me-systemctl`, `left4me-journalctl`, and `left4me-overlay` (the latter mounts the per-instance kernel overlay in PID 1's mount namespace via `nsenter`).
- `/etc/sudoers.d/left4me`: sudoers rules allowing the web/runtime commands to call the helpers non-interactively.
Static units are generated for `/var/lib/left4me`. If `LEFT4ME_ROOT` changes, regenerate and reinstall the unit files instead of reusing the existing static units.
## Runtime User
The deployment creates and runs host operations as the dedicated runtime user:
- Username: `left4me`
- Home: `/var/lib/left4me`
- Shell: `/usr/sbin/nologin`
## Running A Test Deployment
Run the deployment from the repository root:
```bash
deploy/deploy-test-server.sh deploy-user@example-host
```
The SSH user must be able to run `sudo` on the target host. The deployment configures system packages, directories, environment files, helper scripts, sudoers rules, Python dependencies, and systemd units.
## Admin Bootstrap
Set the bootstrap credentials in the environment when creating the first admin user:
```bash
LEFT4ME_ADMIN_USERNAME=admin \
LEFT4ME_ADMIN_PASSWORD='change-me' \
flask create-user "$LEFT4ME_ADMIN_USERNAME" --admin
```
Use a strong one-time password and rotate it after first login if needed.
## Overlay References
Overlay references are relative paths below `${LEFT4ME_ROOT}/overlays`. With the default deployment root, they resolve under `/var/lib/left4me/overlays`. New overlays use `${overlay_id}` as their path; the digit-only form is the only one created by the web app.
Invalid references are rejected:
- Absolute paths such as `/srv/overlay`.
- Parent traversal such as `../other` or `competitive/../../base`.
- Empty path components such as `competitive//base`.
- Symlink escapes that resolve outside `${LEFT4ME_ROOT}/overlays`.
The web app currently supports two overlay surfaces:
- `workshop` overlays (user-owned) — populated by downloading `.vpk` files from the public Steam Web API into `${LEFT4ME_ROOT}/workshop_cache/{steam_id}.vpk` and creating absolute symlinks under `${LEFT4ME_ROOT}/overlays/{overlay_id}/left4dead2/addons/{steam_id}.vpk`.
- `script` overlays — populated by an arbitrary user-authored bash script that runs inside `bubblewrap` + `systemd-run --scope` as the unprivileged `l4d2-sandbox` UID, with the overlay directory bind-mounted RW at `/overlay`. Resource caps: 1h walltime, 4 GB RAM, 512 tasks, 200% CPU, 20 GB post-build disk cap.
Both the caches and the overlay directories are owned by the `left4me` runtime user; if the web service ever runs as a different uid, ensure it shares a group with the host process and that both trees are group-readable.
## Performance Tuning
The deployment ships a host-side perf baseline (slices, unit directives, sysctls). See `docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md` for design rationale.
The following knobs are documented escape hatches — they are **not** auto-applied. Apply only if you have measured a need and understand the failure modes.
### CPU governor
The performance governor squeezes a few percent off jitter under bursty load. `schedutil` is acceptable for sustained UDP workloads.
```sh
sudo cpupower frequency-set -g performance
```
Install via `sudo apt install linux-cpupower` if the binary isn't present.
Persist via your distro's CPU-frequency tooling (e.g. `/etc/default/cpufrequtils`).
### Per-instance CPU affinity
`srcds` is single-threaded per instance. On a multi-core host, pinning each instance to its own core can cut jitter under contention. Drop in `/etc/systemd/system/left4me-server@<name>.service.d/affinity.conf`:
```ini
[Service]
CPUAffinity=2
```
This pins the instance to CPU 2 specifically; per-instance values would typically be 1, 2, 3, ... so each server has its own core.
A reasonable strategy on an N-core host: leave core 0 for the kernel + IRQs + system services, then pin one instance per remaining core.
### NIC tuning
Hardware-specific (install via `sudo apt install ethtool` if not present). On a host with a single primary interface (replace `eth0`):
```sh
sudo ethtool -G eth0 rx 4096 tx 4096
sudo ethtool -K eth0 gro on lro off
```
If you run a high instance count, also pin the NIC's interrupts off the cores that game servers occupy (see `/proc/interrupts` and `/proc/irq/<n>/smp_affinity`).
### Real-time scheduling (advanced, opt-in)
Source-engine servers do not need real-time scheduling, and a misbehaving `srcds` at any RT priority can starve kernel threads — even with the default `kernel.sched_rt_runtime_us=950000` throttling 5% of CPU back. Use only if you have a measured jitter problem that the baseline does not solve.
`/etc/systemd/system/left4me-server@.service.d/realtime.conf`:
```ini
[Service]
CPUSchedulingPolicy=fifo
CPUSchedulingPriority=10
LimitRTPRIO=10
AmbientCapabilities=CAP_SYS_NICE
```
The `AmbientCapabilities=CAP_SYS_NICE` line is needed because the service runs as `User=left4me` with `NoNewPrivileges=true`; without it some kernels/systemd combinations refuse to apply the RT policy.
### Applying changes to running servers
Unit-file changes do not apply to already-running services. After any change:
```sh
sudo systemctl daemon-reload
# Restart each game server via the web UI's stop + start, or:
sudo systemctl restart 'left4me-server@*.service'
```