spec(hardening): threat model + defenses survey + test plan; pivot handoff

Reframe the queued uid-split decision into a broader hardening analysis.
Audit found the same-uid attack surface (DB readable from srcds, ptrace
allowed, RCON stored plaintext) is closable by either uid split or
systemd directive composition; the three specs ground that choice in a
threat model, survey the defenses, and lay out a self-contained test
plan to run on left4.me next. uid-split spec deferred pending results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-15 13:07:40 +02:00
parent 9a2ab974e6
commit 1df811e62a
No known key found for this signature in database
4 changed files with 2036 additions and 86 deletions

View file

@ -0,0 +1,698 @@
# left4me application hardening — defenses survey
**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md`
and `2026-05-15-hardening-test-plan.md`.
This document catalogs the Linux + systemd defense primitives applicable
to left4me, evaluates each against this codebase's needs, and proposes a
candidate composition. Each candidate is *testable* — the test plan
exercises it before commit.
Reference: the threat model defines defenses D1-D7. This document maps
primitives to those defenses.
## Section 1 — Linux kernel primitives
### Namespaces (`man 7 namespaces`)
| NS | Isolates | Relevance |
|---|---|---|
| **mount** | filesystem hierarchy view | Core. Gives `TemporaryFileSystem=` + bind primitives. |
| **user** | uid/gid mapping | Big for D2/D4 (cross-uid ptrace block). |
| **pid** | PID 1, /proc visibility | Pairs with `ProcSubset=pid` for D2. |
| **net** | netifs, ports, routes | Breaks gameservers; do **not** apply to server@. |
| **ipc** | SysV IPC + POSIX MQ + abstract sockets | Hygienic; `PrivateIPC=true`. |
| **uts** | hostname | Cosmetic; doesn't matter for us. |
| **time** | CLOCK_MONOTONIC offset | Irrelevant for us. |
| **cgroup** | cgroup view | Defense-in-depth against cgroup escape. |
**For left4me:** mount + user + pid + ipc on `left4me-server@.service`.
The web unit can use the same minus user-ns (incompatible with sudo).
### Capabilities (`man 7 capabilities`)
Per-process, granted at exec via file caps or by systemd at unit start.
Bounding set = upper bound; ambient = inherited across non-setuid exec.
- **CapabilityBoundingSet=** empty drops everything. Neither srcds nor
gunicorn needs any capability after they start (no raw sockets, no
mount, no module load, no setuid).
- **AmbientCapabilities=** empty (default).
Sharp edge: with `+`-prefixed ExecStartPre, the helper runs as PID 1
(root, all caps), unaffected by these. That's how we get the privileged
overlay mount without breaking the unit's caps.
### Seccomp-bpf (`man 2 seccomp`)
Filter syscall set. Per-process. Composes with the AND of all filters
loaded. The systemd `SystemCallFilter=` wraps it.
For us, two filter strategies:
- **Allow-list base** (`@system-service`): permissive enough for srcds
+ gunicorn; subtract dangerous groups.
- **Deny-list**: simpler but easier to leave holes.
Strategy: allow-list with subtractions.
Critical subtractions for D2:
- `~@debug` — drops `ptrace(2)`, `process_vm_readv/writev(2)`,
`process_madvise(2)`. **Single most important syscall block** for our
threat model.
- `~@mount``mount`, `umount2`, `pivot_root` (gameserver doesn't need;
helper does, and helper runs as root via `+` prefix).
- `~@privileged` — anything requiring CAP_*; redundant with empty
bounding set but defense-in-depth.
- `~@reboot`, `~@swap`, `~@cpu-emulation`, `~@obsolete` — cheap removal.
Sharp edges:
- `SystemCallFilter=` lines compose left-to-right by union (first line
sets allow-list; subsequent `~` lines subtract).
- A `~` subtract on a group not in the allow-list is a no-op.
- `SystemCallArchitectures=native` blocks 32-bit syscall entries that
bypass the filter. Always set this.
- `SystemCallErrorNumber=EPERM` vs. default `KILL``EPERM` is gentler
for non-essential paths; `KILL` is loud and obvious. Start with
default (KILL) for clear signal, switch to `EPERM` if a benign caller
trips it (e.g., a library probing for capabilities).
### Yama LSM — `kernel.yama.ptrace_scope`
System-wide sysctl. Values:
- 0: any same-user can ptrace
- 1: same-uid or direct ancestor (Debian default)
- 2: requires `CAP_SYS_PTRACE` (admin only)
- 3: ptrace disabled entirely
For left4me: setting to 2 system-wide is cheap and removes the same-uid
ptrace path entirely. Set via `/etc/sysctl.d/99-left4me.conf` (or
extend an existing file). Doesn't affect debuggability — if you ever
need to ptrace, do it as root.
Caveat: Yama is enforced AT THE TIME of `ptrace` call. With seccomp
blocking the syscall entirely (`~@debug`), Yama becomes belt-and-braces;
keep both for defense-in-depth.
### LSMs other than Yama
| LSM | Status on Debian Trixie | Fit for us |
|---|---|---|
| **AppArmor** | Available; not enabled by default | Could write profiles for srcds + gunicorn. Per-unit profile via `AppArmorProfile=` on systemd. Moderate effort. |
| **SELinux** | Available; not enabled by default | Heavy. Not worth the operational cost on a single-host VPS. |
| **landlock** | Kernel ≥5.13; available | Process-local sandboxing. Apps must opt in via `landlock(2)`. Python doesn't have a stdlib binding; need to call via ctypes or a wrapper. For us: would need to retrofit gunicorn or write a wrapper. Defer. |
| **BPF LSM** | Kernel ≥5.7; available | Programmable LSM hooks. Bleeding edge for personal infra. Defer. |
| **Tomoyo** | Available; not Debian-enabled | Path-based MAC. Niche. Skip. |
**For left4me:** Yama yes. AppArmor *maybe*, as a follow-up — a profile
limited to "deny path X" patterns for srcds would be small but adds an
audit/rollback surface. Skip in the first pass; revisit if test results
show systemd directives alone leave gaps.
### Filesystem ACLs and modes
POSIX permissions, supplementary groups, ACLs (`setfacl`), extended
attrs (`xattr`).
For us:
- DB and `web.env` already use `root:left4me 0640`. If we go uid-split,
ownership changes; if we go hardening-only, mode is fine — what
matters is *whether the unit's FS view contains them at all*.
- `setfacl` for fine-grained sharing (e.g., one supplementary group
used by both web and game). Doable but adds complexity; consider
only if uid split goes ahead.
### File attributes (chattr)
`chattr +i` (immutable) and `chattr +a` (append-only).
For us:
- `chattr +i /opt/left4me/src/**` — prevents post-deploy tampering by
anything short of root removing the attr. But: `pip install -e`
creates `*.egg-info` files in the tree; deploy of new code would need
to `chattr -R -i ...` first. Too much friction. Skip.
- `chattr +i /etc/left4me/web.env` — keeps the env file from being
rewritten by a malicious uid. Works because the env file is rewritten
rarely (rotate SECRET_KEY explicitly via ckn-bw apply, which is root
and can `chattr -i` first). Worth considering as a small extra.
### cgroups v2
Not a security primitive (not confidentiality/integrity), but a
**resource ceiling**. Already in use:
- `Slice=l4d2-game.slice`, `MemoryMax`, `TasksMax` — keep.
`MemoryDenyWriteExecute=true` is a kernel-level prctl + seccomp, not a
cgroup, but listed here because it's resource-adjacent. See systemd
section.
### Sudo / setuid
Sudoers grants narrow what a unit's uid can do as root. For us, the
helpers (`scripts/libexec/left4me-*`) already validate inputs tightly
(verified in audit). Two design options for the future:
- **Keep sudo path**, narrow the grants (per-uid via 3-user split, or
per-action via tighter sudoers).
- **Replace sudo with systemctl-managed transient units triggered via
dbus / `systemctl start`** — the build-overlay-unit spec already
proposes this for the script-sandbox.
The web app needs to invoke the helpers somehow. `NoNewPrivileges=true`
on the web unit would break sudo's setuid. If we move to
systemctl-triggered units (no setuid involved), we can also tighten the
web unit. Sequenced in the implementation plan, not this survey.
## Section 2 — systemd unit-config primitives
### Identity
- **`User=` / `Group=`** — drop privileges. Already set.
- **`DynamicUser=true`** — transient uid per run, persisted across runs
via `StateDirectory=`. Strong default. **Bad fit for us** because
multiple units share `/var/lib/left4me/` cross-unit; DynamicUser's
per-unit `StateDirectory=` model fights that.
- **`SupplementaryGroups=`** — extra groups. Used if we add a shared
read-only group (e.g., `l4d2-overlay-readers`).
### Filesystem virtualization
The lever the operator asked about ("can systemd have a fully virtual
filesystem"). Yes — composition:
- **`RootDirectory=path`** — chroot. Full FS substitution. Heavy;
requires populating libs/binaries. Skip for the first pass.
- **`RootImage=path`** — same but from a disk image. Way too heavy.
- **`TemporaryFileSystem=path[:opts]`** — empty tmpfs at `path`.
Cheap. Composes with bind paths.
- **`BindReadOnlyPaths=src[:dst]`** — RO bind. Composes over
TemporaryFileSystem.
- **`BindPaths=src[:dst]`** — RW bind. Composes over TemporaryFileSystem.
- **`InaccessiblePaths=path`** — masks a path with an empty file/dir.
Legacy; Bind* is cleaner.
- **`NoExecPaths=path`** / **`ExecPaths=path`** — restrict
executable paths. Strong but easy to misconfigure.
Composition pattern (the one we want for srcds):
```ini
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv
BindReadOnlyPaths=/var/lib/left4me/installation
BindReadOnlyPaths=/var/lib/left4me/overlays
BindReadOnlyPaths=/etc/left4me/host.env
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates /etc/resolv.conf
BindReadOnlyPaths=/etc/nsswitch.conf /etc/alternatives
BindPaths=/var/lib/left4me/runtime/%i
```
Result: srcds has no DB, no `web.env`, no `/opt/left4me/src/` in its FS
view. Files outside the bound list are simply not there from srcds's
perspective — `open()` returns ENOENT, not EACCES.
Sharp edges:
- `TemporaryFileSystem=` size defaults to half RAM; clamp via
`:size=NNM,nr_inodes=NN`.
- Bind paths must exist on disk; ENOENT prevents unit start.
- `BindReadOnlyPaths=` and `BindPaths=` reorder semantics: bind-mounts
applied in order; later wins.
- `RuntimeDirectory=` integrates with `TemporaryFileSystem=` cleanly:
`RuntimeDirectory=left4me/foo` creates `/run/left4me/foo` and binds
it in, auto-cleaning on stop.
### Namespaces (systemd wrappers)
- **`PrivateTmp=true`** — already set.
- **`PrivateDevices=true`** — already set. Drops most of `/dev`.
- **`PrivateNetwork=true`** — **don't** for gameservers (breaks UDP).
- **`PrivateIPC=true`** — private SysV/POSIX IPC namespace; cheap win.
- **`PrivateUsers=true`** — own userns. The configured `User=left4me`
is identity-mapped inside; outside, the unit's processes appear as a
mapped high uid (defense for D2/D4 against cross-namespace ptrace).
Sharp edge: incompatible with `sudo` from inside the unit (setuid +
userns mapping = no host-root).
- **`PrivateMounts=true`** — own mount ns (default-implicit with most
Protect* / Private* directives).
### `/proc` and `/sys` protection
- **`ProtectProc=invisible|noaccess|ptraceable|default`** —
`invisible` makes other procs' `/proc/<pid>/*` not exist. **D2.**
- **`ProcSubset=pid|all`** — `pid` restricts `/proc/` to PID entries;
hides `/proc/kallsyms`, `/proc/cpuinfo`, etc. Cheap.
- **`ProtectKernelTunables=true`** — `/proc/sys`, `/sys` read-only.
- **`ProtectKernelModules=true`** — block `init_module`, `delete_module`.
- **`ProtectKernelLogs=true`** — block `/dev/kmsg`, syslog().
- **`ProtectClock=true`** — block `clock_settime`, `settimeofday`.
- **`ProtectControlGroups=true`** — `/sys/fs/cgroup` read-only.
- **`ProtectHostname=true`** — block `sethostname`/`setdomainname`.
All of `ProtectKernel*`, `ProtectClock`, `ProtectControlGroups`,
`ProtectHostname` are cheap and have no downside for srcds or gunicorn.
Add all of them.
### Filesystem protection (legacy / not Bind*)
- **`ProtectSystem=false|true|full|strict`** — increasingly stringent
RO of system paths. `strict` makes `/`, `/usr`, `/boot`, `/etc`,
`/opt` RO except for explicit writable paths.
- **`ProtectHome=false|true|read-only|tmpfs`** — `tmpfs` masks `/home`,
`/root`, `/run/user` with empty tmpfs.
For us: `ProtectSystem=strict` + `ProtectHome=tmpfs` is the baseline.
But once we adopt `TemporaryFileSystem=` for the relevant trees, these
become secondary — TemporaryFileSystem fully supersedes them in the
covered subtrees. Keep both as defense-in-depth (cheap).
### Syscall filtering
- **`SystemCallFilter=expr`** — discussed in Linux section.
- **`SystemCallArchitectures=native`** — always set.
- **`SystemCallLog=expr`** — opt-in logging without enforcement;
useful for diagnosing what gets called before tightening.
- **`SystemCallErrorNumber=EPERM`** — soft denial vs. SIGKILL. Default
is SIGKILL; switch later if a benign caller trips.
### Capabilities
- **`CapabilityBoundingSet=`** — empty drops all. Use it.
- **`AmbientCapabilities=`** — empty (default).
- **`NoNewPrivileges=true`** — prevents setuid escalation. **Required
on srcds**, **incompatible with sudo on web** until sudo is replaced.
### Network restrictions
- **`RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX`** — for srcds.
AF_UNIX needed for journald socket access.
- **`IPAddressAllow=` / `IPAddressDeny=`** — uses cgroup BPF; affects
outbound traffic. For srcds: probably overcomplicates; the firewall
already controls ingress. Skip for first pass.
- **`SocketBindAllow=` / `SocketBindDeny=`** — restricts which ports a
unit can `bind()`. For srcds, allow only the configured game port
range. Adds value but couples to config. Defer to a follow-up.
### Resource restrictions
- **`MemoryMax`**, **`TasksMax`**, **`LimitNOFILE`** — already set.
- **`OOMScoreAdjust`** — already set (favor killing the gameserver
before system processes if memory tight).
- **`MemoryDenyWriteExecute=true`** — blocks `mprotect(PROT_WRITE|PROT_EXEC)`.
Defends against shellcode in JIT memory. **Source engine likely
fine** (no JIT in the binary; the Squirrel script engine is an
interpreter, not JIT). **Sourcemod plugins**: most are compiled to
bytecode + run on SourcePawn VM (interpreter); no JIT either. Verify
in test.
### IPC and process hygiene
- **`RemoveIPC=true`** — clean up SysV IPC on unit stop.
- **`KeyringMode=private`** — own kernel keyring; no host-key access.
- **`LockPersonality=true`** — block `personality(2)` calls (no x86 vs
x86-64 mode toggle). Already set.
- **`RestrictRealtime=true`** — block real-time scheduling. srcds may
use SCHED_OTHER + nice; no realtime needed.
- **`RestrictNamespaces=true`** — block `unshare(2)` / `clone(CLONE_NEW*)`.
- **`RestrictSUIDSGID=true`** — already set.
- **`UMask=0027`** — narrow default umask.
### Capabilities of the `+` prefix
`ExecStartPre=+cmd` runs `cmd` as root in PID 1's namespaces, bypassing
the unit's User= and almost all Protect*/Private*/Restrict* directives.
This is how the existing overlay-mount helper runs. Critical to verify
in test:
- Does `+` preserve the bypass when `PrivateUsers=true` is set?
(Expected: yes — the userns is set up around the unit's processes;
`+` puts the helper outside it.)
### State management (per-unit)
- **`StateDirectory=path`** — creates `/var/lib/<path>` owned by User=.
- **`RuntimeDirectory=path`** — creates `/run/<path>`, auto-deleted on
stop.
- **`LogsDirectory=path`** — `/var/log/<path>`.
- **`CacheDirectory=path`** — `/var/cache/<path>`.
- **`ConfigurationDirectory=path`** — `/etc/<path>`.
Useful for cleanup hygiene if we redesign storage layout. Not required
for first pass.
### `systemd-analyze security`
`systemd-analyze security <unit>` produces a security score per unit
(lower = more secure). Output lists each directive with a ✓/✗.
Useful as:
- Regression check (record baseline, ensure score drops after refactor).
- Discovery tool ("which directives haven't I set?").
Baseline scores (to capture during test plan):
- `left4me-server@1.service` before refactor
- `left4me-web.service` before refactor
### Composability lookups
The systemd docs use a "predefined preset" concept that's worth knowing:
- **`@privileged`** (syscall group) ⊃ `@process`, `@module`, `@ptrace`, etc.
- **`@system-service`** is the recommended base for "I want a normal
service to work."
- Subtracting `~@privileged` is broad; `~@debug @mount @raw-io` is
surgical.
## Section 3 — Application-level options
### Apparmor profile for srcds
If systemd directives leave gaps, an AppArmor profile would let us
deny specific paths or operations beyond what systemd's directives
cover. E.g., "deny network for srcds to a specific IP range" via
`network inet stream...` deny rules; or "deny mounting" beyond
`SystemCallFilter`.
Effort:
- Enable AppArmor in the kernel cmdline + boot config.
- Write a profile (e.g., `/etc/apparmor.d/usr.bin.srcds_linux`).
- Reference via systemd `AppArmorProfile=` per unit.
Skip for the first pass; revisit if test results show the systemd
directives alone leave a gap.
### landlock for the web app
Python web app could call `landlock_create_ruleset` / `landlock_add_rule`
/ `landlock_restrict_self` via ctypes. Restricts FS access at runtime.
For us:
- Could restrict gunicorn to `/var/lib/left4me/` + `/etc/left4me/web.env`
+ `/opt/left4me/.venv` + `/tmp`.
- Symmetric to `TemporaryFileSystem=` + `Bind*` but at the
application layer (no systemd reach).
Skip; systemd directives are simpler. Reconsider if we move to a
DynamicUser-style world later.
### File-integrity tooling (Aide, Tripwire)
Out of scope for prevention; useful for detection. Not in this design.
### Custom seccomp profile (bypassing systemd)
The web app could call `seccomp(2)` from inside Python via libseccomp
+ ctypes to tighten its own filter beyond what systemd applies.
Symmetric to landlock; skip for the same reason.
## Section 4 — Per-defense mapping
For each defense from the threat model, the primitives that implement
it, in priority order:
### D1 — Gameserver RCE cannot exfiltrate DB or `web.env`
| Primitive | Strength | Notes |
|---|---|---|
| `TemporaryFileSystem=/var/lib /etc` + minimal bind set | Strong | The files simply aren't in the unit's FS view. ENOENT, not EACCES. |
| 3-user split (DB owned by `l4d2-web`) | Strong | Kernel-enforced; survives unit-config errors. |
| `BindReadOnlyPaths=/dev/null:/var/lib/left4me/left4me.db` | Medium | Masks the path; brittle (paths can move). |
| Filesystem ACLs (DB mode 0600) | Weak | Kernel still allows `left4me` group; only fixed by uid split. |
**Composition chosen:** `TemporaryFileSystem=` + Bind* (primary).
3-user split as defense-in-depth or deferred.
### D2 — Gameserver RCE cannot ptrace web app or peers
| Primitive | Strength | Notes |
|---|---|---|
| `SystemCallFilter=~@debug` | Strong | Blocks `ptrace`, `process_vm_readv/writev`. |
| `kernel.yama.ptrace_scope=2` | Strong | Belt-and-braces at the kernel level. |
| `CapabilityBoundingSet=` empty | Strong | No CAP_SYS_PTRACE. |
| `PrivateUsers=true` | Strong | Cross-userns ptrace requires CAP_SYS_PTRACE. |
| 3-user split | Strong | Different uids; same-uid path doesn't exist. |
**Composition chosen:** All four (syscall + yama + caps + userns)
together; they compose redundantly.
### D3 — Gameserver RCE cannot use sudo helpers
| Primitive | Strength | Notes |
|---|---|---|
| `NoNewPrivileges=true` | Strong | Blocks sudo's setuid. Already set on server@. |
| `PrivateUsers=true` | Strong | sudo across userns boundary impossible. |
| Sudoers grants scoped to `l4d2-web` (uid split) | Strong | Different uid means sudo grant doesn't apply. |
| `RestrictSUIDSGID=true` | Strong | Already set. |
**Composition chosen:** NoNewPrivileges (already) + PrivateUsers (new)
+ RestrictSUIDSGID (already). 3-user split is *also* covered by NNP
+ PrivateUsers; uid split would be defense-in-depth.
### D4 — Web app RCE cannot ptrace gameservers
| Primitive | Strength | Notes |
|---|---|---|
| `SystemCallFilter=~@debug` on **web** | Strong | Symmetric to D2 but applied to web. |
| `kernel.yama.ptrace_scope=2` | Strong | System-wide, helps both directions. |
| 3-user split | Strong | Different uids. |
**Composition chosen:** SystemCallFilter on web + yama=2 system-wide.
PrivateUsers cannot be applied to web (sudo incompatibility). 3-user
split as defense-in-depth or deferred.
### D5 — Cross-server contamination
Each `left4me-server@<n>.service` is a separate unit instance. With
`PrivateUsers=true`, each gets its own user namespace. Cross-namespace
ptrace fails. With `TemporaryFileSystem=` and per-instance
`BindPaths=/var/lib/left4me/runtime/%i`, neither instance can read the
other's `runtime/<n>/` or attach to its process.
**Composition chosen:** PrivateUsers + per-instance Bind* (above).
Per-instance uids out of scope.
### D6 — Persistent compromise of `/opt/left4me/src/` blocked from gameserver
Already covered by `ProtectSystem=strict` on server@.service. With
`TemporaryFileSystem=/opt`, the path simply isn't visible to srcds.
**Stronger and redundant — both can stay.**
### D7 — Defenses survive a unit-config refactor in the wrong direction
`deploy/tests/test_deploy_artifacts.py` asserts the directives' presence
in the deployed unit. Add hardening invariants as test cases. Survives
because the test fails CI before deploy.
## Section 5 — Candidate composition
**For testing, not commitment.** Test plan validates each piece.
### `left4me-server@.service`
```ini
[Service]
User=left4me
Group=left4me
# (existing)
Type=simple
WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
EnvironmentFile=/etc/left4me/host.env
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
Restart=on-failure
RestartSec=5
# Resource control (existing)
Slice=l4d2-game.slice
Nice=-5
IOSchedulingClass=best-effort
IOSchedulingPriority=4
OOMScoreAdjust=-200
MemoryHigh=1.5G
MemoryMax=2G
TasksMax=256
LimitNOFILE=65536
KillSignal=SIGINT
TimeoutStopSec=15s
LogRateLimitIntervalSec=0
# Hardening — identity
NoNewPrivileges=true
RestrictSUIDSGID=true
# Hardening — namespaces
PrivateTmp=true
PrivateDevices=true
PrivateIPC=true
PrivateUsers=true # NEW
ProtectHome=true
# Hardening — filesystem view
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media # NEW
BindReadOnlyPaths=/var/lib/left4me/installation # was ReadOnlyPaths
BindReadOnlyPaths=/var/lib/left4me/overlays # was ReadOnlyPaths
BindReadOnlyPaths=/etc/left4me/host.env # NEW
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates # NEW
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives # NEW
BindPaths=/var/lib/left4me/runtime/%i # was ReadWritePaths
ProtectSystem=strict
# (remove old ReadOnlyPaths= and ReadWritePaths= lines — superseded)
# Hardening — /proc, /sys, kernel
ProtectProc=invisible # NEW
ProcSubset=pid # NEW
ProtectKernelTunables=true # NEW
ProtectKernelModules=true # NEW
ProtectKernelLogs=true # NEW
ProtectClock=true # NEW
ProtectControlGroups=true # NEW
ProtectHostname=true # NEW
LockPersonality=true
# Hardening — caps + syscall
CapabilityBoundingSet= # NEW
AmbientCapabilities= # NEW
SystemCallArchitectures=native # NEW
SystemCallFilter=@system-service # NEW
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged # NEW
# Hardening — network
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW (AF_UNIX for journald)
# Hardening — namespaces, realtime, IPC
RestrictNamespaces=true # NEW
RestrictRealtime=true # NEW
RemoveIPC=true # NEW
KeyringMode=private # NEW
UMask=0027 # NEW
# Deferred until test:
# MemoryDenyWriteExecute=true # MAY break sourcemod / Source engine; test first.
```
### `left4me-web.service`
```ini
[Service]
User=left4me
Group=left4me
# (existing)
Type=simple
WorkingDirectory=/opt/left4me/src
Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
EnvironmentFile=/etc/left4me/host.env
EnvironmentFile=/etc/left4me/web.env
ExecStart=/opt/left4me/.venv/bin/gunicorn --workers ... --threads ... --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
Restart=on-failure
RestartSec=3
# Hardening
PrivateTmp=true
ProtectSystem=strict # tightened from =full
ProtectHome=true
ReadWritePaths=/var/lib/left4me # web needs broad write access there
# NoNewPrivileges intentionally NOT set — sudo
# PrivateUsers intentionally NOT set — sudo
# /proc + kernel hardening (sudo-compatible)
ProtectProc=invisible # NEW
ProcSubset=pid # NEW
ProtectKernelTunables=true # NEW
ProtectKernelModules=true # NEW
ProtectKernelLogs=true # NEW
ProtectClock=true # NEW
ProtectControlGroups=true # NEW
ProtectHostname=true # NEW
LockPersonality=true # NEW
# Syscall filter — allow @system-service minus debug-class; keep @privileged
# because sudo needs setuid, chown, etc.
SystemCallArchitectures=native # NEW
SystemCallFilter=@system-service # NEW
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete # NEW
# Network
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW
# Misc hygiene
RestrictRealtime=true # NEW
RestrictNamespaces=true # NEW
RemoveIPC=true # NEW
UMask=0027 # NEW
# Deferred for sudo-removal future work:
# NoNewPrivileges=true
# CapabilityBoundingSet=
# PrivateUsers=true
```
### Host sysctl
`/etc/sysctl.d/99-left4me.conf` (or merge into existing):
```
kernel.yama.ptrace_scope=2
```
System-wide. Means: even if a unit-level config slips, host-level
ptrace is admin-only. Cost: zero for our use case (no debugging in
prod).
## Section 6 — Trade-offs and known sharp edges
To verify in the test plan:
1. **`PrivateUsers=true` + `+`-prefixed ExecStartPre**: expected to
work (the `+` runs outside the unit's namespaces). Sharp if it
doesn't — the overlay mount would fail and srcds wouldn't start.
2. **`TemporaryFileSystem=/etc` and missing files**: srcds and its
dependencies (libstdc++ runtime, libssl, libcurl) may read files
from `/etc` we haven't bound. Watch journalctl for ENOENT during
first start.
3. **`SystemCallFilter=~@privileged` and Source engine**: srcds is C++
and uses syscalls beyond the obvious. A `~@privileged` may trip
something. Mitigation: test with `SystemCallLog=` instead of
`SystemCallFilter=` first; observe what would have been blocked;
then narrow.
4. **`MemoryDenyWriteExecute=true` and sourcemod**: SourcePawn is
bytecode-interpreted (no JIT) per public docs, but plugin
compilation could in theory use a JIT. Test before enabling.
5. **`RestrictAddressFamilies=` without AF_UNIX**: journald socket
needs it. Always include AF_UNIX.
6. **`ProcSubset=pid` and Python**: gunicorn shouldn't break (uses
/proc/self/* + signal-based ipc). Verify.
7. **sysctl `kernel.yama.ptrace_scope=2`**: blocks operator's own
`gdb` / `strace -p` against any running service. If you need to
debug, temporarily set back to 1 via sysctl, then revert.
8. **`ProtectSystem=strict` on web**: was `=full`. Tighter; might
break a write the web app does to a path outside `/var/lib/left4me`.
Audit `l4d2web/*` for `os.makedirs` or `open(...'w')` outside that
root.
## Open questions for the implementer
(After test plan results come back, finalize these.)
1. Do we adopt `MemoryDenyWriteExecute=true` if it works for srcds?
(Probably yes, defense-in-depth at low cost.)
2. Do we set `SocketBindAllow=` on srcds to lock the port range?
(Depends on whether `instance.env` exposes the range cleanly to a
unit directive.)
3. Do we deploy AppArmor profiles as a follow-up?
(Probably no — operational complexity exceeds the marginal gain on
single-host infra.)
4. Do we keep both `BindReadOnlyPaths=` and the legacy
`ReadOnlyPaths=` declarations, or simplify? (Simplify — use Bind*
exclusively once `TemporaryFileSystem=` is in place.)
5. Do we proceed with 3-user split as a follow-up, or close the spec
as "addressed by hardening"? Depends on operator's residual-risk
tolerance after Phase A lands and we observe.
## Pointers
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
- Test plan: `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
- Original uid-split spec (still open): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
- Live unit source (ckn-bw reactor): `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
- Reference units (deploy-dir-rethink reference-only): `deploy/files/usr/local/lib/systemd/system/`
- systemd docs (latest, systemd 256+ on Trixie):
`man systemd.exec`, `man systemd.unit`, `man systemd-analyze`.
- L4D2 / Source engine docs:
- SourcePawn (bytecode-interpreted): https://wiki.alliedmods.net/SourcePawn
- srcds is a Source 2007 engine binary; closed-source, expect surprises.

View file

@ -0,0 +1,898 @@
# left4me application hardening — test plan
**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md`
and `2026-05-15-hardening-defenses-survey.md`. **Executed in a follow-up
session with shell access to `left4.me` (141.95.32.8).**
This document is intentionally self-contained: a session that lands cold
with shell on `left4.me` can execute it end-to-end without re-reading
the threat model or survey. Decisions made in this plan are based on the
candidate composition in the defenses survey (Section 5).
## Test architecture
### Where we test
- **Host:** `left4.me` / `ovh.left4me` (141.95.32.8). Production host;
no separate test bench. (Reference: memory entry
`feedback_test_server_hangs.md` mentions a separate test server at
`ckn@10.0.4.128`; verify whether that host is suitable for this work
*before* using prod.)
- **Canary unit:** `left4me-server@1.service`. Use this as the test
instance. Leave `left4me-server@2.service` running baseline so at
least one server stays up if the canary breaks.
- **Web unit:** `left4me-web.service` is shared. Test web-side
hardening only after server@ tests prove the composition; web is
more disruptive to roll back.
### Operating constraints
- **System units only.** No `systemctl --user`, no lingering, no
per-user systemd instance. All units under `/etc/systemd/system/` or
`/usr/local/lib/systemd/system/`. Drop-ins go to
`/etc/systemd/system/<unit>.d/`.
- **Drop-in style.** Tests apply via `/etc/systemd/system/left4me-server@1.service.d/test-NN-<name>.conf`
(note: `@1` for instance-specific). This leaves the template
unmodified — other instances unaffected. `systemctl daemon-reload`
picks up drop-ins; `systemctl restart left4me-server@1` applies.
- **Cleanup required.** Each test removes its drop-in before the next
starts. Baseline must be restorable at any point.
- **Recording.** Each test produces a one-paragraph result in this
document's "Results" section at the bottom. Append, don't replace.
### Failure modes to watch for
- **SECCOMP audit:** `journalctl -k --since '1 minute ago' | grep -i seccomp`
shows `type=1326` lines. Each is a syscall denied; the syscall number
identifies the call. Use `scmp_sys_resolver` to translate.
- **Unit start failure:** `systemctl is-active left4me-server@1``inactive` or `failed`.
- **srcds crash mid-game:** `journalctl -u left4me-server@1 -f` shows
unexpected exit; `systemctl show left4me-server@1 -p Result` is
not `success`.
- **sourcemod/metamod plugin failures:** in-game `sm plugins list` or
RCON `sm plugins list` shows plugins as failed-to-load.
- **Permission denied where unexpected:** `journalctl -u left4me-server@1`
shows `Permission denied` or `Operation not permitted`.
## Before any test: baseline capture
Capture these so we can compare after each test, and so we have a
known-good snapshot to revert to.
```bash
# 1. Baseline systemd-analyze score
sudo systemd-analyze security left4me-server@1.service \
| tee /tmp/sec-baseline-server.txt
sudo systemd-analyze security left4me-web.service \
| tee /tmp/sec-baseline-web.txt
# 2. Full current unit (cat'd, post-merge with any existing drop-ins)
sudo systemctl cat left4me-server@1.service \
| tee /tmp/unit-baseline-server.conf
sudo systemctl cat left4me-web.service \
| tee /tmp/unit-baseline-web.conf
# 3. Current sysctl
sysctl kernel.yama.ptrace_scope | tee /tmp/sysctl-baseline.txt
# Expect: kernel.yama.ptrace_scope = 1 (Debian default)
# 4. Functional baseline — confirm both servers + web healthy now
sudo systemctl is-active left4me-server@1 left4me-server@2 left4me-web
# Expect: active active active
# 5. Confirm srcds_linux running, gunicorn running
sudo systemctl status left4me-server@1 left4me-server@2 left4me-web \
--no-pager | head -40
# 6. RCON sanity (optional — needs an RCON password)
# (Use the web UI to fire `status` against server@1; expect a reply.)
# 7. Capture baseline syscalls (to compare what's blocked after filter)
# This is heavy; only run if you suspect a filter is too tight:
# sudo systemctl edit --runtime left4me-server@1
# Add: SystemCallLog=@privileged
# Reload, restart, observe journalctl -u for ~5 minutes, then revert.
```
Record `/tmp/sec-baseline-server.txt` score (a value like "5.4 EXPOSED"
is typical). Goal: lower (more secure) after refactor.
## Test 1 — `PrivateUsers=true` compatibility
**Goal:** Confirm `PrivateUsers=true` works on `left4me-server@.service`
with the `+`-prefixed `ExecStartPre` overlay-mount helper.
**Pre-condition:** server@1 active, baseline captured.
**Drop-in:**
```bash
sudo install -d -m0755 /etc/systemd/system/left4me-server@1.service.d/
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-01-privateusers.conf <<'EOF'
[Service]
PrivateUsers=true
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify:**
```bash
# 1. Unit started cleanly
sudo systemctl is-active left4me-server@1
# Expect: active
# 2. ExecStartPre's nsenter+overlay-mount succeeded (the mount exists)
sudo findmnt /var/lib/left4me/runtime/1/merged
# Expect: a row showing overlay mounted
# 3. Process is running
pgrep -af srcds_linux
# Expect: at least one PID matching left4dead2
# 4. From inside the unit's namespace: process appears as configured uid
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
sudo cat /proc/$PID/status | grep -E '^Uid|^Gid'
# Expect: uid 980 (left4me) — outside the namespace, the kernel reports
# the unit's User=. Inside the namespace it's also 980 (identity map).
# 5. Userns confirmed
sudo readlink /proc/$PID/ns/user
sudo readlink /proc/1/ns/user
# Expect: different — different user namespaces
```
**Pass criteria:** all five checks pass.
**Failure handling:** if unit fails to start, check
`journalctl -u left4me-server@1 -n 100` for the failure reason. Most
likely cause if it fails: the overlay-mount helper itself depends on
the unit's mount namespace in a way that PrivateUsers breaks. (The `+`
prefix should bypass — verifying that assumption is the test's whole
point.)
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-01-privateusers.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
sudo systemctl is-active left4me-server@1 # active again
```
---
## Test 2 — `TemporaryFileSystem` + minimal bind set
**Goal:** Confirm srcds runs with `/var/lib`, `/etc`, `/opt`, `/home`,
`/root` virtualized to empty tmpfs, with only the listed paths bound back.
**Drop-in:**
```bash
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-02-tmpfs.conf <<'EOF'
[Service]
# Remove the legacy paths so they don't collide with the new bind setup
ReadOnlyPaths=
ReadWritePaths=
# Virtual filesystem
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
BindReadOnlyPaths=/var/lib/left4me/installation
BindReadOnlyPaths=/var/lib/left4me/overlays
BindReadOnlyPaths=/etc/left4me/host.env
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives
BindPaths=/var/lib/left4me/runtime/%i
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify:**
```bash
# 1. Unit started
sudo systemctl is-active left4me-server@1
# 2. From inside the unit's namespace: invisible files
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
sudo nsenter --target $PID --mount -- ls -la /var/lib/left4me/left4me.db 2>&1
# Expect: No such file or directory
sudo nsenter --target $PID --mount -- ls -la /etc/left4me/web.env 2>&1
# Expect: No such file or directory
sudo nsenter --target $PID --mount -- ls /opt 2>&1
# Expect: empty or "No such file or directory"
sudo nsenter --target $PID --mount -- ls /var/lib/left4me/
# Expect: only installation, overlays, runtime (the bound paths)
# 3. Bound paths visible and right mode
sudo nsenter --target $PID --mount -- ls -la /var/lib/left4me/runtime/1/
# Expect: upper, work, merged dirs visible, RW
sudo nsenter --target $PID --mount -- ls /etc/left4me/
# Expect: only host.env
# 4. DNS works (workshop downloads, master server)
sudo nsenter --target $PID --mount --net -- getent hosts steamcommunity.com
# Expect: an IP
# 5. Game running normally
sudo systemctl status left4me-server@1 --no-pager | head -15
# Expect: active (running)
# 6. No SECCOMP/EACCES errors
sudo journalctl -u left4me-server@1 --since '2 minutes ago' \
| grep -iE 'permission|denied|seccomp|EACCES|ENOENT' | head -20
# Expect: nothing alarming. Some ENOENT may be normal (srcds probes
# files); the question is whether anything is failing fatally.
```
**Pass criteria:** unit active, DB/web.env/src invisible, runtime
visible+writable, DNS works, no fatal errors in journal.
**Failure handling:** if a bind path is missing on disk, the unit
fails to start with a clear error. Add the missing path or remove the
bind reference.
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-02-tmpfs.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
---
## Test 3 — `SystemCallFilter` (logging mode)
**Goal:** Discover what srcds calls under load before committing to a
filter. Run with `SystemCallLog=` (audit only, doesn't block) for 5-10
minutes of live play.
**Drop-in:**
```bash
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-03-syslog.conf <<'EOF'
[Service]
SystemCallArchitectures=native
# Log every syscall in @privileged + @debug + @mount + @raw-io
SystemCallLog=@privileged @debug @mount @raw-io
SystemCallFilter=
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify (and produce data):**
```bash
# 1. Unit active
sudo systemctl is-active left4me-server@1
# 2. Capture logs for 5 minutes during normal play
# (manually connect a Steam client to the server, walk around, then disconnect)
sudo journalctl -u left4me-server@1 --since '5 minutes ago' \
| grep -iE 'audit|syscall|SCMP' \
| tee /tmp/syscall-log-test3.txt
# 3. Analyze
sort -u /tmp/syscall-log-test3.txt > /tmp/syscall-log-test3-uniq.txt
wc -l /tmp/syscall-log-test3-uniq.txt
# Read through; identify whether @debug or @mount or @privileged
# contains any syscall srcds calls during normal operation.
```
**Pass criteria:** capture is complete. Decision feeds Test 4.
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-03-syslog.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
---
## Test 4 — `SystemCallFilter` (enforcement mode)
**Goal:** Apply the candidate `SystemCallFilter=` and confirm srcds
runs without any SECCOMP-killed calls. Tightness driven by Test 3
results.
**Drop-in:**
```bash
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-04-syscall.conf <<'EOF'
[Service]
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify:**
```bash
# 1. Unit active
sudo systemctl is-active left4me-server@1
# 2. Watch for SECCOMP kills for ~10 minutes during play
sudo journalctl -u left4me-server@1 -kf
# Press Ctrl-C after 10 min if no SECCOMP audit lines (type=1326)
# 3. Functional: server accepts connections, plugins load
# (use Steam client; verify in-game)
# Optional RCON check:
# sudo rcon -p $PW -a left4.me:27015 "sm plugins list"
# Expect: list of plugins, all loaded.
# 4. Verify ptrace is blocked
GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -5
# Expect: ptrace: Operation not permitted (or seccomp denial)
```
**Pass criteria:** unit active for ≥10 min, no SECCOMP kills, plugins
load, ptrace blocked.
**Failure handling:** if SECCOMP kills appear:
- Identify the syscall from the audit line (`syscall=<num> compat=0`),
resolve via `scmp_sys_resolver -a $(uname -m) <num>` (libseccomp-dev).
- Relax the filter: remove the offending group from the deny list, OR
switch from kill (default) to log (`SystemCallErrorNumber=EPERM`)
for that group.
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-04-syscall.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
---
## Test 5 — `ProcSubset=pid` + `ProtectProc=invisible`
**Goal:** Confirm /proc is narrowed to the unit's own PIDs and
hidden from external readers.
**Drop-in:**
```bash
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-05-proc.conf <<'EOF'
[Service]
ProtectProc=invisible
ProcSubset=pid
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify:**
```bash
# 1. Unit active
sudo systemctl is-active left4me-server@1
# 2. /proc visibility narrowed
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
sudo nsenter --target $PID --mount --pid -- ls /proc | head -20
# Expect: only the unit's own PIDs (srcds_run, srcds_linux,
# child threads). NOT gunicorn or other PIDs.
# 3. Can't read other procs' environ
GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/environ 2>&1
# Expect: No such file or directory (invisible) — not Permission denied
```
**Pass criteria:** all of the above; no gunicorn PIDs visible.
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-05-proc.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
---
## Test 6 — `MemoryDenyWriteExecute=true`
**Goal:** Test whether Source engine + sourcemod work under MDW=true.
**Likely to fail.** Skip if uncertain.
**Drop-in:**
```bash
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-06-mdw.conf <<'EOF'
[Service]
MemoryDenyWriteExecute=true
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify:**
```bash
# 1. Unit active
sudo systemctl is-active left4me-server@1
# 2. Run for 10+ minutes during normal play, including:
# - Connect a Steam client
# - Walk around a map
# - Trigger a plugin (rcon: sm_admin)
# - Map change
# - Disconnect
# 3. Watch for crashes
sudo journalctl -u left4me-server@1 --since '15 minutes ago' \
| grep -iE 'segfault|SIGSEGV|coredump|abort|EPERM.*mprotect'
# Expect: empty
# 4. SECCOMP kills from mprotect calls
sudo journalctl -u left4me-server@1 -k --since '15 minutes ago' \
| grep -i 'type=1326.*mprotect'
# Expect: empty
```
**Pass criteria:** no crashes, no relevant SECCOMP audit lines.
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-06-mdw.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Decision:** if pass → include `MemoryDenyWriteExecute=true` in the
final composition. If fail → exclude (and document the reason in the
result).
---
## Test 7 — Full proposed composition (everything that passed)
**Goal:** Compose tests 1, 2, 4, 5, (6 if it passed) into a single
drop-in and verify nothing interacts badly.
**Drop-in:** (Adjust to skip Test 6's directives if Test 6 failed.)
```bash
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-07-full.conf <<'EOF'
[Service]
# Identity / privilege
NoNewPrivileges=true
RestrictSUIDSGID=true
CapabilityBoundingSet=
AmbientCapabilities=
UMask=0027
# Namespaces
PrivateUsers=true
PrivateTmp=true
PrivateDevices=true
PrivateIPC=true
ProtectHome=true
# Filesystem view (clean slate)
ReadOnlyPaths=
ReadWritePaths=
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
BindReadOnlyPaths=/var/lib/left4me/installation
BindReadOnlyPaths=/var/lib/left4me/overlays
BindReadOnlyPaths=/etc/left4me/host.env
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives
BindPaths=/var/lib/left4me/runtime/%i
ProtectSystem=strict
# /proc + kernel
ProtectProc=invisible
ProcSubset=pid
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectKernelLogs=true
ProtectClock=true
ProtectControlGroups=true
ProtectHostname=true
LockPersonality=true
# Syscall
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
# Network
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
# IPC + realtime + namespaces
RestrictNamespaces=true
RestrictRealtime=true
RemoveIPC=true
KeyringMode=private
# (Include only if Test 6 passed:)
# MemoryDenyWriteExecute=true
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1
```
**Verify:**
```bash
# 1. Unit active
sudo systemctl is-active left4me-server@1
sleep 30
sudo systemctl is-active left4me-server@1 # still active
# 2. systemd-analyze: score should drop significantly
sudo systemd-analyze security left4me-server@1.service \
| tee /tmp/sec-after-server.txt
diff /tmp/sec-baseline-server.txt /tmp/sec-after-server.txt \
| head -40
# Expect: many ✓ lines that were ✗, score dropped
# 3. Run smoke matrix (next section)
```
**Smoke matrix (run after Test 7 settles):**
```bash
# S1: server is responsive
sudo systemctl status left4me-server@1 --no-pager | head -10
# Active (running), recent green
# S2: srcds is in-game
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
[ -n "$PID" ] && echo "OK: srcds PID $PID" || echo "FAIL"
# S3: from outside, RCON responds
# (do this from the operator's laptop or via the web UI)
# S4: workshop / overlay refresh path
# (trigger from web UI; verify the overlay rebuild succeeds — the
# script-sandbox is a SEPARATE unit, not affected by these changes,
# so any failure is in the web app's invocation path, not the
# sandbox itself.)
# S5: web app can still sudo helpers
# (trigger a server start/stop from the web UI; if the sudo path
# fails, the web app's hardening is too tight — but we haven't
# changed the web unit yet, so this should still work.)
# S6: log streaming works
# (open the web UI's log view for server@1; verify lines flow.)
# S7: file upload to overlay
# (upload a small file via the file-tree endpoint; verify it
# appears on disk in /var/lib/left4me/overlays/<id>/.)
# S8: peer server unaffected
sudo systemctl is-active left4me-server@2
# active (we didn't touch it)
```
**Pass criteria:** all smoke items pass. systemd-analyze score
dropped significantly.
**Failure handling:** if anything in the smoke fails, identify which
directive caused it by removing them one at a time until smoke
passes. Document the offender.
**DO NOT cleanup yet** — leave Test 7 in place for Test 8.
---
## Test 8 — Attack verification (the audit gaps)
**Goal:** Confirm the threat-model defenses (D1, D2, D3, D5) actually
work end-to-end.
**Pre-condition:** Test 7's drop-in still in place.
**Verify:**
```bash
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
# D1.a — srcds cannot read DB
sudo nsenter --target $PID --mount -- cat /var/lib/left4me/left4me.db 2>&1 | head -1
# Expect: cat: /var/lib/left4me/left4me.db: No such file or directory
# D1.b — srcds cannot read web.env
sudo nsenter --target $PID --mount -- cat /etc/left4me/web.env 2>&1 | head -1
# Expect: cat: /etc/left4me/web.env: No such file or directory
# D1.c — srcds cannot read its own past
sudo nsenter --target $PID --mount -- ls /opt 2>&1 | head -5
# Expect: empty listing or No such file or directory
# D2.a — srcds cannot ptrace gunicorn (syscall filter)
sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -3
# Expect: Operation not permitted
# D2.b — srcds cannot read /proc/<gunicorn>/environ
sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/environ 2>&1 | head -1
# Expect: No such file or directory (ProtectProc=invisible)
# D2.c — srcds cannot read /proc/<gunicorn>/mem
sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/mem 2>&1 | head -1
# Expect: No such file or directory
# D3 — srcds cannot use sudo helpers (NoNewPrivileges blocks setuid)
sudo nsenter --target $PID --mount -- sudo -n /usr/local/libexec/left4me/left4me-systemctl show server@2 2>&1 | head -3
# Expect: a sudo error about no new privileges, or operation not permitted
# D5 — server@1 cannot ptrace server@2's srcds
PID2=$(pgrep -f 'srcds_linux.*\@2' | head -1)
[ -n "$PID2" ] && sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $PID2 2>&1 | tail -3
# Expect: Operation not permitted (cross-instance userns OR syscall filter)
# Bonus — confirm PrivateUsers is in effect
sudo readlink /proc/$PID/ns/user
sudo readlink /proc/1/ns/user
# Expect: different
```
**Pass criteria:** every attack vector returns an error.
**Cleanup:** **Do not remove the drop-in yet** — leave it for Test 9.
---
## Test 9 — System-wide sysctl: `kernel.yama.ptrace_scope=2`
**Goal:** Add belt-and-braces system-wide.
**Apply:**
```bash
sudo tee /etc/sysctl.d/99-left4me-ptrace.conf <<'EOF'
# Block ptrace except from root (CAP_SYS_PTRACE).
# Combined with SystemCallFilter=~@debug + PrivateUsers=true in the
# unit, this gives defense-in-depth at three levels.
kernel.yama.ptrace_scope=2
EOF
sudo sysctl --system | grep yama
# Expect: kernel.yama.ptrace_scope = 2
sysctl kernel.yama.ptrace_scope
# Expect: 2
```
**Verify:**
```bash
# As left4me (no caps), gdb attach to gunicorn from OUTSIDE the unit's
# namespace
sudo -u left4me /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -3
# Expect: Operation not permitted
# Operator gdb (as root) still works:
sudo /usr/bin/gdb --batch -ex "info threads" -p $GUNICORN_PID 2>&1 | tail -10
# Expect: gdb output (debugging is admin-only now)
```
**Pass criteria:** non-root can't ptrace anything; root still can.
**No cleanup** — this is permanent (commit to /etc/sysctl.d/).
---
## Test 10 — Web unit hardening (carefully)
**Goal:** Apply non-sudo-breaking directives to `left4me-web.service`.
**Pre-condition:** Test 7's server drop-in still in place. Web is at
baseline.
**Drop-in:**
```bash
sudo install -d -m0755 /etc/systemd/system/left4me-web.service.d/
sudo tee /etc/systemd/system/left4me-web.service.d/test-10-web.conf <<'EOF'
[Service]
# (NoNewPrivileges intentionally NOT set — web sudoes to helpers.)
# (PrivateUsers intentionally NOT set — would break sudo's setuid.)
# (CapabilityBoundingSet not set — sudo + PAM need caps.)
ProtectSystem=strict
ProtectHome=true
LockPersonality=true
UMask=0027
# /proc + kernel
ProtectProc=invisible
ProcSubset=pid
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectKernelLogs=true
ProtectClock=true
ProtectControlGroups=true
ProtectHostname=true
# Syscall (no ~@privileged — sudo needs setuid/etc.)
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete
# Network
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
# Misc
RestrictNamespaces=true
RestrictRealtime=true
RemoveIPC=true
EOF
sudo systemctl daemon-reload
sudo systemctl restart left4me-web
```
**Verify:**
```bash
# 1. Web up
sudo systemctl is-active left4me-web
# 2. Web responds (curl from the host)
curl -sI http://127.0.0.1:8000/ | head -5
# Expect: HTTP/1.1 200 or similar (whatever the default route is)
# 3. Web sudo path works — trigger from operator's laptop, watching the
# web UI. Start/stop a server; observe success.
# 4. systemd-analyze score
sudo systemd-analyze security left4me-web.service \
| tee /tmp/sec-after-web.txt
diff /tmp/sec-baseline-web.txt /tmp/sec-after-web.txt | head -20
# 5. Web cannot ptrace srcds (D4)
WEB_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
sudo -u left4me /usr/bin/gdb --batch -p $PID 2>&1 | tail -3
# (might still succeed if the operator runs as root — what matters is
# from inside the web unit's namespace)
sudo nsenter --target $WEB_PID --mount -- /usr/bin/gdb --batch -p $PID 2>&1 | tail -3
# Expect: Operation not permitted (SystemCallFilter blocks ptrace)
```
**Pass criteria:** all of above.
**Failure handling:** if sudo from web breaks, remove the most likely
culprit (probably one of the SystemCallFilter lines being too tight).
Most likely candidate: `~@debug` could block `process_vm_readv` which
sudo doesn't use, but `~@privileged` is not on the web filter so sudo's
setuid is OK.
**Cleanup:**
```bash
sudo rm /etc/systemd/system/left4me-web.service.d/test-10-web.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-web
```
(Web reverts to baseline. Server drop-in stays for the report.)
---
## Test 11 — Soak test
**Goal:** Run the composition for an extended period to surface
race-condition or workload-dependent issues.
**Pre-condition:** Test 7 drop-in on server@1; Test 9 sysctl in place.
**Procedure:**
```bash
# Run for 24-48 hours; observe:
sudo journalctl -u left4me-server@1 --since '24 hours ago' \
| grep -iE 'seccomp|denied|EACCES|EPERM' | wc -l
# Expect: 0 or a very small number (some EACCES on benign probes
# are normal)
sudo journalctl -u left4me-server@1 -k --since '24 hours ago' \
| grep 'type=1326' | wc -l
# Expect: 0
sudo systemctl status left4me-server@1
# Expect: active, no restarts since start
```
**Pass criteria:** no SECCOMP kills over the soak period, no
unexpected restarts.
---
## Cleanup (after all tests pass)
```bash
# Remove all test drop-ins
sudo rm -rf /etc/systemd/system/left4me-server@1.service.d/test-*.conf
sudo rm -rf /etc/systemd/system/left4me-web.service.d/test-*.conf
sudo systemctl daemon-reload
sudo systemctl restart left4me-server@1 left4me-web
sudo systemctl is-active left4me-server@1 left4me-web # both active
# Sysctl from Test 9 STAYS in place.
# Remove temp files
rm /tmp/sec-baseline-*.txt /tmp/sec-after-*.txt
rm /tmp/unit-baseline-*.conf
rm /tmp/syscall-log-*.txt
```
---
## Results template
Append the executing session's findings here. One paragraph per test.
### Test 1 — PrivateUsers
- Pass / fail: TBD
- Notes:
### Test 2 — TemporaryFileSystem + binds
- Pass / fail: TBD
- Notes:
### Test 3 — SystemCallLog discovery
- Pass / fail: TBD
- Syscalls observed under load (if any from @debug/@mount/@privileged):
- Notes:
### Test 4 — SystemCallFilter enforcement
- Pass / fail: TBD
- If filter had to be relaxed, which group:
- Notes:
### Test 5 — ProcSubset + ProtectProc
- Pass / fail: TBD
- Notes:
### Test 6 — MemoryDenyWriteExecute
- Pass / fail: TBD (likely fail; document the failure mode)
- Notes:
### Test 7 — Full composition
- Pass / fail: TBD
- systemd-analyze score before/after:
- Notes:
### Test 8 — Attack verification
- Pass / fail: TBD
- Per-vector results (D1.a, D1.b, ..., D5):
### Test 9 — Yama ptrace_scope=2
- Applied: TBD
- Operator workflow impact noted:
### Test 10 — Web hardening
- Pass / fail: TBD
- Sudo path verified working:
- systemd-analyze score before/after:
### Test 11 — Soak
- Duration:
- Issues observed:
---
## Output of this test plan
When all tests complete:
1. Mark this document with **status: tested** and record the dates.
2. Open a new implementation plan
(`docs/superpowers/plans/2026-MM-DD-hardening-refactor.md`) that
commits the proven composition to the ckn-bw reactor + reference
units + test suite.
3. Decide on the deferred questions:
- 3-user uid split — necessary or covered by hardening?
- AppArmor profile follow-up — pursue or close?
- `MemoryDenyWriteExecute=true` — include if Test 6 passed?
- `SocketBindAllow=` — add to lock the gameserver port range?
4. Mark `2026-05-15-user-uid-split-design.md` as superseded or closed
per the answer to the previous bullet.
## Pointers
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
- Live unit source: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
- Tools needed on `left4.me`:
- `systemd-analyze` (in `systemd` package, already installed)
- `scmp_sys_resolver` (in `libseccomp-dev`; install on demand for
Test 3/4 if filters need analysis)
- `gdb` (for ptrace tests; install on demand)
- `nsenter` (in `util-linux`, already installed)
- `findmnt`, `pgrep`, standard userspace

View file

@ -0,0 +1,345 @@
# left4me application hardening — threat model
**Status:** living spec, intended input to a hardening implementation plan.
Paired with `2026-05-15-hardening-defenses-survey.md` and
`2026-05-15-hardening-test-plan.md`.
This document establishes *what we defend against and what we accept losing*.
The defenses survey and test plan operationalize this against the codebase.
## Context
The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and
queued "uid split decision" as the next session's task
(`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user
configuration found that the gameserver's systemd hardening blocks
privilege escalation but leaves same-uid attack surface wide open:
RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by
srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable),
no ptrace block on `left4me-server@.service`, no `/proc` isolation.
Rather than answer the original "1/2/3 uids" question in isolation,
this work treats application hardening as a first-class refactor: ground
the decision in an explicit threat model, survey the full Linux+systemd
defense menu, test what composes safely with Source engine + the rest of
the stack, then implement.
## Operating posture (assumed)
Solo-operator, single-host infra (`left4.me` / `ovh.left4me`,
141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged
operator is the user. There are no shell logins as `left4me` or
`l4d2-sandbox`. All access to those uids is funneled through the
systemd-managed units (`left4me-web.service`, `left4me-server@.service`,
`left4me-script-sandbox`). The host runs nothing other than left4me +
ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).
If those assumptions don't hold (e.g., shared host with other tenants,
non-systemd-mediated access to the uids), revise this document before
proceeding — threat surface changes meaningfully.
## Assets
Ordered by impact-if-compromised. Compromise means the attacker can
exfiltrate, modify, or destroy the asset.
### Tier 1 — catastrophic, no easy recovery
| Asset | Where | Impact of compromise |
|---|---|---|
| Host root | the box | Total compromise of every service on the host. |
| `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. |
| `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. |
| Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. |
| User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. |
### Tier 2 — severe but bounded
| Asset | Where | Impact |
|---|---|---|
| `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. |
| Overlay content | `/var/lib/left4me/overlays/<id>/` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. |
| Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. |
| Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. |
| Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. |
### Tier 3 — limited, recoverable
Job history, build logs, the small subset of in-game state not covered by
the above (e.g., live player slot in a specific match).
## Trust boundaries
Lines we want enforced. "Enforced" = the kernel + systemd, not "the
process politely doesn't cross it."
| Id | From | To | Strength today | Strength wanted |
|---|---|---|---|---|
| TB1 | External network | host shell | Strong (firewall, no extra services) | Strong |
| TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong |
| TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) |
| TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong |
| TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong |
| TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/<n>` access) | Medium (web needs to stage server.cfg) |
| TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong |
| TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary |
TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the
web app legitimately writes per-instance config; the boundary is "web
can write per-instance config" allowed, "web can ptrace srcds" denied.
## Attackers
### A1 — Anonymous external attacker (primary)
Reaches public surfaces:
- gunicorn on `:8000` (behind nginx + admin auth)
- srcds on UDP `:27015`+ per instance (game protocol; no auth)
- (Maybe: workshop subscription endpoints if any; check.)
Capabilities: arbitrary network packets. Goal: code execution on the
host, then exfiltrate secrets and persist.
### A2 — Authenticated admin (operator)
In the assumed posture this is *the user*, single person. Out of scope as
a threat per operator's choice (insider == operator). If admin auth ever
expands to multiple operators, revise.
### A3 — Malicious workshop content
A workshop addon (map, plugin, asset pack) is published to the Steam
workshop and pulled into a build. The content runs inside srcds via
Source engine + sourcemod loading. Capabilities: same as A1 once loaded
into srcds (the engine doesn't have a strong privilege boundary against
its own loaded plugins). Distinct in that the entry vector is curated by
the operator (workshop link added to a blueprint), not arbitrary network
input. Risk floor: the operator vetted the source.
### A4 — Compromised player session
A connected player exploits a Source-engine protocol bug. Functionally a
subset of A1 — same capability set once code is running in srcds.
### A5 — Local attacker on the host
Out of scope per operating posture. No non-root local accounts beyond
the systemd-managed service uids.
### A6 — Steam binary supply-chain
`srcds_linux` is a binary from Valve. A compromised Valve build would
already be running as `left4me` and there's no practical defense at
this layer. Out of scope.
## Attack scenarios
### S1 — L4D2 engine RCE → exfil + persist
A1 sends a crafted packet to srcds; srcds executes attacker code as
`left4me` inside `left4me-server@.service`.
**Today, attacker can:**
- Read DB → all RCON passwords (plaintext), all bcrypt hashes.
- Read `web.env` → SECRET_KEY, Steam API key.
- ptrace gunicorn → in-memory secrets, current sessions.
- Read `/proc/<gunicorn-pid>/environ` → same env as `web.env`.
- ptrace + read DB of peer `left4me-server@<n>` — cross-server compromise.
- `sudo left4me-systemctl|journalctl|overlay` for any instance.
- Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`).
- Cannot acquire new caps (NoNewPrivileges).
**Defended outcome (goal):** Blast radius limited to "this gameserver's
runtime state during this session" — no peer-server compromise, no DB
access, no `web.env` access, no ptrace.
### S2 — Web app RCE → secrets + persistence
A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth
bypass, file-upload escape). Web executes attacker code as `left4me`
inside `left4me-web.service`.
**Today, attacker can:**
- Read + write DB (web's primary path).
- Read `web.env`.
- Write `/opt/left4me/src/` → backdoor next gunicorn reload.
- `sudo` all helper verbs.
- ptrace srcds peers, modify their `runtime/<n>/` upper layer.
- Modify overlays (writes to `/var/lib/left4me/overlays/`).
**Defended outcome (goal):** Cannot ptrace gameservers; cannot read
`/proc/<srcds-pid>/*`; web compromise still owns its DB and env (its
primary attack surface, so this is *acceptable residual*).
### S3 — Cross-server contamination
S1 played out on srcds@1; attacker pivots to srcds@2.
**Today:** trivial — ptrace srcds@2, read its memory; or just read the
DB to learn srcds@2's RCON password and send commands.
**Defended outcome (goal):** Blocked. Per-instance namespace isolation
(or per-instance uid) means kernel rejects ptrace; DB invisible to
gameserver uid hides the RCON list.
### S4 — Malicious workshop content
A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod
plugin that abuses engine APIs to do file I/O / network calls.
**Today + with hardening:** functionally equivalent to S1 — the plugin
runs as srcds, same blast radius. No software boundary prevents this;
the only defense is what's outside the unit. So this is *covered* if S1
is covered.
### S5 — Sudoers helper abuse
S1 or S2 attacker uses the sudo grants to widen access.
**Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`):
- `left4me-systemctl <name> {enable|disable|show}` — any instance, no
ownership check
- `left4me-journalctl <name>` — read any unit's journal
- `left4me-overlay mount|umount <name>` — any instance
- `left4me-script-sandbox <overlay_id> <script>` — runs as `l4d2-sandbox`
A compromised gameserver can enable/disable peer instances, read their
journals, mount/umount their overlays. Not root escalation, but a
significant escalation.
**Defended outcome:** sudoers reachable only from `left4me-web`. The
gameserver uid (or the gameserver's namespace) gets none of the helper
grants. This is naturally true if the helpers are invoked only by the
web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits
in its FS view).
### S6 — Sandbox escape
Reached A1-equivalent in `l4d2-script-sandbox`. The sandbox runs as
`l4d2-sandbox`, fully hardened (verified during 2026-05-15 work).
**Today:** sandbox-escape attacker has `l4d2-sandbox` capabilities only.
With build-time-idmap, writes through the bind land on disk as
`left4me`, but the sandbox process itself cannot interact with `left4me`
processes (different uid). Existing isolation is strong.
**Defended outcome:** unchanged — already strong. Document as a load-
bearing invariant; do not weaken.
## What we accept losing
Decisions to *not* defend, with reasoning. Future work might revisit.
- **Kernel CVEs** that escape namespaces or seccomp. No practical defense
short of running on a hypervisor + KVM. Out of scope.
- **systemd unit-config CVEs**. Unit hardening relies on systemd
honoring directives correctly. Out of scope.
- **Steam binary compromise**. `srcds_linux` is Valve's. Out of scope.
- **Sourcemod / Metamod plugin runtime weaknesses**. Plugins run as srcds
by design. Out of scope.
- **Player IP exposure via game protocol**. Inherent to UDP/Source. Out of
scope.
- **DoS via game protocol** (`A2S_INFO` flooding etc.). Out of scope for
*this* effort; covered by network-layer mitigations.
- **DoS via web HTTP**. Covered upstream by nginx + fail2ban; out of
scope for *this* effort.
- **Host root from operator error** (a misconfigured cron, an admin
shell). Out of scope; operator is single-person and aware.
- **Long-term forward secrecy** for past sessions (an attacker who
exfils SECRET_KEY can replay past sessions). Out of scope; rotation
on incident.
## What we defend (prioritized)
D1 — **Gameserver RCE cannot exfiltrate DB or web.env**, including RCON
passwords and SECRET_KEY. Highest value: catastrophic asset, plausible
attack (L4D2 engine RCE is the canonical "old engine, public traffic"
risk).
D2 — **Gameserver RCE cannot ptrace web app or peer gameservers**. Blocks
in-memory secret theft and cross-server contamination.
D3 — **Gameserver RCE cannot use sudo helpers** for instances other
than its own (or, ideally, cannot use sudo at all).
D4 — **Web app RCE cannot ptrace gameservers**. Symmetric to D2; web
still has full DB access (acceptable residual since it's the web app's
own data).
D5 — **Cross-server contamination blocked at the kernel level**. Per-
instance namespaces or per-instance uid.
D6 — **Persistent compromise of `/opt/left4me/src/` blocked from
gameserver context**. Already partially true via `ProtectSystem=strict`;
maintain.
D7 — **All defenses survive a unit-config refactor in the wrong
direction** — e.g., a future developer adding `ReadWritePaths=` widely.
Achieved via tests that assert hardening invariants
(`deploy/tests/test_deploy_artifacts.py`).
## Acceptable user-experience cost
- **Unit start latency**: +5s tolerable; +30s not.
- **Memory overhead**: +tens of MB per unit fine; +hundreds not.
- **Operational complexity**: one well-documented unit-template
hardening profile reusable across units. Acceptable trade-off.
- **Debugging cost**: SECCOMP audit log discoverability via
`journalctl -k` acceptable. ptrace-based debugging in production
unnecessary; can re-enable via ad-hoc drop-in if needed.
- **Steam updates / pip installs**: must continue to work without
per-update operator action. Privileged paths (steamcmd self-update)
can run as `left4me` outside the unit if needed; document.
- **Workshop content**: must continue to load. Builds run in the
sandbox; the gameserver only reads pre-built overlays.
## Acceptance criteria for the implementation
The final composition (hardening directives + any uid changes) must:
1. **Functionally**: pass the smoke matrix from `2026-05-15-hardening-test-plan.md` (RCON, build, restart, file upload, multi-server, workshop).
2. **Defenses verified**:
- srcds cannot read `/var/lib/left4me/left4me.db` or `/etc/left4me/web.env` (file not in FS view, or kernel denies)
- srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
- srcds cannot read `/proc/<other-pid>/*`
- web cannot ptrace srcds (symmetric)
3. **No regressions**: existing test suite passes
(`pytest deploy/tests/test_overlay_helper.py l4d2host/tests/`).
4. **Auditable**: invariants asserted in `deploy/tests/test_deploy_artifacts.py`; baseline `systemd-analyze security` score recorded.
5. **Documentable**: one paragraph per directive in the unit, explaining
*why* it's there. Future maintainers can reason about removal.
## Open questions to clarify with the operator
Before the defenses survey is final, clarify:
1. **Is gunicorn directly internet-reachable, or behind nginx?** The unit
binds `127.0.0.1:8000` (per `metadata.py:208`); presumably nginx
terminates TLS and forwards. Confirm.
2. **Auth model**: who can log into the web app? Is admin auth strong
(long passwords, 2FA), or default-grade? Defines how realistic S2 is.
3. **Workshop content sources**: curated by operator, or arbitrary
workshop subscriptions exposed to admins? Defines A3's realism.
4. **Test bench**: is `ckn@10.0.4.128` a real separate test host, or
ovh.left4me the only deployment target? Affects test plan choices.
5. **`kernel.yama.ptrace_scope` setting on the host?** Default Debian is
1; we may want 2 system-wide.
6. **Is the host running AppArmor?** Debian Trixie does not enable it by
default. If we want AppArmor profiles for srcds (in addition to
systemd directives), it needs enabling system-wide.
## Pointers
- Audit synthesis (this session's conversation): unit hardening profile
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`,
metadata reactor `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`,
filesystem ACLs `~/Projekte/ckn-bw/bundles/left4me/items.py:21-115`,
DB schema `l4d2web/models.py:31, 146-148`, sudoers
`deploy/files/etc/sudoers.d/left4me`.
- Original uid-split spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
— remains open; this work may supersede it.
- Companion docs:
`docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`,
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`.
- Related work landed this session:
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`,
`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`.

View file

@ -1,105 +1,114 @@
# Session handoff — next: uid-split decision
# Session handoff — next: execute hardening test plan
Short handoff for the session that follows the 2026-05-15 deploy-dir
rethink + janitorial sweep. The full project context is in CLAUDE.md
and the per-topic specs/plans linked below; this doc covers only
what's situationally fresh.
Short handoff. Three new hardening specs landed today; the next session
takes the test plan to `left4.me` and runs it. Decision on
`2026-05-15-user-uid-split-design.md` is **deferred** until the test
plan reports back.
## What just landed
Four commits since `e38b844`, pushed to `origin/master`:
Three coordinated specs at `docs/superpowers/specs/`:
- `5284e28` — privileged helpers moved out of `deploy/files/usr/local/{libexec,sbin}/`
into top-level `scripts/{libexec,sbin}/`. `deploy/` is now reference
material (README + example configs + curated example units). Dead
static artifacts deleted: `left4me-apply-cake`, `left4me-cake.service`,
`left4me-nft-mark.service`, `cake.env`, `left4me-mark.nft`, the
superseded `deploy-test-server.sh`.
- `160911f` — plan landed at `docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`;
adjacent specs marked resolved.
- `8f30dd7` — janitorial item 6 (bubblewrap doc-drift) corrected.
- `4aa69c2` — janitorial items 8 and 9 verified on `ovh.left4me`
(141.95.32.8) and marked resolved.
- `2026-05-15-hardening-threat-model.md` — assets, attackers (A1-A6),
trust boundaries (TB1-TB8), attack scenarios (S1-S6), what we
defend (D1-D7), what we accept losing.
- `2026-05-15-hardening-defenses-survey.md` — full Linux + systemd
defense menu, per-defense primitive mapping, candidate composition
for `left4me-server@.service` + `left4me-web.service`.
- `2026-05-15-hardening-test-plan.md` — 11 tests runnable cold on
`left4.me`; drop-in style so they never modify persistent units.
Companion change in **ckn-bw** is committed (`91b7265`) but **not yet
pushed**. Verified against the test host via `bw apply ovh.left4me`;
the working-tree-as-applied was committed afterwards. Pushing it is
safe and idempotent (deployed bytes already match).
## Why the shape changed (from uid-split → hardening)
Janitorial spec status: items 1, 2 (partial), 3, 4, 5, 6, 8, 9
closed. Items 7 and 10 remain (item 7 is conditional on the
build-overlay-unit refactor; item 10 is a calendar reminder for SM
1.13 in late 2026).
The prior handoff pointed this session at the 1/2/3-user decision in
`2026-05-15-user-uid-split-design.md`. Audit during this session
established that the same-uid attack surface (DB readable from srcds,
ptrace of gunicorn allowed, RCON passwords stored plaintext in DB,
no `/proc` isolation) is closable by *either* a uid split *or*
systemd directive composition (`TemporaryFileSystem=` +
`SystemCallFilter=~@debug` + `PrivateUsers=true` + `ProcSubset=pid`
+ empty `CapabilityBoundingSet=`). Operator chose to step back: do
threat-model + research + test before committing to either approach.
The three new specs are the output of that step-back.
## What's next: uid-split
## What's next: run the test plan
Existing handoff:
[`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`](2026-05-15-user-uid-split-design.md).
Read that first. The decision is whether left4me should have 1, 2,
or 3 system users; today it has 2 (`left4me` + `l4d2-sandbox`).
The test plan is **self-contained** — drop a fresh Claude session on
`left4.me` (141.95.32.8) with the spec in hand and it can execute end
to end. System units only; no user units, no lingering.
This is a **decision** task, not a migration. Likely outcome: settle
the question with a short plan and either a memory entry / spec
resolution (if status quo wins) or a follow-up implementation plan
(if "split to 3" or "collapse to 1" wins). Time-box the decision to
one session; defer any migration work to a follow-up plan.
Per the test plan's structure:
1. Capture baseline (`systemd-analyze security`, current unit state,
sysctl).
2. Tests 1-6 isolate individual directives against srcds on
`left4me-server@1` (canary; server@2 stays baseline as a fallback).
3. Test 7 composes everything that passed.
4. Test 8 verifies the threat-model defenses (D1-D5) actually work.
5. Test 9 applies `kernel.yama.ptrace_scope=2` system-wide.
6. Test 10 applies the sudo-compatible subset to `left4me-web.service`.
7. Test 11 is a 24-48h soak.
### Decision-relevant context that emerged this session
Results template at the bottom of the test plan; fill in as you go.
- **The 2-uid model is freshly load-bearing.** The build-time-idmap
work (commits `2f6a9cf` + `9053186`, plan
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`) explicitly
used "sandbox escape could see web.env / DB / running gameservers"
as the argument for keeping `l4d2-sandbox` as a separate uid. That
argument cuts the "collapse to 1" option hard.
- **Verified clean on the host:** `left4me-server@{1,2}.service` are
both running as `left4me` today (janitorial item 8 diagnostic).
No orphan idmap binds; the 2-uid invariants hold.
- **Files-overlay invariant verified:** overlay 8 (`Optimized
Settings`, files-type) is `left4me:left4me` end-to-end with no
`l4d2-sandbox`-owned files (janitorial item 9). This means files
overlays would not be affected by a gameserver-uid split — the
Python web app writes them directly as `left4me`.
- **The hardening floor is high.** `srcds` already runs with
`NoNewPrivileges=true`, `ProtectSystem=strict`,
`PrivateDevices=true`, `ReadOnlyPaths=...installation...overlays`,
`RestrictSUIDSGID=true`, `LockPersonality=true` (see
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`).
Most exfil paths a gameserver-uid split would close are already
closed by systemd hardening. The case for "split to 3" is
defense-in-depth, not a missing primary control.
- **Sudoers / cross-repo cost.** A new uid would need additions in
ckn-bw's `bundles/left4me/items.py` (`users` dict) and in the
sudoers grants. Both are in the right state to receive that change
cleanly; deploy-dir-rethink already pinned where each lives.
After execution: write the implementation plan at
`docs/superpowers/plans/2026-MM-DD-hardening-refactor.md` against the
proven composition. The plan touches `~/Projekte/ckn-bw/bundles/left4me/metadata.py`
(live source for unit emission per `items.py:2-5`) and the reference
copies in `deploy/files/usr/local/lib/systemd/system/`.
### Downstream consequence
## Decision-relevant context
Whatever uid-split decides constrains the **build-overlay-unit**
refactor that follows
([`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`](2026-05-15-build-overlay-unit-design.md)).
The systemd template unit replacing `left4me-script-sandbox` encodes
the idmap mapping `l4d2-sandbox``<target uid>`. Settling the uid
question first means build-overlay-unit composes against a final
foundation rather than retouching.
- **Source of truth for unit files is ckn-bw**, not left4me's
`deploy/files/`. The `deploy/files/usr/local/lib/systemd/system/*.service`
copies are reference-only post-deploy-dir-rethink; the
`systemd/units` reactor in `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
is the live emission. Audit confirmed (commit `5284e28` + `items.py:2-5`
comment).
- **Sandbox is already strong.** `l4d2-sandbox` unit is not in scope
for this refactor — its hardening profile was verified during 2026-05-15
build-time-idmap work. Document as load-bearing; do not weaken.
- **Sudo on the web app blocks deep hardening there.** `NoNewPrivileges=true`
and `PrivateUsers=true` are incompatible with the helper-invocation
pattern. Sudo-compatible subset only on web. Full hardening blocked
on a future "replace sudo with systemctl-managed unit triggering"
refactor (build-overlay-unit spec is a step in that direction).
- **uid-split spec is deferred, not closed.** After Phase A test
results come back, decide: residual risk small enough → close
`2026-05-15-user-uid-split-design.md` as superseded. Residual risk
significant → write the split as a follow-up.
## Pointers
## Open questions to clarify with operator before/during execution
- Source-of-truth spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
- Build-time-idmap plan (the load-bearing security argument):
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`
- Live unit files for srcds hardening review:
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
- ckn-bw users definition: `~/Projekte/ckn-bw/bundles/left4me/items.py`
(the `users = {...}` dict near the top)
- Sandbox helper that does the idmap mapping today:
`scripts/libexec/left4me-script-sandbox`
(Captured in the threat model's "Open questions" section.)
1. Is gunicorn directly internet-reachable, or only via nginx?
2. Admin-auth strength on the web app (defines S2 realism).
3. Workshop content curation policy (defines A3 realism).
4. Is `ckn@10.0.4.128` usable as a test bench, or is `left4.me` the
only deployment target? (Test plan currently assumes `left4.me`.)
5. Current `kernel.yama.ptrace_scope` setting on the host.
6. AppArmor enabled on host? (Default Debian: not enabled.)
## What's NOT next
- Build-overlay-unit refactor. Wait for uid-split.
- Janitorial item 7 (`_sandbox_script_dir` cleanup). Conditional on
build-overlay-unit Option B landing.
- Mako template duplication in ckn-bw. Separate cleanup; the
templates legitimately need bw's metadata access.
- Pushing the ckn-bw `91b7265` commit. Safe but not blocking.
- **build-overlay-unit refactor**
(`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`).
Still queued; sequenced behind this. The hardening profile from
this work becomes the template for the build-overlay unit.
- **Pushing the ckn-bw `91b7265` commit.** Still unpushed; still safe.
Mentioned in the previous handoff; not a blocker.
- **uid-split implementation.** Deferred pending test results.
- **AppArmor profiles.** Listed in the defenses survey; deferred.
Revisit after Phase A if directive-only hardening leaves gaps.
## Pointers
- Test plan (the thing to execute): `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
- Original uid-split spec (deferred): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
- Scratch plan from earlier this session
(`~/.claude/plans/docs-superpowers-specs-2026-05-15-sessio-cosmic-codd.md`)
is superseded by the three specs; safe to discard.