diff --git a/docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md b/docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md new file mode 100644 index 0000000..79fb791 --- /dev/null +++ b/docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md @@ -0,0 +1,698 @@ +# left4me application hardening — defenses survey + +**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md` +and `2026-05-15-hardening-test-plan.md`. + +This document catalogs the Linux + systemd defense primitives applicable +to left4me, evaluates each against this codebase's needs, and proposes a +candidate composition. Each candidate is *testable* — the test plan +exercises it before commit. + +Reference: the threat model defines defenses D1-D7. This document maps +primitives to those defenses. + +## Section 1 — Linux kernel primitives + +### Namespaces (`man 7 namespaces`) + +| NS | Isolates | Relevance | +|---|---|---| +| **mount** | filesystem hierarchy view | Core. Gives `TemporaryFileSystem=` + bind primitives. | +| **user** | uid/gid mapping | Big for D2/D4 (cross-uid ptrace block). | +| **pid** | PID 1, /proc visibility | Pairs with `ProcSubset=pid` for D2. | +| **net** | netifs, ports, routes | Breaks gameservers; do **not** apply to server@. | +| **ipc** | SysV IPC + POSIX MQ + abstract sockets | Hygienic; `PrivateIPC=true`. | +| **uts** | hostname | Cosmetic; doesn't matter for us. | +| **time** | CLOCK_MONOTONIC offset | Irrelevant for us. | +| **cgroup** | cgroup view | Defense-in-depth against cgroup escape. | + +**For left4me:** mount + user + pid + ipc on `left4me-server@.service`. +The web unit can use the same minus user-ns (incompatible with sudo). + +### Capabilities (`man 7 capabilities`) + +Per-process, granted at exec via file caps or by systemd at unit start. +Bounding set = upper bound; ambient = inherited across non-setuid exec. + +- **CapabilityBoundingSet=** empty drops everything. Neither srcds nor + gunicorn needs any capability after they start (no raw sockets, no + mount, no module load, no setuid). +- **AmbientCapabilities=** empty (default). + +Sharp edge: with `+`-prefixed ExecStartPre, the helper runs as PID 1 +(root, all caps), unaffected by these. That's how we get the privileged +overlay mount without breaking the unit's caps. + +### Seccomp-bpf (`man 2 seccomp`) + +Filter syscall set. Per-process. Composes with the AND of all filters +loaded. The systemd `SystemCallFilter=` wraps it. + +For us, two filter strategies: +- **Allow-list base** (`@system-service`): permissive enough for srcds + + gunicorn; subtract dangerous groups. +- **Deny-list**: simpler but easier to leave holes. + +Strategy: allow-list with subtractions. + +Critical subtractions for D2: +- `~@debug` — drops `ptrace(2)`, `process_vm_readv/writev(2)`, + `process_madvise(2)`. **Single most important syscall block** for our + threat model. +- `~@mount` — `mount`, `umount2`, `pivot_root` (gameserver doesn't need; + helper does, and helper runs as root via `+` prefix). +- `~@privileged` — anything requiring CAP_*; redundant with empty + bounding set but defense-in-depth. +- `~@reboot`, `~@swap`, `~@cpu-emulation`, `~@obsolete` — cheap removal. + +Sharp edges: +- `SystemCallFilter=` lines compose left-to-right by union (first line + sets allow-list; subsequent `~` lines subtract). +- A `~` subtract on a group not in the allow-list is a no-op. +- `SystemCallArchitectures=native` blocks 32-bit syscall entries that + bypass the filter. Always set this. +- `SystemCallErrorNumber=EPERM` vs. default `KILL` — `EPERM` is gentler + for non-essential paths; `KILL` is loud and obvious. Start with + default (KILL) for clear signal, switch to `EPERM` if a benign caller + trips it (e.g., a library probing for capabilities). + +### Yama LSM — `kernel.yama.ptrace_scope` + +System-wide sysctl. Values: +- 0: any same-user can ptrace +- 1: same-uid or direct ancestor (Debian default) +- 2: requires `CAP_SYS_PTRACE` (admin only) +- 3: ptrace disabled entirely + +For left4me: setting to 2 system-wide is cheap and removes the same-uid +ptrace path entirely. Set via `/etc/sysctl.d/99-left4me.conf` (or +extend an existing file). Doesn't affect debuggability — if you ever +need to ptrace, do it as root. + +Caveat: Yama is enforced AT THE TIME of `ptrace` call. With seccomp +blocking the syscall entirely (`~@debug`), Yama becomes belt-and-braces; +keep both for defense-in-depth. + +### LSMs other than Yama + +| LSM | Status on Debian Trixie | Fit for us | +|---|---|---| +| **AppArmor** | Available; not enabled by default | Could write profiles for srcds + gunicorn. Per-unit profile via `AppArmorProfile=` on systemd. Moderate effort. | +| **SELinux** | Available; not enabled by default | Heavy. Not worth the operational cost on a single-host VPS. | +| **landlock** | Kernel ≥5.13; available | Process-local sandboxing. Apps must opt in via `landlock(2)`. Python doesn't have a stdlib binding; need to call via ctypes or a wrapper. For us: would need to retrofit gunicorn or write a wrapper. Defer. | +| **BPF LSM** | Kernel ≥5.7; available | Programmable LSM hooks. Bleeding edge for personal infra. Defer. | +| **Tomoyo** | Available; not Debian-enabled | Path-based MAC. Niche. Skip. | + +**For left4me:** Yama yes. AppArmor *maybe*, as a follow-up — a profile +limited to "deny path X" patterns for srcds would be small but adds an +audit/rollback surface. Skip in the first pass; revisit if test results +show systemd directives alone leave gaps. + +### Filesystem ACLs and modes + +POSIX permissions, supplementary groups, ACLs (`setfacl`), extended +attrs (`xattr`). + +For us: +- DB and `web.env` already use `root:left4me 0640`. If we go uid-split, + ownership changes; if we go hardening-only, mode is fine — what + matters is *whether the unit's FS view contains them at all*. +- `setfacl` for fine-grained sharing (e.g., one supplementary group + used by both web and game). Doable but adds complexity; consider + only if uid split goes ahead. + +### File attributes (chattr) + +`chattr +i` (immutable) and `chattr +a` (append-only). + +For us: +- `chattr +i /opt/left4me/src/**` — prevents post-deploy tampering by + anything short of root removing the attr. But: `pip install -e` + creates `*.egg-info` files in the tree; deploy of new code would need + to `chattr -R -i ...` first. Too much friction. Skip. +- `chattr +i /etc/left4me/web.env` — keeps the env file from being + rewritten by a malicious uid. Works because the env file is rewritten + rarely (rotate SECRET_KEY explicitly via ckn-bw apply, which is root + and can `chattr -i` first). Worth considering as a small extra. + +### cgroups v2 + +Not a security primitive (not confidentiality/integrity), but a +**resource ceiling**. Already in use: +- `Slice=l4d2-game.slice`, `MemoryMax`, `TasksMax` — keep. + +`MemoryDenyWriteExecute=true` is a kernel-level prctl + seccomp, not a +cgroup, but listed here because it's resource-adjacent. See systemd +section. + +### Sudo / setuid + +Sudoers grants narrow what a unit's uid can do as root. For us, the +helpers (`scripts/libexec/left4me-*`) already validate inputs tightly +(verified in audit). Two design options for the future: + +- **Keep sudo path**, narrow the grants (per-uid via 3-user split, or + per-action via tighter sudoers). +- **Replace sudo with systemctl-managed transient units triggered via + dbus / `systemctl start`** — the build-overlay-unit spec already + proposes this for the script-sandbox. + +The web app needs to invoke the helpers somehow. `NoNewPrivileges=true` +on the web unit would break sudo's setuid. If we move to +systemctl-triggered units (no setuid involved), we can also tighten the +web unit. Sequenced in the implementation plan, not this survey. + +## Section 2 — systemd unit-config primitives + +### Identity + +- **`User=` / `Group=`** — drop privileges. Already set. +- **`DynamicUser=true`** — transient uid per run, persisted across runs + via `StateDirectory=`. Strong default. **Bad fit for us** because + multiple units share `/var/lib/left4me/` cross-unit; DynamicUser's + per-unit `StateDirectory=` model fights that. +- **`SupplementaryGroups=`** — extra groups. Used if we add a shared + read-only group (e.g., `l4d2-overlay-readers`). + +### Filesystem virtualization + +The lever the operator asked about ("can systemd have a fully virtual +filesystem"). Yes — composition: + +- **`RootDirectory=path`** — chroot. Full FS substitution. Heavy; + requires populating libs/binaries. Skip for the first pass. +- **`RootImage=path`** — same but from a disk image. Way too heavy. +- **`TemporaryFileSystem=path[:opts]`** — empty tmpfs at `path`. + Cheap. Composes with bind paths. +- **`BindReadOnlyPaths=src[:dst]`** — RO bind. Composes over + TemporaryFileSystem. +- **`BindPaths=src[:dst]`** — RW bind. Composes over TemporaryFileSystem. +- **`InaccessiblePaths=path`** — masks a path with an empty file/dir. + Legacy; Bind* is cleaner. +- **`NoExecPaths=path`** / **`ExecPaths=path`** — restrict + executable paths. Strong but easy to misconfigure. + +Composition pattern (the one we want for srcds): +```ini +TemporaryFileSystem=/var/lib /etc /opt /home /root /srv +BindReadOnlyPaths=/var/lib/left4me/installation +BindReadOnlyPaths=/var/lib/left4me/overlays +BindReadOnlyPaths=/etc/left4me/host.env +BindReadOnlyPaths=/etc/ssl /etc/ca-certificates /etc/resolv.conf +BindReadOnlyPaths=/etc/nsswitch.conf /etc/alternatives +BindPaths=/var/lib/left4me/runtime/%i +``` + +Result: srcds has no DB, no `web.env`, no `/opt/left4me/src/` in its FS +view. Files outside the bound list are simply not there from srcds's +perspective — `open()` returns ENOENT, not EACCES. + +Sharp edges: +- `TemporaryFileSystem=` size defaults to half RAM; clamp via + `:size=NNM,nr_inodes=NN`. +- Bind paths must exist on disk; ENOENT prevents unit start. +- `BindReadOnlyPaths=` and `BindPaths=` reorder semantics: bind-mounts + applied in order; later wins. +- `RuntimeDirectory=` integrates with `TemporaryFileSystem=` cleanly: + `RuntimeDirectory=left4me/foo` creates `/run/left4me/foo` and binds + it in, auto-cleaning on stop. + +### Namespaces (systemd wrappers) + +- **`PrivateTmp=true`** — already set. +- **`PrivateDevices=true`** — already set. Drops most of `/dev`. +- **`PrivateNetwork=true`** — **don't** for gameservers (breaks UDP). +- **`PrivateIPC=true`** — private SysV/POSIX IPC namespace; cheap win. +- **`PrivateUsers=true`** — own userns. The configured `User=left4me` + is identity-mapped inside; outside, the unit's processes appear as a + mapped high uid (defense for D2/D4 against cross-namespace ptrace). + Sharp edge: incompatible with `sudo` from inside the unit (setuid + + userns mapping = no host-root). +- **`PrivateMounts=true`** — own mount ns (default-implicit with most + Protect* / Private* directives). + +### `/proc` and `/sys` protection + +- **`ProtectProc=invisible|noaccess|ptraceable|default`** — + `invisible` makes other procs' `/proc//*` not exist. **D2.** +- **`ProcSubset=pid|all`** — `pid` restricts `/proc/` to PID entries; + hides `/proc/kallsyms`, `/proc/cpuinfo`, etc. Cheap. +- **`ProtectKernelTunables=true`** — `/proc/sys`, `/sys` read-only. +- **`ProtectKernelModules=true`** — block `init_module`, `delete_module`. +- **`ProtectKernelLogs=true`** — block `/dev/kmsg`, syslog(). +- **`ProtectClock=true`** — block `clock_settime`, `settimeofday`. +- **`ProtectControlGroups=true`** — `/sys/fs/cgroup` read-only. +- **`ProtectHostname=true`** — block `sethostname`/`setdomainname`. + +All of `ProtectKernel*`, `ProtectClock`, `ProtectControlGroups`, +`ProtectHostname` are cheap and have no downside for srcds or gunicorn. +Add all of them. + +### Filesystem protection (legacy / not Bind*) + +- **`ProtectSystem=false|true|full|strict`** — increasingly stringent + RO of system paths. `strict` makes `/`, `/usr`, `/boot`, `/etc`, + `/opt` RO except for explicit writable paths. +- **`ProtectHome=false|true|read-only|tmpfs`** — `tmpfs` masks `/home`, + `/root`, `/run/user` with empty tmpfs. + +For us: `ProtectSystem=strict` + `ProtectHome=tmpfs` is the baseline. +But once we adopt `TemporaryFileSystem=` for the relevant trees, these +become secondary — TemporaryFileSystem fully supersedes them in the +covered subtrees. Keep both as defense-in-depth (cheap). + +### Syscall filtering + +- **`SystemCallFilter=expr`** — discussed in Linux section. +- **`SystemCallArchitectures=native`** — always set. +- **`SystemCallLog=expr`** — opt-in logging without enforcement; + useful for diagnosing what gets called before tightening. +- **`SystemCallErrorNumber=EPERM`** — soft denial vs. SIGKILL. Default + is SIGKILL; switch later if a benign caller trips. + +### Capabilities + +- **`CapabilityBoundingSet=`** — empty drops all. Use it. +- **`AmbientCapabilities=`** — empty (default). +- **`NoNewPrivileges=true`** — prevents setuid escalation. **Required + on srcds**, **incompatible with sudo on web** until sudo is replaced. + +### Network restrictions + +- **`RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX`** — for srcds. + AF_UNIX needed for journald socket access. +- **`IPAddressAllow=` / `IPAddressDeny=`** — uses cgroup BPF; affects + outbound traffic. For srcds: probably overcomplicates; the firewall + already controls ingress. Skip for first pass. +- **`SocketBindAllow=` / `SocketBindDeny=`** — restricts which ports a + unit can `bind()`. For srcds, allow only the configured game port + range. Adds value but couples to config. Defer to a follow-up. + +### Resource restrictions + +- **`MemoryMax`**, **`TasksMax`**, **`LimitNOFILE`** — already set. +- **`OOMScoreAdjust`** — already set (favor killing the gameserver + before system processes if memory tight). +- **`MemoryDenyWriteExecute=true`** — blocks `mprotect(PROT_WRITE|PROT_EXEC)`. + Defends against shellcode in JIT memory. **Source engine likely + fine** (no JIT in the binary; the Squirrel script engine is an + interpreter, not JIT). **Sourcemod plugins**: most are compiled to + bytecode + run on SourcePawn VM (interpreter); no JIT either. Verify + in test. + +### IPC and process hygiene + +- **`RemoveIPC=true`** — clean up SysV IPC on unit stop. +- **`KeyringMode=private`** — own kernel keyring; no host-key access. +- **`LockPersonality=true`** — block `personality(2)` calls (no x86 vs + x86-64 mode toggle). Already set. +- **`RestrictRealtime=true`** — block real-time scheduling. srcds may + use SCHED_OTHER + nice; no realtime needed. +- **`RestrictNamespaces=true`** — block `unshare(2)` / `clone(CLONE_NEW*)`. +- **`RestrictSUIDSGID=true`** — already set. +- **`UMask=0027`** — narrow default umask. + +### Capabilities of the `+` prefix + +`ExecStartPre=+cmd` runs `cmd` as root in PID 1's namespaces, bypassing +the unit's User= and almost all Protect*/Private*/Restrict* directives. +This is how the existing overlay-mount helper runs. Critical to verify +in test: +- Does `+` preserve the bypass when `PrivateUsers=true` is set? + (Expected: yes — the userns is set up around the unit's processes; + `+` puts the helper outside it.) + +### State management (per-unit) + +- **`StateDirectory=path`** — creates `/var/lib/` owned by User=. +- **`RuntimeDirectory=path`** — creates `/run/`, auto-deleted on + stop. +- **`LogsDirectory=path`** — `/var/log/`. +- **`CacheDirectory=path`** — `/var/cache/`. +- **`ConfigurationDirectory=path`** — `/etc/`. + +Useful for cleanup hygiene if we redesign storage layout. Not required +for first pass. + +### `systemd-analyze security` + +`systemd-analyze security ` produces a security score per unit +(lower = more secure). Output lists each directive with a ✓/✗. +Useful as: +- Regression check (record baseline, ensure score drops after refactor). +- Discovery tool ("which directives haven't I set?"). + +Baseline scores (to capture during test plan): +- `left4me-server@1.service` before refactor +- `left4me-web.service` before refactor + +### Composability lookups + +The systemd docs use a "predefined preset" concept that's worth knowing: + +- **`@privileged`** (syscall group) ⊃ `@process`, `@module`, `@ptrace`, etc. +- **`@system-service`** is the recommended base for "I want a normal + service to work." +- Subtracting `~@privileged` is broad; `~@debug @mount @raw-io` is + surgical. + +## Section 3 — Application-level options + +### Apparmor profile for srcds + +If systemd directives leave gaps, an AppArmor profile would let us +deny specific paths or operations beyond what systemd's directives +cover. E.g., "deny network for srcds to a specific IP range" via +`network inet stream...` deny rules; or "deny mounting" beyond +`SystemCallFilter`. + +Effort: +- Enable AppArmor in the kernel cmdline + boot config. +- Write a profile (e.g., `/etc/apparmor.d/usr.bin.srcds_linux`). +- Reference via systemd `AppArmorProfile=` per unit. + +Skip for the first pass; revisit if test results show the systemd +directives alone leave a gap. + +### landlock for the web app + +Python web app could call `landlock_create_ruleset` / `landlock_add_rule` +/ `landlock_restrict_self` via ctypes. Restricts FS access at runtime. + +For us: +- Could restrict gunicorn to `/var/lib/left4me/` + `/etc/left4me/web.env` + + `/opt/left4me/.venv` + `/tmp`. +- Symmetric to `TemporaryFileSystem=` + `Bind*` but at the + application layer (no systemd reach). + +Skip; systemd directives are simpler. Reconsider if we move to a +DynamicUser-style world later. + +### File-integrity tooling (Aide, Tripwire) + +Out of scope for prevention; useful for detection. Not in this design. + +### Custom seccomp profile (bypassing systemd) + +The web app could call `seccomp(2)` from inside Python via libseccomp ++ ctypes to tighten its own filter beyond what systemd applies. +Symmetric to landlock; skip for the same reason. + +## Section 4 — Per-defense mapping + +For each defense from the threat model, the primitives that implement +it, in priority order: + +### D1 — Gameserver RCE cannot exfiltrate DB or `web.env` + +| Primitive | Strength | Notes | +|---|---|---| +| `TemporaryFileSystem=/var/lib /etc` + minimal bind set | Strong | The files simply aren't in the unit's FS view. ENOENT, not EACCES. | +| 3-user split (DB owned by `l4d2-web`) | Strong | Kernel-enforced; survives unit-config errors. | +| `BindReadOnlyPaths=/dev/null:/var/lib/left4me/left4me.db` | Medium | Masks the path; brittle (paths can move). | +| Filesystem ACLs (DB mode 0600) | Weak | Kernel still allows `left4me` group; only fixed by uid split. | + +**Composition chosen:** `TemporaryFileSystem=` + Bind* (primary). +3-user split as defense-in-depth or deferred. + +### D2 — Gameserver RCE cannot ptrace web app or peers + +| Primitive | Strength | Notes | +|---|---|---| +| `SystemCallFilter=~@debug` | Strong | Blocks `ptrace`, `process_vm_readv/writev`. | +| `kernel.yama.ptrace_scope=2` | Strong | Belt-and-braces at the kernel level. | +| `CapabilityBoundingSet=` empty | Strong | No CAP_SYS_PTRACE. | +| `PrivateUsers=true` | Strong | Cross-userns ptrace requires CAP_SYS_PTRACE. | +| 3-user split | Strong | Different uids; same-uid path doesn't exist. | + +**Composition chosen:** All four (syscall + yama + caps + userns) +together; they compose redundantly. + +### D3 — Gameserver RCE cannot use sudo helpers + +| Primitive | Strength | Notes | +|---|---|---| +| `NoNewPrivileges=true` | Strong | Blocks sudo's setuid. Already set on server@. | +| `PrivateUsers=true` | Strong | sudo across userns boundary impossible. | +| Sudoers grants scoped to `l4d2-web` (uid split) | Strong | Different uid means sudo grant doesn't apply. | +| `RestrictSUIDSGID=true` | Strong | Already set. | + +**Composition chosen:** NoNewPrivileges (already) + PrivateUsers (new) ++ RestrictSUIDSGID (already). 3-user split is *also* covered by NNP ++ PrivateUsers; uid split would be defense-in-depth. + +### D4 — Web app RCE cannot ptrace gameservers + +| Primitive | Strength | Notes | +|---|---|---| +| `SystemCallFilter=~@debug` on **web** | Strong | Symmetric to D2 but applied to web. | +| `kernel.yama.ptrace_scope=2` | Strong | System-wide, helps both directions. | +| 3-user split | Strong | Different uids. | + +**Composition chosen:** SystemCallFilter on web + yama=2 system-wide. +PrivateUsers cannot be applied to web (sudo incompatibility). 3-user +split as defense-in-depth or deferred. + +### D5 — Cross-server contamination + +Each `left4me-server@.service` is a separate unit instance. With +`PrivateUsers=true`, each gets its own user namespace. Cross-namespace +ptrace fails. With `TemporaryFileSystem=` and per-instance +`BindPaths=/var/lib/left4me/runtime/%i`, neither instance can read the +other's `runtime//` or attach to its process. + +**Composition chosen:** PrivateUsers + per-instance Bind* (above). +Per-instance uids out of scope. + +### D6 — Persistent compromise of `/opt/left4me/src/` blocked from gameserver + +Already covered by `ProtectSystem=strict` on server@.service. With +`TemporaryFileSystem=/opt`, the path simply isn't visible to srcds. +**Stronger and redundant — both can stay.** + +### D7 — Defenses survive a unit-config refactor in the wrong direction + +`deploy/tests/test_deploy_artifacts.py` asserts the directives' presence +in the deployed unit. Add hardening invariants as test cases. Survives +because the test fails CI before deploy. + +## Section 5 — Candidate composition + +**For testing, not commitment.** Test plan validates each piece. + +### `left4me-server@.service` + +```ini +[Service] +User=left4me +Group=left4me + +# (existing) +Type=simple +WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2 +EnvironmentFile=/etc/left4me/host.env +EnvironmentFile=/var/lib/left4me/instances/%i/instance.env +ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i +ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS +ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i +Restart=on-failure +RestartSec=5 + +# Resource control (existing) +Slice=l4d2-game.slice +Nice=-5 +IOSchedulingClass=best-effort +IOSchedulingPriority=4 +OOMScoreAdjust=-200 +MemoryHigh=1.5G +MemoryMax=2G +TasksMax=256 +LimitNOFILE=65536 +KillSignal=SIGINT +TimeoutStopSec=15s +LogRateLimitIntervalSec=0 + +# Hardening — identity +NoNewPrivileges=true +RestrictSUIDSGID=true + +# Hardening — namespaces +PrivateTmp=true +PrivateDevices=true +PrivateIPC=true +PrivateUsers=true # NEW +ProtectHome=true + +# Hardening — filesystem view +TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media # NEW +BindReadOnlyPaths=/var/lib/left4me/installation # was ReadOnlyPaths +BindReadOnlyPaths=/var/lib/left4me/overlays # was ReadOnlyPaths +BindReadOnlyPaths=/etc/left4me/host.env # NEW +BindReadOnlyPaths=/etc/ssl /etc/ca-certificates # NEW +BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives # NEW +BindPaths=/var/lib/left4me/runtime/%i # was ReadWritePaths +ProtectSystem=strict +# (remove old ReadOnlyPaths= and ReadWritePaths= lines — superseded) + +# Hardening — /proc, /sys, kernel +ProtectProc=invisible # NEW +ProcSubset=pid # NEW +ProtectKernelTunables=true # NEW +ProtectKernelModules=true # NEW +ProtectKernelLogs=true # NEW +ProtectClock=true # NEW +ProtectControlGroups=true # NEW +ProtectHostname=true # NEW +LockPersonality=true + +# Hardening — caps + syscall +CapabilityBoundingSet= # NEW +AmbientCapabilities= # NEW +SystemCallArchitectures=native # NEW +SystemCallFilter=@system-service # NEW +SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged # NEW + +# Hardening — network +RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW (AF_UNIX for journald) + +# Hardening — namespaces, realtime, IPC +RestrictNamespaces=true # NEW +RestrictRealtime=true # NEW +RemoveIPC=true # NEW +KeyringMode=private # NEW +UMask=0027 # NEW + +# Deferred until test: +# MemoryDenyWriteExecute=true # MAY break sourcemod / Source engine; test first. +``` + +### `left4me-web.service` + +```ini +[Service] +User=left4me +Group=left4me + +# (existing) +Type=simple +WorkingDirectory=/opt/left4me/src +Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin +EnvironmentFile=/etc/left4me/host.env +EnvironmentFile=/etc/left4me/web.env +ExecStart=/opt/left4me/.venv/bin/gunicorn --workers ... --threads ... --bind 127.0.0.1:8000 'l4d2web.app:create_app()' +Restart=on-failure +RestartSec=3 + +# Hardening +PrivateTmp=true +ProtectSystem=strict # tightened from =full +ProtectHome=true +ReadWritePaths=/var/lib/left4me # web needs broad write access there +# NoNewPrivileges intentionally NOT set — sudo +# PrivateUsers intentionally NOT set — sudo + +# /proc + kernel hardening (sudo-compatible) +ProtectProc=invisible # NEW +ProcSubset=pid # NEW +ProtectKernelTunables=true # NEW +ProtectKernelModules=true # NEW +ProtectKernelLogs=true # NEW +ProtectClock=true # NEW +ProtectControlGroups=true # NEW +ProtectHostname=true # NEW +LockPersonality=true # NEW + +# Syscall filter — allow @system-service minus debug-class; keep @privileged +# because sudo needs setuid, chown, etc. +SystemCallArchitectures=native # NEW +SystemCallFilter=@system-service # NEW +SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete # NEW + +# Network +RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW + +# Misc hygiene +RestrictRealtime=true # NEW +RestrictNamespaces=true # NEW +RemoveIPC=true # NEW +UMask=0027 # NEW + +# Deferred for sudo-removal future work: +# NoNewPrivileges=true +# CapabilityBoundingSet= +# PrivateUsers=true +``` + +### Host sysctl + +`/etc/sysctl.d/99-left4me.conf` (or merge into existing): +``` +kernel.yama.ptrace_scope=2 +``` + +System-wide. Means: even if a unit-level config slips, host-level +ptrace is admin-only. Cost: zero for our use case (no debugging in +prod). + +## Section 6 — Trade-offs and known sharp edges + +To verify in the test plan: + +1. **`PrivateUsers=true` + `+`-prefixed ExecStartPre**: expected to + work (the `+` runs outside the unit's namespaces). Sharp if it + doesn't — the overlay mount would fail and srcds wouldn't start. +2. **`TemporaryFileSystem=/etc` and missing files**: srcds and its + dependencies (libstdc++ runtime, libssl, libcurl) may read files + from `/etc` we haven't bound. Watch journalctl for ENOENT during + first start. +3. **`SystemCallFilter=~@privileged` and Source engine**: srcds is C++ + and uses syscalls beyond the obvious. A `~@privileged` may trip + something. Mitigation: test with `SystemCallLog=` instead of + `SystemCallFilter=` first; observe what would have been blocked; + then narrow. +4. **`MemoryDenyWriteExecute=true` and sourcemod**: SourcePawn is + bytecode-interpreted (no JIT) per public docs, but plugin + compilation could in theory use a JIT. Test before enabling. +5. **`RestrictAddressFamilies=` without AF_UNIX**: journald socket + needs it. Always include AF_UNIX. +6. **`ProcSubset=pid` and Python**: gunicorn shouldn't break (uses + /proc/self/* + signal-based ipc). Verify. +7. **sysctl `kernel.yama.ptrace_scope=2`**: blocks operator's own + `gdb` / `strace -p` against any running service. If you need to + debug, temporarily set back to 1 via sysctl, then revert. +8. **`ProtectSystem=strict` on web**: was `=full`. Tighter; might + break a write the web app does to a path outside `/var/lib/left4me`. + Audit `l4d2web/*` for `os.makedirs` or `open(...'w')` outside that + root. + +## Open questions for the implementer + +(After test plan results come back, finalize these.) + +1. Do we adopt `MemoryDenyWriteExecute=true` if it works for srcds? + (Probably yes, defense-in-depth at low cost.) +2. Do we set `SocketBindAllow=` on srcds to lock the port range? + (Depends on whether `instance.env` exposes the range cleanly to a + unit directive.) +3. Do we deploy AppArmor profiles as a follow-up? + (Probably no — operational complexity exceeds the marginal gain on + single-host infra.) +4. Do we keep both `BindReadOnlyPaths=` and the legacy + `ReadOnlyPaths=` declarations, or simplify? (Simplify — use Bind* + exclusively once `TemporaryFileSystem=` is in place.) +5. Do we proceed with 3-user split as a follow-up, or close the spec + as "addressed by hardening"? Depends on operator's residual-risk + tolerance after Phase A lands and we observe. + +## Pointers + +- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md` +- Test plan: `docs/superpowers/specs/2026-05-15-hardening-test-plan.md` +- Original uid-split spec (still open): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` +- Live unit source (ckn-bw reactor): `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+` +- Reference units (deploy-dir-rethink reference-only): `deploy/files/usr/local/lib/systemd/system/` +- systemd docs (latest, systemd 256+ on Trixie): + `man systemd.exec`, `man systemd.unit`, `man systemd-analyze`. +- L4D2 / Source engine docs: + - SourcePawn (bytecode-interpreted): https://wiki.alliedmods.net/SourcePawn + - srcds is a Source 2007 engine binary; closed-source, expect surprises. diff --git a/docs/superpowers/specs/2026-05-15-hardening-test-plan.md b/docs/superpowers/specs/2026-05-15-hardening-test-plan.md new file mode 100644 index 0000000..fa6dfa3 --- /dev/null +++ b/docs/superpowers/specs/2026-05-15-hardening-test-plan.md @@ -0,0 +1,898 @@ +# left4me application hardening — test plan + +**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md` +and `2026-05-15-hardening-defenses-survey.md`. **Executed in a follow-up +session with shell access to `left4.me` (141.95.32.8).** + +This document is intentionally self-contained: a session that lands cold +with shell on `left4.me` can execute it end-to-end without re-reading +the threat model or survey. Decisions made in this plan are based on the +candidate composition in the defenses survey (Section 5). + +## Test architecture + +### Where we test + +- **Host:** `left4.me` / `ovh.left4me` (141.95.32.8). Production host; + no separate test bench. (Reference: memory entry + `feedback_test_server_hangs.md` mentions a separate test server at + `ckn@10.0.4.128`; verify whether that host is suitable for this work + *before* using prod.) +- **Canary unit:** `left4me-server@1.service`. Use this as the test + instance. Leave `left4me-server@2.service` running baseline so at + least one server stays up if the canary breaks. +- **Web unit:** `left4me-web.service` is shared. Test web-side + hardening only after server@ tests prove the composition; web is + more disruptive to roll back. + +### Operating constraints + +- **System units only.** No `systemctl --user`, no lingering, no + per-user systemd instance. All units under `/etc/systemd/system/` or + `/usr/local/lib/systemd/system/`. Drop-ins go to + `/etc/systemd/system/.d/`. +- **Drop-in style.** Tests apply via `/etc/systemd/system/left4me-server@1.service.d/test-NN-.conf` + (note: `@1` for instance-specific). This leaves the template + unmodified — other instances unaffected. `systemctl daemon-reload` + picks up drop-ins; `systemctl restart left4me-server@1` applies. +- **Cleanup required.** Each test removes its drop-in before the next + starts. Baseline must be restorable at any point. +- **Recording.** Each test produces a one-paragraph result in this + document's "Results" section at the bottom. Append, don't replace. + +### Failure modes to watch for + +- **SECCOMP audit:** `journalctl -k --since '1 minute ago' | grep -i seccomp` + shows `type=1326` lines. Each is a syscall denied; the syscall number + identifies the call. Use `scmp_sys_resolver` to translate. +- **Unit start failure:** `systemctl is-active left4me-server@1` → `inactive` or `failed`. +- **srcds crash mid-game:** `journalctl -u left4me-server@1 -f` shows + unexpected exit; `systemctl show left4me-server@1 -p Result` is + not `success`. +- **sourcemod/metamod plugin failures:** in-game `sm plugins list` or + RCON `sm plugins list` shows plugins as failed-to-load. +- **Permission denied where unexpected:** `journalctl -u left4me-server@1` + shows `Permission denied` or `Operation not permitted`. + +## Before any test: baseline capture + +Capture these so we can compare after each test, and so we have a +known-good snapshot to revert to. + +```bash +# 1. Baseline systemd-analyze score +sudo systemd-analyze security left4me-server@1.service \ + | tee /tmp/sec-baseline-server.txt +sudo systemd-analyze security left4me-web.service \ + | tee /tmp/sec-baseline-web.txt + +# 2. Full current unit (cat'd, post-merge with any existing drop-ins) +sudo systemctl cat left4me-server@1.service \ + | tee /tmp/unit-baseline-server.conf +sudo systemctl cat left4me-web.service \ + | tee /tmp/unit-baseline-web.conf + +# 3. Current sysctl +sysctl kernel.yama.ptrace_scope | tee /tmp/sysctl-baseline.txt +# Expect: kernel.yama.ptrace_scope = 1 (Debian default) + +# 4. Functional baseline — confirm both servers + web healthy now +sudo systemctl is-active left4me-server@1 left4me-server@2 left4me-web +# Expect: active active active + +# 5. Confirm srcds_linux running, gunicorn running +sudo systemctl status left4me-server@1 left4me-server@2 left4me-web \ + --no-pager | head -40 + +# 6. RCON sanity (optional — needs an RCON password) +# (Use the web UI to fire `status` against server@1; expect a reply.) + +# 7. Capture baseline syscalls (to compare what's blocked after filter) +# This is heavy; only run if you suspect a filter is too tight: +# sudo systemctl edit --runtime left4me-server@1 +# Add: SystemCallLog=@privileged +# Reload, restart, observe journalctl -u for ~5 minutes, then revert. +``` + +Record `/tmp/sec-baseline-server.txt` score (a value like "5.4 EXPOSED" +is typical). Goal: lower (more secure) after refactor. + +## Test 1 — `PrivateUsers=true` compatibility + +**Goal:** Confirm `PrivateUsers=true` works on `left4me-server@.service` +with the `+`-prefixed `ExecStartPre` overlay-mount helper. + +**Pre-condition:** server@1 active, baseline captured. + +**Drop-in:** +```bash +sudo install -d -m0755 /etc/systemd/system/left4me-server@1.service.d/ +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-01-privateusers.conf <<'EOF' +[Service] +PrivateUsers=true +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify:** +```bash +# 1. Unit started cleanly +sudo systemctl is-active left4me-server@1 +# Expect: active + +# 2. ExecStartPre's nsenter+overlay-mount succeeded (the mount exists) +sudo findmnt /var/lib/left4me/runtime/1/merged +# Expect: a row showing overlay mounted + +# 3. Process is running +pgrep -af srcds_linux +# Expect: at least one PID matching left4dead2 + +# 4. From inside the unit's namespace: process appears as configured uid +PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1) +sudo cat /proc/$PID/status | grep -E '^Uid|^Gid' +# Expect: uid 980 (left4me) — outside the namespace, the kernel reports +# the unit's User=. Inside the namespace it's also 980 (identity map). + +# 5. Userns confirmed +sudo readlink /proc/$PID/ns/user +sudo readlink /proc/1/ns/user +# Expect: different — different user namespaces +``` + +**Pass criteria:** all five checks pass. + +**Failure handling:** if unit fails to start, check +`journalctl -u left4me-server@1 -n 100` for the failure reason. Most +likely cause if it fails: the overlay-mount helper itself depends on +the unit's mount namespace in a way that PrivateUsers breaks. (The `+` +prefix should bypass — verifying that assumption is the test's whole +point.) + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-server@1.service.d/test-01-privateusers.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +sudo systemctl is-active left4me-server@1 # active again +``` + +--- + +## Test 2 — `TemporaryFileSystem` + minimal bind set + +**Goal:** Confirm srcds runs with `/var/lib`, `/etc`, `/opt`, `/home`, +`/root` virtualized to empty tmpfs, with only the listed paths bound back. + +**Drop-in:** +```bash +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-02-tmpfs.conf <<'EOF' +[Service] +# Remove the legacy paths so they don't collide with the new bind setup +ReadOnlyPaths= +ReadWritePaths= + +# Virtual filesystem +TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media +BindReadOnlyPaths=/var/lib/left4me/installation +BindReadOnlyPaths=/var/lib/left4me/overlays +BindReadOnlyPaths=/etc/left4me/host.env +BindReadOnlyPaths=/etc/ssl /etc/ca-certificates +BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives +BindPaths=/var/lib/left4me/runtime/%i +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify:** +```bash +# 1. Unit started +sudo systemctl is-active left4me-server@1 + +# 2. From inside the unit's namespace: invisible files +PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1) +sudo nsenter --target $PID --mount -- ls -la /var/lib/left4me/left4me.db 2>&1 +# Expect: No such file or directory + +sudo nsenter --target $PID --mount -- ls -la /etc/left4me/web.env 2>&1 +# Expect: No such file or directory + +sudo nsenter --target $PID --mount -- ls /opt 2>&1 +# Expect: empty or "No such file or directory" + +sudo nsenter --target $PID --mount -- ls /var/lib/left4me/ +# Expect: only installation, overlays, runtime (the bound paths) + +# 3. Bound paths visible and right mode +sudo nsenter --target $PID --mount -- ls -la /var/lib/left4me/runtime/1/ +# Expect: upper, work, merged dirs visible, RW + +sudo nsenter --target $PID --mount -- ls /etc/left4me/ +# Expect: only host.env + +# 4. DNS works (workshop downloads, master server) +sudo nsenter --target $PID --mount --net -- getent hosts steamcommunity.com +# Expect: an IP + +# 5. Game running normally +sudo systemctl status left4me-server@1 --no-pager | head -15 +# Expect: active (running) + +# 6. No SECCOMP/EACCES errors +sudo journalctl -u left4me-server@1 --since '2 minutes ago' \ + | grep -iE 'permission|denied|seccomp|EACCES|ENOENT' | head -20 +# Expect: nothing alarming. Some ENOENT may be normal (srcds probes +# files); the question is whether anything is failing fatally. +``` + +**Pass criteria:** unit active, DB/web.env/src invisible, runtime +visible+writable, DNS works, no fatal errors in journal. + +**Failure handling:** if a bind path is missing on disk, the unit +fails to start with a clear error. Add the missing path or remove the +bind reference. + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-server@1.service.d/test-02-tmpfs.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +--- + +## Test 3 — `SystemCallFilter` (logging mode) + +**Goal:** Discover what srcds calls under load before committing to a +filter. Run with `SystemCallLog=` (audit only, doesn't block) for 5-10 +minutes of live play. + +**Drop-in:** +```bash +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-03-syslog.conf <<'EOF' +[Service] +SystemCallArchitectures=native +# Log every syscall in @privileged + @debug + @mount + @raw-io +SystemCallLog=@privileged @debug @mount @raw-io +SystemCallFilter= +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify (and produce data):** +```bash +# 1. Unit active +sudo systemctl is-active left4me-server@1 + +# 2. Capture logs for 5 minutes during normal play +# (manually connect a Steam client to the server, walk around, then disconnect) +sudo journalctl -u left4me-server@1 --since '5 minutes ago' \ + | grep -iE 'audit|syscall|SCMP' \ + | tee /tmp/syscall-log-test3.txt + +# 3. Analyze +sort -u /tmp/syscall-log-test3.txt > /tmp/syscall-log-test3-uniq.txt +wc -l /tmp/syscall-log-test3-uniq.txt +# Read through; identify whether @debug or @mount or @privileged +# contains any syscall srcds calls during normal operation. +``` + +**Pass criteria:** capture is complete. Decision feeds Test 4. + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-server@1.service.d/test-03-syslog.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +--- + +## Test 4 — `SystemCallFilter` (enforcement mode) + +**Goal:** Apply the candidate `SystemCallFilter=` and confirm srcds +runs without any SECCOMP-killed calls. Tightness driven by Test 3 +results. + +**Drop-in:** +```bash +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-04-syscall.conf <<'EOF' +[Service] +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify:** +```bash +# 1. Unit active +sudo systemctl is-active left4me-server@1 + +# 2. Watch for SECCOMP kills for ~10 minutes during play +sudo journalctl -u left4me-server@1 -kf +# Press Ctrl-C after 10 min if no SECCOMP audit lines (type=1326) + +# 3. Functional: server accepts connections, plugins load +# (use Steam client; verify in-game) +# Optional RCON check: +# sudo rcon -p $PW -a left4.me:27015 "sm plugins list" +# Expect: list of plugins, all loaded. + +# 4. Verify ptrace is blocked +GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1) +PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1) +sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -5 +# Expect: ptrace: Operation not permitted (or seccomp denial) +``` + +**Pass criteria:** unit active for ≥10 min, no SECCOMP kills, plugins +load, ptrace blocked. + +**Failure handling:** if SECCOMP kills appear: +- Identify the syscall from the audit line (`syscall= compat=0`), + resolve via `scmp_sys_resolver -a $(uname -m) ` (libseccomp-dev). +- Relax the filter: remove the offending group from the deny list, OR + switch from kill (default) to log (`SystemCallErrorNumber=EPERM`) + for that group. + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-server@1.service.d/test-04-syscall.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +--- + +## Test 5 — `ProcSubset=pid` + `ProtectProc=invisible` + +**Goal:** Confirm /proc is narrowed to the unit's own PIDs and +hidden from external readers. + +**Drop-in:** +```bash +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-05-proc.conf <<'EOF' +[Service] +ProtectProc=invisible +ProcSubset=pid +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify:** +```bash +# 1. Unit active +sudo systemctl is-active left4me-server@1 + +# 2. /proc visibility narrowed +PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1) +sudo nsenter --target $PID --mount --pid -- ls /proc | head -20 +# Expect: only the unit's own PIDs (srcds_run, srcds_linux, +# child threads). NOT gunicorn or other PIDs. + +# 3. Can't read other procs' environ +GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1) +sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/environ 2>&1 +# Expect: No such file or directory (invisible) — not Permission denied +``` + +**Pass criteria:** all of the above; no gunicorn PIDs visible. + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-server@1.service.d/test-05-proc.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +--- + +## Test 6 — `MemoryDenyWriteExecute=true` + +**Goal:** Test whether Source engine + sourcemod work under MDW=true. +**Likely to fail.** Skip if uncertain. + +**Drop-in:** +```bash +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-06-mdw.conf <<'EOF' +[Service] +MemoryDenyWriteExecute=true +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify:** +```bash +# 1. Unit active +sudo systemctl is-active left4me-server@1 + +# 2. Run for 10+ minutes during normal play, including: +# - Connect a Steam client +# - Walk around a map +# - Trigger a plugin (rcon: sm_admin) +# - Map change +# - Disconnect + +# 3. Watch for crashes +sudo journalctl -u left4me-server@1 --since '15 minutes ago' \ + | grep -iE 'segfault|SIGSEGV|coredump|abort|EPERM.*mprotect' +# Expect: empty + +# 4. SECCOMP kills from mprotect calls +sudo journalctl -u left4me-server@1 -k --since '15 minutes ago' \ + | grep -i 'type=1326.*mprotect' +# Expect: empty +``` + +**Pass criteria:** no crashes, no relevant SECCOMP audit lines. + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-server@1.service.d/test-06-mdw.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Decision:** if pass → include `MemoryDenyWriteExecute=true` in the +final composition. If fail → exclude (and document the reason in the +result). + +--- + +## Test 7 — Full proposed composition (everything that passed) + +**Goal:** Compose tests 1, 2, 4, 5, (6 if it passed) into a single +drop-in and verify nothing interacts badly. + +**Drop-in:** (Adjust to skip Test 6's directives if Test 6 failed.) +```bash +sudo tee /etc/systemd/system/left4me-server@1.service.d/test-07-full.conf <<'EOF' +[Service] +# Identity / privilege +NoNewPrivileges=true +RestrictSUIDSGID=true +CapabilityBoundingSet= +AmbientCapabilities= +UMask=0027 + +# Namespaces +PrivateUsers=true +PrivateTmp=true +PrivateDevices=true +PrivateIPC=true +ProtectHome=true + +# Filesystem view (clean slate) +ReadOnlyPaths= +ReadWritePaths= +TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media +BindReadOnlyPaths=/var/lib/left4me/installation +BindReadOnlyPaths=/var/lib/left4me/overlays +BindReadOnlyPaths=/etc/left4me/host.env +BindReadOnlyPaths=/etc/ssl /etc/ca-certificates +BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives +BindPaths=/var/lib/left4me/runtime/%i +ProtectSystem=strict + +# /proc + kernel +ProtectProc=invisible +ProcSubset=pid +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectKernelLogs=true +ProtectClock=true +ProtectControlGroups=true +ProtectHostname=true +LockPersonality=true + +# Syscall +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged + +# Network +RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX + +# IPC + realtime + namespaces +RestrictNamespaces=true +RestrictRealtime=true +RemoveIPC=true +KeyringMode=private + +# (Include only if Test 6 passed:) +# MemoryDenyWriteExecute=true +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 +``` + +**Verify:** +```bash +# 1. Unit active +sudo systemctl is-active left4me-server@1 +sleep 30 +sudo systemctl is-active left4me-server@1 # still active + +# 2. systemd-analyze: score should drop significantly +sudo systemd-analyze security left4me-server@1.service \ + | tee /tmp/sec-after-server.txt +diff /tmp/sec-baseline-server.txt /tmp/sec-after-server.txt \ + | head -40 +# Expect: many ✓ lines that were ✗, score dropped + +# 3. Run smoke matrix (next section) +``` + +**Smoke matrix (run after Test 7 settles):** + +```bash +# S1: server is responsive +sudo systemctl status left4me-server@1 --no-pager | head -10 +# Active (running), recent green + +# S2: srcds is in-game +PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1) +[ -n "$PID" ] && echo "OK: srcds PID $PID" || echo "FAIL" + +# S3: from outside, RCON responds +# (do this from the operator's laptop or via the web UI) + +# S4: workshop / overlay refresh path +# (trigger from web UI; verify the overlay rebuild succeeds — the +# script-sandbox is a SEPARATE unit, not affected by these changes, +# so any failure is in the web app's invocation path, not the +# sandbox itself.) + +# S5: web app can still sudo helpers +# (trigger a server start/stop from the web UI; if the sudo path +# fails, the web app's hardening is too tight — but we haven't +# changed the web unit yet, so this should still work.) + +# S6: log streaming works +# (open the web UI's log view for server@1; verify lines flow.) + +# S7: file upload to overlay +# (upload a small file via the file-tree endpoint; verify it +# appears on disk in /var/lib/left4me/overlays//.) + +# S8: peer server unaffected +sudo systemctl is-active left4me-server@2 +# active (we didn't touch it) +``` + +**Pass criteria:** all smoke items pass. systemd-analyze score +dropped significantly. + +**Failure handling:** if anything in the smoke fails, identify which +directive caused it by removing them one at a time until smoke +passes. Document the offender. + +**DO NOT cleanup yet** — leave Test 7 in place for Test 8. + +--- + +## Test 8 — Attack verification (the audit gaps) + +**Goal:** Confirm the threat-model defenses (D1, D2, D3, D5) actually +work end-to-end. + +**Pre-condition:** Test 7's drop-in still in place. + +**Verify:** +```bash +PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1) +GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1) + +# D1.a — srcds cannot read DB +sudo nsenter --target $PID --mount -- cat /var/lib/left4me/left4me.db 2>&1 | head -1 +# Expect: cat: /var/lib/left4me/left4me.db: No such file or directory + +# D1.b — srcds cannot read web.env +sudo nsenter --target $PID --mount -- cat /etc/left4me/web.env 2>&1 | head -1 +# Expect: cat: /etc/left4me/web.env: No such file or directory + +# D1.c — srcds cannot read its own past +sudo nsenter --target $PID --mount -- ls /opt 2>&1 | head -5 +# Expect: empty listing or No such file or directory + +# D2.a — srcds cannot ptrace gunicorn (syscall filter) +sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -3 +# Expect: Operation not permitted + +# D2.b — srcds cannot read /proc//environ +sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/environ 2>&1 | head -1 +# Expect: No such file or directory (ProtectProc=invisible) + +# D2.c — srcds cannot read /proc//mem +sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/mem 2>&1 | head -1 +# Expect: No such file or directory + +# D3 — srcds cannot use sudo helpers (NoNewPrivileges blocks setuid) +sudo nsenter --target $PID --mount -- sudo -n /usr/local/libexec/left4me/left4me-systemctl show server@2 2>&1 | head -3 +# Expect: a sudo error about no new privileges, or operation not permitted + +# D5 — server@1 cannot ptrace server@2's srcds +PID2=$(pgrep -f 'srcds_linux.*\@2' | head -1) +[ -n "$PID2" ] && sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $PID2 2>&1 | tail -3 +# Expect: Operation not permitted (cross-instance userns OR syscall filter) + +# Bonus — confirm PrivateUsers is in effect +sudo readlink /proc/$PID/ns/user +sudo readlink /proc/1/ns/user +# Expect: different +``` + +**Pass criteria:** every attack vector returns an error. + +**Cleanup:** **Do not remove the drop-in yet** — leave it for Test 9. + +--- + +## Test 9 — System-wide sysctl: `kernel.yama.ptrace_scope=2` + +**Goal:** Add belt-and-braces system-wide. + +**Apply:** +```bash +sudo tee /etc/sysctl.d/99-left4me-ptrace.conf <<'EOF' +# Block ptrace except from root (CAP_SYS_PTRACE). +# Combined with SystemCallFilter=~@debug + PrivateUsers=true in the +# unit, this gives defense-in-depth at three levels. +kernel.yama.ptrace_scope=2 +EOF +sudo sysctl --system | grep yama +# Expect: kernel.yama.ptrace_scope = 2 +sysctl kernel.yama.ptrace_scope +# Expect: 2 +``` + +**Verify:** +```bash +# As left4me (no caps), gdb attach to gunicorn from OUTSIDE the unit's +# namespace +sudo -u left4me /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -3 +# Expect: Operation not permitted + +# Operator gdb (as root) still works: +sudo /usr/bin/gdb --batch -ex "info threads" -p $GUNICORN_PID 2>&1 | tail -10 +# Expect: gdb output (debugging is admin-only now) +``` + +**Pass criteria:** non-root can't ptrace anything; root still can. + +**No cleanup** — this is permanent (commit to /etc/sysctl.d/). + +--- + +## Test 10 — Web unit hardening (carefully) + +**Goal:** Apply non-sudo-breaking directives to `left4me-web.service`. + +**Pre-condition:** Test 7's server drop-in still in place. Web is at +baseline. + +**Drop-in:** +```bash +sudo install -d -m0755 /etc/systemd/system/left4me-web.service.d/ +sudo tee /etc/systemd/system/left4me-web.service.d/test-10-web.conf <<'EOF' +[Service] +# (NoNewPrivileges intentionally NOT set — web sudoes to helpers.) +# (PrivateUsers intentionally NOT set — would break sudo's setuid.) +# (CapabilityBoundingSet not set — sudo + PAM need caps.) + +ProtectSystem=strict +ProtectHome=true +LockPersonality=true +UMask=0027 + +# /proc + kernel +ProtectProc=invisible +ProcSubset=pid +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectKernelLogs=true +ProtectClock=true +ProtectControlGroups=true +ProtectHostname=true + +# Syscall (no ~@privileged — sudo needs setuid/etc.) +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete + +# Network +RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX + +# Misc +RestrictNamespaces=true +RestrictRealtime=true +RemoveIPC=true +EOF +sudo systemctl daemon-reload +sudo systemctl restart left4me-web +``` + +**Verify:** +```bash +# 1. Web up +sudo systemctl is-active left4me-web + +# 2. Web responds (curl from the host) +curl -sI http://127.0.0.1:8000/ | head -5 +# Expect: HTTP/1.1 200 or similar (whatever the default route is) + +# 3. Web sudo path works — trigger from operator's laptop, watching the +# web UI. Start/stop a server; observe success. + +# 4. systemd-analyze score +sudo systemd-analyze security left4me-web.service \ + | tee /tmp/sec-after-web.txt +diff /tmp/sec-baseline-web.txt /tmp/sec-after-web.txt | head -20 + +# 5. Web cannot ptrace srcds (D4) +WEB_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1) +sudo -u left4me /usr/bin/gdb --batch -p $PID 2>&1 | tail -3 +# (might still succeed if the operator runs as root — what matters is +# from inside the web unit's namespace) +sudo nsenter --target $WEB_PID --mount -- /usr/bin/gdb --batch -p $PID 2>&1 | tail -3 +# Expect: Operation not permitted (SystemCallFilter blocks ptrace) +``` + +**Pass criteria:** all of above. + +**Failure handling:** if sudo from web breaks, remove the most likely +culprit (probably one of the SystemCallFilter lines being too tight). +Most likely candidate: `~@debug` could block `process_vm_readv` which +sudo doesn't use, but `~@privileged` is not on the web filter so sudo's +setuid is OK. + +**Cleanup:** +```bash +sudo rm /etc/systemd/system/left4me-web.service.d/test-10-web.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-web +``` + +(Web reverts to baseline. Server drop-in stays for the report.) + +--- + +## Test 11 — Soak test + +**Goal:** Run the composition for an extended period to surface +race-condition or workload-dependent issues. + +**Pre-condition:** Test 7 drop-in on server@1; Test 9 sysctl in place. + +**Procedure:** +```bash +# Run for 24-48 hours; observe: +sudo journalctl -u left4me-server@1 --since '24 hours ago' \ + | grep -iE 'seccomp|denied|EACCES|EPERM' | wc -l +# Expect: 0 or a very small number (some EACCES on benign probes +# are normal) + +sudo journalctl -u left4me-server@1 -k --since '24 hours ago' \ + | grep 'type=1326' | wc -l +# Expect: 0 + +sudo systemctl status left4me-server@1 +# Expect: active, no restarts since start +``` + +**Pass criteria:** no SECCOMP kills over the soak period, no +unexpected restarts. + +--- + +## Cleanup (after all tests pass) + +```bash +# Remove all test drop-ins +sudo rm -rf /etc/systemd/system/left4me-server@1.service.d/test-*.conf +sudo rm -rf /etc/systemd/system/left4me-web.service.d/test-*.conf +sudo systemctl daemon-reload +sudo systemctl restart left4me-server@1 left4me-web +sudo systemctl is-active left4me-server@1 left4me-web # both active + +# Sysctl from Test 9 STAYS in place. + +# Remove temp files +rm /tmp/sec-baseline-*.txt /tmp/sec-after-*.txt +rm /tmp/unit-baseline-*.conf +rm /tmp/syscall-log-*.txt +``` + +--- + +## Results template + +Append the executing session's findings here. One paragraph per test. + +### Test 1 — PrivateUsers +- Pass / fail: TBD +- Notes: + +### Test 2 — TemporaryFileSystem + binds +- Pass / fail: TBD +- Notes: + +### Test 3 — SystemCallLog discovery +- Pass / fail: TBD +- Syscalls observed under load (if any from @debug/@mount/@privileged): +- Notes: + +### Test 4 — SystemCallFilter enforcement +- Pass / fail: TBD +- If filter had to be relaxed, which group: +- Notes: + +### Test 5 — ProcSubset + ProtectProc +- Pass / fail: TBD +- Notes: + +### Test 6 — MemoryDenyWriteExecute +- Pass / fail: TBD (likely fail; document the failure mode) +- Notes: + +### Test 7 — Full composition +- Pass / fail: TBD +- systemd-analyze score before/after: +- Notes: + +### Test 8 — Attack verification +- Pass / fail: TBD +- Per-vector results (D1.a, D1.b, ..., D5): + +### Test 9 — Yama ptrace_scope=2 +- Applied: TBD +- Operator workflow impact noted: + +### Test 10 — Web hardening +- Pass / fail: TBD +- Sudo path verified working: +- systemd-analyze score before/after: + +### Test 11 — Soak +- Duration: +- Issues observed: + +--- + +## Output of this test plan + +When all tests complete: +1. Mark this document with **status: tested** and record the dates. +2. Open a new implementation plan + (`docs/superpowers/plans/2026-MM-DD-hardening-refactor.md`) that + commits the proven composition to the ckn-bw reactor + reference + units + test suite. +3. Decide on the deferred questions: + - 3-user uid split — necessary or covered by hardening? + - AppArmor profile follow-up — pursue or close? + - `MemoryDenyWriteExecute=true` — include if Test 6 passed? + - `SocketBindAllow=` — add to lock the gameserver port range? +4. Mark `2026-05-15-user-uid-split-design.md` as superseded or closed + per the answer to the previous bullet. + +## Pointers + +- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md` +- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md` +- Live unit source: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+` +- Reference units: `deploy/files/usr/local/lib/systemd/system/` +- Tools needed on `left4.me`: + - `systemd-analyze` (in `systemd` package, already installed) + - `scmp_sys_resolver` (in `libseccomp-dev`; install on demand for + Test 3/4 if filters need analysis) + - `gdb` (for ptrace tests; install on demand) + - `nsenter` (in `util-linux`, already installed) + - `findmnt`, `pgrep`, standard userspace diff --git a/docs/superpowers/specs/2026-05-15-hardening-threat-model.md b/docs/superpowers/specs/2026-05-15-hardening-threat-model.md new file mode 100644 index 0000000..3d4fb81 --- /dev/null +++ b/docs/superpowers/specs/2026-05-15-hardening-threat-model.md @@ -0,0 +1,345 @@ +# left4me application hardening — threat model + +**Status:** living spec, intended input to a hardening implementation plan. +Paired with `2026-05-15-hardening-defenses-survey.md` and +`2026-05-15-hardening-test-plan.md`. + +This document establishes *what we defend against and what we accept losing*. +The defenses survey and test plan operationalize this against the codebase. + +## Context + +The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and +queued "uid split decision" as the next session's task +(`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user +configuration found that the gameserver's systemd hardening blocks +privilege escalation but leaves same-uid attack surface wide open: +RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by +srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable), +no ptrace block on `left4me-server@.service`, no `/proc` isolation. +Rather than answer the original "1/2/3 uids" question in isolation, +this work treats application hardening as a first-class refactor: ground +the decision in an explicit threat model, survey the full Linux+systemd +defense menu, test what composes safely with Source engine + the rest of +the stack, then implement. + +## Operating posture (assumed) + +Solo-operator, single-host infra (`left4.me` / `ovh.left4me`, +141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged +operator is the user. There are no shell logins as `left4me` or +`l4d2-sandbox`. All access to those uids is funneled through the +systemd-managed units (`left4me-web.service`, `left4me-server@.service`, +`left4me-script-sandbox`). The host runs nothing other than left4me + +ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics). + +If those assumptions don't hold (e.g., shared host with other tenants, +non-systemd-mediated access to the uids), revise this document before +proceeding — threat surface changes meaningfully. + +## Assets + +Ordered by impact-if-compromised. Compromise means the attacker can +exfiltrate, modify, or destroy the asset. + +### Tier 1 — catastrophic, no easy recovery + +| Asset | Where | Impact of compromise | +|---|---|---| +| Host root | the box | Total compromise of every service on the host. | +| `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. | +| `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. | +| Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. | +| User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. | + +### Tier 2 — severe but bounded + +| Asset | Where | Impact | +|---|---|---| +| `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. | +| Overlay content | `/var/lib/left4me/overlays//` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. | +| Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. | +| Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. | +| Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. | + +### Tier 3 — limited, recoverable + +Job history, build logs, the small subset of in-game state not covered by +the above (e.g., live player slot in a specific match). + +## Trust boundaries + +Lines we want enforced. "Enforced" = the kernel + systemd, not "the +process politely doesn't cross it." + +| Id | From | To | Strength today | Strength wanted | +|---|---|---|---|---| +| TB1 | External network | host shell | Strong (firewall, no extra services) | Strong | +| TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong | +| TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) | +| TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong | +| TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong | +| TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/` access) | Medium (web needs to stage server.cfg) | +| TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong | +| TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary | + +TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the +web app legitimately writes per-instance config; the boundary is "web +can write per-instance config" allowed, "web can ptrace srcds" denied. + +## Attackers + +### A1 — Anonymous external attacker (primary) + +Reaches public surfaces: +- gunicorn on `:8000` (behind nginx + admin auth) +- srcds on UDP `:27015`+ per instance (game protocol; no auth) +- (Maybe: workshop subscription endpoints if any; check.) + +Capabilities: arbitrary network packets. Goal: code execution on the +host, then exfiltrate secrets and persist. + +### A2 — Authenticated admin (operator) + +In the assumed posture this is *the user*, single person. Out of scope as +a threat per operator's choice (insider == operator). If admin auth ever +expands to multiple operators, revise. + +### A3 — Malicious workshop content + +A workshop addon (map, plugin, asset pack) is published to the Steam +workshop and pulled into a build. The content runs inside srcds via +Source engine + sourcemod loading. Capabilities: same as A1 once loaded +into srcds (the engine doesn't have a strong privilege boundary against +its own loaded plugins). Distinct in that the entry vector is curated by +the operator (workshop link added to a blueprint), not arbitrary network +input. Risk floor: the operator vetted the source. + +### A4 — Compromised player session + +A connected player exploits a Source-engine protocol bug. Functionally a +subset of A1 — same capability set once code is running in srcds. + +### A5 — Local attacker on the host + +Out of scope per operating posture. No non-root local accounts beyond +the systemd-managed service uids. + +### A6 — Steam binary supply-chain + +`srcds_linux` is a binary from Valve. A compromised Valve build would +already be running as `left4me` and there's no practical defense at +this layer. Out of scope. + +## Attack scenarios + +### S1 — L4D2 engine RCE → exfil + persist + +A1 sends a crafted packet to srcds; srcds executes attacker code as +`left4me` inside `left4me-server@.service`. + +**Today, attacker can:** +- Read DB → all RCON passwords (plaintext), all bcrypt hashes. +- Read `web.env` → SECRET_KEY, Steam API key. +- ptrace gunicorn → in-memory secrets, current sessions. +- Read `/proc//environ` → same env as `web.env`. +- ptrace + read DB of peer `left4me-server@` — cross-server compromise. +- `sudo left4me-systemctl|journalctl|overlay` for any instance. +- Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`). +- Cannot acquire new caps (NoNewPrivileges). + +**Defended outcome (goal):** Blast radius limited to "this gameserver's +runtime state during this session" — no peer-server compromise, no DB +access, no `web.env` access, no ptrace. + +### S2 — Web app RCE → secrets + persistence + +A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth +bypass, file-upload escape). Web executes attacker code as `left4me` +inside `left4me-web.service`. + +**Today, attacker can:** +- Read + write DB (web's primary path). +- Read `web.env`. +- Write `/opt/left4me/src/` → backdoor next gunicorn reload. +- `sudo` all helper verbs. +- ptrace srcds peers, modify their `runtime//` upper layer. +- Modify overlays (writes to `/var/lib/left4me/overlays/`). + +**Defended outcome (goal):** Cannot ptrace gameservers; cannot read +`/proc//*`; web compromise still owns its DB and env (its +primary attack surface, so this is *acceptable residual*). + +### S3 — Cross-server contamination + +S1 played out on srcds@1; attacker pivots to srcds@2. + +**Today:** trivial — ptrace srcds@2, read its memory; or just read the +DB to learn srcds@2's RCON password and send commands. + +**Defended outcome (goal):** Blocked. Per-instance namespace isolation +(or per-instance uid) means kernel rejects ptrace; DB invisible to +gameserver uid hides the RCON list. + +### S4 — Malicious workshop content + +A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod +plugin that abuses engine APIs to do file I/O / network calls. + +**Today + with hardening:** functionally equivalent to S1 — the plugin +runs as srcds, same blast radius. No software boundary prevents this; +the only defense is what's outside the unit. So this is *covered* if S1 +is covered. + +### S5 — Sudoers helper abuse + +S1 or S2 attacker uses the sudo grants to widen access. + +**Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`): +- `left4me-systemctl {enable|disable|show}` — any instance, no + ownership check +- `left4me-journalctl ` — read any unit's journal +- `left4me-overlay mount|umount ` — any instance +- `left4me-script-sandbox