spec(hardening): threat model + defenses survey + test plan; pivot handoff
Reframe the queued uid-split decision into a broader hardening analysis. Audit found the same-uid attack surface (DB readable from srcds, ptrace allowed, RCON stored plaintext) is closable by either uid split or systemd directive composition; the three specs ground that choice in a threat model, survey the defenses, and lay out a self-contained test plan to run on left4.me next. uid-split spec deferred pending results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9a2ab974e6
commit
1df811e62a
4 changed files with 2036 additions and 86 deletions
698
docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
Normal file
698
docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
Normal file
|
|
@ -0,0 +1,698 @@
|
|||
# left4me application hardening — defenses survey
|
||||
|
||||
**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md`
|
||||
and `2026-05-15-hardening-test-plan.md`.
|
||||
|
||||
This document catalogs the Linux + systemd defense primitives applicable
|
||||
to left4me, evaluates each against this codebase's needs, and proposes a
|
||||
candidate composition. Each candidate is *testable* — the test plan
|
||||
exercises it before commit.
|
||||
|
||||
Reference: the threat model defines defenses D1-D7. This document maps
|
||||
primitives to those defenses.
|
||||
|
||||
## Section 1 — Linux kernel primitives
|
||||
|
||||
### Namespaces (`man 7 namespaces`)
|
||||
|
||||
| NS | Isolates | Relevance |
|
||||
|---|---|---|
|
||||
| **mount** | filesystem hierarchy view | Core. Gives `TemporaryFileSystem=` + bind primitives. |
|
||||
| **user** | uid/gid mapping | Big for D2/D4 (cross-uid ptrace block). |
|
||||
| **pid** | PID 1, /proc visibility | Pairs with `ProcSubset=pid` for D2. |
|
||||
| **net** | netifs, ports, routes | Breaks gameservers; do **not** apply to server@. |
|
||||
| **ipc** | SysV IPC + POSIX MQ + abstract sockets | Hygienic; `PrivateIPC=true`. |
|
||||
| **uts** | hostname | Cosmetic; doesn't matter for us. |
|
||||
| **time** | CLOCK_MONOTONIC offset | Irrelevant for us. |
|
||||
| **cgroup** | cgroup view | Defense-in-depth against cgroup escape. |
|
||||
|
||||
**For left4me:** mount + user + pid + ipc on `left4me-server@.service`.
|
||||
The web unit can use the same minus user-ns (incompatible with sudo).
|
||||
|
||||
### Capabilities (`man 7 capabilities`)
|
||||
|
||||
Per-process, granted at exec via file caps or by systemd at unit start.
|
||||
Bounding set = upper bound; ambient = inherited across non-setuid exec.
|
||||
|
||||
- **CapabilityBoundingSet=** empty drops everything. Neither srcds nor
|
||||
gunicorn needs any capability after they start (no raw sockets, no
|
||||
mount, no module load, no setuid).
|
||||
- **AmbientCapabilities=** empty (default).
|
||||
|
||||
Sharp edge: with `+`-prefixed ExecStartPre, the helper runs as PID 1
|
||||
(root, all caps), unaffected by these. That's how we get the privileged
|
||||
overlay mount without breaking the unit's caps.
|
||||
|
||||
### Seccomp-bpf (`man 2 seccomp`)
|
||||
|
||||
Filter syscall set. Per-process. Composes with the AND of all filters
|
||||
loaded. The systemd `SystemCallFilter=` wraps it.
|
||||
|
||||
For us, two filter strategies:
|
||||
- **Allow-list base** (`@system-service`): permissive enough for srcds
|
||||
+ gunicorn; subtract dangerous groups.
|
||||
- **Deny-list**: simpler but easier to leave holes.
|
||||
|
||||
Strategy: allow-list with subtractions.
|
||||
|
||||
Critical subtractions for D2:
|
||||
- `~@debug` — drops `ptrace(2)`, `process_vm_readv/writev(2)`,
|
||||
`process_madvise(2)`. **Single most important syscall block** for our
|
||||
threat model.
|
||||
- `~@mount` — `mount`, `umount2`, `pivot_root` (gameserver doesn't need;
|
||||
helper does, and helper runs as root via `+` prefix).
|
||||
- `~@privileged` — anything requiring CAP_*; redundant with empty
|
||||
bounding set but defense-in-depth.
|
||||
- `~@reboot`, `~@swap`, `~@cpu-emulation`, `~@obsolete` — cheap removal.
|
||||
|
||||
Sharp edges:
|
||||
- `SystemCallFilter=` lines compose left-to-right by union (first line
|
||||
sets allow-list; subsequent `~` lines subtract).
|
||||
- A `~` subtract on a group not in the allow-list is a no-op.
|
||||
- `SystemCallArchitectures=native` blocks 32-bit syscall entries that
|
||||
bypass the filter. Always set this.
|
||||
- `SystemCallErrorNumber=EPERM` vs. default `KILL` — `EPERM` is gentler
|
||||
for non-essential paths; `KILL` is loud and obvious. Start with
|
||||
default (KILL) for clear signal, switch to `EPERM` if a benign caller
|
||||
trips it (e.g., a library probing for capabilities).
|
||||
|
||||
### Yama LSM — `kernel.yama.ptrace_scope`
|
||||
|
||||
System-wide sysctl. Values:
|
||||
- 0: any same-user can ptrace
|
||||
- 1: same-uid or direct ancestor (Debian default)
|
||||
- 2: requires `CAP_SYS_PTRACE` (admin only)
|
||||
- 3: ptrace disabled entirely
|
||||
|
||||
For left4me: setting to 2 system-wide is cheap and removes the same-uid
|
||||
ptrace path entirely. Set via `/etc/sysctl.d/99-left4me.conf` (or
|
||||
extend an existing file). Doesn't affect debuggability — if you ever
|
||||
need to ptrace, do it as root.
|
||||
|
||||
Caveat: Yama is enforced AT THE TIME of `ptrace` call. With seccomp
|
||||
blocking the syscall entirely (`~@debug`), Yama becomes belt-and-braces;
|
||||
keep both for defense-in-depth.
|
||||
|
||||
### LSMs other than Yama
|
||||
|
||||
| LSM | Status on Debian Trixie | Fit for us |
|
||||
|---|---|---|
|
||||
| **AppArmor** | Available; not enabled by default | Could write profiles for srcds + gunicorn. Per-unit profile via `AppArmorProfile=` on systemd. Moderate effort. |
|
||||
| **SELinux** | Available; not enabled by default | Heavy. Not worth the operational cost on a single-host VPS. |
|
||||
| **landlock** | Kernel ≥5.13; available | Process-local sandboxing. Apps must opt in via `landlock(2)`. Python doesn't have a stdlib binding; need to call via ctypes or a wrapper. For us: would need to retrofit gunicorn or write a wrapper. Defer. |
|
||||
| **BPF LSM** | Kernel ≥5.7; available | Programmable LSM hooks. Bleeding edge for personal infra. Defer. |
|
||||
| **Tomoyo** | Available; not Debian-enabled | Path-based MAC. Niche. Skip. |
|
||||
|
||||
**For left4me:** Yama yes. AppArmor *maybe*, as a follow-up — a profile
|
||||
limited to "deny path X" patterns for srcds would be small but adds an
|
||||
audit/rollback surface. Skip in the first pass; revisit if test results
|
||||
show systemd directives alone leave gaps.
|
||||
|
||||
### Filesystem ACLs and modes
|
||||
|
||||
POSIX permissions, supplementary groups, ACLs (`setfacl`), extended
|
||||
attrs (`xattr`).
|
||||
|
||||
For us:
|
||||
- DB and `web.env` already use `root:left4me 0640`. If we go uid-split,
|
||||
ownership changes; if we go hardening-only, mode is fine — what
|
||||
matters is *whether the unit's FS view contains them at all*.
|
||||
- `setfacl` for fine-grained sharing (e.g., one supplementary group
|
||||
used by both web and game). Doable but adds complexity; consider
|
||||
only if uid split goes ahead.
|
||||
|
||||
### File attributes (chattr)
|
||||
|
||||
`chattr +i` (immutable) and `chattr +a` (append-only).
|
||||
|
||||
For us:
|
||||
- `chattr +i /opt/left4me/src/**` — prevents post-deploy tampering by
|
||||
anything short of root removing the attr. But: `pip install -e`
|
||||
creates `*.egg-info` files in the tree; deploy of new code would need
|
||||
to `chattr -R -i ...` first. Too much friction. Skip.
|
||||
- `chattr +i /etc/left4me/web.env` — keeps the env file from being
|
||||
rewritten by a malicious uid. Works because the env file is rewritten
|
||||
rarely (rotate SECRET_KEY explicitly via ckn-bw apply, which is root
|
||||
and can `chattr -i` first). Worth considering as a small extra.
|
||||
|
||||
### cgroups v2
|
||||
|
||||
Not a security primitive (not confidentiality/integrity), but a
|
||||
**resource ceiling**. Already in use:
|
||||
- `Slice=l4d2-game.slice`, `MemoryMax`, `TasksMax` — keep.
|
||||
|
||||
`MemoryDenyWriteExecute=true` is a kernel-level prctl + seccomp, not a
|
||||
cgroup, but listed here because it's resource-adjacent. See systemd
|
||||
section.
|
||||
|
||||
### Sudo / setuid
|
||||
|
||||
Sudoers grants narrow what a unit's uid can do as root. For us, the
|
||||
helpers (`scripts/libexec/left4me-*`) already validate inputs tightly
|
||||
(verified in audit). Two design options for the future:
|
||||
|
||||
- **Keep sudo path**, narrow the grants (per-uid via 3-user split, or
|
||||
per-action via tighter sudoers).
|
||||
- **Replace sudo with systemctl-managed transient units triggered via
|
||||
dbus / `systemctl start`** — the build-overlay-unit spec already
|
||||
proposes this for the script-sandbox.
|
||||
|
||||
The web app needs to invoke the helpers somehow. `NoNewPrivileges=true`
|
||||
on the web unit would break sudo's setuid. If we move to
|
||||
systemctl-triggered units (no setuid involved), we can also tighten the
|
||||
web unit. Sequenced in the implementation plan, not this survey.
|
||||
|
||||
## Section 2 — systemd unit-config primitives
|
||||
|
||||
### Identity
|
||||
|
||||
- **`User=` / `Group=`** — drop privileges. Already set.
|
||||
- **`DynamicUser=true`** — transient uid per run, persisted across runs
|
||||
via `StateDirectory=`. Strong default. **Bad fit for us** because
|
||||
multiple units share `/var/lib/left4me/` cross-unit; DynamicUser's
|
||||
per-unit `StateDirectory=` model fights that.
|
||||
- **`SupplementaryGroups=`** — extra groups. Used if we add a shared
|
||||
read-only group (e.g., `l4d2-overlay-readers`).
|
||||
|
||||
### Filesystem virtualization
|
||||
|
||||
The lever the operator asked about ("can systemd have a fully virtual
|
||||
filesystem"). Yes — composition:
|
||||
|
||||
- **`RootDirectory=path`** — chroot. Full FS substitution. Heavy;
|
||||
requires populating libs/binaries. Skip for the first pass.
|
||||
- **`RootImage=path`** — same but from a disk image. Way too heavy.
|
||||
- **`TemporaryFileSystem=path[:opts]`** — empty tmpfs at `path`.
|
||||
Cheap. Composes with bind paths.
|
||||
- **`BindReadOnlyPaths=src[:dst]`** — RO bind. Composes over
|
||||
TemporaryFileSystem.
|
||||
- **`BindPaths=src[:dst]`** — RW bind. Composes over TemporaryFileSystem.
|
||||
- **`InaccessiblePaths=path`** — masks a path with an empty file/dir.
|
||||
Legacy; Bind* is cleaner.
|
||||
- **`NoExecPaths=path`** / **`ExecPaths=path`** — restrict
|
||||
executable paths. Strong but easy to misconfigure.
|
||||
|
||||
Composition pattern (the one we want for srcds):
|
||||
```ini
|
||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv
|
||||
BindReadOnlyPaths=/var/lib/left4me/installation
|
||||
BindReadOnlyPaths=/var/lib/left4me/overlays
|
||||
BindReadOnlyPaths=/etc/left4me/host.env
|
||||
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates /etc/resolv.conf
|
||||
BindReadOnlyPaths=/etc/nsswitch.conf /etc/alternatives
|
||||
BindPaths=/var/lib/left4me/runtime/%i
|
||||
```
|
||||
|
||||
Result: srcds has no DB, no `web.env`, no `/opt/left4me/src/` in its FS
|
||||
view. Files outside the bound list are simply not there from srcds's
|
||||
perspective — `open()` returns ENOENT, not EACCES.
|
||||
|
||||
Sharp edges:
|
||||
- `TemporaryFileSystem=` size defaults to half RAM; clamp via
|
||||
`:size=NNM,nr_inodes=NN`.
|
||||
- Bind paths must exist on disk; ENOENT prevents unit start.
|
||||
- `BindReadOnlyPaths=` and `BindPaths=` reorder semantics: bind-mounts
|
||||
applied in order; later wins.
|
||||
- `RuntimeDirectory=` integrates with `TemporaryFileSystem=` cleanly:
|
||||
`RuntimeDirectory=left4me/foo` creates `/run/left4me/foo` and binds
|
||||
it in, auto-cleaning on stop.
|
||||
|
||||
### Namespaces (systemd wrappers)
|
||||
|
||||
- **`PrivateTmp=true`** — already set.
|
||||
- **`PrivateDevices=true`** — already set. Drops most of `/dev`.
|
||||
- **`PrivateNetwork=true`** — **don't** for gameservers (breaks UDP).
|
||||
- **`PrivateIPC=true`** — private SysV/POSIX IPC namespace; cheap win.
|
||||
- **`PrivateUsers=true`** — own userns. The configured `User=left4me`
|
||||
is identity-mapped inside; outside, the unit's processes appear as a
|
||||
mapped high uid (defense for D2/D4 against cross-namespace ptrace).
|
||||
Sharp edge: incompatible with `sudo` from inside the unit (setuid +
|
||||
userns mapping = no host-root).
|
||||
- **`PrivateMounts=true`** — own mount ns (default-implicit with most
|
||||
Protect* / Private* directives).
|
||||
|
||||
### `/proc` and `/sys` protection
|
||||
|
||||
- **`ProtectProc=invisible|noaccess|ptraceable|default`** —
|
||||
`invisible` makes other procs' `/proc/<pid>/*` not exist. **D2.**
|
||||
- **`ProcSubset=pid|all`** — `pid` restricts `/proc/` to PID entries;
|
||||
hides `/proc/kallsyms`, `/proc/cpuinfo`, etc. Cheap.
|
||||
- **`ProtectKernelTunables=true`** — `/proc/sys`, `/sys` read-only.
|
||||
- **`ProtectKernelModules=true`** — block `init_module`, `delete_module`.
|
||||
- **`ProtectKernelLogs=true`** — block `/dev/kmsg`, syslog().
|
||||
- **`ProtectClock=true`** — block `clock_settime`, `settimeofday`.
|
||||
- **`ProtectControlGroups=true`** — `/sys/fs/cgroup` read-only.
|
||||
- **`ProtectHostname=true`** — block `sethostname`/`setdomainname`.
|
||||
|
||||
All of `ProtectKernel*`, `ProtectClock`, `ProtectControlGroups`,
|
||||
`ProtectHostname` are cheap and have no downside for srcds or gunicorn.
|
||||
Add all of them.
|
||||
|
||||
### Filesystem protection (legacy / not Bind*)
|
||||
|
||||
- **`ProtectSystem=false|true|full|strict`** — increasingly stringent
|
||||
RO of system paths. `strict` makes `/`, `/usr`, `/boot`, `/etc`,
|
||||
`/opt` RO except for explicit writable paths.
|
||||
- **`ProtectHome=false|true|read-only|tmpfs`** — `tmpfs` masks `/home`,
|
||||
`/root`, `/run/user` with empty tmpfs.
|
||||
|
||||
For us: `ProtectSystem=strict` + `ProtectHome=tmpfs` is the baseline.
|
||||
But once we adopt `TemporaryFileSystem=` for the relevant trees, these
|
||||
become secondary — TemporaryFileSystem fully supersedes them in the
|
||||
covered subtrees. Keep both as defense-in-depth (cheap).
|
||||
|
||||
### Syscall filtering
|
||||
|
||||
- **`SystemCallFilter=expr`** — discussed in Linux section.
|
||||
- **`SystemCallArchitectures=native`** — always set.
|
||||
- **`SystemCallLog=expr`** — opt-in logging without enforcement;
|
||||
useful for diagnosing what gets called before tightening.
|
||||
- **`SystemCallErrorNumber=EPERM`** — soft denial vs. SIGKILL. Default
|
||||
is SIGKILL; switch later if a benign caller trips.
|
||||
|
||||
### Capabilities
|
||||
|
||||
- **`CapabilityBoundingSet=`** — empty drops all. Use it.
|
||||
- **`AmbientCapabilities=`** — empty (default).
|
||||
- **`NoNewPrivileges=true`** — prevents setuid escalation. **Required
|
||||
on srcds**, **incompatible with sudo on web** until sudo is replaced.
|
||||
|
||||
### Network restrictions
|
||||
|
||||
- **`RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX`** — for srcds.
|
||||
AF_UNIX needed for journald socket access.
|
||||
- **`IPAddressAllow=` / `IPAddressDeny=`** — uses cgroup BPF; affects
|
||||
outbound traffic. For srcds: probably overcomplicates; the firewall
|
||||
already controls ingress. Skip for first pass.
|
||||
- **`SocketBindAllow=` / `SocketBindDeny=`** — restricts which ports a
|
||||
unit can `bind()`. For srcds, allow only the configured game port
|
||||
range. Adds value but couples to config. Defer to a follow-up.
|
||||
|
||||
### Resource restrictions
|
||||
|
||||
- **`MemoryMax`**, **`TasksMax`**, **`LimitNOFILE`** — already set.
|
||||
- **`OOMScoreAdjust`** — already set (favor killing the gameserver
|
||||
before system processes if memory tight).
|
||||
- **`MemoryDenyWriteExecute=true`** — blocks `mprotect(PROT_WRITE|PROT_EXEC)`.
|
||||
Defends against shellcode in JIT memory. **Source engine likely
|
||||
fine** (no JIT in the binary; the Squirrel script engine is an
|
||||
interpreter, not JIT). **Sourcemod plugins**: most are compiled to
|
||||
bytecode + run on SourcePawn VM (interpreter); no JIT either. Verify
|
||||
in test.
|
||||
|
||||
### IPC and process hygiene
|
||||
|
||||
- **`RemoveIPC=true`** — clean up SysV IPC on unit stop.
|
||||
- **`KeyringMode=private`** — own kernel keyring; no host-key access.
|
||||
- **`LockPersonality=true`** — block `personality(2)` calls (no x86 vs
|
||||
x86-64 mode toggle). Already set.
|
||||
- **`RestrictRealtime=true`** — block real-time scheduling. srcds may
|
||||
use SCHED_OTHER + nice; no realtime needed.
|
||||
- **`RestrictNamespaces=true`** — block `unshare(2)` / `clone(CLONE_NEW*)`.
|
||||
- **`RestrictSUIDSGID=true`** — already set.
|
||||
- **`UMask=0027`** — narrow default umask.
|
||||
|
||||
### Capabilities of the `+` prefix
|
||||
|
||||
`ExecStartPre=+cmd` runs `cmd` as root in PID 1's namespaces, bypassing
|
||||
the unit's User= and almost all Protect*/Private*/Restrict* directives.
|
||||
This is how the existing overlay-mount helper runs. Critical to verify
|
||||
in test:
|
||||
- Does `+` preserve the bypass when `PrivateUsers=true` is set?
|
||||
(Expected: yes — the userns is set up around the unit's processes;
|
||||
`+` puts the helper outside it.)
|
||||
|
||||
### State management (per-unit)
|
||||
|
||||
- **`StateDirectory=path`** — creates `/var/lib/<path>` owned by User=.
|
||||
- **`RuntimeDirectory=path`** — creates `/run/<path>`, auto-deleted on
|
||||
stop.
|
||||
- **`LogsDirectory=path`** — `/var/log/<path>`.
|
||||
- **`CacheDirectory=path`** — `/var/cache/<path>`.
|
||||
- **`ConfigurationDirectory=path`** — `/etc/<path>`.
|
||||
|
||||
Useful for cleanup hygiene if we redesign storage layout. Not required
|
||||
for first pass.
|
||||
|
||||
### `systemd-analyze security`
|
||||
|
||||
`systemd-analyze security <unit>` produces a security score per unit
|
||||
(lower = more secure). Output lists each directive with a ✓/✗.
|
||||
Useful as:
|
||||
- Regression check (record baseline, ensure score drops after refactor).
|
||||
- Discovery tool ("which directives haven't I set?").
|
||||
|
||||
Baseline scores (to capture during test plan):
|
||||
- `left4me-server@1.service` before refactor
|
||||
- `left4me-web.service` before refactor
|
||||
|
||||
### Composability lookups
|
||||
|
||||
The systemd docs use a "predefined preset" concept that's worth knowing:
|
||||
|
||||
- **`@privileged`** (syscall group) ⊃ `@process`, `@module`, `@ptrace`, etc.
|
||||
- **`@system-service`** is the recommended base for "I want a normal
|
||||
service to work."
|
||||
- Subtracting `~@privileged` is broad; `~@debug @mount @raw-io` is
|
||||
surgical.
|
||||
|
||||
## Section 3 — Application-level options
|
||||
|
||||
### Apparmor profile for srcds
|
||||
|
||||
If systemd directives leave gaps, an AppArmor profile would let us
|
||||
deny specific paths or operations beyond what systemd's directives
|
||||
cover. E.g., "deny network for srcds to a specific IP range" via
|
||||
`network inet stream...` deny rules; or "deny mounting" beyond
|
||||
`SystemCallFilter`.
|
||||
|
||||
Effort:
|
||||
- Enable AppArmor in the kernel cmdline + boot config.
|
||||
- Write a profile (e.g., `/etc/apparmor.d/usr.bin.srcds_linux`).
|
||||
- Reference via systemd `AppArmorProfile=` per unit.
|
||||
|
||||
Skip for the first pass; revisit if test results show the systemd
|
||||
directives alone leave a gap.
|
||||
|
||||
### landlock for the web app
|
||||
|
||||
Python web app could call `landlock_create_ruleset` / `landlock_add_rule`
|
||||
/ `landlock_restrict_self` via ctypes. Restricts FS access at runtime.
|
||||
|
||||
For us:
|
||||
- Could restrict gunicorn to `/var/lib/left4me/` + `/etc/left4me/web.env`
|
||||
+ `/opt/left4me/.venv` + `/tmp`.
|
||||
- Symmetric to `TemporaryFileSystem=` + `Bind*` but at the
|
||||
application layer (no systemd reach).
|
||||
|
||||
Skip; systemd directives are simpler. Reconsider if we move to a
|
||||
DynamicUser-style world later.
|
||||
|
||||
### File-integrity tooling (Aide, Tripwire)
|
||||
|
||||
Out of scope for prevention; useful for detection. Not in this design.
|
||||
|
||||
### Custom seccomp profile (bypassing systemd)
|
||||
|
||||
The web app could call `seccomp(2)` from inside Python via libseccomp
|
||||
+ ctypes to tighten its own filter beyond what systemd applies.
|
||||
Symmetric to landlock; skip for the same reason.
|
||||
|
||||
## Section 4 — Per-defense mapping
|
||||
|
||||
For each defense from the threat model, the primitives that implement
|
||||
it, in priority order:
|
||||
|
||||
### D1 — Gameserver RCE cannot exfiltrate DB or `web.env`
|
||||
|
||||
| Primitive | Strength | Notes |
|
||||
|---|---|---|
|
||||
| `TemporaryFileSystem=/var/lib /etc` + minimal bind set | Strong | The files simply aren't in the unit's FS view. ENOENT, not EACCES. |
|
||||
| 3-user split (DB owned by `l4d2-web`) | Strong | Kernel-enforced; survives unit-config errors. |
|
||||
| `BindReadOnlyPaths=/dev/null:/var/lib/left4me/left4me.db` | Medium | Masks the path; brittle (paths can move). |
|
||||
| Filesystem ACLs (DB mode 0600) | Weak | Kernel still allows `left4me` group; only fixed by uid split. |
|
||||
|
||||
**Composition chosen:** `TemporaryFileSystem=` + Bind* (primary).
|
||||
3-user split as defense-in-depth or deferred.
|
||||
|
||||
### D2 — Gameserver RCE cannot ptrace web app or peers
|
||||
|
||||
| Primitive | Strength | Notes |
|
||||
|---|---|---|
|
||||
| `SystemCallFilter=~@debug` | Strong | Blocks `ptrace`, `process_vm_readv/writev`. |
|
||||
| `kernel.yama.ptrace_scope=2` | Strong | Belt-and-braces at the kernel level. |
|
||||
| `CapabilityBoundingSet=` empty | Strong | No CAP_SYS_PTRACE. |
|
||||
| `PrivateUsers=true` | Strong | Cross-userns ptrace requires CAP_SYS_PTRACE. |
|
||||
| 3-user split | Strong | Different uids; same-uid path doesn't exist. |
|
||||
|
||||
**Composition chosen:** All four (syscall + yama + caps + userns)
|
||||
together; they compose redundantly.
|
||||
|
||||
### D3 — Gameserver RCE cannot use sudo helpers
|
||||
|
||||
| Primitive | Strength | Notes |
|
||||
|---|---|---|
|
||||
| `NoNewPrivileges=true` | Strong | Blocks sudo's setuid. Already set on server@. |
|
||||
| `PrivateUsers=true` | Strong | sudo across userns boundary impossible. |
|
||||
| Sudoers grants scoped to `l4d2-web` (uid split) | Strong | Different uid means sudo grant doesn't apply. |
|
||||
| `RestrictSUIDSGID=true` | Strong | Already set. |
|
||||
|
||||
**Composition chosen:** NoNewPrivileges (already) + PrivateUsers (new)
|
||||
+ RestrictSUIDSGID (already). 3-user split is *also* covered by NNP
|
||||
+ PrivateUsers; uid split would be defense-in-depth.
|
||||
|
||||
### D4 — Web app RCE cannot ptrace gameservers
|
||||
|
||||
| Primitive | Strength | Notes |
|
||||
|---|---|---|
|
||||
| `SystemCallFilter=~@debug` on **web** | Strong | Symmetric to D2 but applied to web. |
|
||||
| `kernel.yama.ptrace_scope=2` | Strong | System-wide, helps both directions. |
|
||||
| 3-user split | Strong | Different uids. |
|
||||
|
||||
**Composition chosen:** SystemCallFilter on web + yama=2 system-wide.
|
||||
PrivateUsers cannot be applied to web (sudo incompatibility). 3-user
|
||||
split as defense-in-depth or deferred.
|
||||
|
||||
### D5 — Cross-server contamination
|
||||
|
||||
Each `left4me-server@<n>.service` is a separate unit instance. With
|
||||
`PrivateUsers=true`, each gets its own user namespace. Cross-namespace
|
||||
ptrace fails. With `TemporaryFileSystem=` and per-instance
|
||||
`BindPaths=/var/lib/left4me/runtime/%i`, neither instance can read the
|
||||
other's `runtime/<n>/` or attach to its process.
|
||||
|
||||
**Composition chosen:** PrivateUsers + per-instance Bind* (above).
|
||||
Per-instance uids out of scope.
|
||||
|
||||
### D6 — Persistent compromise of `/opt/left4me/src/` blocked from gameserver
|
||||
|
||||
Already covered by `ProtectSystem=strict` on server@.service. With
|
||||
`TemporaryFileSystem=/opt`, the path simply isn't visible to srcds.
|
||||
**Stronger and redundant — both can stay.**
|
||||
|
||||
### D7 — Defenses survive a unit-config refactor in the wrong direction
|
||||
|
||||
`deploy/tests/test_deploy_artifacts.py` asserts the directives' presence
|
||||
in the deployed unit. Add hardening invariants as test cases. Survives
|
||||
because the test fails CI before deploy.
|
||||
|
||||
## Section 5 — Candidate composition
|
||||
|
||||
**For testing, not commitment.** Test plan validates each piece.
|
||||
|
||||
### `left4me-server@.service`
|
||||
|
||||
```ini
|
||||
[Service]
|
||||
User=left4me
|
||||
Group=left4me
|
||||
|
||||
# (existing)
|
||||
Type=simple
|
||||
WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
|
||||
EnvironmentFile=/etc/left4me/host.env
|
||||
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
||||
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
||||
ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
||||
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# Resource control (existing)
|
||||
Slice=l4d2-game.slice
|
||||
Nice=-5
|
||||
IOSchedulingClass=best-effort
|
||||
IOSchedulingPriority=4
|
||||
OOMScoreAdjust=-200
|
||||
MemoryHigh=1.5G
|
||||
MemoryMax=2G
|
||||
TasksMax=256
|
||||
LimitNOFILE=65536
|
||||
KillSignal=SIGINT
|
||||
TimeoutStopSec=15s
|
||||
LogRateLimitIntervalSec=0
|
||||
|
||||
# Hardening — identity
|
||||
NoNewPrivileges=true
|
||||
RestrictSUIDSGID=true
|
||||
|
||||
# Hardening — namespaces
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
PrivateIPC=true
|
||||
PrivateUsers=true # NEW
|
||||
ProtectHome=true
|
||||
|
||||
# Hardening — filesystem view
|
||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media # NEW
|
||||
BindReadOnlyPaths=/var/lib/left4me/installation # was ReadOnlyPaths
|
||||
BindReadOnlyPaths=/var/lib/left4me/overlays # was ReadOnlyPaths
|
||||
BindReadOnlyPaths=/etc/left4me/host.env # NEW
|
||||
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates # NEW
|
||||
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives # NEW
|
||||
BindPaths=/var/lib/left4me/runtime/%i # was ReadWritePaths
|
||||
ProtectSystem=strict
|
||||
# (remove old ReadOnlyPaths= and ReadWritePaths= lines — superseded)
|
||||
|
||||
# Hardening — /proc, /sys, kernel
|
||||
ProtectProc=invisible # NEW
|
||||
ProcSubset=pid # NEW
|
||||
ProtectKernelTunables=true # NEW
|
||||
ProtectKernelModules=true # NEW
|
||||
ProtectKernelLogs=true # NEW
|
||||
ProtectClock=true # NEW
|
||||
ProtectControlGroups=true # NEW
|
||||
ProtectHostname=true # NEW
|
||||
LockPersonality=true
|
||||
|
||||
# Hardening — caps + syscall
|
||||
CapabilityBoundingSet= # NEW
|
||||
AmbientCapabilities= # NEW
|
||||
SystemCallArchitectures=native # NEW
|
||||
SystemCallFilter=@system-service # NEW
|
||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged # NEW
|
||||
|
||||
# Hardening — network
|
||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW (AF_UNIX for journald)
|
||||
|
||||
# Hardening — namespaces, realtime, IPC
|
||||
RestrictNamespaces=true # NEW
|
||||
RestrictRealtime=true # NEW
|
||||
RemoveIPC=true # NEW
|
||||
KeyringMode=private # NEW
|
||||
UMask=0027 # NEW
|
||||
|
||||
# Deferred until test:
|
||||
# MemoryDenyWriteExecute=true # MAY break sourcemod / Source engine; test first.
|
||||
```
|
||||
|
||||
### `left4me-web.service`
|
||||
|
||||
```ini
|
||||
[Service]
|
||||
User=left4me
|
||||
Group=left4me
|
||||
|
||||
# (existing)
|
||||
Type=simple
|
||||
WorkingDirectory=/opt/left4me/src
|
||||
Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
||||
EnvironmentFile=/etc/left4me/host.env
|
||||
EnvironmentFile=/etc/left4me/web.env
|
||||
ExecStart=/opt/left4me/.venv/bin/gunicorn --workers ... --threads ... --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
|
||||
Restart=on-failure
|
||||
RestartSec=3
|
||||
|
||||
# Hardening
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict # tightened from =full
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/lib/left4me # web needs broad write access there
|
||||
# NoNewPrivileges intentionally NOT set — sudo
|
||||
# PrivateUsers intentionally NOT set — sudo
|
||||
|
||||
# /proc + kernel hardening (sudo-compatible)
|
||||
ProtectProc=invisible # NEW
|
||||
ProcSubset=pid # NEW
|
||||
ProtectKernelTunables=true # NEW
|
||||
ProtectKernelModules=true # NEW
|
||||
ProtectKernelLogs=true # NEW
|
||||
ProtectClock=true # NEW
|
||||
ProtectControlGroups=true # NEW
|
||||
ProtectHostname=true # NEW
|
||||
LockPersonality=true # NEW
|
||||
|
||||
# Syscall filter — allow @system-service minus debug-class; keep @privileged
|
||||
# because sudo needs setuid, chown, etc.
|
||||
SystemCallArchitectures=native # NEW
|
||||
SystemCallFilter=@system-service # NEW
|
||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete # NEW
|
||||
|
||||
# Network
|
||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW
|
||||
|
||||
# Misc hygiene
|
||||
RestrictRealtime=true # NEW
|
||||
RestrictNamespaces=true # NEW
|
||||
RemoveIPC=true # NEW
|
||||
UMask=0027 # NEW
|
||||
|
||||
# Deferred for sudo-removal future work:
|
||||
# NoNewPrivileges=true
|
||||
# CapabilityBoundingSet=
|
||||
# PrivateUsers=true
|
||||
```
|
||||
|
||||
### Host sysctl
|
||||
|
||||
`/etc/sysctl.d/99-left4me.conf` (or merge into existing):
|
||||
```
|
||||
kernel.yama.ptrace_scope=2
|
||||
```
|
||||
|
||||
System-wide. Means: even if a unit-level config slips, host-level
|
||||
ptrace is admin-only. Cost: zero for our use case (no debugging in
|
||||
prod).
|
||||
|
||||
## Section 6 — Trade-offs and known sharp edges
|
||||
|
||||
To verify in the test plan:
|
||||
|
||||
1. **`PrivateUsers=true` + `+`-prefixed ExecStartPre**: expected to
|
||||
work (the `+` runs outside the unit's namespaces). Sharp if it
|
||||
doesn't — the overlay mount would fail and srcds wouldn't start.
|
||||
2. **`TemporaryFileSystem=/etc` and missing files**: srcds and its
|
||||
dependencies (libstdc++ runtime, libssl, libcurl) may read files
|
||||
from `/etc` we haven't bound. Watch journalctl for ENOENT during
|
||||
first start.
|
||||
3. **`SystemCallFilter=~@privileged` and Source engine**: srcds is C++
|
||||
and uses syscalls beyond the obvious. A `~@privileged` may trip
|
||||
something. Mitigation: test with `SystemCallLog=` instead of
|
||||
`SystemCallFilter=` first; observe what would have been blocked;
|
||||
then narrow.
|
||||
4. **`MemoryDenyWriteExecute=true` and sourcemod**: SourcePawn is
|
||||
bytecode-interpreted (no JIT) per public docs, but plugin
|
||||
compilation could in theory use a JIT. Test before enabling.
|
||||
5. **`RestrictAddressFamilies=` without AF_UNIX**: journald socket
|
||||
needs it. Always include AF_UNIX.
|
||||
6. **`ProcSubset=pid` and Python**: gunicorn shouldn't break (uses
|
||||
/proc/self/* + signal-based ipc). Verify.
|
||||
7. **sysctl `kernel.yama.ptrace_scope=2`**: blocks operator's own
|
||||
`gdb` / `strace -p` against any running service. If you need to
|
||||
debug, temporarily set back to 1 via sysctl, then revert.
|
||||
8. **`ProtectSystem=strict` on web**: was `=full`. Tighter; might
|
||||
break a write the web app does to a path outside `/var/lib/left4me`.
|
||||
Audit `l4d2web/*` for `os.makedirs` or `open(...'w')` outside that
|
||||
root.
|
||||
|
||||
## Open questions for the implementer
|
||||
|
||||
(After test plan results come back, finalize these.)
|
||||
|
||||
1. Do we adopt `MemoryDenyWriteExecute=true` if it works for srcds?
|
||||
(Probably yes, defense-in-depth at low cost.)
|
||||
2. Do we set `SocketBindAllow=` on srcds to lock the port range?
|
||||
(Depends on whether `instance.env` exposes the range cleanly to a
|
||||
unit directive.)
|
||||
3. Do we deploy AppArmor profiles as a follow-up?
|
||||
(Probably no — operational complexity exceeds the marginal gain on
|
||||
single-host infra.)
|
||||
4. Do we keep both `BindReadOnlyPaths=` and the legacy
|
||||
`ReadOnlyPaths=` declarations, or simplify? (Simplify — use Bind*
|
||||
exclusively once `TemporaryFileSystem=` is in place.)
|
||||
5. Do we proceed with 3-user split as a follow-up, or close the spec
|
||||
as "addressed by hardening"? Depends on operator's residual-risk
|
||||
tolerance after Phase A lands and we observe.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
|
||||
- Test plan: `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
||||
- Original uid-split spec (still open): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
||||
- Live unit source (ckn-bw reactor): `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
||||
- Reference units (deploy-dir-rethink reference-only): `deploy/files/usr/local/lib/systemd/system/`
|
||||
- systemd docs (latest, systemd 256+ on Trixie):
|
||||
`man systemd.exec`, `man systemd.unit`, `man systemd-analyze`.
|
||||
- L4D2 / Source engine docs:
|
||||
- SourcePawn (bytecode-interpreted): https://wiki.alliedmods.net/SourcePawn
|
||||
- srcds is a Source 2007 engine binary; closed-source, expect surprises.
|
||||
898
docs/superpowers/specs/2026-05-15-hardening-test-plan.md
Normal file
898
docs/superpowers/specs/2026-05-15-hardening-test-plan.md
Normal file
|
|
@ -0,0 +1,898 @@
|
|||
# left4me application hardening — test plan
|
||||
|
||||
**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md`
|
||||
and `2026-05-15-hardening-defenses-survey.md`. **Executed in a follow-up
|
||||
session with shell access to `left4.me` (141.95.32.8).**
|
||||
|
||||
This document is intentionally self-contained: a session that lands cold
|
||||
with shell on `left4.me` can execute it end-to-end without re-reading
|
||||
the threat model or survey. Decisions made in this plan are based on the
|
||||
candidate composition in the defenses survey (Section 5).
|
||||
|
||||
## Test architecture
|
||||
|
||||
### Where we test
|
||||
|
||||
- **Host:** `left4.me` / `ovh.left4me` (141.95.32.8). Production host;
|
||||
no separate test bench. (Reference: memory entry
|
||||
`feedback_test_server_hangs.md` mentions a separate test server at
|
||||
`ckn@10.0.4.128`; verify whether that host is suitable for this work
|
||||
*before* using prod.)
|
||||
- **Canary unit:** `left4me-server@1.service`. Use this as the test
|
||||
instance. Leave `left4me-server@2.service` running baseline so at
|
||||
least one server stays up if the canary breaks.
|
||||
- **Web unit:** `left4me-web.service` is shared. Test web-side
|
||||
hardening only after server@ tests prove the composition; web is
|
||||
more disruptive to roll back.
|
||||
|
||||
### Operating constraints
|
||||
|
||||
- **System units only.** No `systemctl --user`, no lingering, no
|
||||
per-user systemd instance. All units under `/etc/systemd/system/` or
|
||||
`/usr/local/lib/systemd/system/`. Drop-ins go to
|
||||
`/etc/systemd/system/<unit>.d/`.
|
||||
- **Drop-in style.** Tests apply via `/etc/systemd/system/left4me-server@1.service.d/test-NN-<name>.conf`
|
||||
(note: `@1` for instance-specific). This leaves the template
|
||||
unmodified — other instances unaffected. `systemctl daemon-reload`
|
||||
picks up drop-ins; `systemctl restart left4me-server@1` applies.
|
||||
- **Cleanup required.** Each test removes its drop-in before the next
|
||||
starts. Baseline must be restorable at any point.
|
||||
- **Recording.** Each test produces a one-paragraph result in this
|
||||
document's "Results" section at the bottom. Append, don't replace.
|
||||
|
||||
### Failure modes to watch for
|
||||
|
||||
- **SECCOMP audit:** `journalctl -k --since '1 minute ago' | grep -i seccomp`
|
||||
shows `type=1326` lines. Each is a syscall denied; the syscall number
|
||||
identifies the call. Use `scmp_sys_resolver` to translate.
|
||||
- **Unit start failure:** `systemctl is-active left4me-server@1` → `inactive` or `failed`.
|
||||
- **srcds crash mid-game:** `journalctl -u left4me-server@1 -f` shows
|
||||
unexpected exit; `systemctl show left4me-server@1 -p Result` is
|
||||
not `success`.
|
||||
- **sourcemod/metamod plugin failures:** in-game `sm plugins list` or
|
||||
RCON `sm plugins list` shows plugins as failed-to-load.
|
||||
- **Permission denied where unexpected:** `journalctl -u left4me-server@1`
|
||||
shows `Permission denied` or `Operation not permitted`.
|
||||
|
||||
## Before any test: baseline capture
|
||||
|
||||
Capture these so we can compare after each test, and so we have a
|
||||
known-good snapshot to revert to.
|
||||
|
||||
```bash
|
||||
# 1. Baseline systemd-analyze score
|
||||
sudo systemd-analyze security left4me-server@1.service \
|
||||
| tee /tmp/sec-baseline-server.txt
|
||||
sudo systemd-analyze security left4me-web.service \
|
||||
| tee /tmp/sec-baseline-web.txt
|
||||
|
||||
# 2. Full current unit (cat'd, post-merge with any existing drop-ins)
|
||||
sudo systemctl cat left4me-server@1.service \
|
||||
| tee /tmp/unit-baseline-server.conf
|
||||
sudo systemctl cat left4me-web.service \
|
||||
| tee /tmp/unit-baseline-web.conf
|
||||
|
||||
# 3. Current sysctl
|
||||
sysctl kernel.yama.ptrace_scope | tee /tmp/sysctl-baseline.txt
|
||||
# Expect: kernel.yama.ptrace_scope = 1 (Debian default)
|
||||
|
||||
# 4. Functional baseline — confirm both servers + web healthy now
|
||||
sudo systemctl is-active left4me-server@1 left4me-server@2 left4me-web
|
||||
# Expect: active active active
|
||||
|
||||
# 5. Confirm srcds_linux running, gunicorn running
|
||||
sudo systemctl status left4me-server@1 left4me-server@2 left4me-web \
|
||||
--no-pager | head -40
|
||||
|
||||
# 6. RCON sanity (optional — needs an RCON password)
|
||||
# (Use the web UI to fire `status` against server@1; expect a reply.)
|
||||
|
||||
# 7. Capture baseline syscalls (to compare what's blocked after filter)
|
||||
# This is heavy; only run if you suspect a filter is too tight:
|
||||
# sudo systemctl edit --runtime left4me-server@1
|
||||
# Add: SystemCallLog=@privileged
|
||||
# Reload, restart, observe journalctl -u for ~5 minutes, then revert.
|
||||
```
|
||||
|
||||
Record `/tmp/sec-baseline-server.txt` score (a value like "5.4 EXPOSED"
|
||||
is typical). Goal: lower (more secure) after refactor.
|
||||
|
||||
## Test 1 — `PrivateUsers=true` compatibility
|
||||
|
||||
**Goal:** Confirm `PrivateUsers=true` works on `left4me-server@.service`
|
||||
with the `+`-prefixed `ExecStartPre` overlay-mount helper.
|
||||
|
||||
**Pre-condition:** server@1 active, baseline captured.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo install -d -m0755 /etc/systemd/system/left4me-server@1.service.d/
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-01-privateusers.conf <<'EOF'
|
||||
[Service]
|
||||
PrivateUsers=true
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Unit started cleanly
|
||||
sudo systemctl is-active left4me-server@1
|
||||
# Expect: active
|
||||
|
||||
# 2. ExecStartPre's nsenter+overlay-mount succeeded (the mount exists)
|
||||
sudo findmnt /var/lib/left4me/runtime/1/merged
|
||||
# Expect: a row showing overlay mounted
|
||||
|
||||
# 3. Process is running
|
||||
pgrep -af srcds_linux
|
||||
# Expect: at least one PID matching left4dead2
|
||||
|
||||
# 4. From inside the unit's namespace: process appears as configured uid
|
||||
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
|
||||
sudo cat /proc/$PID/status | grep -E '^Uid|^Gid'
|
||||
# Expect: uid 980 (left4me) — outside the namespace, the kernel reports
|
||||
# the unit's User=. Inside the namespace it's also 980 (identity map).
|
||||
|
||||
# 5. Userns confirmed
|
||||
sudo readlink /proc/$PID/ns/user
|
||||
sudo readlink /proc/1/ns/user
|
||||
# Expect: different — different user namespaces
|
||||
```
|
||||
|
||||
**Pass criteria:** all five checks pass.
|
||||
|
||||
**Failure handling:** if unit fails to start, check
|
||||
`journalctl -u left4me-server@1 -n 100` for the failure reason. Most
|
||||
likely cause if it fails: the overlay-mount helper itself depends on
|
||||
the unit's mount namespace in a way that PrivateUsers breaks. (The `+`
|
||||
prefix should bypass — verifying that assumption is the test's whole
|
||||
point.)
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-01-privateusers.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
sudo systemctl is-active left4me-server@1 # active again
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test 2 — `TemporaryFileSystem` + minimal bind set
|
||||
|
||||
**Goal:** Confirm srcds runs with `/var/lib`, `/etc`, `/opt`, `/home`,
|
||||
`/root` virtualized to empty tmpfs, with only the listed paths bound back.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-02-tmpfs.conf <<'EOF'
|
||||
[Service]
|
||||
# Remove the legacy paths so they don't collide with the new bind setup
|
||||
ReadOnlyPaths=
|
||||
ReadWritePaths=
|
||||
|
||||
# Virtual filesystem
|
||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
|
||||
BindReadOnlyPaths=/var/lib/left4me/installation
|
||||
BindReadOnlyPaths=/var/lib/left4me/overlays
|
||||
BindReadOnlyPaths=/etc/left4me/host.env
|
||||
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates
|
||||
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives
|
||||
BindPaths=/var/lib/left4me/runtime/%i
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Unit started
|
||||
sudo systemctl is-active left4me-server@1
|
||||
|
||||
# 2. From inside the unit's namespace: invisible files
|
||||
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
|
||||
sudo nsenter --target $PID --mount -- ls -la /var/lib/left4me/left4me.db 2>&1
|
||||
# Expect: No such file or directory
|
||||
|
||||
sudo nsenter --target $PID --mount -- ls -la /etc/left4me/web.env 2>&1
|
||||
# Expect: No such file or directory
|
||||
|
||||
sudo nsenter --target $PID --mount -- ls /opt 2>&1
|
||||
# Expect: empty or "No such file or directory"
|
||||
|
||||
sudo nsenter --target $PID --mount -- ls /var/lib/left4me/
|
||||
# Expect: only installation, overlays, runtime (the bound paths)
|
||||
|
||||
# 3. Bound paths visible and right mode
|
||||
sudo nsenter --target $PID --mount -- ls -la /var/lib/left4me/runtime/1/
|
||||
# Expect: upper, work, merged dirs visible, RW
|
||||
|
||||
sudo nsenter --target $PID --mount -- ls /etc/left4me/
|
||||
# Expect: only host.env
|
||||
|
||||
# 4. DNS works (workshop downloads, master server)
|
||||
sudo nsenter --target $PID --mount --net -- getent hosts steamcommunity.com
|
||||
# Expect: an IP
|
||||
|
||||
# 5. Game running normally
|
||||
sudo systemctl status left4me-server@1 --no-pager | head -15
|
||||
# Expect: active (running)
|
||||
|
||||
# 6. No SECCOMP/EACCES errors
|
||||
sudo journalctl -u left4me-server@1 --since '2 minutes ago' \
|
||||
| grep -iE 'permission|denied|seccomp|EACCES|ENOENT' | head -20
|
||||
# Expect: nothing alarming. Some ENOENT may be normal (srcds probes
|
||||
# files); the question is whether anything is failing fatally.
|
||||
```
|
||||
|
||||
**Pass criteria:** unit active, DB/web.env/src invisible, runtime
|
||||
visible+writable, DNS works, no fatal errors in journal.
|
||||
|
||||
**Failure handling:** if a bind path is missing on disk, the unit
|
||||
fails to start with a clear error. Add the missing path or remove the
|
||||
bind reference.
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-02-tmpfs.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test 3 — `SystemCallFilter` (logging mode)
|
||||
|
||||
**Goal:** Discover what srcds calls under load before committing to a
|
||||
filter. Run with `SystemCallLog=` (audit only, doesn't block) for 5-10
|
||||
minutes of live play.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-03-syslog.conf <<'EOF'
|
||||
[Service]
|
||||
SystemCallArchitectures=native
|
||||
# Log every syscall in @privileged + @debug + @mount + @raw-io
|
||||
SystemCallLog=@privileged @debug @mount @raw-io
|
||||
SystemCallFilter=
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify (and produce data):**
|
||||
```bash
|
||||
# 1. Unit active
|
||||
sudo systemctl is-active left4me-server@1
|
||||
|
||||
# 2. Capture logs for 5 minutes during normal play
|
||||
# (manually connect a Steam client to the server, walk around, then disconnect)
|
||||
sudo journalctl -u left4me-server@1 --since '5 minutes ago' \
|
||||
| grep -iE 'audit|syscall|SCMP' \
|
||||
| tee /tmp/syscall-log-test3.txt
|
||||
|
||||
# 3. Analyze
|
||||
sort -u /tmp/syscall-log-test3.txt > /tmp/syscall-log-test3-uniq.txt
|
||||
wc -l /tmp/syscall-log-test3-uniq.txt
|
||||
# Read through; identify whether @debug or @mount or @privileged
|
||||
# contains any syscall srcds calls during normal operation.
|
||||
```
|
||||
|
||||
**Pass criteria:** capture is complete. Decision feeds Test 4.
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-03-syslog.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test 4 — `SystemCallFilter` (enforcement mode)
|
||||
|
||||
**Goal:** Apply the candidate `SystemCallFilter=` and confirm srcds
|
||||
runs without any SECCOMP-killed calls. Tightness driven by Test 3
|
||||
results.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-04-syscall.conf <<'EOF'
|
||||
[Service]
|
||||
SystemCallArchitectures=native
|
||||
SystemCallFilter=@system-service
|
||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Unit active
|
||||
sudo systemctl is-active left4me-server@1
|
||||
|
||||
# 2. Watch for SECCOMP kills for ~10 minutes during play
|
||||
sudo journalctl -u left4me-server@1 -kf
|
||||
# Press Ctrl-C after 10 min if no SECCOMP audit lines (type=1326)
|
||||
|
||||
# 3. Functional: server accepts connections, plugins load
|
||||
# (use Steam client; verify in-game)
|
||||
# Optional RCON check:
|
||||
# sudo rcon -p $PW -a left4.me:27015 "sm plugins list"
|
||||
# Expect: list of plugins, all loaded.
|
||||
|
||||
# 4. Verify ptrace is blocked
|
||||
GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
|
||||
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
|
||||
sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -5
|
||||
# Expect: ptrace: Operation not permitted (or seccomp denial)
|
||||
```
|
||||
|
||||
**Pass criteria:** unit active for ≥10 min, no SECCOMP kills, plugins
|
||||
load, ptrace blocked.
|
||||
|
||||
**Failure handling:** if SECCOMP kills appear:
|
||||
- Identify the syscall from the audit line (`syscall=<num> compat=0`),
|
||||
resolve via `scmp_sys_resolver -a $(uname -m) <num>` (libseccomp-dev).
|
||||
- Relax the filter: remove the offending group from the deny list, OR
|
||||
switch from kill (default) to log (`SystemCallErrorNumber=EPERM`)
|
||||
for that group.
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-04-syscall.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test 5 — `ProcSubset=pid` + `ProtectProc=invisible`
|
||||
|
||||
**Goal:** Confirm /proc is narrowed to the unit's own PIDs and
|
||||
hidden from external readers.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-05-proc.conf <<'EOF'
|
||||
[Service]
|
||||
ProtectProc=invisible
|
||||
ProcSubset=pid
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Unit active
|
||||
sudo systemctl is-active left4me-server@1
|
||||
|
||||
# 2. /proc visibility narrowed
|
||||
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
|
||||
sudo nsenter --target $PID --mount --pid -- ls /proc | head -20
|
||||
# Expect: only the unit's own PIDs (srcds_run, srcds_linux,
|
||||
# child threads). NOT gunicorn or other PIDs.
|
||||
|
||||
# 3. Can't read other procs' environ
|
||||
GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
|
||||
sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/environ 2>&1
|
||||
# Expect: No such file or directory (invisible) — not Permission denied
|
||||
```
|
||||
|
||||
**Pass criteria:** all of the above; no gunicorn PIDs visible.
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-05-proc.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test 6 — `MemoryDenyWriteExecute=true`
|
||||
|
||||
**Goal:** Test whether Source engine + sourcemod work under MDW=true.
|
||||
**Likely to fail.** Skip if uncertain.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-06-mdw.conf <<'EOF'
|
||||
[Service]
|
||||
MemoryDenyWriteExecute=true
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Unit active
|
||||
sudo systemctl is-active left4me-server@1
|
||||
|
||||
# 2. Run for 10+ minutes during normal play, including:
|
||||
# - Connect a Steam client
|
||||
# - Walk around a map
|
||||
# - Trigger a plugin (rcon: sm_admin)
|
||||
# - Map change
|
||||
# - Disconnect
|
||||
|
||||
# 3. Watch for crashes
|
||||
sudo journalctl -u left4me-server@1 --since '15 minutes ago' \
|
||||
| grep -iE 'segfault|SIGSEGV|coredump|abort|EPERM.*mprotect'
|
||||
# Expect: empty
|
||||
|
||||
# 4. SECCOMP kills from mprotect calls
|
||||
sudo journalctl -u left4me-server@1 -k --since '15 minutes ago' \
|
||||
| grep -i 'type=1326.*mprotect'
|
||||
# Expect: empty
|
||||
```
|
||||
|
||||
**Pass criteria:** no crashes, no relevant SECCOMP audit lines.
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-server@1.service.d/test-06-mdw.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Decision:** if pass → include `MemoryDenyWriteExecute=true` in the
|
||||
final composition. If fail → exclude (and document the reason in the
|
||||
result).
|
||||
|
||||
---
|
||||
|
||||
## Test 7 — Full proposed composition (everything that passed)
|
||||
|
||||
**Goal:** Compose tests 1, 2, 4, 5, (6 if it passed) into a single
|
||||
drop-in and verify nothing interacts badly.
|
||||
|
||||
**Drop-in:** (Adjust to skip Test 6's directives if Test 6 failed.)
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/left4me-server@1.service.d/test-07-full.conf <<'EOF'
|
||||
[Service]
|
||||
# Identity / privilege
|
||||
NoNewPrivileges=true
|
||||
RestrictSUIDSGID=true
|
||||
CapabilityBoundingSet=
|
||||
AmbientCapabilities=
|
||||
UMask=0027
|
||||
|
||||
# Namespaces
|
||||
PrivateUsers=true
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
PrivateIPC=true
|
||||
ProtectHome=true
|
||||
|
||||
# Filesystem view (clean slate)
|
||||
ReadOnlyPaths=
|
||||
ReadWritePaths=
|
||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
|
||||
BindReadOnlyPaths=/var/lib/left4me/installation
|
||||
BindReadOnlyPaths=/var/lib/left4me/overlays
|
||||
BindReadOnlyPaths=/etc/left4me/host.env
|
||||
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates
|
||||
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives
|
||||
BindPaths=/var/lib/left4me/runtime/%i
|
||||
ProtectSystem=strict
|
||||
|
||||
# /proc + kernel
|
||||
ProtectProc=invisible
|
||||
ProcSubset=pid
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectKernelLogs=true
|
||||
ProtectClock=true
|
||||
ProtectControlGroups=true
|
||||
ProtectHostname=true
|
||||
LockPersonality=true
|
||||
|
||||
# Syscall
|
||||
SystemCallArchitectures=native
|
||||
SystemCallFilter=@system-service
|
||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
|
||||
|
||||
# Network
|
||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
|
||||
|
||||
# IPC + realtime + namespaces
|
||||
RestrictNamespaces=true
|
||||
RestrictRealtime=true
|
||||
RemoveIPC=true
|
||||
KeyringMode=private
|
||||
|
||||
# (Include only if Test 6 passed:)
|
||||
# MemoryDenyWriteExecute=true
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Unit active
|
||||
sudo systemctl is-active left4me-server@1
|
||||
sleep 30
|
||||
sudo systemctl is-active left4me-server@1 # still active
|
||||
|
||||
# 2. systemd-analyze: score should drop significantly
|
||||
sudo systemd-analyze security left4me-server@1.service \
|
||||
| tee /tmp/sec-after-server.txt
|
||||
diff /tmp/sec-baseline-server.txt /tmp/sec-after-server.txt \
|
||||
| head -40
|
||||
# Expect: many ✓ lines that were ✗, score dropped
|
||||
|
||||
# 3. Run smoke matrix (next section)
|
||||
```
|
||||
|
||||
**Smoke matrix (run after Test 7 settles):**
|
||||
|
||||
```bash
|
||||
# S1: server is responsive
|
||||
sudo systemctl status left4me-server@1 --no-pager | head -10
|
||||
# Active (running), recent green
|
||||
|
||||
# S2: srcds is in-game
|
||||
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
|
||||
[ -n "$PID" ] && echo "OK: srcds PID $PID" || echo "FAIL"
|
||||
|
||||
# S3: from outside, RCON responds
|
||||
# (do this from the operator's laptop or via the web UI)
|
||||
|
||||
# S4: workshop / overlay refresh path
|
||||
# (trigger from web UI; verify the overlay rebuild succeeds — the
|
||||
# script-sandbox is a SEPARATE unit, not affected by these changes,
|
||||
# so any failure is in the web app's invocation path, not the
|
||||
# sandbox itself.)
|
||||
|
||||
# S5: web app can still sudo helpers
|
||||
# (trigger a server start/stop from the web UI; if the sudo path
|
||||
# fails, the web app's hardening is too tight — but we haven't
|
||||
# changed the web unit yet, so this should still work.)
|
||||
|
||||
# S6: log streaming works
|
||||
# (open the web UI's log view for server@1; verify lines flow.)
|
||||
|
||||
# S7: file upload to overlay
|
||||
# (upload a small file via the file-tree endpoint; verify it
|
||||
# appears on disk in /var/lib/left4me/overlays/<id>/.)
|
||||
|
||||
# S8: peer server unaffected
|
||||
sudo systemctl is-active left4me-server@2
|
||||
# active (we didn't touch it)
|
||||
```
|
||||
|
||||
**Pass criteria:** all smoke items pass. systemd-analyze score
|
||||
dropped significantly.
|
||||
|
||||
**Failure handling:** if anything in the smoke fails, identify which
|
||||
directive caused it by removing them one at a time until smoke
|
||||
passes. Document the offender.
|
||||
|
||||
**DO NOT cleanup yet** — leave Test 7 in place for Test 8.
|
||||
|
||||
---
|
||||
|
||||
## Test 8 — Attack verification (the audit gaps)
|
||||
|
||||
**Goal:** Confirm the threat-model defenses (D1, D2, D3, D5) actually
|
||||
work end-to-end.
|
||||
|
||||
**Pre-condition:** Test 7's drop-in still in place.
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
PID=$(pgrep -f 'srcds_linux.*left4dead2' | head -1)
|
||||
GUNICORN_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
|
||||
|
||||
# D1.a — srcds cannot read DB
|
||||
sudo nsenter --target $PID --mount -- cat /var/lib/left4me/left4me.db 2>&1 | head -1
|
||||
# Expect: cat: /var/lib/left4me/left4me.db: No such file or directory
|
||||
|
||||
# D1.b — srcds cannot read web.env
|
||||
sudo nsenter --target $PID --mount -- cat /etc/left4me/web.env 2>&1 | head -1
|
||||
# Expect: cat: /etc/left4me/web.env: No such file or directory
|
||||
|
||||
# D1.c — srcds cannot read its own past
|
||||
sudo nsenter --target $PID --mount -- ls /opt 2>&1 | head -5
|
||||
# Expect: empty listing or No such file or directory
|
||||
|
||||
# D2.a — srcds cannot ptrace gunicorn (syscall filter)
|
||||
sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -3
|
||||
# Expect: Operation not permitted
|
||||
|
||||
# D2.b — srcds cannot read /proc/<gunicorn>/environ
|
||||
sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/environ 2>&1 | head -1
|
||||
# Expect: No such file or directory (ProtectProc=invisible)
|
||||
|
||||
# D2.c — srcds cannot read /proc/<gunicorn>/mem
|
||||
sudo nsenter --target $PID --mount -- cat /proc/$GUNICORN_PID/mem 2>&1 | head -1
|
||||
# Expect: No such file or directory
|
||||
|
||||
# D3 — srcds cannot use sudo helpers (NoNewPrivileges blocks setuid)
|
||||
sudo nsenter --target $PID --mount -- sudo -n /usr/local/libexec/left4me/left4me-systemctl show server@2 2>&1 | head -3
|
||||
# Expect: a sudo error about no new privileges, or operation not permitted
|
||||
|
||||
# D5 — server@1 cannot ptrace server@2's srcds
|
||||
PID2=$(pgrep -f 'srcds_linux.*\@2' | head -1)
|
||||
[ -n "$PID2" ] && sudo nsenter --target $PID --mount -- /usr/bin/gdb --batch -p $PID2 2>&1 | tail -3
|
||||
# Expect: Operation not permitted (cross-instance userns OR syscall filter)
|
||||
|
||||
# Bonus — confirm PrivateUsers is in effect
|
||||
sudo readlink /proc/$PID/ns/user
|
||||
sudo readlink /proc/1/ns/user
|
||||
# Expect: different
|
||||
```
|
||||
|
||||
**Pass criteria:** every attack vector returns an error.
|
||||
|
||||
**Cleanup:** **Do not remove the drop-in yet** — leave it for Test 9.
|
||||
|
||||
---
|
||||
|
||||
## Test 9 — System-wide sysctl: `kernel.yama.ptrace_scope=2`
|
||||
|
||||
**Goal:** Add belt-and-braces system-wide.
|
||||
|
||||
**Apply:**
|
||||
```bash
|
||||
sudo tee /etc/sysctl.d/99-left4me-ptrace.conf <<'EOF'
|
||||
# Block ptrace except from root (CAP_SYS_PTRACE).
|
||||
# Combined with SystemCallFilter=~@debug + PrivateUsers=true in the
|
||||
# unit, this gives defense-in-depth at three levels.
|
||||
kernel.yama.ptrace_scope=2
|
||||
EOF
|
||||
sudo sysctl --system | grep yama
|
||||
# Expect: kernel.yama.ptrace_scope = 2
|
||||
sysctl kernel.yama.ptrace_scope
|
||||
# Expect: 2
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# As left4me (no caps), gdb attach to gunicorn from OUTSIDE the unit's
|
||||
# namespace
|
||||
sudo -u left4me /usr/bin/gdb --batch -p $GUNICORN_PID 2>&1 | tail -3
|
||||
# Expect: Operation not permitted
|
||||
|
||||
# Operator gdb (as root) still works:
|
||||
sudo /usr/bin/gdb --batch -ex "info threads" -p $GUNICORN_PID 2>&1 | tail -10
|
||||
# Expect: gdb output (debugging is admin-only now)
|
||||
```
|
||||
|
||||
**Pass criteria:** non-root can't ptrace anything; root still can.
|
||||
|
||||
**No cleanup** — this is permanent (commit to /etc/sysctl.d/).
|
||||
|
||||
---
|
||||
|
||||
## Test 10 — Web unit hardening (carefully)
|
||||
|
||||
**Goal:** Apply non-sudo-breaking directives to `left4me-web.service`.
|
||||
|
||||
**Pre-condition:** Test 7's server drop-in still in place. Web is at
|
||||
baseline.
|
||||
|
||||
**Drop-in:**
|
||||
```bash
|
||||
sudo install -d -m0755 /etc/systemd/system/left4me-web.service.d/
|
||||
sudo tee /etc/systemd/system/left4me-web.service.d/test-10-web.conf <<'EOF'
|
||||
[Service]
|
||||
# (NoNewPrivileges intentionally NOT set — web sudoes to helpers.)
|
||||
# (PrivateUsers intentionally NOT set — would break sudo's setuid.)
|
||||
# (CapabilityBoundingSet not set — sudo + PAM need caps.)
|
||||
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
LockPersonality=true
|
||||
UMask=0027
|
||||
|
||||
# /proc + kernel
|
||||
ProtectProc=invisible
|
||||
ProcSubset=pid
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectKernelLogs=true
|
||||
ProtectClock=true
|
||||
ProtectControlGroups=true
|
||||
ProtectHostname=true
|
||||
|
||||
# Syscall (no ~@privileged — sudo needs setuid/etc.)
|
||||
SystemCallArchitectures=native
|
||||
SystemCallFilter=@system-service
|
||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete
|
||||
|
||||
# Network
|
||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
|
||||
|
||||
# Misc
|
||||
RestrictNamespaces=true
|
||||
RestrictRealtime=true
|
||||
RemoveIPC=true
|
||||
EOF
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-web
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
# 1. Web up
|
||||
sudo systemctl is-active left4me-web
|
||||
|
||||
# 2. Web responds (curl from the host)
|
||||
curl -sI http://127.0.0.1:8000/ | head -5
|
||||
# Expect: HTTP/1.1 200 or similar (whatever the default route is)
|
||||
|
||||
# 3. Web sudo path works — trigger from operator's laptop, watching the
|
||||
# web UI. Start/stop a server; observe success.
|
||||
|
||||
# 4. systemd-analyze score
|
||||
sudo systemd-analyze security left4me-web.service \
|
||||
| tee /tmp/sec-after-web.txt
|
||||
diff /tmp/sec-baseline-web.txt /tmp/sec-after-web.txt | head -20
|
||||
|
||||
# 5. Web cannot ptrace srcds (D4)
|
||||
WEB_PID=$(pgrep -f 'gunicorn.*l4d2web' | head -1)
|
||||
sudo -u left4me /usr/bin/gdb --batch -p $PID 2>&1 | tail -3
|
||||
# (might still succeed if the operator runs as root — what matters is
|
||||
# from inside the web unit's namespace)
|
||||
sudo nsenter --target $WEB_PID --mount -- /usr/bin/gdb --batch -p $PID 2>&1 | tail -3
|
||||
# Expect: Operation not permitted (SystemCallFilter blocks ptrace)
|
||||
```
|
||||
|
||||
**Pass criteria:** all of above.
|
||||
|
||||
**Failure handling:** if sudo from web breaks, remove the most likely
|
||||
culprit (probably one of the SystemCallFilter lines being too tight).
|
||||
Most likely candidate: `~@debug` could block `process_vm_readv` which
|
||||
sudo doesn't use, but `~@privileged` is not on the web filter so sudo's
|
||||
setuid is OK.
|
||||
|
||||
**Cleanup:**
|
||||
```bash
|
||||
sudo rm /etc/systemd/system/left4me-web.service.d/test-10-web.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-web
|
||||
```
|
||||
|
||||
(Web reverts to baseline. Server drop-in stays for the report.)
|
||||
|
||||
---
|
||||
|
||||
## Test 11 — Soak test
|
||||
|
||||
**Goal:** Run the composition for an extended period to surface
|
||||
race-condition or workload-dependent issues.
|
||||
|
||||
**Pre-condition:** Test 7 drop-in on server@1; Test 9 sysctl in place.
|
||||
|
||||
**Procedure:**
|
||||
```bash
|
||||
# Run for 24-48 hours; observe:
|
||||
sudo journalctl -u left4me-server@1 --since '24 hours ago' \
|
||||
| grep -iE 'seccomp|denied|EACCES|EPERM' | wc -l
|
||||
# Expect: 0 or a very small number (some EACCES on benign probes
|
||||
# are normal)
|
||||
|
||||
sudo journalctl -u left4me-server@1 -k --since '24 hours ago' \
|
||||
| grep 'type=1326' | wc -l
|
||||
# Expect: 0
|
||||
|
||||
sudo systemctl status left4me-server@1
|
||||
# Expect: active, no restarts since start
|
||||
```
|
||||
|
||||
**Pass criteria:** no SECCOMP kills over the soak period, no
|
||||
unexpected restarts.
|
||||
|
||||
---
|
||||
|
||||
## Cleanup (after all tests pass)
|
||||
|
||||
```bash
|
||||
# Remove all test drop-ins
|
||||
sudo rm -rf /etc/systemd/system/left4me-server@1.service.d/test-*.conf
|
||||
sudo rm -rf /etc/systemd/system/left4me-web.service.d/test-*.conf
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart left4me-server@1 left4me-web
|
||||
sudo systemctl is-active left4me-server@1 left4me-web # both active
|
||||
|
||||
# Sysctl from Test 9 STAYS in place.
|
||||
|
||||
# Remove temp files
|
||||
rm /tmp/sec-baseline-*.txt /tmp/sec-after-*.txt
|
||||
rm /tmp/unit-baseline-*.conf
|
||||
rm /tmp/syscall-log-*.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Results template
|
||||
|
||||
Append the executing session's findings here. One paragraph per test.
|
||||
|
||||
### Test 1 — PrivateUsers
|
||||
- Pass / fail: TBD
|
||||
- Notes:
|
||||
|
||||
### Test 2 — TemporaryFileSystem + binds
|
||||
- Pass / fail: TBD
|
||||
- Notes:
|
||||
|
||||
### Test 3 — SystemCallLog discovery
|
||||
- Pass / fail: TBD
|
||||
- Syscalls observed under load (if any from @debug/@mount/@privileged):
|
||||
- Notes:
|
||||
|
||||
### Test 4 — SystemCallFilter enforcement
|
||||
- Pass / fail: TBD
|
||||
- If filter had to be relaxed, which group:
|
||||
- Notes:
|
||||
|
||||
### Test 5 — ProcSubset + ProtectProc
|
||||
- Pass / fail: TBD
|
||||
- Notes:
|
||||
|
||||
### Test 6 — MemoryDenyWriteExecute
|
||||
- Pass / fail: TBD (likely fail; document the failure mode)
|
||||
- Notes:
|
||||
|
||||
### Test 7 — Full composition
|
||||
- Pass / fail: TBD
|
||||
- systemd-analyze score before/after:
|
||||
- Notes:
|
||||
|
||||
### Test 8 — Attack verification
|
||||
- Pass / fail: TBD
|
||||
- Per-vector results (D1.a, D1.b, ..., D5):
|
||||
|
||||
### Test 9 — Yama ptrace_scope=2
|
||||
- Applied: TBD
|
||||
- Operator workflow impact noted:
|
||||
|
||||
### Test 10 — Web hardening
|
||||
- Pass / fail: TBD
|
||||
- Sudo path verified working:
|
||||
- systemd-analyze score before/after:
|
||||
|
||||
### Test 11 — Soak
|
||||
- Duration:
|
||||
- Issues observed:
|
||||
|
||||
---
|
||||
|
||||
## Output of this test plan
|
||||
|
||||
When all tests complete:
|
||||
1. Mark this document with **status: tested** and record the dates.
|
||||
2. Open a new implementation plan
|
||||
(`docs/superpowers/plans/2026-MM-DD-hardening-refactor.md`) that
|
||||
commits the proven composition to the ckn-bw reactor + reference
|
||||
units + test suite.
|
||||
3. Decide on the deferred questions:
|
||||
- 3-user uid split — necessary or covered by hardening?
|
||||
- AppArmor profile follow-up — pursue or close?
|
||||
- `MemoryDenyWriteExecute=true` — include if Test 6 passed?
|
||||
- `SocketBindAllow=` — add to lock the gameserver port range?
|
||||
4. Mark `2026-05-15-user-uid-split-design.md` as superseded or closed
|
||||
per the answer to the previous bullet.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
|
||||
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
|
||||
- Live unit source: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
||||
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
|
||||
- Tools needed on `left4.me`:
|
||||
- `systemd-analyze` (in `systemd` package, already installed)
|
||||
- `scmp_sys_resolver` (in `libseccomp-dev`; install on demand for
|
||||
Test 3/4 if filters need analysis)
|
||||
- `gdb` (for ptrace tests; install on demand)
|
||||
- `nsenter` (in `util-linux`, already installed)
|
||||
- `findmnt`, `pgrep`, standard userspace
|
||||
345
docs/superpowers/specs/2026-05-15-hardening-threat-model.md
Normal file
345
docs/superpowers/specs/2026-05-15-hardening-threat-model.md
Normal file
|
|
@ -0,0 +1,345 @@
|
|||
# left4me application hardening — threat model
|
||||
|
||||
**Status:** living spec, intended input to a hardening implementation plan.
|
||||
Paired with `2026-05-15-hardening-defenses-survey.md` and
|
||||
`2026-05-15-hardening-test-plan.md`.
|
||||
|
||||
This document establishes *what we defend against and what we accept losing*.
|
||||
The defenses survey and test plan operationalize this against the codebase.
|
||||
|
||||
## Context
|
||||
|
||||
The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and
|
||||
queued "uid split decision" as the next session's task
|
||||
(`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user
|
||||
configuration found that the gameserver's systemd hardening blocks
|
||||
privilege escalation but leaves same-uid attack surface wide open:
|
||||
RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by
|
||||
srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable),
|
||||
no ptrace block on `left4me-server@.service`, no `/proc` isolation.
|
||||
Rather than answer the original "1/2/3 uids" question in isolation,
|
||||
this work treats application hardening as a first-class refactor: ground
|
||||
the decision in an explicit threat model, survey the full Linux+systemd
|
||||
defense menu, test what composes safely with Source engine + the rest of
|
||||
the stack, then implement.
|
||||
|
||||
## Operating posture (assumed)
|
||||
|
||||
Solo-operator, single-host infra (`left4.me` / `ovh.left4me`,
|
||||
141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged
|
||||
operator is the user. There are no shell logins as `left4me` or
|
||||
`l4d2-sandbox`. All access to those uids is funneled through the
|
||||
systemd-managed units (`left4me-web.service`, `left4me-server@.service`,
|
||||
`left4me-script-sandbox`). The host runs nothing other than left4me +
|
||||
ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).
|
||||
|
||||
If those assumptions don't hold (e.g., shared host with other tenants,
|
||||
non-systemd-mediated access to the uids), revise this document before
|
||||
proceeding — threat surface changes meaningfully.
|
||||
|
||||
## Assets
|
||||
|
||||
Ordered by impact-if-compromised. Compromise means the attacker can
|
||||
exfiltrate, modify, or destroy the asset.
|
||||
|
||||
### Tier 1 — catastrophic, no easy recovery
|
||||
|
||||
| Asset | Where | Impact of compromise |
|
||||
|---|---|---|
|
||||
| Host root | the box | Total compromise of every service on the host. |
|
||||
| `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. |
|
||||
| `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. |
|
||||
| Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. |
|
||||
| User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. |
|
||||
|
||||
### Tier 2 — severe but bounded
|
||||
|
||||
| Asset | Where | Impact |
|
||||
|---|---|---|
|
||||
| `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. |
|
||||
| Overlay content | `/var/lib/left4me/overlays/<id>/` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. |
|
||||
| Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. |
|
||||
| Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. |
|
||||
| Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. |
|
||||
|
||||
### Tier 3 — limited, recoverable
|
||||
|
||||
Job history, build logs, the small subset of in-game state not covered by
|
||||
the above (e.g., live player slot in a specific match).
|
||||
|
||||
## Trust boundaries
|
||||
|
||||
Lines we want enforced. "Enforced" = the kernel + systemd, not "the
|
||||
process politely doesn't cross it."
|
||||
|
||||
| Id | From | To | Strength today | Strength wanted |
|
||||
|---|---|---|---|---|
|
||||
| TB1 | External network | host shell | Strong (firewall, no extra services) | Strong |
|
||||
| TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong |
|
||||
| TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) |
|
||||
| TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong |
|
||||
| TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong |
|
||||
| TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/<n>` access) | Medium (web needs to stage server.cfg) |
|
||||
| TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong |
|
||||
| TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary |
|
||||
|
||||
TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the
|
||||
web app legitimately writes per-instance config; the boundary is "web
|
||||
can write per-instance config" allowed, "web can ptrace srcds" denied.
|
||||
|
||||
## Attackers
|
||||
|
||||
### A1 — Anonymous external attacker (primary)
|
||||
|
||||
Reaches public surfaces:
|
||||
- gunicorn on `:8000` (behind nginx + admin auth)
|
||||
- srcds on UDP `:27015`+ per instance (game protocol; no auth)
|
||||
- (Maybe: workshop subscription endpoints if any; check.)
|
||||
|
||||
Capabilities: arbitrary network packets. Goal: code execution on the
|
||||
host, then exfiltrate secrets and persist.
|
||||
|
||||
### A2 — Authenticated admin (operator)
|
||||
|
||||
In the assumed posture this is *the user*, single person. Out of scope as
|
||||
a threat per operator's choice (insider == operator). If admin auth ever
|
||||
expands to multiple operators, revise.
|
||||
|
||||
### A3 — Malicious workshop content
|
||||
|
||||
A workshop addon (map, plugin, asset pack) is published to the Steam
|
||||
workshop and pulled into a build. The content runs inside srcds via
|
||||
Source engine + sourcemod loading. Capabilities: same as A1 once loaded
|
||||
into srcds (the engine doesn't have a strong privilege boundary against
|
||||
its own loaded plugins). Distinct in that the entry vector is curated by
|
||||
the operator (workshop link added to a blueprint), not arbitrary network
|
||||
input. Risk floor: the operator vetted the source.
|
||||
|
||||
### A4 — Compromised player session
|
||||
|
||||
A connected player exploits a Source-engine protocol bug. Functionally a
|
||||
subset of A1 — same capability set once code is running in srcds.
|
||||
|
||||
### A5 — Local attacker on the host
|
||||
|
||||
Out of scope per operating posture. No non-root local accounts beyond
|
||||
the systemd-managed service uids.
|
||||
|
||||
### A6 — Steam binary supply-chain
|
||||
|
||||
`srcds_linux` is a binary from Valve. A compromised Valve build would
|
||||
already be running as `left4me` and there's no practical defense at
|
||||
this layer. Out of scope.
|
||||
|
||||
## Attack scenarios
|
||||
|
||||
### S1 — L4D2 engine RCE → exfil + persist
|
||||
|
||||
A1 sends a crafted packet to srcds; srcds executes attacker code as
|
||||
`left4me` inside `left4me-server@.service`.
|
||||
|
||||
**Today, attacker can:**
|
||||
- Read DB → all RCON passwords (plaintext), all bcrypt hashes.
|
||||
- Read `web.env` → SECRET_KEY, Steam API key.
|
||||
- ptrace gunicorn → in-memory secrets, current sessions.
|
||||
- Read `/proc/<gunicorn-pid>/environ` → same env as `web.env`.
|
||||
- ptrace + read DB of peer `left4me-server@<n>` — cross-server compromise.
|
||||
- `sudo left4me-systemctl|journalctl|overlay` for any instance.
|
||||
- Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`).
|
||||
- Cannot acquire new caps (NoNewPrivileges).
|
||||
|
||||
**Defended outcome (goal):** Blast radius limited to "this gameserver's
|
||||
runtime state during this session" — no peer-server compromise, no DB
|
||||
access, no `web.env` access, no ptrace.
|
||||
|
||||
### S2 — Web app RCE → secrets + persistence
|
||||
|
||||
A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth
|
||||
bypass, file-upload escape). Web executes attacker code as `left4me`
|
||||
inside `left4me-web.service`.
|
||||
|
||||
**Today, attacker can:**
|
||||
- Read + write DB (web's primary path).
|
||||
- Read `web.env`.
|
||||
- Write `/opt/left4me/src/` → backdoor next gunicorn reload.
|
||||
- `sudo` all helper verbs.
|
||||
- ptrace srcds peers, modify their `runtime/<n>/` upper layer.
|
||||
- Modify overlays (writes to `/var/lib/left4me/overlays/`).
|
||||
|
||||
**Defended outcome (goal):** Cannot ptrace gameservers; cannot read
|
||||
`/proc/<srcds-pid>/*`; web compromise still owns its DB and env (its
|
||||
primary attack surface, so this is *acceptable residual*).
|
||||
|
||||
### S3 — Cross-server contamination
|
||||
|
||||
S1 played out on srcds@1; attacker pivots to srcds@2.
|
||||
|
||||
**Today:** trivial — ptrace srcds@2, read its memory; or just read the
|
||||
DB to learn srcds@2's RCON password and send commands.
|
||||
|
||||
**Defended outcome (goal):** Blocked. Per-instance namespace isolation
|
||||
(or per-instance uid) means kernel rejects ptrace; DB invisible to
|
||||
gameserver uid hides the RCON list.
|
||||
|
||||
### S4 — Malicious workshop content
|
||||
|
||||
A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod
|
||||
plugin that abuses engine APIs to do file I/O / network calls.
|
||||
|
||||
**Today + with hardening:** functionally equivalent to S1 — the plugin
|
||||
runs as srcds, same blast radius. No software boundary prevents this;
|
||||
the only defense is what's outside the unit. So this is *covered* if S1
|
||||
is covered.
|
||||
|
||||
### S5 — Sudoers helper abuse
|
||||
|
||||
S1 or S2 attacker uses the sudo grants to widen access.
|
||||
|
||||
**Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`):
|
||||
- `left4me-systemctl <name> {enable|disable|show}` — any instance, no
|
||||
ownership check
|
||||
- `left4me-journalctl <name>` — read any unit's journal
|
||||
- `left4me-overlay mount|umount <name>` — any instance
|
||||
- `left4me-script-sandbox <overlay_id> <script>` — runs as `l4d2-sandbox`
|
||||
|
||||
A compromised gameserver can enable/disable peer instances, read their
|
||||
journals, mount/umount their overlays. Not root escalation, but a
|
||||
significant escalation.
|
||||
|
||||
**Defended outcome:** sudoers reachable only from `left4me-web`. The
|
||||
gameserver uid (or the gameserver's namespace) gets none of the helper
|
||||
grants. This is naturally true if the helpers are invoked only by the
|
||||
web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits
|
||||
in its FS view).
|
||||
|
||||
### S6 — Sandbox escape
|
||||
|
||||
Reached A1-equivalent in `l4d2-script-sandbox`. The sandbox runs as
|
||||
`l4d2-sandbox`, fully hardened (verified during 2026-05-15 work).
|
||||
|
||||
**Today:** sandbox-escape attacker has `l4d2-sandbox` capabilities only.
|
||||
With build-time-idmap, writes through the bind land on disk as
|
||||
`left4me`, but the sandbox process itself cannot interact with `left4me`
|
||||
processes (different uid). Existing isolation is strong.
|
||||
|
||||
**Defended outcome:** unchanged — already strong. Document as a load-
|
||||
bearing invariant; do not weaken.
|
||||
|
||||
## What we accept losing
|
||||
|
||||
Decisions to *not* defend, with reasoning. Future work might revisit.
|
||||
|
||||
- **Kernel CVEs** that escape namespaces or seccomp. No practical defense
|
||||
short of running on a hypervisor + KVM. Out of scope.
|
||||
- **systemd unit-config CVEs**. Unit hardening relies on systemd
|
||||
honoring directives correctly. Out of scope.
|
||||
- **Steam binary compromise**. `srcds_linux` is Valve's. Out of scope.
|
||||
- **Sourcemod / Metamod plugin runtime weaknesses**. Plugins run as srcds
|
||||
by design. Out of scope.
|
||||
- **Player IP exposure via game protocol**. Inherent to UDP/Source. Out of
|
||||
scope.
|
||||
- **DoS via game protocol** (`A2S_INFO` flooding etc.). Out of scope for
|
||||
*this* effort; covered by network-layer mitigations.
|
||||
- **DoS via web HTTP**. Covered upstream by nginx + fail2ban; out of
|
||||
scope for *this* effort.
|
||||
- **Host root from operator error** (a misconfigured cron, an admin
|
||||
shell). Out of scope; operator is single-person and aware.
|
||||
- **Long-term forward secrecy** for past sessions (an attacker who
|
||||
exfils SECRET_KEY can replay past sessions). Out of scope; rotation
|
||||
on incident.
|
||||
|
||||
## What we defend (prioritized)
|
||||
|
||||
D1 — **Gameserver RCE cannot exfiltrate DB or web.env**, including RCON
|
||||
passwords and SECRET_KEY. Highest value: catastrophic asset, plausible
|
||||
attack (L4D2 engine RCE is the canonical "old engine, public traffic"
|
||||
risk).
|
||||
|
||||
D2 — **Gameserver RCE cannot ptrace web app or peer gameservers**. Blocks
|
||||
in-memory secret theft and cross-server contamination.
|
||||
|
||||
D3 — **Gameserver RCE cannot use sudo helpers** for instances other
|
||||
than its own (or, ideally, cannot use sudo at all).
|
||||
|
||||
D4 — **Web app RCE cannot ptrace gameservers**. Symmetric to D2; web
|
||||
still has full DB access (acceptable residual since it's the web app's
|
||||
own data).
|
||||
|
||||
D5 — **Cross-server contamination blocked at the kernel level**. Per-
|
||||
instance namespaces or per-instance uid.
|
||||
|
||||
D6 — **Persistent compromise of `/opt/left4me/src/` blocked from
|
||||
gameserver context**. Already partially true via `ProtectSystem=strict`;
|
||||
maintain.
|
||||
|
||||
D7 — **All defenses survive a unit-config refactor in the wrong
|
||||
direction** — e.g., a future developer adding `ReadWritePaths=` widely.
|
||||
Achieved via tests that assert hardening invariants
|
||||
(`deploy/tests/test_deploy_artifacts.py`).
|
||||
|
||||
## Acceptable user-experience cost
|
||||
|
||||
- **Unit start latency**: +5s tolerable; +30s not.
|
||||
- **Memory overhead**: +tens of MB per unit fine; +hundreds not.
|
||||
- **Operational complexity**: one well-documented unit-template
|
||||
hardening profile reusable across units. Acceptable trade-off.
|
||||
- **Debugging cost**: SECCOMP audit log discoverability via
|
||||
`journalctl -k` acceptable. ptrace-based debugging in production
|
||||
unnecessary; can re-enable via ad-hoc drop-in if needed.
|
||||
- **Steam updates / pip installs**: must continue to work without
|
||||
per-update operator action. Privileged paths (steamcmd self-update)
|
||||
can run as `left4me` outside the unit if needed; document.
|
||||
- **Workshop content**: must continue to load. Builds run in the
|
||||
sandbox; the gameserver only reads pre-built overlays.
|
||||
|
||||
## Acceptance criteria for the implementation
|
||||
|
||||
The final composition (hardening directives + any uid changes) must:
|
||||
|
||||
1. **Functionally**: pass the smoke matrix from `2026-05-15-hardening-test-plan.md` (RCON, build, restart, file upload, multi-server, workshop).
|
||||
2. **Defenses verified**:
|
||||
- srcds cannot read `/var/lib/left4me/left4me.db` or `/etc/left4me/web.env` (file not in FS view, or kernel denies)
|
||||
- srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
|
||||
- srcds cannot read `/proc/<other-pid>/*`
|
||||
- web cannot ptrace srcds (symmetric)
|
||||
3. **No regressions**: existing test suite passes
|
||||
(`pytest deploy/tests/test_overlay_helper.py l4d2host/tests/`).
|
||||
4. **Auditable**: invariants asserted in `deploy/tests/test_deploy_artifacts.py`; baseline `systemd-analyze security` score recorded.
|
||||
5. **Documentable**: one paragraph per directive in the unit, explaining
|
||||
*why* it's there. Future maintainers can reason about removal.
|
||||
|
||||
## Open questions to clarify with the operator
|
||||
|
||||
Before the defenses survey is final, clarify:
|
||||
|
||||
1. **Is gunicorn directly internet-reachable, or behind nginx?** The unit
|
||||
binds `127.0.0.1:8000` (per `metadata.py:208`); presumably nginx
|
||||
terminates TLS and forwards. Confirm.
|
||||
2. **Auth model**: who can log into the web app? Is admin auth strong
|
||||
(long passwords, 2FA), or default-grade? Defines how realistic S2 is.
|
||||
3. **Workshop content sources**: curated by operator, or arbitrary
|
||||
workshop subscriptions exposed to admins? Defines A3's realism.
|
||||
4. **Test bench**: is `ckn@10.0.4.128` a real separate test host, or
|
||||
ovh.left4me the only deployment target? Affects test plan choices.
|
||||
5. **`kernel.yama.ptrace_scope` setting on the host?** Default Debian is
|
||||
1; we may want 2 system-wide.
|
||||
6. **Is the host running AppArmor?** Debian Trixie does not enable it by
|
||||
default. If we want AppArmor profiles for srcds (in addition to
|
||||
systemd directives), it needs enabling system-wide.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Audit synthesis (this session's conversation): unit hardening profile
|
||||
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`,
|
||||
metadata reactor `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`,
|
||||
filesystem ACLs `~/Projekte/ckn-bw/bundles/left4me/items.py:21-115`,
|
||||
DB schema `l4d2web/models.py:31, 146-148`, sudoers
|
||||
`deploy/files/etc/sudoers.d/left4me`.
|
||||
- Original uid-split spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
||||
— remains open; this work may supersede it.
|
||||
- Companion docs:
|
||||
`docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`,
|
||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`.
|
||||
- Related work landed this session:
|
||||
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`,
|
||||
`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`.
|
||||
|
|
@ -1,105 +1,114 @@
|
|||
# Session handoff — next: uid-split decision
|
||||
# Session handoff — next: execute hardening test plan
|
||||
|
||||
Short handoff for the session that follows the 2026-05-15 deploy-dir
|
||||
rethink + janitorial sweep. The full project context is in CLAUDE.md
|
||||
and the per-topic specs/plans linked below; this doc covers only
|
||||
what's situationally fresh.
|
||||
Short handoff. Three new hardening specs landed today; the next session
|
||||
takes the test plan to `left4.me` and runs it. Decision on
|
||||
`2026-05-15-user-uid-split-design.md` is **deferred** until the test
|
||||
plan reports back.
|
||||
|
||||
## What just landed
|
||||
|
||||
Four commits since `e38b844`, pushed to `origin/master`:
|
||||
Three coordinated specs at `docs/superpowers/specs/`:
|
||||
|
||||
- `5284e28` — privileged helpers moved out of `deploy/files/usr/local/{libexec,sbin}/`
|
||||
into top-level `scripts/{libexec,sbin}/`. `deploy/` is now reference
|
||||
material (README + example configs + curated example units). Dead
|
||||
static artifacts deleted: `left4me-apply-cake`, `left4me-cake.service`,
|
||||
`left4me-nft-mark.service`, `cake.env`, `left4me-mark.nft`, the
|
||||
superseded `deploy-test-server.sh`.
|
||||
- `160911f` — plan landed at `docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`;
|
||||
adjacent specs marked resolved.
|
||||
- `8f30dd7` — janitorial item 6 (bubblewrap doc-drift) corrected.
|
||||
- `4aa69c2` — janitorial items 8 and 9 verified on `ovh.left4me`
|
||||
(141.95.32.8) and marked resolved.
|
||||
- `2026-05-15-hardening-threat-model.md` — assets, attackers (A1-A6),
|
||||
trust boundaries (TB1-TB8), attack scenarios (S1-S6), what we
|
||||
defend (D1-D7), what we accept losing.
|
||||
- `2026-05-15-hardening-defenses-survey.md` — full Linux + systemd
|
||||
defense menu, per-defense primitive mapping, candidate composition
|
||||
for `left4me-server@.service` + `left4me-web.service`.
|
||||
- `2026-05-15-hardening-test-plan.md` — 11 tests runnable cold on
|
||||
`left4.me`; drop-in style so they never modify persistent units.
|
||||
|
||||
Companion change in **ckn-bw** is committed (`91b7265`) but **not yet
|
||||
pushed**. Verified against the test host via `bw apply ovh.left4me`;
|
||||
the working-tree-as-applied was committed afterwards. Pushing it is
|
||||
safe and idempotent (deployed bytes already match).
|
||||
## Why the shape changed (from uid-split → hardening)
|
||||
|
||||
Janitorial spec status: items 1, 2 (partial), 3, 4, 5, 6, 8, 9
|
||||
closed. Items 7 and 10 remain (item 7 is conditional on the
|
||||
build-overlay-unit refactor; item 10 is a calendar reminder for SM
|
||||
1.13 in late 2026).
|
||||
The prior handoff pointed this session at the 1/2/3-user decision in
|
||||
`2026-05-15-user-uid-split-design.md`. Audit during this session
|
||||
established that the same-uid attack surface (DB readable from srcds,
|
||||
ptrace of gunicorn allowed, RCON passwords stored plaintext in DB,
|
||||
no `/proc` isolation) is closable by *either* a uid split *or*
|
||||
systemd directive composition (`TemporaryFileSystem=` +
|
||||
`SystemCallFilter=~@debug` + `PrivateUsers=true` + `ProcSubset=pid`
|
||||
+ empty `CapabilityBoundingSet=`). Operator chose to step back: do
|
||||
threat-model + research + test before committing to either approach.
|
||||
The three new specs are the output of that step-back.
|
||||
|
||||
## What's next: uid-split
|
||||
## What's next: run the test plan
|
||||
|
||||
Existing handoff:
|
||||
[`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`](2026-05-15-user-uid-split-design.md).
|
||||
Read that first. The decision is whether left4me should have 1, 2,
|
||||
or 3 system users; today it has 2 (`left4me` + `l4d2-sandbox`).
|
||||
The test plan is **self-contained** — drop a fresh Claude session on
|
||||
`left4.me` (141.95.32.8) with the spec in hand and it can execute end
|
||||
to end. System units only; no user units, no lingering.
|
||||
|
||||
This is a **decision** task, not a migration. Likely outcome: settle
|
||||
the question with a short plan and either a memory entry / spec
|
||||
resolution (if status quo wins) or a follow-up implementation plan
|
||||
(if "split to 3" or "collapse to 1" wins). Time-box the decision to
|
||||
one session; defer any migration work to a follow-up plan.
|
||||
Per the test plan's structure:
|
||||
1. Capture baseline (`systemd-analyze security`, current unit state,
|
||||
sysctl).
|
||||
2. Tests 1-6 isolate individual directives against srcds on
|
||||
`left4me-server@1` (canary; server@2 stays baseline as a fallback).
|
||||
3. Test 7 composes everything that passed.
|
||||
4. Test 8 verifies the threat-model defenses (D1-D5) actually work.
|
||||
5. Test 9 applies `kernel.yama.ptrace_scope=2` system-wide.
|
||||
6. Test 10 applies the sudo-compatible subset to `left4me-web.service`.
|
||||
7. Test 11 is a 24-48h soak.
|
||||
|
||||
### Decision-relevant context that emerged this session
|
||||
Results template at the bottom of the test plan; fill in as you go.
|
||||
|
||||
- **The 2-uid model is freshly load-bearing.** The build-time-idmap
|
||||
work (commits `2f6a9cf` + `9053186`, plan
|
||||
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`) explicitly
|
||||
used "sandbox escape could see web.env / DB / running gameservers"
|
||||
as the argument for keeping `l4d2-sandbox` as a separate uid. That
|
||||
argument cuts the "collapse to 1" option hard.
|
||||
- **Verified clean on the host:** `left4me-server@{1,2}.service` are
|
||||
both running as `left4me` today (janitorial item 8 diagnostic).
|
||||
No orphan idmap binds; the 2-uid invariants hold.
|
||||
- **Files-overlay invariant verified:** overlay 8 (`Optimized
|
||||
Settings`, files-type) is `left4me:left4me` end-to-end with no
|
||||
`l4d2-sandbox`-owned files (janitorial item 9). This means files
|
||||
overlays would not be affected by a gameserver-uid split — the
|
||||
Python web app writes them directly as `left4me`.
|
||||
- **The hardening floor is high.** `srcds` already runs with
|
||||
`NoNewPrivileges=true`, `ProtectSystem=strict`,
|
||||
`PrivateDevices=true`, `ReadOnlyPaths=...installation...overlays`,
|
||||
`RestrictSUIDSGID=true`, `LockPersonality=true` (see
|
||||
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`).
|
||||
Most exfil paths a gameserver-uid split would close are already
|
||||
closed by systemd hardening. The case for "split to 3" is
|
||||
defense-in-depth, not a missing primary control.
|
||||
- **Sudoers / cross-repo cost.** A new uid would need additions in
|
||||
ckn-bw's `bundles/left4me/items.py` (`users` dict) and in the
|
||||
sudoers grants. Both are in the right state to receive that change
|
||||
cleanly; deploy-dir-rethink already pinned where each lives.
|
||||
After execution: write the implementation plan at
|
||||
`docs/superpowers/plans/2026-MM-DD-hardening-refactor.md` against the
|
||||
proven composition. The plan touches `~/Projekte/ckn-bw/bundles/left4me/metadata.py`
|
||||
(live source for unit emission per `items.py:2-5`) and the reference
|
||||
copies in `deploy/files/usr/local/lib/systemd/system/`.
|
||||
|
||||
### Downstream consequence
|
||||
## Decision-relevant context
|
||||
|
||||
Whatever uid-split decides constrains the **build-overlay-unit**
|
||||
refactor that follows
|
||||
([`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`](2026-05-15-build-overlay-unit-design.md)).
|
||||
The systemd template unit replacing `left4me-script-sandbox` encodes
|
||||
the idmap mapping `l4d2-sandbox` → `<target uid>`. Settling the uid
|
||||
question first means build-overlay-unit composes against a final
|
||||
foundation rather than retouching.
|
||||
- **Source of truth for unit files is ckn-bw**, not left4me's
|
||||
`deploy/files/`. The `deploy/files/usr/local/lib/systemd/system/*.service`
|
||||
copies are reference-only post-deploy-dir-rethink; the
|
||||
`systemd/units` reactor in `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
||||
is the live emission. Audit confirmed (commit `5284e28` + `items.py:2-5`
|
||||
comment).
|
||||
- **Sandbox is already strong.** `l4d2-sandbox` unit is not in scope
|
||||
for this refactor — its hardening profile was verified during 2026-05-15
|
||||
build-time-idmap work. Document as load-bearing; do not weaken.
|
||||
- **Sudo on the web app blocks deep hardening there.** `NoNewPrivileges=true`
|
||||
and `PrivateUsers=true` are incompatible with the helper-invocation
|
||||
pattern. Sudo-compatible subset only on web. Full hardening blocked
|
||||
on a future "replace sudo with systemctl-managed unit triggering"
|
||||
refactor (build-overlay-unit spec is a step in that direction).
|
||||
- **uid-split spec is deferred, not closed.** After Phase A test
|
||||
results come back, decide: residual risk small enough → close
|
||||
`2026-05-15-user-uid-split-design.md` as superseded. Residual risk
|
||||
significant → write the split as a follow-up.
|
||||
|
||||
## Pointers
|
||||
## Open questions to clarify with operator before/during execution
|
||||
|
||||
- Source-of-truth spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
||||
- Build-time-idmap plan (the load-bearing security argument):
|
||||
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`
|
||||
- Live unit files for srcds hardening review:
|
||||
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
|
||||
- ckn-bw users definition: `~/Projekte/ckn-bw/bundles/left4me/items.py`
|
||||
(the `users = {...}` dict near the top)
|
||||
- Sandbox helper that does the idmap mapping today:
|
||||
`scripts/libexec/left4me-script-sandbox`
|
||||
(Captured in the threat model's "Open questions" section.)
|
||||
|
||||
1. Is gunicorn directly internet-reachable, or only via nginx?
|
||||
2. Admin-auth strength on the web app (defines S2 realism).
|
||||
3. Workshop content curation policy (defines A3 realism).
|
||||
4. Is `ckn@10.0.4.128` usable as a test bench, or is `left4.me` the
|
||||
only deployment target? (Test plan currently assumes `left4.me`.)
|
||||
5. Current `kernel.yama.ptrace_scope` setting on the host.
|
||||
6. AppArmor enabled on host? (Default Debian: not enabled.)
|
||||
|
||||
## What's NOT next
|
||||
|
||||
- Build-overlay-unit refactor. Wait for uid-split.
|
||||
- Janitorial item 7 (`_sandbox_script_dir` cleanup). Conditional on
|
||||
build-overlay-unit Option B landing.
|
||||
- Mako template duplication in ckn-bw. Separate cleanup; the
|
||||
templates legitimately need bw's metadata access.
|
||||
- Pushing the ckn-bw `91b7265` commit. Safe but not blocking.
|
||||
- **build-overlay-unit refactor**
|
||||
(`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`).
|
||||
Still queued; sequenced behind this. The hardening profile from
|
||||
this work becomes the template for the build-overlay unit.
|
||||
- **Pushing the ckn-bw `91b7265` commit.** Still unpushed; still safe.
|
||||
Mentioned in the previous handoff; not a blocker.
|
||||
- **uid-split implementation.** Deferred pending test results.
|
||||
- **AppArmor profiles.** Listed in the defenses survey; deferred.
|
||||
Revisit after Phase A if directive-only hardening leaves gaps.
|
||||
|
||||
## Pointers
|
||||
|
||||
- Test plan (the thing to execute): `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
||||
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
|
||||
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
|
||||
- Original uid-split spec (deferred): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
||||
- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
||||
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
|
||||
- Scratch plan from earlier this session
|
||||
(`~/.claude/plans/docs-superpowers-specs-2026-05-15-sessio-cosmic-codd.md`)
|
||||
is superseded by the three specs; safe to discard.
|
||||
|
|
|
|||
Loading…
Reference in a new issue