left4me/docs/superpowers/specs/2026-05-15-hardening-threat-model.md
mwiegand 8971b23617
refactor(sandbox): collapse l4d2-sandbox user into left4me
The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
2026-05-15 15:50:57 +02:00

352 lines
16 KiB
Markdown

# left4me application hardening — threat model
**Status:** living spec, intended input to a hardening implementation plan.
Paired with `2026-05-15-hardening-defenses-survey.md` and
`2026-05-15-hardening-test-plan.md`.
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
> after the hardening refactor landed — see
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. The same-uid
> threat surface that doc accepts is the same surface this model
> already documents for server/web; the sandbox is now in scope of
> the same hardening profile.
This document establishes *what we defend against and what we accept losing*.
The defenses survey and test plan operationalize this against the codebase.
## Context
The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and
queued "uid split decision" as the next session's task
(`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user
configuration found that the gameserver's systemd hardening blocks
privilege escalation but leaves same-uid attack surface wide open:
RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by
srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable),
no ptrace block on `left4me-server@.service`, no `/proc` isolation.
Rather than answer the original "1/2/3 uids" question in isolation,
this work treats application hardening as a first-class refactor: ground
the decision in an explicit threat model, survey the full Linux+systemd
defense menu, test what composes safely with Source engine + the rest of
the stack, then implement.
## Operating posture (assumed)
Solo-operator, single-host infra (`left4.me` / `ovh.left4me`,
141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged
operator is the user. There are no shell logins as `left4me` or
`l4d2-sandbox`. All access to those uids is funneled through the
systemd-managed units (`left4me-web.service`, `left4me-server@.service`,
`left4me-script-sandbox`). The host runs nothing other than left4me +
ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).
If those assumptions don't hold (e.g., shared host with other tenants,
non-systemd-mediated access to the uids), revise this document before
proceeding — threat surface changes meaningfully.
## Assets
Ordered by impact-if-compromised. Compromise means the attacker can
exfiltrate, modify, or destroy the asset.
### Tier 1 — catastrophic, no easy recovery
| Asset | Where | Impact of compromise |
|---|---|---|
| Host root | the box | Total compromise of every service on the host. |
| `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. |
| `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. |
| Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. |
| User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. |
### Tier 2 — severe but bounded
| Asset | Where | Impact |
|---|---|---|
| `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. |
| Overlay content | `/var/lib/left4me/overlays/<id>/` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. |
| Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. |
| Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. |
| Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. |
### Tier 3 — limited, recoverable
Job history, build logs, the small subset of in-game state not covered by
the above (e.g., live player slot in a specific match).
## Trust boundaries
Lines we want enforced. "Enforced" = the kernel + systemd, not "the
process politely doesn't cross it."
| Id | From | To | Strength today | Strength wanted |
|---|---|---|---|---|
| TB1 | External network | host shell | Strong (firewall, no extra services) | Strong |
| TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong |
| TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) |
| TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong |
| TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong |
| TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/<n>` access) | Medium (web needs to stage server.cfg) |
| TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong |
| TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary |
TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the
web app legitimately writes per-instance config; the boundary is "web
can write per-instance config" allowed, "web can ptrace srcds" denied.
## Attackers
### A1 — Anonymous external attacker (primary)
Reaches public surfaces:
- gunicorn on `:8000` (behind nginx + admin auth)
- srcds on UDP `:27015`+ per instance (game protocol; no auth)
- (Maybe: workshop subscription endpoints if any; check.)
Capabilities: arbitrary network packets. Goal: code execution on the
host, then exfiltrate secrets and persist.
### A2 — Authenticated admin (operator)
In the assumed posture this is *the user*, single person. Out of scope as
a threat per operator's choice (insider == operator). If admin auth ever
expands to multiple operators, revise.
### A3 — Malicious workshop content
A workshop addon (map, plugin, asset pack) is published to the Steam
workshop and pulled into a build. The content runs inside srcds via
Source engine + sourcemod loading. Capabilities: same as A1 once loaded
into srcds (the engine doesn't have a strong privilege boundary against
its own loaded plugins). Distinct in that the entry vector is curated by
the operator (workshop link added to a blueprint), not arbitrary network
input. Risk floor: the operator vetted the source.
### A4 — Compromised player session
A connected player exploits a Source-engine protocol bug. Functionally a
subset of A1 — same capability set once code is running in srcds.
### A5 — Local attacker on the host
Out of scope per operating posture. No non-root local accounts beyond
the systemd-managed service uids.
### A6 — Steam binary supply-chain
`srcds_linux` is a binary from Valve. A compromised Valve build would
already be running as `left4me` and there's no practical defense at
this layer. Out of scope.
## Attack scenarios
### S1 — L4D2 engine RCE → exfil + persist
A1 sends a crafted packet to srcds; srcds executes attacker code as
`left4me` inside `left4me-server@.service`.
**Today, attacker can:**
- Read DB → all RCON passwords (plaintext), all bcrypt hashes.
- Read `web.env` → SECRET_KEY, Steam API key.
- ptrace gunicorn → in-memory secrets, current sessions.
- Read `/proc/<gunicorn-pid>/environ` → same env as `web.env`.
- ptrace + read DB of peer `left4me-server@<n>` — cross-server compromise.
- `sudo left4me-systemctl|journalctl|overlay` for any instance.
- Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`).
- Cannot acquire new caps (NoNewPrivileges).
**Defended outcome (goal):** Blast radius limited to "this gameserver's
runtime state during this session" — no peer-server compromise, no DB
access, no `web.env` access, no ptrace.
### S2 — Web app RCE → secrets + persistence
A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth
bypass, file-upload escape). Web executes attacker code as `left4me`
inside `left4me-web.service`.
**Today, attacker can:**
- Read + write DB (web's primary path).
- Read `web.env`.
- Write `/opt/left4me/src/` → backdoor next gunicorn reload.
- `sudo` all helper verbs.
- ptrace srcds peers, modify their `runtime/<n>/` upper layer.
- Modify overlays (writes to `/var/lib/left4me/overlays/`).
**Defended outcome (goal):** Cannot ptrace gameservers; cannot read
`/proc/<srcds-pid>/*`; web compromise still owns its DB and env (its
primary attack surface, so this is *acceptable residual*).
### S3 — Cross-server contamination
S1 played out on srcds@1; attacker pivots to srcds@2.
**Today:** trivial — ptrace srcds@2, read its memory; or just read the
DB to learn srcds@2's RCON password and send commands.
**Defended outcome (goal):** Blocked. Per-instance namespace isolation
(or per-instance uid) means kernel rejects ptrace; DB invisible to
gameserver uid hides the RCON list.
### S4 — Malicious workshop content
A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod
plugin that abuses engine APIs to do file I/O / network calls.
**Today + with hardening:** functionally equivalent to S1 — the plugin
runs as srcds, same blast radius. No software boundary prevents this;
the only defense is what's outside the unit. So this is *covered* if S1
is covered.
### S5 — Sudoers helper abuse
S1 or S2 attacker uses the sudo grants to widen access.
**Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`):
- `left4me-systemctl <name> {enable|disable|show}` — any instance, no
ownership check
- `left4me-journalctl <name>` — read any unit's journal
- `left4me-overlay mount|umount <name>` — any instance
- `left4me-script-sandbox <overlay_id> <script>` — runs as `l4d2-sandbox`
A compromised gameserver can enable/disable peer instances, read their
journals, mount/umount their overlays. Not root escalation, but a
significant escalation.
**Defended outcome:** sudoers reachable only from `left4me-web`. The
gameserver uid (or the gameserver's namespace) gets none of the helper
grants. This is naturally true if the helpers are invoked only by the
web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits
in its FS view).
### S6 — Sandbox escape
Reached A1-equivalent in `l4d2-script-sandbox`. The sandbox runs as
`l4d2-sandbox`, fully hardened (verified during 2026-05-15 work).
**Today:** sandbox-escape attacker has `l4d2-sandbox` capabilities only.
With build-time-idmap, writes through the bind land on disk as
`left4me`, but the sandbox process itself cannot interact with `left4me`
processes (different uid). Existing isolation is strong.
**Defended outcome:** unchanged — already strong. Document as a load-
bearing invariant; do not weaken.
## What we accept losing
Decisions to *not* defend, with reasoning. Future work might revisit.
- **Kernel CVEs** that escape namespaces or seccomp. No practical defense
short of running on a hypervisor + KVM. Out of scope.
- **systemd unit-config CVEs**. Unit hardening relies on systemd
honoring directives correctly. Out of scope.
- **Steam binary compromise**. `srcds_linux` is Valve's. Out of scope.
- **Sourcemod / Metamod plugin runtime weaknesses**. Plugins run as srcds
by design. Out of scope.
- **Player IP exposure via game protocol**. Inherent to UDP/Source. Out of
scope.
- **DoS via game protocol** (`A2S_INFO` flooding etc.). Out of scope for
*this* effort; covered by network-layer mitigations.
- **DoS via web HTTP**. Covered upstream by nginx + fail2ban; out of
scope for *this* effort.
- **Host root from operator error** (a misconfigured cron, an admin
shell). Out of scope; operator is single-person and aware.
- **Long-term forward secrecy** for past sessions (an attacker who
exfils SECRET_KEY can replay past sessions). Out of scope; rotation
on incident.
## What we defend (prioritized)
D1 — **Gameserver RCE cannot exfiltrate DB or web.env**, including RCON
passwords and SECRET_KEY. Highest value: catastrophic asset, plausible
attack (L4D2 engine RCE is the canonical "old engine, public traffic"
risk).
D2 — **Gameserver RCE cannot ptrace web app or peer gameservers**. Blocks
in-memory secret theft and cross-server contamination.
D3 — **Gameserver RCE cannot use sudo helpers** for instances other
than its own (or, ideally, cannot use sudo at all).
D4 — **Web app RCE cannot ptrace gameservers**. Symmetric to D2; web
still has full DB access (acceptable residual since it's the web app's
own data).
D5 — **Cross-server contamination blocked at the kernel level**. Per-
instance namespaces or per-instance uid.
D6 — **Persistent compromise of `/opt/left4me/src/` blocked from
gameserver context**. Already partially true via `ProtectSystem=strict`;
maintain.
D7 — **All defenses survive a unit-config refactor in the wrong
direction** — e.g., a future developer adding `ReadWritePaths=` widely.
Achieved via tests that assert hardening invariants
(`deploy/tests/test_deploy_artifacts.py`).
## Acceptable user-experience cost
- **Unit start latency**: +5s tolerable; +30s not.
- **Memory overhead**: +tens of MB per unit fine; +hundreds not.
- **Operational complexity**: one well-documented unit-template
hardening profile reusable across units. Acceptable trade-off.
- **Debugging cost**: SECCOMP audit log discoverability via
`journalctl -k` acceptable. ptrace-based debugging in production
unnecessary; can re-enable via ad-hoc drop-in if needed.
- **Steam updates / pip installs**: must continue to work without
per-update operator action. Privileged paths (steamcmd self-update)
can run as `left4me` outside the unit if needed; document.
- **Workshop content**: must continue to load. Builds run in the
sandbox; the gameserver only reads pre-built overlays.
## Acceptance criteria for the implementation
The final composition (hardening directives + any uid changes) must:
1. **Functionally**: pass the smoke matrix from `2026-05-15-hardening-test-plan.md` (RCON, build, restart, file upload, multi-server, workshop).
2. **Defenses verified**:
- srcds cannot read `/var/lib/left4me/left4me.db` or `/etc/left4me/web.env` (file not in FS view, or kernel denies)
- srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
- srcds cannot read `/proc/<other-pid>/*`
- web cannot ptrace srcds (symmetric)
3. **No regressions**: existing test suite passes
(`pytest deploy/tests/test_overlay_helper.py l4d2host/tests/`).
4. **Auditable**: invariants asserted in `deploy/tests/test_deploy_artifacts.py`; baseline `systemd-analyze security` score recorded.
5. **Documentable**: one paragraph per directive in the unit, explaining
*why* it's there. Future maintainers can reason about removal.
## Open questions to clarify with the operator
Before the defenses survey is final, clarify:
1. **Is gunicorn directly internet-reachable, or behind nginx?** The unit
binds `127.0.0.1:8000` (per `metadata.py:208`); presumably nginx
terminates TLS and forwards. Confirm.
2. **Auth model**: who can log into the web app? Is admin auth strong
(long passwords, 2FA), or default-grade? Defines how realistic S2 is.
3. **Workshop content sources**: curated by operator, or arbitrary
workshop subscriptions exposed to admins? Defines A3's realism.
4. **Test bench**: is `ckn@10.0.4.128` a real separate test host, or
ovh.left4me the only deployment target? Affects test plan choices.
5. **`kernel.yama.ptrace_scope` setting on the host?** Default Debian is
1; we may want 2 system-wide.
6. **Is the host running AppArmor?** Debian Trixie does not enable it by
default. If we want AppArmor profiles for srcds (in addition to
systemd directives), it needs enabling system-wide.
## Pointers
- Audit synthesis (this session's conversation): unit hardening profile
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`,
metadata reactor `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`,
filesystem ACLs `~/Projekte/ckn-bw/bundles/left4me/items.py:21-115`,
DB schema `l4d2web/models.py:31, 146-148`, sudoers
`deploy/files/etc/sudoers.d/left4me`.
- Original uid-split spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
— remains open; this work may supersede it.
- Companion docs:
`docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`,
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`.
- Related work landed this session:
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`,
`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`.