left4me/docs/superpowers/specs/2026-05-15-hardening-threat-model.md

# left4me application hardening — threat model

**Status:** living spec, intended input to a hardening implementation plan.
Paired with `2026-05-15-hardening-defenses-survey.md` and
`2026-05-15-hardening-test-plan.md`.

This document establishes *what we defend against and what we accept losing*.
The defenses survey and test plan operationalize this against the codebase.

## Context

The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and
queued "uid split decision" as the next session's task
(`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user
configuration found that the gameserver's systemd hardening blocks
privilege escalation but leaves same-uid attack surface wide open:
RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by
srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable),
no ptrace block on `left4me-server@.service`, no `/proc` isolation.
Rather than answer the original "1/2/3 uids" question in isolation,
this work treats application hardening as a first-class refactor: ground
the decision in an explicit threat model, survey the full Linux+systemd
defense menu, test what composes safely with Source engine + the rest of
the stack, then implement.

## Operating posture (assumed)

Solo-operator, single-host infra (`left4.me` / `ovh.left4me`,
141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged
operator is the user. There are no shell logins as `left4me` or
`l4d2-sandbox`. All access to those uids is funneled through the
systemd-managed units (`left4me-web.service`, `left4me-server@.service`,
`left4me-script-sandbox`). The host runs nothing other than left4me +
ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).

If those assumptions don't hold (e.g., shared host with other tenants,
non-systemd-mediated access to the uids), revise this document before
proceeding — threat surface changes meaningfully.

## Assets

Ordered by impact-if-compromised. Compromise means the attacker can
exfiltrate, modify, or destroy the asset.

### Tier 1 — catastrophic, no easy recovery

| Asset | Where | Impact of compromise |
|---|---|---|
| Host root | the box | Total compromise of every service on the host. |
| `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. |
| `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. |
| Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. |
| User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. |

### Tier 2 — severe but bounded

| Asset | Where | Impact |
|---|---|---|
| `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. |
| Overlay content | `/var/lib/left4me/overlays/<id>/` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. |
| Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. |
| Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. |
| Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. |

### Tier 3 — limited, recoverable

Job history, build logs, the small subset of in-game state not covered by
the above (e.g., live player slot in a specific match).

## Trust boundaries

Lines we want enforced. "Enforced" = the kernel + systemd, not "the
process politely doesn't cross it."

| Id | From | To | Strength today | Strength wanted |
|---|---|---|---|---|
| TB1 | External network | host shell | Strong (firewall, no extra services) | Strong |
| TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong |
| TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) |
| TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong |
| TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong |
| TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/<n>` access) | Medium (web needs to stage server.cfg) |
| TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong |
| TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary |

TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the
web app legitimately writes per-instance config; the boundary is "web
can write per-instance config" allowed, "web can ptrace srcds" denied.

## Attackers

### A1 — Anonymous external attacker (primary)

Reaches public surfaces:
- gunicorn on `:8000` (behind nginx + admin auth)
- srcds on UDP `:27015`+ per instance (game protocol; no auth)
- (Maybe: workshop subscription endpoints if any; check.)

Capabilities: arbitrary network packets. Goal: code execution on the
host, then exfiltrate secrets and persist.

### A2 — Authenticated admin (operator)

In the assumed posture this is *the user*, single person. Out of scope as
a threat per operator's choice (insider == operator). If admin auth ever
expands to multiple operators, revise.

### A3 — Malicious workshop content

A workshop addon (map, plugin, asset pack) is published to the Steam
workshop and pulled into a build. The content runs inside srcds via
Source engine + sourcemod loading. Capabilities: same as A1 once loaded
into srcds (the engine doesn't have a strong privilege boundary against
its own loaded plugins). Distinct in that the entry vector is curated by
the operator (workshop link added to a blueprint), not arbitrary network
input. Risk floor: the operator vetted the source.

### A4 — Compromised player session

A connected player exploits a Source-engine protocol bug. Functionally a
subset of A1 — same capability set once code is running in srcds.

### A5 — Local attacker on the host

Out of scope per operating posture. No non-root local accounts beyond
the systemd-managed service uids.

### A6 — Steam binary supply-chain

`srcds_linux` is a binary from Valve. A compromised Valve build would
already be running as `left4me` and there's no practical defense at
this layer. Out of scope.

## Attack scenarios

### S1 — L4D2 engine RCE → exfil + persist

A1 sends a crafted packet to srcds; srcds executes attacker code as
`left4me` inside `left4me-server@.service`.

**Today, attacker can:**
- Read DB → all RCON passwords (plaintext), all bcrypt hashes.
- Read `web.env` → SECRET_KEY, Steam API key.
- ptrace gunicorn → in-memory secrets, current sessions.
- Read `/proc/<gunicorn-pid>/environ` → same env as `web.env`.
- ptrace + read DB of peer `left4me-server@<n>` — cross-server compromise.
- `sudo left4me-systemctl|journalctl|overlay` for any instance.
- Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`).
- Cannot acquire new caps (NoNewPrivileges).

**Defended outcome (goal):** Blast radius limited to "this gameserver's
runtime state during this session" — no peer-server compromise, no DB
access, no `web.env` access, no ptrace.

### S2 — Web app RCE → secrets + persistence

A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth
bypass, file-upload escape). Web executes attacker code as `left4me`
inside `left4me-web.service`.

**Today, attacker can:**
- Read + write DB (web's primary path).
- Read `web.env`.
- Write `/opt/left4me/src/` → backdoor next gunicorn reload.
- `sudo` all helper verbs.
- ptrace srcds peers, modify their `runtime/<n>/` upper layer.
- Modify overlays (writes to `/var/lib/left4me/overlays/`).

**Defended outcome (goal):** Cannot ptrace gameservers; cannot read
`/proc/<srcds-pid>/*`; web compromise still owns its DB and env (its
primary attack surface, so this is *acceptable residual*).

### S3 — Cross-server contamination

S1 played out on srcds@1; attacker pivots to srcds@2.

**Today:** trivial — ptrace srcds@2, read its memory; or just read the
DB to learn srcds@2's RCON password and send commands.

**Defended outcome (goal):** Blocked. Per-instance namespace isolation
(or per-instance uid) means kernel rejects ptrace; DB invisible to
gameserver uid hides the RCON list.

### S4 — Malicious workshop content

A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod
plugin that abuses engine APIs to do file I/O / network calls.

**Today + with hardening:** functionally equivalent to S1 — the plugin
runs as srcds, same blast radius. No software boundary prevents this;
the only defense is what's outside the unit. So this is *covered* if S1
is covered.

### S5 — Sudoers helper abuse

S1 or S2 attacker uses the sudo grants to widen access.

**Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`):
- `left4me-systemctl <name> {enable|disable|show}` — any instance, no
  ownership check
- `left4me-journalctl <name>` — read any unit's journal
- `left4me-overlay mount|umount <name>` — any instance
- `left4me-script-sandbox <overlay_id> <script>` — runs as `l4d2-sandbox`

A compromised gameserver can enable/disable peer instances, read their
journals, mount/umount their overlays. Not root escalation, but a
significant escalation.

**Defended outcome:** sudoers reachable only from `left4me-web`. The
gameserver uid (or the gameserver's namespace) gets none of the helper
grants. This is naturally true if the helpers are invoked only by the
web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits
in its FS view).

### S6 — Sandbox escape

Reached A1-equivalent in `l4d2-script-sandbox`. The sandbox runs as
`l4d2-sandbox`, fully hardened (verified during 2026-05-15 work).

**Today:** sandbox-escape attacker has `l4d2-sandbox` capabilities only.
With build-time-idmap, writes through the bind land on disk as
`left4me`, but the sandbox process itself cannot interact with `left4me`
processes (different uid). Existing isolation is strong.

**Defended outcome:** unchanged — already strong. Document as a load-
bearing invariant; do not weaken.

## What we accept losing

Decisions to *not* defend, with reasoning. Future work might revisit.

- **Kernel CVEs** that escape namespaces or seccomp. No practical defense
  short of running on a hypervisor + KVM. Out of scope.
- **systemd unit-config CVEs**. Unit hardening relies on systemd
  honoring directives correctly. Out of scope.
- **Steam binary compromise**. `srcds_linux` is Valve's. Out of scope.
- **Sourcemod / Metamod plugin runtime weaknesses**. Plugins run as srcds
  by design. Out of scope.
- **Player IP exposure via game protocol**. Inherent to UDP/Source. Out of
  scope.
- **DoS via game protocol** (`A2S_INFO` flooding etc.). Out of scope for
  *this* effort; covered by network-layer mitigations.
- **DoS via web HTTP**. Covered upstream by nginx + fail2ban; out of
  scope for *this* effort.
- **Host root from operator error** (a misconfigured cron, an admin
  shell). Out of scope; operator is single-person and aware.
- **Long-term forward secrecy** for past sessions (an attacker who
  exfils SECRET_KEY can replay past sessions). Out of scope; rotation
  on incident.

## What we defend (prioritized)

D1 — **Gameserver RCE cannot exfiltrate DB or web.env**, including RCON
passwords and SECRET_KEY. Highest value: catastrophic asset, plausible
attack (L4D2 engine RCE is the canonical "old engine, public traffic"
risk).

D2 — **Gameserver RCE cannot ptrace web app or peer gameservers**. Blocks
in-memory secret theft and cross-server contamination.

D3 — **Gameserver RCE cannot use sudo helpers** for instances other
than its own (or, ideally, cannot use sudo at all).

D4 — **Web app RCE cannot ptrace gameservers**. Symmetric to D2; web
still has full DB access (acceptable residual since it's the web app's
own data).

D5 — **Cross-server contamination blocked at the kernel level**. Per-
instance namespaces or per-instance uid.

D6 — **Persistent compromise of `/opt/left4me/src/` blocked from
gameserver context**. Already partially true via `ProtectSystem=strict`;
maintain.

D7 — **All defenses survive a unit-config refactor in the wrong
direction** — e.g., a future developer adding `ReadWritePaths=` widely.
Achieved via tests that assert hardening invariants
(`deploy/tests/test_deploy_artifacts.py`).

## Acceptable user-experience cost

- **Unit start latency**: +5s tolerable; +30s not.
- **Memory overhead**: +tens of MB per unit fine; +hundreds not.
- **Operational complexity**: one well-documented unit-template
  hardening profile reusable across units. Acceptable trade-off.
- **Debugging cost**: SECCOMP audit log discoverability via
  `journalctl -k` acceptable. ptrace-based debugging in production
  unnecessary; can re-enable via ad-hoc drop-in if needed.
- **Steam updates / pip installs**: must continue to work without
  per-update operator action. Privileged paths (steamcmd self-update)
  can run as `left4me` outside the unit if needed; document.
- **Workshop content**: must continue to load. Builds run in the
  sandbox; the gameserver only reads pre-built overlays.

## Acceptance criteria for the implementation

The final composition (hardening directives + any uid changes) must:

1. **Functionally**: pass the smoke matrix from `2026-05-15-hardening-test-plan.md` (RCON, build, restart, file upload, multi-server, workshop).
2. **Defenses verified**:
   - srcds cannot read `/var/lib/left4me/left4me.db` or `/etc/left4me/web.env` (file not in FS view, or kernel denies)
   - srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
   - srcds cannot read `/proc/<other-pid>/*`
   - web cannot ptrace srcds (symmetric)
3. **No regressions**: existing test suite passes
   (`pytest deploy/tests/test_overlay_helper.py l4d2host/tests/`).
4. **Auditable**: invariants asserted in `deploy/tests/test_deploy_artifacts.py`; baseline `systemd-analyze security` score recorded.
5. **Documentable**: one paragraph per directive in the unit, explaining
   *why* it's there. Future maintainers can reason about removal.

## Open questions to clarify with the operator

Before the defenses survey is final, clarify:

1. **Is gunicorn directly internet-reachable, or behind nginx?** The unit
   binds `127.0.0.1:8000` (per `metadata.py:208`); presumably nginx
   terminates TLS and forwards. Confirm.
2. **Auth model**: who can log into the web app? Is admin auth strong
   (long passwords, 2FA), or default-grade? Defines how realistic S2 is.
3. **Workshop content sources**: curated by operator, or arbitrary
   workshop subscriptions exposed to admins? Defines A3's realism.
4. **Test bench**: is `ckn@10.0.4.128` a real separate test host, or
   ovh.left4me the only deployment target? Affects test plan choices.
5. **`kernel.yama.ptrace_scope` setting on the host?** Default Debian is
   1; we may want 2 system-wide.
6. **Is the host running AppArmor?** Debian Trixie does not enable it by
   default. If we want AppArmor profiles for srcds (in addition to
   systemd directives), it needs enabling system-wide.

## Pointers

- Audit synthesis (this session's conversation): unit hardening profile
  `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`,
  metadata reactor `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`,
  filesystem ACLs `~/Projekte/ckn-bw/bundles/left4me/items.py:21-115`,
  DB schema `l4d2web/models.py:31, 146-148`, sudoers
  `deploy/files/etc/sudoers.d/left4me`.
- Original uid-split spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
  — remains open; this work may supersede it.
- Companion docs:
  `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`,
  `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`.
- Related work landed this session:
  `docs/superpowers/plans/2026-05-15-build-time-idmap.md`,
  `docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`.