left4me/docs/superpowers/specs/2026-05-15-hardening-threat-model.md
mwiegand 8971b23617
refactor(sandbox): collapse l4d2-sandbox user into left4me
The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
2026-05-15 15:50:57 +02:00

16 KiB

left4me application hardening — threat model

Status: living spec, intended input to a hardening implementation plan. Paired with 2026-05-15-hardening-defenses-survey.md and 2026-05-15-hardening-test-plan.md.

Updated 2026-05-15: l4d2-sandbox was collapsed into left4me after the hardening refactor landed — see docs/superpowers/plans/2026-05-15-uid-collapse.md. The same-uid threat surface that doc accepts is the same surface this model already documents for server/web; the sandbox is now in scope of the same hardening profile.

This document establishes what we defend against and what we accept losing. The defenses survey and test plan operationalize this against the codebase.

Context

The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and queued "uid split decision" as the next session's task (2026-05-15-user-uid-split-design.md). Audit of the running 2-user configuration found that the gameserver's systemd hardening blocks privilege escalation but leaves same-uid attack surface wide open: RCON passwords plaintext in /var/lib/left4me/left4me.db (readable by srcds), Flask SECRET_KEY in /etc/left4me/web.env (also readable), no ptrace block on left4me-server@.service, no /proc isolation. Rather than answer the original "1/2/3 uids" question in isolation, this work treats application hardening as a first-class refactor: ground the decision in an explicit threat model, survey the full Linux+systemd defense menu, test what composes safely with Source engine + the rest of the stack, then implement.

Operating posture (assumed)

Solo-operator, single-host infra (left4.me / ovh.left4me, 141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged operator is the user. There are no shell logins as left4me or l4d2-sandbox. All access to those uids is funneled through the systemd-managed units (left4me-web.service, left4me-server@.service, left4me-script-sandbox). The host runs nothing other than left4me + ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).

If those assumptions don't hold (e.g., shared host with other tenants, non-systemd-mediated access to the uids), revise this document before proceeding — threat surface changes meaningfully.

Assets

Ordered by impact-if-compromised. Compromise means the attacker can exfiltrate, modify, or destroy the asset.

Tier 1 — catastrophic, no easy recovery

Asset Where Impact of compromise
Host root the box Total compromise of every service on the host.
web.env Flask SECRET_KEY /etc/left4me/web.env, root:left4me 0640 Session forgery: attacker logs in as any admin without password.
web.env Steam Web API key same Attacker can query/operate Steam Web API as us. Rate-limited; reputational.
Server RCON passwords DB: Server.rcon_password plaintext (l4d2web/models.py:146-148) Attacker can execute arbitrary RCON on every gameserver: sm_kick, rcon say, server lockup, plugin abuse.
User password hashes (bcrypt) DB: User.password_digest (l4d2web/models.py:31) Offline cracking per user. bcrypt slows it but doesn't stop it.

Tier 2 — severe but bounded

Asset Where Impact
/opt/left4me/src/ Python source left4me:left4me on disk Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (ProtectSystem=strict covers /opt); RW from inside the web unit.
Overlay content /var/lib/left4me/overlays/<id>/ Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay.
Steam installation /var/lib/left4me/installation/ Tampered srcds_linux; trivial persistence. Currently RO from server, RW from web.
Sourcemod admin lists inside overlays RCON-equivalent: admin commands in-game.
Workshop cache /var/lib/left4me/workshop_cache/ Used by builds; tampered content surfaces in next overlay.

Tier 3 — limited, recoverable

Job history, build logs, the small subset of in-game state not covered by the above (e.g., live player slot in a specific match).

Trust boundaries

Lines we want enforced. "Enforced" = the kernel + systemd, not "the process politely doesn't cross it."

Id From To Strength today Strength wanted
TB1 External network host shell Strong (firewall, no extra services) Strong
TB2 Gameserver process rest of the host Weak (same-uid + same-FS view) Strong
TB3 Web app rest of the host Weak (same-uid + same-FS view) Medium (sudo path inherent)
TB4 Sandbox rest of the host Strong (separate uid + hardened unit) Strong
TB5 Gameserver instance N gameserver instance M None (same-uid, same-DB) Strong
TB6 Web app gameserver runtime state None (same-uid, shared runtime/<n> access) Medium (web needs to stage server.cfg)
TB7 Gameserver web-only secrets (DB, web.env) None Strong
TB8 Workshop content srcds-process Inherent (content runs as data) n/a — not a software boundary

TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the web app legitimately writes per-instance config; the boundary is "web can write per-instance config" allowed, "web can ptrace srcds" denied.

Attackers

A1 — Anonymous external attacker (primary)

Reaches public surfaces:

  • gunicorn on :8000 (behind nginx + admin auth)
  • srcds on UDP :27015+ per instance (game protocol; no auth)
  • (Maybe: workshop subscription endpoints if any; check.)

Capabilities: arbitrary network packets. Goal: code execution on the host, then exfiltrate secrets and persist.

A2 — Authenticated admin (operator)

In the assumed posture this is the user, single person. Out of scope as a threat per operator's choice (insider == operator). If admin auth ever expands to multiple operators, revise.

A3 — Malicious workshop content

A workshop addon (map, plugin, asset pack) is published to the Steam workshop and pulled into a build. The content runs inside srcds via Source engine + sourcemod loading. Capabilities: same as A1 once loaded into srcds (the engine doesn't have a strong privilege boundary against its own loaded plugins). Distinct in that the entry vector is curated by the operator (workshop link added to a blueprint), not arbitrary network input. Risk floor: the operator vetted the source.

A4 — Compromised player session

A connected player exploits a Source-engine protocol bug. Functionally a subset of A1 — same capability set once code is running in srcds.

A5 — Local attacker on the host

Out of scope per operating posture. No non-root local accounts beyond the systemd-managed service uids.

A6 — Steam binary supply-chain

srcds_linux is a binary from Valve. A compromised Valve build would already be running as left4me and there's no practical defense at this layer. Out of scope.

Attack scenarios

S1 — L4D2 engine RCE → exfil + persist

A1 sends a crafted packet to srcds; srcds executes attacker code as left4me inside left4me-server@.service.

Today, attacker can:

  • Read DB → all RCON passwords (plaintext), all bcrypt hashes.
  • Read web.env → SECRET_KEY, Steam API key.
  • ptrace gunicorn → in-memory secrets, current sessions.
  • Read /proc/<gunicorn-pid>/environ → same env as web.env.
  • ptrace + read DB of peer left4me-server@<n> — cross-server compromise.
  • sudo left4me-systemctl|journalctl|overlay for any instance.
  • Cannot write /opt/left4me/src/ (ProtectSystem=strict covers /opt).
  • Cannot acquire new caps (NoNewPrivileges).

Defended outcome (goal): Blast radius limited to "this gameserver's runtime state during this session" — no peer-server compromise, no DB access, no web.env access, no ptrace.

S2 — Web app RCE → secrets + persistence

A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth bypass, file-upload escape). Web executes attacker code as left4me inside left4me-web.service.

Today, attacker can:

  • Read + write DB (web's primary path).
  • Read web.env.
  • Write /opt/left4me/src/ → backdoor next gunicorn reload.
  • sudo all helper verbs.
  • ptrace srcds peers, modify their runtime/<n>/ upper layer.
  • Modify overlays (writes to /var/lib/left4me/overlays/).

Defended outcome (goal): Cannot ptrace gameservers; cannot read /proc/<srcds-pid>/*; web compromise still owns its DB and env (its primary attack surface, so this is acceptable residual).

S3 — Cross-server contamination

S1 played out on srcds@1; attacker pivots to srcds@2.

Today: trivial — ptrace srcds@2, read its memory; or just read the DB to learn srcds@2's RCON password and send commands.

Defended outcome (goal): Blocked. Per-instance namespace isolation (or per-instance uid) means kernel rejects ptrace; DB invisible to gameserver uid hides the RCON list.

S4 — Malicious workshop content

A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod plugin that abuses engine APIs to do file I/O / network calls.

Today + with hardening: functionally equivalent to S1 — the plugin runs as srcds, same blast radius. No software boundary prevents this; the only defense is what's outside the unit. So this is covered if S1 is covered.

S5 — Sudoers helper abuse

S1 or S2 attacker uses the sudo grants to widen access.

Today: sudoers grants (audit findings, deploy/files/etc/sudoers.d/left4me):

  • left4me-systemctl <name> {enable|disable|show} — any instance, no ownership check
  • left4me-journalctl <name> — read any unit's journal
  • left4me-overlay mount|umount <name> — any instance
  • left4me-script-sandbox <overlay_id> <script> — runs as l4d2-sandbox

A compromised gameserver can enable/disable peer instances, read their journals, mount/umount their overlays. Not root escalation, but a significant escalation.

Defended outcome: sudoers reachable only from left4me-web. The gameserver uid (or the gameserver's namespace) gets none of the helper grants. This is naturally true if the helpers are invoked only by the web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits in its FS view).

S6 — Sandbox escape

Reached A1-equivalent in l4d2-script-sandbox. The sandbox runs as l4d2-sandbox, fully hardened (verified during 2026-05-15 work).

Today: sandbox-escape attacker has l4d2-sandbox capabilities only. With build-time-idmap, writes through the bind land on disk as left4me, but the sandbox process itself cannot interact with left4me processes (different uid). Existing isolation is strong.

Defended outcome: unchanged — already strong. Document as a load- bearing invariant; do not weaken.

What we accept losing

Decisions to not defend, with reasoning. Future work might revisit.

  • Kernel CVEs that escape namespaces or seccomp. No practical defense short of running on a hypervisor + KVM. Out of scope.
  • systemd unit-config CVEs. Unit hardening relies on systemd honoring directives correctly. Out of scope.
  • Steam binary compromise. srcds_linux is Valve's. Out of scope.
  • Sourcemod / Metamod plugin runtime weaknesses. Plugins run as srcds by design. Out of scope.
  • Player IP exposure via game protocol. Inherent to UDP/Source. Out of scope.
  • DoS via game protocol (A2S_INFO flooding etc.). Out of scope for this effort; covered by network-layer mitigations.
  • DoS via web HTTP. Covered upstream by nginx + fail2ban; out of scope for this effort.
  • Host root from operator error (a misconfigured cron, an admin shell). Out of scope; operator is single-person and aware.
  • Long-term forward secrecy for past sessions (an attacker who exfils SECRET_KEY can replay past sessions). Out of scope; rotation on incident.

What we defend (prioritized)

D1 — Gameserver RCE cannot exfiltrate DB or web.env, including RCON passwords and SECRET_KEY. Highest value: catastrophic asset, plausible attack (L4D2 engine RCE is the canonical "old engine, public traffic" risk).

D2 — Gameserver RCE cannot ptrace web app or peer gameservers. Blocks in-memory secret theft and cross-server contamination.

D3 — Gameserver RCE cannot use sudo helpers for instances other than its own (or, ideally, cannot use sudo at all).

D4 — Web app RCE cannot ptrace gameservers. Symmetric to D2; web still has full DB access (acceptable residual since it's the web app's own data).

D5 — Cross-server contamination blocked at the kernel level. Per- instance namespaces or per-instance uid.

D6 — Persistent compromise of /opt/left4me/src/ blocked from gameserver context. Already partially true via ProtectSystem=strict; maintain.

D7 — All defenses survive a unit-config refactor in the wrong direction — e.g., a future developer adding ReadWritePaths= widely. Achieved via tests that assert hardening invariants (deploy/tests/test_deploy_artifacts.py).

Acceptable user-experience cost

  • Unit start latency: +5s tolerable; +30s not.
  • Memory overhead: +tens of MB per unit fine; +hundreds not.
  • Operational complexity: one well-documented unit-template hardening profile reusable across units. Acceptable trade-off.
  • Debugging cost: SECCOMP audit log discoverability via journalctl -k acceptable. ptrace-based debugging in production unnecessary; can re-enable via ad-hoc drop-in if needed.
  • Steam updates / pip installs: must continue to work without per-update operator action. Privileged paths (steamcmd self-update) can run as left4me outside the unit if needed; document.
  • Workshop content: must continue to load. Builds run in the sandbox; the gameserver only reads pre-built overlays.

Acceptance criteria for the implementation

The final composition (hardening directives + any uid changes) must:

  1. Functionally: pass the smoke matrix from 2026-05-15-hardening-test-plan.md (RCON, build, restart, file upload, multi-server, workshop).
  2. Defenses verified:
    • srcds cannot read /var/lib/left4me/left4me.db or /etc/left4me/web.env (file not in FS view, or kernel denies)
    • srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
    • srcds cannot read /proc/<other-pid>/*
    • web cannot ptrace srcds (symmetric)
  3. No regressions: existing test suite passes (pytest deploy/tests/test_overlay_helper.py l4d2host/tests/).
  4. Auditable: invariants asserted in deploy/tests/test_deploy_artifacts.py; baseline systemd-analyze security score recorded.
  5. Documentable: one paragraph per directive in the unit, explaining why it's there. Future maintainers can reason about removal.

Open questions to clarify with the operator

Before the defenses survey is final, clarify:

  1. Is gunicorn directly internet-reachable, or behind nginx? The unit binds 127.0.0.1:8000 (per metadata.py:208); presumably nginx terminates TLS and forwards. Confirm.
  2. Auth model: who can log into the web app? Is admin auth strong (long passwords, 2FA), or default-grade? Defines how realistic S2 is.
  3. Workshop content sources: curated by operator, or arbitrary workshop subscriptions exposed to admins? Defines A3's realism.
  4. Test bench: is ckn@10.0.4.128 a real separate test host, or ovh.left4me the only deployment target? Affects test plan choices.
  5. kernel.yama.ptrace_scope setting on the host? Default Debian is 1; we may want 2 system-wide.
  6. Is the host running AppArmor? Debian Trixie does not enable it by default. If we want AppArmor profiles for srcds (in addition to systemd directives), it needs enabling system-wide.

Pointers

  • Audit synthesis (this session's conversation): unit hardening profile deploy/files/usr/local/lib/systemd/system/left4me-server@.service, metadata reactor ~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+, filesystem ACLs ~/Projekte/ckn-bw/bundles/left4me/items.py:21-115, DB schema l4d2web/models.py:31, 146-148, sudoers deploy/files/etc/sudoers.d/left4me.
  • Original uid-split spec: docs/superpowers/specs/2026-05-15-user-uid-split-design.md — remains open; this work may supersede it.
  • Companion docs: docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md, docs/superpowers/specs/2026-05-15-hardening-test-plan.md.
  • Related work landed this session: docs/superpowers/plans/2026-05-15-build-time-idmap.md, docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md.