From 62cf6cdd56fa5113d196e13b9ca4cc8b9522ffa4 Mon Sep 17 00:00:00 2001 From: mwiegand Date: Fri, 15 May 2026 01:58:09 +0200 Subject: [PATCH] spec: handoff for revisiting 1/2/3-user split for left4me The 2-user split (left4me + l4d2-sandbox) has been inherited as a constraint across multiple recent plans (idmap-on-mount, build-time- idmap, helper consolidation) without ever being designed end-to-end. Three plausible configurations: collapse to 1 user (rejected for security), keep at 2 users (status quo), or split web from game into 3 users for blast-radius limiting on either side. Doc captures the threat-model heuristics, cross-uid file-access plumbing options (shared group vs. world-read), idmap implications, a step-by-step migration sketch for the 3-user variant, and explicit out-of-scope items (per-instance gameserver uids, etc.). Detailed enough that a future session can pick a configuration and execute without re-deriving the design space. Co-Authored-By: Claude Opus 4.7 --- .../specs/2026-05-15-user-uid-split-design.md | 480 ++++++++++++++++++ 1 file changed, 480 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-15-user-uid-split-design.md diff --git a/docs/superpowers/specs/2026-05-15-user-uid-split-design.md b/docs/superpowers/specs/2026-05-15-user-uid-split-design.md new file mode 100644 index 0000000..2e150ba --- /dev/null +++ b/docs/superpowers/specs/2026-05-15-user-uid-split-design.md @@ -0,0 +1,480 @@ +# How many system users should left4me have? — 1, 2, or 3 + +**Status: open question, not settled design.** This is a handoff +document. Today left4me has 2 system users: `left4me` (web app + +gameservers + workshop builds) and `l4d2-sandbox` (script-overlay +sandbox). Whether that split is correct — should we collapse, or +split further? — has surfaced multiple times across recent design +work without ever being settled. A future session should evaluate +and decide; this doc gives them enough context to do so cold. + +## Why this came up + +Three relevant moments: + +1. **Build-time idmap (2026-05-15, this session)**: when we + considered eliminating the `l4d2-sandbox` uid entirely and just + running the script sandbox as `left4me`, we noted "sandbox escape + could see web.env / DB / running gameservers" — i.e. uid separation + is the load-bearing defense layer. We kept `l4d2-sandbox`. Plan at + `docs/superpowers/plans/2026-05-15-build-time-idmap.md` flagged + this as an out-of-scope future direction. + +2. **Idmap-on-mount plan (2026-05-14)**: in + `docs/superpowers/plans/2026-05-14-overlay-idmap.md` "Out of scope" + section: *"Gameserver uid split (separating the gameserver-runtime + uid from `left4me`) — planned for a later session."* No design + captured. + +3. **In this session's conversation**: when discussing the + server-side-symlink option for the helper consolidation, we + surfaced that a compromised web app (running as `left4me`) can + already reach `left4me`-owned gameserver state, RCON, etc., + because they share a uid. Splitting them would localize web + compromise. + +The question never got a structured answer because each plan +inherited "we have 2 users" as a constraint, not as a design choice. +This doc fixes that. + +## Current state (2 users) + +User → what runs as it: + +| User | Runs as | Reads | Writes | +|---|---|---|---| +| `left4me` (uid 980) | Flask web app (`left4me-web.service`), `srcds_run` for each gameserver instance (`left4me-server@.service`), web-driven workshop & files-overlay builds (in-process Python). | DB, env files, /opt/left4me/src, all of /var/lib/left4me/. | DB, /var/lib/left4me/{overlays,instances,runtime,…}, /opt/left4me/src (pip install -e creates egg-info). | +| `l4d2-sandbox` (uid 981) | Script-overlay sandbox (`systemd-run`-launched transient unit; will become `build-overlay@.service` if that refactor lands). | `/etc/left4me/sandbox-resolv.conf`, `/etc/ssl`, `/etc/ca-certificates`, the script bind-mounted at `/script.sh`, the idmapped `/overlay` bind. | `/overlay` only. After build-time-idmap refactor, those writes land on disk as `left4me`-owned via the bind's uid translation. | + +What this *prevents* today: +- Sandbox-escape ⇒ can't read the DB (different uid; DB is `root:left4me 0640`). +- Sandbox-escape ⇒ can't attach to gameserver processes (different uid; can't ptrace, can't signal). +- Sandbox-escape ⇒ can't write to anything outside its bind (`ProtectSystem=strict` + bind list). + +What this *doesn't* prevent today: +- Web-app compromise ⇒ full access to DB, env files, all gameservers (same uid). +- Web-app compromise ⇒ can `sudo` the privileged helpers (script-sandbox, overlay mount, systemctl) per sudoers rules. +- Web-app compromise ⇒ can replace `/opt/left4me/src/` Python code (it's `left4me`-owned); on next gunicorn reload, attacker code runs as the web app. +- Web-app compromise ⇒ can ptrace running gameservers (same uid; can read their memory, inject code). +- Gameserver compromise (e.g. RCE via game protocol bug) ⇒ symmetric: read DB, mess with web app, etc. + +## Three configurations + +### 1-user (`left4me` only) + +Collapse `l4d2-sandbox` into `left4me`. Sandbox runs as `left4me` +with the existing systemd hardening (`ProtectSystem=strict`, narrow +binds, seccomp, etc.). + +- **Pros**: simplest. No idmap needed anywhere (sandbox writes land + as `left4me` natively). Drops ~40 lines of helper code. No + cross-uid file-access plumbing. +- **Cons**: a sandbox escape that the systemd hardening fails to + contain (e.g. kernel bug bypassing seccomp; mount-namespace + escape) gains the web app's uid — full DB / env / gameserver + access. The current 2-user split exists specifically to limit + this blast radius. +- **When this is OK**: if you're confident systemd hardening is + load-bearing and uid separation is belt-and-braces. The cost-of- + failure is "web app compromised from a sandbox bug" — judge + whether you'd accept that risk. + +### 2-user (current: `left4me` + `l4d2-sandbox`) + +Status quo after the build-time-idmap refactor. Web/game share a +uid; sandbox is separate. + +- **Pros**: existing architecture, working code, all tests pass. + One idmap point (build-time), well-understood. +- **Cons**: a web-app compromise has full access to gameserver + state (same uid as srcds). A gameserver RCE (e.g. via L4D2 + network code bug) has full access to the web app's state. +- **Threat model**: assumes the web app and gameservers are + mutually trusting. Acceptable for solo-operator infra; less so + for multi-tenant. + +### 3-user (`l4d2-web` + `l4d2-game` + `l4d2-sandbox`) + +Split `left4me` into two: `l4d2-web` for the web app, `l4d2-game` +for gameservers. Keep `l4d2-sandbox`. + +- **Pros**: localizes web compromise (can no longer attach to + running gameservers, read their memory, modify their per-instance + config). Symmetric protection for gameserver RCEs (can't reach + the DB / env files directly). +- **Cons**: significant plumbing for cross-uid file access + (overlays, instance state, upper-layer staging). Reintroduces an + idmap concern at the *gameserver* boundary (overlay copy-up by + `l4d2-game` of `l4d2-web`-owned lowerdirs), unless we use a + shared group instead. See "Cross-uid plumbing" below. + +The bigger uid-set version (per-instance gameserver uids, e.g. +`l4d2-game-1`, `l4d2-game-2`) is **out of scope** for this doc — +the marginal gain over a single `l4d2-game` is small for a single- +host deployment. + +## Cross-uid plumbing (the hard part of 3-user) + +These are the file boundaries that the split affects: + +### Overlays (`/var/lib/left4me/overlays//`) + +- Today: `l4d2-web`-owned (after build-time-idmap migration). +- `l4d2-game` needs to read them as lowerdir at overlay mount time. +- Sandbox writes through idmap; the idmap target uid must be the + one whose writes appear on disk. + +Three approaches: + +- **Shared group `l4d2-overlay`**: both `l4d2-web` and `l4d2-game` + members. Overlays chgrp'd to `l4d2-overlay`, mode `2775` (setgid + so new entries inherit). Sandbox idmap maps to whichever primary + uid (probably `l4d2-web` since the web app *creates* the overlay + dirs). `l4d2-game` reads via group access. Copy-up by `l4d2-game` + produces files owned by `l4d2-game` (its own primary uid) — but + since they end up in upper, that's fine. +- **World-readable**: overlays mode `0755`; `l4d2-game` reads via + "other" access. Simpler but slightly looser perms. Acceptable for + internal infra. +- **Per-overlay idmap on gameserver mount** (back to what we just + removed): bind overlays as idmapped lowerdirs at gameserver start + time, presenting them as `l4d2-game`-owned to overlayfs. We + already know this works — we just deleted that code path. Don't + re-add it; prefer group-based access. + +### Upper layer (`/var/lib/left4me/runtime//upper/`) + +- Today: `left4me`-owned, written by both the web app (server.cfg + staging in `start_instance`) and the gameserver (copy-up at + runtime). +- With 3-user: `l4d2-game` writes via copy-up; `l4d2-web` writes + server.cfg staging. + +Two paths: + +- **Shared group `l4d2-runtime`** (could be the same as + `l4d2-overlay` or distinct). Upper-layer dir mode `2775`, + setgid'd, both uids write via group. +- **Move server.cfg staging out of `start_instance`** into the + systemd unit itself: `ExecStartPre=+...` does the cp as root and + chowns to `l4d2-game`. Cleaner separation of concerns; harder to + pipe dynamic content (the cfg lines come from the DB blueprint). + Either pass them via an env file, or have the web app write + `instances//server.cfg` (its own dir) and the unit cps it + into upper. + +### Database (`/var/lib/left4me/left4me.db`) + +- Today: `root:left4me 0640`. `left4me` (uid 980) can read and + write. +- With 3-user: only `l4d2-web` should write. `l4d2-game` shouldn't + need DB access at all (gameservers operate from `instance.env` + + overlay state; DB is a web-side concern). +- Migration: chown to `root:l4d2-web`, mode unchanged. Game uid is + not in the `l4d2-web` group, so it can't read the DB. Clean. + +### Env files (`/etc/left4me/{host.env,web.env}`) + +- Today: `root:left4me 0640`. `host.env` for L4D2 server config + (used by systemd units), `web.env` for the Flask app (secret_key, + DB url, etc.). +- With 3-user: + - `web.env` → `root:l4d2-web 0640`. Only the web app reads it. + - `host.env` is more nuanced — used by gameserver units + (`left4me-server@.service` sources it). Today both web and + game read it (web uses `host.env` for some operations too). + Probably keep readable by both: `root` owner, group + membership for both uids. Or duplicate the few values that + matter. + +### Code (`/opt/left4me/src/`) + +- Today: `left4me`-owned (for `pip install -e` editable installs; + creates `egg-info/` in the source tree). +- With 3-user: only the web app needs to run the Python code from + this tree. Make it `l4d2-web`-owned. +- The gameserver runs `srcds_run` from `/var/lib/left4me/installation/` + (Steam-managed) and the overlay binds — doesn't touch + `/opt/left4me/src/`. + +### Helpers and sudoers + +- `left4me-overlay`: invoked by `left4me-server@.service` (root via + systemd `+` prefix). Doesn't need user accounts at all. +- `left4me-script-sandbox` (today): invoked by the web app via + sudo. With 3-user, the sudoers grant moves from `left4me` to + `l4d2-web`. +- `left4me-systemctl` and `left4me-journalctl`: invoked by the web + app. Same — sudoers moves to `l4d2-web`. +- The admin CLI `/usr/local/sbin/left4me` (the `sudo left4me ` wrapper): drops to `l4d2-web` (the web app's uid) to run + flask commands. + +## Idmap implications + +With the build-time idmap landing on 2026-05-15, the sandbox's +writes get translated to the in-mount uid → disk-side uid via +`mount --bind --map-users=::1`. + +Currently `disk_uid = left4me`. + +Under 3-user, the disk-side target of the sandbox idmap should be +the uid that "owns" overlay state — i.e. **`l4d2-web`**, since the +web app creates the overlay dirs and reads them for the file-tree +endpoint. Gameserver access goes through the shared group, not the +idmap. + +This means the helper's `id -u left4me` → `id -u l4d2-web` is the +one-line change captured in the build-time-idmap plan. Trivial. + +## Threat-model heuristics + +The decision really turns on what you think is most likely: + +- "Web app gets RCE'd via a Flask/dependency bug, or session + hijack" → 3-user helps (game state survives). +- "Gameserver gets RCE'd via L4D2 source-engine bug" → 3-user + helps (web/DB survives). L4D2 is old code with known unpatched + vulns in the engine; this is non-negligible. +- "Sandbox script exploits a kernel bug to escape seccomp" → 2-user + already helps. Going to 3-user doesn't add much here. +- "Local privilege escalation via sudo helper" → 3-user is + modestly better (sudoers grants are narrower per-uid). + +If you take L4D2 engine RCE seriously, the 2→3 split has real +value. If you think the web app is the most exposed surface and +the gameservers are behind a NAT/firewall + only running trusted +maps, less so. + +## Migration plan sketch (for the 3-user option) + +If you choose 3-user, the rough plan: + +1. **Create the new users in ckn-bw bundle**: + ```python + users = { + 'l4d2-web': {'uid': 982, 'gid': 982, 'home': '/var/lib/left4me', + 'shell': '/usr/sbin/nologin'}, + 'l4d2-game': {'uid': 983, 'gid': 983, 'home': '/var/lib/left4me', + 'shell': '/usr/sbin/nologin'}, + 'l4d2-sandbox': {'uid': 981, ...}, # unchanged + } + groups = { + 'l4d2-web': {'gid': 982}, + 'l4d2-game': {'gid': 983}, + 'l4d2-sandbox': {'gid': 981}, + 'l4d2-overlay': {'gid': 984, 'members': ['l4d2-web', 'l4d2-game']}, + } + ``` + The old `left4me` user/group can be kept as an alias or + removed entirely. + +2. **Re-chown across the migration**: + - `/opt/left4me/src` → `l4d2-web:l4d2-web` (recursive). + - `/var/lib/left4me/left4me.db` → `root:l4d2-web 0640`. + - `/etc/left4me/web.env` → `root:l4d2-web 0640`. + - `/etc/left4me/host.env` → `root:l4d2-web 0640` (decide if + game needs to read it; if so, supplementary group). + - `/var/lib/left4me/overlays` → `l4d2-web:l4d2-overlay 2775`. + - `/var/lib/left4me/instances` → `l4d2-web:l4d2-web`. + - `/var/lib/left4me/runtime` → `l4d2-web:l4d2-overlay 2775` + (or whatever combination accommodates upper-layer writes from + both uids). + - `/var/lib/left4me/installation` → `l4d2-game:l4d2-game` + (steamcmd writes here as the game user). + +3. **Update `User=`/`Group=` in systemd units**: + - `left4me-web.service`: `User=l4d2-web Group=l4d2-web` + - `left4me-server@.service`: `User=l4d2-game Group=l4d2-game` + - Both: add `SupplementaryGroups=l4d2-overlay` where needed. + +4. **Update sudoers** (`/etc/sudoers.d/left4me`): + ``` + l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl * + l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl * + l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox # or the build-overlay unit + ``` + Game uid gets no sudoers grants at all (gameservers don't + privesc). + +5. **Update the script-sandbox idmap target**: + - In `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, + change `id -u left4me` → `id -u l4d2-web` and matching + groups. One-line change. + +6. **Update the admin CLI**: + - `/usr/local/sbin/left4me` does `sudo -u left4me sh -c '. host.env; + . web.env; flask …'`. Change to `sudo -u l4d2-web …`. + +7. **Update steamcmd ownership in ckn-bw**: + - `actions['left4me_install_steamcmd']` runs as `left4me`; + change to `l4d2-game`. + - `/opt/left4me/steam` dir → `l4d2-game`. + +8. **Add tests** that assert the user-split invariant in + `deploy/tests/test_deploy_artifacts.py` (e.g. sudoers grants + to the right uid, systemd unit `User=` matches). + +9. **Run the migration on the test server**: + - Stop all `left4me-server@*` + `left4me-web`. + - Run a one-shot `chown` script (or `bw apply` if the bundle + does the chowns). + - Start everything; verify nothing broken. + +Estimate: 1-2 working days. Most of the time is debugging surprise +cross-uid permission failures (the kind of thing you can't fully +predict without trying it). + +## Open decisions + +1. **Do we actually want this?** Threat-model section above. If + the answer is "no, 2-user is fine," close this doc and don't + come back to it. The current state is correct and tested. +2. **If yes, web vs game split, or also split per-instance + gameserver?** Recommendation: single `l4d2-game` uid for all + instances. Per-instance uids add a lot of ceremony for a + marginal hardening gain. +3. **`l4d2-overlay` shared group: one group or two?** Could split + into `l4d2-overlay-read` (game-membership) and + `l4d2-overlay-write` (web-membership), but probably overkill. +4. **Keep the `left4me` user as a no-op alias for compatibility?** + Existing systemd units, sudoers, etc. all reference `left4me`; + migration is easier if we just rename. But a clean break is + easier to reason about. Recommendation: clean break, no + alias. +5. **Should we collapse to 1 user instead?** Captured above. The + defense-in-depth from `l4d2-sandbox` is real; recommend + keeping it. +6. **Does `host.env` need to be readable by both web and game?** + Audit what's in it. If it's all gameserver-specific values, + `root:l4d2-game`. If split, may need to factor into two env + files or use supplementary groups. +7. **`steamcmd` install location and ownership.** Today `steamcmd` + self-updates at runtime as `left4me`. With game uid, this + needs to run as `l4d2-game` (or the game uid needs write to + `/opt/left4me/steam/`). + +## Verification (for the 3-user migration) + +After migration on `left4.me`: + +```bash +# uids are distinct +id l4d2-web; id l4d2-game; id l4d2-sandbox + +# nothing left4me-owned remains +sudo find /var/lib/left4me /opt/left4me /etc/left4me -user left4me 2>/dev/null +# expect: empty + +# the right things own the right things +sudo ls -ln /var/lib/left4me/left4me.db # root:l4d2-web 0640 +sudo ls -ln /etc/left4me/web.env # root:l4d2-web 0640 +sudo ls -ln /opt/left4me/src # l4d2-web:l4d2-web +sudo ls -ln /var/lib/left4me/overlays # l4d2-web:l4d2-overlay 2775 +sudo ls -ln /var/lib/left4me/installation # l4d2-game:l4d2-game + +# unit user= is right +systemctl show left4me-web -p User # l4d2-web +systemctl show left4me-server@2 -p User # l4d2-game + +# game uid can read overlays via shared group +sudo -u l4d2-game cat /var/lib/left4me/overlays/9/left4dead2/addons/sourcemod.vdf +# (should succeed) + +# game uid cannot read DB +sudo -u l4d2-game cat /var/lib/left4me/left4me.db 2>&1 +# expect: permission denied + +# web uid cannot ptrace srcds +sudo -u l4d2-web gdb --batch -ex "attach $(pgrep -f srcds_linux | head -1)" 2>&1 +# expect: Operation not permitted + +# everything works end-to-end +# - server 2 stays running +# - script overlay rebuild succeeds +# - web UI responds and shows live logs from server 2 +``` + +## Risks (for the 3-user migration) + +- **Surprise file-access failures** in odd corners: the admin CLI, + log paths, lock files, alembic migrations, the database WAL + files, steamcmd cache, workshop cache. Each is a 30-second fix + but they add up. +- **Concurrent migration + running services**: stop everything, + migrate, restart. Don't try to migrate live. +- **Workshop builds**: today the web app calls `steamcmd` directly + in-process to download workshop items. With a split, this needs + to either move to a sudo'd helper (run as `l4d2-game` since + `steamcmd` is game-owned) or duplicate the steam install for the + web user. Probably the former. +- **Gameserver writes that we didn't catch**: log files, lock + files, srcds-internal state. Some may live outside the overlay + and need explicit handling. +- **The build-time-idmap one-line change might miss something**: + the sandbox helper hard-codes `id -u left4me`; that becomes + `id -u l4d2-web`. But if any other tooling (deploy scripts, + doc commands) references `left4me`, those need updating too. +- **ckn-bw migration**: the user/group/file changes also need to + land in ckn-bw's `bundles/left4me/items.py` (`users`, `groups`, + `directories`, `files`). Cross-repo coordination. + +## Pointers + +- **Current users**: ckn-bw `bundles/left4me/items.py:42-58` + (groups + users dicts). +- **Existing sudoers**: `deploy/files/etc/sudoers.d/left4me`. +- **Systemd unit User= directives**: emitted from ckn-bw + `bundles/left4me/metadata.py` systemd_units reactor. Search for + `User=left4me`. +- **Web env files**: `/etc/left4me/{host,web}.env` templated from + ckn-bw `bundles/left4me/files/etc/left4me/{host,web}.env.mako`. +- **Database location and mode**: ckn-bw `bundles/left4me/items.py` + has the chmod near the steamcmd/alembic actions (the "0640" + block). +- **Build-time-idmap helper** that needs the one-line target-uid + change: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`, + the `LEFT4ME_UID=$(id -u left4me)` line. +- **Admin CLI**: `deploy/files/usr/local/sbin/left4me`. + +Related design docs: +- `docs/superpowers/plans/2026-05-15-build-time-idmap.md` — flagged + this as out-of-scope future direction. +- `docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md` — + adjacent open questions about `deploy/` layout. +- `docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md` — + template-unit refactor for the script sandbox. Both that refactor + and this one are orthogonal; either can land first. If both are + on the table, doing the unit refactor first means the user-split + change touches a clean unit file instead of a bash helper. + +## What's NOT in scope + +- Per-instance gameserver uids (one uid per server). Marginal gain + for a single-host deploy. +- Splitting the web app process (Flask + gunicorn + job worker) + into separate uids. They run in the same Python process; would + require a worker-as-subprocess redesign. +- Replacing systemd-managed users with PrivateUsers=true (userns + mapping). Different mechanism; doesn't replace POSIX uid + separation for filesystem access. +- Hardening the existing 2-user setup further (seccomp tightening, + capability drops, etc.) — out of scope for the *split* + decision; could happen in either configuration. + +## Decision criteria + +Do this if: +- L4D2 engine RCE potential keeps you up at night. +- The web app handles sensitive operations (admin auth, payments, + multi-tenancy). +- You want to harden as a matter of hygiene even without a + specific threat. + +Skip this if: +- Solo-operator infra on a personal VPS, internal-only. +- The compromise scenarios above are unlikely or low-impact. +- You'd rather spend the day on user-facing features. + +The current 2-user state is correct and tested. This refactor is +upgrade-not-fix.