spec: handoff for revisiting 1/2/3-user split for left4me

The 2-user split (left4me + l4d2-sandbox) has been inherited as a
constraint across multiple recent plans (idmap-on-mount, build-time-
idmap, helper consolidation) without ever being designed
end-to-end. Three plausible configurations: collapse to 1 user
(rejected for security), keep at 2 users (status quo), or split web
from game into 3 users for blast-radius limiting on either side.

Doc captures the threat-model heuristics, cross-uid file-access
plumbing options (shared group vs. world-read), idmap implications,
a step-by-step migration sketch for the 3-user variant, and explicit
out-of-scope items (per-instance gameserver uids, etc.). Detailed
enough that a future session can pick a configuration and execute
without re-deriving the design space.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-15 01:58:09 +02:00

21 KiB

Raw Blame History

How many system users should left4me have? — 1, 2, or 3

Status: open question, not settled design. This is a handoff document. Today left4me has 2 system users: left4me (web app + gameservers + workshop builds) and l4d2-sandbox (script-overlay sandbox). Whether that split is correct — should we collapse, or split further? — has surfaced multiple times across recent design work without ever being settled. A future session should evaluate and decide; this doc gives them enough context to do so cold.

Why this came up

Three relevant moments:

Build-time idmap (2026-05-15, this session): when we considered eliminating the l4d2-sandbox uid entirely and just running the script sandbox as left4me, we noted "sandbox escape could see web.env / DB / running gameservers" — i.e. uid separation is the load-bearing defense layer. We kept l4d2-sandbox. Plan at docs/superpowers/plans/2026-05-15-build-time-idmap.md flagged this as an out-of-scope future direction.
Idmap-on-mount plan (2026-05-14): in docs/superpowers/plans/2026-05-14-overlay-idmap.md "Out of scope" section: "Gameserver uid split (separating the gameserver-runtime uid from left4me) — planned for a later session." No design captured.
In this session's conversation: when discussing the server-side-symlink option for the helper consolidation, we surfaced that a compromised web app (running as left4me) can already reach left4me-owned gameserver state, RCON, etc., because they share a uid. Splitting them would localize web compromise.

The question never got a structured answer because each plan inherited "we have 2 users" as a constraint, not as a design choice. This doc fixes that.

Current state (2 users)

User → what runs as it:

User	Runs as	Reads	Writes
`left4me` (uid 980)	Flask web app (`left4me-web.service`), `srcds_run` for each gameserver instance (`left4me-server@.service`), web-driven workshop & files-overlay builds (in-process Python).	DB, env files, /opt/left4me/src, all of /var/lib/left4me/.	DB, /var/lib/left4me/{overlays,instances,runtime,…}, /opt/left4me/src (pip install -e creates egg-info).
`l4d2-sandbox` (uid 981)	Script-overlay sandbox (`systemd-run`-launched transient unit; will become `build-overlay@.service` if that refactor lands).	`/etc/left4me/sandbox-resolv.conf`, `/etc/ssl`, `/etc/ca-certificates`, the script bind-mounted at `/script.sh`, the idmapped `/overlay` bind.	`/overlay` only. After build-time-idmap refactor, those writes land on disk as `left4me`-owned via the bind's uid translation.

What this prevents today:

Sandbox-escape ⇒ can't read the DB (different uid; DB is root:left4me 0640).
Sandbox-escape ⇒ can't attach to gameserver processes (different uid; can't ptrace, can't signal).
Sandbox-escape ⇒ can't write to anything outside its bind (ProtectSystem=strict + bind list).

What this doesn't prevent today:

Web-app compromise ⇒ full access to DB, env files, all gameservers (same uid).
Web-app compromise ⇒ can sudo the privileged helpers (script-sandbox, overlay mount, systemctl) per sudoers rules.
Web-app compromise ⇒ can replace /opt/left4me/src/ Python code (it's left4me-owned); on next gunicorn reload, attacker code runs as the web app.
Web-app compromise ⇒ can ptrace running gameservers (same uid; can read their memory, inject code).
Gameserver compromise (e.g. RCE via game protocol bug) ⇒ symmetric: read DB, mess with web app, etc.

Three configurations

1-user (`left4me` only)

Collapse l4d2-sandbox into left4me. Sandbox runs as left4me with the existing systemd hardening (ProtectSystem=strict, narrow binds, seccomp, etc.).

Pros: simplest. No idmap needed anywhere (sandbox writes land as left4me natively). Drops ~40 lines of helper code. No cross-uid file-access plumbing.
Cons: a sandbox escape that the systemd hardening fails to contain (e.g. kernel bug bypassing seccomp; mount-namespace escape) gains the web app's uid — full DB / env / gameserver access. The current 2-user split exists specifically to limit this blast radius.
When this is OK: if you're confident systemd hardening is load-bearing and uid separation is belt-and-braces. The cost-of- failure is "web app compromised from a sandbox bug" — judge whether you'd accept that risk.

2-user (current: `left4me` + `l4d2-sandbox`)

Status quo after the build-time-idmap refactor. Web/game share a uid; sandbox is separate.

Pros: existing architecture, working code, all tests pass. One idmap point (build-time), well-understood.
Cons: a web-app compromise has full access to gameserver state (same uid as srcds). A gameserver RCE (e.g. via L4D2 network code bug) has full access to the web app's state.
Threat model: assumes the web app and gameservers are mutually trusting. Acceptable for solo-operator infra; less so for multi-tenant.

3-user (`l4d2-web` + `l4d2-game` + `l4d2-sandbox`)

Split left4me into two: l4d2-web for the web app, l4d2-game for gameservers. Keep l4d2-sandbox.

Pros: localizes web compromise (can no longer attach to running gameservers, read their memory, modify their per-instance config). Symmetric protection for gameserver RCEs (can't reach the DB / env files directly).
Cons: significant plumbing for cross-uid file access (overlays, instance state, upper-layer staging). Reintroduces an idmap concern at the gameserver boundary (overlay copy-up by l4d2-game of l4d2-web-owned lowerdirs), unless we use a shared group instead. See "Cross-uid plumbing" below.

The bigger uid-set version (per-instance gameserver uids, e.g. l4d2-game-1, l4d2-game-2) is out of scope for this doc — the marginal gain over a single l4d2-game is small for a single- host deployment.

Cross-uid plumbing (the hard part of 3-user)

These are the file boundaries that the split affects:

Overlays (`/var/lib/left4me/overlays/<id>/`)

Today: l4d2-web-owned (after build-time-idmap migration).
l4d2-game needs to read them as lowerdir at overlay mount time.
Sandbox writes through idmap; the idmap target uid must be the one whose writes appear on disk.

Three approaches:

Shared group l4d2-overlay: both l4d2-web and l4d2-game members. Overlays chgrp'd to l4d2-overlay, mode 2775 (setgid so new entries inherit). Sandbox idmap maps to whichever primary uid (probably l4d2-web since the web app creates the overlay dirs). l4d2-game reads via group access. Copy-up by l4d2-game produces files owned by l4d2-game (its own primary uid) — but since they end up in upper, that's fine.
World-readable: overlays mode 0755; l4d2-game reads via "other" access. Simpler but slightly looser perms. Acceptable for internal infra.
Per-overlay idmap on gameserver mount (back to what we just removed): bind overlays as idmapped lowerdirs at gameserver start time, presenting them as l4d2-game-owned to overlayfs. We already know this works — we just deleted that code path. Don't re-add it; prefer group-based access.

Upper layer (`/var/lib/left4me/runtime/<n>/upper/`)

Today: left4me-owned, written by both the web app (server.cfg staging in start_instance) and the gameserver (copy-up at runtime).
With 3-user: l4d2-game writes via copy-up; l4d2-web writes server.cfg staging.

Two paths:

Shared group l4d2-runtime (could be the same as l4d2-overlay or distinct). Upper-layer dir mode 2775, setgid'd, both uids write via group.
Move server.cfg staging out of start_instance into the systemd unit itself: ExecStartPre=+... does the cp as root and chowns to l4d2-game. Cleaner separation of concerns; harder to pipe dynamic content (the cfg lines come from the DB blueprint). Either pass them via an env file, or have the web app write instances/<n>/server.cfg (its own dir) and the unit cps it into upper.

Database (`/var/lib/left4me/left4me.db`)

Today: root:left4me 0640. left4me (uid 980) can read and write.
With 3-user: only l4d2-web should write. l4d2-game shouldn't need DB access at all (gameservers operate from instance.env + overlay state; DB is a web-side concern).
Migration: chown to root:l4d2-web, mode unchanged. Game uid is not in the l4d2-web group, so it can't read the DB. Clean.

Env files (`/etc/left4me/{host.env,web.env}`)

Today: root:left4me 0640. host.env for L4D2 server config (used by systemd units), web.env for the Flask app (secret_key, DB url, etc.).
With 3-user:
- web.env → root:l4d2-web 0640. Only the web app reads it.
- host.env is more nuanced — used by gameserver units (left4me-server@.service sources it). Today both web and game read it (web uses host.env for some operations too). Probably keep readable by both: root owner, group membership for both uids. Or duplicate the few values that matter.

Code (`/opt/left4me/src/`)

Today: left4me-owned (for pip install -e editable installs; creates egg-info/ in the source tree).
With 3-user: only the web app needs to run the Python code from this tree. Make it l4d2-web-owned.
The gameserver runs srcds_run from /var/lib/left4me/installation/ (Steam-managed) and the overlay binds — doesn't touch /opt/left4me/src/.

Helpers and sudoers

left4me-overlay: invoked by left4me-server@.service (root via systemd + prefix). Doesn't need user accounts at all.
left4me-script-sandbox (today): invoked by the web app via sudo. With 3-user, the sudoers grant moves from left4me to l4d2-web.
left4me-systemctl and left4me-journalctl: invoked by the web app. Same — sudoers moves to l4d2-web.
The admin CLI /usr/local/sbin/left4me (the sudo left4me <flask cmd> wrapper): drops to l4d2-web (the web app's uid) to run flask commands.

Idmap implications

With the build-time idmap landing on 2026-05-15, the sandbox's writes get translated to the in-mount uid → disk-side uid via mount --bind --map-users=<disk_uid>:<sandbox_uid>:1.

Currently disk_uid = left4me.

Under 3-user, the disk-side target of the sandbox idmap should be the uid that "owns" overlay state — i.e. l4d2-web, since the web app creates the overlay dirs and reads them for the file-tree endpoint. Gameserver access goes through the shared group, not the idmap.

This means the helper's id -u left4me → id -u l4d2-web is the one-line change captured in the build-time-idmap plan. Trivial.

Threat-model heuristics

The decision really turns on what you think is most likely:

"Web app gets RCE'd via a Flask/dependency bug, or session hijack" → 3-user helps (game state survives).
"Gameserver gets RCE'd via L4D2 source-engine bug" → 3-user helps (web/DB survives). L4D2 is old code with known unpatched vulns in the engine; this is non-negligible.
"Sandbox script exploits a kernel bug to escape seccomp" → 2-user already helps. Going to 3-user doesn't add much here.
"Local privilege escalation via sudo helper" → 3-user is modestly better (sudoers grants are narrower per-uid).

If you take L4D2 engine RCE seriously, the 2→3 split has real value. If you think the web app is the most exposed surface and the gameservers are behind a NAT/firewall + only running trusted maps, less so.

Migration plan sketch (for the 3-user option)

If you choose 3-user, the rough plan:

Create the new users in ckn-bw bundle:

users = {
    'l4d2-web':     {'uid': 982, 'gid': 982, 'home': '/var/lib/left4me',
                     'shell': '/usr/sbin/nologin'},
    'l4d2-game':    {'uid': 983, 'gid': 983, 'home': '/var/lib/left4me',
                     'shell': '/usr/sbin/nologin'},
    'l4d2-sandbox': {'uid': 981, ...},  # unchanged
}
groups = {
    'l4d2-web':     {'gid': 982},
    'l4d2-game':    {'gid': 983},
    'l4d2-sandbox': {'gid': 981},
    'l4d2-overlay': {'gid': 984, 'members': ['l4d2-web', 'l4d2-game']},
}

The old left4me user/group can be kept as an alias or removed entirely.

Re-chown across the migration:
- /opt/left4me/src → l4d2-web:l4d2-web (recursive).
- /var/lib/left4me/left4me.db → root:l4d2-web 0640.
- /etc/left4me/web.env → root:l4d2-web 0640.
- /etc/left4me/host.env → root:l4d2-web 0640 (decide if game needs to read it; if so, supplementary group).
- /var/lib/left4me/overlays → l4d2-web:l4d2-overlay 2775.
- /var/lib/left4me/instances → l4d2-web:l4d2-web.
- /var/lib/left4me/runtime → l4d2-web:l4d2-overlay 2775 (or whatever combination accommodates upper-layer writes from both uids).
- /var/lib/left4me/installation → l4d2-game:l4d2-game (steamcmd writes here as the game user).
Update User=/Group= in systemd units:
- left4me-web.service: User=l4d2-web Group=l4d2-web
- left4me-server@.service: User=l4d2-game Group=l4d2-game
- Both: add SupplementaryGroups=l4d2-overlay where needed.

Update sudoers (/etc/sudoers.d/left4me):

l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox  # or the build-overlay unit

Game uid gets no sudoers grants at all (gameservers don't privesc).

Update the script-sandbox idmap target:
- In deploy/files/usr/local/libexec/left4me/left4me-script-sandbox, change id -u left4me → id -u l4d2-web and matching groups. One-line change.
Update the admin CLI:
- /usr/local/sbin/left4me does sudo -u left4me sh -c '. host.env; . web.env; flask …'. Change to sudo -u l4d2-web ….
Update steamcmd ownership in ckn-bw:
- actions['left4me_install_steamcmd'] runs as left4me; change to l4d2-game.
- /opt/left4me/steam dir → l4d2-game.
Add tests that assert the user-split invariant in deploy/tests/test_deploy_artifacts.py (e.g. sudoers grants to the right uid, systemd unit User= matches).
Run the migration on the test server:
- Stop all left4me-server@* + left4me-web.
- Run a one-shot chown script (or bw apply if the bundle does the chowns).
- Start everything; verify nothing broken.

Estimate: 1-2 working days. Most of the time is debugging surprise cross-uid permission failures (the kind of thing you can't fully predict without trying it).

Open decisions

Do we actually want this? Threat-model section above. If the answer is "no, 2-user is fine," close this doc and don't come back to it. The current state is correct and tested.
If yes, web vs game split, or also split per-instance gameserver? Recommendation: single l4d2-game uid for all instances. Per-instance uids add a lot of ceremony for a marginal hardening gain.
l4d2-overlay shared group: one group or two? Could split into l4d2-overlay-read (game-membership) and l4d2-overlay-write (web-membership), but probably overkill.
Keep the left4me user as a no-op alias for compatibility? Existing systemd units, sudoers, etc. all reference left4me; migration is easier if we just rename. But a clean break is easier to reason about. Recommendation: clean break, no alias.
Should we collapse to 1 user instead? Captured above. The defense-in-depth from l4d2-sandbox is real; recommend keeping it.
Does host.env need to be readable by both web and game? Audit what's in it. If it's all gameserver-specific values, root:l4d2-game. If split, may need to factor into two env files or use supplementary groups.
steamcmd install location and ownership. Today steamcmd self-updates at runtime as left4me. With game uid, this needs to run as l4d2-game (or the game uid needs write to /opt/left4me/steam/).

Verification (for the 3-user migration)

After migration on left4.me:

# uids are distinct
id l4d2-web; id l4d2-game; id l4d2-sandbox

# nothing left4me-owned remains
sudo find /var/lib/left4me /opt/left4me /etc/left4me -user left4me 2>/dev/null
# expect: empty

# the right things own the right things
sudo ls -ln /var/lib/left4me/left4me.db          # root:l4d2-web 0640
sudo ls -ln /etc/left4me/web.env                  # root:l4d2-web 0640
sudo ls -ln /opt/left4me/src                      # l4d2-web:l4d2-web
sudo ls -ln /var/lib/left4me/overlays             # l4d2-web:l4d2-overlay 2775
sudo ls -ln /var/lib/left4me/installation         # l4d2-game:l4d2-game

# unit user= is right
systemctl show left4me-web -p User                # l4d2-web
systemctl show left4me-server@2 -p User           # l4d2-game

# game uid can read overlays via shared group
sudo -u l4d2-game cat /var/lib/left4me/overlays/9/left4dead2/addons/sourcemod.vdf
# (should succeed)

# game uid cannot read DB
sudo -u l4d2-game cat /var/lib/left4me/left4me.db 2>&1
# expect: permission denied

# web uid cannot ptrace srcds
sudo -u l4d2-web gdb --batch -ex "attach $(pgrep -f srcds_linux | head -1)" 2>&1
# expect: Operation not permitted

# everything works end-to-end
# - server 2 stays running
# - script overlay rebuild succeeds
# - web UI responds and shows live logs from server 2

Risks (for the 3-user migration)

Surprise file-access failures in odd corners: the admin CLI, log paths, lock files, alembic migrations, the database WAL files, steamcmd cache, workshop cache. Each is a 30-second fix but they add up.
Concurrent migration + running services: stop everything, migrate, restart. Don't try to migrate live.
Workshop builds: today the web app calls steamcmd directly in-process to download workshop items. With a split, this needs to either move to a sudo'd helper (run as l4d2-game since steamcmd is game-owned) or duplicate the steam install for the web user. Probably the former.
Gameserver writes that we didn't catch: log files, lock files, srcds-internal state. Some may live outside the overlay and need explicit handling.
The build-time-idmap one-line change might miss something: the sandbox helper hard-codes id -u left4me; that becomes id -u l4d2-web. But if any other tooling (deploy scripts, doc commands) references left4me, those need updating too.
ckn-bw migration: the user/group/file changes also need to land in ckn-bw's bundles/left4me/items.py (users, groups, directories, files). Cross-repo coordination.

Pointers

Current users: ckn-bw bundles/left4me/items.py:42-58 (groups + users dicts).
Existing sudoers: deploy/files/etc/sudoers.d/left4me.
Systemd unit User= directives: emitted from ckn-bw bundles/left4me/metadata.py systemd_units reactor. Search for User=left4me.
Web env files: /etc/left4me/{host,web}.env templated from ckn-bw bundles/left4me/files/etc/left4me/{host,web}.env.mako.
Database location and mode: ckn-bw bundles/left4me/items.py has the chmod near the steamcmd/alembic actions (the "0640" block).
Build-time-idmap helper that needs the one-line target-uid change: deploy/files/usr/local/libexec/left4me/left4me-script-sandbox, the LEFT4ME_UID=$(id -u left4me) line.
Admin CLI: deploy/files/usr/local/sbin/left4me.

Related design docs:

docs/superpowers/plans/2026-05-15-build-time-idmap.md — flagged this as out-of-scope future direction.
docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md — adjacent open questions about deploy/ layout.
docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md — template-unit refactor for the script sandbox. Both that refactor and this one are orthogonal; either can land first. If both are on the table, doing the unit refactor first means the user-split change touches a clean unit file instead of a bash helper.

What's NOT in scope

Per-instance gameserver uids (one uid per server). Marginal gain for a single-host deploy.
Splitting the web app process (Flask + gunicorn + job worker) into separate uids. They run in the same Python process; would require a worker-as-subprocess redesign.
Replacing systemd-managed users with PrivateUsers=true (userns mapping). Different mechanism; doesn't replace POSIX uid separation for filesystem access.
Hardening the existing 2-user setup further (seccomp tightening, capability drops, etc.) — out of scope for the split decision; could happen in either configuration.

Decision criteria

Do this if:

L4D2 engine RCE potential keeps you up at night.
The web app handles sensitive operations (admin auth, payments, multi-tenancy).
You want to harden as a matter of hygiene even without a specific threat.

Skip this if:

Solo-operator infra on a personal VPS, internal-only.
The compromise scenarios above are unlikely or low-impact.
You'd rather spend the day on user-facing features.

The current 2-user state is correct and tested. This refactor is upgrade-not-fix.

21 KiB Raw Blame History