left4me/docs/superpowers/specs/2026-05-15-user-uid-split-design.md
mwiegand 8971b23617
refactor(sandbox): collapse l4d2-sandbox user into left4me
The hardening refactor that just landed closes the same-uid attack
surface (FS view, ptrace, /proc visibility, signals) for the web +
gameserver units via systemd directives plus system-wide
kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate
uid was the inconsistent half-step — defense-in-depth only, with
build-time-idmap complexity attached. One principle wins: harden
once, share the uid.

scripts/libexec/left4me-script-sandbox: drop the idmap block (uid
lookups, STAGING setup, cleanup_staging trap, mount --bind
--map-users), switch User=/Group= to left4me, point BindPaths at
\$OVERLAY_DIR directly. Header comment updated to reflect
hardening-not-uid as the same-uid defense. nsenter self-wrap kept —
it's about mount-namespace escape, not uid.

Tests + comments + companion docs updated. Build-time-idmap and
overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised
to "1 user is correct"; one-line update notes on the hardening
specs and the build-overlay-unit-design.

Companion ckn-bw commit removes the l4d2-sandbox user + group and
tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode
was specifically for the sandbox uid).
2026-05-15 15:50:57 +02:00

24 KiB

How many system users should left4me have? — 1, 2, or 3

Status: SUPERSEDED 2026-05-15 by the hardening refactor + uid-collapse.

The original question — should left4me have 1, 2, or 3 system users — is now answered: 1 user (after the uid-collapse refactor) is correct. The defenses that motivated a multi-user split (DB readability from srcds, cross-server ptrace, same-uid /proc visibility, web-side reach into gameserver state) are closed by the systemd hardening composition landed in the hardening-refactor plan (docs/superpowers/plans/2026-05-15-hardening-refactor.md):

  • PrivateUsers=true blocks cross-uid ptrace at the kernel level.
  • PrivatePIDs=true hides peer processes even when uids match.
  • TemporaryFileSystem= + minimal binds hide the DB and web.env from srcds entirely.
  • SystemCallFilter=~@debug + empty CapabilityBoundingSet= block ptrace at the syscall layer.
  • System-wide kernel.yama.ptrace_scope=2 blocks same-uid ptrace.

The interim state (left4me + l4d2-sandbox) recorded earlier in this doc was the principled middle ground — script-sandbox builds keeping a separate uid for kernel-enforced isolation. After the hardening refactor closed the same-uid attack surface for server/web, leaving the sandbox on a separate uid was the inconsistent half-step. The uid-collapse refactor (docs/superpowers/plans/2026-05-15-uid-collapse.md) removed l4d2-sandbox so the sandbox now runs as left4me, defended by the same hardening profile. One principle: hardening covers it.

The residual filesystem-ACL surface (DB at 0640 root:left4me, web.env same) is a separate concern: a uid split would close it via kernel ACLs, but for the current deployment shape it's covered by the systemd-imposed FS view. If the deployment shape changes (multi-tenant host, shell logins as the service uid, additional services running as left4me outside these units) the uid split should be revisited.

The original content of this spec is preserved below for context.


Status: open question, not settled design. This is a handoff document. Today left4me has 2 system users: left4me (web app + gameservers + workshop builds) and l4d2-sandbox (script-overlay sandbox). Whether that split is correct — should we collapse, or split further? — has surfaced multiple times across recent design work without ever being settled. A future session should evaluate and decide; this doc gives them enough context to do so cold.

Why this came up

Three relevant moments:

  1. Build-time idmap (2026-05-15, this session): when we considered eliminating the l4d2-sandbox uid entirely and just running the script sandbox as left4me, we noted "sandbox escape could see web.env / DB / running gameservers" — i.e. uid separation is the load-bearing defense layer. We kept l4d2-sandbox. Plan at docs/superpowers/plans/2026-05-15-build-time-idmap.md flagged this as an out-of-scope future direction.

  2. Idmap-on-mount plan (2026-05-14): in docs/superpowers/plans/2026-05-14-overlay-idmap.md "Out of scope" section: "Gameserver uid split (separating the gameserver-runtime uid from left4me) — planned for a later session." No design captured.

  3. In this session's conversation: when discussing the server-side-symlink option for the helper consolidation, we surfaced that a compromised web app (running as left4me) can already reach left4me-owned gameserver state, RCON, etc., because they share a uid. Splitting them would localize web compromise.

The question never got a structured answer because each plan inherited "we have 2 users" as a constraint, not as a design choice. This doc fixes that.

Note: these are system units, not user units

Both left4me-server@.service and left4me-web.service are system units at /usr/local/lib/systemd/system/, started by PID 1, that drop to the unprivileged uid via User=left4me Group=left4me after their +-prefixed ExecStartPre runs as root. They are not user units (no systemctl --user, no per-user systemd instance, no lingering required).

This makes the user-split refactor much simpler than it would be otherwise. Changing User=/Group= in the unit is a literal one-line edit per unit. None of the typical user-unit friction applies — no lingering setup, no pam_systemd dependency, no "how does the user instance get bootstrapped on boot," no socket activation gymnastics. The privileged ExecStartPre/ExecStopPost steps continue to run as root via the + prefix regardless of what User= is set to.

Current state (2 users)

User → what runs as it:

User Runs as Reads Writes
left4me (uid 980) Flask web app (left4me-web.service), srcds_run for each gameserver instance (left4me-server@.service), web-driven workshop & files-overlay builds (in-process Python). DB, env files, /opt/left4me/src, all of /var/lib/left4me/. DB, /var/lib/left4me/{overlays,instances,runtime,…}, /opt/left4me/src (pip install -e creates egg-info).
l4d2-sandbox (uid 981) Script-overlay sandbox (systemd-run-launched transient unit; will become build-overlay@.service if that refactor lands). /etc/left4me/sandbox-resolv.conf, /etc/ssl, /etc/ca-certificates, the script bind-mounted at /script.sh, the idmapped /overlay bind. /overlay only. After build-time-idmap refactor, those writes land on disk as left4me-owned via the bind's uid translation.

What this prevents today:

  • Sandbox-escape ⇒ can't read the DB (different uid; DB is root:left4me 0640).
  • Sandbox-escape ⇒ can't attach to gameserver processes (different uid; can't ptrace, can't signal).
  • Sandbox-escape ⇒ can't write to anything outside its bind (ProtectSystem=strict + bind list).

What this doesn't prevent today:

  • Web-app compromise ⇒ full access to DB, env files, all gameservers (same uid).
  • Web-app compromise ⇒ can sudo the privileged helpers (script-sandbox, overlay mount, systemctl) per sudoers rules.
  • Web-app compromise ⇒ can replace /opt/left4me/src/ Python code (it's left4me-owned); on next gunicorn reload, attacker code runs as the web app.
  • Web-app compromise ⇒ can ptrace running gameservers (same uid; can read their memory, inject code).
  • Gameserver compromise (e.g. RCE via game protocol bug) ⇒ symmetric: read DB, mess with web app, etc.

Three configurations

1-user (left4me only)

Collapse l4d2-sandbox into left4me. Sandbox runs as left4me with the existing systemd hardening (ProtectSystem=strict, narrow binds, seccomp, etc.).

  • Pros: simplest. No idmap needed anywhere (sandbox writes land as left4me natively). Drops ~40 lines of helper code. No cross-uid file-access plumbing.
  • Cons: a sandbox escape that the systemd hardening fails to contain (e.g. kernel bug bypassing seccomp; mount-namespace escape) gains the web app's uid — full DB / env / gameserver access. The current 2-user split exists specifically to limit this blast radius.
  • When this is OK: if you're confident systemd hardening is load-bearing and uid separation is belt-and-braces. The cost-of- failure is "web app compromised from a sandbox bug" — judge whether you'd accept that risk.

2-user (current: left4me + l4d2-sandbox)

Status quo after the build-time-idmap refactor. Web/game share a uid; sandbox is separate.

  • Pros: existing architecture, working code, all tests pass. One idmap point (build-time), well-understood.
  • Cons: a web-app compromise has full access to gameserver state (same uid as srcds). A gameserver RCE (e.g. via L4D2 network code bug) has full access to the web app's state.
  • Threat model: assumes the web app and gameservers are mutually trusting. Acceptable for solo-operator infra; less so for multi-tenant.

3-user (l4d2-web + l4d2-game + l4d2-sandbox)

Split left4me into two: l4d2-web for the web app, l4d2-game for gameservers. Keep l4d2-sandbox.

  • Pros: localizes web compromise (can no longer attach to running gameservers, read their memory, modify their per-instance config). Symmetric protection for gameserver RCEs (can't reach the DB / env files directly).
  • Cons: significant plumbing for cross-uid file access (overlays, instance state, upper-layer staging). Reintroduces an idmap concern at the gameserver boundary (overlay copy-up by l4d2-game of l4d2-web-owned lowerdirs), unless we use a shared group instead. See "Cross-uid plumbing" below.

The bigger uid-set version (per-instance gameserver uids, e.g. l4d2-game-1, l4d2-game-2) is out of scope for this doc — the marginal gain over a single l4d2-game is small for a single- host deployment.

Cross-uid plumbing (the hard part of 3-user)

These are the file boundaries that the split affects:

Overlays (/var/lib/left4me/overlays/<id>/)

  • Today: l4d2-web-owned (after build-time-idmap migration).
  • l4d2-game needs to read them as lowerdir at overlay mount time.
  • Sandbox writes through idmap; the idmap target uid must be the one whose writes appear on disk.

Three approaches:

  • Shared group l4d2-overlay: both l4d2-web and l4d2-game members. Overlays chgrp'd to l4d2-overlay, mode 2775 (setgid so new entries inherit). Sandbox idmap maps to whichever primary uid (probably l4d2-web since the web app creates the overlay dirs). l4d2-game reads via group access. Copy-up by l4d2-game produces files owned by l4d2-game (its own primary uid) — but since they end up in upper, that's fine.
  • World-readable: overlays mode 0755; l4d2-game reads via "other" access. Simpler but slightly looser perms. Acceptable for internal infra.
  • Per-overlay idmap on gameserver mount (back to what we just removed): bind overlays as idmapped lowerdirs at gameserver start time, presenting them as l4d2-game-owned to overlayfs. We already know this works — we just deleted that code path. Don't re-add it; prefer group-based access.

Upper layer (/var/lib/left4me/runtime/<n>/upper/)

  • Today: left4me-owned, written by both the web app (server.cfg staging in start_instance) and the gameserver (copy-up at runtime).
  • With 3-user: l4d2-game writes via copy-up; l4d2-web writes server.cfg staging.

Two paths:

  • Shared group l4d2-runtime (could be the same as l4d2-overlay or distinct). Upper-layer dir mode 2775, setgid'd, both uids write via group.
  • Move server.cfg staging out of start_instance into the systemd unit itself: ExecStartPre=+... does the cp as root and chowns to l4d2-game. Cleaner separation of concerns; harder to pipe dynamic content (the cfg lines come from the DB blueprint). Either pass them via an env file, or have the web app write instances/<n>/server.cfg (its own dir) and the unit cps it into upper.

Database (/var/lib/left4me/left4me.db)

  • Today: root:left4me 0640. left4me (uid 980) can read and write.
  • With 3-user: only l4d2-web should write. l4d2-game shouldn't need DB access at all (gameservers operate from instance.env + overlay state; DB is a web-side concern).
  • Migration: chown to root:l4d2-web, mode unchanged. Game uid is not in the l4d2-web group, so it can't read the DB. Clean.

Env files (/etc/left4me/{host.env,web.env})

  • Today: root:left4me 0640. host.env for L4D2 server config (used by systemd units), web.env for the Flask app (secret_key, DB url, etc.).
  • With 3-user:
    • web.envroot:l4d2-web 0640. Only the web app reads it.
    • host.env is more nuanced — used by gameserver units (left4me-server@.service sources it). Today both web and game read it (web uses host.env for some operations too). Probably keep readable by both: root owner, group membership for both uids. Or duplicate the few values that matter.

Code (/opt/left4me/src/)

  • Today: left4me-owned (for pip install -e editable installs; creates egg-info/ in the source tree).
  • With 3-user: only the web app needs to run the Python code from this tree. Make it l4d2-web-owned.
  • The gameserver runs srcds_run from /var/lib/left4me/installation/ (Steam-managed) and the overlay binds — doesn't touch /opt/left4me/src/.

Helpers and sudoers

  • left4me-overlay: invoked by left4me-server@.service (root via systemd + prefix). Doesn't need user accounts at all.
  • left4me-script-sandbox (today): invoked by the web app via sudo. With 3-user, the sudoers grant moves from left4me to l4d2-web.
  • left4me-systemctl and left4me-journalctl: invoked by the web app. Same — sudoers moves to l4d2-web.
  • The admin CLI /usr/local/sbin/left4me (the sudo left4me <flask cmd> wrapper): drops to l4d2-web (the web app's uid) to run flask commands.

Idmap implications

With the build-time idmap landing on 2026-05-15, the sandbox's writes get translated to the in-mount uid → disk-side uid via mount --bind --map-users=<disk_uid>:<sandbox_uid>:1.

Currently disk_uid = left4me.

Under 3-user, the disk-side target of the sandbox idmap should be the uid that "owns" overlay state — i.e. l4d2-web, since the web app creates the overlay dirs and reads them for the file-tree endpoint. Gameserver access goes through the shared group, not the idmap.

This means the helper's id -u left4meid -u l4d2-web is the one-line change captured in the build-time-idmap plan. Trivial.

Threat-model heuristics

The decision really turns on what you think is most likely:

  • "Web app gets RCE'd via a Flask/dependency bug, or session hijack" → 3-user helps (game state survives).
  • "Gameserver gets RCE'd via L4D2 source-engine bug" → 3-user helps (web/DB survives). L4D2 is old code with known unpatched vulns in the engine; this is non-negligible.
  • "Sandbox script exploits a kernel bug to escape seccomp" → 2-user already helps. Going to 3-user doesn't add much here.
  • "Local privilege escalation via sudo helper" → 3-user is modestly better (sudoers grants are narrower per-uid).

If you take L4D2 engine RCE seriously, the 2→3 split has real value. If you think the web app is the most exposed surface and the gameservers are behind a NAT/firewall + only running trusted maps, less so.

Migration plan sketch (for the 3-user option)

If you choose 3-user, the rough plan:

  1. Create the new users in ckn-bw bundle:

    users = {
        'l4d2-web':     {'uid': 982, 'gid': 982, 'home': '/var/lib/left4me',
                         'shell': '/usr/sbin/nologin'},
        'l4d2-game':    {'uid': 983, 'gid': 983, 'home': '/var/lib/left4me',
                         'shell': '/usr/sbin/nologin'},
        'l4d2-sandbox': {'uid': 981, ...},  # unchanged
    }
    groups = {
        'l4d2-web':     {'gid': 982},
        'l4d2-game':    {'gid': 983},
        'l4d2-sandbox': {'gid': 981},
        'l4d2-overlay': {'gid': 984, 'members': ['l4d2-web', 'l4d2-game']},
    }
    

    The old left4me user/group can be kept as an alias or removed entirely.

  2. Re-chown across the migration:

    • /opt/left4me/srcl4d2-web:l4d2-web (recursive).
    • /var/lib/left4me/left4me.dbroot:l4d2-web 0640.
    • /etc/left4me/web.envroot:l4d2-web 0640.
    • /etc/left4me/host.envroot:l4d2-web 0640 (decide if game needs to read it; if so, supplementary group).
    • /var/lib/left4me/overlaysl4d2-web:l4d2-overlay 2775.
    • /var/lib/left4me/instancesl4d2-web:l4d2-web.
    • /var/lib/left4me/runtimel4d2-web:l4d2-overlay 2775 (or whatever combination accommodates upper-layer writes from both uids).
    • /var/lib/left4me/installationl4d2-game:l4d2-game (steamcmd writes here as the game user).
  3. Update User=/Group= in systemd units:

    • left4me-web.service: User=l4d2-web Group=l4d2-web
    • left4me-server@.service: User=l4d2-game Group=l4d2-game
    • Both: add SupplementaryGroups=l4d2-overlay where needed.
  4. Update sudoers (/etc/sudoers.d/left4me):

    l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-systemctl *
    l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-journalctl *
    l4d2-web ALL=(root) NOPASSWD: /usr/local/libexec/left4me/left4me-script-sandbox  # or the build-overlay unit
    

    Game uid gets no sudoers grants at all (gameservers don't privesc).

  5. Update the script-sandbox idmap target:

    • In deploy/files/usr/local/libexec/left4me/left4me-script-sandbox, change id -u left4meid -u l4d2-web and matching groups. One-line change.
  6. Update the admin CLI:

    • /usr/local/sbin/left4me does sudo -u left4me sh -c '. host.env; . web.env; flask …'. Change to sudo -u l4d2-web ….
  7. Update steamcmd ownership in ckn-bw:

    • actions['left4me_install_steamcmd'] runs as left4me; change to l4d2-game.
    • /opt/left4me/steam dir → l4d2-game.
  8. Add tests that assert the user-split invariant in deploy/tests/test_deploy_artifacts.py (e.g. sudoers grants to the right uid, systemd unit User= matches).

  9. Run the migration on the test server:

    • Stop all left4me-server@* + left4me-web.
    • Run a one-shot chown script (or bw apply if the bundle does the chowns).
    • Start everything; verify nothing broken.

Estimate: 1-2 working days. Most of the time is debugging surprise cross-uid permission failures (the kind of thing you can't fully predict without trying it).

Open decisions

  1. Do we actually want this? Threat-model section above. If the answer is "no, 2-user is fine," close this doc and don't come back to it. The current state is correct and tested.
  2. If yes, web vs game split, or also split per-instance gameserver? Recommendation: single l4d2-game uid for all instances. Per-instance uids add a lot of ceremony for a marginal hardening gain.
  3. l4d2-overlay shared group: one group or two? Could split into l4d2-overlay-read (game-membership) and l4d2-overlay-write (web-membership), but probably overkill.
  4. Keep the left4me user as a no-op alias for compatibility? Existing systemd units, sudoers, etc. all reference left4me; migration is easier if we just rename. But a clean break is easier to reason about. Recommendation: clean break, no alias.
  5. Should we collapse to 1 user instead? Captured above. The defense-in-depth from l4d2-sandbox is real; recommend keeping it.
  6. Does host.env need to be readable by both web and game? Audit what's in it. If it's all gameserver-specific values, root:l4d2-game. If split, may need to factor into two env files or use supplementary groups.
  7. steamcmd install location and ownership. Today steamcmd self-updates at runtime as left4me. With game uid, this needs to run as l4d2-game (or the game uid needs write to /opt/left4me/steam/).

Verification (for the 3-user migration)

After migration on left4.me:

# uids are distinct
id l4d2-web; id l4d2-game; id l4d2-sandbox

# nothing left4me-owned remains
sudo find /var/lib/left4me /opt/left4me /etc/left4me -user left4me 2>/dev/null
# expect: empty

# the right things own the right things
sudo ls -ln /var/lib/left4me/left4me.db          # root:l4d2-web 0640
sudo ls -ln /etc/left4me/web.env                  # root:l4d2-web 0640
sudo ls -ln /opt/left4me/src                      # l4d2-web:l4d2-web
sudo ls -ln /var/lib/left4me/overlays             # l4d2-web:l4d2-overlay 2775
sudo ls -ln /var/lib/left4me/installation         # l4d2-game:l4d2-game

# unit user= is right
systemctl show left4me-web -p User                # l4d2-web
systemctl show left4me-server@2 -p User           # l4d2-game

# game uid can read overlays via shared group
sudo -u l4d2-game cat /var/lib/left4me/overlays/9/left4dead2/addons/sourcemod.vdf
# (should succeed)

# game uid cannot read DB
sudo -u l4d2-game cat /var/lib/left4me/left4me.db 2>&1
# expect: permission denied

# web uid cannot ptrace srcds
sudo -u l4d2-web gdb --batch -ex "attach $(pgrep -f srcds_linux | head -1)" 2>&1
# expect: Operation not permitted

# everything works end-to-end
# - server 2 stays running
# - script overlay rebuild succeeds
# - web UI responds and shows live logs from server 2

Risks (for the 3-user migration)

  • Surprise file-access failures in odd corners: the admin CLI, log paths, lock files, alembic migrations, the database WAL files, steamcmd cache, workshop cache. Each is a 30-second fix but they add up.
  • Concurrent migration + running services: stop everything, migrate, restart. Don't try to migrate live.
  • Workshop builds: today the web app calls steamcmd directly in-process to download workshop items. With a split, this needs to either move to a sudo'd helper (run as l4d2-game since steamcmd is game-owned) or duplicate the steam install for the web user. Probably the former.
  • Gameserver writes that we didn't catch: log files, lock files, srcds-internal state. Some may live outside the overlay and need explicit handling.
  • The build-time-idmap one-line change might miss something: the sandbox helper hard-codes id -u left4me; that becomes id -u l4d2-web. But if any other tooling (deploy scripts, doc commands) references left4me, those need updating too.
  • ckn-bw migration: the user/group/file changes also need to land in ckn-bw's bundles/left4me/items.py (users, groups, directories, files). Cross-repo coordination.

Pointers

  • Current users: ckn-bw bundles/left4me/items.py:42-58 (groups + users dicts).
  • Existing sudoers: deploy/files/etc/sudoers.d/left4me.
  • Systemd unit User= directives: emitted from ckn-bw bundles/left4me/metadata.py systemd_units reactor. Search for User=left4me.
  • Web env files: /etc/left4me/{host,web}.env templated from ckn-bw bundles/left4me/files/etc/left4me/{host,web}.env.mako.
  • Database location and mode: ckn-bw bundles/left4me/items.py has the chmod near the steamcmd/alembic actions (the "0640" block).
  • Build-time-idmap helper that needs the one-line target-uid change: deploy/files/usr/local/libexec/left4me/left4me-script-sandbox, the LEFT4ME_UID=$(id -u left4me) line.
  • Admin CLI: deploy/files/usr/local/sbin/left4me.

Related design docs:

  • docs/superpowers/plans/2026-05-15-build-time-idmap.md — flagged this as out-of-scope future direction.
  • docs/superpowers/specs/2026-05-15-deploy-dir-rethink-design.md — adjacent open questions about deploy/ layout.
  • docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md — template-unit refactor for the script sandbox. Both that refactor and this one are orthogonal; either can land first. If both are on the table, doing the unit refactor first means the user-split change touches a clean unit file instead of a bash helper.

What's NOT in scope

  • Per-instance gameserver uids (one uid per server). Marginal gain for a single-host deploy.
  • Splitting the web app process (Flask + gunicorn + job worker) into separate uids. They run in the same Python process; would require a worker-as-subprocess redesign.
  • Replacing systemd-managed users with PrivateUsers=true (userns mapping). Different mechanism; doesn't replace POSIX uid separation for filesystem access.
  • Hardening the existing 2-user setup further (seccomp tightening, capability drops, etc.) — out of scope for the split decision; could happen in either configuration.

Decision criteria

Do this if:

  • L4D2 engine RCE potential keeps you up at night.
  • The web app handles sensitive operations (admin auth, payments, multi-tenancy).
  • You want to harden as a matter of hygiene even without a specific threat.

Skip this if:

  • Solo-operator infra on a personal VPS, internal-only.
  • The compromise scenarios above are unlikely or low-impact.
  • You'd rather spend the day on user-facing features.

The current 2-user state is correct and tested. This refactor is upgrade-not-fix.