spec(user-uid-split): mark superseded by the hardening refactor

The 1/2/3-user question is answered: stay at 2 (left4me + l4d2-sandbox). The defenses that motivated a 3-user split (cross-uid ptrace, cross-server contamination, web-side reach into gameserver state, DB/env exposure to srcds) are closed by the systemd hardening composition: PrivateUsers + PrivatePIDs + TemporaryFileSystem + SystemCallFilter=~@debug + empty CapabilityBoundingSet. The residual filesystem-ACL surface (mode 0640 root:left4me on DB and web.env) is noted as a separate concern — covered for the current deployment shape, revisit if shape changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
spec(hardening-test-plan): fix four bugs surfaced by executor
2026-05-15 14:59:13 +02:00 · 2026-05-15 14:58:46 +02:00 · 2026-05-15 14:54:10 +02:00 · 2026-05-15 14:39:11 +02:00 · 2026-05-15 14:25:25 +02:00 · 2026-05-15 14:16:02 +02:00
9 changed files with 4114 additions and 59 deletions
--- a/deploy/files/usr/local/lib/systemd/system/left4me-server@.service
+++ b/deploy/files/usr/local/lib/systemd/system/left4me-server@.service
@ -1,10 +1,21 @@
 # left4me gameserver — system unit, one instance per gameserver.
 #
 # This is the REFERENCE COPY of the deployed unit. The live source is
 # the systemd/units reactor at ~/Projekte/ckn-bw/bundles/left4me/metadata.py
 # (look for 'left4me-server@.service'). Hardening directives live in
 # the HARDENING_SERVER constant near the top of the same file.
 # This file is hand-synced; edit both together.
 #
 # Threat model: docs/superpowers/specs/2026-05-15-hardening-threat-model.md
 # Defenses survey: docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
 # Test plan + results: docs/superpowers/specs/2026-05-15-hardening-test-plan.md
 [Unit]
 Description=left4me server instance %i
 After=network-online.target
 Wants=network-online.target
 # Bound the restart loop. Without these, a persistent ExecStartPre or
-# ExecStart failure spins indefinitely. Note: these are [Unit]-section
+# ExecStart failure spins indefinitely.
 # directives (systemd 230+), not [Service].
 StartLimitBurst=5
 StartLimitIntervalSec=60s
@ -14,49 +25,25 @@ User=left4me
 Group=left4me
 EnvironmentFile=/etc/left4me/host.env
 EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
-# `-` prefix: chdir failure is non-fatal. systemd applies WorkingDirectory
+# `-` prefix: chdir failure is non-fatal. The merged dir only exists
-# before every Exec line — including ExecStartPre — but the merged dir only
+# once ExecStartPre's overlay mount succeeds.
 # exists once ExecStartPre's overlay mount succeeds. With `-`, ExecStartPre
 # runs in the unit's home (cwd doesn't matter for the mount helper); the
 # ExecStart re-applies WorkingDirectory after the mount and finds the dir.
 WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
-# Single source of truth for the kernel-overlayfs mount lifecycle: the web
+# `+` prefix runs the helper as PID 1 (root, all caps, host
-# app's start_instance only stages cfg files and asks systemd to enable+
+# namespaces) — required because the unit has NoNewPrivileges=true
-# start this unit; the actual `mount -t overlay` lives here so reboot
+# AND PrivateUsers=true; both block sudo's setuid path. nsenter into
-# auto-start works the same as a UI-driven start. ExecStopPost mirrors it
+# PID 1's mount namespace ensures the umount in ExecStopPost succeeds
-# so the unmount lives in the same place — no Python-side _mounter needed
+# without EBUSY from the unit's own slave-mount tree.
 # in stop/delete/reset paths. Both helper verbs are idempotent.
 #
 # `+` prefix runs the helper as PID 1 (root, no sandbox). Required because
 # the unit has NoNewPrivileges=true, which blocks sudo's setuid escalation
 # — and the helper itself needs root for the mount/umount syscalls.
 #
 # `nsenter --mount=/proc/1/ns/mnt --` runs the helper Python interpreter
 # in PID 1's mount namespace. Without this, the `+` prefix removes the
 # sandbox/credentials but does NOT detach from the unit's per-service
 # mount namespace (created by PrivateTmp/Protect*) — so the helper
 # process itself would hold a reference to that namespace, keeping the
 # slave-mount tree alive after the cgroup empties, and umount in PID 1
 # would return EBUSY for as long as the helper ran. Putting nsenter at
 # the unit-level (as opposed to inside the helper, where only the
 # umount syscall escaped) is what actually frees the namespace. Once
 # the helper is in PID 1's namespace, ExecStopPost's umount succeeds
 # on the first try with no retry/race window. ExecStopPost (not
 # ExecStop) so unmount runs after the cgroup is cleared; ExecStop runs
 # while srcds is still alive and would EBUSY.
 ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
-# Run from the merged overlay, NOT installation/. srcds_run is a shell
+# Run from the merged overlay, NOT installation/. srcds_run cds to its
-# script that `cd`s to its own dirname before exec'ing srcds_linux, so the
+# own dirname before exec'ing srcds_linux; the binary's path determines
-# binary's path determines where the engine reads gameinfo.txt and addons
+# gameinfo + addons lookup.
 # from — WorkingDirectory has no effect. Invoking installation/srcds_run
 # would resolve everything against the lower layer and never see overlay-
 # provided plugins (Metamod/SourceMod) or cfgs (zonemod, confogl).
 ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
 ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
 Restart=on-failure
 RestartSec=5
-# Resource control baseline — see docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
+# === Resource control baseline ===
 # See docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
 Slice=l4d2-game.slice
 Nice=-5
 IOSchedulingClass=best-effort
@ -70,16 +57,72 @@ KillSignal=SIGINT
 TimeoutStopSec=15s
 LogRateLimitIntervalSec=0
-# Hardening (unchanged from previous baseline).
+# === Identity / privilege drop ===
-NoNewPrivileges=true
+NoNewPrivileges=true                  # block setuid escalation (defense: D3)
 RestrictSUIDSGID=true                 # block setuid()/setgid() syscalls
 CapabilityBoundingSet=                # drop all caps — no privilege to escalate
 AmbientCapabilities=
 # === Filesystem virtualization ===
 # Mask /var/lib, /etc, /opt, etc. with empty tmpfs; bind back only
 # what srcds needs. The DB (/var/lib/left4me/left4me.db) and web.env
 # (/etc/left4me/web.env) are intentionally not bound — they don't
 # exist in this unit's filesystem view (defenses: D1.a, D1.b).
 TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
 BindReadOnlyPaths=/var/lib/left4me/installation
 BindReadOnlyPaths=/var/lib/left4me/overlays
 BindReadOnlyPaths=/etc/left4me/host.env
 BindReadOnlyPaths=/etc/ssl
 BindReadOnlyPaths=/etc/ca-certificates
 BindReadOnlyPaths=/etc/resolv.conf
 BindReadOnlyPaths=/etc/nsswitch.conf
 BindReadOnlyPaths=/etc/alternatives
 BindPaths=/var/lib/left4me/runtime/%i
 ProtectSystem=strict                  # belt-and-braces with TemporaryFileSystem
 ProtectHome=true
 # === Process namespacing ===
 PrivateUsers=true                     # own user namespace; cross-uid ptrace blocked (D2)
 PrivatePIDs=true                      # own PID namespace; hides peer-srcds + gunicorn (D2.b, D5)
 PrivateTmp=true
 PrivateDevices=true
-ProtectHome=true
+PrivateIPC=true
-ProtectSystem=strict
+RestrictNamespaces=true               # block unshare()/clone(CLONE_NEW*)
-ReadOnlyPaths=/var/lib/left4me/installation /var/lib/left4me/overlays
+
-ReadWritePaths=/var/lib/left4me/runtime/%i
+# === /proc and /sys ===
-RestrictSUIDSGID=true
+ProtectProc=invisible                 # foreign-uid /proc hidden (paired with PrivatePIDs for full hide)
-LockPersonality=true
+ProcSubset=pid                        # /proc shows only PID dirs, no kallsyms/cpuinfo
 ProtectKernelTunables=true            # /proc/sys, /sys read-only
 ProtectKernelModules=true             # no module load/unload
 ProtectKernelLogs=true                # no /dev/kmsg or syslog()
 ProtectClock=true                     # no settimeofday()
 ProtectControlGroups=true             # /sys/fs/cgroup read-only
 ProtectHostname=true                  # no sethostname()
 LockPersonality=true                  # no personality() switches
 # === Syscall filter ===
 # srcds_linux is i386 (Source 2007 engine). 'native x86' allows both
 # x86_64 (from srcds_run + the dynamic linker) and i386 (from srcds_linux).
 # Bare 'native' traps srcds_run in a respawn loop.
 SystemCallArchitectures=native x86
 SystemCallFilter=@system-service
 SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
 # ~@debug is the load-bearing block for D2.a: drops ptrace(), process_vm_readv/writev().
 # ~@privileged blocks anything requiring CAP_*, redundant with empty bounding set.
 # MemoryDenyWriteExecute=true is NOT set — Source engine i386 .so files
 # have text relocations that need mprotect(W+X) during dynamic-linker pass.
 # === Network ===
 RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX  # AF_UNIX needed for journald
 # Lock srcds bindable sockets to the game port range.
 SocketBindAllow=udp:27000-27999
 SocketBindAllow=tcp:27000-27999
 # === Misc hygiene ===
 RestrictRealtime=true                 # no real-time scheduling
 RemoveIPC=true                        # clean up SysV IPC on unit stop
 KeyringMode=private                   # private kernel keyring
 UMask=0027
 [Install]
 WantedBy=multi-user.target
--- a/deploy/files/usr/local/lib/systemd/system/left4me-web.service
+++ b/deploy/files/usr/local/lib/systemd/system/left4me-web.service
@ -1,3 +1,25 @@
 # left4me web application — system unit.
 #
 # This is the REFERENCE COPY of the deployed unit. The live source is
 # the systemd/units reactor at ~/Projekte/ckn-bw/bundles/left4me/metadata.py
 # (look for 'left4me-web.service'). Hardening directives live in
 # the HARDENING_WEB constant near the top of the same file.
 # This file is hand-synced; edit both together.
 #
 # Several directives that the gameserver uses are intentionally absent
 # from this unit:
 #   NoNewPrivileges    — blocks sudo's setuid escalation
 #   PrivateUsers       — breaks sudo's host-root mapping
 #   RestrictSUIDSGID   — blocks setuid()/setgid()
 #   CapabilityBoundingSet=  — empty value would deny sudo's caps
 #   ~@privileged in SystemCallFilter  — blocks sudo's setuid syscall
 # The web app invokes privileged helpers (left4me-systemctl,
 # left4me-overlay, left4me-script-sandbox) via sudo, so these
 # directives can't be applied here. A future refactor replacing sudo
 # with systemctl-managed transient units would unlock them.
 #
 # Threat model + defenses + tests: see docs/superpowers/specs/2026-05-15-hardening-*
 [Unit]
 Description=left4me web application
 After=network-online.target
@ -7,25 +29,53 @@ Wants=network-online.target
 Type=simple
 User=left4me
 Group=left4me
-WorkingDirectory=/opt/left4me
+WorkingDirectory=/opt/left4me/src
-Environment=HOME=/var/lib/left4me
+Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 Environment=PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 EnvironmentFile=/etc/left4me/host.env
 EnvironmentFile=/etc/left4me/web.env
-ExecStart=/opt/left4me/.venv/bin/gunicorn --workers 1 --threads 32 --bind 0.0.0.0:8000 'l4d2web.app:create_app()'
+# Placeholder values for --workers / --threads. Live emission interpolates
 # from metadata.get('left4me/gunicorn_workers') and gunicorn_threads.
 ExecStart=/opt/left4me/.venv/bin/gunicorn --workers 4 --threads 4 --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
 Restart=on-failure
 RestartSec=3
-# NoNewPrivileges intentionally not set: the worker invokes sudo to run
+
-# the left4me-systemctl, left4me-journalctl, and left4me-overlay
+# Web writes broadly under /var/lib/left4me (DB, instance configs,
-# privileged helpers, all setuid via sudo.
+# overlays, runtime). Kept inline because it's web-specific
-# ProtectSystem=full + ReadWritePaths implicitly give this unit a private
+# (server@ uses BindPaths to bind only its instance dir).
 # mount namespace, but mount visibility no longer depends on it: overlay
 # mounts are performed by the left4me-overlay helper, which nsenters into
 # PID 1's mount namespace, so the resulting mount lives in the host
 # namespace where the per-instance gameserver units can see it.
 ProtectSystem=full
 ReadWritePaths=/var/lib/left4me
 # === Filesystem ===
 ProtectSystem=strict                  # tightened from prior 'full'; via HARDENING_COMMON
 ProtectHome=true
 PrivateTmp=true
 # === /proc + kernel ===
 ProtectProc=invisible                 # foreign-uid /proc hidden (defense: D4)
 ProcSubset=pid
 ProtectKernelTunables=true
 ProtectKernelModules=true
 ProtectKernelLogs=true
 ProtectClock=true
 ProtectControlGroups=true
 ProtectHostname=true
 LockPersonality=true
 # === Syscall filter (sudo-compatible — note absence of ~@privileged) ===
 SystemCallArchitectures=native
 SystemCallFilter=@system-service
 SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete
 # ~@debug blocks ptrace + process_vm_readv/writev (D4).
 # ~@privileged intentionally omitted — sudo needs setuid().
 # === Network ===
 RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
 # === Misc hygiene ===
 RestrictNamespaces=true
 RestrictRealtime=true
 RemoveIPC=true
 KeyringMode=private
 UMask=0027
 [Install]
 WantedBy=multi-user.target
--- a/docs/superpowers/plans/2026-05-15-hardening-refactor.md
+++ b/docs/superpowers/plans/2026-05-15-hardening-refactor.md
--- a/docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
+++ b/docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
@ -0,0 +1,698 @@
 # left4me application hardening — defenses survey
 **Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md`
 and `2026-05-15-hardening-test-plan.md`.
 This document catalogs the Linux + systemd defense primitives applicable
 to left4me, evaluates each against this codebase's needs, and proposes a
 candidate composition. Each candidate is *testable* — the test plan
 exercises it before commit.
 Reference: the threat model defines defenses D1-D7. This document maps
 primitives to those defenses.
 ## Section 1 — Linux kernel primitives
 ### Namespaces (`man 7 namespaces`)
 | NS | Isolates | Relevance |
 |---|---|---|
 | **mount** | filesystem hierarchy view | Core. Gives `TemporaryFileSystem=` + bind primitives. |
 | **user** | uid/gid mapping | Big for D2/D4 (cross-uid ptrace block). |
 | **pid** | PID 1, /proc visibility | Pairs with `ProcSubset=pid` for D2. |
 | **net** | netifs, ports, routes | Breaks gameservers; do **not** apply to server@. |
 | **ipc** | SysV IPC + POSIX MQ + abstract sockets | Hygienic; `PrivateIPC=true`. |
 | **uts** | hostname | Cosmetic; doesn't matter for us. |
 | **time** | CLOCK_MONOTONIC offset | Irrelevant for us. |
 | **cgroup** | cgroup view | Defense-in-depth against cgroup escape. |
 **For left4me:** mount + user + pid + ipc on `left4me-server@.service`.
 The web unit can use the same minus user-ns (incompatible with sudo).
 ### Capabilities (`man 7 capabilities`)
 Per-process, granted at exec via file caps or by systemd at unit start.
 Bounding set = upper bound; ambient = inherited across non-setuid exec.
 - **CapabilityBoundingSet=** empty drops everything. Neither srcds nor
  gunicorn needs any capability after they start (no raw sockets, no
  mount, no module load, no setuid).
 - **AmbientCapabilities=** empty (default).
 Sharp edge: with `+`-prefixed ExecStartPre, the helper runs as PID 1
 (root, all caps), unaffected by these. That's how we get the privileged
 overlay mount without breaking the unit's caps.
 ### Seccomp-bpf (`man 2 seccomp`)
 Filter syscall set. Per-process. Composes with the AND of all filters
 loaded. The systemd `SystemCallFilter=` wraps it.
 For us, two filter strategies:
 - **Allow-list base** (`@system-service`): permissive enough for srcds
  + gunicorn; subtract dangerous groups.
 - **Deny-list**: simpler but easier to leave holes.
 Strategy: allow-list with subtractions.
 Critical subtractions for D2:
 - `~@debug` — drops `ptrace(2)`, `process_vm_readv/writev(2)`,
  `process_madvise(2)`. **Single most important syscall block** for our
  threat model.
 - `~@mount` — `mount`, `umount2`, `pivot_root` (gameserver doesn't need;
  helper does, and helper runs as root via `+` prefix).
 - `~@privileged` — anything requiring CAP_*; redundant with empty
  bounding set but defense-in-depth.
 - `~@reboot`, `~@swap`, `~@cpu-emulation`, `~@obsolete` — cheap removal.
 Sharp edges:
 - `SystemCallFilter=` lines compose left-to-right by union (first line
  sets allow-list; subsequent `~` lines subtract).
 - A `~` subtract on a group not in the allow-list is a no-op.
 - `SystemCallArchitectures=native` blocks 32-bit syscall entries that
  bypass the filter. Always set this.
 - `SystemCallErrorNumber=EPERM` vs. default `KILL` — `EPERM` is gentler
  for non-essential paths; `KILL` is loud and obvious. Start with
  default (KILL) for clear signal, switch to `EPERM` if a benign caller
  trips it (e.g., a library probing for capabilities).
 ### Yama LSM — `kernel.yama.ptrace_scope`
 System-wide sysctl. Values:
 - 0: any same-user can ptrace
 - 1: same-uid or direct ancestor (Debian default)
 - 2: requires `CAP_SYS_PTRACE` (admin only)
 - 3: ptrace disabled entirely
 For left4me: setting to 2 system-wide is cheap and removes the same-uid
 ptrace path entirely. Set via `/etc/sysctl.d/99-left4me.conf` (or
 extend an existing file). Doesn't affect debuggability — if you ever
 need to ptrace, do it as root.
 Caveat: Yama is enforced AT THE TIME of `ptrace` call. With seccomp
 blocking the syscall entirely (`~@debug`), Yama becomes belt-and-braces;
 keep both for defense-in-depth.
 ### LSMs other than Yama
 | LSM | Status on Debian Trixie | Fit for us |
 |---|---|---|
 | **AppArmor** | Available; not enabled by default | Could write profiles for srcds + gunicorn. Per-unit profile via `AppArmorProfile=` on systemd. Moderate effort. |
 | **SELinux** | Available; not enabled by default | Heavy. Not worth the operational cost on a single-host VPS. |
 | **landlock** | Kernel ≥5.13; available | Process-local sandboxing. Apps must opt in via `landlock(2)`. Python doesn't have a stdlib binding; need to call via ctypes or a wrapper. For us: would need to retrofit gunicorn or write a wrapper. Defer. |
 | **BPF LSM** | Kernel ≥5.7; available | Programmable LSM hooks. Bleeding edge for personal infra. Defer. |
 | **Tomoyo** | Available; not Debian-enabled | Path-based MAC. Niche. Skip. |
 **For left4me:** Yama yes. AppArmor *maybe*, as a follow-up — a profile
 limited to "deny path X" patterns for srcds would be small but adds an
 audit/rollback surface. Skip in the first pass; revisit if test results
 show systemd directives alone leave gaps.
 ### Filesystem ACLs and modes
 POSIX permissions, supplementary groups, ACLs (`setfacl`), extended
 attrs (`xattr`).
 For us:
 - DB and `web.env` already use `root:left4me 0640`. If we go uid-split,
  ownership changes; if we go hardening-only, mode is fine — what
  matters is *whether the unit's FS view contains them at all*.
 - `setfacl` for fine-grained sharing (e.g., one supplementary group
  used by both web and game). Doable but adds complexity; consider
  only if uid split goes ahead.
 ### File attributes (chattr)
 `chattr +i` (immutable) and `chattr +a` (append-only).
 For us:
 - `chattr +i /opt/left4me/src/**` — prevents post-deploy tampering by
  anything short of root removing the attr. But: `pip install -e`
  creates `*.egg-info` files in the tree; deploy of new code would need
  to `chattr -R -i ...` first. Too much friction. Skip.
 - `chattr +i /etc/left4me/web.env` — keeps the env file from being
  rewritten by a malicious uid. Works because the env file is rewritten
  rarely (rotate SECRET_KEY explicitly via ckn-bw apply, which is root
  and can `chattr -i` first). Worth considering as a small extra.
 ### cgroups v2
 Not a security primitive (not confidentiality/integrity), but a
 **resource ceiling**. Already in use:
 - `Slice=l4d2-game.slice`, `MemoryMax`, `TasksMax` — keep.
 `MemoryDenyWriteExecute=true` is a kernel-level prctl + seccomp, not a
 cgroup, but listed here because it's resource-adjacent. See systemd
 section.
 ### Sudo / setuid
 Sudoers grants narrow what a unit's uid can do as root. For us, the
 helpers (`scripts/libexec/left4me-*`) already validate inputs tightly
 (verified in audit). Two design options for the future:
 - **Keep sudo path**, narrow the grants (per-uid via 3-user split, or
  per-action via tighter sudoers).
 - **Replace sudo with systemctl-managed transient units triggered via
  dbus / `systemctl start`** — the build-overlay-unit spec already
  proposes this for the script-sandbox.
 The web app needs to invoke the helpers somehow. `NoNewPrivileges=true`
 on the web unit would break sudo's setuid. If we move to
 systemctl-triggered units (no setuid involved), we can also tighten the
 web unit. Sequenced in the implementation plan, not this survey.
 ## Section 2 — systemd unit-config primitives
 ### Identity
 - **`User=` / `Group=`** — drop privileges. Already set.
 - **`DynamicUser=true`** — transient uid per run, persisted across runs
  via `StateDirectory=`. Strong default. **Bad fit for us** because
  multiple units share `/var/lib/left4me/` cross-unit; DynamicUser's
  per-unit `StateDirectory=` model fights that.
 - **`SupplementaryGroups=`** — extra groups. Used if we add a shared
  read-only group (e.g., `l4d2-overlay-readers`).
 ### Filesystem virtualization
 The lever the operator asked about ("can systemd have a fully virtual
 filesystem"). Yes — composition:
 - **`RootDirectory=path`** — chroot. Full FS substitution. Heavy;
  requires populating libs/binaries. Skip for the first pass.
 - **`RootImage=path`** — same but from a disk image. Way too heavy.
 - **`TemporaryFileSystem=path[:opts]`** — empty tmpfs at `path`.
  Cheap. Composes with bind paths.
 - **`BindReadOnlyPaths=src[:dst]`** — RO bind. Composes over
  TemporaryFileSystem.
 - **`BindPaths=src[:dst]`** — RW bind. Composes over TemporaryFileSystem.
 - **`InaccessiblePaths=path`** — masks a path with an empty file/dir.
  Legacy; Bind* is cleaner.
 - **`NoExecPaths=path`** / **`ExecPaths=path`** — restrict
  executable paths. Strong but easy to misconfigure.
 Composition pattern (the one we want for srcds):
 ```ini
 TemporaryFileSystem=/var/lib /etc /opt /home /root /srv
 BindReadOnlyPaths=/var/lib/left4me/installation
 BindReadOnlyPaths=/var/lib/left4me/overlays
 BindReadOnlyPaths=/etc/left4me/host.env
 BindReadOnlyPaths=/etc/ssl /etc/ca-certificates /etc/resolv.conf
 BindReadOnlyPaths=/etc/nsswitch.conf /etc/alternatives
 BindPaths=/var/lib/left4me/runtime/%i
 ```
 Result: srcds has no DB, no `web.env`, no `/opt/left4me/src/` in its FS
 view. Files outside the bound list are simply not there from srcds's
 perspective — `open()` returns ENOENT, not EACCES.
 Sharp edges:
 - `TemporaryFileSystem=` size defaults to half RAM; clamp via
  `:size=NNM,nr_inodes=NN`.
 - Bind paths must exist on disk; ENOENT prevents unit start.
 - `BindReadOnlyPaths=` and `BindPaths=` reorder semantics: bind-mounts
  applied in order; later wins.
 - `RuntimeDirectory=` integrates with `TemporaryFileSystem=` cleanly:
  `RuntimeDirectory=left4me/foo` creates `/run/left4me/foo` and binds
  it in, auto-cleaning on stop.
 ### Namespaces (systemd wrappers)
 - **`PrivateTmp=true`** — already set.
 - **`PrivateDevices=true`** — already set. Drops most of `/dev`.
 - **`PrivateNetwork=true`** — **don't** for gameservers (breaks UDP).
 - **`PrivateIPC=true`** — private SysV/POSIX IPC namespace; cheap win.
 - **`PrivateUsers=true`** — own userns. The configured `User=left4me`
  is identity-mapped inside; outside, the unit's processes appear as a
  mapped high uid (defense for D2/D4 against cross-namespace ptrace).
  Sharp edge: incompatible with `sudo` from inside the unit (setuid +
  userns mapping = no host-root).
 - **`PrivateMounts=true`** — own mount ns (default-implicit with most
  Protect* / Private* directives).
 ### `/proc` and `/sys` protection
 - **`ProtectProc=invisible|noaccess|ptraceable|default`** —
  `invisible` makes other procs' `/proc/<pid>/*` not exist. **D2.**
 - **`ProcSubset=pid|all`** — `pid` restricts `/proc/` to PID entries;
  hides `/proc/kallsyms`, `/proc/cpuinfo`, etc. Cheap.
 - **`ProtectKernelTunables=true`** — `/proc/sys`, `/sys` read-only.
 - **`ProtectKernelModules=true`** — block `init_module`, `delete_module`.
 - **`ProtectKernelLogs=true`** — block `/dev/kmsg`, syslog().
 - **`ProtectClock=true`** — block `clock_settime`, `settimeofday`.
 - **`ProtectControlGroups=true`** — `/sys/fs/cgroup` read-only.
 - **`ProtectHostname=true`** — block `sethostname`/`setdomainname`.
 All of `ProtectKernel*`, `ProtectClock`, `ProtectControlGroups`,
 `ProtectHostname` are cheap and have no downside for srcds or gunicorn.
 Add all of them.
 ### Filesystem protection (legacy / not Bind*)
 - **`ProtectSystem=false|true|full|strict`** — increasingly stringent
  RO of system paths. `strict` makes `/`, `/usr`, `/boot`, `/etc`,
  `/opt` RO except for explicit writable paths.
 - **`ProtectHome=false|true|read-only|tmpfs`** — `tmpfs` masks `/home`,
  `/root`, `/run/user` with empty tmpfs.
 For us: `ProtectSystem=strict` + `ProtectHome=tmpfs` is the baseline.
 But once we adopt `TemporaryFileSystem=` for the relevant trees, these
 become secondary — TemporaryFileSystem fully supersedes them in the
 covered subtrees. Keep both as defense-in-depth (cheap).
 ### Syscall filtering
 - **`SystemCallFilter=expr`** — discussed in Linux section.
 - **`SystemCallArchitectures=native`** — always set.
 - **`SystemCallLog=expr`** — opt-in logging without enforcement;
  useful for diagnosing what gets called before tightening.
 - **`SystemCallErrorNumber=EPERM`** — soft denial vs. SIGKILL. Default
  is SIGKILL; switch later if a benign caller trips.
 ### Capabilities
 - **`CapabilityBoundingSet=`** — empty drops all. Use it.
 - **`AmbientCapabilities=`** — empty (default).
 - **`NoNewPrivileges=true`** — prevents setuid escalation. **Required
  on srcds**, **incompatible with sudo on web** until sudo is replaced.
 ### Network restrictions
 - **`RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX`** — for srcds.
  AF_UNIX needed for journald socket access.
 - **`IPAddressAllow=` / `IPAddressDeny=`** — uses cgroup BPF; affects
  outbound traffic. For srcds: probably overcomplicates; the firewall
  already controls ingress. Skip for first pass.
 - **`SocketBindAllow=` / `SocketBindDeny=`** — restricts which ports a
  unit can `bind()`. For srcds, allow only the configured game port
  range. Adds value but couples to config. Defer to a follow-up.
 ### Resource restrictions
 - **`MemoryMax`**, **`TasksMax`**, **`LimitNOFILE`** — already set.
 - **`OOMScoreAdjust`** — already set (favor killing the gameserver
  before system processes if memory tight).
 - **`MemoryDenyWriteExecute=true`** — blocks `mprotect(PROT_WRITE|PROT_EXEC)`.
  Defends against shellcode in JIT memory. **Source engine likely
  fine** (no JIT in the binary; the Squirrel script engine is an
  interpreter, not JIT). **Sourcemod plugins**: most are compiled to
  bytecode + run on SourcePawn VM (interpreter); no JIT either. Verify
  in test.
 ### IPC and process hygiene
 - **`RemoveIPC=true`** — clean up SysV IPC on unit stop.
 - **`KeyringMode=private`** — own kernel keyring; no host-key access.
 - **`LockPersonality=true`** — block `personality(2)` calls (no x86 vs
  x86-64 mode toggle). Already set.
 - **`RestrictRealtime=true`** — block real-time scheduling. srcds may
  use SCHED_OTHER + nice; no realtime needed.
 - **`RestrictNamespaces=true`** — block `unshare(2)` / `clone(CLONE_NEW*)`.
 - **`RestrictSUIDSGID=true`** — already set.
 - **`UMask=0027`** — narrow default umask.
 ### Capabilities of the `+` prefix
 `ExecStartPre=+cmd` runs `cmd` as root in PID 1's namespaces, bypassing
 the unit's User= and almost all Protect*/Private*/Restrict* directives.
 This is how the existing overlay-mount helper runs. Critical to verify
 in test:
 - Does `+` preserve the bypass when `PrivateUsers=true` is set?
  (Expected: yes — the userns is set up around the unit's processes;
  `+` puts the helper outside it.)
 ### State management (per-unit)
 - **`StateDirectory=path`** — creates `/var/lib/<path>` owned by User=.
 - **`RuntimeDirectory=path`** — creates `/run/<path>`, auto-deleted on
  stop.
 - **`LogsDirectory=path`** — `/var/log/<path>`.
 - **`CacheDirectory=path`** — `/var/cache/<path>`.
 - **`ConfigurationDirectory=path`** — `/etc/<path>`.
 Useful for cleanup hygiene if we redesign storage layout. Not required
 for first pass.
 ### `systemd-analyze security`
 `systemd-analyze security <unit>` produces a security score per unit
 (lower = more secure). Output lists each directive with a ✓/✗.
 Useful as:
 - Regression check (record baseline, ensure score drops after refactor).
 - Discovery tool ("which directives haven't I set?").
 Baseline scores (to capture during test plan):
 - `left4me-server@1.service` before refactor
 - `left4me-web.service` before refactor
 ### Composability lookups
 The systemd docs use a "predefined preset" concept that's worth knowing:
 - **`@privileged`** (syscall group) ⊃ `@process`, `@module`, `@ptrace`, etc.
 - **`@system-service`** is the recommended base for "I want a normal
  service to work."
 - Subtracting `~@privileged` is broad; `~@debug @mount @raw-io` is
  surgical.
 ## Section 3 — Application-level options
 ### Apparmor profile for srcds
 If systemd directives leave gaps, an AppArmor profile would let us
 deny specific paths or operations beyond what systemd's directives
 cover. E.g., "deny network for srcds to a specific IP range" via
 `network inet stream...` deny rules; or "deny mounting" beyond
 `SystemCallFilter`.
 Effort:
 - Enable AppArmor in the kernel cmdline + boot config.
 - Write a profile (e.g., `/etc/apparmor.d/usr.bin.srcds_linux`).
 - Reference via systemd `AppArmorProfile=` per unit.
 Skip for the first pass; revisit if test results show the systemd
 directives alone leave a gap.
 ### landlock for the web app
 Python web app could call `landlock_create_ruleset` / `landlock_add_rule`
 / `landlock_restrict_self` via ctypes. Restricts FS access at runtime.
 For us:
 - Could restrict gunicorn to `/var/lib/left4me/` + `/etc/left4me/web.env`
  + `/opt/left4me/.venv` + `/tmp`.
 - Symmetric to `TemporaryFileSystem=` + `Bind*` but at the
  application layer (no systemd reach).
 Skip; systemd directives are simpler. Reconsider if we move to a
 DynamicUser-style world later.
 ### File-integrity tooling (Aide, Tripwire)
 Out of scope for prevention; useful for detection. Not in this design.
 ### Custom seccomp profile (bypassing systemd)
 The web app could call `seccomp(2)` from inside Python via libseccomp
 + ctypes to tighten its own filter beyond what systemd applies.
 Symmetric to landlock; skip for the same reason.
 ## Section 4 — Per-defense mapping
 For each defense from the threat model, the primitives that implement
 it, in priority order:
 ### D1 — Gameserver RCE cannot exfiltrate DB or `web.env`
 | Primitive | Strength | Notes |
 |---|---|---|
 | `TemporaryFileSystem=/var/lib /etc` + minimal bind set | Strong | The files simply aren't in the unit's FS view. ENOENT, not EACCES. |
 | 3-user split (DB owned by `l4d2-web`) | Strong | Kernel-enforced; survives unit-config errors. |
 | `BindReadOnlyPaths=/dev/null:/var/lib/left4me/left4me.db` | Medium | Masks the path; brittle (paths can move). |
 | Filesystem ACLs (DB mode 0600) | Weak | Kernel still allows `left4me` group; only fixed by uid split. |
 **Composition chosen:** `TemporaryFileSystem=` + Bind* (primary).
 3-user split as defense-in-depth or deferred.
 ### D2 — Gameserver RCE cannot ptrace web app or peers
 | Primitive | Strength | Notes |
 |---|---|---|
 | `SystemCallFilter=~@debug` | Strong | Blocks `ptrace`, `process_vm_readv/writev`. |
 | `kernel.yama.ptrace_scope=2` | Strong | Belt-and-braces at the kernel level. |
 | `CapabilityBoundingSet=` empty | Strong | No CAP_SYS_PTRACE. |
 | `PrivateUsers=true` | Strong | Cross-userns ptrace requires CAP_SYS_PTRACE. |
 | 3-user split | Strong | Different uids; same-uid path doesn't exist. |
 **Composition chosen:** All four (syscall + yama + caps + userns)
 together; they compose redundantly.
 ### D3 — Gameserver RCE cannot use sudo helpers
 | Primitive | Strength | Notes |
 |---|---|---|
 | `NoNewPrivileges=true` | Strong | Blocks sudo's setuid. Already set on server@. |
 | `PrivateUsers=true` | Strong | sudo across userns boundary impossible. |
 | Sudoers grants scoped to `l4d2-web` (uid split) | Strong | Different uid means sudo grant doesn't apply. |
 | `RestrictSUIDSGID=true` | Strong | Already set. |
 **Composition chosen:** NoNewPrivileges (already) + PrivateUsers (new)
 + RestrictSUIDSGID (already). 3-user split is *also* covered by NNP
 + PrivateUsers; uid split would be defense-in-depth.
 ### D4 — Web app RCE cannot ptrace gameservers
 | Primitive | Strength | Notes |
 |---|---|---|
 | `SystemCallFilter=~@debug` on **web** | Strong | Symmetric to D2 but applied to web. |
 | `kernel.yama.ptrace_scope=2` | Strong | System-wide, helps both directions. |
 | 3-user split | Strong | Different uids. |
 **Composition chosen:** SystemCallFilter on web + yama=2 system-wide.
 PrivateUsers cannot be applied to web (sudo incompatibility). 3-user
 split as defense-in-depth or deferred.
 ### D5 — Cross-server contamination
 Each `left4me-server@<n>.service` is a separate unit instance. With
 `PrivateUsers=true`, each gets its own user namespace. Cross-namespace
 ptrace fails. With `TemporaryFileSystem=` and per-instance
 `BindPaths=/var/lib/left4me/runtime/%i`, neither instance can read the
 other's `runtime/<n>/` or attach to its process.
 **Composition chosen:** PrivateUsers + per-instance Bind* (above).
 Per-instance uids out of scope.
 ### D6 — Persistent compromise of `/opt/left4me/src/` blocked from gameserver
 Already covered by `ProtectSystem=strict` on server@.service. With
 `TemporaryFileSystem=/opt`, the path simply isn't visible to srcds.
 **Stronger and redundant — both can stay.**
 ### D7 — Defenses survive a unit-config refactor in the wrong direction
 `deploy/tests/test_deploy_artifacts.py` asserts the directives' presence
 in the deployed unit. Add hardening invariants as test cases. Survives
 because the test fails CI before deploy.
 ## Section 5 — Candidate composition
 **For testing, not commitment.** Test plan validates each piece.
 ### `left4me-server@.service`
 ```ini
 [Service]
 User=left4me
 Group=left4me
 # (existing)
 Type=simple
 WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
 EnvironmentFile=/etc/left4me/host.env
 EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
 ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
 ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
 ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
 Restart=on-failure
 RestartSec=5
 # Resource control (existing)
 Slice=l4d2-game.slice
 Nice=-5
 IOSchedulingClass=best-effort
 IOSchedulingPriority=4
 OOMScoreAdjust=-200
 MemoryHigh=1.5G
 MemoryMax=2G
 TasksMax=256
 LimitNOFILE=65536
 KillSignal=SIGINT
 TimeoutStopSec=15s
 LogRateLimitIntervalSec=0
 # Hardening — identity
 NoNewPrivileges=true
 RestrictSUIDSGID=true
 # Hardening — namespaces
 PrivateTmp=true
 PrivateDevices=true
 PrivateIPC=true
 PrivateUsers=true               # NEW
 ProtectHome=true
 # Hardening — filesystem view
 TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media     # NEW
 BindReadOnlyPaths=/var/lib/left4me/installation                          # was ReadOnlyPaths
 BindReadOnlyPaths=/var/lib/left4me/overlays                              # was ReadOnlyPaths
 BindReadOnlyPaths=/etc/left4me/host.env                                  # NEW
 BindReadOnlyPaths=/etc/ssl /etc/ca-certificates                          # NEW
 BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives  # NEW
 BindPaths=/var/lib/left4me/runtime/%i                                    # was ReadWritePaths
 ProtectSystem=strict
 # (remove old ReadOnlyPaths= and ReadWritePaths= lines — superseded)
 # Hardening — /proc, /sys, kernel
 ProtectProc=invisible           # NEW
 ProcSubset=pid                  # NEW
 ProtectKernelTunables=true      # NEW
 ProtectKernelModules=true       # NEW
 ProtectKernelLogs=true          # NEW
 ProtectClock=true               # NEW
 ProtectControlGroups=true       # NEW
 ProtectHostname=true            # NEW
 LockPersonality=true
 # Hardening — caps + syscall
 CapabilityBoundingSet=          # NEW
 AmbientCapabilities=            # NEW
 SystemCallArchitectures=native  # NEW
 SystemCallFilter=@system-service                                      # NEW
 SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged  # NEW
 # Hardening — network
 RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX                       # NEW (AF_UNIX for journald)
 # Hardening — namespaces, realtime, IPC
 RestrictNamespaces=true          # NEW
 RestrictRealtime=true            # NEW
 RemoveIPC=true                   # NEW
 KeyringMode=private              # NEW
 UMask=0027                       # NEW
 # Deferred until test:
 # MemoryDenyWriteExecute=true    # MAY break sourcemod / Source engine; test first.
 ```
 ### `left4me-web.service`
 ```ini
 [Service]
 User=left4me
 Group=left4me
 # (existing)
 Type=simple
 WorkingDirectory=/opt/left4me/src
 Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 EnvironmentFile=/etc/left4me/host.env
 EnvironmentFile=/etc/left4me/web.env
 ExecStart=/opt/left4me/.venv/bin/gunicorn --workers ... --threads ... --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
 Restart=on-failure
 RestartSec=3
 # Hardening
 PrivateTmp=true
 ProtectSystem=strict                      # tightened from =full
 ProtectHome=true
 ReadWritePaths=/var/lib/left4me           # web needs broad write access there
 # NoNewPrivileges intentionally NOT set — sudo
 # PrivateUsers intentionally NOT set — sudo
 # /proc + kernel hardening (sudo-compatible)
 ProtectProc=invisible                     # NEW
 ProcSubset=pid                            # NEW
 ProtectKernelTunables=true                # NEW
 ProtectKernelModules=true                 # NEW
 ProtectKernelLogs=true                    # NEW
 ProtectClock=true                         # NEW
 ProtectControlGroups=true                 # NEW
 ProtectHostname=true                      # NEW
 LockPersonality=true                      # NEW
 # Syscall filter — allow @system-service minus debug-class; keep @privileged
 # because sudo needs setuid, chown, etc.
 SystemCallArchitectures=native            # NEW
 SystemCallFilter=@system-service          # NEW
 SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete  # NEW
 # Network
 RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX  # NEW
 # Misc hygiene
 RestrictRealtime=true                     # NEW
 RestrictNamespaces=true                   # NEW
 RemoveIPC=true                            # NEW
 UMask=0027                                # NEW
 # Deferred for sudo-removal future work:
 # NoNewPrivileges=true
 # CapabilityBoundingSet=
 # PrivateUsers=true
 ```
 ### Host sysctl
 `/etc/sysctl.d/99-left4me.conf` (or merge into existing):
 ```
 kernel.yama.ptrace_scope=2
 ```
 System-wide. Means: even if a unit-level config slips, host-level
 ptrace is admin-only. Cost: zero for our use case (no debugging in
 prod).
 ## Section 6 — Trade-offs and known sharp edges
 To verify in the test plan:
 1. **`PrivateUsers=true` + `+`-prefixed ExecStartPre**: expected to
   work (the `+` runs outside the unit's namespaces). Sharp if it
   doesn't — the overlay mount would fail and srcds wouldn't start.
 2. **`TemporaryFileSystem=/etc` and missing files**: srcds and its
   dependencies (libstdc++ runtime, libssl, libcurl) may read files
   from `/etc` we haven't bound. Watch journalctl for ENOENT during
   first start.
 3. **`SystemCallFilter=~@privileged` and Source engine**: srcds is C++
   and uses syscalls beyond the obvious. A `~@privileged` may trip
   something. Mitigation: test with `SystemCallLog=` instead of
   `SystemCallFilter=` first; observe what would have been blocked;
   then narrow.
 4. **`MemoryDenyWriteExecute=true` and sourcemod**: SourcePawn is
   bytecode-interpreted (no JIT) per public docs, but plugin
   compilation could in theory use a JIT. Test before enabling.
 5. **`RestrictAddressFamilies=` without AF_UNIX**: journald socket
   needs it. Always include AF_UNIX.
 6. **`ProcSubset=pid` and Python**: gunicorn shouldn't break (uses
   /proc/self/* + signal-based ipc). Verify.
 7. **sysctl `kernel.yama.ptrace_scope=2`**: blocks operator's own
   `gdb` / `strace -p` against any running service. If you need to
   debug, temporarily set back to 1 via sysctl, then revert.
 8. **`ProtectSystem=strict` on web**: was `=full`. Tighter; might
   break a write the web app does to a path outside `/var/lib/left4me`.
   Audit `l4d2web/*` for `os.makedirs` or `open(...'w')` outside that
   root.
 ## Open questions for the implementer
 (After test plan results come back, finalize these.)
 1. Do we adopt `MemoryDenyWriteExecute=true` if it works for srcds?
   (Probably yes, defense-in-depth at low cost.)
 2. Do we set `SocketBindAllow=` on srcds to lock the port range?
   (Depends on whether `instance.env` exposes the range cleanly to a
   unit directive.)
 3. Do we deploy AppArmor profiles as a follow-up?
   (Probably no — operational complexity exceeds the marginal gain on
   single-host infra.)
 4. Do we keep both `BindReadOnlyPaths=` and the legacy
   `ReadOnlyPaths=` declarations, or simplify? (Simplify — use Bind*
   exclusively once `TemporaryFileSystem=` is in place.)
 5. Do we proceed with 3-user split as a follow-up, or close the spec
   as "addressed by hardening"? Depends on operator's residual-risk
   tolerance after Phase A lands and we observe.
 ## Pointers
 - Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
 - Test plan: `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
 - Original uid-split spec (still open): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
 - Live unit source (ckn-bw reactor): `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
 - Reference units (deploy-dir-rethink reference-only): `deploy/files/usr/local/lib/systemd/system/`
 - systemd docs (latest, systemd 256+ on Trixie):
  `man systemd.exec`, `man systemd.unit`, `man systemd-analyze`.
 - L4D2 / Source engine docs:
  - SourcePawn (bytecode-interpreted): https://wiki.alliedmods.net/SourcePawn
  - srcds is a Source 2007 engine binary; closed-source, expect surprises.
--- a/docs/superpowers/specs/2026-05-15-hardening-refactor-design.md
+++ b/docs/superpowers/specs/2026-05-15-hardening-refactor-design.md
@ -0,0 +1,237 @@
 # Hardening refactor — design
 **Status:** approved design; implementation plan to follow at
 `docs/superpowers/plans/2026-05-15-hardening-refactor.md`.
 Companion: `2026-05-15-hardening-threat-model.md`,
 `2026-05-15-hardening-defenses-survey.md`,
 `2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
 This doc records the *shape* of the refactor — where the artifacts live,
 how they're factored, what's in scope. The implementation plan lays out
 the steps.
 ## Context
 The hardening test plan ran end-to-end on `left4.me` on 2026-05-15
 (commit `461b8d0`). Outcome: `left4me-server@1` 7.5→1.3 systemd-analyze,
 `left4me-web` 8.7→4.1, all 8 Test 8 attack vectors blocked. Two
 amendments to the spec's proposed composition required: `SystemCallArchitectures=native x86`
 (srcds_linux is i386), `PrivatePIDs=true` (same-uid
 `ProtectProc=invisible` can't hide gunicorn from srcds; PID namespace
 fixes it at the kernel level). `MemoryDenyWriteExecute=true` permanently
 excluded (Source engine i386 `.so` files have text relocations).
 Composition is *not currently deployed* — Test 7's drop-in was cleaned
 up at session end; only the Test 9 sysctl (`kernel.yama.ptrace_scope=2`)
 persists. This refactor lands the proven composition permanently via
 the ckn-bw bundle.
 ## Approach
 Keep the current responsibility split for now: ckn-bw owns systemd unit
 emission (base + hardening), left4me owns the educational reference
 copies and the threat-model/test docs. Hardening directives land in
 ckn-bw's `systemd/units` reactor at
 `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`, factored via
 shared Python dicts so the two units (and the future
 build-overlay-unit refactor) reuse the common base.
 The broader responsibility reshape — hardening as drop-in files
 *living* in left4me with ckn-bw as a thin file-shipper — is a real
 direction worth pursuing, but deserves its own session. Deferred.
 ## Factoring
 Three dict constants at the top of `metadata.py` (or in a sibling
 `hardening.py` module if `metadata.py` grows past a comfortable read):
 ### `HARDENING_COMMON`
 Directives both units take verbatim. ~17 keys:
 ```python
 HARDENING_COMMON = {
    'ProtectProc': 'invisible',
    'ProcSubset': 'pid',
    'ProtectKernelTunables': 'true',
    'ProtectKernelModules': 'true',
    'ProtectKernelLogs': 'true',
    'ProtectClock': 'true',
    'ProtectControlGroups': 'true',
    'ProtectHostname': 'true',
    'LockPersonality': 'true',
    'ProtectSystem': 'strict',
    'ProtectHome': 'true',
    'PrivateTmp': 'true',
    'RestrictNamespaces': 'true',
    'RestrictRealtime': 'true',
    'RemoveIPC': 'true',
    'KeyringMode': 'private',
    'UMask': '0027',
    'RestrictAddressFamilies': 'AF_INET AF_INET6 AF_UNIX',
 }
 ```
 ### `HARDENING_SERVER`
 `{**HARDENING_COMMON, ...server-specific}`. Adds sudo-incompatible
 flags + filesystem virtualization + i386 amendment + per-instance PID
 namespace + bound socket binds:
 - `NoNewPrivileges=true`
 - `RestrictSUIDSGID=true`
 - `PrivateUsers=true`
 - **`PrivatePIDs=true`** *(Test amendment — D2.b / D5)*
 - `PrivateIPC=true`
 - `PrivateDevices=true`
 - `CapabilityBoundingSet=` *(empty value → drop all)*
 - `AmbientCapabilities=`
 - `SystemCallArchitectures='native x86'` *(Test amendment — i386 srcds)*
 - `SystemCallFilter=('@system-service', '~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged')` *(tuple → repeated key)*
 - `TemporaryFileSystem='/var/lib /etc /opt /home /root /srv /mnt /media'`
 - `BindReadOnlyPaths=('/var/lib/left4me/installation', '/var/lib/left4me/overlays', '/etc/left4me/host.env', '/etc/ssl', '/etc/ca-certificates', '/etc/resolv.conf', '/etc/nsswitch.conf', '/etc/alternatives')`
 - `BindPaths='/var/lib/left4me/runtime/%i'`
 - `SocketBindAllow=('udp:27000-27999', 'tcp:27000-27999')` *(NEW — lock srcds bindable sockets to the game port range; not tested in Test 7 but cheap defense-in-depth. Concrete range pending verification of `LEFT4ME_PORT_RANGE_*` substitution support in systemd directives; hard-coded range as fallback.)*
 ### `HARDENING_WEB`
 `{**HARDENING_COMMON, ...web-specific}`. Web inherits `ProtectSystem=strict`
 from COMMON (was `=full` in the current base unit; this tightens). Adds
 a syscall filter *without* `~@privileged` (sudo needs setuid).
 **Excludes** `NoNewPrivileges`, `PrivateUsers`, `RestrictSUIDSGID`,
 empty `CapabilityBoundingSet` — all sudo-incompatible.
 - `SystemCallArchitectures='native'`
 - `SystemCallFilter=('@system-service', '~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete')` *(no `~@privileged`)*
 Web's existing `ReadWritePaths=/var/lib/left4me` stays in its unit's
 inline `Service` dict (web-specific, not common).
 ### Multi-value directives and empty values
 Tuples-of-strings → emitted as repeated `Key=Value` lines by ckn-bw's
 systemd-bundle emitter. Existing precedent: `EnvironmentFile` at
 `metadata.py:201-204`. Empty values (`CapabilityBoundingSet=`,
 `AmbientCapabilities=`) need to emit as `Key=` with nothing after `=`.
 Both behaviors verified as the first step of the implementation plan;
 fallback approaches if the emitter doesn't handle them: inline-joined
 strings where systemd accepts them, or extend the emitter.
 ## Reference units
 Keep `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
 and `deploy/files/usr/local/lib/systemd/system/left4me-web.service` as
 **deliberately educational** copies of the deployed units. Each new
 hardening directive in the reference gets a one-line comment
 explaining the threat it addresses. A cold reader of the repo can open
 the reference unit and read the threat model in code form, without
 needing to read the ckn-bw bundle or systemd man pages.
 Source-of-truth: ckn-bw reactor is what's deployed. Reference units in
 left4me are hand-synced. No CI drift test (would be brittle against
 comment ordering and structural human-readable formatting); operator
 discipline at edit time keeps them aligned. A top-of-file note in each
 reference unit points readers at the reactor.
 ## Scope of the refactor
 1. **Ckn-bw reactor edits.** Three constants + spread into the two
   units. Verify tuple-multi-value emission. `metadata.py`.
 2. **Sysctl drop-in via ckn-bw.** `kernel.yama.ptrace_scope=2`. Move
   from host-only `/etc/sysctl.d/99-left4me-ptrace.conf` (applied by
   hand in Test 9) into the bundle's file management. Find the existing
   sysctl pattern in ckn-bw and follow it.
 3. **Reference unit mirror with educational comments.** Update
   `deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service`
   to match the reactor's emission, with per-directive comments
   explaining each hardening directive's purpose. Top-of-file note
   pointing to the reactor.
 4. **Spec bug fixes in the test plan.** Four bugs flagged in
   `2026-05-15-hardening-test-plan.md`'s output section: PID-lookup
   race (use `systemctl show -p MainPID --value`), gdb-from-host
   verification flaw (probe via `systemd-run` inside the same
   hardening profile, not via `nsenter` that bypasses it), D5 pgrep
   pattern, `scmp_sys_resolver` package is `seccomp` not
   `libseccomp-dev`. Doc-only.
 5. **Mark `2026-05-15-user-uid-split-design.md` superseded.** Front-matter
   status note + brief explanation that `PrivateUsers` + `PrivatePIDs`
   + `TemporaryFileSystem` close D1, D2, D3, D5 at the kernel level.
   Reference this design + the refactor plan as the replacement.
 6. **`SocketBindAllow=` for srcds** (in `HARDENING_SERVER`). Not tested
   in Test 7; verify on deploy. Encoding pending — likely hard-coded
   port range, since systemd directive variable substitution support
   is uneven.
 7. **Cleanup unmanaged packages on left4.me.** `apt remove gdb seccomp
   libseccomp-dev` after the refactor lands. Test-only tooling;
   reinstall on demand for future test sessions.
 ## Sequencing the deploy
 1. Land ckn-bw commit (reactor changes, sysctl drop-in entry).
 2. Land left4me commit (reference units, spec bug fixes, uid-split
   spec status update, this design doc, the refactor plan).
 3. Push both repos.
 4. `bw apply ovh.left4me` — applies reactor changes; systemd restarts
   affected units automatically.
 5. Verify on the host:
   - `systemctl cat left4me-server@1` shows the new directives.
   - Re-run a Test 8 subset (D1.a, D1.b, D2.b via PrivatePIDs, D5 with
     the corrected pgrep) using the *corrected* probe pattern (per
     spec bug fix in scope item 4). Test 8's full rerun is unnecessary
     — composition is proven; only the *deployment* needs verifying.
   - `sysctl kernel.yama.ptrace_scope` = 2.
   - Smoke: server@1 + server@2 + web all active and stable for 10
     minutes. Web UI: login, server start/stop, log view, overlay
     rebuild.
 6. Rollback if needed: `git revert` the ckn-bw commit + `bw apply`.
 ## What's out of scope
 - **`MemoryDenyWriteExecute=true`** — permanently excluded.
 - **AppArmor profile** — deferred per defenses-survey.
 - **`build-overlay-unit` refactor**
  (`2026-05-15-build-overlay-unit-design.md`) — sequenced after this.
  Will reuse `HARDENING_COMMON` (or a variant) when it lands.
 - **3-user uid split** — `2026-05-15-user-uid-split-design.md`
  superseded by this refactor (scope item 5).
 - **Broader configmgmt-responsibility reshape** — hardening as
  drop-ins living in left4me, ckn-bw becoming a thin file-shipper.
  Real direction worth pursuing; deserves a dedicated session.
  Out of scope here.
 - **Stale RCON port app bug** — flagged in executor's handoff. Separate
  scope.
 - **Pushing the branch** — operator decides when.
 ## Implementation notes (resolved during plan execution)
 - The ckn-bw systemd-bundle emitter renders Python tuples as repeated
  `Key=Value` lines and renders empty strings as `Key=` with no value.
  Both behaviors confirmed by reading the Mako template in
  `libs/systemd.py:17-23`. Tuple branch: `isinstance(value,
  (list, set, tuple))` iterates and emits `${option}=${item}` per
  element, preserving insertion order (sets are sorted; lists and
  tuples are not). Empty-string branch: falls through to `else:
  ${option}=${str(value)}`, which emits `Key=` with nothing after `=`.
  `None` suppresses the key entirely (distinct from empty string —
  important). The `protection()` helper at `libs/systemd.py:94` already
  uses `'CapabilityBoundingSet': ''` as a live in-repo example. Tuple
  precedent in the left4me bundle: `EnvironmentFile` at
  `bundles/left4me/metadata.py:201-204`. Verified 2026-05-15.
 - `SocketBindAllow=` value: hard-coded port range `27000-27999` for
  both `udp:` and `tcp:` lines (matches the `LEFT4ME_PORT_RANGE_*`
  metadata values). Variable substitution in systemd directives is not
  universally supported; hard-coded range avoids the hazard.
 ## Pointers
 - Threat model: `2026-05-15-hardening-threat-model.md`
 - Defenses survey: `2026-05-15-hardening-defenses-survey.md` (§ 5
  candidate composition is the basis for the factoring above)
 - Test plan + results: `2026-05-15-hardening-test-plan.md`
  (commit `461b8d0`)
 - Executor's handoff: `2026-05-15-session-handoff.md`
  (commit `152c313`)
 - Live reactor: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
 - Reference units: `deploy/files/usr/local/lib/systemd/system/`
 - Deferred uid-split spec: `2026-05-15-user-uid-split-design.md`
 - Adjacent (sequenced after): `2026-05-15-build-overlay-unit-design.md`
--- a/docs/superpowers/specs/2026-05-15-hardening-test-plan.md
+++ b/docs/superpowers/specs/2026-05-15-hardening-test-plan.md
--- a/docs/superpowers/specs/2026-05-15-hardening-threat-model.md
+++ b/docs/superpowers/specs/2026-05-15-hardening-threat-model.md
@ -0,0 +1,345 @@
 # left4me application hardening — threat model
 **Status:** living spec, intended input to a hardening implementation plan.
 Paired with `2026-05-15-hardening-defenses-survey.md` and
 `2026-05-15-hardening-test-plan.md`.
 This document establishes *what we defend against and what we accept losing*.
 The defenses survey and test plan operationalize this against the codebase.
 ## Context
 The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and
 queued "uid split decision" as the next session's task
 (`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user
 configuration found that the gameserver's systemd hardening blocks
 privilege escalation but leaves same-uid attack surface wide open:
 RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by
 srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable),
 no ptrace block on `left4me-server@.service`, no `/proc` isolation.
 Rather than answer the original "1/2/3 uids" question in isolation,
 this work treats application hardening as a first-class refactor: ground
 the decision in an explicit threat model, survey the full Linux+systemd
 defense menu, test what composes safely with Source engine + the rest of
 the stack, then implement.
 ## Operating posture (assumed)
 Solo-operator, single-host infra (`left4.me` / `ovh.left4me`,
 141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged
 operator is the user. There are no shell logins as `left4me` or
 `l4d2-sandbox`. All access to those uids is funneled through the
 systemd-managed units (`left4me-web.service`, `left4me-server@.service`,
 `left4me-script-sandbox`). The host runs nothing other than left4me +
 ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).
 If those assumptions don't hold (e.g., shared host with other tenants,
 non-systemd-mediated access to the uids), revise this document before
 proceeding — threat surface changes meaningfully.
 ## Assets
 Ordered by impact-if-compromised. Compromise means the attacker can
 exfiltrate, modify, or destroy the asset.
 ### Tier 1 — catastrophic, no easy recovery
 | Asset | Where | Impact of compromise |
 |---|---|---|
 | Host root | the box | Total compromise of every service on the host. |
 | `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. |
 | `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. |
 | Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. |
 | User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. |
 ### Tier 2 — severe but bounded
 | Asset | Where | Impact |
 |---|---|---|
 | `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. |
 | Overlay content | `/var/lib/left4me/overlays/<id>/` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. |
 | Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. |
 | Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. |
 | Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. |
 ### Tier 3 — limited, recoverable
 Job history, build logs, the small subset of in-game state not covered by
 the above (e.g., live player slot in a specific match).
 ## Trust boundaries
 Lines we want enforced. "Enforced" = the kernel + systemd, not "the
 process politely doesn't cross it."
 | Id | From | To | Strength today | Strength wanted |
 |---|---|---|---|---|
 | TB1 | External network | host shell | Strong (firewall, no extra services) | Strong |
 | TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong |
 | TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) |
 | TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong |
 | TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong |
 | TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/<n>` access) | Medium (web needs to stage server.cfg) |
 | TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong |
 | TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary |
 TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the
 web app legitimately writes per-instance config; the boundary is "web
 can write per-instance config" allowed, "web can ptrace srcds" denied.
 ## Attackers
 ### A1 — Anonymous external attacker (primary)
 Reaches public surfaces:
 - gunicorn on `:8000` (behind nginx + admin auth)
 - srcds on UDP `:27015`+ per instance (game protocol; no auth)
 - (Maybe: workshop subscription endpoints if any; check.)
 Capabilities: arbitrary network packets. Goal: code execution on the
 host, then exfiltrate secrets and persist.
 ### A2 — Authenticated admin (operator)
 In the assumed posture this is *the user*, single person. Out of scope as
 a threat per operator's choice (insider == operator). If admin auth ever
 expands to multiple operators, revise.
 ### A3 — Malicious workshop content
 A workshop addon (map, plugin, asset pack) is published to the Steam
 workshop and pulled into a build. The content runs inside srcds via
 Source engine + sourcemod loading. Capabilities: same as A1 once loaded
 into srcds (the engine doesn't have a strong privilege boundary against
 its own loaded plugins). Distinct in that the entry vector is curated by
 the operator (workshop link added to a blueprint), not arbitrary network
 input. Risk floor: the operator vetted the source.
 ### A4 — Compromised player session
 A connected player exploits a Source-engine protocol bug. Functionally a
 subset of A1 — same capability set once code is running in srcds.
 ### A5 — Local attacker on the host
 Out of scope per operating posture. No non-root local accounts beyond
 the systemd-managed service uids.
 ### A6 — Steam binary supply-chain
 `srcds_linux` is a binary from Valve. A compromised Valve build would
 already be running as `left4me` and there's no practical defense at
 this layer. Out of scope.
 ## Attack scenarios
 ### S1 — L4D2 engine RCE → exfil + persist
 A1 sends a crafted packet to srcds; srcds executes attacker code as
 `left4me` inside `left4me-server@.service`.
 **Today, attacker can:**
 - Read DB → all RCON passwords (plaintext), all bcrypt hashes.
 - Read `web.env` → SECRET_KEY, Steam API key.
 - ptrace gunicorn → in-memory secrets, current sessions.
 - Read `/proc/<gunicorn-pid>/environ` → same env as `web.env`.
 - ptrace + read DB of peer `left4me-server@<n>` — cross-server compromise.
 - `sudo left4me-systemctl|journalctl|overlay` for any instance.
 - Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`).
 - Cannot acquire new caps (NoNewPrivileges).
 **Defended outcome (goal):** Blast radius limited to "this gameserver's
 runtime state during this session" — no peer-server compromise, no DB
 access, no `web.env` access, no ptrace.
 ### S2 — Web app RCE → secrets + persistence
 A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth
 bypass, file-upload escape). Web executes attacker code as `left4me`
 inside `left4me-web.service`.
 **Today, attacker can:**
 - Read + write DB (web's primary path).
 - Read `web.env`.
 - Write `/opt/left4me/src/` → backdoor next gunicorn reload.
 - `sudo` all helper verbs.
 - ptrace srcds peers, modify their `runtime/<n>/` upper layer.
 - Modify overlays (writes to `/var/lib/left4me/overlays/`).
 **Defended outcome (goal):** Cannot ptrace gameservers; cannot read
 `/proc/<srcds-pid>/*`; web compromise still owns its DB and env (its
 primary attack surface, so this is *acceptable residual*).
 ### S3 — Cross-server contamination
 S1 played out on srcds@1; attacker pivots to srcds@2.
 **Today:** trivial — ptrace srcds@2, read its memory; or just read the
 DB to learn srcds@2's RCON password and send commands.
 **Defended outcome (goal):** Blocked. Per-instance namespace isolation
 (or per-instance uid) means kernel rejects ptrace; DB invisible to
 gameserver uid hides the RCON list.
 ### S4 — Malicious workshop content
 A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod
 plugin that abuses engine APIs to do file I/O / network calls.
 **Today + with hardening:** functionally equivalent to S1 — the plugin
 runs as srcds, same blast radius. No software boundary prevents this;
 the only defense is what's outside the unit. So this is *covered* if S1
 is covered.
 ### S5 — Sudoers helper abuse
 S1 or S2 attacker uses the sudo grants to widen access.
 **Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`):
 - `left4me-systemctl <name> {enable|disable|show}` — any instance, no
  ownership check
 - `left4me-journalctl <name>` — read any unit's journal
 - `left4me-overlay mount|umount <name>` — any instance
 - `left4me-script-sandbox <overlay_id> <script>` — runs as `l4d2-sandbox`
 A compromised gameserver can enable/disable peer instances, read their
 journals, mount/umount their overlays. Not root escalation, but a
 significant escalation.
 **Defended outcome:** sudoers reachable only from `left4me-web`. The
 gameserver uid (or the gameserver's namespace) gets none of the helper
 grants. This is naturally true if the helpers are invoked only by the
 web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits
 in its FS view).
 ### S6 — Sandbox escape
 Reached A1-equivalent in `l4d2-script-sandbox`. The sandbox runs as
 `l4d2-sandbox`, fully hardened (verified during 2026-05-15 work).
 **Today:** sandbox-escape attacker has `l4d2-sandbox` capabilities only.
 With build-time-idmap, writes through the bind land on disk as
 `left4me`, but the sandbox process itself cannot interact with `left4me`
 processes (different uid). Existing isolation is strong.
 **Defended outcome:** unchanged — already strong. Document as a load-
 bearing invariant; do not weaken.
 ## What we accept losing
 Decisions to *not* defend, with reasoning. Future work might revisit.
 - **Kernel CVEs** that escape namespaces or seccomp. No practical defense
  short of running on a hypervisor + KVM. Out of scope.
 - **systemd unit-config CVEs**. Unit hardening relies on systemd
  honoring directives correctly. Out of scope.
 - **Steam binary compromise**. `srcds_linux` is Valve's. Out of scope.
 - **Sourcemod / Metamod plugin runtime weaknesses**. Plugins run as srcds
  by design. Out of scope.
 - **Player IP exposure via game protocol**. Inherent to UDP/Source. Out of
  scope.
 - **DoS via game protocol** (`A2S_INFO` flooding etc.). Out of scope for
  *this* effort; covered by network-layer mitigations.
 - **DoS via web HTTP**. Covered upstream by nginx + fail2ban; out of
  scope for *this* effort.
 - **Host root from operator error** (a misconfigured cron, an admin
  shell). Out of scope; operator is single-person and aware.
 - **Long-term forward secrecy** for past sessions (an attacker who
  exfils SECRET_KEY can replay past sessions). Out of scope; rotation
  on incident.
 ## What we defend (prioritized)
 D1 — **Gameserver RCE cannot exfiltrate DB or web.env**, including RCON
 passwords and SECRET_KEY. Highest value: catastrophic asset, plausible
 attack (L4D2 engine RCE is the canonical "old engine, public traffic"
 risk).
 D2 — **Gameserver RCE cannot ptrace web app or peer gameservers**. Blocks
 in-memory secret theft and cross-server contamination.
 D3 — **Gameserver RCE cannot use sudo helpers** for instances other
 than its own (or, ideally, cannot use sudo at all).
 D4 — **Web app RCE cannot ptrace gameservers**. Symmetric to D2; web
 still has full DB access (acceptable residual since it's the web app's
 own data).
 D5 — **Cross-server contamination blocked at the kernel level**. Per-
 instance namespaces or per-instance uid.
 D6 — **Persistent compromise of `/opt/left4me/src/` blocked from
 gameserver context**. Already partially true via `ProtectSystem=strict`;
 maintain.
 D7 — **All defenses survive a unit-config refactor in the wrong
 direction** — e.g., a future developer adding `ReadWritePaths=` widely.
 Achieved via tests that assert hardening invariants
 (`deploy/tests/test_deploy_artifacts.py`).
 ## Acceptable user-experience cost
 - **Unit start latency**: +5s tolerable; +30s not.
 - **Memory overhead**: +tens of MB per unit fine; +hundreds not.
 - **Operational complexity**: one well-documented unit-template
  hardening profile reusable across units. Acceptable trade-off.
 - **Debugging cost**: SECCOMP audit log discoverability via
  `journalctl -k` acceptable. ptrace-based debugging in production
  unnecessary; can re-enable via ad-hoc drop-in if needed.
 - **Steam updates / pip installs**: must continue to work without
  per-update operator action. Privileged paths (steamcmd self-update)
  can run as `left4me` outside the unit if needed; document.
 - **Workshop content**: must continue to load. Builds run in the
  sandbox; the gameserver only reads pre-built overlays.
 ## Acceptance criteria for the implementation
 The final composition (hardening directives + any uid changes) must:
 1. **Functionally**: pass the smoke matrix from `2026-05-15-hardening-test-plan.md` (RCON, build, restart, file upload, multi-server, workshop).
 2. **Defenses verified**:
   - srcds cannot read `/var/lib/left4me/left4me.db` or `/etc/left4me/web.env` (file not in FS view, or kernel denies)
   - srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
   - srcds cannot read `/proc/<other-pid>/*`
   - web cannot ptrace srcds (symmetric)
 3. **No regressions**: existing test suite passes
   (`pytest deploy/tests/test_overlay_helper.py l4d2host/tests/`).
 4. **Auditable**: invariants asserted in `deploy/tests/test_deploy_artifacts.py`; baseline `systemd-analyze security` score recorded.
 5. **Documentable**: one paragraph per directive in the unit, explaining
   *why* it's there. Future maintainers can reason about removal.
 ## Open questions to clarify with the operator
 Before the defenses survey is final, clarify:
 1. **Is gunicorn directly internet-reachable, or behind nginx?** The unit
   binds `127.0.0.1:8000` (per `metadata.py:208`); presumably nginx
   terminates TLS and forwards. Confirm.
 2. **Auth model**: who can log into the web app? Is admin auth strong
   (long passwords, 2FA), or default-grade? Defines how realistic S2 is.
 3. **Workshop content sources**: curated by operator, or arbitrary
   workshop subscriptions exposed to admins? Defines A3's realism.
 4. **Test bench**: is `ckn@10.0.4.128` a real separate test host, or
   ovh.left4me the only deployment target? Affects test plan choices.
 5. **`kernel.yama.ptrace_scope` setting on the host?** Default Debian is
   1; we may want 2 system-wide.
 6. **Is the host running AppArmor?** Debian Trixie does not enable it by
   default. If we want AppArmor profiles for srcds (in addition to
   systemd directives), it needs enabling system-wide.
 ## Pointers
 - Audit synthesis (this session's conversation): unit hardening profile
  `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`,
  metadata reactor `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`,
  filesystem ACLs `~/Projekte/ckn-bw/bundles/left4me/items.py:21-115`,
  DB schema `l4d2web/models.py:31, 146-148`, sudoers
  `deploy/files/etc/sudoers.d/left4me`.
 - Original uid-split spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
  — remains open; this work may supersede it.
 - Companion docs:
  `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`,
  `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`.
 - Related work landed this session:
  `docs/superpowers/plans/2026-05-15-build-time-idmap.md`,
  `docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`.
--- a/docs/superpowers/specs/2026-05-15-session-handoff.md
+++ b/docs/superpowers/specs/2026-05-15-session-handoff.md
@ -0,0 +1,178 @@
 # Session handoff — next: write the hardening-refactor implementation plan
 Short handoff. The hardening test plan was executed end-to-end on
 `left4.me` this session. Results are recorded inline in the spec at
 `docs/superpowers/specs/2026-05-15-hardening-test-plan.md` (commit
 `461b8d0`). The next session writes the implementation plan that lands
 the proven composition in ckn-bw.
 ## What just happened
 Ran all 11 tests from the hardening test plan on
 `left4me-server@1` (canary) and `left4me-web` against the live host
 at `left4.me` / `left4me.ovh.ckn.li` (Debian 13, systemd 257). All
 drop-ins cleaned up at session end; the Test 9 sysctl
 (`kernel.yama.ptrace_scope=2`) is the one persistent host change.
 `gdb` + `seccomp` packages left installed.
 Headline numbers:
 - `left4me-server@1.service`: **7.5 EXPOSED → 1.3 OK** (systemd-analyze)
 - `left4me-web.service`: **8.7 EXPOSED → 4.1 OK**
 - Test 8 attack matrix: all 8 vectors (D1.a/b/c, D2.a/b/c, D3, D5) blocked.
 Three things the test surfaced that change what the refactor must look like:
 - **`SystemCallArchitectures=native x86`**, not bare `native`.
  `srcds_linux` is 32-bit i386; with `native=AUDIT_ARCH_X86_64` only,
  every i386 syscall is killed and srcds_run respawns every 10 s.
 - **Add `PrivatePIDs=true`** to the composition. `ProtectProc=invisible`
  alone cannot hide gunicorn from srcds because they share uid 980;
  PrivatePIDs gives each instance its own PID namespace and closes
  D2.b without needing the uid split.
 - **Exclude `MemoryDenyWriteExecute=true`.** Source engine i386 `.so`
  files have text relocations; MDW returns EPERM on the relocation
  `mprotect`, dlopen aborts, srcds enters the respawn loop. Permanent
  exclusion — not fixable without rebuilding Valve's closed-source
  binary.
 Full per-test detail is in the spec's "Results" section.
 ## What's next: write the refactor plan
 Target file: `docs/superpowers/plans/2026-05-16-hardening-refactor.md`
 (or whatever date the next session opens).
 Scope:
 1. **Land the proven composition in ckn-bw.** Live source for the
   unit emission is `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`.
   The reactor emits `left4me-server@.service` and `left4me-web.service`
   — both need the new directives. Copy the Test 7 drop-in (from the
   spec) into the reactor's unit body, with the two amendments above.
 2. **Land the web composition** (sudo-compatible subset from Test 10)
   in the same reactor.
 3. **Land the sysctl drop-in in ckn-bw.** Currently
   `/etc/sysctl.d/99-left4me-ptrace.conf` is host-only — if ckn-bw
   later enforces unmanaged-file removal, this would disappear. Add
   `pkg_files:` entry (or whatever the bundle convention is) for
   `kernel.yama.ptrace_scope=2`.
 4. **Update reference units** in
   `deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service`
   to mirror the new emission (these are reference-only post the
   deploy-dir-rethink, but should not drift from the live source).
 5. **Decide on `SocketBindAllow=`** for game port range (27000–27999
   per `LEFT4ME_PORT_RANGE_*`). Worth adding to lock srcds's bindable
   sockets; not tested in this session.
 6. **Resolve the deferred specs:**
   - `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` —
     **mark as superseded.** PrivatePIDs + PrivateUsers close the
     same-uid /proc gap that motivated it. Note the residual app-level
     same-uid surface (DB ACLs, web.env mode) is a separate concern
     not addressed by uid split anyway.
   - AppArmor follow-up — defer further; defenses survey lists it.
     Revisit if directive-only hardening leaves observable gaps.
 7. **Fix the four spec bugs documented at the bottom of the test plan**
   (PID-lookup races, gdb-from-outside-NS verification flaw, D5
   pgrep pattern, scmp_sys_resolver package name).
 ### Recommendation on sequencing
 Before touching ckn-bw, run **superpowers:brainstorming** on the
 refactor — there's a real design choice on emission shape. The
 test-plan drop-in is ~50 lines of new directives; the existing
 reactor emits a smaller unit. Options:
 - **A. Inline.** All directives land directly in the
  `[Service]` block emitted by the reactor. Simple, ckn-bw-idiomatic.
 - **B. Profile-as-reusable-fragment.** Put the directive block in a
  shared bundle (so the future build-overlay-unit refactor can reuse
  it). Better factoring, more upfront design.
 - **C. Drop-in pattern preserved.** Reactor emits the base unit
  unchanged, plus a separate `*.service.d/hardening.conf` drop-in.
  Mirrors the test methodology; easier to revert by removing the
  drop-in.
 My weak preference is **A** for the first pass — get the production
 state hardened, then refactor into shared shape (B) when the
 build-overlay-unit work needs it. **C** is operationally clean but
 introduces a new emission pattern just for this. Worth 10 minutes of
 brainstorming before committing.
 ## Decision-relevant context
 - **Source of truth is ckn-bw.** `deploy/files/.../*.service` copies
  are reference-only post-deploy-dir-rethink. Don't edit them as the
  primary change — emit-then-mirror.
 - **Sandbox `l4d2-sandbox` unit is out of scope.** Verified during
  prior build-time-idmap work; do not weaken.
 - **Web sudo helpers must keep working.** `NoNewPrivileges` and
  `PrivateUsers` are NOT in the web composition (Test 10 confirmed
  the sudo-compat subset). The "replace sudo with systemctl-managed
  unit triggering" refactor (build-overlay-unit spec is a step
  toward it) would unlock deeper web hardening later.
 - **App-level stale RCON port bug** surfaced during testing: each
  srcds restart picks a new ephemeral RCON port; the web app
  caches the previous one and logs `Connection refused`. Pre-exists
  hardening (repro'd before any drop-in was applied). Separate issue,
  not for this refactor. Mention to operator; track in whatever
  issue-tracking the project uses.
 - **gdb + seccomp packages on left4.me** are installed but not in
  ckn-bw. Either add them to the bundle (so they're reproducible)
  or `apt remove` them after the refactor lands — operator
  preference.
 ## Host state at end of session
 - `left4me-server@1`, `@2`, `left4me-web`: all `active`, baseline
  (no drop-ins).
 - `/etc/sysctl.d/99-left4me-ptrace.conf` present, `ptrace_scope=2`
  effective.
 - `gdb`, `seccomp` (provides `scmp_sys_resolver`), `libseccomp-dev`
  installed.
 - `/tmp/sec-{baseline,after}-{server,web}.txt`, `/tmp/unit-baseline-*.conf`,
  `/tmp/sysctl-baseline.txt` retained (next session can pull diffs from
  these if needed).
 ## What's NOT next
 - **Don't re-run the test plan.** Already done; results committed.
 - **Don't push to origin yet.** Repo is 3 ahead of
  `origin/master` (the three hardening specs + this commit). User
  said "commit locally" this session; they'll push when ready.
 - **Don't fix the stale-RCON-port app bug as part of the refactor.**
  Different system, different scope.
 - **Don't do AppArmor.** Still deferred.
 - **build-overlay-unit refactor** (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`)
  remains sequenced behind this; not next.
 ## Open questions for the next session
 - Should the refactor be a single PR/commit, or split into
  "ckn-bw emission" + "reference unit mirror" + "sysctl drop-in"?
  Operator preference. Recommend single bundle if the changes are
  small.
 - Should we land Test 7's composition on `@2` first as a longer
  canary before rolling to all instances? Or trust the symmetric
  emission and roll everywhere at once? Currently both are running
  baseline; @1 was the only canary.
 - `SocketBindAllow=` for the 27000–27999 game port range — include
  in the first pass, or defer to a follow-up commit? Survey lists
  it, test plan didn't exercise it.
 ## Pointers
 - Test plan (executed; **read the Results section first**):
  `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
 - Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
 - Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
 - Original uid-split (to be marked superseded):
  `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
 - Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
 - Reference units: `deploy/files/usr/local/lib/systemd/system/`
 - Recent commit on this work: `461b8d0`
 - Host SSH: `ssh left4.me` (config at `~/.ssh/config`, 1Password agent)
--- a/docs/superpowers/specs/2026-05-15-user-uid-split-design.md
+++ b/docs/superpowers/specs/2026-05-15-user-uid-split-design.md
@ -1,5 +1,31 @@
 # How many system users should left4me have? — 1, 2, or 3
 **Status: SUPERSEDED 2026-05-15 by the hardening refactor.**
 The original question — should left4me have 1, 2, or 3 system users — is
 now answered: **2 users (current state) is correct.** The
 defenses that motivated a 3-user split (DB readability from srcds,
 cross-server ptrace, same-uid /proc visibility, web-side reach into
 gameserver state) are closed by the systemd hardening composition
 landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`):
 - `PrivateUsers=true` blocks cross-uid ptrace at the kernel level.
 - `PrivatePIDs=true` hides peer processes even when uids match.
 - `TemporaryFileSystem=` + minimal binds hide the DB and web.env from
  srcds entirely.
 - `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block
  ptrace at the syscall layer.
 The residual filesystem-ACL surface (DB at `0640 root:left4me`,
 web.env same) is a separate concern: a uid split would close it via
 kernel ACLs, but for the current deployment shape it's covered by the
 systemd-imposed FS view. If the deployment shape changes (multi-tenant
 host, shell logins as the service uids, additional services running
 as `left4me` outside these units) the uid split should be revisited.
 The original content of this spec is preserved below for context.
 ---
 **Status: open question, not settled design.** This is a handoff
 document. Today left4me has 2 system users: `left4me` (web app +
 gameservers + workshop builds) and `l4d2-sandbox` (script-overlay
Author	SHA1	Message	Date
mwiegand	f615d0de75	spec(user-uid-split): mark superseded by the hardening refactor The 1/2/3-user question is answered: stay at 2 (left4me + l4d2-sandbox). The defenses that motivated a 3-user split (cross-uid ptrace, cross-server contamination, web-side reach into gameserver state, DB/env exposure to srcds) are closed by the systemd hardening composition: PrivateUsers + PrivatePIDs + TemporaryFileSystem + SystemCallFilter=~@debug + empty CapabilityBoundingSet. The residual filesystem-ACL surface (mode 0640 root:left4me on DB and web.env) is noted as a separate concern — covered for the current deployment shape, revisit if shape changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:59:13 +02:00
mwiegand	37309ba399	spec(hardening-test-plan): fix four bugs surfaced by executor Four corrections noted by the test plan's executor in commit `461b8d0`: - PID-lookup race: pgrep+head can pick the wrong instance. Replace with systemctl show -p MainPID --value left4me-server@N.service. - gdb-from-host ptrace check: nsenter into only the mount namespace with root caps bypasses the SECCOMP filter, so the test is a false positive. Replace with systemd-run-with-same-directives probe, or syscall-filter inspection. - D5 pgrep pattern: 'srcds_linux.*\@2' doesn't match because @N is in the unit name, not argv. Use systemctl show -p MainPID. - scmp_sys_resolver is in the seccomp package on Debian 13, not libseccomp-dev. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:58:46 +02:00
mwiegand	8e678b6765	deploy/files: annotate reference units with per-directive hardening comments Update the educational reference copies of left4me-server@.service and left4me-web.service to match the new hardening composition from the ckn-bw reactor (HARDENING_COMMON + HARDENING_SERVER / HARDENING_WEB). Per-directive comments explain each defense's purpose and the threat it addresses, so a cold reader of this repo can understand the threat model from the unit file alone. Top-of-file note in each reference points at the ckn-bw reactor as the live source; reference is hand-synced. gunicorn ExecStart in the web reference uses placeholder '--workers 4 --threads 4' values; live emission interpolates from metadata. This is the documented divergence between the reference and the deployed unit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:54:10 +02:00
mwiegand	7c64910c90	spec(hardening-refactor): resolve emitter open items Verified during plan execution that the ckn-bw systemd-bundle emitter handles tuples and empty values as expected. SocketBindAllow port range hard-coded since systemd directive variable substitution is not universal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:39:11 +02:00
mwiegand	b1293f9952	plan(hardening-refactor): implementation plan against the proven composition 12 tasks across left4me + ckn-bw: emitter verification, three Python constants in the systemd_units reactor, spread into both managed units, sysctl drop-in, annotated reference units, four spec bug fixes, mark uid-split spec superseded, cross-repo push, bw apply + verify on host, apt-remove test tooling. Each task has bite-sized steps with exact commands and expected output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:25:25 +02:00
mwiegand	81dc29a9c3	spec(hardening-refactor): revise design — inline-in-reactor, defer drop-in reshape Going back to the inline-in-reactor shape: hardening directives land in ckn-bw's systemd_units reactor as shared Python dicts (HARDENING_COMMON + HARDENING_SERVER + HARDENING_WEB), spread into each unit's Service block. Educational reference units in deploy/files/.../*.service stay and get per-directive comments. Operator discipline hand-syncs the reference to the reactor; no CI drift test. The broader responsibility reshape — hardening drop-ins living in left4me with ckn-bw as a thin file-shipper — is worth pursuing as a separate dedicated session, not bundled into this refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:16:02 +02:00
mwiegand	3256ed2ab1	spec(hardening-refactor): design — drop-ins owned by left4me, ckn-bw deploys Hardening composition is application knowledge (which paths to bind, that srcds is i386, what breaks sudo). It belongs in the left4me repo as drop-in .conf files under deploy/files/etc/systemd/system/<unit>.d/. ckn-bw shrinks: keeps the base units in its reactor, removes the hardening keys, ships the drop-ins to /etc/systemd/system/. Existing educational reference units in deploy/files/.../*.service are deleted in favor of the drop-ins, which carry per-directive comments. Broader configmgmt-responsibility reshape (base units leaving the reactor) deliberately deferred to a future session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:05:38 +02:00
mwiegand	152c313315	spec(session-handoff): point next session at hardening-refactor plan The prior handoff pointed this session at running the test plan; that's done (commit `461b8d0`). Update the handoff to point the next session at writing docs/superpowers/plans/2026-MM-DD-hardening-refactor.md against the proven composition, including the two amendments (x86 arch, PrivatePIDs) and the MDW permanent exclusion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 13:43:37 +02:00
mwiegand	461b8d028f	spec(hardening): test plan executed on left4.me — results recorded Ran the 11-test plan against left4me-server@1 (canary) and left4me-web on left4.me / Debian 13 / systemd 257. Cleaned up all unit drop-ins; kept the Test 9 sysctl (kernel.yama.ptrace_scope=2) per spec. Outcomes: - server@1 systemd-analyze: 7.5 EXPOSED → 1.3 OK - left4me-web systemd-analyze: 8.7 EXPOSED → 4.1 OK - All 8 attack vectors in Test 8 (D1.a-c, D2.a-c, D3, D5) blocked - Test 6 (MemoryDenyWriteExecute) fails as predicted — Source engine i386 .so files have text relocations; exclude from final composition. - Test 11 (24-48h soak) skipped per operator decision. Two amendments to the spec's proposed composition required for the refactor: - SystemCallArchitectures=native x86 (not bare 'native') — srcds_linux is i386, the kernel kills every native-only call. - PrivatePIDs=true added — ProtectProc=invisible alone cannot hide gunicorn from srcds because both run as uid 980; PrivatePIDs gives each instance its own PID namespace and closes D2.b. Spec bugs surfaced and documented in the "Output" section: PID lookup via pgrep (race vs. instance), Test 4/10 gdb-from-host doesn't actually exercise the unit's SECCOMP filter, Test 8 D5 pgrep pattern won't match. Tooling note corrected: scmp_sys_resolver is in 'seccomp' package, not 'libseccomp-dev'. Next session: write docs/superpowers/plans/2026-MM-DD-hardening-refactor.md against the proven composition; supersede the uid-split spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 13:39:50 +02:00
mwiegand	1df811e62a	spec(hardening): threat model + defenses survey + test plan; pivot handoff Reframe the queued uid-split decision into a broader hardening analysis. Audit found the same-uid attack surface (DB readable from srcds, ptrace allowed, RCON stored plaintext) is closable by either uid split or systemd directive composition; the three specs ground that choice in a threat model, survey the defenses, and lay out a self-contained test plan to run on left4.me next. uid-split spec deferred pending results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 13:07:40 +02:00
mwiegand	9a2ab974e6	spec: session handoff pointing next session at uid-split Short companion to the existing topic-specific handoff docs. Captures the situationally-fresh state at the end of the 2026-05-15 deploy-dir-rethink + janitorial sweep so a fresh session can pick up cold: what just landed, what's next (uid-split), what's NOT next (build-overlay-unit, until uid-split decides), and the decision-relevant signals that emerged during this session — mostly that the 2-uid model was freshly load-bearing in the build-time-idmap work and that srcds hardening already covers most of what a gameserver-uid split would add. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 12:17:55 +02:00