Compare commits
No commits in common. "f615d0de755c68433c916cc824cec871c7855d62" and "4aa69c2461e2027dc49310a3aff6686c50b48844" have entirely different histories.
f615d0de75
...
4aa69c2461
9 changed files with 59 additions and 4114 deletions
|
|
@ -1,21 +1,10 @@
|
||||||
# left4me gameserver — system unit, one instance per gameserver.
|
|
||||||
#
|
|
||||||
# This is the REFERENCE COPY of the deployed unit. The live source is
|
|
||||||
# the systemd/units reactor at ~/Projekte/ckn-bw/bundles/left4me/metadata.py
|
|
||||||
# (look for 'left4me-server@.service'). Hardening directives live in
|
|
||||||
# the HARDENING_SERVER constant near the top of the same file.
|
|
||||||
# This file is hand-synced; edit both together.
|
|
||||||
#
|
|
||||||
# Threat model: docs/superpowers/specs/2026-05-15-hardening-threat-model.md
|
|
||||||
# Defenses survey: docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md
|
|
||||||
# Test plan + results: docs/superpowers/specs/2026-05-15-hardening-test-plan.md
|
|
||||||
|
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=left4me server instance %i
|
Description=left4me server instance %i
|
||||||
After=network-online.target
|
After=network-online.target
|
||||||
Wants=network-online.target
|
Wants=network-online.target
|
||||||
# Bound the restart loop. Without these, a persistent ExecStartPre or
|
# Bound the restart loop. Without these, a persistent ExecStartPre or
|
||||||
# ExecStart failure spins indefinitely.
|
# ExecStart failure spins indefinitely. Note: these are [Unit]-section
|
||||||
|
# directives (systemd 230+), not [Service].
|
||||||
StartLimitBurst=5
|
StartLimitBurst=5
|
||||||
StartLimitIntervalSec=60s
|
StartLimitIntervalSec=60s
|
||||||
|
|
||||||
|
|
@ -25,25 +14,49 @@ User=left4me
|
||||||
Group=left4me
|
Group=left4me
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
EnvironmentFile=/etc/left4me/host.env
|
||||||
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
||||||
# `-` prefix: chdir failure is non-fatal. The merged dir only exists
|
# `-` prefix: chdir failure is non-fatal. systemd applies WorkingDirectory
|
||||||
# once ExecStartPre's overlay mount succeeds.
|
# before every Exec line — including ExecStartPre — but the merged dir only
|
||||||
|
# exists once ExecStartPre's overlay mount succeeds. With `-`, ExecStartPre
|
||||||
|
# runs in the unit's home (cwd doesn't matter for the mount helper); the
|
||||||
|
# ExecStart re-applies WorkingDirectory after the mount and finds the dir.
|
||||||
WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
|
WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
|
||||||
# `+` prefix runs the helper as PID 1 (root, all caps, host
|
# Single source of truth for the kernel-overlayfs mount lifecycle: the web
|
||||||
# namespaces) — required because the unit has NoNewPrivileges=true
|
# app's start_instance only stages cfg files and asks systemd to enable+
|
||||||
# AND PrivateUsers=true; both block sudo's setuid path. nsenter into
|
# start this unit; the actual `mount -t overlay` lives here so reboot
|
||||||
# PID 1's mount namespace ensures the umount in ExecStopPost succeeds
|
# auto-start works the same as a UI-driven start. ExecStopPost mirrors it
|
||||||
# without EBUSY from the unit's own slave-mount tree.
|
# so the unmount lives in the same place — no Python-side _mounter needed
|
||||||
|
# in stop/delete/reset paths. Both helper verbs are idempotent.
|
||||||
|
#
|
||||||
|
# `+` prefix runs the helper as PID 1 (root, no sandbox). Required because
|
||||||
|
# the unit has NoNewPrivileges=true, which blocks sudo's setuid escalation
|
||||||
|
# — and the helper itself needs root for the mount/umount syscalls.
|
||||||
|
#
|
||||||
|
# `nsenter --mount=/proc/1/ns/mnt --` runs the helper Python interpreter
|
||||||
|
# in PID 1's mount namespace. Without this, the `+` prefix removes the
|
||||||
|
# sandbox/credentials but does NOT detach from the unit's per-service
|
||||||
|
# mount namespace (created by PrivateTmp/Protect*) — so the helper
|
||||||
|
# process itself would hold a reference to that namespace, keeping the
|
||||||
|
# slave-mount tree alive after the cgroup empties, and umount in PID 1
|
||||||
|
# would return EBUSY for as long as the helper ran. Putting nsenter at
|
||||||
|
# the unit-level (as opposed to inside the helper, where only the
|
||||||
|
# umount syscall escaped) is what actually frees the namespace. Once
|
||||||
|
# the helper is in PID 1's namespace, ExecStopPost's umount succeeds
|
||||||
|
# on the first try with no retry/race window. ExecStopPost (not
|
||||||
|
# ExecStop) so unmount runs after the cgroup is cleared; ExecStop runs
|
||||||
|
# while srcds is still alive and would EBUSY.
|
||||||
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
||||||
# Run from the merged overlay, NOT installation/. srcds_run cds to its
|
# Run from the merged overlay, NOT installation/. srcds_run is a shell
|
||||||
# own dirname before exec'ing srcds_linux; the binary's path determines
|
# script that `cd`s to its own dirname before exec'ing srcds_linux, so the
|
||||||
# gameinfo + addons lookup.
|
# binary's path determines where the engine reads gameinfo.txt and addons
|
||||||
|
# from — WorkingDirectory has no effect. Invoking installation/srcds_run
|
||||||
|
# would resolve everything against the lower layer and never see overlay-
|
||||||
|
# provided plugins (Metamod/SourceMod) or cfgs (zonemod, confogl).
|
||||||
ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
||||||
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
|
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=5
|
RestartSec=5
|
||||||
|
|
||||||
# === Resource control baseline ===
|
# Resource control baseline — see docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
||||||
# See docs/superpowers/specs/2026-05-09-l4d2-server-host-perf-baseline-design.md
|
|
||||||
Slice=l4d2-game.slice
|
Slice=l4d2-game.slice
|
||||||
Nice=-5
|
Nice=-5
|
||||||
IOSchedulingClass=best-effort
|
IOSchedulingClass=best-effort
|
||||||
|
|
@ -57,72 +70,16 @@ KillSignal=SIGINT
|
||||||
TimeoutStopSec=15s
|
TimeoutStopSec=15s
|
||||||
LogRateLimitIntervalSec=0
|
LogRateLimitIntervalSec=0
|
||||||
|
|
||||||
# === Identity / privilege drop ===
|
# Hardening (unchanged from previous baseline).
|
||||||
NoNewPrivileges=true # block setuid escalation (defense: D3)
|
NoNewPrivileges=true
|
||||||
RestrictSUIDSGID=true # block setuid()/setgid() syscalls
|
|
||||||
CapabilityBoundingSet= # drop all caps — no privilege to escalate
|
|
||||||
AmbientCapabilities=
|
|
||||||
|
|
||||||
# === Filesystem virtualization ===
|
|
||||||
# Mask /var/lib, /etc, /opt, etc. with empty tmpfs; bind back only
|
|
||||||
# what srcds needs. The DB (/var/lib/left4me/left4me.db) and web.env
|
|
||||||
# (/etc/left4me/web.env) are intentionally not bound — they don't
|
|
||||||
# exist in this unit's filesystem view (defenses: D1.a, D1.b).
|
|
||||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/installation
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/overlays
|
|
||||||
BindReadOnlyPaths=/etc/left4me/host.env
|
|
||||||
BindReadOnlyPaths=/etc/ssl
|
|
||||||
BindReadOnlyPaths=/etc/ca-certificates
|
|
||||||
BindReadOnlyPaths=/etc/resolv.conf
|
|
||||||
BindReadOnlyPaths=/etc/nsswitch.conf
|
|
||||||
BindReadOnlyPaths=/etc/alternatives
|
|
||||||
BindPaths=/var/lib/left4me/runtime/%i
|
|
||||||
ProtectSystem=strict # belt-and-braces with TemporaryFileSystem
|
|
||||||
ProtectHome=true
|
|
||||||
|
|
||||||
# === Process namespacing ===
|
|
||||||
PrivateUsers=true # own user namespace; cross-uid ptrace blocked (D2)
|
|
||||||
PrivatePIDs=true # own PID namespace; hides peer-srcds + gunicorn (D2.b, D5)
|
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
PrivateDevices=true
|
PrivateDevices=true
|
||||||
PrivateIPC=true
|
ProtectHome=true
|
||||||
RestrictNamespaces=true # block unshare()/clone(CLONE_NEW*)
|
ProtectSystem=strict
|
||||||
|
ReadOnlyPaths=/var/lib/left4me/installation /var/lib/left4me/overlays
|
||||||
# === /proc and /sys ===
|
ReadWritePaths=/var/lib/left4me/runtime/%i
|
||||||
ProtectProc=invisible # foreign-uid /proc hidden (paired with PrivatePIDs for full hide)
|
RestrictSUIDSGID=true
|
||||||
ProcSubset=pid # /proc shows only PID dirs, no kallsyms/cpuinfo
|
LockPersonality=true
|
||||||
ProtectKernelTunables=true # /proc/sys, /sys read-only
|
|
||||||
ProtectKernelModules=true # no module load/unload
|
|
||||||
ProtectKernelLogs=true # no /dev/kmsg or syslog()
|
|
||||||
ProtectClock=true # no settimeofday()
|
|
||||||
ProtectControlGroups=true # /sys/fs/cgroup read-only
|
|
||||||
ProtectHostname=true # no sethostname()
|
|
||||||
LockPersonality=true # no personality() switches
|
|
||||||
|
|
||||||
# === Syscall filter ===
|
|
||||||
# srcds_linux is i386 (Source 2007 engine). 'native x86' allows both
|
|
||||||
# x86_64 (from srcds_run + the dynamic linker) and i386 (from srcds_linux).
|
|
||||||
# Bare 'native' traps srcds_run in a respawn loop.
|
|
||||||
SystemCallArchitectures=native x86
|
|
||||||
SystemCallFilter=@system-service
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged
|
|
||||||
# ~@debug is the load-bearing block for D2.a: drops ptrace(), process_vm_readv/writev().
|
|
||||||
# ~@privileged blocks anything requiring CAP_*, redundant with empty bounding set.
|
|
||||||
# MemoryDenyWriteExecute=true is NOT set — Source engine i386 .so files
|
|
||||||
# have text relocations that need mprotect(W+X) during dynamic-linker pass.
|
|
||||||
|
|
||||||
# === Network ===
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # AF_UNIX needed for journald
|
|
||||||
# Lock srcds bindable sockets to the game port range.
|
|
||||||
SocketBindAllow=udp:27000-27999
|
|
||||||
SocketBindAllow=tcp:27000-27999
|
|
||||||
|
|
||||||
# === Misc hygiene ===
|
|
||||||
RestrictRealtime=true # no real-time scheduling
|
|
||||||
RemoveIPC=true # clean up SysV IPC on unit stop
|
|
||||||
KeyringMode=private # private kernel keyring
|
|
||||||
UMask=0027
|
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
|
|
|
||||||
|
|
@ -1,25 +1,3 @@
|
||||||
# left4me web application — system unit.
|
|
||||||
#
|
|
||||||
# This is the REFERENCE COPY of the deployed unit. The live source is
|
|
||||||
# the systemd/units reactor at ~/Projekte/ckn-bw/bundles/left4me/metadata.py
|
|
||||||
# (look for 'left4me-web.service'). Hardening directives live in
|
|
||||||
# the HARDENING_WEB constant near the top of the same file.
|
|
||||||
# This file is hand-synced; edit both together.
|
|
||||||
#
|
|
||||||
# Several directives that the gameserver uses are intentionally absent
|
|
||||||
# from this unit:
|
|
||||||
# NoNewPrivileges — blocks sudo's setuid escalation
|
|
||||||
# PrivateUsers — breaks sudo's host-root mapping
|
|
||||||
# RestrictSUIDSGID — blocks setuid()/setgid()
|
|
||||||
# CapabilityBoundingSet= — empty value would deny sudo's caps
|
|
||||||
# ~@privileged in SystemCallFilter — blocks sudo's setuid syscall
|
|
||||||
# The web app invokes privileged helpers (left4me-systemctl,
|
|
||||||
# left4me-overlay, left4me-script-sandbox) via sudo, so these
|
|
||||||
# directives can't be applied here. A future refactor replacing sudo
|
|
||||||
# with systemctl-managed transient units would unlock them.
|
|
||||||
#
|
|
||||||
# Threat model + defenses + tests: see docs/superpowers/specs/2026-05-15-hardening-*
|
|
||||||
|
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=left4me web application
|
Description=left4me web application
|
||||||
After=network-online.target
|
After=network-online.target
|
||||||
|
|
@ -29,53 +7,25 @@ Wants=network-online.target
|
||||||
Type=simple
|
Type=simple
|
||||||
User=left4me
|
User=left4me
|
||||||
Group=left4me
|
Group=left4me
|
||||||
WorkingDirectory=/opt/left4me/src
|
WorkingDirectory=/opt/left4me
|
||||||
Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
Environment=HOME=/var/lib/left4me
|
||||||
|
Environment=PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
EnvironmentFile=/etc/left4me/host.env
|
||||||
EnvironmentFile=/etc/left4me/web.env
|
EnvironmentFile=/etc/left4me/web.env
|
||||||
# Placeholder values for --workers / --threads. Live emission interpolates
|
ExecStart=/opt/left4me/.venv/bin/gunicorn --workers 1 --threads 32 --bind 0.0.0.0:8000 'l4d2web.app:create_app()'
|
||||||
# from metadata.get('left4me/gunicorn_workers') and gunicorn_threads.
|
|
||||||
ExecStart=/opt/left4me/.venv/bin/gunicorn --workers 4 --threads 4 --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
|
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=3
|
RestartSec=3
|
||||||
|
# NoNewPrivileges intentionally not set: the worker invokes sudo to run
|
||||||
# Web writes broadly under /var/lib/left4me (DB, instance configs,
|
# the left4me-systemctl, left4me-journalctl, and left4me-overlay
|
||||||
# overlays, runtime). Kept inline because it's web-specific
|
# privileged helpers, all setuid via sudo.
|
||||||
# (server@ uses BindPaths to bind only its instance dir).
|
# ProtectSystem=full + ReadWritePaths implicitly give this unit a private
|
||||||
|
# mount namespace, but mount visibility no longer depends on it: overlay
|
||||||
|
# mounts are performed by the left4me-overlay helper, which nsenters into
|
||||||
|
# PID 1's mount namespace, so the resulting mount lives in the host
|
||||||
|
# namespace where the per-instance gameserver units can see it.
|
||||||
|
ProtectSystem=full
|
||||||
ReadWritePaths=/var/lib/left4me
|
ReadWritePaths=/var/lib/left4me
|
||||||
|
|
||||||
# === Filesystem ===
|
|
||||||
ProtectSystem=strict # tightened from prior 'full'; via HARDENING_COMMON
|
|
||||||
ProtectHome=true
|
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
|
|
||||||
# === /proc + kernel ===
|
|
||||||
ProtectProc=invisible # foreign-uid /proc hidden (defense: D4)
|
|
||||||
ProcSubset=pid
|
|
||||||
ProtectKernelTunables=true
|
|
||||||
ProtectKernelModules=true
|
|
||||||
ProtectKernelLogs=true
|
|
||||||
ProtectClock=true
|
|
||||||
ProtectControlGroups=true
|
|
||||||
ProtectHostname=true
|
|
||||||
LockPersonality=true
|
|
||||||
|
|
||||||
# === Syscall filter (sudo-compatible — note absence of ~@privileged) ===
|
|
||||||
SystemCallArchitectures=native
|
|
||||||
SystemCallFilter=@system-service
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete
|
|
||||||
# ~@debug blocks ptrace + process_vm_readv/writev (D4).
|
|
||||||
# ~@privileged intentionally omitted — sudo needs setuid().
|
|
||||||
|
|
||||||
# === Network ===
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
|
|
||||||
|
|
||||||
# === Misc hygiene ===
|
|
||||||
RestrictNamespaces=true
|
|
||||||
RestrictRealtime=true
|
|
||||||
RemoveIPC=true
|
|
||||||
KeyringMode=private
|
|
||||||
UMask=0027
|
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,698 +0,0 @@
|
||||||
# left4me application hardening — defenses survey
|
|
||||||
|
|
||||||
**Status:** living spec. Companion to `2026-05-15-hardening-threat-model.md`
|
|
||||||
and `2026-05-15-hardening-test-plan.md`.
|
|
||||||
|
|
||||||
This document catalogs the Linux + systemd defense primitives applicable
|
|
||||||
to left4me, evaluates each against this codebase's needs, and proposes a
|
|
||||||
candidate composition. Each candidate is *testable* — the test plan
|
|
||||||
exercises it before commit.
|
|
||||||
|
|
||||||
Reference: the threat model defines defenses D1-D7. This document maps
|
|
||||||
primitives to those defenses.
|
|
||||||
|
|
||||||
## Section 1 — Linux kernel primitives
|
|
||||||
|
|
||||||
### Namespaces (`man 7 namespaces`)
|
|
||||||
|
|
||||||
| NS | Isolates | Relevance |
|
|
||||||
|---|---|---|
|
|
||||||
| **mount** | filesystem hierarchy view | Core. Gives `TemporaryFileSystem=` + bind primitives. |
|
|
||||||
| **user** | uid/gid mapping | Big for D2/D4 (cross-uid ptrace block). |
|
|
||||||
| **pid** | PID 1, /proc visibility | Pairs with `ProcSubset=pid` for D2. |
|
|
||||||
| **net** | netifs, ports, routes | Breaks gameservers; do **not** apply to server@. |
|
|
||||||
| **ipc** | SysV IPC + POSIX MQ + abstract sockets | Hygienic; `PrivateIPC=true`. |
|
|
||||||
| **uts** | hostname | Cosmetic; doesn't matter for us. |
|
|
||||||
| **time** | CLOCK_MONOTONIC offset | Irrelevant for us. |
|
|
||||||
| **cgroup** | cgroup view | Defense-in-depth against cgroup escape. |
|
|
||||||
|
|
||||||
**For left4me:** mount + user + pid + ipc on `left4me-server@.service`.
|
|
||||||
The web unit can use the same minus user-ns (incompatible with sudo).
|
|
||||||
|
|
||||||
### Capabilities (`man 7 capabilities`)
|
|
||||||
|
|
||||||
Per-process, granted at exec via file caps or by systemd at unit start.
|
|
||||||
Bounding set = upper bound; ambient = inherited across non-setuid exec.
|
|
||||||
|
|
||||||
- **CapabilityBoundingSet=** empty drops everything. Neither srcds nor
|
|
||||||
gunicorn needs any capability after they start (no raw sockets, no
|
|
||||||
mount, no module load, no setuid).
|
|
||||||
- **AmbientCapabilities=** empty (default).
|
|
||||||
|
|
||||||
Sharp edge: with `+`-prefixed ExecStartPre, the helper runs as PID 1
|
|
||||||
(root, all caps), unaffected by these. That's how we get the privileged
|
|
||||||
overlay mount without breaking the unit's caps.
|
|
||||||
|
|
||||||
### Seccomp-bpf (`man 2 seccomp`)
|
|
||||||
|
|
||||||
Filter syscall set. Per-process. Composes with the AND of all filters
|
|
||||||
loaded. The systemd `SystemCallFilter=` wraps it.
|
|
||||||
|
|
||||||
For us, two filter strategies:
|
|
||||||
- **Allow-list base** (`@system-service`): permissive enough for srcds
|
|
||||||
+ gunicorn; subtract dangerous groups.
|
|
||||||
- **Deny-list**: simpler but easier to leave holes.
|
|
||||||
|
|
||||||
Strategy: allow-list with subtractions.
|
|
||||||
|
|
||||||
Critical subtractions for D2:
|
|
||||||
- `~@debug` — drops `ptrace(2)`, `process_vm_readv/writev(2)`,
|
|
||||||
`process_madvise(2)`. **Single most important syscall block** for our
|
|
||||||
threat model.
|
|
||||||
- `~@mount` — `mount`, `umount2`, `pivot_root` (gameserver doesn't need;
|
|
||||||
helper does, and helper runs as root via `+` prefix).
|
|
||||||
- `~@privileged` — anything requiring CAP_*; redundant with empty
|
|
||||||
bounding set but defense-in-depth.
|
|
||||||
- `~@reboot`, `~@swap`, `~@cpu-emulation`, `~@obsolete` — cheap removal.
|
|
||||||
|
|
||||||
Sharp edges:
|
|
||||||
- `SystemCallFilter=` lines compose left-to-right by union (first line
|
|
||||||
sets allow-list; subsequent `~` lines subtract).
|
|
||||||
- A `~` subtract on a group not in the allow-list is a no-op.
|
|
||||||
- `SystemCallArchitectures=native` blocks 32-bit syscall entries that
|
|
||||||
bypass the filter. Always set this.
|
|
||||||
- `SystemCallErrorNumber=EPERM` vs. default `KILL` — `EPERM` is gentler
|
|
||||||
for non-essential paths; `KILL` is loud and obvious. Start with
|
|
||||||
default (KILL) for clear signal, switch to `EPERM` if a benign caller
|
|
||||||
trips it (e.g., a library probing for capabilities).
|
|
||||||
|
|
||||||
### Yama LSM — `kernel.yama.ptrace_scope`
|
|
||||||
|
|
||||||
System-wide sysctl. Values:
|
|
||||||
- 0: any same-user can ptrace
|
|
||||||
- 1: same-uid or direct ancestor (Debian default)
|
|
||||||
- 2: requires `CAP_SYS_PTRACE` (admin only)
|
|
||||||
- 3: ptrace disabled entirely
|
|
||||||
|
|
||||||
For left4me: setting to 2 system-wide is cheap and removes the same-uid
|
|
||||||
ptrace path entirely. Set via `/etc/sysctl.d/99-left4me.conf` (or
|
|
||||||
extend an existing file). Doesn't affect debuggability — if you ever
|
|
||||||
need to ptrace, do it as root.
|
|
||||||
|
|
||||||
Caveat: Yama is enforced AT THE TIME of `ptrace` call. With seccomp
|
|
||||||
blocking the syscall entirely (`~@debug`), Yama becomes belt-and-braces;
|
|
||||||
keep both for defense-in-depth.
|
|
||||||
|
|
||||||
### LSMs other than Yama
|
|
||||||
|
|
||||||
| LSM | Status on Debian Trixie | Fit for us |
|
|
||||||
|---|---|---|
|
|
||||||
| **AppArmor** | Available; not enabled by default | Could write profiles for srcds + gunicorn. Per-unit profile via `AppArmorProfile=` on systemd. Moderate effort. |
|
|
||||||
| **SELinux** | Available; not enabled by default | Heavy. Not worth the operational cost on a single-host VPS. |
|
|
||||||
| **landlock** | Kernel ≥5.13; available | Process-local sandboxing. Apps must opt in via `landlock(2)`. Python doesn't have a stdlib binding; need to call via ctypes or a wrapper. For us: would need to retrofit gunicorn or write a wrapper. Defer. |
|
|
||||||
| **BPF LSM** | Kernel ≥5.7; available | Programmable LSM hooks. Bleeding edge for personal infra. Defer. |
|
|
||||||
| **Tomoyo** | Available; not Debian-enabled | Path-based MAC. Niche. Skip. |
|
|
||||||
|
|
||||||
**For left4me:** Yama yes. AppArmor *maybe*, as a follow-up — a profile
|
|
||||||
limited to "deny path X" patterns for srcds would be small but adds an
|
|
||||||
audit/rollback surface. Skip in the first pass; revisit if test results
|
|
||||||
show systemd directives alone leave gaps.
|
|
||||||
|
|
||||||
### Filesystem ACLs and modes
|
|
||||||
|
|
||||||
POSIX permissions, supplementary groups, ACLs (`setfacl`), extended
|
|
||||||
attrs (`xattr`).
|
|
||||||
|
|
||||||
For us:
|
|
||||||
- DB and `web.env` already use `root:left4me 0640`. If we go uid-split,
|
|
||||||
ownership changes; if we go hardening-only, mode is fine — what
|
|
||||||
matters is *whether the unit's FS view contains them at all*.
|
|
||||||
- `setfacl` for fine-grained sharing (e.g., one supplementary group
|
|
||||||
used by both web and game). Doable but adds complexity; consider
|
|
||||||
only if uid split goes ahead.
|
|
||||||
|
|
||||||
### File attributes (chattr)
|
|
||||||
|
|
||||||
`chattr +i` (immutable) and `chattr +a` (append-only).
|
|
||||||
|
|
||||||
For us:
|
|
||||||
- `chattr +i /opt/left4me/src/**` — prevents post-deploy tampering by
|
|
||||||
anything short of root removing the attr. But: `pip install -e`
|
|
||||||
creates `*.egg-info` files in the tree; deploy of new code would need
|
|
||||||
to `chattr -R -i ...` first. Too much friction. Skip.
|
|
||||||
- `chattr +i /etc/left4me/web.env` — keeps the env file from being
|
|
||||||
rewritten by a malicious uid. Works because the env file is rewritten
|
|
||||||
rarely (rotate SECRET_KEY explicitly via ckn-bw apply, which is root
|
|
||||||
and can `chattr -i` first). Worth considering as a small extra.
|
|
||||||
|
|
||||||
### cgroups v2
|
|
||||||
|
|
||||||
Not a security primitive (not confidentiality/integrity), but a
|
|
||||||
**resource ceiling**. Already in use:
|
|
||||||
- `Slice=l4d2-game.slice`, `MemoryMax`, `TasksMax` — keep.
|
|
||||||
|
|
||||||
`MemoryDenyWriteExecute=true` is a kernel-level prctl + seccomp, not a
|
|
||||||
cgroup, but listed here because it's resource-adjacent. See systemd
|
|
||||||
section.
|
|
||||||
|
|
||||||
### Sudo / setuid
|
|
||||||
|
|
||||||
Sudoers grants narrow what a unit's uid can do as root. For us, the
|
|
||||||
helpers (`scripts/libexec/left4me-*`) already validate inputs tightly
|
|
||||||
(verified in audit). Two design options for the future:
|
|
||||||
|
|
||||||
- **Keep sudo path**, narrow the grants (per-uid via 3-user split, or
|
|
||||||
per-action via tighter sudoers).
|
|
||||||
- **Replace sudo with systemctl-managed transient units triggered via
|
|
||||||
dbus / `systemctl start`** — the build-overlay-unit spec already
|
|
||||||
proposes this for the script-sandbox.
|
|
||||||
|
|
||||||
The web app needs to invoke the helpers somehow. `NoNewPrivileges=true`
|
|
||||||
on the web unit would break sudo's setuid. If we move to
|
|
||||||
systemctl-triggered units (no setuid involved), we can also tighten the
|
|
||||||
web unit. Sequenced in the implementation plan, not this survey.
|
|
||||||
|
|
||||||
## Section 2 — systemd unit-config primitives
|
|
||||||
|
|
||||||
### Identity
|
|
||||||
|
|
||||||
- **`User=` / `Group=`** — drop privileges. Already set.
|
|
||||||
- **`DynamicUser=true`** — transient uid per run, persisted across runs
|
|
||||||
via `StateDirectory=`. Strong default. **Bad fit for us** because
|
|
||||||
multiple units share `/var/lib/left4me/` cross-unit; DynamicUser's
|
|
||||||
per-unit `StateDirectory=` model fights that.
|
|
||||||
- **`SupplementaryGroups=`** — extra groups. Used if we add a shared
|
|
||||||
read-only group (e.g., `l4d2-overlay-readers`).
|
|
||||||
|
|
||||||
### Filesystem virtualization
|
|
||||||
|
|
||||||
The lever the operator asked about ("can systemd have a fully virtual
|
|
||||||
filesystem"). Yes — composition:
|
|
||||||
|
|
||||||
- **`RootDirectory=path`** — chroot. Full FS substitution. Heavy;
|
|
||||||
requires populating libs/binaries. Skip for the first pass.
|
|
||||||
- **`RootImage=path`** — same but from a disk image. Way too heavy.
|
|
||||||
- **`TemporaryFileSystem=path[:opts]`** — empty tmpfs at `path`.
|
|
||||||
Cheap. Composes with bind paths.
|
|
||||||
- **`BindReadOnlyPaths=src[:dst]`** — RO bind. Composes over
|
|
||||||
TemporaryFileSystem.
|
|
||||||
- **`BindPaths=src[:dst]`** — RW bind. Composes over TemporaryFileSystem.
|
|
||||||
- **`InaccessiblePaths=path`** — masks a path with an empty file/dir.
|
|
||||||
Legacy; Bind* is cleaner.
|
|
||||||
- **`NoExecPaths=path`** / **`ExecPaths=path`** — restrict
|
|
||||||
executable paths. Strong but easy to misconfigure.
|
|
||||||
|
|
||||||
Composition pattern (the one we want for srcds):
|
|
||||||
```ini
|
|
||||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/installation
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/overlays
|
|
||||||
BindReadOnlyPaths=/etc/left4me/host.env
|
|
||||||
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates /etc/resolv.conf
|
|
||||||
BindReadOnlyPaths=/etc/nsswitch.conf /etc/alternatives
|
|
||||||
BindPaths=/var/lib/left4me/runtime/%i
|
|
||||||
```
|
|
||||||
|
|
||||||
Result: srcds has no DB, no `web.env`, no `/opt/left4me/src/` in its FS
|
|
||||||
view. Files outside the bound list are simply not there from srcds's
|
|
||||||
perspective — `open()` returns ENOENT, not EACCES.
|
|
||||||
|
|
||||||
Sharp edges:
|
|
||||||
- `TemporaryFileSystem=` size defaults to half RAM; clamp via
|
|
||||||
`:size=NNM,nr_inodes=NN`.
|
|
||||||
- Bind paths must exist on disk; ENOENT prevents unit start.
|
|
||||||
- `BindReadOnlyPaths=` and `BindPaths=` reorder semantics: bind-mounts
|
|
||||||
applied in order; later wins.
|
|
||||||
- `RuntimeDirectory=` integrates with `TemporaryFileSystem=` cleanly:
|
|
||||||
`RuntimeDirectory=left4me/foo` creates `/run/left4me/foo` and binds
|
|
||||||
it in, auto-cleaning on stop.
|
|
||||||
|
|
||||||
### Namespaces (systemd wrappers)
|
|
||||||
|
|
||||||
- **`PrivateTmp=true`** — already set.
|
|
||||||
- **`PrivateDevices=true`** — already set. Drops most of `/dev`.
|
|
||||||
- **`PrivateNetwork=true`** — **don't** for gameservers (breaks UDP).
|
|
||||||
- **`PrivateIPC=true`** — private SysV/POSIX IPC namespace; cheap win.
|
|
||||||
- **`PrivateUsers=true`** — own userns. The configured `User=left4me`
|
|
||||||
is identity-mapped inside; outside, the unit's processes appear as a
|
|
||||||
mapped high uid (defense for D2/D4 against cross-namespace ptrace).
|
|
||||||
Sharp edge: incompatible with `sudo` from inside the unit (setuid +
|
|
||||||
userns mapping = no host-root).
|
|
||||||
- **`PrivateMounts=true`** — own mount ns (default-implicit with most
|
|
||||||
Protect* / Private* directives).
|
|
||||||
|
|
||||||
### `/proc` and `/sys` protection
|
|
||||||
|
|
||||||
- **`ProtectProc=invisible|noaccess|ptraceable|default`** —
|
|
||||||
`invisible` makes other procs' `/proc/<pid>/*` not exist. **D2.**
|
|
||||||
- **`ProcSubset=pid|all`** — `pid` restricts `/proc/` to PID entries;
|
|
||||||
hides `/proc/kallsyms`, `/proc/cpuinfo`, etc. Cheap.
|
|
||||||
- **`ProtectKernelTunables=true`** — `/proc/sys`, `/sys` read-only.
|
|
||||||
- **`ProtectKernelModules=true`** — block `init_module`, `delete_module`.
|
|
||||||
- **`ProtectKernelLogs=true`** — block `/dev/kmsg`, syslog().
|
|
||||||
- **`ProtectClock=true`** — block `clock_settime`, `settimeofday`.
|
|
||||||
- **`ProtectControlGroups=true`** — `/sys/fs/cgroup` read-only.
|
|
||||||
- **`ProtectHostname=true`** — block `sethostname`/`setdomainname`.
|
|
||||||
|
|
||||||
All of `ProtectKernel*`, `ProtectClock`, `ProtectControlGroups`,
|
|
||||||
`ProtectHostname` are cheap and have no downside for srcds or gunicorn.
|
|
||||||
Add all of them.
|
|
||||||
|
|
||||||
### Filesystem protection (legacy / not Bind*)
|
|
||||||
|
|
||||||
- **`ProtectSystem=false|true|full|strict`** — increasingly stringent
|
|
||||||
RO of system paths. `strict` makes `/`, `/usr`, `/boot`, `/etc`,
|
|
||||||
`/opt` RO except for explicit writable paths.
|
|
||||||
- **`ProtectHome=false|true|read-only|tmpfs`** — `tmpfs` masks `/home`,
|
|
||||||
`/root`, `/run/user` with empty tmpfs.
|
|
||||||
|
|
||||||
For us: `ProtectSystem=strict` + `ProtectHome=tmpfs` is the baseline.
|
|
||||||
But once we adopt `TemporaryFileSystem=` for the relevant trees, these
|
|
||||||
become secondary — TemporaryFileSystem fully supersedes them in the
|
|
||||||
covered subtrees. Keep both as defense-in-depth (cheap).
|
|
||||||
|
|
||||||
### Syscall filtering
|
|
||||||
|
|
||||||
- **`SystemCallFilter=expr`** — discussed in Linux section.
|
|
||||||
- **`SystemCallArchitectures=native`** — always set.
|
|
||||||
- **`SystemCallLog=expr`** — opt-in logging without enforcement;
|
|
||||||
useful for diagnosing what gets called before tightening.
|
|
||||||
- **`SystemCallErrorNumber=EPERM`** — soft denial vs. SIGKILL. Default
|
|
||||||
is SIGKILL; switch later if a benign caller trips.
|
|
||||||
|
|
||||||
### Capabilities
|
|
||||||
|
|
||||||
- **`CapabilityBoundingSet=`** — empty drops all. Use it.
|
|
||||||
- **`AmbientCapabilities=`** — empty (default).
|
|
||||||
- **`NoNewPrivileges=true`** — prevents setuid escalation. **Required
|
|
||||||
on srcds**, **incompatible with sudo on web** until sudo is replaced.
|
|
||||||
|
|
||||||
### Network restrictions
|
|
||||||
|
|
||||||
- **`RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX`** — for srcds.
|
|
||||||
AF_UNIX needed for journald socket access.
|
|
||||||
- **`IPAddressAllow=` / `IPAddressDeny=`** — uses cgroup BPF; affects
|
|
||||||
outbound traffic. For srcds: probably overcomplicates; the firewall
|
|
||||||
already controls ingress. Skip for first pass.
|
|
||||||
- **`SocketBindAllow=` / `SocketBindDeny=`** — restricts which ports a
|
|
||||||
unit can `bind()`. For srcds, allow only the configured game port
|
|
||||||
range. Adds value but couples to config. Defer to a follow-up.
|
|
||||||
|
|
||||||
### Resource restrictions
|
|
||||||
|
|
||||||
- **`MemoryMax`**, **`TasksMax`**, **`LimitNOFILE`** — already set.
|
|
||||||
- **`OOMScoreAdjust`** — already set (favor killing the gameserver
|
|
||||||
before system processes if memory tight).
|
|
||||||
- **`MemoryDenyWriteExecute=true`** — blocks `mprotect(PROT_WRITE|PROT_EXEC)`.
|
|
||||||
Defends against shellcode in JIT memory. **Source engine likely
|
|
||||||
fine** (no JIT in the binary; the Squirrel script engine is an
|
|
||||||
interpreter, not JIT). **Sourcemod plugins**: most are compiled to
|
|
||||||
bytecode + run on SourcePawn VM (interpreter); no JIT either. Verify
|
|
||||||
in test.
|
|
||||||
|
|
||||||
### IPC and process hygiene
|
|
||||||
|
|
||||||
- **`RemoveIPC=true`** — clean up SysV IPC on unit stop.
|
|
||||||
- **`KeyringMode=private`** — own kernel keyring; no host-key access.
|
|
||||||
- **`LockPersonality=true`** — block `personality(2)` calls (no x86 vs
|
|
||||||
x86-64 mode toggle). Already set.
|
|
||||||
- **`RestrictRealtime=true`** — block real-time scheduling. srcds may
|
|
||||||
use SCHED_OTHER + nice; no realtime needed.
|
|
||||||
- **`RestrictNamespaces=true`** — block `unshare(2)` / `clone(CLONE_NEW*)`.
|
|
||||||
- **`RestrictSUIDSGID=true`** — already set.
|
|
||||||
- **`UMask=0027`** — narrow default umask.
|
|
||||||
|
|
||||||
### Capabilities of the `+` prefix
|
|
||||||
|
|
||||||
`ExecStartPre=+cmd` runs `cmd` as root in PID 1's namespaces, bypassing
|
|
||||||
the unit's User= and almost all Protect*/Private*/Restrict* directives.
|
|
||||||
This is how the existing overlay-mount helper runs. Critical to verify
|
|
||||||
in test:
|
|
||||||
- Does `+` preserve the bypass when `PrivateUsers=true` is set?
|
|
||||||
(Expected: yes — the userns is set up around the unit's processes;
|
|
||||||
`+` puts the helper outside it.)
|
|
||||||
|
|
||||||
### State management (per-unit)
|
|
||||||
|
|
||||||
- **`StateDirectory=path`** — creates `/var/lib/<path>` owned by User=.
|
|
||||||
- **`RuntimeDirectory=path`** — creates `/run/<path>`, auto-deleted on
|
|
||||||
stop.
|
|
||||||
- **`LogsDirectory=path`** — `/var/log/<path>`.
|
|
||||||
- **`CacheDirectory=path`** — `/var/cache/<path>`.
|
|
||||||
- **`ConfigurationDirectory=path`** — `/etc/<path>`.
|
|
||||||
|
|
||||||
Useful for cleanup hygiene if we redesign storage layout. Not required
|
|
||||||
for first pass.
|
|
||||||
|
|
||||||
### `systemd-analyze security`
|
|
||||||
|
|
||||||
`systemd-analyze security <unit>` produces a security score per unit
|
|
||||||
(lower = more secure). Output lists each directive with a ✓/✗.
|
|
||||||
Useful as:
|
|
||||||
- Regression check (record baseline, ensure score drops after refactor).
|
|
||||||
- Discovery tool ("which directives haven't I set?").
|
|
||||||
|
|
||||||
Baseline scores (to capture during test plan):
|
|
||||||
- `left4me-server@1.service` before refactor
|
|
||||||
- `left4me-web.service` before refactor
|
|
||||||
|
|
||||||
### Composability lookups
|
|
||||||
|
|
||||||
The systemd docs use a "predefined preset" concept that's worth knowing:
|
|
||||||
|
|
||||||
- **`@privileged`** (syscall group) ⊃ `@process`, `@module`, `@ptrace`, etc.
|
|
||||||
- **`@system-service`** is the recommended base for "I want a normal
|
|
||||||
service to work."
|
|
||||||
- Subtracting `~@privileged` is broad; `~@debug @mount @raw-io` is
|
|
||||||
surgical.
|
|
||||||
|
|
||||||
## Section 3 — Application-level options
|
|
||||||
|
|
||||||
### Apparmor profile for srcds
|
|
||||||
|
|
||||||
If systemd directives leave gaps, an AppArmor profile would let us
|
|
||||||
deny specific paths or operations beyond what systemd's directives
|
|
||||||
cover. E.g., "deny network for srcds to a specific IP range" via
|
|
||||||
`network inet stream...` deny rules; or "deny mounting" beyond
|
|
||||||
`SystemCallFilter`.
|
|
||||||
|
|
||||||
Effort:
|
|
||||||
- Enable AppArmor in the kernel cmdline + boot config.
|
|
||||||
- Write a profile (e.g., `/etc/apparmor.d/usr.bin.srcds_linux`).
|
|
||||||
- Reference via systemd `AppArmorProfile=` per unit.
|
|
||||||
|
|
||||||
Skip for the first pass; revisit if test results show the systemd
|
|
||||||
directives alone leave a gap.
|
|
||||||
|
|
||||||
### landlock for the web app
|
|
||||||
|
|
||||||
Python web app could call `landlock_create_ruleset` / `landlock_add_rule`
|
|
||||||
/ `landlock_restrict_self` via ctypes. Restricts FS access at runtime.
|
|
||||||
|
|
||||||
For us:
|
|
||||||
- Could restrict gunicorn to `/var/lib/left4me/` + `/etc/left4me/web.env`
|
|
||||||
+ `/opt/left4me/.venv` + `/tmp`.
|
|
||||||
- Symmetric to `TemporaryFileSystem=` + `Bind*` but at the
|
|
||||||
application layer (no systemd reach).
|
|
||||||
|
|
||||||
Skip; systemd directives are simpler. Reconsider if we move to a
|
|
||||||
DynamicUser-style world later.
|
|
||||||
|
|
||||||
### File-integrity tooling (Aide, Tripwire)
|
|
||||||
|
|
||||||
Out of scope for prevention; useful for detection. Not in this design.
|
|
||||||
|
|
||||||
### Custom seccomp profile (bypassing systemd)
|
|
||||||
|
|
||||||
The web app could call `seccomp(2)` from inside Python via libseccomp
|
|
||||||
+ ctypes to tighten its own filter beyond what systemd applies.
|
|
||||||
Symmetric to landlock; skip for the same reason.
|
|
||||||
|
|
||||||
## Section 4 — Per-defense mapping
|
|
||||||
|
|
||||||
For each defense from the threat model, the primitives that implement
|
|
||||||
it, in priority order:
|
|
||||||
|
|
||||||
### D1 — Gameserver RCE cannot exfiltrate DB or `web.env`
|
|
||||||
|
|
||||||
| Primitive | Strength | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `TemporaryFileSystem=/var/lib /etc` + minimal bind set | Strong | The files simply aren't in the unit's FS view. ENOENT, not EACCES. |
|
|
||||||
| 3-user split (DB owned by `l4d2-web`) | Strong | Kernel-enforced; survives unit-config errors. |
|
|
||||||
| `BindReadOnlyPaths=/dev/null:/var/lib/left4me/left4me.db` | Medium | Masks the path; brittle (paths can move). |
|
|
||||||
| Filesystem ACLs (DB mode 0600) | Weak | Kernel still allows `left4me` group; only fixed by uid split. |
|
|
||||||
|
|
||||||
**Composition chosen:** `TemporaryFileSystem=` + Bind* (primary).
|
|
||||||
3-user split as defense-in-depth or deferred.
|
|
||||||
|
|
||||||
### D2 — Gameserver RCE cannot ptrace web app or peers
|
|
||||||
|
|
||||||
| Primitive | Strength | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `SystemCallFilter=~@debug` | Strong | Blocks `ptrace`, `process_vm_readv/writev`. |
|
|
||||||
| `kernel.yama.ptrace_scope=2` | Strong | Belt-and-braces at the kernel level. |
|
|
||||||
| `CapabilityBoundingSet=` empty | Strong | No CAP_SYS_PTRACE. |
|
|
||||||
| `PrivateUsers=true` | Strong | Cross-userns ptrace requires CAP_SYS_PTRACE. |
|
|
||||||
| 3-user split | Strong | Different uids; same-uid path doesn't exist. |
|
|
||||||
|
|
||||||
**Composition chosen:** All four (syscall + yama + caps + userns)
|
|
||||||
together; they compose redundantly.
|
|
||||||
|
|
||||||
### D3 — Gameserver RCE cannot use sudo helpers
|
|
||||||
|
|
||||||
| Primitive | Strength | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `NoNewPrivileges=true` | Strong | Blocks sudo's setuid. Already set on server@. |
|
|
||||||
| `PrivateUsers=true` | Strong | sudo across userns boundary impossible. |
|
|
||||||
| Sudoers grants scoped to `l4d2-web` (uid split) | Strong | Different uid means sudo grant doesn't apply. |
|
|
||||||
| `RestrictSUIDSGID=true` | Strong | Already set. |
|
|
||||||
|
|
||||||
**Composition chosen:** NoNewPrivileges (already) + PrivateUsers (new)
|
|
||||||
+ RestrictSUIDSGID (already). 3-user split is *also* covered by NNP
|
|
||||||
+ PrivateUsers; uid split would be defense-in-depth.
|
|
||||||
|
|
||||||
### D4 — Web app RCE cannot ptrace gameservers
|
|
||||||
|
|
||||||
| Primitive | Strength | Notes |
|
|
||||||
|---|---|---|
|
|
||||||
| `SystemCallFilter=~@debug` on **web** | Strong | Symmetric to D2 but applied to web. |
|
|
||||||
| `kernel.yama.ptrace_scope=2` | Strong | System-wide, helps both directions. |
|
|
||||||
| 3-user split | Strong | Different uids. |
|
|
||||||
|
|
||||||
**Composition chosen:** SystemCallFilter on web + yama=2 system-wide.
|
|
||||||
PrivateUsers cannot be applied to web (sudo incompatibility). 3-user
|
|
||||||
split as defense-in-depth or deferred.
|
|
||||||
|
|
||||||
### D5 — Cross-server contamination
|
|
||||||
|
|
||||||
Each `left4me-server@<n>.service` is a separate unit instance. With
|
|
||||||
`PrivateUsers=true`, each gets its own user namespace. Cross-namespace
|
|
||||||
ptrace fails. With `TemporaryFileSystem=` and per-instance
|
|
||||||
`BindPaths=/var/lib/left4me/runtime/%i`, neither instance can read the
|
|
||||||
other's `runtime/<n>/` or attach to its process.
|
|
||||||
|
|
||||||
**Composition chosen:** PrivateUsers + per-instance Bind* (above).
|
|
||||||
Per-instance uids out of scope.
|
|
||||||
|
|
||||||
### D6 — Persistent compromise of `/opt/left4me/src/` blocked from gameserver
|
|
||||||
|
|
||||||
Already covered by `ProtectSystem=strict` on server@.service. With
|
|
||||||
`TemporaryFileSystem=/opt`, the path simply isn't visible to srcds.
|
|
||||||
**Stronger and redundant — both can stay.**
|
|
||||||
|
|
||||||
### D7 — Defenses survive a unit-config refactor in the wrong direction
|
|
||||||
|
|
||||||
`deploy/tests/test_deploy_artifacts.py` asserts the directives' presence
|
|
||||||
in the deployed unit. Add hardening invariants as test cases. Survives
|
|
||||||
because the test fails CI before deploy.
|
|
||||||
|
|
||||||
## Section 5 — Candidate composition
|
|
||||||
|
|
||||||
**For testing, not commitment.** Test plan validates each piece.
|
|
||||||
|
|
||||||
### `left4me-server@.service`
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
User=left4me
|
|
||||||
Group=left4me
|
|
||||||
|
|
||||||
# (existing)
|
|
||||||
Type=simple
|
|
||||||
WorkingDirectory=-/var/lib/left4me/runtime/%i/merged/left4dead2
|
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
|
||||||
EnvironmentFile=/var/lib/left4me/instances/%i/instance.env
|
|
||||||
ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
|
|
||||||
ExecStart=/var/lib/left4me/runtime/%i/merged/srcds_run -game left4dead2 +hostport ${L4D2_PORT} $L4D2_ARGS
|
|
||||||
ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
|
|
||||||
Restart=on-failure
|
|
||||||
RestartSec=5
|
|
||||||
|
|
||||||
# Resource control (existing)
|
|
||||||
Slice=l4d2-game.slice
|
|
||||||
Nice=-5
|
|
||||||
IOSchedulingClass=best-effort
|
|
||||||
IOSchedulingPriority=4
|
|
||||||
OOMScoreAdjust=-200
|
|
||||||
MemoryHigh=1.5G
|
|
||||||
MemoryMax=2G
|
|
||||||
TasksMax=256
|
|
||||||
LimitNOFILE=65536
|
|
||||||
KillSignal=SIGINT
|
|
||||||
TimeoutStopSec=15s
|
|
||||||
LogRateLimitIntervalSec=0
|
|
||||||
|
|
||||||
# Hardening — identity
|
|
||||||
NoNewPrivileges=true
|
|
||||||
RestrictSUIDSGID=true
|
|
||||||
|
|
||||||
# Hardening — namespaces
|
|
||||||
PrivateTmp=true
|
|
||||||
PrivateDevices=true
|
|
||||||
PrivateIPC=true
|
|
||||||
PrivateUsers=true # NEW
|
|
||||||
ProtectHome=true
|
|
||||||
|
|
||||||
# Hardening — filesystem view
|
|
||||||
TemporaryFileSystem=/var/lib /etc /opt /home /root /srv /mnt /media # NEW
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/installation # was ReadOnlyPaths
|
|
||||||
BindReadOnlyPaths=/var/lib/left4me/overlays # was ReadOnlyPaths
|
|
||||||
BindReadOnlyPaths=/etc/left4me/host.env # NEW
|
|
||||||
BindReadOnlyPaths=/etc/ssl /etc/ca-certificates # NEW
|
|
||||||
BindReadOnlyPaths=/etc/resolv.conf /etc/nsswitch.conf /etc/alternatives # NEW
|
|
||||||
BindPaths=/var/lib/left4me/runtime/%i # was ReadWritePaths
|
|
||||||
ProtectSystem=strict
|
|
||||||
# (remove old ReadOnlyPaths= and ReadWritePaths= lines — superseded)
|
|
||||||
|
|
||||||
# Hardening — /proc, /sys, kernel
|
|
||||||
ProtectProc=invisible # NEW
|
|
||||||
ProcSubset=pid # NEW
|
|
||||||
ProtectKernelTunables=true # NEW
|
|
||||||
ProtectKernelModules=true # NEW
|
|
||||||
ProtectKernelLogs=true # NEW
|
|
||||||
ProtectClock=true # NEW
|
|
||||||
ProtectControlGroups=true # NEW
|
|
||||||
ProtectHostname=true # NEW
|
|
||||||
LockPersonality=true
|
|
||||||
|
|
||||||
# Hardening — caps + syscall
|
|
||||||
CapabilityBoundingSet= # NEW
|
|
||||||
AmbientCapabilities= # NEW
|
|
||||||
SystemCallArchitectures=native # NEW
|
|
||||||
SystemCallFilter=@system-service # NEW
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged # NEW
|
|
||||||
|
|
||||||
# Hardening — network
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW (AF_UNIX for journald)
|
|
||||||
|
|
||||||
# Hardening — namespaces, realtime, IPC
|
|
||||||
RestrictNamespaces=true # NEW
|
|
||||||
RestrictRealtime=true # NEW
|
|
||||||
RemoveIPC=true # NEW
|
|
||||||
KeyringMode=private # NEW
|
|
||||||
UMask=0027 # NEW
|
|
||||||
|
|
||||||
# Deferred until test:
|
|
||||||
# MemoryDenyWriteExecute=true # MAY break sourcemod / Source engine; test first.
|
|
||||||
```
|
|
||||||
|
|
||||||
### `left4me-web.service`
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
User=left4me
|
|
||||||
Group=left4me
|
|
||||||
|
|
||||||
# (existing)
|
|
||||||
Type=simple
|
|
||||||
WorkingDirectory=/opt/left4me/src
|
|
||||||
Environment=HOME=/var/lib/left4me PATH=/opt/left4me/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
|
||||||
EnvironmentFile=/etc/left4me/host.env
|
|
||||||
EnvironmentFile=/etc/left4me/web.env
|
|
||||||
ExecStart=/opt/left4me/.venv/bin/gunicorn --workers ... --threads ... --bind 127.0.0.1:8000 'l4d2web.app:create_app()'
|
|
||||||
Restart=on-failure
|
|
||||||
RestartSec=3
|
|
||||||
|
|
||||||
# Hardening
|
|
||||||
PrivateTmp=true
|
|
||||||
ProtectSystem=strict # tightened from =full
|
|
||||||
ProtectHome=true
|
|
||||||
ReadWritePaths=/var/lib/left4me # web needs broad write access there
|
|
||||||
# NoNewPrivileges intentionally NOT set — sudo
|
|
||||||
# PrivateUsers intentionally NOT set — sudo
|
|
||||||
|
|
||||||
# /proc + kernel hardening (sudo-compatible)
|
|
||||||
ProtectProc=invisible # NEW
|
|
||||||
ProcSubset=pid # NEW
|
|
||||||
ProtectKernelTunables=true # NEW
|
|
||||||
ProtectKernelModules=true # NEW
|
|
||||||
ProtectKernelLogs=true # NEW
|
|
||||||
ProtectClock=true # NEW
|
|
||||||
ProtectControlGroups=true # NEW
|
|
||||||
ProtectHostname=true # NEW
|
|
||||||
LockPersonality=true # NEW
|
|
||||||
|
|
||||||
# Syscall filter — allow @system-service minus debug-class; keep @privileged
|
|
||||||
# because sudo needs setuid, chown, etc.
|
|
||||||
SystemCallArchitectures=native # NEW
|
|
||||||
SystemCallFilter=@system-service # NEW
|
|
||||||
SystemCallFilter=~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete # NEW
|
|
||||||
|
|
||||||
# Network
|
|
||||||
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX # NEW
|
|
||||||
|
|
||||||
# Misc hygiene
|
|
||||||
RestrictRealtime=true # NEW
|
|
||||||
RestrictNamespaces=true # NEW
|
|
||||||
RemoveIPC=true # NEW
|
|
||||||
UMask=0027 # NEW
|
|
||||||
|
|
||||||
# Deferred for sudo-removal future work:
|
|
||||||
# NoNewPrivileges=true
|
|
||||||
# CapabilityBoundingSet=
|
|
||||||
# PrivateUsers=true
|
|
||||||
```
|
|
||||||
|
|
||||||
### Host sysctl
|
|
||||||
|
|
||||||
`/etc/sysctl.d/99-left4me.conf` (or merge into existing):
|
|
||||||
```
|
|
||||||
kernel.yama.ptrace_scope=2
|
|
||||||
```
|
|
||||||
|
|
||||||
System-wide. Means: even if a unit-level config slips, host-level
|
|
||||||
ptrace is admin-only. Cost: zero for our use case (no debugging in
|
|
||||||
prod).
|
|
||||||
|
|
||||||
## Section 6 — Trade-offs and known sharp edges
|
|
||||||
|
|
||||||
To verify in the test plan:
|
|
||||||
|
|
||||||
1. **`PrivateUsers=true` + `+`-prefixed ExecStartPre**: expected to
|
|
||||||
work (the `+` runs outside the unit's namespaces). Sharp if it
|
|
||||||
doesn't — the overlay mount would fail and srcds wouldn't start.
|
|
||||||
2. **`TemporaryFileSystem=/etc` and missing files**: srcds and its
|
|
||||||
dependencies (libstdc++ runtime, libssl, libcurl) may read files
|
|
||||||
from `/etc` we haven't bound. Watch journalctl for ENOENT during
|
|
||||||
first start.
|
|
||||||
3. **`SystemCallFilter=~@privileged` and Source engine**: srcds is C++
|
|
||||||
and uses syscalls beyond the obvious. A `~@privileged` may trip
|
|
||||||
something. Mitigation: test with `SystemCallLog=` instead of
|
|
||||||
`SystemCallFilter=` first; observe what would have been blocked;
|
|
||||||
then narrow.
|
|
||||||
4. **`MemoryDenyWriteExecute=true` and sourcemod**: SourcePawn is
|
|
||||||
bytecode-interpreted (no JIT) per public docs, but plugin
|
|
||||||
compilation could in theory use a JIT. Test before enabling.
|
|
||||||
5. **`RestrictAddressFamilies=` without AF_UNIX**: journald socket
|
|
||||||
needs it. Always include AF_UNIX.
|
|
||||||
6. **`ProcSubset=pid` and Python**: gunicorn shouldn't break (uses
|
|
||||||
/proc/self/* + signal-based ipc). Verify.
|
|
||||||
7. **sysctl `kernel.yama.ptrace_scope=2`**: blocks operator's own
|
|
||||||
`gdb` / `strace -p` against any running service. If you need to
|
|
||||||
debug, temporarily set back to 1 via sysctl, then revert.
|
|
||||||
8. **`ProtectSystem=strict` on web**: was `=full`. Tighter; might
|
|
||||||
break a write the web app does to a path outside `/var/lib/left4me`.
|
|
||||||
Audit `l4d2web/*` for `os.makedirs` or `open(...'w')` outside that
|
|
||||||
root.
|
|
||||||
|
|
||||||
## Open questions for the implementer
|
|
||||||
|
|
||||||
(After test plan results come back, finalize these.)
|
|
||||||
|
|
||||||
1. Do we adopt `MemoryDenyWriteExecute=true` if it works for srcds?
|
|
||||||
(Probably yes, defense-in-depth at low cost.)
|
|
||||||
2. Do we set `SocketBindAllow=` on srcds to lock the port range?
|
|
||||||
(Depends on whether `instance.env` exposes the range cleanly to a
|
|
||||||
unit directive.)
|
|
||||||
3. Do we deploy AppArmor profiles as a follow-up?
|
|
||||||
(Probably no — operational complexity exceeds the marginal gain on
|
|
||||||
single-host infra.)
|
|
||||||
4. Do we keep both `BindReadOnlyPaths=` and the legacy
|
|
||||||
`ReadOnlyPaths=` declarations, or simplify? (Simplify — use Bind*
|
|
||||||
exclusively once `TemporaryFileSystem=` is in place.)
|
|
||||||
5. Do we proceed with 3-user split as a follow-up, or close the spec
|
|
||||||
as "addressed by hardening"? Depends on operator's residual-risk
|
|
||||||
tolerance after Phase A lands and we observe.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
|
|
||||||
- Test plan: `docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
|
||||||
- Original uid-split spec (still open): `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
|
||||||
- Live unit source (ckn-bw reactor): `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
|
||||||
- Reference units (deploy-dir-rethink reference-only): `deploy/files/usr/local/lib/systemd/system/`
|
|
||||||
- systemd docs (latest, systemd 256+ on Trixie):
|
|
||||||
`man systemd.exec`, `man systemd.unit`, `man systemd-analyze`.
|
|
||||||
- L4D2 / Source engine docs:
|
|
||||||
- SourcePawn (bytecode-interpreted): https://wiki.alliedmods.net/SourcePawn
|
|
||||||
- srcds is a Source 2007 engine binary; closed-source, expect surprises.
|
|
||||||
|
|
@ -1,237 +0,0 @@
|
||||||
# Hardening refactor — design
|
|
||||||
|
|
||||||
**Status:** approved design; implementation plan to follow at
|
|
||||||
`docs/superpowers/plans/2026-05-15-hardening-refactor.md`.
|
|
||||||
Companion: `2026-05-15-hardening-threat-model.md`,
|
|
||||||
`2026-05-15-hardening-defenses-survey.md`,
|
|
||||||
`2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
|
|
||||||
|
|
||||||
This doc records the *shape* of the refactor — where the artifacts live,
|
|
||||||
how they're factored, what's in scope. The implementation plan lays out
|
|
||||||
the steps.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The hardening test plan ran end-to-end on `left4.me` on 2026-05-15
|
|
||||||
(commit `461b8d0`). Outcome: `left4me-server@1` 7.5→1.3 systemd-analyze,
|
|
||||||
`left4me-web` 8.7→4.1, all 8 Test 8 attack vectors blocked. Two
|
|
||||||
amendments to the spec's proposed composition required: `SystemCallArchitectures=native x86`
|
|
||||||
(srcds_linux is i386), `PrivatePIDs=true` (same-uid
|
|
||||||
`ProtectProc=invisible` can't hide gunicorn from srcds; PID namespace
|
|
||||||
fixes it at the kernel level). `MemoryDenyWriteExecute=true` permanently
|
|
||||||
excluded (Source engine i386 `.so` files have text relocations).
|
|
||||||
|
|
||||||
Composition is *not currently deployed* — Test 7's drop-in was cleaned
|
|
||||||
up at session end; only the Test 9 sysctl (`kernel.yama.ptrace_scope=2`)
|
|
||||||
persists. This refactor lands the proven composition permanently via
|
|
||||||
the ckn-bw bundle.
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
Keep the current responsibility split for now: ckn-bw owns systemd unit
|
|
||||||
emission (base + hardening), left4me owns the educational reference
|
|
||||||
copies and the threat-model/test docs. Hardening directives land in
|
|
||||||
ckn-bw's `systemd/units` reactor at
|
|
||||||
`~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`, factored via
|
|
||||||
shared Python dicts so the two units (and the future
|
|
||||||
build-overlay-unit refactor) reuse the common base.
|
|
||||||
|
|
||||||
The broader responsibility reshape — hardening as drop-in files
|
|
||||||
*living* in left4me with ckn-bw as a thin file-shipper — is a real
|
|
||||||
direction worth pursuing, but deserves its own session. Deferred.
|
|
||||||
|
|
||||||
## Factoring
|
|
||||||
|
|
||||||
Three dict constants at the top of `metadata.py` (or in a sibling
|
|
||||||
`hardening.py` module if `metadata.py` grows past a comfortable read):
|
|
||||||
|
|
||||||
### `HARDENING_COMMON`
|
|
||||||
|
|
||||||
Directives both units take verbatim. ~17 keys:
|
|
||||||
|
|
||||||
```python
|
|
||||||
HARDENING_COMMON = {
|
|
||||||
'ProtectProc': 'invisible',
|
|
||||||
'ProcSubset': 'pid',
|
|
||||||
'ProtectKernelTunables': 'true',
|
|
||||||
'ProtectKernelModules': 'true',
|
|
||||||
'ProtectKernelLogs': 'true',
|
|
||||||
'ProtectClock': 'true',
|
|
||||||
'ProtectControlGroups': 'true',
|
|
||||||
'ProtectHostname': 'true',
|
|
||||||
'LockPersonality': 'true',
|
|
||||||
'ProtectSystem': 'strict',
|
|
||||||
'ProtectHome': 'true',
|
|
||||||
'PrivateTmp': 'true',
|
|
||||||
'RestrictNamespaces': 'true',
|
|
||||||
'RestrictRealtime': 'true',
|
|
||||||
'RemoveIPC': 'true',
|
|
||||||
'KeyringMode': 'private',
|
|
||||||
'UMask': '0027',
|
|
||||||
'RestrictAddressFamilies': 'AF_INET AF_INET6 AF_UNIX',
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### `HARDENING_SERVER`
|
|
||||||
|
|
||||||
`{**HARDENING_COMMON, ...server-specific}`. Adds sudo-incompatible
|
|
||||||
flags + filesystem virtualization + i386 amendment + per-instance PID
|
|
||||||
namespace + bound socket binds:
|
|
||||||
|
|
||||||
- `NoNewPrivileges=true`
|
|
||||||
- `RestrictSUIDSGID=true`
|
|
||||||
- `PrivateUsers=true`
|
|
||||||
- **`PrivatePIDs=true`** *(Test amendment — D2.b / D5)*
|
|
||||||
- `PrivateIPC=true`
|
|
||||||
- `PrivateDevices=true`
|
|
||||||
- `CapabilityBoundingSet=` *(empty value → drop all)*
|
|
||||||
- `AmbientCapabilities=`
|
|
||||||
- `SystemCallArchitectures='native x86'` *(Test amendment — i386 srcds)*
|
|
||||||
- `SystemCallFilter=('@system-service', '~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged')` *(tuple → repeated key)*
|
|
||||||
- `TemporaryFileSystem='/var/lib /etc /opt /home /root /srv /mnt /media'`
|
|
||||||
- `BindReadOnlyPaths=('/var/lib/left4me/installation', '/var/lib/left4me/overlays', '/etc/left4me/host.env', '/etc/ssl', '/etc/ca-certificates', '/etc/resolv.conf', '/etc/nsswitch.conf', '/etc/alternatives')`
|
|
||||||
- `BindPaths='/var/lib/left4me/runtime/%i'`
|
|
||||||
- `SocketBindAllow=('udp:27000-27999', 'tcp:27000-27999')` *(NEW — lock srcds bindable sockets to the game port range; not tested in Test 7 but cheap defense-in-depth. Concrete range pending verification of `LEFT4ME_PORT_RANGE_*` substitution support in systemd directives; hard-coded range as fallback.)*
|
|
||||||
|
|
||||||
### `HARDENING_WEB`
|
|
||||||
|
|
||||||
`{**HARDENING_COMMON, ...web-specific}`. Web inherits `ProtectSystem=strict`
|
|
||||||
from COMMON (was `=full` in the current base unit; this tightens). Adds
|
|
||||||
a syscall filter *without* `~@privileged` (sudo needs setuid).
|
|
||||||
**Excludes** `NoNewPrivileges`, `PrivateUsers`, `RestrictSUIDSGID`,
|
|
||||||
empty `CapabilityBoundingSet` — all sudo-incompatible.
|
|
||||||
|
|
||||||
- `SystemCallArchitectures='native'`
|
|
||||||
- `SystemCallFilter=('@system-service', '~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete')` *(no `~@privileged`)*
|
|
||||||
|
|
||||||
Web's existing `ReadWritePaths=/var/lib/left4me` stays in its unit's
|
|
||||||
inline `Service` dict (web-specific, not common).
|
|
||||||
|
|
||||||
### Multi-value directives and empty values
|
|
||||||
|
|
||||||
Tuples-of-strings → emitted as repeated `Key=Value` lines by ckn-bw's
|
|
||||||
systemd-bundle emitter. Existing precedent: `EnvironmentFile` at
|
|
||||||
`metadata.py:201-204`. Empty values (`CapabilityBoundingSet=`,
|
|
||||||
`AmbientCapabilities=`) need to emit as `Key=` with nothing after `=`.
|
|
||||||
Both behaviors verified as the first step of the implementation plan;
|
|
||||||
fallback approaches if the emitter doesn't handle them: inline-joined
|
|
||||||
strings where systemd accepts them, or extend the emitter.
|
|
||||||
|
|
||||||
## Reference units
|
|
||||||
|
|
||||||
Keep `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
|
|
||||||
and `deploy/files/usr/local/lib/systemd/system/left4me-web.service` as
|
|
||||||
**deliberately educational** copies of the deployed units. Each new
|
|
||||||
hardening directive in the reference gets a one-line comment
|
|
||||||
explaining the threat it addresses. A cold reader of the repo can open
|
|
||||||
the reference unit and read the threat model in code form, without
|
|
||||||
needing to read the ckn-bw bundle or systemd man pages.
|
|
||||||
|
|
||||||
Source-of-truth: ckn-bw reactor is what's deployed. Reference units in
|
|
||||||
left4me are hand-synced. No CI drift test (would be brittle against
|
|
||||||
comment ordering and structural human-readable formatting); operator
|
|
||||||
discipline at edit time keeps them aligned. A top-of-file note in each
|
|
||||||
reference unit points readers at the reactor.
|
|
||||||
|
|
||||||
## Scope of the refactor
|
|
||||||
|
|
||||||
1. **Ckn-bw reactor edits.** Three constants + spread into the two
|
|
||||||
units. Verify tuple-multi-value emission. `metadata.py`.
|
|
||||||
2. **Sysctl drop-in via ckn-bw.** `kernel.yama.ptrace_scope=2`. Move
|
|
||||||
from host-only `/etc/sysctl.d/99-left4me-ptrace.conf` (applied by
|
|
||||||
hand in Test 9) into the bundle's file management. Find the existing
|
|
||||||
sysctl pattern in ckn-bw and follow it.
|
|
||||||
3. **Reference unit mirror with educational comments.** Update
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service`
|
|
||||||
to match the reactor's emission, with per-directive comments
|
|
||||||
explaining each hardening directive's purpose. Top-of-file note
|
|
||||||
pointing to the reactor.
|
|
||||||
4. **Spec bug fixes in the test plan.** Four bugs flagged in
|
|
||||||
`2026-05-15-hardening-test-plan.md`'s output section: PID-lookup
|
|
||||||
race (use `systemctl show -p MainPID --value`), gdb-from-host
|
|
||||||
verification flaw (probe via `systemd-run` inside the same
|
|
||||||
hardening profile, not via `nsenter` that bypasses it), D5 pgrep
|
|
||||||
pattern, `scmp_sys_resolver` package is `seccomp` not
|
|
||||||
`libseccomp-dev`. Doc-only.
|
|
||||||
5. **Mark `2026-05-15-user-uid-split-design.md` superseded.** Front-matter
|
|
||||||
status note + brief explanation that `PrivateUsers` + `PrivatePIDs`
|
|
||||||
+ `TemporaryFileSystem` close D1, D2, D3, D5 at the kernel level.
|
|
||||||
Reference this design + the refactor plan as the replacement.
|
|
||||||
6. **`SocketBindAllow=` for srcds** (in `HARDENING_SERVER`). Not tested
|
|
||||||
in Test 7; verify on deploy. Encoding pending — likely hard-coded
|
|
||||||
port range, since systemd directive variable substitution support
|
|
||||||
is uneven.
|
|
||||||
7. **Cleanup unmanaged packages on left4.me.** `apt remove gdb seccomp
|
|
||||||
libseccomp-dev` after the refactor lands. Test-only tooling;
|
|
||||||
reinstall on demand for future test sessions.
|
|
||||||
|
|
||||||
## Sequencing the deploy
|
|
||||||
|
|
||||||
1. Land ckn-bw commit (reactor changes, sysctl drop-in entry).
|
|
||||||
2. Land left4me commit (reference units, spec bug fixes, uid-split
|
|
||||||
spec status update, this design doc, the refactor plan).
|
|
||||||
3. Push both repos.
|
|
||||||
4. `bw apply ovh.left4me` — applies reactor changes; systemd restarts
|
|
||||||
affected units automatically.
|
|
||||||
5. Verify on the host:
|
|
||||||
- `systemctl cat left4me-server@1` shows the new directives.
|
|
||||||
- Re-run a Test 8 subset (D1.a, D1.b, D2.b via PrivatePIDs, D5 with
|
|
||||||
the corrected pgrep) using the *corrected* probe pattern (per
|
|
||||||
spec bug fix in scope item 4). Test 8's full rerun is unnecessary
|
|
||||||
— composition is proven; only the *deployment* needs verifying.
|
|
||||||
- `sysctl kernel.yama.ptrace_scope` = 2.
|
|
||||||
- Smoke: server@1 + server@2 + web all active and stable for 10
|
|
||||||
minutes. Web UI: login, server start/stop, log view, overlay
|
|
||||||
rebuild.
|
|
||||||
6. Rollback if needed: `git revert` the ckn-bw commit + `bw apply`.
|
|
||||||
|
|
||||||
## What's out of scope
|
|
||||||
|
|
||||||
- **`MemoryDenyWriteExecute=true`** — permanently excluded.
|
|
||||||
- **AppArmor profile** — deferred per defenses-survey.
|
|
||||||
- **`build-overlay-unit` refactor**
|
|
||||||
(`2026-05-15-build-overlay-unit-design.md`) — sequenced after this.
|
|
||||||
Will reuse `HARDENING_COMMON` (or a variant) when it lands.
|
|
||||||
- **3-user uid split** — `2026-05-15-user-uid-split-design.md`
|
|
||||||
superseded by this refactor (scope item 5).
|
|
||||||
- **Broader configmgmt-responsibility reshape** — hardening as
|
|
||||||
drop-ins living in left4me, ckn-bw becoming a thin file-shipper.
|
|
||||||
Real direction worth pursuing; deserves a dedicated session.
|
|
||||||
Out of scope here.
|
|
||||||
- **Stale RCON port app bug** — flagged in executor's handoff. Separate
|
|
||||||
scope.
|
|
||||||
- **Pushing the branch** — operator decides when.
|
|
||||||
|
|
||||||
## Implementation notes (resolved during plan execution)
|
|
||||||
|
|
||||||
- The ckn-bw systemd-bundle emitter renders Python tuples as repeated
|
|
||||||
`Key=Value` lines and renders empty strings as `Key=` with no value.
|
|
||||||
Both behaviors confirmed by reading the Mako template in
|
|
||||||
`libs/systemd.py:17-23`. Tuple branch: `isinstance(value,
|
|
||||||
(list, set, tuple))` iterates and emits `${option}=${item}` per
|
|
||||||
element, preserving insertion order (sets are sorted; lists and
|
|
||||||
tuples are not). Empty-string branch: falls through to `else:
|
|
||||||
${option}=${str(value)}`, which emits `Key=` with nothing after `=`.
|
|
||||||
`None` suppresses the key entirely (distinct from empty string —
|
|
||||||
important). The `protection()` helper at `libs/systemd.py:94` already
|
|
||||||
uses `'CapabilityBoundingSet': ''` as a live in-repo example. Tuple
|
|
||||||
precedent in the left4me bundle: `EnvironmentFile` at
|
|
||||||
`bundles/left4me/metadata.py:201-204`. Verified 2026-05-15.
|
|
||||||
- `SocketBindAllow=` value: hard-coded port range `27000-27999` for
|
|
||||||
both `udp:` and `tcp:` lines (matches the `LEFT4ME_PORT_RANGE_*`
|
|
||||||
metadata values). Variable substitution in systemd directives is not
|
|
||||||
universally supported; hard-coded range avoids the hazard.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Threat model: `2026-05-15-hardening-threat-model.md`
|
|
||||||
- Defenses survey: `2026-05-15-hardening-defenses-survey.md` (§ 5
|
|
||||||
candidate composition is the basis for the factoring above)
|
|
||||||
- Test plan + results: `2026-05-15-hardening-test-plan.md`
|
|
||||||
(commit `461b8d0`)
|
|
||||||
- Executor's handoff: `2026-05-15-session-handoff.md`
|
|
||||||
(commit `152c313`)
|
|
||||||
- Live reactor: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
|
||||||
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
|
|
||||||
- Deferred uid-split spec: `2026-05-15-user-uid-split-design.md`
|
|
||||||
- Adjacent (sequenced after): `2026-05-15-build-overlay-unit-design.md`
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,345 +0,0 @@
|
||||||
# left4me application hardening — threat model
|
|
||||||
|
|
||||||
**Status:** living spec, intended input to a hardening implementation plan.
|
|
||||||
Paired with `2026-05-15-hardening-defenses-survey.md` and
|
|
||||||
`2026-05-15-hardening-test-plan.md`.
|
|
||||||
|
|
||||||
This document establishes *what we defend against and what we accept losing*.
|
|
||||||
The defenses survey and test plan operationalize this against the codebase.
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The 2026-05-15 work landed deploy-dir-rethink + build-time-idmap and
|
|
||||||
queued "uid split decision" as the next session's task
|
|
||||||
(`2026-05-15-user-uid-split-design.md`). Audit of the running 2-user
|
|
||||||
configuration found that the gameserver's systemd hardening blocks
|
|
||||||
privilege escalation but leaves same-uid attack surface wide open:
|
|
||||||
RCON passwords plaintext in `/var/lib/left4me/left4me.db` (readable by
|
|
||||||
srcds), Flask `SECRET_KEY` in `/etc/left4me/web.env` (also readable),
|
|
||||||
no ptrace block on `left4me-server@.service`, no `/proc` isolation.
|
|
||||||
Rather than answer the original "1/2/3 uids" question in isolation,
|
|
||||||
this work treats application hardening as a first-class refactor: ground
|
|
||||||
the decision in an explicit threat model, survey the full Linux+systemd
|
|
||||||
defense menu, test what composes safely with Source engine + the rest of
|
|
||||||
the stack, then implement.
|
|
||||||
|
|
||||||
## Operating posture (assumed)
|
|
||||||
|
|
||||||
Solo-operator, single-host infra (`left4.me` / `ovh.left4me`,
|
|
||||||
141.95.32.8). Host is a personal VPS, not multi-tenant. The only privileged
|
|
||||||
operator is the user. There are no shell logins as `left4me` or
|
|
||||||
`l4d2-sandbox`. All access to those uids is funneled through the
|
|
||||||
systemd-managed units (`left4me-web.service`, `left4me-server@.service`,
|
|
||||||
`left4me-script-sandbox`). The host runs nothing other than left4me +
|
|
||||||
ckn-bw-managed baseline (nginx, sshd, fail2ban-class basics).
|
|
||||||
|
|
||||||
If those assumptions don't hold (e.g., shared host with other tenants,
|
|
||||||
non-systemd-mediated access to the uids), revise this document before
|
|
||||||
proceeding — threat surface changes meaningfully.
|
|
||||||
|
|
||||||
## Assets
|
|
||||||
|
|
||||||
Ordered by impact-if-compromised. Compromise means the attacker can
|
|
||||||
exfiltrate, modify, or destroy the asset.
|
|
||||||
|
|
||||||
### Tier 1 — catastrophic, no easy recovery
|
|
||||||
|
|
||||||
| Asset | Where | Impact of compromise |
|
|
||||||
|---|---|---|
|
|
||||||
| Host root | the box | Total compromise of every service on the host. |
|
|
||||||
| `web.env` Flask `SECRET_KEY` | `/etc/left4me/web.env`, `root:left4me 0640` | Session forgery: attacker logs in as any admin without password. |
|
|
||||||
| `web.env` Steam Web API key | same | Attacker can query/operate Steam Web API as us. Rate-limited; reputational. |
|
|
||||||
| Server RCON passwords | DB: `Server.rcon_password` plaintext (`l4d2web/models.py:146-148`) | Attacker can execute arbitrary RCON on every gameserver: `sm_kick`, `rcon say`, server lockup, plugin abuse. |
|
|
||||||
| User password hashes (bcrypt) | DB: `User.password_digest` (`l4d2web/models.py:31`) | Offline cracking per user. bcrypt slows it but doesn't stop it. |
|
|
||||||
|
|
||||||
### Tier 2 — severe but bounded
|
|
||||||
|
|
||||||
| Asset | Where | Impact |
|
|
||||||
|---|---|---|
|
|
||||||
| `/opt/left4me/src/` Python source | `left4me:left4me` on disk | Persistent backdoor in web app via gunicorn reload. Currently RO from inside the server unit (`ProtectSystem=strict` covers `/opt`); RW from inside the web unit. |
|
|
||||||
| Overlay content | `/var/lib/left4me/overlays/<id>/` | Persistent sourcemod plugin or replaced binary; surfaces in every gameserver using that overlay. |
|
|
||||||
| Steam installation | `/var/lib/left4me/installation/` | Tampered `srcds_linux`; trivial persistence. Currently RO from server, RW from web. |
|
|
||||||
| Sourcemod admin lists | inside overlays | RCON-equivalent: admin commands in-game. |
|
|
||||||
| Workshop cache | `/var/lib/left4me/workshop_cache/` | Used by builds; tampered content surfaces in next overlay. |
|
|
||||||
|
|
||||||
### Tier 3 — limited, recoverable
|
|
||||||
|
|
||||||
Job history, build logs, the small subset of in-game state not covered by
|
|
||||||
the above (e.g., live player slot in a specific match).
|
|
||||||
|
|
||||||
## Trust boundaries
|
|
||||||
|
|
||||||
Lines we want enforced. "Enforced" = the kernel + systemd, not "the
|
|
||||||
process politely doesn't cross it."
|
|
||||||
|
|
||||||
| Id | From | To | Strength today | Strength wanted |
|
|
||||||
|---|---|---|---|---|
|
|
||||||
| TB1 | External network | host shell | Strong (firewall, no extra services) | Strong |
|
|
||||||
| TB2 | Gameserver process | rest of the host | Weak (same-uid + same-FS view) | Strong |
|
|
||||||
| TB3 | Web app | rest of the host | Weak (same-uid + same-FS view) | Medium (sudo path inherent) |
|
|
||||||
| TB4 | Sandbox | rest of the host | Strong (separate uid + hardened unit) | Strong |
|
|
||||||
| TB5 | Gameserver instance N | gameserver instance M | None (same-uid, same-DB) | Strong |
|
|
||||||
| TB6 | Web app | gameserver runtime state | None (same-uid, shared `runtime/<n>` access) | Medium (web needs to stage server.cfg) |
|
|
||||||
| TB7 | Gameserver | web-only secrets (DB, web.env) | None | Strong |
|
|
||||||
| TB8 | Workshop content | srcds-process | Inherent (content runs as data) | n/a — not a software boundary |
|
|
||||||
|
|
||||||
TB2, TB5, TB7 are the highest-leverage gaps. TB6 is partial because the
|
|
||||||
web app legitimately writes per-instance config; the boundary is "web
|
|
||||||
can write per-instance config" allowed, "web can ptrace srcds" denied.
|
|
||||||
|
|
||||||
## Attackers
|
|
||||||
|
|
||||||
### A1 — Anonymous external attacker (primary)
|
|
||||||
|
|
||||||
Reaches public surfaces:
|
|
||||||
- gunicorn on `:8000` (behind nginx + admin auth)
|
|
||||||
- srcds on UDP `:27015`+ per instance (game protocol; no auth)
|
|
||||||
- (Maybe: workshop subscription endpoints if any; check.)
|
|
||||||
|
|
||||||
Capabilities: arbitrary network packets. Goal: code execution on the
|
|
||||||
host, then exfiltrate secrets and persist.
|
|
||||||
|
|
||||||
### A2 — Authenticated admin (operator)
|
|
||||||
|
|
||||||
In the assumed posture this is *the user*, single person. Out of scope as
|
|
||||||
a threat per operator's choice (insider == operator). If admin auth ever
|
|
||||||
expands to multiple operators, revise.
|
|
||||||
|
|
||||||
### A3 — Malicious workshop content
|
|
||||||
|
|
||||||
A workshop addon (map, plugin, asset pack) is published to the Steam
|
|
||||||
workshop and pulled into a build. The content runs inside srcds via
|
|
||||||
Source engine + sourcemod loading. Capabilities: same as A1 once loaded
|
|
||||||
into srcds (the engine doesn't have a strong privilege boundary against
|
|
||||||
its own loaded plugins). Distinct in that the entry vector is curated by
|
|
||||||
the operator (workshop link added to a blueprint), not arbitrary network
|
|
||||||
input. Risk floor: the operator vetted the source.
|
|
||||||
|
|
||||||
### A4 — Compromised player session
|
|
||||||
|
|
||||||
A connected player exploits a Source-engine protocol bug. Functionally a
|
|
||||||
subset of A1 — same capability set once code is running in srcds.
|
|
||||||
|
|
||||||
### A5 — Local attacker on the host
|
|
||||||
|
|
||||||
Out of scope per operating posture. No non-root local accounts beyond
|
|
||||||
the systemd-managed service uids.
|
|
||||||
|
|
||||||
### A6 — Steam binary supply-chain
|
|
||||||
|
|
||||||
`srcds_linux` is a binary from Valve. A compromised Valve build would
|
|
||||||
already be running as `left4me` and there's no practical defense at
|
|
||||||
this layer. Out of scope.
|
|
||||||
|
|
||||||
## Attack scenarios
|
|
||||||
|
|
||||||
### S1 — L4D2 engine RCE → exfil + persist
|
|
||||||
|
|
||||||
A1 sends a crafted packet to srcds; srcds executes attacker code as
|
|
||||||
`left4me` inside `left4me-server@.service`.
|
|
||||||
|
|
||||||
**Today, attacker can:**
|
|
||||||
- Read DB → all RCON passwords (plaintext), all bcrypt hashes.
|
|
||||||
- Read `web.env` → SECRET_KEY, Steam API key.
|
|
||||||
- ptrace gunicorn → in-memory secrets, current sessions.
|
|
||||||
- Read `/proc/<gunicorn-pid>/environ` → same env as `web.env`.
|
|
||||||
- ptrace + read DB of peer `left4me-server@<n>` — cross-server compromise.
|
|
||||||
- `sudo left4me-systemctl|journalctl|overlay` for any instance.
|
|
||||||
- Cannot write `/opt/left4me/src/` (ProtectSystem=strict covers `/opt`).
|
|
||||||
- Cannot acquire new caps (NoNewPrivileges).
|
|
||||||
|
|
||||||
**Defended outcome (goal):** Blast radius limited to "this gameserver's
|
|
||||||
runtime state during this session" — no peer-server compromise, no DB
|
|
||||||
access, no `web.env` access, no ptrace.
|
|
||||||
|
|
||||||
### S2 — Web app RCE → secrets + persistence
|
|
||||||
|
|
||||||
A1 finds a Flask vulnerability (Jinja SSTI, SQLAlchemy injection, auth
|
|
||||||
bypass, file-upload escape). Web executes attacker code as `left4me`
|
|
||||||
inside `left4me-web.service`.
|
|
||||||
|
|
||||||
**Today, attacker can:**
|
|
||||||
- Read + write DB (web's primary path).
|
|
||||||
- Read `web.env`.
|
|
||||||
- Write `/opt/left4me/src/` → backdoor next gunicorn reload.
|
|
||||||
- `sudo` all helper verbs.
|
|
||||||
- ptrace srcds peers, modify their `runtime/<n>/` upper layer.
|
|
||||||
- Modify overlays (writes to `/var/lib/left4me/overlays/`).
|
|
||||||
|
|
||||||
**Defended outcome (goal):** Cannot ptrace gameservers; cannot read
|
|
||||||
`/proc/<srcds-pid>/*`; web compromise still owns its DB and env (its
|
|
||||||
primary attack surface, so this is *acceptable residual*).
|
|
||||||
|
|
||||||
### S3 — Cross-server contamination
|
|
||||||
|
|
||||||
S1 played out on srcds@1; attacker pivots to srcds@2.
|
|
||||||
|
|
||||||
**Today:** trivial — ptrace srcds@2, read its memory; or just read the
|
|
||||||
DB to learn srcds@2's RCON password and send commands.
|
|
||||||
|
|
||||||
**Defended outcome (goal):** Blocked. Per-instance namespace isolation
|
|
||||||
(or per-instance uid) means kernel rejects ptrace; DB invisible to
|
|
||||||
gameserver uid hides the RCON list.
|
|
||||||
|
|
||||||
### S4 — Malicious workshop content
|
|
||||||
|
|
||||||
A3 adds an addon to a blueprint; addon includes a Squirrel/SourceMod
|
|
||||||
plugin that abuses engine APIs to do file I/O / network calls.
|
|
||||||
|
|
||||||
**Today + with hardening:** functionally equivalent to S1 — the plugin
|
|
||||||
runs as srcds, same blast radius. No software boundary prevents this;
|
|
||||||
the only defense is what's outside the unit. So this is *covered* if S1
|
|
||||||
is covered.
|
|
||||||
|
|
||||||
### S5 — Sudoers helper abuse
|
|
||||||
|
|
||||||
S1 or S2 attacker uses the sudo grants to widen access.
|
|
||||||
|
|
||||||
**Today:** sudoers grants (audit findings, `deploy/files/etc/sudoers.d/left4me`):
|
|
||||||
- `left4me-systemctl <name> {enable|disable|show}` — any instance, no
|
|
||||||
ownership check
|
|
||||||
- `left4me-journalctl <name>` — read any unit's journal
|
|
||||||
- `left4me-overlay mount|umount <name>` — any instance
|
|
||||||
- `left4me-script-sandbox <overlay_id> <script>` — runs as `l4d2-sandbox`
|
|
||||||
|
|
||||||
A compromised gameserver can enable/disable peer instances, read their
|
|
||||||
journals, mount/umount their overlays. Not root escalation, but a
|
|
||||||
significant escalation.
|
|
||||||
|
|
||||||
**Defended outcome:** sudoers reachable only from `left4me-web`. The
|
|
||||||
gameserver uid (or the gameserver's namespace) gets none of the helper
|
|
||||||
grants. This is naturally true if the helpers are invoked only by the
|
|
||||||
web app; ensure the gameserver unit cannot sudo (no PAM, no setuid bits
|
|
||||||
in its FS view).
|
|
||||||
|
|
||||||
### S6 — Sandbox escape
|
|
||||||
|
|
||||||
Reached A1-equivalent in `l4d2-script-sandbox`. The sandbox runs as
|
|
||||||
`l4d2-sandbox`, fully hardened (verified during 2026-05-15 work).
|
|
||||||
|
|
||||||
**Today:** sandbox-escape attacker has `l4d2-sandbox` capabilities only.
|
|
||||||
With build-time-idmap, writes through the bind land on disk as
|
|
||||||
`left4me`, but the sandbox process itself cannot interact with `left4me`
|
|
||||||
processes (different uid). Existing isolation is strong.
|
|
||||||
|
|
||||||
**Defended outcome:** unchanged — already strong. Document as a load-
|
|
||||||
bearing invariant; do not weaken.
|
|
||||||
|
|
||||||
## What we accept losing
|
|
||||||
|
|
||||||
Decisions to *not* defend, with reasoning. Future work might revisit.
|
|
||||||
|
|
||||||
- **Kernel CVEs** that escape namespaces or seccomp. No practical defense
|
|
||||||
short of running on a hypervisor + KVM. Out of scope.
|
|
||||||
- **systemd unit-config CVEs**. Unit hardening relies on systemd
|
|
||||||
honoring directives correctly. Out of scope.
|
|
||||||
- **Steam binary compromise**. `srcds_linux` is Valve's. Out of scope.
|
|
||||||
- **Sourcemod / Metamod plugin runtime weaknesses**. Plugins run as srcds
|
|
||||||
by design. Out of scope.
|
|
||||||
- **Player IP exposure via game protocol**. Inherent to UDP/Source. Out of
|
|
||||||
scope.
|
|
||||||
- **DoS via game protocol** (`A2S_INFO` flooding etc.). Out of scope for
|
|
||||||
*this* effort; covered by network-layer mitigations.
|
|
||||||
- **DoS via web HTTP**. Covered upstream by nginx + fail2ban; out of
|
|
||||||
scope for *this* effort.
|
|
||||||
- **Host root from operator error** (a misconfigured cron, an admin
|
|
||||||
shell). Out of scope; operator is single-person and aware.
|
|
||||||
- **Long-term forward secrecy** for past sessions (an attacker who
|
|
||||||
exfils SECRET_KEY can replay past sessions). Out of scope; rotation
|
|
||||||
on incident.
|
|
||||||
|
|
||||||
## What we defend (prioritized)
|
|
||||||
|
|
||||||
D1 — **Gameserver RCE cannot exfiltrate DB or web.env**, including RCON
|
|
||||||
passwords and SECRET_KEY. Highest value: catastrophic asset, plausible
|
|
||||||
attack (L4D2 engine RCE is the canonical "old engine, public traffic"
|
|
||||||
risk).
|
|
||||||
|
|
||||||
D2 — **Gameserver RCE cannot ptrace web app or peer gameservers**. Blocks
|
|
||||||
in-memory secret theft and cross-server contamination.
|
|
||||||
|
|
||||||
D3 — **Gameserver RCE cannot use sudo helpers** for instances other
|
|
||||||
than its own (or, ideally, cannot use sudo at all).
|
|
||||||
|
|
||||||
D4 — **Web app RCE cannot ptrace gameservers**. Symmetric to D2; web
|
|
||||||
still has full DB access (acceptable residual since it's the web app's
|
|
||||||
own data).
|
|
||||||
|
|
||||||
D5 — **Cross-server contamination blocked at the kernel level**. Per-
|
|
||||||
instance namespaces or per-instance uid.
|
|
||||||
|
|
||||||
D6 — **Persistent compromise of `/opt/left4me/src/` blocked from
|
|
||||||
gameserver context**. Already partially true via `ProtectSystem=strict`;
|
|
||||||
maintain.
|
|
||||||
|
|
||||||
D7 — **All defenses survive a unit-config refactor in the wrong
|
|
||||||
direction** — e.g., a future developer adding `ReadWritePaths=` widely.
|
|
||||||
Achieved via tests that assert hardening invariants
|
|
||||||
(`deploy/tests/test_deploy_artifacts.py`).
|
|
||||||
|
|
||||||
## Acceptable user-experience cost
|
|
||||||
|
|
||||||
- **Unit start latency**: +5s tolerable; +30s not.
|
|
||||||
- **Memory overhead**: +tens of MB per unit fine; +hundreds not.
|
|
||||||
- **Operational complexity**: one well-documented unit-template
|
|
||||||
hardening profile reusable across units. Acceptable trade-off.
|
|
||||||
- **Debugging cost**: SECCOMP audit log discoverability via
|
|
||||||
`journalctl -k` acceptable. ptrace-based debugging in production
|
|
||||||
unnecessary; can re-enable via ad-hoc drop-in if needed.
|
|
||||||
- **Steam updates / pip installs**: must continue to work without
|
|
||||||
per-update operator action. Privileged paths (steamcmd self-update)
|
|
||||||
can run as `left4me` outside the unit if needed; document.
|
|
||||||
- **Workshop content**: must continue to load. Builds run in the
|
|
||||||
sandbox; the gameserver only reads pre-built overlays.
|
|
||||||
|
|
||||||
## Acceptance criteria for the implementation
|
|
||||||
|
|
||||||
The final composition (hardening directives + any uid changes) must:
|
|
||||||
|
|
||||||
1. **Functionally**: pass the smoke matrix from `2026-05-15-hardening-test-plan.md` (RCON, build, restart, file upload, multi-server, workshop).
|
|
||||||
2. **Defenses verified**:
|
|
||||||
- srcds cannot read `/var/lib/left4me/left4me.db` or `/etc/left4me/web.env` (file not in FS view, or kernel denies)
|
|
||||||
- srcds cannot ptrace gunicorn or peer srcds (syscall blocked, or kernel rejects across namespaces/uids)
|
|
||||||
- srcds cannot read `/proc/<other-pid>/*`
|
|
||||||
- web cannot ptrace srcds (symmetric)
|
|
||||||
3. **No regressions**: existing test suite passes
|
|
||||||
(`pytest deploy/tests/test_overlay_helper.py l4d2host/tests/`).
|
|
||||||
4. **Auditable**: invariants asserted in `deploy/tests/test_deploy_artifacts.py`; baseline `systemd-analyze security` score recorded.
|
|
||||||
5. **Documentable**: one paragraph per directive in the unit, explaining
|
|
||||||
*why* it's there. Future maintainers can reason about removal.
|
|
||||||
|
|
||||||
## Open questions to clarify with the operator
|
|
||||||
|
|
||||||
Before the defenses survey is final, clarify:
|
|
||||||
|
|
||||||
1. **Is gunicorn directly internet-reachable, or behind nginx?** The unit
|
|
||||||
binds `127.0.0.1:8000` (per `metadata.py:208`); presumably nginx
|
|
||||||
terminates TLS and forwards. Confirm.
|
|
||||||
2. **Auth model**: who can log into the web app? Is admin auth strong
|
|
||||||
(long passwords, 2FA), or default-grade? Defines how realistic S2 is.
|
|
||||||
3. **Workshop content sources**: curated by operator, or arbitrary
|
|
||||||
workshop subscriptions exposed to admins? Defines A3's realism.
|
|
||||||
4. **Test bench**: is `ckn@10.0.4.128` a real separate test host, or
|
|
||||||
ovh.left4me the only deployment target? Affects test plan choices.
|
|
||||||
5. **`kernel.yama.ptrace_scope` setting on the host?** Default Debian is
|
|
||||||
1; we may want 2 system-wide.
|
|
||||||
6. **Is the host running AppArmor?** Debian Trixie does not enable it by
|
|
||||||
default. If we want AppArmor profiles for srcds (in addition to
|
|
||||||
systemd directives), it needs enabling system-wide.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Audit synthesis (this session's conversation): unit hardening profile
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/left4me-server@.service`,
|
|
||||||
metadata reactor `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`,
|
|
||||||
filesystem ACLs `~/Projekte/ckn-bw/bundles/left4me/items.py:21-115`,
|
|
||||||
DB schema `l4d2web/models.py:31, 146-148`, sudoers
|
|
||||||
`deploy/files/etc/sudoers.d/left4me`.
|
|
||||||
- Original uid-split spec: `docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
|
||||||
— remains open; this work may supersede it.
|
|
||||||
- Companion docs:
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`,
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`.
|
|
||||||
- Related work landed this session:
|
|
||||||
`docs/superpowers/plans/2026-05-15-build-time-idmap.md`,
|
|
||||||
`docs/superpowers/plans/2026-05-15-deploy-dir-rethink.md`.
|
|
||||||
|
|
@ -1,178 +0,0 @@
|
||||||
# Session handoff — next: write the hardening-refactor implementation plan
|
|
||||||
|
|
||||||
Short handoff. The hardening test plan was executed end-to-end on
|
|
||||||
`left4.me` this session. Results are recorded inline in the spec at
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md` (commit
|
|
||||||
`461b8d0`). The next session writes the implementation plan that lands
|
|
||||||
the proven composition in ckn-bw.
|
|
||||||
|
|
||||||
## What just happened
|
|
||||||
|
|
||||||
Ran all 11 tests from the hardening test plan on
|
|
||||||
`left4me-server@1` (canary) and `left4me-web` against the live host
|
|
||||||
at `left4.me` / `left4me.ovh.ckn.li` (Debian 13, systemd 257). All
|
|
||||||
drop-ins cleaned up at session end; the Test 9 sysctl
|
|
||||||
(`kernel.yama.ptrace_scope=2`) is the one persistent host change.
|
|
||||||
`gdb` + `seccomp` packages left installed.
|
|
||||||
|
|
||||||
Headline numbers:
|
|
||||||
- `left4me-server@1.service`: **7.5 EXPOSED → 1.3 OK** (systemd-analyze)
|
|
||||||
- `left4me-web.service`: **8.7 EXPOSED → 4.1 OK**
|
|
||||||
- Test 8 attack matrix: all 8 vectors (D1.a/b/c, D2.a/b/c, D3, D5) blocked.
|
|
||||||
|
|
||||||
Three things the test surfaced that change what the refactor must look like:
|
|
||||||
- **`SystemCallArchitectures=native x86`**, not bare `native`.
|
|
||||||
`srcds_linux` is 32-bit i386; with `native=AUDIT_ARCH_X86_64` only,
|
|
||||||
every i386 syscall is killed and srcds_run respawns every 10 s.
|
|
||||||
- **Add `PrivatePIDs=true`** to the composition. `ProtectProc=invisible`
|
|
||||||
alone cannot hide gunicorn from srcds because they share uid 980;
|
|
||||||
PrivatePIDs gives each instance its own PID namespace and closes
|
|
||||||
D2.b without needing the uid split.
|
|
||||||
- **Exclude `MemoryDenyWriteExecute=true`.** Source engine i386 `.so`
|
|
||||||
files have text relocations; MDW returns EPERM on the relocation
|
|
||||||
`mprotect`, dlopen aborts, srcds enters the respawn loop. Permanent
|
|
||||||
exclusion — not fixable without rebuilding Valve's closed-source
|
|
||||||
binary.
|
|
||||||
|
|
||||||
Full per-test detail is in the spec's "Results" section.
|
|
||||||
|
|
||||||
## What's next: write the refactor plan
|
|
||||||
|
|
||||||
Target file: `docs/superpowers/plans/2026-05-16-hardening-refactor.md`
|
|
||||||
(or whatever date the next session opens).
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
1. **Land the proven composition in ckn-bw.** Live source for the
|
|
||||||
unit emission is `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`.
|
|
||||||
The reactor emits `left4me-server@.service` and `left4me-web.service`
|
|
||||||
— both need the new directives. Copy the Test 7 drop-in (from the
|
|
||||||
spec) into the reactor's unit body, with the two amendments above.
|
|
||||||
|
|
||||||
2. **Land the web composition** (sudo-compatible subset from Test 10)
|
|
||||||
in the same reactor.
|
|
||||||
|
|
||||||
3. **Land the sysctl drop-in in ckn-bw.** Currently
|
|
||||||
`/etc/sysctl.d/99-left4me-ptrace.conf` is host-only — if ckn-bw
|
|
||||||
later enforces unmanaged-file removal, this would disappear. Add
|
|
||||||
`pkg_files:` entry (or whatever the bundle convention is) for
|
|
||||||
`kernel.yama.ptrace_scope=2`.
|
|
||||||
|
|
||||||
4. **Update reference units** in
|
|
||||||
`deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service`
|
|
||||||
to mirror the new emission (these are reference-only post the
|
|
||||||
deploy-dir-rethink, but should not drift from the live source).
|
|
||||||
|
|
||||||
5. **Decide on `SocketBindAllow=`** for game port range (27000–27999
|
|
||||||
per `LEFT4ME_PORT_RANGE_*`). Worth adding to lock srcds's bindable
|
|
||||||
sockets; not tested in this session.
|
|
||||||
|
|
||||||
6. **Resolve the deferred specs:**
|
|
||||||
- `docs/superpowers/specs/2026-05-15-user-uid-split-design.md` —
|
|
||||||
**mark as superseded.** PrivatePIDs + PrivateUsers close the
|
|
||||||
same-uid /proc gap that motivated it. Note the residual app-level
|
|
||||||
same-uid surface (DB ACLs, web.env mode) is a separate concern
|
|
||||||
not addressed by uid split anyway.
|
|
||||||
- AppArmor follow-up — defer further; defenses survey lists it.
|
|
||||||
Revisit if directive-only hardening leaves observable gaps.
|
|
||||||
|
|
||||||
7. **Fix the four spec bugs documented at the bottom of the test plan**
|
|
||||||
(PID-lookup races, gdb-from-outside-NS verification flaw, D5
|
|
||||||
pgrep pattern, scmp_sys_resolver package name).
|
|
||||||
|
|
||||||
### Recommendation on sequencing
|
|
||||||
|
|
||||||
Before touching ckn-bw, run **superpowers:brainstorming** on the
|
|
||||||
refactor — there's a real design choice on emission shape. The
|
|
||||||
test-plan drop-in is ~50 lines of new directives; the existing
|
|
||||||
reactor emits a smaller unit. Options:
|
|
||||||
|
|
||||||
- **A. Inline.** All directives land directly in the
|
|
||||||
`[Service]` block emitted by the reactor. Simple, ckn-bw-idiomatic.
|
|
||||||
- **B. Profile-as-reusable-fragment.** Put the directive block in a
|
|
||||||
shared bundle (so the future build-overlay-unit refactor can reuse
|
|
||||||
it). Better factoring, more upfront design.
|
|
||||||
- **C. Drop-in pattern preserved.** Reactor emits the base unit
|
|
||||||
unchanged, plus a separate `*.service.d/hardening.conf` drop-in.
|
|
||||||
Mirrors the test methodology; easier to revert by removing the
|
|
||||||
drop-in.
|
|
||||||
|
|
||||||
My weak preference is **A** for the first pass — get the production
|
|
||||||
state hardened, then refactor into shared shape (B) when the
|
|
||||||
build-overlay-unit work needs it. **C** is operationally clean but
|
|
||||||
introduces a new emission pattern just for this. Worth 10 minutes of
|
|
||||||
brainstorming before committing.
|
|
||||||
|
|
||||||
## Decision-relevant context
|
|
||||||
|
|
||||||
- **Source of truth is ckn-bw.** `deploy/files/.../*.service` copies
|
|
||||||
are reference-only post-deploy-dir-rethink. Don't edit them as the
|
|
||||||
primary change — emit-then-mirror.
|
|
||||||
- **Sandbox `l4d2-sandbox` unit is out of scope.** Verified during
|
|
||||||
prior build-time-idmap work; do not weaken.
|
|
||||||
- **Web sudo helpers must keep working.** `NoNewPrivileges` and
|
|
||||||
`PrivateUsers` are NOT in the web composition (Test 10 confirmed
|
|
||||||
the sudo-compat subset). The "replace sudo with systemctl-managed
|
|
||||||
unit triggering" refactor (build-overlay-unit spec is a step
|
|
||||||
toward it) would unlock deeper web hardening later.
|
|
||||||
- **App-level stale RCON port bug** surfaced during testing: each
|
|
||||||
srcds restart picks a new ephemeral RCON port; the web app
|
|
||||||
caches the previous one and logs `Connection refused`. Pre-exists
|
|
||||||
hardening (repro'd before any drop-in was applied). Separate issue,
|
|
||||||
not for this refactor. Mention to operator; track in whatever
|
|
||||||
issue-tracking the project uses.
|
|
||||||
- **gdb + seccomp packages on left4.me** are installed but not in
|
|
||||||
ckn-bw. Either add them to the bundle (so they're reproducible)
|
|
||||||
or `apt remove` them after the refactor lands — operator
|
|
||||||
preference.
|
|
||||||
|
|
||||||
## Host state at end of session
|
|
||||||
|
|
||||||
- `left4me-server@1`, `@2`, `left4me-web`: all `active`, baseline
|
|
||||||
(no drop-ins).
|
|
||||||
- `/etc/sysctl.d/99-left4me-ptrace.conf` present, `ptrace_scope=2`
|
|
||||||
effective.
|
|
||||||
- `gdb`, `seccomp` (provides `scmp_sys_resolver`), `libseccomp-dev`
|
|
||||||
installed.
|
|
||||||
- `/tmp/sec-{baseline,after}-{server,web}.txt`, `/tmp/unit-baseline-*.conf`,
|
|
||||||
`/tmp/sysctl-baseline.txt` retained (next session can pull diffs from
|
|
||||||
these if needed).
|
|
||||||
|
|
||||||
## What's NOT next
|
|
||||||
|
|
||||||
- **Don't re-run the test plan.** Already done; results committed.
|
|
||||||
- **Don't push to origin yet.** Repo is 3 ahead of
|
|
||||||
`origin/master` (the three hardening specs + this commit). User
|
|
||||||
said "commit locally" this session; they'll push when ready.
|
|
||||||
- **Don't fix the stale-RCON-port app bug as part of the refactor.**
|
|
||||||
Different system, different scope.
|
|
||||||
- **Don't do AppArmor.** Still deferred.
|
|
||||||
- **build-overlay-unit refactor** (`docs/superpowers/specs/2026-05-15-build-overlay-unit-design.md`)
|
|
||||||
remains sequenced behind this; not next.
|
|
||||||
|
|
||||||
## Open questions for the next session
|
|
||||||
|
|
||||||
- Should the refactor be a single PR/commit, or split into
|
|
||||||
"ckn-bw emission" + "reference unit mirror" + "sysctl drop-in"?
|
|
||||||
Operator preference. Recommend single bundle if the changes are
|
|
||||||
small.
|
|
||||||
- Should we land Test 7's composition on `@2` first as a longer
|
|
||||||
canary before rolling to all instances? Or trust the symmetric
|
|
||||||
emission and roll everywhere at once? Currently both are running
|
|
||||||
baseline; @1 was the only canary.
|
|
||||||
- `SocketBindAllow=` for the 27000–27999 game port range — include
|
|
||||||
in the first pass, or defer to a follow-up commit? Survey lists
|
|
||||||
it, test plan didn't exercise it.
|
|
||||||
|
|
||||||
## Pointers
|
|
||||||
|
|
||||||
- Test plan (executed; **read the Results section first**):
|
|
||||||
`docs/superpowers/specs/2026-05-15-hardening-test-plan.md`
|
|
||||||
- Threat model: `docs/superpowers/specs/2026-05-15-hardening-threat-model.md`
|
|
||||||
- Defenses survey: `docs/superpowers/specs/2026-05-15-hardening-defenses-survey.md`
|
|
||||||
- Original uid-split (to be marked superseded):
|
|
||||||
`docs/superpowers/specs/2026-05-15-user-uid-split-design.md`
|
|
||||||
- Live unit emission: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
|
||||||
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
|
|
||||||
- Recent commit on this work: `461b8d0`
|
|
||||||
- Host SSH: `ssh left4.me` (config at `~/.ssh/config`, 1Password agent)
|
|
||||||
|
|
@ -1,31 +1,5 @@
|
||||||
# How many system users should left4me have? — 1, 2, or 3
|
# How many system users should left4me have? — 1, 2, or 3
|
||||||
|
|
||||||
**Status: SUPERSEDED 2026-05-15 by the hardening refactor.**
|
|
||||||
|
|
||||||
The original question — should left4me have 1, 2, or 3 system users — is
|
|
||||||
now answered: **2 users (current state) is correct.** The
|
|
||||||
defenses that motivated a 3-user split (DB readability from srcds,
|
|
||||||
cross-server ptrace, same-uid /proc visibility, web-side reach into
|
|
||||||
gameserver state) are closed by the systemd hardening composition
|
|
||||||
landed in the hardening-refactor plan (`docs/superpowers/plans/2026-05-15-hardening-refactor.md`):
|
|
||||||
- `PrivateUsers=true` blocks cross-uid ptrace at the kernel level.
|
|
||||||
- `PrivatePIDs=true` hides peer processes even when uids match.
|
|
||||||
- `TemporaryFileSystem=` + minimal binds hide the DB and web.env from
|
|
||||||
srcds entirely.
|
|
||||||
- `SystemCallFilter=~@debug` + empty `CapabilityBoundingSet=` block
|
|
||||||
ptrace at the syscall layer.
|
|
||||||
|
|
||||||
The residual filesystem-ACL surface (DB at `0640 root:left4me`,
|
|
||||||
web.env same) is a separate concern: a uid split would close it via
|
|
||||||
kernel ACLs, but for the current deployment shape it's covered by the
|
|
||||||
systemd-imposed FS view. If the deployment shape changes (multi-tenant
|
|
||||||
host, shell logins as the service uids, additional services running
|
|
||||||
as `left4me` outside these units) the uid split should be revisited.
|
|
||||||
|
|
||||||
The original content of this spec is preserved below for context.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Status: open question, not settled design.** This is a handoff
|
**Status: open question, not settled design.** This is a handoff
|
||||||
document. Today left4me has 2 system users: `left4me` (web app +
|
document. Today left4me has 2 system users: `left4me` (web app +
|
||||||
gameservers + workshop builds) and `l4d2-sandbox` (script-overlay
|
gameservers + workshop builds) and `l4d2-sandbox` (script-overlay
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue