The hardening refactor that just landed closes the same-uid attack surface (FS view, ptrace, /proc visibility, signals) for the web + gameserver units via systemd directives plus system-wide kernel.yama.ptrace_scope=2. Keeping the script-sandbox on a separate uid was the inconsistent half-step — defense-in-depth only, with build-time-idmap complexity attached. One principle wins: harden once, share the uid. scripts/libexec/left4me-script-sandbox: drop the idmap block (uid lookups, STAGING setup, cleanup_staging trap, mount --bind --map-users), switch User=/Group= to left4me, point BindPaths at \$OVERLAY_DIR directly. Header comment updated to reflect hardening-not-uid as the same-uid defense. nsenter self-wrap kept — it's about mount-namespace escape, not uid. Tests + comments + companion docs updated. Build-time-idmap and overlay-idmap plans marked SUPERSEDED; user-uid-split spec revised to "1 user is correct"; one-line update notes on the hardening specs and the build-overlay-unit-design. Companion ckn-bw commit removes the l4d2-sandbox user + group and tightens /var/lib/left4me from 0711 → 0755 (the traverse-only mode was specifically for the sandbox uid).
243 lines
11 KiB
Markdown
243 lines
11 KiB
Markdown
# Hardening refactor — design
|
|
|
|
**Status:** approved design; implementation plan to follow at
|
|
`docs/superpowers/plans/2026-05-15-hardening-refactor.md`.
|
|
Companion: `2026-05-15-hardening-threat-model.md`,
|
|
`2026-05-15-hardening-defenses-survey.md`,
|
|
`2026-05-15-hardening-test-plan.md` (executed 2026-05-15, results inline).
|
|
|
|
> **Updated 2026-05-15:** `l4d2-sandbox` was collapsed into `left4me`
|
|
> after this refactor landed — see
|
|
> `docs/superpowers/plans/2026-05-15-uid-collapse.md`. References below
|
|
> to the sandbox running as a separate uid describe the pre-collapse
|
|
> state; the directive composition this doc establishes is unchanged.
|
|
|
|
This doc records the *shape* of the refactor — where the artifacts live,
|
|
how they're factored, what's in scope. The implementation plan lays out
|
|
the steps.
|
|
|
|
## Context
|
|
|
|
The hardening test plan ran end-to-end on `left4.me` on 2026-05-15
|
|
(commit `461b8d0`). Outcome: `left4me-server@1` 7.5→1.3 systemd-analyze,
|
|
`left4me-web` 8.7→4.1, all 8 Test 8 attack vectors blocked. Two
|
|
amendments to the spec's proposed composition required: `SystemCallArchitectures=native x86`
|
|
(srcds_linux is i386), `PrivatePIDs=true` (same-uid
|
|
`ProtectProc=invisible` can't hide gunicorn from srcds; PID namespace
|
|
fixes it at the kernel level). `MemoryDenyWriteExecute=true` permanently
|
|
excluded (Source engine i386 `.so` files have text relocations).
|
|
|
|
Composition is *not currently deployed* — Test 7's drop-in was cleaned
|
|
up at session end; only the Test 9 sysctl (`kernel.yama.ptrace_scope=2`)
|
|
persists. This refactor lands the proven composition permanently via
|
|
the ckn-bw bundle.
|
|
|
|
## Approach
|
|
|
|
Keep the current responsibility split for now: ckn-bw owns systemd unit
|
|
emission (base + hardening), left4me owns the educational reference
|
|
copies and the threat-model/test docs. Hardening directives land in
|
|
ckn-bw's `systemd/units` reactor at
|
|
`~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`, factored via
|
|
shared Python dicts so the two units (and the future
|
|
build-overlay-unit refactor) reuse the common base.
|
|
|
|
The broader responsibility reshape — hardening as drop-in files
|
|
*living* in left4me with ckn-bw as a thin file-shipper — is a real
|
|
direction worth pursuing, but deserves its own session. Deferred.
|
|
|
|
## Factoring
|
|
|
|
Three dict constants at the top of `metadata.py` (or in a sibling
|
|
`hardening.py` module if `metadata.py` grows past a comfortable read):
|
|
|
|
### `HARDENING_COMMON`
|
|
|
|
Directives both units take verbatim. ~17 keys:
|
|
|
|
```python
|
|
HARDENING_COMMON = {
|
|
'ProtectProc': 'invisible',
|
|
'ProcSubset': 'pid',
|
|
'ProtectKernelTunables': 'true',
|
|
'ProtectKernelModules': 'true',
|
|
'ProtectKernelLogs': 'true',
|
|
'ProtectClock': 'true',
|
|
'ProtectControlGroups': 'true',
|
|
'ProtectHostname': 'true',
|
|
'LockPersonality': 'true',
|
|
'ProtectSystem': 'strict',
|
|
'ProtectHome': 'true',
|
|
'PrivateTmp': 'true',
|
|
'RestrictNamespaces': 'true',
|
|
'RestrictRealtime': 'true',
|
|
'RemoveIPC': 'true',
|
|
'KeyringMode': 'private',
|
|
'UMask': '0027',
|
|
'RestrictAddressFamilies': 'AF_INET AF_INET6 AF_UNIX',
|
|
}
|
|
```
|
|
|
|
### `HARDENING_SERVER`
|
|
|
|
`{**HARDENING_COMMON, ...server-specific}`. Adds sudo-incompatible
|
|
flags + filesystem virtualization + i386 amendment + per-instance PID
|
|
namespace + bound socket binds:
|
|
|
|
- `NoNewPrivileges=true`
|
|
- `RestrictSUIDSGID=true`
|
|
- `PrivateUsers=true`
|
|
- **`PrivatePIDs=true`** *(Test amendment — D2.b / D5)*
|
|
- `PrivateIPC=true`
|
|
- `PrivateDevices=true`
|
|
- `CapabilityBoundingSet=` *(empty value → drop all)*
|
|
- `AmbientCapabilities=`
|
|
- `SystemCallArchitectures='native x86'` *(Test amendment — i386 srcds)*
|
|
- `SystemCallFilter=('@system-service', '~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete @privileged')` *(tuple → repeated key)*
|
|
- `TemporaryFileSystem='/var/lib /etc /opt /home /root /srv /mnt /media'`
|
|
- `BindReadOnlyPaths=('/var/lib/left4me/installation', '/var/lib/left4me/overlays', '/etc/left4me/host.env', '/etc/ssl', '/etc/ca-certificates', '/etc/resolv.conf', '/etc/nsswitch.conf', '/etc/alternatives')`
|
|
- `BindPaths='/var/lib/left4me/runtime/%i'`
|
|
- `SocketBindAllow=('udp:27000-27999', 'tcp:27000-27999')` *(NEW — lock srcds bindable sockets to the game port range; not tested in Test 7 but cheap defense-in-depth. Concrete range pending verification of `LEFT4ME_PORT_RANGE_*` substitution support in systemd directives; hard-coded range as fallback.)*
|
|
|
|
### `HARDENING_WEB`
|
|
|
|
`{**HARDENING_COMMON, ...web-specific}`. Web inherits `ProtectSystem=strict`
|
|
from COMMON (was `=full` in the current base unit; this tightens). Adds
|
|
a syscall filter *without* `~@privileged` (sudo needs setuid).
|
|
**Excludes** `NoNewPrivileges`, `PrivateUsers`, `RestrictSUIDSGID`,
|
|
empty `CapabilityBoundingSet` — all sudo-incompatible.
|
|
|
|
- `SystemCallArchitectures='native'`
|
|
- `SystemCallFilter=('@system-service', '~@debug @mount @raw-io @reboot @swap @cpu-emulation @obsolete')` *(no `~@privileged`)*
|
|
|
|
Web's existing `ReadWritePaths=/var/lib/left4me` stays in its unit's
|
|
inline `Service` dict (web-specific, not common).
|
|
|
|
### Multi-value directives and empty values
|
|
|
|
Tuples-of-strings → emitted as repeated `Key=Value` lines by ckn-bw's
|
|
systemd-bundle emitter. Existing precedent: `EnvironmentFile` at
|
|
`metadata.py:201-204`. Empty values (`CapabilityBoundingSet=`,
|
|
`AmbientCapabilities=`) need to emit as `Key=` with nothing after `=`.
|
|
Both behaviors verified as the first step of the implementation plan;
|
|
fallback approaches if the emitter doesn't handle them: inline-joined
|
|
strings where systemd accepts them, or extend the emitter.
|
|
|
|
## Reference units
|
|
|
|
Keep `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
|
|
and `deploy/files/usr/local/lib/systemd/system/left4me-web.service` as
|
|
**deliberately educational** copies of the deployed units. Each new
|
|
hardening directive in the reference gets a one-line comment
|
|
explaining the threat it addresses. A cold reader of the repo can open
|
|
the reference unit and read the threat model in code form, without
|
|
needing to read the ckn-bw bundle or systemd man pages.
|
|
|
|
Source-of-truth: ckn-bw reactor is what's deployed. Reference units in
|
|
left4me are hand-synced. No CI drift test (would be brittle against
|
|
comment ordering and structural human-readable formatting); operator
|
|
discipline at edit time keeps them aligned. A top-of-file note in each
|
|
reference unit points readers at the reactor.
|
|
|
|
## Scope of the refactor
|
|
|
|
1. **Ckn-bw reactor edits.** Three constants + spread into the two
|
|
units. Verify tuple-multi-value emission. `metadata.py`.
|
|
2. **Sysctl drop-in via ckn-bw.** `kernel.yama.ptrace_scope=2`. Move
|
|
from host-only `/etc/sysctl.d/99-left4me-ptrace.conf` (applied by
|
|
hand in Test 9) into the bundle's file management. Find the existing
|
|
sysctl pattern in ckn-bw and follow it.
|
|
3. **Reference unit mirror with educational comments.** Update
|
|
`deploy/files/usr/local/lib/systemd/system/{left4me-server@,left4me-web}.service`
|
|
to match the reactor's emission, with per-directive comments
|
|
explaining each hardening directive's purpose. Top-of-file note
|
|
pointing to the reactor.
|
|
4. **Spec bug fixes in the test plan.** Four bugs flagged in
|
|
`2026-05-15-hardening-test-plan.md`'s output section: PID-lookup
|
|
race (use `systemctl show -p MainPID --value`), gdb-from-host
|
|
verification flaw (probe via `systemd-run` inside the same
|
|
hardening profile, not via `nsenter` that bypasses it), D5 pgrep
|
|
pattern, `scmp_sys_resolver` package is `seccomp` not
|
|
`libseccomp-dev`. Doc-only.
|
|
5. **Mark `2026-05-15-user-uid-split-design.md` superseded.** Front-matter
|
|
status note + brief explanation that `PrivateUsers` + `PrivatePIDs`
|
|
+ `TemporaryFileSystem` close D1, D2, D3, D5 at the kernel level.
|
|
Reference this design + the refactor plan as the replacement.
|
|
6. **`SocketBindAllow=` for srcds** (in `HARDENING_SERVER`). Not tested
|
|
in Test 7; verify on deploy. Encoding pending — likely hard-coded
|
|
port range, since systemd directive variable substitution support
|
|
is uneven.
|
|
7. **Cleanup unmanaged packages on left4.me.** `apt remove gdb seccomp
|
|
libseccomp-dev` after the refactor lands. Test-only tooling;
|
|
reinstall on demand for future test sessions.
|
|
|
|
## Sequencing the deploy
|
|
|
|
1. Land ckn-bw commit (reactor changes, sysctl drop-in entry).
|
|
2. Land left4me commit (reference units, spec bug fixes, uid-split
|
|
spec status update, this design doc, the refactor plan).
|
|
3. Push both repos.
|
|
4. `bw apply ovh.left4me` — applies reactor changes; systemd restarts
|
|
affected units automatically.
|
|
5. Verify on the host:
|
|
- `systemctl cat left4me-server@1` shows the new directives.
|
|
- Re-run a Test 8 subset (D1.a, D1.b, D2.b via PrivatePIDs, D5 with
|
|
the corrected pgrep) using the *corrected* probe pattern (per
|
|
spec bug fix in scope item 4). Test 8's full rerun is unnecessary
|
|
— composition is proven; only the *deployment* needs verifying.
|
|
- `sysctl kernel.yama.ptrace_scope` = 2.
|
|
- Smoke: server@1 + server@2 + web all active and stable for 10
|
|
minutes. Web UI: login, server start/stop, log view, overlay
|
|
rebuild.
|
|
6. Rollback if needed: `git revert` the ckn-bw commit + `bw apply`.
|
|
|
|
## What's out of scope
|
|
|
|
- **`MemoryDenyWriteExecute=true`** — permanently excluded.
|
|
- **AppArmor profile** — deferred per defenses-survey.
|
|
- **`build-overlay-unit` refactor**
|
|
(`2026-05-15-build-overlay-unit-design.md`) — sequenced after this.
|
|
Will reuse `HARDENING_COMMON` (or a variant) when it lands.
|
|
- **3-user uid split** — `2026-05-15-user-uid-split-design.md`
|
|
superseded by this refactor (scope item 5).
|
|
- **Broader configmgmt-responsibility reshape** — hardening as
|
|
drop-ins living in left4me, ckn-bw becoming a thin file-shipper.
|
|
Real direction worth pursuing; deserves a dedicated session.
|
|
Out of scope here.
|
|
- **Stale RCON port app bug** — flagged in executor's handoff. Separate
|
|
scope.
|
|
- **Pushing the branch** — operator decides when.
|
|
|
|
## Implementation notes (resolved during plan execution)
|
|
|
|
- The ckn-bw systemd-bundle emitter renders Python tuples as repeated
|
|
`Key=Value` lines and renders empty strings as `Key=` with no value.
|
|
Both behaviors confirmed by reading the Mako template in
|
|
`libs/systemd.py:17-23`. Tuple branch: `isinstance(value,
|
|
(list, set, tuple))` iterates and emits `${option}=${item}` per
|
|
element, preserving insertion order (sets are sorted; lists and
|
|
tuples are not). Empty-string branch: falls through to `else:
|
|
${option}=${str(value)}`, which emits `Key=` with nothing after `=`.
|
|
`None` suppresses the key entirely (distinct from empty string —
|
|
important). The `protection()` helper at `libs/systemd.py:94` already
|
|
uses `'CapabilityBoundingSet': ''` as a live in-repo example. Tuple
|
|
precedent in the left4me bundle: `EnvironmentFile` at
|
|
`bundles/left4me/metadata.py:201-204`. Verified 2026-05-15.
|
|
- `SocketBindAllow=` value: hard-coded port range `27000-27999` for
|
|
both `udp:` and `tcp:` lines (matches the `LEFT4ME_PORT_RANGE_*`
|
|
metadata values). Variable substitution in systemd directives is not
|
|
universally supported; hard-coded range avoids the hazard.
|
|
|
|
## Pointers
|
|
|
|
- Threat model: `2026-05-15-hardening-threat-model.md`
|
|
- Defenses survey: `2026-05-15-hardening-defenses-survey.md` (§ 5
|
|
candidate composition is the basis for the factoring above)
|
|
- Test plan + results: `2026-05-15-hardening-test-plan.md`
|
|
(commit `461b8d0`)
|
|
- Executor's handoff: `2026-05-15-session-handoff.md`
|
|
(commit `152c313`)
|
|
- Live reactor: `~/Projekte/ckn-bw/bundles/left4me/metadata.py:150+`
|
|
- Reference units: `deploy/files/usr/local/lib/systemd/system/`
|
|
- Deferred uid-split spec: `2026-05-15-user-uid-split-design.md`
|
|
- Adjacent (sequenced after): `2026-05-15-build-overlay-unit-design.md`
|