docs(specs): script sandbox v3 — egress filter design + plan

Captures the v3 design: IPAddressDeny= alone (no IPAddressAllow=any
because the documented "more specific wins" semantics don't hold on
systemd 257 / kernel 6.12 — the allow trumps unconditionally), explicit
CIDRs (the -p parser rejects the localhost/link-local shorthand
keywords), and a static sandbox-only resolv.conf bind to keep DNS
reachable when private RFC1918 ranges are blocked.

Plan documents what was implemented (in 7e66936) and the lessons
surfaced during execution so the next person doesn't have to rediscover
them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
mwiegand 2026-05-08 17:08:47 +02:00
parent 7e66936d03
commit abc907b14b
No known key found for this signature in database
2 changed files with 202 additions and 0 deletions

View file

@ -0,0 +1,89 @@
# L4D2 Script Sandbox v3 Implementation Plan
> **Approval status:** User-approved 2026-05-08; implemented and pushed in `7e66936`. This plan is recorded retrospectively for symmetry with the v1 / v2 plans.
**Goal:** Restrict the sandbox to public-internet egress per `docs/superpowers/specs/2026-05-08-l4d2-script-sandbox-v3-egress-filter.md`. Bind a static public-resolver `resolv.conf` into the sandbox.
---
## Locked Decisions (see spec for rationale)
- `IPAddressDeny=` only; no `IPAddressAllow=any`.
- Explicit CIDRs (no `localhost` / `link-local` shorthand keywords — `systemd-run -p` parser rejects them).
- Static `nameserver 1.1.1.1` + `nameserver 8.8.8.8` in a sandbox-only resolv.conf.
- `AF_UNIX` left enabled.
---
## Current Gap (at start of this iteration)
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` (v2) shares the host network namespace with no egress filter.
- The helper bind-mounts `/etc/resolv.conf` from the host into the sandbox (which points at private-IP DNS).
- `deploy/deploy-test-server.sh` does not install a sandbox-only resolv.conf.
- No deploy-artifact tests for `IPAddressDeny=` or for the resolv.conf shape.
---
## Task 1: Add `IPAddressDeny=`, swap resolv.conf bind, ship the static file
**Files:**
- Create: `deploy/files/etc/left4me/sandbox-resolv.conf` — two `nameserver` lines + a header comment.
- Modify: `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add `-p IPAddressDeny="..."` directive (11 explicit CIDRs); replace the `/etc/resolv.conf:/etc/resolv.conf` token in `BindReadOnlyPaths=` with `/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf`.
- Modify: `deploy/deploy-test-server.sh` — add an `install -m 0644 -o root -g root .../sandbox-resolv.conf /etc/left4me/sandbox-resolv.conf` line near the existing `host.env` install.
- Modify: `deploy/tests/test_deploy_artifacts.py` — extend `test_script_sandbox_helper_invokes_systemd_run_with_hardening` to assert each CIDR is present and that `IPAddressAllow=any` is **absent** (regression guard); update the BindReadOnlyPaths assertion to expect the sandbox-resolv.conf bind; add `test_sandbox_resolv_conf_exists` and `test_deploy_script_installs_sandbox_resolv_conf`.
Test plan (RED-first not used here; the work was driven by smoke-test feedback against a live host):
1. `test_script_sandbox_helper_invokes_systemd_run_with_hardening``IPAddressDeny=` present with all 11 CIDRs; no `IPAddressAllow=any`; resolv.conf bind path is `/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf`.
2. `test_sandbox_resolv_conf_exists` — file present, ≥2 nameservers, all in non-private space.
3. `test_deploy_script_installs_sandbox_resolv_conf` — deploy script references both source path under `deploy/files/etc/left4me/sandbox-resolv.conf` and target path `/etc/left4me/sandbox-resolv.conf`.
**Verification:**
```
sh -n deploy/deploy-test-server.sh
bash -n deploy/files/usr/local/libexec/left4me/left4me-script-sandbox
python3 -m pytest deploy/tests/ -q
```
**Commit:** `feat(deploy): restrict script-sandbox egress to public internet only`
---
## Task 2: Deploy + smoke-test on `ckn@10.0.4.128`
**Files:** none.
Run `deploy/deploy-test-server.sh ckn@10.0.4.128`. Then on the host, invoke the helper with a probe script that opens TCP connections to:
- `1.1.1.1:443` — must connect (public)
- `127.0.0.1:8000` — must block (web app on loopback)
- `127.0.0.1:22` — must block (sshd on loopback)
- `10.0.4.128:22` — must block (host's external SSH on private LAN)
- `10.0.0.1:53` — must block (LAN DNS resolver)
Plus `curl -m 5 https://steamcommunity.com/` end-to-end (DNS + HTTPS) → 200.
Inside the sandbox, `cat /etc/resolv.conf` must show the two public resolvers.
If any of the localhost / private targets connects, the deny is being silently overridden — see spec §Locked Decisions point 1.
**Commit:** none — operational verification.
---
## Lessons surfaced during execution
These belong in the spec but are repeated here as the "things the next person should not have to rediscover":
- **`IPAddressAllow=any` silently overrides every `IPAddressDeny=` rule** on this systemd 257 / kernel 6.12 combo, despite documentation stating "more specific rule wins". The negative test (`IPAddressAllow=any not in text`) locks this in.
- **systemd-run's `-p` parser rejects the `localhost` / `link-local` / `multicast` shorthand keywords** even though they parse fine in unit files. Use explicit CIDRs.
- **`/var/lib/left4me/.../left4me.db` is mode 0644 by default** — writing this file from the web app left it world-readable. Tightening to 0640 root:left4me happens in v2's deploy-script change; v3 does not re-touch it.
- **bpftool ships separately on Debian.** It's not needed for runtime, but `apt-get install bpftool` is useful for inspecting `sd_fw_egress` attach state when debugging filter behaviour.
---
## Rollback
`git revert 7e66936` and redeploy. The change is purely in deploy artifacts; no app code, no DB migration. Reverting reopens the previous v2 reachability.

View file

@ -0,0 +1,113 @@
# L4D2 Script Sandbox v3 — Egress Filter (Public Internet Only)
**Goal:** Restrict the script-overlay sandbox to public-internet egress only. Block reachability to the host's own services (localhost), the LAN, and any private RFC1918 / link-local / multicast / CGNAT / ULA addresses. Public DNS is preserved by bind-mounting a sandbox-only `resolv.conf` pointing at Cloudflare + Google.
**Approval status:** User-approved 2026-05-08. Implemented and smoke-tested on `ckn@10.0.4.128`.
## Context
After the v2 (systemd-only) migration, the sandbox still shared the host's network namespace. A live probe demonstrated the script could:
- Reach the web app on `127.0.0.1:8000` (HTTP 200 from `/health`).
- Reach the host's SSH daemon on `127.0.0.1:22` (banner returned).
- Reach the host on the LAN at `10.0.4.128:22` (banner returned).
- Reach the LAN gateway / DNS server at `10.0.0.1`.
- See Unix sockets in `/run` (`AF_UNIX` allowed).
The threat model says the sandbox should reach the public internet to download Workshop / l4d2center / GitHub content, but should **not** be able to talk to the host or LAN. systemd's `IPAddressDeny=` BPF cgroup egress filter is the right tool. It attaches a BPF program (`sd_fw_egress`) to the unit's cgroup; matching packets are silently dropped at send time.
A complication: the host's `/etc/resolv.conf` typically points at a private-IP DNS server (10.0.0.1 in the test deploy). Naively blocking `10.0.0.0/8` kills DNS, which kills outbound HTTP. The fix is to give the sandbox a static `resolv.conf` with public resolvers; DNS traffic then targets allowed public IPs.
## Locked Decisions
1. **`IPAddressDeny=` alone — no `IPAddressAllow=any`.** The systemd documentation claims "more specific rule wins" when both are set, but on systemd 257 + kernel 6.12 (and likely other combos), `IPAddressAllow=any` silently overrides every `IPAddressDeny=` rule. Verified empirically. With only `IPAddressDeny=` set, the kernel's default "allow all" applies to non-listed addresses; the listed CIDRs are dropped at the egress hook. **This must not be regressed** — adding back `IPAddressAllow=any` reopens every blocked range.
2. **Explicit CIDRs, no shorthand keywords.** systemd's unit-file parser accepts `localhost`, `link-local`, `multicast` shortcuts, but the `systemd-run -p` parser rejects them with `Failed to parse IP address prefix: localhost`. Use the CIDRs directly: `127.0.0.0/8 ::1/128 169.254.0.0/16 fe80::/10 224.0.0.0/4 ff00::/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 100.64.0.0/10 fc00::/7`.
3. **Static `/etc/left4me/sandbox-resolv.conf` with public resolvers** (Cloudflare 1.1.1.1, Google 8.8.8.8). Bind-mounted into the sandbox at `/etc/resolv.conf` via `BindReadOnlyPaths=/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf`. Two nameservers for redundancy. Picking other public resolvers (Quad9, OpenDNS) would also be acceptable; the file is the source of truth, not the helper.
4. **`AF_UNIX` stays in `RestrictAddressFamilies=`.** Dropping it would risk breaking NSS / syslog / D-Bus introspection paths for marginal gain — the IP-level filter handles the actual threat (reaching host TCP services). The Unix-socket surface (D-Bus system bus, systemd notify) is uid-gated and `l4d2-sandbox` has no special D-Bus permissions.
5. **No `PrivateNetwork=`.** That would block all networking, including the public internet. The whole point of script overlays is reaching public download sources.
6. **No DNS-over-HTTPS or DNSSEC.** Plain UDP-53 to public resolvers is sufficient; the threat is "egress targeting", not "DNS hijacking". Revisit if the trust model relaxes.
## Architecture
```text
sudo helper (root)
└─ chown overlay dir to l4d2-sandbox
└─ systemd-run --service [...all v2 directives...]
-p IPAddressDeny="<11 CIDRs>"
-p BindReadOnlyPaths="/etc/left4me/sandbox-resolv.conf:/etc/resolv.conf [...]"
└─ /bin/bash /script.sh
(egress to listed CIDRs dropped at sd_fw_egress BPF hook;
DNS goes to 1.1.1.1 / 8.8.8.8; everything else
reaches the public internet normally)
```
`IPAddressDeny=` blocks egress to:
| CIDR | Coverage |
|---|---|
| `127.0.0.0/8` | IPv4 loopback |
| `::1/128` | IPv6 loopback |
| `169.254.0.0/16` | IPv4 link-local (incl. AWS metadata, DHCP fallback) |
| `fe80::/10` | IPv6 link-local |
| `224.0.0.0/4` | IPv4 multicast |
| `ff00::/8` | IPv6 multicast |
| `10.0.0.0/8` | RFC1918 private |
| `172.16.0.0/12` | RFC1918 private |
| `192.168.0.0/16` | RFC1918 private |
| `100.64.0.0/10` | CGNAT (RFC6598) |
| `fc00::/7` | IPv6 ULA |
Public IPv4 / IPv6 destinations are unaffected.
## Files
- `deploy/files/etc/left4me/sandbox-resolv.conf` *(new)*`nameserver 1.1.1.1` + `nameserver 8.8.8.8`. Mode 0644 root-owned at deploy time.
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox``IPAddressDeny=` directive added; `BindReadOnlyPaths=` references the sandbox-resolv.conf instead of `/etc/resolv.conf`.
- `deploy/deploy-test-server.sh``install -m 0644 -o root -g root .../sandbox-resolv.conf /etc/left4me/sandbox-resolv.conf`.
- `deploy/tests/test_deploy_artifacts.py` — assert all of the above + the **negative assertion `IPAddressAllow=any not in text`** (regression guard).
The web app, ScriptBuilder, routes, models, and migrations are all unchanged. Same as v2.
## Verification
Smoke battery on the deployed host (probe script invoked through the helper as root):
| Target | Expected | Actual |
|---|---|---|
| `1.1.1.1:443` | connected | ✓ CONNECTED |
| `https://steamcommunity.com/` (DNS + HTTPS) | 200 | ✓ 200 |
| `127.0.0.1:8000` (web app) | blocked | ✓ TimeoutError |
| `127.0.0.1:22` (sshd) | blocked | ✓ TimeoutError |
| `10.0.4.128:22` (host LAN ssh) | blocked | ✓ TimeoutError |
| `10.0.0.1:53` (host's DNS resolver) | blocked | ✓ TimeoutError |
| `cat /etc/resolv.conf` inside | shows 1.1.1.1 + 8.8.8.8 | ✓ |
`bpftool cgroup show` against the unit's cgroup confirms `sd_fw_egress` and `sd_fw_ingress` are attached.
## Risks
- **`IPAddressAllow=` accidentally added back.** Reopens every blocked range silently. Mitigation: explicit negative test in `test_deploy_artifacts.py` plus a comment in the helper.
- **Public DNS resolver outage.** 1.1.1.1 and 8.8.8.8 are both down → DNS in sandbox fails → builds fail. Two resolvers from independent operators makes this very unlikely. Operator can change the file in `/etc/left4me/sandbox-resolv.conf` if they prefer different resolvers; the helper picks it up on next invocation.
- **Public DNS resolver privacy.** Cloudflare and Google see hostnames the scripts query. Acceptable for the workload (Steam Workshop, GitHub, etc. are public anyway); switch to Quad9 or self-hosted if this is a concern.
- **Future kernel/systemd that flips the documented "more specific wins" semantics.** If a future systemd version actually implements the documented behavior, a unit with only `IPAddressDeny=` continues to work; the negative test on `IPAddressAllow=any` keeps the regression-safe configuration locked in. Re-test on each major systemd upgrade.
- **Scripts that legitimately need a private IP.** E.g., a self-hosted internal mirror at 10.x. Not a use case today; if it arises, expose specific IPs via a future `IPAddressAllow=10.x.y.z/32` for that one host (not blanket).
## Out Of Scope
- **Per-overlay UID isolation.** Cross-script-overlay write access via the shared `l4d2-sandbox` UID is still possible after a hypothetical sandbox bypass. Deferred from earlier discussions.
- **Egress allowlist by hostname / domain.** Would require a forward proxy (Squid, mitmproxy). Heavier than warranted for the trust model.
- **Dropping `AF_UNIX` from `RestrictAddressFamilies=`.** Tangential to IP-level egress; risks breaking NSS / syslog.
- **DNSSEC / DoH.** Threat model is egress targeting, not DNS hijacking.
- **Network-namespace isolation (`PrivateNetwork=` + custom netns + NAT).** Heavier than `IPAddressDeny=` for equivalent outcome.
## Implementation Boundaries
- **No app code change.** Helper-side only.
- **No new systemd units.** Same transient `left4me-script-{id}-{pid}.service` pattern.
- **No new apt deps.** `bpftool` was used during smoke testing but is not required at runtime.
- **One new deploy artifact.** `sandbox-resolv.conf` shipped under `deploy/files/etc/left4me/`.