plan(left4me): move idmap from gameserver mount to script-sandbox build
Architectural cleanup: the uid translation is a build-time concern (the sandbox produces sandbox-uid files); having the gameserver path unwind that producer-side decision on every mount means the mount helper carries idmap lifecycle code it shouldn't need. Moving the idmap into the script-sandbox bind makes files land left4me-owned on disk, drops ~140 lines from left4me-overlay, and makes all overlay content (workshop + script-built) consistent on-disk. Verified on left4.me kernel 6.12.86 that the kernel idmap propagates through plain re-bind, so systemd-run's BindPaths can wrap a pre-created idmapped staging path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
dd918aca4b
commit
bc25d423aa
1 changed files with 264 additions and 0 deletions
264
docs/superpowers/plans/2026-05-15-build-time-idmap.md
Normal file
264
docs/superpowers/plans/2026-05-15-build-time-idmap.md
Normal file
|
|
@ -0,0 +1,264 @@
|
||||||
|
# Build-time idmap: move the uid translation from the gameserver mount
|
||||||
|
into the script sandbox
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The current idmap implementation translates uids at **gameserver mount
|
||||||
|
time**: `left4me-overlay` stats each lowerdir, creates a per-lowerdir
|
||||||
|
idmapped bind under `runtime/<n>/idmap/<basename>` for the sandbox-
|
||||||
|
owned ones, then uses those bind paths in the overlay's `lowerdir=`.
|
||||||
|
On stop, the binds get torn down. Works correctly today, but spreads
|
||||||
|
the idmap concern across two helpers and adds mount lifecycle code on
|
||||||
|
every gameserver start.
|
||||||
|
|
||||||
|
Cleaner alternative: do the idmap translation at **script-sandbox
|
||||||
|
build time**, so files land on disk as `left4me`-owned. The on-disk
|
||||||
|
state then matches workshop-built overlays (also left4me-owned), and
|
||||||
|
the gameserver mount path becomes uniform — no per-lowerdir stat,
|
||||||
|
no idmap binds, no extra cleanup.
|
||||||
|
|
||||||
|
This plan switches the architecture to the build-time approach and
|
||||||
|
reverts the gameserver-mount idmap code.
|
||||||
|
|
||||||
|
## Verified mechanism
|
||||||
|
|
||||||
|
Tested end-to-end on `left4.me` (Trixie, kernel 6.12.86, ext4) on
|
||||||
|
2026-05-15:
|
||||||
|
|
||||||
|
1. `/source/` dir owned by `left4me` on disk.
|
||||||
|
2. `mount --bind --map-users=980:981:1 --map-groups=980:981:1
|
||||||
|
/source /idmapped` — inside `/idmapped`, files appear as uid 981
|
||||||
|
(sandbox view).
|
||||||
|
3. `mount --bind /idmapped /rebound` — a plain second bind. The idmap
|
||||||
|
**propagates** to `/rebound` (rebound view also shows uid 981).
|
||||||
|
This is what `BindPaths=` in the sandbox unit does.
|
||||||
|
4. `sudo -u l4d2-sandbox touch /rebound/x.txt` — write **succeeds**.
|
||||||
|
The file lands on disk owned by `left4me` (uid 980).
|
||||||
|
|
||||||
|
Map direction is the inverse of the gameserver-side map:
|
||||||
|
`--map-users=<disk_uid>:<mount_uid>:1` where disk is `left4me` and
|
||||||
|
mount-side is `l4d2-sandbox`. Inside the bind, the sandbox uid sees
|
||||||
|
its own uid as itself; writes from that uid get translated back to
|
||||||
|
the disk-side (left4me) for storage.
|
||||||
|
|
||||||
|
## Approach
|
||||||
|
|
||||||
|
### Script-sandbox helper (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`)
|
||||||
|
|
||||||
|
Pre-create an idmapped bind staging path, point the sandbox's
|
||||||
|
BindPaths at it, clean up on exit. Concretely:
|
||||||
|
|
||||||
|
1. **Remove** the existing `chown -R l4d2-sandbox:l4d2-sandbox
|
||||||
|
"$OVERLAY_DIR"` and `chmod 0755` lines. The overlay dir stays
|
||||||
|
`left4me`-owned (web app's creation default).
|
||||||
|
2. **Add** a setup block before `systemd-run`:
|
||||||
|
```bash
|
||||||
|
STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID}
|
||||||
|
trap 'umount "$STAGING" 2>/dev/null || true; rmdir "$STAGING" 2>/dev/null || true' EXIT
|
||||||
|
mkdir -p "$STAGING"
|
||||||
|
mount --bind \
|
||||||
|
--map-users=$(id -u left4me):$(id -u l4d2-sandbox):1 \
|
||||||
|
--map-groups=$(id -g left4me):$(id -g l4d2-sandbox):1 \
|
||||||
|
"$OVERLAY_DIR" "$STAGING"
|
||||||
|
```
|
||||||
|
3. **Change** the systemd-run line:
|
||||||
|
- `BindPaths="${OVERLAY_DIR}:/overlay"` → `BindPaths="${STAGING}:/overlay"`
|
||||||
|
4. **Remove** the post-build `find ... chmod o+r` block. Files end up
|
||||||
|
left4me-owned, web app reads them via its primary uid. The
|
||||||
|
world-read kludge was only needed because of the old sandbox-
|
||||||
|
owned files; with this change it's obsolete.
|
||||||
|
|
||||||
|
`trap` ensures the staging bind is umounted even on errors / signals.
|
||||||
|
Idempotent: if the helper is re-run, `umount + rmdir` handle existing
|
||||||
|
state, and `mkdir -p` + `mount --bind` over an existing mountpoint
|
||||||
|
adds another bind that the next exit cleans up. The kernel 6.12 bind
|
||||||
|
nesting on the same path works fine (verified during the recent
|
||||||
|
gameserver-side idmap fix).
|
||||||
|
|
||||||
|
### Gameserver-mount helper (`deploy/files/usr/local/libexec/left4me/left4me-overlay`)
|
||||||
|
|
||||||
|
Revert the idmap logic added in commit `2f6a9cf` (+ fix in `9053186`,
|
||||||
|
+ mountpoint-detection fix in `dd918ac`). Specifically:
|
||||||
|
|
||||||
|
1. **Remove** the per-lowerdir stat + idmap-decision loop in `cmd_mount`.
|
||||||
|
`lowerdir=` becomes the simple colon-join of resolved lowerdirs
|
||||||
|
(the pre-2f6a9cf shape).
|
||||||
|
2. **Remove** the bind-umount loop in `cmd_umount` and the
|
||||||
|
`shutil.rmtree(idmap_dir, ...)` line.
|
||||||
|
3. **Remove** the `_is_mountpoint`, `_lookup_uid`, and `_get_user_ids`
|
||||||
|
helpers — no longer used. (Keep `os.path.ismount` for the merged
|
||||||
|
overlay check; that one's reliable.)
|
||||||
|
4. **Remove** the `LEFT4ME_TEST_*_UID/GID` test-only env-var stubs.
|
||||||
|
5. **Remove** the idmap PRINT_ONLY emission.
|
||||||
|
|
||||||
|
The helper shrinks back to the pre-idmap size (~242 lines from current 381).
|
||||||
|
|
||||||
|
### Tests
|
||||||
|
|
||||||
|
In `l4d2host/tests/test_overlay_helper.py`:
|
||||||
|
|
||||||
|
1. **Remove** `test_mount_idmaps_sandbox_owned_lowerdir`.
|
||||||
|
2. **Remove** `test_mount_skips_idmap_for_left4me_owned_lowerdir`.
|
||||||
|
3. **Remove** `test_umount_unwinds_idmap_binds`.
|
||||||
|
4. **Remove** `test_is_mountpoint_detects_same_fs_bind_mount` and the
|
||||||
|
`_load_helper_module` helper.
|
||||||
|
5. **Remove** `_setup_instance_with_uid` and the `FAKE_*_UID/GID`
|
||||||
|
constants.
|
||||||
|
6. **Remove** the `LEFT4ME_TEST_*` env-var injection in `_run`.
|
||||||
|
|
||||||
|
In `deploy/tests/test_deploy_artifacts.py`:
|
||||||
|
|
||||||
|
1. **Remove** `test_overlay_helper_idmaps_sandbox_owned_lowerdirs`
|
||||||
|
(the regression test for the soon-removed feature).
|
||||||
|
2. **Add** a new test `test_script_sandbox_uses_idmap_staging` that
|
||||||
|
asserts the sandbox helper contains:
|
||||||
|
- `--map-users=` and `--map-groups=` strings (the bind setup),
|
||||||
|
- `/var/lib/left4me/tmp/sandbox-idmap-` (the staging path prefix),
|
||||||
|
- `BindPaths="${STAGING}:/overlay"` (or close equivalent — point
|
||||||
|
the bind at the idmapped staging path, not at OVERLAY_DIR).
|
||||||
|
- A `trap` for cleanup.
|
||||||
|
3. **Remove** the existing `chown -R l4d2-sandbox` assertion in the
|
||||||
|
sandbox-helper test (if any).
|
||||||
|
|
||||||
|
### Migration
|
||||||
|
|
||||||
|
Existing overlays under `/var/lib/left4me/overlays/<id>/` are a mix:
|
||||||
|
|
||||||
|
- Workshop-built: already `left4me`-owned (no migration needed).
|
||||||
|
- Script-built (e.g. server 2's overlays 4 and 9): currently
|
||||||
|
`l4d2-sandbox`-owned from the prior helper version. **Need chown to
|
||||||
|
`left4me:left4me`.**
|
||||||
|
|
||||||
|
One-shot migration command on the test server (run before deploying
|
||||||
|
the new helpers, OR after — both work because the new script-sandbox
|
||||||
|
also expects left4me-owned dirs):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo chown -R left4me:left4me /var/lib/left4me/overlays/
|
||||||
|
```
|
||||||
|
|
||||||
|
That's safe — overlays/* are all overlay content, no other tenants.
|
||||||
|
The workshop ones are already left4me; the chown is a no-op for them.
|
||||||
|
The script-built ones get flipped to the new ownership model.
|
||||||
|
|
||||||
|
Running gameservers using the old idmap-bind setup will keep working
|
||||||
|
on the old overlays/<id> files (which they bind via the now-orphan
|
||||||
|
idmap bind that's already in place). The next stop/start cycle picks
|
||||||
|
up the new helper, which:
|
||||||
|
|
||||||
|
- Doesn't create any new idmap binds (gameserver-side helper has
|
||||||
|
none),
|
||||||
|
- Cleans up the legacy idmap binds it finds (the existing umount loop
|
||||||
|
in the current helper handles this on the way out).
|
||||||
|
|
||||||
|
After the first stop/start cycle, no more idmap binds exist anywhere
|
||||||
|
in the system. Steady state.
|
||||||
|
|
||||||
|
### ckn-bw bundle
|
||||||
|
|
||||||
|
No changes needed. The `install_left4me_scripts` action picks up the
|
||||||
|
new helper contents from `/opt/left4me/src/deploy/files/usr/local/...`
|
||||||
|
on the next `git_deploy` apply. ckn-bw itself is content-agnostic
|
||||||
|
about the helper internals.
|
||||||
|
|
||||||
|
## Files to modify
|
||||||
|
|
||||||
|
- `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add
|
||||||
|
idmap bind setup + trap cleanup; remove old chown; switch BindPaths.
|
||||||
|
- `deploy/files/usr/local/libexec/left4me/left4me-overlay` — revert the
|
||||||
|
~140 lines of idmap-handling code; remove uid lookup, mountinfo
|
||||||
|
helper, test-stub env vars; drop the idmap PRINT_ONLY emission.
|
||||||
|
- `l4d2host/tests/test_overlay_helper.py` — drop idmap tests and
|
||||||
|
helpers.
|
||||||
|
- `deploy/tests/test_deploy_artifacts.py` — flip the asserted
|
||||||
|
invariant (helper has idmap → sandbox has idmap).
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
End-to-end on `left4.me`:
|
||||||
|
|
||||||
|
1. Push left4me commit, `bw apply ovh.left4me`.
|
||||||
|
2. `sudo chown -R left4me:left4me /var/lib/left4me/overlays/` (one-shot
|
||||||
|
migration).
|
||||||
|
3. `sudo systemctl restart left4me-server@2`.
|
||||||
|
4. `sudo findmnt --task 1 -o TARGET | grep runtime/2` — expect *only*
|
||||||
|
`runtime/2/merged`, no `idmap/*` subdirs.
|
||||||
|
5. `sudo ls -ln /var/lib/left4me/overlays/9/` and a couple of other
|
||||||
|
script overlays — expect `left4me:left4me`.
|
||||||
|
6. Trigger an overlay rebuild from the web UI on a script overlay.
|
||||||
|
Confirm the build succeeds and the resulting files are
|
||||||
|
left4me-owned on disk.
|
||||||
|
7. `sudo -u left4me touch
|
||||||
|
/var/lib/left4me/runtime/2/merged/left4dead2/addons/sourcemod/logs/test.log`
|
||||||
|
— expect write to succeed (verifies SM logging path still works).
|
||||||
|
8. RCON `sm_cvar nb_update_frequency 0.0333` — no permission-denied
|
||||||
|
line in `journalctl -u left4me-server@2`.
|
||||||
|
|
||||||
|
Local tests:
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest l4d2host/tests/test_overlay_helper.py -q
|
||||||
|
pytest deploy/tests/test_deploy_artifacts.py -q
|
||||||
|
```
|
||||||
|
|
||||||
|
Both should pass with reduced test count (removed idmap-on-mount
|
||||||
|
tests, added one sandbox-helper assertion).
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
- **Kernel version dependency**: idmap propagation through plain
|
||||||
|
re-bind was verified on 6.12.86. Older kernels may behave
|
||||||
|
differently. ovh.left4me is on Trixie's 6.12, so we're fine; future
|
||||||
|
hosts on older kernels would need verification. Document the kernel
|
||||||
|
floor (≥ 6.6 for overlayfs+idmap, but ≥ 6.x for the propagation —
|
||||||
|
we have no exact lower bound documented).
|
||||||
|
- **Stale idmap binds during migration**: server 2 currently has two
|
||||||
|
active gameserver-side idmap binds (`runtime/2/idmap/overlays_4`
|
||||||
|
and `overlays_9`). The first stop after deploy uses the existing
|
||||||
|
helper code (with `_is_mountpoint` fix) to umount them. Verified
|
||||||
|
in the recent fix cycle. New starts won't create new binds.
|
||||||
|
- **Sandbox migration of in-flight builds**: if a script-overlay
|
||||||
|
build is running during the deploy + chown migration, the chown
|
||||||
|
could happen mid-write. Mitigation: don't run the chown while a
|
||||||
|
build is active; check via `systemctl list-units
|
||||||
|
'left4me-script-*'` first.
|
||||||
|
- **The trap-based cleanup in bash**: if the helper is hit with
|
||||||
|
SIGKILL, the trap doesn't fire and the staging bind leaks. Same
|
||||||
|
exposure as today's leaks (gameserver-side stale binds on similar
|
||||||
|
scenarios). Acceptable; the next sandbox run for the same overlay
|
||||||
|
id `umount`s the leftover bind first via the trap setup pattern
|
||||||
|
(`umount; rmdir; mkdir -p; mount --bind` is idempotent).
|
||||||
|
|
||||||
|
## Why this is worth doing despite the working current solution
|
||||||
|
|
||||||
|
Today's idmap-on-mount works and is correct. The reasons to refactor:
|
||||||
|
|
||||||
|
- **Architectural locality**: the uid translation is a build-time
|
||||||
|
concern (the sandbox creates files); having it as a mount-time
|
||||||
|
concern means the gameserver path needs to know about a producer-
|
||||||
|
side decision.
|
||||||
|
- **Code reduction**: helper shrinks by ~140 lines; tests by ~150.
|
||||||
|
Removed code is removed bug surface.
|
||||||
|
- **On-disk consistency**: all overlay content becomes `left4me`-
|
||||||
|
owned. Easier to reason about (no two-tier ownership), easier to
|
||||||
|
manually inspect (no per-overlay-type ownership).
|
||||||
|
- **Mount lifecycle simplification**: no per-instance idmap dir
|
||||||
|
creation, no per-start uid lookups, no per-stop bind teardown, no
|
||||||
|
stacked-bind regression hazard from the same-fs `os.path.ismount`
|
||||||
|
trap (we already fixed that once).
|
||||||
|
- **Web app read path**: drops the world-read chmod kludge in the
|
||||||
|
sandbox helper. File-tree download endpoint reads via primary uid.
|
||||||
|
|
||||||
|
The cost (refactor + migration) is paid once; the benefit is
|
||||||
|
permanent.
|
||||||
|
|
||||||
|
## Out of scope
|
||||||
|
|
||||||
|
- Splitting the web-app uid from the gameserver uid (future change
|
||||||
|
noted in earlier plans).
|
||||||
|
- Rewriting shell helpers in Python.
|
||||||
|
- `left4me-apply-cake` cleanup (still drifting along in the install
|
||||||
|
glob).
|
||||||
|
- Re-examining whether `l4d2-sandbox` should exist as a separate uid
|
||||||
|
at all (this plan keeps it, but the cost-benefit might shift
|
||||||
|
later).
|
||||||
Loading…
Reference in a new issue