# Build-time idmap: move the uid translation from the gameserver mount into the script sandbox ## Context The current idmap implementation translates uids at **gameserver mount time**: `left4me-overlay` stats each lowerdir, creates a per-lowerdir idmapped bind under `runtime//idmap/` for the sandbox- owned ones, then uses those bind paths in the overlay's `lowerdir=`. On stop, the binds get torn down. Works correctly today, but spreads the idmap concern across two helpers and adds mount lifecycle code on every gameserver start. Cleaner alternative: do the idmap translation at **script-sandbox build time**, so files land on disk as `left4me`-owned. The on-disk state then matches workshop-built overlays (also left4me-owned), and the gameserver mount path becomes uniform — no per-lowerdir stat, no idmap binds, no extra cleanup. This plan switches the architecture to the build-time approach and reverts the gameserver-mount idmap code. ## Verified mechanism Tested end-to-end on `left4.me` (Trixie, kernel 6.12.86, ext4) on 2026-05-15: 1. `/source/` dir owned by `left4me` on disk. 2. `mount --bind --map-users=980:981:1 --map-groups=980:981:1 /source /idmapped` — inside `/idmapped`, files appear as uid 981 (sandbox view). 3. `mount --bind /idmapped /rebound` — a plain second bind. The idmap **propagates** to `/rebound` (rebound view also shows uid 981). This is what `BindPaths=` in the sandbox unit does. 4. `sudo -u l4d2-sandbox touch /rebound/x.txt` — write **succeeds**. The file lands on disk owned by `left4me` (uid 980). Map direction is the inverse of the gameserver-side map: `--map-users=::1` where disk is `left4me` and mount-side is `l4d2-sandbox`. Inside the bind, the sandbox uid sees its own uid as itself; writes from that uid get translated back to the disk-side (left4me) for storage. ## Approach ### Script-sandbox helper (`deploy/files/usr/local/libexec/left4me/left4me-script-sandbox`) Pre-create an idmapped bind staging path, point the sandbox's BindPaths at it, clean up on exit. Concretely: 1. **Remove** the existing `chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"` and `chmod 0755` lines. The overlay dir stays `left4me`-owned (web app's creation default). 2. **Add** a setup block before `systemd-run`: ```bash STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID} trap 'umount "$STAGING" 2>/dev/null || true; rmdir "$STAGING" 2>/dev/null || true' EXIT mkdir -p "$STAGING" mount --bind \ --map-users=$(id -u left4me):$(id -u l4d2-sandbox):1 \ --map-groups=$(id -g left4me):$(id -g l4d2-sandbox):1 \ "$OVERLAY_DIR" "$STAGING" ``` 3. **Change** the systemd-run line: - `BindPaths="${OVERLAY_DIR}:/overlay"` → `BindPaths="${STAGING}:/overlay"` 4. **Remove** the post-build `find ... chmod o+r` block. Files end up left4me-owned, web app reads them via its primary uid. The world-read kludge was only needed because of the old sandbox- owned files; with this change it's obsolete. `trap` ensures the staging bind is umounted even on errors / signals. Idempotent: if the helper is re-run, `umount + rmdir` handle existing state, and `mkdir -p` + `mount --bind` over an existing mountpoint adds another bind that the next exit cleans up. The kernel 6.12 bind nesting on the same path works fine (verified during the recent gameserver-side idmap fix). ### Gameserver-mount helper (`deploy/files/usr/local/libexec/left4me/left4me-overlay`) Revert the idmap logic added in commit `2f6a9cf` (+ fix in `9053186`, + mountpoint-detection fix in `dd918ac`). Specifically: 1. **Remove** the per-lowerdir stat + idmap-decision loop in `cmd_mount`. `lowerdir=` becomes the simple colon-join of resolved lowerdirs (the pre-2f6a9cf shape). 2. **Remove** the bind-umount loop in `cmd_umount` and the `shutil.rmtree(idmap_dir, ...)` line. 3. **Remove** the `_is_mountpoint`, `_lookup_uid`, and `_get_user_ids` helpers — no longer used. (Keep `os.path.ismount` for the merged overlay check; that one's reliable.) 4. **Remove** the `LEFT4ME_TEST_*_UID/GID` test-only env-var stubs. 5. **Remove** the idmap PRINT_ONLY emission. The helper shrinks back to the pre-idmap size (~242 lines from current 381). ### Tests In `l4d2host/tests/test_overlay_helper.py`: 1. **Remove** `test_mount_idmaps_sandbox_owned_lowerdir`. 2. **Remove** `test_mount_skips_idmap_for_left4me_owned_lowerdir`. 3. **Remove** `test_umount_unwinds_idmap_binds`. 4. **Remove** `test_is_mountpoint_detects_same_fs_bind_mount` and the `_load_helper_module` helper. 5. **Remove** `_setup_instance_with_uid` and the `FAKE_*_UID/GID` constants. 6. **Remove** the `LEFT4ME_TEST_*` env-var injection in `_run`. In `deploy/tests/test_deploy_artifacts.py`: 1. **Remove** `test_overlay_helper_idmaps_sandbox_owned_lowerdirs` (the regression test for the soon-removed feature). 2. **Add** a new test `test_script_sandbox_uses_idmap_staging` that asserts the sandbox helper contains: - `--map-users=` and `--map-groups=` strings (the bind setup), - `/var/lib/left4me/tmp/sandbox-idmap-` (the staging path prefix), - `BindPaths="${STAGING}:/overlay"` (or close equivalent — point the bind at the idmapped staging path, not at OVERLAY_DIR). - A `trap` for cleanup. 3. **Remove** the existing `chown -R l4d2-sandbox` assertion in the sandbox-helper test (if any). ### Migration Existing overlays under `/var/lib/left4me/overlays//` are a mix: - Workshop-built: already `left4me`-owned (no migration needed). - Script-built (e.g. server 2's overlays 4 and 9): currently `l4d2-sandbox`-owned from the prior helper version. **Need chown to `left4me:left4me`.** One-shot migration command on the test server (run before deploying the new helpers, OR after — both work because the new script-sandbox also expects left4me-owned dirs): ```bash sudo chown -R left4me:left4me /var/lib/left4me/overlays/ ``` That's safe — overlays/* are all overlay content, no other tenants. The workshop ones are already left4me; the chown is a no-op for them. The script-built ones get flipped to the new ownership model. Running gameservers using the old idmap-bind setup will keep working on the old overlays/ files (which they bind via the now-orphan idmap bind that's already in place). The next stop/start cycle picks up the new helper, which: - Doesn't create any new idmap binds (gameserver-side helper has none), - Cleans up the legacy idmap binds it finds (the existing umount loop in the current helper handles this on the way out). After the first stop/start cycle, no more idmap binds exist anywhere in the system. Steady state. ### ckn-bw bundle No changes needed. The `install_left4me_scripts` action picks up the new helper contents from `/opt/left4me/src/deploy/files/usr/local/...` on the next `git_deploy` apply. ckn-bw itself is content-agnostic about the helper internals. ## Files to modify - `deploy/files/usr/local/libexec/left4me/left4me-script-sandbox` — add idmap bind setup + trap cleanup; remove old chown; switch BindPaths. - `deploy/files/usr/local/libexec/left4me/left4me-overlay` — revert the ~140 lines of idmap-handling code; remove uid lookup, mountinfo helper, test-stub env vars; drop the idmap PRINT_ONLY emission. - `l4d2host/tests/test_overlay_helper.py` — drop idmap tests and helpers. - `deploy/tests/test_deploy_artifacts.py` — flip the asserted invariant (helper has idmap → sandbox has idmap). ## Verification End-to-end on `left4.me`: 1. Push left4me commit, `bw apply ovh.left4me`. 2. `sudo chown -R left4me:left4me /var/lib/left4me/overlays/` (one-shot migration). 3. `sudo systemctl restart left4me-server@2`. 4. `sudo findmnt --task 1 -o TARGET | grep runtime/2` — expect *only* `runtime/2/merged`, no `idmap/*` subdirs. 5. `sudo ls -ln /var/lib/left4me/overlays/9/` and a couple of other script overlays — expect `left4me:left4me`. 6. Trigger an overlay rebuild from the web UI on a script overlay. Confirm the build succeeds and the resulting files are left4me-owned on disk. 7. `sudo -u left4me touch /var/lib/left4me/runtime/2/merged/left4dead2/addons/sourcemod/logs/test.log` — expect write to succeed (verifies SM logging path still works). 8. RCON `sm_cvar nb_update_frequency 0.0333` — no permission-denied line in `journalctl -u left4me-server@2`. Local tests: ``` pytest l4d2host/tests/test_overlay_helper.py -q pytest deploy/tests/test_deploy_artifacts.py -q ``` Both should pass with reduced test count (removed idmap-on-mount tests, added one sandbox-helper assertion). ## Risks - **Kernel version dependency**: idmap propagation through plain re-bind was verified on 6.12.86. Older kernels may behave differently. ovh.left4me is on Trixie's 6.12, so we're fine; future hosts on older kernels would need verification. Document the kernel floor (≥ 6.6 for overlayfs+idmap, but ≥ 6.x for the propagation — we have no exact lower bound documented). - **Stale idmap binds during migration**: server 2 currently has two active gameserver-side idmap binds (`runtime/2/idmap/overlays_4` and `overlays_9`). The first stop after deploy uses the existing helper code (with `_is_mountpoint` fix) to umount them. Verified in the recent fix cycle. New starts won't create new binds. - **Sandbox migration of in-flight builds**: if a script-overlay build is running during the deploy + chown migration, the chown could happen mid-write. Mitigation: don't run the chown while a build is active; check via `systemctl list-units 'left4me-script-*'` first. - **The trap-based cleanup in bash**: if the helper is hit with SIGKILL, the trap doesn't fire and the staging bind leaks. Same exposure as today's leaks (gameserver-side stale binds on similar scenarios). Acceptable; the next sandbox run for the same overlay id `umount`s the leftover bind first via the trap setup pattern (`umount; rmdir; mkdir -p; mount --bind` is idempotent). ## Why this is worth doing despite the working current solution Today's idmap-on-mount works and is correct. The reasons to refactor: - **Architectural locality**: the uid translation is a build-time concern (the sandbox creates files); having it as a mount-time concern means the gameserver path needs to know about a producer- side decision. - **Code reduction**: helper shrinks by ~140 lines; tests by ~150. Removed code is removed bug surface. - **On-disk consistency**: all overlay content becomes `left4me`- owned. Easier to reason about (no two-tier ownership), easier to manually inspect (no per-overlay-type ownership). - **Mount lifecycle simplification**: no per-instance idmap dir creation, no per-start uid lookups, no per-stop bind teardown, no stacked-bind regression hazard from the same-fs `os.path.ismount` trap (we already fixed that once). - **Web app read path**: drops the world-read chmod kludge in the sandbox helper. File-tree download endpoint reads via primary uid. The cost (refactor + migration) is paid once; the benefit is permanent. ## Out of scope - Splitting the web-app uid from the gameserver uid (future change noted in earlier plans). - Rewriting shell helpers in Python. - `left4me-apply-cake` cleanup (still drifting along in the install glob). - Re-examining whether `l4d2-sandbox` should exist as a separate uid at all (this plan keeps it, but the cost-benefit might shift later).