Architectural cleanup: the uid translation is a build-time concern (the sandbox produces sandbox-uid files); having the gameserver path unwind that producer-side decision on every mount means the mount helper carries idmap lifecycle code it shouldn't need. Moving the idmap into the script-sandbox bind makes files land left4me-owned on disk, drops ~140 lines from left4me-overlay, and makes all overlay content (workshop + script-built) consistent on-disk. Verified on left4.me kernel 6.12.86 that the kernel idmap propagates through plain re-bind, so systemd-run's BindPaths can wrap a pre-created idmapped staging path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 KiB
Build-time idmap: move the uid translation from the gameserver mount
into the script sandbox
Context
The current idmap implementation translates uids at gameserver mount
time: left4me-overlay stats each lowerdir, creates a per-lowerdir
idmapped bind under runtime/<n>/idmap/<basename> for the sandbox-
owned ones, then uses those bind paths in the overlay's lowerdir=.
On stop, the binds get torn down. Works correctly today, but spreads
the idmap concern across two helpers and adds mount lifecycle code on
every gameserver start.
Cleaner alternative: do the idmap translation at script-sandbox
build time, so files land on disk as left4me-owned. The on-disk
state then matches workshop-built overlays (also left4me-owned), and
the gameserver mount path becomes uniform — no per-lowerdir stat,
no idmap binds, no extra cleanup.
This plan switches the architecture to the build-time approach and reverts the gameserver-mount idmap code.
Verified mechanism
Tested end-to-end on left4.me (Trixie, kernel 6.12.86, ext4) on
2026-05-15:
/source/dir owned byleft4meon disk.mount --bind --map-users=980:981:1 --map-groups=980:981:1 /source /idmapped— inside/idmapped, files appear as uid 981 (sandbox view).mount --bind /idmapped /rebound— a plain second bind. The idmap propagates to/rebound(rebound view also shows uid 981). This is whatBindPaths=in the sandbox unit does.sudo -u l4d2-sandbox touch /rebound/x.txt— write succeeds. The file lands on disk owned byleft4me(uid 980).
Map direction is the inverse of the gameserver-side map:
--map-users=<disk_uid>:<mount_uid>:1 where disk is left4me and
mount-side is l4d2-sandbox. Inside the bind, the sandbox uid sees
its own uid as itself; writes from that uid get translated back to
the disk-side (left4me) for storage.
Approach
Script-sandbox helper (deploy/files/usr/local/libexec/left4me/left4me-script-sandbox)
Pre-create an idmapped bind staging path, point the sandbox's BindPaths at it, clean up on exit. Concretely:
- Remove the existing
chown -R l4d2-sandbox:l4d2-sandbox "$OVERLAY_DIR"andchmod 0755lines. The overlay dir staysleft4me-owned (web app's creation default). - Add a setup block before
systemd-run:STAGING=/var/lib/left4me/tmp/sandbox-idmap-${OVERLAY_ID} trap 'umount "$STAGING" 2>/dev/null || true; rmdir "$STAGING" 2>/dev/null || true' EXIT mkdir -p "$STAGING" mount --bind \ --map-users=$(id -u left4me):$(id -u l4d2-sandbox):1 \ --map-groups=$(id -g left4me):$(id -g l4d2-sandbox):1 \ "$OVERLAY_DIR" "$STAGING" - Change the systemd-run line:
BindPaths="${OVERLAY_DIR}:/overlay"→BindPaths="${STAGING}:/overlay"
- Remove the post-build
find ... chmod o+rblock. Files end up left4me-owned, web app reads them via its primary uid. The world-read kludge was only needed because of the old sandbox- owned files; with this change it's obsolete.
trap ensures the staging bind is umounted even on errors / signals.
Idempotent: if the helper is re-run, umount + rmdir handle existing
state, and mkdir -p + mount --bind over an existing mountpoint
adds another bind that the next exit cleans up. The kernel 6.12 bind
nesting on the same path works fine (verified during the recent
gameserver-side idmap fix).
Gameserver-mount helper (deploy/files/usr/local/libexec/left4me/left4me-overlay)
Revert the idmap logic added in commit 2f6a9cf (+ fix in 9053186,
- mountpoint-detection fix in
dd918ac). Specifically:
- Remove the per-lowerdir stat + idmap-decision loop in
cmd_mount.lowerdir=becomes the simple colon-join of resolved lowerdirs (the pre-2f6a9cf shape). - Remove the bind-umount loop in
cmd_umountand theshutil.rmtree(idmap_dir, ...)line. - Remove the
_is_mountpoint,_lookup_uid, and_get_user_idshelpers — no longer used. (Keepos.path.ismountfor the merged overlay check; that one's reliable.) - Remove the
LEFT4ME_TEST_*_UID/GIDtest-only env-var stubs. - Remove the idmap PRINT_ONLY emission.
The helper shrinks back to the pre-idmap size (~242 lines from current 381).
Tests
In l4d2host/tests/test_overlay_helper.py:
- Remove
test_mount_idmaps_sandbox_owned_lowerdir. - Remove
test_mount_skips_idmap_for_left4me_owned_lowerdir. - Remove
test_umount_unwinds_idmap_binds. - Remove
test_is_mountpoint_detects_same_fs_bind_mountand the_load_helper_modulehelper. - Remove
_setup_instance_with_uidand theFAKE_*_UID/GIDconstants. - Remove the
LEFT4ME_TEST_*env-var injection in_run.
In deploy/tests/test_deploy_artifacts.py:
- Remove
test_overlay_helper_idmaps_sandbox_owned_lowerdirs(the regression test for the soon-removed feature). - Add a new test
test_script_sandbox_uses_idmap_stagingthat asserts the sandbox helper contains:--map-users=and--map-groups=strings (the bind setup),/var/lib/left4me/tmp/sandbox-idmap-(the staging path prefix),BindPaths="${STAGING}:/overlay"(or close equivalent — point the bind at the idmapped staging path, not at OVERLAY_DIR).- A
trapfor cleanup.
- Remove the existing
chown -R l4d2-sandboxassertion in the sandbox-helper test (if any).
Migration
Existing overlays under /var/lib/left4me/overlays/<id>/ are a mix:
- Workshop-built: already
left4me-owned (no migration needed). - Script-built (e.g. server 2's overlays 4 and 9): currently
l4d2-sandbox-owned from the prior helper version. Need chown toleft4me:left4me.
One-shot migration command on the test server (run before deploying the new helpers, OR after — both work because the new script-sandbox also expects left4me-owned dirs):
sudo chown -R left4me:left4me /var/lib/left4me/overlays/
That's safe — overlays/* are all overlay content, no other tenants. The workshop ones are already left4me; the chown is a no-op for them. The script-built ones get flipped to the new ownership model.
Running gameservers using the old idmap-bind setup will keep working on the old overlays/ files (which they bind via the now-orphan idmap bind that's already in place). The next stop/start cycle picks up the new helper, which:
- Doesn't create any new idmap binds (gameserver-side helper has none),
- Cleans up the legacy idmap binds it finds (the existing umount loop in the current helper handles this on the way out).
After the first stop/start cycle, no more idmap binds exist anywhere in the system. Steady state.
ckn-bw bundle
No changes needed. The install_left4me_scripts action picks up the
new helper contents from /opt/left4me/src/deploy/files/usr/local/...
on the next git_deploy apply. ckn-bw itself is content-agnostic
about the helper internals.
Files to modify
deploy/files/usr/local/libexec/left4me/left4me-script-sandbox— add idmap bind setup + trap cleanup; remove old chown; switch BindPaths.deploy/files/usr/local/libexec/left4me/left4me-overlay— revert the ~140 lines of idmap-handling code; remove uid lookup, mountinfo helper, test-stub env vars; drop the idmap PRINT_ONLY emission.l4d2host/tests/test_overlay_helper.py— drop idmap tests and helpers.deploy/tests/test_deploy_artifacts.py— flip the asserted invariant (helper has idmap → sandbox has idmap).
Verification
End-to-end on left4.me:
- Push left4me commit,
bw apply ovh.left4me. sudo chown -R left4me:left4me /var/lib/left4me/overlays/(one-shot migration).sudo systemctl restart left4me-server@2.sudo findmnt --task 1 -o TARGET | grep runtime/2— expect onlyruntime/2/merged, noidmap/*subdirs.sudo ls -ln /var/lib/left4me/overlays/9/and a couple of other script overlays — expectleft4me:left4me.- Trigger an overlay rebuild from the web UI on a script overlay. Confirm the build succeeds and the resulting files are left4me-owned on disk.
sudo -u left4me touch /var/lib/left4me/runtime/2/merged/left4dead2/addons/sourcemod/logs/test.log— expect write to succeed (verifies SM logging path still works).- RCON
sm_cvar nb_update_frequency 0.0333— no permission-denied line injournalctl -u left4me-server@2.
Local tests:
pytest l4d2host/tests/test_overlay_helper.py -q
pytest deploy/tests/test_deploy_artifacts.py -q
Both should pass with reduced test count (removed idmap-on-mount tests, added one sandbox-helper assertion).
Risks
- Kernel version dependency: idmap propagation through plain re-bind was verified on 6.12.86. Older kernels may behave differently. ovh.left4me is on Trixie's 6.12, so we're fine; future hosts on older kernels would need verification. Document the kernel floor (≥ 6.6 for overlayfs+idmap, but ≥ 6.x for the propagation — we have no exact lower bound documented).
- Stale idmap binds during migration: server 2 currently has two
active gameserver-side idmap binds (
runtime/2/idmap/overlays_4andoverlays_9). The first stop after deploy uses the existing helper code (with_is_mountpointfix) to umount them. Verified in the recent fix cycle. New starts won't create new binds. - Sandbox migration of in-flight builds: if a script-overlay
build is running during the deploy + chown migration, the chown
could happen mid-write. Mitigation: don't run the chown while a
build is active; check via
systemctl list-units 'left4me-script-*'first. - The trap-based cleanup in bash: if the helper is hit with
SIGKILL, the trap doesn't fire and the staging bind leaks. Same
exposure as today's leaks (gameserver-side stale binds on similar
scenarios). Acceptable; the next sandbox run for the same overlay
id
umounts the leftover bind first via the trap setup pattern (umount; rmdir; mkdir -p; mount --bindis idempotent).
Why this is worth doing despite the working current solution
Today's idmap-on-mount works and is correct. The reasons to refactor:
- Architectural locality: the uid translation is a build-time concern (the sandbox creates files); having it as a mount-time concern means the gameserver path needs to know about a producer- side decision.
- Code reduction: helper shrinks by ~140 lines; tests by ~150. Removed code is removed bug surface.
- On-disk consistency: all overlay content becomes
left4me- owned. Easier to reason about (no two-tier ownership), easier to manually inspect (no per-overlay-type ownership). - Mount lifecycle simplification: no per-instance idmap dir
creation, no per-start uid lookups, no per-stop bind teardown, no
stacked-bind regression hazard from the same-fs
os.path.ismounttrap (we already fixed that once). - Web app read path: drops the world-read chmod kludge in the sandbox helper. File-tree download endpoint reads via primary uid.
The cost (refactor + migration) is paid once; the benefit is permanent.
Out of scope
- Splitting the web-app uid from the gameserver uid (future change noted in earlier plans).
- Rewriting shell helpers in Python.
left4me-apply-cakecleanup (still drifting along in the install glob).- Re-examining whether
l4d2-sandboxshould exist as a separate uid at all (this plan keeps it, but the cost-benefit might shift later).