diff --git a/docs/superpowers/plans/2026-05-09-overlay-umount-helper-namespace-pin.md b/docs/superpowers/plans/2026-05-09-overlay-umount-helper-namespace-pin.md
new file mode 100644
index 0000000..ffe5811
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-09-overlay-umount-helper-namespace-pin.md
@@ -0,0 +1,161 @@
+# Overlay umount helper was pinning the unit's mount namespace alive
+
+> **Status:** fixed in `5eac51a` (helper nsenter wrap) and `87d56a0`
+> (modal delegation). This doc is a postmortem so future maintainers
+> don't walk the same path.
+
+## Symptom
+
+After commit `936c8bb` ("ExecStart srcds_run from merged overlay,
+not installation/"), every Reset job started failing:
+
+```
+OSError: [Errno 16] Device or resource busy:
+  '/var/lib/left4me/runtime/<id>/merged'
+```
+
+`shutil.rmtree(runtime_dir)` in `_purge_instance` tripped on the
+still-mounted `merged/`. The unit's `ExecStopPost` had run the umount
+helper, the helper had returned non-zero, the unit went `failed`, and
+the rmtree downstream couldn't proceed.
+
+## False starts (don't repeat these)
+
+We initially modeled this as an unavoidable kernel-level race between
+ExecStopPost and the deferred reaping of the unit's per-service mount
+namespace. The "fixes" applied in that frame:
+
+1. **Eager-retry loop in `cmd_umount`** (started at 4 s deadline,
+   bumped to 12 s, then 25 s). Each bump worked sometimes and broke
+   sometimes — because we were timing the helper's own life, not the
+   kernel's reaping (see root cause).
+2. **Lazy-umount (`umount -l`) fallback** if eager retries exhausted.
+   This *would* have made the unit not go `failed`, but it left
+   `work/work` half-finalized and just moved the EBUSY downstream.
+3. **`TimeoutStopSec=15s` → `60s`** to give ExecStopPost more retry
+   room. This made Stop sit in "stopping" for tens of seconds.
+
+All three workarounds shipped to the test box and were reverted in
+`5eac51a` once we found the actual cause.
+
+## Root cause
+
+A live empirical probe (`/tmp/probe-umount2.sh` on the test box,
+polling `/proc/*/ns/mnt` while a stop was in flight) showed:
+
+```
+[t= 0.00] mounted=Y holders=[]
+[t= 2.27] mounted=Y holders=[35259(left4me-overlay) ]
+[t= 4.53] mounted=Y holders=[35259(left4me-overlay) ]
+[t= …  ]                          (steady for ~22 s)
+[t=22.97] mounted=Y holders=[35259(left4me-overlay) ]
+[t=25.22] mounted=N holders=[]    ← helper finally exited
+```
+
+The single PID holding a reference to the unit's dying mount namespace
+was **our own umount helper** running as ExecStopPost. The EBUSY
+window matched the helper's retry budget exactly. The mount became
+unmountable the moment the helper exited.
+
+### Why the helper was holding the namespace
+
+systemd's `+` Exec prefix removes sandbox & credentials, but does
+**not** detach from the unit's per-service mount namespace (created
+by `PrivateTmp=true` + `Protect*` directives). The Python interpreter
+that runs `left4me-overlay` was launched inside the unit's namespace.
+
+Inside the helper we did:
+
+```python
+subprocess.run([NSENTER, "--mount=/proc/1/ns/mnt", "--", UMOUNT_BIN, ...])
+```
+
+That nsenter put the *child process* (the umount syscall) in PID 1's
+namespace — but the *parent process* (the helper Python interpreter)
+never left the unit's namespace. As long as the helper was alive, it
+held a reference to that namespace, which kept the slave-mount tree
+alive, which made `umount` in PID 1 return EBUSY (mount-propagation
+can't reconcile a slave that still has open references).
+
+Self-defeating loop: the helper tried to umount the namespace it was
+holding open. The mount only released when the helper gave up.
+
+### Why this didn't surface before commit `936c8bb`
+
+Before that commit, `ExecStart` invoked `srcds_run` from the
+`installation/` lower layer. Srcds processes had cwd / mmaps in
+`installation/`, **not** in the overlay mount. The unit's namespace
+still existed and the helper still pinned it, but the kernel didn't
+need to reconcile any references inside the overlay — so `umount` in
+PID 1 found nothing busy and succeeded immediately.
+
+Once srcds started running from inside `merged/`, the unit's namespace
+gained file references inside the overlay, and the helper's
+namespace-pin became the thing keeping those references in place.
+
+## Fix
+
+**One change at the systemd Exec line, two consequential cleanups.**
+
+### `deploy/files/usr/local/lib/systemd/system/left4me-server@.service`
+
+Wrap both helper invocations with nsenter at the unit level:
+
+```ini
+ExecStartPre=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay mount %i
+ExecStopPost=+/usr/bin/nsenter --mount=/proc/1/ns/mnt -- /usr/local/libexec/left4me/left4me-overlay umount %i
+```
+
+`nsenter` runs in the unit's namespace momentarily, switches its own
+mount namespace to PID 1's, then `execve`s the helper. From that
+point the helper Python interpreter — *the long-lived parent process*
+— lives in PID 1's namespace and holds no reference to the unit's
+namespace.
+
+`TimeoutStopSec` reverts to `15s`.
+
+### `deploy/files/usr/local/libexec/left4me/left4me-overlay`
+
+With the helper already in PID 1's namespace, internal nsenter is
+redundant. Removed:
+
+- `nsenter --mount=/proc/1/ns/mnt --` prefix on the mount/umount argv.
+- `cmd_umount`'s eager-retry loop (no race left to ride out).
+- Lazy-umount (`umount -l`) fallback (no fallback needed; eager
+  succeeds first try).
+- `work_inner` cleanup retry (no kernel-finalisation residual after a
+  successful eager umount).
+- `import time`.
+
+Kept: input validation, idempotency guards (`os.path.ismount`),
+`work_inner` rmtree (the kernel-overlayfs orphan dir is unrelated to
+the namespace issue and still needs cleaning up).
+
+## Verification
+
+After deploy on the test box:
+
+| Metric | Before fix | After fix |
+|---|---|---|
+| Reset duration (`l4d2ctl reset 3`) | ~25 s | ~0.5 s |
+| `holders=` of dying namespace | `[helper_pid]` for ~25 s | `[]` immediately |
+| Unit state after Stop | `failed` | `inactive` |
+| ExecStopPost exit code | 32 (EBUSY) | 0 |
+
+UI flow (`/servers/3` → Start → Reset): job `#164 reset succeeded`
+in 1.3 s end-to-end. No `failed` rows on subsequent resets.
+
+## Lessons
+
+- **A retry loop is a hint, not a fix.** If you find yourself reaching
+  for "retry until kernel finishes," check whether *your own process*
+  is what's blocking the kernel from finishing. nsenter at the
+  syscall level looks right, but only escapes the namespace for the
+  child process; the parent still pins it.
+- **Probe for the holder, don't assume async.** `/proc/*/ns/mnt` plus
+  a tight polling loop quickly tells you who's actually holding a
+  namespace alive. We jumped to "task_work_add reaping" as the
+  explanation and burned a round of workarounds before checking.
+- **`+` prefix only escapes sandbox & credentials.** Mount namespace
+  inheritance is unaffected; if you need PID 1's namespace, do
+  `nsenter` yourself at the Exec line.