fix(spawn): opt-in PTY_SPAWNER_PID watchdog to prevent orphaned daemons#39
Merged
myobie merged 1 commit intoJun 11, 2026
Merged
Conversation
`spawnDaemon` produces a detached daemon process that has no mechanism to self-terminate when its spawner exits without calling `disconnect()` / `kill()`. Because `detached: true` puts the daemon in its own session, the kernel sends no signal when the spawner dies — the daemon is reparented to init and survives forever. In practice this leaks daemons whenever a short-lived script, test harness, or scoped resource (e.g. `@overeng/pty-effect`) spawns a daemon and exits. We observed hundreds of `dist/server.js` processes accumulating multi-GiB RSS over days. See effect-utils#677. Add a new `bindToSpawnerLifetime` option (opt-in, default off). When set, `spawn.ts` injects `PTY_SPAWNER_PID=<pid>` into the daemon's env, and the daemon polls `process.kill(pid, 0)` every 5 s. ESRCH triggers a clean shutdown via the existing `cleanShutdown(0)` path. An invalid or already-dead PID at startup causes immediate shutdown. Long-lived supervisors that want daemons to outlive them simply omit the option — full backwards compatibility. Refs: overengineeringstudio/effect-utils#677
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
spawnDaemon()produces a detached daemon process that has no way to self-terminate when its spawner exits without callingdisconnect()/kill(). Becausedetached: trueputs the daemon in its own session, the kernel sends no signal when the spawner dies — the daemon is reparented to init and survives forever.In practice this leaks daemons whenever a short-lived script, test harness, or scoped resource spawns a daemon and exits. We observed this in the wild: hundreds of
@myobie/pty/dist/server.jsprocesses accumulating multi-GiB RSS over days, pushing one host into swap thrashing.Full write-up + scenario matrix (clean exit / SIGKILL / SIGHUP / SIGTERM): overengineeringstudio/effect-utils#677
Self-contained reproduction: https://github.com/schickling-repros/2026-05-myobie-pty-zombie-daemon-leak
What
New opt-in
bindToSpawnerLifetimefield onSpawnDaemonOptions. Default off — existing callers see no behaviour change.When enabled:
spawn.tsinjectsPTY_SPAWNER_PID=<spawner pid>into the daemon's env.server.tsreads the var on startup. If the PID is already dead (ESRCH), it exits viacleanShutdown(0)immediately. Otherwise it pollsprocess.kill(pid, 0)every 5 s and callscleanShutdown(0)once the spawner is gone.How
SpawnDaemonOptions; conditional env-var injection inspawnViaNode.installSpawnerWatchdog+isProcessAlivehelpers inserver.ts, wired into the existing entry-point block alongside theSIGTERM/SIGINThandlers. UsessetInterval(...).unref()so the watchdog never holds the event loop open on its own.Rationale
The issue lists four fix options:
server.js— requires picking a default, and conflicts with the design intent that daemons survive client restarts.PTY_SPAWNER_PIDenv var ← this PR. Opt-in, no config to tune, no behaviour change unless requested.prctl(PR_SET_PDEATHSIG)— Linux-only, and order-sensitive afterdetached: true(the parent may already be init by the time we set it).@overeng/pty-effect) — wraps the symptom; same leak recurs in any other caller.Option 2 is the minimal upstream fix: tight binding when the caller wants it, zero impact when they don't. Long-lived supervisors (e.g. the bundled
pty supervisor) that want daemons to outlive them simply omit the option.Test plan
tests/spawner-pid-watchdog.test.tscovers:PTY_SPAWNER_PIDvalues are ignored (no behaviour change).npm run typecheckandnpm run buildclean.npm test: the new tests pass; the only failures locally are intests/stats-cli.test.ts(macOSpsparsing — pre-existing onmain, unrelated to this change).Alternatives considered
prctl(PR_SET_PDEATHSIG, SIGTERM)— would give a kernel-driven signal on Linux, but is platform-locked and prone to "parent already reparented to init" races right afterdetached: true. Could be layered on top of this PR later as a Linux-only fast path; the env-var watchdog still works on macOS and BSDs.PTY_SPAWNER_PIDunconditionally, but that would change semantics for the bundled supervisor (daemons would die when the supervisor restarts). Keeping it opt-in preserves the documented "long-lived sessions that survive process restarts" contract.Posted on behalf of @schickling
agent_nameagent_session_idagent_toolagent_tool_versionagent_runtimeagent_modelworktreemachinetooling_profile