Skip to content

SessionEnd cleanup fails when review cwd ≠ session cwd — broker.json looked up by cwd-hash, leaving orphan brokers even on graceful /quit #380

Description

@hongsu

Summary

handleSessionEnd cleans up the broker by calling loadBrokerSession(input.cwd), but broker.json is stored under a cwd-derived hash (resolveStateDir(cwd)). When the broker was spawned for a different cwd than the session's cwd, the lookup misses, endpoint is null, no broker/shutdown is sent, and the broker is orphaned — even when the session exits gracefully (/quit, SIGTERM).

This is distinct from #108 (which attributes orphans to the hook not firing on crash/kill). Here the hook fires normally (exit 0) but cannot locate the broker.

Why this happens in normal use

The broker cwd is decided by resolveCommandCwd (codex-companion.mjs:149):

options.cwd ? path.resolve(process.cwd(), options.cwd) : process.cwd()

→ flows to connect(cwd)ensureBrokerSession(cwd) → broker --cwd and broker.json location.

A very common workflow breaks this:

  • Session starts in the main tree (session cwd = /repo).
  • A review runs against a git worktree — either cd worktree && node companion … or node companion … --cwd worktree → broker cwd = worktree.
  • /quit → SessionEnd input.cwd = /repoloadBrokerSession(/repo) looks at the /repo hash, never finds the worktree broker.json → broker survives.

Reviewing a worktree's uncommitted working tree requires broker cwd = worktree, so for a main-tree session this mismatch is structural, not accidental.

Reproduction (verified)

# broker registered for cwd=A
ensureBrokerSession("/tmp/probe")            # broker --cwd=/tmp/probe, alive

# SessionEnd with matching cwd  -> broker is killed (OK)
echo '{"cwd":"/tmp/probe","session_id":"s1","hook_event_name":"SessionEnd"}' \
  | node session-lifecycle-hook.mjs SessionEnd   # exit 0, broker DEAD

# SessionEnd with mismatched cwd -> hook exits 0 but broker SURVIVES
echo '{"cwd":"/tmp","session_id":"s2","hook_event_name":"SessionEnd"}' \
  | node session-lifecycle-hook.mjs SessionEnd   # exit 0, broker ALIVE  <-- bug

Observed in the wild: every live session had cwd in the main tree, while every orphan broker's --cwd pointed at a worktree — a perfect mismatch pattern.

Root-cause fix (preferred)

Decouple broker cleanup from cwd:

  • A. Track brokers by session, not cwd-hash. Record the owning session_id in broker.json (and/or a global broker index). handleSessionEnd resolves brokers by input.session_id and tears down all of them regardless of cwd. This fixes the graceful-exit mismatch directly.
  • B. Parent-session liveness watchdog in the broker. The broker records the owning session PID and periodically checks process.kill(pid, 0); if the owner is gone it self-terminates. This additionally covers SIGKILL / crash / OOM, where SessionEnd never fires at all.

A handles graceful exits; B handles abnormal exits. Together they remove the cwd dependency entirely.

Fallback (defense-in-depth, not a root fix)

An idle timeout (#108) as a last-resort net if both A and B fail. Useful but secondary — it lets brokers linger for the timeout window and masks the real ownership/lookup defect.

Related

Environment

  • Plugin openai-codex/codex v1.0.4
  • Linux; reproduced directly against session-lifecycle-hook.mjs / broker-lifecycle.mjs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions