Skip to content

Background task hangs forever in "queued" when --cwd is a git worktree (detached worker dies silently; no timeout) #367

Description

@lilcadet101

Summary

A background task (task ... --background, as used by /codex:rescue and the adversarial-review flow) hangs forever in queued and never executes when its --cwd points at a git worktree (a checkout where .git is a file pointing at the real gitdir, e.g. git worktree add). The detached worker dies during startup and, because it is spawned with stdio: "ignore" and there is no heartbeat/timeout, the failure is completely silent and the job is stuck at queued permanently.

Foreground tasks work fine in the same worktree, and background tasks work fine in non-worktree directories — so this is specific to the detached-worker + worktree combination, not to Codex itself (the CLI is installed and authenticated and runs the identical work in the foreground).

Environment

  • OS: Windows 11 (10.0.26200), shell: Git-bash / MSYS (also reproduced via the plugin invoked from Claude Code)
  • Plugin: codex@openai-codex v1.0.4 (scripts/codex-companion.mjs)
  • codex-cli 0.137.0, authenticated ("Logged in using ChatGPT")
  • Node: v20+ (system node)

Reproduction

CC=".../codex/1.0.4/scripts/codex-companion.mjs"

# 1) Make a git worktree
git -C /path/to/repo worktree add /tmp/wt-probe origin/main   # .git is now a FILE

# 2) Background task in the worktree  -> stuck "queued" forever
node "$CC" task "Reply with exactly: OK" --cwd /tmp/wt-probe --background
node "$CC" status <jobId>     # phase: queued, never advances; worker pid is dead

# 3) Same task, FOREGROUND, same worktree -> works
node "$CC" task "Reply with exactly: OK" --cwd /tmp/wt-probe        # -> OK

# 4) Background task in a NON-worktree dir -> works
node "$CC" task "Reply with exactly: OK" --cwd /tmp/plain-dir --background   # -> completes

A/B matrix (all other factors equal):

cwd mode result
plain dir / main clone --background ✅ runs to completion
git worktree --background stuck queued forever
git worktree foreground ✅ runs to completion

The stuck job's log dead-ends at:

[..] Starting Codex Task.
[..] Queued for background execution.

…and never reaches Starting Codex task thread.. Re-running the same task-worker job in the foreground completes it.

Root cause / analysis

enqueueBackgroundTaskspawnDetachedTaskWorker (codex-companion.mjs ~L641-650) spawns the worker as:

spawn(process.execPath, [scriptPath, "task-worker", "--cwd", cwd, "--job-id", jobId], {
  cwd, env: process.env, detached: true, stdio: "ignore", windowsHide: true
});

Two problems combine:

  1. Silent death + no timeout. stdio: "ignore" discards the worker's stderr, and there is no heartbeat or watchdog that flips a never-started job to failed. So when the detached worker throws during init (which it does in a worktree cwd, before it has wired its own logging to the job logFile), the error vanishes and the job is pinned at queued indefinitely. status keeps reporting queued against a dead pid.

  2. Worktree-specific worker startup failure. The detached worker (process cwd = the worktree) dies before logging Starting Codex task thread, whereas the identical foreground path in the same worktree succeeds. The only deltas are detached:true + stdio:"ignore" + the child's process cwd being the worktree. Something in the worker→broker→codex app-server spawn chain fails for a worktree checkout under those conditions on Windows.

Secondary bug (recovery is also broken on Windows/Git-bash)

/codex:cancel <job> cannot kill a stuck job here: the companion's taskkill /PID <pid> /T /F is run through a Git-bash/MSYS context that path-translates the /PID switch into a filesystem path, producing:

taskkill /PID ... : ERROR: Invalid argument/option - 'C:/Program Files/Git/PID'.

so the kill silently fails and the orphaned job/process can't be cleaned via the plugin.

Tertiary: broker daemons leak

app-server-broker.mjs serve daemons (one per cwd) are never reaped — they accumulated to ~40 over a few days of normal use on this machine, one per working directory/worktree, long after those jobs finished.

Suggested fixes

  • Don't discard the detached worker's stderr to the void — redirect it to the job log file (or a worker log) so startup failures are recorded. At minimum, mark a job failed (with the captured error) if its worker exits/never advances within a timeout.
  • Add a heartbeat/timeout so a job that never leaves queued (dead worker pid) is surfaced as failed, not reported as queued forever.
  • Fix worktree support in the detached/background worker path (parity with the working foreground path).
  • On Windows, invoke taskkill via the comspec / an args array that isn't subject to MSYS path translation (e.g. spawn with explicit args, or prefix with the stop-parsing token), so /codex:cancel works under Git-bash.
  • Reap idle app-server-broker daemons.

Workaround

Run the review/task from the main clone (non-worktree) directory, or run foreground (not --background), when the branch under review lives in a git worktree. Reviewing a worktree branch's diff still works from the main clone via git show <sha> / git diff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions