Skip to content

fix(providers/codex): remove stale attemptController.abort() that crashes after SDK cleanup#1739

Merged
Wirasm merged 1 commit into
coleam00:devfrom
kagura-agent:fix/codex-abort-crash
May 21, 2026
Merged

fix(providers/codex): remove stale attemptController.abort() that crashes after SDK cleanup#1739
Wirasm merged 1 commit into
coleam00:devfrom
kagura-agent:fix/codex-abort-crash

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

@kagura-agent kagura-agent commented May 21, 2026

Problem

Closes #1735

The fix in #1371 added attemptController.abort() in the retry loop's finally block. By the time this runs, the codex-sdk's own finally has already called child.removeAllListeners() + child.kill(). The subsequent abort() fires Node's internal spawn-signal abort listener on the now-listenerless child, emitting an uncaught AbortError that bypasses any surrounding try/catch.

100% reproducible — crashes the workflow ~9-15s into the first codex node.

Fix

Remove attemptController.abort() from the finally block. The per-attempt AbortController is short-lived and goes out of scope at iteration end — no explicit cleanup needed. Caller signal cancellation is unaffected (removeEventListener for the caller's abort listener is retained in the same finally block).

Changes

Validation

  • bun test packages/providers/src/codex/provider.test.ts — 59 pass, 0 fail
  • New regression test successful attempt does not throw from stale abort cleanup (#1735) passes
  • Provider coverage at 99.67%

Summary by CodeRabbit

  • Bug Fixes
    • Resolved uncaught errors occurring in sendQuery operations caused by abort signal cleanup race conditions.

Review Change Stack

…shes after SDK cleanup (coleam00#1735)

The codex-sdk's own finally calls child.removeAllListeners() + child.kill()
before Archon's retry-loop finally runs. The subsequent attemptController.abort()
fires Node's internal spawn-signal abort listener on the now-listenerless child,
surfacing an uncaught AbortError that bypasses try/catch.

The per-attempt AbortController is short-lived and goes out of scope at iteration
end — no explicit abort() cleanup is needed. Caller signal cancellation is
unaffected (removed via removeEventListener in the same finally block).

Closes coleam00#1735
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a6def82a-fcff-4ffd-bcd3-23d254ca21e8

📥 Commits

Reviewing files that changed from the base of the PR and between 253321b and 156e2c8.

📒 Files selected for processing (2)
  • packages/providers/src/codex/provider.test.ts
  • packages/providers/src/codex/provider.ts

📝 Walkthrough

Walkthrough

This PR fixes a race condition in the Codex provider's retry loop where calling attemptController.abort() in the finally block collides with the Codex SDK's own subprocess cleanup, causing uncaught AbortError exceptions. The fix removes the explicit abort call and adds a regression test to verify the behavior.

Changes

Abort cleanup race fix

Layer / File(s) Summary
Remove abort() race and validate cleanup
packages/providers/src/codex/provider.ts, packages/providers/src/codex/provider.test.ts
The per-attempt finally block no longer calls attemptController.abort() after each retry attempt; scope exit provides sufficient cleanup. Comments explain that the abort call raced with Codex SDK's removeAllListeners() on the child process, triggering an uncaught AbortError in Node's spawn-signal wiring. A new regression test installs an uncaughtException handler, runs sendQuery to completion, waits for deferred errors, and verifies no uncaught exceptions occur.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

  • coleam00/Archon#1266: Per-attempt AbortController handling and attempt-poisoning mitigation directly address the signal-reuse and retry-loop abort behavior described in #1266.

Possibly related PRs

  • coleam00/Archon#1371: Both PRs modify CodexProvider.sendQuery per-retry AbortController cleanup logic and add/adjust regression coverage around stale signals and abort-error behavior.
  • coleam00/Archon#1162: Both PRs modify packages/providers/src/codex/provider.ts sendQuery abort/cleanup behavior and add/cover related tests around abort handling across retries.

Poem

🐰 A race condition met its match,
When abort() and cleanup couldn't latch.
Scope exit whispers, soft and clean,
No uncaught errors in between,
The test hops in with watchful sight! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main fix: removing the stale attemptController.abort() call that caused crashes after SDK cleanup, with reference to the addressed problem.
Description check ✅ Passed The description covers Problem, Fix, Changes, and Validation sections. It lacks formal UX Journey, Architecture Diagram, Risk/Label snapshot, and Compatibility sections required by the template, but provides sufficient technical detail for understanding the bug fix.
Linked Issues check ✅ Passed The PR fully addresses #1735 requirements: removes the problematic attemptController.abort() call from finally, retains caller signal cleanup, adds regression test for successful attempts without stale abort errors, and validates with passing tests and code coverage metrics.
Out of Scope Changes check ✅ Passed All changes directly address issue #1735: fix in provider.ts removes the abort call and adds explanatory comments, and test file adds a regression test for the specific crash scenario. No out-of-scope modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm Wirasm merged commit a9288b4 into coleam00:dev May 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(providers/codex): attemptController.abort() in retry finally crashes via codex-sdk's removeAllListeners (regression from #1371)

2 participants