fix: stale PostToolUse notifications and background task output leak by timvisher-dd · Pull Request #353 · zed-industries/claude-agent-acp

timvisher-dd · 2026-02-27T01:01:42Z

Problem

Two independent bugs caused stale or garbled output in ACP clients (agent-shell):

PostToolUse hook race: The SDK fires PostToolUse hooks ~42ms before the streaming handler registers callbacks, causing "No onPostToolUseHook found" errors and lost session/update notifications. For subagent child tool uses, the gap is 10-60+ seconds.
Background task output leak: When a run_in_background Bash task completes, the SDK emits a task_notification followed by an internal model turn after the result message. prompt() returned at the first result, leaving the internal turn in the iterator buffer. The next prompt() call processed stale background task output instead of the user's actual input — the model responded to its own background task completion rather than "yes".

Root Cause Analysis

Bug 1: PostToolUse hook timing

The SDK fires hooks synchronously from its tool execution path, but streaming events that trigger registerHookCallback() are consumed asynchronously from the query iterator. For fast tools, the hook fires before the content_block_start event is processed (~42ms gap). For subagent child tools, the gap is 10-60+ seconds because messages are relayed only when the subagent finishes.

Bug 2: SDK internal turns from background tasks

Reading the minified SDK source revealed the root cause: the SDK's iP() function (minified; likely hasRunningDeferrableTasks() or shouldDeferResult() based on usage) checks whether to defer the result message for running background tasks, but only includes local_agent tasks — NOT local_bash tasks. Similarly, the internal queue class q4 (likely InputStreamQueue or MessageBuffer) is where we peek for buffered task_notification messages. When a background Bash task completes after the turn's result, the SDK emits:

[148] result/success         ← prompt() returns here (premature)
[149] system/task_notification ← bg task completed
[150] system/init            ← internal turn starts
[151-167] stream_event deltas ← model responds to task completion
[168] assistant message
[172] result/success         ← internal turn ends (leaked to next prompt)

Verified: the Claude TUI hides this by rendering internal turns inline, but ACP clients can't — they see the stale output on the next prompt.

Independence Proof

Empirically verified both fixes are independently necessary:

Reverted fire-and-stash → 10 PostToolUse tests fail, bg-task-leak tests pass
Reverted internal turn fix → bg-task-leak tests fail, PostToolUse tests pass

Fix 1: Non-blocking fire-and-stash (tools.ts)

When the hook fires before registration:

Stash { toolInput, toolResponse } and return { continue: true } immediately
When registerHookCallback() runs later, find stash, execute callback, clean up
Periodic sweep (60s, 5min TTL, unref()) cleans orphaned entries

Handles both the 42ms race and 10-60s subagent delay with zero blocking.

Fix 2: Internal turn consumption (acp-agent.ts)

After receiving a result/success:

Track local_bash background tasks via task_started messages (ignore local_agent — SDK handles those)
Yield to event loop (setTimeout(0) macrotask) to let the SDK enqueue task_notification
Peek at SDK's internal queue (inputStream.queue[0]) for task_notification
If found: save the prompt response, continue consuming the internal turn
If not found: log warning with task IDs, return (best-effort — upstream SDK fix needed)

Known limitation

The setTimeout(0) yield is best-effort. If task_notification arrives later than one macrotask tick, the internal turn can still leak. A warning is logged with the unresolved task IDs for observability. The proper fix requires an upstream SDK change to either defer local_bash results (like local_agent) or expose an idle/hasPending signal on the Query interface.

Changes

Source

src/tools.ts: Fire-and-stash mechanism replacing blocking Promise.race
src/acp-agent.ts: Internal turn detection (task_started tracking, queue peek, savedPromptResponse)
src/embed.d.ts: Type declaration shim for single-file bun build module

Tests (151 pass, 0 flaky)

src/tests/bg-task-leak.test.ts (13 tests): Full SDK message sequence from real trace data
- Sync/async internal turn consumption
- Cross-prompt stale message isolation
- local_agent vs local_bash filtering
- Warning path (notification never arrives)
- Terminal statuses (completed/failed/stopped)
- error_during_execution known limitation
- Multiple back-to-back internal turns
src/tests/tools.test.ts (10 new tests): Fire-and-stash contract
- Happy path, 42ms race, subagent delay, batch, error handling
- All deterministic (fake timers + microtask flushing, zero real delays)

Infrastructure

bin/test: Local CI mirror — parses .github/workflows/ci.yml with yq
src/tests/authorization.test.ts: Type fix for auth extension property
Build fixes: SDK import path correction, embed module declaration

E2E Verification

Standalone harness (x.bg-task-leak-harness.mjs) spawns real ACP agent, launches background Bash tasks via subagents, and validates Turn B responds to "yes" (not stale background output):

Pre-fix: 3/3 leak detected — Turn B returned in 1ms with "The background task completed... still waiting on your answer"
Post-fix: 3/3 clean — Turn B returned in 9-19s with file creation tool calls

Risk

Low for fire-and-stash: Happy path unchanged. Stash path replaces 5s block with immediate return.

Medium for internal turn fix: Accesses SDK internals (inputStream.queue) not in public API. The task_type === "local_bash" filter relies on an untyped SDK field verified by reading the minified SDK source. Both are documented as workarounds pending upstream SDK changes.

Test plan

bin/test — format, lint, build (0 errors), 151 tests pass
E2E harness: 3/3 iterations clean (background task consumed during Turn A)
4 rounds of Codex review — all findings addressed or documented as known limitations
Fire-and-stash independence empirically proven (revert → 10 tests fail)
Manual: verify no [hook-trace] errors in agent-shell during subagent workloads
Upstream: file SDK issue for local_bash result deferral

benbrandt · 2026-03-03T13:44:07Z

@timvisher-dd how did you testing go/

timvisher-dd · 2026-03-03T15:48:47Z

@timvisher-dd how did you testing go/

Still working through it! Do you mind if I leave this open in Draft?

benbrandt · 2026-03-03T19:38:43Z

All good, thanks for testing!

timvisher-dd · 2026-03-05T02:30:07Z

@benbrandt I'd like to do a bit of manual testing tomorrow with this change but I'm confident that it's directionally correct. Does that seem right to you?

timvisher-dd · 2026-03-06T17:03:42Z

@benbrandt This is feeling pretty good to me. Definitely give it a critical eye because it's a whack-a-mole style problem but I've been running it all day in this state and haven't seen a spurious Notice or odd shell behavior from agent-shell.

timvisher-dd · 2026-03-10T13:15:24Z

Actually I guess I take that back again. Even on my dev integration branches I still see problems with this every now and then. I'm clearly not understanding the full scope of the interactions. I'll drop this back to draft because it's helpful to remind me to keep pressing on it it but if y'all'd rather it be closed for now I'd be fine with that, too. Sorry for the noise here. (－‸ლ)

timvisher-dd · 2026-03-10T14:39:50Z

Actually I guess I take that back again. Even on my dev integration branches I still see problems with this every now and then. I'm clearly not understanding the full scope of the interactions. I'll drop this back to draft because it's helpful to remind me to keep pressing on it it but if y'all'd rather it be closed for now I'd be fine with that, too. Sorry for the noise here. (－‸ლ)

LOL globdammit actually I think the issue was that my local integration branch wasn't including this fix this morning. (ﾉಥ益ಥ）ﾉ ┻━┻

timvisher-dd · 2026-03-12T23:35:03Z

@benbrandt The latest code here is behaving promisingly. I think I may finally have cracked something. xD

The SDK fires PostToolUse hooks ~42ms before the streaming handler registers callbacks, causing a within-turn race condition. The previous code blocked for up to 5 seconds waiting for registration, which could delay the SDK. Replace the blocking Promise.race/timeout approach with a fire-and-stash model: when the hook fires before registration, stash the input data and return immediately. When registerHookCallback() runs later, it finds the stash and executes the callback. This handles both the common ~42ms race and the subagent case where callbacks arrive seconds later. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…g prompt When a background Bash task (run_in_background) completes, the SDK emits a task_notification followed by an internal model turn after the initial result message. prompt() was returning at the first result, leaving the internal turn in the iterator buffer. The next prompt() call would then process stale background task output instead of the user's actual input. Track pending background tasks via task_started messages and peek at the SDK's internal queue after receiving a result. If a task_notification is queued, defer returning and continue consuming the internal turn. This is a pragmatic workaround for the SDK not deferring local_bash background tasks the way it defers local_agent tasks (upstream issue to be filed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…loop yield Two issues found during E2E harness testing: 1. Only track local_bash tasks in pendingTaskIds — the SDK already defers results for local_agent tasks, so Agent subagent task_started events were inflating the count and the notification never arrives via task_notification (agents complete through the tool flow). 2. Yield to the event loop (setTimeout(0)) before peeking the SDK queue. The task_notification is enqueued asynchronously after the result message; peeking synchronously missed it every time. The yield gives the SDK a tick to process the background task completion. E2E harness: 3/3 iterations now consume the internal turn during Turn A (verified by bg task completion text appearing in Turn A's response and Turn B correctly acting on "yes" by creating summary files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two gaps found during E2E harness testing: 1. local_agent task_started must not trigger internal turn detection — the SDK already handles agent task deferral. Without this test, agent task IDs inflated pendingTaskIds and were never resolved. 2. task_notification arriving during the setTimeout(0) yield must still be consumed. The mock simulates the real-world race by pushing internal turn messages into the queue asynchronously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Codex review flagged that the comment said "microtask yield" but setTimeout(0) is a macrotask. Updated to accurately describe the timing semantics and document the best-effort nature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two unit test gaps identified from harness/codex analysis: 1. Warning path: when task_notification never arrives within the setTimeout(0) yield, prompt() should return (not hang) and log a warning with the unresolved task IDs. 2. Failed task_notification: status "failed" should clear pendingTaskIds just like "completed", preventing false-positive internal turn detection on subsequent results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ack-to-back internal turns Three remaining gaps from codex review: 1. stopped task_notification status clears pendingTaskIds (same as failed/completed — verifies all terminal statuses work) 2. error_during_execution result with pending bg tasks does NOT drain internal turns (documents known limitation — drain only runs for result/success) 3. Multiple back-to-back background task completions producing two consecutive internal turns are both consumed in a single prompt() call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…shing Real setTimeout delays (5-50ms) in tests are flaky under CI load and add unnecessary wall-clock time. Replace with: - tools.test.ts: flushMicrotasks() helper that chains Promise.resolve() calls to flush the .then() microtask queue deterministically - bg-task-leak.test.ts: vi.useFakeTimers() + vi.advanceTimersByTimeAsync(0) for the async task_notification test that needs both the production setTimeout(0) yield and the test's deferred push to fire Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d document depth Two improvements from codex review of the timer fix: 1. Add Promise.resolve() flush before vi.advanceTimersByTimeAsync(0) to ensure the production setTimeout(0) is registered before advancing 2. Document the flushMicrotasks depth (5 iterations) with rationale Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- tools.test.ts: replace unused vi import with afterEach import - bg-task-leak.test.ts: remove unused updates destructuring - bin/test: new script that runs the same checks as CI (format, lint, build, test) against git-tracked files only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test helpers (makeResultMessage, makeNormalTurnMessages, makeBgTaskInternalTurnMessages) return untyped SDK message shapes that get spliced together with task_started/task_notification objects. TypeScript inferred strict element types from the literal returns, causing tsc errors when splice inserted objects with different shapes. Explicitly typing returns as any/any[] matches how the mock Query already uses any[] and avoids 11 spurious tsc errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Uses yq to extract run steps from .github/workflows/ci.yml so the script stays in sync automatically if CI changes. Skips npm ci (deps already installed locally) and overrides format:check to use git-tracked files only (matching CI's clean checkout behavior). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three categories of build errors fixed: 1. @anthropic-ai/claude-code/sdk-tools.js: module moved to @anthropic-ai/claude-agent-sdk/sdk-tools.js — update import path 2. ClientCapabilities.auth: ACP extension property not yet in the type definition — cast to any for the two access sites 3. @anthropic-ai/claude-agent-sdk/embed: module only exists in single-file bun builds — add src/embed.d.ts type declaration shim Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

iP() → likely hasRunningDeferrableTasks q4 → likely InputStreamQueue or MessageBuffer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The SDK ships sdk-tools.d.ts but doesn't export ./sdk-tools in its package.json exports map. With a stale local node_modules (0.2.68), tsc resolved the .js import via the loose .d.ts file. After npm ci installs 0.2.71 (matching package.json), tsc can't resolve the path. Add a declare module shim with a triple-slash reference to the SDK's sdk-tools.d.ts so tsc can resolve the import regardless of the SDK's exports configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… npm ci The previous version skipped npm ci entirely, assuming local deps were correct. This let a stale SDK version (0.2.68 vs 0.2.71) mask a build error that CI caught. Now bin/test checks if node_modules matches package-lock.json (via npm ls) and runs npm ci if out of sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The bg-task-leak tests hung because the mock query never produced a user replay message. In production, session.input.push() causes the SDK to replay the user message through the query iterator, setting promptReplayed = true before the result arrives. Without this, every result was consumed as a "background task result" and the loop blocked forever on the next query.next(). Patch createAgentWithSession so input.push() splices a replay message into the mock queue right before the first result, matching SDK behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cla-bot bot added the cla-signed label Feb 27, 2026

timvisher-dd mentioned this pull request Feb 27, 2026

Stale session/update notifications leak into next prompt turn xenodium/agent-shell#345

Closed

timvisher-dd marked this pull request as ready for review February 27, 2026 01:16

timvisher-dd marked this pull request as draft February 27, 2026 04:25

xenodium mentioned this pull request Feb 27, 2026

[Bug] Tool calls and messages being delayed until I send new prompts xenodium/agent-shell#201

Closed

ElleNajt mentioned this pull request Feb 27, 2026

fix: drain background task turns before processing prompt response #359

Closed

xenodium mentioned this pull request Mar 3, 2026

session/load support is currently broken xenodium/agent-shell#361

Closed

5 tasks

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch 2 times, most recently from 8068985 to 07a7bf1 Compare March 5, 2026 02:29

timvisher-dd changed the title ~~fix: await pending PostToolUse hooks before prompt() returns~~ fix: suppress expected PostToolUse errors for subagent and server tools Mar 5, 2026

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from 07a7bf1 to 5ed2d93 Compare March 5, 2026 16:46

timvisher-dd changed the title ~~fix: suppress expected PostToolUse errors for subagent and server tools~~ fix: defer PostToolUse hook execution when callback not yet registered Mar 5, 2026

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch 4 times, most recently from a9aa618 to fa6761d Compare March 6, 2026 14:00

timvisher-dd marked this pull request as ready for review March 6, 2026 17:02

timvisher-dd marked this pull request as draft March 10, 2026 13:15

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from fa6761d to c1746d8 Compare March 10, 2026 14:40

timvisher-dd changed the title ~~fix: defer PostToolUse hook execution when callback not yet registered~~ fix: replace blocking PostToolUse hook deferral with non-blocking fire-and-stash model Mar 11, 2026

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from 17ec70d to 04c4e76 Compare March 12, 2026 17:51

timvisher-dd changed the title ~~fix: replace blocking PostToolUse hook deferral with non-blocking fire-and-stash model~~ fix: stale PostToolUse notifications and background task output leak Mar 12, 2026

timvisher-dd marked this pull request as ready for review March 13, 2026 17:18

timvisher-dd and others added 18 commits March 13, 2026 15:09

style: fix prettier formatting in test files

cdc17a1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: improve bin/test comment explaining git-tracked-only format check

4e296e7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: annotate minified SDK names with likely original names

1d23a8d

iP() → likely hasRunningDeferrableTasks q4 → likely InputStreamQueue or MessageBuffer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

timvisher-dd mentioned this pull request Mar 13, 2026

fix: use current model's context window for usage_update size #412

Open

1 task

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from ad384d8 to ab025a4 Compare March 15, 2026 05:15

timvisher-dd mentioned this pull request Mar 15, 2026

Fix prompt hang when user message replay is absent #423

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stale PostToolUse notifications and background task output leak#353

fix: stale PostToolUse notifications and background task output leak#353
timvisher-dd wants to merge 19 commits intozed-industries:mainfrom
timvisher-dd:fix/stale-post-tool-hook-notifications

timvisher-dd commented Feb 27, 2026 •

edited

Loading

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 3, 2026

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 5, 2026

Uh oh!

timvisher-dd commented Mar 6, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timvisher-dd commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause Analysis

Bug 1: PostToolUse hook timing

Bug 2: SDK internal turns from background tasks

Independence Proof

Fix 1: Non-blocking fire-and-stash (tools.ts)

Fix 2: Internal turn consumption (acp-agent.ts)

Known limitation

Changes

Source

Tests (151 pass, 0 flaky)

Infrastructure

E2E Verification

Risk

Test plan

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 3, 2026

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 5, 2026

Uh oh!

timvisher-dd commented Mar 6, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timvisher-dd commented Feb 27, 2026 •

edited

Loading