Fix stop button leaving agents stuck on "Delegating to sub-agent…"#157
Draft
gnguralnick wants to merge 5 commits into
Draft
Fix stop button leaving agents stuck on "Delegating to sub-agent…"#157gnguralnick wants to merge 5 commits into
gnguralnick wants to merge 5 commits into
Conversation
added 5 commits
June 11, 2026 17:26
Clicking stop during a sub-agent run kills Claude mid-delegation: the Agent tool_use never receives a tool_result (not even on resume), so has_unmatched_tool_use stayed true for the rest of the transcript's life. The stale-tail guard only covers the window before the next message; after that the activity state re-pinned at TOOL_RUNNING and the indicator read 'Delegating to sub-agent' forever, with the stop button stuck visible. Backend: track the timestamp of the newest unmatched tool_use and fence it against the claude_process_started marker (same principle as the stale-tail guard) -- a tool call issued before the current Claude process cannot still be running. The sub-agent card had the sibling problem: a delegation stopped before its linkage ever landed showed 'Running...' forever, and its parent was retried for re-linking on every poll cycle. The watcher now marks such never-linked dead delegations is_interrupted, re-broadcasts the parent once (terminal, so the retry stops), and the card renders 'Stopped' -- or, when a tool_result exists without a linked transcript, 'Stopped'/'Finished' by error state -- instead of 'Running...'.
The committed lock predated the vendor/mngr sync's version bumps (imbue-common/concurrency-group 0.1.19, imbue-mngr/mngr-claude 0.2.12, resource-guards 0.1.8); any uv run regenerates it to match the worktree's pyprojects.
Architecture-review cleanups: extract read_process_started_at / CLAUDE_PROCESS_STARTED_MARKER into activity_state.py (replacing the duplicated private readers in agent_manager and session_watcher), rename mergeLateSubagentMetadata to mergeLateSubagentState to match its widened role, and simplify the update_session_events short-circuit to a tuple comparison.
Problem: newest_unmatched_tool_use_timestamp rebuilt the matched tool_result id set that has_unmatched_tool_use also builds, so the matching rule was maintained twice in the same file. Fix: extract a shared pure matched_tool_result_ids helper and use it from both scanners; semantics are unchanged (has_unmatched_tool_use additionally gains an early return).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Clicking the chat stop button while a sub-agent was running left the agent in a permanently "working" state: the activity indicator kept reading "Delegating to sub-agent…" (with the stop button stuck visible) even after new messages, and the delegation's card could read "Running…" forever. Root-caused against a live agent's transcript: the stop endpoint restarts Claude mid-delegation, the abandoned
Agenttool_use never receives atool_result(not even on a later resume), and nothing downstream ever retires it.Activity indicator (
activity_state.py/agent_manager.py)has_unmatched_tool_usescans the whole transcript, so one danglingtool_usepinned the derived state atTOOL_RUNNINGfor the rest of the transcript's life. The existing guards each cover only a window: the interrupt endpoint's explicit reset lasts until the next watcher recompute, and the stale-tail guard (#146) stops applying the moment a new message lands — which is exactly when the dangling call re-pins the state.The tracker now also caches the timestamp of the newest unmatched
tool_useand fences it against theclaude_process_startedmarker (same positive-evidence principle as the stale-tail guard): a tool call issued before the current Claude process started cannot still be running. The newest-pending semantics keep in-flight turns correct — a fresh turn's running tool still derivesTOOL_RUNNING; once it resolves, only the abandoned call remains and the fence retires it. Missing marker or timestamp leaves behavior unchanged.Sub-agent card (
session_watcher.py+ frontend)A delegation killed before its session linkage ever landed (no
meta.jsontoolUseIddiscovered, and the closingtool_resultwill never arrive) showed a "Running…" card forever, and the watcher retried re-linking its parent on every poll cycle, indefinitely. The enrichment pass now marks such never-linked dead callsis_interruptedusing the same marker fence, the re-broadcast pass re-emits the parent once so the card flips live (terminal, so the retry loop also ends), and the frontend renders "Stopped". A card whosetool_resultexists but never linked to a transcript (denied, errored, or cleaned-up session) now also shows "Stopped"/"Finished" by error state instead of the previous permanent "Running…". Linked delegations are untouched — "View conversation" is already a terminal state.Both fixes are retroactive for already-stuck agents: the marker is already newer than the dead calls, so the next recompute/enrichment clears them.
Testing
Agenttool_uses from stop clicks with no matchingtool_resultanywhere, no interrupt sentinel, and a normal turn following — exactly the shape the new regression tests encode.system_interface:uv run pytest— 505 passed, 10 skipped, coverage 86.06%; ratchets pass. Two failures are pre-existing on the base commit and environmental to the local machine (mngr'sis_allowed_in_pytestopt-in, and the e2e page-title test needing the built frontend), verified by re-running both on a clean tree.npm run test145 passed;npm run lint(tsc + eslint) clean.