Skip to content

fix(query): auto-recover from context-overflow errors#1169

Open
0xfandom wants to merge 3 commits into
Gitlawb:mainfrom
0xfandom:fix/1105-context-overflow-auto-recover
Open

fix(query): auto-recover from context-overflow errors#1169
0xfandom wants to merge 3 commits into
Gitlawb:mainfrom
0xfandom:fix/1105-context-overflow-auto-recover

Conversation

@0xfandom
Copy link
Copy Markdown
Contributor

Summary

  • Detect "context window exceeded" assistant messages from all three sources (Anthropic prompt-too-long, OpenAI-shim context_overflow category, Anthropic 500-with-context-keywords) via a new isContextOverflowMessage helper and an apiError: 'context_overflow' tag on the assistant message.
  • Wire a one-shot auto-compact + retry path in the query loop so the conversation recovers automatically instead of surfacing the error and dropping the current task. Covers external builds (no reactiveCompact / contextCollapse) and OpenAI-shim providers like Codex / GPT-5.5 that surface the limit through a 500 rather than the Anthropic PTL path.

Closes #1105.

Impact

  • user-facing impact: when a request fails with "exceeds the context window of this model" (Codex / GPT-5.5 today, also Anthropic PTL and 500-with-context overflow), the loop now silently compacts and retries the same intent. Previously the task halted and the user had to run /compact or /new manually.
  • developer/maintainer impact: new State.hasAttemptedContextOverflowRecovery field carried through every continue site in queryLoop; resets at the same boundaries as hasAttemptedReactiveCompact (new tool round, continuation nudge, token-budget continuation). One-shot per turn; the existing autocompact 3-strike circuit breaker (autoCompact.ts:274) handles deeper recursion if the post-compact retry overflows again. The new branch sits AFTER the existing reactiveCompact / contextCollapse branches, so internal builds keep their existing recovery and only fall through to this path if neither matched.

Testing

  • bun run build
  • bun run smoke — fails on main too (missing optional @orama/orama), not caused by this change.
  • focused tests: bun test src/services/api/errors.test.ts (new), bun test src/services/api/ src/services/compact/ (529 pass / 0 fail), bun test src/__tests__ (51 pass / 0 fail).
  • bun run typecheck — no new errors in the touched files; remaining errors are pre-existing on main.

Notes

  • provider/model path tested: I verified the Anthropic PTL and Anthropic 500-context paths via unit tests; couldn't reproduce the original Codex / GPT-5.5 1M-context overflow against a live account, so the OpenAI-shim path is exercised through the classifier's context_overflow category test only.
  • follow-up work or known limitations:
    • Doesn't implement option 1 from the issue (pre-flight token-estimate + compact before send). That's the larger change @gnanam1990 flagged — wants a request-body token estimator that doesn't exist yet. Option 2 (retry-after-failure) is what this PR ships.
    • On the post-compact retry the request still goes back through the provider; if a single tool result is so large that even the summary + that tool result blows the budget, the loop will surface the second context-overflow without a further retry (one-shot guard). Hitting that pathologically is what the circuit breaker exists to prevent.

0xfandom added 2 commits May 14, 2026 18:12
Tag the three places that surface a 'context window exceeded' assistant
message (Anthropic PTL, OpenAI-shim context_overflow category, Anthropic
500 with context keywords) with apiError: 'context_overflow' and add
isContextOverflowMessage helper. Lets the query-loop recovery branch in
the follow-up commit catch all three via a single predicate instead of
duplicating string matchers, and keeps the content-prefix fallback so
older sites that didn't get the tag are still recognised.

Refs Gitlawb#1105
When a request fails because the conversation exceeds the provider
context window, run a single auto-compact + retry instead of surfacing
the error and stopping the turn. Covers external builds (no
reactiveCompact / contextCollapse compiled in) and OpenAI-shim providers
like Codex / GPT-5.5 that surface the limit through a 500 with
context-overflow keywords rather than the Anthropic prompt-too-long
path.

Withholds the error in the streaming loop (parallel to the existing
prompt-too-long withholding), runs compactConversation, replaces
messagesForQuery with the post-compact summary, and continues the loop.
Gated by hasAttemptedContextOverflowRecovery so a single turn cannot
loop compact -> error -> compact forever, and the autocompact 3-strike
circuit breaker in autoCompact.ts handles deeper recursion if the
post-compact retry overflows again. Resets on each fresh tool round at
the next_turn site so subsequent turns get a clean recovery attempt.

Closes Gitlawb#1105
gnanam1990
gnanam1990 previously approved these changes May 14, 2026
Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local checks on febcf78:

  • bun run build && node dist/cli.mjs --version — ✅ builds, prints 0.10.0 (OpenClaude)
  • bun test src/services/api/errors.test.ts — ✅ 6 pass
  • bun test src/services/compact src/services/api — ✅ 528 pass when isolated (the 1 fail in openaiShim.test.ts reproduces only under multi-file ordering; in isolation: 94/0 — pre-existing test-state-leak, unrelated)
  • tsc --noEmit — no new errors introduced (the 10 pre-existing failures on main remain unchanged)

Code review:

  • The one-shot hasAttemptedContextOverflowRecovery guard correctly mirrors hasAttemptedReactiveCompact — gated at the message-withhold site and the recovery branch, and reset at the same fresh-tool-round sites. No infinite-loop surface.
  • Reusing compactConversation(..., isAutoCompact=true) is a nice call — gets the existing 3-strike circuit breaker for free if the post-compact retry also overflows.
  • The isContextOverflowMessage fallback by content-prefix is a sensible safety net for older emit sites that didn't carry the apiError: 'context_overflow' tag. Tests cover all three sources (PTL, OpenAI-shim, Anthropic-500) plus the rejection cases.
  • No red flags (no tengu_*, no USER_TYPE === 'ant', no new network calls, no new deps, no CI diff).

LGTM.

Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [P1] The new overflow recovery path also applies to compact/session-memory forks
    src/query.ts:1300
    This branch currently runs even when querySource is 'compact' or 'session_memory'. Those flows already have specialized oversized-context handling, and the compact worker has its own prompt-too-long retry path. If a compact fork hits a non-PTL context_overflow, this code can re-enter compactConversation() from inside queryLoop using the forked compact prompt as messagesForQuery / forkContextMessages instead of the original oversized conversation. That risks compacting the compact worker's own prompt rather than the real conversation payload, and it bypasses the dedicated compaction retry logic. Please guard this branch the same way the existing oversized-context logic guards compact/session-memory sources, and let those specialized callers handle their own recovery.

Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

The context-overflow retry is valuable for the main user loop, but I think this needs one guard before merge: the new recovery path should not run for compact/session-memory fork queries. Those flows already have specialized oversized-context handling, and this branch can recursively invoke compaction from inside the compact worker with the fork prompt rather than the original conversation.

Findings

  • src/query.ts:1300 - Guard context-overflow recovery away from compact/session-memory query sources.
    Impact: isWithheldContextOverflow currently applies regardless of querySource, so a querySource === 'compact' or 'session_memory' fork that hits a non-PTL context overflow will enter this branch and call compactConversation(messagesForQuery, ..., forkContextMessages: messagesForQuery). In those forked flows, messagesForQuery is the compact/session-memory worker prompt, not the original oversized conversation payload. That can compact the worker's prompt, bypass the dedicated compact retry behavior, and produce a misleading post-compact retry instead of letting the specialized caller handle the failure.
    Suggested fix: mirror the existing compact/session-memory exclusion used by the pre-flight blocking-limit path and/or the specialized recovery paths, e.g. require querySource !== 'compact' && querySource !== 'session_memory' before setting isWithheldContextOverflow true.

Validation

I reviewed the PR diff and checked the surrounding query-loop logic locally. The existing blocking-limit path explicitly skips compact/session-memory sources because those are forked agents that inherit the full conversation and need their dedicated handlers; the new context-overflow recovery branch currently lacks the same guard.

Per @jatmn and @techbrewboss review on Gitlawb#1169: the new
isWithheldContextOverflow branch was running regardless of querySource.
Compact and session_memory forks pass the worker prompt as
messagesForQuery, so recovering here would call compactConversation()
with the worker prompt as forkContextMessages — bypassing the dedicated
compact retry path and producing a misleading post-compact retry of the
worker prompt rather than the real conversation.

Mirror the existing pre-flight blocking-limit exclusion (~query.ts:691)
and let the specialized fork callers handle their own oversized-context
recovery.
@0xfandom
Copy link
Copy Markdown
Contributor Author

Pushed 2d21f65 — added the querySource !== 'compact' && querySource !== 'session_memory' guard on isWithheldContextOverflow, mirroring the pre-flight blocking-limit exclusion at src/query.ts:~691. Compact/session-memory forks now fall through to their specialized handlers instead of re-entering compactConversation() with the worker prompt as forkContextMessages.

bun run build → green
bun test src/services/api/errors.test.ts src/services/compact → 14 pass / 0 fail

No new query-loop test added — there's no query.test.ts harness in the repo today and the pre-existing blocking-limit guard at line ~691 also relies on errors/compact unit coverage + manual repro. Happy to add one if you'd prefer, but it'd be the first of its kind.

Thanks @jatmn @techbrewboss for the catch.

Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on the earlier review. The recovery branch now skips compact and session_memory sources, but I found one remaining issue in the companion withhold path.

Findings

  • [P1] Don't withhold context-overflow errors for compact/session-memory forks
    src/query.ts:923
    The recovery branch now has the querySource !== 'compact' && querySource !== 'session_memory' guard, but the streaming withhold condition above it still withholds any isContextOverflowMessage(message) regardless of querySource. For a compact or session-memory fork that returns a context_overflow API error, this hides the message from the stream, then the guarded recovery branch correctly skips it, and the generic API-error early return later exits with reason: 'completed' without yielding the original error. That means the specialized compact/session-memory caller does not get the diagnostic/retry path it needs. Please apply the same query-source exclusion to the withhold condition, or otherwise ensure the skipped recovery path surfaces the original error.

@Vasanthdev2004
Copy link
Copy Markdown
Collaborator

Blockers

  1. Withhold path not guarded — The streaming withhold condition at src/query.ts:923 still withholds isContextOverflowMessage(message) regardless of querySource. For compact/session-memory forks, this hides the error from the stream, then the guarded recovery branch skips it, and the generic API-error early return exits without yielding the original error. The specialized caller doesn't get the diagnostic/retry path it needs.

Non-Blocking

  • Contributor has addressed the main recovery branch guard, but the withhold path still needs the same exclusion.
  • No query-loop test harness exists in the repo — relying on unit tests and manual repro.

Looks Good

  • Valuable feature — auto-recovers from context-overflow errors instead of dropping the task
  • One-shot guard prevents infinite loops
  • Reuses existing compactConversation with 3-strike circuit breaker
  • Covers all three overflow sources (Anthropic PTL, OpenAI-shim, Anthropic 500)
  • Good test coverage for the classifier

Verdict: Changes Requested — withhold path needs the same query-source exclusion as the recovery branch.

Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

The auto-recovery path is useful and the new context-overflow classifier looks good, but there is still one guard mismatch that should block merge. The recovery branch now skips compact/session-memory fork queries, while the earlier streaming withhold path still hides the same errors for those fork sources.

Findings

  • src/query.ts:922 - Don’t withhold context-overflow errors for compact/session-memory forks.
    Impact: The recovery branch now correctly skips querySource === 'compact' and 'session_memory', but the streaming withhold condition still hides any isContextOverflowMessage(message) before that branch runs. For compact/session-memory forks, that means the original context_overflow API error is withheld, the guarded recovery branch skips it, and the later generic API-error return exits without yielding the diagnostic to the specialized caller.
    Suggested fix: Add the same querySource !== 'compact' && querySource !== 'session_memory' guard to the withhold condition, or otherwise ensure the skipped recovery path surfaces the original error.

Validation

Reviewed the PR metadata, full diff, existing reviews, references/openclaude.md, and the surrounding query-loop logic. I also ran bun test src/services/api/errors.test.ts in a detached PR worktree using the repo’s installed dependencies: 6 pass / 0 fail. No malicious or suspicious behavior found.

@gnanam1990
Copy link
Copy Markdown
Collaborator

Confirmed @jatmn's and @techbrewboss's finding against the code — they've independently landed on the same real issue. The recovery branch correctly carries the querySource !== 'compact' && querySource !== 'session_memory' guard, but the streaming withhold condition above it (!hasAttemptedContextOverflowRecovery && isContextOverflowMessage(message)) does not. So for a compact/session-memory fork that returns a context_overflow error: the message is withheld from the stream, the guarded recovery branch correctly skips it, and the generic API-error return then exits with completed without ever yielding the original error — the specialized caller loses its diagnostic/retry path. Applying the same query-source guard to the withhold condition (or surfacing the original error when the guarded recovery is skipped) resolves it. The rest of the auto-recovery design looks sound. Thanks — happy to re-review promptly once that guard is mirrored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-recover from context-window errors after large tool results

5 participants