feat(provider): auto-switch on rate limit via providerFallbackChain (#768) by 0xfandom · Pull Request #1176 · Gitlawb/openclaude

0xfandom · 2026-05-15T06:41:17Z

Summary

Implements the "smallest useful version" @paulerrr described in #768 — when the active provider returns a rate-limit error and the user has configured an ordered fallback list, swap to the next provider and retry the turn instead of dropping the task and surfacing the error.

Three pieces:

Settings schema. New optional providerFallbackChain: string[] field on the user-settings schema (src/utils/settings/types.ts). List of providerProfile ids, ordered by preference. Example:
```
{
  "providerFallbackChain": [
    "provider_anthropic",
    "provider_openai",
    "provider_ollama"
  ]
}
```
src/utils/providerFallback.ts resolver (pure, side-effect-free):
- getProviderFallbackChain() — reads + defensively filters non-string / empty entries so a malformed hand-edit doesn't crash the recovery flow
- resolveNextFallbackProvider(activeId, chain, profiles) — pure function: next entry past activeId; skips chain entries that no longer resolve to a real profile; refuses to wrap past the last entry (avoids a degraded-network churn loop); starts from chain[0] when the active profile isn't in the chain (treats the chain as an absolute priority list); returns null on exhaustion
- resolveNextFallbackProviderFromState() — convenience that pulls chain + active from settings/state
- 11 unit tests cover each branch + malformed input
Query-loop recovery branch in src/query.ts, sibling to the existing reactive-compact / context-overflow recovery paths:
- Detects lastMessage.error === 'rate_limit' on an isApiErrorMessage assistant message
- Skips for compact / session_memory fork query sources — those are forked workers operating against the parent turn's already-committed credential set; switching mid-fork would change credentials under the parent. Mirrors the pre-flight blocking-limit guard at src/query.ts:~691.
- Calls setActiveProviderProfile() (same code path /provider uses; persists active profile, swaps env vars including OPENAI_BASE_URL / OPENAI_API_KEY, refreshes the startup file)
- Emits an inline Provider <from> rate-limited — switched to <to> assistant message tagged error: 'rate_limit' so the UI / transcript renders it consistently with other rate-limit messages
- One-shot per turn via state.hasAttemptedProviderFallback; reset at next_turn / continuation_nudge / token_budget_continuation so a fresh user turn can fall back again
- On no-fallback-configured / chain-exhausted / activation-failed → falls through to the standard rate-limit termination so the user still sees the original error

Impact

user-facing impact: opt-in via providerFallbackChain setting. Users who don't set it see no behavior change. Users who set it get automatic recovery on rate-limit / 429 with an inline Provider X rate-limited — switched to Y notification. Backed by the same setActiveProviderProfile swap /provider performs, so subsequent turns continue against the new provider until the user picks differently.
developer/maintainer impact: one new optional settings field, one new pure module + tests, and one new query-loop branch that mirrors the existing reactive-compact / context-overflow recovery shape (same one-shot state-field + reset-site pattern). No changes to existing error classification or message creation.

What this does NOT do

Out of scope per the issue's explicit "smallest useful version" framing. Each is a clean follow-up:

A /switch slash command for manual provider switching without waiting for a failure (the issue lists this as a separate item).
A /provider next UI affordance.
Quota-vs-burst-429 disambiguation. Today the branch treats any error: 'rate_limit' assistant message the same way. If the chain is healthy this is fine; the one-shot guard prevents thrashing.
Per-provider error-type filters (e.g. "fall back on quota but not on burst 429"). Easy to layer on a providerFallbackErrorTypes field later without changing the recovery shape.

Testing

bun run build — green
focused tests: bun test src/utils/providerFallback.test.ts — 11/11 pass
combined run: bun test src/utils/providerFallback.test.ts src/utils/model/ src/utils/providerProfiles.test.ts src/services/api/errors.test.ts src/services/compact — 172/172 pass (mock-leak guard on the new test file via import * as actual from path spreads, per the 2026-04-30 lesson)
bun run smoke — fails on this checkout with Cannot find package '@orama/orama'; reproduces on clean upstream main HEAD 94e8ff3 (pre-existing local env issue, unrelated to this change)
no new query-loop integration tests — the existing reactive-compact / context-overflow / max-output-tokens recovery branches in src/query.ts aren't covered by query-loop tests either (no query.test.ts harness in the repo today). Happy to add one alongside this if you'd prefer — it would be the first of its kind.

Notes

provider/model path tested: any provider that surfaces error: 'rate_limit'. That's already emitted from 5+ sites in src/services/api/errors.ts (Anthropic 429, quota-limit, no-headers 429, etc.) plus the OpenAI-compat shim's rate_limited category.
The encoded fallback list is intentionally just profile ids — model selection follows the activated profile's model field, same as /provider switching. Cross-profile model overrides are a separate concern (matches the boundary @Vasanthdev2004 drew in Feature: Show agentModels from settings.json in /model picker instead of hardcoded GPT/Codex models #1119 piece 2).
The Provider X rate-limited — switched to Y notification is emitted as an assistant createAssistantAPIErrorMessage with error: 'rate_limit' rather than a user/system meta message so the existing rate-limit transcript / styling renders it. If you prefer a distinct surface (e.g. a meta-style user message instead), happy to swap.

Closes #768

Adds the "smallest useful version" described in Gitlawb#768 — when the active provider returns a rate-limit error and the user has configured an ordered list of provider profile ids in settings.providerFallbackChain, swap to the next chain entry and retry the turn instead of bubbling the error to the UI. Three pieces: - New `providerFallbackChain?: string[]` field on the user settings schema. List of providerProfile ids, ordered by preference. - New `src/utils/providerFallback.ts` resolver: - `getProviderFallbackChain()` — read + defensively filter - `resolveNextFallbackProvider(activeId, chain, profiles)` — pure function: returns the next chain entry past `activeId`, skipping chain entries that no longer resolve to a real profile, refusing to wrap past the last entry (avoids a degraded-network churn loop), starting from `chain[0]` when the active profile isn't in the chain (treats the chain as an absolute priority list) - `resolveNextFallbackProviderFromState()` — convenience over the pure resolver that reads chain + active from settings/state - 11 unit tests covering each branch + malformed input - Query loop recovery branch in `src/query.ts`, sibling to the existing reactive-compact / context-overflow recovery paths: - Detect `lastMessage.error === 'rate_limit'` on an isApiErrorMessage - Skip for compact / session_memory fork query sources — those run against the same conversation tail the outer turn just committed to a credential set, switching mid-fork would change credentials under the parent - Call `setActiveProviderProfile()` (same path /provider uses; persists active profile, swaps env vars including OPENAI_BASE_URL / OPENAI_API_KEY, refreshes startup file) - Emit an inline `Provider <from> rate-limited — switched to <to>` assistant message tagged `error: 'rate_limit'` so existing UI/transcript handling renders it consistently - One-shot per turn via `state.hasAttemptedProviderFallback`; reset at next_turn / continuation_nudge / token_budget_continuation so a fresh user turn can fall back again - On no-fallback-configured / chain-exhausted / activation-failed, fall through to the standard rate-limit termination so the user still sees the original error Out of scope (separate followups per the issue's "smallest useful version" framing): a `/switch` slash command, `/provider next` UI, and quota-vs-burst-429 disambiguation. Closes Gitlawb#768

CI surfaced 3 fails on the settings-cache path: setSessionSettingsCache() works locally but doesn't survive a fresh `import('?ts=...')` because the settings module loads its own cache instance on each nonced re-import. Locally bun reused the cache across re-imports, hiding the issue. Stub `getSettings_DEPRECATED` / `getInitialSettings` on the mocked `./settings/settings.js` factory so the resolver sees the test's intended `providerFallbackChain` regardless of how the settings module's session cache behaves under fresh imports. Spreads `...actualSettings` so the rest of the settings surface is preserved per the 2026-04-30 mock-leak lesson.

jatmn

Findings

[P1] Withhold the 429 before attempting provider fallback
src/query.ts:910
The new fallback branch runs after streaming completes, but rate-limit API-error messages are not marked as withheld in the streaming loop. As a result, the original rate_limit assistant error is yielded to the UI/SDK before isWithheldRateLimit gets a chance to switch providers and retry. That contradicts the intended "retry instead of surfacing the error" behavior, and it is especially risky for SDK consumers that terminate on any yielded error message. Please withhold eligible rate_limit messages under the same guard used by the recovery branch, then surface the original error only when no fallback is configured, activation fails, or the fallback attempt is exhausted.
[P2] Import ProviderProfile from an exported module
src/utils/providerFallback.ts:16
ProviderProfile is imported from ./providerProfiles.js, but that module only imports the type from ./config.js; it does not re-export it. bun test passes because the type is erased, but bun run typecheck reports the new errors in providerFallback.ts and providerFallback.test.ts. Please import the type from ./config.js directly, or explicitly re-export it from providerProfiles.ts.

Vasanthdev2004 · 2026-05-16T10:34:07Z

Blockers

Withhold the 429 before attempting provider fallback — Rate-limit API-error messages are not marked as withheld in the streaming loop. The original rate_limit assistant error is yielded to the UI/SDK before the fallback branch gets a chance to switch providers and retry. This contradicts the intended "retry instead of surfacing the error" behavior.

Non-Blocking

Type import issue — ProviderProfile imported from wrong module, causes typecheck errors.

Looks Good

Opt-in via providerFallbackChain setting — no behavior change for users who don't set it
Pure resolver module with 11 unit tests
One-shot per turn via state.hasAttemptedProviderFallback
Skips for compact/session_memory fork query sources
Clear inline notification when switching providers
Mirrors existing recovery branch patterns

Verdict: Changes Requested — rate-limit messages need to be withheld before fallback attempt.

gnanam1990 · 2026-05-17T07:19:04Z

Confirmed @jatmn's two points: the fallback branch runs after yield lastMessage, so the rate_limit error is surfaced to the UI/SDK before the provider switch (SDK consumers terminating on yielded errors won't benefit) — withholding eligible rate_limit messages under the same guard as the recovery branch is the right fix; and ProviderProfile is exported from config.ts, not re-exported by providerProfiles.ts, so the new import { ProviderProfile } from './providerProfiles.js' in providerFallback.ts:16 fails bun run typecheck even though bun test passes.

One additional design point worth raising since this touches provider routing: the one-shot hasAttemptedProviderFallback guard is only cleared at next_turn / nudge / budget-continuation, so an ordered providerFallbackChain only advances a single hop per turn. If the user configures a 3-entry chain and the second provider is also rate-limited within the same turn, the turn still fails despite the remaining chain entry. Could the retry loop advance through the chain within the turn (with a bounded attempt count to avoid oscillation) rather than one hop per turn? Worth confirming the intended semantics here. Thanks for tackling this one — it's a useful feature.

0xfandom added 2 commits May 15, 2026 12:10

jatmn requested changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(provider): auto-switch on rate limit via providerFallbackChain (#768)#1176

feat(provider): auto-switch on rate limit via providerFallbackChain (#768)#1176
0xfandom wants to merge 2 commits into
Gitlawb:mainfrom
0xfandom:feat/768-provider-fallback-chain

0xfandom commented May 15, 2026

Uh oh!

jatmn left a comment

Uh oh!

Vasanthdev2004 commented May 16, 2026

Uh oh!

gnanam1990 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

0xfandom commented May 15, 2026

Summary

Impact

What this does NOT do

Testing

Notes

Uh oh!

jatmn left a comment

Choose a reason for hiding this comment

Findings

Uh oh!

Vasanthdev2004 commented May 16, 2026

Blockers

Non-Blocking

Looks Good

Uh oh!

gnanam1990 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants