feat(provider): auto-switch on rate limit via providerFallbackChain (#768)#1176
feat(provider): auto-switch on rate limit via providerFallbackChain (#768)#11760xfandom wants to merge 2 commits into
Conversation
Adds the "smallest useful version" described in Gitlawb#768 — when the active provider returns a rate-limit error and the user has configured an ordered list of provider profile ids in settings.providerFallbackChain, swap to the next chain entry and retry the turn instead of bubbling the error to the UI. Three pieces: - New `providerFallbackChain?: string[]` field on the user settings schema. List of providerProfile ids, ordered by preference. - New `src/utils/providerFallback.ts` resolver: - `getProviderFallbackChain()` — read + defensively filter - `resolveNextFallbackProvider(activeId, chain, profiles)` — pure function: returns the next chain entry past `activeId`, skipping chain entries that no longer resolve to a real profile, refusing to wrap past the last entry (avoids a degraded-network churn loop), starting from `chain[0]` when the active profile isn't in the chain (treats the chain as an absolute priority list) - `resolveNextFallbackProviderFromState()` — convenience over the pure resolver that reads chain + active from settings/state - 11 unit tests covering each branch + malformed input - Query loop recovery branch in `src/query.ts`, sibling to the existing reactive-compact / context-overflow recovery paths: - Detect `lastMessage.error === 'rate_limit'` on an isApiErrorMessage - Skip for compact / session_memory fork query sources — those run against the same conversation tail the outer turn just committed to a credential set, switching mid-fork would change credentials under the parent - Call `setActiveProviderProfile()` (same path /provider uses; persists active profile, swaps env vars including OPENAI_BASE_URL / OPENAI_API_KEY, refreshes startup file) - Emit an inline `Provider <from> rate-limited — switched to <to>` assistant message tagged `error: 'rate_limit'` so existing UI/transcript handling renders it consistently - One-shot per turn via `state.hasAttemptedProviderFallback`; reset at next_turn / continuation_nudge / token_budget_continuation so a fresh user turn can fall back again - On no-fallback-configured / chain-exhausted / activation-failed, fall through to the standard rate-limit termination so the user still sees the original error Out of scope (separate followups per the issue's "smallest useful version" framing): a `/switch` slash command, `/provider next` UI, and quota-vs-burst-429 disambiguation. Closes Gitlawb#768
CI surfaced 3 fails on the settings-cache path: setSessionSettingsCache()
works locally but doesn't survive a fresh `import('?ts=...')` because the
settings module loads its own cache instance on each nonced re-import.
Locally bun reused the cache across re-imports, hiding the issue.
Stub `getSettings_DEPRECATED` / `getInitialSettings` on the mocked
`./settings/settings.js` factory so the resolver sees the test's intended
`providerFallbackChain` regardless of how the settings module's session
cache behaves under fresh imports.
Spreads `...actualSettings` so the rest of the settings surface is
preserved per the 2026-04-30 mock-leak lesson.
jatmn
left a comment
There was a problem hiding this comment.
Findings
-
[P1] Withhold the 429 before attempting provider fallback
src/query.ts:910
The new fallback branch runs after streaming completes, but rate-limit API-error messages are not marked aswithheldin the streaming loop. As a result, the originalrate_limitassistant error is yielded to the UI/SDK beforeisWithheldRateLimitgets a chance to switch providers and retry. That contradicts the intended "retry instead of surfacing the error" behavior, and it is especially risky for SDK consumers that terminate on any yielded error message. Please withhold eligiblerate_limitmessages under the same guard used by the recovery branch, then surface the original error only when no fallback is configured, activation fails, or the fallback attempt is exhausted. -
[P2] Import
ProviderProfilefrom an exported module
src/utils/providerFallback.ts:16
ProviderProfileis imported from./providerProfiles.js, but that module only imports the type from./config.js; it does not re-export it.bun testpasses because the type is erased, butbun run typecheckreports the new errors inproviderFallback.tsandproviderFallback.test.ts. Please import the type from./config.jsdirectly, or explicitly re-export it fromproviderProfiles.ts.
Blockers
Non-Blocking
Looks Good
Verdict: Changes Requested — rate-limit messages need to be withheld before fallback attempt. |
|
Confirmed @jatmn's two points: the fallback branch runs after One additional design point worth raising since this touches provider routing: the one-shot |
Summary
Implements the "smallest useful version" @paulerrr described in #768 — when the active provider returns a rate-limit error and the user has configured an ordered fallback list, swap to the next provider and retry the turn instead of dropping the task and surfacing the error.
Three pieces:
Settings schema. New optional
providerFallbackChain: string[]field on the user-settings schema (src/utils/settings/types.ts). List ofproviderProfileids, ordered by preference. Example:{ "providerFallbackChain": [ "provider_anthropic", "provider_openai", "provider_ollama" ] }src/utils/providerFallback.tsresolver (pure, side-effect-free):getProviderFallbackChain()— reads + defensively filters non-string / empty entries so a malformed hand-edit doesn't crash the recovery flowresolveNextFallbackProvider(activeId, chain, profiles)— pure function: next entry pastactiveId; skips chain entries that no longer resolve to a real profile; refuses to wrap past the last entry (avoids a degraded-network churn loop); starts fromchain[0]when the active profile isn't in the chain (treats the chain as an absolute priority list); returnsnullon exhaustionresolveNextFallbackProviderFromState()— convenience that pulls chain + active from settings/stateQuery-loop recovery branch in
src/query.ts, sibling to the existing reactive-compact / context-overflow recovery paths:lastMessage.error === 'rate_limit'on anisApiErrorMessageassistant messagesrc/query.ts:~691.setActiveProviderProfile()(same code path/provideruses; persists active profile, swaps env vars includingOPENAI_BASE_URL/OPENAI_API_KEY, refreshes the startup file)Provider <from> rate-limited — switched to <to>assistant message taggederror: 'rate_limit'so the UI / transcript renders it consistently with other rate-limit messagesstate.hasAttemptedProviderFallback; reset atnext_turn/continuation_nudge/token_budget_continuationso a fresh user turn can fall back againImpact
providerFallbackChainsetting. Users who don't set it see no behavior change. Users who set it get automatic recovery on rate-limit / 429 with an inlineProvider X rate-limited — switched to Ynotification. Backed by the samesetActiveProviderProfileswap/providerperforms, so subsequent turns continue against the new provider until the user picks differently.What this does NOT do
Out of scope per the issue's explicit "smallest useful version" framing. Each is a clean follow-up:
/switchslash command for manual provider switching without waiting for a failure (the issue lists this as a separate item)./provider nextUI affordance.error: 'rate_limit'assistant message the same way. If the chain is healthy this is fine; the one-shot guard prevents thrashing.providerFallbackErrorTypesfield later without changing the recovery shape.Testing
bun run build— greenbun test src/utils/providerFallback.test.ts— 11/11 passbun test src/utils/providerFallback.test.ts src/utils/model/ src/utils/providerProfiles.test.ts src/services/api/errors.test.ts src/services/compact— 172/172 pass (mock-leak guard on the new test file viaimport * as actual from pathspreads, per the 2026-04-30 lesson)bun run smoke— fails on this checkout withCannot find package '@orama/orama'; reproduces on clean upstreammainHEAD94e8ff3(pre-existing local env issue, unrelated to this change)src/query.tsaren't covered by query-loop tests either (noquery.test.tsharness in the repo today). Happy to add one alongside this if you'd prefer — it would be the first of its kind.Notes
error: 'rate_limit'. That's already emitted from 5+ sites insrc/services/api/errors.ts(Anthropic 429, quota-limit, no-headers 429, etc.) plus the OpenAI-compat shim'srate_limitedcategory.modelfield, same as/providerswitching. Cross-profile model overrides are a separate concern (matches the boundary @Vasanthdev2004 drew in Feature: Show agentModels from settings.json in /model picker instead of hardcoded GPT/Codex models #1119 piece 2).Provider X rate-limited — switched to Ynotification is emitted as an assistantcreateAssistantAPIErrorMessagewitherror: 'rate_limit'rather than a user/system meta message so the existing rate-limit transcript / styling renders it. If you prefer a distinct surface (e.g. ameta-style user message instead), happy to swap.Closes #768