Skip to content

feat(provider): auto-switch on rate limit via providerFallbackChain (#768)#1176

Open
0xfandom wants to merge 2 commits into
Gitlawb:mainfrom
0xfandom:feat/768-provider-fallback-chain
Open

feat(provider): auto-switch on rate limit via providerFallbackChain (#768)#1176
0xfandom wants to merge 2 commits into
Gitlawb:mainfrom
0xfandom:feat/768-provider-fallback-chain

Conversation

@0xfandom
Copy link
Copy Markdown
Contributor

Summary

Implements the "smallest useful version" @paulerrr described in #768 — when the active provider returns a rate-limit error and the user has configured an ordered fallback list, swap to the next provider and retry the turn instead of dropping the task and surfacing the error.

Three pieces:

  1. Settings schema. New optional providerFallbackChain: string[] field on the user-settings schema (src/utils/settings/types.ts). List of providerProfile ids, ordered by preference. Example:

    {
      "providerFallbackChain": [
        "provider_anthropic",
        "provider_openai",
        "provider_ollama"
      ]
    }
  2. src/utils/providerFallback.ts resolver (pure, side-effect-free):

    • getProviderFallbackChain() — reads + defensively filters non-string / empty entries so a malformed hand-edit doesn't crash the recovery flow
    • resolveNextFallbackProvider(activeId, chain, profiles) — pure function: next entry past activeId; skips chain entries that no longer resolve to a real profile; refuses to wrap past the last entry (avoids a degraded-network churn loop); starts from chain[0] when the active profile isn't in the chain (treats the chain as an absolute priority list); returns null on exhaustion
    • resolveNextFallbackProviderFromState() — convenience that pulls chain + active from settings/state
    • 11 unit tests cover each branch + malformed input
  3. Query-loop recovery branch in src/query.ts, sibling to the existing reactive-compact / context-overflow recovery paths:

    • Detects lastMessage.error === 'rate_limit' on an isApiErrorMessage assistant message
    • Skips for compact / session_memory fork query sources — those are forked workers operating against the parent turn's already-committed credential set; switching mid-fork would change credentials under the parent. Mirrors the pre-flight blocking-limit guard at src/query.ts:~691.
    • Calls setActiveProviderProfile() (same code path /provider uses; persists active profile, swaps env vars including OPENAI_BASE_URL / OPENAI_API_KEY, refreshes the startup file)
    • Emits an inline Provider <from> rate-limited — switched to <to> assistant message tagged error: 'rate_limit' so the UI / transcript renders it consistently with other rate-limit messages
    • One-shot per turn via state.hasAttemptedProviderFallback; reset at next_turn / continuation_nudge / token_budget_continuation so a fresh user turn can fall back again
    • On no-fallback-configured / chain-exhausted / activation-failed → falls through to the standard rate-limit termination so the user still sees the original error

Impact

  • user-facing impact: opt-in via providerFallbackChain setting. Users who don't set it see no behavior change. Users who set it get automatic recovery on rate-limit / 429 with an inline Provider X rate-limited — switched to Y notification. Backed by the same setActiveProviderProfile swap /provider performs, so subsequent turns continue against the new provider until the user picks differently.
  • developer/maintainer impact: one new optional settings field, one new pure module + tests, and one new query-loop branch that mirrors the existing reactive-compact / context-overflow recovery shape (same one-shot state-field + reset-site pattern). No changes to existing error classification or message creation.

What this does NOT do

Out of scope per the issue's explicit "smallest useful version" framing. Each is a clean follow-up:

  • A /switch slash command for manual provider switching without waiting for a failure (the issue lists this as a separate item).
  • A /provider next UI affordance.
  • Quota-vs-burst-429 disambiguation. Today the branch treats any error: 'rate_limit' assistant message the same way. If the chain is healthy this is fine; the one-shot guard prevents thrashing.
  • Per-provider error-type filters (e.g. "fall back on quota but not on burst 429"). Easy to layer on a providerFallbackErrorTypes field later without changing the recovery shape.

Testing

  • bun run build — green
  • focused tests: bun test src/utils/providerFallback.test.ts — 11/11 pass
  • combined run: bun test src/utils/providerFallback.test.ts src/utils/model/ src/utils/providerProfiles.test.ts src/services/api/errors.test.ts src/services/compact — 172/172 pass (mock-leak guard on the new test file via import * as actual from path spreads, per the 2026-04-30 lesson)
  • bun run smoke — fails on this checkout with Cannot find package '@orama/orama'; reproduces on clean upstream main HEAD 94e8ff3 (pre-existing local env issue, unrelated to this change)
  • no new query-loop integration tests — the existing reactive-compact / context-overflow / max-output-tokens recovery branches in src/query.ts aren't covered by query-loop tests either (no query.test.ts harness in the repo today). Happy to add one alongside this if you'd prefer — it would be the first of its kind.

Notes

  • provider/model path tested: any provider that surfaces error: 'rate_limit'. That's already emitted from 5+ sites in src/services/api/errors.ts (Anthropic 429, quota-limit, no-headers 429, etc.) plus the OpenAI-compat shim's rate_limited category.
  • The encoded fallback list is intentionally just profile ids — model selection follows the activated profile's model field, same as /provider switching. Cross-profile model overrides are a separate concern (matches the boundary @Vasanthdev2004 drew in Feature: Show agentModels from settings.json in /model picker instead of hardcoded GPT/Codex models #1119 piece 2).
  • The Provider X rate-limited — switched to Y notification is emitted as an assistant createAssistantAPIErrorMessage with error: 'rate_limit' rather than a user/system meta message so the existing rate-limit transcript / styling renders it. If you prefer a distinct surface (e.g. a meta-style user message instead), happy to swap.

Closes #768

0xfandom added 2 commits May 15, 2026 12:10
Adds the "smallest useful version" described in Gitlawb#768 — when the active
provider returns a rate-limit error and the user has configured an
ordered list of provider profile ids in settings.providerFallbackChain,
swap to the next chain entry and retry the turn instead of bubbling the
error to the UI.

Three pieces:

- New `providerFallbackChain?: string[]` field on the user settings
  schema. List of providerProfile ids, ordered by preference.

- New `src/utils/providerFallback.ts` resolver:
  - `getProviderFallbackChain()` — read + defensively filter
  - `resolveNextFallbackProvider(activeId, chain, profiles)` — pure
    function: returns the next chain entry past `activeId`, skipping
    chain entries that no longer resolve to a real profile, refusing
    to wrap past the last entry (avoids a degraded-network churn
    loop), starting from `chain[0]` when the active profile isn't in
    the chain (treats the chain as an absolute priority list)
  - `resolveNextFallbackProviderFromState()` — convenience over the
    pure resolver that reads chain + active from settings/state
  - 11 unit tests covering each branch + malformed input

- Query loop recovery branch in `src/query.ts`, sibling to the
  existing reactive-compact / context-overflow recovery paths:
  - Detect `lastMessage.error === 'rate_limit'` on an isApiErrorMessage
  - Skip for compact / session_memory fork query sources — those run
    against the same conversation tail the outer turn just committed
    to a credential set, switching mid-fork would change credentials
    under the parent
  - Call `setActiveProviderProfile()` (same path /provider uses;
    persists active profile, swaps env vars including
    OPENAI_BASE_URL / OPENAI_API_KEY, refreshes startup file)
  - Emit an inline `Provider <from> rate-limited — switched to <to>`
    assistant message tagged `error: 'rate_limit'` so existing
    UI/transcript handling renders it consistently
  - One-shot per turn via `state.hasAttemptedProviderFallback`;
    reset at next_turn / continuation_nudge / token_budget_continuation
    so a fresh user turn can fall back again
  - On no-fallback-configured / chain-exhausted / activation-failed,
    fall through to the standard rate-limit termination so the user
    still sees the original error

Out of scope (separate followups per the issue's "smallest useful
version" framing): a `/switch` slash command, `/provider next` UI, and
quota-vs-burst-429 disambiguation.

Closes Gitlawb#768
CI surfaced 3 fails on the settings-cache path: setSessionSettingsCache()
works locally but doesn't survive a fresh `import('?ts=...')` because the
settings module loads its own cache instance on each nonced re-import.
Locally bun reused the cache across re-imports, hiding the issue.

Stub `getSettings_DEPRECATED` / `getInitialSettings` on the mocked
`./settings/settings.js` factory so the resolver sees the test's intended
`providerFallbackChain` regardless of how the settings module's session
cache behaves under fresh imports.

Spreads `...actualSettings` so the rest of the settings surface is
preserved per the 2026-04-30 mock-leak lesson.
Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [P1] Withhold the 429 before attempting provider fallback
    src/query.ts:910
    The new fallback branch runs after streaming completes, but rate-limit API-error messages are not marked as withheld in the streaming loop. As a result, the original rate_limit assistant error is yielded to the UI/SDK before isWithheldRateLimit gets a chance to switch providers and retry. That contradicts the intended "retry instead of surfacing the error" behavior, and it is especially risky for SDK consumers that terminate on any yielded error message. Please withhold eligible rate_limit messages under the same guard used by the recovery branch, then surface the original error only when no fallback is configured, activation fails, or the fallback attempt is exhausted.

  • [P2] Import ProviderProfile from an exported module
    src/utils/providerFallback.ts:16
    ProviderProfile is imported from ./providerProfiles.js, but that module only imports the type from ./config.js; it does not re-export it. bun test passes because the type is erased, but bun run typecheck reports the new errors in providerFallback.ts and providerFallback.test.ts. Please import the type from ./config.js directly, or explicitly re-export it from providerProfiles.ts.

@Vasanthdev2004
Copy link
Copy Markdown
Collaborator

Blockers

  1. Withhold the 429 before attempting provider fallback — Rate-limit API-error messages are not marked as withheld in the streaming loop. The original rate_limit assistant error is yielded to the UI/SDK before the fallback branch gets a chance to switch providers and retry. This contradicts the intended "retry instead of surfacing the error" behavior.

Non-Blocking

  • Type import issue — ProviderProfile imported from wrong module, causes typecheck errors.

Looks Good

  • Opt-in via providerFallbackChain setting — no behavior change for users who don't set it
  • Pure resolver module with 11 unit tests
  • One-shot per turn via state.hasAttemptedProviderFallback
  • Skips for compact/session_memory fork query sources
  • Clear inline notification when switching providers
  • Mirrors existing recovery branch patterns

Verdict: Changes Requested — rate-limit messages need to be withheld before fallback attempt.

@gnanam1990
Copy link
Copy Markdown
Collaborator

Confirmed @jatmn's two points: the fallback branch runs after yield lastMessage, so the rate_limit error is surfaced to the UI/SDK before the provider switch (SDK consumers terminating on yielded errors won't benefit) — withholding eligible rate_limit messages under the same guard as the recovery branch is the right fix; and ProviderProfile is exported from config.ts, not re-exported by providerProfiles.ts, so the new import { ProviderProfile } from './providerProfiles.js' in providerFallback.ts:16 fails bun run typecheck even though bun test passes.

One additional design point worth raising since this touches provider routing: the one-shot hasAttemptedProviderFallback guard is only cleared at next_turn / nudge / budget-continuation, so an ordered providerFallbackChain only advances a single hop per turn. If the user configures a 3-entry chain and the second provider is also rate-limited within the same turn, the turn still fails despite the remaining chain entry. Could the retry loop advance through the chain within the turn (with a bounded attempt count to avoid oscillation) rather than one hop per turn? Worth confirming the intended semantics here. Thanks for tackling this one — it's a useful feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto-switch to next configured provider on rate limit or quota exhaustion

4 participants