Skip to content

[Bug]: runtime_fallback shows "Model Fallback" toast but primary session keeps retrying exhausted OpenAI model (OpenCode 1.17.5) #5435

Description

@samerfarida

Prerequisites

  • I will write this issue in English (see our Language Policy)
  • I have searched existing issues to avoid duplicates
  • I am using the latest version of oh-my-openagent
  • I have read the documentation or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer

Bug Description

Summary

With runtime_fallback.enabled: true, the TUI displays "Model Fallback — Switching to …" when the primary model hits quota / usage-limit errors, but OpenCode logs show the primary ultraworker session continues streaming on the same exhausted provider/model (openai/gpt-5.4). The configured fallback (litellm/openai.eu.gpt-5.5) is never used for that session.

This is not a model-name mismatch: fallback models are registered in OpenCode, listed by opencode models, and match the gateway catalog. Subagent fallback does work via a different code path (background-agent respawn on LiteLLM).


Environment

Component Version
OpenCode 1.17.5
oh-my-openagent 4.11.1
OS macOS (darwin 25.4.0)
Agent Sisyphus - ultraworker (config key: sisyphus)
Primary model openai/gpt-5.4
First fallback litellm/openai.eu.gpt-5.5

Relevant oh-my-openagent.json snippet

{
  "runtime_fallback": {
    "enabled": true,
    "max_fallback_attempts": 5,
    "cooldown_seconds": 60,
    "notify_on_fallback": true
  },
  "agents": {
    "sisyphus": {
      "model": "openai/gpt-5.4",
      "variant": "high",
      "fallback_models": [
        { "model": "litellm/openai.eu.gpt-5.5", "variant": "high" },
        { "model": "litellm/vertex_ai.anthropic.claude-opus-4-8", "variant": "max" },
        { "model": "opencode/deepseek-v4-flash-free" },
        { "model": "lmstudio/qwen/qwen3-coder-30b" }
      ]
    }
  }
}

Fallback LiteLLM models are also registered under opencode.jsonprovider.litellm.models and appear in opencode models.

Root cause analysis (from omO 4.11.1 bundled source)

Investigated in node_modules/oh-my-openagent/dist/index.js (maps to packages/omo-opencode/src/hooks/runtime-fallback/*).

1. Toast is shown before auto-retry dispatch succeeds

dispatchFallbackRetry() calls prepareFallback(), shows the toast, then calls autoRetryWithFallback():

// fallback-retry-dispatcher.ts (conceptual — from dist/index.js ~103196)
async function dispatchFallbackRetry(deps, helpers, options) {
  const result = prepareFallback(...);
  if (result.success && deps.config.notify_on_fallback) {
    await deps.ctx.client.tui.showToast({
      body: {
        title: "Model Fallback",
        message: `Switching to ${result.newModel?.split("/").pop()} for next request`,
        ...
      }
    });
  }
  if (result.success && result.newModel) {
    await helpers.autoRetryWithFallback(...);  // may fail or be skipped AFTER toast
  }
}

Impact: Users see "Switching to …" even when autoRetryWithFallback is gated, fails, or loses a race.

2. Auto-retry can be silently skipped by promptAsync gate

createAutoRetryDispatcher() dispatches via dispatchInternalPrompt({ mode: "async", ... }) and only treats status === "dispatched" || status === "queued" as success (isInternalPromptDispatchAccepted). Otherwise it logs and returns without fallback:

  • Auto-retry skipped by promptAsync gate
  • Retry already in flight, skipping
  • Session active, queueing fallback dispatch (may not result in fallback model stream)

OpenCode’s built-in post-error loop (stream errorcancel session → retry same model ~30–44s later) appears to win this race.

3. chat.message hook restores agent primary after cooldown

When runtime_fallback is enabled, createChatMessageHandler2() can override user messages back to originalModel (agent config primary) when:

  • currentModel !== originalModel
  • pendingFallbackModel is cleared
  • originalModel is not in cooldown (60s default)
// chat-message-handler.ts (conceptual — from dist/index.js ~103079)
if (state.currentModel !== state.originalModel && !state.pendingFallbackModel
    && !isModelInCooldown(state.originalModel, state, config.cooldown_seconds)) {
  // "Restoring preferred primary model"
  output.message.model = { providerID: "openai", modelID: "gpt-5.4" };
  return;
}

Impact: Even after a successful fallback, the next user message can revert the session to the exhausted OpenAI primary (matches log section D).

4. Two fallback systems — primary uses the fragile one

When runtime_fallback.enabled === true, legacy model-fallback event handlers are disabled (shouldHandleModelFallback() returns false). Primary ultraworker depends entirely on runtime_fallback. Subagents use background-agent fallback (tryFallbackRetry / respawn), which continues to work.

5. Quota errors are classified retryable, but integration is incomplete

classifyRuntimeFallbackError() correctly maps "The usage limit has been reached"quota_exceeded, and isRuntimeFallbackRetryableError() returns true for that type. The classifier is fine; the dispatch / race / restore logic is not.

6. OmO [runtime-fallback] logs not visible in OpenCode log file

No [runtime-fallback] lines appear in opencode.log, making production diagnosis difficult. Consider routing plugin logs to the same sink or documenting where to find them.


Suggested fixes (for maintainers)

Fix 1 — Defer toast until dispatch is accepted (high priority)

File: packages/omo-opencode/src/hooks/runtime-fallback/fallback-retry-dispatcher.ts

Move toast after successful autoRetryWithFallback, or pass a callback:

export async function dispatchFallbackRetry(deps, helpers, options) {
  const result = prepareFallback(options.sessionID, options.state, options.fallbackModels, deps.config);
  if (!result.success || !result.newModel) {
    log(`[runtime-fallback] Fallback preparation failed`, { ... });
    return { ok: false, reason: result.error };
  }

  const dispatchResult = await helpers.autoRetryWithFallback(
    options.sessionID,
    result.newModel,
    options.resolvedAgent,
    options.source,
  );

  if (dispatchResult?.accepted && deps.config.notify_on_fallback) {
    await deps.ctx.client.tui.showToast({
      body: {
        title: "Model Fallback",
        message: `Switched to ${formatModel(result.newModel)}`,
        variant: "warning",
        duration: 5000,
      },
    }).catch(() => {});
  } else if (deps.config.notify_on_fallback) {
    await deps.ctx.client.tui.showToast({
      body: {
        title: "Model Fallback Failed",
        message: `Could not switch to ${formatModel(result.newModel)}. ${dispatchResult?.reason ?? "Retry blocked."}`,
        variant: "error",
        duration: 8000,
      },
    }).catch(() => {});
  }

  return { ok: dispatchResult?.accepted ?? false, newModel: result.newModel };
}

autoRetryWithFallback should return { accepted: boolean, reason?: string } instead of void.

Fix 2 — Abort OpenCode same-model retry before fallback dispatch (high priority)

Files:

  • auto-retry-dispatch.ts
  • message-update-handler.ts

On quota_exceeded / usage-limit errors, always call abortSessionRequest(sessionID, "message.updated.quota-fallback") and add the session to internallyAbortedSessions before dispatchFallbackRetry, similar to the existing session.status.retry-signal path.

Today, abort with internal marker is only guaranteed for:

source === "session.status.retry-signal"
  || source === "message.updated.retry-signal"
  || source === "session.timeout"

Extend to quota / usage-limit classification so session.error from the abort does not call resetRetryState().

Fix 3 — Do not restore primary while provider is quota-blocked (medium priority)

File: chat-message-handler.ts

Skip "Restoring preferred primary model" when:

  • originalModel provider recently failed with quota_exceeded, or
  • state.failedModels.has(originalModel) and still in cooldown, or
  • state.currentModel is a successful fallback (fallbackIndex >= 0)
function shouldRestorePrimary(state: FallbackState, config: RuntimeFallbackConfig): boolean {
  if (state.pendingFallbackModel) return false;
  if (state.fallbackIndex >= 0 && state.currentModel !== state.originalModel) {
    return false; // stay on active fallback until user explicitly changes model
  }
  if (isModelInCooldown(state.originalModel, state, config.cooldown_seconds)) {
    return false;
  }
  return state.currentModel !== state.originalModel;
}

Optionally add config: runtime_fallback.restore_primary_after_cooldown: false (default false when fallbacks configured).

Fix 4 — Persist fallback model on session record (medium priority)

After accepted fallback dispatch, call OpenCode session update so the core loop picks up the new model:

await ctx.client.session.update({
  path: { id: sessionID },
  body: {
    model: {
      providerID: parsed.providerID,
      modelID: parsed.modelID,
    },
  },
  query: { directory: ctx.directory },
});

This reduces reliance on winning the promptAsync race against OpenCode’s internal retry.

Fix 5 — Surface plugin logs in OpenCode log (low priority)

Route [runtime-fallback] log lines to the same structured logger OpenCode uses, or document OMO_LOG_LEVEL=debug and output path. Would have saved hours of diagnosis.

Fix 6 — Integration test (recommended)

Add a test that simulates:

  1. Primary stream error with message The usage limit has been reached
  2. message.updated with assistant error
  3. Assert promptAsync body contains fallback model
  4. Assert no second primary stream on original provider without fallback dispatch
  5. Assert toast fires only after accepted dispatch

Workaround (config only — not a fix)

Set agent primary to LiteLLM so quota errors hit the corporate gateway first:

"sisyphus": {
  "model": "litellm/openai.eu.gpt-5.5",
  "variant": "high",
  "fallback_models": [
    { "model": "litellm/vertex_ai.anthropic.claude-opus-4-8", "variant": "max" },
    ...
  ]
}

This avoids the broken OpenAI-primary → cross-provider fallback path but does not fix the underlying bug.

Steps to Reproduce

  1. Configure sisyphus primary on direct OpenAI (openai/gpt-5.4) with LiteLLM fallbacks as above.
  2. Enable runtime_fallback with notify_on_fallback: true.
  3. Start a long-lived Sisyphus - ultraworker session on a project.
  4. Exhaust the OpenAI subscription quota (or trigger repeated The usage limit has been reached errors).
  5. Observe the TUI toast: "Model Fallback — Switching to openai.eu.gpt-5.5(high) for next request" (or similar).
  6. Inspect OpenCode logs (~/.local/share/opencode/log/opencode.log) for stream providerID=… lines on the same session ID.

Checklist for repro confirmation

  • runtime_fallback.enabled: true
  • Primary on direct OpenAI (openai/*), fallback on different provider (litellm/*)
  • OpenAI quota exhausted (The usage limit has been reached)
  • Long-lived session (not a fresh session)
  • Compare stream providerID= lines vs TUI toast for same session.id

Expected Behavior

After a quota / usage-limit error on the primary model:

  1. OmO selects the next fallback from fallback_models.
  2. OmO dispatches a retry (promptAsync) with the fallback model in the request body.
  3. OpenCode logs show subsequent primary streams on the fallback provider, e.g. providerID=litellm modelID=openai.eu.gpt-5.5.
  4. Toast is shown only after the fallback dispatch is accepted (or clearly indicates failure).

Actual Behavior

  1. TUI toast appears announcing fallback to litellm/openai.eu.gpt-5.5.
  2. OpenCode logs show repeated streams on the same exhausted model:
    • providerID=openai modelID=gpt-5.4
    • ERROR … The usage limit has been reached
    • cancel session
    • ~30–44s later → same openai/gpt-5.4 stream again
  3. litellm/openai.eu.gpt-5.5 never appears in logs for the affected session after quota hit.
  4. Session can remain stuck retrying OpenAI for hours until the user manually picks a LiteLLM model in the TUI.

Doctor Output

~ bunx oh-my-openagent doctor --verbose

 oMoMoMoMo Doctor

System Information
────────────────────────────────────────
  ✓ opencode    1.17.5
  ✓ oh-my-openagent 4.11.1
  ✓ loaded      4.11.1
  ✓ bun         1.3.14
  ✓ path        /opt/homebrew/bin/opencode

Configuration
────────────────────────────────────────
  ✓ /Users/sfarida002/.config/opencode/opencode.jsonc (valid)

Tools
────────────────────────────────────────
  ✓ LSP         1 server
                    lsp-tools-mcp (*)
  ✓ ast-grep CLI installed
  ✓ comment-checker installed
  ✓ gh CLI installed (samer-farida_pwcit)

MCPs
────────────────────────────────────────
  ✓ websearch
  ✓ context7
  ✓ grep_app
  ✓ lsp

System
────────────────────────────────────────
OpenCode: 1.17.5
Plugin expected: 4.11.1
Plugin loaded: 4.11.1
Bun: 1.3.14

Configuration
────────────────────────────────────────
Path: /Users/sfarida002/.config/opencode/oh-my-openagent.json

TUI Plugin
────────────────────────────────────────
opencode.json: /Users/sfarida002/.config/opencode/opencode.jsonc
tui.json: /Users/sfarida002/.config/opencode/tui.json

Tools
────────────────────────────────────────
AST-Grep CLI: yes
Comment checker: yes
LSP: 1 server(s)
GH CLI: installed (authenticated)
MCP: builtin=4, user=0

Models
────────────────────────────────────────
═══ Available Models (from cache) ═══

  Providers in cache: 146
  Sample: requesty, qiniu-ai, alibaba-cn, regolo-ai, stackit, vercel...
  Total models: 5278
  Cache: /Users/sfarida002/.cache/opencode/models.json
  ℹ Runtime: only connected providers used
  Refresh: opencode models --refresh

═══ Configured Models ═══

Agents:
  ● sisyphus: openai/gpt-5.4 (high) [capabilities: snapshot-backed]
  ● hephaestus: openai/gpt-5.4-mini (medium) [capabilities: snapshot-backed]
  ● oracle: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
  ● librarian: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
  ● explore: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
  ● multimodal-looker: openai/gpt-5.4-mini (medium) [capabilities: snapshot-backed]
  ● prometheus: openai/gpt-5.4 (high) [capabilities: snapshot-backed]
  ● metis: openai/gpt-5.4-mini [capabilities: snapshot-backed]
  ● momus: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
  ● atlas: openai/gpt-5.4-mini [capabilities: snapshot-backed]
  ● sisyphus-junior: openai/gpt-5.4-mini [capabilities: snapshot-backed]

Categories:
  ● visual-engineering: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
  ● ultrabrain: openai/gpt-5.4 (high) [capabilities: snapshot-backed]
  ● deep: openai/gpt-5.4-mini (medium) [capabilities: snapshot-backed]
  ● artistry: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
  ● quick: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
  ● unspecified-low: openai/gpt-5.4-mini [capabilities: snapshot-backed]
  ● unspecified-high: openai/gpt-5.4-mini (max) [capabilities: snapshot-backed]
  ● writing: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]

● = user override, ○ = provider fallback

Summary
────────────────────────────────────────
  5 passed, 0 failed, 0 warnings
  Total: 6 checks in 511ms

Error Logs

---



## Log excerpts

Log file: `~/.local/share/opencode/log/opencode.log`

### A. Failing primary session — toast shown, fallback never used

Session: `ses_123c60247ffed3UX9zA1bZ0nNO`  
Agent: `Sisyphus - ultraworker`


timestamp=2026-06-19T13:48:54.839Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-19T13:48:55.317Z level=ERROR run=2093686d message="stream error" providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"
timestamp=2026-06-19T13:48:55.321Z level=INFO run=2093686d message=cancel session.id=ses_123c60247ffed3UX9zA1bZ0nNO

timestamp=2026-06-19T13:49:39.092Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-19T13:49:39.633Z level=ERROR run=2093686d message="stream error" providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"

timestamp=2026-06-19T13:54:55.343Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-19T13:54:56.219Z level=ERROR run=2093686d message="stream error" providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"
timestamp=2026-06-19T13:54:56.221Z level=INFO run=2093686d message=cancel session.id=ses_123c60247ffed3UX9zA1bZ0nNO


**Note:** A repo-wide search for `openai.eu.gpt-5.5` on this session ID returns **zero** matches. The session eventually moved to `litellm/openai.eu.gpt-5.3-codex` only after manual model selection (~15:05), not via the configured fallback chain.

### B. Session where fallback eventually worked (after new user message)

Session: `ses_12458f0d8ffelx0gCO7PDXmJgU`

Last OpenAI failure:


timestamp=2026-06-18T19:08:43.382Z level=INFO run=97a2d187 message=stream providerID=openai modelID=gpt-5.5 session.id=ses_12458f0d8ffelx0gCO7PDXmJgU small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-18T19:08:44.509Z level=ERROR run=97a2d187 message="stream error" providerID=openai modelID=gpt-5.5 session.id=ses_12458f0d8ffelx0gCO7PDXmJgU small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"


First successful LiteLLM stream (~17 minutes later, new run id, new user message):


timestamp=2026-06-18T19:25:08.436Z level=INFO run=82341f16 message=stream providerID=litellm modelID=openai.eu.gpt-5.5 session.id=ses_12458f0d8ffelx0gCO7PDXmJgU small=false agent="Sisyphus - ultraworker" mode=primary


### C. Subagent fallback works (different code path)

Same parent session `ses_123c60247ffed3UX9zA1bZ0nNO`, subagents after OpenAI quota errors:


timestamp=2026-06-18T19:40:35.049Z level=INFO … model.id=vertex_ai.anthropic.claude-sonnet-4-6 model.providerID=litellm … parentID=ses_123c60247ffed3UX9zA1bZ0nNO


Subsequent subagent streams use `providerID=litellm modelID=vertex_ai.anthropic.claude-sonnet-4-6`. Background-agent fallback respawn works; **primary `runtime_fallback` does not.**

### D. Primary session forced back to OpenAI on user message

Session had been on `litellm/openai.eu.gpt-5.4-mini` for ~16 hours. On new user messages at 11:28, streams switched to agent-config primary:


timestamp=2026-06-19T11:28:20.163Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary


This aligns with `chat.message` hook behavior that restores agent-config primary after cooldown (see root cause).

Configuration

Additional Context

  • OpenCode version: 1.17.5
  • oh-my-openagent version: 4.11.1 (bunx oh-my-opencode doctor --verbose — all checks passed)
  • Model names verified: all litellm/* fallbacks exist in opencode.json, opencode models, and gateway catalog
  • When runtime_fallback is disabled, legacy model-fallback is also disabled for events — there is no automatic cross-provider fallback for primary sessions in the current config shape

Operating System

macOS

OpenCode Version

1.17.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingopencodeOpenCode edition: packages/omo-opencode

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions