Prerequisites
Bug Description
Summary
With runtime_fallback.enabled: true, the TUI displays "Model Fallback — Switching to …" when the primary model hits quota / usage-limit errors, but OpenCode logs show the primary ultraworker session continues streaming on the same exhausted provider/model (openai/gpt-5.4). The configured fallback (litellm/openai.eu.gpt-5.5) is never used for that session.
This is not a model-name mismatch: fallback models are registered in OpenCode, listed by opencode models, and match the gateway catalog. Subagent fallback does work via a different code path (background-agent respawn on LiteLLM).
Environment
| Component |
Version |
| OpenCode |
1.17.5 |
| oh-my-openagent |
4.11.1 |
| OS |
macOS (darwin 25.4.0) |
| Agent |
Sisyphus - ultraworker (config key: sisyphus) |
| Primary model |
openai/gpt-5.4 |
| First fallback |
litellm/openai.eu.gpt-5.5 |
Relevant oh-my-openagent.json snippet
{
"runtime_fallback": {
"enabled": true,
"max_fallback_attempts": 5,
"cooldown_seconds": 60,
"notify_on_fallback": true
},
"agents": {
"sisyphus": {
"model": "openai/gpt-5.4",
"variant": "high",
"fallback_models": [
{ "model": "litellm/openai.eu.gpt-5.5", "variant": "high" },
{ "model": "litellm/vertex_ai.anthropic.claude-opus-4-8", "variant": "max" },
{ "model": "opencode/deepseek-v4-flash-free" },
{ "model": "lmstudio/qwen/qwen3-coder-30b" }
]
}
}
}
Fallback LiteLLM models are also registered under opencode.json → provider.litellm.models and appear in opencode models.
Root cause analysis (from omO 4.11.1 bundled source)
Investigated in node_modules/oh-my-openagent/dist/index.js (maps to packages/omo-opencode/src/hooks/runtime-fallback/*).
1. Toast is shown before auto-retry dispatch succeeds
dispatchFallbackRetry() calls prepareFallback(), shows the toast, then calls autoRetryWithFallback():
// fallback-retry-dispatcher.ts (conceptual — from dist/index.js ~103196)
async function dispatchFallbackRetry(deps, helpers, options) {
const result = prepareFallback(...);
if (result.success && deps.config.notify_on_fallback) {
await deps.ctx.client.tui.showToast({
body: {
title: "Model Fallback",
message: `Switching to ${result.newModel?.split("/").pop()} for next request`,
...
}
});
}
if (result.success && result.newModel) {
await helpers.autoRetryWithFallback(...); // may fail or be skipped AFTER toast
}
}
Impact: Users see "Switching to …" even when autoRetryWithFallback is gated, fails, or loses a race.
2. Auto-retry can be silently skipped by promptAsync gate
createAutoRetryDispatcher() dispatches via dispatchInternalPrompt({ mode: "async", ... }) and only treats status === "dispatched" || status === "queued" as success (isInternalPromptDispatchAccepted). Otherwise it logs and returns without fallback:
Auto-retry skipped by promptAsync gate
Retry already in flight, skipping
Session active, queueing fallback dispatch (may not result in fallback model stream)
OpenCode’s built-in post-error loop (stream error → cancel session → retry same model ~30–44s later) appears to win this race.
3. chat.message hook restores agent primary after cooldown
When runtime_fallback is enabled, createChatMessageHandler2() can override user messages back to originalModel (agent config primary) when:
currentModel !== originalModel
pendingFallbackModel is cleared
originalModel is not in cooldown (60s default)
// chat-message-handler.ts (conceptual — from dist/index.js ~103079)
if (state.currentModel !== state.originalModel && !state.pendingFallbackModel
&& !isModelInCooldown(state.originalModel, state, config.cooldown_seconds)) {
// "Restoring preferred primary model"
output.message.model = { providerID: "openai", modelID: "gpt-5.4" };
return;
}
Impact: Even after a successful fallback, the next user message can revert the session to the exhausted OpenAI primary (matches log section D).
4. Two fallback systems — primary uses the fragile one
When runtime_fallback.enabled === true, legacy model-fallback event handlers are disabled (shouldHandleModelFallback() returns false). Primary ultraworker depends entirely on runtime_fallback. Subagents use background-agent fallback (tryFallbackRetry / respawn), which continues to work.
5. Quota errors are classified retryable, but integration is incomplete
classifyRuntimeFallbackError() correctly maps "The usage limit has been reached" → quota_exceeded, and isRuntimeFallbackRetryableError() returns true for that type. The classifier is fine; the dispatch / race / restore logic is not.
6. OmO [runtime-fallback] logs not visible in OpenCode log file
No [runtime-fallback] lines appear in opencode.log, making production diagnosis difficult. Consider routing plugin logs to the same sink or documenting where to find them.
Suggested fixes (for maintainers)
Fix 1 — Defer toast until dispatch is accepted (high priority)
File: packages/omo-opencode/src/hooks/runtime-fallback/fallback-retry-dispatcher.ts
Move toast after successful autoRetryWithFallback, or pass a callback:
export async function dispatchFallbackRetry(deps, helpers, options) {
const result = prepareFallback(options.sessionID, options.state, options.fallbackModels, deps.config);
if (!result.success || !result.newModel) {
log(`[runtime-fallback] Fallback preparation failed`, { ... });
return { ok: false, reason: result.error };
}
const dispatchResult = await helpers.autoRetryWithFallback(
options.sessionID,
result.newModel,
options.resolvedAgent,
options.source,
);
if (dispatchResult?.accepted && deps.config.notify_on_fallback) {
await deps.ctx.client.tui.showToast({
body: {
title: "Model Fallback",
message: `Switched to ${formatModel(result.newModel)}`,
variant: "warning",
duration: 5000,
},
}).catch(() => {});
} else if (deps.config.notify_on_fallback) {
await deps.ctx.client.tui.showToast({
body: {
title: "Model Fallback Failed",
message: `Could not switch to ${formatModel(result.newModel)}. ${dispatchResult?.reason ?? "Retry blocked."}`,
variant: "error",
duration: 8000,
},
}).catch(() => {});
}
return { ok: dispatchResult?.accepted ?? false, newModel: result.newModel };
}
autoRetryWithFallback should return { accepted: boolean, reason?: string } instead of void.
Fix 2 — Abort OpenCode same-model retry before fallback dispatch (high priority)
Files:
auto-retry-dispatch.ts
message-update-handler.ts
On quota_exceeded / usage-limit errors, always call abortSessionRequest(sessionID, "message.updated.quota-fallback") and add the session to internallyAbortedSessions before dispatchFallbackRetry, similar to the existing session.status.retry-signal path.
Today, abort with internal marker is only guaranteed for:
source === "session.status.retry-signal"
|| source === "message.updated.retry-signal"
|| source === "session.timeout"
Extend to quota / usage-limit classification so session.error from the abort does not call resetRetryState().
Fix 3 — Do not restore primary while provider is quota-blocked (medium priority)
File: chat-message-handler.ts
Skip "Restoring preferred primary model" when:
originalModel provider recently failed with quota_exceeded, or
state.failedModels.has(originalModel) and still in cooldown, or
state.currentModel is a successful fallback (fallbackIndex >= 0)
function shouldRestorePrimary(state: FallbackState, config: RuntimeFallbackConfig): boolean {
if (state.pendingFallbackModel) return false;
if (state.fallbackIndex >= 0 && state.currentModel !== state.originalModel) {
return false; // stay on active fallback until user explicitly changes model
}
if (isModelInCooldown(state.originalModel, state, config.cooldown_seconds)) {
return false;
}
return state.currentModel !== state.originalModel;
}
Optionally add config: runtime_fallback.restore_primary_after_cooldown: false (default false when fallbacks configured).
Fix 4 — Persist fallback model on session record (medium priority)
After accepted fallback dispatch, call OpenCode session update so the core loop picks up the new model:
await ctx.client.session.update({
path: { id: sessionID },
body: {
model: {
providerID: parsed.providerID,
modelID: parsed.modelID,
},
},
query: { directory: ctx.directory },
});
This reduces reliance on winning the promptAsync race against OpenCode’s internal retry.
Fix 5 — Surface plugin logs in OpenCode log (low priority)
Route [runtime-fallback] log lines to the same structured logger OpenCode uses, or document OMO_LOG_LEVEL=debug and output path. Would have saved hours of diagnosis.
Fix 6 — Integration test (recommended)
Add a test that simulates:
- Primary stream error with message
The usage limit has been reached
message.updated with assistant error
- Assert
promptAsync body contains fallback model
- Assert no second primary stream on original provider without fallback dispatch
- Assert toast fires only after accepted dispatch
Workaround (config only — not a fix)
Set agent primary to LiteLLM so quota errors hit the corporate gateway first:
"sisyphus": {
"model": "litellm/openai.eu.gpt-5.5",
"variant": "high",
"fallback_models": [
{ "model": "litellm/vertex_ai.anthropic.claude-opus-4-8", "variant": "max" },
...
]
}
This avoids the broken OpenAI-primary → cross-provider fallback path but does not fix the underlying bug.
Steps to Reproduce
- Configure
sisyphus primary on direct OpenAI (openai/gpt-5.4) with LiteLLM fallbacks as above.
- Enable
runtime_fallback with notify_on_fallback: true.
- Start a long-lived Sisyphus - ultraworker session on a project.
- Exhaust the OpenAI subscription quota (or trigger repeated
The usage limit has been reached errors).
- Observe the TUI toast: "Model Fallback — Switching to openai.eu.gpt-5.5(high) for next request" (or similar).
- Inspect OpenCode logs (
~/.local/share/opencode/log/opencode.log) for stream providerID=… lines on the same session ID.
Checklist for repro confirmation
Expected Behavior
After a quota / usage-limit error on the primary model:
- OmO selects the next fallback from
fallback_models.
- OmO dispatches a retry (
promptAsync) with the fallback model in the request body.
- OpenCode logs show subsequent primary streams on the fallback provider, e.g.
providerID=litellm modelID=openai.eu.gpt-5.5.
- Toast is shown only after the fallback dispatch is accepted (or clearly indicates failure).
Actual Behavior
- TUI toast appears announcing fallback to
litellm/openai.eu.gpt-5.5.
- OpenCode logs show repeated streams on the same exhausted model:
providerID=openai modelID=gpt-5.4
ERROR … The usage limit has been reached
cancel session
- ~30–44s later → same
openai/gpt-5.4 stream again
litellm/openai.eu.gpt-5.5 never appears in logs for the affected session after quota hit.
- Session can remain stuck retrying OpenAI for hours until the user manually picks a LiteLLM model in the TUI.
Doctor Output
~ bunx oh-my-openagent doctor --verbose
oMoMoMoMo Doctor
System Information
────────────────────────────────────────
✓ opencode 1.17.5
✓ oh-my-openagent 4.11.1
✓ loaded 4.11.1
✓ bun 1.3.14
✓ path /opt/homebrew/bin/opencode
Configuration
────────────────────────────────────────
✓ /Users/sfarida002/.config/opencode/opencode.jsonc (valid)
Tools
────────────────────────────────────────
✓ LSP 1 server
lsp-tools-mcp (*)
✓ ast-grep CLI installed
✓ comment-checker installed
✓ gh CLI installed (samer-farida_pwcit)
MCPs
────────────────────────────────────────
✓ websearch
✓ context7
✓ grep_app
✓ lsp
System
────────────────────────────────────────
OpenCode: 1.17.5
Plugin expected: 4.11.1
Plugin loaded: 4.11.1
Bun: 1.3.14
Configuration
────────────────────────────────────────
Path: /Users/sfarida002/.config/opencode/oh-my-openagent.json
TUI Plugin
────────────────────────────────────────
opencode.json: /Users/sfarida002/.config/opencode/opencode.jsonc
tui.json: /Users/sfarida002/.config/opencode/tui.json
Tools
────────────────────────────────────────
AST-Grep CLI: yes
Comment checker: yes
LSP: 1 server(s)
GH CLI: installed (authenticated)
MCP: builtin=4, user=0
Models
────────────────────────────────────────
═══ Available Models (from cache) ═══
Providers in cache: 146
Sample: requesty, qiniu-ai, alibaba-cn, regolo-ai, stackit, vercel...
Total models: 5278
Cache: /Users/sfarida002/.cache/opencode/models.json
ℹ Runtime: only connected providers used
Refresh: opencode models --refresh
═══ Configured Models ═══
Agents:
● sisyphus: openai/gpt-5.4 (high) [capabilities: snapshot-backed]
● hephaestus: openai/gpt-5.4-mini (medium) [capabilities: snapshot-backed]
● oracle: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
● librarian: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
● explore: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
● multimodal-looker: openai/gpt-5.4-mini (medium) [capabilities: snapshot-backed]
● prometheus: openai/gpt-5.4 (high) [capabilities: snapshot-backed]
● metis: openai/gpt-5.4-mini [capabilities: snapshot-backed]
● momus: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
● atlas: openai/gpt-5.4-mini [capabilities: snapshot-backed]
● sisyphus-junior: openai/gpt-5.4-mini [capabilities: snapshot-backed]
Categories:
● visual-engineering: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
● ultrabrain: openai/gpt-5.4 (high) [capabilities: snapshot-backed]
● deep: openai/gpt-5.4-mini (medium) [capabilities: snapshot-backed]
● artistry: openai/gpt-5.4-mini (high) [capabilities: snapshot-backed]
● quick: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
● unspecified-low: openai/gpt-5.4-mini [capabilities: snapshot-backed]
● unspecified-high: openai/gpt-5.4-mini (max) [capabilities: snapshot-backed]
● writing: openai/gpt-5.4-mini-fast [capabilities: snapshot-backed]
● = user override, ○ = provider fallback
Summary
────────────────────────────────────────
5 passed, 0 failed, 0 warnings
Total: 6 checks in 511ms
Error Logs
---
## Log excerpts
Log file: `~/.local/share/opencode/log/opencode.log`
### A. Failing primary session — toast shown, fallback never used
Session: `ses_123c60247ffed3UX9zA1bZ0nNO`
Agent: `Sisyphus - ultraworker`
timestamp=2026-06-19T13:48:54.839Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-19T13:48:55.317Z level=ERROR run=2093686d message="stream error" providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"
timestamp=2026-06-19T13:48:55.321Z level=INFO run=2093686d message=cancel session.id=ses_123c60247ffed3UX9zA1bZ0nNO
timestamp=2026-06-19T13:49:39.092Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-19T13:49:39.633Z level=ERROR run=2093686d message="stream error" providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"
timestamp=2026-06-19T13:54:55.343Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-19T13:54:56.219Z level=ERROR run=2093686d message="stream error" providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"
timestamp=2026-06-19T13:54:56.221Z level=INFO run=2093686d message=cancel session.id=ses_123c60247ffed3UX9zA1bZ0nNO
**Note:** A repo-wide search for `openai.eu.gpt-5.5` on this session ID returns **zero** matches. The session eventually moved to `litellm/openai.eu.gpt-5.3-codex` only after manual model selection (~15:05), not via the configured fallback chain.
### B. Session where fallback eventually worked (after new user message)
Session: `ses_12458f0d8ffelx0gCO7PDXmJgU`
Last OpenAI failure:
timestamp=2026-06-18T19:08:43.382Z level=INFO run=97a2d187 message=stream providerID=openai modelID=gpt-5.5 session.id=ses_12458f0d8ffelx0gCO7PDXmJgU small=false agent="Sisyphus - ultraworker" mode=primary
timestamp=2026-06-18T19:08:44.509Z level=ERROR run=97a2d187 message="stream error" providerID=openai modelID=gpt-5.5 session.id=ses_12458f0d8ffelx0gCO7PDXmJgU small=false agent="Sisyphus - ultraworker" mode=primary error.error="AI_APICallError: The usage limit has been reached"
First successful LiteLLM stream (~17 minutes later, new run id, new user message):
timestamp=2026-06-18T19:25:08.436Z level=INFO run=82341f16 message=stream providerID=litellm modelID=openai.eu.gpt-5.5 session.id=ses_12458f0d8ffelx0gCO7PDXmJgU small=false agent="Sisyphus - ultraworker" mode=primary
### C. Subagent fallback works (different code path)
Same parent session `ses_123c60247ffed3UX9zA1bZ0nNO`, subagents after OpenAI quota errors:
timestamp=2026-06-18T19:40:35.049Z level=INFO … model.id=vertex_ai.anthropic.claude-sonnet-4-6 model.providerID=litellm … parentID=ses_123c60247ffed3UX9zA1bZ0nNO
Subsequent subagent streams use `providerID=litellm modelID=vertex_ai.anthropic.claude-sonnet-4-6`. Background-agent fallback respawn works; **primary `runtime_fallback` does not.**
### D. Primary session forced back to OpenAI on user message
Session had been on `litellm/openai.eu.gpt-5.4-mini` for ~16 hours. On new user messages at 11:28, streams switched to agent-config primary:
timestamp=2026-06-19T11:28:20.163Z level=INFO run=2093686d message=stream providerID=openai modelID=gpt-5.4 session.id=ses_123c60247ffed3UX9zA1bZ0nNO small=false agent="Sisyphus - ultraworker" mode=primary
This aligns with `chat.message` hook behavior that restores agent-config primary after cooldown (see root cause).
Configuration
Additional Context
- OpenCode version: 1.17.5
- oh-my-openagent version: 4.11.1 (
bunx oh-my-opencode doctor --verbose — all checks passed)
- Model names verified: all
litellm/* fallbacks exist in opencode.json, opencode models, and gateway catalog
- When
runtime_fallback is disabled, legacy model-fallback is also disabled for events — there is no automatic cross-provider fallback for primary sessions in the current config shape
Operating System
macOS
OpenCode Version
1.17.5
Prerequisites
Bug Description
Summary
With
runtime_fallback.enabled: true, the TUI displays "Model Fallback — Switching to …" when the primary model hits quota / usage-limit errors, but OpenCode logs show the primary ultraworker session continues streaming on the same exhausted provider/model (openai/gpt-5.4). The configured fallback (litellm/openai.eu.gpt-5.5) is never used for that session.This is not a model-name mismatch: fallback models are registered in OpenCode, listed by
opencode models, and match the gateway catalog. Subagent fallback does work via a different code path (background-agent respawn on LiteLLM).Environment
Sisyphus - ultraworker(config key:sisyphus)openai/gpt-5.4litellm/openai.eu.gpt-5.5Relevant
oh-my-openagent.jsonsnippet{ "runtime_fallback": { "enabled": true, "max_fallback_attempts": 5, "cooldown_seconds": 60, "notify_on_fallback": true }, "agents": { "sisyphus": { "model": "openai/gpt-5.4", "variant": "high", "fallback_models": [ { "model": "litellm/openai.eu.gpt-5.5", "variant": "high" }, { "model": "litellm/vertex_ai.anthropic.claude-opus-4-8", "variant": "max" }, { "model": "opencode/deepseek-v4-flash-free" }, { "model": "lmstudio/qwen/qwen3-coder-30b" } ] } } }Fallback LiteLLM models are also registered under
opencode.json→provider.litellm.modelsand appear inopencode models.Root cause analysis (from omO 4.11.1 bundled source)
Investigated in
node_modules/oh-my-openagent/dist/index.js(maps topackages/omo-opencode/src/hooks/runtime-fallback/*).1. Toast is shown before auto-retry dispatch succeeds
dispatchFallbackRetry()callsprepareFallback(), shows the toast, then callsautoRetryWithFallback():Impact: Users see "Switching to …" even when
autoRetryWithFallbackis gated, fails, or loses a race.2. Auto-retry can be silently skipped by
promptAsyncgatecreateAutoRetryDispatcher()dispatches viadispatchInternalPrompt({ mode: "async", ... })and only treatsstatus === "dispatched" || status === "queued"as success (isInternalPromptDispatchAccepted). Otherwise it logs and returns without fallback:Auto-retry skipped by promptAsync gateRetry already in flight, skippingSession active, queueing fallback dispatch(may not result in fallback model stream)OpenCode’s built-in post-error loop (
stream error→cancel session→ retry same model ~30–44s later) appears to win this race.3.
chat.messagehook restores agent primary after cooldownWhen
runtime_fallbackis enabled,createChatMessageHandler2()can override user messages back tooriginalModel(agent config primary) when:currentModel !== originalModelpendingFallbackModelis clearedoriginalModelis not in cooldown (60s default)Impact: Even after a successful fallback, the next user message can revert the session to the exhausted OpenAI primary (matches log section D).
4. Two fallback systems — primary uses the fragile one
When
runtime_fallback.enabled === true, legacymodel-fallbackevent handlers are disabled (shouldHandleModelFallback()returns false). Primary ultraworker depends entirely onruntime_fallback. Subagents use background-agent fallback (tryFallbackRetry/ respawn), which continues to work.5. Quota errors are classified retryable, but integration is incomplete
classifyRuntimeFallbackError()correctly maps"The usage limit has been reached"→quota_exceeded, andisRuntimeFallbackRetryableError()returnstruefor that type. The classifier is fine; the dispatch / race / restore logic is not.6. OmO
[runtime-fallback]logs not visible in OpenCode log fileNo
[runtime-fallback]lines appear inopencode.log, making production diagnosis difficult. Consider routing plugin logs to the same sink or documenting where to find them.Suggested fixes (for maintainers)
Fix 1 — Defer toast until dispatch is accepted (high priority)
File:
packages/omo-opencode/src/hooks/runtime-fallback/fallback-retry-dispatcher.tsMove toast after successful
autoRetryWithFallback, or pass a callback:autoRetryWithFallbackshould return{ accepted: boolean, reason?: string }instead ofvoid.Fix 2 — Abort OpenCode same-model retry before fallback dispatch (high priority)
Files:
auto-retry-dispatch.tsmessage-update-handler.tsOn
quota_exceeded/ usage-limit errors, always callabortSessionRequest(sessionID, "message.updated.quota-fallback")and add the session tointernallyAbortedSessionsbeforedispatchFallbackRetry, similar to the existingsession.status.retry-signalpath.Today, abort with internal marker is only guaranteed for:
Extend to quota / usage-limit classification so
session.errorfrom the abort does not callresetRetryState().Fix 3 — Do not restore primary while provider is quota-blocked (medium priority)
File:
chat-message-handler.tsSkip "Restoring preferred primary model" when:
originalModelprovider recently failed withquota_exceeded, orstate.failedModels.has(originalModel)and still in cooldown, orstate.currentModelis a successful fallback (fallbackIndex >= 0)Optionally add config:
runtime_fallback.restore_primary_after_cooldown: false(defaultfalsewhen fallbacks configured).Fix 4 — Persist fallback model on session record (medium priority)
After accepted fallback dispatch, call OpenCode session update so the core loop picks up the new model:
This reduces reliance on winning the
promptAsyncrace against OpenCode’s internal retry.Fix 5 — Surface plugin logs in OpenCode log (low priority)
Route
[runtime-fallback]log lines to the same structured logger OpenCode uses, or documentOMO_LOG_LEVEL=debugand output path. Would have saved hours of diagnosis.Fix 6 — Integration test (recommended)
Add a test that simulates:
The usage limit has been reachedmessage.updatedwith assistanterrorpromptAsyncbody contains fallback modelWorkaround (config only — not a fix)
Set agent primary to LiteLLM so quota errors hit the corporate gateway first:
This avoids the broken OpenAI-primary → cross-provider fallback path but does not fix the underlying bug.
Steps to Reproduce
sisyphusprimary on direct OpenAI (openai/gpt-5.4) with LiteLLM fallbacks as above.runtime_fallbackwithnotify_on_fallback: true.The usage limit has been reachederrors).~/.local/share/opencode/log/opencode.log) forstream providerID=…lines on the same session ID.Checklist for repro confirmation
runtime_fallback.enabled: trueopenai/*), fallback on different provider (litellm/*)The usage limit has been reached)stream providerID=lines vs TUI toast for samesession.idExpected Behavior
After a quota / usage-limit error on the primary model:
fallback_models.promptAsync) with the fallback model in the request body.providerID=litellm modelID=openai.eu.gpt-5.5.Actual Behavior
litellm/openai.eu.gpt-5.5.providerID=openai modelID=gpt-5.4ERROR … The usage limit has been reachedcancel sessionopenai/gpt-5.4stream againlitellm/openai.eu.gpt-5.5never appears in logs for the affected session after quota hit.Doctor Output
Error Logs
Configuration
Additional Context
bunx oh-my-opencode doctor --verbose— all checks passed)litellm/*fallbacks exist inopencode.json,opencode models, and gateway catalogruntime_fallbackis disabled, legacymodel-fallbackis also disabled for events — there is no automatic cross-provider fallback for primary sessions in the current config shapeOperating System
macOS
OpenCode Version
1.17.5