Skip to content

[Bug]: runtime_fallback persists duplicate user messages during model fallback #4595

@Edison-A-N

Description

@Edison-A-N

Prerequisites

  • I will write this issue in English (see the Language Policy)
  • I have searched existing issues to avoid duplicates
  • I am using the latest version of oh-my-openagent
  • I have read the documentation / asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer

Bug Description

runtime_fallback can persist the same user prompt as multiple distinct user messages in the same OpenCode session when a provider rate-limit/cooldown triggers fallback.

This is different from duplicated stream chunks or duplicated assistant output. In the OpenCode SQLite database, the repeated user turns have different message.id values and each fallback attempt is recorded as if the user had submitted the prompt again.

I inspected oh-my-openagent@4.5.1 and the current runtime fallback path still extracts the last user message parts and re-dispatches them via promptAsync:

// dist/index.js from oh-my-openagent@4.5.1
const lastUserMessage = messages?.filter((message) => message.info?.role === "user").pop();
const retryParts = ...

const promptResult = await dispatchInternalPrompt({
  mode: "async",
  source: `runtime-fallback:${source}`,
  input: {
    path: { id: sessionID },
    body: {
      ...retryModelPayload,
      parts: retryParts
    }
  }
});

dispatchInternalPrompt(... mode: "async") calls client.session.promptAsync(...). Because this is a normal prompt submission, OpenCode 1.15 persists each fallback retry as a new user message.

Steps to Reproduce

  1. Use OpenCode with oh-my-openagent@latest (4.5.1) and OpenCode 1.15.11.
  2. Enable runtime/model fallback with multiple fallback models, for example:
{
  "model_fallback": true,
  "runtime_fallback": {
    "enabled": true,
    "retry_on_errors": [400, 429, 500, 502, 503, 529],
    "max_fallback_attempts": 3,
    "cooldown_seconds": 60,
    "timeout_seconds": 30,
    "notify_on_fallback": true
  },
  "agents": {
    "sisyphus": {
      "model": "github-copilot/gpt-5.5",
      "fallback_models": [
        "horologium/qwen3-7-max",
        "horologium/deepseek-v4-pro"
      ]
    }
  }
}
  1. Send a prompt while the primary provider/model is rate-limited or cooling down.
  2. Let runtime fallback switch to the fallback models.
  3. Inspect the OpenCode DB (message / part tables) or exported session history.

Expected Behavior

A runtime fallback retry should not create a second/third durable user turn for the same human-submitted prompt.

Possible acceptable behaviors:

  • Reuse the original user message and attach fallback attempts as assistant/model-attempt metadata.
  • Mark retry prompts with metadata such as fallback_retry_of: <original_message_id> so they can be hidden/deduplicated in history/export.
  • Use an OpenCode API/path that retries the model invocation without creating a new user message, if available.

The user-visible and persisted session history should contain one user turn for one human submission.

Actual Behavior

The same human prompt is persisted multiple times as separate user messages, each with its own msg_* id, matching the fallback model sequence.

Observed sanitized timeline from OpenCode SQLite:

created              role       user/provider-model                  assistant/provider-model       finish/error
2026-05-28 06:14:40  user       github-copilot/gpt-5.5
2026-05-28 06:14:40  assistant                                      github-copilot/gpt-5.5
2026-05-28 06:14:43  user       horologium/qwen3-7-max
2026-05-28 06:14:45  assistant                                      horologium/qwen3-7-max          tool-calls
2026-05-28 06:15:10  assistant                                      horologium/qwen3-7-max          MessageAbortedError
2026-05-28 06:15:13  user       horologium/deepseek-v4-pro
2026-05-28 06:15:13  assistant                                      horologium/deepseek-v4-pro      stop

The same pattern happened again a few minutes later:

2026-05-28 06:18:55  user       github-copilot/gpt-5.5
2026-05-28 06:19:48  user       horologium/qwen3-7-max
2026-05-28 06:19:48  assistant                                      horologium/qwen3-7-max          stop
2026-05-28 06:20:18  user       horologium/deepseek-v4-pro
2026-05-28 06:20:18  assistant                                      horologium/deepseek-v4-pro      stop

The user did not manually resend those prompts. The only known trigger was provider-side rate limiting/cooldown.

Doctor Output

$ bunx oh-my-openagent doctor
/bin/bash: bunx: command not found

$ npm exec --yes oh-my-openagent@latest -- doctor
# no output; command timed out after 180 seconds in this environment

$ opencode --version
1.15.11

$ npm view oh-my-openagent version
4.5.1

$ node --version
v24.13.0

$ npm --version
11.6.2

Error Logs

The relevant persisted session included a MessageAbortedError during the fallback sequence:

assistant horologium/qwen3-7-max error=MessageAbortedError

Prior related issues I found, but which do not seem to cover this exact persistence behavior:

This report is specifically about fallback retries being stored as additional user messages in OpenCode session history.

Configuration

Sanitized relevant config only:

{
  "plugin": ["oh-my-openagent@latest"],
  "default_agent": "sisyphus",
  "enabled_providers": ["horologium", "github-copilot"],
  "model_fallback": true,
  "runtime_fallback": {
    "enabled": true,
    "retry_on_errors": [400, 429, 500, 502, 503, 529],
    "max_fallback_attempts": 3,
    "cooldown_seconds": 60,
    "timeout_seconds": 30,
    "notify_on_fallback": true
  },
  "agents": {
    "sisyphus": {
      "model": "github-copilot/gpt-5.5",
      "fallback_models": [
        "horologium/qwen3-7-max",
        "horologium/deepseek-v4-pro"
      ]
    }
  }
}

Additional Context

I attempted a small CLI reproduction with:

opencode run --agent sisyphus --model horologium/qwen3-7-max --title omo-fallback-test --format json "reply OK"

In my non-interactive harness this started OpenCode/plugin loading but did not persist a test session, so the strongest evidence above is from a real affected OpenCode session plus inspection of the oh-my-openagent@4.5.1 packaged implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions