[Bug]: runtime_fallback persists duplicate user messages during model fallback

## Prerequisites

- [x] I will write this issue in English (see the Language Policy)
- [x] I have searched existing issues to avoid duplicates
- [x] I am using the latest version of oh-my-openagent
- [x] I have read the documentation / asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer

## Bug Description

`runtime_fallback` can persist the same user prompt as multiple distinct user messages in the same OpenCode session when a provider rate-limit/cooldown triggers fallback.

This is different from duplicated stream chunks or duplicated assistant output. In the OpenCode SQLite database, the repeated user turns have different `message.id` values and each fallback attempt is recorded as if the user had submitted the prompt again.

I inspected `oh-my-openagent@4.5.1` and the current runtime fallback path still extracts the last user message parts and re-dispatches them via `promptAsync`:

```js
// dist/index.js from oh-my-openagent@4.5.1
const lastUserMessage = messages?.filter((message) => message.info?.role === "user").pop();
const retryParts = ...

const promptResult = await dispatchInternalPrompt({
  mode: "async",
  source: `runtime-fallback:${source}`,
  input: {
    path: { id: sessionID },
    body: {
      ...retryModelPayload,
      parts: retryParts
    }
  }
});
```

`dispatchInternalPrompt(... mode: "async")` calls `client.session.promptAsync(...)`. Because this is a normal prompt submission, OpenCode 1.15 persists each fallback retry as a new user message.

## Steps to Reproduce

1. Use OpenCode with `oh-my-openagent@latest` (`4.5.1`) and OpenCode `1.15.11`.
2. Enable runtime/model fallback with multiple fallback models, for example:

```jsonc
{
  "model_fallback": true,
  "runtime_fallback": {
    "enabled": true,
    "retry_on_errors": [400, 429, 500, 502, 503, 529],
    "max_fallback_attempts": 3,
    "cooldown_seconds": 60,
    "timeout_seconds": 30,
    "notify_on_fallback": true
  },
  "agents": {
    "sisyphus": {
      "model": "github-copilot/gpt-5.5",
      "fallback_models": [
        "horologium/qwen3-7-max",
        "horologium/deepseek-v4-pro"
      ]
    }
  }
}
```

3. Send a prompt while the primary provider/model is rate-limited or cooling down.
4. Let runtime fallback switch to the fallback models.
5. Inspect the OpenCode DB (`message` / `part` tables) or exported session history.

## Expected Behavior

A runtime fallback retry should not create a second/third durable user turn for the same human-submitted prompt.

Possible acceptable behaviors:

- Reuse the original user message and attach fallback attempts as assistant/model-attempt metadata.
- Mark retry prompts with metadata such as `fallback_retry_of: <original_message_id>` so they can be hidden/deduplicated in history/export.
- Use an OpenCode API/path that retries the model invocation without creating a new user message, if available.

The user-visible and persisted session history should contain one user turn for one human submission.

## Actual Behavior

The same human prompt is persisted multiple times as separate user messages, each with its own `msg_*` id, matching the fallback model sequence.

Observed sanitized timeline from OpenCode SQLite:

```text
created              role       user/provider-model                  assistant/provider-model       finish/error
2026-05-28 06:14:40  user       github-copilot/gpt-5.5
2026-05-28 06:14:40  assistant                                      github-copilot/gpt-5.5
2026-05-28 06:14:43  user       horologium/qwen3-7-max
2026-05-28 06:14:45  assistant                                      horologium/qwen3-7-max          tool-calls
2026-05-28 06:15:10  assistant                                      horologium/qwen3-7-max          MessageAbortedError
2026-05-28 06:15:13  user       horologium/deepseek-v4-pro
2026-05-28 06:15:13  assistant                                      horologium/deepseek-v4-pro      stop
```

The same pattern happened again a few minutes later:

```text
2026-05-28 06:18:55  user       github-copilot/gpt-5.5
2026-05-28 06:19:48  user       horologium/qwen3-7-max
2026-05-28 06:19:48  assistant                                      horologium/qwen3-7-max          stop
2026-05-28 06:20:18  user       horologium/deepseek-v4-pro
2026-05-28 06:20:18  assistant                                      horologium/deepseek-v4-pro      stop
```

The user did not manually resend those prompts. The only known trigger was provider-side rate limiting/cooldown.

## Doctor Output

```shell
$ bunx oh-my-openagent doctor
/bin/bash: bunx: command not found

$ npm exec --yes oh-my-openagent@latest -- doctor
# no output; command timed out after 180 seconds in this environment

$ opencode --version
1.15.11

$ npm view oh-my-openagent version
4.5.1

$ node --version
v24.13.0

$ npm --version
11.6.2
```

## Error Logs

The relevant persisted session included a `MessageAbortedError` during the fallback sequence:

```text
assistant horologium/qwen3-7-max error=MessageAbortedError
```

Prior related issues I found, but which do not seem to cover this exact persistence behavior:

- #4019: duplicate output/thought streams / double delegation
- v4.2.0 release: promptAsync gate for duplicate assistant responses
- #2301: runtime_fallback / model_fallback conflict and `abort + promptAsync(prompt)` behavior
- #4006: internal abort misclassified as user cancellation causing retry loop

This report is specifically about fallback retries being stored as additional user messages in OpenCode session history.

## Configuration

Sanitized relevant config only:

```jsonc
{
  "plugin": ["oh-my-openagent@latest"],
  "default_agent": "sisyphus",
  "enabled_providers": ["horologium", "github-copilot"],
  "model_fallback": true,
  "runtime_fallback": {
    "enabled": true,
    "retry_on_errors": [400, 429, 500, 502, 503, 529],
    "max_fallback_attempts": 3,
    "cooldown_seconds": 60,
    "timeout_seconds": 30,
    "notify_on_fallback": true
  },
  "agents": {
    "sisyphus": {
      "model": "github-copilot/gpt-5.5",
      "fallback_models": [
        "horologium/qwen3-7-max",
        "horologium/deepseek-v4-pro"
      ]
    }
  }
}
```

## Additional Context

I attempted a small CLI reproduction with:

```shell
opencode run --agent sisyphus --model horologium/qwen3-7-max --title omo-fallback-test --format json "reply OK"
```

In my non-interactive harness this started OpenCode/plugin loading but did not persist a test session, so the strongest evidence above is from a real affected OpenCode session plus inspection of the `oh-my-openagent@4.5.1` packaged implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: runtime_fallback persists duplicate user messages during model fallback #4595

Prerequisites

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Doctor Output

Error Logs

Configuration

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: runtime_fallback persists duplicate user messages during model fallback #4595

Description

Prerequisites

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Doctor Output

Error Logs

Configuration

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions