Skip to content

Runtime fallback and background subagents can stall or mark completed work interrupted #4528

@sigvardt

Description

@sigvardt

Summary

OpenCode sessions using Oh My OpenAgent can stall silently or mark completed subagent work as interrupted during provider retry/fallback and Atlas/background-task orchestration.

The failure appears as a main command session that exits or stalls without a useful error, or as subagents that either stall, report interrupted, or fail internal retries with an agent-name mismatch.

Failure modes observed

  • Runtime fallback stores the OpenCode model payload as if it is always a string. When OpenCode sends object-form models, fallback comparison can throw current.toLowerCase is not a function.
  • First-prompt watchdog only covered subagents and used abort+prompt fallback. Main command sessions such as opencode run --command start-work --agent "Atlas - Plan Executor" could stay silent until OpenCode provider retry handling took over or the command exited.
  • session.status provider retry signals caused runtime fallback to abort the current request and dispatch an internal prompt immediately. In CLI command mode this can end the command runner instead of letting OpenCode's provider retry carry the queued fallback model.
  • Runtime fallback internal prompts pass plugin display-agent names such as Atlas - Plan Executor; OpenCode internal prompt routes only know core agent keys (build, explore, general, plan), producing Agent not found.
  • Background task completion aborts child sessions even after they have already reached idle/completed state, which can create false interrupted markers.
  • Atlas/Ralph continuation gates only considered running background tasks active, so pending background work could be skipped during parent continuation decisions.
  • Atlas task timers for background-launched work were not started through the same timer path and plan checkbox timer closure could happen after the orchestrator-only guard.

Expected behavior

  • Fallback state accepts both string and object model payloads.
  • Main command sessions and subagents should recover through configured fallback models without silently exiting.
  • Provider auto-retry signals should queue the fallback model for OpenCode's next retry instead of aborting the request in command mode.
  • Internal fallback prompts should only pass OpenCode internal agent keys, or omit plugin display agents.
  • Completed idle child sessions should not be aborted after successful background task completion.
  • Pending background tasks should block parent continuation just like running tasks.

Local verification

  • bun test targeted runtime fallback tests
  • bun test Atlas task timer/background launch/idle continuation tests
  • bun test src/features/background-agent/manager.test.ts
  • bun test src/plugin/fallback.cliproxyapi-matrix.test.ts
  • bun run typecheck
  • bun run build
  • End-to-end local patched cache stress: Atlas command sessions launched multiple background subagents and completed without stall/interrupted markers.

A PR with the fix is being opened from fix/runtime-fallback-background-stalls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions