Skip to content

bridge: ACP agent stderr swallowed — only opaque -32603 Internal error reaches Discord #854

@clsung

Description

@clsung

Description

When the ACP backend (e.g. codex-acp) hits a runtime error during a turn, openab forwards only the raw JSON-RPC error (-32603 Internal error) to the Discord channel. The agent's actual stderr — which carries the real cause — does not appear in the bridge's kubectl logs, so remote debugging requires pod-exec'ing and reproducing the ACP handshake by hand.

In our case the real stderr was:

ERROR codex_acp:🧵 Unhandled error during turn:
{"detail":"The 'gpt-5.2-codex' model is not supported when using Codex with a ChatGPT account."}

…but the user only saw ⚠️ Internal Error (code: -32603) / Internal error in Discord, and kubectl logs <openab pod> had zero ERROR/WARN lines for the failed dispatch.

Discord discussion: https://discord.com/channels/1491295327620169908/1506171914282860575

Steps to Reproduce

  1. Deploy openab with agent.command = "codex-acp" (no args).
  2. Authenticate codex with a personal ChatGPT account (which does not include gpt-5.2-codex — the codex-acp default model).
  3. Cue the bot from Discord.
  4. Discord shows ⚠️ Internal Error (code: -32603) / Internal error.
  5. kubectl logs <openab pod> shows the dispatch completed at INFO level with the failure signature:
    batch dispatched ... agent_dispatch_ms=2926 tokens_per_event=[1] wait_ms=[429]
    
    No ERROR line carries the underlying cause. Bridge restart does not help (root cause is per-session, not stale state).

Expected Behavior

  • ACP backend stderr should be captured and emitted at WARN/ERROR level in the bridge log stream, scoped to the channel/thread/session that triggered the failure, so operators don't need to enter the pod to find the cause.
  • (Nice-to-have) The user-facing Discord message could include a short hint that there is a backend error (without leaking raw stderr), so end users know whether to retry, switch model, or escalate.
  • (Possible heuristic) When a dispatch ends with tokens_per_event<=1 and short agent_dispatch_ms and the agent's last frame was a JSON-RPC error, the bridge could auto-log at WARN — that pattern was the only signal in our case and we almost missed it.

Environment

  • openab: k8s deployment via Helm chart; bridge binary mirrored from clsung/openab (recent main).
  • ACP agent: @zed-industries/codex-acp 0.10.0 (codex CLI @openai/codex 0.128.0).
  • Adapter: Discord.
  • Platform: linux/arm64 k8s pod (orbstack node).

Screenshots / Logs

Bridge log around the failure (no ERROR/WARN despite the failure):

2026-05-19T05:46:00.209454Z INFO dispatch{channel=... adapter="discord"}: openab::dispatch:
  batch dispatched thread_key=discord:... events_per_dispatch=1 packed_block_count=2
  agent_dispatch_ms=2926 tokens_per_event=[1] wait_ms=[429] senders=["clsung"]

Actual stderr from codex-acp (only visible by running the ACP handshake manually inside the pod):

2026-05-19T06:03:49.241962Z ERROR codex_acp:🧵 Unhandled error during turn:
{"detail":"The 'gpt-5.2-codex' model is not supported when using Codex with a ChatGPT account."} Some(Other)
{"jsonrpc":"2.0","id":3,"error":{"code":-32603,"message":"Internal error","data":{
  "message":"{\"detail\":\"The 'gpt-5.2-codex' model is not supported when using Codex with a ChatGPT account.\"}",
  "codex_error_info":"other"
}}}

Notes

  • The triggering case (model mismatch with a ChatGPT account) is a codex-acp / upstream account-capability issue, not an openab bug per se. The actual openab gap is the lack of visibility into agent stderr — that's what this issue is about.
  • Workaround for our deployment: pinning model = "gpt-5.5" in ~/.codex/config.toml makes codex-acp stop selecting gpt-5.2-codex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions