Skip to content

Discord adapter: latency UX improvements (perception + proactive context management) #795

@RayKuo-Mantis

Description

@RayKuo-Mantis

Description

Discord adapter: latency UX improvements (perception + proactive context management)

Context

Running openab-claude:0.8.3-beta.7 + PR #791 swap on GB10 ARM64 host
(meta + mentor instances, ~28h uptime, normal daily-driver usage with Discord
adapter).

Observed behavior

Two distinct latency signals from openab::dispatch logs over a full day of
real usage:

Metric Range Comment
wait_ms (OpenAB queue wait) 300–500 ms Healthy across all turns
agent_dispatch_ms (inner agent processing) 2 sec – 395 sec Wide variance

Breakdown by usage pattern:

  • Idle chat: 2–5 sec
  • Active sessions with tool chains: 30–100 sec common
  • Multi-tool reasoning-heavy turns: 100–400 sec
  • Peak observed today: 395 sec (6.5 min) on one turn

Root cause investigation

After issuing /compact inside the Claude session, agent_dispatch_ms
immediately dropped back to the 2–5 sec normal range. So:

  • ~80% of "Discord feels slow" is Claude CLI session jsonl context bloat
    (full history sent on every Anthropic API call grows linearly with session size)
  • ~10% is ACP / Discord chunked-send overhead (json-rpc wrap + Discord API
    message chunking for long replies)
  • ~10% is perception (no streaming visibility — user sees nothing for
    30+ sec, which feels indistinguishable from "stuck")

OpenAB itself is not the latency root cause — wait_ms is consistently
healthy. The dominant factor is internal Claude/Anthropic processing, which
OpenAB only measures.

That said, OpenAB sits between the user and Claude CLI — it is the only layer
that can mitigate the user-facing experience of these long turns.

Suggested improvements

1. Typing indicator / partial output during long dispatch

Currently the Discord adapter waits for the agent to fully complete its turn
before pushing the reply. For 30+ sec turns this looks like the bot died.

  • Maintain Discord typing indicator while dispatch is in flight, OR
  • Stream partial output as Claude produces it (if ACP supports incremental
    chunks)

2. Auto progress hint when agent_dispatch_ms is unusually long

When a turn exceeds a threshold (e.g. > 60 sec):

  • Auto-send a short ⏳ still processing... message to keep channel alive
  • Optionally include a hint like session context is X% full — consider /compact

3. Expose session size to user

  • Slash command or /status showing current jsonl size + last agent_dispatch_ms
  • Lets the user see context-bloat accumulating and proactively /compact
    before turns slow down

4. Auto-compact (most impactful, but biggest change)

OpenAB is an agent layer with full ownership — it can do proactive context
management that the upstream Claude Code CLI itself does not (Anthropic's
auto-compact only triggers near hard context limit, not proactively).

Proposed mechanism:

  • Monitor signals: any of —
    • jsonl file size exceeds threshold (e.g. 10 MB)
    • rolling average agent_dispatch_ms over last N turns exceeds threshold (e.g. 30 sec)
    • time since last /compact exceeds T hours
  • Trigger: before delivering the next user message to the agent, OpenAB
    injects a synthetic /compact dispatch
  • Transparent to user: just a maintenance turn, not visible in Discord
  • Result: user never has to think about compacting, session stays in
    fresh-context regime indefinitely

This is structurally the kind of thing only an agent wrapper can do —
the upstream commercial Claude API can't modify Claude Code's own behavior,
but a layer that invokes Claude CLI can drive it proactively.

Priority

(1)(2)(3) are perception-layer fixes — keep user informed that the system is
alive during slow turns. Cheap wins.

(4) is structural — eliminates the dominant 80% latency contributor entirely
for daily-driver use cases.

Logging suggestion (separate)

Consider splitting agent_dispatch_ms into:

  • claude_cli_ms (inner CLI + API time)
  • acp_overhead_ms (json-rpc serialization + Discord chunked-send)

Currently they're conflated. Splitting helps users (and you) distinguish
"Claude is slow today" from "OpenAB has overhead" when triaging issues.

Environment

  • Host: GB10 ARM64, Ubuntu 24.04
  • OpenAB image: ghcr.io/openabdev/openab-claude:0.8.3-beta.7 (also tested
    with PR fix: reconnect Discord gateway on silent WS disconnect #791 reconnect fix swapped on meta instance)
  • Two compose instances side by side (meta + mentor)
  • Use case: continuous daily-driver, Discord-only interface

Use Case

Discord adapter: latency UX improvements

Proposed Solution

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions