Skip to content

Stream in-progress assistant responses live in the chat view#153

Open
gnguralnick wants to merge 15 commits into
mainfrom
gabriel/streaming-messages
Open

Stream in-progress assistant responses live in the chat view#153
gnguralnick wants to merge 15 commits into
mainfrom
gabriel/streaming-messages

Conversation

@gnguralnick

@gnguralnick gnguralnick commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Shows a Claude agent's response in the system interface chat as it is being typed, instead of only after the turn finalizes.

How it works

mngr's Claude plugin can capture the agent's tmux pane on an interval, reverse-map the rendered assistant text back to markdown, and write the in-progress message to $MNGR_AGENT_STATE_DIR/plugin/claude/stream_buffer. This surfaces that buffer in the chat:

  • .mngr/settings.toml: enables streaming for chat agents with a conservative 1s capture interval (agent_types.claude.streaming_snapshot_interval_seconds) — the continuously-running, comparatively expensive half, kept modest.
  • Backend (stream_watcher.py): AgentStreamWatcher tails the buffer on a conservative 2s poll (poll-only, not watchdog, so the fan-out rate is capped regardless of how fast mngr rewrites the file) and broadcasts an assistant_streaming snapshot over the existing SSE channel only when the content changes — an idle agent costs one read per interval and no broadcasts. Frames carry no session_id, so they ride the main stream and are excluded from per-subagent streams. Wired in alongside the session/tickets watchers.
  • Frontend: renders a provisional, dimmed assistant bubble at the live tail, replaced by the durable assistant_message the instant it lands.

Keeping the preview honest

mngr's stream buffer is an approximate, raw view of the tmux pane — it keeps showing the last assistant block until the agent idles, and re-shows it at the start of the next turn. mngr's own reference consumer (mngr_robinhood) reconciles these raw snapshots downstream, so the system interface owns its reconciliation rather than changing the shared mngr contract:

  • Idle is a hard guarantee: no preview renders when the agent's activity state is IDLE.
  • No prior-turn lingering: the preview resets on a new user message (and on a finalized assistant message), so a just-sent turn drops any leftover in-progress text immediately.
  • Whitespace-tolerant staleness check: the preview is suppressed unless it carries content beyond the latest finalized message, compared with a whitespace-tolerant walk ported from mngr_robinhood's stream_buffer, so mngr's cosmetic rendering differences (trailing spaces, a collapsed blank line) don't defeat the check.

Both intervals are intentionally conservative to keep CPU cost predictable and can be lowered if responses feel laggy. The preview is a live approximation only; the durable transcript event remains the source of truth.

Base

Built on a sync of vendor/mngr to upstream mngr main (10d7996e), with the root uv.lock regenerated to match the bumped vendored package versions.

Tests

  • Backend: apps/system_interface suite passing (492), coverage above threshold; new stream_watcher unit tests.
  • Frontend: lint clean, 301 tests passing; new tests for the streaming preview lifecycle and the whitespace-tolerant staleness logic.

Gabriel Guralnick added 6 commits June 11, 2026 10:27
Replaces the vendored mngr copy with the tracked tree of the mngr repo's
current main at commit 10d7996e567f74a805d26f2bba8a5a9a38cbe46b (via
`git archive HEAD` from the upstream checkout). Picks up everything merged
into mngr main since the previous vendor snapshot (94211c77), 1520 commits.

The synced packages bumped their versions (imbue-common 0.1.18->0.1.19,
resource-guards 0.1.7->0.1.8, concurrency-group 0.1.18->0.1.19,
imbue-mngr / imbue-mngr-claude 0.2.10->0.2.12), so the root uv.lock is
regenerated to match -- no third-party packages added or removed, only
the editable path-package versions bumped. Without this a --frozen/--locked
Docker or CI build would fail to resolve.
origin/main re-synced vendor/mngr to mngr 815069d8, which is an ancestor
of this branch's sync to mngr 10d7996e (10 commits newer). Resolved the
two conflicting minds desktop_client files by keeping this branch's
version; the result matches the upstream mngr 10d7996e tracked tree
exactly and the root uv.lock remains consistent (uv lock --check passes).
Show a Claude agent's response as it is being typed, rather than only after
the turn finalizes. mngr's Claude plugin can capture the agent's tmux pane on
an interval and write the in-progress markdown to
$MNGR_AGENT_STATE_DIR/plugin/claude/stream_buffer; this surfaces that in the
system interface.

- .mngr/settings.toml: enable streaming for chat agents with a conservative 1s
  capture interval (agent_types.claude.streaming_snapshot_interval_seconds). This
  is the continuously-running, comparatively expensive half (a tmux capture +
  reverse-map per interval, per agent), so it is kept modest.
- stream_watcher.py: AgentStreamWatcher tails the stream_buffer on a conservative
  2s poll (deliberately poll-only, not watchdog, so the read/fan-out rate is
  capped regardless of how fast mngr rewrites the file) and broadcasts an
  assistant_streaming snapshot only when the (last_complete_id, body) pair
  changes -- so an idle agent costs one stat+read per interval and no broadcasts.
  Frames carry no session_id, so they ride the main stream and are excluded from
  per-subagent streams. Wired in alongside the session/tickets watchers.
- frontend: render a provisional, dimmed assistant bubble at the live tail from
  the latest assistant_streaming frame, replaced the instant the durable
  assistant_message lands (so the two never double-render). Cleared on idle and
  on stream disconnect.

Both intervals are intentionally conservative to keep CPU cost predictable; they
can be lowered if responses feel laggy. Streaming is a live preview only -- the
durable transcript event remains the source of truth.
The pulsing in-progress bubble lingered after the canonical assistant message
rendered, and re-appeared at the start of the next turn. Root cause: mngr's
stream buffer keeps the last assistant block as the "in-progress" body until the
agent goes idle, so the watcher re-broadcasts that text -- under a new
last_complete_id when the message commits, and again on the next turn's activity
before fresh output streams. The one-shot clear-on-assistant_message was
promptly undone by the next stale frame.

Fix presentation at render time rather than chasing every stale frame: a new
shouldShowStreamingPreview() suppresses the bubble when the agent is IDLE (no
response can be in flight) or when the preview text already equals the latest
finalized assistant message (whitespace-normalized, since mngr's reverse-mapped
markdown differs cosmetically from the transcript). A genuinely new, still-
streaming message differs from the last finalized one, so it still shows.

Pure decision + normalizer are unit-tested; the chat view calls them with the
preview text, the latest finalized assistant text, the agent's activity state,
and the tail-anchored flag.
Following the established division of labor -- mngr's stream buffer is an
approximate, raw view of the tmux pane, and its reference consumer
(mngr_robinhood) owns the snapshot reconciliation -- harden the system
interface's own reconciliation instead of changing the shared mngr contract.

Two guarantees:
- Idle is a hard gate: when the agent's activity state is IDLE, the preview
  never renders (a settled agent has no response in flight).
- Prior-turn text can't linger: the preview is reset on a new user_message (as
  well as on a finalized assistant_message), so a just-sent turn immediately
  drops any in-progress text mngr is still showing.

Replace the brittle exact text-equality check with a whitespace-tolerant
"does the preview carry content beyond the latest finalized message" walk,
ported from mngr_robinhood's stream_buffer._unemitted_suffix_start. mngr's
reverse-mapped markdown differs cosmetically (trailing spaces, a collapsed
blank line around a rule) from the canonical transcript text, so an exact
compare would miss the lingering/re-shown message; the tolerant walk recognizes
it as already-finalized while still showing a genuinely new message.
…ant check

The previous commit replaced the exact text-equality staleness check with the
whitespace-tolerant 'adds nothing beyond the finalized message' walk; sync the
ChatPanel docstring wording to match.
@gnguralnick gnguralnick changed the title Sync vendored mngr to current upstream main Stream in-progress assistant responses live in the chat view Jun 11, 2026
@gnguralnick gnguralnick marked this pull request as ready for review June 11, 2026 22:39
@joshalbrecht

Copy link
Copy Markdown
Contributor

roughly lgtm, but:

  1. we really need to add something (at the mngr level) so that we can make it more efficient before we can really merge this
  2. resolve the merge conflicts

My vote for the simplest way to accomplish #1 is to:
A) ensure that it doesn't bother doing the streaming stuff unless the agent is actually working
B) disable streaming for worker agents (vs the chat agents) (in the FCT repo, via the config)
C) ensure that our polling interval here is ~5 seconds or so

I think that would make this efficient enough that we could have it on without worrying about the performance impacts too much (but we'll need to monitor as well

Gabriel Guralnick added 9 commits June 15, 2026 10:46
origin/main's "simplify-progress" rework removed the tickets-watcher /
step_enrichment side-channel that drove the progress view, deriving
progress decoration from the transcript instead. This branch's
in-progress assistant-response streaming (AgentStreamWatcher,
assistant_streaming SSE frames, the streaming preview bubble) is
independent of that machinery, so the resolution keeps the streaming
feature and drops the removed watcher plumbing:

- server.py: keep the stream-watcher wiring, drop the tickets-watcher
  wiring (its module was deleted on origin/main).
- StreamingMessage.ts: keep the assistant_streaming SSE handler, drop the
  step_enrichment handler (origin/main removed its imports); restore the
  mithril import the preview redraw needs.
- test_ratchets.py: __init__ ratchet count is now 6 (origin/main's 5
  minus the removed tickets watcher, plus AgentStreamWatcher).
Three changes to reduce the cost of approximate response streaming:

- mngr (stream_snapshot.py): do no streaming work at all while an agent is
  idle -- no transcript read, no tmux capture, no buffer write. Previously
  the expensive pane capture was already gated on the `active` marker, but
  the loop still read the transcript and rewrote the buffer every interval
  while idle. Now the only idle-time work is a single clearing write on the
  active->idle edge (so a stale in-progress preview can't linger). Threaded
  via a `was_active` flag returned from each poll. (To be upstreamed to mngr
  separately.)

- .mngr/settings.toml: disable streaming for the `worker` agent type
  (streaming_snapshot_interval_seconds = 0). Delegated workers run headless
  with nobody watching the pane, so the per-interval capture is pure
  overhead. Covers the worker and crystallize-worker templates.

- Both poll intervals set to ~5s for a ~5s average end-to-end preview update:
  mngr's streaming_snapshot_interval_seconds 1.0 -> 5.0 (the expensive tmux
  capture) and the system interface's STREAM_POLL_INTERVAL_SECONDS 2.0 -> 5.0
  (the buffer read). The two loops are independent, so average latency is
  roughly the sum of their half-intervals (2.5s + 2.5s).
The in-progress assistant preview was always rendered as a trailing bubble
below every open step, ignoring the progress-view structure. When the tail
turn has an open step, route the preview into that step's expanded body
instead, so the live output sits with the work it belongs to.

- turn-grouping: add tailFrontierStep() to identify the open step (if any)
  that owns the live preview on the tail turn.
- ProgressBlock: render the preview at the tail of the frontier step's
  expanded body, and make a frontier step expandable even before its first
  finalized event. Narration stays finalized-transcript-derived (it swaps,
  it does not grow).
- ChatPanel: read the preview live in the row render closure (kept out of the
  memoized rows cache) and suppress the standalone trailing bubble when an open
  step owns the stream. No open step (no-steps turn, or steps all closed)
  keeps the default-visible bubble.
…sages

# Conflicts:
#	vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator.py
#	vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator_test.py
…messages

Problem: connectToStream cleared the live in-progress assistant preview on
every user_message event. Claude Code emits user_message events mid-turn for
non-boundary content (skill expansions, stop-hook feedback, /welcome), so a
skill invocation or stop hook firing while the agent was actively streaming
would flicker the live bubble off until the next ~5s snapshot frame.

Fix: only reset the preview on a genuine boundary user_message, reusing the
existing isNonBoundaryUserMessage predicate that the turn-grouping and
rendering layers already use to decide turn boundaries. The assistant_message
reset is unchanged. Added a regression test covering skill-expansion and
stop-hook user_messages arriving mid-stream.
Problem: AgentStreamWatcher._read_buffer caught OSError and returned None
without logging, silently swallowing genuine read failures (permission/I/O
errors), and the module-level loguru `logger` binding it was evidently meant
to use was left dead/unused.
Fix: log the caught OSError at debug level before returning None, matching the
sibling AgentSessionWatcher pattern in session_watcher.py -- quiet by default
for the expected absent-buffer case, present for diagnosing real failures, and
putting the previously-unused logger binding to use.
Merging origin/main brought in the minds 0.3.1 vendored mngr refresh, whose
workspace packages declare new dependencies (anthropic for mngr_claude's
agents_to_message API, requests for mngr, docstring-parser) and version bumps
(concurrency-group, imbue-common, imbue-mngr). origin/main's committed uv.lock
lagged its own vendor/mngr pyproject files, so re-resolving here adds the
missing entries. Regenerated with `uv lock` (stable under `uv lock --check`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants