Stream in-progress assistant responses live in the chat view by gnguralnick · Pull Request #153 · imbue-ai/forever-claude-template

gnguralnick · 2026-06-11T17:33:49Z

Shows a Claude agent's response in the system interface chat as it is being typed, instead of only after the turn finalizes.

How it works

mngr's Claude plugin can capture the agent's tmux pane on an interval, reverse-map the rendered assistant text back to markdown, and write the in-progress message to $MNGR_AGENT_STATE_DIR/plugin/claude/stream_buffer. This surfaces that buffer in the chat:

.mngr/settings.toml: enables streaming for chat agents with a conservative 1s capture interval (agent_types.claude.streaming_snapshot_interval_seconds) — the continuously-running, comparatively expensive half, kept modest.
Backend (stream_watcher.py): AgentStreamWatcher tails the buffer on a conservative 2s poll (poll-only, not watchdog, so the fan-out rate is capped regardless of how fast mngr rewrites the file) and broadcasts an assistant_streaming snapshot over the existing SSE channel only when the content changes — an idle agent costs one read per interval and no broadcasts. Frames carry no session_id, so they ride the main stream and are excluded from per-subagent streams. Wired in alongside the session/tickets watchers.
Frontend: renders a provisional, dimmed assistant bubble at the live tail, replaced by the durable assistant_message the instant it lands.

Keeping the preview honest

mngr's stream buffer is an approximate, raw view of the tmux pane — it keeps showing the last assistant block until the agent idles, and re-shows it at the start of the next turn. mngr's own reference consumer (mngr_robinhood) reconciles these raw snapshots downstream, so the system interface owns its reconciliation rather than changing the shared mngr contract:

Idle is a hard guarantee: no preview renders when the agent's activity state is IDLE.
No prior-turn lingering: the preview resets on a new user message (and on a finalized assistant message), so a just-sent turn drops any leftover in-progress text immediately.
Whitespace-tolerant staleness check: the preview is suppressed unless it carries content beyond the latest finalized message, compared with a whitespace-tolerant walk ported from mngr_robinhood's stream_buffer, so mngr's cosmetic rendering differences (trailing spaces, a collapsed blank line) don't defeat the check.

Both intervals are intentionally conservative to keep CPU cost predictable and can be lowered if responses feel laggy. The preview is a live approximation only; the durable transcript event remains the source of truth.

Base

Built on a sync of vendor/mngr to upstream mngr main (10d7996e), with the root uv.lock regenerated to match the bumped vendored package versions.

Tests

Backend: apps/system_interface suite passing (492), coverage above threshold; new stream_watcher unit tests.
Frontend: lint clean, 301 tests passing; new tests for the streaming preview lifecycle and the whitespace-tolerant staleness logic.

Replaces the vendored mngr copy with the tracked tree of the mngr repo's current main at commit 10d7996e567f74a805d26f2bba8a5a9a38cbe46b (via `git archive HEAD` from the upstream checkout). Picks up everything merged into mngr main since the previous vendor snapshot (94211c77), 1520 commits. The synced packages bumped their versions (imbue-common 0.1.18->0.1.19, resource-guards 0.1.7->0.1.8, concurrency-group 0.1.18->0.1.19, imbue-mngr / imbue-mngr-claude 0.2.10->0.2.12), so the root uv.lock is regenerated to match -- no third-party packages added or removed, only the editable path-package versions bumped. Without this a --frozen/--locked Docker or CI build would fail to resolve.

origin/main re-synced vendor/mngr to mngr 815069d8, which is an ancestor of this branch's sync to mngr 10d7996e (10 commits newer). Resolved the two conflicting minds desktop_client files by keeping this branch's version; the result matches the upstream mngr 10d7996e tracked tree exactly and the root uv.lock remains consistent (uv lock --check passes).

Show a Claude agent's response as it is being typed, rather than only after the turn finalizes. mngr's Claude plugin can capture the agent's tmux pane on an interval and write the in-progress markdown to $MNGR_AGENT_STATE_DIR/plugin/claude/stream_buffer; this surfaces that in the system interface. - .mngr/settings.toml: enable streaming for chat agents with a conservative 1s capture interval (agent_types.claude.streaming_snapshot_interval_seconds). This is the continuously-running, comparatively expensive half (a tmux capture + reverse-map per interval, per agent), so it is kept modest. - stream_watcher.py: AgentStreamWatcher tails the stream_buffer on a conservative 2s poll (deliberately poll-only, not watchdog, so the read/fan-out rate is capped regardless of how fast mngr rewrites the file) and broadcasts an assistant_streaming snapshot only when the (last_complete_id, body) pair changes -- so an idle agent costs one stat+read per interval and no broadcasts. Frames carry no session_id, so they ride the main stream and are excluded from per-subagent streams. Wired in alongside the session/tickets watchers. - frontend: render a provisional, dimmed assistant bubble at the live tail from the latest assistant_streaming frame, replaced the instant the durable assistant_message lands (so the two never double-render). Cleared on idle and on stream disconnect. Both intervals are intentionally conservative to keep CPU cost predictable; they can be lowered if responses feel laggy. Streaming is a live preview only -- the durable transcript event remains the source of truth.

The pulsing in-progress bubble lingered after the canonical assistant message rendered, and re-appeared at the start of the next turn. Root cause: mngr's stream buffer keeps the last assistant block as the "in-progress" body until the agent goes idle, so the watcher re-broadcasts that text -- under a new last_complete_id when the message commits, and again on the next turn's activity before fresh output streams. The one-shot clear-on-assistant_message was promptly undone by the next stale frame. Fix presentation at render time rather than chasing every stale frame: a new shouldShowStreamingPreview() suppresses the bubble when the agent is IDLE (no response can be in flight) or when the preview text already equals the latest finalized assistant message (whitespace-normalized, since mngr's reverse-mapped markdown differs cosmetically from the transcript). A genuinely new, still- streaming message differs from the last finalized one, so it still shows. Pure decision + normalizer are unit-tested; the chat view calls them with the preview text, the latest finalized assistant text, the agent's activity state, and the tail-anchored flag.

Following the established division of labor -- mngr's stream buffer is an approximate, raw view of the tmux pane, and its reference consumer (mngr_robinhood) owns the snapshot reconciliation -- harden the system interface's own reconciliation instead of changing the shared mngr contract. Two guarantees: - Idle is a hard gate: when the agent's activity state is IDLE, the preview never renders (a settled agent has no response in flight). - Prior-turn text can't linger: the preview is reset on a new user_message (as well as on a finalized assistant_message), so a just-sent turn immediately drops any in-progress text mngr is still showing. Replace the brittle exact text-equality check with a whitespace-tolerant "does the preview carry content beyond the latest finalized message" walk, ported from mngr_robinhood's stream_buffer._unemitted_suffix_start. mngr's reverse-mapped markdown differs cosmetically (trailing spaces, a collapsed blank line around a rule) from the canonical transcript text, so an exact compare would miss the lingering/re-shown message; the tolerant walk recognizes it as already-finalized while still showing a genuinely new message.

…ant check The previous commit replaced the exact text-equality staleness check with the whitespace-tolerant 'adds nothing beyond the finalized message' walk; sync the ChatPanel docstring wording to match.

joshalbrecht · 2026-06-13T17:03:40Z

roughly lgtm, but:

we really need to add something (at the mngr level) so that we can make it more efficient before we can really merge this
resolve the merge conflicts

My vote for the simplest way to accomplish #1 is to:
A) ensure that it doesn't bother doing the streaming stuff unless the agent is actually working
B) disable streaming for worker agents (vs the chat agents) (in the FCT repo, via the config)
C) ensure that our polling interval here is ~5 seconds or so

I think that would make this efficient enough that we could have it on without worrying about the performance impacts too much (but we'll need to monitor as well

origin/main's "simplify-progress" rework removed the tickets-watcher / step_enrichment side-channel that drove the progress view, deriving progress decoration from the transcript instead. This branch's in-progress assistant-response streaming (AgentStreamWatcher, assistant_streaming SSE frames, the streaming preview bubble) is independent of that machinery, so the resolution keeps the streaming feature and drops the removed watcher plumbing: - server.py: keep the stream-watcher wiring, drop the tickets-watcher wiring (its module was deleted on origin/main). - StreamingMessage.ts: keep the assistant_streaming SSE handler, drop the step_enrichment handler (origin/main removed its imports); restore the mithril import the preview redraw needs. - test_ratchets.py: __init__ ratchet count is now 6 (origin/main's 5 minus the removed tickets watcher, plus AgentStreamWatcher).

Three changes to reduce the cost of approximate response streaming: - mngr (stream_snapshot.py): do no streaming work at all while an agent is idle -- no transcript read, no tmux capture, no buffer write. Previously the expensive pane capture was already gated on the `active` marker, but the loop still read the transcript and rewrote the buffer every interval while idle. Now the only idle-time work is a single clearing write on the active->idle edge (so a stale in-progress preview can't linger). Threaded via a `was_active` flag returned from each poll. (To be upstreamed to mngr separately.) - .mngr/settings.toml: disable streaming for the `worker` agent type (streaming_snapshot_interval_seconds = 0). Delegated workers run headless with nobody watching the pane, so the per-interval capture is pure overhead. Covers the worker and crystallize-worker templates. - Both poll intervals set to ~5s for a ~5s average end-to-end preview update: mngr's streaming_snapshot_interval_seconds 1.0 -> 5.0 (the expensive tmux capture) and the system interface's STREAM_POLL_INTERVAL_SECONDS 2.0 -> 5.0 (the buffer read). The two loops are independent, so average latency is roughly the sum of their half-intervals (2.5s + 2.5s).

…sages

The in-progress assistant preview was always rendered as a trailing bubble below every open step, ignoring the progress-view structure. When the tail turn has an open step, route the preview into that step's expanded body instead, so the live output sits with the work it belongs to. - turn-grouping: add tailFrontierStep() to identify the open step (if any) that owns the live preview on the tail turn. - ProgressBlock: render the preview at the tail of the frontier step's expanded body, and make a frontier step expandable even before its first finalized event. Narration stays finalized-transcript-derived (it swaps, it does not grow). - ChatPanel: read the preview live in the row render closure (kept out of the memoized rows cache) and suppress the standalone trailing bubble when an open step owns the stream. No open step (no-steps turn, or steps all closed) keeps the default-visible bubble.

…sages # Conflicts: # vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator.py # vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator_test.py

…messages Problem: connectToStream cleared the live in-progress assistant preview on every user_message event. Claude Code emits user_message events mid-turn for non-boundary content (skill expansions, stop-hook feedback, /welcome), so a skill invocation or stop hook firing while the agent was actively streaming would flicker the live bubble off until the next ~5s snapshot frame. Fix: only reset the preview on a genuine boundary user_message, reusing the existing isNonBoundaryUserMessage predicate that the turn-grouping and rendering layers already use to decide turn boundaries. The assistant_message reset is unchanged. Added a regression test covering skill-expansion and stop-hook user_messages arriving mid-stream.

Problem: AgentStreamWatcher._read_buffer caught OSError and returned None without logging, silently swallowing genuine read failures (permission/I/O errors), and the module-level loguru `logger` binding it was evidently meant to use was left dead/unused. Fix: log the caught OSError at debug level before returning None, matching the sibling AgentSessionWatcher pattern in session_watcher.py -- quiet by default for the expected absent-buffer case, present for diagnosing real failures, and putting the previously-unused logger binding to use.

Merging origin/main brought in the minds 0.3.1 vendored mngr refresh, whose workspace packages declare new dependencies (anthropic for mngr_claude's agents_to_message API, requests for mngr, docstring-parser) and version bumps (concurrency-group, imbue-common, imbue-mngr). origin/main's committed uv.lock lagged its own vendor/mngr pyproject files, so re-resolving here adds the missing entries. Regenerated with `uv lock` (stable under `uv lock --check`).

Gabriel Guralnick added 6 commits June 11, 2026 10:27

Update renderStreamingPreview docstring to match the whitespace-toler…

9bf8737

…ant check The previous commit replaced the exact text-equality staleness check with the whitespace-tolerant 'adds nothing beyond the finalized message' walk; sync the ChatPanel docstring wording to match.

gnguralnick changed the title ~~Sync vendored mngr to current upstream main~~ Stream in-progress assistant responses live in the chat view Jun 11, 2026

gnguralnick marked this pull request as ready for review June 11, 2026 22:39

Gabriel Guralnick added 9 commits June 15, 2026 10:46

Merge remote-tracking branch 'origin/main' into gabriel/streaming-mes…

04b8fb5

…sages

Merge remote-tracking branch 'origin/main' into gabriel/streaming-mes…

f5c52a9

…sages

Merge remote-tracking branch 'origin/main' into gabriel/streaming-mes…

3fa951a

…sages # Conflicts: # vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator.py # vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream in-progress assistant responses live in the chat view#153

Stream in-progress assistant responses live in the chat view#153
gnguralnick wants to merge 15 commits into
mainfrom
gabriel/streaming-messages

gnguralnick commented Jun 11, 2026 •

edited

Loading

Uh oh!

joshalbrecht commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gnguralnick commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Keeping the preview honest

Base

Tests

Uh oh!

joshalbrecht commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gnguralnick commented Jun 11, 2026 •

edited

Loading