Stream in-progress assistant responses live in the chat view#153
Open
gnguralnick wants to merge 15 commits into
Open
Stream in-progress assistant responses live in the chat view#153gnguralnick wants to merge 15 commits into
gnguralnick wants to merge 15 commits into
Conversation
added 6 commits
June 11, 2026 10:27
Replaces the vendored mngr copy with the tracked tree of the mngr repo's current main at commit 10d7996e567f74a805d26f2bba8a5a9a38cbe46b (via `git archive HEAD` from the upstream checkout). Picks up everything merged into mngr main since the previous vendor snapshot (94211c77), 1520 commits. The synced packages bumped their versions (imbue-common 0.1.18->0.1.19, resource-guards 0.1.7->0.1.8, concurrency-group 0.1.18->0.1.19, imbue-mngr / imbue-mngr-claude 0.2.10->0.2.12), so the root uv.lock is regenerated to match -- no third-party packages added or removed, only the editable path-package versions bumped. Without this a --frozen/--locked Docker or CI build would fail to resolve.
origin/main re-synced vendor/mngr to mngr 815069d8, which is an ancestor of this branch's sync to mngr 10d7996e (10 commits newer). Resolved the two conflicting minds desktop_client files by keeping this branch's version; the result matches the upstream mngr 10d7996e tracked tree exactly and the root uv.lock remains consistent (uv lock --check passes).
Show a Claude agent's response as it is being typed, rather than only after the turn finalizes. mngr's Claude plugin can capture the agent's tmux pane on an interval and write the in-progress markdown to $MNGR_AGENT_STATE_DIR/plugin/claude/stream_buffer; this surfaces that in the system interface. - .mngr/settings.toml: enable streaming for chat agents with a conservative 1s capture interval (agent_types.claude.streaming_snapshot_interval_seconds). This is the continuously-running, comparatively expensive half (a tmux capture + reverse-map per interval, per agent), so it is kept modest. - stream_watcher.py: AgentStreamWatcher tails the stream_buffer on a conservative 2s poll (deliberately poll-only, not watchdog, so the read/fan-out rate is capped regardless of how fast mngr rewrites the file) and broadcasts an assistant_streaming snapshot only when the (last_complete_id, body) pair changes -- so an idle agent costs one stat+read per interval and no broadcasts. Frames carry no session_id, so they ride the main stream and are excluded from per-subagent streams. Wired in alongside the session/tickets watchers. - frontend: render a provisional, dimmed assistant bubble at the live tail from the latest assistant_streaming frame, replaced the instant the durable assistant_message lands (so the two never double-render). Cleared on idle and on stream disconnect. Both intervals are intentionally conservative to keep CPU cost predictable; they can be lowered if responses feel laggy. Streaming is a live preview only -- the durable transcript event remains the source of truth.
The pulsing in-progress bubble lingered after the canonical assistant message rendered, and re-appeared at the start of the next turn. Root cause: mngr's stream buffer keeps the last assistant block as the "in-progress" body until the agent goes idle, so the watcher re-broadcasts that text -- under a new last_complete_id when the message commits, and again on the next turn's activity before fresh output streams. The one-shot clear-on-assistant_message was promptly undone by the next stale frame. Fix presentation at render time rather than chasing every stale frame: a new shouldShowStreamingPreview() suppresses the bubble when the agent is IDLE (no response can be in flight) or when the preview text already equals the latest finalized assistant message (whitespace-normalized, since mngr's reverse-mapped markdown differs cosmetically from the transcript). A genuinely new, still- streaming message differs from the last finalized one, so it still shows. Pure decision + normalizer are unit-tested; the chat view calls them with the preview text, the latest finalized assistant text, the agent's activity state, and the tail-anchored flag.
Following the established division of labor -- mngr's stream buffer is an approximate, raw view of the tmux pane, and its reference consumer (mngr_robinhood) owns the snapshot reconciliation -- harden the system interface's own reconciliation instead of changing the shared mngr contract. Two guarantees: - Idle is a hard gate: when the agent's activity state is IDLE, the preview never renders (a settled agent has no response in flight). - Prior-turn text can't linger: the preview is reset on a new user_message (as well as on a finalized assistant_message), so a just-sent turn immediately drops any in-progress text mngr is still showing. Replace the brittle exact text-equality check with a whitespace-tolerant "does the preview carry content beyond the latest finalized message" walk, ported from mngr_robinhood's stream_buffer._unemitted_suffix_start. mngr's reverse-mapped markdown differs cosmetically (trailing spaces, a collapsed blank line around a rule) from the canonical transcript text, so an exact compare would miss the lingering/re-shown message; the tolerant walk recognizes it as already-finalized while still showing a genuinely new message.
…ant check The previous commit replaced the exact text-equality staleness check with the whitespace-tolerant 'adds nothing beyond the finalized message' walk; sync the ChatPanel docstring wording to match.
Contributor
|
roughly lgtm, but:
My vote for the simplest way to accomplish #1 is to: I think that would make this efficient enough that we could have it on without worrying about the performance impacts too much (but we'll need to monitor as well |
added 9 commits
June 15, 2026 10:46
origin/main's "simplify-progress" rework removed the tickets-watcher / step_enrichment side-channel that drove the progress view, deriving progress decoration from the transcript instead. This branch's in-progress assistant-response streaming (AgentStreamWatcher, assistant_streaming SSE frames, the streaming preview bubble) is independent of that machinery, so the resolution keeps the streaming feature and drops the removed watcher plumbing: - server.py: keep the stream-watcher wiring, drop the tickets-watcher wiring (its module was deleted on origin/main). - StreamingMessage.ts: keep the assistant_streaming SSE handler, drop the step_enrichment handler (origin/main removed its imports); restore the mithril import the preview redraw needs. - test_ratchets.py: __init__ ratchet count is now 6 (origin/main's 5 minus the removed tickets watcher, plus AgentStreamWatcher).
Three changes to reduce the cost of approximate response streaming: - mngr (stream_snapshot.py): do no streaming work at all while an agent is idle -- no transcript read, no tmux capture, no buffer write. Previously the expensive pane capture was already gated on the `active` marker, but the loop still read the transcript and rewrote the buffer every interval while idle. Now the only idle-time work is a single clearing write on the active->idle edge (so a stale in-progress preview can't linger). Threaded via a `was_active` flag returned from each poll. (To be upstreamed to mngr separately.) - .mngr/settings.toml: disable streaming for the `worker` agent type (streaming_snapshot_interval_seconds = 0). Delegated workers run headless with nobody watching the pane, so the per-interval capture is pure overhead. Covers the worker and crystallize-worker templates. - Both poll intervals set to ~5s for a ~5s average end-to-end preview update: mngr's streaming_snapshot_interval_seconds 1.0 -> 5.0 (the expensive tmux capture) and the system interface's STREAM_POLL_INTERVAL_SECONDS 2.0 -> 5.0 (the buffer read). The two loops are independent, so average latency is roughly the sum of their half-intervals (2.5s + 2.5s).
The in-progress assistant preview was always rendered as a trailing bubble below every open step, ignoring the progress-view structure. When the tail turn has an open step, route the preview into that step's expanded body instead, so the live output sits with the work it belongs to. - turn-grouping: add tailFrontierStep() to identify the open step (if any) that owns the live preview on the tail turn. - ProgressBlock: render the preview at the tail of the frontier step's expanded body, and make a frontier step expandable even before its first finalized event. Narration stays finalized-transcript-derived (it swaps, it does not grow). - ChatPanel: read the preview live in the row render closure (kept out of the memoized rows cache) and suppress the standalone trailing bubble when an open step owns the stream. No open step (no-steps turn, or steps all closed) keeps the default-visible bubble.
…sages # Conflicts: # vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator.py # vendor/mngr/apps/minds/imbue/minds/desktop_client/agent_creator_test.py
…messages Problem: connectToStream cleared the live in-progress assistant preview on every user_message event. Claude Code emits user_message events mid-turn for non-boundary content (skill expansions, stop-hook feedback, /welcome), so a skill invocation or stop hook firing while the agent was actively streaming would flicker the live bubble off until the next ~5s snapshot frame. Fix: only reset the preview on a genuine boundary user_message, reusing the existing isNonBoundaryUserMessage predicate that the turn-grouping and rendering layers already use to decide turn boundaries. The assistant_message reset is unchanged. Added a regression test covering skill-expansion and stop-hook user_messages arriving mid-stream.
Problem: AgentStreamWatcher._read_buffer caught OSError and returned None without logging, silently swallowing genuine read failures (permission/I/O errors), and the module-level loguru `logger` binding it was evidently meant to use was left dead/unused. Fix: log the caught OSError at debug level before returning None, matching the sibling AgentSessionWatcher pattern in session_watcher.py -- quiet by default for the expected absent-buffer case, present for diagnosing real failures, and putting the previously-unused logger binding to use.
Merging origin/main brought in the minds 0.3.1 vendored mngr refresh, whose workspace packages declare new dependencies (anthropic for mngr_claude's agents_to_message API, requests for mngr, docstring-parser) and version bumps (concurrency-group, imbue-common, imbue-mngr). origin/main's committed uv.lock lagged its own vendor/mngr pyproject files, so re-resolving here adds the missing entries. Regenerated with `uv lock` (stable under `uv lock --check`).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Shows a Claude agent's response in the system interface chat as it is being typed, instead of only after the turn finalizes.
How it works
mngr's Claude plugin can capture the agent's tmux pane on an interval, reverse-map the rendered assistant text back to markdown, and write the in-progress message to
$MNGR_AGENT_STATE_DIR/plugin/claude/stream_buffer. This surfaces that buffer in the chat:.mngr/settings.toml: enables streaming for chat agents with a conservative 1s capture interval (agent_types.claude.streaming_snapshot_interval_seconds) — the continuously-running, comparatively expensive half, kept modest.stream_watcher.py):AgentStreamWatchertails the buffer on a conservative 2s poll (poll-only, not watchdog, so the fan-out rate is capped regardless of how fast mngr rewrites the file) and broadcasts anassistant_streamingsnapshot over the existing SSE channel only when the content changes — an idle agent costs one read per interval and no broadcasts. Frames carry nosession_id, so they ride the main stream and are excluded from per-subagent streams. Wired in alongside the session/tickets watchers.assistant_messagethe instant it lands.Keeping the preview honest
mngr's stream buffer is an approximate, raw view of the tmux pane — it keeps showing the last assistant block until the agent idles, and re-shows it at the start of the next turn. mngr's own reference consumer (
mngr_robinhood) reconciles these raw snapshots downstream, so the system interface owns its reconciliation rather than changing the shared mngr contract:IDLE.mngr_robinhood'sstream_buffer, so mngr's cosmetic rendering differences (trailing spaces, a collapsed blank line) don't defeat the check.Both intervals are intentionally conservative to keep CPU cost predictable and can be lowered if responses feel laggy. The preview is a live approximation only; the durable transcript event remains the source of truth.
Base
Built on a sync of
vendor/mngrto upstream mngrmain(10d7996e), with the rootuv.lockregenerated to match the bumped vendored package versions.Tests
apps/system_interfacesuite passing (492), coverage above threshold; newstream_watcherunit tests.