Fix gateway tool output visibility and timing#2555
Fix gateway tool output visibility and timing#2555henrypark133 wants to merge 3 commits intostagingfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves how the web gateway surfaces tool activity by correlating live tool events via call_id, exposing actual tool output (when persisted), and carrying real execution timings (duration_ms) end-to-end so the UI can render accurate tool cards and durations.
Changes:
- Add
call_id(andduration_msfor completions) to tool-relatedStatusUpdate/AppEventvariants and preserve these fields through the bridge and web channel layers. - Use persisted tool-call
resultto populate expanded history tool cards (with display truncation), avoiding new history storage. - Refactor frontend tool activity rendering to a controller that correlates events by
call_idand formats durations in a millisecond-friendly way.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ws_gateway_integration.rs | Asserts WS payloads include call_id and duration_ms for tool events. |
| tests/support_unit_tests.rs | Updates test fixtures for new duration_ms field on ToolCompleted. |
| src/channels/web/util.rs | Adds tool_result_for_display and threads persisted result + call_id into ToolCallInfo. |
| src/channels/web/types.rs | Extends ToolCallInfo DTO with call_id and result. |
| src/channels/web/tests/tool_event_passthrough.rs | Regression test ensuring gateway preserves tool identity/timing fields to SSE. |
| src/channels/web/tests/mod.rs | Registers new tool passthrough test module. |
| src/channels/web/server.rs | Includes call_id and display-ready result for in-memory turn tool calls. |
| src/channels/web/responses_api.rs | Correlates tool events by call_id when building response output items. |
| src/channels/web/mod.rs | Ensures call_id/duration_ms are passed through as AppEvents. |
| src/channels/wasm/wrapper.rs | Updates WASM channel tests for duration_ms. |
| src/channels/channel.rs | Adds duration_ms to StatusUpdate::ToolCompleted and propagates it in constructor helpers/tests. |
| src/bridge/router.rs | Preserves engine action call_id and forwards duration_ms via tool-status events. |
| src/agent/thread_ops.rs | Measures tool execution durations and passes duration_ms into status updates. |
| src/agent/dispatcher.rs | Measures tool execution durations and passes duration_ms into status updates. |
| crates/ironclaw_gateway/static/app.js | Refactors tool activity cards to correlate by call_id, show persisted results, and format ms durations. |
| crates/ironclaw_common/src/event.rs | Adds optional call_id and duration_ms fields to tool-related AppEvents. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces call_id and duration_ms fields across the system to improve the tracking and display of tool activity. Key changes include refactoring the frontend tool activity state into a reusable controller, adding execution timing in the agent dispatcher, and updating event propagation to preserve tool identity. A logic error was identified in the frontend's tool correlation function where name-based fallback could lead to incorrect state updates during parallel tool calls.
|
Addressed the open review comments on
Validation after the follow-up patch:
Follow-up commit pushed: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 733f5b95ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const MAX_HISTORY_IMAGE_DATA_URL_BYTES_PER_IMAGE: usize = 512 * 1024; | ||
| const MAX_HISTORY_IMAGE_DATA_URL_BYTES_PER_RESPONSE: usize = 1024 * 1024; | ||
| const MAX_TOOL_RESULT_DISPLAY_CHARS: usize = 1000; | ||
|
|
There was a problem hiding this comment.
MAX_TOOL_RESULT_DISPLAY_CHARS is passed to truncate_preview, but truncate_preview truncates by bytes (and can return max_bytes + 3 due to the appended "..."). To avoid confusion/misconfiguration, rename this constant (and any docs/tests) to reflect bytes (e.g., MAX_TOOL_RESULT_DISPLAY_BYTES) or switch to a char-count truncation helper if the intent is truly characters.
| let call_id = format!("call_{}", Uuid::new_v4().simple()); | ||
| let call_id = call_id | ||
| .clone() | ||
| .unwrap_or_else(|| format!("call_{}", Uuid::new_v4().simple())); |
There was a problem hiding this comment.
This starts threading call_id through the streaming worker, but the completion side still uses a single global current_tool_index later in the same match arm. With overlapping tool calls, a later ToolStarted overwrites that slot, so the first completion can emit response.output_item.done for the wrong function-call item. Since the PR already has the call_id here, the in-flight output index should be tracked by call_id as well instead of by one mutable slot.
henrypark133
left a comment
There was a problem hiding this comment.
Review: keep Responses API tool completions correlated per call
Most of the earlier tool-visibility feedback looks addressed, and the new call_id / duration_ms plumbing is much cleaner. One correctness issue still remains in the streaming Responses API path.
Critical: response.output_item.done still uses one global tool slot
File: src/channels/web/responses_api.rs:947
The streaming worker now threads call_id into the FunctionCall and FunctionCallOutput items, but it still tracks completion with a single current_tool_index. If tool A starts, tool B starts, and tool A completes first, the later start overwrites that slot and the code emits response.output_item.done for B instead of A. The final output list keeps the right call_id, but streamed clients can observe the wrong item transition to done and never get a done event for the actual completed call.
Suggested fix: track in-flight output indexes by call_id (or look them up by call_id on completion) instead of using a single mutable index.
Recommended verdict: Request changes.
Residual risk: I also kicked off a couple of targeted tests from the worktree, but they were still compiling when I posted this review.
Summary
Testing