Skip to content

Fix gateway tool output visibility and timing#2555

Open
henrypark133 wants to merge 3 commits intostagingfrom
tool-visibility
Open

Fix gateway tool output visibility and timing#2555
henrypark133 wants to merge 3 commits intostagingfrom
tool-visibility

Conversation

@henrypark133
Copy link
Copy Markdown
Collaborator

Summary

  • fix gateway tool activity cards to correlate live tool events by call_id and show actual tool output when available
  • use the existing persisted result field for expanded history cards without adding new history storage, and bound active-thread history results to keep refreshes fast
  • thread call_id and live duration_ms through the web event surface, preserve engine_v2 action call IDs in the bridge, and replace misleading 0.0s with millisecond-friendly duration formatting

Testing

  • cargo fmt --all
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test gateway_send_status_preserves_tool_event_fields --lib
  • cargo test test_build_turns_with_persisted_tool_result_for_display --lib
  • cargo test accumulator_tool_flow --lib
  • cargo test test_ws_multiple_events_in_sequence --test ws_gateway_integration
  • cargo test test_tool_result_for_display_truncates_long_content --lib
  • cargo test forward_event_to_channel_preserves_call_id_for_action_events --lib
  • cargo test thread_event_to_app_events_preserves_call_id_for_action_events --lib

Copilot AI review requested due to automatic review settings April 16, 2026 23:27
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel Channel infrastructure scope: channel/web Web gateway channel scope: channel/wasm WASM channel runtime size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves how the web gateway surfaces tool activity by correlating live tool events via call_id, exposing actual tool output (when persisted), and carrying real execution timings (duration_ms) end-to-end so the UI can render accurate tool cards and durations.

Changes:

  • Add call_id (and duration_ms for completions) to tool-related StatusUpdate/AppEvent variants and preserve these fields through the bridge and web channel layers.
  • Use persisted tool-call result to populate expanded history tool cards (with display truncation), avoiding new history storage.
  • Refactor frontend tool activity rendering to a controller that correlates events by call_id and formats durations in a millisecond-friendly way.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/ws_gateway_integration.rs Asserts WS payloads include call_id and duration_ms for tool events.
tests/support_unit_tests.rs Updates test fixtures for new duration_ms field on ToolCompleted.
src/channels/web/util.rs Adds tool_result_for_display and threads persisted result + call_id into ToolCallInfo.
src/channels/web/types.rs Extends ToolCallInfo DTO with call_id and result.
src/channels/web/tests/tool_event_passthrough.rs Regression test ensuring gateway preserves tool identity/timing fields to SSE.
src/channels/web/tests/mod.rs Registers new tool passthrough test module.
src/channels/web/server.rs Includes call_id and display-ready result for in-memory turn tool calls.
src/channels/web/responses_api.rs Correlates tool events by call_id when building response output items.
src/channels/web/mod.rs Ensures call_id/duration_ms are passed through as AppEvents.
src/channels/wasm/wrapper.rs Updates WASM channel tests for duration_ms.
src/channels/channel.rs Adds duration_ms to StatusUpdate::ToolCompleted and propagates it in constructor helpers/tests.
src/bridge/router.rs Preserves engine action call_id and forwards duration_ms via tool-status events.
src/agent/thread_ops.rs Measures tool execution durations and passes duration_ms into status updates.
src/agent/dispatcher.rs Measures tool execution durations and passes duration_ms into status updates.
crates/ironclaw_gateway/static/app.js Refactors tool activity cards to correlate by call_id, show persisted results, and format ms durations.
crates/ironclaw_common/src/event.rs Adds optional call_id and duration_ms fields to tool-related AppEvents.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/channels/web/util.rs
Comment thread crates/ironclaw_gateway/static/app.js Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces call_id and duration_ms fields across the system to improve the tracking and display of tool activity. Key changes include refactoring the frontend tool activity state into a reusable controller, adding execution timing in the agent dispatcher, and updating event propagation to preserve tool identity. A logic error was identified in the frontend's tool correlation function where name-based fallback could lead to incorrect state updates during parallel tool calls.

Comment thread crates/ironclaw_gateway/static/app.js
Copy link
Copy Markdown
Collaborator Author

Addressed the open review comments on tool-visibility:

  • tool_result_for_display() now returns None for JSON null and empty strings, so the UI no longer renders synthetic "null" output.
  • createToolActivitySummary() now builds the duration node with textContent instead of appending via innerHTML, so <1ms renders safely.
  • findRendered() now treats call_id as authoritative and no longer falls back to name-based matching when a call_id is present but does not satisfy the predicate.

Validation after the follow-up patch:

  • cargo fmt --all
  • node --check crates/ironclaw_gateway/static/app.js
  • cargo test test_tool_result_for_display_skips_null --lib

Follow-up commit pushed: 733f5b95.

@henrypark133
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 733f5b95ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/ironclaw_gateway/static/app.js Outdated
henrypark133

This comment was marked as off-topic.

Comment thread src/channels/web/tests/tool_event_passthrough.rs
Copilot AI review requested due to automatic review settings April 17, 2026 04:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/channels/web/util.rs
Comment on lines 9 to 12
const MAX_HISTORY_IMAGE_DATA_URL_BYTES_PER_IMAGE: usize = 512 * 1024;
const MAX_HISTORY_IMAGE_DATA_URL_BYTES_PER_RESPONSE: usize = 1024 * 1024;
const MAX_TOOL_RESULT_DISPLAY_CHARS: usize = 1000;

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_TOOL_RESULT_DISPLAY_CHARS is passed to truncate_preview, but truncate_preview truncates by bytes (and can return max_bytes + 3 due to the appended "..."). To avoid confusion/misconfiguration, rename this constant (and any docs/tests) to reflect bytes (e.g., MAX_TOOL_RESULT_DISPLAY_BYTES) or switch to a char-count truncation helper if the intent is truly characters.

Copilot uses AI. Check for mistakes.
let call_id = format!("call_{}", Uuid::new_v4().simple());
let call_id = call_id
.clone()
.unwrap_or_else(|| format!("call_{}", Uuid::new_v4().simple()));
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts threading call_id through the streaming worker, but the completion side still uses a single global current_tool_index later in the same match arm. With overlapping tool calls, a later ToolStarted overwrites that slot, so the first completion can emit response.output_item.done for the wrong function-call item. Since the PR already has the call_id here, the in-flight output index should be tracked by call_id as well instead of by one mutable slot.

Copy link
Copy Markdown
Collaborator Author

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: keep Responses API tool completions correlated per call

Most of the earlier tool-visibility feedback looks addressed, and the new call_id / duration_ms plumbing is much cleaner. One correctness issue still remains in the streaming Responses API path.

Critical: response.output_item.done still uses one global tool slot

File: src/channels/web/responses_api.rs:947
The streaming worker now threads call_id into the FunctionCall and FunctionCallOutput items, but it still tracks completion with a single current_tool_index. If tool A starts, tool B starts, and tool A completes first, the later start overwrites that slot and the code emits response.output_item.done for B instead of A. The final output list keeps the right call_id, but streamed clients can observe the wrong item transition to done and never get a done event for the actual completed call.

Suggested fix: track in-flight output indexes by call_id (or look them up by call_id on completion) instead of using a single mutable index.

Recommended verdict: Request changes.

Residual risk: I also kicked off a couple of targeted tests from the worktree, but they were still compiling when I posted this review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: channel Channel infrastructure size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants