Skip to content

Python: Fix streaming path to emit mcp_server_tool_result on output_item.done instead of output_item.added#4821

Open
giles17 wants to merge 5 commits intomicrosoft:mainfrom
giles17:agent/fix-4814-1
Open

Python: Fix streaming path to emit mcp_server_tool_result on output_item.done instead of output_item.added#4821
giles17 wants to merge 5 commits intomicrosoft:mainfrom
giles17:agent/fix-4814-1

Conversation

@giles17
Copy link
Contributor

@giles17 giles17 commented Mar 20, 2026

Motivation and Context

The streaming code path attempted to extract MCP tool results from response.output_item.added events, where the result/output fields are not yet populated. This meant mcp_server_tool_result content was either empty or malformed during streaming, breaking parity with the non-streaming and agent-as-tool paths.

Fixes #4814

Description

The root cause was that _parse_chunk_from_openai tried to read result/output/outputs from the MCP call item at the response.output_item.added stage, before the API has populated those fields. The fix removes that premature extraction and instead handles MCP call results in a new response.output_item.done case, where the completed item carries the final output string. This aligns streaming behavior with the non-streaming path and correctly emits Content.from_mcp_server_tool_result with the actual tool output.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by giles17's agent

Copilot and others added 3 commits March 20, 2026 19:39
…ft#4814)

Remove premature mcp_server_tool_result emission from the
response.output_item.added/mcp_call handler — at that point the MCP
server has not yet responded and output is always None.

Add a handler for response.mcp_call.completed that emits
mcp_server_tool_result with the actual tool output, matching the
non-streaming path behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ft#4814)

Stop eagerly emitting mcp_server_tool_result on response.output_item.added
(when output is always None). Instead, handle response.output_item.done for
mcp_call items, which carries the full McpCall with populated output.

This matches the non-streaming path which guards with 'if item.output is not
None' before emitting the result.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 20, 2026 19:52
@giles17 giles17 self-assigned this Mar 20, 2026
Copy link
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 96% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by giles17's agents

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 20, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/openai
   _responses_client.py82212684%312–315, 319–320, 325–326, 336–337, 344, 359–365, 386, 394, 417, 514, 613, 672, 674, 676, 678, 746, 760, 840, 850, 855, 898, 977, 994, 1007, 1072, 1165, 1170, 1174–1176, 1180–1181, 1247, 1276, 1282, 1292, 1298, 1303, 1309, 1314–1315, 1376, 1398–1399, 1414–1415, 1433–1434, 1475–1478, 1640, 1695, 1697, 1777–1785, 1907, 1942, 1957, 1977–1987, 2000, 2011–2015, 2029, 2043–2054, 2063, 2095–2098, 2106–2107, 2109–2111, 2125–2127, 2137–2138, 2144, 2159
TOTAL27296321888% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5351 20 💤 0 ❌ 0 🔥 1m 27s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the OpenAI Responses streaming parser so MCP tool results are emitted only once the MCP call item is complete, restoring parity with non-streaming behavior and enabling UIs to display MCP outputs during streaming.

Changes:

  • Stop emitting mcp_server_tool_result during response.output_item.added for mcp_call items (when output fields are not populated yet).
  • Add handling for response.output_item.done to emit mcp_server_tool_result with the finalized mcp_call.output.
  • Update/extend unit tests to validate deferral behavior and output_item.done result emission (including the no-output case).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/core/agent_framework/openai/_responses_client.py Defers MCP result parsing to response.output_item.done and emits mcp_server_tool_result from the completed item output.
python/packages/core/tests/openai/test_openai_responses_client.py Adjusts tests to assert no result on output_item.added and adds coverage for output_item.done result emission.

…icrosoft#4814)

- Add call_id fallback in response.output_item.done mcp_call handler to
  match the output_item.added handler pattern
- Use done_item instead of event for raw_representation to keep
  consistent with other output_item branches and non-streaming path
- Add test for call_id fallback when id attribute is missing
- Add raw_representation assertions to existing done handler tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 93%

✓ Correctness

This PR makes two small, correct fixes to the streaming MCP call handling in _parse_chunk_from_openai: (1) adds a call_id attribute fallback when extracting the call identifier from a done mcp_call item, and (2) fixes raw_representation to pass the item object instead of the event wrapper, aligning with the non-streaming path and the response.output_item.added streaming path. Tests are well-constructed and correctly validate the changes. No issues found.

✓ Security Reliability

The changes are minor, correct, and improve reliability: (1) adding a call_id attribute fallback when id is missing on the done_item guards against different event object shapes from the OpenAI API, and (2) passing done_item instead of event as raw_representation is consistent with the non-streaming path (line 1560) and the streaming output_item.added handler (line 1932), which both pass the item object. The test coverage for the fallback and raw_representation assertions is adequate. No security or reliability issues found.

✗ Test Coverage

The diff fixes two bugs in the streaming MCP call handler: (1) adds a call_id fallback when id is missing, and (2) passes done_item instead of event as raw_representation. Tests cover the happy path, null-output, and call_id fallback cases, plus verify raw_representation. However, the new fallback test is missing a raw_representation assertion, and there is no test for the case where both id and call_id are absent (empty-string fallback). The broader issue from the linked discussion—missing handler for response.mcp_call.completed—is not addressed here, but that is a separate feature/fix beyond this PR's scope.

✓ Design Approach

The two changes are well-motivated and consistent with the rest of the codebase. The call_id fallback in response.output_item.done now mirrors the identical pattern already used in response.output_item.added (line 1925), and raw_representation=done_item aligns with both the non-streaming path (line 1560) and the streaming mcp_call added handler (line 1932), which all use the item rather than the outer event wrapper. No design-level concerns found. The only gap is a missing raw_representation assertion in the new fallback test.

Suggestions

  • Add assert result_content.raw_representation is mock_item to test_parse_chunk_from_openai_with_mcp_output_item_done_call_id_fallback for consistency with the other two MCP done tests, since the raw_representation=done_item fix is part of what this PR explicitly changes.
  • The non-streaming path at line 1545 uses item.id directly without the call_id fallback. Consider applying the same defensive fallback (getattr(item, 'id', None) or getattr(item, 'call_id', None) or '') there for consistency, so both paths handle unexpected API shapes gracefully.
  • Add a test case where the mock item has neither id nor call_id to verify the empty-string fallback path (the third or "" branch).

Automated review by giles17's agents

…rage (microsoft#4814)

- Apply defensive call_id fallback (getattr with id/call_id/empty) to
  non-streaming mcp_call path for consistency with streaming path
- Add raw_representation assertion to call_id fallback test
- Add test for empty-string fallback when neither id nor call_id exist

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 91%

✓ Correctness

The PR makes the non-streaming mcp_call path at line 1545 defensive against missing id attribute by using getattr with fallback to call_id then empty string, matching the pattern already used in the streaming path (line 2208). The code change is correct. The new test covers the streaming response.output_item.done handler for the edge case where neither id nor call_id exists. However, the new test (and the existing fallback test at line 1282) test the streaming path (_parse_chunk_from_openai), not the non-streaming path (_parse_response_from_openai) where the actual code change was made. The non-streaming path change at line 1545 lacks a dedicated test for the call_id fallback and the no-attribute fallback scenarios. This is a gap but not a blocking issue since the pattern is well-tested in the streaming path and the logic is identical.

✓ Security Reliability

The diff makes a small, focused reliability improvement: the non-streaming MCP call path now uses defensive getattr for extracting call_id, matching the already-defensive streaming path. The fallback chain (idcall_id → empty string) is consistent across both paths. The new test covers the edge case where neither attribute exists. No security concerns. One minor reliability gap exists in the same non-streaming block where other attributes (item.name, item.server_label, item.arguments, item.output) are still accessed directly without getattr, but these are pre-existing and out of scope for this PR.

✗ Test Coverage

The production code change is on line 1545 in the non-streaming path (_parse_response_from_openai), adding getattr fallback for call_id when item.id is missing. However, the new tests exclusively target the streaming path (_parse_chunk_from_openai via response.output_item.done), which already had the same getattr fallback logic before this PR (line 2208). This means the actual code change has zero new test coverage. The non-streaming path's mcp_call handling at line 1545 needs at least two new tests: one for call_id fallback (no id attribute) and one for empty-string fallback (neither id nor call_id). Additionally, the existing non-streaming test at line 1161 uses MagicMock() (without spec), which auto-creates .id, so it never exercised the old bug either.

✗ Design Approach

The diff changes item.id to a getattr fallback chain in _parse_response_from_openai (non-streaming path, line 1545) to make it consistent with the already-fixed streaming response.output_item.done handler at line 2208. The approach is defensible but has two design problems: (1) the new tests are written against _parse_chunk_from_openai (streaming path, line 2208), not against the code that was actually changed (_parse_response_from_openai, line 1545), leaving the changed line untested; (2) silently falling back to an empty-string call_id when neither attribute exists — and then cementing that with an assertion — codifies broken correlation behavior for concurrent MCP calls instead of failing fast or at least logging a warning.

Flagged Issues

  • All new tests target the streaming path _parse_chunk_from_openai (line 2208), which already had the getattr fallback before this PR. The actual code change is in the non-streaming _parse_response_from_openai (line 1545) and has no test coverage. Add tests for the non-streaming mcp_call case using MagicMock(spec=[]) (so .id isn't auto-created): one with call_id set to verify the fallback, and one with neither id nor call_id to verify the empty-string fallback.
  • When neither id nor call_id exists, silently falling back to "" means concurrent MCP calls become indistinguishable. The code should at minimum emit a warning log before falling back; a stronger alternative is raising a descriptive exception so integration failures are immediately visible.

Suggestions

  • Consider making item.name, item.server_label, item.arguments, and item.output access defensive with getattr in the non-streaming mcp_call block (lines 1549–1555) to match the defensive style now used for call_id. This is pre-existing but could be addressed for consistency.
  • The or chain getattr(item, 'id', None) or getattr(item, 'call_id', None) or "" silently swallows a falsy-but-present id value (e.g., id=""). Prefer an explicit is not None check or use hasattr to avoid conflating a missing attribute with a present-but-empty one.
  • The streaming response.output_item.added path (line 1925) also uses the same getattr fallback pattern but has no fallback tests—consider adding them for completeness.

Automated review by giles17's agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: Streaming path never delivers mcp_server_tool_result content — parity gap with non-streaming and agent-as-tool paths

3 participants