Handle Anthropic stop_reason=pause_turn and OpenAI background mode#4306
Handle Anthropic stop_reason=pause_turn and OpenAI background mode#4306mattbrandman wants to merge 1 commit intopydantic:mainfrom
stop_reason=pause_turn and OpenAI background mode#4306Conversation
|
@DouweM this is a slightly different take leveraging the fact that we do in fact have a graph to work with. |
400bb22 to
2387b04
Compare
stop_reason=pause_turn and OpenAI background mode
bc560c5 to
b757f1d
Compare
b757f1d to
4e9533e
Compare
4e9533e to
51c8b30
Compare
Status update: branch restructured and review feedback addressedThis PR has been restructured into a focused stack: CI fixes in this push
Review feedback addressed (from earlier commits + this restructure)
Remaining unresolved threads (will address in follow-up or are no longer applicable)Most unresolved review threads from |
51c8b30 to
7252260
Compare
7252260 to
45ea637
Compare
| # On pause_turn continuation, pass just the container ID string to reconnect. | ||
| # Re-passing BetaContainerParams triggers a prefill rejection on some models | ||
| # (e.g. Sonnet 4-6) even though plain string ID works fine. | ||
| if messages and isinstance(messages[-1], ModelResponse) and messages[-1].state == 'suspended': | ||
| if messages[-1].provider_details: | ||
| return messages[-1].provider_details.get('container_id') | ||
| return None |
There was a problem hiding this comment.
🚩 Anthropic _get_container returns raw string instead of BetaContainerParams for continuations
At anthropic.py:534-536, when the last message is a suspended response, _get_container returns the raw container_id string from provider_details instead of wrapping it in BetaContainerParams(id=...) as is done in the normal path at line 542. The comment explains this is intentional to avoid a prefill rejection on some models. This is a provider-specific workaround that may break if Anthropic changes their API contract. The return type annotation says BetaContainerParams | None but now it can also return str, which is a type inconsistency (though the API apparently accepts both).
Was this helpful? React with 👍 or 👎 to provide feedback.
b772648 to
fb9442d
Compare
fb9442d to
0ec1274
Compare
| @staticmethod | ||
| def _merge_response(existing: _messages.ModelResponse, new: _messages.ModelResponse) -> _messages.ModelResponse: | ||
| """Merge a new response into an existing one. | ||
|
|
||
| If same `provider_response_id`, replace entirely with the new response. | ||
| If the model changed between responses, replace entirely (incompatible responses should not be merged). | ||
| Otherwise, accumulate parts, sum usage, and use other fields from the new response. | ||
| """ | ||
| # Same response ID → the new response is a full replacement (e.g. OpenAI background retrieve). | ||
| if existing.provider_response_id and existing.provider_response_id == new.provider_response_id: | ||
| return new | ||
|
|
||
| # Different model → replace (accumulating parts from different models is always wrong). | ||
| # When either model_name is None/empty, we fall through to accumulation — this is intentional | ||
| # because providers may not always populate model_name on continuation responses. | ||
| if existing.model_name and new.model_name and existing.model_name != new.model_name: | ||
| return new | ||
|
|
||
| # Same model, different response → accumulate parts and sum usage. | ||
| # Preserve existing provider response IDs when continuation responses omit them | ||
| # (e.g. resumed OpenAI streams that start after a sequence number). | ||
| merged_usage = existing.usage + new.usage | ||
| return replace( | ||
| new, | ||
| parts=[*existing.parts, *new.parts], | ||
| usage=merged_usage, | ||
| provider_response_id=new.provider_response_id or existing.provider_response_id, | ||
| ) |
There was a problem hiding this comment.
🚩 _merge_response accumulates parts across continuations — duplicate parts possible if provider echoes earlier content
The merge logic at _agent_graph.py:1346-1351 concatenates [*existing.parts, *new.parts]. This works correctly for providers like Anthropic that return only NEW content in continuation responses. However, if a provider echoes earlier parts in the continuation response (e.g. some OpenAI retrieve responses return the full output), the merged response would contain duplicate parts. The same-provider_response_id check at line 1333 handles the OpenAI background case (where retrieve returns the full response with the same ID), but an edge case could arise if a provider returns a different response ID but includes earlier content.
Was this helpful? React with 👍 or 👎 to provide feedback.
…ck continuation pinning Adds general-purpose continuation infrastructure for models that pause mid-turn: - ContinueRequestNode in agent graph for automatic continuation requests - ModelResponseState type (complete/suspended) on ModelResponse - Fallback model continuation pinning: pin to the model that started a continuation - Message rewinding for fallback recovery This enables Anthropic pause_turn and OpenAI background mode (added in follow-up).
0ec1274 to
14446ac
Compare
stop_reasonpause_turnis not handled correctly, resulting in errors with long-running built-in tools #2600