Allow hooks to raise ModelRetry for retry control flow#4858
Conversation
Extends `ModelRetry` support beyond tools and output validators to capability hooks. This lets post-processing hooks trigger model retries without manipulating graph nodes directly. Supported hooks: - `after_model_request`: reject a model response and retry (response is kept in history for context) - `wrap_model_request`: raise instead of returning to trigger retry (skips `on_model_request_error`) - `on_model_request_error`: recover from errors by retrying the model - Tool hooks (`before/after_tool_validate`, `before/after_tool_execute`): raise to ask the model to redo the tool call - `wrap_tool_execute`: raise to trigger tool retry (skips `on_tool_execute_error`) - `on_tool_execute_error`: recover from tool errors by retrying Retry counting uses existing pools: `max_result_retries` for model request hooks, per-tool `max_retries` for tool hooks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| _messages.ModelRequest(parts=[m], instructions=instructions) | ||
| ) | ||
| self._result = retry_node | ||
| return retry_node |
There was a problem hiding this comment.
The retry node creation logic (increment retries → build RetryPromptPart → get instructions → create ModelRequestNode → set self._result) is duplicated four times across _make_request, _finish_handling, and stream (twice). The only differences are a few context-specific lines before the shared part (e.g. appending the response in _finish_handling, setting _did_stream in stream).
Consider extracting a helper like _build_retry_node(ctx, run_context, error) that handles the common retry node construction, so each call site only needs to handle its own context-specific bookkeeping before calling the helper.
| ) | ||
| except exceptions.SkipModelRequest as e: | ||
| model_response = e.response | ||
| except exceptions.ModelRetry: |
There was a problem hiding this comment.
The except ModelRetry: raise followed by except Exception pattern is repeated in three places (_make_request, and twice in stream). The re-raise is necessary to prevent the broad except Exception from swallowing ModelRetry, but it's worth noting that with the helper extraction suggested in the other comment, you may be able to structure this more cleanly — e.g. by checking isinstance(e, ModelRetry) inside the except Exception block and re-raising if so, which would eliminate the inner try/except.
tests/test_capabilities.py
Outdated
| result = await agent.run('hello') | ||
| assert result.output == 'good response' | ||
| assert call_count == 2 | ||
| # Verify model response is in history before the retry |
There was a problem hiding this comment.
This test (test_after_model_request_model_retry_non_streaming) is nearly identical to test_after_model_request_model_retry above — same hook, same FunctionModel, same agent.run() call path. The name suggests it's meant to test the non-streaming path specifically, but FunctionModel with agent.run() already goes through _make_request in both tests.
If the intent was to have a streaming vs non-streaming pair, the other test should use agent.run_stream(). Otherwise, one of these should be removed.
tests/test_capabilities.py
Outdated
| assert call_count == 2 | ||
| # Verify message flow: response is in history, then retry prompt, then new response | ||
| messages = result.all_messages() | ||
| assert any(isinstance(p, RetryPromptPart) for msg in messages for p in msg.parts) |
There was a problem hiding this comment.
These tests should use snapshot() for result.all_messages() to validate the complete message trace, not just check for the presence/absence of specific part types. Per the testing guidelines, agent tests should assert on the final output AND snapshot result.all_messages() to catch regressions in tool calls, intermediate steps, and message flow.
This applies to all tests in TestModelRetryFromHooks.
|
|
||
|
|
||
| class TestModelRetryFromHooks: | ||
| """Tests for raising ModelRetry from capability hooks.""" |
There was a problem hiding this comment.
There are no tests for the streaming code paths (agent.run_stream() or agent.iter()), even though the stream() method in ModelRequestNode has significant changes for ModelRetry handling in two separate branches (early wrap completion and normal streaming). Please add at least one streaming test to cover the ModelRetry path through stream().
| """Wraps the model request. handler() calls the model.""" | ||
| """Wraps the model request. handler() calls the model. | ||
|
|
||
| Raise [`ModelRetry`][pydantic_ai.exceptions.ModelRetry] to skip `on_model_request_error` |
There was a problem hiding this comment.
The docstring says "Raise ModelRetry to skip on_model_request_error and directly retry the model request with a retry prompt" — but looking at the implementation, ModelRetry raised from wrap_model_request after the handler has been called means the model's response is silently discarded (not appended to history). This is unlike after_model_request where the response is preserved so the model can see what it said.
This is a meaningful behavioral difference that should be documented explicitly. Users might expect the response to be visible to the model on retry, especially if they're raising ModelRetry based on inspecting the response inside wrap_model_request.
Docs Preview
|
…ests - Extract `_build_retry_node` helper to deduplicate retry node creation across `_make_request`, `_finish_handling`, and `stream` (4 call sites) - Fix bug: `stream()` short-circuit path must yield before returning (required by @asynccontextmanager contract) — yield empty stream - Replace duplicate non-streaming test with actual streaming test using `run_stream()` with tool call scenario - Update `wrap_model_request` docstring to note response is not preserved (unlike `after_model_request`) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| except (SkipToolExecution, CallDeferred, ApprovalRequired, ToolRetryError): | ||
| raise # Control flow, not errors | ||
| except ModelRetry: | ||
| raise # Propagate to outer handler | ||
| except Exception as e: | ||
| tool_result = await cap.on_tool_execute_error(ctx, call=call, tool_def=tool_def, args=args, error=e) |
There was a problem hiding this comment.
🚩 Behavioral change: ModelRetry from wrap_tool_execute no longer reaches on_tool_execute_error
Previously, if wrap_tool_execute raised ModelRetry, it would be caught by except Exception as e: and passed to on_tool_execute_error. With the new except ModelRetry: raise clause at pydantic_ai_slim/pydantic_ai/_tool_manager.py:311-312, ModelRetry now bypasses on_tool_execute_error entirely and is converted directly to ToolRetryError by the outer handler. This is documented as intentional in the updated docstring for on_tool_execute_error at pydantic_ai_slim/pydantic_ai/capabilities/abstract.py:491-492, but it is a semantic change to the hook contract that could affect existing capability implementations relying on on_tool_execute_error seeing ModelRetry exceptions.
Was this helpful? React with 👍 or 👎 to provide feedback.
Python 3.14 makes `return` inside `finally` a SyntaxError. Replace with `try/except/else` pattern and extract `_resolve_wrap_result` helper to reduce `stream()` complexity below the C901 threshold. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover both streaming code paths: - Short-circuit: wrap_model_request raises ModelRetry without calling handler - After handler: wrap_model_request raises ModelRetry after stream consumed (uses tool call scenario since text output is already yielded) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 'on_model_request_error' and 'on_tool_execute_error' methods in the "skips on error" tests are intentionally never called — that's what the tests verify. Mark with pragma: no cover. Remove dead ToolReturnPart branch from after_tool_execute test model function — tool never returns when the hook raises ModelRetry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make after_tool_execute and before_tool_execute tests retry the tool call (not just return text), so the hook's return path (second call) is also exercised. This covers the last 2 uncovered lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The stream function is called on the second wrap_model_request invocation (after retry), so it is covered. strict-no-cover correctly flagged this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # pydantic_ai_slim/pydantic_ai/_agent_graph.py
|
|
||
| Increments the retry counter and creates a new request with a RetryPromptPart. | ||
| """ | ||
| ctx.state.increment_retries(ctx.deps.max_result_retries, error=error) |
There was a problem hiding this comment.
📝 Info: ModelRetry retries count against max_result_retries, sharing the budget with output validation retries
_build_retry_node calls ctx.state.increment_retries(ctx.deps.max_result_retries, error=error) at line 825, which shares the same global retry counter (ctx.state.retries) and budget (max_result_retries) as output validation retries in CallToolsNode. This means a ModelRetry from after_model_request consumes one of the retries that would otherwise be available for output validation. This is a design choice rather than a bug, but users setting output_retries=2 might be surprised that a hook retry reduces their remaining output validation retries.
Was this helpful? React with 👍 or 👎 to provide feedback.
…elRetry When wrap_model_request calls handler() and then raises ModelRetry, the model DID respond. Capture the handler's response and append it to message history before retrying, so the model can see what it said — consistent with after_model_request behavior. Extract _append_response helper to deduplicate response-to-history logic across _finish_handling, _make_request, and stream(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In the normal streaming path, the handler is always called (that's how the stream is created), so _handler_response is always set. Use assert instead of an if-guard to avoid an unreachable branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove spurious `ctx.state.usage.requests += 1` in streaming ModelRetry path — `_streaming_handler` already incremented it when the handler ran - Add `snapshot()` assertions to key tests per test guidelines: verify complete message trace including retry prompts in history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove spurious usage.requests increment in streaming short-circuit ModelRetry path — handler was never called, no model request was made - Add snapshot() assertions with all_messages() to all ModelRetry tests per test guidelines (rule:194) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| # --- ModelRetry from hooks tests --- | ||
|
|
||
|
|
||
| class TestModelRetryFromHooks: |
There was a problem hiding this comment.
🚩 Tests only use FunctionModel/TestModel — no provider-specific coverage
Per tests/AGENTS.md rule:11, tests should be parametrized across providers (at minimum OpenAI, Anthropic, Google). All new tests in TestModelRetryFromHooks exclusively use FunctionModel. While this is common for capability/hook tests (testing framework behavior rather than provider behavior), the ModelRetry from hooks interacts with the model request lifecycle which varies across providers — particularly streaming behavior. Provider-specific cassette tests would increase confidence that the retry mechanism works end-to-end with real provider streaming implementations.
Was this helpful? React with 👍 or 👎 to provide feedback.
| except exceptions.ModelRetry as e: | ||
| # ModelRetry from wrap_model_request or on_model_request_error — retry the model request. | ||
| # If the handler was called, preserve the response in history for context. | ||
| ctx.state.usage.requests += 1 |
There was a problem hiding this comment.
usage.requests is unconditionally incremented here, even when _handler_response is None (i.e. wrap_model_request short-circuited without calling the handler, meaning no actual model request was made). This is inconsistent with the streaming short-circuit ModelRetry path at line 585, which explicitly does not increment usage.requests because "handler was never called."
The streaming path's logic seems more correct: if the handler was never called, no model request was made, so counting it as a request is wrong. This should probably be conditional:
if _handler_response is not None:
ctx.state.usage.requests += 1
self._append_response(ctx, _handler_response)Consider also adding a test that asserts on result.usage().requests to verify request counting is correct across all ModelRetry paths (short-circuit vs after-handler, streaming vs non-streaming).
| Raise [`ModelRetry`][pydantic_ai.exceptions.ModelRetry] to reject the response and | ||
| ask the model to try again. The original response is still appended to message history | ||
| so the model can see what it said. Retries count against `max_result_retries`. | ||
| """ |
There was a problem hiding this comment.
The new ModelRetry support from capability hooks is a meaningful feature addition, but there are no documentation updates. The capabilities docs should describe:
- Which hooks support
ModelRetryand the semantics for each (model hooks vs tool hooks, which retry pool is used) - The interaction between
ModelRetryandon_model_request_error/on_tool_execute_error(i.e.ModelRetrybypasses the error hooks) - Examples showing common use cases (e.g. post-processing validation in
after_model_request, guardrails inwrap_model_request)
The docstring updates here are a good start, but users need documentation to discover and understand this feature.
|
The PR description has |
- Only increment usage.requests in _make_request ModelRetry path when the handler was actually called (consistent with streaming paths) - Add "Triggering retries with ModelRetry" section to docs/hooks.md documenting which hooks support it, retry counting, and an example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make the ModelRetry docs example a proper tested example with full type annotations, title metadata, and correct expected output. Import Hooks from pydantic_ai.capabilities (public API). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
before_model_request doesn't support ModelRetry since the model hasn't responded yet. Change "Any hook" to "Most hooks" and note the exception. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The supported hooks are already enumerated in the bullet lists below. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| else: | ||
| self.last_request_context = wrap_request_context | ||
| await self._finish_handling(ctx, model_response) | ||
| assert self._result is not None |
There was a problem hiding this comment.
🟡 Streaming _build_retry_node in finally block raises or silently ignores retry when FinalResultEvent was already yielded
When using run_stream, if the model returns text that triggers a FinalResultEvent, the agent loop yields StreamedRunResult to the user and breaks out of its while loop (at agent/abstract.py:709). However, the async with node.stream(graph_ctx) context is still active, so ModelRequestNode.stream()'s finally block runs afterward. Inside the finally block, _finish_handling calls after_model_request, which may raise ModelRetry. When it does, _build_retry_node calls ctx.state.increment_retries() — which can raise UnexpectedModelBehavior if max_result_retries is exceeded, crashing the user's async with agent.run_stream(...) exit even though they already received a valid response. If retries are NOT exceeded, the retry ModelRequestNode is stored in self._result but never processed (the while loop already broke at agent/abstract.py:709, so _wrap_and_advance at line 722 is never reached), meaning the ModelRetry is silently ignored — the user gets the response the hook intended to reject.
Trace of the silent-ignore path
run_stream→node.stream(graph_ctx)yields stream with text → FinalResultEvent emitted- Agent loop yields
StreamedRunResultatabstract.py:699, setsyielded=True,breakat line 709 async with node.stream()exits →finallyblock runs at_agent_graph.py:615_finish_handlingat line 643 →after_model_requestraisesModelRetry_append_responseappends response,_build_retry_nodecreates retry node at line 640 or 826- Retry node stored in
self._resultbut agent loop already exited — retry never executed
Was this helpful? React with 👍 or 👎 to provide feedback.
| async def test_after_model_request_model_retry_streaming(self): | ||
| """after_model_request raises ModelRetry during streaming with tool calls — model is called again.""" | ||
| call_count = 0 | ||
|
|
||
| async def stream_fn(messages: list[ModelMessage], info: AgentInfo) -> AsyncIterator[str | DeltaToolCalls]: | ||
| nonlocal call_count | ||
| call_count += 1 | ||
| if call_count == 1: | ||
| # First call: return a tool call that after_model_request will reject | ||
| yield {0: DeltaToolCall(name='my_tool', json_args='{}', tool_call_id='call-1')} | ||
| elif call_count == 2: | ||
| # Second call (after retry): return text | ||
| yield 'good response' | ||
| else: | ||
| yield 'unexpected' # pragma: no cover | ||
|
|
||
| @dataclass | ||
| class RetryCap(AbstractCapability[Any]): | ||
| retried: bool = False | ||
|
|
||
| async def after_model_request( | ||
| self, | ||
| ctx: RunContext[Any], | ||
| *, | ||
| request_context: ModelRequestContext, | ||
| response: ModelResponse, | ||
| ) -> ModelResponse: | ||
| if not self.retried: | ||
| self.retried = True | ||
| raise ModelRetry('Response was bad, please try again') | ||
| return response | ||
|
|
||
| cap = RetryCap() | ||
| agent = Agent( | ||
| FunctionModel(simple_model_function, stream_function=stream_fn), | ||
| capabilities=[cap], | ||
| ) | ||
|
|
||
| @agent.tool_plain | ||
| def my_tool() -> str: | ||
| return 'tool result' # pragma: no cover | ||
|
|
||
| async with agent.run_stream('hello') as streamed: | ||
| result = await streamed.get_output() | ||
| assert result == 'good response' | ||
| assert call_count == 2 |
There was a problem hiding this comment.
🚩 Test coverage focuses on tool-call scenarios for streaming ModelRetry; text-response streaming path untested
The streaming ModelRetry tests (test_after_model_request_model_retry_streaming at line 6292, test_wrap_model_request_model_retry_streaming_after_handler at line 6430) use tool calls in their first model response, not text. When the model returns text in streaming, a FinalResultEvent is emitted and the agent loop yields StreamedRunResult before after_model_request runs in the finally block. A test with text as the first streaming response (triggering FinalResultEvent) followed by after_model_request raising ModelRetry would reveal that the retry is silently ignored.
Was this helpful? React with 👍 or 👎 to provide feedback.
- Merge main with PR #4858 (ModelRetry from hooks) - Re-apply ModelRetry outer try/except for tool execute hooks (lost in merge) - Add same ModelRetry pattern to output validate and execute hooks: - before/after/wrap can raise ModelRetry to trigger a retry - ModelRetry from wrap_output_execute bypasses on_output_execute_error - Converted to ToolRetryError for retry handling - Fix streaming output tool path: use handle_output_tool_call instead of handle_call so output hooks fire (not tool hooks) during streaming - Fix union + error recovery crash: do_execute returns recovered data as-is when _union_kind is empty - Add allow_partial/wrap_validation_errors params to validate_output_tool_call - Update docstrings and docs/hooks.md with output hook ModelRetry info - Add 5 tests: before_validate, after_validate, after_execute, wrap_execute (skips error hook), before_execute all raising ModelRetry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
ModelRetrysupport beyond tools and output validators to capability hooksafter_model_request,wrap_model_request,on_model_request_error, and all tool lifecycle hooks (before/after_tool_validate,before/after_tool_execute,wrap_tool_execute,on_tool_execute_error)max_result_retriesfor model request hooks, per-toolmax_retriesfor tool hooksTest plan
TestModelRetryFromHookscovering all supported hooks🤖 Generated with Claude Code
Closes #XXXX