Skip to content

Allow hooks to raise ModelRetry for retry control flow#4858

Merged
DouweM merged 18 commits intomainfrom
hook-model-retry
Mar 27, 2026
Merged

Allow hooks to raise ModelRetry for retry control flow#4858
DouweM merged 18 commits intomainfrom
hook-model-retry

Conversation

@DouweM
Copy link
Copy Markdown
Collaborator

@DouweM DouweM commented Mar 26, 2026

Summary

  • Extends ModelRetry support beyond tools and output validators to capability hooks
  • Lets post-processing hooks trigger model retries without manipulating graph nodes directly
  • Supported in after_model_request, wrap_model_request, on_model_request_error, and all tool lifecycle hooks (before/after_tool_validate, before/after_tool_execute, wrap_tool_execute, on_tool_execute_error)
  • Retry counting uses existing pools: max_result_retries for model request hooks, per-tool max_retries for tool hooks

Test plan

  • 12 new tests in TestModelRetryFromHooks covering all supported hooks
  • All 289 existing capability tests pass
  • All core agent/tool/streaming tests pass (324 tests)
  • Changed files typecheck clean with pyright

🤖 Generated with Claude Code

Closes #XXXX

Extends `ModelRetry` support beyond tools and output validators to capability
hooks. This lets post-processing hooks trigger model retries without
manipulating graph nodes directly.

Supported hooks:
- `after_model_request`: reject a model response and retry (response is kept
  in history for context)
- `wrap_model_request`: raise instead of returning to trigger retry (skips
  `on_model_request_error`)
- `on_model_request_error`: recover from errors by retrying the model
- Tool hooks (`before/after_tool_validate`, `before/after_tool_execute`):
  raise to ask the model to redo the tool call
- `wrap_tool_execute`: raise to trigger tool retry (skips
  `on_tool_execute_error`)
- `on_tool_execute_error`: recover from tool errors by retrying

Retry counting uses existing pools: `max_result_retries` for model request
hooks, per-tool `max_retries` for tool hooks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Mar 26, 2026
_messages.ModelRequest(parts=[m], instructions=instructions)
)
self._result = retry_node
return retry_node
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry node creation logic (increment retries → build RetryPromptPart → get instructions → create ModelRequestNode → set self._result) is duplicated four times across _make_request, _finish_handling, and stream (twice). The only differences are a few context-specific lines before the shared part (e.g. appending the response in _finish_handling, setting _did_stream in stream).

Consider extracting a helper like _build_retry_node(ctx, run_context, error) that handles the common retry node construction, so each call site only needs to handle its own context-specific bookkeeping before calling the helper.

)
except exceptions.SkipModelRequest as e:
model_response = e.response
except exceptions.ModelRetry:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except ModelRetry: raise followed by except Exception pattern is repeated in three places (_make_request, and twice in stream). The re-raise is necessary to prevent the broad except Exception from swallowing ModelRetry, but it's worth noting that with the helper extraction suggested in the other comment, you may be able to structure this more cleanly — e.g. by checking isinstance(e, ModelRetry) inside the except Exception block and re-raising if so, which would eliminate the inner try/except.

result = await agent.run('hello')
assert result.output == 'good response'
assert call_count == 2
# Verify model response is in history before the retry
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (test_after_model_request_model_retry_non_streaming) is nearly identical to test_after_model_request_model_retry above — same hook, same FunctionModel, same agent.run() call path. The name suggests it's meant to test the non-streaming path specifically, but FunctionModel with agent.run() already goes through _make_request in both tests.

If the intent was to have a streaming vs non-streaming pair, the other test should use agent.run_stream(). Otherwise, one of these should be removed.

assert call_count == 2
# Verify message flow: response is in history, then retry prompt, then new response
messages = result.all_messages()
assert any(isinstance(p, RetryPromptPart) for msg in messages for p in msg.parts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should use snapshot() for result.all_messages() to validate the complete message trace, not just check for the presence/absence of specific part types. Per the testing guidelines, agent tests should assert on the final output AND snapshot result.all_messages() to catch regressions in tool calls, intermediate steps, and message flow.

This applies to all tests in TestModelRetryFromHooks.



class TestModelRetryFromHooks:
"""Tests for raising ModelRetry from capability hooks."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for the streaming code paths (agent.run_stream() or agent.iter()), even though the stream() method in ModelRequestNode has significant changes for ModelRetry handling in two separate branches (early wrap completion and normal streaming). Please add at least one streaming test to cover the ModelRetry path through stream().

"""Wraps the model request. handler() calls the model."""
"""Wraps the model request. handler() calls the model.

Raise [`ModelRetry`][pydantic_ai.exceptions.ModelRetry] to skip `on_model_request_error`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "Raise ModelRetry to skip on_model_request_error and directly retry the model request with a retry prompt" — but looking at the implementation, ModelRetry raised from wrap_model_request after the handler has been called means the model's response is silently discarded (not appended to history). This is unlike after_model_request where the response is preserved so the model can see what it said.

This is a meaningful behavioral difference that should be documented explicitly. Users might expect the response to be visible to the model on retry, especially if they're raising ModelRetry based on inspecting the response inside wrap_model_request.

devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

Docs Preview

commit: a753eef
Preview URL: https://b148861f-pydantic-ai-previews.pydantic.workers.dev

…ests

- Extract `_build_retry_node` helper to deduplicate retry node creation
  across `_make_request`, `_finish_handling`, and `stream` (4 call sites)
- Fix bug: `stream()` short-circuit path must yield before returning
  (required by @asynccontextmanager contract) — yield empty stream
- Replace duplicate non-streaming test with actual streaming test using
  `run_stream()` with tool call scenario
- Update `wrap_model_request` docstring to note response is not preserved
  (unlike `after_model_request`)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +309 to +314
except (SkipToolExecution, CallDeferred, ApprovalRequired, ToolRetryError):
raise # Control flow, not errors
except ModelRetry:
raise # Propagate to outer handler
except Exception as e:
tool_result = await cap.on_tool_execute_error(ctx, call=call, tool_def=tool_def, args=args, error=e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Behavioral change: ModelRetry from wrap_tool_execute no longer reaches on_tool_execute_error

Previously, if wrap_tool_execute raised ModelRetry, it would be caught by except Exception as e: and passed to on_tool_execute_error. With the new except ModelRetry: raise clause at pydantic_ai_slim/pydantic_ai/_tool_manager.py:311-312, ModelRetry now bypasses on_tool_execute_error entirely and is converted directly to ToolRetryError by the outer handler. This is documented as intentional in the updated docstring for on_tool_execute_error at pydantic_ai_slim/pydantic_ai/capabilities/abstract.py:491-492, but it is a semantic change to the hook contract that could affect existing capability implementations relying on on_tool_execute_error seeing ModelRetry exceptions.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM and others added 6 commits March 26, 2026 03:43
Python 3.14 makes `return` inside `finally` a SyntaxError. Replace with
`try/except/else` pattern and extract `_resolve_wrap_result` helper to
reduce `stream()` complexity below the C901 threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover both streaming code paths:
- Short-circuit: wrap_model_request raises ModelRetry without calling handler
- After handler: wrap_model_request raises ModelRetry after stream consumed
  (uses tool call scenario since text output is already yielded)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 'on_model_request_error' and 'on_tool_execute_error' methods in the
"skips on error" tests are intentionally never called — that's what the
tests verify. Mark with pragma: no cover.

Remove dead ToolReturnPart branch from after_tool_execute test model
function — tool never returns when the hook raises ModelRetry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make after_tool_execute and before_tool_execute tests retry the tool call
(not just return text), so the hook's return path (second call) is also
exercised. This covers the last 2 uncovered lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The stream function is called on the second wrap_model_request invocation
(after retry), so it is covered. strict-no-cover correctly flagged this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	pydantic_ai_slim/pydantic_ai/_agent_graph.py

Increments the retry counter and creates a new request with a RetryPromptPart.
"""
ctx.state.increment_retries(ctx.deps.max_result_retries, error=error)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: ModelRetry retries count against max_result_retries, sharing the budget with output validation retries

_build_retry_node calls ctx.state.increment_retries(ctx.deps.max_result_retries, error=error) at line 825, which shares the same global retry counter (ctx.state.retries) and budget (max_result_retries) as output validation retries in CallToolsNode. This means a ModelRetry from after_model_request consumes one of the retries that would otherwise be available for output validation. This is a design choice rather than a bug, but users setting output_retries=2 might be surprised that a hook retry reduces their remaining output validation retries.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…elRetry

When wrap_model_request calls handler() and then raises ModelRetry, the
model DID respond. Capture the handler's response and append it to message
history before retrying, so the model can see what it said — consistent
with after_model_request behavior.

Extract _append_response helper to deduplicate response-to-history logic
across _finish_handling, _make_request, and stream().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size: L Large PR (501-1500 weighted lines) and removed size: M Medium PR (101-500 weighted lines) labels Mar 26, 2026
In the normal streaming path, the handler is always called (that's how
the stream is created), so _handler_response is always set. Use assert
instead of an if-guard to avoid an unreachable branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove spurious `ctx.state.usage.requests += 1` in streaming ModelRetry
  path — `_streaming_handler` already incremented it when the handler ran
- Add `snapshot()` assertions to key tests per test guidelines: verify
  complete message trace including retry prompts in history

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

- Remove spurious usage.requests increment in streaming short-circuit
  ModelRetry path — handler was never called, no model request was made
- Add snapshot() assertions with all_messages() to all ModelRetry tests
  per test guidelines (rule:194)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 15 additional findings in Devin Review.

Open in Devin Review

# --- ModelRetry from hooks tests ---


class TestModelRetryFromHooks:
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Tests only use FunctionModel/TestModel — no provider-specific coverage

Per tests/AGENTS.md rule:11, tests should be parametrized across providers (at minimum OpenAI, Anthropic, Google). All new tests in TestModelRetryFromHooks exclusively use FunctionModel. While this is common for capability/hook tests (testing framework behavior rather than provider behavior), the ModelRetry from hooks interacts with the model request lifecycle which varies across providers — particularly streaming behavior. Provider-specific cassette tests would increase confidence that the retry mechanism works end-to-end with real provider streaming implementations.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

except exceptions.ModelRetry as e:
# ModelRetry from wrap_model_request or on_model_request_error — retry the model request.
# If the handler was called, preserve the response in history for context.
ctx.state.usage.requests += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage.requests is unconditionally incremented here, even when _handler_response is None (i.e. wrap_model_request short-circuited without calling the handler, meaning no actual model request was made). This is inconsistent with the streaming short-circuit ModelRetry path at line 585, which explicitly does not increment usage.requests because "handler was never called."

The streaming path's logic seems more correct: if the handler was never called, no model request was made, so counting it as a request is wrong. This should probably be conditional:

if _handler_response is not None:
    ctx.state.usage.requests += 1
    self._append_response(ctx, _handler_response)

Consider also adding a test that asserts on result.usage().requests to verify request counting is correct across all ModelRetry paths (short-circuit vs after-handler, streaming vs non-streaming).

Raise [`ModelRetry`][pydantic_ai.exceptions.ModelRetry] to reject the response and
ask the model to try again. The original response is still appended to message history
so the model can see what it said. Retries count against `max_result_retries`.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ModelRetry support from capability hooks is a meaningful feature addition, but there are no documentation updates. The capabilities docs should describe:

  • Which hooks support ModelRetry and the semantics for each (model hooks vs tool hooks, which retry pool is used)
  • The interaction between ModelRetry and on_model_request_error / on_tool_execute_error (i.e. ModelRetry bypasses the error hooks)
  • Examples showing common use cases (e.g. post-processing validation in after_model_request, guardrails in wrap_model_request)

The docstring updates here are a good start, but users need documentation to discover and understand this feature.

@github-actions
Copy link
Copy Markdown
Contributor

The PR description has Closes #XXXX as a placeholder — please link the actual issue (or create one if this feature wasn't previously tracked). This helps maintain traceability for future reference.

DouweM and others added 2 commits March 27, 2026 02:47
- Only increment usage.requests in _make_request ModelRetry path when the
  handler was actually called (consistent with streaming paths)
- Add "Triggering retries with ModelRetry" section to docs/hooks.md
  documenting which hooks support it, retry counting, and an example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Make the ModelRetry docs example a proper tested example with full type
annotations, title metadata, and correct expected output. Import Hooks
from pydantic_ai.capabilities (public API).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

DouweM and others added 3 commits March 27, 2026 03:21
before_model_request doesn't support ModelRetry since the model hasn't
responded yet. Change "Any hook" to "Most hooks" and note the exception.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The supported hooks are already enumerated in the bullet lists below.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DouweM DouweM enabled auto-merge (squash) March 27, 2026 03:27
@DouweM DouweM merged commit f82046b into main Mar 27, 2026
38 of 39 checks passed
@DouweM DouweM deleted the hook-model-retry branch March 27, 2026 03:29
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 21 additional findings in Devin Review.

Open in Devin Review

Comment on lines +641 to +644
else:
self.last_request_context = wrap_request_context
await self._finish_handling(ctx, model_response)
assert self._result is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Streaming _build_retry_node in finally block raises or silently ignores retry when FinalResultEvent was already yielded

When using run_stream, if the model returns text that triggers a FinalResultEvent, the agent loop yields StreamedRunResult to the user and breaks out of its while loop (at agent/abstract.py:709). However, the async with node.stream(graph_ctx) context is still active, so ModelRequestNode.stream()'s finally block runs afterward. Inside the finally block, _finish_handling calls after_model_request, which may raise ModelRetry. When it does, _build_retry_node calls ctx.state.increment_retries() — which can raise UnexpectedModelBehavior if max_result_retries is exceeded, crashing the user's async with agent.run_stream(...) exit even though they already received a valid response. If retries are NOT exceeded, the retry ModelRequestNode is stored in self._result but never processed (the while loop already broke at agent/abstract.py:709, so _wrap_and_advance at line 722 is never reached), meaning the ModelRetry is silently ignored — the user gets the response the hook intended to reject.

Trace of the silent-ignore path
  1. run_streamnode.stream(graph_ctx) yields stream with text → FinalResultEvent emitted
  2. Agent loop yields StreamedRunResult at abstract.py:699, sets yielded=True, break at line 709
  3. async with node.stream() exits → finally block runs at _agent_graph.py:615
  4. _finish_handling at line 643 → after_model_request raises ModelRetry
  5. _append_response appends response, _build_retry_node creates retry node at line 640 or 826
  6. Retry node stored in self._result but agent loop already exited — retry never executed
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +6292 to +6337
async def test_after_model_request_model_retry_streaming(self):
"""after_model_request raises ModelRetry during streaming with tool calls — model is called again."""
call_count = 0

async def stream_fn(messages: list[ModelMessage], info: AgentInfo) -> AsyncIterator[str | DeltaToolCalls]:
nonlocal call_count
call_count += 1
if call_count == 1:
# First call: return a tool call that after_model_request will reject
yield {0: DeltaToolCall(name='my_tool', json_args='{}', tool_call_id='call-1')}
elif call_count == 2:
# Second call (after retry): return text
yield 'good response'
else:
yield 'unexpected' # pragma: no cover

@dataclass
class RetryCap(AbstractCapability[Any]):
retried: bool = False

async def after_model_request(
self,
ctx: RunContext[Any],
*,
request_context: ModelRequestContext,
response: ModelResponse,
) -> ModelResponse:
if not self.retried:
self.retried = True
raise ModelRetry('Response was bad, please try again')
return response

cap = RetryCap()
agent = Agent(
FunctionModel(simple_model_function, stream_function=stream_fn),
capabilities=[cap],
)

@agent.tool_plain
def my_tool() -> str:
return 'tool result' # pragma: no cover

async with agent.run_stream('hello') as streamed:
result = await streamed.get_output()
assert result == 'good response'
assert call_count == 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Test coverage focuses on tool-call scenarios for streaming ModelRetry; text-response streaming path untested

The streaming ModelRetry tests (test_after_model_request_model_retry_streaming at line 6292, test_wrap_model_request_model_retry_streaming_after_handler at line 6430) use tool calls in their first model response, not text. When the model returns text in streaming, a FinalResultEvent is emitted and the agent loop yields StreamedRunResult before after_model_request runs in the finally block. A test with text as the first streaming response (triggering FinalResultEvent) followed by after_model_request raising ModelRetry would reveal that the retry is silently ignored.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM added a commit that referenced this pull request Mar 27, 2026
- Merge main with PR #4858 (ModelRetry from hooks)
- Re-apply ModelRetry outer try/except for tool execute hooks (lost in merge)
- Add same ModelRetry pattern to output validate and execute hooks:
  - before/after/wrap can raise ModelRetry to trigger a retry
  - ModelRetry from wrap_output_execute bypasses on_output_execute_error
  - Converted to ToolRetryError for retry handling
- Fix streaming output tool path: use handle_output_tool_call instead
  of handle_call so output hooks fire (not tool hooks) during streaming
- Fix union + error recovery crash: do_execute returns recovered data
  as-is when _union_kind is empty
- Add allow_partial/wrap_validation_errors params to validate_output_tool_call
- Update docstrings and docs/hooks.md with output hook ModelRetry info
- Add 5 tests: before_validate, after_validate, after_execute,
  wrap_execute (skips error hook), before_execute all raising ModelRetry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: L Large PR (501-1500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant