feat: separate cache breakpoints for static vs dynamic instructions (Anthropic) by Alex-Resch · Pull Request #4865 · pydantic/pydantic-ai

Alex-Resch · 2026-03-26T14:41:50Z

Summary

Add support for caching static and dynamic instruction blocks separately when using Anthropic prompt caching.

New features:

anthropic_static_cache_instructions model setting to cache the static system prompt independently
add_cache_breakpoint parameter on @agent.instructions() to add per-instruction cache breakpoints
InstructionPart dataclass and instruction_parts field on ModelRequest for structured cache control

This allows users with large, stable system prompts combined with frequently changing dynamic context to avoid cache invalidation of the entire system prompt when only the dynamic part changes.

Pre-Review Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
Linting and type checking pass per make format and make typecheck.
PR title is fit for the release changelog.

Pre-Merge Checklist

New tests for any fix or new behavior, maintaining 100% coverage.
Updated documentation for new features and behaviors, including docstrings for API docs.

devin-ai-integration

Devin Review found 4 potential issues.

View 2 additional findings in Devin Review.

pydantic_ai_slim/pydantic_ai/models/anthropic.py

pydantic_ai_slim/pydantic_ai/_agent_graph.py

devin-ai-integration · 2026-03-26T14:52:17Z

pydantic_ai_slim/pydantic_ai/agent/__init__.py

+                self._instructions.append(instruction_runner)  # pyright: ignore[reportArgumentType]
                return func_

            return decorator
        else:
-            self._instructions.append(func)
+            runner = _system_prompt.SystemPromptRunner[AgentDepsT](func, add_cache_breakpoint=add_cache_breakpoint)
+            self._instructions.append(runner)  # pyright: ignore[reportArgumentType]


🚩 Type mismatch: SystemPromptRunner appended to list[str | SystemPromptFunc]

The _instructions field is typed as list[str | SystemPromptFunc[AgentDepsT]] at pydantic_ai_slim/pydantic_ai/agent/__init__.py:192, but the new code at lines 1816 and 1822 appends SystemPromptRunner instances to it (suppressed with pyright: ignore[reportArgumentType]). This works at runtime because _get_instructions() at line 2326 now has an explicit isinstance(instruction, SystemPromptRunner) check. However, it means the type annotation no longer reflects what the list actually contains. Per the AGENTS.md rule about fixing type errors properly instead of using pyright: ignore, this should be addressed — ideally by updating the type annotation of _instructions to list[str | SystemPromptFunc[AgentDepsT] | SystemPromptRunner[AgentDepsT]] rather than suppressing the type checker.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-26T14:52:18Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

            ]
            return system_prompt_blocks, anthropic_messages


🚩 Empty system prompt block possible when cache_instructions is set with no content

The old code guarded the cache_instructions branch with if system_prompt and (cache_instructions := ...), ensuring the branch only executed when there was actual system prompt content. The new code at line 1042 only checks if cache_instructions: without verifying system_prompt is non-empty. If a user sets anthropic_cache_instructions=True but has no system prompts and no instructions, the code would create a BetaTextBlockParam with text='', which might cause an Anthropic API error. In practice this is unlikely because users rarely enable cache_instructions without any instructions, but it's a subtle behavioral regression from the old code.

(Refers to lines 1042-1053)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-03-26T15:17:23Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

            ]
            return system_prompt_blocks, anthropic_messages


🟡 Removed if system_prompt guard allows empty text block to be sent to Anthropic API

The old code guarded the cache_instructions path with if system_prompt and (cache_instructions := ...), ensuring no cached block was created when the system prompt was empty. The new code at line 1042 only checks if cache_instructions:, so when cache_instructions is set but both system_prompt is '' and instructions_str is None, a BetaTextBlockParam(type='text', text='', cache_control=...) is returned. Since this is a non-empty list, system=system_prompt or OMIT at pydantic_ai_slim/pydantic_ai/models/anthropic.py:493 sends it to the Anthropic API, which may reject an empty text block.

(Refers to lines 1042-1053)

Was this helpful? React with 👍 or 👎 to provide feedback.

…Anthropic)

DouweM · 2026-03-27T03:36:24Z

@Alex-Resch Thanks Alex! I've asked our review bot to have a look and I'll check it out myself tomorrow; I agree we should support something like this.

github-actions · 2026-03-27T03:38:47Z

Thanks for working on this, @Alex-Resch! The underlying use case (separate caching for static vs dynamic instructions) is a real one, and the issue at #4543 articulates it well.

However, I have significant concerns about the approach taken here that I think need to be discussed with a maintainer before this can move forward. The core issue is that this PR introduces an Anthropic-specific caching concept into provider-agnostic core types and APIs:

InstructionPart in messages.py — a new public type in the shared messages module whose add_cache_breakpoint field is an Anthropic-specific concept. Other providers don't have this notion of cache breakpoints with TTLs.
instruction_parts on ModelRequest — adds a provider-specific field to a core message type that all providers use. Even though it's excluded from serialization, it's still a public field on a shared abstraction.
add_cache_breakpoint on @agent.instructions() — adds an Anthropic-specific parameter to the provider-agnostic Agent decorator API. Users of other providers will see this parameter in their IDE autocomplete but it will do nothing.
get_instructions return type change — changing from str | None to tuple[str | None, list[InstructionPart] | None] touches 6+ callsites in the agent graph, all to thread Anthropic-specific metadata through the framework.

The project guidelines are quite clear that provider-specific code should live in models/{provider}.py and that core types should remain provider-agnostic. The existing anthropic_cache_instructions setting achieves this well — it's a setting in AnthropicModelSettings that only affects the Anthropic model adapter.

An alternative approach that's more in line with the project's architecture would be to keep the instruction splitting logic entirely within the Anthropic model adapter, without modifying core types. For example, the Anthropic model could inspect the individual instruction parts itself (perhaps via a new field on ModelRequest that all models could potentially use, or via the existing instructions string with some structured metadata). But the right design here really needs @DouweM's input.

I'd recommend discussing the design approach with the maintainers before investing more in the implementation. The issue itself also has no maintainer comments yet, so there's no alignment on what shape the solution should take.

I've left specific inline comments below on the implementation issues I noticed, but the architectural question above is the most important thing to resolve first.

github-actions · 2026-03-27T03:38:57Z

pydantic_ai_slim/pydantic_ai/messages.py

+@dataclass(repr=False)
+class InstructionPart:
+    """A single instruction block with optional cache control metadata."""
+
+    content: str
+    add_cache_breakpoint: bool | Literal['5m', '1h'] = False
+


InstructionPart with its add_cache_breakpoint: bool | Literal['5m', '1h'] field is fundamentally an Anthropic prompt caching concept. Placing it in the shared messages.py module (which defines the provider-agnostic message protocol) means every provider and every consumer of messages now has visibility into a concept that only Anthropic supports.

The project guidelines explicitly say to "place provider-specific code in models/{provider}.py, not shared modules" and to "store provider-specific metadata in structured provider_details or provider_metadata fields."

If this feature moves forward, the Anthropic-specific cache metadata should live in the Anthropic model adapter, not in the core message types. @DouweM — would appreciate your input on the right abstraction here.

github-actions · 2026-03-27T03:39:02Z

pydantic_ai_slim/pydantic_ai/messages.py

+    instruction_parts: Annotated[list[InstructionPart] | None, pydantic.Field(exclude=True)] = None
+    """Structured instruction parts for models that support per-part cache control (e.g. Anthropic)."""


Same concern as above: adding instruction_parts to ModelRequest threads Anthropic-specific cache metadata through the core message type. The exclude=True prevents serialization issues but doesn't address the fundamental coupling — every part of the system that creates or processes ModelRequest objects now needs to be aware of this field.

The 6+ callsites in _agent_graph.py that had to be updated to thread instruction_parts through are evidence of this coupling.

github-actions · 2026-03-27T03:39:07Z

pydantic_ai_slim/pydantic_ai/agent/__init__.py

+                instruction_runner = _system_prompt.SystemPromptRunner[AgentDepsT](
+                    func_, add_cache_breakpoint=add_cache_breakpoint
+                )
+                self._instructions.append(instruction_runner)  # pyright: ignore[reportArgumentType]


Suppressing reportArgumentType with pyright: ignore to append a SystemPromptRunner to a list[str | SystemPromptFunc[AgentDepsT]] is a red flag — it means the type annotation doesn't match what the list actually contains. The project guidelines are clear: fix type errors properly instead of using pyright: ignore suppressions.

If _instructions can now contain SystemPromptRunner instances, its type annotation should be updated to reflect that.

github-actions · 2026-03-27T03:39:10Z

pydantic_ai_slim/pydantic_ai/agent/__init__.py

+        *,
+        add_cache_breakpoint: bool | Literal['5m', '1h'] = False,
+    ) -> Callable[[SystemPromptFunc[AgentDepsT]], SystemPromptFunc[AgentDepsT]]: ...

    def instructions(
        self,
        func: _system_prompt.SystemPromptFunc[AgentDepsT] | None = None,
        /,
+        add_cache_breakpoint: bool | Literal['5m', '1h'] = False,


add_cache_breakpoint is an Anthropic-specific parameter on the provider-agnostic @agent.instructions() decorator. Users of OpenAI, Google, Groq, etc. will see this parameter in their IDE and might wonder what it does — the answer is "nothing, it only affects Anthropic."

This is the kind of API surface area expansion that needs careful consideration. The project philosophy prefers "strong primitives, powerful abstractions, and general solutions" over narrow provider-specific features. A more general abstraction (if one exists) or keeping this entirely in AnthropicModelSettings configuration would be more appropriate.

@DouweM — this is the public API change that most warrants your input. Is there a provider-agnostic concept here (like "instruction segmentation" or "instruction metadata") that would make sense, or should this stay Anthropic-only?

github-actions · 2026-03-27T03:39:17Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

@@ -1033,6 +1052,35 @@ async def _map_message(  # noqa: C901
            ]
            return system_prompt_blocks, anthropic_messages


The old code had a guard if system_prompt and (cache_instructions := ...), but the new code only checks if cache_instructions:. If cache_instructions is set but system_prompt is empty and instructions_str is None, this creates a BetaTextBlockParam(type='text', text='') with cache control, which Anthropic will reject.

Additionally, there's no validation or error when both anthropic_cache_instructions and anthropic_static_cache_instructions are set simultaneously. These settings are conceptually conflicting (one caches everything as a single block, the other splits static from dynamic), and using both could waste cache points or cause confusing behavior. Consider raising a UserError when both are set.

github-actions · 2026-03-27T03:39:19Z

docs/models/anthropic.md

+
+This way, the expensive static instructions stay cached even when dynamic context changes between requests.
+
+```python {test="skip"}


The docs guidelines say to avoid test="skip" in code examples unless unavoidable — prefer mocks or fixtures instead. This example could use a test model (like the FunctionModel or VCR cassettes) to make it testable, ensuring it stays in sync with the actual API as it evolves.

github-actions · 2026-03-27T03:39:24Z

pydantic_ai_slim/pydantic_ai/_agent_graph.py

    max_result_retries: int
    end_strategy: EndStrategy
-    get_instructions: Callable[[RunContext[DepsT]], Awaitable[str | None]]
+    get_instructions: Callable[[RunContext[DepsT]], Awaitable[tuple[str | None, list[InstructionPart] | None]]]


Changing the return type of get_instructions from str | None to tuple[str | None, list[InstructionPart] | None] is a significant change to a core interface that ripples through the entire agent graph (every callsite that calls get_instructions and every callsite that constructs ModelRequest had to be updated). This is a lot of framework-level plumbing for a single provider's caching feature.

If this approach is accepted by the maintainers, consider whether get_instructions should instead return a structured object (e.g., a small dataclass with text and parts fields) rather than a bare tuple, for better readability at the callsites.

github-actions · 2026-03-27T03:39:28Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+    def _get_instruction_parts(
+        messages: Sequence[ModelMessage],
+        model_request_parameters: ModelRequestParameters | None = None,
+    ) -> list[InstructionPart] | None:
+        last_two_requests: list[ModelRequest] = []
+        for message in reversed(messages):
+            if isinstance(message, ModelRequest):
+                last_two_requests.append(message)
+                if len(last_two_requests) == 2:
+                    break
+
+        parts: list[InstructionPart] | None = None
+
+        if last_two_requests:
+            most_recent = last_two_requests[0]
+            if most_recent.instruction_parts is not None:
+                parts = list(most_recent.instruction_parts)
+            elif len(last_two_requests) == 2 and all(
+                p.part_kind == 'tool-return' or p.part_kind == 'retry-prompt' for p in most_recent.parts
+            ):
+                second = last_two_requests[1]
+                if second.instruction_parts is not None:
+                    parts = list(second.instruction_parts)
+
+        if (
+            parts is not None
+            and model_request_parameters
+            and (output_instr := model_request_parameters.prompted_output_instructions)
+        ):
+            parts.append(InstructionPart(content=output_instr))
+
+        return parts or None


This method largely duplicates the logic in the existing _get_instructions (from the Model base class) — both iterate messages in reverse, find the last two requests, handle the "tool-return-only" fallback case, and append prompted_output_instructions. The duplication is a maintenance hazard since changes to one will need to be mirrored in the other.

Per the guidelines, duplicated logic should be consolidated into shared helpers. If this approach is accepted, consider refactoring _get_instructions to return both the string and the parts, or having _get_instruction_parts delegate to _get_instructions for the common traversal logic.

github-actions bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Mar 26, 2026

devin-ai-integration bot reviewed Mar 26, 2026

View reviewed changes

Alex-Resch force-pushed the feat/separate-cache-breakpoints branch 2 times, most recently from abace67 to 9d4b089 Compare March 26, 2026 15:09

devin-ai-integration bot reviewed Mar 26, 2026

View reviewed changes

Alex-Resch force-pushed the feat/separate-cache-breakpoints branch from 9d4b089 to a23175f Compare March 26, 2026 15:52

This comment was marked as resolved.

Sign in to view

Alex-Resch force-pushed the feat/separate-cache-breakpoints branch from a23175f to 09d8e48 Compare March 26, 2026 17:24

github-actions bot added size: L Large PR (501-1500 weighted lines) and removed size: M Medium PR (101-500 weighted lines) labels Mar 26, 2026

This comment was marked as resolved.

Sign in to view

feat: separate cache breakpoints for static vs dynamic instructions (…

51c23e2

…Anthropic)

Alex-Resch force-pushed the feat/separate-cache-breakpoints branch from 09d8e48 to 51c23e2 Compare March 26, 2026 18:19

DouweM added the auto-review label Mar 27, 2026

github-actions bot reviewed Mar 27, 2026

View reviewed changes

github-actions bot removed the auto-review label Mar 27, 2026

		instruction_parts: Annotated[list[InstructionPart] \| None, pydantic.Field(exclude=True)] = None
		"""Structured instruction parts for models that support per-part cache control (e.g. Anthropic)."""

		@@ -1033,6 +1052,35 @@ async def _map_message( # noqa: C901
		]
		return system_prompt_blocks, anthropic_messages


		This way, the expensive static instructions stay cached even when dynamic context changes between requests.

		```python {test="skip"}

Conversation

Alex-Resch commented Mar 26, 2026

Summary

Pre-Review Checklist

Pre-Merge Checklist

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

DouweM commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DouweM commented Mar 27, 2026 •

edited

Loading