feat: separate cache breakpoints for static vs dynamic instructions (Anthropic)#4865
feat: separate cache breakpoints for static vs dynamic instructions (Anthropic)#4865Alex-Resch wants to merge 1 commit intopydantic:mainfrom
Conversation
| self._instructions.append(instruction_runner) # pyright: ignore[reportArgumentType] | ||
| return func_ | ||
|
|
||
| return decorator | ||
| else: | ||
| self._instructions.append(func) | ||
| runner = _system_prompt.SystemPromptRunner[AgentDepsT](func, add_cache_breakpoint=add_cache_breakpoint) | ||
| self._instructions.append(runner) # pyright: ignore[reportArgumentType] |
There was a problem hiding this comment.
🚩 Type mismatch: SystemPromptRunner appended to list[str | SystemPromptFunc]
The _instructions field is typed as list[str | SystemPromptFunc[AgentDepsT]] at pydantic_ai_slim/pydantic_ai/agent/__init__.py:192, but the new code at lines 1816 and 1822 appends SystemPromptRunner instances to it (suppressed with pyright: ignore[reportArgumentType]). This works at runtime because _get_instructions() at line 2326 now has an explicit isinstance(instruction, SystemPromptRunner) check. However, it means the type annotation no longer reflects what the list actually contains. Per the AGENTS.md rule about fixing type errors properly instead of using pyright: ignore, this should be addressed — ideally by updating the type annotation of _instructions to list[str | SystemPromptFunc[AgentDepsT] | SystemPromptRunner[AgentDepsT]] rather than suppressing the type checker.
Was this helpful? React with 👍 or 👎 to provide feedback.
| ] | ||
| return system_prompt_blocks, anthropic_messages |
There was a problem hiding this comment.
🚩 Empty system prompt block possible when cache_instructions is set with no content
The old code guarded the cache_instructions branch with if system_prompt and (cache_instructions := ...), ensuring the branch only executed when there was actual system prompt content. The new code at line 1042 only checks if cache_instructions: without verifying system_prompt is non-empty. If a user sets anthropic_cache_instructions=True but has no system prompts and no instructions, the code would create a BetaTextBlockParam with text='', which might cause an Anthropic API error. In practice this is unlikely because users rarely enable cache_instructions without any instructions, but it's a subtle behavioral regression from the old code.
(Refers to lines 1042-1053)
Was this helpful? React with 👍 or 👎 to provide feedback.
abace67 to
9d4b089
Compare
| ] | ||
| return system_prompt_blocks, anthropic_messages |
There was a problem hiding this comment.
🟡 Removed if system_prompt guard allows empty text block to be sent to Anthropic API
The old code guarded the cache_instructions path with if system_prompt and (cache_instructions := ...), ensuring no cached block was created when the system prompt was empty. The new code at line 1042 only checks if cache_instructions:, so when cache_instructions is set but both system_prompt is '' and instructions_str is None, a BetaTextBlockParam(type='text', text='', cache_control=...) is returned. Since this is a non-empty list, system=system_prompt or OMIT at pydantic_ai_slim/pydantic_ai/models/anthropic.py:493 sends it to the Anthropic API, which may reject an empty text block.
(Refers to lines 1042-1053)
Was this helpful? React with 👍 or 👎 to provide feedback.
9d4b089 to
a23175f
Compare
a23175f to
09d8e48
Compare
09d8e48 to
51c23e2
Compare
|
@Alex-Resch Thanks Alex! I've asked our review bot to have a look and I'll check it out myself tomorrow; I agree we should support something like this. |
|
Thanks for working on this, @Alex-Resch! The underlying use case (separate caching for static vs dynamic instructions) is a real one, and the issue at #4543 articulates it well. However, I have significant concerns about the approach taken here that I think need to be discussed with a maintainer before this can move forward. The core issue is that this PR introduces an Anthropic-specific caching concept into provider-agnostic core types and APIs:
The project guidelines are quite clear that provider-specific code should live in An alternative approach that's more in line with the project's architecture would be to keep the instruction splitting logic entirely within the Anthropic model adapter, without modifying core types. For example, the Anthropic model could inspect the individual instruction parts itself (perhaps via a new field on I'd recommend discussing the design approach with the maintainers before investing more in the implementation. The issue itself also has no maintainer comments yet, so there's no alignment on what shape the solution should take. I've left specific inline comments below on the implementation issues I noticed, but the architectural question above is the most important thing to resolve first. |
| @dataclass(repr=False) | ||
| class InstructionPart: | ||
| """A single instruction block with optional cache control metadata.""" | ||
|
|
||
| content: str | ||
| add_cache_breakpoint: bool | Literal['5m', '1h'] = False | ||
|
|
There was a problem hiding this comment.
InstructionPart with its add_cache_breakpoint: bool | Literal['5m', '1h'] field is fundamentally an Anthropic prompt caching concept. Placing it in the shared messages.py module (which defines the provider-agnostic message protocol) means every provider and every consumer of messages now has visibility into a concept that only Anthropic supports.
The project guidelines explicitly say to "place provider-specific code in models/{provider}.py, not shared modules" and to "store provider-specific metadata in structured provider_details or provider_metadata fields."
If this feature moves forward, the Anthropic-specific cache metadata should live in the Anthropic model adapter, not in the core message types. @DouweM — would appreciate your input on the right abstraction here.
| instruction_parts: Annotated[list[InstructionPart] | None, pydantic.Field(exclude=True)] = None | ||
| """Structured instruction parts for models that support per-part cache control (e.g. Anthropic).""" |
There was a problem hiding this comment.
Same concern as above: adding instruction_parts to ModelRequest threads Anthropic-specific cache metadata through the core message type. The exclude=True prevents serialization issues but doesn't address the fundamental coupling — every part of the system that creates or processes ModelRequest objects now needs to be aware of this field.
The 6+ callsites in _agent_graph.py that had to be updated to thread instruction_parts through are evidence of this coupling.
| instruction_runner = _system_prompt.SystemPromptRunner[AgentDepsT]( | ||
| func_, add_cache_breakpoint=add_cache_breakpoint | ||
| ) | ||
| self._instructions.append(instruction_runner) # pyright: ignore[reportArgumentType] |
There was a problem hiding this comment.
Suppressing reportArgumentType with pyright: ignore to append a SystemPromptRunner to a list[str | SystemPromptFunc[AgentDepsT]] is a red flag — it means the type annotation doesn't match what the list actually contains. The project guidelines are clear: fix type errors properly instead of using pyright: ignore suppressions.
If _instructions can now contain SystemPromptRunner instances, its type annotation should be updated to reflect that.
| *, | ||
| add_cache_breakpoint: bool | Literal['5m', '1h'] = False, | ||
| ) -> Callable[[SystemPromptFunc[AgentDepsT]], SystemPromptFunc[AgentDepsT]]: ... | ||
|
|
||
| def instructions( | ||
| self, | ||
| func: _system_prompt.SystemPromptFunc[AgentDepsT] | None = None, | ||
| /, | ||
| add_cache_breakpoint: bool | Literal['5m', '1h'] = False, |
There was a problem hiding this comment.
add_cache_breakpoint is an Anthropic-specific parameter on the provider-agnostic @agent.instructions() decorator. Users of OpenAI, Google, Groq, etc. will see this parameter in their IDE and might wonder what it does — the answer is "nothing, it only affects Anthropic."
This is the kind of API surface area expansion that needs careful consideration. The project philosophy prefers "strong primitives, powerful abstractions, and general solutions" over narrow provider-specific features. A more general abstraction (if one exists) or keeping this entirely in AnthropicModelSettings configuration would be more appropriate.
@DouweM — this is the public API change that most warrants your input. Is there a provider-agnostic concept here (like "instruction segmentation" or "instruction metadata") that would make sense, or should this stay Anthropic-only?
| @@ -1033,6 +1052,35 @@ async def _map_message( # noqa: C901 | |||
| ] | |||
| return system_prompt_blocks, anthropic_messages | |||
There was a problem hiding this comment.
The old code had a guard if system_prompt and (cache_instructions := ...), but the new code only checks if cache_instructions:. If cache_instructions is set but system_prompt is empty and instructions_str is None, this creates a BetaTextBlockParam(type='text', text='') with cache control, which Anthropic will reject.
Additionally, there's no validation or error when both anthropic_cache_instructions and anthropic_static_cache_instructions are set simultaneously. These settings are conceptually conflicting (one caches everything as a single block, the other splits static from dynamic), and using both could waste cache points or cause confusing behavior. Consider raising a UserError when both are set.
|
|
||
| This way, the expensive static instructions stay cached even when dynamic context changes between requests. | ||
|
|
||
| ```python {test="skip"} |
There was a problem hiding this comment.
The docs guidelines say to avoid test="skip" in code examples unless unavoidable — prefer mocks or fixtures instead. This example could use a test model (like the FunctionModel or VCR cassettes) to make it testable, ensuring it stays in sync with the actual API as it evolves.
| max_result_retries: int | ||
| end_strategy: EndStrategy | ||
| get_instructions: Callable[[RunContext[DepsT]], Awaitable[str | None]] | ||
| get_instructions: Callable[[RunContext[DepsT]], Awaitable[tuple[str | None, list[InstructionPart] | None]]] |
There was a problem hiding this comment.
Changing the return type of get_instructions from str | None to tuple[str | None, list[InstructionPart] | None] is a significant change to a core interface that ripples through the entire agent graph (every callsite that calls get_instructions and every callsite that constructs ModelRequest had to be updated). This is a lot of framework-level plumbing for a single provider's caching feature.
If this approach is accepted by the maintainers, consider whether get_instructions should instead return a structured object (e.g., a small dataclass with text and parts fields) rather than a bare tuple, for better readability at the callsites.
| def _get_instruction_parts( | ||
| messages: Sequence[ModelMessage], | ||
| model_request_parameters: ModelRequestParameters | None = None, | ||
| ) -> list[InstructionPart] | None: | ||
| last_two_requests: list[ModelRequest] = [] | ||
| for message in reversed(messages): | ||
| if isinstance(message, ModelRequest): | ||
| last_two_requests.append(message) | ||
| if len(last_two_requests) == 2: | ||
| break | ||
|
|
||
| parts: list[InstructionPart] | None = None | ||
|
|
||
| if last_two_requests: | ||
| most_recent = last_two_requests[0] | ||
| if most_recent.instruction_parts is not None: | ||
| parts = list(most_recent.instruction_parts) | ||
| elif len(last_two_requests) == 2 and all( | ||
| p.part_kind == 'tool-return' or p.part_kind == 'retry-prompt' for p in most_recent.parts | ||
| ): | ||
| second = last_two_requests[1] | ||
| if second.instruction_parts is not None: | ||
| parts = list(second.instruction_parts) | ||
|
|
||
| if ( | ||
| parts is not None | ||
| and model_request_parameters | ||
| and (output_instr := model_request_parameters.prompted_output_instructions) | ||
| ): | ||
| parts.append(InstructionPart(content=output_instr)) | ||
|
|
||
| return parts or None |
There was a problem hiding this comment.
This method largely duplicates the logic in the existing _get_instructions (from the Model base class) — both iterate messages in reverse, find the last two requests, handle the "tool-return-only" fallback case, and append prompted_output_instructions. The duplication is a maintenance hazard since changes to one will need to be mirrored in the other.
Per the guidelines, duplicated logic should be consolidated into shared helpers. If this approach is accepted, consider refactoring _get_instructions to return both the string and the parts, or having _get_instruction_parts delegate to _get_instructions for the common traversal logic.
Closes #4543
Summary
Add support for caching static and dynamic instruction blocks separately when using Anthropic prompt caching.
New features:
anthropic_static_cache_instructionsmodel setting to cache the static system prompt independentlyadd_cache_breakpointparameter on@agent.instructions()to add per-instruction cache breakpointsInstructionPartdataclass andinstruction_partsfield onModelRequestfor structured cache controlThis allows users with large, stable system prompts combined with frequently changing dynamic context to avoid cache invalidation of the entire system prompt when only the dynamic part changes.
Pre-Review Checklist
make formatandmake typecheck.Pre-Merge Checklist