Skip to content

feat: separate cache breakpoints for static vs dynamic instructions (Anthropic)#4865

Open
Alex-Resch wants to merge 1 commit intopydantic:mainfrom
Alex-Resch:feat/separate-cache-breakpoints
Open

feat: separate cache breakpoints for static vs dynamic instructions (Anthropic)#4865
Alex-Resch wants to merge 1 commit intopydantic:mainfrom
Alex-Resch:feat/separate-cache-breakpoints

Conversation

@Alex-Resch
Copy link
Copy Markdown

Closes #4543

Summary

Add support for caching static and dynamic instruction blocks separately when using Anthropic prompt caching.

New features:

  • anthropic_static_cache_instructions model setting to cache the static system prompt independently
  • add_cache_breakpoint parameter on @agent.instructions() to add per-instruction cache breakpoints
  • InstructionPart dataclass and instruction_parts field on ModelRequest for structured cache control

This allows users with large, stable system prompts combined with frequently changing dynamic context to avoid cache invalidation of the entire system prompt when only the dynamic part changes.

Pre-Review Checklist

  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • No breaking changes in accordance with the version policy.
  • Linting and type checking pass per make format and make typecheck.
  • PR title is fit for the release changelog.

Pre-Merge Checklist

  • New tests for any fix or new behavior, maintaining 100% coverage.
  • Updated documentation for new features and behaviors, including docstrings for API docs.

@github-actions github-actions bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Mar 26, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1816 to +1822
self._instructions.append(instruction_runner) # pyright: ignore[reportArgumentType]
return func_

return decorator
else:
self._instructions.append(func)
runner = _system_prompt.SystemPromptRunner[AgentDepsT](func, add_cache_breakpoint=add_cache_breakpoint)
self._instructions.append(runner) # pyright: ignore[reportArgumentType]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Type mismatch: SystemPromptRunner appended to list[str | SystemPromptFunc]

The _instructions field is typed as list[str | SystemPromptFunc[AgentDepsT]] at pydantic_ai_slim/pydantic_ai/agent/__init__.py:192, but the new code at lines 1816 and 1822 appends SystemPromptRunner instances to it (suppressed with pyright: ignore[reportArgumentType]). This works at runtime because _get_instructions() at line 2326 now has an explicit isinstance(instruction, SystemPromptRunner) check. However, it means the type annotation no longer reflects what the list actually contains. Per the AGENTS.md rule about fixing type errors properly instead of using pyright: ignore, this should be addressed — ideally by updating the type annotation of _instructions to list[str | SystemPromptFunc[AgentDepsT] | SystemPromptRunner[AgentDepsT]] rather than suppressing the type checker.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines 1052 to 1053
]
return system_prompt_blocks, anthropic_messages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Empty system prompt block possible when cache_instructions is set with no content

The old code guarded the cache_instructions branch with if system_prompt and (cache_instructions := ...), ensuring the branch only executed when there was actual system prompt content. The new code at line 1042 only checks if cache_instructions: without verifying system_prompt is non-empty. If a user sets anthropic_cache_instructions=True but has no system prompts and no instructions, the code would create a BetaTextBlockParam with text='', which might cause an Anthropic API error. In practice this is unlikely because users rarely enable cache_instructions without any instructions, but it's a subtle behavioral regression from the old code.

(Refers to lines 1042-1053)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@Alex-Resch Alex-Resch force-pushed the feat/separate-cache-breakpoints branch 2 times, most recently from abace67 to 9d4b089 Compare March 26, 2026 15:09
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1052 to 1053
]
return system_prompt_blocks, anthropic_messages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Removed if system_prompt guard allows empty text block to be sent to Anthropic API

The old code guarded the cache_instructions path with if system_prompt and (cache_instructions := ...), ensuring no cached block was created when the system prompt was empty. The new code at line 1042 only checks if cache_instructions:, so when cache_instructions is set but both system_prompt is '' and instructions_str is None, a BetaTextBlockParam(type='text', text='', cache_control=...) is returned. Since this is a non-empty list, system=system_prompt or OMIT at pydantic_ai_slim/pydantic_ai/models/anthropic.py:493 sends it to the Anthropic API, which may reject an empty text block.

(Refers to lines 1042-1053)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@Alex-Resch Alex-Resch force-pushed the feat/separate-cache-breakpoints branch from 9d4b089 to a23175f Compare March 26, 2026 15:52
devin-ai-integration[bot]

This comment was marked as resolved.

@Alex-Resch Alex-Resch force-pushed the feat/separate-cache-breakpoints branch from a23175f to 09d8e48 Compare March 26, 2026 17:24
@github-actions github-actions bot added size: L Large PR (501-1500 weighted lines) and removed size: M Medium PR (101-500 weighted lines) labels Mar 26, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@Alex-Resch Alex-Resch force-pushed the feat/separate-cache-breakpoints branch from 09d8e48 to 51c23e2 Compare March 26, 2026 18:19
@DouweM
Copy link
Copy Markdown
Collaborator

DouweM commented Mar 27, 2026

@Alex-Resch Thanks Alex! I've asked our review bot to have a look and I'll check it out myself tomorrow; I agree we should support something like this.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for working on this, @Alex-Resch! The underlying use case (separate caching for static vs dynamic instructions) is a real one, and the issue at #4543 articulates it well.

However, I have significant concerns about the approach taken here that I think need to be discussed with a maintainer before this can move forward. The core issue is that this PR introduces an Anthropic-specific caching concept into provider-agnostic core types and APIs:

  1. InstructionPart in messages.py — a new public type in the shared messages module whose add_cache_breakpoint field is an Anthropic-specific concept. Other providers don't have this notion of cache breakpoints with TTLs.

  2. instruction_parts on ModelRequest — adds a provider-specific field to a core message type that all providers use. Even though it's excluded from serialization, it's still a public field on a shared abstraction.

  3. add_cache_breakpoint on @agent.instructions() — adds an Anthropic-specific parameter to the provider-agnostic Agent decorator API. Users of other providers will see this parameter in their IDE autocomplete but it will do nothing.

  4. get_instructions return type change — changing from str | None to tuple[str | None, list[InstructionPart] | None] touches 6+ callsites in the agent graph, all to thread Anthropic-specific metadata through the framework.

The project guidelines are quite clear that provider-specific code should live in models/{provider}.py and that core types should remain provider-agnostic. The existing anthropic_cache_instructions setting achieves this well — it's a setting in AnthropicModelSettings that only affects the Anthropic model adapter.

An alternative approach that's more in line with the project's architecture would be to keep the instruction splitting logic entirely within the Anthropic model adapter, without modifying core types. For example, the Anthropic model could inspect the individual instruction parts itself (perhaps via a new field on ModelRequest that all models could potentially use, or via the existing instructions string with some structured metadata). But the right design here really needs @DouweM's input.

I'd recommend discussing the design approach with the maintainers before investing more in the implementation. The issue itself also has no maintainer comments yet, so there's no alignment on what shape the solution should take.

I've left specific inline comments below on the implementation issues I noticed, but the architectural question above is the most important thing to resolve first.

Comment on lines +1362 to +1368
@dataclass(repr=False)
class InstructionPart:
"""A single instruction block with optional cache control metadata."""

content: str
add_cache_breakpoint: bool | Literal['5m', '1h'] = False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstructionPart with its add_cache_breakpoint: bool | Literal['5m', '1h'] field is fundamentally an Anthropic prompt caching concept. Placing it in the shared messages.py module (which defines the provider-agnostic message protocol) means every provider and every consumer of messages now has visibility into a concept that only Anthropic supports.

The project guidelines explicitly say to "place provider-specific code in models/{provider}.py, not shared modules" and to "store provider-specific metadata in structured provider_details or provider_metadata fields."

If this feature moves forward, the Anthropic-specific cache metadata should live in the Anthropic model adapter, not in the core message types. @DouweM — would appreciate your input on the right abstraction here.

Comment on lines +1387 to +1388
instruction_parts: Annotated[list[InstructionPart] | None, pydantic.Field(exclude=True)] = None
"""Structured instruction parts for models that support per-part cache control (e.g. Anthropic)."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as above: adding instruction_parts to ModelRequest threads Anthropic-specific cache metadata through the core message type. The exclude=True prevents serialization issues but doesn't address the fundamental coupling — every part of the system that creates or processes ModelRequest objects now needs to be aware of this field.

The 6+ callsites in _agent_graph.py that had to be updated to thread instruction_parts through are evidence of this coupling.

instruction_runner = _system_prompt.SystemPromptRunner[AgentDepsT](
func_, add_cache_breakpoint=add_cache_breakpoint
)
self._instructions.append(instruction_runner) # pyright: ignore[reportArgumentType]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppressing reportArgumentType with pyright: ignore to append a SystemPromptRunner to a list[str | SystemPromptFunc[AgentDepsT]] is a red flag — it means the type annotation doesn't match what the list actually contains. The project guidelines are clear: fix type errors properly instead of using pyright: ignore suppressions.

If _instructions can now contain SystemPromptRunner instances, its type annotation should be updated to reflect that.

Comment on lines +1770 to +1778
*,
add_cache_breakpoint: bool | Literal['5m', '1h'] = False,
) -> Callable[[SystemPromptFunc[AgentDepsT]], SystemPromptFunc[AgentDepsT]]: ...

def instructions(
self,
func: _system_prompt.SystemPromptFunc[AgentDepsT] | None = None,
/,
add_cache_breakpoint: bool | Literal['5m', '1h'] = False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_cache_breakpoint is an Anthropic-specific parameter on the provider-agnostic @agent.instructions() decorator. Users of OpenAI, Google, Groq, etc. will see this parameter in their IDE and might wonder what it does — the answer is "nothing, it only affects Anthropic."

This is the kind of API surface area expansion that needs careful consideration. The project philosophy prefers "strong primitives, powerful abstractions, and general solutions" over narrow provider-specific features. A more general abstraction (if one exists) or keeping this entirely in AnthropicModelSettings configuration would be more appropriate.

@DouweM — this is the public API change that most warrants your input. Is there a provider-agnostic concept here (like "instruction segmentation" or "instruction metadata") that would make sense, or should this stay Anthropic-only?

Comment on lines 1042 to 1053
@@ -1033,6 +1052,35 @@ async def _map_message( # noqa: C901
]
return system_prompt_blocks, anthropic_messages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code had a guard if system_prompt and (cache_instructions := ...), but the new code only checks if cache_instructions:. If cache_instructions is set but system_prompt is empty and instructions_str is None, this creates a BetaTextBlockParam(type='text', text='') with cache control, which Anthropic will reject.

Additionally, there's no validation or error when both anthropic_cache_instructions and anthropic_static_cache_instructions are set simultaneously. These settings are conceptually conflicting (one caches everything as a single block, the other splits static from dynamic), and using both could waste cache points or cause confusing behavior. Consider raising a UserError when both are set.


This way, the expensive static instructions stay cached even when dynamic context changes between requests.

```python {test="skip"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs guidelines say to avoid test="skip" in code examples unless unavoidable — prefer mocks or fixtures instead. This example could use a test model (like the FunctionModel or VCR cassettes) to make it testable, ensuring it stays in sync with the actual API as it evolves.

max_result_retries: int
end_strategy: EndStrategy
get_instructions: Callable[[RunContext[DepsT]], Awaitable[str | None]]
get_instructions: Callable[[RunContext[DepsT]], Awaitable[tuple[str | None, list[InstructionPart] | None]]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the return type of get_instructions from str | None to tuple[str | None, list[InstructionPart] | None] is a significant change to a core interface that ripples through the entire agent graph (every callsite that calls get_instructions and every callsite that constructs ModelRequest had to be updated). This is a lot of framework-level plumbing for a single provider's caching feature.

If this approach is accepted by the maintainers, consider whether get_instructions should instead return a structured object (e.g., a small dataclass with text and parts fields) rather than a bare tuple, for better readability at the callsites.

Comment on lines +1352 to +1383
def _get_instruction_parts(
messages: Sequence[ModelMessage],
model_request_parameters: ModelRequestParameters | None = None,
) -> list[InstructionPart] | None:
last_two_requests: list[ModelRequest] = []
for message in reversed(messages):
if isinstance(message, ModelRequest):
last_two_requests.append(message)
if len(last_two_requests) == 2:
break

parts: list[InstructionPart] | None = None

if last_two_requests:
most_recent = last_two_requests[0]
if most_recent.instruction_parts is not None:
parts = list(most_recent.instruction_parts)
elif len(last_two_requests) == 2 and all(
p.part_kind == 'tool-return' or p.part_kind == 'retry-prompt' for p in most_recent.parts
):
second = last_two_requests[1]
if second.instruction_parts is not None:
parts = list(second.instruction_parts)

if (
parts is not None
and model_request_parameters
and (output_instr := model_request_parameters.prompted_output_instructions)
):
parts.append(InstructionPart(content=output_instr))

return parts or None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method largely duplicates the logic in the existing _get_instructions (from the Model base class) — both iterate messages in reverse, find the last two requests, handle the "tool-return-only" fallback case, and append prompted_output_instructions. The duplication is a maintenance hazard since changes to one will need to be mirrored in the other.

Per the guidelines, duplicated logic should be consolidated into shared helpers. If this approach is accepted, consider refactoring _get_instructions to return both the string and the parts, or having _get_instruction_parts delegate to _get_instructions for the common traversal logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: L Large PR (501-1500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Separate cache breakpoints for static vs dynamic instructions (Anthropic prompt caching)

2 participants