Skip to content

Commit 6f2aa11

Browse files
dsfacciniclaudeDavid SF
authored
Add StepPersistence capability for step-event durability across delegates (#251)
* Add StepPersistence capability for step-event durability across delegates Supersedes PR #176 (SessionPersistence): orchestrators like pydanty need visible event trails for delegate runs that may time out before a "save full session after the run" hook can fire, and need to continue or fork a delegate's prior investigation without rediscovering context. A single after-run snapshot is too coarse for that use case. The capability now records (a) append-only StepEvents at every boundary (run/model-request/tool-call start, completion, failure), (b) a ContinuableSnapshot only when message history is provider-valid (every ToolCallPart has a matching ToolReturnPart / RetryPromptPart) — saved mid-run after CallToolsNode and at after_run, and (c) a ToolEffectRecord ledger so a run killed between before_tool_execute and after_tool_execute leaves an `unknown_after_crash`-style record rather than a falsely-continuable snapshot. Lineage metadata (parent_run_id, agent_name) ties delegate runs back to their orchestrator. `continue_run` / `fork_run` helpers load the latest continuable snapshot for a run. Backends: InMemoryStepStore (tests) and FileStepStore (JSONL events + JSON snapshots, with run_id path-traversal validation and anyio.to_thread for blocking I/O). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address pydanty review + ergonomics: tighten correctness and identity model Correctness fixes from pydanty's PR review: - FileStepStore: snapshot filenames are now a per-run monotonic counter, not `ctx.run_step` — `run_step` resets each Agent.run, so re-using a `run_id` across calls would let an earlier run's higher step-index snapshot mask a later run's lower-step-index one. - StepStore.get_tool_effect now takes both `run_id` and `tool_call_id`. TestModel and other providers can reuse deterministic tool-call ids across runs; the previous unscoped lookup let one run's effect leak into another's record (including `started_at`). - is_provider_valid now rejects orphan, duplicate, and out-of-order tool returns — the old `set.discard` pattern silently accepted any return regardless of whether a matching call was open. Identity model: - `run_id` resolution: explicit > `{agent_name}-{8-char-hex}` > UUID. Materialised per Agent.run in `for_run`, so reusing one capability instance never silently merges runs. - `parent_run_id` auto-inferred via a module-level ContextVar set in `wrap_run`, so an orchestrator's tool that synchronously calls `delegate.run(...)` produces a delegate `RunRecord.parent_run_id` pointing at the orchestrator's `run_id` with zero threading. Explicit `parent_run_id=` still wins. - `conversation_id` propagated to `StepEvent` and `RunRecord`; `store.list_runs(conversation_id=..., parent_run_id=...)` supports filtering by either or both. Mirrors pydantic_ai's three-level identity (conversation -> run -> step) so "run 1, run 2, run 3" of one dialogue is queryable as a group via `conversation_id`. - `continue_from=` field dropped from the capability. Continuation is now only via `continue_run(store, run_id=...)` -> standard `Agent.run(message_history=...)`. One way to pass history into pydantic_ai, no parallel capability flag. README rewritten around the final API. New sections: three-level identity, run lineage with auto-inferred parent, inspecting a run tree, failure recovery. Tests: 168 total (up from 64), 100% branch coverage on the package. New coverage for the snapshot seq counter, cross-run tool-effect isolation, orphan/duplicate/out-of-order return rejection, ContextVar parent inference across nested agent.run, conversation_id propagation, and the agent_name-derived run_id default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address pydanty round-2 review: retry validity, list ordering, effect metadata Correctness: - is_provider_valid no longer rejects non-tool RetryPromptParts. Pydantic AI emits `RetryPromptPart(tool_name=None)` for output-validation failures and providers map those as plain user messages, not tool results. The previous check required every RetryPromptPart to resolve an open tool call, so a run with one output retry produced no final continuable snapshot despite being fully valid. - StepStore.list_runs now guarantees chronological (started_at ascending) ordering across both backends. FileStepStore was previously returning directory-name order (lexicographic), so the README's `[-1]` pattern for "latest run in conversation" could pick the older run when run ids did not sort by recency. - after_tool_execute and on_tool_execute_error preserve idempotency_key and effect_summary from the prior `started` record. Previously the terminal record was written without those fields, so any annotation the tool body wrote was lost on completion. - from_spec raises ValueError for unknown backends instead of silently falling back to in-memory storage. For a persistence capability, turning a typo into accidental non-durability is the wrong failure mode. API: - New annotate_tool_effect(store, ctx, *, idempotency_key=None, effect_summary=None) helper. Tool bodies that write external state call it to attach idempotency + effect metadata to the in-flight ToolEffectRecord without knowing the (run_id, tool_call_id) plumbing. Resolves run_id from a ContextVar set by wrap_run; reads tool_call_id / tool_name from RunContext. - ContextVar moved from `_capability.py` into a new `_context.py` module so the helper and the capability can share it without circular imports and without crossing the private-name barrier. Docs: README fixes a non-existent `list_runs(agent_name=None)` call, documents the chronological-ordering guarantee, and replaces the hand-wavy "populate fields on the ToolEffectRecord" line with a concrete `annotate_tool_effect` example. Tests: 178 total (was 168), 100% branch coverage on the package. Added coverage for non-tool retry acceptance, chronological list_runs on both backends, metadata preservation across completed/failed transitions, annotate_tool_effect under realistic agent.tool, and from_spec backend validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(step_persistence): align FileStepStore docstring with seq-counter layout The class docstring still showed snapshots/<step_index>.json from the pre-fix layout, but both the README and _next_snapshot_seq document the monotonic counter. Bring the class docstring in line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(step_persistence): fix broken README examples + clarify run_id sharing Pydanty round-3 review: - README continuation and lineage examples queried `list_runs(conversation_id=...)` on conversations the earlier `.run(...)` calls never set, so the examples crashed with IndexError on `[-1]`. Pass the conversation_id to the earlier calls so the lookup actually works. - The capability docstring claimed reusing a `StepPersistence` instance across `Agent.run` calls does NOT share the id. That is true only for the auto-derived (`agent_name`-prefixed or `ctx.run_id`) cases — an explicit `run_id=` is shared across every `.run()` by design, since that is the orchestrator pattern where the caller owns one logical identity across turns. Rewrite the resolution-order docs to spell out which cases share and which don't, and when to pick each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(step_persistence): align run_id semantics with pydantic_ai (per-call, not shared) Pydanty round-4 review: the prior round documented explicit `run_id` as shared across `.run()` calls on one capability instance — that framing caused real correctness gaps. The `ToolEffectRecord` ledger is keyed by `(run_id, tool_call_id)` and providers reuse deterministic tool-call ids (e.g. `TestModel` emits `pyd_ai_tool_call_id__{name}`), so a second `.run()` overwrites the first's effect record under the same key — the `unknown_after_crash` signal from turn 1 disappears when turn 2 lands. Realign: - `run_id` is per-`Agent.run`, matching `pydantic_ai.RunContext.run_id`. - For multi-turn logical grouping, use `conversation_id=` on `Agent.run(...)` — that is the pyai-native primitive. The orchestrator pattern is `conversation_id='orch'` with each turn auto-deriving its own `run_id`. - Explicit `run_id=` remains supported but is documented as single-shot (testing, replay, debugging). Reusing it across calls is a caller contract violation, not an implementation feature. Code is unchanged — the implementation was already correct under the right contract. Only the docs were misleading. Tests: - `TestRunIdIsPerCall::test_multi_turn_orchestrator_uses_conversation_id` exercises the recommended pattern: three turns sharing a `conversation_id`, three distinct auto-derived `run_id`s, all queryable as a group. - `TestRunIdIsPerCall::test_explicit_run_id_reuse_collides_ledger` locks down the misuse contract: reusing one explicit `run_id` across two `.run()` calls produces colliding effect records under the `(run_id, tool_call_id)` key. The behavior is documented; the test exists so a future refactor cannot silently change it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(step_persistence): explain why for_run uses a local ContextVar Pyai-aligned review flagged this as a P3 explainer: pydantic_ai already has three single-slot cross-run signals (RUN_ID_BAGGAGE_KEY, ctx.run_id, _CURRENT_RUN_CONTEXT). All three get overwritten by the inner Instrumentation.wrap_run before any nested capability can see the parent identity. A separate harness-local ContextVar, snapshotted before our own wrap_run rebinds it, is the only correct mechanism today. Spell this out so the next reader doesn't try to 'simplify' it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(step_persistence): enforce explicit-run_id reuse with ValueError Pydanty round-5 review accepted the docs-only contract but flagged that "documented but not enforced" is a soft spot. Enforce it: `before_run` calls `store.get_run(run_id=...)` when the user supplied an explicit `run_id`, and raises `ValueError` if a record with that id already exists. The auto-derived cases cannot trigger this check (each call materialises a fresh id in `for_run`). The check is one extra store read per Agent.run when an explicit run_id is set, only. The error message points the caller at `conversation_id` for multi-turn grouping. Test renamed from `test_explicit_run_id_reuse_collides_ledger` to `test_explicit_run_id_reuse_raises` — asserts the second `.run()` raises and the first run's records survive untouched. README + capability docstring updated: the misuse path is now "raises" not "caller's contract." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: gitignore branch-context skill artifacts + AGENTS.local.md Two patterns that match the existing CLAUDE.local.md ignore convention: - AGENTS.local.md — canonical local-instructions file (CLAUDE.local.md is symlinked to it where the worktree follows the same AGENTS.md/CLAUDE.md symlink pattern). - .agents/skills/branch-context/ — per-worktree decisions log (`pr-decisions.md`) and the skill's local SKILL.md. Pattern lifted from `~/pydantic/ai/base/.claude/skills/branch-context/` where pyai uses an identical setup. Neither is intended to land in PRs — they record cross-iteration design calls so future AICA sessions in this worktree don't silently undo them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(step_persistence): add SqliteStepStore + MediaStore externalization Adds a new `pydantic_ai_harness.media` package (MediaStore protocol + DiskMediaStore / SqliteMediaStore / S3MediaStore) and wires it into the file/sqlite step-persistence backends so large BinaryContent payloads get externalized out of snapshot JSON / table rows by default. Defaults are zero-config: FileStepStore writes blobs under `<root>/media/<sha256>.bin`; SqliteStepStore writes them to a sibling `media` table in the same DB. Threshold is 64 KiB and URI scheme is `media+sha256://<hex>` so blobs are content-addressed across stores. Pass `media_store=None` to keep bytes inline, or a custom `MediaStore` to redirect (e.g. `S3MediaStore` for R2 / AWS / MinIO). S3MediaStore handrolls SigV4 over httpx to avoid a botocore/boto3 dependency. Verified working against Cloudflare R2. `StepPersistence.from_spec(backend='sqlite', database=...)` now resolves. 180 → 261 tests, 100% branch coverage maintained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(media): VCR cassettes for S3MediaStore against R2 Adds replay-driven tests under `tests/media/test_s3_cassettes.py` that exercise `S3MediaStore.put/get/exists` against pre-recorded Cloudflare R2 responses. CI runs them without any S3 creds via the committed cassettes under `tests/media/cassettes/`. Sanitisation policy: - `before_record_request`/`before_record_response` swap the real R2 account-id subdomain and bucket name for fixed placeholders (`account.r2.cloudflarestorage.com`, `harness-test-bucket`) - `Authorization` and `x-amz-date` filtered to `REDACTED` - CF-RAY, x-amz-version-id, x-amz-checksum-*, x-amz-request-id headers dropped (none load-bearing for tests; some carry identifying info) - Non-2xx response bodies blanked (R2's gzipped XML error envelope leaks the bucket name; our code only checks status code) The `s3_credentials` fixture uses `os.environ.get(NAME, PLACEHOLDER)` per field, so real R2 creds are used when recording locally with `.env` loaded, and the placeholder constants match the scrubbed cassettes during replay. Because the placeholders are fixed, any scrubber miss during a future re-record shows up as a replay URL mismatch — built-in canary against credential / private-data leakage in committed cassettes. Adds `pytest-recording` (pulls `vcrpy`) to the dev deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(media): public_url resolver — protocol method + per-store callable Adds `MediaStore.public_url(uri) -> str | None` plus a `public_url=` constructor parameter on every concrete store. The parameter accepts a sync or async callable; the store auto-detects and awaits if needed. This is the bottom-layer primitive for the forthcoming `MediaExternalizer` capability — that capability will call `store.public_url(...)` per externalized blob and swap `BinaryContent` for `ImageUrl` / `AudioUrl` parts before the model sees the message. The callable shape covers both static URLs (public bucket / CDN — use `make_static_public_url` helper) and dynamic URLs (presigned, per-request signing — pass any async callable with TTL captured in its closure). Why a callable rather than a static config: a public bucket's URL host is not derivable from the bucket creds (R2 public buckets use `pub-<hash>.r2.dev`, AWS public buckets use a different scheme than the path-style endpoint we sign for). The URL is always user-supplied information, so a callable is the right primitive — same shape for the static and presigned cases, and `get` stays untouched (it serves the harness's internal byte fetch, not the model's external HTTP fetch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(media): introduce MediaContext + key_strategy for extensible operations Adds `MediaContext` (frozen, kw-only dataclass with `media_type`, `filename`, `metadata`) and threads it through every `MediaStore` method and both user callables (`PublicUrlResolver`, `KeyStrategy`). New context fields can be added non-breakingly; existing call sites and resolvers keep working. Also adds: - `KeyStrategy = Callable[[str, MediaContext], str]` for per-store layout control. Default `default_key_strategy` produces `<sha256>.bin`. Disk store validates the result against `..` traversal. - `metadata` persistence on `SqliteMediaStore` (new JSON column) and `S3MediaStore` (signed `x-amz-meta-*` headers, ASCII key validation). Disk store explicitly does NOT persist metadata in v1 — sidecar / xattr options each have load-bearing drawbacks; we ship nothing rather than a half-true persistence promise. - `make_static_public_url(...)` updated to the new `(uri, ctx)` signature. The shift is motivated by the same principle as pydantic_ai's `RunContext`: extension via fields on a context bag rather than via breaking signature changes. Every new requirement (TTL hints for presigned URLs, audit ids, response-header overrides, etc.) becomes a field addition, not an API revision. Cassettes from the previous commit replay unchanged — match-on does not include the signed headers and the request URLs are stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(media): close metadata round-trip; drop vestigial sqlite key_strategy Adds `MediaStore.get_metadata(uri) -> Mapping[str, str]` to the protocol and implements it on all three concrete stores: - `DiskMediaStore`: writes a sidecar `<resolved>.meta.json` alongside the blob on put (atomic tmp + rename), reads it back on `get_metadata`. Returns `{}` when no metadata was supplied. v1 had documented this as a deliberate gap — sidecar JSON is straightforward and the xattr / ADS drawbacks don't apply. - `SqliteMediaStore`: `SELECT metadata FROM <table> WHERE sha256=?` + `json.loads`. Raises `FileNotFoundError` for unknown URIs. - `S3MediaStore`: HEAD + collects `x-amz-meta-*` response headers, strips the prefix. Reuses the existing 404 / non-2xx error shape. Drops `key_strategy=` from `SqliteMediaStore`. The digest is the primary key by content-addressing construction — a user-chosen key would either break dedup or be a no-op. Kept on Disk + S3 where bucket / directory layout is a real concern. README + branch-context entries updated to reflect: all three stores round-trip metadata; key_strategy is Disk + S3 only. Coverage stays at 100% branch. * fix(media): reject absolute paths from DiskMediaStore key_strategy `Path(root) / absolute_path` returns `absolute_path` — the root is silently discarded — so a custom `key_strategy` returning `/etc/passwd` (or similar) escapes the store directory even though the previous check only blocked `..`. Tighten the validator to reject both shapes. Caught by pydanty during its #251 integration review. * fix(step_persistence): dedupe terminal snapshot; document sqlite thread-affinity The terminal CallToolsNode already saves the final provider-valid snapshot with the correct step_index. after_run was re-saving the same tail stamped with step_index=0 (ctx.run_step is reset by then), so latest_snapshot reported a misleading step and every run wrote a duplicate. Track whether a node snapshot was taken via a task-local ContextVar and make after_run a fallback that only fires when the run reached no provider-valid boundary. Also document that a caller-owned sqlite connection= must set check_same_thread=False (store SQL runs on anyio worker threads), on both SqliteStepStore and SqliteMediaStore, and correct the WAL-on-every-connection claim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(media): SigV4 wire path matches signed path; walker preserves unknown fields S3MediaStore signed `_canonical_uri(path)` (each segment percent-encoded) but sent the raw path, letting httpx apply looser encoding. A custom key_prefix / key_strategy emitting reserved chars (`@`, `(`, `=`, ...) diverged from the signed path -> SignatureDoesNotMatch. Send the canonical bytes via httpx `raw_path` so signer and sender agree. Default `<hex>.bin` keys are unaffected. The externalize/restore walker hardcoded the BinaryContent key set, silently dropping any field pydantic_ai adds upstream. Copy the node and swap only `data` <-> marker keys so unknown fields round-trip. Adds tests for reserved-char path agreement, unknown-field preservation, and restore over a pruned blob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor: gate step_persistence + media behind experimental namespace Move both packages under `pydantic_ai_harness.experimental`, matching the convention introduced for compaction/planning/subagents (#191): the old `pydantic_ai_harness.step_persistence` / `.media` paths and the top-level re-exports are gone, so the only import path is now from pydantic_ai_harness.experimental.step_persistence import StepPersistence from pydantic_ai_harness.experimental.media import S3MediaStore Both package __init__s call `warn_experimental(...)`, so importing either emits a `HarnessExperimentalWarning` (silenced category-wide by one filter). This keeps us from committing to a public surface before the capability has real usage. README gains the standard experimental banner; warning tests cover both new packages. 100% branch coverage retained. * docs(media): fix metadata/key_strategy drift, convert em-dashes Subagent review of the experimental move surfaced doc drift: - README "Persistence by store" implied `get_metadata` returns `media_type`; it returns only the user `metadata` mapping. Reworded. - `KeyStrategy` docstring still listed `SqliteMediaStore` / "DB primary key" as a user; sqlite has no `key_strategy`. Dropped it. - README understated `DiskMediaStore` traversal protection (it rejects absolute paths as well as `..`). - README had a stale paragraph that read as `key_strategy` but described `public_url`, and omitted S3. Rewritten to name `public_url` and all three stores. - README `MediaContext` method list now includes `get_metadata`. - Converted em-dashes to `--` across the step_persistence + media trees per the writing-style rule (#270). Other experimental packages' pre-existing em-dashes are left for a separate sweep. No code behavior change; 100% branch coverage retained. * docs(experimental): add warning banner to planning/subagents, finish em-dash sweep Uniformity pass across the experimental packages: - Add the standard `[!WARNING]` experimental banner to planning/README.md and subagents/README.md (compaction and step_persistence already had it; these two lacked it). - Convert remaining em-dashes to `--` in the compaction package and the shared `_warn.py`, per the writing-style rule (#270). The whole experimental source tree is now em-dash-free. Docs only; 100% branch coverage retained. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: David SF <david.sanchez@pydantic.dev>
1 parent d347d08 commit 6f2aa11

39 files changed

Lines changed: 6605 additions & 27 deletions

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
.mcp.json
33
.DS_Store
44
.agents/settings.local.json
5+
.agents/skills/branch-context/
6+
AGENTS.local.md
57
CLAUDE.local.md
68
LOCAL_WORKTREES.md
79

pydantic_ai_harness/__init__.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Pydantic AI capability library."""
1+
"""The batteries for your Pydantic AI agent -- the official capability library."""
22

33
from typing import TYPE_CHECKING
44

@@ -8,7 +8,12 @@
88
from .logfire import ManagedPrompt
99
from .shell import Shell
1010

11-
__all__ = ['CodeMode', 'FileSystem', 'ManagedPrompt', 'Shell']
11+
__all__ = [
12+
'CodeMode',
13+
'FileSystem',
14+
'ManagedPrompt',
15+
'Shell',
16+
]
1217

1318

1419
def __getattr__(name: str) -> object:

pydantic_ai_harness/experimental/_warn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ class HarnessExperimentalWarning(UserWarning):
2828
def warn_experimental(feature: str) -> None:
2929
"""Emit a `HarnessExperimentalWarning` for *feature*, including how to silence all of them.
3030
31-
One filter silences the whole category every experimental capability so users never
31+
One filter silences the whole category (every experimental capability), so users never
3232
need a suppression line per capability.
3333
"""
3434
warnings.warn(

pydantic_ai_harness/experimental/compaction/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
> [!WARNING]
44
> **Experimental.** These capabilities live under `pydantic_ai_harness.experimental` and may
55
> change or be removed in any release, without a deprecation period. Import them from the
6-
> experimental path there is no top-level export:
6+
> experimental path -- there is no top-level export:
77
>
88
> ```python
99
> from pydantic_ai_harness.experimental.compaction import TieredCompaction
@@ -24,7 +24,7 @@ window. Each is a Pydantic AI `Capability` that runs in the `before_model_reques
2424
**persist** into the run's message history, so a trim/clear/summary carries forward to later
2525
steps (it is not recomputed from the full history every turn).
2626
27-
All strategies preserve tool-call / tool-return **pairing** core does not validate this, and a
27+
All strategies preserve tool-call / tool-return **pairing** -- core does not validate this, and a
2828
provider rejects an orphaned pair. The zero-LLM strategies never call a model.
2929
3030
## The menu
@@ -48,9 +48,9 @@ near-lossless). `TieredCompaction` triggers and stops on a single `target_tokens
4848
## Cost: why summarization is the last resort
4949
5050
Summarization turns input tokens into output tokens, which are billed at a premium and generated
51-
serially so it is genuinely expensive. The zero-LLM strategies touch only the cheaper input side.
51+
serially -- so it is genuinely expensive. The zero-LLM strategies touch only the cheaper input side.
5252
The field consensus (Anthropic, OpenCode, Letta) is to clear/dedupe first and summarize only when
53-
that is not enough which is exactly what `TieredCompaction` encodes:
53+
that is not enough -- which is exactly what `TieredCompaction` encodes:
5454
5555
```python
5656
from pydantic_ai import Agent
@@ -77,14 +77,14 @@ agent = Agent(
7777
```
7878
7979
A tier inside `TieredCompaction` is driven directly by the orchestrator, which re-measures after each
80-
and stops once under `target_tokens` so a tier's own `max_*` trigger is irrelevant there (set it to
80+
and stops once under `target_tokens` -- so a tier's own `max_*` trigger is irrelevant there (set it to
8181
anything valid). Any object with `async def compact(messages, ctx) -> list[ModelMessage]`
8282
(`CompactionStrategy`) can be a tier, so you can plug in your own.
8383
8484
## Cache tradeoff (read before using `ClearToolResults`)
8585
8686
Clearing or deduplicating rewrites message content, which invalidates the provider's prompt cache
87-
from the edit point onward the next request pays a cache-write. Use `ClearToolResults`'
87+
from the edit point onward -- the next request pays a cache-write. Use `ClearToolResults`'
8888
`min_clear_tokens` to skip clearing that reclaims too little to be worth busting the cache.
8989
9090
## Model inheritance
@@ -94,8 +94,8 @@ running agent's model. No token caps are imposed on the summary call.
9494
9595
## Usage accounting
9696
97-
The summary call is a real request to the model, so its full usage tokens **and** the request
98-
itself is folded into the run's `ctx.usage`. This is deliberate: it keeps cost honest, keeps the
97+
The summary call is a real request to the model, so its full usage -- tokens **and** the request
98+
itself -- is folded into the run's `ctx.usage`. This is deliberate: it keeps cost honest, keeps the
9999
request count consistent (a model request that didn't count as one would be the surprise), and lets a
100100
`UsageLimits` request limit catch a runaway compaction. A run-request / iteration limiter will
101101
therefore see compaction calls among its requests.
@@ -120,5 +120,5 @@ def my_file_key(call: ToolCallPart) -> str | None:
120120
## Out of scope
121121
122122
These strategies compress or drop context *inside* the window. Moving large tool outputs *out* of the
123-
window overflowing them to a file the agent (or a subagent) can query on demand is a separate
123+
window -- overflowing them to a file the agent (or a subagent) can query on demand -- is a separate
124124
capability, not lossy truncation. Prefer it over capping individual tool outputs.

pydantic_ai_harness/experimental/compaction/_clear_tool_results.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""`ClearToolResults` zero-cost in-place clearing of old tool results."""
1+
"""`ClearToolResults` -- zero-cost in-place clearing of old tool results."""
22

33
from __future__ import annotations
44

@@ -31,7 +31,7 @@ class ClearToolResults(AbstractCapability[AgentDepsT]):
3131
calls remain paired with their (now-blanked) results, so the history stays valid.
3232
No LLM calls are made.
3333
34-
This is the cheap first tier of compaction tool results typically dominate
34+
This is the cheap first tier of compaction -- tool results typically dominate
3535
context, and the agent can re-run a tool if it needs the data again.
3636
3737
Cache tradeoff: clearing rewrites message content, which invalidates the provider's

pydantic_ai_harness/experimental/compaction/_deduplicate_file_reads.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""`DeduplicateFileReads` zero-cost in-place clearing of superseded file reads."""
1+
"""`DeduplicateFileReads` -- zero-cost in-place clearing of superseded file reads."""
22

33
from __future__ import annotations
44

@@ -25,7 +25,7 @@ class DeduplicateFileReads(AbstractCapability[AgentDepsT]):
2525
earlier reads are blanked with a placeholder. Tool-call pairing is preserved. No LLM
2626
calls are made.
2727
28-
File identity is supplied by the ``file_key`` seam given a ``ToolCallPart`` it returns
28+
File identity is supplied by the ``file_key`` seam -- given a ``ToolCallPart`` it returns
2929
a stable key for the file being read, or ``None`` if the call is not a file read. There
3030
is no default: file-read identification is agent-specific, and a wrong guess would drop
3131
live data.

pydantic_ai_harness/experimental/compaction/_limit_warner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""`LimitWarner` injects warnings as the run approaches configured limits."""
1+
"""`LimitWarner` -- injects warnings as the run approaches configured limits."""
22

33
from __future__ import annotations
44

pydantic_ai_harness/experimental/compaction/_shared.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""Shared utilities for the compaction capabilities.
22
33
Token estimation, the `CompactionStrategy` protocol, tool-pair-safe cutoff logic, first-user
4-
preservation, and in-place tool-result clearing anything used by more than one capability.
4+
preservation, and in-place tool-result clearing -- anything used by more than one capability.
55
"""
66

77
from __future__ import annotations
@@ -121,7 +121,7 @@ async def compact(
121121

122122

123123
# ---------------------------------------------------------------------------
124-
# Safe cutoff logic preserves tool-call / tool-return pairs
124+
# Safe cutoff logic -- preserves tool-call / tool-return pairs
125125
# ---------------------------------------------------------------------------
126126

127127
_TOOL_PAIR_SEARCH_RANGE = 5

pydantic_ai_harness/experimental/compaction/_sliding_window.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""`SlidingWindow` zero-cost trimming of the oldest messages."""
1+
"""`SlidingWindow` -- zero-cost trimming of the oldest messages."""
22

33
from __future__ import annotations
44

pydantic_ai_harness/experimental/compaction/_summarizing_compaction.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""`SummarizingCompaction` LLM-powered summarization of older messages."""
1+
"""`SummarizingCompaction` -- LLM-powered summarization of older messages."""
22

33
from __future__ import annotations
44

@@ -44,7 +44,7 @@
4444
Choices made and the reasoning, so they are not relitigated.
4545
4646
## Artifacts
47-
Files, paths, identifiers, commands, and APIs touched quote exact names.
47+
Files, paths, identifiers, commands, and APIs touched -- quote exact names.
4848
4949
## Current state
5050
What is done and what is in progress right now.
@@ -55,7 +55,7 @@
5555
## Open questions
5656
Unresolved questions or blockers.
5757
58-
Focus on results, not a replay of completed actions. Respond ONLY with the summary no \
58+
Focus on results, not a replay of completed actions. Respond ONLY with the summary -- no \
5959
preamble, no markdown fences.
6060
6161
<messages>
@@ -140,8 +140,8 @@ class SummarizingCompaction(AbstractCapability[AgentDepsT]):
140140
summarized using a dedicated model call and replaced with a compact, structured
141141
summary message, preserving recent context and tool-call integrity.
142142
143-
This is the expensive tier summarization turns input tokens into (pricier) output
144-
tokens so it is best used behind cheaper passes (see `TieredCompaction`).
143+
This is the expensive tier -- summarization turns input tokens into (pricier) output
144+
tokens -- so it is best used behind cheaper passes (see `TieredCompaction`).
145145
146146
The summary call's usage is folded into the parent run's usage (it counts as a real
147147
request), so cost accounting stays honest; note this also increments the run's request

0 commit comments

Comments
 (0)