This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
- Install dependencies:
make install(requires uv and pre-commit) - Run all checks:
make allorpre-commit run --all-files - Run tests:
make test - Build docs:
make docsormake docs-serve(local development)
- Run specific test:
uv run pytest tests/test_agent.py::test_function_name -v - Run test file:
uv run pytest tests/test_agent.py -v - Run with debug:
uv run pytest tests/test_agent.py -v -s
pydantic_deep/— Core library (agent, deps, toolsets, middleware, processors)cli/— CLI application (terminal AI assistant)apps/swebench_agent/— SWE-bench evaluation agentapps/harbor_agent/— Harbor benchmark agentapps/deepresearch/— Full-featured research reference apptests/— Unit testsdocs/— Documentation source (MkDocs)
Agent Factory (pydantic_deep/agent.py)
create_deep_agent(): Main factory function for creating configured agentscreate_default_deps(): Helper for creating DeepAgentDeps with sensible defaults- Built on top of pydantic-ai's Agent class
Dependencies (pydantic_deep/deps.py)
DeepAgentDeps: Dataclass holding agent dependencies (backend, working_dir, skills_dirs, subagents)- Passed to agent.run() for runtime configuration
Backends (from pydantic-ai-backend)
BackendProtocol: Interface for file storage backendsStateBackend: In-memory file storage (for testing, ephemeral use)LocalBackend: Real filesystem operationsDockerSandbox: Isolated Docker container executionCompositeBackend: Combines multiple backends with routing
Toolsets (pydantic_deep/toolsets/)
TodoToolset: Task planning and tracking tools (read_todos, write_todos) - from pydantic-ai-todocreate_console_toolset: File operations (ls, read, write, edit, glob, grep, execute) - from pydantic-ai-backendSubAgentToolset: Spawn and delegate to subagents - from subagents-pydantic-aiSkillsToolset: Load and use skill definitions from markdown files
Subagents (from subagents-pydantic-ai)
create_subagent_toolset(): Factory function to create subagent toolsetsget_subagent_system_prompt(): Generate system prompt for subagent tools- Dual-mode execution: sync (blocking) or async (background)
- Task management: check_task, list_active_tasks, soft_cancel_task, hard_cancel_task
- Types:
SubAgentConfig,CompiledSubAgent,TaskHandle,TaskStatus,TaskPriority
Processors (from summarization-pydantic-ai)
SummarizationProcessor: LLM-based conversation summarization for token managementSlidingWindowProcessor: Zero-cost message trimming without LLM callscreate_summarization_processor(): Factory function for summarization processorscreate_sliding_window_processor(): Factory function for sliding window processors
Types (pydantic_deep/types.py)
- Pydantic models for all data structures
FileData,FileInfo,WriteResult,EditResult,GrepMatchTodo,SubAgentConfig,CompiledSubAgentSkill,SkillDirectory,SkillFrontmatterResponseFormat: Alias for structured output specification
Checkpointing (pydantic_deep/toolsets/checkpointing.py)
Checkpoint: Immutable snapshot of conversation state (id, label, turn, messages, metadata)CheckpointStore: Protocol for storage backends (save, get, list_all, remove, etc.)InMemoryCheckpointStore: Default in-memory storeFileCheckpointStore: Persistent JSON file storeCheckpointMiddleware: Auto-checkpoint via middleware hooks (every_tool, every_turn, manual_only)CheckpointToolset: Agent tools (save_checkpoint, list_checkpoints, rewind_to)RewindRequested: Exception for app-level rewind (propagates out of agent.run())fork_from_checkpoint(): Utility for session forking
Agent Teams (pydantic_deep/toolsets/teams.py)
SharedTodoItem: Task with assignment, dependencies, and status trackingSharedTodoList: Asyncio-safe shared TODO list with claiming and dependency blockingTeamMessage: Message between team membersTeamMessageBus: Peer-to-peer message bus using asyncio.Queue per agentTeamMember: Member definition (name, role, description, instructions, model)TeamMemberHandle: Runtime handle to a running team memberAgentTeam: Coordinator — spawn, assign, broadcast, wait_all, dissolvecreate_team_toolset(): Factory for team management tools (spawn_team, assign_task, check_teammates, message_teammate, dissolve_team)
Output Styles (pydantic_deep/styles.py)
OutputStyle: Dataclass (name, description, content)BUILTIN_STYLES: Dict of 4 built-in styles (concise, explanatory, formal, conversational)resolve_style(): Resolve style name → OutputStyle (built-ins → styles_dir → error)discover_styles(): Discover .md style files from a directoryload_style_from_file(): Load a single style with frontmatter parsingformat_style_prompt(): Format for system prompt injection
Hooks (pydantic_deep/middleware/hooks.py)
HookEvent: Enum (PRE_TOOL_USE, POST_TOOL_USE, POST_TOOL_USE_FAILURE)Hook: Definition — event, command/handler, matcher regex, timeout, backgroundHookInput: Data passed to hooks (event, tool_name, tool_input, tool_result, tool_error)HookResult: Result from hook (allow, reason, modified_args, modified_result)HooksMiddleware: AgentMiddleware that dispatches hooks on tool eventsEXIT_ALLOW = 0,EXIT_DENY = 2: Claude Code exit code conventions
Persistent Memory (pydantic_deep/toolsets/memory.py)
MemoryFile: Loaded memory (agent_name, path, content)AgentMemoryToolset: FunctionToolset with read_memory, write_memory, update_memoryget_instructions(): Injects memory into system prompt (first N lines)load_memory(),format_memory_prompt(),get_memory_path()- Default path:
{memory_dir}/{agent_name}/MEMORY.md
Context Files (pydantic_deep/toolsets/context.py)
ContextFile: Loaded context file (name, path, content)ContextToolset: FunctionToolset that injects context files via get_instructions()discover_context_files(): Auto-discover DEEP.md, AGENTS.md, CLAUDE.md, SOUL.mdload_context_files(): Load from backend (missing files silently skipped)format_context_prompt(): Format with subagent filtering and truncationDEFAULT_CONTEXT_FILENAMES: [DEEP.md, AGENTS.md, CLAUDE.md, SOUL.md]SUBAGENT_CONTEXT_ALLOWLIST: {DEEP.md, AGENTS.md} — subagents don't see SOUL.md/CLAUDE.md
Eviction Processor (pydantic_deep/processors/eviction.py)
EvictionProcessor: History processor — saves large tool outputs to files, replaces with previewcreate_eviction_processor(): Factory functioncreate_content_preview(): Head/tail preview with truncation marker- Default threshold: 20,000 tokens (80,000 chars)
- Uses runtime
ctx.deps.backendfor writing
Cost Tracking (from pydantic-ai-middleware)
- Enabled by default via
cost_tracking=True CostTrackingMiddleware: Tracks token usage and USD costs per run and cumulativeCostInfo: Per-run and cumulative token/cost dataBudgetExceededError: Raised when cumulative cost exceedscost_budget_usd- Pricing from
genai-pricespackage
Patch Tool Calls (pydantic_deep/processors/patch.py)
patch_tool_calls_processor(): HistoryProcessor that fixes orphaned tool calls- Injects synthetic
ToolReturnPartwith "Tool call was cancelled." message - Used when resuming interrupted conversations (
patch_tool_calls=True)
Plan Mode (pydantic_deep/toolsets/plan/)
create_plan_toolset(): Factory for ask_user + save_plan tools- Built-in 'planner' subagent registered when
include_plan=True PLANNER_INSTRUCTIONS,PLANNER_DESCRIPTION: Planner configuration- Plans saved as markdown files in
plans_dir(default:/plans) ask_usersupports headless mode (auto-selects recommended option)
Context Manager (from summarization-pydantic-ai)
- Enabled by default via
context_manager=True ContextManagerMiddleware: Dual-protocol — history processor + AgentMiddleware- Token tracking with
on_context_updatecallback (percentage, current, max) - Auto-compression when approaching token budget (compress_threshold=0.9)
create_context_manager_middleware(): Factory function
Middleware Integration (from pydantic-ai-middleware)
middlewareparam: List of AgentMiddleware instancespermission_handler: Async callback for ToolDecision.ASKmiddleware_context: Shared state between hooks- Automatically wraps Agent in MiddlewareAgent when any middleware is used
share_todos on DeepAgentDeps
DeepAgentDeps.share_todos: bool = False- When True,
clone_for_subagent()passes same todos list reference (shared) - When False (default), subagents get an empty todos list (isolated)
Backend Abstraction
from pydantic_ai_backends import StateBackend, LocalBackend, CompositeBackend
# In-memory for testing
backend = StateBackend()
# Real filesystem
backend = LocalBackend(root_dir="/path/to/workspace")
# Combined backends with routing
backend = CompositeBackend(
default=StateBackend(),
routes={
"/project/": LocalBackend(root_dir="/home/user/project"),
},
)Toolset Registration
from pydantic_deep import create_deep_agent, DeepAgentDeps
from pydantic_ai_backends import create_console_toolset
from pydantic_ai_todo import create_todo_toolset
agent = create_deep_agent(
model="openai:gpt-4.1",
toolsets=[create_todo_toolset(), create_console_toolset()],
)Skills System
# Skills are markdown files with YAML frontmatter
# Located in skills_dirs specified in DeepAgentDeps
deps = DeepAgentDeps(
backend=StateBackend(),
skills_dirs=["/path/to/skills"],
)Structured Output
from pydantic import BaseModel
from pydantic_deep import create_deep_agent
class TaskResult(BaseModel):
status: str
details: str
# Agent returns TaskResult instead of str
agent = create_deep_agent(output_type=TaskResult)Context Management / Summarization
from pydantic_deep import (
create_deep_agent,
create_summarization_processor,
create_sliding_window_processor,
)
# Automatically summarize when reaching token limits
processor = create_summarization_processor(
trigger=("tokens", 100000), # or ("messages", 50) or ("fraction", 0.8)
keep=("messages", 20), # Keep last N messages after summarization
)
# Or use sliding window for zero-cost trimming
window = create_sliding_window_processor(
trigger=("messages", 100),
keep=("messages", 50),
)
agent = create_deep_agent(history_processors=[processor])- Unit tests:
tests/directory with comprehensive coverage - Test models: Use
TestModelfrom pydantic-ai for deterministic testing - Async testing: pytest-asyncio with
asyncio_mode = "auto" - Coverage requirement: 100% coverage is required for all PRs
pyproject.toml: Main configuration (dependencies, tools, coverage)Makefile: Development task automationmkdocs.yml: Documentation configuration.pre-commit-config.yaml: Pre-commit hook configuration
- Backend Protocol: All backends implement
BackendProtocolfor consistent file operations - Async-First: Most operations are async, use
awaitappropriately - Type Safety: Full type annotations with Pyright strict mode
- Sandbox Support: DockerSandbox requires
dockeroptional dependency
- Local docs:
make docs-serve(serves at http://localhost:8000) - Docs source:
docs/directory (MkDocs with Material theme) - API reference: Auto-generated from docstrings using mkdocstrings
- Package manager: uv (fast Python package manager)
- Lock file:
uv.lock(commit this file) - Sync command:
make syncto update dependencies - Optional extras: sandbox, cli, dev
Every pull request MUST have 100% coverage. You can check the coverage by running make test.
Use # pragma: no cover for legitimately untestable code (e.g., platform-specific branches).
All code must pass both Pyright and MyPy strict checking:
make typecheckfor Pyrightmake typecheck-mypyfor MyPy
Always reference Python objects with backticks and link to API reference:
The [`create_deep_agent`][pydantic_deep.agent.create_deep_agent] function creates a configured agent.When renaming a class, add deprecation warning:
from typing_extensions import deprecated
class NewClass: ...
@deprecated("Use `NewClass` instead.")
class OldClass(NewClass): ...