Learn Pydantic AI by building a multi-tenant SQL Safety Assistant, one level at a time.
cd pydantic-ai-learn
uv sync
export GOOGLE_API_KEY=your-key-here # from aistudio.google.comWhat you learn: What runtime context injection actually IS, in pure Python. No pydantic-ai.
Agent= stateless class, created once at startupChatDeps= per-user environment (conn, permissions)RunContext= just a wrapper that carries deps into tools (not a resource manager!)- YOU manage conn lifecycle, not RunContext
uv run level-0-vanilla-di/main.pyKey insight: RunContext is not magic. It's just ctx.deps = deps. Whoever creates the conn closes the conn. Agent holds zero user state. This is the mental model for everything that follows.
What you learn: RunContext[Deps] is a security boundary, not just clean code.
- DuckDB as mock BQ — real SQL, fixture-style setup (
db.py) ChatDepswithallowed_datasets— LLM never sees thisagent.iter()— step through the ReAct loop node by node- Multi-tenant demo: same agent, different permissions per user
uv run level-1-basic-di/main.pyKey insight: User A gets allowed_datasets=["sales"], User B gets ["sales", "marketing", "hr"]. Same agent, same tools — the permission boundary lives in deps, invisible to the LLM.
What you learn: Tools can return intent, not result. Agent pauses for human approval.
execute_sqlhasrequires_approval=True— always pauses- Agent generates SQL → dry-runs → hits the approval gate → human decides
DeferredToolRequests/DeferredToolResultsfor pause/resume
uv run level-2-deferred-tools/main.pyKey insight: The agent proposes, the human disposes. execute_sql never runs without explicit approval.
What you learn: Guardrails the LLM must NOT control. Conditional approval based on cost.
cost_limit_usdin deps — cheap queries auto-execute, expensive ones need approvalApprovalRequiredraised conditionally (notrequires_approval=True)- 4 demo scenarios: cheap/auto, expensive/approved, expensive/denied, no-permission
uv run level-3-cost-guardrail/main.pyKey insight: Cost guardrail lives in deps, enforced at runtime. The LLM proposes SQL, but it's the runtime that decides whether to pause for approval. This is infra-level design.
What you learn: Agent approval across two HTTP requests. State bridged by an in-memory store.
POST /query→ agent runs → pauses → returns{approval_id}GET /pending/{id}→ see the SQL + cost waiting for approvalPOST /approve/{id}→{"approved": true}→ agent resumes → final resultconnis NOT stored — re-injected on resume (this is why DI matters for durable patterns)
uv run uvicorn level-4-fastapi.main:app --reload
# then open http://localhost:8000/docsKey insight: The agent's message history is serializable. The DB connection is not. DI lets you re-inject the connection cleanly on resume without touching agent or tool code.
What you learn: message_history is the session — you own persistence, not the framework.
POST /session— create a session (getssession_id)POST /session/{id}/chat— send a message; agent sees full history each turnGET /session/{id}/history— inspect the conversation so far- Approval mid-session: history writes back only after resolution
uv run uvicorn level-5-multi-turn.main:app --reload --port 8001
# then open http://localhost:8001/docsKey insight: The one-line difference from level-4: message_history=session["messages"]. That list grows every turn. The LLM sees all of it. You decide where to store it — dict today, Redis in level-6.
What you learn: Session state that survives process restarts. One file swap — everything else is identical to level-5.
store.pyswapped: in-memory dict → Redis (redis.setex)- Messages serialized with
ModelMessagesTypeAdapter(pydantic-ai's own serializer) - 24h session TTL, 1h approval TTL
main.pyis byte-for-byte the same as level-5
docker run -d -p 6379:6379 redis:7-alpine
uv sync --extra redis
uv run uvicorn level-6-redis.main:app --reload --port 8002
# then open http://localhost:8002/docsKey insight: Restart the server — sessions survive. That's the only difference from level-5. This is the foundation for durable async queues and crash recovery.
What you learn: Two agents, two models, same tools. Pay for the expensive model only when needed.
- Agent A (
gemini-2.0-flash-lite) handles simple queries directly - If Agent A outputs
ESCALATE:, Agent B (gemini-2.0-flash) takes over - Agent B receives Agent A's SQL + data as context — no duplicate work
- Tools registered on both agents via a
for agent in (a, b)loop
uv run level-7-multi-agent/main.pyKey insight: The "routing" is just a string check: output.startswith("ESCALATE:"). No graph, no router agent. Agent A passes its work to Agent B — 80% of queries handled cheaply, 20% escalated.
| Level | Concept | Key idea |
|---|---|---|
| 0 | Vanilla DI | The mental model behind everything |
| 1 | RunContext[Deps] + agent.iter() |
Security boundary + step-by-step visibility |
| 2 | Deferred tools | Human-in-the-loop (always pauses) |
| 3 | Cost guardrail | Human-in-the-loop (conditional) |
| 4 | FastAPI | Approval across two HTTP requests |
| 5 | Multi-turn session | message_history = session state |
| 6 | Redis store | Session survives restarts (durable) |
| 7 | Multi-agent | Model escalation — cheap first, expensive when needed |
Real BigQuery → replace db.py with google.cloud.bigquery.Client
Real auth → move user_id + allowed_datasets into JWT claims
Streaming → AG-UI protocol for real-time frontend updates