Common questions about the SDK and this workshop.
You can build everything in this workshop with the raw Messages API. The SDK gives you:
- The agentic loop for free. You don't write the
while stop_reason == "tool_use"loop. You subscribe to a message stream. - Simpler tool definition.
@tooldecorator + Python types instead of hand-written JSON Schema. - Sub-agents as a primitive. The
Tasktool andAgentDefinitionpattern don't exist in the raw API — you'd build your own. - Lifecycle hooks.
PreToolUse,PostToolUse,UserPromptSubmitfire automatically — no manual interception. - Session management, retry, streaming out of the box.
Rough LOC reduction: ~50% for equivalent behavior. See docs/CHEATSHEET.md for the full pattern reference.
| Main agent | Sub-agent | |
|---|---|---|
| Who defines it | You, via ClaudeAgentOptions |
You, via AgentDefinition |
| Who invokes it | You, via client.query() |
The main agent, via Task(subagent_type="...", prompt="...") |
| Context window | The one you're watching | Its own, isolated |
| Sees conversation history? | Yes | No — only the prompt passed to Task |
The main agent is the one you talk to. Sub-agents are its workers — spun up on demand, run to completion, return one answer, discarded.
| Tool | Hook | |
|---|---|---|
| Who triggers it | The model, mid-turn | The SDK, on lifecycle events |
| Model aware of it? | Yes — in the tool list | No — invisible |
| Use for | Capabilities the agent needs | Guardrails, context injection, logging, blocking |
A tool is something the model chooses to call. A hook is a callback that fires automatically when something happens (user submits a prompt, tool is about to run, etc.).
The memory system in Stage 3 uses both: a tool to let the model choose what to save, a hook to automatically inject saved memories into every new conversation.
Two reasons:
- The workshop is about SDK patterns, not integrations. If we used real APIs, half the session would be "get your Salesforce/Jira/whatever credentials working." Mock data lets you focus on the interesting part.
- Reliability. Real APIs flake, rate-limit, change. A workshop that depends on third-party uptime is a workshop waiting to go sideways.
The @tool contract is the same either way. In production you'd swap the mock implementation for a real API call — the agent code doesn't change.
Yes. Change MODEL in config.py to any model ID your API key has access to:
claude-sonnet-4-6— default, fast and cheapclaude-opus-4-6— slower and pricier but highest qualityclaude-haiku-4-5— fastest, good for sub-agents
You can also set different models for sub-agents vs. the main agent (the starter sub-agent configs all use "haiku").
Roughly: the four-stage guided demo at default settings runs under $0.10 for sonnet, maybe $0.30–0.50 for opus. The SDK prints the cost after each turn ([done — 3 turn(s), $0.0234]).
Breakout iteration depends on how much you run. Budget a dollar or two for an active session.
Yes, but it's prompt-driven, not code-driven. The model spawns sub-agents by calling the Task tool. If the model emits multiple Task calls in a single turn, the SDK runs them in parallel.
To get parallelism, your system prompt needs to tell the orchestrator to do this: "When researching multiple topics, spawn a separate researcher for each in the same turn — don't wait for one to finish before starting the next."
It does — but that's not the thing ENABLE_MEMORY unlocks.
Within-session memory (ask a follow-up, it remembers the context) works at every stage because ClaudeSDKClient preserves conversation history while connected. You get this for free with no toggle.
Cross-session memory (close the terminal, run again tomorrow, it still knows your preferences) is what Stage 3 adds. That's the part that needs a persistence layer.
Great question — and the biggest gap folks flagged in the last workshop. Short answer for now:
- Write down 5–10 test prompts that cover what the agent should handle
- Run them after each config change and score pass/fail manually
- Look for regressions — did enabling a sub-agent make something worse?
The SDK doesn't have a built-in eval harness yet. For production agents, most teams build a simple harness: a list of (prompt, expected_behavior) pairs, run them, have a model judge the outputs. We're working on better guidance here.
Not for this workshop — the exercises are specifically about the SDK's Python API. But the Agent SDK and Claude Code share the same underlying primitives (tools, sub-agents, hooks), so the mental models transfer directly.
- The SDK README and its
examples/directory - The cookbook agents — production-shaped reference implementations
- The breakout you worked on here — swap the mock tools for real integrations and you've got a real agent