feat(managed_agents): add multiagent and outcomes cookbooks#599
Conversation
- CMA_coordinate_specialist_team.ipynb: heterogeneous team via the multiagent coordinator config — coordinator runs three specialists with scoped toolsets to assemble a sales proposal - CMA_verify_with_outcome_grader.ipynb: grade-and-revise loop with Outcomes — writer drafts a cited brief, grader checks it against a rubric, feedback drives revisions until it passes; includes a rubric-writing section with failure modes and a six-principle table - Update managed_agents/README.md and registry.yaml
Notebook ChangesThis PR modifies the following notebooks: 📓
|
Model checkValidated model references in the changed files against the current public models list. Findings
MODEL = os.environ.get(\"COOKBOOK_MODEL\", \"claude-opus-4-6\")
Note: the project's `CLAUDE.md` lists `claude-opus-4-6` as the current Opus, but the canonical docs now list `claude-opus-4-7` as the latest. `CLAUDE.md` may itself be stale.
Other checks
RecommendationUpdate the default in `CMA_coordinate_specialist_team.ipynb` from `claude-opus-4-6` to `claude-opus-4-7` (and re-run the notebook to refresh outputs if the agent IDs / sample outputs were tied to the older model). |
Link ReviewReviewed links in:
✅ Valid linksAnthropic documentation (HTTPS, current
Internal relative paths in External research citations in
|
There was a problem hiding this comment.
PR Review
Recommendation: COMMENT
Summary
Adds two well-crafted Managed Agents cookbooks: a multiagent coordinator demo (heterogeneous specialist team for sales-proposal generation) and an Outcomes/grader demo (writer + stateless grader loop with rubric-driven revision). Plus matching managed_agents/README.md and registry.yaml entries.
Pedagogy is excellent — rubric-design tables, "Why not just put the rubric in the system prompt?" sidebar, and post-run analysis sections are model examples for the cookbook. No critical issues; the items below are improvements rather than blockers.
Actionable Feedback (5 items)
-
managed_agents/CMA_coordinate_specialist_team.ipynb(end) andmanaged_agents/CMA_verify_with_outcome_grader.ipynb(end) — Add cleanup cells that callclient.beta.sessions.archive(...)andclient.beta.environments.delete(...). Other CMA cookbooks tear down sessions/environments to avoid accumulating live billed resources; a learner running these repeatedly will leave a trail ofproposal-meridian/research-briefenvironments behind. -
CMA_coordinate_specialist_team.ipynb(in cell withdisplay(Markdown(ev.input["content"]))) — Useev.input.get("file_path", "")defensively, matching the verify notebook's equivalent. Direct["file_path"]access raisesKeyErrorif a future SDK ever emits awritetool use withoutfile_path. -
managed_agents/README.mdandregistry.yaml— Both new entries paraphrase the cookbook descriptions slightly differently. Keeping the README table cell text andregistry.yamldescriptionfield identical (or sourced from one canonical string) prevents drift between the website and the README. - General — Neither notebook nor the README mentions that the
managed-agents-2026-04-01beta requires allowlisted access. A user outside the beta will get a confusing 4xx. One sentence in the README "Getting started" or in each notebook's setup section would save real debugging. -
CMA_coordinate_specialist_team.ipynb(in cell withPROSPECT = {...}) — Minor:~{PROSPECT['employees']} employeesformats as~8500. Use:,to render8,500so it reads consistently with the case-study summaries (6,200,2,800).
Detailed Review
Code Quality
- Both notebooks correctly use
dotenv.load_dotenv()plusos.environ.get("COOKBOOK_MODEL", ...)for model selection. - Models use the non-dated aliases (
claude-opus-4-6,claude-sonnet-4-6) per CLAUDE.md. - The two notebooks pick different default models (Opus for coordinator, Sonnet for writer/grader). That is plausibly intentional — the coordinator orchestrates while the writer/grader are simpler — but a one-line note explaining why would help readers calibrate their own choices.
- Both cookbooks reimplement an inline streaming loop instead of using
utilities.stream_until_end_turn, because the coordinator emitsthread_created/thread_message_receivedand the Outcomes loop emitsspan.outcome_evaluation_*events. That's the right call, but a one-sentence note (mirroring the gate-notebook commentary inREADME.md) would clarify why. - The
make_agenthelper in the coordinator cookbook lacks type hints; not load-bearing, just a project-style nit.
Security
- No hardcoded keys, no
os.environassignment of credentials, no shell-injection-prone strings. Beta header is correctly listed viaBETAS. - Mounted resources (
/mnt/user-data/...) are scoped to the session's environment; nothing leaks to the host.
Suggestions
- Coordinate notebook prints diagnostic glyphs (
━━━); verify notebook uses✓/⟳. Both render fine in Jupyter; just be aware of Windows console caveats outside Jupyter. import reandimport timein the verify notebook are both used (render_feedback, elapsed-time math). No dead imports.- Registry categories (
Agent Patterns,Tools,Evals) match existing taxonomy. Both authorsmarkn-antandgaganb-antare present inauthors.yaml.
Positive Notes
- The verify notebook's "What you'll learn" → rubric-design table → live trace → "What just happened" arc is exactly the structure these explainers should follow. Catching the 8-K Exhibit 99.1 vs. 10-K distinction in pass 2 is a great concrete teaching moment.
- The coordinator notebook's
send_to_parentpayload printout (showing each subagent's raw return) is a clean way to make the multi-agent boundary visible before showing the assembled artifact. - "Why three subagents instead of one" closing section in the coordinator notebook directly addresses the obvious reader question.
Adds two Claude Managed Agents cookbooks:
CMA_coordinate_specialist_team.ipynb— Heterogeneous team via themultiagentcoordinator config: a coordinator runs three specialists (web-search researcher, file-reading librarian, rules-based pricer) with scoped toolsets to assemble a sales proposal. Covers themultiagentfield, thethread_created/thread_message_receivedevent types, and why per-role tool scoping matters.CMA_verify_with_outcome_grader.ipynb— Build a grade-and-revise loop with Outcomes: a writer drafts a cited research brief, a stateless grader fetches every URL and checks every quote against a rubric, and feedback drives revisions until the brief passes. Coversuser.define_outcome, thespan.outcome_evaluation_*events, and how to write a rubric the grader can act on.Plus
managed_agents/README.mdtable rows andregistry.yamlentries.