feat(managed_agents): add callable_agents and define_outcome cookbooks#592
feat(managed_agents): add callable_agents and define_outcome cookbooks#592markn-ant wants to merge 1 commit into
Conversation
Notebook ChangesThis PR modifies the following notebooks: 📓
|
|
Moving to claude-cookbooks-private for internal review first. |
There was a problem hiding this comment.
PR Review
Recommendation: REQUEST_CHANGES
Summary
Adds two well-crafted CMA tutorial notebooks: CMA_coordinate_specialist_team.ipynb (heterogeneous multiagent via callable_agents) and CMA_verify_with_outcome_grader.ipynb (iterative grader loop via define_outcome). Code quality is high, model aliases are correct, and the pedagogical approach is strong. A few issues need addressing before merge.
Actionable Feedback (5 items)
-
CMA_verify_with_outcome_grader.ipynb(in cell withUSER_MESSSAGE = """) — Typo: variable is namedUSER_MESSSAGE(three S's). It works at runtime since definition and usage match, but is visible to all readers of the source. Rename toUSER_MESSAGEin both the definition and the reference two cells later. -
CMA_verify_with_outcome_grader.ipynb(inside theRUBRICstring) — Incomplete sentence:"COVERAGE CHECKLIST. Each item has a specific area"ends without a predicate. This is sent verbatim to the grader agent. Should read something like"Each item has a specific bar that must be cleared"to make the instruction unambiguous. -
CMA_verify_with_outcome_grader.ipynb(polling cell,elif et == "session.status_idle": done = True) — Premature loop exit: the session can go idle between the writer finishing and the grader spinning up. If this fires beforespan.outcome_evaluation_end, the loop exits withres = Noneand displaysNone after 0 iterations. The coordinate notebook guards this withif created > 0 and idled >= created:; add an analogous guard here (e.g. only setdone = Trueonsession.status_idlewhenit > 0orres is not None). -
Both notebooks (last cell) — No resource cleanup: the coordinate notebook creates 4 agents, 1 environment, 9 files, and 1 session; the verify notebook creates 1 agent, 1 environment, and 1 session. None are archived. Other CMA notebooks (e.g.
CMA_iterate_fix_failing_tests.ipynb) include a final cleanup cell callingclient.beta.sessions.archive,client.beta.environments.archive, andclient.beta.agents.archive. Add equivalent cleanup cells to both new notebooks to avoid dangling resources (especially important for tutorial readers who may run the notebook multiple times). -
Both notebooks (polling loops) — No wall-clock timeout: if the session stalls without firing
session.status_idle, both loops poll indefinitely. The coordinate notebook has an additional edge case: ifcreatedstays 0 (coordinator spawns no subagents), theif created > 0guard blocks every subsequent idle event from breaking the loop. Add a deadline guard (e.g.if time.time() > time.time() + 600: break) or add a prose note that this is a simplified tutorial loop, and fix thecreated == 0dead-lock case with a comment or guard.
Detailed Review
Code Quality
Both notebooks follow the CMA tutorial conventions well: COOKBOOK_MODEL env var, numbered sections, committed outputs, httpx raw polling explained with a comment. The for/else/break pattern in the coordinate notebook's polling loop is correct and idiomatic. The text_of() helper defensively handles both string and typed-block content shapes. The render_feedback regex stripping in the verify notebook is pragmatic and well-commented.
Security
No hardcoded secrets. client.api_key is used in the httpx header — this correctly reads from the SDK client (which sources it from ANTHROPIC_API_KEY), not from any hardcoded value. No injection risks in the notebooks.
Model Usage
Correct non-dated aliases used throughout: claude-opus-4-7 for the coordinate notebook and claude-sonnet-4-6 for the verify notebook. The choice of Opus for the coordinator (which orchestrates three parallel specialists) is appropriate.
Suggestions (non-blocking)
- Both notebooks use
%pip install anthropicwithout the-qflag. Other CMA notebooks use-qto suppress noisy output. Consider changing to%pip install -q anthropic. - The verify notebook re-imports
timein the polling cell (import re, time) even thoughtimewas already imported at the top. The duplicate is harmless but slightly confusing for readers. - Both notebooks could add 2–4 bullet-point "By the end of this notebook you will have…" learning objectives to the intro cell, following the style of
CMA_iterate_fix_failing_tests.ipynb. Minor given the README table fills this role, but it would bring these to par with the rest of the series.
Positive Notes
- The RUBRIC in the verify notebook is a standout artifact: the five-point specificity on the named-operator citation (GAAP, 10-K/10-Q, sec.gov only) is precise enough that a grader agent can actually verify it, and the narrative explaining what the grader caught (8-K exhibit vs. 10-K) gives readers a concrete, non-obvious example of rubric precision in practice.
- README entries are accurate and slot cleanly into the existing table format. Registry entries have correct paths, author, date, and categories.
- The
markn-antauthor is present inauthors.yaml.
Adds two guided tutorials for the Managed Agents research-preview features:
CMA_coordinate_specialist_team.ipynb— heterogeneous multiagent viacallable_agents. A coordinator runs three specialists with scoped toolsets to assemble a sales proposal.CMA_verify_with_outcome_grader.ipynb— iterative grader loop viadefine_outcome. A writer drafts a cited brief, a grader independently verifies every citation against a rubric, feedback drives revisions.Both follow the existing CMA tutorial conventions (COOKBOOK_MODEL env var, numbered sections, committed outputs). README table and registry.yaml updated.