From 09b89727477b38b14bf2007a73d2d0c4d8eb2450 Mon Sep 17 00:00:00 2001 From: Charmaine Lee Date: Wed, 8 Apr 2026 17:58:24 -0400 Subject: [PATCH 1/3] fix(managed_agents): unwrap markdown paragraphs for MDX rendering The docs site's MDX renderer treats single newlines as hard breaks, so hard-wrapped source paragraphs rendered with mid-sentence line breaks (most visibly, inline links appearing alone on their own line). - Reflow all prose paragraphs to single lines across 8 notebooks - Convert the 4-space-indented flow diagram in slack_data_bot to a fenced code block (MDX does not render indented code blocks) --- .../CMA_explore_unfamiliar_codebase.ipynb | 46 +--- .../CMA_gate_human_in_the_loop.ipynb | 81 ++----- .../CMA_iterate_fix_failing_tests.ipynb | 155 +++---------- .../CMA_operate_in_production.ipynb | 170 +++------------ .../CMA_orchestrate_issue_to_pr.ipynb | 76 +------ managed_agents/data_analyst_agent.ipynb | 108 +++------ managed_agents/slack_data_bot.ipynb | 111 ++-------- managed_agents/sre_incident_responder.ipynb | 206 ++++++------------ 8 files changed, 200 insertions(+), 753 deletions(-) diff --git a/managed_agents/CMA_explore_unfamiliar_codebase.ipynb b/managed_agents/CMA_explore_unfamiliar_codebase.ipynb index 73651383..9ab723d8 100644 --- a/managed_agents/CMA_explore_unfamiliar_codebase.ipynb +++ b/managed_agents/CMA_explore_unfamiliar_codebase.ipynb @@ -7,28 +7,14 @@ "source": [ "# Explore: grounding in an unfamiliar codebase\n", "\n", - "This notebook drops the agent into a repository it's never seen\n", - "before and asks it to figure out the real architecture. The\n", - "filesystem is the agent's only workspace, and only the files it\n", - "chooses to read end up in its context window, so exploration with\n", - "`ls`, `grep`, and `read` is how it builds up a mental model.\n", + "This notebook drops the agent into a repository it's never seen before and asks it to figure out the real architecture. The filesystem is the agent's only workspace, and only the files it chooses to read end up in its context window, so exploration with `ls`, `grep`, and `read` is how it builds up a mental model.\n", "\n", - "The interesting part is a trap we've planted in the fixture.\n", - "`ARCHITECTURE.md` describes a layout that the code no longer\n", - "follows, so an agent that trusts the docs without checking the\n", - "code will confidently give the wrong answer. Grounding, in this\n", - "context, means verifying what you read against what's actually\n", - "there rather than treating documentation as authoritative.\n", + "The interesting part is a trap we've planted in the fixture. `ARCHITECTURE.md` describes a layout that the code no longer follows, so an agent that trusts the docs without checking the code will confidently give the wrong answer. Grounding, in this context, means verifying what you read against what's actually there rather than treating documentation as authoritative.\n", "\n", "What this teaches beyond the iterate notebook:\n", "\n", - "- **Exploration before action.** A good agent reads enough of the\n", - " tree to understand it, then answers, not the other way around.\n", - "- **Adding resources mid-session.** The sidebar at the end shows\n", - " how to push more files into a running session via\n", - " `sessions.resources.add` rather than re-creating the session.\n", - " Useful when exploration uncovers something the agent should\n", - " look at next." + "- **Exploration before action.** A good agent reads enough of the tree to understand it, then answers, not the other way around.\n", + "- **Adding resources mid-session.** The sidebar at the end shows how to push more files into a running session via `sessions.resources.add` rather than re-creating the session. Useful when exploration uncovers something the agent should look at next." ] }, { @@ -60,10 +46,7 @@ "source": [ "## 1. Generate the repo fixture\n", "\n", - "The repo is small enough that we build it in memory with a helper\n", - "rather than keeping a disk fixture alongside the notebook. The\n", - "helper plants a `services/` microservices layout and a stale\n", - "`ARCHITECTURE.md` that still describes the old monolithic layout." + "The repo is small enough that we build it in memory with a helper rather than keeping a disk fixture alongside the notebook. The helper plants a `services/` microservices layout and a stale `ARCHITECTURE.md` that still describes the old monolithic layout." ] }, { @@ -133,9 +116,7 @@ "source": [ "## 3. Explore and watch for the stale-doc trap\n", "\n", - "A grounded answer mentions the real `services/` layout and flags\n", - "`ARCHITECTURE.md` as out of date. An ungrounded answer parrots\n", - "the monolith layout the stale doc describes." + "A grounded answer mentions the real `services/` layout and flags `ARCHITECTURE.md` as out of date. An ungrounded answer parrots the monolith layout the stale doc describes." ] }, { @@ -176,9 +157,7 @@ "source": [ "## 4. Read back the agent's notes\n", "\n", - "The agent was told to keep notes in `/tmp/NOTES.md` as it worked.\n", - "Printing that file is a useful way to see how its understanding\n", - "of the codebase developed during exploration." + "The agent was told to keep notes in `/tmp/NOTES.md` as it worked. Printing that file is a useful way to see how its understanding of the codebase developed during exploration." ] }, { @@ -207,16 +186,9 @@ "source": [ "## Sidebar: adding more context to a running session\n", "\n", - "The `resources=` argument on `sessions.create` is the most common\n", - "way to mount files, but the API also exposes a\n", - "`/v1/sessions//resources` sub-resource for managing mounts on\n", - "an existing session. This is useful here: if exploration uncovers\n", - "a question that needs additional context (a config file, a\n", - "changelog, an external schema), you can drop it in without\n", - "tearing down the session.\n", + "The `resources=` argument on `sessions.create` is the most common way to mount files, but the API also exposes a `/v1/sessions//resources` sub-resource for managing mounts on an existing session. This is useful here: if exploration uncovers a question that needs additional context (a config file, a changelog, an external schema), you can drop it in without tearing down the session.\n", "\n", - "The pattern is the same upload-then-attach loop you already know,\n", - "just split across two calls instead of one:" + "The pattern is the same upload-then-attach loop you already know, just split across two calls instead of one:" ] }, { diff --git a/managed_agents/CMA_gate_human_in_the_loop.ipynb b/managed_agents/CMA_gate_human_in_the_loop.ipynb index c1012ee4..74d4bf04 100644 --- a/managed_agents/CMA_gate_human_in_the_loop.ipynb +++ b/managed_agents/CMA_gate_human_in_the_loop.ipynb @@ -7,54 +7,24 @@ "source": [ "# Gate: human-in-the-loop with custom tools\n", "\n", - "Many workflows sit in the gap between \"fully automate\" and\n", - "\"always ask a human.\" Expense approval is a classic example: the\n", - "agent can handle the clear cases on its own, but it should know\n", - "when to escalate ambiguous ones for human review. Calibration\n", - "matters here, an agent that escalates everything is exhausting\n", - "to work with, and an agent that escalates nothing is dangerous.\n", + "Many workflows sit in the gap between \"fully automate\" and \"always ask a human.\" Expense approval is a classic example: the agent can handle the clear cases on its own, but it should know when to escalate ambiguous ones for human review. Calibration matters here, an agent that escalates everything is exhausting to work with, and an agent that escalates nothing is dangerous.\n", "\n", - "This notebook builds an expense approver around two **custom\n", - "tools**: `decide()` for clear-cut cases and `escalate()` for\n", - "ambiguous ones. Both round-trip through your application, which\n", - "is where you either log the outcome (decide) or put it in front\n", - "of a reviewer (escalate).\n", + "This notebook builds an expense approver around two **custom tools**: `decide()` for clear-cut cases and `escalate()` for ambiguous ones. Both round-trip through your application, which is where you either log the outcome (decide) or put it in front of a reviewer (escalate).\n", "\n", "## What custom tools are\n", "\n", - "Up until now the cookbook has used the built-in `agent_toolset`\n", - "(bash, read, write, etc.), all of which run inside the sandbox\n", - "container. **Custom tools** are different: when the agent calls\n", - "one, the session pauses and emits an `agent.custom_tool_use`\n", - "event, your application sees the call, runs whatever code you\n", - "want, and POSTs back a `user.custom_tool_result` event. The\n", - "session resumes with that result in the agent's context.\n", + "Up until now the cookbook has used the built-in `agent_toolset` (bash, read, write, etc.), all of which run inside the sandbox container. **Custom tools** are different: when the agent calls one, the session pauses and emits an `agent.custom_tool_use` event, your application sees the call, runs whatever code you want, and POSTs back a `user.custom_tool_result` event. The session resumes with that result in the agent's context.\n", "\n", "This is the right shape for two situations:\n", "\n", - "1. **The data lives somewhere the sandbox can't reach.** Anything\n", - " behind your own network boundary. The agent calls back into\n", - " your application via the round-trip.\n", - "2. **You want a human in the loop, or your own audit and approval\n", - " layer in front of every call.** That's what this notebook does:\n", - " `decide` and `escalate` aren't just \"tools\" in the abstract,\n", - " they're the seam where your business logic and human reviewers\n", - " take over from the agent.\n", + "1. **The data lives somewhere the sandbox can't reach.** Anything behind your own network boundary. The agent calls back into your application via the round-trip.\n", + "2. **You want a human in the loop, or your own audit and approval layer in front of every call.** That's what this notebook does: `decide` and `escalate` aren't just \"tools\" in the abstract, they're the seam where your business logic and human reviewers take over from the agent.\n", "\n", - "(The other extension patterns, MCP toolsets and `resources=`\n", - "repo mounts, are covered in the operate notebook and the orchestrate notebook\n", - "respectively.)\n", + "(The other extension patterns, MCP toolsets and `resources=` repo mounts, are covered in the operate notebook and the orchestrate notebook respectively.)\n", "\n", - "The notebook has two parts. Part A drives the session by\n", - "streaming events locally and responding to each custom tool call\n", - "as it arrives, convenient during development because everything\n", - "happens in one process and you can see the behavior live. Part B\n", - "is a short pointer to the production webhook pattern, which is\n", - "walked through end-to-end in the operate notebook.\n", + "The notebook has two parts. Part A drives the session by streaming events locally and responding to each custom tool call as it arrives, convenient during development because everything happens in one process and you can see the behavior live. Part B is a short pointer to the production webhook pattern, which is walked through end-to-end in the operate notebook.\n", "\n", - "The fixture lives in `example_data/gate/` and contains a\n", - "`policy.yaml` plus twelve receipts that exercise every branch of\n", - "the policy." + "The fixture lives in `example_data/gate/` and contains a `policy.yaml` plus twelve receipts that exercise every branch of the policy." ] }, { @@ -112,17 +82,9 @@ "source": [ "## 2. Define the agent with two custom tools\n", "\n", - "Custom tools are declared in the same `tools=` array as the\n", - "built-in toolset, with `\"type\": \"custom\"` and a JSON schema for\n", - "the input. Each declaration tells the model what the tool is for\n", - "(`description`), what to call it with (`input_schema`), and what\n", - "its name is. The agent decides when to call them; your code\n", - "decides what they do when called.\n", + "Custom tools are declared in the same `tools=` array as the built-in toolset, with `\"type\": \"custom\"` and a JSON schema for the input. Each declaration tells the model what the tool is for (`description`), what to call it with (`input_schema`), and what its name is. The agent decides when to call them; your code decides what they do when called.\n", "\n", - "Here we keep the built-in `agent_toolset_20260401` enabled too,\n", - "so the agent can read the policy file and the receipts inline.\n", - "`decide` and `escalate` are the two custom tools that make every\n", - "decision a round-trip." + "Here we keep the built-in `agent_toolset_20260401` enabled too, so the agent can read the policy file and the receipts inline. `decide` and `escalate` are the two custom tools that make every decision a round-trip." ] }, { @@ -210,12 +172,7 @@ "source": [ "## Part A: streaming locally during development\n", "\n", - "The simplest way to drive a custom-tool agent is to stream the\n", - "session's events and react to each tool call as it arrives.\n", - "`decide` calls get logged and `escalate` calls get a simulated\n", - "human decision inline. In production you would queue the\n", - "escalation and have a real reviewer come back to it later, which\n", - "is what the operate notebook covers." + "The simplest way to drive a custom-tool agent is to stream the session's events and react to each tool call as it arrives. `decide` calls get logged and `escalate` calls get a simulated human decision inline. In production you would queue the escalation and have a real reviewer come back to it later, which is what the operate notebook covers." ] }, { @@ -335,21 +292,9 @@ "source": [ "## Part B: webhooks for production\n", "\n", - "The local streaming pattern works fine during development, but\n", - "it holds an HTTP connection open while humans think, which\n", - "doesn't scale well. The production pattern instead registers a\n", - "webhook in the Console that fires on `session.status_idled`,\n", - "which is the signal that the agent is either done or waiting on\n", - "a tool result. Your server inspects the events, puts any pending\n", - "escalation in front of a reviewer, and POSTs the\n", - "`user.custom_tool_result` back whenever the human finishes, no\n", - "long-lived connection on your side.\n", + "The local streaming pattern works fine during development, but it holds an HTTP connection open while humans think, which doesn't scale well. The production pattern instead registers a webhook in the Console that fires on `session.status_idled`, which is the signal that the agent is either done or waiting on a tool result. Your server inspects the events, puts any pending escalation in front of a reviewer, and POSTs the `user.custom_tool_result` back whenever the human finishes, no long-lived connection on your side.\n", "\n", - "The operate notebook walks through the full webhook setup end-to-end:\n", - "Console registration, HMAC signature verification, the FastAPI\n", - "handler, and the round-trip back to `events.send`. The code that\n", - "responds to the agent is identical to Part A above; only the\n", - "trigger changes (webhook push instead of streaming pull)." + "The operate notebook walks through the full webhook setup end-to-end: Console registration, HMAC signature verification, the FastAPI handler, and the round-trip back to `events.send`. The code that responds to the agent is identical to Part A above; only the trigger changes (webhook push instead of streaming pull)." ] } ], diff --git a/managed_agents/CMA_iterate_fix_failing_tests.ipynb b/managed_agents/CMA_iterate_fix_failing_tests.ipynb index 14e1f839..c7d93640 100644 --- a/managed_agents/CMA_iterate_fix_failing_tests.ipynb +++ b/managed_agents/CMA_iterate_fix_failing_tests.ipynb @@ -7,17 +7,9 @@ "source": [ "# Iterate: do → observe → fix\n", "\n", - "This is the entry-point notebook. You'll learn the Managed Agents\n", - "API surface by doing the most universal thing an agent does: try\n", - "something, read what happened, try again. We upload a tiny package\n", - "with two planted bugs, tell the agent to make the tests pass,\n", - "and watch it work the loop, run the tests, read the traceback,\n", - "edit the code, rerun, repeat until green.\n", + "This is the entry-point notebook. You'll learn the Managed Agents API surface by doing the most universal thing an agent does: try something, read what happened, try again. We upload a tiny package with two planted bugs, tell the agent to make the tests pass, and watch it work the loop, run the tests, read the traceback, edit the code, rerun, repeat until green.\n", "\n", - "Along the way you'll see every API shape the rest of the cookbook\n", - "builds on: agent / environment / session, the file mount, the\n", - "event stream, and the archive call. By the end of this notebook\n", - "you'll have done everything you need to drive an agent end-to-end.\n", + "Along the way you'll see every API shape the rest of the cookbook builds on: agent / environment / session, the file mount, the event stream, and the archive call. By the end of this notebook you'll have done everything you need to drive an agent end-to-end.\n", "\n", "## Concepts\n", "\n", @@ -25,11 +17,9 @@ "\n", "- **Agent**, a reusable config (model, system prompt, tools)\n", "- **Environment**, a container template (packages, networking)\n", - "- **Session**, binds an agent and an environment, mounts any\n", - " files the agent needs, and produces an event stream\n", + "- **Session**, binds an agent and an environment, mounts any files the agent needs, and produces an event stream\n", "\n", - "You create an agent and an environment once and reuse them across\n", - "many sessions. Each session is one self-contained run." + "You create an agent and an environment once and reuse them across many sessions. Each session is one self-contained run." ] }, { @@ -57,15 +47,9 @@ "source": [ "## 1. Create the agent\n", "\n", - "The system prompt is deliberately sparse. We want the agent to\n", - "figure out the iterate loop for itself rather than follow a\n", - "step-by-step script, the test output makes the task obvious\n", - "enough without further hand-holding.\n", + "The system prompt is deliberately sparse. We want the agent to figure out the iterate loop for itself rather than follow a step-by-step script, the test output makes the task obvious enough without further hand-holding.\n", "\n", - "`agent_toolset_20260401` is the built-in toolset: bash, read,\n", - "write, edit, glob, grep, web_fetch, and web_search. Setting\n", - "`permission_policy` to `always_allow` lets the agent run them\n", - "without round-tripping for confirmation." + "`agent_toolset_20260401` is the built-in toolset: bash, read, write, edit, glob, grep, web_fetch, and web_search. Setting `permission_policy` to `always_allow` lets the agent run them without round-tripping for confirmation." ] }, { @@ -102,10 +86,7 @@ "source": [ "## 2. Create the environment\n", "\n", - "An environment is a container template. `type: cloud` runs in\n", - "Anthropic's hosted sandbox. `networking: limited` blocks\n", - "arbitrary outbound traffic, this notebook doesn't need network\n", - "access at all, so we keep it locked down." + "An environment is a container template. `type: cloud` runs in Anthropic's hosted sandbox. `networking: limited` blocks arbitrary outbound traffic, this notebook doesn't need network access at all, so we keep it locked down." ] }, { @@ -128,15 +109,9 @@ "source": [ "## 3. Upload the failing tests\n", "\n", - "Upload files through the Files API to get back IDs. We'll mount\n", - "them on the session in step 4.\n", + "Upload files through the Files API to get back IDs. We'll mount them on the session in step 4.\n", "\n", - "`calc.py` has two planted bugs and `test_calc.py` has three\n", - "assertions that catch them. One of the failures (`test_mean`)\n", - "is downstream of the other two, which quietly teaches the agent\n", - "not to over-fix: `mean()` calls `add` and `divide` internally,\n", - "so once those are fixed `test_mean` starts passing on its own\n", - "without any direct edit to `mean()`." + "`calc.py` has two planted bugs and `test_calc.py` has three assertions that catch them. One of the failures (`test_mean`) is downstream of the other two, which quietly teaches the agent not to over-fix: `mean()` calls `add` and `divide` internally, so once those are fixed `test_mean` starts passing on its own without any direct edit to `mean()`." ] }, { @@ -162,16 +137,9 @@ "source": [ "## 4. Create the session\n", "\n", - "A session binds the agent and the environment, mounts any files\n", - "the agent needs, and starts a fresh container. `resources=` is\n", - "how you put data into the container before the agent starts —\n", - "the orchestrate notebook shows how to use the same field to\n", - "clone a GitHub repo instead of mounting individual files.\n", + "A session binds the agent and the environment, mounts any files the agent needs, and starts a fresh container. `resources=` is how you put data into the container before the agent starts — the orchestrate notebook shows how to use the same field to clone a GitHub repo instead of mounting individual files.\n", "\n", - "Files mount under `/mnt/session/uploads/`, which is\n", - "read-only. The agent has to copy files into a writable directory\n", - "like `/mnt/user` or `/tmp` before it can edit them, and anything\n", - "you want to retrieve later goes in `/mnt/session/outputs/`." + "Files mount under `/mnt/session/uploads/`, which is read-only. The agent has to copy files into a writable directory like `/mnt/user` or `/tmp` before it can edit them, and anything you want to retrieve later goes in `/mnt/session/outputs/`." ] }, { @@ -200,25 +168,14 @@ "source": [ "## 5. Drive the agent and watch it work\n", "\n", - "Two steps: send a `user.message` event with the task, then read\n", - "the event stream until the agent reaches `end_turn`.\n", + "Two steps: send a `user.message` event with the task, then read the event stream until the agent reaches `end_turn`.\n", "\n", - "The stream is a server-sent event connection. We use it (rather\n", - "than polling, see the sidebar at the end) because the agent will\n", - "spend ~30 seconds iterating and we want to see each round live.\n", + "The stream is a server-sent event connection. We use it (rather than polling, see the sidebar at the end) because the agent will spend ~30 seconds iterating and we want to see each round live.\n", "\n", "Two patterns to internalize:\n", "\n", - "1. **Open the stream first, then send.** The `with` block opens\n", - " the SSE connection; anything you `send` inside the block is\n", - " guaranteed to be observable. Sending before opening risks\n", - " losing events that fire in the race window.\n", - "2. **Exit on `session.status_idle` with `stop_reason.type ==\n", - " \"end_turn\"`.** The session goes idle any time it's waiting for\n", - " input, both at end of turn AND when a custom tool call needs\n", - " a response. `stop_reason.type` disambiguates; `end_turn` is\n", - " our exit signal. The gate notebook shows the\n", - " `requires_action` side of the same loop." + "1. **Open the stream first, then send.** The `with` block opens the SSE connection; anything you `send` inside the block is guaranteed to be observable. Sending before opening risks losing events that fire in the race window.\n", + "2. **Exit on `session.status_idle` with `stop_reason.type == \"end_turn\"`.** The session goes idle any time it's waiting for input, both at end of turn AND when a custom tool call needs a response. `stop_reason.type` disambiguates; `end_turn` is our exit signal. The gate notebook shows the `requires_action` side of the same loop." ] }, { @@ -270,18 +227,9 @@ "id": "f17c3bb8", "metadata": {}, "source": [ - "That `match ev.type:` block is the canonical streaming pattern.\n", - "Every other notebook in this cookbook imports it as\n", - "`stream_until_end_turn` from `utilities.py` instead of repeating\n", - "the loop. We use it for the verify step below.\n", + "That `match ev.type:` block is the canonical streaming pattern. Every other notebook in this cookbook imports it as `stream_until_end_turn` from `utilities.py` instead of repeating the loop. We use it for the verify step below.\n", "\n", - "`wait_for_idle_status` is the second helper from `utilities.py`.\n", - "It absorbs the race described in the callout in step 7: even\n", - "after a stream has yielded `session.status_idle`, the server-side\n", - "`status` field on the session record can briefly still read\n", - "`running`, and an immediate `archive()` call would 400. The\n", - "helper just polls `sessions.retrieve` until the field settles.\n", - "Code that streams and then archives in the next breath needs it." + "`wait_for_idle_status` is the second helper from `utilities.py`. It absorbs the race described in the callout in step 7: even after a stream has yielded `session.status_idle`, the server-side `status` field on the session record can briefly still read `running`, and an immediate `archive()` call would 400. The helper just polls `sessions.retrieve` until the field settles. Code that streams and then archives in the next breath needs it." ] }, { @@ -301,10 +249,7 @@ "source": [ "## 6. Verify\n", "\n", - "Don't take the agent's word for it. Re-run every assertion one\n", - "more time independently and print the final `calc.py`. If the\n", - "agent over-fixed or regressed something between the last\n", - "in-loop run and the end of the turn, this catches it." + "Don't take the agent's word for it. Re-run every assertion one more time independently and print the final `calc.py`. If the agent over-fixed or regressed something between the last in-loop run and the end of the turn, this catches it." ] }, { @@ -344,16 +289,7 @@ "source": [ "## 7. Cleanup\n", "\n", - "Archiving is how you mark a session, environment, or agent as\n", - "finished. It tears down any live container, stops the resource\n", - "from counting against your workspace quotas, and hides it from\n", - "default list views, but it keeps the record, configuration,\n", - "and event history around for audit and for anyone who wants to\n", - "retrieve the resource later by ID. If you want to remove the\n", - "record entirely, most resources also expose a separate `delete`\n", - "endpoint (the operate notebook walks through resource lifecycle\n", - "in detail), but `archive` is almost always what you want at the\n", - "end of a run." + "Archiving is how you mark a session, environment, or agent as finished. It tears down any live container, stops the resource from counting against your workspace quotas, and hides it from default list views, but it keeps the record, configuration, and event history around for audit and for anyone who wants to retrieve the resource later by ID. If you want to remove the record entirely, most resources also expose a separate `delete` endpoint (the operate notebook walks through resource lifecycle in detail), but `archive` is almost always what you want at the end of a run." ] }, { @@ -377,11 +313,7 @@ "source": [ "## Sidebar: polling instead of streaming\n", "\n", - "The streaming pattern in step 5 is the right choice when you\n", - "want live progress on something the agent will spend more than a\n", - "few seconds on. For shorter tasks, or for production code where\n", - "you don't want a long-lived HTTP connection, you can do the\n", - "same thing with `events.list` polling instead:\n", + "The streaming pattern in step 5 is the right choice when you want live progress on something the agent will spend more than a few seconds on. For shorter tasks, or for production code where you don't want a long-lived HTTP connection, you can do the same thing with `events.list` polling instead:\n", "\n", "```python\n", "client.beta.sessions.events.send(session_id=..., events=[...])\n", @@ -403,27 +335,11 @@ "\n", "Tradeoffs:\n", "\n", - "Streaming wins when you want to watch the agent work. Every\n", - "tool call, every partial message, every state transition\n", - "arrives as soon as the server emits it, which is exactly what\n", - "you want while developing a new workflow. The cost is a\n", - "long-lived SSE connection: your process has to stay alive for\n", - "the duration of the turn, it can't pause and resume across\n", - "gaps, and a network blip at the wrong moment can drop you\n", - "mid-stream with no clean way to recover where you left off.\n", + "Streaming wins when you want to watch the agent work. Every tool call, every partial message, every state transition arrives as soon as the server emits it, which is exactly what you want while developing a new workflow. The cost is a long-lived SSE connection: your process has to stay alive for the duration of the turn, it can't pause and resume across gaps, and a network blip at the wrong moment can drop you mid-stream with no clean way to recover where you left off.\n", "\n", - "Polling wins in the opposite situation. It's stateless,\n", - "survives process restarts, and composes cleanly with webhook\n", - "handlers and queue workers that don't want to hold connections\n", - "open. The cost is latency and hidden progress: you don't see\n", - "anything until you poll again, so feedback is bounded by your\n", - "poll interval, and a long turn looks like silence until it's\n", - "done.\n", + "Polling wins in the opposite situation. It's stateless, survives process restarts, and composes cleanly with webhook handlers and queue workers that don't want to hold connections open. The cost is latency and hidden progress: you don't see anything until you poll again, so feedback is bounded by your poll interval, and a long turn looks like silence until it's done.\n", "\n", - "In production setups where the agent might run for minutes and\n", - "your handler can't hold a connection open, the polling pattern\n", - "(or its production cousin, the `session.status_idled` webhook\n", - "shown in the gate notebook) is what you want." + "In production setups where the agent might run for minutes and your handler can't hold a connection open, the polling pattern (or its production cousin, the `session.status_idled` webhook shown in the gate notebook) is what you want." ] }, { @@ -433,27 +349,12 @@ "source": [ "## Where to go next\n", "\n", - "The iterate loop is the simplest shape an agent loop takes. Four\n", - "companion notebooks in this directory build on the same API\n", - "shapes and show other workflows you can drive:\n", + "The iterate loop is the simplest shape an agent loop takes. Four companion notebooks in this directory build on the same API shapes and show other workflows you can drive:\n", "\n", - "- [`CMA_orchestrate_issue_to_pr.ipynb`](CMA_orchestrate_issue_to_pr.ipynb) — multi-turn\n", - " agent that carries state through a longer tool chain: read an\n", - " issue, write a fix, open a PR, recover from a CI failure,\n", - " address a review comment, and merge.\n", - "- [`CMA_explore_unfamiliar_codebase.ipynb`](CMA_explore_unfamiliar_codebase.ipynb) — the\n", - " grounding pattern for an agent dropped into a repo it has\n", - " never seen, with a planted stale-doc trap. Also shows\n", - " `sessions.resources.add` for pushing more context into a\n", - " running session.\n", - "- [`CMA_gate_human_in_the_loop.ipynb`](CMA_gate_human_in_the_loop.ipynb) — custom-tool\n", - " `decide()` and `escalate()` round-trip for human-in-the-loop\n", - " workflows. Covers the `requires_action` idle bounce and\n", - " parallel-tool-call dedupe.\n", - "- [`CMA_operate_in_production.ipynb`](CMA_operate_in_production.ipynb) — production\n", - " setup story: vault-backed MCP credentials, the\n", - " `session.status_idled` webhook for HITL without long-lived\n", - " connections, and the resource lifecycle CRUD verbs." + "- [`CMA_orchestrate_issue_to_pr.ipynb`](CMA_orchestrate_issue_to_pr.ipynb) — multi-turn agent that carries state through a longer tool chain: read an issue, write a fix, open a PR, recover from a CI failure, address a review comment, and merge.\n", + "- [`CMA_explore_unfamiliar_codebase.ipynb`](CMA_explore_unfamiliar_codebase.ipynb) — the grounding pattern for an agent dropped into a repo it has never seen, with a planted stale-doc trap. Also shows `sessions.resources.add` for pushing more context into a running session.\n", + "- [`CMA_gate_human_in_the_loop.ipynb`](CMA_gate_human_in_the_loop.ipynb) — custom-tool `decide()` and `escalate()` round-trip for human-in-the-loop workflows. Covers the `requires_action` idle bounce and parallel-tool-call dedupe.\n", + "- [`CMA_operate_in_production.ipynb`](CMA_operate_in_production.ipynb) — production setup story: vault-backed MCP credentials, the `session.status_idled` webhook for HITL without long-lived connections, and the resource lifecycle CRUD verbs." ] } ], diff --git a/managed_agents/CMA_operate_in_production.ipynb b/managed_agents/CMA_operate_in_production.ipynb index b0d8ed41..8ea94921 100644 --- a/managed_agents/CMA_operate_in_production.ipynb +++ b/managed_agents/CMA_operate_in_production.ipynb @@ -7,29 +7,14 @@ "source": [ "# Operate: running Managed Agents in production\n", "\n", - "Most of the other Managed Agents cookbooks focus on the agent\n", - "loop itself, getting an agent to do something useful against a\n", - "fixture. This one is about the machinery around that loop, the\n", - "pieces you need before you can put a Managed Agents app in\n", - "front of real users:\n", - "\n", - "1. **MCP toolsets** instead of custom tools, when your agent\n", - " needs to talk to a SaaS API without round-tripping every call\n", - " through your application.\n", - "2. **Vaults** to hold per-end-user credentials, so each user's\n", - " GitHub / Linear / Slack tokens stay separate from everyone\n", - " else's and your audit trail is clean.\n", - "3. **Webhooks** to drive human-in-the-loop work without holding\n", - " a long-lived HTTP connection open the whole time.\n", - "4. **Resource lifecycle** verbs (list, retrieve, update, archive,\n", - " delete) for managing what your workspace accumulates over time.\n", - "\n", - "We'll build one end-to-end flow that touches all four: create a\n", - "vault for a fictional end user, attach a GitHub MCP credential to\n", - "it, run an agent session that uses the credential server-side,\n", - "show the webhook handler you'd register to drive the same session\n", - "from a real production server, and walk through the management\n", - "verbs you'd use to clean up afterwards.\n", + "Most of the other Managed Agents cookbooks focus on the agent loop itself, getting an agent to do something useful against a fixture. This one is about the machinery around that loop, the pieces you need before you can put a Managed Agents app in front of real users:\n", + "\n", + "1. **MCP toolsets** instead of custom tools, when your agent needs to talk to a SaaS API without round-tripping every call through your application.\n", + "2. **Vaults** to hold per-end-user credentials, so each user's GitHub / Linear / Slack tokens stay separate from everyone else's and your audit trail is clean.\n", + "3. **Webhooks** to drive human-in-the-loop work without holding a long-lived HTTP connection open the whole time.\n", + "4. **Resource lifecycle** verbs (list, retrieve, update, archive, delete) for managing what your workspace accumulates over time.\n", + "\n", + "We'll build one end-to-end flow that touches all four: create a vault for a fictional end user, attach a GitHub MCP credential to it, run an agent session that uses the credential server-side, show the webhook handler you'd register to drive the same session from a real production server, and walk through the management verbs you'd use to clean up afterwards.\n", "\n", "This notebook needs `GITHUB_TOKEN` in your environment." ] @@ -61,33 +46,11 @@ "source": [ "## Concepts: MCP toolsets and vaults\n", "\n", - "**MCP toolsets** are the third extension pattern, after custom\n", - "tools (the gate notebook) and `resources=` mounts (the orchestrate notebook).\n", - "An MCP toolset points the agent at an external server that\n", - "implements the [Model Context Protocol](https://modelcontextprotocol.io/).\n", - "The agent calls tools on that server directly from inside the\n", - "sandbox, with no round-trip through your application, Anthropic\n", - "proxies the calls, the server responds, and the agent keeps\n", - "going. The vast majority of public SaaS APIs (GitHub, Slack,\n", - "Linear, Stripe, Notion, Salesforce, Asana...) either already have\n", - "an MCP server or can be wrapped in one in an afternoon, and any\n", - "of them are good MCP candidates.\n", - "\n", - "Rule of thumb: if the service is reachable over the public\n", - "internet with a bearer token, an MCP toolset will work. If\n", - "it's only reachable from inside your own network, use a custom\n", - "tool instead, which is what the gate notebook covers.\n", - "\n", - "**Vaults** are the answer to the question \"where do I put the\n", - "tokens?\" Hard-coding a single token at session creation time\n", - "works for a one-tenant setup, but it falls apart the moment you\n", - "have end users. Each user needs their own GitHub credential, and\n", - "you need to keep them isolated from each other. A vault is a\n", - "per-user container of credentials that you register once and\n", - "then reference by ID on every session you create for that user.\n", - "You don't run your own secret store, you don't pass tokens on\n", - "every request, and the audit trail is tied to the vault so you\n", - "always know which end user an agent was acting for." + "**MCP toolsets** are the third extension pattern, after custom tools (the gate notebook) and `resources=` mounts (the orchestrate notebook). An MCP toolset points the agent at an external server that implements the [Model Context Protocol](https://modelcontextprotocol.io/). The agent calls tools on that server directly from inside the sandbox, with no round-trip through your application, Anthropic proxies the calls, the server responds, and the agent keeps going. The vast majority of public SaaS APIs (GitHub, Slack, Linear, Stripe, Notion, Salesforce, Asana...) either already have an MCP server or can be wrapped in one in an afternoon, and any of them are good MCP candidates.\n", + "\n", + "Rule of thumb: if the service is reachable over the public internet with a bearer token, an MCP toolset will work. If it's only reachable from inside your own network, use a custom tool instead, which is what the gate notebook covers.\n", + "\n", + "**Vaults** are the answer to the question \"where do I put the tokens?\" Hard-coding a single token at session creation time works for a one-tenant setup, but it falls apart the moment you have end users. Each user needs their own GitHub credential, and you need to keep them isolated from each other. A vault is a per-user container of credentials that you register once and then reference by ID on every session you create for that user. You don't run your own secret store, you don't pass tokens on every request, and the audit trail is tied to the vault so you always know which end user an agent was acting for." ] }, { @@ -97,10 +60,7 @@ "source": [ "## 1. Create a vault for an end user\n", "\n", - "A vault has a `display_name` that shows up in the Console and a\n", - "`metadata` dict where you'd typically store your internal user\n", - "ID, so you can map the vault back to a record in your own\n", - "database." + "A vault has a `display_name` that shows up in the Console and a `metadata` dict where you'd typically store your internal user ID, so you can map the vault back to a record in your own database." ] }, { @@ -124,12 +84,7 @@ "source": [ "## 2. Attach an MCP credential\n", "\n", - "Credentials live under a vault. Each credential pairs an MCP\n", - "server URL with a token the agent uses when calling that server.\n", - "For the GitHub Copilot MCP server, a static bearer token (your\n", - "GitHub PAT) is the simplest form. The API also supports a full\n", - "OAuth flow with refresh for services that require it, both\n", - "shapes are handled through `auth=`." + "Credentials live under a vault. Each credential pairs an MCP server URL with a token the agent uses when calling that server. For the GitHub Copilot MCP server, a static bearer token (your GitHub PAT) is the simplest form. The API also supports a full OAuth flow with refresh for services that require it, both shapes are handled through `auth=`." ] }, { @@ -158,12 +113,7 @@ "source": [ "## 3. Reference the vault on a session\n", "\n", - "Pass `vault_ids=[vault.id]` on `sessions.create` and the API\n", - "looks up the matching MCP server URL on every tool call. The\n", - "agent never sees the token itself, and you don't have to pass\n", - "it on the request. The agent definition just lists the MCP\n", - "server as usual, the credential wiring happens at session\n", - "creation time." + "Pass `vault_ids=[vault.id]` on `sessions.create` and the API looks up the matching MCP server URL on every tool call. The agent never sees the token itself, and you don't have to pass it on the request. The agent definition just lists the MCP server as usual, the credential wiring happens at session creation time." ] }, { @@ -217,11 +167,7 @@ "source": [ "## 4. Run a turn as the end user\n", "\n", - "Everything the agent does against GitHub now flows through the\n", - "vault's credential. Auditing in your own systems is\n", - "straightforward: you know exactly which end user was acting\n", - "because the vault is tied to them via the metadata you set in\n", - "step 1." + "Everything the agent does against GitHub now flows through the vault's credential. Auditing in your own systems is straightforward: you know exactly which end user was acting because the vault is tied to them via the metadata you set in step 1." ] }, { @@ -256,31 +202,13 @@ "source": [ "## 5. Webhooks for production HITL\n", "\n", - "The streaming pattern in the gate notebook is convenient during\n", - "development because everything happens in one process, but it\n", - "holds an HTTP connection open while a human reviews, that\n", - "doesn't scale, and it doesn't survive process restarts. The\n", - "production pattern instead registers a webhook in the Console\n", - "that fires on `session.status_idled`, which is the signal that\n", - "the agent is either done OR waiting on a tool result.\n", - "\n", - "When the webhook fires, your server inspects the events, puts\n", - "any pending escalation in front of a reviewer, and POSTs the\n", - "`user.custom_tool_result` back whenever the human finishes. The\n", - "session simply sits idle until you respond, with no long-lived\n", - "connection on your side.\n", - "\n", - "Webhook registration is a one-time Console step under\n", - "**Settings → Webhooks**. You'll get a `whsec_...` signing\n", - "secret that is shown only once at creation; store it in your\n", - "secrets manager.\n", - "\n", - "**The block below is a reference implementation, not a notebook\n", - "cell.** Copy it into your own server, it depends on FastAPI,\n", - "which the cookbook doesn't install, and it's not run as part of\n", - "this notebook's flow. Paired with the agent definition from\n", - "the gate notebook, it's enough to drive the gate workflow end-to-end\n", - "from a production server.\n", + "The streaming pattern in the gate notebook is convenient during development because everything happens in one process, but it holds an HTTP connection open while a human reviews, that doesn't scale, and it doesn't survive process restarts. The production pattern instead registers a webhook in the Console that fires on `session.status_idled`, which is the signal that the agent is either done OR waiting on a tool result.\n", + "\n", + "When the webhook fires, your server inspects the events, puts any pending escalation in front of a reviewer, and POSTs the `user.custom_tool_result` back whenever the human finishes. The session simply sits idle until you respond, with no long-lived connection on your side.\n", + "\n", + "Webhook registration is a one-time Console step under **Settings → Webhooks**. You'll get a `whsec_...` signing secret that is shown only once at creation; store it in your secrets manager.\n", + "\n", + "**The block below is a reference implementation, not a notebook cell.** Copy it into your own server, it depends on FastAPI, which the cookbook doesn't install, and it's not run as part of this notebook's flow. Paired with the agent definition from the gate notebook, it's enough to drive the gate workflow end-to-end from a production server.\n", "\n", "```python\n", "import hmac\n", @@ -339,10 +267,7 @@ " )\n", "```\n", "\n", - "The code that responds to the agent is identical to the Part A\n", - "loop in the gate notebook. The only thing that changes is how your\n", - "server learns there's work to do: instead of a local loop pulling\n", - "events, webhooks push notifications on your schedule." + "The code that responds to the agent is identical to the Part A loop in the gate notebook. The only thing that changes is how your server learns there's work to do: instead of a local loop pulling events, webhooks push notifications on your schedule." ] }, { @@ -352,18 +277,9 @@ "source": [ "## 6. Resource lifecycle: list, retrieve, update, archive\n", "\n", - "Every resource in the API, agents, environments, sessions,\n", - "vaults, credentials, exposes the same five-verb pattern:\n", - "`list`, `retrieve`, `update`, `archive`, and (for some)\n", - "`delete`. We'll demonstrate the full set on agents, then list\n", - "the verbs available on each other resource as a quick reference.\n", - "\n", - "**archive vs delete:** `archive` keeps the record around for\n", - "audit and retrieval but tears down any live container and stops\n", - "the resource counting against your workspace quotas. `delete`\n", - "removes the record entirely. For most workflows `archive` is the\n", - "right call; reach for `delete` only when you specifically need\n", - "the record gone (e.g. test cleanup)." + "Every resource in the API, agents, environments, sessions, vaults, credentials, exposes the same five-verb pattern: `list`, `retrieve`, `update`, `archive`, and (for some) `delete`. We'll demonstrate the full set on agents, then list the verbs available on each other resource as a quick reference.\n", + "\n", + "**archive vs delete:** `archive` keeps the record around for audit and retrieval but tears down any live container and stops the resource counting against your workspace quotas. `delete` removes the record entirely. For most workflows `archive` is the right call; reach for `delete` only when you specifically need the record gone (e.g. test cleanup)." ] }, { @@ -402,10 +318,7 @@ "source": [ "## 7. Cleanup\n", "\n", - "Credentials and vaults have their own archive endpoints.\n", - "Archiving a vault does NOT automatically archive its\n", - "credentials, so do the credentials first if you want a clean\n", - "sweep." + "Credentials and vaults have their own archive endpoints. Archiving a vault does NOT automatically archive its credentials, so do the credentials first if you want a clean sweep." ] }, { @@ -431,25 +344,12 @@ "source": [ "## The other cookbooks\n", "\n", - "This notebook is the production-shaped bookend. The workflow\n", - "notebooks it wraps around are worth running first if you\n", - "haven't already:\n", - "\n", - "- [`CMA_iterate_fix_failing_tests.ipynb`](CMA_iterate_fix_failing_tests.ipynb) — the\n", - " entry-point notebook. Introduces agents, environments,\n", - " sessions, file mounts, and the streaming event loop through\n", - " a do-observe-fix loop on a failing test suite.\n", - "- [`CMA_orchestrate_issue_to_pr.ipynb`](CMA_orchestrate_issue_to_pr.ipynb) — multi-turn\n", - " agent that drives an issue all the way to a merged PR through\n", - " a mock gh CLI, with mid-chain recovery from a CI failure and\n", - " a review comment.\n", - "- [`CMA_explore_unfamiliar_codebase.ipynb`](CMA_explore_unfamiliar_codebase.ipynb) — the\n", - " grounding pattern, with a planted stale-doc trap. Also shows\n", - " `sessions.resources.add` for pushing more context into a\n", - " running session.\n", - "- [`CMA_gate_human_in_the_loop.ipynb`](CMA_gate_human_in_the_loop.ipynb) — custom-tool\n", - " `decide()` and `escalate()` round-trip for human-in-the-loop\n", - " workflows, paired with the webhook reference block above." + "This notebook is the production-shaped bookend. The workflow notebooks it wraps around are worth running first if you haven't already:\n", + "\n", + "- [`CMA_iterate_fix_failing_tests.ipynb`](CMA_iterate_fix_failing_tests.ipynb) — the entry-point notebook. Introduces agents, environments, sessions, file mounts, and the streaming event loop through a do-observe-fix loop on a failing test suite.\n", + "- [`CMA_orchestrate_issue_to_pr.ipynb`](CMA_orchestrate_issue_to_pr.ipynb) — multi-turn agent that drives an issue all the way to a merged PR through a mock gh CLI, with mid-chain recovery from a CI failure and a review comment.\n", + "- [`CMA_explore_unfamiliar_codebase.ipynb`](CMA_explore_unfamiliar_codebase.ipynb) — the grounding pattern, with a planted stale-doc trap. Also shows `sessions.resources.add` for pushing more context into a running session.\n", + "- [`CMA_gate_human_in_the_loop.ipynb`](CMA_gate_human_in_the_loop.ipynb) — custom-tool `decide()` and `escalate()` round-trip for human-in-the-loop workflows, paired with the webhook reference block above." ] } ], diff --git a/managed_agents/CMA_orchestrate_issue_to_pr.ipynb b/managed_agents/CMA_orchestrate_issue_to_pr.ipynb index 26d2eeff..61b95730 100644 --- a/managed_agents/CMA_orchestrate_issue_to_pr.ipynb +++ b/managed_agents/CMA_orchestrate_issue_to_pr.ipynb @@ -7,33 +7,16 @@ "source": [ "# Orchestrate: from issue to merged PR\n", "\n", - "This notebook walks the agent through a realistic end-to-end loop:\n", - "read a vague bug report, find the bug, fix it, open a PR, survive\n", - "CI, address review feedback, and merge. A real maintainer workflow\n", - "is never linear, and that's the point of the exercise, the agent\n", - "has to carry state across many different tool types (reading JSON,\n", - "grepping code, editing files, running a mock CLI, parsing CI\n", - "output) while recovering from two mid-chain surprises: a CI\n", - "failure and a review bot that demands a docstring.\n", + "This notebook walks the agent through a realistic end-to-end loop: read a vague bug report, find the bug, fix it, open a PR, survive CI, address review feedback, and merge. A real maintainer workflow is never linear, and that's the point of the exercise, the agent has to carry state across many different tool types (reading JSON, grepping code, editing files, running a mock CLI, parsing CI output) while recovering from two mid-chain surprises: a CI failure and a review bot that demands a docstring.\n", "\n", - "State flows through the chain as issue body → file paths → fix\n", - "diff → PR number → CI output → review comment → final merge. The\n", - "`gh-mock` CLI in the fixture persists everything in `.gh-state/`\n", - "so each step can see what the previous ones did, mimicking a real\n", - "GitHub workflow without any network access.\n", + "State flows through the chain as issue body → file paths → fix diff → PR number → CI output → review comment → final merge. The `gh-mock` CLI in the fixture persists everything in `.gh-state/` so each step can see what the previous ones did, mimicking a real GitHub workflow without any network access.\n", "\n", "What this teaches beyond the iterate notebook:\n", "\n", - "- **Multi-turn steering across a long chain.** The session\n", - " filesystem and conversation history persist across turns, so\n", - " each user message picks up where the last one left off. We use\n", - " that to verify the final state at the end.\n", - "- **Mid-chain recovery.** The agent has to read a CI failure or a\n", - " review comment and adapt, not just retry blindly.\n", + "- **Multi-turn steering across a long chain.** The session filesystem and conversation history persist across turns, so each user message picks up where the last one left off. We use that to verify the final state at the end.\n", + "- **Mid-chain recovery.** The agent has to read a CI failure or a review comment and adapt, not just retry blindly.\n", "\n", - "The fixture lives in `example_data/orchestrate/` and contains a\n", - "mock `gh` CLI, an issue JSON file, and a `src/` + `tests/` layout\n", - "with a planted bug." + "The fixture lives in `example_data/orchestrate/` and contains a mock `gh` CLI, an issue JSON file, and a `src/` + `tests/` layout with a planted bug." ] }, { @@ -65,13 +48,7 @@ "source": [ "## 1. Pack the fixture\n", "\n", - "The mock repository bundles a handful of files: `src/url_utils.py`\n", - "contains the actual bug, `src/blog.py` is a caller that makes the\n", - "bug easier to observe, `tests/test_urls.py` fails until the bug is\n", - "fixed, and the `gh-mock` CLI plus `issue_42.json` provide the\n", - "GitHub-like workflow the agent will drive. We zip the directory in\n", - "memory and upload it as a single file resource, see the sidebar\n", - "at the end for how to mount a real GitHub repository instead." + "The mock repository bundles a handful of files: `src/url_utils.py` contains the actual bug, `src/blog.py` is a caller that makes the bug easier to observe, `tests/test_urls.py` fails until the bug is fixed, and the `gh-mock` CLI plus `issue_42.json` provide the GitHub-like workflow the agent will drive. We zip the directory in memory and upload it as a single file resource, see the sidebar at the end for how to mount a real GitHub repository instead." ] }, { @@ -98,11 +75,7 @@ "source": [ "## 2. Agent + environment + session\n", "\n", - "The environment declares `pytest` as a pip dependency so the agent\n", - "can actually run the test suite as part of its CI loop. This is\n", - "the first notebook in the cookbook that needs network access for\n", - "package installation, hence the `allow_package_managers: True`\n", - "alongside the otherwise-`limited` networking config." + "The environment declares `pytest` as a pip dependency so the agent can actually run the test suite as part of its CI loop. This is the first notebook in the cookbook that needs network access for package installation, hence the `allow_package_managers: True` alongside the otherwise-`limited` networking config." ] }, { @@ -158,14 +131,7 @@ "source": [ "## 3. Run the full chain\n", "\n", - "A single instruction kicks off the whole loop. The two recovery\n", - "points to watch for are a CI failure and a review comment. If the\n", - "agent's first fix is incomplete, for example if it only handles\n", - "`é` and misses `ü`, `gh-mock pr checks` will exit non-zero with\n", - "pytest output that the agent needs to read and iterate on. Then,\n", - "once CI is green, the reviewer bot will block the merge if\n", - "`slugify()` is missing a docstring, giving the agent one more\n", - "chance to adapt before the final merge step." + "A single instruction kicks off the whole loop. The two recovery points to watch for are a CI failure and a review comment. If the agent's first fix is incomplete, for example if it only handles `é` and misses `ü`, `gh-mock pr checks` will exit non-zero with pytest output that the agent needs to read and iterate on. Then, once CI is green, the reviewer bot will block the merge if `slugify()` is missing a docstring, giving the agent one more chance to adapt before the final merge step." ] }, { @@ -208,13 +174,7 @@ "source": [ "## 4. Multi-turn verification\n", "\n", - "Sessions are stateful: the container filesystem and the\n", - "conversation history persist across turns, so a follow-up just\n", - "sends another `user.message`. We use that here to independently\n", - "verify the final state, the mock CLI persists the PR state in\n", - "`.gh-state/pr_101.json`, so printing that file is the simplest\n", - "way to confirm the PR ended up merged with CI green and at least\n", - "one review approved before the merge." + "Sessions are stateful: the container filesystem and the conversation history persist across turns, so a follow-up just sends another `user.message`. We use that here to independently verify the final state, the mock CLI persists the PR state in `.gh-state/pr_101.json`, so printing that file is the simplest way to confirm the PR ended up merged with CI green and at least one review approved before the merge." ] }, { @@ -257,18 +217,9 @@ "source": [ "## Sidebar: mounting a real GitHub repository\n", "\n", - "The fixture above is a mock so the notebook can run offline and\n", - "you don't need any GitHub credentials to try it. For real work\n", - "against a real repository, swap the `{\"type\": \"file\", ...}` mount\n", - "above for a `{\"type\": \"github_repository\", ...}` resource and the\n", - "API will clone the repo into the container at session start.\n", + "The fixture above is a mock so the notebook can run offline and you don't need any GitHub credentials to try it. For real work against a real repository, swap the `{\"type\": \"file\", ...}` mount above for a `{\"type\": \"github_repository\", ...}` resource and the API will clone the repo into the container at session start.\n", "\n", - "This is the same `resources=` field as the file mount in\n", - "the iterate notebook, the list takes a mix of types, so you could also\n", - "clone the repo AND mount a separate config file in the same call.\n", - "The agent's bash/read/grep tools see the working tree as a normal\n", - "directory; the only difference is that the API handles the clone\n", - "instead of you.\n", + "This is the same `resources=` field as the file mount in the iterate notebook, the list takes a mix of types, so you could also clone the repo AND mount a separate config file in the same call. The agent's bash/read/grep tools see the working tree as a normal directory; the only difference is that the API handles the clone instead of you.\n", "\n", "```python\n", "session = client.beta.sessions.create(\n", @@ -287,10 +238,7 @@ ")\n", "```\n", "\n", - "A `GITHUB_TOKEN` is required for both private repos and to\n", - "authenticate the clone. The clone happens once at session creation;\n", - "subsequent turns work against the same working tree without\n", - "re-cloning." + "A `GITHUB_TOKEN` is required for both private repos and to authenticate the clone. The clone happens once at session creation; subsequent turns work against the same working tree without re-cloning." ] } ], diff --git a/managed_agents/data_analyst_agent.ipynb b/managed_agents/data_analyst_agent.ipynb index 9c47db5b..0b8ca82f 100644 --- a/managed_agents/data_analyst_agent.ipynb +++ b/managed_agents/data_analyst_agent.ipynb @@ -9,33 +9,18 @@ "\n", "## Introduction\n", "\n", - "Every team has someone who gets handed a CSV and asked \"what's\n", - "interesting in here?\" In this cookbook you'll build an agent that\n", - "answers for them: upload a CSV, get back a narrative HTML report\n", - "with interactive charts.\n", + "Every team has someone who gets handed a CSV and asked \"what's interesting in here?\" In this cookbook you'll build an agent that answers for them: upload a CSV, get back a narrative HTML report with interactive charts.\n", "\n", - "You'll run it on\n", - "[Claude Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview),\n", - "Anthropic's hosted runtime for stateful, tool-using agents, built on\n", - "four core concepts:\n", + "You'll run it on [Claude Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview), Anthropic's hosted runtime for stateful, tool-using agents, built on four core concepts:\n", "\n", "- **Agent**: the model, system prompt, tools, MCP servers, and skills\n", - "- **Environment**: a configured container template (packages, network\n", - " access)\n", - "- **Session**: a running agent instance within an environment,\n", - " performing a specific task and generating outputs\n", - "- **Events**: messages exchanged between your application and the\n", - " agent (user turns, tool results, status updates)\n", - "\n", - "An agent plus an environment gives you a session. You attach your\n", - "data to it as resources, then drive it by sending events and reading\n", - "back the stream.\n", - "\n", - "Anthropic handles the sandbox, tool execution, and context\n", - "management for you. If you need full control over the agent loop and\n", - "deployment, try the\n", - "[Claude Agent SDK](https://platform.claude.com/docs/en/api/agent-sdk/overview)\n", - "instead.\n", + "- **Environment**: a configured container template (packages, network access)\n", + "- **Session**: a running agent instance within an environment, performing a specific task and generating outputs\n", + "- **Events**: messages exchanged between your application and the agent (user turns, tool results, status updates)\n", + "\n", + "An agent plus an environment gives you a session. You attach your data to it as resources, then drive it by sending events and reading back the stream.\n", + "\n", + "Anthropic handles the sandbox, tool execution, and context management for you. If you need full control over the agent loop and deployment, try the [Claude Agent SDK](https://platform.claude.com/docs/en/api/agent-sdk/overview) instead.\n", "\n", "### What you'll learn\n", "\n", @@ -49,9 +34,7 @@ "### Prerequisites\n", "\n", "- Python 3.11+\n", - "- An Anthropic API key from the\n", - " [Console](https://platform.claude.com/settings/keys), set as\n", - " `ANTHROPIC_API_KEY`\n", + "- An Anthropic API key from the [Console](https://platform.claude.com/settings/keys), set as `ANTHROPIC_API_KEY`\n", "\n", "Install dependencies:" ] @@ -91,16 +74,9 @@ "source": [ "## 1. Create an environment\n", "\n", - "An **environment** is a reusable container spec. Declaring `pandas`\n", - "and `plotly` here means every session starts with them preinstalled,\n", - "so the agent can begin analyzing immediately instead of running\n", - "`pip install` first.\n", + "An **environment** is a reusable container spec. Declaring `pandas` and `plotly` here means every session starts with them preinstalled, so the agent can begin analyzing immediately instead of running `pip install` first.\n", "\n", - "Networking is `unrestricted` here so the agent can load plotly from\n", - "its CDN – but that lets it reach anywhere on the internet, so for\n", - "production use a\n", - "[host allowlist](https://platform.claude.com/docs/en/managed-agents/environments)\n", - "instead." + "Networking is `unrestricted` here so the agent can load plotly from its CDN – but that lets it reach anywhere on the internet, so for production use a [host allowlist](https://platform.claude.com/docs/en/managed-agents/environments) instead." ] }, { @@ -130,17 +106,9 @@ "source": [ "## 2. Create the agent\n", "\n", - "An **agent** pairs a model with a system prompt and a set of tools.\n", - "Most of the output quality comes from the system prompt; this one\n", - "pushes for narrative structure, findings backed by specific figures,\n", - "and the right pattern for embedding multiple plotly charts in one\n", - "HTML file.\n", - "\n", - "[`agent_toolset_20260401`](https://platform.claude.com/docs/en/managed-agents/tools)\n", - "provides eight tools: `bash`, `read`, `write`, `edit`, `glob`,\n", - "`grep`, `web_fetch`, and `web_search`. Here they all run under\n", - "`always_allow`, with the two web tools disabled because this\n", - "analysis is offline." + "An **agent** pairs a model with a system prompt and a set of tools. Most of the output quality comes from the system prompt; this one pushes for narrative structure, findings backed by specific figures, and the right pattern for embedding multiple plotly charts in one HTML file.\n", + "\n", + "[`agent_toolset_20260401`](https://platform.claude.com/docs/en/managed-agents/tools) provides eight tools: `bash`, `read`, `write`, `edit`, `glob`, `grep`, `web_fetch`, and `web_search`. Here they all run under `always_allow`, with the two web tools disabled because this analysis is offline." ] }, { @@ -204,9 +172,7 @@ "source": [ "## 3. Upload the dataset\n", "\n", - "The included sample CSV has 50 rows, so the analysis completes in a\n", - "few minutes. Swap in any CSV (or a zip of CSVs) here; the rest of\n", - "the flow is identical." + "The included sample CSV has 50 rows, so the analysis completes in a few minutes. Swap in any CSV (or a zip of CSVs) here; the rest of the flow is identical." ] }, { @@ -239,13 +205,9 @@ "source": [ "## 4. Create a session and send the task\n", "\n", - "A **session** binds the agent to the environment and any mounted\n", - "files. Passing `{\"type\": \"agent\", \"id\": ..., \"version\": ...}` reuses\n", - "the versioned agent you created above. `resources` mounts the\n", - "uploaded file at the given absolute path inside the container.\n", + "A **session** binds the agent to the environment and any mounted files. Passing `{\"type\": \"agent\", \"id\": ..., \"version\": ...}` reuses the versioned agent you created above. `resources` mounts the uploaded file at the given absolute path inside the container.\n", "\n", - "After creating the session, send a `user.message` event with the\n", - "task. The agent will start working immediately." + "After creating the session, send a `user.message` event with the task. The agent will start working immediately." ] }, { @@ -300,15 +262,11 @@ "source": [ "## 5. Stream the run\n", "\n", - "Open the session in the [Console](https://platform.claude.com/)\n", - "under **Sessions** to watch every event, tool call, and token count\n", - "live:\n", + "Open the session in the [Console](https://platform.claude.com/) under **Sessions** to watch every event, tool call, and token count live:\n", "\n", "\"Session\n", "\n", - "The helper below tails the same event stream, printing\n", - "`agent.message` text and `agent.tool_use` calls as they arrive, and\n", - "returns on `session.status_idle`." + "The helper below tails the same event stream, printing `agent.message` text and `agent.tool_use` calls as they arrive, and returns on `session.status_idle`." ] }, { @@ -347,13 +305,9 @@ "source": [ "## 6. Retrieve the report\n", "\n", - "Anything the agent writes to `/mnt/session/outputs/` is persisted\n", - "and surfaced via the Files API with `scope_id=`. Files\n", - "written elsewhere in the container are not persisted.\n", + "Anything the agent writes to `/mnt/session/outputs/` is persisted and surfaced via the Files API with `scope_id=`. Files written elsewhere in the container are not persisted.\n", "\n", - "The [Files API](https://platform.claude.com/docs/en/api/beta/files/list)\n", - "is a separate feature in beta, so to use `scope_id` here you also\n", - "need to pass the Managed Agents beta header." + "The [Files API](https://platform.claude.com/docs/en/api/beta/files/list) is a separate feature in beta, so to use `scope_id` here you also need to pass the Managed Agents beta header." ] }, { @@ -393,12 +347,7 @@ "source": [ "## 7. Clean up and next steps\n", "\n", - "You create the agent and environment once and reuse them across\n", - "runs; you create a new session for each conversation. Now that you\n", - "have the report, archive this session to release its container. The\n", - "lines below save the agent and environment IDs to `.env` so\n", - "[`slack_data_bot.ipynb`](slack_data_bot.ipynb) can start new\n", - "sessions with them.\n", + "You create the agent and environment once and reuse them across runs; you create a new session for each conversation. Now that you have the report, archive this session to release its container. The lines below save the agent and environment IDs to `.env` so [`slack_data_bot.ipynb`](slack_data_bot.ipynb) can start new sessions with them.\n", "\n", "> **Warning:** make sure `.env` is listed in `.gitignore` before\n", "> running the next cell – never commit it." @@ -432,18 +381,13 @@ "id": "cell-17", "metadata": {}, "source": [ - "You've built and run a data analyst agent end to end: a reusable\n", - "environment and agent, a session that mounted your CSV, a live event\n", - "stream, and a downloaded HTML report.\n", + "You've built and run a data analyst agent end to end: a reusable environment and agent, a session that mounted your CSV, a live event stream, and a downloaded HTML report.\n", "\n", "From here:\n", "\n", "- Open the downloaded `report.html` to see the narrative and charts.\n", - "- Open the session in the\n", - " [Console](https://platform.claude.com/) to inspect token usage\n", - " and the full event log.\n", - "- Continue to [`slack_data_bot.ipynb`](slack_data_bot.ipynb)\n", - " to drive this agent from Slack." + "- Open the session in the [Console](https://platform.claude.com/) to inspect token usage and the full event log.\n", + "- Continue to [`slack_data_bot.ipynb`](slack_data_bot.ipynb) to drive this agent from Slack." ] } ], diff --git a/managed_agents/slack_data_bot.ipynb b/managed_agents/slack_data_bot.ipynb index 617ccd93..a40e6392 100644 --- a/managed_agents/slack_data_bot.ipynb +++ b/managed_agents/slack_data_bot.ipynb @@ -4,66 +4,7 @@ "cell_type": "markdown", "id": "cell-0", "metadata": {}, - "source": [ - "# Build a Slack data analyst bot with Claude Managed Agents\n", - "\n", - "## Introduction\n", - "\n", - "You'll wrap the agent from\n", - "[`data_analyst_agent.ipynb`](data_analyst_agent.ipynb) in a\n", - "Slack bot built with\n", - "[Bolt for Python](https://docs.slack.dev/tools/bolt-python/), Slack's\n", - "official framework for building apps. Mention the bot with a question\n", - "and a CSV attachment to get a narrative report posted back to the\n", - "thread. Follow-up messages continue the same session.\n", - "\n", - " user: @databot what's driving Q1 revenue? [sales.csv]\n", - " │\n", - " ▼\n", - " bot uploads the CSV and starts an agent session\n", - " │\n", - " ▼\n", - " bot streams the agent's progress back to the thread\n", - " │\n", - " ▼\n", - " bot posts the finished report to the thread\n", - "\n", - "### What you'll learn\n", - "\n", - "- Kick off an agent run from a Slack mention\n", - "- Show the agent's progress as thread updates\n", - "- Post the finished report back to the thread\n", - "- Keep the conversation going with follow-up replies\n", - "\n", - "### Prerequisites\n", - "\n", - "1. Run the install cell below.\n", - "\n", - "2. Create a [Slack app](https://api.slack.com/apps): choose\n", - " **Create New App → From a manifest**, paste\n", - " [`slack_app_manifest.yaml`](example_data/slack_data_bot/slack_app_manifest.yaml), and install\n", - " it to your workspace. The manifest enables Socket Mode (Slack\n", - " delivers events over a WebSocket, so you don't need a public URL)\n", - " and the required scopes. Then grab two tokens:\n", - "\n", - " - **OAuth & Permissions** → copy the Bot User OAuth Token\n", - " (`xoxb-...`)\n", - " - **Basic Information → App-Level Tokens** → generate one with\n", - " scope `connections:write` (`xapp-...`)\n", - "\n", - " In a channel you want the bot in, run `/invite @databot`.\n", - "\n", - "3. Run [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb),\n", - " which saves `ANALYST_ENV_ID`, `ANALYST_AGENT_ID`, and\n", - " `ANALYST_AGENT_VERSION` to `.env`.\n", - "\n", - "The setup cell below prompts for your Slack tokens and saves them to\n", - "`.env` so you don't re-enter them on restart (or add them to `.env`\n", - "beforehand to skip the prompt). `.env` is already in `.gitignore` –\n", - "never commit it to version control. If you don't have a Slack\n", - "workspace handy you can still read through the code – each section\n", - "explains what it does – but you'll need one to run the bot." - ] + "source": "# Build a Slack data analyst bot with Claude Managed Agents\n\n## Introduction\n\nYou'll wrap the agent from [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb) in a Slack bot built with [Bolt for Python](https://docs.slack.dev/tools/bolt-python/), Slack's official framework for building apps. Mention the bot with a question and a CSV attachment to get a narrative report posted back to the thread. Follow-up messages continue the same session.\n\n```text\nuser: @databot what's driving Q1 revenue? [sales.csv]\n │\n ▼\nbot uploads the CSV and starts an agent session\n │\n ▼\nbot streams the agent's progress back to the thread\n │\n ▼\nbot posts the finished report to the thread\n```\n\n### What you'll learn\n\n- Kick off an agent run from a Slack mention\n- Show the agent's progress as thread updates\n- Post the finished report back to the thread\n- Keep the conversation going with follow-up replies\n\n### Prerequisites\n\n1. Run the install cell below.\n\n2. Create a [Slack app](https://api.slack.com/apps): choose **Create New App → From a manifest**, paste [`slack_app_manifest.yaml`](example_data/slack_data_bot/slack_app_manifest.yaml), and install it to your workspace. The manifest enables Socket Mode (Slack delivers events over a WebSocket, so you don't need a public URL) and the required scopes. Then grab two tokens:\n\n - **OAuth & Permissions** → copy the Bot User OAuth Token (`xoxb-...`)\n - **Basic Information → App-Level Tokens** → generate one with scope `connections:write` (`xapp-...`)\n\n In a channel you want the bot in, run `/invite @databot`.\n\n3. Run [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb), which saves `ANALYST_ENV_ID`, `ANALYST_AGENT_ID`, and `ANALYST_AGENT_VERSION` to `.env`.\n\nThe setup cell below prompts for your Slack tokens and saves them to `.env` so you don't re-enter them on restart (or add them to `.env` beforehand to skip the prompt). `.env` is already in `.gitignore` – never commit it to version control. If you don't have a Slack workspace handy you can still read through the code – each section explains what it does – but you'll need one to run the bot." }, { "cell_type": "code", @@ -135,16 +76,9 @@ "source": [ "## 1. Start a session when the bot is mentioned\n", "\n", - "Bolt passes an `ack` callback into every handler; calling it tells\n", - "Slack the event was received. Slack retries anything not acknowledged\n", - "[within three seconds](https://docs.slack.dev/apis/events-api/#responding),\n", - "so `on_mention` calls `ack()` immediately and hands the slow work\n", - "(file upload, session creation, streaming) to `start_analysis` on a\n", - "background thread.\n", + "Bolt passes an `ack` callback into every handler; calling it tells Slack the event was received. Slack retries anything not acknowledged [within three seconds](https://docs.slack.dev/apis/events-api/#responding), so `on_mention` calls `ack()` immediately and hands the slow work (file upload, session creation, streaming) to `start_analysis` on a background thread.\n", "\n", - "Each mention creates a session you can open in the\n", - "[Console](https://platform.claude.com/) under **Sessions** to\n", - "watch the full trace." + "Each mention creates a session you can open in the [Console](https://platform.claude.com/) under **Sessions** to watch the full trace." ] }, { @@ -224,15 +158,9 @@ "source": [ "## 2. Relay progress and results to the thread\n", "\n", - "The `relay_stream` function defined below is the bridge between the\n", - "two APIs: it reads from the Anthropic session event stream and posts\n", - "to Slack. It loops until the agent goes idle, then posts the final\n", - "summary and uploads any files the agent wrote.\n", + "The `relay_stream` function defined below is the bridge between the two APIs: it reads from the Anthropic session event stream and posts to Slack. It loops until the agent goes idle, then posts the final summary and uploads any files the agent wrote.\n", "\n", - "`files.list(scope_id=...)` returns every file in the session – both\n", - "the CSV we uploaded and anything the agent wrote. We filter to\n", - "`downloadable == True` so only agent-generated outputs (the report,\n", - "charts) get posted back to Slack, not the user's own input." + "`files.list(scope_id=...)` returns every file in the session – both the CSV we uploaded and anything the agent wrote. We filter to `downloadable == True` so only agent-generated outputs (the report, charts) get posted back to Slack, not the user's own input." ] }, { @@ -295,9 +223,7 @@ "source": [ "## 3. Handle follow-ups in the same session\n", "\n", - "A reply in the thread becomes another turn in the existing\n", - "session – you don't need to `@mention` the bot again. The container\n", - "filesystem and conversation history persist across turns." + "A reply in the thread becomes another turn in the existing session – you don't need to `@mention` the bot again. The container filesystem and conversation history persist across turns." ] }, { @@ -345,12 +271,9 @@ "source": [ "## 4. Run the bot\n", "\n", - "The cell below connects to Slack and starts listening. It blocks\n", - "while the bot runs – stop it with the ■ interrupt button when\n", - "you're done.\n", + "The cell below connects to Slack and starts listening. It blocks while the bot runs – stop it with the ■ interrupt button when you're done.\n", "\n", - "In any channel the bot is in, mention it with a CSV attached. It\n", - "posts progress, then the summary and `report.html` in the thread:\n", + "In any channel the bot is in, mention it with a CSV attached. It posts progress, then the summary and `report.html` in the thread:\n", "\n", "\"Slack\n", "\n", @@ -376,19 +299,11 @@ "source": [ "## Next steps\n", "\n", - "You've wrapped the analyst agent in a Slack bot: mentions start a\n", - "session, the event stream relays progress to the thread, outputs get\n", - "uploaded, and replies continue the same conversation.\n", + "You've wrapped the analyst agent in a Slack bot: mentions start a session, the event stream relays progress to the thread, outputs get uploaded, and replies continue the same conversation.\n", "\n", - "- Swap the agent's system prompt in\n", - " [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb) to\n", - " change its analysis style. Re-running that notebook creates a new\n", - " agent and saves its ID to `.env` for the bot to pick up.\n", - "- Persist `thread_sessions` to a database so conversations survive\n", - " bot restarts.\n", - "- Move the bot out of this notebook: copy the code to a `.py` file\n", - " and deploy it anywhere that can hold a long-lived WebSocket\n", - " connection." + "- Swap the agent's system prompt in [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb) to change its analysis style. Re-running that notebook creates a new agent and saves its ID to `.env` for the bot to pick up.\n", + "- Persist `thread_sessions` to a database so conversations survive bot restarts.\n", + "- Move the bot out of this notebook: copy the code to a `.py` file and deploy it anywhere that can hold a long-lived WebSocket connection." ] } ], @@ -410,4 +325,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/managed_agents/sre_incident_responder.ipynb b/managed_agents/sre_incident_responder.ipynb index ee356324..407776aa 100644 --- a/managed_agents/sre_incident_responder.ipynb +++ b/managed_agents/sre_incident_responder.ipynb @@ -9,30 +9,17 @@ "\n", "## Introduction\n", "\n", - "When a production alert fires at 3 a.m., someone has to pull the\n", - "logs, find the right runbook, trace the misconfiguration, open a PR,\n", - "and get it approved. An agent can take that first pass for you and\n", - "have a fix waiting for review by the time you're at the keyboard \u2014 as\n", - "long as it has the right context and a human makes the final call.\n", + "When a production alert fires at 3 a.m., someone has to pull the logs, find the right runbook, trace the misconfiguration, open a PR, and get it approved. An agent can take that first pass for you and have a fix waiting for review by the time you're at the keyboard — as long as it has the right context and a human makes the final call.\n", "\n", - "[Claude Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview)\n", - "gives you the scalable infrastructure, sandboxing, & security pieces to build that with ease. In this\n", - "tutorial you'll wire them together:\n", + "[Claude Managed Agents](https://platform.claude.com/docs/en/managed-agents/overview) gives you the scalable infrastructure, sandboxing, & security pieces to build that with ease. In this tutorial you'll wire them together:\n", "\n", "- A simulated **PagerDuty webhook** triggers your Claude Managed Agent with one API call.\n", - "- A **Skill** teaches the agent your team's runbook conventions, so it\n", - " knows where to look.\n", - "- The built-in `bash`/`read`/`edit` tools let it investigate logs and\n", - " infrastructure code in a sandbox.\n", - "- **Custom tools** let it open a pull request and ask a human to\n", - " approve before merging \u2014 your code handles those calls, so you\n", - " decide what \"open a PR\" actually does.\n", + "- A **Skill** teaches the agent your team's runbook conventions, so it knows where to look.\n", + "- The built-in `bash`/`read`/`edit` tools let it investigate logs and infrastructure code in a sandbox.\n", + "- **Custom tools** let it open a pull request and ask a human to approve before merging — your code handles those calls, so you decide what \"open a PR\" actually does.\n", "- The **Anthropic Console** records every step automatically, providing you complete observability.\n", "\n", - "Everything below runs with only `ANTHROPIC_API_KEY`. PagerDuty,\n", - "GitHub, and Datadog are mocked with local fixtures so you can focus on\n", - "the Managed Agents pieces; the closing section shows how to swap each\n", - "mock for the real service.\n", + "Everything below runs with only `ANTHROPIC_API_KEY`. PagerDuty, GitHub, and Datadog are mocked with local fixtures so you can focus on the Managed Agents pieces; the closing section shows how to swap each mock for the real service.\n", "\n", "### What you'll learn\n", "\n", @@ -44,8 +31,8 @@ "\n", "### Prerequisites\n", "\n", - "Set `ANTHROPIC_API_KEY` in your environment, then install\n", - "dependencies:\n" + "Set `ANTHROPIC_API_KEY` in your environment, then install dependencies:\n", + "" ] }, { @@ -101,18 +88,10 @@ "source": [ "## 1. Upload a runbook skill\n", "\n", - "A\n", - "[**Skill**](https://platform.claude.com/docs/en/managed-agents/skills)\n", - "is a small filesystem bundle the platform mounts into the agent's\n", - "context with progressive disclosure: the agent sees a one-line\n", - "description up front and reads the body only when it's relevant. It's\n", - "a good place for team conventions that shouldn't live in the system\n", - "prompt.\n", - "\n", - "The sample skill below encodes one rule \u2014 *consult the runbook before\n", - "touching infrastructure* \u2014 the way a real team playbook would. You\n", - "upload it once via the Skills API and reference it by ID on every\n", - "agent that needs it.\n" + "A [**Skill**](https://platform.claude.com/docs/en/managed-agents/skills) is a small filesystem bundle the platform mounts into the agent's context with progressive disclosure: the agent sees a one-line description up front and reads the body only when it's relevant. It's a good place for team conventions that shouldn't live in the system prompt.\n", + "\n", + "The sample skill below encodes one rule — *consult the runbook before touching infrastructure* — the way a real team playbook would. You upload it once via the Skills API and reference it by ID on every agent that needs it.\n", + "" ] }, { @@ -153,7 +132,7 @@ "code, or status pattern).\n", "\n", "Consult the team runbooks before proposing any fix. Runbooks are\n", - "organised by failure signature \u2014 for example `oom.md`, `5xx.md`,\n", + "organised by failure signature — for example `oom.md`, `5xx.md`,\n", "`latency.md`. Each one lists the triage steps for that class of\n", "failure and the configuration that usually needs to change.\n", "\n", @@ -177,18 +156,12 @@ "\n", "The agent's `tools` list combines three kinds of capability:\n", "\n", - "- [`agent_toolset_20260401`](https://platform.claude.com/docs/en/managed-agents/tools)\n", - " \u2014 the built-in `bash`, `read`, `grep`, `edit`, \u2026 tools that run\n", - " *inside* the sandbox. The agent uses these to investigate.\n", + "- [`agent_toolset_20260401`](https://platform.claude.com/docs/en/managed-agents/tools) — the built-in `bash`, `read`, `grep`, `edit`, … tools that run *inside* the sandbox. The agent uses these to investigate.\n", "- The runbook **skill** from step 1.\n", - "- Three **custom tools** \u2014 `open_pull_request`, `request_approval`,\n", - " and `merge_pull_request` \u2014 that the agent can call but *your\n", - " application* executes. They're how the agent reaches systems\n", - " outside the sandbox, and how you put a human in the loop.\n", - "\n", - "The system prompt is persona and workflow only. The alert itself\n", - "arrives as the first user event, so the same agent handles any\n", - "incident.\n" + "- Three **custom tools** — `open_pull_request`, `request_approval`, and `merge_pull_request` — that the agent can call but *your application* executes. They're how the agent reaches systems outside the sandbox, and how you put a human in the loop.\n", + "\n", + "The system prompt is persona and workflow only. The alert itself arrives as the first user event, so the same agent handles any incident.\n", + "" ] }, { @@ -232,7 +205,7 @@ " Otherwise stop and report.\n", "\n", "Never call merge_pull_request unless request_approval returned\n", - "\"approved\". Keep the fix minimal \u2014 do not refactor unrelated config.\n", + "\"approved\". Keep the fix minimal — do not refactor unrelated config.\n", "\"\"\"\n", "\n", "agent = client.beta.agents.create(\n", @@ -302,18 +275,9 @@ "source": [ "## 3. Create an environment and mount the data\n", "\n", - "The agent needs three things in its workspace to investigate: the\n", - "recent service logs, the infrastructure repo, and the team runbooks.\n", - "Upload each via the Files API and list them as `resources` so they're\n", - "mounted into every session at the paths the system prompt expects. A\n", - "`limited`-networking cloud environment is enough because the agent\n", - "only needs its own filesystem.\n", + "The agent needs three things in its workspace to investigate: the recent service logs, the infrastructure repo, and the team runbooks. Upload each via the Files API and list them as `resources` so they're mounted into every session at the paths the system prompt expects. A `limited`-networking cloud environment is enough because the agent only needs its own filesystem.\n", "\n", - "To keep this notebook runnable with only `ANTHROPIC_API_KEY`, the\n", - "infra \"repo\" is a single manifest with a too-low `memory: 128Mi`\n", - "limit. In production you'd replace that upload with a\n", - "`github_repository` resource that clones the real repo straight into\n", - "the sandbox:\n", + "To keep this notebook runnable with only `ANTHROPIC_API_KEY`, the infra \"repo\" is a single manifest with a too-low `memory: 128Mi` limit. In production you'd replace that upload with a `github_repository` resource that clones the real repo straight into the sandbox:\n", "\n", "```python\n", "{\n", @@ -323,7 +287,8 @@ " \"checkout\": {\"type\": \"branch\", \"name\": \"main\"},\n", " \"mount_path\": \"infra\",\n", "}\n", - "```\n" + "```\n", + "" ] }, { @@ -380,14 +345,8 @@ "source": [ "## 4. Handle the incident alert\n", "\n", - "The handler below is the one function you'd deploy \u2014 a Flask or\n", - "FastAPI route that your alerting system calls when an incident fires.\n", - "It creates a session referencing the agent and environment, mounts\n", - "the data, and sends the alert JSON as the first `user.message` event.\n", - "This example uses a [PagerDuty V3\n", - "webhook](https://developer.pagerduty.com/docs/webhooks-overview)\n", - "payload, but any pager that can POST JSON works the same way; here\n", - "you call the handler directly with the fixture.\n" + "The handler below is the one function you'd deploy — a Flask or FastAPI route that your alerting system calls when an incident fires. It creates a session referencing the agent and environment, mounts the data, and sends the alert JSON as the first `user.message` event. This example uses a [PagerDuty V3 webhook](https://developer.pagerduty.com/docs/webhooks-overview) payload, but any pager that can POST JSON works the same way; here you call the handler directly with the fixture.\n", + "" ] }, { @@ -446,25 +405,12 @@ "source": [ "## 5. Service the agent's custom tool calls\n", "\n", - "This is where the built-in tools and your custom tools come together.\n", - "The agent's `read`/`bash`/`edit` calls run on the container and\n", - "appear in the event log as `agent.tool_use` \u2014 that's the\n", - "investigation, and you just print it. But when the agent calls one of\n", - "your custom tools, the session goes `idle` with\n", - "`stop_reason.type == \"requires_action\"` and waits for *your\n", - "application* to respond with a `user.custom_tool_result`.\n", - "\n", - "The loop below polls `events.list`, answers `open_pull_request` and\n", - "`merge_pull_request` inline by writing to a local list \u2014 that's the\n", - "GitHub mock \u2014 but **returns** when `request_approval` arrives,\n", - "because that one needs a human.\n", - "\n", - "In production, \"needs a human\" usually means *post it to Slack*: drop\n", - "the agent's summary into the on-call channel with an **Approve**\n", - "button, and send the result back when someone clicks. The\n", - "[`slack_data_bot` cookbook](slack_data_bot.ipynb) shows the Bolt\n", - "wiring for that; here you'll approve inline in the next cell so the\n", - "notebook stays self-contained.\n" + "This is where the built-in tools and your custom tools come together. The agent's `read`/`bash`/`edit` calls run on the container and appear in the event log as `agent.tool_use` — that's the investigation, and you just print it. But when the agent calls one of your custom tools, the session goes `idle` with `stop_reason.type == \"requires_action\"` and waits for *your application* to respond with a `user.custom_tool_result`.\n", + "\n", + "The loop below polls `events.list`, answers `open_pull_request` and `merge_pull_request` inline by writing to a local list — that's the GitHub mock — but **returns** when `request_approval` arrives, because that one needs a human.\n", + "\n", + "In production, \"needs a human\" usually means *post it to Slack*: drop the agent's summary into the on-call channel with an **Approve** button, and send the result back when someone clicks. The [`slack_data_bot` cookbook](slack_data_bot.ipynb) shows the Bolt wiring for that; here you'll approve inline in the next cell so the notebook stays self-contained.\n", + "" ] }, { @@ -494,7 +440,7 @@ "\n", "**Failure Signature:**\n", "- Service starts and warms pricing cache (14,092 entries)\n", - "- Heap memory grows rapidly: 101MB \u2192 118MB \u2192 121MB (against 128MB limit)\n", + "- Heap memory grows rapidly: 101MB → 118MB → 121MB (against 128MB limit)\n", "- `pricing.recompute` function attempts allocation during garbage collection pause (412ms GC pause observed)\n", "- Container gets OOMKilled (exit 137) after ~2 minutes\n", "- Service restarts and immediately repeats the cycle\n", @@ -532,11 +478,11 @@ "Now let me create a unified diff:\n", " [bash]\n", "Perfect! Now let me open a pull request with this fix:\n", - "\u2192 open_pull_request\n", + "→ open_pull_request\n", "\n", - "\u2500\u2500 PR #1: Fix checkout-svc OOMKilled crash-loop by increasing memory limits \u2500\u2500\n", + "── PR #1: Fix checkout-svc OOMKilled crash-loop by increasing memory limits ──\n", "Excellent! PR #1 has been created. Now requesting approval from the on-call engineer:\n", - "\u2192 request_approval\n" + "→ request_approval\n" ] } ], @@ -550,7 +496,7 @@ " if name == \"open_pull_request\":\n", " n = len(prs) + 1\n", " prs.append({\"number\": n, \"merged\": False, **args})\n", - " print(f\"\\n\u2500\u2500 PR #{n}: {args['title']} \u2500\u2500\")\n", + " print(f\"\\n── PR #{n}: {args['title']} ──\")\n", " return {\"pr_number\": n, \"url\": f\"mock://infra/pull/{n}\"}\n", " if name == \"merge_pull_request\":\n", " prs[args[\"pr_number\"] - 1][\"merged\"] = True\n", @@ -578,7 +524,7 @@ " print(f\"\\n [{ev.name}]\")\n", " elif ev.type == \"agent.custom_tool_use\":\n", " custom_calls[ev.id] = ev\n", - " print(f\"\\n\u2192 {ev.name}\")\n", + " print(f\"\\n→ {ev.name}\")\n", " elif ev.type == \"session.status_idle\":\n", " idle_stop = ev.stop_reason\n", " elif ev.type == \"session.status_terminated\":\n", @@ -619,11 +565,8 @@ "id": "186ea378", "metadata": {}, "source": [ - "The agent has read the logs, matched the `OOMKilled` signature to\n", - "`runbooks/oom.md` via the skill, found the 128Mi memory limit in\n", - "`infra/k8s/checkout-deploy.yaml`, edited it, opened a PR, and is now\n", - "waiting on you. This is the message that would land in your `#oncall`\n", - "Slack channel:\n" + "The agent has read the logs, matched the `OOMKilled` signature to `runbooks/oom.md` via the skill, found the 128Mi memory limit in `infra/k8s/checkout-deploy.yaml`, edited it, opened a PR, and is now waiting on you. This is the message that would land in your `#oncall` Slack channel:\n", + "" ] }, { @@ -655,8 +598,8 @@ "\n", "## Fix\n", "Increase memory allocation to provide adequate headroom:\n", - "- **Memory request:** 128Mi \u2192 256Mi\n", - "- **Memory limit:** 128Mi \u2192 512Mi\n", + "- **Memory request:** 128Mi → 256Mi\n", + "- **Memory limit:** 128Mi → 512Mi\n", "\n", "This provides 4x headroom for the pricing cache and normal operations while remaining resource-efficient (512Mi limit is standard for Java/similar workloads with caching).\n", "\n", @@ -679,7 +622,7 @@ " httpGet:\n", " path: /healthz\n", "\n", - "\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n", + "────────────────────────────────────────────────────────────\n", "APPROVAL REQUESTED: **Incident:** checkout-svc CrashLoopBackOff with OOMKilled (7 restarts in 5 min)\n", "\n", "**Root Cause:** Memory limit (128Mi) insufficient for 14k-entry pricing cache\n", @@ -694,7 +637,7 @@ "pr = prs[0]\n", "print(pr[\"body\"], \"\\n\")\n", "print(pr[\"diff\"])\n", - "print(\"\\n\" + \"\u2500\" * 60)\n", + "print(\"\\n\" + \"─\" * 60)\n", "print(\"APPROVAL REQUESTED:\", pending_approvals[0][\"summary\"])" ] }, @@ -705,10 +648,8 @@ "source": [ "## 6. Approve and let the agent merge\n", "\n", - "Send `\"approved\"` back as the `request_approval` result. The agent\n", - "resumes, calls `merge_pull_request`, and ends its turn. In the Slack\n", - "version this send happens in your button-click handler \u2014 the payload\n", - "is identical.\n" + "Send `\"approved\"` back as the `request_approval` result. The agent resumes, calls `merge_pull_request`, and ends its turn. In the Slack version this send happens in your button-click handler — the payload is identical.\n", + "" ] }, { @@ -729,16 +670,16 @@ "output_type": "stream", "text": [ "Great! The fix has been approved. Now merging:\n", - "\u2192 merge_pull_request\n", - "## \u2705 Incident Resolved\n", + "→ merge_pull_request\n", + "## ✅ Incident Resolved\n", "\n", "**Summary:**\n", "- **Status:** MERGED (PR #1)\n", "- **Failure:** checkout-svc crash-loop with OOMKilled (exit 137)\n", "- **Root Cause:** Memory limit of 128Mi was insufficient for pricing cache (14,092 entries)\n", "- **Fix Applied:** \n", - " - Memory request: 128Mi \u2192 256Mi\n", - " - Memory limit: 128Mi \u2192 512Mi\n", + " - Memory request: 128Mi → 256Mi\n", + " - Memory limit: 128Mi → 512Mi\n", "\n", "**What Happened:**\n", "1. Service loaded 14k+ pricing cache entries during startup\n", @@ -782,18 +723,14 @@ "source": [ "## 7. Review the run in the Console\n", "\n", - "Because the investigation ran as a Managed Agents session, every step\n", - "above \u2014 the file reads, the `bash` diff, the manifest edit, the three\n", - "custom tool calls, and the approval you sent \u2014 is persisted as an\n", - "event on the session. Open it in the\n", - "[Console](https://platform.claude.com/) under **Managed Agents \u2192\n", - "Sessions** for the full audit trail with no extra instrumentation:\n", + "Because the investigation ran as a Managed Agents session, every step above — the file reads, the `bash` diff, the manifest edit, the three custom tool calls, and the approval you sent — is persisted as an event on the session. Open it in the [Console](https://platform.claude.com/) under **Managed Agents → Sessions** for the full audit trail with no extra instrumentation:\n", "\n", "![Console session view for the incident-response run](attachment:console_session.png)\n", "\n", "### Cleanup\n", "\n", - "Archive the session and the resources you created.\n" + "Archive the session and the resources you created.\n", + "" ], "attachments": { "console_session.png": { @@ -841,11 +778,7 @@ "\n", "Three swaps take this from notebook to on-call.\n", "\n", - "**Approve in Slack.** When `request_approval` arrives, post it to the\n", - "on-call channel with Block Kit buttons and send the\n", - "`user.custom_tool_result` back from the action handler. The\n", - "[`slack_data_bot` cookbook](slack_data_bot.ipynb) covers the Bolt app\n", - "setup; the approval-specific bit is small:\n", + "**Approve in Slack.** When `request_approval` arrives, post it to the on-call channel with Block Kit buttons and send the `user.custom_tool_result` back from the action handler. The [`slack_data_bot` cookbook](slack_data_bot.ipynb) covers the Bolt app setup; the approval-specific bit is small:\n", "\n", "```python\n", "def post_for_approval(session_id, event_id, summary):\n", @@ -876,16 +809,9 @@ " )\n", "```\n", "\n", - "To drop the polling loop entirely, register a Console webhook on\n", - "`session.requires_action` \u2014 the platform calls your endpoint the\n", - "moment the agent pauses, and you post to Slack from there.\n", + "To drop the polling loop entirely, register a Console webhook on `session.requires_action` — the platform calls your endpoint the moment the agent pauses, and you post to Slack from there.\n", "\n", - "**GitHub instead of the mock.** Drop the `open_pull_request` /\n", - "`merge_pull_request` custom tools and give the agent the GitHub MCP\n", - "server instead, with the token stored in a vault so it never appears\n", - "in your code.\n", - "[`CMA_operate_in_production.ipynb`](CMA_operate_in_production.ipynb)\n", - "walks through per-user credentials.\n", + "**GitHub instead of the mock.** Drop the `open_pull_request` / `merge_pull_request` custom tools and give the agent the GitHub MCP server instead, with the token stored in a vault so it never appears in your code. [`CMA_operate_in_production.ipynb`](CMA_operate_in_production.ipynb) walks through per-user credentials.\n", "\n", "```python\n", "agent = client.beta.agents.create(\n", @@ -899,14 +825,13 @@ "session = client.beta.sessions.create(..., vault_ids=[github_vault.id])\n", "```\n", "\n", - "**Live logs instead of a fixture.** Pass `DD_API_KEY` / `DD_APP_KEY`\n", - "through the environment config and let the agent `curl` the Datadog\n", - "Logs API from `bash` instead of reading a mounted file.\n", + "**Live logs instead of a fixture.** Pass `DD_API_KEY` / `DD_APP_KEY` through the environment config and let the agent `curl` the Datadog Logs API from `bash` instead of reading a mounted file.\n", "\n", "> See also the [Agent SDK site-reliability\n", "> agent](https://github.com/anthropics/claude-cookbooks/blob/main/claude_agent_sdk/03_The_site_reliability_agent.ipynb)\n", "> for the same problem solved with the local Agent SDK instead of the\n", - "> hosted Managed Agents runtime.\n" + "> hosted Managed Agents runtime.\n", + "" ] }, { @@ -916,17 +841,14 @@ "source": [ "## What you learned\n", "\n", - "- Trigger a session from any external event \u2014 one API call from a\n", - " PagerDuty webhook started the whole run.\n", + "- Trigger a session from any external event — one API call from a PagerDuty webhook started the whole run.\n", "- Attach a **Skill** to give the agent your team's conventions.\n", - "- Mount data with **resources**: `github_repository` for code, `file`\n", - " for logs and runbooks.\n", - "- Use **custom tools** to call back into your app and gate actions on\n", - " human approval via `requires_action`.\n", + "- Mount data with **resources**: `github_repository` for code, `file` for logs and runbooks.\n", + "- Use **custom tools** to call back into your app and gate actions on human approval via `requires_action`.\n", "- Get the full audit trail for free in the **Console** session view.\n", "\n", - "Swap the mocks for GitHub MCP, a Slack approval button, and live\n", - "logs, and it's ready for on-call.\n" + "Swap the mocks for GitHub MCP, a Slack approval button, and live logs, and it's ready for on-call.\n", + "" ] } ], @@ -942,4 +864,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} From 65da848d7632a11c3ee692ef2e39c23f088d2729 Mon Sep 17 00:00:00 2001 From: Charmaine Lee Date: Wed, 8 Apr 2026 18:07:08 -0400 Subject: [PATCH 2/3] =?UTF-8?q?fix(managed=5Fagents):=20address=20review?= =?UTF-8?q?=20=E2=80=94=20blockquotes,=20source=20normalization?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Join multi-line blockquotes in sre_incident_responder (Next steps) and data_analyst_agent (Warning) onto single lines so inline links and prose render correctly in MDX - Remove trailing empty-string source elements in sre_incident_responder markdown cells (reflow artifact) - Normalize slack_data_bot cell 0 source from string to array form for consistency with all other cells --- managed_agents/data_analyst_agent.ipynb | 3 +- managed_agents/slack_data_bot.ipynb | 45 ++++++++++++++++++++- managed_agents/sre_incident_responder.ipynb | 36 +++++------------ 3 files changed, 55 insertions(+), 29 deletions(-) diff --git a/managed_agents/data_analyst_agent.ipynb b/managed_agents/data_analyst_agent.ipynb index 0b8ca82f..37d1c8c8 100644 --- a/managed_agents/data_analyst_agent.ipynb +++ b/managed_agents/data_analyst_agent.ipynb @@ -349,8 +349,7 @@ "\n", "You create the agent and environment once and reuse them across runs; you create a new session for each conversation. Now that you have the report, archive this session to release its container. The lines below save the agent and environment IDs to `.env` so [`slack_data_bot.ipynb`](slack_data_bot.ipynb) can start new sessions with them.\n", "\n", - "> **Warning:** make sure `.env` is listed in `.gitignore` before\n", - "> running the next cell – never commit it." + "> **Warning:** make sure `.env` is listed in `.gitignore` before running the next cell – never commit it." ] }, { diff --git a/managed_agents/slack_data_bot.ipynb b/managed_agents/slack_data_bot.ipynb index a40e6392..9ece0e31 100644 --- a/managed_agents/slack_data_bot.ipynb +++ b/managed_agents/slack_data_bot.ipynb @@ -4,7 +4,48 @@ "cell_type": "markdown", "id": "cell-0", "metadata": {}, - "source": "# Build a Slack data analyst bot with Claude Managed Agents\n\n## Introduction\n\nYou'll wrap the agent from [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb) in a Slack bot built with [Bolt for Python](https://docs.slack.dev/tools/bolt-python/), Slack's official framework for building apps. Mention the bot with a question and a CSV attachment to get a narrative report posted back to the thread. Follow-up messages continue the same session.\n\n```text\nuser: @databot what's driving Q1 revenue? [sales.csv]\n │\n ▼\nbot uploads the CSV and starts an agent session\n │\n ▼\nbot streams the agent's progress back to the thread\n │\n ▼\nbot posts the finished report to the thread\n```\n\n### What you'll learn\n\n- Kick off an agent run from a Slack mention\n- Show the agent's progress as thread updates\n- Post the finished report back to the thread\n- Keep the conversation going with follow-up replies\n\n### Prerequisites\n\n1. Run the install cell below.\n\n2. Create a [Slack app](https://api.slack.com/apps): choose **Create New App → From a manifest**, paste [`slack_app_manifest.yaml`](example_data/slack_data_bot/slack_app_manifest.yaml), and install it to your workspace. The manifest enables Socket Mode (Slack delivers events over a WebSocket, so you don't need a public URL) and the required scopes. Then grab two tokens:\n\n - **OAuth & Permissions** → copy the Bot User OAuth Token (`xoxb-...`)\n - **Basic Information → App-Level Tokens** → generate one with scope `connections:write` (`xapp-...`)\n\n In a channel you want the bot in, run `/invite @databot`.\n\n3. Run [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb), which saves `ANALYST_ENV_ID`, `ANALYST_AGENT_ID`, and `ANALYST_AGENT_VERSION` to `.env`.\n\nThe setup cell below prompts for your Slack tokens and saves them to `.env` so you don't re-enter them on restart (or add them to `.env` beforehand to skip the prompt). `.env` is already in `.gitignore` – never commit it to version control. If you don't have a Slack workspace handy you can still read through the code – each section explains what it does – but you'll need one to run the bot." + "source": [ + "# Build a Slack data analyst bot with Claude Managed Agents\n", + "\n", + "## Introduction\n", + "\n", + "You'll wrap the agent from [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb) in a Slack bot built with [Bolt for Python](https://docs.slack.dev/tools/bolt-python/), Slack's official framework for building apps. Mention the bot with a question and a CSV attachment to get a narrative report posted back to the thread. Follow-up messages continue the same session.\n", + "\n", + "```text\n", + "user: @databot what's driving Q1 revenue? [sales.csv]\n", + " │\n", + " ▼\n", + "bot uploads the CSV and starts an agent session\n", + " │\n", + " ▼\n", + "bot streams the agent's progress back to the thread\n", + " │\n", + " ▼\n", + "bot posts the finished report to the thread\n", + "```\n", + "\n", + "### What you'll learn\n", + "\n", + "- Kick off an agent run from a Slack mention\n", + "- Show the agent's progress as thread updates\n", + "- Post the finished report back to the thread\n", + "- Keep the conversation going with follow-up replies\n", + "\n", + "### Prerequisites\n", + "\n", + "1. Run the install cell below.\n", + "\n", + "2. Create a [Slack app](https://api.slack.com/apps): choose **Create New App → From a manifest**, paste [`slack_app_manifest.yaml`](example_data/slack_data_bot/slack_app_manifest.yaml), and install it to your workspace. The manifest enables Socket Mode (Slack delivers events over a WebSocket, so you don't need a public URL) and the required scopes. Then grab two tokens:\n", + "\n", + " - **OAuth & Permissions** → copy the Bot User OAuth Token (`xoxb-...`)\n", + " - **Basic Information → App-Level Tokens** → generate one with scope `connections:write` (`xapp-...`)\n", + "\n", + " In a channel you want the bot in, run `/invite @databot`.\n", + "\n", + "3. Run [`data_analyst_agent.ipynb`](data_analyst_agent.ipynb), which saves `ANALYST_ENV_ID`, `ANALYST_AGENT_ID`, and `ANALYST_AGENT_VERSION` to `.env`.\n", + "\n", + "The setup cell below prompts for your Slack tokens and saves them to `.env` so you don't re-enter them on restart (or add them to `.env` beforehand to skip the prompt). `.env` is already in `.gitignore` – never commit it to version control. If you don't have a Slack workspace handy you can still read through the code – each section explains what it does – but you'll need one to run the bot." + ] }, { "cell_type": "code", @@ -325,4 +366,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/managed_agents/sre_incident_responder.ipynb b/managed_agents/sre_incident_responder.ipynb index 407776aa..53b78264 100644 --- a/managed_agents/sre_incident_responder.ipynb +++ b/managed_agents/sre_incident_responder.ipynb @@ -31,8 +31,7 @@ "\n", "### Prerequisites\n", "\n", - "Set `ANTHROPIC_API_KEY` in your environment, then install dependencies:\n", - "" + "Set `ANTHROPIC_API_KEY` in your environment, then install dependencies:" ] }, { @@ -90,8 +89,7 @@ "\n", "A [**Skill**](https://platform.claude.com/docs/en/managed-agents/skills) is a small filesystem bundle the platform mounts into the agent's context with progressive disclosure: the agent sees a one-line description up front and reads the body only when it's relevant. It's a good place for team conventions that shouldn't live in the system prompt.\n", "\n", - "The sample skill below encodes one rule — *consult the runbook before touching infrastructure* — the way a real team playbook would. You upload it once via the Skills API and reference it by ID on every agent that needs it.\n", - "" + "The sample skill below encodes one rule — *consult the runbook before touching infrastructure* — the way a real team playbook would. You upload it once via the Skills API and reference it by ID on every agent that needs it." ] }, { @@ -160,8 +158,7 @@ "- The runbook **skill** from step 1.\n", "- Three **custom tools** — `open_pull_request`, `request_approval`, and `merge_pull_request` — that the agent can call but *your application* executes. They're how the agent reaches systems outside the sandbox, and how you put a human in the loop.\n", "\n", - "The system prompt is persona and workflow only. The alert itself arrives as the first user event, so the same agent handles any incident.\n", - "" + "The system prompt is persona and workflow only. The alert itself arrives as the first user event, so the same agent handles any incident." ] }, { @@ -287,8 +284,7 @@ " \"checkout\": {\"type\": \"branch\", \"name\": \"main\"},\n", " \"mount_path\": \"infra\",\n", "}\n", - "```\n", - "" + "```" ] }, { @@ -345,8 +341,7 @@ "source": [ "## 4. Handle the incident alert\n", "\n", - "The handler below is the one function you'd deploy — a Flask or FastAPI route that your alerting system calls when an incident fires. It creates a session referencing the agent and environment, mounts the data, and sends the alert JSON as the first `user.message` event. This example uses a [PagerDuty V3 webhook](https://developer.pagerduty.com/docs/webhooks-overview) payload, but any pager that can POST JSON works the same way; here you call the handler directly with the fixture.\n", - "" + "The handler below is the one function you'd deploy — a Flask or FastAPI route that your alerting system calls when an incident fires. It creates a session referencing the agent and environment, mounts the data, and sends the alert JSON as the first `user.message` event. This example uses a [PagerDuty V3 webhook](https://developer.pagerduty.com/docs/webhooks-overview) payload, but any pager that can POST JSON works the same way; here you call the handler directly with the fixture." ] }, { @@ -409,8 +404,7 @@ "\n", "The loop below polls `events.list`, answers `open_pull_request` and `merge_pull_request` inline by writing to a local list — that's the GitHub mock — but **returns** when `request_approval` arrives, because that one needs a human.\n", "\n", - "In production, \"needs a human\" usually means *post it to Slack*: drop the agent's summary into the on-call channel with an **Approve** button, and send the result back when someone clicks. The [`slack_data_bot` cookbook](slack_data_bot.ipynb) shows the Bolt wiring for that; here you'll approve inline in the next cell so the notebook stays self-contained.\n", - "" + "In production, \"needs a human\" usually means *post it to Slack*: drop the agent's summary into the on-call channel with an **Approve** button, and send the result back when someone clicks. The [`slack_data_bot` cookbook](slack_data_bot.ipynb) shows the Bolt wiring for that; here you'll approve inline in the next cell so the notebook stays self-contained." ] }, { @@ -565,8 +559,7 @@ "id": "186ea378", "metadata": {}, "source": [ - "The agent has read the logs, matched the `OOMKilled` signature to `runbooks/oom.md` via the skill, found the 128Mi memory limit in `infra/k8s/checkout-deploy.yaml`, edited it, opened a PR, and is now waiting on you. This is the message that would land in your `#oncall` Slack channel:\n", - "" + "The agent has read the logs, matched the `OOMKilled` signature to `runbooks/oom.md` via the skill, found the 128Mi memory limit in `infra/k8s/checkout-deploy.yaml`, edited it, opened a PR, and is now waiting on you. This is the message that would land in your `#oncall` Slack channel:" ] }, { @@ -648,8 +641,7 @@ "source": [ "## 6. Approve and let the agent merge\n", "\n", - "Send `\"approved\"` back as the `request_approval` result. The agent resumes, calls `merge_pull_request`, and ends its turn. In the Slack version this send happens in your button-click handler — the payload is identical.\n", - "" + "Send `\"approved\"` back as the `request_approval` result. The agent resumes, calls `merge_pull_request`, and ends its turn. In the Slack version this send happens in your button-click handler — the payload is identical." ] }, { @@ -729,8 +721,7 @@ "\n", "### Cleanup\n", "\n", - "Archive the session and the resources you created.\n", - "" + "Archive the session and the resources you created." ], "attachments": { "console_session.png": { @@ -827,11 +818,7 @@ "\n", "**Live logs instead of a fixture.** Pass `DD_API_KEY` / `DD_APP_KEY` through the environment config and let the agent `curl` the Datadog Logs API from `bash` instead of reading a mounted file.\n", "\n", - "> See also the [Agent SDK site-reliability\n", - "> agent](https://github.com/anthropics/claude-cookbooks/blob/main/claude_agent_sdk/03_The_site_reliability_agent.ipynb)\n", - "> for the same problem solved with the local Agent SDK instead of the\n", - "> hosted Managed Agents runtime.\n", - "" + "> See also the [Agent SDK site-reliability agent](https://github.com/anthropics/claude-cookbooks/blob/main/claude_agent_sdk/03_The_site_reliability_agent.ipynb) for the same problem solved with the local Agent SDK instead of the hosted Managed Agents runtime.\n" ] }, { @@ -847,8 +834,7 @@ "- Use **custom tools** to call back into your app and gate actions on human approval via `requires_action`.\n", "- Get the full audit trail for free in the **Console** session view.\n", "\n", - "Swap the mocks for GitHub MCP, a Slack approval button, and live logs, and it's ready for on-call.\n", - "" + "Swap the mocks for GitHub MCP, a Slack approval button, and live logs, and it's ready for on-call." ] } ], From 370c1f5d24d05603ecfa648d2d40369413dd3f40 Mon Sep 17 00:00:00 2001 From: Charmaine Lee Date: Wed, 8 Apr 2026 18:38:11 -0400 Subject: [PATCH 3/3] fix(registry): correct SRE incident responder published date to 2026-04-08 --- registry.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/registry.yaml b/registry.yaml index 68551eaf..8bf1a394 100644 --- a/registry.yaml +++ b/registry.yaml @@ -49,7 +49,7 @@ path: managed_agents/sre_incident_responder.ipynb authors: - gaganb-ant - date: '2026-04-07' + date: '2026-04-08' categories: - Agent Patterns - Observability