Skip to content

Commit 942b071

Browse files
authored
Merge pull request #750 from anthropics/cj-ant/roadtrip-planner
Add roadtrip_planner managed agents cookbook
2 parents 8231472 + d146098 commit 942b071

25 files changed

Lines changed: 3531 additions & 0 deletions
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Copy to .env.local (gitignored). `npm run setup` reads the three keys
2+
# below and appends the six ROADTRIP_PLANNER_* ids it provisions.
3+
4+
# Anthropic (sk-ant-...): https://console.anthropic.com/settings/keys
5+
ANTHROPIC_API_KEY=
6+
7+
# National Park Service, free and emailed instantly:
8+
# https://www.nps.gov/subjects/developer/get-started.htm
9+
NATIONAL_PARK_SERVICE_API_KEY=
10+
11+
# Windy Point Forecast, free tier:
12+
# https://api.windy.com/point-forecast/docs
13+
WINDY_API_KEY=
14+
15+
# Written by `npm run setup`
16+
# ROADTRIP_PLANNER_ENVIRONMENT_ID=env_...
17+
# ROADTRIP_PLANNER_AGENT_ID=agent_...
18+
# ROADTRIP_PLANNER_REVIEWER_AGENT_ID=agent_...
19+
# ROADTRIP_PLANNER_VAULT_ID=vlt_...
20+
# ROADTRIP_PLANNER_NATIONAL_PARK_SERVICE_CREDENTIAL_ID=vcrd_...
21+
# ROADTRIP_PLANNER_WINDY_CREDENTIAL_ID=vcrd_...
22+
23+
# Optional
24+
# ROADTRIP_PLANNER_MODEL=claude-sonnet-5
25+
# ROADTRIP_PLANNER_REVIEWER_MODEL=claude-opus-4-8
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# The repo-root .gitignore has a Python-era `lib/` pattern that would
2+
# silently drop src/lib/, which is most of this app.
3+
!src/lib/
4+
5+
# A cookbook is a starting point: no lockfile, resolve fresh.
6+
package-lock.json
7+
node_modules/
8+
.next/
9+
out/
10+
next-env.d.ts
11+
*.tsbuildinfo
12+
.env
13+
.env.local
14+
events.json
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Road trip planner (Claude Managed Agents + Next.js)
2+
3+
A Next.js chat built directly on a Managed Agent session: no chat framework,
4+
the event log is the app state. Teaches session event streaming
5+
(`event_deltas`), vault credential `injection_location`, per-session
6+
`agent_with_overrides`, and `multiagent` coordinator rosters. The planner
7+
calls the National Park Service and Windy APIs with keys it never holds,
8+
then hands its draft to a second agent - an Opus reviewer running as a
9+
session thread - for a quick critique.
10+
11+
## When the user asks to set this up, run it, or debug it
12+
13+
1. **Invoke `/claude-api` first.** It loads the Managed Agents API reference
14+
(agents, environments, sessions, events, vaults, credentials, webhooks).
15+
Use it as the source of truth for any SDK call you write or edit here.
16+
Don't guess field names.
17+
2. **Read [`./skill.md`](./skill.md)** and walk the user through it in order.
18+
It has the key signups, the provisioning step, a pointer to the four
19+
README beats, and the debugging table.
20+
3. The two files worth reading before changing anything are
21+
`src/lib/use-managed-agent-session.ts` (the client runtime: one EventSource, the
22+
SDK accumulator, the re-sync-on-connect habit) and `setup/create.ts`
23+
(where `injection_location` and the `multiagent` roster are set).
24+
`src/lib/transcript.ts` is the one fold both first paint and live
25+
streaming render through.
26+
27+
## Invariants to preserve when editing
28+
29+
- The stream is a tail, not a replay, and previews are never persisted: every
30+
EventSource (re)connect re-fetches the event log (via `/api/session`)
31+
before trusting the tail. Do not "optimize" that fetch away. It is the
32+
entire resume story.
33+
- Previews are speculative. The accumulator's snapshot retires when the
34+
buffered `agent.message` with the same id lands in the log, an orphan
35+
`event_delta` (attached mid-generation) is dropped, and an errored
36+
`span.model_request_end` discards the open snapshot.
37+
- One fold (`transcript.ts`) renders both history and live state. If a new
38+
event type should render, it goes in the fold, not in a second mapping.
39+
- `web_search` and `web_fetch` stay disabled on the agent. With them on, the
40+
model answers from the open web and the vault demo proves nothing.
41+
- The reviewer stays tool-free: its toolset is deny-by-default
42+
(`default_config: { enabled: false }`), and the prompt says the same. The
43+
review must come from the draft alone - a poisoned draft must have no tool
44+
to reach, and the rail must not fill with a second agent's curls.
45+
- The handoff is prompt-triggered. The planner's "Review step" prompt
46+
section decides when to message the reviewer; the user never asks and the
47+
app sends nothing. Weaken that section and beat 4 silently disappears.
48+
- The environment's `allowed_hosts` and each credential's `allowed_hosts`
49+
both list the vendor host. Drop either and the calls fail differently.
50+
- The header's model picker shows `session.agent.model` from the API
51+
response, never client state. An override must be visible as the resolved
52+
snapshot or the demo proves nothing.
53+
- No database. If a change needs one, it does not belong in this cookbook.
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Road trip planner: stream sessions, scope vault credentials, override models, and review plans agent-to-agent
2+
3+
A Next.js chat built directly on a [Claude Managed Agent](https://platform.claude.com/docs/en/managed-agents/overview) [session](https://platform.claude.com/docs/en/managed-agents/sessions). There is no chat framework and no database: the session's event log is the message list, the live tokens are its SSE tail, and the only server code in the streaming path is a thin proxy that keeps the API key out of the browser. The agent plans road trips around any US national park. You'll see four API features on one screen:
4+
5+
1. **Stream the turn live.** `event_deltas[]=...` on [`GET /v1/sessions/{id}/events/stream`](https://platform.claude.com/docs/en/managed-agents/events-and-streaming) interleaves previews with the buffered event log. `agent.message` previews carry the reply token by token, and `agent.thinking` previews fire the moment extended thinking starts. Tool calls, results, retries, and usage ride the same stream, so the UI renders the whole turn as it happens. Without previews the same chat is seconds of blank, then a paragraph.
6+
2. **Inject [vault credentials](https://platform.claude.com/docs/en/managed-agents/vaults) at a specific request location.** The agent authenticates to two vendor APIs with keys it never holds. The National Park Service wants its key in a request header and Windy wants it inside the JSON body. The `injection_location` field on each credential controls where the placeholder may be swapped for the real value at egress.
7+
3. **Override agent settings per session.** Every session is created with the `agent_with_overrides` selector. The model picker adds a `model` override to it, rerunning the same stored agent on a different model. One agent, no per-user copies, no new versions.
8+
4. **Hand the draft to a second agent.** The planner is a `multiagent` coordinator with one roster entry: a reviewer agent running Opus. When it drafts an itinerary it spawns the reviewer as a session thread, messages it the draft, and waits for the critique. The whole exchange is ordinary session events (`session.thread_created`, `agent.thread_message_sent`, `agent.thread_message_received`) on the same stream the chat already renders, so you watch the two agents talk to each other.
9+
10+
```
11+
browser ──▶ POST /api/chat ──▶ sessions.events.send(user.message)
12+
13+
│ one EventSource GET /v1/sessions/{id}/events/stream
14+
└── GET /api/stream ◀────────── ?event_deltas[]=agent.message
15+
(a thin SSE proxy) &event_deltas[]=agent.thinking
16+
17+
src/lib/use-managed-agent-session.ts: accumulate previews (SDK helper),
18+
append buffered events, fold the log into turns, render
19+
20+
sandbox ── curl -H "X-Api-Key: $NATIONAL_PARK_SERVICE_API_KEY" ──▶ egress ──▶ developer.nps.gov
21+
the env var is an opaque placeholder └─ swaps in the real key when the
22+
host and the location are allowed
23+
24+
planner ── agent.thread_message_sent (the draft) ──▶ reviewer thread (Opus, same session)
25+
◀── agent.thread_message_received (the critique) ──┘
26+
```
27+
28+
The session is the backend: the agent remembers the conversation, the transcript is `GET /v1/sessions/{id}/events`, and the browser holds one httpOnly cookie with the session id.
29+
30+
The new API calls, and where to read them:
31+
32+
- `event_deltas` on the session event stream: [`src/app/api/stream/route.ts`](./src/app/api/stream/route.ts), the SSE proxy
33+
- the SDK's `accumulateManagedAgentsEvent` folding previews in the browser: [`src/lib/use-managed-agent-session.ts`](./src/lib/use-managed-agent-session.ts)
34+
- the event log folded into renderable turns: [`src/lib/transcript.ts`](./src/lib/transcript.ts)
35+
- `injection_location` provisioned on each credential: [`setup/create.ts`](./setup/create.ts), step 5
36+
- `injection_location` flipped on a live credential: one `ant` CLI call, beat 2 below
37+
- `agent_with_overrides` on session create: [`src/app/api/session/route.ts`](./src/app/api/session/route.ts)
38+
- the `multiagent` coordinator roster on the planner agent: [`setup/create.ts`](./setup/create.ts), step 3
39+
- thread events folded into the rail and the chat: [`src/lib/transcript.ts`](./src/lib/transcript.ts)
40+
41+
## Prerequisites
42+
43+
- Node 20 or later
44+
- Anthropic credentials for an organization with Managed Agents access (`ANTHROPIC_API_KEY`, or the login `ant auth login` saves)
45+
- A National Park Service API key, free and emailed instantly: <https://www.nps.gov/subjects/developer/get-started.htm>
46+
- A Windy Point Forecast API key, free tier: <https://api.windy.com/point-forecast/docs>
47+
- `@anthropic-ai/sdk` at the latest release
48+
49+
## Run it
50+
51+
```bash
52+
cd managed_agents/roadtrip_planner
53+
npm install
54+
cp .env.example .env.local # fill in the three keys
55+
npm run setup # environment + 2 agents + vault + 2 credentials, ids -> .env.local
56+
npm run dev # http://localhost:3000
57+
```
58+
59+
Setup provisions each credential with the injection location its vendor documents, hardcoded in [`setup/create.ts`](./setup/create.ts):
60+
61+
| secret | host | injected in |
62+
|---|---|---|
63+
| `NATIONAL_PARK_SERVICE_API_KEY` | `developer.nps.gov` | header (`X-Api-Key`) |
64+
| `WINDY_API_KEY` | `api.windy.com` | body (`"key": ...`) |
65+
66+
Same credential type, same vault, opposite `injection_location`. The model never sees either value.
67+
68+
## Four things to do with it
69+
70+
Each one is a runnable step. Together they are the cookbook.
71+
72+
### 1. Ask for a trip
73+
74+
"Plan a 5 day road trip split between Zion and Bryce Canyon for the first week of October." The agent resolves the parks first (`/parks?q=...`), then plans from what the APIs return. The status line flips to "thinking..." the moment the model starts reasoning: that is the `agent.thinking` preview, where the buffered event only lands when thinking ends. The reply renders as the model writes it. The right rail shows each `curl` the agent runs in its sandbox (it budgets itself to five per question), each against an allowed host, each authenticated with a key it cannot read. Any park works: swap in Acadia, Yellowstone, Joshua Tree.
75+
76+
### 2. Flip one field and watch a vendor reject the placeholder
77+
78+
Update the live credential with the [`ant` CLI](https://platform.claude.com/docs/en/api/sdks/cli) (it shares the credentials `npm run setup` used; the ids are in `.env.local`):
79+
80+
```bash
81+
eval "$(grep '^ROADTRIP_PLANNER_' .env.local)"
82+
ant beta:vaults:credentials update \
83+
--vault-id "$ROADTRIP_PLANNER_VAULT_ID" \
84+
--credential-id "$ROADTRIP_PLANNER_NATIONAL_PARK_SERVICE_CREDENTIAL_ID" \
85+
--auth '{type: environment_variable, injection_location: {header: false, body: true}}'
86+
```
87+
88+
The National Park Service only accepts its key in a header, and header injection is now off for that credential. Nothing substitutes the placeholder, the next NPS call carries it literally, and NPS rejects it. Ask "is anything closed at the park right now" and watch the 4xx land in the tool rail: the agent shows the status and body, retries the documented header location once, then says plainly that NPS is rejecting its key while it keeps planning with the weather API. Heal it by setting the credential back to the location its vendor documents:
89+
90+
```bash
91+
eval "$(grep '^ROADTRIP_PLANNER_' .env.local)"
92+
ant beta:vaults:credentials update \
93+
--vault-id "$ROADTRIP_PLANNER_VAULT_ID" \
94+
--credential-id "$ROADTRIP_PLANNER_NATIONAL_PARK_SERVICE_CREDENTIAL_ID" \
95+
--auth '{type: environment_variable, injection_location: {header: true, body: false}}'
96+
```
97+
98+
One field, no secret rotation, no redeploy, visible consequence. The same flip works the other way on the Windy credential (`$ROADTRIP_PLANNER_WINDY_CREDENTIAL_ID`, with the two booleans inverted), because Windy only documents body auth.
99+
100+
### 3. Run the same agent on a different model
101+
102+
Pick another model in the header. A new trip starts with a `model` override on the selector every session already uses:
103+
104+
```ts
105+
anthropic.beta.sessions.create({
106+
agent: { type: "agent_with_overrides", id: agentId, model: "claude-opus-4-8" },
107+
// ...
108+
});
109+
```
110+
111+
The override replaces the stored agent's model for this session only. The header shows the model from the session's resolved `agent` snapshot (`session.agent.model`), so what you read is what the API resolved, not what the client asked for. The stored agent never changes: no copy, no new version, and "New trip" returns to its configured model. `system`, `tools`, `mcp_servers`, and `skills` are overridable the same way.
112+
113+
### 4. Watch the planner get its plan reviewed
114+
115+
Ask for a full multi-day itinerary (beat 1's Zion and Bryce prompt works). After the draft is written, the status line flips to "waiting on Plan reviewer...": the planner spawned its roster agent as a session thread and messaged it the draft. The reviewer is a second stored agent created by `npm run setup`:
116+
117+
```ts
118+
const reviewer = await anthropic.beta.agents.create({
119+
name: "Plan reviewer",
120+
model: "claude-opus-4-8",
121+
system: REVIEWER_SYSTEM, // a quick gut-check: verdict line + at most 3 issues
122+
tools: [{ type: "agent_toolset_20260401", default_config: { enabled: false }, configs: [] }],
123+
});
124+
125+
const planner = await anthropic.beta.agents.create({
126+
name: "Road trip planner",
127+
model: "claude-sonnet-5",
128+
multiagent: { type: "coordinator", agents: [reviewer.id] },
129+
// ...
130+
});
131+
```
132+
133+
Nobody asks for the review. The roster grants the capability, and the planner's own system prompt decides when to use it: a new day-by-day itinerary gets reviewed, a quick factual answer does not. The send is not a tool call and not an app request; the only trace is the thread events themselves.
134+
135+
Two models in one session: the planner drafts on Sonnet, the gut-check runs on Opus. The reviewer's toolset is deny-by-default, which is both a security boundary (a poisoned draft has no tool to reach) and what keeps the review fast: it judges the draft text alone. Its reply still routes back, because agent-to-agent messaging is platform machinery, not a tool.
136+
137+
The exchange renders in two places, fed by the same events. In the chat, two chips: "to Plan reviewer" (the draft going out) and "from Plan reviewer" (the critique coming back). In the rail: `session.thread_created`, then both `agent.thread_message_*` events with their full text, then the thread's status events as it goes idle. The planner's final reply ends with a "Reviewer flagged" line when the critique raised something it could not verify from the APIs.
138+
139+
The review costs one extra agent and zero plumbing: no queue, no webhook, no second session. The thread lives inside the session, its lifecycle events cross-post to the stream the page already holds open, and the agent-to-agent messages are persisted events, so the exchange survives a reload like everything else.
140+
141+
## The client
142+
143+
A Managed Agents session already is a resumable chat backend, so the client is three pieces, all in [`src/lib/use-managed-agent-session.ts`](./src/lib/use-managed-agent-session.ts) and [`src/lib/transcript.ts`](./src/lib/transcript.ts):
144+
145+
- **One EventSource** on `/api/stream`, a thin authenticated proxy of the session's SSE tail. The browser owns reconnect, and every connect re-fetches the event log first: the stream is a tail, not a replay, and previews are never persisted. That habit is the whole resume story: reload mid-answer and the page renders what streaming rendered.
146+
- **The SDK's `accumulateManagedAgentsEvent`**, run in the browser, folds `event_start` / `event_delta` previews into one growing `agent.message` snapshot. Previews are speculative: the buffered event with the same id retires the snapshot, and because the ids match, the rendered part does not jump. A delta for a preview the client never saw open (it attached mid-generation) is dropped, and the buffered event delivers the whole thing. An errored model request never produces its buffered event, so its terminal `span.model_request_end` drops the snapshot instead.
147+
- **One fold** from the event array to renderable turns. The same fold runs over hydrated history and the live array, so a reload renders what streaming rendered, turn stats included.
148+
149+
Everything else on the stream is a status signal: thinking starts drive the activity line, `span.model_request_*` drive the working state and token stats, `agent.thread_message_sent` flips the activity line to "waiting on Plan reviewer..." until the reply lands, and `session.error` distinguishes a retry in progress from a dead turn. Stop is a real API verb: the button POSTs `user.interrupt`, the agent winds down, and the `session.status_idle` that follows flips the UI back to ready.
150+
151+
<details>
152+
<summary><strong>Caveats</strong></summary>
153+
154+
- `injection_location` has two flags: `header` and `body`. There is no query-string option, so a key in `?api_key=...` is never substituted. APIs that only accept query-param auth can't use environment variable credentials, and if the agent falls back to NPS's `?api_key=` form the request carries the placeholder.
155+
- Two allowlists must agree. The [environment](https://platform.claude.com/docs/en/managed-agents/environments)'s `networking.allowed_hosts` controls what the sandbox can reach at all, and the credential's `networking.allowed_hosts` controls where its secret may be substituted. A host missing from the first never connects, and a host missing from the second gets the placeholder.
156+
- Streaming previews cover `agent.message` (text deltas) and `agent.thinking` (start only) on the session's primary thread. Tool calls and subagent threads arrive as buffered events, so the reviewer's critique lands whole, not token by token.
157+
- The roster is flat: 1 to 20 entries, and a roster agent cannot have a roster of its own (depth limit 1). The model picker overrides the planner only; the reviewer thread always runs the reviewer agent's stored model.
158+
- Vaults and the model both attach at `sessions.create`. You can update a credential in a vault a running session already holds, but a different vault or a different model means a new session, which is why the picker starts a new trip.
159+
160+
</details>
161+
162+
## Files
163+
164+
| | |
165+
|---|---|
166+
| `setup/create.ts` | environment + planner + reviewer + vault + two credentials, idempotent, prints ids |
167+
| `setup/teardown.ts` | archive everything the setup created |
168+
| `src/lib/client.ts` | the shared SDK client and the session cookie name |
169+
| `src/lib/use-managed-agent-session.ts` | the client runtime: one EventSource, the SDK accumulator, send/stop |
170+
| `src/lib/transcript.ts` | the event log folded into renderable turns |
171+
| `src/app/api/session/route.ts` | create or resume the session behind the cookie, return the event log |
172+
| `src/app/api/stream/route.ts` | SSE proxy of the session tail (the API key stays server-side) |
173+
| `src/app/api/chat/route.ts` | send one `user.message` |
174+
| `src/app/api/interrupt/route.ts` | send `user.interrupt` (the stop button) |
175+
| `src/app/page.tsx` | the chat, the model picker, the tool rail |
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import type { NextConfig } from "next";
2+
3+
const nextConfig: NextConfig = {
4+
// The chat page calls /api/session once on mount to create (or resume) the
5+
// Managed Agents session. React strict mode double-runs effects in dev, which would
6+
// race two session creates, so the effect guards itself with a ref instead
7+
// of relying on this flag. Nothing here is load-bearing.
8+
reactStrictMode: true,
9+
};
10+
11+
export default nextConfig;

0 commit comments

Comments
 (0)