anthropics
diff --git a/‎managed_agents/roadtrip_planner/.env.example‎
Lines changed: 25 additions & 0 deletions b/‎managed_agents/roadtrip_planner/.env.example‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎managed_agents/roadtrip_planner/.gitignore‎
Lines changed: 14 additions & 0 deletions b/‎managed_agents/roadtrip_planner/.gitignore‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎managed_agents/roadtrip_planner/CLAUDE.md‎
Lines changed: 53 additions & 0 deletions b/‎managed_agents/roadtrip_planner/CLAUDE.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎managed_agents/roadtrip_planner/README.md‎
Lines changed: 175 additions & 0 deletions b/‎managed_agents/roadtrip_planner/README.md‎
Lines changed: 175 additions & 0 deletions
diff --git a/‎managed_agents/roadtrip_planner/next.config.ts‎
Lines changed: 11 additions & 0 deletions b/‎managed_agents/roadtrip_planner/next.config.ts‎
Lines changed: 11 additions & 0 deletions
@@ -0,0 +1,25 @@
+# Copy to .env.local (gitignored). `npm run setup` reads the three keys
+# below and appends the six ROADTRIP_PLANNER_* ids it provisions.
+
+# Anthropic (sk-ant-...): https://console.anthropic.com/settings/keys
+ANTHROPIC_API_KEY=
+
+# National Park Service, free and emailed instantly:
+# https://www.nps.gov/subjects/developer/get-started.htm
+NATIONAL_PARK_SERVICE_API_KEY=
+
+# Windy Point Forecast, free tier:
+# https://api.windy.com/point-forecast/docs
+WINDY_API_KEY=
+
+# Written by `npm run setup`
+# ROADTRIP_PLANNER_ENVIRONMENT_ID=env_...
+# ROADTRIP_PLANNER_AGENT_ID=agent_...
+# ROADTRIP_PLANNER_REVIEWER_AGENT_ID=agent_...
+# ROADTRIP_PLANNER_VAULT_ID=vlt_...
+# ROADTRIP_PLANNER_NATIONAL_PARK_SERVICE_CREDENTIAL_ID=vcrd_...
+# ROADTRIP_PLANNER_WINDY_CREDENTIAL_ID=vcrd_...
+
+# Optional
+# ROADTRIP_PLANNER_MODEL=claude-sonnet-5
+# ROADTRIP_PLANNER_REVIEWER_MODEL=claude-opus-4-8
@@ -0,0 +1,14 @@
+# The repo-root .gitignore has a Python-era `lib/` pattern that would
+# silently drop src/lib/, which is most of this app.
+!src/lib/
+
+# A cookbook is a starting point: no lockfile, resolve fresh.
+package-lock.json
+node_modules/
+.next/
+out/
+next-env.d.ts
+*.tsbuildinfo
+.env
+.env.local
+events.json
@@ -0,0 +1,53 @@
+# Road trip planner (Claude Managed Agents + Next.js)
+
+A Next.js chat built directly on a Managed Agent session: no chat framework,
+the event log is the app state. Teaches session event streaming
+(`event_deltas`), vault credential `injection_location`, per-session
+`agent_with_overrides`, and `multiagent` coordinator rosters. The planner
+calls the National Park Service and Windy APIs with keys it never holds,
+then hands its draft to a second agent - an Opus reviewer running as a
+session thread - for a quick critique.
+
+## When the user asks to set this up, run it, or debug it
+
+1. **Invoke `/claude-api` first.** It loads the Managed Agents API reference
+   (agents, environments, sessions, events, vaults, credentials, webhooks).
+   Use it as the source of truth for any SDK call you write or edit here.
+   Don't guess field names.
+2. **Read [`./skill.md`](./skill.md)** and walk the user through it in order.
+   It has the key signups, the provisioning step, a pointer to the four
+   README beats, and the debugging table.
+3. The two files worth reading before changing anything are
+   `src/lib/use-managed-agent-session.ts` (the client runtime: one EventSource, the
+   SDK accumulator, the re-sync-on-connect habit) and `setup/create.ts`
+   (where `injection_location` and the `multiagent` roster are set).
+   `src/lib/transcript.ts` is the one fold both first paint and live
+   streaming render through.
+
+## Invariants to preserve when editing
+
+- The stream is a tail, not a replay, and previews are never persisted: every
+  EventSource (re)connect re-fetches the event log (via `/api/session`)
+  before trusting the tail. Do not "optimize" that fetch away. It is the
+  entire resume story.
+- Previews are speculative. The accumulator's snapshot retires when the
+  buffered `agent.message` with the same id lands in the log, an orphan
+  `event_delta` (attached mid-generation) is dropped, and an errored
+  `span.model_request_end` discards the open snapshot.
+- One fold (`transcript.ts`) renders both history and live state. If a new
+  event type should render, it goes in the fold, not in a second mapping.
+- `web_search` and `web_fetch` stay disabled on the agent. With them on, the
+  model answers from the open web and the vault demo proves nothing.
+- The reviewer stays tool-free: its toolset is deny-by-default
+  (`default_config: { enabled: false }`), and the prompt says the same. The
+  review must come from the draft alone - a poisoned draft must have no tool
+  to reach, and the rail must not fill with a second agent's curls.
+- The handoff is prompt-triggered. The planner's "Review step" prompt
+  section decides when to message the reviewer; the user never asks and the
+  app sends nothing. Weaken that section and beat 4 silently disappears.
+- The environment's `allowed_hosts` and each credential's `allowed_hosts`
+  both list the vendor host. Drop either and the calls fail differently.
+- The header's model picker shows `session.agent.model` from the API
+  response, never client state. An override must be visible as the resolved
+  snapshot or the demo proves nothing.
+- No database. If a change needs one, it does not belong in this cookbook.
@@ -0,0 +1,175 @@
+# Road trip planner: stream sessions, scope vault credentials, override models, and review plans agent-to-agent
+
+A Next.js chat built directly on a [Claude Managed Agent](https://platform.claude.com/docs/en/managed-agents/overview) [session](https://platform.claude.com/docs/en/managed-agents/sessions). There is no chat framework and no database: the session's event log is the message list, the live tokens are its SSE tail, and the only server code in the streaming path is a thin proxy that keeps the API key out of the browser. The agent plans road trips around any US national park. You'll see four API features on one screen:
+
+1. **Stream the turn live.** `event_deltas[]=...` on [`GET /v1/sessions/{id}/events/stream`](https://platform.claude.com/docs/en/managed-agents/events-and-streaming) interleaves previews with the buffered event log. `agent.message` previews carry the reply token by token, and `agent.thinking` previews fire the moment extended thinking starts. Tool calls, results, retries, and usage ride the same stream, so the UI renders the whole turn as it happens. Without previews the same chat is seconds of blank, then a paragraph.
+2. **Inject [vault credentials](https://platform.claude.com/docs/en/managed-agents/vaults) at a specific request location.** The agent authenticates to two vendor APIs with keys it never holds. The National Park Service wants its key in a request header and Windy wants it inside the JSON body. The `injection_location` field on each credential controls where the placeholder may be swapped for the real value at egress.
+3. **Override agent settings per session.** Every session is created with the `agent_with_overrides` selector. The model picker adds a `model` override to it, rerunning the same stored agent on a different model. One agent, no per-user copies, no new versions.
+4. **Hand the draft to a second agent.** The planner is a `multiagent` coordinator with one roster entry: a reviewer agent running Opus. When it drafts an itinerary it spawns the reviewer as a session thread, messages it the draft, and waits for the critique. The whole exchange is ordinary session events (`session.thread_created`, `agent.thread_message_sent`, `agent.thread_message_received`) on the same stream the chat already renders, so you watch the two agents talk to each other.
+
+```
+browser ──▶ POST /api/chat ──▶ sessions.events.send(user.message)
+   ▲
+   │ one EventSource              GET /v1/sessions/{id}/events/stream
+   └── GET /api/stream ◀──────────   ?event_deltas[]=agent.message
+       (a thin SSE proxy)            &event_deltas[]=agent.thinking
+
+   src/lib/use-managed-agent-session.ts: accumulate previews (SDK helper),
+   append buffered events, fold the log into turns, render
+
+sandbox ── curl -H "X-Api-Key: $NATIONAL_PARK_SERVICE_API_KEY" ──▶ egress ──▶ developer.nps.gov
+           the env var is an opaque placeholder                    └─ swaps in the real key when the
+                                                                      host and the location are allowed
+
+planner ── agent.thread_message_sent (the draft) ──▶ reviewer thread (Opus, same session)
+        ◀── agent.thread_message_received (the critique) ──┘
+```
+
+The session is the backend: the agent remembers the conversation, the transcript is `GET /v1/sessions/{id}/events`, and the browser holds one httpOnly cookie with the session id.
+
+The new API calls, and where to read them:
+
+- `event_deltas` on the session event stream: [`src/app/api/stream/route.ts`](./src/app/api/stream/route.ts), the SSE proxy
+- the SDK's `accumulateManagedAgentsEvent` folding previews in the browser: [`src/lib/use-managed-agent-session.ts`](./src/lib/use-managed-agent-session.ts)
+- the event log folded into renderable turns: [`src/lib/transcript.ts`](./src/lib/transcript.ts)
+- `injection_location` provisioned on each credential: [`setup/create.ts`](./setup/create.ts), step 5
+- `injection_location` flipped on a live credential: one `ant` CLI call, beat 2 below
+- `agent_with_overrides` on session create: [`src/app/api/session/route.ts`](./src/app/api/session/route.ts)
+- the `multiagent` coordinator roster on the planner agent: [`setup/create.ts`](./setup/create.ts), step 3
+- thread events folded into the rail and the chat: [`src/lib/transcript.ts`](./src/lib/transcript.ts)
+
+## Prerequisites
+
+- Node 20 or later
+- Anthropic credentials for an organization with Managed Agents access (`ANTHROPIC_API_KEY`, or the login `ant auth login` saves)
+- A National Park Service API key, free and emailed instantly: <https://www.nps.gov/subjects/developer/get-started.htm>
+- A Windy Point Forecast API key, free tier: <https://api.windy.com/point-forecast/docs>
+- `@anthropic-ai/sdk` at the latest release
+
+## Run it
+
+```bash
+cd managed_agents/roadtrip_planner
+npm install
+cp .env.example .env.local   # fill in the three keys
+npm run setup                # environment + 2 agents + vault + 2 credentials, ids -> .env.local
+npm run dev                  # http://localhost:3000
+```
+
+Setup provisions each credential with the injection location its vendor documents, hardcoded in [`setup/create.ts`](./setup/create.ts):
+
+| secret | host | injected in |
+|---|---|---|
+| `NATIONAL_PARK_SERVICE_API_KEY` | `developer.nps.gov` | header (`X-Api-Key`) |
+| `WINDY_API_KEY` | `api.windy.com` | body (`"key": ...`) |
+
+Same credential type, same vault, opposite `injection_location`. The model never sees either value.
+
+## Four things to do with it
+
+Each one is a runnable step. Together they are the cookbook.
+
+### 1. Ask for a trip
+
+"Plan a 5 day road trip split between Zion and Bryce Canyon for the first week of October." The agent resolves the parks first (`/parks?q=...`), then plans from what the APIs return. The status line flips to "thinking..." the moment the model starts reasoning: that is the `agent.thinking` preview, where the buffered event only lands when thinking ends. The reply renders as the model writes it. The right rail shows each `curl` the agent runs in its sandbox (it budgets itself to five per question), each against an allowed host, each authenticated with a key it cannot read. Any park works: swap in Acadia, Yellowstone, Joshua Tree.
+
+### 2. Flip one field and watch a vendor reject the placeholder
+
+Update the live credential with the [`ant` CLI](https://platform.claude.com/docs/en/api/sdks/cli) (it shares the credentials `npm run setup` used; the ids are in `.env.local`):
+
+```bash
+eval "$(grep '^ROADTRIP_PLANNER_' .env.local)"
+ant beta:vaults:credentials update \
+  --vault-id "$ROADTRIP_PLANNER_VAULT_ID" \
+  --credential-id "$ROADTRIP_PLANNER_NATIONAL_PARK_SERVICE_CREDENTIAL_ID" \
+  --auth '{type: environment_variable, injection_location: {header: false, body: true}}'
+```
+
+The National Park Service only accepts its key in a header, and header injection is now off for that credential. Nothing substitutes the placeholder, the next NPS call carries it literally, and NPS rejects it. Ask "is anything closed at the park right now" and watch the 4xx land in the tool rail: the agent shows the status and body, retries the documented header location once, then says plainly that NPS is rejecting its key while it keeps planning with the weather API. Heal it by setting the credential back to the location its vendor documents:
+
+```bash
+eval "$(grep '^ROADTRIP_PLANNER_' .env.local)"
+ant beta:vaults:credentials update \
+  --vault-id "$ROADTRIP_PLANNER_VAULT_ID" \
+  --credential-id "$ROADTRIP_PLANNER_NATIONAL_PARK_SERVICE_CREDENTIAL_ID" \
+  --auth '{type: environment_variable, injection_location: {header: true, body: false}}'
+```
+
+One field, no secret rotation, no redeploy, visible consequence. The same flip works the other way on the Windy credential (`$ROADTRIP_PLANNER_WINDY_CREDENTIAL_ID`, with the two booleans inverted), because Windy only documents body auth.
+
+### 3. Run the same agent on a different model
+
+Pick another model in the header. A new trip starts with a `model` override on the selector every session already uses:
+
+```ts
+anthropic.beta.sessions.create({
+  agent: { type: "agent_with_overrides", id: agentId, model: "claude-opus-4-8" },
+  // ...
+});
+```
+
+The override replaces the stored agent's model for this session only. The header shows the model from the session's resolved `agent` snapshot (`session.agent.model`), so what you read is what the API resolved, not what the client asked for. The stored agent never changes: no copy, no new version, and "New trip" returns to its configured model. `system`, `tools`, `mcp_servers`, and `skills` are overridable the same way.
+
+### 4. Watch the planner get its plan reviewed
+
+Ask for a full multi-day itinerary (beat 1's Zion and Bryce prompt works). After the draft is written, the status line flips to "waiting on Plan reviewer...": the planner spawned its roster agent as a session thread and messaged it the draft. The reviewer is a second stored agent created by `npm run setup`:
+
+```ts
+const reviewer = await anthropic.beta.agents.create({
+  name: "Plan reviewer",
+  model: "claude-opus-4-8",
+  system: REVIEWER_SYSTEM, // a quick gut-check: verdict line + at most 3 issues
+  tools: [{ type: "agent_toolset_20260401", default_config: { enabled: false }, configs: [] }],
+});
+
+const planner = await anthropic.beta.agents.create({
+  name: "Road trip planner",
+  model: "claude-sonnet-5",
+  multiagent: { type: "coordinator", agents: [reviewer.id] },
+  // ...
+});
+```
+
+Nobody asks for the review. The roster grants the capability, and the planner's own system prompt decides when to use it: a new day-by-day itinerary gets reviewed, a quick factual answer does not. The send is not a tool call and not an app request; the only trace is the thread events themselves.
+
+Two models in one session: the planner drafts on Sonnet, the gut-check runs on Opus. The reviewer's toolset is deny-by-default, which is both a security boundary (a poisoned draft has no tool to reach) and what keeps the review fast: it judges the draft text alone. Its reply still routes back, because agent-to-agent messaging is platform machinery, not a tool.
+
+The exchange renders in two places, fed by the same events. In the chat, two chips: "to Plan reviewer" (the draft going out) and "from Plan reviewer" (the critique coming back). In the rail: `session.thread_created`, then both `agent.thread_message_*` events with their full text, then the thread's status events as it goes idle. The planner's final reply ends with a "Reviewer flagged" line when the critique raised something it could not verify from the APIs.
+
+The review costs one extra agent and zero plumbing: no queue, no webhook, no second session. The thread lives inside the session, its lifecycle events cross-post to the stream the page already holds open, and the agent-to-agent messages are persisted events, so the exchange survives a reload like everything else.
+
+## The client
+
+A Managed Agents session already is a resumable chat backend, so the client is three pieces, all in [`src/lib/use-managed-agent-session.ts`](./src/lib/use-managed-agent-session.ts) and [`src/lib/transcript.ts`](./src/lib/transcript.ts):
+
+- **One EventSource** on `/api/stream`, a thin authenticated proxy of the session's SSE tail. The browser owns reconnect, and every connect re-fetches the event log first: the stream is a tail, not a replay, and previews are never persisted. That habit is the whole resume story: reload mid-answer and the page renders what streaming rendered.
+- **The SDK's `accumulateManagedAgentsEvent`**, run in the browser, folds `event_start` / `event_delta` previews into one growing `agent.message` snapshot. Previews are speculative: the buffered event with the same id retires the snapshot, and because the ids match, the rendered part does not jump. A delta for a preview the client never saw open (it attached mid-generation) is dropped, and the buffered event delivers the whole thing. An errored model request never produces its buffered event, so its terminal `span.model_request_end` drops the snapshot instead.
+- **One fold** from the event array to renderable turns. The same fold runs over hydrated history and the live array, so a reload renders what streaming rendered, turn stats included.
+
+Everything else on the stream is a status signal: thinking starts drive the activity line, `span.model_request_*` drive the working state and token stats, `agent.thread_message_sent` flips the activity line to "waiting on Plan reviewer..." until the reply lands, and `session.error` distinguishes a retry in progress from a dead turn. Stop is a real API verb: the button POSTs `user.interrupt`, the agent winds down, and the `session.status_idle` that follows flips the UI back to ready.
+
+<details>
+<summary><strong>Caveats</strong></summary>
+
+- `injection_location` has two flags: `header` and `body`. There is no query-string option, so a key in `?api_key=...` is never substituted. APIs that only accept query-param auth can't use environment variable credentials, and if the agent falls back to NPS's `?api_key=` form the request carries the placeholder.
+- Two allowlists must agree. The [environment](https://platform.claude.com/docs/en/managed-agents/environments)'s `networking.allowed_hosts` controls what the sandbox can reach at all, and the credential's `networking.allowed_hosts` controls where its secret may be substituted. A host missing from the first never connects, and a host missing from the second gets the placeholder.
+- Streaming previews cover `agent.message` (text deltas) and `agent.thinking` (start only) on the session's primary thread. Tool calls and subagent threads arrive as buffered events, so the reviewer's critique lands whole, not token by token.
+- The roster is flat: 1 to 20 entries, and a roster agent cannot have a roster of its own (depth limit 1). The model picker overrides the planner only; the reviewer thread always runs the reviewer agent's stored model.
+- Vaults and the model both attach at `sessions.create`. You can update a credential in a vault a running session already holds, but a different vault or a different model means a new session, which is why the picker starts a new trip.
+
+</details>
+
+## Files
+
+| | |
+|---|---|
+| `setup/create.ts` | environment + planner + reviewer + vault + two credentials, idempotent, prints ids |
+| `setup/teardown.ts` | archive everything the setup created |
+| `src/lib/client.ts` | the shared SDK client and the session cookie name |
+| `src/lib/use-managed-agent-session.ts` | the client runtime: one EventSource, the SDK accumulator, send/stop |
+| `src/lib/transcript.ts` | the event log folded into renderable turns |
+| `src/app/api/session/route.ts` | create or resume the session behind the cookie, return the event log |
+| `src/app/api/stream/route.ts` | SSE proxy of the session tail (the API key stays server-side) |
+| `src/app/api/chat/route.ts` | send one `user.message` |
+| `src/app/api/interrupt/route.ts` | send `user.interrupt` (the stop button) |
+| `src/app/page.tsx` | the chat, the model picker, the tool rail |
@@ -0,0 +1,11 @@
+import type { NextConfig } from "next";
+
+const nextConfig: NextConfig = {
+  // The chat page calls /api/session once on mount to create (or resume) the
+  // Managed Agents session. React strict mode double-runs effects in dev, which would
+  // race two session creates, so the effect guards itself with a ref instead
+  // of relying on this flag. Nothing here is load-bearing.
+  reactStrictMode: true,
+};
+
+export default nextConfig;