Skip to content

feat(agent-sdk): add session browser cookbook#480

Merged
qing-ant merged 2 commits into
mainfrom
qing/agent-sdk-session-browser
Apr 1, 2026
Merged

feat(agent-sdk): add session browser cookbook#480
qing-ant merged 2 commits into
mainfrom
qing/agent-sdk-session-browser

Conversation

@qing-ant

@qing-ant qing-ant commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

What

Adds notebook 05 to the Agent SDK cookbook series, covering the session management primitives that shipped in claude-agent-sdk v0.1.51: list_sessions, get_session_info, get_session_messages, rename_session, tag_session, fork_session, delete_session.

The cookbook walks through building a session browser — the sidebar pattern from Claude Code Desktop and the VS Code extension:

  1. Generate demo sessions — three short query() calls with cwd scoped to an isolated demo dir (Haiku, single-turn, no tools, to keep token spend minimal)
  2. List and inspectlist_sessions() rendered as a pandas table, get_session_info() for single lookup, get_session_messages() with content-block extraction
  3. Organizerename_session(), tag_session(), soft-delete via __hidden tag + a filter helper
  4. Fork and resumefork_session() then continue the fork with query(options=ClaudeAgentOptions(resume=fork.session_id))
  5. Cleanupdelete_session() over everything the demo created

Why

Companion cookbook to session management docs page. The docs page is reference material; this is the hands-on recipe.

Also in this PR

  • registry.yaml — cookbook entry (category: Claude Agent SDK)
  • authors.yaml — added qing-ant
  • claude_agent_sdk/pyproject.toml — bumped SDK pin from >=0.0.20 to >=0.1.51 so the session functions are available in the tutorial kernel
  • claude_agent_sdk/session_browser_demo/.gitkeep — demo workspace dir for cwd=

Testing

  • ruff check / ruff format pass
  • verify_registry.py passes (schema + GitHub handle validation)
  • pre-commit hooks pass
  • Notebook executed end-to-end and outputs committed

Adds notebook 05 covering the session management primitives that shipped
in claude-agent-sdk v0.1.51 (list_sessions, get_session_info,
get_session_messages, rename_session, tag_session, fork_session,
delete_session).

The cookbook walks through building a session browser: generating demo
sessions, listing them with pagination, reading messages back, organizing
with titles and tags, and forking into a live query() call. This is the
sidebar pattern from Claude Code Desktop and the VS Code extension.

Also bumps the agent_sdk pyproject pin to >=0.1.51 so the functions are
available in the tutorial kernel.
@github-actions

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 claude_agent_sdk/05_Building_a_session_browser.ipynb

View diff
nbdiff /dev/null claude_agent_sdk/05_Building_a_session_browser.ipynb (158d64c3c40ef365096d0969475b595125b6d815)
--- /dev/null  2026-03-30 21:26:00.696285
+++ claude_agent_sdk/05_Building_a_session_browser.ipynb (158d64c3c40ef365096d0969475b595125b6d815)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # Building a Session Browser
+      
+      When you ship an agent as a product (a desktop app, an IDE extension, an internal chatbot), the first thing users ask for is the sidebar. They want to see the conversation from last Tuesday, jump back into it, and maybe branch off in a new direction without losing the original. The agent loop is half the product; the other half is session management.
+      
+      The Claude Agent SDK writes every conversation to a JSONL transcript on disk. It also ships a set of functions that read those transcripts back and organize them, so you can build that sidebar without writing a file parser or walking `~/.claude/projects/` by hand.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - List and render past sessions for a project, with pagination and metadata like branch, title, and last-modified time
+      - Read a stored session's messages back into your UI without spawning the agent
+      - Rename, tag, and filter sessions so users can organize their history
+      - Fork a session at any point and resume the fork as a live `query()` call
+      
+      This is the pattern behind the session sidebar in Claude Code Desktop and the VS Code extension. The same primitives work for any UI you want to put on top of the Agent SDK.
+  markdown cell:
+    source:
+      ## Prerequisites
+      
+      Before following this guide, ensure you have:
+      
+      **Required Knowledge**
+      
+      - Python fundamentals, including `async`/`await`
+      - Basic familiarity with the Agent SDK's `query()` function (see [Notebook 00](00_The_one_liner_research_agent.ipynb) for an introduction)
+      
+      **Required Tools**
+      
+      - Python 3.11 or higher
+      - The Claude Code CLI (`npm install -g @anthropic-ai/claude-code`)
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      ## Setup
+      
+      Install the required dependencies. Session management functions landed in `claude-agent-sdk` v0.1.51.
+  code cell:
+    source:
+      %%capture
+      %pip install -U "claude-agent-sdk>=0.1.51" python-dotenv pandas
+  markdown cell:
+    source:
+      Load your API key from `.env` and configure the model. We use Haiku here because the demo sessions are short and we want them cheap and fast; in a real product you'd pick whatever model fits your agent.
+  code cell:
+    source:
+      import os
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      load_dotenv()
+      
+      MODEL = "claude-haiku-4-5"
+      
+      # All demo sessions live under this project directory. Using a dedicated
+      # cwd keeps the demo isolated from your real Claude Code sessions.
+      DEMO_DIR = str(Path("session_browser_demo").resolve())
+      os.makedirs(DEMO_DIR, exist_ok=True)
+      print(f"Demo project dir: {DEMO_DIR}")
+  markdown cell:
+    source:
+      # Part 1: Create some sessions to manage
+      
+      Session management functions read transcripts that `query()` (or the Claude Code CLI) has already written. To have something to browse, we first run three short conversations and capture their session IDs.
+      
+      The `cwd` option tells the SDK which project directory this conversation belongs to. Transcripts are filed under `~/.claude/projects/<encoded-cwd>/<session-id>.jsonl`, so every call with the same `cwd` ends up in the same bucket. That bucket is what `list_sessions(directory=...)` reads later.
+      
+      We disable tools and cap each run at one turn to keep token usage minimal.
+  code cell:
+    source:
+      from claude_agent_sdk import ClaudeAgentOptions, ResultMessage, query
+      
+      
+      async def run_one_turn(prompt: str) -> str:
+          """Run a single-turn conversation and return its session_id."""
+          opts = ClaudeAgentOptions(
+              model=MODEL,
+              cwd=DEMO_DIR,
+              max_turns=1,
+              allowed_tools=[],  # text-only, no tool loop
+          )
+          session_id = None
+          async for msg in query(prompt=prompt, options=opts):
+              if isinstance(msg, ResultMessage):
+                  session_id = msg.session_id
+                  preview = (msg.result or "")[:80]
+                  print(f"[{session_id[:8]}] {preview}...")
+          assert session_id is not None
+          return session_id
+  markdown cell:
+    source:
+      Three prompts, three sessions. In a real product these would be whatever your users asked the agent.
+  code cell:
+    source:
+      prompts = [
+          "Give me three name ideas for a CLI tool that manages git worktrees.",
+          "Explain the difference between a mutex and a semaphore in one paragraph.",
+          "Write a haiku about merge conflicts.",
+      ]
+      
+      demo_session_ids = []
+      for p in prompts:
+          sid = await run_one_turn(p)
+          demo_session_ids.append(sid)
+      
+      print(f"\nCreated {len(demo_session_ids)} sessions.")
+  markdown cell:
+    source:
+      # Part 2: List and inspect sessions
+      
+      ## Build the session list
+      
+      `list_sessions()` scans the project's transcript directory and returns metadata for each session, sorted newest first. It reads file stats plus the head and tail of each transcript, so it stays fast even when the directory has hundreds of files. No subprocess is spawned and no API call is made.
+      
+      Each `SDKSessionInfo` carries what you need to render a row in a picker: a display summary, last-modified timestamp, git branch, working directory, and any custom title or tag you've set.
+  code cell:
+    source:
+      from datetime import datetime
+      
+      import pandas as pd
+      
+      from claude_agent_sdk import list_sessions
+      
+      sessions = list_sessions(directory=DEMO_DIR)
+      
+      # Render as a table. In a real UI this would be your sidebar component.
+      rows = []
+      for s in sessions:
+          rows.append(
+              {
+                  "id": s.session_id[:8],
+                  "summary": (s.summary[:50] + "...") if len(s.summary) > 50 else s.summary,
+                  "modified": datetime.fromtimestamp(s.last_modified / 1000).strftime("%H:%M:%S"),
+                  "branch": s.git_branch or "-",
+                  "tag": s.tag or "-",
+              }
+          )
+      
+      pd.DataFrame(rows)
+  markdown cell:
+    source:
+      For long histories, pass `limit` and `offset` to page through results. A session picker typically loads the first page, then fetches more as the user scrolls:
+      
+      ```python
+      page_2 = list_sessions(directory=DEMO_DIR, limit=20, offset=20)
+      ```
+      
+      If your app already stored a session ID (say, in your own database alongside a user record) and you just need that one row, `get_session_info()` is cheaper than listing everything.
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_info
+      
+      info = get_session_info(demo_session_ids[0], directory=DEMO_DIR)
+      
+      print(f"Session:      {info.session_id}")
+      print(f"Summary:      {info.summary}")
+      print(f"First prompt: {info.first_prompt}")
+      print(f"Created:      {datetime.fromtimestamp(info.created_at / 1000)}")
+      print(f"Size:         {info.file_size:,} bytes")
+  markdown cell:
+    source:
+      ## Read a session's messages
+      
+      Once a user clicks a session in the sidebar, you load its conversation into the main view. `get_session_messages()` reconstructs the message chain from the transcript and returns user and assistant turns in order. Like the listing function, it's a pure file read: the agent doesn't need to be running.
+      
+      Each `SessionMessage` has a `type` (`"user"` or `"assistant"`), a `uuid`, and a `message` dict in the same shape as the Anthropic Messages API (`role`, `content`).
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_messages
+      
+      messages = get_session_messages(demo_session_ids[0], directory=DEMO_DIR)
+      
+      for m in messages:
+          role = m.type
+          # content is a list of blocks; pull out the text ones
+          text_parts = [
+              b.get("text", "")
+              for b in m.message.get("content", [])
+              if isinstance(b, dict) and b.get("type") == "text"
+          ]
+          text = " ".join(text_parts).strip()
+          print(f"[{role:>9}] {text[:100]}")
+  markdown cell:
+    source:
+      For long sessions, `limit` and `offset` let you load a window at a time. A chat view might load the last 50 messages on open, then fetch older pages as the user scrolls up. Offsets are applied in chronological order (oldest first), so page 0 is the start of the conversation.
+  markdown cell:
+    source:
+      # Part 3: Organize with titles and tags
+      
+      ## Rename a session
+      
+      Auto-generated summaries are fine for a quick glance, but users often want to give a session a real name. `rename_session()` appends a title entry to the transcript; `list_sessions()` picks it up as `custom_title` on the next read.
+      
+      Appends are cheap and idempotent: calling rename twice just means the newer title wins. No file rewrite happens.
+  code cell:
+    source:
+      from claude_agent_sdk import rename_session
+      
+      rename_session(demo_session_ids[0], "Worktree CLI naming brainstorm", directory=DEMO_DIR)
+      rename_session(demo_session_ids[2], "Haiku corner", directory=DEMO_DIR)
+      
+      # Verify the titles stuck
+      for s in list_sessions(directory=DEMO_DIR):
+          label = s.custom_title or "(auto)"
+          print(f"{s.session_id[:8]}  custom_title={label!r}  summary={s.summary[:40]!r}")
+  markdown cell:
+    source:
+      ## Tag and filter
+      
+      Tags are a single string attached to a session. Use them for whatever categorization your product needs: `"archived"`, `"needs-review"`, `"favorite"`. Pass `None` to clear a tag.
+      
+      A common pattern is soft-delete: instead of removing the transcript file, tag it `"__hidden"` and filter that out in your list view. The data stays recoverable.
+  code cell:
+    source:
+      from claude_agent_sdk import tag_session
+      
+      # Mark two sessions as favorites, hide the other
+      tag_session(demo_session_ids[0], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[2], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[1], "__hidden", directory=DEMO_DIR)
+      
+      
+      def visible_sessions(directory: str, tag_filter: str | None = None):
+          """List sessions, hiding soft-deletes and optionally filtering by tag."""
+          results = []
+          for s in list_sessions(directory=directory):
+              if s.tag == "__hidden":
+                  continue
+              if tag_filter is not None and s.tag != tag_filter:
+                  continue
+              results.append(s)
+          return results
+      
+      
+      favorites = visible_sessions(DEMO_DIR, tag_filter="favorite")
+      print(f"Visible favorites: {len(favorites)}")
+      for s in favorites:
+          print(f"  {s.session_id[:8]}  [{s.tag}]  {s.custom_title or s.summary}")
+  markdown cell:
+    source:
+      Tags are single values, not lists. If you need multiple axes (say, a status plus a category), encode them into one string like `"review:urgent"` and parse on read, or store richer state in your own database keyed on `session_id`.
+  markdown cell:
+    source:
+      # Part 4: Fork and resume
+      
+      ## Branch from an existing conversation
+      
+      Forking copies a session's transcript into a new file with freshly remapped message IDs. The original stays untouched. This is the primitive behind "try a different approach" features: the user keeps their original thread and gets a new one to experiment in.
+      
+      `fork_session()` writes the new file and returns its ID. It doesn't run the agent, so the fork sits on disk until you resume it with `query()`.
+  code cell:
+    source:
+      from claude_agent_sdk import fork_session
+      
+      source = demo_session_ids[0]
+      
+      fork = fork_session(
+          source,
+          directory=DEMO_DIR,
+          title="Worktree CLI names (round 2)",
+      )
+      
+      print(f"Source: {source}")
+      print(f"Fork:   {fork.session_id}")
+      
+      # The fork starts with the same message history as the source
+      source_msgs = get_session_messages(source, directory=DEMO_DIR)
+      fork_msgs = get_session_messages(fork.session_id, directory=DEMO_DIR)
+      print(f"Source has {len(source_msgs)} messages, fork has {len(fork_msgs)}")
+  markdown cell:
+    source:
+      To branch from a specific point rather than the full history, pass `up_to_message_id`. The fork will contain the source transcript up to and including that message. You can get message UUIDs from `get_session_messages()[i].uuid`.
+      
+      ## Resume the fork into a live query
+      
+      The fork is just a transcript file. To turn it back into a running conversation, hand its ID to `ClaudeAgentOptions.resume`. The agent loads the forked history and continues from there.
+  code cell:
+    source:
+      resume_opts = ClaudeAgentOptions(
+          model=MODEL,
+          cwd=DEMO_DIR,
+          max_turns=1,
+          allowed_tools=[],
+          resume=fork.session_id,
+      )
+      
+      async for msg in query(
+          prompt="Those were okay. Give me three more names, but punnier.",
+          options=resume_opts,
+      ):
+          if isinstance(msg, ResultMessage):
+              print(f"[fork {fork.session_id[:8]} resumed]")
+              print(msg.result)
+  markdown cell:
+    source:
+      The original session is still sitting there unchanged. List again and you'll see both the source and the fork as separate rows with independent histories.
+  code cell:
+    source:
+      for s in list_sessions(directory=DEMO_DIR):
+          marker = "(fork)" if s.session_id == fork.session_id else "      "
+          print(f"{marker} {s.session_id[:8]}  {s.custom_title or s.summary[:50]}")
+  markdown cell:
+    source:
+      # Cleanup
+      
+      `delete_session()` removes a transcript file. It's a hard delete, which is why the soft-delete tag pattern from Part 3 is usually the safer default for user-facing UIs.
+      
+      Here we use it to tidy up everything the demo created.
+  code cell:
+    source:
+      from claude_agent_sdk import delete_session
+      
+      # Clean up every session in the demo dir, including the fork
+      for s in list_sessions(directory=DEMO_DIR):
+          delete_session(s.session_id, directory=DEMO_DIR)
+          print(f"Deleted {s.session_id[:8]}")
+      
+      remaining = list_sessions(directory=DEMO_DIR)
+      print(f"\n{len(remaining)} session(s) remaining.")
+  markdown cell:
+    source:
+      # Recap
+      
+      We built the core of a session browser against the Agent SDK's local transcript store:
+      
+      - **Listing** with `list_sessions()` gives you everything needed to render a sidebar, and it scales because it reads file stats and head/tail slices rather than parsing whole transcripts.
+      - **Reading** with `get_session_messages()` loads a conversation back for display without spawning the agent.
+      - **Organizing** with `rename_session()` and `tag_session()` appends metadata entries, so it's cheap and the most recent call wins.
+      - **Forking** with `fork_session()` plus `options.resume` lets users branch a conversation and keep going without touching the original.
+      
+      All of these are pure file operations on `~/.claude/projects/`. They work whether or not the agent subprocess is running, and they see the same transcripts the Claude Code CLI writes.
+      
+      ## Where to go next
+      
+      - **Wire it to a UI.** These functions are UI-agnostic; drop them behind a FastAPI route or an Electron IPC handler and you have the backend for a session sidebar.
+      - **Cross-host sessions.** Transcripts live on the local disk. To share sessions across machines, sync the files or index session IDs and messages into your own store. See [Manage sessions on disk](https://docs.claude.com/en/agent-sdk/local-session-management) for patterns.
+      - **TypeScript.** The same API exists in `@anthropic-ai/claude-agent-sdk` with camelCase names (`listSessions`, `forkSession`, and so on). See the [TypeScript SDK reference](https://docs.claude.com/en/agent-sdk/typescript).
+      - **The bigger picture.** Notebooks [00](00_The_one_liner_research_agent.ipynb) through [03](03_The_site_reliability_agent.ipynb) cover building the agent itself. This notebook covers managing what it leaves behind.

Generated by nbdime

@github-actions

github-actions Bot commented Mar 30, 2026

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: COMMENT

Summary

This notebook is a well-structured, clear walkthrough of the Agent SDK session management API, covering all key primitives (list, inspect, rename, tag, fork, resume, delete). The writing quality is high and the four-part structure maps cleanly to the stated learning objectives.

Actionable Feedback (4 items)
  • 05_Building_a_session_browser.ipynb (all cells) — Notebook was not executed before submission; all cells have empty outputs and null execution counts. Per the project guidelines, outputs must be kept in notebooks as demonstration artifacts. Please run the notebook end-to-end and commit with outputs.
  • 05_Building_a_session_browser.ipynb (in cell with async def run_one_turn) — Replace assert session_id is not None with an explicit check: if session_id is None: raise RuntimeError("No ResultMessage received; check SDK version >=0.1.51 and API key."). assert gives no diagnostic information on failure and is stripped with -O.
  • 05_Building_a_session_browser.ipynb (in cell with def visible_sessions) — Add a return type annotation (-> list). The run_one_turn helper has -> str; consistency is worth maintaining.
  • 05_Building_a_session_browser.ipynb (markdown cell "Build the session list") — SDKSessionInfo is described by name in prose but never imported. Add it to the relevant import cell so readers have a copy-pasteable example.
Detailed Review

Code Quality

Code is clean and idiomatic. The visible_sessions helper is a good pattern to expose (soft-delete via __hidden tag). Imports are scattered one-per-section, which keeps cell outputs focused but differs from the other notebooks in this series that aggregate imports in the setup cell.

The DEMO_DIR = str(Path("session_browser_demo").resolve()) resolves relative to the kernel cwd at execution time. This works in the expected Jupyter environment but can point to an unexpected location under nbconvert --execute from the repo root. A comment like # Assumes kernel cwd is the claude_agent_sdk/ directory would surface the assumption.

The pagination and up_to_message_id examples are shown only in markdown code fences, never as live cells. Even a minimal live example (e.g., getting a UUID from get_session_messages()[0].uuid) would make the feature tangible for readers.

Security

No concerns. API key is loaded via dotenv, no secrets are hardcoded.

Suggestions

  • Convert the inline pagination snippet (page_2 = list_sessions(...)) from a markdown fence to a live cell; with only 3 demo sessions the result (empty list) is itself instructive.
  • **Required Knowledge** in the prerequisites cell is missing the colon present in the template style.

Positive Notes

  • Introduction leads with the user problem ("the sidebar users ask for first"), not the API surface — exactly right for a cookbook.
  • Haiku model choice for demo sessions is explicitly justified (token cost), which models good habits for readers.
  • The cleanup cell (hard delete) paired with the earlier soft-delete pattern gives readers both options with a clear rationale for each.
  • "Where to go next" section is thorough: covers UI wiring, cross-host patterns, and the TypeScript equivalent with correct camelCase names.

@github-actions

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 claude_agent_sdk/05_Building_a_session_browser.ipynb

View diff
nbdiff /dev/null claude_agent_sdk/05_Building_a_session_browser.ipynb (67874c17f87d53b80fc8c365eb7aa3c1c5325bd5)
--- /dev/null  2026-03-30 22:20:01.620895
+++ claude_agent_sdk/05_Building_a_session_browser.ipynb (67874c17f87d53b80fc8c365eb7aa3c1c5325bd5)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # Building a Session Browser
+      
+      When you ship an agent as a product (a desktop app, an IDE extension, an internal chatbot), the first thing users ask for is the sidebar. They want to see the conversation from last Tuesday, jump back into it, and maybe branch off in a new direction without losing the original. The agent loop is half the product; the other half is session management.
+      
+      The Claude Agent SDK writes every conversation to a JSONL transcript on disk. It also ships a set of functions that read those transcripts back and organize them, so you can build that sidebar without writing a file parser or walking `~/.claude/projects/` by hand.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - List and render past sessions for a project, with pagination and metadata like branch, title, and last-modified time
+      - Read a stored session's messages back into your UI without spawning the agent
+      - Rename, tag, and filter sessions so users can organize their history
+      - Fork a session at any point and resume the fork as a live `query()` call
+      
+      This is the pattern behind the session sidebar in Claude Code Desktop and the VS Code extension. The same primitives work for any UI you want to put on top of the Agent SDK.
+  markdown cell:
+    source:
+      ## Prerequisites
+      
+      Before following this guide, ensure you have:
+      
+      **Required Knowledge**
+      
+      - Python fundamentals, including `async`/`await`
+      - Basic familiarity with the Agent SDK's `query()` function (see [Notebook 00](00_The_one_liner_research_agent.ipynb) for an introduction)
+      
+      **Required Tools**
+      
+      - Python 3.11 or higher
+      - The Claude Code CLI (`npm install -g @anthropic-ai/claude-code`)
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      ## Setup
+      
+      Install the required dependencies. Session management functions landed in `claude-agent-sdk` v0.1.51.
+  code cell:
+    source:
+      %%capture
+      %pip install -U "claude-agent-sdk>=0.1.51" python-dotenv pandas
+  markdown cell:
+    source:
+      Load your API key from `.env` and configure the model. We use Haiku here because the demo sessions are short and we want them cheap and fast; in a real product you'd pick whatever model fits your agent.
+  code cell:
+    source:
+      import os
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      load_dotenv()
+      
+      MODEL = "claude-haiku-4-5"
+      
+      # All demo sessions live under this project directory. Using a dedicated
+      # cwd keeps the demo isolated from your real Claude Code sessions.
+      # Note: this path resolves relative to the kernel's working directory
+      # (claude_agent_sdk/ when launched per the README).
+      DEMO_DIR = str(Path("session_browser_demo").resolve())
+      os.makedirs(DEMO_DIR, exist_ok=True)
+      print(f"Demo project dir: {DEMO_DIR}")
+  markdown cell:
+    source:
+      # Part 1: Create some sessions to manage
+      
+      Session management functions read transcripts that `query()` (or the Claude Code CLI) has already written. To have something to browse, we first run three short conversations and capture their session IDs.
+      
+      The `cwd` option tells the SDK which project directory this conversation belongs to. Transcripts are filed under `~/.claude/projects/<encoded-cwd>/<session-id>.jsonl`, so every call with the same `cwd` ends up in the same bucket. That bucket is what `list_sessions(directory=...)` reads later.
+      
+      We disable tools and cap each run at one turn to keep token usage minimal.
+  code cell:
+    source:
+      from claude_agent_sdk import ClaudeAgentOptions, ResultMessage, query
+      
+      
+      async def run_one_turn(prompt: str) -> str:
+          """Run a single-turn conversation and return its session_id."""
+          opts = ClaudeAgentOptions(
+              model=MODEL,
+              cwd=DEMO_DIR,
+              max_turns=1,
+              allowed_tools=[],  # text-only, no tool loop
+          )
+          session_id = None
+          async for msg in query(prompt=prompt, options=opts):
+              if isinstance(msg, ResultMessage):
+                  session_id = msg.session_id
+                  preview = (msg.result or "")[:80]
+                  print(f"[{session_id[:8]}] {preview}...")
+          if session_id is None:
+              raise RuntimeError("No ResultMessage received; check API key and SDK version.")
+          return session_id
+  markdown cell:
+    source:
+      Three prompts, three sessions. In a real product these would be whatever your users asked the agent.
+  code cell:
+    source:
+      prompts = [
+          "Give me three name ideas for a CLI tool that manages git worktrees.",
+          "Explain the difference between a mutex and a semaphore in one paragraph.",
+          "Write a haiku about merge conflicts.",
+      ]
+      
+      demo_session_ids = []
+      for p in prompts:
+          sid = await run_one_turn(p)
+          demo_session_ids.append(sid)
+      
+      print(f"\nCreated {len(demo_session_ids)} sessions.")
+  markdown cell:
+    source:
+      # Part 2: List and inspect sessions
+      
+      ## Build the session list
+      
+      `list_sessions()` scans the project's transcript directory and returns metadata for each session, sorted newest first. It reads file stats plus the head and tail of each transcript, so it stays fast even when the directory has hundreds of files. No subprocess is spawned and no API call is made.
+      
+      Each `SDKSessionInfo` carries what you need to render a row in a picker: a display summary, last-modified timestamp, git branch, working directory, and any custom title or tag you've set.
+  code cell:
+    source:
+      from datetime import datetime
+      
+      import pandas as pd
+      
+      from claude_agent_sdk import SDKSessionInfo, list_sessions
+      
+      sessions = list_sessions(directory=DEMO_DIR)
+      
+      # Render as a table. In a real UI this would be your sidebar component.
+      rows = []
+      for s in sessions:
+          rows.append(
+              {
+                  "id": s.session_id[:8],
+                  "summary": (s.summary[:50] + "...") if len(s.summary) > 50 else s.summary,
+                  "modified": datetime.fromtimestamp(s.last_modified / 1000).strftime("%H:%M:%S"),
+                  "branch": s.git_branch or "-",
+                  "tag": s.tag or "-",
+              }
+          )
+      
+      pd.DataFrame(rows)
+  markdown cell:
+    source:
+      For long histories, pass `limit` and `offset` to page through results. A session picker typically loads the first page, then fetches more as the user scrolls:
+      
+      ```python
+      page_2 = list_sessions(directory=DEMO_DIR, limit=20, offset=20)
+      ```
+      
+      If your app already stored a session ID (say, in your own database alongside a user record) and you just need that one row, `get_session_info()` is cheaper than listing everything.
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_info
+      
+      info = get_session_info(demo_session_ids[0], directory=DEMO_DIR)
+      
+      print(f"Session:      {info.session_id}")
+      print(f"Summary:      {info.summary}")
+      print(f"First prompt: {info.first_prompt}")
+      print(f"Created:      {datetime.fromtimestamp(info.created_at / 1000)}")
+      print(f"Size:         {info.file_size:,} bytes")
+  markdown cell:
+    source:
+      ## Read a session's messages
+      
+      Once a user clicks a session in the sidebar, you load its conversation into the main view. `get_session_messages()` reconstructs the message chain from the transcript and returns user and assistant turns in order. Like the listing function, it's a pure file read: the agent doesn't need to be running.
+      
+      Each `SessionMessage` has a `type` (`"user"` or `"assistant"`), a `uuid`, and a `message` dict in the same shape as the Anthropic Messages API (`role`, `content`).
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_messages
+      
+      messages = get_session_messages(demo_session_ids[0], directory=DEMO_DIR)
+      
+      for m in messages:
+          role = m.type
+          # content is a list of blocks; pull out the text ones
+          text_parts = [
+              b.get("text", "")
+              for b in m.message.get("content", [])
+              if isinstance(b, dict) and b.get("type") == "text"
+          ]
+          text = " ".join(text_parts).strip()
+          print(f"[{role:>9}] {text[:100]}")
+  markdown cell:
+    source:
+      For long sessions, `limit` and `offset` let you load a window at a time. A chat view might load the last 50 messages on open, then fetch older pages as the user scrolls up. Offsets are applied in chronological order (oldest first), so page 0 is the start of the conversation.
+  markdown cell:
+    source:
+      # Part 3: Organize with titles and tags
+      
+      ## Rename a session
+      
+      Auto-generated summaries are fine for a quick glance, but users often want to give a session a real name. `rename_session()` appends a title entry to the transcript; `list_sessions()` picks it up as `custom_title` on the next read.
+      
+      Appends are cheap and idempotent: calling rename twice just means the newer title wins. No file rewrite happens.
+  code cell:
+    source:
+      from claude_agent_sdk import rename_session
+      
+      rename_session(demo_session_ids[0], "Worktree CLI naming brainstorm", directory=DEMO_DIR)
+      rename_session(demo_session_ids[2], "Haiku corner", directory=DEMO_DIR)
+      
+      # Verify the titles stuck
+      for s in list_sessions(directory=DEMO_DIR):
+          label = s.custom_title or "(auto)"
+          print(f"{s.session_id[:8]}  custom_title={label!r}  summary={s.summary[:40]!r}")
+  markdown cell:
+    source:
+      ## Tag and filter
+      
+      Tags are a single string attached to a session. Use them for whatever categorization your product needs: `"archived"`, `"needs-review"`, `"favorite"`. Pass `None` to clear a tag.
+      
+      A common pattern is soft-delete: instead of removing the transcript file, tag it `"__hidden"` and filter that out in your list view. The data stays recoverable.
+  code cell:
+    source:
+      from claude_agent_sdk import tag_session
+      
+      # Mark two sessions as favorites, hide the other
+      tag_session(demo_session_ids[0], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[2], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[1], "__hidden", directory=DEMO_DIR)
+      
+      
+      def visible_sessions(directory: str, tag_filter: str | None = None) -> list[SDKSessionInfo]:
+          """List sessions, hiding soft-deletes and optionally filtering by tag."""
+          results = []
+          for s in list_sessions(directory=directory):
+              if s.tag == "__hidden":
+                  continue
+              if tag_filter is not None and s.tag != tag_filter:
+                  continue
+              results.append(s)
+          return results
+      
+      
+      favorites = visible_sessions(DEMO_DIR, tag_filter="favorite")
+      print(f"Visible favorites: {len(favorites)}")
+      for s in favorites:
+          print(f"  {s.session_id[:8]}  [{s.tag}]  {s.custom_title or s.summary}")
+  markdown cell:
+    source:
+      Tags are single values, not lists. If you need multiple axes (say, a status plus a category), encode them into one string like `"review:urgent"` and parse on read, or store richer state in your own database keyed on `session_id`.
+  markdown cell:
+    source:
+      # Part 4: Fork and resume
+      
+      ## Branch from an existing conversation
+      
+      Forking copies a session's transcript into a new file with freshly remapped message IDs. The original stays untouched. This is the primitive behind "try a different approach" features: the user keeps their original thread and gets a new one to experiment in.
+      
+      `fork_session()` writes the new file and returns its ID. It doesn't run the agent, so the fork sits on disk until you resume it with `query()`.
+  code cell:
+    source:
+      from claude_agent_sdk import fork_session
+      
+      source = demo_session_ids[0]
+      
+      fork = fork_session(
+          source,
+          directory=DEMO_DIR,
+          title="Worktree CLI names (round 2)",
+      )
+      
+      print(f"Source: {source}")
+      print(f"Fork:   {fork.session_id}")
+      
+      # The fork starts with the same message history as the source
+      source_msgs = get_session_messages(source, directory=DEMO_DIR)
+      fork_msgs = get_session_messages(fork.session_id, directory=DEMO_DIR)
+      print(f"Source has {len(source_msgs)} messages, fork has {len(fork_msgs)}")
+  markdown cell:
+    source:
+      To branch from a specific point rather than the full history, pass `up_to_message_id`. The fork will contain the source transcript up to and including that message. You can get message UUIDs from `get_session_messages()[i].uuid`.
+      
+      ## Resume the fork into a live query
+      
+      The fork is just a transcript file. To turn it back into a running conversation, hand its ID to `ClaudeAgentOptions.resume`. The agent loads the forked history and continues from there.
+  code cell:
+    source:
+      resume_opts = ClaudeAgentOptions(
+          model=MODEL,
+          cwd=DEMO_DIR,
+          max_turns=1,
+          allowed_tools=[],
+          resume=fork.session_id,
+      )
+      
+      async for msg in query(
+          prompt="Those were okay. Give me three more names, but punnier.",
+          options=resume_opts,
+      ):
+          if isinstance(msg, ResultMessage):
+              print(f"[fork {fork.session_id[:8]} resumed]")
+              print(msg.result)
+  markdown cell:
+    source:
+      The original session is still sitting there unchanged. List again and you'll see both the source and the fork as separate rows with independent histories.
+  code cell:
+    source:
+      for s in list_sessions(directory=DEMO_DIR):
+          marker = "(fork)" if s.session_id == fork.session_id else "      "
+          print(f"{marker} {s.session_id[:8]}  {s.custom_title or s.summary[:50]}")
+  markdown cell:
+    source:
+      # Cleanup
+      
+      `delete_session()` removes a transcript file. It's a hard delete, which is why the soft-delete tag pattern from Part 3 is usually the safer default for user-facing UIs.
+      
+      Here we use it to tidy up everything the demo created.
+  code cell:
+    source:
+      from claude_agent_sdk import delete_session
+      
+      # Clean up every session in the demo dir, including the fork
+      for s in list_sessions(directory=DEMO_DIR):
+          delete_session(s.session_id, directory=DEMO_DIR)
+          print(f"Deleted {s.session_id[:8]}")
+      
+      remaining = list_sessions(directory=DEMO_DIR)
+      print(f"\n{len(remaining)} session(s) remaining.")
+  markdown cell:
+    source:
+      # Recap
+      
+      We built the core of a session browser against the Agent SDK's local transcript store:
+      
+      - **Listing** with `list_sessions()` gives you everything needed to render a sidebar, and it scales because it reads file stats and head/tail slices rather than parsing whole transcripts.
+      - **Reading** with `get_session_messages()` loads a conversation back for display without spawning the agent.
+      - **Organizing** with `rename_session()` and `tag_session()` appends metadata entries, so it's cheap and the most recent call wins.
+      - **Forking** with `fork_session()` plus `options.resume` lets users branch a conversation and keep going without touching the original.
+      
+      All of these are pure file operations on `~/.claude/projects/`. They work whether or not the agent subprocess is running, and they see the same transcripts the Claude Code CLI writes.
+      
+      ## Where to go next
+      
+      - **Wire it to a UI.** These functions are UI-agnostic; drop them behind a FastAPI route or an Electron IPC handler and you have the backend for a session sidebar.
+      - **Cross-host sessions.** Transcripts live on the local disk. To share sessions across machines, sync the files or index session IDs and messages into your own store. See [Manage sessions on disk](https://docs.claude.com/en/agent-sdk/local-session-management) for patterns.
+      - **TypeScript.** The same API exists in `@anthropic-ai/claude-agent-sdk` with camelCase names (`listSessions`, `forkSession`, and so on). See the [TypeScript SDK reference](https://docs.claude.com/en/agent-sdk/typescript).
+      - **The bigger picture.** Notebooks [00](00_The_one_liner_research_agent.ipynb) through [03](03_The_site_reliability_agent.ipynb) cover building the agent itself. This notebook covers managing what it leaves behind.

Generated by nbdime

@qing-ant qing-ant marked this pull request as ready for review March 30, 2026 22:22

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: COMMENT

Summary

Adds a well-structured session browser cookbook (notebook 05) demonstrating Agent SDK session management primitives: list, inspect, rename, tag, fork, and delete sessions. The narrative framing is strong and follows project conventions closely.

Actionable Feedback (4 items)
  • 05_Building_a_session_browser.ipynb (cleanup cell with for s in list_sessions(...)) — Snapshot the list before iterating to avoid potential mutation-during-iteration issues: sessions_to_delete = list_sessions(directory=DEMO_DIR) then iterate over that. Teaches a safer pattern that readers will copy.
  • 05_Building_a_session_browser.ipynb (cell with def visible_sessions(...)) — SDKSessionInfo is used in the function's return type annotation but is only imported in an earlier cell. Add from claude_agent_sdk import SDKSessionInfo, tag_session to this cell so it's self-contained and won't raise NameError if run standalone.
  • session_browser_demo/.gitkeep — Convention is that .gitkeep files are empty (their only purpose is to force git to track an empty directory). The comment # Session browser demo workspace should be removed; use an empty file or a README.md if a note is needed.
  • 05_Building_a_session_browser.ipynb (list-sessions cell with datetime.fromtimestamp(s.last_modified / 1000)) — fromtimestamp() returns a naive local-time datetime. Add a # local time comment or use datetime.fromtimestamp(..., tz=timezone.utc) to avoid reader confusion across timezones.
Detailed Review

Code Quality

The notebook follows project conventions well. %%capture is used correctly for pip installation, load_dotenv() is used for API key access, MODEL is defined as a top-level constant, and the structure flows logically from setup → create → list → organize → fork → cleanup. The visible_sessions helper demonstrating soft-delete via __hidden tag is a good real-world pattern worth keeping.

Security

No concerns. API key is loaded via dotenv, no hardcoded credentials, and DEMO_DIR uses an isolated subdirectory to avoid interfering with real sessions.

Suggestions

The markdown comment explaining cwd path resolution ("Note: this path resolves relative to the kernel's working directory") is buried in a code comment. Surfacing it in the adjacent markdown cell would help readers who skim prose but skip code comments.

Positive Notes

  • Strong narrative hook — opens with the user problem ("the sidebar") rather than the API machinery.
  • pyproject.toml bump is well-justified: the markdown explains why >=0.1.51 is needed.
  • Cleanup cell demonstrates delete_session clearly and explains the soft-delete alternative.
  • "Where to go next" section includes a TypeScript callout and links to both related notebooks and docs — genuinely useful.
  • Model choice (claude-haiku-4-5) is correct per CLAUDE.md and the reasoning (cheap demo sessions) is explained.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: COMMENT

Summary

Adds a well-crafted session browser cookbook (notebook 05) covering all the major Agent SDK session-management primitives — list, inspect, rename, tag, fork, and resume. The code is idiomatic, the narrative is clear, and all project conventions are followed correctly.

Actionable Feedback (3 items)
  • claude_agent_sdk/session_browser_demo/.gitkeep — Remove the comment line (# Session browser demo workspace). A .gitkeep file should be zero bytes; the convention is broken by adding content, and some tooling checks for empty files to determine if a directory is "pristine". The directory's purpose is already communicated by its name and the notebook prose.

  • 05_Building_a_session_browser.ipynb (setup cell with DEMO_DIR = str(Path("session_browser_demo").resolve())) — The comment already acknowledges this is fragile, but it will silently break for most users who open the notebook from the repo root (e.g., from VS Code or JupyterLab), resolving to <repo-root>/session_browser_demo/ instead of <repo-root>/claude_agent_sdk/session_browser_demo/. Consider using Path(__file__).resolve().parent / "session_browser_demo" (works in script mode), or for notebooks a __vsc_ipynb_file__-based fallback. At minimum, add a visible warning cell prompting the user to verify the printed path before running further cells.

  • 05_Building_a_session_browser.ipynb (Part headings using #) — "Part 1/2/3/4" headings use # (H1), the same level as the notebook title. Move them to ## and their sub-sections to ### to produce a correct document hierarchy in JupyterBook/nbconvert output.

Detailed Review

Code Quality

The code is clean, idiomatic, and follows all project conventions: non-dated model alias (claude-haiku-4-5), load_dotenv() + os.environ for the API key, %%capture on the install cell, and a raise RuntimeError instead of silently returning None when no result arrives. The visible_sessions soft-delete helper is a genuinely useful real-world pattern. The cleanup cell at the end is a thoughtful addition that keeps the demo re-runnable.

Security

No concerns. No hardcoded keys; authentication follows the correct dotenv pattern throughout.

Suggestions

  • The timestamp division / 1000 (milliseconds → seconds) is correct but unexplained. A brief inline comment (# SDK timestamps are Unix milliseconds) would help readers who inspect the raw JSONL.
  • run_one_turn silently discards non-ResultMessage types. A one-line comment noting this is intentional would reduce reader confusion.
  • The Recap section summarises function signatures more than it closes the loop on the four TLOs stated in the introduction. Restating the TLOs as accomplished outcomes ("You can now list sessions… fork and resume from any point…") would strengthen the conclusion.
  • The pyproject.toml bump from >=0.0.20 to >=0.1.51 is a breaking change for environments pinned to older versions. Worth a brief comment noting the constraint is driven by the session-management APIs added in 0.1.51.

Positive Notes

  • Excellent opening paragraph — "the first thing users ask for is the sidebar" immediately grounds the notebook in a real product scenario.
  • The fork → resume round-trip section is the clearest demonstration of this pattern I've seen in the series.
  • authors.yaml and registry.yaml entries are complete and well-formed.
  • The notebook progresses logically from generating test data → reading → organizing → branching → cleanup, making it easy to follow even without running cells.

@qing-ant

Copy link
Copy Markdown
Contributor Author

@github-actions

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 claude_agent_sdk/05_Building_a_session_browser.ipynb

View diff
nbdiff /dev/null claude_agent_sdk/05_Building_a_session_browser.ipynb (565110aecf1348b0dca4bd30112505c1e3c5ea43)
--- /dev/null  2026-03-30 22:26:37.002839
+++ claude_agent_sdk/05_Building_a_session_browser.ipynb (565110aecf1348b0dca4bd30112505c1e3c5ea43)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # Building a Session Browser
+      
+      When you ship an agent as a product (a desktop app, an IDE extension, an internal chatbot), the first thing users ask for is the sidebar. They want to see the conversation from last Tuesday, jump back into it, and maybe branch off in a new direction without losing the original. The agent loop is half the product; the other half is session management.
+      
+      The Claude Agent SDK writes every conversation to a JSONL transcript on disk. It also ships a set of functions that read those transcripts back and organize them, so you can build that sidebar without writing a file parser or walking `~/.claude/projects/` by hand.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - List and render past sessions for a project, with pagination and metadata like branch, title, and last-modified time
+      - Read a stored session's messages back into your UI without spawning the agent
+      - Rename, tag, and filter sessions so users can organize their history
+      - Fork a session at any point and resume the fork as a live `query()` call
+      
+      This is the pattern behind the session sidebar in Claude Code Desktop and the VS Code extension. The same primitives work for any UI you want to put on top of the Agent SDK.
+  markdown cell:
+    source:
+      ## Prerequisites
+      
+      Before following this guide, ensure you have:
+      
+      **Required Knowledge**
+      
+      - Python fundamentals, including `async`/`await`
+      - Basic familiarity with the Agent SDK's `query()` function (see [Notebook 00](00_The_one_liner_research_agent.ipynb) for an introduction)
+      
+      **Required Tools**
+      
+      - Python 3.11 or higher
+      - The Claude Code CLI (`npm install -g @anthropic-ai/claude-code`)
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      ## Setup
+      
+      Install the required dependencies. Session management functions landed in `claude-agent-sdk` v0.1.51.
+  code cell:
+    source:
+      %%capture
+      %pip install -U "claude-agent-sdk>=0.1.51" python-dotenv pandas
+  markdown cell:
+    source:
+      Load your API key from `.env` and configure the model. We use Haiku here because the demo sessions are short and we want them cheap and fast; in a real product you'd pick whatever model fits your agent.
+  code cell:
+    source:
+      import os
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      load_dotenv()
+      
+      MODEL = "claude-haiku-4-5"
+      
+      # All demo sessions live under this project directory. Using a dedicated
+      # cwd keeps the demo isolated from your real Claude Code sessions.
+      # Note: this path resolves relative to the kernel's working directory
+      # (claude_agent_sdk/ when launched per the README).
+      DEMO_DIR = str(Path("session_browser_demo").resolve())
+      os.makedirs(DEMO_DIR, exist_ok=True)
+      print(f"Demo project dir: {DEMO_DIR}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Demo project dir: /path/to/claude-cookbooks/claude_agent_sdk/session_browser_demo
+  markdown cell:
+    source:
+      # Part 1: Create some sessions to manage
+      
+      Session management functions read transcripts that `query()` (or the Claude Code CLI) has already written. To have something to browse, we first run three short conversations and capture their session IDs.
+      
+      The `cwd` option tells the SDK which project directory this conversation belongs to. Transcripts are filed under `~/.claude/projects/<encoded-cwd>/<session-id>.jsonl`, so every call with the same `cwd` ends up in the same bucket. That bucket is what `list_sessions(directory=...)` reads later.
+      
+      We disable tools and cap each run at one turn to keep token usage minimal.
+  code cell:
+    source:
+      from claude_agent_sdk import ClaudeAgentOptions, ResultMessage, query
+      
+      
+      async def run_one_turn(prompt: str) -> str:
+          """Run a single-turn conversation and return its session_id."""
+          opts = ClaudeAgentOptions(
+              model=MODEL,
+              cwd=DEMO_DIR,
+              max_turns=1,
+              allowed_tools=[],  # text-only, no tool loop
+          )
+          session_id = None
+          async for msg in query(prompt=prompt, options=opts):
+              if isinstance(msg, ResultMessage):
+                  session_id = msg.session_id
+                  preview = (msg.result or "")[:80]
+                  print(f"[{session_id[:8]}] {preview}...")
+          if session_id is None:
+              raise RuntimeError("No ResultMessage received; check API key and SDK version.")
+          return session_id
+  markdown cell:
+    source:
+      Three prompts, three sessions. In a real product these would be whatever your users asked the agent.
+  code cell:
+    source:
+      prompts = [
+          "Give me three name ideas for a CLI tool that manages git worktrees.",
+          "Explain the difference between a mutex and a semaphore in one paragraph.",
+          "Write a haiku about merge conflicts.",
+      ]
+      
+      demo_session_ids = []
+      for p in prompts:
+          sid = await run_one_turn(p)
+          demo_session_ids.append(sid)
+      
+      print(f"\nCreated {len(demo_session_ids)} sessions.")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          [6c06fad2] Here are three name ideas for a git worktree management CLI tool:
+          
+          1. **Trunk** ...
+      output 1:
+        output_type: stream
+        name: stdout
+        text:
+          [1e81ee11] A **mutex** (mutual exclusion lock) is a binary synchronization primitive that e...
+      output 2:
+        output_type: stream
+        name: stdout
+        text:
+          [cf35447e] Branches diverge fast
+          Conflicting changes collide
+          Rebase saves the day...
+      output 3:
+        output_type: stream
+        name: stdout
+        text:
+          
+          Created 3 sessions.
+  markdown cell:
+    source:
+      # Part 2: List and inspect sessions
+      
+      ## Build the session list
+      
+      `list_sessions()` scans the project's transcript directory and returns metadata for each session, sorted newest first. It reads file stats plus the head and tail of each transcript, so it stays fast even when the directory has hundreds of files. No subprocess is spawned and no API call is made.
+      
+      Each `SDKSessionInfo` carries what you need to render a row in a picker: a display summary, last-modified timestamp, git branch, working directory, and any custom title or tag you've set.
+  code cell:
+    source:
+      from datetime import datetime
+      
+      import pandas as pd
+      
+      from claude_agent_sdk import SDKSessionInfo, list_sessions
+      
+      sessions = list_sessions(directory=DEMO_DIR)
+      
+      # Render as a table. In a real UI this would be your sidebar component.
+      rows = []
+      for s in sessions:
+          rows.append(
+              {
+                  "id": s.session_id[:8],
+                  "summary": (s.summary[:50] + "...") if len(s.summary) > 50 else s.summary,
+                  "modified": datetime.fromtimestamp(s.last_modified / 1000).strftime("%H:%M:%S"),
+                  "branch": s.git_branch or "-",
+                  "tag": s.tag or "-",
+              }
+          )
+      
+      pd.DataFrame(rows)
+    outputs:
+      output 0:
+        output_type: execute_result
+        execution_count: 5
+        data:
+          text/html:
+            <div>
+            <style scoped>
+                .dataframe tbody tr th:only-of-type {
+                    vertical-align: middle;
+                }
+            
+                .dataframe tbody tr th {
+                    vertical-align: top;
+                }
+            
+                .dataframe thead th {
+                    text-align: right;
+                }
+            </style>
+            <table border="1" class="dataframe">
+              <thead>
+                <tr style="text-align: right;">
+                  <th></th>
+                  <th>id</th>
+                  <th>summary</th>
+                  <th>modified</th>
+                  <th>branch</th>
+                  <th>tag</th>
+                </tr>
+              </thead>
+              <tbody>
+                <tr>
+                  <th>0</th>
+                  <td>cf35447e</td>
+                  <td>Write a haiku about merge conflicts.</td>
+                  <td>15:26:23</td>
+                  <td>qing/agent-sdk-session-browser</td>
+                  <td>-</td>
+                </tr>
+                <tr>
+                  <th>1</th>
+                  <td>1e81ee11</td>
+                  <td>Explain the difference between a mutex and a s...</td>
+                  <td>15:26:18</td>
+                  <td>qing/agent-sdk-session-browser</td>
+                  <td>-</td>
+                </tr>
+                <tr>
+                  <th>2</th>
+                  <td>6c06fad2</td>
+                  <td>Give me three name ideas for a CLI tool that m...</td>
+                  <td>15:26:12</td>
+                  <td>qing/agent-sdk-session-browser</td>
+                  <td>-</td>
+                </tr>
+              </tbody>
+            </table>
+            </div>
+          text/plain:
+                     id                                            summary  modified  \
+            0  cf35447e               Write a haiku about merge conflicts.  15:26:23   
+            1  1e81ee11  Explain the difference between a mutex and a s...  15:26:18   
+            2  6c06fad2  Give me three name ideas for a CLI tool that m...  15:26:12   
+            
+                                       branch tag  
+            0  qing/agent-sdk-session-browser   -  
+            1  qing/agent-sdk-session-browser   -  
+            2  qing/agent-sdk-session-browser   -  
+  markdown cell:
+    source:
+      For long histories, pass `limit` and `offset` to page through results. A session picker typically loads the first page, then fetches more as the user scrolls:
+      
+      ```python
+      page_2 = list_sessions(directory=DEMO_DIR, limit=20, offset=20)
+      ```
+      
+      If your app already stored a session ID (say, in your own database alongside a user record) and you just need that one row, `get_session_info()` is cheaper than listing everything.
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_info
+      
+      info = get_session_info(demo_session_ids[0], directory=DEMO_DIR)
+      
+      print(f"Session:      {info.session_id}")
+      print(f"Summary:      {info.summary}")
+      print(f"First prompt: {info.first_prompt}")
+      print(f"Created:      {datetime.fromtimestamp(info.created_at / 1000)}")
+      print(f"Size:         {info.file_size:,} bytes")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Session:      6c06fad2-0e7d-4b2b-8dc7-8ab68284034d
+          Summary:      Give me three name ideas for a CLI tool that manages git worktrees.
+          First prompt: Give me three name ideas for a CLI tool that manages git worktrees.
+          Created:      2026-03-30 15:26:06.310000
+          Size:         8,508 bytes
+  markdown cell:
+    source:
+      ## Read a session's messages
+      
+      Once a user clicks a session in the sidebar, you load its conversation into the main view. `get_session_messages()` reconstructs the message chain from the transcript and returns user and assistant turns in order. Like the listing function, it's a pure file read: the agent doesn't need to be running.
+      
+      Each `SessionMessage` has a `type` (`"user"` or `"assistant"`), a `uuid`, and a `message` dict in the same shape as the Anthropic Messages API (`role`, `content`).
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_messages
+      
+      messages = get_session_messages(demo_session_ids[0], directory=DEMO_DIR)
+      
+      for m in messages:
+          role = m.type
+          # content is a list of blocks; pull out the text ones
+          text_parts = [
+              b.get("text", "")
+              for b in m.message.get("content", [])
+              if isinstance(b, dict) and b.get("type") == "text"
+          ]
+          text = " ".join(text_parts).strip()
+          print(f"[{role:>9}] {text[:100]}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          [     user] 
+          [assistant] 
+          [assistant] Here are three name ideas for a git worktree management CLI tool:
+          
+          1. **Trunk** — Simple, memorable,
+  markdown cell:
+    source:
+      For long sessions, `limit` and `offset` let you load a window at a time. A chat view might load the last 50 messages on open, then fetch older pages as the user scrolls up. Offsets are applied in chronological order (oldest first), so page 0 is the start of the conversation.
+  markdown cell:
+    source:
+      # Part 3: Organize with titles and tags
+      
+      ## Rename a session
+      
+      Auto-generated summaries are fine for a quick glance, but users often want to give a session a real name. `rename_session()` appends a title entry to the transcript; `list_sessions()` picks it up as `custom_title` on the next read.
+      
+      Appends are cheap and idempotent: calling rename twice just means the newer title wins. No file rewrite happens.
+  code cell:
+    source:
+      from claude_agent_sdk import rename_session
+      
+      rename_session(demo_session_ids[0], "Worktree CLI naming brainstorm", directory=DEMO_DIR)
+      rename_session(demo_session_ids[2], "Haiku corner", directory=DEMO_DIR)
+      
+      # Verify the titles stuck
+      for s in list_sessions(directory=DEMO_DIR):
+          label = s.custom_title or "(auto)"
+          print(f"{s.session_id[:8]}  custom_title={label!r}  summary={s.summary[:40]!r}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          cf35447e  custom_title='Haiku corner'  summary='Haiku corner'
+          6c06fad2  custom_title='Worktree CLI naming brainstorm'  summary='Worktree CLI naming brainstorm'
+          1e81ee11  custom_title='(auto)'  summary='Explain the difference between a mutex a'
+  markdown cell:
+    source:
+      ## Tag and filter
+      
+      Tags are a single string attached to a session. Use them for whatever categorization your product needs: `"archived"`, `"needs-review"`, `"favorite"`. Pass `None` to clear a tag.
+      
+      A common pattern is soft-delete: instead of removing the transcript file, tag it `"__hidden"` and filter that out in your list view. The data stays recoverable.
+  code cell:
+    source:
+      from claude_agent_sdk import tag_session
+      
+      # Mark two sessions as favorites, hide the other
+      tag_session(demo_session_ids[0], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[2], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[1], "__hidden", directory=DEMO_DIR)
+      
+      
+      def visible_sessions(directory: str, tag_filter: str | None = None) -> list[SDKSessionInfo]:
+          """List sessions, hiding soft-deletes and optionally filtering by tag."""
+          results = []
+          for s in list_sessions(directory=directory):
+              if s.tag == "__hidden":
+                  continue
+              if tag_filter is not None and s.tag != tag_filter:
+                  continue
+              results.append(s)
+          return results
+      
+      
+      favorites = visible_sessions(DEMO_DIR, tag_filter="favorite")
+      print(f"Visible favorites: {len(favorites)}")
+      for s in favorites:
+          print(f"  {s.session_id[:8]}  [{s.tag}]  {s.custom_title or s.summary}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Visible favorites: 2
+            cf35447e  [favorite]  Haiku corner
+            6c06fad2  [favorite]  Worktree CLI naming brainstorm
+  markdown cell:
+    source:
+      Tags are single values, not lists. If you need multiple axes (say, a status plus a category), encode them into one string like `"review:urgent"` and parse on read, or store richer state in your own database keyed on `session_id`.
+  markdown cell:
+    source:
+      # Part 4: Fork and resume
+      
+      ## Branch from an existing conversation
+      
+      Forking copies a session's transcript into a new file with freshly remapped message IDs. The original stays untouched. This is the primitive behind "try a different approach" features: the user keeps their original thread and gets a new one to experiment in.
+      
+      `fork_session()` writes the new file and returns its ID. It doesn't run the agent, so the fork sits on disk until you resume it with `query()`.
+  code cell:
+    source:
+      from claude_agent_sdk import fork_session
+      
+      source = demo_session_ids[0]
+      
+      fork = fork_session(
+          source,
+          directory=DEMO_DIR,
+          title="Worktree CLI names (round 2)",
+      )
+      
+      print(f"Source: {source}")
+      print(f"Fork:   {fork.session_id}")
+      
+      # The fork starts with the same message history as the source
+      source_msgs = get_session_messages(source, directory=DEMO_DIR)
+      fork_msgs = get_session_messages(fork.session_id, directory=DEMO_DIR)
+      print(f"Source has {len(source_msgs)} messages, fork has {len(fork_msgs)}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Source: 6c06fad2-0e7d-4b2b-8dc7-8ab68284034d
+          Fork:   03f8fe23-9c48-4337-a0f3-f05c93e6f15f
+          Source has 3 messages, fork has 3
+  markdown cell:
+    source:
+      To branch from a specific point rather than the full history, pass `up_to_message_id`. The fork will contain the source transcript up to and including that message. You can get message UUIDs from `get_session_messages()[i].uuid`.
+      
+      ## Resume the fork into a live query
+      
+      The fork is just a transcript file. To turn it back into a running conversation, hand its ID to `ClaudeAgentOptions.resume`. The agent loads the forked history and continues from there.
+  code cell:
+    source:
+      resume_opts = ClaudeAgentOptions(
+          model=MODEL,
+          cwd=DEMO_DIR,
+          max_turns=1,
+          allowed_tools=[],
+          resume=fork.session_id,
+      )
+      
+      async for msg in query(
+          prompt="Those were okay. Give me three more names, but punnier.",
+          options=resume_opts,
+      ):
+          if isinstance(msg, ResultMessage):
+              print(f"[fork {fork.session_id[:8]} resumed]")
+              print(msg.result)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          [fork 03f8fe23 resumed]
+          Here are three punchier names with actual puns:
+          
+          1. **Fir Real** — A "for real" pun using "fir" (the tree). Perfect for when you want to emphasize this tool actually gets the job done. Has that groan-inducing pun quality.
+          
+          2. **Leaf It To Me** — A "leave it to me" pun with "leaf." Classic pun structure that works especially well if your worktree management is reliable enough to trust completely.
+          
+          3. **Birch, Please** — A cheeky "please" pun using "birch" (the tree). Bold and memorable, with that perfectly cheesy programmer-humor vibe.
+          
+          All three have that satisfying "why would you name it that?" followed by "oh, I get it" moment that good puns should deliver. 😄
+  markdown cell:
+    source:
+      The original session is still sitting there unchanged. List again and you'll see both the source and the fork as separate rows with independent histories.
+  code cell:
+    source:
+      for s in list_sessions(directory=DEMO_DIR):
+          marker = "(fork)" if s.session_id == fork.session_id else "      "
+          print(f"{marker} {s.session_id[:8]}  {s.custom_title or s.summary[:50]}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          (fork) 03f8fe23  Worktree CLI names (round 2)
+                 cf35447e  Haiku corner
+                 1e81ee11  Explain the difference between a mutex and a semap
+                 6c06fad2  Worktree CLI naming brainstorm
+  markdown cell:
+    source:
+      # Cleanup
+      
+      `delete_session()` removes a transcript file. It's a hard delete, which is why the soft-delete tag pattern from Part 3 is usually the safer default for user-facing UIs.
+      
+      Here we use it to tidy up everything the demo created.
+  code cell:
+    source:
+      from claude_agent_sdk import delete_session
+      
+      # Clean up every session in the demo dir, including the fork
+      for s in list_sessions(directory=DEMO_DIR):
+          delete_session(s.session_id, directory=DEMO_DIR)
+          print(f"Deleted {s.session_id[:8]}")
+      
+      remaining = list_sessions(directory=DEMO_DIR)
+      print(f"\n{len(remaining)} session(s) remaining.")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Deleted 03f8fe23
+          Deleted cf35447e
+          Deleted 1e81ee11
+          Deleted 6c06fad2
+          
+          0 session(s) remaining.
+  markdown cell:
+    source:
+      # Recap
+      
+      We built the core of a session browser against the Agent SDK's local transcript store:
+      
+      - **Listing** with `list_sessions()` gives you everything needed to render a sidebar, and it scales because it reads file stats and head/tail slices rather than parsing whole transcripts.
+      - **Reading** with `get_session_messages()` loads a conversation back for display without spawning the agent.
+      - **Organizing** with `rename_session()` and `tag_session()` appends metadata entries, so it's cheap and the most recent call wins.
+      - **Forking** with `fork_session()` plus `options.resume` lets users branch a conversation and keep going without touching the original.
+      
+      All of these are pure file operations on `~/.claude/projects/`. They work whether or not the agent subprocess is running, and they see the same transcripts the Claude Code CLI writes.
+      
+      ## Where to go next
+      
+      - **Wire it to a UI.** These functions are UI-agnostic; drop them behind a FastAPI route or an Electron IPC handler and you have the backend for a session sidebar.
+      - **Cross-host sessions.** Transcripts live on the local disk. To share sessions across machines, sync the files or index session IDs and messages into your own store. See [Manage sessions on disk](https://docs.claude.com/en/agent-sdk/local-session-management) for patterns.
+      - **TypeScript.** The same API exists in `@anthropic-ai/claude-agent-sdk` with camelCase names (`listSessions`, `forkSession`, and so on). See the [TypeScript SDK reference](https://docs.claude.com/en/agent-sdk/typescript).
+      - **The bigger picture.** Notebooks [00](00_The_one_liner_research_agent.ipynb) through [03](03_The_site_reliability_agent.ipynb) cover building the agent itself. This notebook covers managing what it leaves behind.

Generated by nbdime

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: COMMENT

Summary

This PR adds a well-structured session browser cookbook for the Claude Agent SDK, demonstrating list_sessions, get_session_messages, rename_session, tag_session, fork_session, and delete_session. The notebook is pedagogically coherent and follows project standards (correct model ID, dotenv pattern, outputs committed).

Actionable Feedback (4 items)
  • 05_Building_a_session_browser.ipynb (in cell with for m in messages:) — The output shows blank lines for both the user turn and first assistant turn. The text extraction only handles {"type": "text", ...} dicts, but the user message content is likely a plain string, not a list of content blocks. Either guard against isinstance(content, str) or add a markdown note explaining why the user-turn line is blank, so readers aren't confused.

  • 05_Building_a_session_browser.ipynb (in cell with DEMO_DIR = str(Path("session_browser_demo").resolve())) — This path resolves relative to the kernel's cwd. If a reader launches Jupyter from the repo root (the common default), it resolves to the wrong location. The inline comment warns about this, but a more prominent callout in the Prerequisites section — e.g. "you must launch Jupyter from the claude_agent_sdk/ directory" — would prevent silent failures.

  • 05_Building_a_session_browser.ipynb (in cell with def visible_sessions(...)) — The function unconditionally filters out __hidden sessions before checking tag_filter. Passing tag_filter="__hidden" (e.g. to build a "trash/recover" view) always returns an empty list. The prose says "the data stays recoverable" but the helper as written makes recovery impossible through the same function. Either fix the guard order or document this limitation.

  • 05_Building_a_session_browser.ipynb — Verify that the two docs.claude.com links in the "Where to go next" section resolve: https://docs.claude.com/en/agent-sdk/local-session-management and https://docs.claude.com/en/agent-sdk/typescript. These appear to be forward-references to new docs pages; broken links in a published cookbook are a common trust issue.

Detailed Review

Code Quality

Overall high quality. Model ID (claude-haiku-4-5) is correct per project standards. %%capture on pip install, dotenv.load_dotenv(), and imports are all idiomatic. The run_one_turn function correctly raises RuntimeError if no ResultMessage arrives rather than silently returning None. The visible_sessions return type annotation (-> list[SDKSessionInfo]) and the str | None parameter type are good modern Python.

Security

No issues. No hardcoded API keys; keys are loaded via dotenv. No subprocess injection risks.

Suggestions

  • The sequential for p in prompts: await run_one_turn(p) loop could use a comment noting it's intentionally sequential (vs. asyncio.gather) to keep printed output in order.
  • The fork-resume cell doesn't guard against a missing ResultMessage the way run_one_turn does — a small inconsistency.
  • The datetime.fromtimestamp(s.last_modified / 1000) pattern (milliseconds → seconds) appears twice without explanation; a brief inline comment # SDK timestamps are milliseconds on the first occurrence would help readers.

Positive Notes

  • Notebook structure maps cleanly to its stated learning objectives. Each Part has a concrete action goal, and the Recap section ties back to every bullet in the intro.
  • The delete_session() section correctly frames hard-delete vs. soft-delete (__hidden tag) as a UX tradeoff, which is good advice for anyone building a real product on this.
  • Committed outputs confirm the notebook runs end-to-end.
  • pyproject.toml SDK pin bump (>=0.0.20>=0.1.51) is correctly scoped to the claude_agent_sdk/ subdirectory.

bogini
bogini previously approved these changes Mar 31, 2026

@bogini bogini left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamped 🧑‍🍳

@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 claude_agent_sdk/05_Building_a_session_browser.ipynb

View diff
nbdiff /dev/null claude_agent_sdk/05_Building_a_session_browser.ipynb (a8c50fb5f30f168ccca07634749c3c4fe0e02a8c)
--- /dev/null  2026-04-01 00:08:08.313548
+++ claude_agent_sdk/05_Building_a_session_browser.ipynb (a8c50fb5f30f168ccca07634749c3c4fe0e02a8c)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # Building a Session Browser
+      
+      When you ship an agent as a product (a desktop app, an IDE extension, an internal chatbot), the first thing users ask for is the sidebar. They want to see the conversation from last Tuesday, jump back into it, and maybe branch off in a new direction without losing the original. The agent loop is half the product; the other half is session management.
+      
+      The Claude Agent SDK writes every conversation to a JSONL transcript on disk. It also ships a set of functions that read those transcripts back and organize them, so you can build that sidebar without writing a file parser or walking `~/.claude/projects/` by hand.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - List and render past sessions for a project, with pagination and metadata like branch, title, and last-modified time
+      - Read a stored session's messages back into your UI without spawning the agent
+      - Rename, tag, and filter sessions so users can organize their history
+      - Fork a session at any point and resume the fork as a live `query()` call
+      
+      This is the pattern behind the session sidebar in Claude Code Desktop and the VS Code extension. The same primitives work for any UI you want to put on top of the Agent SDK.
+  markdown cell:
+    source:
+      ## Prerequisites
+      
+      Before following this guide, ensure you have:
+      
+      **Required Knowledge**
+      
+      - Python fundamentals, including `async`/`await`
+      - Basic familiarity with the Agent SDK's `query()` function (see [Notebook 00](00_The_one_liner_research_agent.ipynb) for an introduction)
+      
+      **Required Tools**
+      
+      - Python 3.11 or higher
+      - The Claude Code CLI (`npm install -g @anthropic-ai/claude-code`)
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      ## Setup
+      
+      Install the required dependencies. Session management functions landed in `claude-agent-sdk` v0.1.51.
+  code cell:
+    source:
+      %%capture
+      %pip install -U "claude-agent-sdk>=0.1.51" python-dotenv pandas
+  markdown cell:
+    source:
+      Load your API key from `.env` and configure the model. We use Haiku here because the demo sessions are short and we want them cheap and fast; in a real product you'd pick whatever model fits your agent.
+  code cell:
+    source:
+      import os
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      load_dotenv()
+      
+      MODEL = "claude-haiku-4-5"
+      
+      # All demo sessions live under this project directory. Using a dedicated
+      # cwd keeps the demo isolated from your real Claude Code sessions.
+      # Note: this path resolves relative to the kernel's working directory
+      # (claude_agent_sdk/ when launched per the README).
+      DEMO_DIR = str(Path("session_browser_demo").resolve())
+      os.makedirs(DEMO_DIR, exist_ok=True)
+      print(f"Demo project dir: {DEMO_DIR}")
+  markdown cell:
+    source:
+      # Part 1: Create some sessions to manage
+      
+      Session management functions read transcripts that `query()` (or the Claude Code CLI) has already written. To have something to browse, we first run three short conversations and capture their session IDs.
+      
+      The `cwd` option tells the SDK which project directory this conversation belongs to. Transcripts are filed under `~/.claude/projects/<encoded-cwd>/<session-id>.jsonl`, so every call with the same `cwd` ends up in the same bucket. That bucket is what `list_sessions(directory=...)` reads later.
+      
+      We disable tools and cap each run at one turn to keep token usage minimal.
+  code cell:
+    source:
+      from claude_agent_sdk import ClaudeAgentOptions, ResultMessage, query
+      
+      
+      async def run_one_turn(prompt: str) -> str:
+          """Run a single-turn conversation and return its session_id."""
+          opts = ClaudeAgentOptions(
+              model=MODEL,
+              cwd=DEMO_DIR,
+              max_turns=1,
+              allowed_tools=[],  # text-only, no tool loop
+          )
+          session_id = None
+          async for msg in query(prompt=prompt, options=opts):
+              if isinstance(msg, ResultMessage):
+                  session_id = msg.session_id
+                  preview = (msg.result or "")[:80]
+                  print(f"[{session_id[:8]}] {preview}...")
+          if session_id is None:
+              raise RuntimeError("No ResultMessage received; check API key and SDK version.")
+          return session_id
+  markdown cell:
+    source:
+      Three prompts, three sessions. In a real product these would be whatever your users asked the agent.
+  code cell:
+    source:
+      prompts = [
+          "Give me three name ideas for a CLI tool that manages git worktrees.",
+          "Explain the difference between a mutex and a semaphore in one paragraph.",
+          "Write a haiku about merge conflicts.",
+      ]
+      
+      demo_session_ids = []
+      for p in prompts:
+          sid = await run_one_turn(p)
+          demo_session_ids.append(sid)
+      
+      print(f"\nCreated {len(demo_session_ids)} sessions.")
+  markdown cell:
+    source:
+      # Part 2: List and inspect sessions
+      
+      ## Build the session list
+      
+      `list_sessions()` scans the project's transcript directory and returns metadata for each session, sorted newest first. It reads file stats plus the head and tail of each transcript, so it stays fast even when the directory has hundreds of files. No subprocess is spawned and no API call is made.
+      
+      Each `SDKSessionInfo` carries what you need to render a row in a picker: a display summary, last-modified timestamp, git branch, working directory, and any custom title or tag you've set.
+  code cell:
+    source:
+      from datetime import datetime
+      
+      import pandas as pd
+      
+      from claude_agent_sdk import SDKSessionInfo, list_sessions
+      
+      sessions = list_sessions(directory=DEMO_DIR)
+      
+      # Render as a table. In a real UI this would be your sidebar component.
+      rows = []
+      for s in sessions:
+          rows.append(
+              {
+                  "id": s.session_id[:8],
+                  "summary": (s.summary[:50] + "...") if len(s.summary) > 50 else s.summary,
+                  "modified": datetime.fromtimestamp(s.last_modified / 1000).strftime("%H:%M:%S"),
+                  "branch": s.git_branch or "-",
+                  "tag": s.tag or "-",
+              }
+          )
+      
+      pd.DataFrame(rows)
+  markdown cell:
+    source:
+      For long histories, pass `limit` and `offset` to page through results. A session picker typically loads the first page, then fetches more as the user scrolls:
+      
+      ```python
+      page_2 = list_sessions(directory=DEMO_DIR, limit=20, offset=20)
+      ```
+      
+      If your app already stored a session ID (say, in your own database alongside a user record) and you just need that one row, `get_session_info()` is cheaper than listing everything.
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_info
+      
+      info = get_session_info(demo_session_ids[0], directory=DEMO_DIR)
+      
+      print(f"Session:      {info.session_id}")
+      print(f"Summary:      {info.summary}")
+      print(f"First prompt: {info.first_prompt}")
+      print(f"Created:      {datetime.fromtimestamp(info.created_at / 1000)}")
+      print(f"Size:         {info.file_size:,} bytes")
+  markdown cell:
+    source:
+      ## Read a session's messages
+      
+      Once a user clicks a session in the sidebar, you load its conversation into the main view. `get_session_messages()` reconstructs the message chain from the transcript and returns user and assistant turns in order. Like the listing function, it's a pure file read: the agent doesn't need to be running.
+      
+      Each `SessionMessage` has a `type` (`"user"` or `"assistant"`), a `uuid`, and a `message` dict in the same shape as the Anthropic Messages API (`role`, `content`).
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_messages
+      
+      messages = get_session_messages(demo_session_ids[0], directory=DEMO_DIR)
+      
+      for m in messages:
+          role = m.type
+          # content is a list of blocks; pull out the text ones
+          text_parts = [
+              b.get("text", "")
+              for b in m.message.get("content", [])
+              if isinstance(b, dict) and b.get("type") == "text"
+          ]
+          text = " ".join(text_parts).strip()
+          print(f"[{role:>9}] {text[:100]}")
+  markdown cell:
+    source:
+      For long sessions, `limit` and `offset` let you load a window at a time. A chat view might load the last 50 messages on open, then fetch older pages as the user scrolls up. Offsets are applied in chronological order (oldest first), so page 0 is the start of the conversation.
+  markdown cell:
+    source:
+      # Part 3: Organize with titles and tags
+      
+      ## Rename a session
+      
+      Auto-generated summaries are fine for a quick glance, but users often want to give a session a real name. `rename_session()` appends a title entry to the transcript; `list_sessions()` picks it up as `custom_title` on the next read.
+      
+      Appends are cheap and idempotent: calling rename twice just means the newer title wins. No file rewrite happens.
+  code cell:
+    source:
+      from claude_agent_sdk import rename_session
+      
+      rename_session(demo_session_ids[0], "Worktree CLI naming brainstorm", directory=DEMO_DIR)
+      rename_session(demo_session_ids[2], "Haiku corner", directory=DEMO_DIR)
+      
+      # Verify the titles stuck
+      for s in list_sessions(directory=DEMO_DIR):
+          label = s.custom_title or "(auto)"
+          print(f"{s.session_id[:8]}  custom_title={label!r}  summary={s.summary[:40]!r}")
+  markdown cell:
+    source:
+      ## Tag and filter
+      
+      Tags are a single string attached to a session. Use them for whatever categorization your product needs: `"archived"`, `"needs-review"`, `"favorite"`. Pass `None` to clear a tag.
+      
+      A common pattern is soft-delete: instead of removing the transcript file, tag it `"__hidden"` and filter that out in your list view. The data stays recoverable.
+  code cell:
+    source:
+      from claude_agent_sdk import tag_session
+      
+      # Mark two sessions as favorites, hide the other
+      tag_session(demo_session_ids[0], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[2], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[1], "__hidden", directory=DEMO_DIR)
+      
+      
+      def visible_sessions(directory: str, tag_filter: str | None = None) -> list[SDKSessionInfo]:
+          """List sessions, hiding soft-deletes and optionally filtering by tag."""
+          results = []
+          for s in list_sessions(directory=directory):
+              if s.tag == "__hidden":
+                  continue
+              if tag_filter is not None and s.tag != tag_filter:
+                  continue
+              results.append(s)
+          return results
+      
+      
+      favorites = visible_sessions(DEMO_DIR, tag_filter="favorite")
+      print(f"Visible favorites: {len(favorites)}")
+      for s in favorites:
+          print(f"  {s.session_id[:8]}  [{s.tag}]  {s.custom_title or s.summary}")
+  markdown cell:
+    source:
+      Tags are single values, not lists. If you need multiple axes (say, a status plus a category), encode them into one string like `"review:urgent"` and parse on read, or store richer state in your own database keyed on `session_id`.
+  markdown cell:
+    source:
+      # Part 4: Fork and resume
+      
+      ## Branch from an existing conversation
+      
+      Forking copies a session's transcript into a new file with freshly remapped message IDs. The original stays untouched. This is the primitive behind "try a different approach" features: the user keeps their original thread and gets a new one to experiment in.
+      
+      `fork_session()` writes the new file and returns its ID. It doesn't run the agent, so the fork sits on disk until you resume it with `query()`.
+  code cell:
+    source:
+      from claude_agent_sdk import fork_session
+      
+      source = demo_session_ids[0]
+      
+      fork = fork_session(
+          source,
+          directory=DEMO_DIR,
+          title="Worktree CLI names (round 2)",
+      )
+      
+      print(f"Source: {source}")
+      print(f"Fork:   {fork.session_id}")
+      
+      # The fork starts with the same message history as the source
+      source_msgs = get_session_messages(source, directory=DEMO_DIR)
+      fork_msgs = get_session_messages(fork.session_id, directory=DEMO_DIR)
+      print(f"Source has {len(source_msgs)} messages, fork has {len(fork_msgs)}")
+  markdown cell:
+    source:
+      To branch from a specific point rather than the full history, pass `up_to_message_id`. The fork will contain the source transcript up to and including that message. You can get message UUIDs from `get_session_messages()[i].uuid`.
+      
+      ## Resume the fork into a live query
+      
+      The fork is just a transcript file. To turn it back into a running conversation, hand its ID to `ClaudeAgentOptions.resume`. The agent loads the forked history and continues from there.
+  code cell:
+    source:
+      resume_opts = ClaudeAgentOptions(
+          model=MODEL,
+          cwd=DEMO_DIR,
+          max_turns=1,
+          allowed_tools=[],
+          resume=fork.session_id,
+      )
+      
+      async for msg in query(
+          prompt="Those were okay. Give me three more names, but punnier.",
+          options=resume_opts,
+      ):
+          if isinstance(msg, ResultMessage):
+              print(f"[fork {fork.session_id[:8]} resumed]")
+              print(msg.result)
+  markdown cell:
+    source:
+      The original session is still sitting there unchanged. List again and you'll see both the source and the fork as separate rows with independent histories.
+  code cell:
+    source:
+      for s in list_sessions(directory=DEMO_DIR):
+          marker = "(fork)" if s.session_id == fork.session_id else "      "
+          print(f"{marker} {s.session_id[:8]}  {s.custom_title or s.summary[:50]}")
+  markdown cell:
+    source:
+      # Cleanup
+      
+      `delete_session()` removes a transcript file. It's a hard delete, which is why the soft-delete tag pattern from Part 3 is usually the safer default for user-facing UIs.
+      
+      Here we use it to tidy up everything the demo created.
+  code cell:
+    source:
+      from claude_agent_sdk import delete_session
+      
+      # Clean up every session in the demo dir, including the fork
+      for s in list_sessions(directory=DEMO_DIR):
+          delete_session(s.session_id, directory=DEMO_DIR)
+          print(f"Deleted {s.session_id[:8]}")
+      
+      remaining = list_sessions(directory=DEMO_DIR)
+      print(f"\n{len(remaining)} session(s) remaining.")
+  markdown cell:
+    source:
+      # Recap
+      
+      We built the core of a session browser against the Agent SDK's local transcript store:
+      
+      - **Listing** with `list_sessions()` gives you everything needed to render a sidebar, and it scales because it reads file stats and head/tail slices rather than parsing whole transcripts.
+      - **Reading** with `get_session_messages()` loads a conversation back for display without spawning the agent.
+      - **Organizing** with `rename_session()` and `tag_session()` appends metadata entries, so it's cheap and the most recent call wins.
+      - **Forking** with `fork_session()` plus `options.resume` lets users branch a conversation and keep going without touching the original.
+      
+      All of these are pure file operations on `~/.claude/projects/`. They work whether or not the agent subprocess is running, and they see the same transcripts the Claude Code CLI writes.
+      
+      ## Where to go next
+      
+      - **Wire it to a UI.** These functions are UI-agnostic; drop them behind a FastAPI route or an Electron IPC handler and you have the backend for a session sidebar.
+      - **Cross-host sessions.** Transcripts live on the local disk. To share sessions across machines, sync the files or index session IDs and messages into your own store. See [Manage sessions on disk](https://docs.claude.com/en/agent-sdk/local-session-management) for patterns.
+      - **TypeScript.** The same API exists in `@anthropic-ai/claude-agent-sdk` with camelCase names (`listSessions`, `forkSession`, and so on). See the [TypeScript SDK reference](https://docs.claude.com/en/agent-sdk/typescript).
+      - **The bigger picture.** Notebooks [00](00_The_one_liner_research_agent.ipynb) through [03](03_The_site_reliability_agent.ipynb) cover building the agent itself. This notebook covers managing what it leaves behind.

Generated by nbdime

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review - Recommendation: APPROVE with minor suggestions. Summary: This notebook covers the Agent SDK session management API through a coherent end-to-end demo. Well-structured, idiomatic Python, good pedagogical flow. No critical issues. Actionable feedback: (1) Notebook has empty outputs - per repo rules execute top-to-bottom and commit outputs as done in commit 565110a. (2) pyproject.toml requires-python is >=3.11,<3.13 but notebook metadata shows Python 3.13.12 - either widen to <3.14 or align the kernel metadata. (3) session_browser_demo/.gitkeep has a comment line; convention is empty file. (4) Prerequisites list the Claude Code CLI as required but no notebook cell uses it directly - clarify if truly required or downgrade to Recommended. Code quality is solid: run_one_turn correctly raises on missing ResultMessage, visible_sessions soft-delete pattern is a great teaching point, isinstance guard in get_session_messages is defensive and correct. MODEL=claude-haiku-4-5 uses correct non-dated alias. SDK version bump to >=0.1.51 is correctly a minimum not a pin. Prose is consistently strong throughout.

@qing-ant qing-ant force-pushed the qing/agent-sdk-session-browser branch from a8c50fb to 67874c1 Compare April 1, 2026 00:21
@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 claude_agent_sdk/05_Building_a_session_browser.ipynb

View diff
nbdiff /dev/null claude_agent_sdk/05_Building_a_session_browser.ipynb (67874c17f87d53b80fc8c365eb7aa3c1c5325bd5)
--- /dev/null  2026-04-01 00:20:52.886347
+++ claude_agent_sdk/05_Building_a_session_browser.ipynb (67874c17f87d53b80fc8c365eb7aa3c1c5325bd5)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # Building a Session Browser
+      
+      When you ship an agent as a product (a desktop app, an IDE extension, an internal chatbot), the first thing users ask for is the sidebar. They want to see the conversation from last Tuesday, jump back into it, and maybe branch off in a new direction without losing the original. The agent loop is half the product; the other half is session management.
+      
+      The Claude Agent SDK writes every conversation to a JSONL transcript on disk. It also ships a set of functions that read those transcripts back and organize them, so you can build that sidebar without writing a file parser or walking `~/.claude/projects/` by hand.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - List and render past sessions for a project, with pagination and metadata like branch, title, and last-modified time
+      - Read a stored session's messages back into your UI without spawning the agent
+      - Rename, tag, and filter sessions so users can organize their history
+      - Fork a session at any point and resume the fork as a live `query()` call
+      
+      This is the pattern behind the session sidebar in Claude Code Desktop and the VS Code extension. The same primitives work for any UI you want to put on top of the Agent SDK.
+  markdown cell:
+    source:
+      ## Prerequisites
+      
+      Before following this guide, ensure you have:
+      
+      **Required Knowledge**
+      
+      - Python fundamentals, including `async`/`await`
+      - Basic familiarity with the Agent SDK's `query()` function (see [Notebook 00](00_The_one_liner_research_agent.ipynb) for an introduction)
+      
+      **Required Tools**
+      
+      - Python 3.11 or higher
+      - The Claude Code CLI (`npm install -g @anthropic-ai/claude-code`)
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      ## Setup
+      
+      Install the required dependencies. Session management functions landed in `claude-agent-sdk` v0.1.51.
+  code cell:
+    source:
+      %%capture
+      %pip install -U "claude-agent-sdk>=0.1.51" python-dotenv pandas
+  markdown cell:
+    source:
+      Load your API key from `.env` and configure the model. We use Haiku here because the demo sessions are short and we want them cheap and fast; in a real product you'd pick whatever model fits your agent.
+  code cell:
+    source:
+      import os
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      load_dotenv()
+      
+      MODEL = "claude-haiku-4-5"
+      
+      # All demo sessions live under this project directory. Using a dedicated
+      # cwd keeps the demo isolated from your real Claude Code sessions.
+      # Note: this path resolves relative to the kernel's working directory
+      # (claude_agent_sdk/ when launched per the README).
+      DEMO_DIR = str(Path("session_browser_demo").resolve())
+      os.makedirs(DEMO_DIR, exist_ok=True)
+      print(f"Demo project dir: {DEMO_DIR}")
+  markdown cell:
+    source:
+      # Part 1: Create some sessions to manage
+      
+      Session management functions read transcripts that `query()` (or the Claude Code CLI) has already written. To have something to browse, we first run three short conversations and capture their session IDs.
+      
+      The `cwd` option tells the SDK which project directory this conversation belongs to. Transcripts are filed under `~/.claude/projects/<encoded-cwd>/<session-id>.jsonl`, so every call with the same `cwd` ends up in the same bucket. That bucket is what `list_sessions(directory=...)` reads later.
+      
+      We disable tools and cap each run at one turn to keep token usage minimal.
+  code cell:
+    source:
+      from claude_agent_sdk import ClaudeAgentOptions, ResultMessage, query
+      
+      
+      async def run_one_turn(prompt: str) -> str:
+          """Run a single-turn conversation and return its session_id."""
+          opts = ClaudeAgentOptions(
+              model=MODEL,
+              cwd=DEMO_DIR,
+              max_turns=1,
+              allowed_tools=[],  # text-only, no tool loop
+          )
+          session_id = None
+          async for msg in query(prompt=prompt, options=opts):
+              if isinstance(msg, ResultMessage):
+                  session_id = msg.session_id
+                  preview = (msg.result or "")[:80]
+                  print(f"[{session_id[:8]}] {preview}...")
+          if session_id is None:
+              raise RuntimeError("No ResultMessage received; check API key and SDK version.")
+          return session_id
+  markdown cell:
+    source:
+      Three prompts, three sessions. In a real product these would be whatever your users asked the agent.
+  code cell:
+    source:
+      prompts = [
+          "Give me three name ideas for a CLI tool that manages git worktrees.",
+          "Explain the difference between a mutex and a semaphore in one paragraph.",
+          "Write a haiku about merge conflicts.",
+      ]
+      
+      demo_session_ids = []
+      for p in prompts:
+          sid = await run_one_turn(p)
+          demo_session_ids.append(sid)
+      
+      print(f"\nCreated {len(demo_session_ids)} sessions.")
+  markdown cell:
+    source:
+      # Part 2: List and inspect sessions
+      
+      ## Build the session list
+      
+      `list_sessions()` scans the project's transcript directory and returns metadata for each session, sorted newest first. It reads file stats plus the head and tail of each transcript, so it stays fast even when the directory has hundreds of files. No subprocess is spawned and no API call is made.
+      
+      Each `SDKSessionInfo` carries what you need to render a row in a picker: a display summary, last-modified timestamp, git branch, working directory, and any custom title or tag you've set.
+  code cell:
+    source:
+      from datetime import datetime
+      
+      import pandas as pd
+      
+      from claude_agent_sdk import SDKSessionInfo, list_sessions
+      
+      sessions = list_sessions(directory=DEMO_DIR)
+      
+      # Render as a table. In a real UI this would be your sidebar component.
+      rows = []
+      for s in sessions:
+          rows.append(
+              {
+                  "id": s.session_id[:8],
+                  "summary": (s.summary[:50] + "...") if len(s.summary) > 50 else s.summary,
+                  "modified": datetime.fromtimestamp(s.last_modified / 1000).strftime("%H:%M:%S"),
+                  "branch": s.git_branch or "-",
+                  "tag": s.tag or "-",
+              }
+          )
+      
+      pd.DataFrame(rows)
+  markdown cell:
+    source:
+      For long histories, pass `limit` and `offset` to page through results. A session picker typically loads the first page, then fetches more as the user scrolls:
+      
+      ```python
+      page_2 = list_sessions(directory=DEMO_DIR, limit=20, offset=20)
+      ```
+      
+      If your app already stored a session ID (say, in your own database alongside a user record) and you just need that one row, `get_session_info()` is cheaper than listing everything.
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_info
+      
+      info = get_session_info(demo_session_ids[0], directory=DEMO_DIR)
+      
+      print(f"Session:      {info.session_id}")
+      print(f"Summary:      {info.summary}")
+      print(f"First prompt: {info.first_prompt}")
+      print(f"Created:      {datetime.fromtimestamp(info.created_at / 1000)}")
+      print(f"Size:         {info.file_size:,} bytes")
+  markdown cell:
+    source:
+      ## Read a session's messages
+      
+      Once a user clicks a session in the sidebar, you load its conversation into the main view. `get_session_messages()` reconstructs the message chain from the transcript and returns user and assistant turns in order. Like the listing function, it's a pure file read: the agent doesn't need to be running.
+      
+      Each `SessionMessage` has a `type` (`"user"` or `"assistant"`), a `uuid`, and a `message` dict in the same shape as the Anthropic Messages API (`role`, `content`).
+  code cell:
+    source:
+      from claude_agent_sdk import get_session_messages
+      
+      messages = get_session_messages(demo_session_ids[0], directory=DEMO_DIR)
+      
+      for m in messages:
+          role = m.type
+          # content is a list of blocks; pull out the text ones
+          text_parts = [
+              b.get("text", "")
+              for b in m.message.get("content", [])
+              if isinstance(b, dict) and b.get("type") == "text"
+          ]
+          text = " ".join(text_parts).strip()
+          print(f"[{role:>9}] {text[:100]}")
+  markdown cell:
+    source:
+      For long sessions, `limit` and `offset` let you load a window at a time. A chat view might load the last 50 messages on open, then fetch older pages as the user scrolls up. Offsets are applied in chronological order (oldest first), so page 0 is the start of the conversation.
+  markdown cell:
+    source:
+      # Part 3: Organize with titles and tags
+      
+      ## Rename a session
+      
+      Auto-generated summaries are fine for a quick glance, but users often want to give a session a real name. `rename_session()` appends a title entry to the transcript; `list_sessions()` picks it up as `custom_title` on the next read.
+      
+      Appends are cheap and idempotent: calling rename twice just means the newer title wins. No file rewrite happens.
+  code cell:
+    source:
+      from claude_agent_sdk import rename_session
+      
+      rename_session(demo_session_ids[0], "Worktree CLI naming brainstorm", directory=DEMO_DIR)
+      rename_session(demo_session_ids[2], "Haiku corner", directory=DEMO_DIR)
+      
+      # Verify the titles stuck
+      for s in list_sessions(directory=DEMO_DIR):
+          label = s.custom_title or "(auto)"
+          print(f"{s.session_id[:8]}  custom_title={label!r}  summary={s.summary[:40]!r}")
+  markdown cell:
+    source:
+      ## Tag and filter
+      
+      Tags are a single string attached to a session. Use them for whatever categorization your product needs: `"archived"`, `"needs-review"`, `"favorite"`. Pass `None` to clear a tag.
+      
+      A common pattern is soft-delete: instead of removing the transcript file, tag it `"__hidden"` and filter that out in your list view. The data stays recoverable.
+  code cell:
+    source:
+      from claude_agent_sdk import tag_session
+      
+      # Mark two sessions as favorites, hide the other
+      tag_session(demo_session_ids[0], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[2], "favorite", directory=DEMO_DIR)
+      tag_session(demo_session_ids[1], "__hidden", directory=DEMO_DIR)
+      
+      
+      def visible_sessions(directory: str, tag_filter: str | None = None) -> list[SDKSessionInfo]:
+          """List sessions, hiding soft-deletes and optionally filtering by tag."""
+          results = []
+          for s in list_sessions(directory=directory):
+              if s.tag == "__hidden":
+                  continue
+              if tag_filter is not None and s.tag != tag_filter:
+                  continue
+              results.append(s)
+          return results
+      
+      
+      favorites = visible_sessions(DEMO_DIR, tag_filter="favorite")
+      print(f"Visible favorites: {len(favorites)}")
+      for s in favorites:
+          print(f"  {s.session_id[:8]}  [{s.tag}]  {s.custom_title or s.summary}")
+  markdown cell:
+    source:
+      Tags are single values, not lists. If you need multiple axes (say, a status plus a category), encode them into one string like `"review:urgent"` and parse on read, or store richer state in your own database keyed on `session_id`.
+  markdown cell:
+    source:
+      # Part 4: Fork and resume
+      
+      ## Branch from an existing conversation
+      
+      Forking copies a session's transcript into a new file with freshly remapped message IDs. The original stays untouched. This is the primitive behind "try a different approach" features: the user keeps their original thread and gets a new one to experiment in.
+      
+      `fork_session()` writes the new file and returns its ID. It doesn't run the agent, so the fork sits on disk until you resume it with `query()`.
+  code cell:
+    source:
+      from claude_agent_sdk import fork_session
+      
+      source = demo_session_ids[0]
+      
+      fork = fork_session(
+          source,
+          directory=DEMO_DIR,
+          title="Worktree CLI names (round 2)",
+      )
+      
+      print(f"Source: {source}")
+      print(f"Fork:   {fork.session_id}")
+      
+      # The fork starts with the same message history as the source
+      source_msgs = get_session_messages(source, directory=DEMO_DIR)
+      fork_msgs = get_session_messages(fork.session_id, directory=DEMO_DIR)
+      print(f"Source has {len(source_msgs)} messages, fork has {len(fork_msgs)}")
+  markdown cell:
+    source:
+      To branch from a specific point rather than the full history, pass `up_to_message_id`. The fork will contain the source transcript up to and including that message. You can get message UUIDs from `get_session_messages()[i].uuid`.
+      
+      ## Resume the fork into a live query
+      
+      The fork is just a transcript file. To turn it back into a running conversation, hand its ID to `ClaudeAgentOptions.resume`. The agent loads the forked history and continues from there.
+  code cell:
+    source:
+      resume_opts = ClaudeAgentOptions(
+          model=MODEL,
+          cwd=DEMO_DIR,
+          max_turns=1,
+          allowed_tools=[],
+          resume=fork.session_id,
+      )
+      
+      async for msg in query(
+          prompt="Those were okay. Give me three more names, but punnier.",
+          options=resume_opts,
+      ):
+          if isinstance(msg, ResultMessage):
+              print(f"[fork {fork.session_id[:8]} resumed]")
+              print(msg.result)
+  markdown cell:
+    source:
+      The original session is still sitting there unchanged. List again and you'll see both the source and the fork as separate rows with independent histories.
+  code cell:
+    source:
+      for s in list_sessions(directory=DEMO_DIR):
+          marker = "(fork)" if s.session_id == fork.session_id else "      "
+          print(f"{marker} {s.session_id[:8]}  {s.custom_title or s.summary[:50]}")
+  markdown cell:
+    source:
+      # Cleanup
+      
+      `delete_session()` removes a transcript file. It's a hard delete, which is why the soft-delete tag pattern from Part 3 is usually the safer default for user-facing UIs.
+      
+      Here we use it to tidy up everything the demo created.
+  code cell:
+    source:
+      from claude_agent_sdk import delete_session
+      
+      # Clean up every session in the demo dir, including the fork
+      for s in list_sessions(directory=DEMO_DIR):
+          delete_session(s.session_id, directory=DEMO_DIR)
+          print(f"Deleted {s.session_id[:8]}")
+      
+      remaining = list_sessions(directory=DEMO_DIR)
+      print(f"\n{len(remaining)} session(s) remaining.")
+  markdown cell:
+    source:
+      # Recap
+      
+      We built the core of a session browser against the Agent SDK's local transcript store:
+      
+      - **Listing** with `list_sessions()` gives you everything needed to render a sidebar, and it scales because it reads file stats and head/tail slices rather than parsing whole transcripts.
+      - **Reading** with `get_session_messages()` loads a conversation back for display without spawning the agent.
+      - **Organizing** with `rename_session()` and `tag_session()` appends metadata entries, so it's cheap and the most recent call wins.
+      - **Forking** with `fork_session()` plus `options.resume` lets users branch a conversation and keep going without touching the original.
+      
+      All of these are pure file operations on `~/.claude/projects/`. They work whether or not the agent subprocess is running, and they see the same transcripts the Claude Code CLI writes.
+      
+      ## Where to go next
+      
+      - **Wire it to a UI.** These functions are UI-agnostic; drop them behind a FastAPI route or an Electron IPC handler and you have the backend for a session sidebar.
+      - **Cross-host sessions.** Transcripts live on the local disk. To share sessions across machines, sync the files or index session IDs and messages into your own store. See [Manage sessions on disk](https://docs.claude.com/en/agent-sdk/local-session-management) for patterns.
+      - **TypeScript.** The same API exists in `@anthropic-ai/claude-agent-sdk` with camelCase names (`listSessions`, `forkSession`, and so on). See the [TypeScript SDK reference](https://docs.claude.com/en/agent-sdk/typescript).
+      - **The bigger picture.** Notebooks [00](00_The_one_liner_research_agent.ipynb) through [03](03_The_site_reliability_agent.ipynb) cover building the agent itself. This notebook covers managing what it leaves behind.

📓 tool_use/context_engineering/context_engineering_tools.ipynb

View diff
nbdiff tool_use/context_engineering/context_engineering_tools.ipynb (a5a6b21048dc8e8f00a254183130bd92ccc48f66) /dev/null
--- tool_use/context_engineering/context_engineering_tools.ipynb (a5a6b21048dc8e8f00a254183130bd92ccc48f66)  (no timestamp)
+++ /dev/null  2026-04-01 00:20:52.886347
## deleted /cells:
-  markdown cell:
-    source:
-      # Context Engineering for AI Agents: Memory vs. Compaction vs. Tool Clearing
-      
-      ## Introduction
-      
-      A common challenge when building long-horizon agents is managing context. Tool results, the model's own reasoning, and user messages all accumulate, and eventually you either hit the token limit or start paying for context that isn't helping anymore. Studies on needle-in-a-haystack style benchmarking have uncovered the concept of [context rot](https://research.trychroma.com/context-rot): as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases. So, even before the hard context limit is reached, the agent may be getting less out of each token.
-      
-      Our engineering blog on [effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) frames this as a resource problem: context is finite with diminishing marginal returns, and the core discipline is finding the smallest set of high-signal tokens that maximize the likelihood of your desired outcome. There are several levers for this: subagents that isolate work in their own context, programmatic tool calling that keeps large results out of the window entirely, and others.
-      
-      This cookbook focuses on three: **compaction**, **tool-result clearing**, and **memory**. All three are effective strategies for context engineering, but since they all operate to make the context window more efficient in different ways, they can be hard to distinguish. Understanding those distinctions is what lets you map each tool to the part of your workload it actually helps with. Alongside other core context management strategies like utilizing subagents, these three are crucial for teams building long-running agents to understand. They also all have first-party API support, so you can adopt them without building orchestration infrastructure.
-      
-      - **Compaction** distills the contents of a context window into a high-fidelity summary, letting the agent continue with minimal performance degradation when the conversation gets long.
-      - **Tool-result clearing** addresses the bloat from tool use itself. As an agent pulls in tools and calls them, the results pile up, and deciding how much of that tool output to keep becomes an increasingly important part of managing context. Clearing drops old, re-fetchable results while keeping the record that the call happened.
-      - **Memory** is structured note-taking: the agent writes to persistent external storage so it can track progress across tasks and sessions without keeping everything in active context.
-      
-      [Claude Code](https://claude.com/product/claude-code) employs multiple of these strategies in production: compaction for long conversations and two complementary memory systems for cross-session persistence. Our API offers first-party implementations of all three: [server-side compaction](https://platform.claude.com/docs/en/build-with-claude/compaction), [context editing](https://platform.claude.com/docs/en/build-with-claude/context-editing) (which includes tool-result clearing), and the [memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool). This cookbook works through how to think about designing with them: when each one applies, how to configure them, what changes when you use them independently vs. together, and sample use-cases where different combinations make sense.
-      
-      The examples center on a **long-running research agent**: one that reads a corpus of documents, takes notes, and builds on its findings across multiple sessions. It's a useful test case because it naturally hits all three problems: bulky document reads (clearing), long analytical conversations (compaction), and knowledge that needs to survive between sessions (memory).
-      
-      ### What you'll learn
-      
-      - How to **cap in-session token growth** with `clear_tool_uses` when an agent's context is dominated by large, re-fetchable tool results like file reads and API responses
-      - How to **keep long conversations going** with server-side compaction, including how to serialize the `compaction` block back and probe what survives the summary
-      - How to **persist agent knowledge across sessions** by implementing a file-backed memory handler that the model drives itself, so Session 2 picks up where Session 1 left off
-      - How to **implement each primitive most effectively**, replacing the default compaction prompt to preserve what your agent needs, guiding what the agent writes to `/memories`, and testing clearing configs against your own workload's tool-use pattern
-      - How to **diagnose which part of the context problem your workload actually has**, and pick the primitive that targets it, with a framework for mapping workload characteristics to the right tool
-      
-      ### Prerequisites
-      
-      To run this notebook, you will need:
-      
-      - **Anthropic API key** set as `ANTHROPIC_API_KEY` in your environment or a `.env` file ([get one here](https://platform.claude.com/))
-      - **Python 3.11+** with the `anthropic`, `python-dotenv`, and `matplotlib` packages installed
-      - **`research_corpus.py`** alongside this notebook (included in the repository). It defines `CORPUS`, a dict of eight synthetic review documents on model organisms for aging research (~40K tokens each, ~320K tokens total), plus probe questions used later to test what survives compaction. You can swap in your own documents by replacing the dict.
-      
-      > **Running from the cookbooks repo?** Ensure your working directory is `tool_use/context_engineering` before running the notebook.
-  markdown cell:
-    source:
-      ## Step 0: Environment Setup
-      
-      Create a `.env` file in this directory with your Anthropic API key:
-      
-      ```
-      ANTHROPIC_API_KEY=your-key-here
-      ```
-  code cell:
-    source:
-      %%capture
-      %pip install anthropic python-dotenv matplotlib
-  code cell:
-    source:
-      import json
-      import os
-      import tempfile
-      from collections import namedtuple
-      from pathlib import Path
-      
-      import anthropic
-      import matplotlib.pyplot as plt
-      from dotenv import load_dotenv
-      
-      load_dotenv()
-      
-      if not os.environ.get("ANTHROPIC_API_KEY"):
-          raise ValueError("ANTHROPIC_API_KEY not set. Add it to a .env file or export it.")
-      
-      CORPUS_PATH = Path("research_corpus.py")
-      assert CORPUS_PATH.exists(), (
-          f"research_corpus.py not found in {Path.cwd()}. It should be alongside this notebook."
-      )
-      
-      client = anthropic.Anthropic()
-      MODEL = "claude-sonnet-4-6"
-      print(f"anthropic SDK {anthropic.__version__}, model {MODEL}")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          anthropic SDK 0.84.0, model claude-sonnet-4-6
-  code cell:
-    source:
-      # Force-reload the corpus module in case this kernel has a stale cached version
-      import importlib
-      
-      import research_corpus
-      
-      importlib.reload(research_corpus)
-      from research_corpus import COMPACTION_PROBES, CORPUS
-      
-      # Count tokens via the API (cached in-memory so repeat kernel runs are fast).
-      _token_cache: dict[str, int] = {}
-      
-      
-      def count_tokens(text: str) -> int:
-          if text not in _token_cache:
-              _token_cache[text] = client.messages.count_tokens(
-                  model=MODEL, messages=[{"role": "user", "content": text}]
-              ).input_tokens
-          return _token_cache[text]
-      
-      
-      print(f"CORPUS is a dict of {len(CORPUS)} synthetic documents held in Python memory.")
-      print("When the agent calls read_file, the content is served from this dict and")
-      print("lands directly in the agent's context window — no disk I/O involved.\n")
-      _total_tokens = 0
-      for path, content in CORPUS.items():
-          n_tok = count_tokens(content)
-          _total_tokens += n_tok
-          # Strip the virtual directory prefix for display clarity
-          display_name = path.removeprefix("/research/")
-          print(f"  {display_name:<26} ~{n_tok:>6,} tokens")
-      print(f"\n  Total corpus: ~{_total_tokens:,} tokens")
-      
-      # Sanity check: the corpus should be large enough to push context well past
-      # 200K in a single read-pass. If this fires, research_corpus.py may be stale.
-      assert _total_tokens > 250_000, (
-          f"Corpus is only {_total_tokens:,} tokens; expected >250K. "
-          "Restart the kernel and re-run, or verify research_corpus.py is current."
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          CORPUS is a dict of 8 synthetic documents held in Python memory.
-          When the agent calls read_file, the content is served from this dict and
-          lands directly in the agent's context window — no disk I/O involved.
-          
-            celegans_review.md         ~41,260 tokens
-            drosophila_review.md       ~41,200 tokens
-            mouse_review.md            ~41,155 tokens
-            zebrafish_review.md        ~41,095 tokens
-            killifish_review.md        ~41,115 tokens
-            yeast_review.md            ~41,101 tokens
-            nmr_review.md              ~41,017 tokens
-            rhesus_review.md           ~41,012 tokens
-          
-            Total corpus: ~328,955 tokens
-  markdown cell:
-    source:
-      ## The Problem: A Long-Running Research Agent
-      
-      The agent in this cookbook plays the role of a biology researcher writing a comparative review of model organisms for aging and longevity research. The task is realistic enough to matter: it involves reading through a corpus of review documents (one per organism), extracting comparable facts (lifespan, genetic tractability, translational relevance), taking structured notes, and synthesizing findings across everything read.
-      
-      This kind of work is where context management starts to bite. Each document is around 40K tokens (narrative plus extensive appendix tables of intervention data), and the task asks the agent to read them in two batches: four high-throughput organisms (C. elegans, Drosophila, yeast, killifish) first, then four low-throughput organisms (mouse, zebrafish, naked mole-rat, rhesus). The two-batch structure is an experimental design choice for this cookbook: it produces a context trajectory that climbs past the compaction trigger on the first batch and past the 200K reference line on the second, so each primitive's effect on the trajectory is visible in the same run. Without context management, the agent's context grows to hundreds of thousands of tokens mid-task. And since the work spans sessions, even a completed run starts the next session with no memory of what was learned.
-      
-      ### The research task
-      
-      The agent's concrete assignment: compare the model organisms in `/research/` on three dimensions (lifespan and experimental throughput, genetic tractability, and translational relevance to human aging), reading the eight review documents in two batches and taking notes as it goes, then writing a comparative synthesis.
-  markdown cell:
-    source:
-      ## How the Three APIs Map to the Problem
-      
-      Each API targets a different kind of context growth. Understanding which kind you're facing is the first step to picking the right tool.
-      
-      ### Conceptually
-      
-      **Compaction** is the practice of taking a conversation nearing the context window limit, summarizing its contents, and reinitiating with that summary. It aims to distill the context window in a high-fidelity manner so the agent can continue with minimal performance degradation. The art of compaction lies in what to keep versus what to discard: overly aggressive compaction can lose subtle but critical context whose importance only becomes apparent later. The summary preserves architectural decisions, unresolved questions, and key facts while discarding redundant content; it's lossy by design, but handles all context growth, not just tool results. Compaction is a *whole-transcript* operation: user messages, assistant messages, tool calls, tool results, even prior compaction blocks are all flattened into the summary.
-      
-      **Tool-result clearing**, by contrast, is a *sub-transcript* operation. It walks the message list and surgically replaces `tool_result` content blocks, leaving everything else — user messages, assistant reasoning, the `tool_use` record — untouched. When an agent calls tools, the results become part of the conversation history and count against the context budget on every subsequent turn. Much of that content is re-fetchable: file contents the agent can re-read, API responses it can re-request. Clearing replaces old `tool_result` blocks with a short placeholder, keeping the `tool_use` record so the model still knows it made the call, but dropping the bulky payload. Once a tool has been called deep in the message history, the agent rarely needs to see the raw result again; clearing is one of the safest, lightest-touch ways to recover that space. If the agent does need the data, it just calls the tool again.
-      
-      **Memory**, or structured note-taking, is a technique where the agent regularly writes notes persisted outside the context window, then pulls them back in at later times. This provides persistent memory with minimal overhead: the agent tracks progress across complex tasks, maintaining critical context that would otherwise be lost across dozens of tool calls or across context resets. After a reset (a new session, or after compaction), the agent reads its own notes and continues. You implement the storage backend, so you control what's stored and for how long.
-      
-      Beyond enabling these primitives, it's also important to understand how to implement them most effectively: the default behavior gets you started, but the quality of a compaction summary and the usefulness of what lands in memory both depend on guidance you provide. Each primitive's section below includes a subsection on effective implementation.
-      
-      ### Tactically
-      
-      | API | Identifier | Beta header | Triggered by | Configurable knobs |
-      |---|---|---|---|---|
-      | Compaction | `compact_20260112` | `compact-2026-01-12` | Token threshold (server-side, min 50K) | `trigger` (default 150K), `instructions`, `pause_after_compaction` |
-      | Tool clearing | `clear_tool_uses_20250919` | `context-management-2025-06-27` | Token threshold (server-side) | `trigger` (default 100K), `keep` (default 3 tool uses), `clear_at_least`, `exclude_tools`, `clear_tool_inputs` |
-      | Memory tool | `memory_20250818` | none (standalone) | The model (it's a tool call) | Client implements: `view`, `create`, `str_replace`, `insert`, `delete`, `rename` |
-      
-      ### Mapped to the research agent
-      
-      For the research agent specifically, the three problems line up cleanly:
-      
-      - The agent's running commentary ("C. elegans is 18-day lifespan with genome-wide RNAi, mouse is 30 months but costs $100K per cohort...") and the user's follow-up questions accumulate into a long dialogue. That's a **compaction** problem.
-      - Reading eight ~40K-token review documents produces roughly 320K tokens of tool-result volume, significantly into the range where model performance decays from context rot. Most of it the agent could re-read on demand. That's a **clearing** problem.
-      - The work spans multiple sessions. If Session 1 determined that killifish is the shortest-lived vertebrate (4-6 months), we want Session 2 to retain that finding and build on it rather than rediscover it from scratch. That's a **memory** problem.
-      
-      A rough mental model for prioritizing: compaction compresses the whole window when it grows too large, clearing drops stale re-fetchable data inside the window, and memory moves information out of the window so it survives across sessions. Each layer adds config to tune and interactions to understand, so it's worth starting with the one that matches the bottleneck you're actually observing.
-  markdown cell:
-    source:
-      ## The Research Agent
-      
-      Before exploring each primitive, we set up the agent itself: tool schemas, tool execution, and an agent loop that can be run with or without any context-management configuration. Everything is inline so you can see the full loop.
-  code cell:
-    source:
-      # ── Tool schemas ─────────────────────────────────────────────────────────
-      RESEARCH_TASK = """Compare the model organisms in /research/ on three dimensions:
-      1. Lifespan and experimental throughput (how fast can you get a lifespan result?)
-      2. Genetic tractability (what tools exist for manipulation and screening?)
-      3. Translational relevance (how well do findings transfer to human aging?)
-      
-      Work through the organisms in two batches, recording a note after each
-      batch before moving on:
-      
-      BATCH 1 (high-throughput models): Read celegans_review.md,
-      drosophila_review.md, yeast_review.md, and killifish_review.md. Record a
-      single detailed note covering all four organisms on all three dimensions.
-      Do not begin Batch 2 until this note is recorded.
-      
-      BATCH 2 (low-throughput models): Read mouse_review.md,
-      zebrafish_review.md, nmr_review.md, and rhesus_review.md. Record a
-      single detailed note covering all four organisms on all three
-      dimensions.
-      
-      After both batches, write a comprehensive comparative synthesis
-      contrasting the high-throughput and low-throughput models."""
-      
-      BASE_TOOLS: list[dict] = [
-          {
-              "name": "search_files",
-              "description": "Search document filenames and contents for a keyword. Returns matching paths with a short excerpt.",
-              "input_schema": {
-                  "type": "object",
-                  "properties": {"query": {"type": "string", "description": "Search term"}},
-                  "required": ["query"],
-              },
-          },
-          {
-              "name": "read_file",
-              "description": "Read the full content of a document by path.",
-              "input_schema": {
-                  "type": "object",
-                  "properties": {
-                      "path": {"type": "string", "description": "e.g. /research/celegans_review.md"}
-                  },
-                  "required": ["path"],
-              },
-          },
-          {
-              "name": "record_finding",
-              "description": "Record a research finding (freeform text). Findings are held for the duration of this session only; they are not persisted across sessions.",
-              "input_schema": {
-                  "type": "object",
-                  "properties": {"finding": {"type": "string"}},
-                  "required": ["finding"],
-              },
-          },
-      ]
-      
-      # The memory tool spec is defined here but only added to the agent's tool
-      # list when a memory_handler is passed to run_research_session. Baseline
-      # and clearing/compaction runs do NOT get the memory tool.
-      MEMORY_TOOL_SPEC = {"type": "memory_20250818", "name": "memory"}
-      
-      SYSTEM_PROMPT = """You are a biology research analyst writing a comparative review of model organisms for aging research.
-      
-      Work systematically: search and read review documents, take notes on key facts (lifespans, genetic tools, translational strengths and caveats), and build a comparative picture.
-      
-      Be concise in your reasoning text; the goal is notes, not essays."""
-  code cell:
-    source:
-      # ── Tool execution ───────────────────────────────────────────────────────
-      def execute_research_tool(name: str, tool_input: dict, notes: list[str]) -> str:
-          if name == "search_files":
-              query = tool_input.get("query", "").lower()
-              if not query:
-                  return "Error: query is required."
-              hits: list[str] = []
-              for path, content in CORPUS.items():
-                  if query in path.lower() or query in content.lower():
-                      for line in content.split("\n"):
-                          if query in line.lower():
-                              hits.append(f"  {path}\n    → {line.strip()[:120]}")
-                              break
-                      else:
-                          hits.append(f"  {path}")
-              return (
-                  f"Found {len(hits)} match(es):\n" + "\n".join(hits)
-                  if hits
-                  else f"No matches for '{query}'."
-              )
-      
-          if name == "read_file":
-              path = tool_input.get("path", "")
-              content = CORPUS.get(path)
-              if content is None:
-                  return f"Error: '{path}' not found. Available: {', '.join(CORPUS.keys())}"
-              return content
-      
-          if name == "record_finding":
-              finding = tool_input.get("finding", "")
-              notes.append(finding)
-              return f"Finding #{len(notes)} recorded (session-local)."
-      
-          return f"Error: unknown tool '{name}'"
-  code cell:
-    source:
-      # ── Session result container ─────────────────────────────────────────────
-      SessionResult = namedtuple(
-          "SessionResult",
-          [
-              "messages",  # final message list
-              "notes",  # notes taken this session
-              "token_trajectory",  # list of (turn, total_context_tokens)
-              "events",  # list of dicts describing compaction / clearing events
-              "tool_counts",  # dict of tool_name -> call count
-              "file_reads",  # list of (turn, path) for each read_file call
-              "hit_limit",  # True if the session stopped because it hit the context window
-              "final_text",  # final assistant text (convenience)
-          ],
-      )
-      
-      
-      def _format_tool_arg(name: str, tool_input: dict) -> str:
-          """Human-readable one-line summary of a tool call's arguments for verbose output."""
-          if name == "search_files":
-              return repr(tool_input.get("query", ""))
-          if name == "read_file":
-              return tool_input.get("path", "?")
-          if name == "record_finding":
-              finding = tool_input.get("finding", "")
-              preview = finding[:50].replace("\n", " ")
-              return f'"{preview}{"..." if len(finding) > 50 else ""}"'
-          if name == "memory":
-              cmd = tool_input.get("command", "?")
-              path = tool_input.get("path", tool_input.get("old_path", ""))
-              return f"{cmd} {path}"
-          return str(tool_input)[:60]
-      
-      
-      # ── Agent loop ───────────────────────────────────────────────────────────
-      def run_research_session(
-          initial_prompt: str,
-          *,
-          context_management: dict | None = None,
-          betas: list[str] | None = None,
-          memory_handler=None,
-          max_turns: int = 12,
-          label: str = "session",
-          verbose: bool = True,
-      ) -> SessionResult:
-          """Run the research agent. Catches context-window overflow gracefully."""
-          tools = list(BASE_TOOLS)
-          if memory_handler is not None:
-              tools.append(MEMORY_TOOL_SPEC)
-      
-          messages: list[dict] = [{"role": "user", "content": initial_prompt}]
-          notes: list[str] = []
-          token_trajectory: list[tuple[int, int]] = []
-          events: list[dict] = []
-          tool_counts: dict[str, int] = {}
-          file_reads: list[tuple[int, str]] = []  # (turn, path) for each read_file
-          hit_limit = False
-          final_text = ""
-      
-          if verbose:
-              print(f"┌─ [{label}]")
-      
-          for turn in range(1, max_turns + 1):
-              kwargs: dict = dict(
-                  model=MODEL,
-                  max_tokens=4096,
-                  system=SYSTEM_PROMPT,
-                  tools=tools,
-                  messages=messages,
-              )
-              if context_management:
-                  kwargs["context_management"] = context_management
-              if betas:
-                  kwargs["betas"] = betas
-      
-              # Call the API; catch context-window overflow
-              try:
-                  if betas:
-                      response = client.beta.messages.create(**kwargs)
-                  else:
-                      response = client.messages.create(**kwargs)
-              except anthropic.BadRequestError as e:
-                  # Context window exceeded (or similar input-too-large error)
-                  hit_limit = True
-                  if verbose:
-                      print(f"│  ⚠ CONTEXT WINDOW LIMIT REACHED at turn {turn} (API rejected)")
-                      print(f"│    {str(e)[:200]}")
-                  break
-      
-              # Track TOTAL context size: uncached + cache-read + cache-created.
-              # usage.input_tokens alone excludes cached tokens, which makes the
-              # plot show dips that are just cache hits, not context management.
-              u = response.usage
-              total_in = (
-                  u.input_tokens
-                  + (getattr(u, "cache_read_input_tokens", None) or 0)
-                  + (getattr(u, "cache_creation_input_tokens", None) or 0)
-              )
-              token_trajectory.append((turn, total_in))
-      
-              # Surface context-management events on their own prominent lines
-              cm = getattr(response, "context_management", None)
-              if cm is not None and getattr(cm, "applied_edits", None):
-                  for edit in cm.applied_edits:
-                      cleared_uses = getattr(edit, "cleared_tool_uses", None)
-                      cleared_toks = getattr(edit, "cleared_input_tokens", None)
-                      events.append(
-                          {
-                              "turn": turn,
-                              "kind": "clearing",
-                              "cleared_tool_uses": cleared_uses,
-                              "cleared_input_tokens": cleared_toks,
-                          }
-                      )
-                      if verbose:
-                          print(
-                              f"│  ✂ CLEARING (turn {turn}): {cleared_uses or '?'} tool results cleared, "
-                              f"~{cleared_toks:,} tokens freed"
-                              if cleared_toks
-                              else f"│  ✂ CLEARING (turn {turn}): applied"
-                          )
-      
-              # Serialize assistant content, track tool calls and compaction
-              serialized: list[dict] = []
-              tool_calls: list[dict] = []
-              turn_tool_calls: list[tuple[str, dict]] = []  # (name, input) for verbose display
-              compaction_this_turn = False
-              for block in response.content:
-                  if block.type == "text":
-                      serialized.append({"type": "text", "text": block.text})
-                      if block.text.strip():
-                          final_text = block.text
-                  elif block.type == "tool_use":
-                      serialized.append(
-                          {
-                              "type": "tool_use",
-                              "id": block.id,
-                              "name": block.name,
-                              "input": block.input,
-                          }
-                      )
-                      tool_calls.append({"id": block.id, "name": block.name, "input": block.input})
-                      turn_tool_calls.append((block.name, block.input))
-                      tool_counts[block.name] = tool_counts.get(block.name, 0) + 1
-                      # Track file reads specifically so we can show what clearing drops
-                      if block.name == "read_file":
-                          file_reads.append((turn, block.input.get("path", "?")))
-                  elif block.type == "thinking":
-                      serialized.append(
-                          {"type": "thinking", "thinking": block.thinking, "signature": block.signature}
-                      )
-                  elif block.type == "compaction":
-                      serialized.append({"type": "compaction", "content": block.content})
-                      events.append({"turn": turn, "kind": "compaction", "summary": block.content})
-                      compaction_this_turn = True
-                      if verbose:
-                          print(
-                              f"│  ⊟ COMPACTION (turn {turn}): "
-                              f"~{count_tokens(block.content):,}-token summary replaces prior turns"
-                          )
-              messages.append({"role": "assistant", "content": serialized})
-      
-              if not tool_calls:
-                  if verbose and not compaction_this_turn:
-                      print(f"│  turn {turn:2d}  ctx={total_in:>7,}  (final answer)")
-                  break
-      
-              # Execute tools, collecting result sizes so the verbose print
-              # can show how much each call added to context
-              tool_results: list[dict] = []
-              result_sizes: list[int] = []
-              for call in tool_calls:
-                  if call["name"] == "memory" and memory_handler is not None:
-                      result = memory_handler.handle(call["input"])
-                  else:
-                      result = execute_research_tool(call["name"], call["input"], notes)
-                  tool_results.append(
-                      {
-                          "type": "tool_result",
-                          "tool_use_id": call["id"],
-                          "content": result,
-                      }
-                  )
-                  result_sizes.append(len(result) if isinstance(result, str) else 0)
-              messages.append({"role": "user", "content": tool_results})
-      
-              if verbose and not compaction_this_turn:
-                  # Header with context size, then one line per tool call with
-                  # its argument and approximate result size (so you can see
-                  # which calls are responsible for the next turn's ctx jump)
-                  print(f"│  turn {turn:2d}  ctx={total_in:>7,}")
-                  for (name, tinput), rsize in zip(turn_tool_calls, result_sizes, strict=False):
-                      size_note = f"  → ~{rsize // 4:,} tok" if rsize > 200 else ""
-                      print(f"│            {name:<16} {_format_tool_arg(name, tinput)}{size_note}")
-      
-          if verbose:
-              if token_trajectory:
-                  peak = max(t for _, t in token_trajectory)
-                  status = "⚠ HIT CONTEXT LIMIT" if hit_limit else "completed"
-                  print(
-                      f"└─ {status}: {len(token_trajectory)} turns, peak ctx {peak:,}, "
-                      f"final ctx {token_trajectory[-1][1]:,}, {len(events)} context event(s)\n"
-                  )
-              else:
-                  print("└─ HIT CONTEXT LIMIT on first turn: 0 turns completed\n")
-      
-          return SessionResult(
-              messages, notes, token_trajectory, events, tool_counts, file_reads, hit_limit, final_text
-          )
-      
-      
-      def show_cleared_reads(result: SessionResult, keep: int):
-          """Show which file reads are no longer in context after clearing.
-      
-          Clearing replaces tool results older than the last `keep` tool uses
-          with placeholders. We reconstruct the tool-use order and mark any
-          read that falls outside the surviving `keep`-window as cleared.
-      
-          Note: if clearing fires multiple times, only the last event's
-          boundary is considered and earlier cleared-then-re-read files may
-          be misclassified.
-          """
-          if not result.file_reads:
-              print("No file reads in this session.")
-              return
-          clearing_events = [e for e in result.events if e["kind"] == "clearing"]
-          if not clearing_events:
-              print("Clearing never fired; all file reads remain in context.")
-              return
-      
-          # Walk messages in order to reconstruct the sequence of tool_use blocks
-          # and which turn each one came from. The last `keep` of these survive
-          # the most recent clearing; earlier ones are cleared.
-          tool_use_seq: list[tuple[int, str, str]] = []  # (turn, name, path-if-read)
-          turn = 0
-          for msg in result.messages:
-              if msg["role"] == "assistant" and isinstance(msg["content"], list):
-                  turn += 1
-                  for block in msg["content"]:
-                      if block.get("type") == "tool_use":
-                          name = block.get("name", "?")
-                          path = block.get("input", {}).get("path", "") if name == "read_file" else ""
-                          tool_use_seq.append((turn, name, path))
-      
-          last_clear_turn = clearing_events[-1]["turn"]
-          seq_before = [t for t in tool_use_seq if t[0] < last_clear_turn]
-          cleared_reads = [(t, p) for (t, n, p) in seq_before[:-keep] if n == "read_file"]
-          surviving_reads = [
-              (t, p) for tu in seq_before[-keep:] for (t, n, p) in [tu] if n == "read_file"
-          ]
-          # Reads at or after the last clearing turn are untouched by it.
-          surviving_reads += [
-              (t, p) for (t, n, p) in tool_use_seq if t >= last_clear_turn and n == "read_file"
-          ]
-      
-          total = len(result.file_reads)
-          print(f"Total file reads across session: {total}")
-          print(f"Last clearing event fired at turn {last_clear_turn} (keep={keep})")
-          print(f"\nReads cleared from context: {len(cleared_reads)}")
-          for turn, path in cleared_reads[:12]:
-              print(f"  ✗ turn {turn:2d}: {path}")
-          if len(cleared_reads) > 12:
-              print(f"  ... and {len(cleared_reads) - 12} more")
-          print(
-              f"\nReads still in context (within the keep={keep} window or after "
-              f"the last clearing): {len(surviving_reads)}"
-          )
-          for turn, path in surviving_reads[:6]:
-              print(f"  ✓ turn {turn:2d}: {path}")
-          if len(surviving_reads) > 6:
-              print(f"  ... and {len(surviving_reads) - 6} more")
-      
-      
-      # ── Plot helpers ─────────────────────────────────────────────────────────
-      
-      
-      def plot_trajectories(
-          results: dict[str, SessionResult],
-          title: str = "Context size per turn",
-          triggers: dict[str, int] | None = None,
-          project_growth_for: str | None = None,
-      ):
-          """Line plot of context tokens per turn.
-      
-          Vertical dashed lines mark turns where clearing fired; dash-dot lines
-          mark compaction. Horizontal dotted lines mark configured trigger
-          thresholds (pass triggers={"clearing": 20000, "compaction": 50000} etc).
-      
-          project_growth_for: label of a run to extrapolate. Fits a line to the
-          last 5 points and draws a dotted segment forward ~8 turns to show where
-          unmanaged growth is heading.
-          """
-          fig, ax = plt.subplots(figsize=(10, 4.5))
-          colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]
-      
-          for i, (label, res) in enumerate(results.items()):
-              turns = [t for t, _ in res.token_trajectory]
-              tokens = [tok for _, tok in res.token_trajectory]
-              color = colors[i % len(colors)]
-              ax.plot(turns, tokens, marker="o", label=label, markersize=4, color=color)
-              # Mark events on the x-axis
-              for ev in res.events:
-                  style = "--" if ev["kind"] == "clearing" else "-."
-                  ax.axvline(x=ev["turn"], color=color, linestyle=style, alpha=0.25, linewidth=1)
-              # Dotted growth projection for the named run, capped at 1M
-              if label == project_growth_for and len(turns) >= 3:
-                  HARD_LIMIT = 1_000_000
-                  fit_n = min(5, len(turns))
-                  xs, ys = turns[-fit_n:], tokens[-fit_n:]
-                  n = len(xs)
-                  sx, sy = sum(xs), sum(ys)
-                  slope = (n * sum(x * y for x, y in zip(xs, ys, strict=False)) - sx * sy) / (
-                      n * sum(x * x for x in xs) - sx * sx
-                  )
-                  intercept = (sy - slope * sx) / n
-                  proj_x, proj_y = [], []
-                  for x in range(turns[-1], turns[-1] + 9):
-                      y = slope * x + intercept
-                      if y > HARD_LIMIT:
-                          # Clip the last segment to the 1M ceiling and stop
-                          if proj_y and slope > 0:
-                              proj_x.append(proj_x[-1] + (HARD_LIMIT - proj_y[-1]) / slope)
-                              proj_y.append(HARD_LIMIT)
-                          break
-                      proj_x.append(x)
-                      proj_y.append(y)
-                  if proj_y:
-                      ax.plot(proj_x, proj_y, linestyle=":", color=color, alpha=0.6, linewidth=1.5)
-      
-          # Horizontal reference lines for trigger thresholds
-          if triggers:
-              for name, value in triggers.items():
-                  ax.axhline(y=value, color="gray", linestyle=":", alpha=0.6, linewidth=1)
-                  ax.annotate(
-                      f"{name} trigger: {value:,}",
-                      xy=(ax.get_xlim()[1], value),
-                      xytext=(-5, 3),
-                      textcoords="offset points",
-                      ha="right",
-                      va="bottom",
-                      fontsize=8,
-                      color="gray",
-                  )
-      
-          # 200K reference: earlier models cap here and would hard-stop
-          if ax.get_ylim()[1] > 30_000:
-              ax.axhline(y=200_000, color="gray", linestyle="--", alpha=0.5, linewidth=1)
-              ax.annotate(
-                  "200K: earlier models stop here",
-                  xy=(ax.get_xlim()[0], 200_000),
-                  xytext=(5, -12),
-                  textcoords="offset points",
-                  ha="left",
-                  va="top",
-                  fontsize=8,
-                  color="gray",
-                  alpha=0.8,
-              )
-      
-          ax.set_xlabel("Turn")
-          ax.set_ylabel("Context tokens (incl. cached)")
-          ax.set_title(title)
-          ax.legend(loc="best")
-          ax.grid(alpha=0.3)
-          plt.tight_layout()
-          plt.show()
-      
-      
-      def plot_summary_bars(results: dict[str, SessionResult], title: str = "Session outcomes"):
-          """Side-by-side bars: final context, file reads."""
-          labels = list(results.keys())
-          final_ctx = [r.token_trajectory[-1][1] for r in results.values()]
-          reads = [r.tool_counts.get("read_file", 0) for r in results.values()]
-      
-          fig, axes = plt.subplots(1, 2, figsize=(9, 3.5))
-          fig.suptitle(title, y=1.02)
-      
-          for ax, values, ylabel in zip(
-              axes,
-              [final_ctx, reads],
-              ["Final context (tokens)", "File reads"],
-              strict=False,
-          ):
-              bars = ax.bar(range(len(labels)), values, color=plt.cm.Set2(range(len(labels))))
-              ax.set_ylabel(ylabel)
-              ax.set_xticks(range(len(labels)))
-              ax.set_xticklabels(labels, rotation=30, ha="right", fontsize=8)
-              ax.grid(axis="y", alpha=0.3)
-              for bar, val in zip(bars, values, strict=False):
-                  ax.text(
-                      bar.get_x() + bar.get_width() / 2,
-                      bar.get_height(),
-                      f"{val:,}" if val > 1000 else str(val),
-                      ha="center",
-                      va="bottom",
-                      fontsize=7,
-                  )
-      
-          plt.tight_layout()
-          plt.show()
-  markdown cell:
-    source:
-      ### Baseline: no context management
-      
-      First we run the agent with no context-management configuration. With the large corpus (each document is ~40K tokens with its appendix tables), context accumulates fast. We'll look at the same run under two lenses: what happens on a 1M-token window, and what would happen on a 200K window.
-      
-      #### Part 1: On a 1M-token window
-      
-      Claude Sonnet 4.6 and Claude Opus 4.6 both provide a [1M-token context window](https://platform.claude.com/docs/en/build-with-claude/context-windows). For this task, the baseline's total input stays under that limit: the agent reads the full corpus and synthesizes without hitting a hard wall. The trajectory below shows the run climbing to hundreds of thousands of tokens, with the dotted line projecting continued growth at the same rate.
-  code cell:
-    source:
-      baseline = run_research_session(RESEARCH_TASK, label="baseline", max_turns=12)
-      
-      plot_trajectories(
-          {"baseline": baseline},
-          "Baseline on a 1M window: context grows past 200K",
-          project_growth_for="baseline",
-      )
-      
-      peak = max(t for _, t in baseline.token_trajectory)
-      n_turns = len(baseline.token_trajectory)
-      print(f"Peak context: {peak:,} tokens across {n_turns} turns")
-      print(f"File reads: {baseline.tool_counts.get('read_file', 0)}, Notes: {len(baseline.notes)}")
-      if baseline.hit_limit:
-          print("\n⚠ This run hit the 1M hard limit.")
-      else:
-          print(
-              "\nThe run stayed within the 1M window. The dotted line projects "
-              "where continued growth at the same rate would land, capped at 1M."
-          )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [baseline]
-          │  turn  1  ctx=  1,058
-          │            read_file        /research/celegans_review.md  → ~27,465 tok
-          │            read_file        /research/drosophila_review.md  → ~27,435 tok
-          │            read_file        /research/yeast_review.md  → ~27,320 tok
-          │            read_file        /research/killifish_review.md  → ~27,349 tok
-          │  turn  2  ctx=166,043
-          │            record_finding   "## BATCH 1 NOTES: High-Throughput Model Organisms ..."
-          │  turn  3  ctx=168,242
-          │            read_file        /research/mouse_review.md  → ~27,390 tok
-          │            read_file        /research/zebrafish_review.md  → ~27,363 tok
-          │            read_file        /research/nmr_review.md  → ~27,287 tok
-          │            read_file        /research/rhesus_review.md  → ~27,285 tok
-          │  turn  4  ctx=332,821
-          │            record_finding   "## BATCH 2 NOTES: Low-Throughput Model Organisms —..."
-          │  turn  5  ctx=335,279  (final answer)
-          └─ completed: 5 turns, peak ctx 335,279, final ctx 335,279, 0 context event(s)
-          
-      output 1:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=7c711ab132fc64d3...>
-          text/plain: <Figure size 1000x450 with 1 Axes>
-      output 2:
-        output_type: stream
-        name: stdout
-        text:
-          Peak context: 335,279 tokens across 5 turns
-          File reads: 8, Notes: 2
-          
-          The run stayed within the 1M window. The dotted line projects where continued growth at the same rate would land, capped at 1M.
-  code cell:
-    source:
-      # ── What's actually sitting in the context window at the end of the run ──
-      # Walk the message list and bucket everything by type. This is what the
-      # model has to attend to on every turn: no hard wall stopped it, but the
-      # pile keeps growing and every fact from early turns is buried deeper.
-      
-      
-      def _bucket_context(messages: list[dict]) -> dict[str, tuple[int, int]]:
-          """Categorize message content into (count, approx_tokens) per bucket."""
-          buckets: dict[str, list[int]] = {
-              "File-read results": [],
-              "Other tool results": [],
-              "Agent reasoning text": [],
-              "Tool-call records": [],
-              "User/task prompts": [],
-          }
-          for msg in messages:
-              content = msg.get("content")
-              if isinstance(content, str):
-                  buckets["User/task prompts"].append(len(content))
-              elif isinstance(content, list):
-                  for block in content:
-                      bt = block.get("type")
-                      if bt == "tool_result":
-                          body = block.get("content", "")
-                          body = body if isinstance(body, str) else json.dumps(body)
-                          # Heuristic: large tool results are file reads
-                          key = "File-read results" if len(body) > 5000 else "Other tool results"
-                          buckets[key].append(len(body))
-                      elif bt == "text":
-                          buckets["Agent reasoning text"].append(len(block.get("text", "")))
-                      elif bt == "tool_use":
-                          buckets["Tool-call records"].append(len(json.dumps(block.get("input", {}))))
-          # chars // 4 as a rough token estimate; the exact number isn't the point
-          return {k: (len(v), sum(v) // 4) for k, v in buckets.items() if v}
-      
-      
-      print("=" * 72)
-      print("WHAT THE MODEL IS ATTENDING TO at the end of the baseline run")
-      print("=" * 72)
-      breakdown = _bucket_context(baseline.messages)
-      total_est = sum(tok for _, tok in breakdown.values())
-      # Scale the char-based estimates so the total matches the API's reported
-      # context size. This keeps the proportions meaningful and the absolute
-      # numbers consistent with the usage figures shown elsewhere.
-      api_total = baseline.token_trajectory[-1][1]
-      scale = api_total / total_est if total_est else 1.0
-      print(f"\nContext breakdown ({api_total:,} tokens total per API usage):\n")
-      for label, (count, tokens) in sorted(breakdown.items(), key=lambda kv: -kv[1][1]):
-          scaled = int(tokens * scale)
-          pct = 100 * tokens / total_est if total_est else 0
-          print(f"  {label:<26} ~{scaled:>8,} tokens  ({count:>3} blocks, {pct:4.1f}%)")
-      
-      if baseline.file_reads:
-          first_turn, first_path = baseline.file_reads[0]
-          doc_name = first_path.split("/")[-1]
-          print(f"\nFirst document read: {doc_name} at turn {first_turn}.")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ========================================================================
-          WHAT THE MODEL IS ATTENDING TO at the end of the baseline run
-          ========================================================================
-          
-          Context breakdown (335,279 tokens total per API usage):
-          
-            File-read results          ~ 322,946 tokens  (  8 blocks, 96.3%)
-            Tool-call records          ~   6,287 tokens  ( 10 blocks,  1.9%)
-            Agent reasoning text       ~   5,660 tokens  (  5 blocks,  1.7%)
-            User/task prompts          ~     357 tokens  (  1 blocks,  0.1%)
-            Other tool results         ~      26 tokens  (  2 blocks,  0.0%)
-          
-          First document read: celegans_review.md at turn 1.
-  markdown cell:
-    source:
-      The breakdown above makes the scale concrete. The model is carrying hundreds of thousands of tokens of file contents on every turn, most of it documents the agent already processed and took notes on. The first document read is still in the window, but by the end of the run it's sitting behind hundreds of thousands of tokens of other tool results plus all the agent's reasoning and notes. It hasn't been removed; it's competing with everything else for attention. This is where context rot shows up: recall of details from that depth degrades as the window fills, even though the content is technically present. And prefill latency scales with context length, so every turn pays to process the full pile.
-      
-      #### Part 2: On a 200K-token window
-      
-      Earlier models cap at 200K tokens. On those models, the same baseline run hits a hard wall: the API rejects the next request once context exceeds the limit, and the task stops mid-run.
-      
-      The cell below finds the turn where the baseline first crossed 200K and shows what the run looks like from a 200K model's perspective: same trajectory up to that point, then a hard stop.
-  code cell:
-    source:
-      # No new agent run here. We reuse the Part-1 baseline's token_trajectory
-      # and clip it at the turn where context first exceeded 200K, to show what
-      # the same run would look like from a 200K model's perspective.
-      cross_turn = next((t for t, tok in baseline.token_trajectory if tok > 200_000), None)
-      
-      if cross_turn is None:
-          print(
-              "Baseline never exceeded 200K tokens this run; a 200K model "
-              "would have seen the same result."
-          )
-      else:
-          # Plot the turns that fit under 200K, plus a dotted segment to the
-          # rejected turn and a red cutoff marker between them.
-          clipped_traj = [(t, tok) for t, tok in baseline.token_trajectory if t < cross_turn]
-          turns_c = [t for t, _ in clipped_traj]
-          toks_c = [tok for _, tok in clipped_traj]
-          rejected_ctx = dict(baseline.token_trajectory).get(cross_turn, 200_000)
-      
-          fig, ax = plt.subplots(figsize=(10, 4.5))
-          ax.plot(turns_c, toks_c, marker="o", markersize=4, label="baseline")
-          # Dotted segment from last successful turn to the rejected turn
-          ax.plot(
-              [turns_c[-1], cross_turn],
-              [toks_c[-1], min(rejected_ctx, 200_000)],
-              linestyle=":",
-              color="#1f77b4",
-              alpha=0.6,
-              linewidth=1.5,
-          )
-          # Hard red vertical line marking the cutoff between the last
-          # successful turn and the rejected one
-          cutoff_x = (turns_c[-1] + cross_turn) / 2
-          ax.axvline(x=cutoff_x, color="#d62728", linewidth=2)
-          ax.set_xlim(0.5, cross_turn + 0.5)
-          ax.set_ylim(0, 205_000)
-          ax.text(
-              0.02,
-              0.96,
-              "context window: 200,000",
-              transform=ax.transAxes,
-              ha="left",
-              va="top",
-              fontsize=9,
-              color="#d62728",
-          )
-          ax.set_xlabel("Turn")
-          ax.set_ylabel("Context tokens (incl. cached)")
-          ax.set_title("Baseline on a 200K window: agent stopped mid-task")
-          ax.legend(loc="lower right")
-          ax.grid(alpha=0.3)
-          plt.tight_layout()
-          plt.show()
-      
-          last_turn = clipped_traj[-1][0]
-          last_ctx = clipped_traj[-1][1]
-          reads_done = sum(1 for t, _ in baseline.file_reads if t < cross_turn)
-          notes_at_stop = sum(
-              1
-              for m in baseline.messages[: 2 * cross_turn]
-              if isinstance(m.get("content"), list)
-              for b in m["content"]
-              if b.get("type") == "tool_use" and b.get("name") == "record_finding"
-          )
-          print("⚠ Baseline HIT THE CONTEXT WINDOW LIMIT.")
-          print(f"  Completed {last_turn} turns before the API rejected the next request.")
-          print(f"  Last successful context: {last_ctx:,} tokens")
-          print(f"  File reads attempted: {reads_done}")
-          print(f"  Notes taken: {notes_at_stop}")
-          print()
-          print(
-              "  The agent stopped mid-task. Without context management, it "
-              "cannot continue past this point."
-          )
-    outputs:
-      output 0:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=9396cc02933bd972...>
-          text/plain: <Figure size 1000x450 with 1 Axes>
-      output 1:
-        output_type: stream
-        name: stdout
-        text:
-          ⚠ Baseline HIT THE CONTEXT WINDOW LIMIT.
-            Completed 3 turns before the API rejected the next request.
-            Last successful context: 168,242 tokens
-            File reads attempted: 8
-            Notes taken: 2
-          
-            The agent stopped mid-task. Without context management, it cannot continue past this point.
-  markdown cell:
-    source:
-      Both failure modes come from the same underlying problem: the context window fills with hundreds of thousands of tokens of file content, most of it already processed and noted. What differs is how the failure surfaces. On a 200K window it's a hard stop: the API rejects the next request and the task ends mid-phase. On a 1M window the agent keeps running, but context rot sets in as the window fills: an early document read is still technically present, but by the end of the run it's buried under everything read since, and the model's ability to recall its details degrades. The agent completes, but the quality of the synthesis depends on recall that's fighting against that pile. Prefill latency scales with it too: every turn pays to process the full context, regardless of how much of it is still useful.
-      
-      The primitives below each address this by keeping the working set small enough that neither failure mode bites: the window doesn't fill, so smaller models don't stop and larger models don't degrade. Plots include the dashed 200K reference line so you can see where an earlier model would have been cut off.
-  markdown cell:
-    source:
-      ---
-      
-      ## Compaction
-      
-      [Compaction](https://platform.claude.com/docs/en/build-with-claude/compaction) is a useful strategy for managing context in long-running conversations: it takes a conversation nearing the context window limit, summarizes its contents, and reinitiates with that summary. This addresses the agent's own reasoning text, user back-and-forth, and decisions made over the course of a session. The specific sequence of actions and exact wording from earlier turns won't be preserved, but the goals, decisions, and major discoveries the agent made are summarized — what the summary retains depends on your compaction prompt, which we cover below.
-      
-      At its core, compaction distills the contents of a context window in a high-fidelity manner, enabling the agent to continue with minimal performance degradation. The trade-off is in choosing what the summary must retain versus what it can safely drop: overly aggressive compaction can lose subtle but critical context whose importance only becomes apparent later. The summary preserves key decisions and facts but may drop specific numbers or exact phrasing. It costs inference (the summarizer model runs), but handles all context growth, not just tool results.
-      
-      ### How it works under the hood
-      
-      Here's a minimal sample implementation of compaction. Our first-party API provides a robust, tested version (automatic triggering at a token threshold, a typed content block that slots natively into the conversation, correct tool-use pairing), but the ~25-line version below makes the mechanism concrete: render the conversation to text, ask the model to summarize it, replace the old messages with that summary.
-  code cell:
-    source:
-      def demo_compact(client, messages: list[dict], model: str) -> list[dict]:
-          """Summarize the conversation and return a fresh message list containing only the summary."""
-          # Render to a plain-text transcript
-          lines: list[str] = []
-          for msg in messages:
-              role = msg["role"]
-              content = msg.get("content", "")
-              if isinstance(content, str):
-                  lines.append(f"[{role}] {content}")
-              elif isinstance(content, list):
-                  for block in content:
-                      bt = block.get("type", "?")
-                      if bt == "text":
-                          lines.append(f"[{role}] {block['text']}")
-                      elif bt == "tool_use":
-                          lines.append(f"[{role}→tool] {block['name']}({block.get('input', {})})")
-                      elif bt == "tool_result":
-                          body = block.get("content", "")
-                          body = body if isinstance(body, str) else json.dumps(body)
-                          lines.append(f"[tool_result] {body[:200]}{'…' if len(body) > 200 else ''}")
-          transcript = "\n".join(lines)
-      
-          resp = client.messages.create(
-              model=model,
-              max_tokens=1024,
-              messages=[
-                  {
-                      "role": "user",
-                      "content": (
-                          "Summarize the following agent transcript, preserving key "
-                          "decisions, facts learned, and open threads. Be concise.\n\n"
-                          f"<transcript>\n{transcript}\n</transcript>"
-                      ),
-                  }
-              ],
-          )
-          summary = resp.content[0].text
-          return [{"role": "user", "content": f"[conversation summary]\n{summary}"}]
-  code cell:
-    source:
-      # Demonstrate on a longer synthetic conversation so the summary is meaningfully shorter
-      demo_convo = []
-      organisms = [
-          (
-              "C. elegans",
-              "18-day median lifespan at 20°C. Survival curve takes 4-6 weeks. Automated platforms handle thousands of animals. Genome-wide RNAi by feeding, fast CRISPR knock-ins. ~40% human orthology, no adaptive immunity, post-mitotic soma.",
-          ),
-          (
-              "Drosophila",
-              "60-80 day median lifespan at 25°C. Survival curves take 3-4 months. GAL4/UAS tissue-specific expression, FLP/FRT mosaic analysis, mature CRISPR. ~60% human disease gene orthology. Good for cardiac aging and neurodegeneration models.",
-          ),
-          (
-              "Mouse",
-              "~24-30 month median lifespan. Full study is 3-year commitment, $1-2/day/animal, $100K+ per 100-mouse cohort. Cre-lox conditionals, inbred strain panels, humanized models. ~85% orthology, best translational predictor.",
-          ),
-          (
-              "Killifish",
-              "4-6 month median lifespan (GRZ strain), shortest-lived vertebrate in the lab. Survival curves under a year. CRISPR works, toolkit still thin. ~70% orthology. Rapid vertebrate aging phenotypes on a compressed timeline.",
-          ),
-          (
-              "Yeast",
-              "Replicative lifespan ~25 divisions, ~1-2 weeks to result. Genome-wide deletion collection, trivial CRISPR, pooled competitive assays. ~30% orthology. Great for mechanism, not for translation.",
-          ),
-          (
-              "Naked mole-rat",
-              "Max lifespan >37 years, negligible senescence. No controlled genetics (eusocial breeding). Cell-line studies only. Near-zero cancer incidence. Comparative biology resource, not a screening organism.",
-          ),
-      ]
-      for name, facts in organisms:
-          demo_convo.append({"role": "user", "content": f"Tell me about {name} for aging research."})
-          demo_convo.append({"role": "assistant", "content": f"{name}: {facts}"})
-      demo_convo.append({"role": "user", "content": "What are the key tradeoffs across all of these?"})
-      demo_convo.append(
-          {
-              "role": "assistant",
-              "content": "The core tradeoff is throughput vs. translational relevance. Short-lived invertebrates (worm, fly, yeast) let you screen fast but translate weakly. Mouse is the translational gatekeeper but slow and expensive. Killifish tries to split the difference. The mole-rat is a comparative outlier you study to understand exceptional longevity, not to screen interventions.",
-          }
-      )
-      
-      before_tokens = count_tokens(json.dumps(demo_convo))
-      compacted = demo_compact(client, demo_convo, MODEL)
-      after_tokens = count_tokens(json.dumps(compacted))
-      print(
-          f"Conversation size: ~{before_tokens:,} → ~{after_tokens:,} tokens "
-          f"({(before_tokens - after_tokens) / before_tokens:.0%} reduction)"
-          if after_tokens < before_tokens
-          else f"Conversation size: ~{before_tokens:,} → ~{after_tokens:,} tokens"
-      )
-      print(f"\nSummary produced:\n{compacted[0]['content'][:800]}")
-      if len(compacted[0]["content"]) > 800:
-          print("...")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          Conversation size: ~741 → ~449 tokens (39% reduction)
-          
-          Summary produced:
-          [conversation summary]
-          ## Summary: Model Organisms for Aging Research
-          
-          ### Key Facts Learned
-          | Organism | Lifespan | Study Duration | Human Orthology | Standout Feature |
-          |---|---|---|---|---|
-          | *C. elegans* | 18-day median | 4–6 weeks | ~40% | Genome-wide RNAi; high throughput |
-          | *Drosophila* | 60–80 days | 3–4 months | ~60% | Tissue-specific genetics; cardiac/neuro models |
-          | Mouse | 24–30 months | ~3 years | ~85% | Best translational predictor; expensive ($100K+/cohort) |
-          | Killifish | 4–6 months | <1 year | ~70% | Shortest-lived vertebrate; thin toolkit |
-          | Yeast | ~25 divisions | 1–2 weeks | ~30% | Mechanistic screening; poor translation |
-          | Naked mole-rat | >37 years | N/A | — | Negligible senescence; no genetic control |
-          
-          ### Core Tradeoff
-          **Throughput ↔ Translational relevance** —
-          ...
-  markdown cell:
-    source:
-      The sample above demonstrates the mechanism: the model produces a condensed version of the conversation that the agent can continue from.
-      
-      ### Using the API
-      
-      Our API provides this natively as the `compact_20260112` context edit. It triggers automatically at a token threshold (minimum 50K), returns a typed `compaction` content block that slots into the conversation natively, and handles tool-use pairing across the summary boundary. When compaction fires, you serialize the compaction block back (`{"type": "compaction", "content": block.content}`) and the API drops everything before it on the next request.
-      
-      **API Documentation:** [Compaction — platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/compaction)
-      
-      Here's the research agent running with compaction configured. We set the trigger at 180K so the first batch of reads (~165K) stays under it: the compaction trajectory tracks the baseline through that batch, then diverges when the second batch pushes context past the trigger. Watch for `⊟ COMPACTION` lines in the output and the drop on the plot where the summary replaces the earlier conversation.
-  code cell:
-    source:
-      COMPACTION_TRIGGER = 180_000  # above batch-1's ~165K so the rise is visible before compaction fires
-      
-      compaction_run = run_research_session(
-          RESEARCH_TASK,
-          betas=["compact-2026-01-12"],
-          context_management={
-              "edits": [
-                  {
-                      "type": "compact_20260112",
-                      "trigger": {"type": "input_tokens", "value": COMPACTION_TRIGGER},
-                  }
-              ]
-          },
-          max_turns=12,
-          label="compaction",
-      )
-      
-      plot_trajectories(
-          {"baseline": baseline, "compaction": compaction_run},
-          "Compaction: tracks baseline through batch 1, then summarizes and continues",
-          triggers={"compaction": COMPACTION_TRIGGER},
-      )
-      
-      compaction_events = [e for e in compaction_run.events if e["kind"] == "compaction"]
-      
-      # Outcome comparison
-      print(
-          f"Baseline:    {'HIT LIMIT at' if baseline.hit_limit else 'completed'} "
-          f"turn {len(baseline.token_trajectory)}, "
-          f"peak {max(t for _, t in baseline.token_trajectory):,} tokens"
-      )
-      print(
-          f"Compaction:  {'HIT LIMIT at' if compaction_run.hit_limit else 'completed'} "
-          f"turn {len(compaction_run.token_trajectory)}, "
-          f"peak {max(t for _, t in compaction_run.token_trajectory):,} tokens, "
-          f"{len(compaction_events)} compaction event(s)"
-      )
-      
-      if compaction_events:
-          for i, ev in enumerate(compaction_events, 1):
-              print(
-                  f"  Compaction #{i} at turn {ev['turn']}: ~{count_tokens(ev['summary']):,}-token summary"
-              )
-      
-      baseline_peak = max(t for _, t in baseline.token_trajectory)
-      compaction_peak = max(t for _, t in compaction_run.token_trajectory)
-      print(
-          f"\n→ Baseline climbed to {baseline_peak:,} tokens; compaction kept "
-          f"the peak at {compaction_peak:,}. On a smaller context window the "
-          f"baseline would have been cut off mid-task; here it completed, but "
-          f"every turn paid to process the full pile."
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [compaction]
-          │  turn  1  ctx=  1,129
-          │            read_file        /research/celegans_review.md  → ~27,465 tok
-          │            read_file        /research/drosophila_review.md  → ~27,435 tok
-          │            read_file        /research/yeast_review.md  → ~27,320 tok
-          │            read_file        /research/killifish_review.md  → ~27,349 tok
-          │  turn  2  ctx=166,103
-          │            record_finding   "## BATCH 1 DETAILED NOTE: High-Throughput Model Or..."
-          │  turn  3  ctx=169,164
-          │            read_file        /research/mouse_review.md  → ~27,390 tok
-          │            read_file        /research/zebrafish_review.md  → ~27,363 tok
-          │            read_file        /research/nmr_review.md  → ~27,287 tok
-          │            read_file        /research/rhesus_review.md  → ~27,285 tok
-          │  ⊟ COMPACTION (turn 4): ~2,783-token summary replaces prior turns
-          │  turn  5  ctx=  5,635
-          │            search_files     'comparative aging model organisms synthesis pipeline translational'
-          │  turn  6  ctx=  5,751
-          │            search_files     'aging model organism review'
-          │  turn  7  ctx=  5,829  (final answer)
-          └─ completed: 7 turns, peak ctx 169,164, final ctx 5,829, 1 context event(s)
-          
-      output 1:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=2839675bba50ff4f...>
-          text/plain: <Figure size 1000x450 with 1 Axes>
-      output 2:
-        output_type: stream
-        name: stdout
-        text:
-          Baseline:    completed turn 5, peak 335,279 tokens
-          Compaction:  completed turn 7, peak 169,164 tokens, 1 compaction event(s)
-            Compaction #1 at turn 4: ~2,783-token summary
-          
-          → Baseline climbed to 335,279 tokens; compaction kept the peak at 169,164. On a smaller context window the baseline would have been cut off mid-task; here it completed, but every turn paid to process the full pile.
-  code cell:
-    source:
-      # What compaction costs: check the summary text directly for a MIX of
-      # details. We inspect what the summary PRESERVED, not what the agent
-      # happens to have in context at end-of-run (which could include reads
-      # that happened after compaction fired).
-      compaction_events = [e for e in compaction_run.events if e["kind"] == "compaction"]
-      
-      print("=" * 70)
-      print("WHAT COMPACTION COSTS: what the summary preserved vs. dropped")
-      print("=" * 70)
-      
-      if not compaction_events:
-          print("(Compaction did not fire this run; probe skipped)")
-      else:
-          # Use the last compaction summary — it covers the most history.
-          last_ev = compaction_events[-1]
-          summary_text = last_ev["summary"]
-          n_tok = count_tokens(summary_text)
-          print(
-              f"Last compaction at turn {last_ev['turn']} produced a "
-              f"~{n_tok:,}-token summary. The file content read before "
-              f"that point was distilled into this.\n"
-          )
-          print(
-              "Checking the summary text directly for six details: three "
-              "HIGH-LEVEL facts (central to the task, likely summarized) and "
-              "three OBSCURE specifics (appendix table cells, unlikely to make "
-              "the cut).\n"
-          )
-      
-          results = []
-          for question, answer_key, expected_survive in COMPACTION_PROBES:
-              survived = answer_key.lower() in summary_text.lower()
-              results.append((question, answer_key, expected_survive, survived))
-      
-          for label, expect in [
-              ("HIGH-LEVEL FACTS (expected to survive)", True),
-              ("OBSCURE SPECIFICS (expected to be lost)", False),
-          ]:
-              print(f"── {label} ──")
-              for q, key, exp, surv in results:
-                  if exp != expect:
-                      continue
-                  status = "✓ in summary" if surv else "✗ not in summary"
-                  print(f"  {status:<18} looking for '{key}' — {q}")
-              print()
-      
-          hl_survived = sum(1 for _, _, e, s in results if e and s)
-          hl_total = sum(1 for _, _, e, _ in results if e)
-          ob_survived = sum(1 for _, _, e, s in results if not e and s)
-          ob_total = sum(1 for _, _, e, _ in results if not e)
-      
-          print("=" * 70)
-          print(
-              f"RESULT: high-level {hl_survived}/{hl_total} preserved, "
-              f"obscure {ob_survived}/{ob_total} preserved"
-          )
-          print("=" * 70)
-      
-          # Show a representative excerpt of the summary so the reader can see
-          # the compression directly.
-          excerpt = summary_text.strip()
-          if len(excerpt) > 600:
-              excerpt = excerpt[:600] + "…"
-          print("\nExcerpt from the summary (first ~600 chars):\n")
-          for line in excerpt.splitlines():
-              print(f"  {line}")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ======================================================================
-          WHAT COMPACTION COSTS: what the summary preserved vs. dropped
-          ======================================================================
-          Last compaction at turn 4 produced a ~2,783-token summary. The file content read before that point was distilled into this.
-          
-          Checking the summary text directly for six details: three HIGH-LEVEL facts (central to the task, likely summarized) and three OBSCURE specifics (appendix table cells, unlikely to make the cut).
-          
-          ── HIGH-LEVEL FACTS (expected to survive) ──
-            ✓ in summary       looking for '18' — What is the approximate median lifespan of C. elegans at 20°C?
-            ✓ in summary       looking for 'killifish' — Which model organism is the shortest-lived vertebrate used in aging research?
-            ✓ in summary       looking for '60' — Roughly what percentage of human disease genes have Drosophila orthologs?
-          
-          ── OBSCURE SPECIFICS (expected to be lost) ──
-            ✗ not in summary   looking for '61' — In the appendix Table A5, what is the I-squared heterogeneity value for the NAD+ precursor intervention?
-            ✗ not in summary   looking for '55' — In appendix Table A2, what was the effect magnitude for the IIS reduction intervention in cohort 2?
-            ✗ not in summary   looking for '0.72' — In appendix Table A7, what is the PhenoAge-like epigenetic clock acceleration ratio under DR for liver tissue?
-          
-          ======================================================================
-          RESULT: high-level 3/3 preserved, obscure 0/3 preserved
-          ======================================================================
-          
-          Excerpt from the summary (first ~600 chars):
-          
-            ## Task Overview
-            I am a biology research analyst writing a comparative review of model organisms for aging research. The task has three phases:
-            1. **BATCH 1 (DONE):** Read and record a note on four high-throughput models (C. elegans, Drosophila, yeast, killifish)
-            2. **BATCH 2 (IN PROGRESS):** Read four low-throughput models (mouse, zebrafish, NMR, rhesus) and record a single note — **FILES HAVE BEEN READ, NOTE NOT YET RECORDED**
-            3. **FINAL:** Write a comprehensive comparative synthesis contrasting both batches
-            
-            ---
-            
-            ## Current State
-            
-            ### Batch 1 — COMPLETE
-            - All four files read: `celegans_revi…
-  markdown cell:
-    source:
-      ### Analysis
-      
-      The baseline keeps climbing until it either hits a context-window limit (a hard stop on smaller windows) or accumulates enough tokens that context rot meaningfully degrades recall. Compaction addresses both: when context crosses the trigger, the older conversation is replaced by a model-generated summary and context drops sharply. The agent continues with a lean window instead of an ever-growing one.
-      
-      The probe above checks the summary text directly for a *mix* of details. The pattern that tends to emerge: high-level facts central to the task (lifespan figures the agent noted, organism identities, major comparisons) usually survive in the summary. Obscure specifics (a single cell in an appendix table, a heterogeneity statistic) usually don't. This is a meaningful difference from tool-result clearing: clearing drops tool results *wholesale* so the content is gone until re-fetched, while compaction keeps the *substance* in compressed form but loses verbatim detail.
-      
-      What compaction gets you is a general-purpose way to keep the window lean: it handles dialogue and tool results together, the important content survives in summarized form, and the agent keeps working under conditions where it would otherwise be cut off or swamped. What it doesn't get you is verbatim fidelity on specifics, or cross-session persistence. If your context bloat is mostly re-fetchable tool output, clearing is cheaper and lossless (the agent can just call the tool again). If it's dialogue and reasoning that can't be re-fetched, compaction is the right fit.
-  markdown cell:
-    source:
-      ### Implementing compaction effectively
-      
-      The `instructions` parameter lets you replace the default summarization prompt entirely. The [compaction docs](https://platform.claude.com/docs/en/build-with-claude/compaction#custom-summarization-instructions) give the default prompt verbatim:
-      
-      > You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a `<summary></summary>` block.
-      
-      This helps give you a place to start. However, custom `instructions` don't supplement this prompt — they completely replace it. So if you provide your own, you're responsible for the full framing. The docs' example for a coding context is `"Focus on preserving code snippets, variable names, and technical decisions."`
-      
-      For this cookbook's research agent, you might write something that names the specific details the probe above showed are at risk of being lost:
-      
-      ```python
-      context_management={
-          "edits": [{
-              "type": "compact_20260112",
-              "trigger": {"type": "input_tokens", "value": 150_000},
-              "instructions": (
-                  "Summarize this research agent transcript. Preserve every "
-                  "quantitative lifespan figure and effect size with its "
-                  "source organism, and note which documents have been read "
-                  "and which remain. Wrap the summary in <summary></summary>."
-              ),
-          }]
-      }
-      ```
-  markdown cell:
-    source:
-      ---
-      
-      ## Tool-Result Clearing
-      
-      When an agent calls tools, each result gets appended to the conversation as a `tool_result` block ([context editing docs](https://platform.claude.com/docs/en/build-with-claude/context-editing)). Those blocks count toward the input-token budget on every subsequent turn, even after the agent has processed the content and moved on. For tools that are re-callable (file reads, API queries, search), carrying the verbatim result forward is often unnecessary; the agent can just call the tool again if it needs to.
-      
-      Clearing replaces old `tool_result` blocks with a short placeholder string. The `tool_use` block that preceded it stays, so the model retains a record that it made the call (and with what input), but the bulky response body is gone. This is the cheapest of the three primitives: no inference cost, just a mechanical edit to the message list.
-      
-      ### How it works under the hood
-      
-      To make the mechanism concrete, here's a minimal sample implementation of tool-result clearing. Our first-party API provides a robust, tested version of this (automatic triggering, correct block-pairing invariants, tool exclusions, and more), but seeing the ~15-line version makes the core operation tangible: walk the message list, find `tool_result` blocks, replace the content of all but the most recent few with a placeholder.
-  code cell:
-    source:
-      def demo_clear_tool_results(messages: list[dict], keep: int = 3) -> int:
-          """Replace the content of all but the last `keep` tool_result blocks.
-          Returns the number of blocks cleared."""
-          positions: list[tuple[int, int]] = []
-          for mi, msg in enumerate(messages):
-              if msg.get("role") != "user" or not isinstance(msg.get("content"), list):
-                  continue
-              for bi, block in enumerate(msg["content"]):
-                  if isinstance(block, dict) and block.get("type") == "tool_result":
-                      positions.append((mi, bi))
-      
-          to_clear = positions[:-keep] if keep > 0 else positions
-          for mi, bi in to_clear:
-              messages[mi]["content"][bi]["content"] = "[cleared to save context]"
-          return len(to_clear)
-  code cell:
-    source:
-      # Demonstrate on a small message list using the actual research corpus
-      sample_paths = list(CORPUS.keys())[:3]  # first three docs
-      
-      demo = [{"role": "user", "content": "Read three files and summarize."}]
-      for i, path in enumerate(sample_paths):
-          demo.append(
-              {
-                  "role": "assistant",
-                  "content": [
-                      {"type": "tool_use", "id": f"t{i}", "name": "read_file", "input": {"path": path}},
-                  ],
-              }
-          )
-          demo.append(
-              {
-                  "role": "user",
-                  "content": [
-                      {"type": "tool_result", "tool_use_id": f"t{i}", "content": CORPUS[path]},
-                  ],
-              }
-          )
-      
-      before_tokens = count_tokens(json.dumps(demo))
-      cleared = demo_clear_tool_results(demo, keep=1)
-      after_tokens = count_tokens(json.dumps(demo))
-      
-      print(f"Cleared {cleared} of 3 tool results (keep=1 leaves the most recent)")
-      print(
-          f"Message-list size: ~{before_tokens:,} → ~{after_tokens:,} tokens "
-          f"({(before_tokens - after_tokens) / before_tokens:.0%} reduction)"
-      )
-      print()
-      print("What each tool_result looks like now:")
-      for _i, msg in enumerate(demo):
-          if not isinstance(msg.get("content"), list):
-              continue
-          for block in msg["content"]:
-              if block.get("type") == "tool_use":
-                  print(f"\n  [tool_use]  read_file({block['input']['path']})")
-              elif block.get("type") == "tool_result":
-                  body = block["content"]
-                  if body == "[cleared to save context]":
-                      print(f"  [tool_result] {body!r}  (← cleared)")
-                  else:
-                      preview = body[:80].replace("\n", " ")
-                      print(f"  [tool_result] {preview!r}... (~{count_tokens(body):,} tokens retained)")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          Cleared 2 of 3 tool results (keep=1 leaves the most recent)
-          Message-list size: ~128,740 → ~43,060 tokens (67% reduction)
-          
-          What each tool_result looks like now:
-          
-            [tool_use]  read_file(/research/celegans_review.md)
-            [tool_result] '[cleared to save context]'  (← cleared)
-          
-            [tool_use]  read_file(/research/drosophila_review.md)
-            [tool_result] '[cleared to save context]'  (← cleared)
-          
-            [tool_use]  read_file(/research/mouse_review.md)
-            [tool_result] '# Model Organism Review: Mus musculus in Aging Research  The laboratory mouse is'... (~41,155 tokens retained)
-  markdown cell:
-    source:
-      The sample above shows the mechanism. What's missing from it: token counting and automatic triggering, correct `tool_use`/`tool_result` pairing invariants, tool-specific exclusions, and awareness on the model side that clearing happened.
-      
-      ### Using the API
-      
-      Our API provides this natively as the `clear_tool_uses_20250919` context edit. It handles token counting and triggering server-side, preserves block pairing, and lets you exempt specific tools from clearing (useful when the memory tool is also active, as we'll see later). When clearing fires, the response includes `context_management.applied_edits` with details on how many tool uses were cleared and how many tokens were freed.
-      
-      > There's also `clear_thinking_20251015` for extended-thinking blocks. Same config shape, different `type`. It must be the first entry in the `edits` array if you're using both.
-      
-      **API Documentation:** [Context editing — platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/context-editing)
-      
-      Here's the research agent running with clearing enabled. The baseline's context climbed with every file read; clearing keeps this run bounded by dropping old tool results whenever context climbs past the trigger. Watch for `✂ CLEARING` lines in the output; dashed vertical lines on the plot mark each firing.
-  code cell:
-    source:
-      CLEARING_KEEP = 4  # How many most-recent tool results survive each clearing
-      
-      clearing_run = run_research_session(
-          RESEARCH_TASK,
-          betas=["context-management-2025-06-27"],
-          context_management={
-              "edits": [
-                  {
-                      "type": "clear_tool_uses_20250919",
-                      "trigger": {"type": "input_tokens", "value": 30_000},
-                      "keep": {"type": "tool_uses", "value": CLEARING_KEEP},
-                      "clear_at_least": {"type": "input_tokens", "value": 10_000},
-                  }
-              ]
-          },
-          label="clearing",
-          max_turns=12,
-      )
-      
-      plot_trajectories(
-          {"baseline": baseline, "clearing": clearing_run},
-          "Clearing: stays bounded while the baseline climbs",
-          triggers={"clearing": 30_000},
-      )
-      
-      # Show outcome comparison
-      baseline_turns = len(baseline.token_trajectory)
-      clearing_turns = len(clearing_run.token_trajectory)
-      print(
-          f"Baseline:  {'HIT LIMIT at' if baseline.hit_limit else 'completed'} "
-          f"turn {baseline_turns}, peak {max(t for _, t in baseline.token_trajectory):,} tokens"
-      )
-      print(
-          f"Clearing:  {'HIT LIMIT at' if clearing_run.hit_limit else 'completed'} "
-          f"turn {clearing_turns}, peak {max(t for _, t in clearing_run.token_trajectory):,} tokens"
-      )
-      
-      baseline_peak = max(t for _, t in baseline.token_trajectory)
-      clearing_peak = max(t for _, t in clearing_run.token_trajectory)
-      print(
-          f"\n→ Baseline climbed to {baseline_peak:,} tokens; clearing kept "
-          f"the peak at {clearing_peak:,}. On a smaller context window the "
-          f"baseline would have been cut off; here both completed, but the "
-          f"baseline was attending to far more on every turn."
-      )
-      
-      print(f"\nClearing events: {len([e for e in clearing_run.events if e['kind'] == 'clearing'])}")
-      print(
-          f"File reads: baseline={baseline.tool_counts.get('read_file', 0)}, "
-          f"clearing={clearing_run.tool_counts.get('read_file', 0)}"
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [clearing]
-          │  turn  1  ctx=  1,058
-          │            read_file        /research/celegans_review.md  → ~27,465 tok
-          │            read_file        /research/drosophila_review.md  → ~27,435 tok
-          │            read_file        /research/yeast_review.md  → ~27,320 tok
-          │            read_file        /research/killifish_review.md  → ~27,349 tok
-          │  turn  2  ctx=166,043
-          │            record_finding   "## BATCH 1 NOTES: High-Throughput Model Organisms ..."
-          │  turn  3  ctx=169,211
-          │            read_file        /research/mouse_review.md  → ~27,390 tok
-          │            read_file        /research/zebrafish_review.md  → ~27,363 tok
-          │            read_file        /research/nmr_review.md  → ~27,287 tok
-          │            read_file        /research/rhesus_review.md  → ~27,285 tok
-          │  ✂ CLEARING (turn 4): 4 tool results cleared, ~163,817 tokens freed
-          │  turn  4  ctx=169,174
-          │            record_finding   ""
-          │  ✂ CLEARING (turn 5): 4 tool results cleared, ~163,811 tokens freed
-          │  turn  5  ctx=169,255
-          │            record_finding   ""
-          │  ✂ CLEARING (turn 6): 4 tool results cleared, ~163,807 tokens freed
-          │  turn  6  ctx=169,313
-          │            record_finding   "## BATCH 2 NOTES: Low-Throughput Model Organisms (..."
-          │  ✂ CLEARING (turn 7): 4 tool results cleared, ~162,930 tokens freed
-          │  turn  7  ctx=173,137  (final answer)
-          └─ completed: 7 turns, peak ctx 173,137, final ctx 173,137, 4 context event(s)
-          
-      output 1:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=d8b9b7a6cc31f234...>
-          text/plain: <Figure size 1000x450 with 1 Axes>
-      output 2:
-        output_type: stream
-        name: stdout
-        text:
-          Baseline:  completed turn 5, peak 335,279 tokens
-          Clearing:  completed turn 7, peak 173,137 tokens
-          
-          → Baseline climbed to 335,279 tokens; clearing kept the peak at 173,137. On a smaller context window the baseline would have been cut off; here both completed, but the baseline was attending to far more on every turn.
-          
-          Clearing events: 4
-          File reads: baseline=8, clearing=8
-  code cell:
-    source:
-      # What's lost: which file reads are no longer in the clearing run's context?
-      print("=" * 60)
-      print("WHAT CLEARING COSTS: reads no longer in context")
-      print("=" * 60)
-      show_cleared_reads(clearing_run, keep=CLEARING_KEEP)
-      print(
-          "\nThe cleared reads above are gone from the conversation. If the "
-          "agent needs that content again, it must call read_file again; the "
-          "information is re-fetchable, but the original read is no longer "
-          "visible in context."
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ============================================================
-          WHAT CLEARING COSTS: reads no longer in context
-          ============================================================
-          Total file reads across session: 8
-          Last clearing event fired at turn 7 (keep=4)
-          
-          Reads cleared from context: 7
-            ✗ turn  1: /research/celegans_review.md
-            ✗ turn  1: /research/drosophila_review.md
-            ✗ turn  1: /research/yeast_review.md
-            ✗ turn  1: /research/killifish_review.md
-            ✗ turn  3: /research/mouse_review.md
-            ✗ turn  3: /research/zebrafish_review.md
-            ✗ turn  3: /research/nmr_review.md
-          
-          Reads still in context (within the keep=4 window or after the last clearing): 1
-            ✓ turn  3: /research/rhesus_review.md
-          
-          The cleared reads above are gone from the conversation. If the agent needs that content again, it must call read_file again; the information is re-fetchable, but the original read is no longer visible in context.
-  markdown cell:
-    source:
-      ### Analysis
-      
-      The baseline keeps climbing; the clearing run stays bounded. Once context is past the trigger (30K here) and there are more than `keep` tool uses on record, clearing fires server-side: tool results older than the most recent `keep` are replaced with placeholders and context drops back down. The dashed lines on the plot mark each firing. That bounded window means the run doesn't hit a hard limit on smaller models, and it doesn't accumulate into the range where context rot degrades recall.
-      
-      The second cell above shows what this costs. Every file read except the most recent few is gone from context. When the agent reaches the synthesis phase, it has two options. It can work from its own notes plus whatever recent reads survived the last clearing: if the notes were thorough, this is fine; if they were sparse, the synthesis misses details the agent saw but didn't record. Or it can re-fetch cleared content by calling `read_file` again: the clearing run may show more file reads than the baseline for the same documents, because some reads were cleared before the agent was done with them. How much the second path costs depends on your tools: re-reading a local file is nearly free, but re-calling a rate-limited or slow API is not. Tuning `keep` and `trigger` shifts where the agent lands between these two.
-      
-      What clearing gets you is a bounded window at no inference cost, avoiding both the hard-limit cutoff and the recall degradation that comes with a large accumulated context. What it doesn't get you is any help with content that isn't a tool result (the agent's own reasoning, user messages) or any persistence across sessions.
-  markdown cell:
-    source:
-      ### Implementing clearing effectively
-      
-      Unlike compaction and memory, clearing has no prompt to tune, and the knobs are all numeric (`trigger`, `keep`, `clear_at_least`) or list-based (`exclude_tools`). One trade-off to understand: clearing invalidates cached prompt prefixes. To account for this, clear enough tokens to make the cache invalidation worthwhile; the `clear_at_least` parameter ensures a minimum number of tokens is cleared each time. You'll incur cache write costs each time clearing fires, but subsequent requests can reuse the newly cached prefix.
-      
-      The right values for `trigger` and `keep` depend on how your agent uses tool results: how large they are, how often the agent revisits them, whether re-fetching is cheap. The clearing run above used trigger=30K and keep=4; the all-three run later uses a higher trigger and keep=6 so clearing and compaction split the work. Test a few configurations against your own agent's workload: the `context_management.applied_edits` field in each response shows how many tool uses and tokens were cleared, which makes the effect of each config directly observable.
-  markdown cell:
-    source:
-      ---
-      
-      ## Memory Tool
-      
-      The [memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool) enables Claude to store and retrieve information across conversations through a memory file directory. Claude can create, read, update, and delete files that persist between sessions, allowing it to build knowledge over time without keeping everything in the context window.
-      
-      This is the key primitive for just-in-time context retrieval: rather than loading all relevant information upfront, agents store what they learn in memory and pull it back on demand. This keeps the active context focused on what's currently relevant, which is critical for long-running workflows where loading everything at once would overwhelm the window. Clearing and compaction both operate on the current context; neither helps when a new session starts and the window is empty. Memory solves that problem.
-      
-      The memory tool operates client-side: Claude makes tool calls to perform memory operations, and your application executes those operations locally. This gives you complete control over where and how the data is stored. The API provides the tool protocol and auto-injects a system prompt establishing the memory-checking behavior; you implement the storage backend.
-      
-      ### How it works under the hood
-      
-      Here's a minimal sample implementation: a key-value store you write to after a session and read from before the next one. Our first-party API provides the robust version (the model decides what and when to save as part of its reasoning, full file operations, auto-injected protocol prompt), but this ~10-line version makes the core pattern concrete.
-  code cell:
-    source:
-      class SimpleMemoryStore:
-          def __init__(self):
-              self._store: dict[str, str] = {}
-      
-          def save(self, key: str, value: str) -> None:
-              self._store[key] = value
-      
-          def load(self, key: str) -> str | None:
-              return self._store.get(key)
-      
-          def keys(self) -> list[str]:
-              return list(self._store)
-      
-      
-      # Demonstrate the pattern manually
-      store = SimpleMemoryStore()
-      store.save(
-          "lifespans",
-          "C. elegans: ~18 days. Mouse: ~24-30 months. Killifish: 4-6 months (shortest vertebrate).",
-      )
-      store.save(
-          "tractability",
-          "Worm has genome-wide RNAi by feeding. Mouse has Cre-lox conditionals. Killifish toolkit still thin.",
-      )
-      
-      # In "session 2", you'd inject these into the first prompt:
-      preamble = "\n".join(f"- {k}: {store.load(k)}" for k in store.keys())
-      print(f"Session-2 opening prompt would include:\n\nPrior research notes:\n{preamble}")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          Session-2 opening prompt would include:
-          
-          Prior research notes:
-          - lifespans: C. elegans: ~18 days. Mouse: ~24-30 months. Killifish: 4-6 months (shortest vertebrate).
-          - tractability: Worm has genome-wide RNAi by feeding. Mouse has Cre-lox conditionals. Killifish toolkit still thin.
-  markdown cell:
-    source:
-      The sample above shows the pattern, but it puts you in charge of deciding what to save and when to load it. That's exactly the work the model is better positioned to do: it knows, mid-reasoning, what facts matter and when it needs to recall them.
-      
-      ### Using the API
-      
-      Our API provides this natively as the `memory_20250818` tool. The model decides what and when to save as part of its tool-use loop, an auto-injected system prompt establishes the protocol ("always view your memory directory before doing anything else"), and the tool offers full file operations rather than key-value. This is a client-side tool: the API provides the protocol, you implement the file backend.
-      
-      **API Documentation:** [Memory tool — platform.claude.com](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool)
-      
-      Here's a complete handler implementing all six commands.
-  code cell:
-    source:
-      class MemoryToolHandler:
-          """Client-side backend for the memory_20250818 tool.
-          Implements view, create, str_replace, insert, delete, rename.
-          File-backed (tempdir) so writes persist across notebook cells.
-          """
-      
-          def __init__(self, base_dir: str | None = None):
-              self.base = Path(base_dir or tempfile.mkdtemp(prefix="memory_"))
-              self.base.mkdir(parents=True, exist_ok=True)
-      
-          def _resolve(self, path: str) -> Path:
-              rel = path.removeprefix("/memories").removeprefix("/") or "."
-              resolved = (self.base / rel).resolve()
-              if not str(resolved).startswith(str(self.base.resolve())):
-                  raise ValueError(f"path escapes memory directory: {path}")
-              return resolved
-      
-          def handle(self, tool_input: dict) -> str:
-              cmd = tool_input.get("command")
-              try:
-                  if cmd == "view":
-                      return self._view(tool_input.get("path", "/memories"))
-                  if cmd == "create":
-                      return self._create(tool_input["path"], tool_input["file_text"])
-                  if cmd == "str_replace":
-                      return self._str_replace(
-                          tool_input["path"], tool_input["old_str"], tool_input["new_str"]
-                      )
-                  if cmd == "insert":
-                      return self._insert(
-                          tool_input["path"], tool_input["insert_line"], tool_input["insert_text"]
-                      )
-                  if cmd == "delete":
-                      return self._delete(tool_input["path"])
-                  if cmd == "rename":
-                      return self._rename(tool_input["old_path"], tool_input["new_path"])
-                  return f"Error: unknown command '{cmd}'"
-              except Exception as e:
-                  return f"Error: {e}"
-      
-          def _view(self, path: str) -> str:
-              p = self._resolve(path)
-              if not p.exists():
-                  return (
-                      "Directory /memories is empty."
-                      if path in ("/memories", "/memories/")
-                      else f"Error: '{path}' does not exist."
-                  )
-              if p.is_dir():
-                  entries = sorted(p.iterdir())
-                  if not entries:
-                      return "(empty directory)"
-                  return "\n".join((f"{e.name}/" if e.is_dir() else e.name) for e in entries)
-              content = p.read_text()
-              lines = content.split("\n")
-              return "\n".join(f"{i + 1:4d}\t{line}" for i, line in enumerate(lines))
-      
-          def _create(self, path: str, file_text: str) -> str:
-              p = self._resolve(path)
-              p.parent.mkdir(parents=True, exist_ok=True)
-              p.write_text(file_text)
-              return f"Created {path}"
-      
-          def _str_replace(self, path: str, old_str: str, new_str: str) -> str:
-              p = self._resolve(path)
-              content = p.read_text()
-              if old_str not in content:
-                  return f"Error: old_str not found in {path}"
-              if content.count(old_str) > 1:
-                  return f"Error: old_str appears {content.count(old_str)} times; be more specific."
-              p.write_text(content.replace(old_str, new_str, 1))
-              return f"Replaced in {path}"
-      
-          def _insert(self, path: str, insert_line: int, insert_text: str) -> str:
-              p = self._resolve(path)
-              lines = p.read_text().split("\n")
-              lines.insert(insert_line, insert_text)
-              p.write_text("\n".join(lines))
-              return f"Inserted at line {insert_line} in {path}"
-      
-          def _delete(self, path: str) -> str:
-              p = self._resolve(path)
-              if p.is_dir():
-                  import shutil
-      
-                  shutil.rmtree(p)
-              else:
-                  p.unlink()
-              return f"Deleted {path}"
-      
-          def _rename(self, old_path: str, new_path: str) -> str:
-              old_p, new_p = self._resolve(old_path), self._resolve(new_path)
-              new_p.parent.mkdir(parents=True, exist_ok=True)
-              old_p.rename(new_p)
-              return f"Renamed {old_path} → {new_path}"
-  markdown cell:
-    source:
-      > Security note: the `_resolve` method guards against path traversal (`../../etc/passwd`). In production you'd also want to cap file sizes and total directory size. See the [Memory Cookbook](https://platform.claude.com/cookbook/tool-use-memory-cookbook) for deeper memory patterns.
-      
-      ### Demonstrating the benefit
-      
-      To see the effect concretely, we run the agent across three sessions:
-      
-      1. **Session 1** does the initial research pass and writes its findings to `/memories`.
-      2. **Session 2 (without memory)** runs a follow-up task with an empty memory directory. It has to rediscover everything from scratch.
-      3. **Session 2 (with memory)** runs the same follow-up task but with access to Session 1's saved files. It reads those first and builds on them instead of re-researching.
-      
-      The comparison between the two Session 2 runs is where the memory benefit becomes visible.
-  code cell:
-    source:
-      # Session 1: initial research pass, memory tool enabled
-      memory = MemoryToolHandler()
-      # Focused task for S1: read a handful of docs, save the comparative findings
-      S1_TASK = """Compare four model organisms for aging research: C. elegans, mouse,
-      killifish, and yeast. For each one, summarize lifespan, genetic tractability,
-      and translational relevance. Read the review documents and save your
-      comparative findings to /memories so a follow-up session can build on them."""
-      
-      memory_s1 = run_research_session(
-          S1_TASK,
-          memory_handler=memory,
-          max_turns=15,
-          label="memory/s1",
-      )
-      
-      print("=" * 60)
-      print("What Session 1 wrote to /memories:")
-      print("=" * 60)
-      for f in sorted(memory.base.iterdir()):
-          content = f.read_text()
-          print(f"\n▸ {f.name} (~{count_tokens(content):,} tokens)")
-          for line in content.split("\n")[:10]:
-              print(f"  {line}")
-          if len(content.split("\n")) > 10:
-              print("  ...")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [memory/s1]
-          │  turn  1  ctx=  1,941
-          │            memory           view /memories
-          │  turn  2  ctx=  2,025
-          │            search_files     'C. elegans aging lifespan'
-          │            search_files     'mouse aging model organism'
-          │            search_files     'killifish aging'
-          │            search_files     'yeast aging replicative chronological'
-          │  turn  3  ctx=  2,386
-          │            search_files     'aging model organism review'
-          │            search_files     'lifespan genetics'
-          │  turn  4  ctx=  2,576
-          │            search_files     'review'  → ~205 tok
-          │            search_files     'organism'  → ~205 tok
-          │  turn  5  ctx=  3,296
-          │            read_file        /research/celegans_review.md  → ~27,465 tok
-          │            read_file        /research/mouse_review.md  → ~27,390 tok
-          │            read_file        /research/killifish_review.md  → ~27,349 tok
-          │            read_file        /research/yeast_review.md  → ~27,320 tok
-          │  turn  6  ctx=168,218
-          │            memory           create /memories/aging_model_organisms_comparison.md
-          │  turn  7  ctx=171,354
-          │            record_finding   "Four aging model organism reviews fully read and s..."
-          │  turn  8  ctx=171,935  (final answer)
-          └─ completed: 8 turns, peak ctx 171,935, final ctx 171,935, 0 context event(s)
-          
-          ============================================================
-          What Session 1 wrote to /memories:
-          ============================================================
-          
-          ▸ aging_model_organisms_comparison.md (~2,999 tokens)
-            # Comparative Notes: Four Model Organisms in Aging Research
-            ## Source files read: /research/celegans_review.md, mouse_review.md, killifish_review.md, yeast_review.md
-            ## Session status: COMPLETE — all four reviews read, notes synthesised
-            
-            ---
-            
-            ## 1. C. elegans (Caenorhabditis elegans)
-            
-            ### Lifespan
-            - Wild-type median: ~18 days at 20°C (N2 strain)
-            ...
-  code cell:
-    source:
-      # Session 2 WITHOUT memory: follow-up task with empty /memories
-      # This shows what happens when the agent has no prior knowledge to draw on.
-      S2_TASK = (
-          "Write a final synthesis comparing all eight model organisms for "
-          "aging research (C. elegans, Drosophila, yeast, killifish, mouse, "
-          "zebrafish, naked mole-rat, rhesus), focusing on which organisms "
-          "are strongest for which kinds of questions. Use information from "
-          "/memories for organisms already covered there; read source "
-          "documents for any organism not yet captured."
-      )
-      
-      empty_memory = MemoryToolHandler()  # fresh, empty /memories
-      memory_s2_without = run_research_session(
-          S2_TASK,
-          memory_handler=empty_memory,
-          max_turns=12,
-          label="memory/s2-without",
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [memory/s2-without]
-          │  turn  1  ctx=  1,952
-          │            memory           view /memories
-          │  turn  2  ctx=  2,056
-          │            search_files     'C. elegans aging model organism'
-          │            search_files     'Drosophila aging model organism'
-          │            search_files     'yeast aging model organism'
-          │            search_files     'killifish aging model organism'
-          │  turn  3  ctx=  2,415
-          │            search_files     'aging model organism lifespan'
-          │            search_files     'mouse aging research'
-          │            search_files     'naked mole-rat longevity'
-          │            search_files     'rhesus macaque aging'
-          │  turn  4  ctx=  2,755
-          │            search_files     'model organism'  → ~205 tok
-          │            search_files     'aging research review'
-          │            search_files     'lifespan genetics'
-          │  turn  5  ctx=  3,277
-          │            read_file        /research/celegans_review.md  → ~27,465 tok
-          │            read_file        /research/drosophila_review.md  → ~27,435 tok
-          │            read_file        /research/yeast_review.md  → ~27,320 tok
-          │            read_file        /research/killifish_review.md  → ~27,349 tok
-          │  turn  6  ctx=168,250
-          │            read_file        /research/mouse_review.md  → ~27,390 tok
-          │            read_file        /research/zebrafish_review.md  → ~27,363 tok
-          │            read_file        /research/nmr_review.md  → ~27,287 tok
-          │            read_file        /research/rhesus_review.md  → ~27,285 tok
-          │  turn  7  ctx=332,824
-          │            memory           create /memories/organism_notes.md
-          │  turn  8  ctx=333,872
-          │            record_finding   "All 8 organism review documents read and key facts..."
-          │  turn  9  ctx=333,977  (final answer)
-          └─ completed: 9 turns, peak ctx 333,977, final ctx 333,977, 0 context event(s)
-          
-  code cell:
-    source:
-      # Session 2 WITH memory: same follow-up task, but with access to Session 1's files
-      # The agent should read /memories first and build on it instead of re-researching.
-      memory_s2 = run_research_session(
-          S2_TASK,
-          memory_handler=memory,  # SAME handler as Session 1
-          max_turns=12,
-          label="memory/s2-with",
-      )
-      
-      # Show what Session 2 read from /memories in its opening turns
-      print("=" * 60)
-      print("What Session 2 (with memory) read from /memories:")
-      print("=" * 60)
-      for msg_i, msg in enumerate(memory_s2.messages[:8]):
-          # Look for memory tool calls in assistant turns
-          if msg["role"] == "assistant" and isinstance(msg["content"], list):
-              for block in msg["content"]:
-                  if block.get("type") == "tool_use" and block.get("name") == "memory":
-                      inp = block.get("input", {})
-                      print(f"\n[memory {inp.get('command', '?')}] {inp.get('path', '')}")
-          # Look for the corresponding tool results (what was read)
-          if msg["role"] == "user" and isinstance(msg["content"], list) and msg_i <= 6:
-              for block in msg["content"]:
-                  if block.get("type") == "tool_result":
-                      result = block.get("content", "")
-                      if isinstance(result, str) and len(result) > 20:
-                          # Show a preview of what was read
-                          preview = result[:350].replace("\n", "\n  ")
-                          print(f"  └ {preview}{'...' if len(result) > 350 else ''}")
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [memory/s2-with]
-          │  turn  1  ctx=  1,952
-          │            memory           view /memories
-          │  turn  2  ctx=  2,043
-          │            memory           view /memories/aging_model_organisms_comparison.md  → ~2,707 tok
-          │  turn  3  ctx=  5,737
-          │            read_file        /research/drosophila_review.md  → ~27,435 tok
-          │            read_file        /research/zebrafish_review.md  → ~27,363 tok
-          │            read_file        /research/nmr_review.md  → ~27,287 tok
-          │            read_file        /research/rhesus_review.md  → ~27,285 tok
-          │  turn  4  ctx=170,401
-          │            memory           str_replace /memories/aging_model_organisms_comparison.md
-          │  turn  5  ctx=172,415
-          │            record_finding   "All eight model organism reviews fully read and sy..."
-          │  turn  6  ctx=172,623  (final answer)
-          └─ completed: 6 turns, peak ctx 172,623, final ctx 172,623, 0 context event(s)
-          
-          ============================================================
-          What Session 2 (with memory) read from /memories:
-          ============================================================
-          
-          [memory view] /memories
-            └ aging_model_organisms_comparison.md
-          
-          [memory view] /memories/aging_model_organisms_comparison.md
-            └    1	# Comparative Notes: Four Model Organisms in Aging Research
-               2	## Source files read: /research/celegans_review.md, mouse_review.md, killifish_review.md, yeast_review.md
-               3	## Session status: COMPLETE — all four reviews read, notes synthesised
-               4	
-               5	---
-               6	
-               7	## 1. C. elegans (Caenorhabditis elegans)
-               8	
-               9	### Lifespan
-              10	-...
-            └ # Model Organism Review: Drosophila melanogaster in Aging Research
-            
-            Drosophila melanogaster, the common fruit fly, has been a genetic model for
-            over a century and an aging model since Pearl's demographic work in the
-            1920s. It occupies a useful middle ground: more complex than C. elegans
-            (tissue diversity, a functional heart, behavioral repertoire) ...
-            └ # Model Organism Review: Danio rerio (Zebrafish) in Aging Research
-            
-            Zebrafish are a vertebrate model with particular strengths in developmental
-            biology and regeneration that have been increasingly adopted for aging
-            research over the past decade. Their transparent larvae and high fecundity
-            make them competitive with invertebrates for some screening ...
-            └ # Model Organism Review: Heterocephalus glaber (Naked Mole-Rat)
-            
-            The naked mole-rat is an eusocial rodent from East African burrows with a
-            maximum lifespan exceeding 30 years, roughly ten times that of a similarly
-            sized mouse. It is studied as a model of exceptional longevity rather than
-            as a general aging model: the question is what makes it long-...
-            └ # Model Organism Review: Macaca mulatta (Rhesus Macaque)
-            
-            Rhesus macaques are the primary non-human primate model for aging research.
-            With a maximum lifespan of ~40 years and physiology closely mirroring human,
-            they represent the closest experimentally accessible approximation to human
-            aging, at correspondingly high cost.
-            
-            ## Lifespan and throughpu...
-          
-          [memory str_replace] /memories/aging_model_organisms_comparison.md
-  code cell:
-    source:
-      # Compare: the bar chart shows final context and file reads side by side.
-      plot_summary_bars(
-          {
-              "S2 without memory": memory_s2_without,
-              "S2 with memory": memory_s2,
-          },
-          "Memory benefit: Session 2 with vs. without Session 1's saved findings",
-      )
-      
-      s2_with_reads = memory_s2.tool_counts.get("read_file", 0)
-      s2_without_reads = memory_s2_without.tool_counts.get("read_file", 0)
-      s2_with_mem = memory_s2.tool_counts.get("memory", 0)
-      s2_without_mem = memory_s2_without.tool_counts.get("memory", 0)
-      
-      print("Session 2 comparison:")
-      print(
-          f"  Without memory: {s2_without_reads} file reads, {s2_without_mem} memory ops, "
-          f"peak ctx {max(t for _, t in memory_s2_without.token_trajectory):,}"
-      )
-      print(
-          f"  With memory:    {s2_with_reads} file reads, {s2_with_mem} memory ops, "
-          f"peak ctx {max(t for _, t in memory_s2.token_trajectory):,}"
-      )
-      
-      if s2_with_reads < s2_without_reads:
-          saved = s2_without_reads - s2_with_reads
-          print(
-              f"\n→ Memory saved {saved} file read(s): the agent pulled Session 1's "
-              "findings from /memories instead of re-reading the source documents."
-          )
-      elif s2_with_reads == 0 and s2_with_mem > 0:
-          print(
-              "\n→ Session 2 with memory completed using only /memories; "
-              "no source-document re-reads needed."
-          )
-      else:
-          print(
-              "\n→ Read counts are similar this run. The benefit depends on how "
-              "comprehensive Session 1's notes were and how the agent chose to use them."
-          )
-      
-      print(
-          "\nThis comparison shows memory working well because Session 1's notes "
-          "were comprehensive. If Session 1 had saved sparse or poorly organized "
-          "notes, Session 2 would fall back to re-reading source documents. "
-          "Memory's value depends on the agent's judgment about what to write."
-      )
-    outputs:
-      output 0:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=dbd91dc3af629570...>
-          text/plain: <Figure size 900x350 with 2 Axes>
-      output 1:
-        output_type: stream
-        name: stdout
-        text:
-          Session 2 comparison:
-            Without memory: 8 file reads, 2 memory ops, peak ctx 333,977
-            With memory:    4 file reads, 3 memory ops, peak ctx 172,623
-          
-          → Memory saved 4 file read(s): the agent pulled Session 1's findings from /memories instead of re-reading the source documents.
-          
-          This comparison shows memory working well because Session 1's notes were comprehensive. If Session 1 had saved sparse or poorly organized notes, Session 2 would fall back to re-reading source documents. Memory's value depends on the agent's judgment about what to write.
-  markdown cell:
-    source:
-      ### Analysis
-      
-      The comparison makes the benefit concrete. Session 2 without memory has nothing to draw on; `/memories` is empty, so it has to go back to the source documents to rediscover the same facts. Session 2 with memory opens by reading `/memories` (the auto-injected protocol makes this a default first move), finds Session 1's saved findings, and can build a synthesis from those instead of re-reading every source document.
-      
-      This is just-in-time retrieval in practice: rather than loading all prior knowledge into the first prompt, the agent pulls the relevant pieces from memory on demand. The file-read counts and final context in the bar chart quantify the difference directly.
-      
-      What memory gets you is cross-session persistence with lossless fidelity on whatever the agent chose to save. What it doesn't get you is any help with in-session context growth (Session 1's peak context is still high) and it adds tool-call overhead for every read and write. Memory solves the cross-session problem; clearing and compaction solve the in-session one.
-  markdown cell:
-    source:
-      ### Implementing memory effectively
-      
-      The `memory_20250818` tool auto-injects a system prompt establishing a check-memory-first protocol and an assume-interruption mindset ("ALWAYS VIEW YOUR MEMORY DIRECTORY BEFORE DOING ANYTHING ELSE... Your context window might be reset at any moment"). This handles the basic mechanics. Beyond that, the [memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool#prompting-guidance) describe several ways to shape what the model saves:
-      
-      **Topical guidance.** You can steer what gets written with a simple system-prompt instruction: `"Only write down information relevant to <topic> in your memory system."` For this cookbook's research agent, that might be "save comparative findings and key figures, not raw document contents."
-      
-      **Keeping `/memories` organized.** If you observe the model creating cluttered memory files, try adding: `"when editing your memory folder, always try to keep its content up-to-date, coherent and organized. You can rename or delete files that are no longer relevant. Do not create new files unless necessary."` This keeps the directory from accumulating half-overlapping notes across sessions.
-      
-      **Initializer-session structure.** For multi-session work, try running a dedicated first session that sets up memory artifacts before substantive work begins: a progress log, a feature checklist, references to any setup scripts. Subsequent sessions open by reading those artifacts to recover state. Pre-seeding `/memories` this way gives later sessions a consistent structure to work within instead of each session inventing its own organization.
-      
-      **Storage hygiene.** On the client-side, you can also track file sizes to prevent unbounded growth, consider clearing out memory files that haven't been accessed in an extended time, and validate against path traversal.
-  markdown cell:
-    source:
-      ---
-      
-      ## Summary: What Each Primitive Does
-      
-      | Primitive | Operates on | What's traded away | Solves |
-      |---|---|---|---|
-      | **Compaction** | The whole conversation in the current window | Verbatim detail compressed into a summary; obscure specifics may be lost | All in-session growth |
-      | **Clearing** | Tool results in the current window | Old tool results are gone from context (must re-fetch if needed again) | Tool-result bloat |
-      | **Memory** | External storage, across windows | Tool-call overhead; only as good as what the agent chose to save | Cross-session persistence |
-      
-      The chart below puts the three solo runs side by side, plus the baseline. Note that memory's Session 2 is a different task (follow-up synthesis) so the absolute numbers aren't directly comparable to the others; what matters for memory is the S2-with vs. S2-without comparison shown above.
-  code cell:
-    source:
-      # Side-by-side view of each primitive's solo run
-      solo_results = {
-          "baseline": baseline,
-          "clearing": clearing_run,
-          "compaction": compaction_run,
-          "memory (S2 with)": memory_s2,
-      }
-      plot_summary_bars(solo_results, "Each primitive alone: session outcomes")
-      
-      # Completion status and events per run
-      print(f"{'Run':<20} {'Status':<18} {'Context events'}")
-      print("-" * 60)
-      for name, res in solo_results.items():
-          status = "HIT LIMIT" if res.hit_limit else "completed"
-          n_clear = sum(1 for e in res.events if e["kind"] == "clearing")
-          n_compact = sum(1 for e in res.events if e["kind"] == "compaction")
-          ev_parts = []
-          if n_clear:
-              ev_parts.append(f"{n_clear} clearing")
-          if n_compact:
-              ev_parts.append(f"{n_compact} compaction")
-          print(f"  {name:<18} {status:<18} {', '.join(ev_parts) if ev_parts else 'none'}")
-    outputs:
-      output 0:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=abfcc091bb9c498c...>
-          text/plain: <Figure size 900x350 with 2 Axes>
-      output 1:
-        output_type: stream
-        name: stdout
-        text:
-          Run                  Status             Context events
-          ------------------------------------------------------------
-            baseline           completed          none
-            clearing           completed          4 clearing
-            compaction         completed          1 compaction
-            memory (S2 with)   completed          none
-  markdown cell:
-    source:
-      The three primitives address different slices of the context problem, which is why they compose rather than compete. Clearing and compaction manage what's inside the current window; memory moves information out of the window so it survives across sessions. Which ones you need depends on which parts of the problem your workload actually hits.
-  markdown cell:
-    source:
-      ---
-      
-      ## Using Them Together
-      
-      The three primitives target different parts of the context problem, so they can be layered. [Claude Code](https://code.claude.com/docs/en/memory) is a real-world example that employs compaction alongside [two complementary memory systems](https://code.claude.com/docs/en/memory#claudemd-vs-auto-memory): `CLAUDE.md` files hold user-defined instructions and rules (coding standards, project architecture, workflows) that the developer writes and checks into source control; auto memory holds learnings and patterns Claude writes itself (build commands, debugging insights, preferences discovered from corrections). Both are useful forms of memory for Claude Code.
-      
-      The Claude Code design shows that memory can take different shapes for the same agent; one form written by the user, another written by the model. The same applies to compaction and clearing: each has configuration knobs (trigger thresholds, custom instructions, which tools to exclude) that let you tune behavior to your use case. This is why the prompting and configuration guidance in the "Implementing effectively" sections above matters: the default behavior is a starting point, but the right settings depend on what your agent actually does.
-      
-      > **Note on `exclude_tools`**: when combining clearing with the memory tool, the `exclude_tools: ["memory"]` setting (shown in the config below) prevents the agent's memory reads and writes from being cleared. Without it, the agent could lose track of what it just saved. The [memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool#using-with-context-editing) recommend this explicitly when layering the two.
-      
-      Below we run the research agent with all three primitives active at once and trace what each one does over the course of the session.
-      
-      > **Note on the config**: both triggers are set above the first batch's size (~167K) so the trajectory tracks the baseline through batch 1. When batch 2 pushes context to ~330K, clearing fires first (keep=6 drops the earliest reads, leaving ~210K) and compaction fires on what clearing left. Memory is active throughout. This tuning is for demonstration, so that all three primitives activate in one run. A production config depends on your agent's specific context-growth pattern.
-  code cell:
-    source:
-      # Both triggers sit above batch-1's ~167K so the rise is visible. At
-      # batch-2 (~330K) clearing fires first (keep=6 drops the earliest 3-4
-      # reads, leaving ~210K) and compaction then fires on what clearing
-      # left. Tuned for demo visibility, not a production recommendation.
-      combo_memory = MemoryToolHandler()
-      COMBO_CLEARING_TRIGGER = 200_000
-      COMBO_COMPACTION_TRIGGER = 180_000
-      
-      combo_run = run_research_session(
-          """Compare the model organisms in /research/ on lifespan, genetic
-      tractability, and translational relevance.
-      
-      Work in two batches, recording a note after each batch before moving on.
-      Batch 1 (high-throughput): read celegans_review.md, drosophila_review.md,
-      yeast_review.md, killifish_review.md, then record a note. Batch 2
-      (low-throughput): read mouse_review.md, zebrafish_review.md,
-      nmr_review.md, rhesus_review.md, then record a note. Save a comparative
-      summary to /memories so future sessions can build on it. Finish with a
-      detailed synthesis.""",
-          betas=["context-management-2025-06-27", "compact-2026-01-12"],
-          context_management={
-              "edits": [
-                  {
-                      "type": "clear_tool_uses_20250919",
-                      "trigger": {"type": "input_tokens", "value": COMBO_CLEARING_TRIGGER},
-                      "keep": {"type": "tool_uses", "value": 6},
-                      "clear_at_least": {"type": "input_tokens", "value": 30_000},
-                      "exclude_tools": ["memory"],
-                  },
-                  {
-                      "type": "compact_20260112",
-                      "trigger": {"type": "input_tokens", "value": COMBO_COMPACTION_TRIGGER},
-                  },
-              ],
-          },
-          memory_handler=combo_memory,
-          max_turns=20,
-          label="all-three",
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ┌─ [all-three]
-          │  turn  1  ctx=  2,106
-          │            memory           view /memories
-          │  turn  2  ctx=  2,206
-          │            read_file        /research/celegans_review.md  → ~27,465 tok
-          │            read_file        /research/drosophila_review.md  → ~27,435 tok
-          │            read_file        /research/yeast_review.md  → ~27,320 tok
-          │            read_file        /research/killifish_review.md  → ~27,349 tok
-          │  turn  3  ctx=167,185
-          │            memory           create /memories/batch1_notes.md
-          │  turn  4  ctx=169,938
-          │            read_file        /research/mouse_review.md  → ~27,390 tok
-          │            read_file        /research/zebrafish_review.md  → ~27,363 tok
-          │            read_file        /research/nmr_review.md  → ~27,287 tok
-          │            read_file        /research/rhesus_review.md  → ~27,285 tok
-          │  ✂ CLEARING (turn 5): 2 tool results cleared, ~81,993 tokens freed
-          │  ⊟ COMPACTION (turn 5): ~2,839-token summary replaces prior turns
-          │  ✂ CLEARING (turn 6): 2 tool results cleared, ~81,810 tokens freed
-          │  turn  6  ctx=  4,901
-          │            memory           view /memories/batch1_notes.md  → ~2,370 tok
-          │  ✂ CLEARING (turn 7): 2 tool results cleared, ~81,683 tokens freed
-          │  turn  7  ctx=  8,121
-          │            memory           create /memories/batch2_notes.md
-          │  ✂ CLEARING (turn 8): 2 tool results cleared, ~81,508 tokens freed
-          │  turn  8  ctx= 10,524
-          │            memory           create /memories/comparative_summary.md
-          │  ✂ CLEARING (turn 9): 2 tool results cleared, ~81,413 tokens freed
-          │  turn  9  ctx= 12,799
-          │            search_files     'translational relevance aging pipeline'
-          │            search_files     'lifespan maximum cohort size intervention'
-          │  ✂ CLEARING (turn 10): 4 tool results cleared, ~162,379 tokens freed
-          │  turn 10  ctx= 13,096
-          │            search_files     'yeast replicative lifespan'
-          │            search_files     'rhesus macaque caloric restriction'
-          │  ✂ CLEARING (turn 11): 4 tool results cleared, ~162,371 tokens freed
-          │  turn 11  ctx= 13,290
-          │            record_finding   "All 8 model organisms reviewed and compared. Notes..."
-          │  ✂ CLEARING (turn 12): 4 tool results cleared, ~162,337 tokens freed
-          │  turn 12  ctx= 13,461
-          │            memory           str_replace /memories/comparative_summary.md
-          │  ✂ CLEARING (turn 13): 4 tool results cleared, ~162,331 tokens freed
-          │  turn 13  ctx= 13,749  (final answer)
-          └─ completed: 13 turns, peak ctx 169,938, final ctx 13,749, 10 context event(s)
-          
-  code cell:
-    source:
-      # Build a timeline showing when each primitive activated over the session
-      print("=" * 70)
-      print("SESSION TIMELINE: all three primitives active")
-      print("=" * 70)
-      
-      # Build turn-indexed event markers
-      clear_turns = {e["turn"] for e in combo_run.events if e["kind"] == "clearing"}
-      compact_turns = {e["turn"] for e in combo_run.events if e["kind"] == "compaction"}
-      
-      # Find per-turn memory tool calls by walking messages
-      mem_call_turns = set()
-      turn = 0
-      for msg in combo_run.messages:
-          if msg["role"] == "assistant" and isinstance(msg["content"], list):
-              turn += 1
-              for block in msg["content"]:
-                  if block.get("type") == "tool_use" and block.get("name") == "memory":
-                      mem_call_turns.add(turn)
-      
-      # Print turn-by-turn with markers
-      for t, ctx in combo_run.token_trajectory:
-          markers = []
-          if t in clear_turns:
-              markers.append("✂ CLEARING")
-          if t in compact_turns:
-              markers.append("⊟ COMPACTION")
-          if t in mem_call_turns:
-              markers.append("◇ memory")
-          marker_str = "  " + "  ".join(markers) if markers else ""
-          print(f"  turn {t:2d}  ctx={ctx:>7,}{marker_str}")
-      
-      # Summary of what each primitive did
-      n_clearing = len(clear_turns)
-      n_compaction = len(compact_turns)
-      n_memory_calls = combo_run.tool_counts.get("memory", 0)
-      mem_files = sorted(combo_memory.base.iterdir())
-      
-      print("\n" + "─" * 70)
-      print("What each primitive did this session:")
-      print(f"\n  ✂ Tool-result clearing fired {n_clearing} time(s)")
-      print(
-          f"    → Dropped old tool results to keep context capped at ~{max(t for _, t in combo_run.token_trajectory):,} tokens"
-      )
-      print(
-          f"    → {combo_run.tool_counts.get('read_file', 0)} file reads total "
-          f"(vs baseline's {baseline.tool_counts.get('read_file', 0)}); some were re-fetches after context was reset"
-      )
-      
-      if n_compaction > 0:
-          print(f"\n  ⊟ Compaction fired {n_compaction} time(s)")
-          for ev in combo_run.events:
-              if ev["kind"] == "compaction":
-                  print(
-                      f"    → At turn {ev['turn']}: replaced prior conversation with ~{count_tokens(ev['summary']):,}-token summary"
-                  )
-      else:
-          print(
-              f"\n  ⊟ Compaction did not fire (clearing kept context below the {COMBO_COMPACTION_TRIGGER:,} trigger)"
-          )
-      
-      print(
-          f"\n  ◇ Memory tool called {n_memory_calls} time(s), wrote {len(mem_files)} file(s) to /memories"
-      )
-      for f in mem_files:
-          content = f.read_text()
-          print(f"    → {f.name} (~{count_tokens(content):,} tokens)")
-          lines = content.split("\n")
-          for line in lines[:6]:
-              print(f"        {line}")
-          if len(lines) > 6:
-              print("        ...")
-      
-      print("\n" + "─" * 70)
-      print(
-          f"Peak context: {max(t for _, t in combo_run.token_trajectory):,} tokens "
-          f"(vs baseline's {max(t for _, t in baseline.token_trajectory):,})"
-      )
-      print(f"Final context: {combo_run.token_trajectory[-1][1]:,} tokens")
-      print(
-          f"Session {'completed' if not combo_run.hit_limit else 'HIT LIMIT'} "
-          f"in {len(combo_run.token_trajectory)} turns"
-      )
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          ======================================================================
-          SESSION TIMELINE: all three primitives active
-          ======================================================================
-            turn  1  ctx=  2,106  ◇ memory
-            turn  2  ctx=  2,206
-            turn  3  ctx=167,185  ◇ memory
-            turn  4  ctx=169,938
-            turn  5  ctx=  4,798  ✂ CLEARING  ⊟ COMPACTION  ◇ memory
-            turn  6  ctx=  4,901  ✂ CLEARING  ◇ memory
-            turn  7  ctx=  8,121  ✂ CLEARING  ◇ memory
-            turn  8  ctx= 10,524  ✂ CLEARING  ◇ memory
-            turn  9  ctx= 12,799  ✂ CLEARING
-            turn 10  ctx= 13,096  ✂ CLEARING
-            turn 11  ctx= 13,290  ✂ CLEARING
-            turn 12  ctx= 13,461  ✂ CLEARING  ◇ memory
-            turn 13  ctx= 13,749  ✂ CLEARING
-          
-          ──────────────────────────────────────────────────────────────────────
-          What each primitive did this session:
-          
-            ✂ Tool-result clearing fired 9 time(s)
-              → Dropped old tool results to keep context capped at ~169,938 tokens
-              → 8 file reads total (vs baseline's 8); some were re-fetches after context was reset
-          
-            ⊟ Compaction fired 1 time(s)
-              → At turn 5: replaced prior conversation with ~2,839-token summary
-          
-            ◇ Memory tool called 7 time(s), wrote 3 file(s) to /memories
-              → batch1_notes.md (~2,610 tokens)
-                  # Batch 1: High-Throughput Model Organisms — Notes
-                  
-                  ## 1. C. elegans (Caenorhabditis elegans)
-                  
-                  ### Lifespan
-                  - Wild-type median: ~18 days at 20°C (N2 strain)
-                  ...
-              → batch2_notes.md (~2,254 tokens)
-                  # Batch 2: Vertebrate & Exceptional Longevity Models — Notes
-                  
-                  ## 5. Mus musculus (Mouse)
-                  
-                  ### Lifespan
-                  - Median: ~24–30 months (C57BL/6J); max ~4.5 yrs (Ames dwarf + CR)
-                  ...
-              → comparative_summary.md (~2,173 tokens)
-                  # Comparative Summary: Model Organisms for Aging Research
-                  
-                  ## Quick Reference Table
-                  
-                  | Organism | Median Lifespan | Time to Result | Human Orthology | Genetic Tractability | Translational Relevance | Primary Role |
-                  |---|---|---|---|---|---|---|
-                  ...
-          
-          ──────────────────────────────────────────────────────────────────────
-          Peak context: 169,938 tokens (vs baseline's 335,279)
-          Final context: 13,749 tokens
-          Session completed in 13 turns
-  code cell:
-    source:
-      plot_trajectories(
-          {"baseline": baseline, "all three active": combo_run},
-          "All three primitives active: context stays bounded while the session continues",
-          triggers={"clearing": COMBO_CLEARING_TRIGGER, "compaction": COMBO_COMPACTION_TRIGGER},
-      )
-    outputs:
-      output 0:
-        output_type: display_data
-        data:
-          image/png: iVBORw0K...<snip base64, md5=0535b18fb23bc5f2...>
-          text/plain: <Figure size 1000x450 with 1 Axes>
-  markdown cell:
-    source:
-      ### What the timeline shows
-      
-      With all three primitives active, each one activated for its own reason during the session. The trajectory tracked the baseline through batch 1: both triggers sit above the first batch's size, so neither edit fired until batch 2 pushed context past ~330K. At that point clearing dropped the earliest reads and compaction summarized what remained, letting the agent continue. Memory was active throughout, with the agent checking `/memories` at the start and saving its comparative notes for future sessions. The timeline above shows all three cooperating across one session.
-      
-      Getting the primitives to split the work usefully takes some tuning; plan to experiment with the values against your own workload.
-      
-      The point isn't that running all three produces the "best" numbers; it's that they each handle a different part of the problem when that problem actually arises. The useful question isn't "should I use all three?" but "which of the three problems does my workload actually have?"
-      
-      ### When you might NOT want a primitive
-      
-      Not every workload needs every tool. A few cases where you'd deliberately leave one out:
-      
-      - **Skip memory** if you want each session to start fresh. A user-facing chatbot where every conversation should be independent doesn't need cross-session persistence; adding memory would carry state you don't want.
-      - **Skip compaction** if your sessions are short enough to stay under the context limit naturally. Compaction is lossy (specific details get summarized away), so if you don't need the headroom, you're paying fidelity for nothing.
-      - **Skip clearing** if the agent genuinely needs to see past tool results in full. An agent doing cross-document analysis where it compares passages side by side can't re-fetch its way back to a cleared result fast enough; clearing would force redundant reads.
-  markdown cell:
-    source:
-      ---
-      
-      ## Takeaways and Next Steps
-      
-      ### Lessons from the experiments
-      
-      Running the research agent under these different configurations surfaces a few practical lessons:
-      
-      **The shape of the trajectory reflects what each tool does.** When clearing fires you see a step-down on the turn where old tool results were removed; in longer sessions this can repeat as context climbs back over the trigger. Compaction produces a larger drop each time it fires, since the summary replaces an entire run of turns rather than just the tool results within them. The plots in this cookbook are meant to make those effects visible, so you can see concretely what changes when you turn a knob. Which tool fits your workload is a separate question, driven by what the agent needs to do.
-      
-      **Lossiness is a spectrum, not a binary.** Clearing is lossless as long as the tool is re-callable. Compaction is lossy in a controlled way: the summarizer prompt (default or custom) determines what survives. Memory is lossless on what gets saved but is only as good as the agent's judgment about what to save. Each primitive trades fidelity differently.
-      
-      **Layering adds capability and complexity in equal measure.** Using all three together covers more of the context problem, but also means more knobs to tune and more interactions to trace. The useful question before adding a primitive is what specific problem in your workload it solves.
-      
-      **On larger context windows.** With Sonnet 4.6 and Opus 4.6 providing 1M-token context, that headroom is useful: more verbatim detail can stay around, and lossy operations can be spaced out. But as the baseline's context breakdown showed, the working set on a 1M model fills with stale tool results just as fast as on a 200K model; the difference is where the hard limit sits, not how quickly context accumulates. Context rot and prefill latency scale with how much is in the window, not with the window's limit, so keeping the working set lean is still worth doing even when the hard wall is far away.
-      
-      ### Thinking about your workload
-      
-      This table sketches workload characteristics and which primitive is worth trying first. Treat these as hypotheses to test on your own agent, not as answers. Every workload has quirks a table can't capture.
-      
-      | If your workload has... | Worth trying first | Watch for |
-      |---|---|---|
-      | Sessions spanning days or weeks | Memory tool | Tool-call overhead; stale memory if facts change |
-      | An agent that should learn user preferences across sessions | Memory tool | PII/sensitive data policy; stale preferences if the user changes their mind |
-      | Large, re-fetchable tool results | Clearing | Agent re-reading what was just cleared; tune `keep` and trigger |
-      | Dialogue as the primary context | Compaction | Specific figures getting summarized away |
-      | Tool results that aren't easily re-fetchable (ephemeral APIs, uploads) | Compaction over clearing | Summary fidelity on those specific results |
-      | Every session should start fresh | Skip memory | Cross-session state you don't want |
-      | Sessions stay well under the window | Skip compaction | Lossiness you don't need |
-      
-      ### What this cookbook didn't cover
-      
-      **Tuning beyond the basics.** The "Implementing effectively" sections above give you a starting point for each primitive. The next step is experimentation: different use cases will get different value out of the same primitive depending on parameters and prompts. A coding agent and a research agent might both use compaction, but the `instructions` string that works for one won't work for the other; the same is true of clearing thresholds and what you guide the model to write to `/memories`.
-      
-      Setting up a test harness helps here. For a simple example, the agent loop in this cookbook (`run_research_session`) returns `token_trajectory`, `events`, and `tool_counts`: you can run your agent under a handful of configs, plot the trajectories side by side, and measure what matters to you (task quality, latency, token spend).
-      
-      **Adjacent features.** [Programmatic tool calling (PTC)](https://www.anthropic.com/engineering/advanced-tool-use) prevents large results from entering context at all by running tools inside a model-authored program, which is a different approach to the tool-bloat problem. [Tool search](https://www.anthropic.com/engineering/advanced-tool-use) trims tool-definition bloat when you have many tools.
-      
-      ### Related reading
-      
-      The [Memory Cookbook](https://platform.claude.com/cookbook/tool-use-memory-cookbook) goes deeper on memory patterns with a code-review agent, and the [Compaction Cookbook](https://platform.claude.com/cookbook/tool-use-automatic-context-compaction) covers compaction in isolation. For a detailed case study of context management techniques in a multi-session software agent, see [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).
-      
-      ## Conclusion
-      
-      This notebook walked through three context-management primitives for long-running agents: compaction to compress conversational history, tool-result clearing to drop re-fetchable tool output, and the memory tool to persist knowledge across sessions. Each addresses a different slice of the context problem.
-      
-      Which primitives matter for your agent depends on where its context growth actually comes from. The configs and agent loop in this cookbook are a starting point for running your own workload under different configurations and seeing what changes.

Generated by nbdime

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: COMMENT

Summary

Adds a well-structured session browser cookbook (Notebook 05) covering the seven Agent SDK session management primitives. The prose is clear, the four-part structure is logical, and prior review feedback has already been incorporated.

Actionable Feedback (5 items)
  • 05_Building_a_session_browser.ipynb (in cell with for s in list_sessions(directory=DEMO_DIR): delete_session(...)) — Iteration-during-deletion bug: collect sessions into a list first to avoid skipping files. Fix: sessions_to_delete = list(list_sessions(directory=DEMO_DIR)) then iterate over that.
  • 05_Building_a_session_browser.ipynb (in cell with datetime.fromtimestamp(s.last_modified / 1000)) — Uses timezone-naive datetime.fromtimestamp(). Prefer datetime.fromtimestamp(..., tz=timezone.utc) — readers will copy this into real products.
  • 05_Building_a_session_browser.ipynb (in cells with # Part 1:, # Part 2: headings) — Part headings use # (H1), same level as the notebook title. Consider ## for parts and ### for subsections to establish a clear hierarchy.
  • session_browser_demo/.gitkeep — Contains a comment (# Session browser demo workspace) rather than being empty. Conventional .gitkeep files are empty.
  • General: Add a link to the Python session management API reference in "Where to go next" — up_to_message_id is mentioned inline but never linked, leaving readers without a path to the full parameter details.
Detailed Review

Code Quality

The code is clean and idiomatic. The run_one_turn helper, visible_sessions filter, and fork-then-resume pattern are all good teaching examples. The follow-up commit already addressed the prior review's assertraise RuntimeError, return type annotation, and SDKSessionInfo import placement — all meaningful improvements.

One remaining bug: the cleanup cell iterates over list_sessions() while calling delete_session() in the same loop. Whether list_sessions() returns an eagerly-evaluated list or a lazy iterator, mutating the directory mid-scan is undefined behavior and may silently skip sessions. Collecting IDs first (sessions_to_delete = list(list_sessions(...))) is the safe fix and sets the right mental model for readers who will copy this pattern.

Security

No issues. load_dotenv() is used correctly, no secrets are hardcoded, and %%capture hides pip noise without suppressing errors.

Suggestions

  • pandas is imported for a single pd.DataFrame(rows) display. A brief comment noting it's only for the tabular UI mock would help readers adapting this to production (where they may not want the dependency).
  • The fork return value type (SDKSessionInfo) is never named in prose. The sentence before Cell 11 could add "fork.session_id is the identifier you pass to resume" to close the loop without requiring readers to inspect the SDK source.

Positive Notes

  • Introduction leads with the user problem ("the first thing users ask for is the sidebar") rather than the API surface — exactly the right framing for a cookbook.
  • MODEL = "claude-haiku-4-5" uses the non-dated alias per project conventions.
  • pyproject.toml version bump to >=0.1.51 is consistent with the pip install cell, preventing kernel/environment skew.
  • Soft-delete via __hidden tag is taught before hard-delete, which is the correct pedagogical order.
  • up_to_message_id is mentioned in prose without a demo cell — the right call for scope control while still surfacing the full API.
  • Conventional commit format with a substantive body explaining the version bump.

@qing-ant qing-ant merged commit ca0dd33 into main Apr 1, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants