Skip to content

fix(tool_use): unwrap hard-wrapped paragraphs in context engineering notebook#483

Merged
isabella-anthropic merged 1 commit into
mainfrom
isabella/fix-context-engineering-newlines
Mar 31, 2026
Merged

fix(tool_use): unwrap hard-wrapped paragraphs in context engineering notebook#483
isabella-anthropic merged 1 commit into
mainfrom
isabella/fix-context-engineering-newlines

Conversation

@isabella-anthropic

@isabella-anthropic isabella-anthropic commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #481. Markdown cells had prose hard-wrapped at ~80 chars, which renders as mid-paragraph line breaks in the cookbook UI.

Change: Joined wrapped prose lines into single-line paragraphs across 25 markdown cells

Preserved: Code blocks, tables, lists, blockquotes, all 29 output blocks, execution counts, and notebook metadata are byte-identical to main.

…notebook

Markdown cells had prose hard-wrapped at ~80 chars, which renders as
mid-paragraph line breaks in the UI. Joined wrapped lines into single-line
paragraphs, matching the format used in claude_agent_sdk notebooks. Code
blocks, tables, lists, and blockquotes unchanged; all cell outputs intact.
@github-actions

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 tool_use/context_engineering/context_engineering_tools.ipynb

View diff
nbdiff tool_use/context_engineering/context_engineering_tools.ipynb (ee3dfe1e245e90ed0fee452b1a8c9ab9e33927f4) tool_use/context_engineering/context_engineering_tools.ipynb (acbcebdce9394c79271ac924dd9802039bf30e8c)
--- tool_use/context_engineering/context_engineering_tools.ipynb (ee3dfe1e245e90ed0fee452b1a8c9ab9e33927f4)  (no timestamp)
+++ tool_use/context_engineering/context_engineering_tools.ipynb (acbcebdce9394c79271ac924dd9802039bf30e8c)  (no timestamp)
## modified /cells/0/source:
@@ -2,80 +2,27 @@
 
 ## Introduction
 
-A common challenge when building long-horizon agents is managing context.
-Tool results, the model's own reasoning, and user messages all accumulate,
-and eventually you either hit the token limit or start paying for context
-that isn't helping anymore. Studies on needle-in-a-haystack style benchmarking have uncovered the concept of [context rot](https://research.trychroma.com/context-rot): as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases. So, even before the hard context limit is reached, the agent may be getting
-less out of each token.
+A common challenge when building long-horizon agents is managing context. Tool results, the model's own reasoning, and user messages all accumulate, and eventually you either hit the token limit or start paying for context that isn't helping anymore. Studies on needle-in-a-haystack style benchmarking have uncovered the concept of [context rot](https://research.trychroma.com/context-rot): as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases. So, even before the hard context limit is reached, the agent may be getting less out of each token.
 
-Our engineering blog on [effective context engineering for AI
-agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
-frames this as a resource problem: context is finite with diminishing
-marginal returns, and the core discipline is finding the smallest set of
-high-signal tokens that maximize the likelihood of your desired outcome.
-There are several levers for this: subagents that isolate work in their
-own context, programmatic tool calling that keeps large results out of the
-window entirely, and others.
+Our engineering blog on [effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) frames this as a resource problem: context is finite with diminishing marginal returns, and the core discipline is finding the smallest set of high-signal tokens that maximize the likelihood of your desired outcome. There are several levers for this: subagents that isolate work in their own context, programmatic tool calling that keeps large results out of the window entirely, and others.
 
-This cookbook focuses on three: **compaction**, **tool-result clearing**,
-and **memory**. All three are effective strategies for context engineering,
-but since they all operate to make the context window more efficient in different ways, they
-can be hard to distinguish. Understanding those distinctions is what lets you map each tool to the part
-of your workload it actually helps with. Alongside other core context management strategies
-like utilizing subagents, these three are crucial for teams building long-running
-agents to understand. They also all have first-party API support, so you
-can adopt them without building orchestration infrastructure.
+This cookbook focuses on three: **compaction**, **tool-result clearing**, and **memory**. All three are effective strategies for context engineering, but since they all operate to make the context window more efficient in different ways, they can be hard to distinguish. Understanding those distinctions is what lets you map each tool to the part of your workload it actually helps with. Alongside other core context management strategies like utilizing subagents, these three are crucial for teams building long-running agents to understand. They also all have first-party API support, so you can adopt them without building orchestration infrastructure.
 
-- **Compaction** distills the contents of a context window into a
-  high-fidelity summary, letting the agent continue with minimal
-  performance degradation when the conversation gets long.
-- **Tool-result clearing** addresses the bloat from tool use itself. As an
-  agent pulls in tools and calls them, the results pile up, and deciding how
-  much of that tool output to keep becomes an increasingly important part
-  of managing context. Clearing drops old, re-fetchable results while
-  keeping the record that the call happened.
-- **Memory** is structured note-taking: the agent writes to persistent
-  external storage so it can track progress across tasks and sessions
-  without keeping everything in active context.
+- **Compaction** distills the contents of a context window into a high-fidelity summary, letting the agent continue with minimal performance degradation when the conversation gets long.
+- **Tool-result clearing** addresses the bloat from tool use itself. As an agent pulls in tools and calls them, the results pile up, and deciding how much of that tool output to keep becomes an increasingly important part of managing context. Clearing drops old, re-fetchable results while keeping the record that the call happened.
+- **Memory** is structured note-taking: the agent writes to persistent external storage so it can track progress across tasks and sessions without keeping everything in active context.
 
-[Claude Code](https://claude.com/product/claude-code) employs multiple
-of these strategies in production: compaction for long conversations
-and two complementary memory systems for cross-session persistence. Our API
-offers first-party implementations of all three:
-[server-side compaction](https://platform.claude.com/docs/en/build-with-claude/compaction),
-[context editing](https://platform.claude.com/docs/en/build-with-claude/context-editing)
-(which includes tool-result clearing), and the
-[memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool).
-This cookbook works through how to
-think about designing with them: when each one applies, how to configure
-them, what changes when you use them independently vs. together, and sample
-use-cases where different combinations make sense.
+[Claude Code](https://claude.com/product/claude-code) employs multiple of these strategies in production: compaction for long conversations and two complementary memory systems for cross-session persistence. Our API offers first-party implementations of all three: [server-side compaction](https://platform.claude.com/docs/en/build-with-claude/compaction), [context editing](https://platform.claude.com/docs/en/build-with-claude/context-editing) (which includes tool-result clearing), and the [memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool). This cookbook works through how to think about designing with them: when each one applies, how to configure them, what changes when you use them independently vs. together, and sample use-cases where different combinations make sense.
 
-The examples center on a **long-running research agent**: one that reads a
-corpus of documents, takes notes, and builds on its findings across
-multiple sessions. It's a useful test case because it naturally hits all
-three problems: bulky document reads (clearing), long analytical
-conversations (compaction), and knowledge that needs to survive between
-sessions (memory).
+The examples center on a **long-running research agent**: one that reads a corpus of documents, takes notes, and builds on its findings across multiple sessions. It's a useful test case because it naturally hits all three problems: bulky document reads (clearing), long analytical conversations (compaction), and knowledge that needs to survive between sessions (memory).
 
 ### What you'll learn
 
-- How to **cap in-session token growth** with `clear_tool_uses` when an
-  agent's context is dominated by large, re-fetchable tool results like
-  file reads and API responses
-- How to **keep long conversations going** with server-side compaction,
-  including how to serialize the `compaction` block back and probe what
-  survives the summary
-- How to **persist agent knowledge across sessions** by implementing a
-  file-backed memory handler that the model drives itself, so Session 2
-  picks up where Session 1 left off
-- How to **implement each primitive most effectively**, replacing the
-  default compaction prompt to preserve what your agent needs, guiding
-  what the agent writes to `/memories`, and testing clearing configs
-  against your own workload's tool-use pattern
-- How to **diagnose which part of the context problem your workload
-  actually has**, and pick the primitive that targets it, with a
-  framework for mapping workload characteristics to the right tool
+- How to **cap in-session token growth** with `clear_tool_uses` when an agent's context is dominated by large, re-fetchable tool results like file reads and API responses
+- How to **keep long conversations going** with server-side compaction, including how to serialize the `compaction` block back and probe what survives the summary
+- How to **persist agent knowledge across sessions** by implementing a file-backed memory handler that the model drives itself, so Session 2 picks up where Session 1 left off
+- How to **implement each primitive most effectively**, replacing the default compaction prompt to preserve what your agent needs, guiding what the agent writes to `/memories`, and testing clearing configs against your own workload's tool-use pattern
+- How to **diagnose which part of the context problem your workload actually has**, and pick the primitive that targets it, with a framework for mapping workload characteristics to the right tool
 
 ### Prerequisites
 
@@ -83,10 +30,6 @@ To run this notebook, you will need:
 
 - **Anthropic API key** set as `ANTHROPIC_API_KEY` in your environment or a `.env` file ([get one here](https://platform.claude.com/))
 - **Python 3.11+** with the `anthropic`, `python-dotenv`, and `matplotlib` packages installed
-- **`research_corpus.py`** alongside this notebook (included
-  in the repository). It defines `CORPUS`, a dict of eight synthetic review
-  documents on model organisms for aging research (~40K tokens each, ~320K
-  tokens total), plus probe questions used later to test what survives
-  compaction. You can swap in your own documents by replacing the dict.
+- **`research_corpus.py`** alongside this notebook (included in the repository). It defines `CORPUS`, a dict of eight synthetic review documents on model organisms for aging research (~40K tokens each, ~320K tokens total), plus probe questions used later to test what survives compaction. You can swap in your own documents by replacing the dict.
 
 > **Running from the cookbooks repo?** Ensure your working directory is `tool_use/context_engineering` before running the notebook.

## modified /cells/5/source:
@@ -1,30 +1,9 @@
 ## The Problem: A Long-Running Research Agent
 
-The agent in this cookbook plays the role of a biology researcher writing
-a comparative review of model organisms for aging and longevity research.
-The task is realistic enough to matter: it involves reading through a
-corpus of review documents (one per organism), extracting comparable
-facts (lifespan, genetic tractability, translational relevance), taking
-structured notes, and synthesizing findings across everything read.
+The agent in this cookbook plays the role of a biology researcher writing a comparative review of model organisms for aging and longevity research. The task is realistic enough to matter: it involves reading through a corpus of review documents (one per organism), extracting comparable facts (lifespan, genetic tractability, translational relevance), taking structured notes, and synthesizing findings across everything read.
 
-This kind of work is where context management starts to bite. Each
-document is around 40K tokens (narrative plus extensive appendix tables
-of intervention data), and the task asks the agent to read them in two
-batches: four high-throughput organisms (C. elegans, Drosophila, yeast,
-killifish) first, then four low-throughput organisms (mouse, zebrafish,
-naked mole-rat, rhesus). The two-batch structure is an experimental
-design choice for this cookbook: it produces a context trajectory that
-climbs past the compaction trigger on the first batch and past the 200K
-reference line on the second, so each primitive's effect on the
-trajectory is visible in the same run. Without context management, the
-agent's context grows to hundreds of thousands of tokens mid-task. And
-since the work spans sessions, even a completed run starts the next
-session with no memory of what was learned.
+This kind of work is where context management starts to bite. Each document is around 40K tokens (narrative plus extensive appendix tables of intervention data), and the task asks the agent to read them in two batches: four high-throughput organisms (C. elegans, Drosophila, yeast, killifish) first, then four low-throughput organisms (mouse, zebrafish, naked mole-rat, rhesus). The two-batch structure is an experimental design choice for this cookbook: it produces a context trajectory that climbs past the compaction trigger on the first batch and past the 200K reference line on the second, so each primitive's effect on the trajectory is visible in the same run. Without context management, the agent's context grows to hundreds of thousands of tokens mid-task. And since the work spans sessions, even a completed run starts the next session with no memory of what was learned.
 
 ### The research task
 
-The agent's concrete assignment: compare the model organisms in
-`/research/` on three dimensions (lifespan and experimental throughput,
-genetic tractability, and translational relevance to human aging),
-reading the eight review documents in two batches and taking notes as it
-goes, then writing a comparative synthesis.
+The agent's concrete assignment: compare the model organisms in `/research/` on three dimensions (lifespan and experimental throughput, genetic tractability, and translational relevance to human aging), reading the eight review documents in two batches and taking notes as it goes, then writing a comparative synthesis.

## modified /cells/6/source:
@@ -1,52 +1,16 @@
 ## How the Three APIs Map to the Problem
 
-Each API targets a different kind of context growth. Understanding which
-kind you're facing is the first step to picking the right tool.
+Each API targets a different kind of context growth. Understanding which kind you're facing is the first step to picking the right tool.
 
 ### Conceptually
 
-**Compaction** is the practice of taking a conversation nearing the context
-window limit, summarizing its contents, and reinitiating with that summary.
-It aims to distill the context window in a high-fidelity manner so the agent can
-continue with minimal performance degradation. The art of compaction lies
-in what to keep versus what to discard: overly aggressive compaction can
-lose subtle but critical context whose importance only becomes apparent
-later. The summary preserves architectural decisions, unresolved questions,
-and key facts while discarding redundant content; it's lossy by design, but
-handles all context growth, not just tool results. Compaction is a
-*whole-transcript* operation: user messages, assistant messages, tool
-calls, tool results, even prior compaction blocks are all flattened into
-the summary.
+**Compaction** is the practice of taking a conversation nearing the context window limit, summarizing its contents, and reinitiating with that summary. It aims to distill the context window in a high-fidelity manner so the agent can continue with minimal performance degradation. The art of compaction lies in what to keep versus what to discard: overly aggressive compaction can lose subtle but critical context whose importance only becomes apparent later. The summary preserves architectural decisions, unresolved questions, and key facts while discarding redundant content; it's lossy by design, but handles all context growth, not just tool results. Compaction is a *whole-transcript* operation: user messages, assistant messages, tool calls, tool results, even prior compaction blocks are all flattened into the summary.
 
-**Tool-result clearing**, by contrast, is a *sub-transcript* operation. It
-walks the message list and surgically replaces `tool_result` content
-blocks, leaving everything else — user messages, assistant reasoning, the
-`tool_use` record — untouched. When an agent calls tools, the results
-become part of the conversation history and count against the context
-budget on every subsequent turn. Much of that content is re-fetchable:
-file contents the agent can re-read, API responses it can re-request.
-Clearing replaces old `tool_result` blocks with a short placeholder,
-keeping the `tool_use` record so the model still knows it made the call,
-but dropping the bulky payload. Once a tool has been called deep in the
-message history, the agent rarely needs to see the raw result again;
-clearing is one of the safest, lightest-touch ways to recover that space.
-If the agent does need the data, it just calls the tool again.
+**Tool-result clearing**, by contrast, is a *sub-transcript* operation. It walks the message list and surgically replaces `tool_result` content blocks, leaving everything else — user messages, assistant reasoning, the `tool_use` record — untouched. When an agent calls tools, the results become part of the conversation history and count against the context budget on every subsequent turn. Much of that content is re-fetchable: file contents the agent can re-read, API responses it can re-request. Clearing replaces old `tool_result` blocks with a short placeholder, keeping the `tool_use` record so the model still knows it made the call, but dropping the bulky payload. Once a tool has been called deep in the message history, the agent rarely needs to see the raw result again; clearing is one of the safest, lightest-touch ways to recover that space. If the agent does need the data, it just calls the tool again.
 
-**Memory**, or structured note-taking, is a technique where the agent
-regularly writes notes persisted outside the context window, then pulls
-them back in at later times. This provides persistent memory with minimal
-overhead: the agent tracks progress across complex tasks, maintaining
-critical context that would otherwise be lost across dozens of tool calls
-or across context resets. After a reset (a new session, or after
-compaction), the agent reads its own notes and continues. You implement
-the storage backend, so you control what's stored and for how long.
+**Memory**, or structured note-taking, is a technique where the agent regularly writes notes persisted outside the context window, then pulls them back in at later times. This provides persistent memory with minimal overhead: the agent tracks progress across complex tasks, maintaining critical context that would otherwise be lost across dozens of tool calls or across context resets. After a reset (a new session, or after compaction), the agent reads its own notes and continues. You implement the storage backend, so you control what's stored and for how long.
 
-Beyond enabling these primitives, it's also important to understand how
-to implement them most effectively: the default behavior gets you
-started, but the quality of a compaction summary and the usefulness of
-what lands in memory both depend on guidance you provide. Each
-primitive's section below includes a subsection on effective
-implementation.
+Beyond enabling these primitives, it's also important to understand how to implement them most effectively: the default behavior gets you started, but the quality of a compaction summary and the usefulness of what lands in memory both depend on guidance you provide. Each primitive's section below includes a subsection on effective implementation.
 
 ### Tactically
 
@@ -60,22 +24,8 @@ implementation.
 
 For the research agent specifically, the three problems line up cleanly:
 
-- The agent's running commentary ("C. elegans is 18-day lifespan with
-  genome-wide RNAi, mouse is 30 months but costs $100K per cohort...")
-  and the user's follow-up questions accumulate into a long dialogue.
-  That's a **compaction** problem.
-- Reading eight ~40K-token review documents produces roughly 320K
-  tokens of tool-result volume, significantly into the range where
-  model performance decays from context rot. Most of it the agent
-  could re-read on demand. That's a **clearing** problem.
-- The work spans multiple sessions. If Session 1 determined that
-  killifish is the shortest-lived vertebrate (4-6 months), we want
-  Session 2 to retain that finding and build on it rather than
-  rediscover it from scratch. That's a **memory** problem.
+- The agent's running commentary ("C. elegans is 18-day lifespan with genome-wide RNAi, mouse is 30 months but costs $100K per cohort...") and the user's follow-up questions accumulate into a long dialogue. That's a **compaction** problem.
+- Reading eight ~40K-token review documents produces roughly 320K tokens of tool-result volume, significantly into the range where model performance decays from context rot. Most of it the agent could re-read on demand. That's a **clearing** problem.
+- The work spans multiple sessions. If Session 1 determined that killifish is the shortest-lived vertebrate (4-6 months), we want Session 2 to retain that finding and build on it rather than rediscover it from scratch. That's a **memory** problem.
 
-A rough mental model for prioritizing: compaction compresses the whole
-window when it grows too large, clearing drops stale re-fetchable data
-inside the window, and memory moves information out of the window so it survives
-across sessions. Each layer adds config to tune and interactions to
-understand, so it's worth starting with the one that matches the bottleneck
-you're actually observing.
+A rough mental model for prioritizing: compaction compresses the whole window when it grows too large, clearing drops stale re-fetchable data inside the window, and memory moves information out of the window so it survives across sessions. Each layer adds config to tune and interactions to understand, so it's worth starting with the one that matches the bottleneck you're actually observing.

## modified /cells/7/source:
@@ -1,6 +1,3 @@
 ## The Research Agent
 
-Before exploring each primitive, we set up the agent itself: tool schemas,
-tool execution, and an agent loop that can be run with or without any
-context-management configuration. Everything is inline so you can see the
-full loop.
+Before exploring each primitive, we set up the agent itself: tool schemas, tool execution, and an agent loop that can be run with or without any context-management configuration. Everything is inline so you can see the full loop.

## modified /cells/11/source:
@@ -1,16 +1,7 @@
 ### Baseline: no context management
 
-First we run the agent with no context-management configuration. With the
-large corpus (each document is ~40K tokens with its appendix tables),
-context accumulates fast. We'll look at the same run under two lenses:
-what happens on a 1M-token window, and what would happen on a 200K
-window.
+First we run the agent with no context-management configuration. With the large corpus (each document is ~40K tokens with its appendix tables), context accumulates fast. We'll look at the same run under two lenses: what happens on a 1M-token window, and what would happen on a 200K window.
 
 #### Part 1: On a 1M-token window
 
-Claude Sonnet 4.6 and Claude Opus 4.6 both provide a
-[1M-token context window](https://platform.claude.com/docs/en/build-with-claude/context-windows).
-For this task, the baseline's total input stays under that limit: the
-agent reads the full corpus and synthesizes without hitting a hard wall.
-The trajectory below shows the run climbing to hundreds of thousands of tokens, with the
-dotted line projecting continued growth at the same rate.
+Claude Sonnet 4.6 and Claude Opus 4.6 both provide a [1M-token context window](https://platform.claude.com/docs/en/build-with-claude/context-windows). For this task, the baseline's total input stays under that limit: the agent reads the full corpus and synthesizes without hitting a hard wall. The trajectory below shows the run climbing to hundreds of thousands of tokens, with the dotted line projecting continued growth at the same rate.

## modified /cells/14/source:
@@ -1,21 +1,7 @@
-The breakdown above makes the scale concrete. The model is carrying
-hundreds of thousands of tokens of file contents on every turn, most of
-it documents the agent already processed and took notes on. The first
-document read is still in the window, but by the end of the run it's
-sitting behind hundreds of thousands of tokens of other tool results
-plus all the agent's reasoning and notes. It hasn't been removed; it's
-competing with everything else for attention. This is where context rot
-shows up: recall of details from that depth degrades as the window
-fills, even though the content is technically present. And prefill
-latency scales with context length, so every turn pays to process the
-full pile.
+The breakdown above makes the scale concrete. The model is carrying hundreds of thousands of tokens of file contents on every turn, most of it documents the agent already processed and took notes on. The first document read is still in the window, but by the end of the run it's sitting behind hundreds of thousands of tokens of other tool results plus all the agent's reasoning and notes. It hasn't been removed; it's competing with everything else for attention. This is where context rot shows up: recall of details from that depth degrades as the window fills, even though the content is technically present. And prefill latency scales with context length, so every turn pays to process the full pile.
 
 #### Part 2: On a 200K-token window
 
-Earlier models cap at 200K tokens. On those models, the same baseline
-run hits a hard wall: the API rejects the next request once context
-exceeds the limit, and the task stops mid-run.
+Earlier models cap at 200K tokens. On those models, the same baseline run hits a hard wall: the API rejects the next request once context exceeds the limit, and the task stops mid-run.
 
-The cell below finds the turn where the baseline first crossed 200K and
-shows what the run looks like from a 200K model's perspective: same
-trajectory up to that point, then a hard stop.
+The cell below finds the turn where the baseline first crossed 200K and shows what the run looks like from a 200K model's perspective: same trajectory up to that point, then a hard stop.

## modified /cells/16/source:
@@ -1,18 +1,3 @@
-Both failure modes come from the same underlying problem: the context
-window fills with hundreds of thousands of tokens of file content, most
-of it already processed and noted. What differs is how the failure
-surfaces. On a 200K window it's a hard stop: the API rejects the next
-request and the task ends mid-phase. On a 1M window the agent keeps
-running, but context rot sets in as the window fills: an early document
-read is still technically present, but by the end of the run it's
-buried under everything read since, and the model's ability to recall
-its details degrades. The agent completes, but the quality of the
-synthesis depends on recall that's fighting against that pile. Prefill
-latency scales with it too: every turn pays to process the full
-context, regardless of how much of it is still useful.
+Both failure modes come from the same underlying problem: the context window fills with hundreds of thousands of tokens of file content, most of it already processed and noted. What differs is how the failure surfaces. On a 200K window it's a hard stop: the API rejects the next request and the task ends mid-phase. On a 1M window the agent keeps running, but context rot sets in as the window fills: an early document read is still technically present, but by the end of the run it's buried under everything read since, and the model's ability to recall its details degrades. The agent completes, but the quality of the synthesis depends on recall that's fighting against that pile. Prefill latency scales with it too: every turn pays to process the full context, regardless of how much of it is still useful.
 
-The primitives below each address this by keeping the working set small
-enough that neither failure mode bites: the window doesn't fill, so
-smaller models don't stop and larger models don't degrade. Plots
-include the dashed 200K reference line so you can see where an earlier
-model would have been cut off.
+The primitives below each address this by keeping the working set small enough that neither failure mode bites: the window doesn't fill, so smaller models don't stop and larger models don't degrade. Plots include the dashed 200K reference line so you can see where an earlier model would have been cut off.

## modified /cells/17/source:
@@ -2,33 +2,10 @@
 
 ## Compaction
 
-[Compaction](https://platform.claude.com/docs/en/build-with-claude/compaction)
-is a useful strategy for managing context in long-running
-conversations: it takes a conversation nearing the context window limit,
-summarizes its contents, and reinitiates with that summary. This
-addresses the agent's own reasoning text, user back-and-forth, and
-decisions made over the course of a session. The specific sequence of
-actions and exact wording from earlier turns won't be preserved, but
-the goals, decisions, and major discoveries the agent made are
-summarized — what the summary retains depends on your compaction
-prompt, which we cover below.
+[Compaction](https://platform.claude.com/docs/en/build-with-claude/compaction) is a useful strategy for managing context in long-running conversations: it takes a conversation nearing the context window limit, summarizes its contents, and reinitiates with that summary. This addresses the agent's own reasoning text, user back-and-forth, and decisions made over the course of a session. The specific sequence of actions and exact wording from earlier turns won't be preserved, but the goals, decisions, and major discoveries the agent made are summarized — what the summary retains depends on your compaction prompt, which we cover below.
 
-At its core, compaction distills the contents of a context window in a
-high-fidelity manner, enabling the agent to continue with minimal
-performance degradation. The trade-off is in choosing what the summary
-must retain versus what it can safely drop: overly aggressive compaction
-can lose subtle but critical context whose importance only becomes
-apparent later. The summary preserves key decisions and facts but may
-drop specific numbers or exact phrasing. It costs inference (the
-summarizer model runs), but handles all context growth, not just tool
-results.
+At its core, compaction distills the contents of a context window in a high-fidelity manner, enabling the agent to continue with minimal performance degradation. The trade-off is in choosing what the summary must retain versus what it can safely drop: overly aggressive compaction can lose subtle but critical context whose importance only becomes apparent later. The summary preserves key decisions and facts but may drop specific numbers or exact phrasing. It costs inference (the summarizer model runs), but handles all context growth, not just tool results.
 
 ### How it works under the hood
 
-Here's a minimal sample implementation of compaction. Our first-party
-API provides a robust, tested version (automatic triggering at a token
-threshold, a typed content block that slots natively into the
-conversation, correct tool-use pairing), but the ~25-line version
-below makes the mechanism concrete: render the conversation to text,
-ask the model to summarize it, replace the old messages with that
-summary.
+Here's a minimal sample implementation of compaction. Our first-party API provides a robust, tested version (automatic triggering at a token threshold, a typed content block that slots natively into the conversation, correct tool-use pairing), but the ~25-line version below makes the mechanism concrete: render the conversation to text, ask the model to summarize it, replace the old messages with that summary.

## modified /cells/20/source:
@@ -1,21 +1,9 @@
-The sample above demonstrates the mechanism: the model produces a
-condensed version of the conversation that the agent can continue from.
+The sample above demonstrates the mechanism: the model produces a condensed version of the conversation that the agent can continue from.
 
 ### Using the API
 
-Our API provides this natively as the `compact_20260112` context edit. It
-triggers automatically at a token threshold (minimum 50K), returns a typed
-`compaction` content block that slots into the conversation natively, and
-handles tool-use pairing across the summary boundary. When compaction
-fires, you serialize the compaction block back (`{"type": "compaction",
-"content": block.content}`) and the API drops everything before it on the
-next request.
+Our API provides this natively as the `compact_20260112` context edit. It triggers automatically at a token threshold (minimum 50K), returns a typed `compaction` content block that slots into the conversation natively, and handles tool-use pairing across the summary boundary. When compaction fires, you serialize the compaction block back (`{"type": "compaction", "content": block.content}`) and the API drops everything before it on the next request.
 
 **API Documentation:** [Compaction — platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/compaction)
 
-Here's the research agent running with compaction configured. We set the
-trigger at 180K so the first batch of reads (~165K) stays under it:
-the compaction trajectory tracks the baseline through that batch, then
-diverges when the second batch pushes context past the trigger. Watch
-for `⊟ COMPACTION` lines in the output and the drop on the plot where
-the summary replaces the earlier conversation.
+Here's the research agent running with compaction configured. We set the trigger at 180K so the first batch of reads (~165K) stays under it: the compaction trajectory tracks the baseline through that batch, then diverges when the second batch pushes context past the trigger. Watch for `⊟ COMPACTION` lines in the output and the drop on the plot where the summary replaces the earlier conversation.

## modified /cells/23/source:
@@ -1,28 +1,7 @@
 ### Analysis
 
-The baseline keeps climbing until it either hits a context-window limit
-(a hard stop on smaller windows) or accumulates enough tokens that
-context rot meaningfully degrades recall. Compaction addresses both: when
-context crosses the trigger, the older conversation is replaced by a
-model-generated summary and context drops sharply. The agent continues
-with a lean window instead of an ever-growing one.
+The baseline keeps climbing until it either hits a context-window limit (a hard stop on smaller windows) or accumulates enough tokens that context rot meaningfully degrades recall. Compaction addresses both: when context crosses the trigger, the older conversation is replaced by a model-generated summary and context drops sharply. The agent continues with a lean window instead of an ever-growing one.
 
-The probe above checks the summary text directly for a *mix* of details.
-The pattern that tends to emerge: high-level facts central to the task
-(lifespan figures the agent noted, organism identities, major
-comparisons) usually survive in the summary. Obscure specifics (a single
-cell in an appendix table, a heterogeneity statistic) usually don't.
-This is a meaningful difference from tool-result clearing: clearing
-drops tool results *wholesale* so the content is gone until re-fetched,
-while compaction keeps the *substance* in compressed form but loses
-verbatim detail.
+The probe above checks the summary text directly for a *mix* of details. The pattern that tends to emerge: high-level facts central to the task (lifespan figures the agent noted, organism identities, major comparisons) usually survive in the summary. Obscure specifics (a single cell in an appendix table, a heterogeneity statistic) usually don't. This is a meaningful difference from tool-result clearing: clearing drops tool results *wholesale* so the content is gone until re-fetched, while compaction keeps the *substance* in compressed form but loses verbatim detail.
 
-What compaction gets you is a general-purpose way to keep the window
-lean: it handles dialogue and tool results together, the important
-content survives in summarized form, and the agent keeps working under
-conditions where it would otherwise be cut off or swamped. What it
-doesn't get you is verbatim fidelity on specifics, or cross-session
-persistence. If your context bloat is mostly re-fetchable tool output,
-clearing is cheaper and lossless (the agent can just call the tool
-again). If it's dialogue and reasoning that can't be re-fetched,
-compaction is the right fit.
+What compaction gets you is a general-purpose way to keep the window lean: it handles dialogue and tool results together, the important content survives in summarized form, and the agent keeps working under conditions where it would otherwise be cut off or swamped. What it doesn't get you is verbatim fidelity on specifics, or cross-session persistence. If your context bloat is mostly re-fetchable tool output, clearing is cheaper and lossless (the agent can just call the tool again). If it's dialogue and reasoning that can't be re-fetched, compaction is the right fit.

## modified /cells/24/source:
@@ -1,9 +1,6 @@
 ### Implementing compaction effectively
 
-The `instructions` parameter lets you replace the default summarization
-prompt entirely. The [compaction
-docs](https://platform.claude.com/docs/en/build-with-claude/compaction#custom-summarization-instructions)
-give the default prompt verbatim:
+The `instructions` parameter lets you replace the default summarization prompt entirely. The [compaction docs](https://platform.claude.com/docs/en/build-with-claude/compaction#custom-summarization-instructions) give the default prompt verbatim:
 
 > You have written a partial transcript for the initial task above.
 > Please write a summary of the transcript. The purpose of this summary
@@ -14,15 +11,9 @@ give the default prompt verbatim:
 > learnings etc. You must wrap your summary in a `<summary></summary>`
 > block.
 
-This helps give you a place to start. However, custom `instructions`
-don't supplement this prompt — they completely replace it. So if you
-provide your own, you're responsible for the full framing. The docs'
-example for a coding context is `"Focus on preserving code snippets,
-variable names, and technical decisions."`
+This helps give you a place to start. However, custom `instructions` don't supplement this prompt — they completely replace it. So if you provide your own, you're responsible for the full framing. The docs' example for a coding context is `"Focus on preserving code snippets, variable names, and technical decisions."`
 
-For this cookbook's research agent, you might write something that
-names the specific details the probe above showed are at risk of being
-lost:
+For this cookbook's research agent, you might write something that names the specific details the probe above showed are at risk of being lost:
 
 ```python
 context_management={

## modified /cells/25/source:
@@ -2,26 +2,10 @@
 
 ## Tool-Result Clearing
 
-When an agent calls tools, each result gets appended to the conversation as
-a `tool_result` block
-([context editing docs](https://platform.claude.com/docs/en/build-with-claude/context-editing)). Those blocks count toward the input-token budget on
-every subsequent turn, even after the agent has processed the content and
-moved on. For tools that are re-callable (file reads, API queries, search),
-carrying the verbatim result forward is often unnecessary; the agent can
-just call the tool again if it needs to.
+When an agent calls tools, each result gets appended to the conversation as a `tool_result` block ([context editing docs](https://platform.claude.com/docs/en/build-with-claude/context-editing)). Those blocks count toward the input-token budget on every subsequent turn, even after the agent has processed the content and moved on. For tools that are re-callable (file reads, API queries, search), carrying the verbatim result forward is often unnecessary; the agent can just call the tool again if it needs to.
 
-Clearing replaces old `tool_result` blocks with a short placeholder string.
-The `tool_use` block that preceded it stays, so the model retains a record
-that it made the call (and with what input), but the bulky response body is
-gone. This is the cheapest of the three primitives: no inference cost, just
-a mechanical edit to the message list.
+Clearing replaces old `tool_result` blocks with a short placeholder string. The `tool_use` block that preceded it stays, so the model retains a record that it made the call (and with what input), but the bulky response body is gone. This is the cheapest of the three primitives: no inference cost, just a mechanical edit to the message list.
 
 ### How it works under the hood
 
-To make the mechanism concrete, here's a minimal sample implementation
-of tool-result clearing. Our first-party API provides a robust, tested
-version of this (automatic triggering, correct block-pairing invariants,
-tool exclusions, and more), but seeing the ~15-line version makes the
-core operation tangible: walk the message list, find `tool_result`
-blocks, replace the content of all but the most recent few with a
-placeholder.
+To make the mechanism concrete, here's a minimal sample implementation of tool-result clearing. Our first-party API provides a robust, tested version of this (automatic triggering, correct block-pairing invariants, tool exclusions, and more), but seeing the ~15-line version makes the core operation tangible: walk the message list, find `tool_result` blocks, replace the content of all but the most recent few with a placeholder.

## modified /cells/28/source:
@@ -1,16 +1,8 @@
-The sample above shows the mechanism. What's missing from it: token
-counting and automatic triggering, correct `tool_use`/`tool_result`
-pairing invariants, tool-specific exclusions, and awareness on the
-model side that clearing happened.
+The sample above shows the mechanism. What's missing from it: token counting and automatic triggering, correct `tool_use`/`tool_result` pairing invariants, tool-specific exclusions, and awareness on the model side that clearing happened.
 
 ### Using the API
 
-Our API provides this natively as the `clear_tool_uses_20250919` context
-edit. It handles token counting and triggering server-side, preserves block
-pairing, and lets you exempt specific tools from clearing (useful when the
-memory tool is also active, as we'll see later). When clearing fires, the
-response includes `context_management.applied_edits` with details on how
-many tool uses were cleared and how many tokens were freed.
+Our API provides this natively as the `clear_tool_uses_20250919` context edit. It handles token counting and triggering server-side, preserves block pairing, and lets you exempt specific tools from clearing (useful when the memory tool is also active, as we'll see later). When clearing fires, the response includes `context_management.applied_edits` with details on how many tool uses were cleared and how many tokens were freed.
 
 > There's also `clear_thinking_20251015` for extended-thinking blocks.
 > Same config shape, different `type`. It must be the first entry in the
@@ -18,8 +10,4 @@ many tool uses were cleared and how many tokens were freed.
 
 **API Documentation:** [Context editing — platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/context-editing)
 
-Here's the research agent running with clearing enabled. The baseline's
-context climbed with every file read; clearing keeps this run bounded
-by dropping old tool results whenever context climbs past the trigger.
-Watch for `✂ CLEARING` lines in the output; dashed vertical lines on
-the plot mark each firing.
+Here's the research agent running with clearing enabled. The baseline's context climbed with every file read; clearing keeps this run bounded by dropping old tool results whenever context climbs past the trigger. Watch for `✂ CLEARING` lines in the output; dashed vertical lines on the plot mark each firing.

## modified /cells/31/source:
@@ -1,29 +1,7 @@
 ### Analysis
 
-The baseline keeps climbing; the clearing run stays bounded. Once
-context is past the trigger (30K here) and there are more than `keep`
-tool uses on record, clearing fires server-side: tool results older
-than the most recent `keep` are replaced with placeholders and context
-drops back down. The dashed lines on the plot mark each firing. That
-bounded
-window means the run doesn't hit a hard limit on smaller models, and it
-doesn't accumulate into the range where context rot degrades recall.
+The baseline keeps climbing; the clearing run stays bounded. Once context is past the trigger (30K here) and there are more than `keep` tool uses on record, clearing fires server-side: tool results older than the most recent `keep` are replaced with placeholders and context drops back down. The dashed lines on the plot mark each firing. That bounded window means the run doesn't hit a hard limit on smaller models, and it doesn't accumulate into the range where context rot degrades recall.
 
-The second cell above shows what this costs. Every file read except the
-most recent few is gone from context. When the agent reaches the
-synthesis phase, it has two options. It can work from its own notes
-plus whatever recent reads survived the last clearing: if the notes
-were thorough, this is fine; if they were sparse, the synthesis misses
-details the agent saw but didn't record. Or it can re-fetch cleared
-content by calling `read_file` again: the clearing run may show more
-file reads than the baseline for the same documents, because some reads
-were cleared before the agent was done with them. How much the second
-path costs depends on your tools: re-reading a local file is nearly
-free, but re-calling a rate-limited or slow API is not. Tuning `keep`
-and `trigger` shifts where the agent lands between these two.
+The second cell above shows what this costs. Every file read except the most recent few is gone from context. When the agent reaches the synthesis phase, it has two options. It can work from its own notes plus whatever recent reads survived the last clearing: if the notes were thorough, this is fine; if they were sparse, the synthesis misses details the agent saw but didn't record. Or it can re-fetch cleared content by calling `read_file` again: the clearing run may show more file reads than the baseline for the same documents, because some reads were cleared before the agent was done with them. How much the second path costs depends on your tools: re-reading a local file is nearly free, but re-calling a rate-limited or slow API is not. Tuning `keep` and `trigger` shifts where the agent lands between these two.
 
-What clearing gets you is a bounded window at no inference cost,
-avoiding both the hard-limit cutoff and the recall degradation that
-comes with a large accumulated context. What it doesn't get you is any
-help with content that isn't a tool result (the agent's own reasoning,
-user messages) or any persistence across sessions.
+What clearing gets you is a bounded window at no inference cost, avoiding both the hard-limit cutoff and the recall degradation that comes with a large accumulated context. What it doesn't get you is any help with content that isn't a tool result (the agent's own reasoning, user messages) or any persistence across sessions.

## modified /cells/32/source:
@@ -1,20 +1,5 @@
 ### Implementing clearing effectively
 
-Unlike compaction and memory, clearing has no prompt to tune, and the
-knobs are all numeric (`trigger`, `keep`, `clear_at_least`) or
-list-based (`exclude_tools`). One trade-off to understand: clearing
-invalidates cached prompt prefixes. To account for this, clear enough
-tokens to make the cache invalidation worthwhile; the `clear_at_least`
-parameter ensures a minimum number of tokens is cleared each time.
-You'll incur cache write costs each time clearing fires, but subsequent
-requests can reuse the newly cached prefix.
+Unlike compaction and memory, clearing has no prompt to tune, and the knobs are all numeric (`trigger`, `keep`, `clear_at_least`) or list-based (`exclude_tools`). One trade-off to understand: clearing invalidates cached prompt prefixes. To account for this, clear enough tokens to make the cache invalidation worthwhile; the `clear_at_least` parameter ensures a minimum number of tokens is cleared each time. You'll incur cache write costs each time clearing fires, but subsequent requests can reuse the newly cached prefix.
 
-The right values for `trigger` and `keep` depend on how your agent
-uses tool results: how large they are, how often the agent revisits
-them, whether re-fetching is cheap. The clearing run above used
-trigger=30K and keep=4; the all-three run later uses a higher trigger
-and keep=6 so clearing and compaction split the work. Test a few
-configurations
-against your own agent's workload: the `context_management.applied_edits`
-field in each response shows how many tool uses and tokens were
-cleared, which makes the effect of each config directly observable.
+The right values for `trigger` and `keep` depend on how your agent uses tool results: how large they are, how often the agent revisits them, whether re-fetching is cheap. The clearing run above used trigger=30K and keep=4; the all-three run later uses a higher trigger and keep=6 so clearing and compaction split the work. Test a few configurations against your own agent's workload: the `context_management.applied_edits` field in each response shows how many tool uses and tokens were cleared, which makes the effect of each config directly observable.

## modified /cells/33/source:
@@ -2,33 +2,12 @@
 
 ## Memory Tool
 
-The [memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool)
-enables Claude to store and retrieve information across
-conversations through a memory file directory. Claude can create, read,
-update, and delete files that persist between sessions, allowing it to
-build knowledge over time without keeping everything in the context
-window.
+The [memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool) enables Claude to store and retrieve information across conversations through a memory file directory. Claude can create, read, update, and delete files that persist between sessions, allowing it to build knowledge over time without keeping everything in the context window.
 
-This is the key primitive for just-in-time context retrieval: rather than
-loading all relevant information upfront, agents store what they learn in
-memory and pull it back on demand. This keeps the active context focused
-on what's currently relevant, which is critical for long-running
-workflows where loading everything at once would overwhelm the window.
-Clearing and compaction both operate on the current context; neither
-helps when a new session starts and the window is empty. Memory solves
-that problem.
+This is the key primitive for just-in-time context retrieval: rather than loading all relevant information upfront, agents store what they learn in memory and pull it back on demand. This keeps the active context focused on what's currently relevant, which is critical for long-running workflows where loading everything at once would overwhelm the window. Clearing and compaction both operate on the current context; neither helps when a new session starts and the window is empty. Memory solves that problem.
 
-The memory tool operates client-side: Claude makes tool calls to perform
-memory operations, and your application executes those operations
-locally. This gives you complete control over where and how the data is
-stored. The API provides the tool protocol and auto-injects a system
-prompt establishing the memory-checking behavior; you implement the
-storage backend.
+The memory tool operates client-side: Claude makes tool calls to perform memory operations, and your application executes those operations locally. This gives you complete control over where and how the data is stored. The API provides the tool protocol and auto-injects a system prompt establishing the memory-checking behavior; you implement the storage backend.
 
 ### How it works under the hood
 
-Here's a minimal sample implementation: a key-value store you write to
-after a session and read from before the next one. Our first-party API
-provides the robust version (the model decides what and when to save as
-part of its reasoning, full file operations, auto-injected protocol
-prompt), but this ~10-line version makes the core pattern concrete.
+Here's a minimal sample implementation: a key-value store you write to after a session and read from before the next one. Our first-party API provides the robust version (the model decides what and when to save as part of its reasoning, full file operations, auto-injected protocol prompt), but this ~10-line version makes the core pattern concrete.

## modified /cells/35/source:
@@ -1,16 +1,8 @@
-The sample above shows the pattern, but it puts you in charge of
-deciding what to save and when to load it. That's exactly the work the
-model is better positioned to do: it knows, mid-reasoning, what facts
-matter and when it needs to recall them.
+The sample above shows the pattern, but it puts you in charge of deciding what to save and when to load it. That's exactly the work the model is better positioned to do: it knows, mid-reasoning, what facts matter and when it needs to recall them.
 
 ### Using the API
 
-Our API provides this natively as the `memory_20250818` tool. The model
-decides what and when to save as part of its tool-use loop, an
-auto-injected system prompt establishes the protocol ("always view your
-memory directory before doing anything else"), and the tool offers full
-file operations rather than key-value. This is a client-side tool: the API
-provides the protocol, you implement the file backend.
+Our API provides this natively as the `memory_20250818` tool. The model decides what and when to save as part of its tool-use loop, an auto-injected system prompt establishes the protocol ("always view your memory directory before doing anything else"), and the tool offers full file operations rather than key-value. This is a client-side tool: the API provides the protocol, you implement the file backend.
 
 **API Documentation:** [Memory tool — platform.claude.com](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool)
 

## modified /cells/37/source:
@@ -8,13 +8,8 @@
 
 To see the effect concretely, we run the agent across three sessions:
 
-1. **Session 1** does the initial research pass and writes its findings
-   to `/memories`.
-2. **Session 2 (without memory)** runs a follow-up task with an empty
-   memory directory. It has to rediscover everything from scratch.
-3. **Session 2 (with memory)** runs the same follow-up task but with
-   access to Session 1's saved files. It reads those first and builds on
-   them instead of re-researching.
+1. **Session 1** does the initial research pass and writes its findings to `/memories`.
+2. **Session 2 (without memory)** runs a follow-up task with an empty memory directory. It has to rediscover everything from scratch.
+3. **Session 2 (with memory)** runs the same follow-up task but with access to Session 1's saved files. It reads those first and builds on them instead of re-researching.
 
-The comparison between the two Session 2 runs is where the memory benefit
-becomes visible.
+The comparison between the two Session 2 runs is where the memory benefit becomes visible.

## modified /cells/42/source:
@@ -1,19 +1,7 @@
 ### Analysis
 
-The comparison makes the benefit concrete. Session 2 without memory has
-nothing to draw on; `/memories` is empty, so it has to go back to the
-source documents to rediscover the same facts. Session 2 with memory
-opens by reading `/memories` (the auto-injected protocol makes this a
-default first move), finds Session 1's saved findings, and can build a
-synthesis from those instead of re-reading every source document.
+The comparison makes the benefit concrete. Session 2 without memory has nothing to draw on; `/memories` is empty, so it has to go back to the source documents to rediscover the same facts. Session 2 with memory opens by reading `/memories` (the auto-injected protocol makes this a default first move), finds Session 1's saved findings, and can build a synthesis from those instead of re-reading every source document.
 
-This is just-in-time retrieval in practice: rather than loading all prior
-knowledge into the first prompt, the agent pulls the relevant pieces from
-memory on demand. The file-read counts and final context in the bar
-chart quantify the difference directly.
+This is just-in-time retrieval in practice: rather than loading all prior knowledge into the first prompt, the agent pulls the relevant pieces from memory on demand. The file-read counts and final context in the bar chart quantify the difference directly.
 
-What memory gets you is cross-session persistence with lossless fidelity
-on whatever the agent chose to save. What it doesn't get you is any help
-with in-session context growth (Session 1's peak context is still high)
-and it adds tool-call overhead for every read and write. Memory solves the
-cross-session problem; clearing and compaction solve the in-session one.
+What memory gets you is cross-session persistence with lossless fidelity on whatever the agent chose to save. What it doesn't get you is any help with in-session context growth (Session 1's peak context is still high) and it adds tool-call overhead for every read and write. Memory solves the cross-session problem; clearing and compaction solve the in-session one.

## modified /cells/43/source:
@@ -1,33 +1,11 @@
 ### Implementing memory effectively
 
-The `memory_20250818` tool auto-injects a system prompt establishing a
-check-memory-first protocol and an assume-interruption mindset ("ALWAYS
-VIEW YOUR MEMORY DIRECTORY BEFORE DOING ANYTHING ELSE... Your context
-window might be reset at any moment"). This handles the basic mechanics.
-Beyond that, the
-[memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool#prompting-guidance)
-describe several ways to shape what the model saves:
+The `memory_20250818` tool auto-injects a system prompt establishing a check-memory-first protocol and an assume-interruption mindset ("ALWAYS VIEW YOUR MEMORY DIRECTORY BEFORE DOING ANYTHING ELSE... Your context window might be reset at any moment"). This handles the basic mechanics. Beyond that, the [memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool#prompting-guidance) describe several ways to shape what the model saves:
 
-**Topical guidance.** You can steer what gets written with a simple
-system-prompt instruction: `"Only write down information relevant to
-<topic> in your memory system."` For this cookbook's research agent,
-that might be "save comparative findings and key figures, not raw
-document contents."
+**Topical guidance.** You can steer what gets written with a simple system-prompt instruction: `"Only write down information relevant to <topic> in your memory system."` For this cookbook's research agent, that might be "save comparative findings and key figures, not raw document contents."
 
-**Keeping `/memories` organized.** If you observe the model creating
-cluttered memory files, try adding: `"when editing your
-memory folder, always try to keep its content up-to-date, coherent and
-organized. You can rename or delete files that are no longer relevant.
-Do not create new files unless necessary."` This keeps the directory
-from accumulating half-overlapping notes across sessions.
+**Keeping `/memories` organized.** If you observe the model creating cluttered memory files, try adding: `"when editing your memory folder, always try to keep its content up-to-date, coherent and organized. You can rename or delete files that are no longer relevant. Do not create new files unless necessary."` This keeps the directory from accumulating half-overlapping notes across sessions.
 
-**Initializer-session structure.** For multi-session work, try running a dedicated first session that sets up memory
-artifacts before substantive work begins: a progress log, a feature
-checklist, references to any setup scripts. Subsequent sessions open by
-reading those artifacts to recover state. Pre-seeding `/memories` this
-way gives later sessions a consistent structure to work within instead
-of each session inventing its own organization.
+**Initializer-session structure.** For multi-session work, try running a dedicated first session that sets up memory artifacts before substantive work begins: a progress log, a feature checklist, references to any setup scripts. Subsequent sessions open by reading those artifacts to recover state. Pre-seeding `/memories` this way gives later sessions a consistent structure to work within instead of each session inventing its own organization.
 
-**Storage hygiene.** On the client-side, you can also track file sizes to prevent unbounded growth, consider clearing out
-memory files that haven't been accessed in an extended time, and
-validate against path traversal.
+**Storage hygiene.** On the client-side, you can also track file sizes to prevent unbounded growth, consider clearing out memory files that haven't been accessed in an extended time, and validate against path traversal.

## modified /cells/44/source:
@@ -8,7 +8,4 @@
 | **Clearing** | Tool results in the current window | Old tool results are gone from context (must re-fetch if needed again) | Tool-result bloat |
 | **Memory** | External storage, across windows | Tool-call overhead; only as good as what the agent chose to save | Cross-session persistence |
 
-The chart below puts the three solo runs side by side, plus the baseline.
-Note that memory's Session 2 is a different task (follow-up synthesis) so
-the absolute numbers aren't directly comparable to the others; what
-matters for memory is the S2-with vs. S2-without comparison shown above.
+The chart below puts the three solo runs side by side, plus the baseline. Note that memory's Session 2 is a different task (follow-up synthesis) so the absolute numbers aren't directly comparable to the others; what matters for memory is the S2-with vs. S2-without comparison shown above.

## modified /cells/46/source:
@@ -1,5 +1 @@
-The three primitives address different slices of the context problem,
-which is why they compose rather than compete. Clearing and compaction
-manage what's inside the current window; memory moves information out of
-the window so it survives across sessions. Which ones you need depends on
-which parts of the problem your workload actually hits.
+The three primitives address different slices of the context problem, which is why they compose rather than compete. Clearing and compaction manage what's inside the current window; memory moves information out of the window so it survives across sessions. Which ones you need depends on which parts of the problem your workload actually hits.

## modified /cells/47/source:
@@ -2,27 +2,9 @@
 
 ## Using Them Together
 
-The three primitives target different parts of the context problem, so
-they can be layered. [Claude
-Code](https://code.claude.com/docs/en/memory) is a real-world example
-that employs compaction alongside
-[two complementary memory systems](https://code.claude.com/docs/en/memory#claudemd-vs-auto-memory):
-`CLAUDE.md` files hold user-defined instructions and rules (coding
-standards, project architecture, workflows) that the developer writes
-and checks into source control; auto memory holds learnings and
-patterns Claude writes itself (build commands, debugging insights,
-preferences discovered from corrections). Both are useful forms of
-memory for Claude Code.
+The three primitives target different parts of the context problem, so they can be layered. [Claude Code](https://code.claude.com/docs/en/memory) is a real-world example that employs compaction alongside [two complementary memory systems](https://code.claude.com/docs/en/memory#claudemd-vs-auto-memory): `CLAUDE.md` files hold user-defined instructions and rules (coding standards, project architecture, workflows) that the developer writes and checks into source control; auto memory holds learnings and patterns Claude writes itself (build commands, debugging insights, preferences discovered from corrections). Both are useful forms of memory for Claude Code.
 
-The Claude Code design shows that memory can take different shapes for
-the same agent; one form written by the user, another written by the
-model. The same applies to compaction and clearing: each has
-configuration knobs (trigger thresholds, custom instructions, which
-tools to exclude) that let you tune behavior to your use case. This is
-why the prompting and configuration guidance in the "Implementing
-effectively" sections above matters: the default behavior is a
-starting point, but the right settings depend on what your agent
-actually does.
+The Claude Code design shows that memory can take different shapes for the same agent; one form written by the user, another written by the model. The same applies to compaction and clearing: each has configuration knobs (trigger thresholds, custom instructions, which tools to exclude) that let you tune behavior to your use case. This is why the prompting and configuration guidance in the "Implementing effectively" sections above matters: the default behavior is a starting point, but the right settings depend on what your agent actually does.
 
 > **Note on `exclude_tools`**: when combining clearing with the memory tool,
 > the `exclude_tools: ["memory"]` setting (shown in the config below)
@@ -31,8 +13,7 @@ actually does.
 > [memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool#using-with-context-editing)
 > recommend this explicitly when layering the two.
 
-Below we run the research agent with all three primitives active at once
-and trace what each one does over the course of the session.
+Below we run the research agent with all three primitives active at once and trace what each one does over the course of the session.
 
 > **Note on the config**: both triggers are set above the first
 > batch's size (~167K) so the trajectory tracks the baseline through

## modified /cells/51/source:
@@ -1,37 +1,15 @@
 ### What the timeline shows
 
-With all three primitives active, each one activated for its own reason
-during the session. The trajectory tracked the baseline through batch
-1: both triggers sit above the first batch's size, so neither edit
-fired until batch 2 pushed context past ~330K. At that point clearing
-dropped the earliest reads and compaction summarized what remained,
-letting the agent continue. Memory was active throughout, with the
-agent checking `/memories` at the start and saving its comparative
-notes for future sessions. The timeline above shows all three
-cooperating across one session.
+With all three primitives active, each one activated for its own reason during the session. The trajectory tracked the baseline through batch 1: both triggers sit above the first batch's size, so neither edit fired until batch 2 pushed context past ~330K. At that point clearing dropped the earliest reads and compaction summarized what remained, letting the agent continue. Memory was active throughout, with the agent checking `/memories` at the start and saving its comparative notes for future sessions. The timeline above shows all three cooperating across one session.
 
-Getting the primitives to split the work usefully takes some 
-tuning; plan to experiment with the values against your own workload.
+Getting the primitives to split the work usefully takes some tuning; plan to experiment with the values against your own workload.
 
-The point isn't that running all three produces the "best" numbers; it's
-that they each handle a different part of the problem when that problem
-actually arises. The useful question isn't "should I use all three?" but
-"which of the three problems does my workload actually have?"
+The point isn't that running all three produces the "best" numbers; it's that they each handle a different part of the problem when that problem actually arises. The useful question isn't "should I use all three?" but "which of the three problems does my workload actually have?"
 
 ### When you might NOT want a primitive
 
-Not every workload needs every tool. A few cases where you'd
-deliberately leave one out:
+Not every workload needs every tool. A few cases where you'd deliberately leave one out:
 
-- **Skip memory** if you want each session to start fresh. A user-facing
-  chatbot where every conversation should be independent doesn't need
-  cross-session persistence; adding memory would carry state you don't
-  want.
-- **Skip compaction** if your sessions are short enough to stay under the
-  context limit naturally. Compaction is lossy (specific details get
-  summarized away), so if you don't need the headroom, you're paying
-  fidelity for nothing.
-- **Skip clearing** if the agent genuinely needs to see past tool results
-  in full. An agent doing cross-document analysis where it compares
-  passages side by side can't re-fetch its way back to a cleared result
-  fast enough; clearing would force redundant reads.
+- **Skip memory** if you want each session to start fresh. A user-facing chatbot where every conversation should be independent doesn't need cross-session persistence; adding memory would carry state you don't want.
+- **Skip compaction** if your sessions are short enough to stay under the context limit naturally. Compaction is lossy (specific details get summarized away), so if you don't need the headroom, you're paying fidelity for nothing.
+- **Skip clearing** if the agent genuinely needs to see past tool results in full. An agent doing cross-document analysis where it compares passages side by side can't re-fetch its way back to a cleared result fast enough; clearing would force redundant reads.

## modified /cells/52/source:
@@ -4,45 +4,19 @@
 
 ### Lessons from the experiments
 
-Running the research agent under these different configurations surfaces
-a few practical lessons:
-
-**The shape of the trajectory reflects what each tool does.** When
-clearing fires you see a step-down on the turn where old tool results
-were removed; in longer sessions this can repeat as context climbs back
-over the trigger. Compaction produces a larger drop each time it fires,
-since the summary replaces an entire run of turns rather than just the
-tool results within them. The plots in this cookbook are meant to
-make those effects visible, so you can see concretely what changes when
-you turn a knob. Which tool fits your workload is a separate question,
-driven by what the agent needs to do.
-
-**Lossiness is a spectrum, not a binary.** Clearing is lossless as long
-as the tool is re-callable. Compaction is lossy in a controlled way: the
-summarizer prompt (default or custom) determines what survives. Memory
-is lossless on what gets saved but is only as good as the agent's
-judgment about what to save. Each primitive trades fidelity differently.
-
-**Layering adds capability and complexity in equal measure.** Using all
-three together covers more of the context problem, but also means more
-knobs to tune and more interactions to trace. The useful question before
-adding a primitive is what specific problem in your workload it solves.
-
-**On larger context windows.** With Sonnet 4.6 and Opus 4.6 providing 1M-token
-context, that headroom is useful: more verbatim detail can stay around, and
-lossy operations can be spaced out. But as the baseline's context
-breakdown showed, the working set on a 1M model fills with stale tool
-results just as fast as on a 200K model; the difference is where the
-hard limit sits, not how quickly context accumulates.
-Context rot and prefill latency scale with how much is in the window,
-not with the window's limit, so keeping the working set lean is still
-worth doing even when the hard wall is far away.
+Running the research agent under these different configurations surfaces a few practical lessons:
+
+**The shape of the trajectory reflects what each tool does.** When clearing fires you see a step-down on the turn where old tool results were removed; in longer sessions this can repeat as context climbs back over the trigger. Compaction produces a larger drop each time it fires, since the summary replaces an entire run of turns rather than just the tool results within them. The plots in this cookbook are meant to make those effects visible, so you can see concretely what changes when you turn a knob. Which tool fits your workload is a separate question, driven by what the agent needs to do.
+
+**Lossiness is a spectrum, not a binary.** Clearing is lossless as long as the tool is re-callable. Compaction is lossy in a controlled way: the summarizer prompt (default or custom) determines what survives. Memory is lossless on what gets saved but is only as good as the agent's judgment about what to save. Each primitive trades fidelity differently.
+
+**Layering adds capability and complexity in equal measure.** Using all three together covers more of the context problem, but also means more knobs to tune and more interactions to trace. The useful question before adding a primitive is what specific problem in your workload it solves.
+
+**On larger context windows.** With Sonnet 4.6 and Opus 4.6 providing 1M-token context, that headroom is useful: more verbatim detail can stay around, and lossy operations can be spaced out. But as the baseline's context breakdown showed, the working set on a 1M model fills with stale tool results just as fast as on a 200K model; the difference is where the hard limit sits, not how quickly context accumulates. Context rot and prefill latency scale with how much is in the window, not with the window's limit, so keeping the working set lean is still worth doing even when the hard wall is far away.
 
 ### Thinking about your workload
 
-This table sketches workload characteristics and which primitive is
-worth trying first. Treat these as hypotheses to test on your own agent,
-not as answers. Every workload has quirks a table can't capture.
+This table sketches workload characteristics and which primitive is worth trying first. Treat these as hypotheses to test on your own agent, not as answers. Every workload has quirks a table can't capture.
 
 | If your workload has... | Worth trying first | Watch for |
 |---|---|---|
@@ -56,48 +30,18 @@ not as answers. Every workload has quirks a table can't capture.
 
 ### What this cookbook didn't cover
 
-**Tuning beyond the basics.** The "Implementing effectively" sections
-above give you a starting point for each primitive. The next step is
-experimentation: different use cases will get different value out of
-the same primitive depending on parameters and prompts. A coding agent
-and a research agent might both use compaction, but the `instructions`
-string that works for one won't work for the other; the same is true
-of clearing thresholds and what you guide the model to write to
-`/memories`.
-
-Setting up a test harness helps here. For a simple example, the agent loop in this
-cookbook (`run_research_session`) returns `token_trajectory`,
-`events`, and `tool_counts`: you can run
-your agent under a handful of configs, plot the trajectories side by
-side, and measure what matters to you (task quality, latency, token
-spend).
-
-**Adjacent features.** [Programmatic tool calling
-(PTC)](https://www.anthropic.com/engineering/advanced-tool-use) prevents
-large results from entering context at all by running tools inside a
-model-authored program, which is a different approach to the tool-bloat
-problem. [Tool search](https://www.anthropic.com/engineering/advanced-tool-use)
-trims tool-definition bloat when you have many tools.
+**Tuning beyond the basics.** The "Implementing effectively" sections above give you a starting point for each primitive. The next step is experimentation: different use cases will get different value out of the same primitive depending on parameters and prompts. A coding agent and a research agent might both use compaction, but the `instructions` string that works for one won't work for the other; the same is true of clearing thresholds and what you guide the model to write to `/memories`.
+
+Setting up a test harness helps here. For a simple example, the agent loop in this cookbook (`run_research_session`) returns `token_trajectory`, `events`, and `tool_counts`: you can run your agent under a handful of configs, plot the trajectories side by side, and measure what matters to you (task quality, latency, token spend).
+
+**Adjacent features.** [Programmatic tool calling (PTC)](https://www.anthropic.com/engineering/advanced-tool-use) prevents large results from entering context at all by running tools inside a model-authored program, which is a different approach to the tool-bloat problem. [Tool search](https://www.anthropic.com/engineering/advanced-tool-use) trims tool-definition bloat when you have many tools.
 
 ### Related reading
 
-The [Memory Cookbook](https://platform.claude.com/cookbook/tool-use-memory-cookbook)
-goes deeper on memory patterns with a code-review agent, and the
-[Compaction Cookbook](https://platform.claude.com/cookbook/tool-use-automatic-context-compaction)
-covers compaction in isolation. For
-a detailed case study of context management techniques in a multi-session
-software agent, see [Effective harnesses for long-running
-agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).
+The [Memory Cookbook](https://platform.claude.com/cookbook/tool-use-memory-cookbook) goes deeper on memory patterns with a code-review agent, and the [Compaction Cookbook](https://platform.claude.com/cookbook/tool-use-automatic-context-compaction) covers compaction in isolation. For a detailed case study of context management techniques in a multi-session software agent, see [Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).
 
 ## Conclusion
 
-This notebook walked through three context-management primitives for
-long-running agents: compaction to compress conversational history,
-tool-result clearing to drop re-fetchable tool output, and the memory
-tool to persist knowledge across sessions. Each addresses a different
-slice of the context problem.
+This notebook walked through three context-management primitives for long-running agents: compaction to compress conversational history, tool-result clearing to drop re-fetchable tool output, and the memory tool to persist knowledge across sessions. Each addresses a different slice of the context problem.
 
-Which primitives matter for your agent depends on where its context growth
-actually comes from. The configs and agent loop in this cookbook are a
-starting point for running your own workload under different
-configurations and seeing what changes.
+Which primitives matter for your agent depends on where its context growth actually comes from. The configs and agent loop in this cookbook are a starting point for running your own workload under different configurations and seeing what changes.

Generated by nbdime

@github-actions

Copy link
Copy Markdown

@isabella-anthropic isabella-anthropic merged commit 50871d8 into main Mar 31, 2026
8 checks passed

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: APPROVE

Summary

This PR makes purely mechanical formatting changes to the context engineering notebook, joining hard-wrapped prose lines (~80 chars) into single-line paragraphs across 25 markdown cells to fix rendering in the cookbook UI.

Actionable Feedback

No actionable issues found. This PR is clean and ready to merge.

Detailed Review

Code Quality: Changes are exactly as described. All 25 affected markdown cells had their hard-wrapped prose joined into single-line paragraphs. No prose content was altered - words, punctuation, hyperlinks, and inline code are unchanged. All list items, headers, blockquotes, tables, and fenced code blocks are correctly preserved.

Security: No security concerns - purely a formatting change.

Positive Notes:

  • All 27 code cells are byte-identical to main (none appear in the diff).
  • All output blocks and execution counts are preserved.
  • Single-line paragraphs render consistently across all Markdown environments - this is the right fix for the cookbook UI rendering issue.
  • The PR description is accurate about what was and was not modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants