SummarizationMiddleware breaks Anthropic extended thinking by removing thinking blocks during active assistant turns

### Checked other resources

- [x] This is a bug, not a usage question.
- [x] I added a clear and descriptive title that summarizes this issue.
- [x] I used the GitHub search to find a similar question and didn't find it.
- [x] I am sure that this is a bug in LangChain rather than my code.
- [x] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- [x] This is not related to the langchain-community package.
- [x] I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

### Package (Required)

- [x] langchain
- [ ] langchain-openai
- [ ] langchain-anthropic
- [ ] langchain-classic
- [ ] langchain-core
- [ ] langchain-cli
- [ ] langchain-model-profiles
- [ ] langchain-tests
- [ ] langchain-text-splitters
- [ ] langchain-chroma
- [ ] langchain-deepseek
- [ ] langchain-exa
- [ ] langchain-fireworks
- [ ] langchain-groq
- [ ] langchain-huggingface
- [ ] langchain-mistralai
- [ ] langchain-nomic
- [ ] langchain-ollama
- [ ] langchain-perplexity
- [ ] langchain-prompty
- [ ] langchain-qdrant
- [ ] langchain-xai
- [ ] Other / not sure / general

### Reproduction Steps / Example Code (Python)

```python
from langchain_anthropic import ChatAnthropic
from langchain.agents import create_agent
from langchain.agents.middleware.summarization import SummarizationMiddleware
from langchain_core.tools import tool

@tool
def get_data(source: str) -> str:
    """Fetch data from a source."""
    return f"Data from {source}: [sample data]"

@tool  
def process_data(data: str) -> str:
    """Process the fetched data."""
    return f"Processed: {data}"

@tool
def format_output(processed: str) -> str:
    """Format the final output."""
    return f"Formatted: {processed}"

model = ChatAnthropic(
    model_name="claude-sonnet-4-5-20250929",
    thinking={"type": "enabled", "budget_tokens": 4000}
)

agent = create_agent(
    model=model,
    tools=[get_data, process_data, format_output],
    middleware=[
        SummarizationMiddleware(
            model=ChatAnthropic(model_name="claude-sonnet-4-5-20250929"),
            trigger=("messages", 2),  # Aggressive trigger to reproduce the bug
            keep=("messages", 2)
        )
    ]
)

# This will trigger multiple tool calls in sequence:
# 1. User: asks question
# 2. AI: [thinking] + [tool_call: get_data]
# 3. [tool_result]
# 4. AI: [thinking] + [tool_call: process_data]  ← Summarization triggers here
#    (Removes thinking blocks from step 2, breaking the turn)
# 5. [tool_result]
# 6. AI: [thinking] + [tool_call: format_output]
# 7. API ERROR: Missing thinking block in turn

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Get data from 'database', process it, and format the output"
    }]
})
```

### Error Message and Stack Trace (if applicable)

```shell
anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking'}, 'request_id': 'req_<redacted>'}
```

## Description

When using Claude with extended thinking (thinking={"type": "enabled"}) alongside SummarizationMiddleware, the middleware can remove thinking blocks from assistant messages during active tool use sequences, causing Anthropic API errors.

According to [Anthropic's extended thinking documentation](https://platform.claude.com/docs/en/build-with-claude/extended-thinking#preserving-thinking-blocks), thinking blocks must be preserved during the entire "assistant turn" - which includes tool use sequences. An assistant turn is only complete after the final response with no pending tool calls. 

> During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model's reasoning flow and conversation integrity.
> 
> When Claude invokes tools, it is pausing its construction of a response to await external information. When tool results are returned, Claude will continue building that existing response. This necessitates preserving thinking blocks during tool use, for a couple of reasons:
> 
>   1. Reasoning continuity: The thinking blocks capture Claude's step-by-step reasoning that led to tool requests. When you post tool results, including the original thinking ensures Claude can continue its reasoning from where it left off.
>    2. Context maintenance: While tool results appear as user messages in the API structure, they're part of a continuous reasoning flow. Preserving thinking blocks maintains this conceptual flow across multiple API calls. For more information on context management, see our [guide on context windows](https://platform.claude.com/docs/en/build-with-claude/context-windows).

The Problem: If summarization triggers in the middle of a multi-step tool use sequence, it removes thinking blocks from earlier AI messages in the same turn, violating Anthropic's requirement and causing the API to reject subsequent requests.

Note: The latest version of SummarizationMiddleware (in #34609 )  has improved AI/Tool message pair handling in _find_safe_cutoff_point() by searching backward to find matching AIMessages. However, the middleware still lacks turn-awareness and doesn't understand when thinking blocks need to be preserved.

## Expected Behavior

The middleware should understand that an assistant turn includes **all AI responses until one with no tool calls**. 

For example, this entire sequence is **ONE turn**:

```
User: "Get data, process it, format it"
AI: [thinking] + [tool_call: get_data]      ← Turn starts
[tool_result]
AI: [thinking] + [tool_call: process_data]  ← Still same turn
[tool_result]
AI: [thinking] + [tool_call: format_output] ← Still same turn
[tool_result]
AI: [thinking] + [text: "Here's the result"] ← Turn completes (no tool calls)
```

**All thinking blocks** from all AI messages within this turn must be preserved until the turn completes.

From Anthropic's documentation:

> From the model's perspective, tool use loops are part of the assistant turn. An assistant turn doesn't complete until Claude finishes its full response, which may include multiple tool calls and results.

## Proposed Solution

The `_find_safe_cutoff_point()` method needs to be enhanced with turn-awareness (potentially only in the case of Anthropic Models with extended thinking). Two possible approaches:

### Approach 1: Preserve Entire Turn (Recommended)

Detect when an assistant turn is in progress and preserve all messages from the start of that turn. A turn is complete only when an AI message has no tool calls or a new HumanMessage appears.

**Benefits:**
- Simple to implement
- Safest approach - fully complies with Anthropic's requirement to preserve all thinking blocks in active turns
- Clear semantics

### Approach 2: Selective Preservation

Keep the first AI message with thinking blocks in the turn, but allow summarization of intermediate tool results and responses.

**Benefits:**
- More token-efficient for very long turns with many sequential tool calls
- Could handle edge cases with 20+ tool calls in a single turn

**Challenges:**
- More complex to implement
- Potential for LLM performance degradation because it lost full context of previous tool calls


### System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:56 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6041
> Python Version:  3.13.9 (main, Oct 14 2025, 21:10:40) [Clang 20.1.4 ]

Package Information
-------------------
> langchain_core: 1.2.7
> langchain: 1.2.6
> langsmith: 0.6.4
> langchain_anthropic: 1.3.1
> langgraph_sdk: 0.3.3

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> anthropic: 0.76.0
> httpx: 0.28.1
> jsonpatch: 1.33
> langgraph: 1.0.6
> orjson: 3.11.5
> packaging: 25.0
> pydantic: 2.12.5
> pyyaml: 6.0.3
> requests: 2.32.5
> requests-toolbelt: 1.0.0
> tenacity: 9.1.2
> typing-extensions: 4.15.0
> uuid-utils: 0.13.0
> zstandard: 0.25.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SummarizationMiddleware breaks Anthropic extended thinking by removing thinking blocks during active assistant turns #34794

Checked other resources

Package (Required)

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

Expected Behavior

Proposed Solution

Approach 1: Preserve Entire Turn (Recommended)

Approach 2: Selective Preservation

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SummarizationMiddleware breaks Anthropic extended thinking by removing thinking blocks during active assistant turns #34794

Description

Checked other resources

Package (Required)

Reproduction Steps / Example Code (Python)

Error Message and Stack Trace (if applicable)

Description

Expected Behavior

Proposed Solution

Approach 1: Preserve Entire Turn (Recommended)

Approach 2: Selective Preservation

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions