Skip to content

SummarizationMiddleware breaks Anthropic extended thinking by removing thinking blocks during active assistant turns #34794

@Nilesh3105

Description

@Nilesh3105

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-cli
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-perplexity
  • langchain-prompty
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Reproduction Steps / Example Code (Python)

from langchain_anthropic import ChatAnthropic
from langchain.agents import create_agent
from langchain.agents.middleware.summarization import SummarizationMiddleware
from langchain_core.tools import tool

@tool
def get_data(source: str) -> str:
    """Fetch data from a source."""
    return f"Data from {source}: [sample data]"

@tool  
def process_data(data: str) -> str:
    """Process the fetched data."""
    return f"Processed: {data}"

@tool
def format_output(processed: str) -> str:
    """Format the final output."""
    return f"Formatted: {processed}"

model = ChatAnthropic(
    model_name="claude-sonnet-4-5-20250929",
    thinking={"type": "enabled", "budget_tokens": 4000}
)

agent = create_agent(
    model=model,
    tools=[get_data, process_data, format_output],
    middleware=[
        SummarizationMiddleware(
            model=ChatAnthropic(model_name="claude-sonnet-4-5-20250929"),
            trigger=("messages", 2),  # Aggressive trigger to reproduce the bug
            keep=("messages", 2)
        )
    ]
)

# This will trigger multiple tool calls in sequence:
# 1. User: asks question
# 2. AI: [thinking] + [tool_call: get_data]
# 3. [tool_result]
# 4. AI: [thinking] + [tool_call: process_data]  ← Summarization triggers here
#    (Removes thinking blocks from step 2, breaking the turn)
# 5. [tool_result]
# 6. AI: [thinking] + [tool_call: format_output]
# 7. API ERROR: Missing thinking block in turn

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Get data from 'database', process it, and format the output"
    }]
})

Error Message and Stack Trace (if applicable)

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking'}, 'request_id': 'req_<redacted>'}

Description

When using Claude with extended thinking (thinking={"type": "enabled"}) alongside SummarizationMiddleware, the middleware can remove thinking blocks from assistant messages during active tool use sequences, causing Anthropic API errors.

According to Anthropic's extended thinking documentation, thinking blocks must be preserved during the entire "assistant turn" - which includes tool use sequences. An assistant turn is only complete after the final response with no pending tool calls.

During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model's reasoning flow and conversation integrity.

When Claude invokes tools, it is pausing its construction of a response to await external information. When tool results are returned, Claude will continue building that existing response. This necessitates preserving thinking blocks during tool use, for a couple of reasons:

  1. Reasoning continuity: The thinking blocks capture Claude's step-by-step reasoning that led to tool requests. When you post tool results, including the original thinking ensures Claude can continue its reasoning from where it left off.
  2. Context maintenance: While tool results appear as user messages in the API structure, they're part of a continuous reasoning flow. Preserving thinking blocks maintains this conceptual flow across multiple API calls. For more information on context management, see our guide on context windows.

The Problem: If summarization triggers in the middle of a multi-step tool use sequence, it removes thinking blocks from earlier AI messages in the same turn, violating Anthropic's requirement and causing the API to reject subsequent requests.

Note: The latest version of SummarizationMiddleware (in #34609 ) has improved AI/Tool message pair handling in _find_safe_cutoff_point() by searching backward to find matching AIMessages. However, the middleware still lacks turn-awareness and doesn't understand when thinking blocks need to be preserved.

Expected Behavior

The middleware should understand that an assistant turn includes all AI responses until one with no tool calls.

For example, this entire sequence is ONE turn:

User: "Get data, process it, format it"
AI: [thinking] + [tool_call: get_data]      ← Turn starts
[tool_result]
AI: [thinking] + [tool_call: process_data]  ← Still same turn
[tool_result]
AI: [thinking] + [tool_call: format_output] ← Still same turn
[tool_result]
AI: [thinking] + [text: "Here's the result"] ← Turn completes (no tool calls)

All thinking blocks from all AI messages within this turn must be preserved until the turn completes.

From Anthropic's documentation:

From the model's perspective, tool use loops are part of the assistant turn. An assistant turn doesn't complete until Claude finishes its full response, which may include multiple tool calls and results.

Proposed Solution

The _find_safe_cutoff_point() method needs to be enhanced with turn-awareness (potentially only in the case of Anthropic Models with extended thinking). Two possible approaches:

Approach 1: Preserve Entire Turn (Recommended)

Detect when an assistant turn is in progress and preserve all messages from the start of that turn. A turn is complete only when an AI message has no tool calls or a new HumanMessage appears.

Benefits:

  • Simple to implement
  • Safest approach - fully complies with Anthropic's requirement to preserve all thinking blocks in active turns
  • Clear semantics

Approach 2: Selective Preservation

Keep the first AI message with thinking blocks in the turn, but allow summarization of intermediate tool results and responses.

Benefits:

  • More token-efficient for very long turns with many sequential tool calls
  • Could handle edge cases with 20+ tool calls in a single turn

Challenges:

  • More complex to implement
  • Potential for LLM performance degradation because it lost full context of previous tool calls

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:56 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6041
Python Version: 3.13.9 (main, Oct 14 2025, 21:10:40) [Clang 20.1.4 ]

Package Information

langchain_core: 1.2.7
langchain: 1.2.6
langsmith: 0.6.4
langchain_anthropic: 1.3.1
langgraph_sdk: 0.3.3

Optional packages not installed

langserve

Other Dependencies

anthropic: 0.76.0
httpx: 0.28.1
jsonpatch: 1.33
langgraph: 1.0.6
orjson: 3.11.5
packaging: 25.0
pydantic: 2.12.5
pyyaml: 6.0.3
requests: 2.32.5
requests-toolbelt: 1.0.0
tenacity: 9.1.2
typing-extensions: 4.15.0
uuid-utils: 0.13.0
zstandard: 0.25.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing featureexternallangchain`langchain` package issues & PRs

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions