-
Notifications
You must be signed in to change notification settings - Fork 20.6k
Description
Checked other resources
- This is a bug, not a usage question.
- I added a clear and descriptive title that summarizes this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- This is not related to the langchain-community package.
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Package (Required)
- langchain
- langchain-openai
- langchain-anthropic
- langchain-classic
- langchain-core
- langchain-cli
- langchain-model-profiles
- langchain-tests
- langchain-text-splitters
- langchain-chroma
- langchain-deepseek
- langchain-exa
- langchain-fireworks
- langchain-groq
- langchain-huggingface
- langchain-mistralai
- langchain-nomic
- langchain-ollama
- langchain-perplexity
- langchain-prompty
- langchain-qdrant
- langchain-xai
- Other / not sure / general
Reproduction Steps / Example Code (Python)
from langchain_anthropic import ChatAnthropic
from langchain.agents import create_agent
from langchain.agents.middleware.summarization import SummarizationMiddleware
from langchain_core.tools import tool
@tool
def get_data(source: str) -> str:
"""Fetch data from a source."""
return f"Data from {source}: [sample data]"
@tool
def process_data(data: str) -> str:
"""Process the fetched data."""
return f"Processed: {data}"
@tool
def format_output(processed: str) -> str:
"""Format the final output."""
return f"Formatted: {processed}"
model = ChatAnthropic(
model_name="claude-sonnet-4-5-20250929",
thinking={"type": "enabled", "budget_tokens": 4000}
)
agent = create_agent(
model=model,
tools=[get_data, process_data, format_output],
middleware=[
SummarizationMiddleware(
model=ChatAnthropic(model_name="claude-sonnet-4-5-20250929"),
trigger=("messages", 2), # Aggressive trigger to reproduce the bug
keep=("messages", 2)
)
]
)
# This will trigger multiple tool calls in sequence:
# 1. User: asks question
# 2. AI: [thinking] + [tool_call: get_data]
# 3. [tool_result]
# 4. AI: [thinking] + [tool_call: process_data] ← Summarization triggers here
# (Removes thinking blocks from step 2, breaking the turn)
# 5. [tool_result]
# 6. AI: [thinking] + [tool_call: format_output]
# 7. API ERROR: Missing thinking block in turn
result = agent.invoke({
"messages": [{
"role": "user",
"content": "Get data from 'database', process it, and format the output"
}]
})Error Message and Stack Trace (if applicable)
anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking'}, 'request_id': 'req_<redacted>'}Description
When using Claude with extended thinking (thinking={"type": "enabled"}) alongside SummarizationMiddleware, the middleware can remove thinking blocks from assistant messages during active tool use sequences, causing Anthropic API errors.
According to Anthropic's extended thinking documentation, thinking blocks must be preserved during the entire "assistant turn" - which includes tool use sequences. An assistant turn is only complete after the final response with no pending tool calls.
During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model's reasoning flow and conversation integrity.
When Claude invokes tools, it is pausing its construction of a response to await external information. When tool results are returned, Claude will continue building that existing response. This necessitates preserving thinking blocks during tool use, for a couple of reasons:
- Reasoning continuity: The thinking blocks capture Claude's step-by-step reasoning that led to tool requests. When you post tool results, including the original thinking ensures Claude can continue its reasoning from where it left off.
- Context maintenance: While tool results appear as user messages in the API structure, they're part of a continuous reasoning flow. Preserving thinking blocks maintains this conceptual flow across multiple API calls. For more information on context management, see our guide on context windows.
The Problem: If summarization triggers in the middle of a multi-step tool use sequence, it removes thinking blocks from earlier AI messages in the same turn, violating Anthropic's requirement and causing the API to reject subsequent requests.
Note: The latest version of SummarizationMiddleware (in #34609 ) has improved AI/Tool message pair handling in _find_safe_cutoff_point() by searching backward to find matching AIMessages. However, the middleware still lacks turn-awareness and doesn't understand when thinking blocks need to be preserved.
Expected Behavior
The middleware should understand that an assistant turn includes all AI responses until one with no tool calls.
For example, this entire sequence is ONE turn:
User: "Get data, process it, format it"
AI: [thinking] + [tool_call: get_data] ← Turn starts
[tool_result]
AI: [thinking] + [tool_call: process_data] ← Still same turn
[tool_result]
AI: [thinking] + [tool_call: format_output] ← Still same turn
[tool_result]
AI: [thinking] + [text: "Here's the result"] ← Turn completes (no tool calls)
All thinking blocks from all AI messages within this turn must be preserved until the turn completes.
From Anthropic's documentation:
From the model's perspective, tool use loops are part of the assistant turn. An assistant turn doesn't complete until Claude finishes its full response, which may include multiple tool calls and results.
Proposed Solution
The _find_safe_cutoff_point() method needs to be enhanced with turn-awareness (potentially only in the case of Anthropic Models with extended thinking). Two possible approaches:
Approach 1: Preserve Entire Turn (Recommended)
Detect when an assistant turn is in progress and preserve all messages from the start of that turn. A turn is complete only when an AI message has no tool calls or a new HumanMessage appears.
Benefits:
- Simple to implement
- Safest approach - fully complies with Anthropic's requirement to preserve all thinking blocks in active turns
- Clear semantics
Approach 2: Selective Preservation
Keep the first AI message with thinking blocks in the turn, but allow summarization of intermediate tool results and responses.
Benefits:
- More token-efficient for very long turns with many sequential tool calls
- Could handle edge cases with 20+ tool calls in a single turn
Challenges:
- More complex to implement
- Potential for LLM performance degradation because it lost full context of previous tool calls
System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 25.2.0: Tue Nov 18 21:09:56 PST 2025; root:xnu-12377.61.12~1/RELEASE_ARM64_T6041
Python Version: 3.13.9 (main, Oct 14 2025, 21:10:40) [Clang 20.1.4 ]
Package Information
langchain_core: 1.2.7
langchain: 1.2.6
langsmith: 0.6.4
langchain_anthropic: 1.3.1
langgraph_sdk: 0.3.3
Optional packages not installed
langserve
Other Dependencies
anthropic: 0.76.0
httpx: 0.28.1
jsonpatch: 1.33
langgraph: 1.0.6
orjson: 3.11.5
packaging: 25.0
pydantic: 2.12.5
pyyaml: 6.0.3
requests: 2.32.5
requests-toolbelt: 1.0.0
tenacity: 9.1.2
typing-extensions: 4.15.0
uuid-utils: 0.13.0
zstandard: 0.25.0