Skip to content

Claude extended thinking 400 error caused by summarizeVirtualTools gpt-4o-mini responses in conversation history #4919

@rlerikse

Description

@rlerikse

Bug Description

When using Claude models (Opus 4.6 / Sonnet 4.6) with extended thinking enabled, pipeline-style workflows that span multiple tool calls eventually fail with an API 400 error. The root cause is summarizeVirtualTools responses generated by gpt-4o-mini being included in the conversation history sent to Claude's API.

Error Message

Request Failed: 400 {"message":"messages.9.content.0.type: Expected thinking or redacted_thinking,
but found text. When thinking is enabled, a final assistant message must start with a thinking block
(preceeding the lastmost set of tool_use and tool_result blocks). We recommend you include thinking
blocks from previous turns. To avoid this requirement, disable thinking."}

Reproduction Steps

  1. Open a workspace with agent .md files configured for Claude Opus 4.6 / Sonnet 4.6
  2. Have deferred tool groups active (e.g., SonarQube, Python environment tools)
  3. Start a multi-step agent workflow that uses runSubagent multiple times
  4. Observe that summarizeVirtualTools calls use gpt-4o-mini at session start
  5. After ~8-10 conversation turns (including subagent invocations and user checkpoints), the next Claude API call fails with the 400 error above

Root Cause Analysis

Claude's extended thinking API requires all assistant messages in the conversation history to begin with a thinking or redacted_thinking content block. The summarizeVirtualTools mechanism uses gpt-4o-mini-2024-07-18 to generate tool group summaries. These GPT-4o-mini responses are stored as assistant messages without thinking blocks. When VS Code later constructs the full messages[] array for a Claude API call, these non-thinking assistant messages violate the API contract.

From debug logs, the sequence is:

[20:57:31.429Z] MODEL_TURN: summarizeVirtualTools | model: gpt-4o-mini-2024-07-18  <- no thinking block
[20:57:31.450Z] MODEL_TURN: summarizeVirtualTools | model: gpt-4o-mini-2024-07-18  <- no thinking block
[20:57:34.657Z] MODEL_TURN: panel/editAgent   | model: claude-sonnet-4              <- has thinking
...
[later]  400 error at messages.9  <- mixed-model history causes validation failure

The error manifests non-deterministically after enough conversation turns accumulate, because Claude's validation depends on the specific message index positions.

Suggested Fixes

  1. Inject redacted_thinking stubs into non-Claude assistant messages before sending them to Claude's API
  2. Strip summarizeVirtualTools responses from the conversation history after the first turn (they are only needed once for tool discovery)
  3. Use the same model (Claude) for summarizeVirtualTools when the primary session model is Claude

Environment

  • VS Code: Latest Insiders (April 2026)
  • Copilot Chat Extension: Latest
  • Primary Model: Claude Opus 4.6 / Claude Sonnet 4.6
  • OS: macOS

Workaround

Using shorter sessions (fewer conversation turns) reduces the probability of hitting the message index threshold. Retrying also works since the fresh conversation lacks the stale gpt-4o-mini messages.

Copilot Request ID

a8e136b1-cb39-47e8-9a84-5031c536c723

GH Request ID

B1D2:22790A:1D09BD5:20104C6:69CD8C28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions