Skip to content

translator: gemini non-stream reasoning as string and thinking_blocks#1995

Open
Flgado wants to merge 1 commit intoenvoyproxy:mainfrom
Flgado:fix/gemini-reasoning-content-string-thinking-blocks
Open

translator: gemini non-stream reasoning as string and thinking_blocks#1995
Flgado wants to merge 1 commit intoenvoyproxy:mainfrom
Flgado:fix/gemini-reasoning-content-string-thinking-blocks

Conversation

@Flgado
Copy link
Copy Markdown

@Flgado Flgado commented Mar 29, 2026

Description
Non-streaming Vertex / Gemini chat completions now follow the LiteLLM-style split already discussed in the community: reasoning_content is always a plain string (the visible thinking summary text), and optional thinking_blocks carry structured metadata (e.g. signatures) that OpenAI’s core schema does not define. This matches the direction agreed on in Slack (link) and avoids exposing a Bedrock-shaped nested object in reasoning_content, which breaks common OpenAI-compatible clients.

Implementation highlights:

  • Extend the OpenAI-shaped ChatCompletionResponseChoiceMessage schema with ThinkingBlock

  • In the Gemini helper, map thought parts to the string union for reasoning_content and populate thinking_blocks; when the model attaches a thought signature to the first function-call part (parallel tools) or only there, merge or attach that signature into thinking_blocks so clients can round-trip history together with tool_calls and assistant content parts of type thinking + signature.

  • Add unit tests in internal/translator/gemini_helper_test.go for geminiCandidatesToOpenAIChoices and for signature extraction in extractTextAndThoughtSummaryFromGeminiParts.

This commit improves interoperability with LiteLLM, LangChain, and other clients that expect reasoning_content to be a string while still preserving provider-specific seals for advanced use cases.

Fixes #1974

Special notes for reviewers

  • Request-side replay of thinking continues to use assistant content as structured parts (type: "thinking", text, signature) plus tool_calls where applicable; this PR focuses on response shape and mapping from Gemini parts.
  • Optional follow-up (not in this PR): emit or accept Google’s OpenAI-compat extra_content.google.thought_signature on tool_calls for clients that follow that dialect verbatim; the LiteLLM-style fields remain the primary contract.

Testing:

  • Manually verified chat completions against Vertex with gemini-2.5-flash and gemini-3.1-pro-preview (thinking enabled): reasoning_content is a plain string, thinking_blocks present when expected, and multi-turn replay with thinking + signature in assistant content.

Signed-off-by: Joao Folgado <jfolgado94@gmail.com>
@Flgado Flgado requested a review from a team as a code owner March 29, 2026 16:14
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 29, 2026
@dosubot
Copy link
Copy Markdown

dosubot bot commented Mar 29, 2026

Related Documentation

2 document(s) may need updating based on files changed in this PR:

Envoy's Space

vendor-specific-fields /ai-gateway/blob/main/site/docs/capabilities/llm-integrations/vendor-specific-fields.md — ⏳ Awaiting Merge
vendor-specific-fields /ai-gateway/blob/main/site/versioned_docs/version-0.5/capabilities/llm-integrations/vendor-specific-fields.md — ⏳ Awaiting Merge

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI-GCP Vertex AI translator: reasoning_content returned as nested dict instead of string (non-streaming)

1 participant