Skip to content

feat(google): support OpenAI-compatible reasoning_effort for Gemini thinking models#160

Open
Tushar49 wants to merge 5 commits into
tashfeenahmed:mainfrom
Tushar49:Tushar49-patch-1
Open

feat(google): support OpenAI-compatible reasoning_effort for Gemini thinking models#160
Tushar49 wants to merge 5 commits into
tashfeenahmed:mainfrom
Tushar49:Tushar49-patch-1

Conversation

@Tushar49

@Tushar49 Tushar49 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Adds OpenAI-compatible reasoning_effort parameter for the Google provider, mapping to Gemini's thinkingConfig so callers can opt into Gemini's thinking budget without leaving the OpenAI request shape.

What this PR does

  • Schema: shared/types.ts + server/src/routes/proxy.ts accept reasoning_effort: "minimal" | "low" | "medium" | "high" on ChatCompletionRequest. Omitting it preserves existing behavior exactly (no thinkingConfig is sent).
  • Mapping (server/src/providers/google.ts):
    • Gemini 3.xthinkingConfig.thinkingLevel (MINIMAL / LOW / MEDIUM / HIGH). gemini-3.1-pro does not accept MINIMAL → falls back to LOW.
    • Other Gemini models (2.x, etc.) → thinkingConfig.thinkingBudget integer. Canonical mapping: minimal → 512 (Pro: 128), low → 1024, medium → 8192, high → 24576. Clamped per model: gemini-2.5-pro min 128 / max 24576, Flash 0–24576, Flash-Lite 512–24576.
    • Always sets includeThoughts: true when reasoning_effort is provided.
  • Response parsing: Gemini returns candidates[0].content.parts[] where parts may carry thought: true. Non-thought parts go to message.content (unchanged); thought parts are aggregated into message.reasoning_content (OpenAI extension used by opencode and other clients). usageMetadata.thoughtsTokenCountusage.completion_tokens_details.reasoning_tokens.
  • Streaming: thought-part deltas emit as delta.reasoning_content, normal text deltas as delta.content.

Why

OpenAI clients (opencode, Continue, Cursor, etc.) already use reasoning_effort to drive reasoning models. Today they have no way to dial Gemini's thinking budget through freellmapi without bypassing the proxy. Google's own OpenAI-compatible endpoint maps the same field to the same thinkingConfig — this PR brings freellmapi to parity.

Verification

From server/:

New tests cover: effort → budget mapping per model class, Gemini 3.x thinkingLevel path, gemini-3.1-pro MINIMAL → LOW fallback, includeThoughts on/off semantics, thought-part parsing into reasoning_content, reasoning_tokens in usage, and streaming delta.reasoning_content.

Style / scope

Happy to iterate on naming, default budgets, or split into smaller PRs if preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant