Skip to content

Provide unified thinking config among different providers #1463

@hustxiayang

Description

@hustxiayang

Feature Proposal: Unified Thinking/Reasoning Configuration Interface

1. Goal

Provide a single, unified interface for configuring advanced thinking and reasoning settings across different Large Language Model (LLM) providers. This standardization is a key requirement and major benefit of using our gateway proxy, significantly improving the developer experience.

2. Current Status: Vendor-Specific Configuration

Currently, users must manage different configuration fields and structures for similar functionality when calling different vendor APIs. This complexity defeats the purpose of a unified gateway.

for anthropic models, we have

type AnthropicVendorFields struct {
	Thinking *anthropic.ThinkingConfigParamUnion `json:"thinking,omitzero"`
}

type ThinkingConfigParamUnion struct {
	OfEnabled  *ThinkingConfigEnabledParam  `json:",omitzero,inline"`
	OfDisabled *ThinkingConfigDisabledParam `json:",omitzero,inline"`
	paramUnion
}

type ThinkingConfigEnabledParam struct {
	BudgetTokens int64 `json:"budget_tokens,required"`
	Type constant.Enabled `json:"type,required"`
}

type ThinkingConfigDisabledParam struct {
	Type constant.Disabled `json:"type,required"`
}

for gemini models, we have

type GCPVertexAIGenerationConfig struct {
	ThinkingConfig *genai.ThinkingConfig `json:"thinkingConfig,omitzero"`
}
type ThinkingConfig struct {
	IncludeThoughts bool `json:"includeThoughts,omitempty"`
	ThinkingBudget *int32 `json:"thinkingBudget,omitempty"`
}

As a result, users need to use different inference configs for different models. For example:

if (model is a anthropic model){
     client.completions.create(
          ...
          thinking = {"type": "enabled", "budget_tokens": 1024}
          ...
   )
}
elif (model is a gemini model){
      client.completions.create(
          ...
          thinkingConfig = {"thinkingBudget": 1024, "includeThoughts": false}
          ...
      )

}

3. Proposal

3.1 Reason

However, the interface among these two are actually very similar, most of them are the same thing with different names. For example ThinkingConfig and Thinking, ThinkingBudget and BudgetTokens.

3.2 Unified interface

Thus, we propose to use a single unified interface directly in ChatCompletionRequest for both providers. The unified interface is similar to anthropic's thinking config, but extended with includeThoughts.

Here is the proposed definition:

Thinking *ThinkingUnion `json:"thinking,omitzero"`

type ThinkingUnion struct {
	OfEnabled  *ThinkingEnabled  `json:",omitzero,inline"`
	OfDisabled *ThinkingDisabled `json:",omitzero,inline"`
}

type ThinkingEnabled struct {
	BudgetTokens int64 `json:"budget_tokens"`
        Type string `json:"type"`
	IncludeThoughts bool `json:"includeThoughts,omitempty"`
}

type ThinkingDisabled struct {
	Type string `json:"type,"`
}

3.3 Translation

The translation from ThinkingUnion into anthropic's thinking is trivial as it just need to ignore IncludeThoughts and extract other fields.

The translation from ThinkingUnion into gemini models' thinkingConfig is also straightforward: It just need to extract the filed BudgetTokens as thinkingBudget and directly use IncludeThoughts

With this change, users can use the same codes for models from these two providers, which is a key advantage to use a proxy.

4. Further scope of unified interface

Reasoning models from other providers/backends also have different config definitions. For example in vLLM, to disable thinking for qwen3 models, users need to use

client.completions.create(
    ...
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}
    ...
)

We can also provide a unified interface for this case, then users can write:

client.completions.create(
    ...
    thinking={"type": disabled"}
    ...
)

and we translate it into extra_body={"chat_template_kwargs": {"enable_thinking": False}} internally.

In summary, we propose to achieve the goal:

  • For openai's gpt models, we use reasoning_effort to config thinking parameters.
  • For all other backends, including models using anthropic api, converse api, gcp vertex api, vLLM etc, we provide a single unified ThinkingUnion interface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions