- 
                Notifications
    
You must be signed in to change notification settings  - Fork 117
 
Description
Feature Proposal: Unified Thinking/Reasoning Configuration Interface
1. Goal
Provide a single, unified interface for configuring advanced thinking and reasoning settings across different Large Language Model (LLM) providers. This standardization is a key requirement and major benefit of using our gateway proxy, significantly improving the developer experience.
2. Current Status: Vendor-Specific Configuration
Currently, users must manage different configuration fields and structures for similar functionality when calling different vendor APIs. This complexity defeats the purpose of a unified gateway.
for anthropic models, we have
type AnthropicVendorFields struct {
	Thinking *anthropic.ThinkingConfigParamUnion `json:"thinking,omitzero"`
}
type ThinkingConfigParamUnion struct {
	OfEnabled  *ThinkingConfigEnabledParam  `json:",omitzero,inline"`
	OfDisabled *ThinkingConfigDisabledParam `json:",omitzero,inline"`
	paramUnion
}
type ThinkingConfigEnabledParam struct {
	BudgetTokens int64 `json:"budget_tokens,required"`
	Type constant.Enabled `json:"type,required"`
}
type ThinkingConfigDisabledParam struct {
	Type constant.Disabled `json:"type,required"`
}for gemini models, we have
type GCPVertexAIGenerationConfig struct {
	ThinkingConfig *genai.ThinkingConfig `json:"thinkingConfig,omitzero"`
}
type ThinkingConfig struct {
	IncludeThoughts bool `json:"includeThoughts,omitempty"`
	ThinkingBudget *int32 `json:"thinkingBudget,omitempty"`
}As a result, users need to use different inference configs for different models. For example:
if (model is a anthropic model){
     client.completions.create(
          ...
          thinking = {"type": "enabled", "budget_tokens": 1024}
          ...
   )
}
elif (model is a gemini model){
      client.completions.create(
          ...
          thinkingConfig = {"thinkingBudget": 1024, "includeThoughts": false}
          ...
      )
}3. Proposal
3.1 Reason
However, the interface among these two are actually very similar, most of them are the same thing with different names. For example ThinkingConfig and Thinking, ThinkingBudget and BudgetTokens.
3.2 Unified interface
Thus, we propose to use a single unified interface directly in ChatCompletionRequest for both providers. The unified interface is similar to anthropic's thinking config, but extended with includeThoughts.
Here is the proposed definition:
Thinking *ThinkingUnion `json:"thinking,omitzero"`
type ThinkingUnion struct {
	OfEnabled  *ThinkingEnabled  `json:",omitzero,inline"`
	OfDisabled *ThinkingDisabled `json:",omitzero,inline"`
}
type ThinkingEnabled struct {
	BudgetTokens int64 `json:"budget_tokens"`
        Type string `json:"type"`
	IncludeThoughts bool `json:"includeThoughts,omitempty"`
}
type ThinkingDisabled struct {
	Type string `json:"type,"`
}3.3 Translation
The translation from ThinkingUnion into anthropic's thinking is trivial as it just need to ignore IncludeThoughts and extract other fields.
The translation from ThinkingUnion into gemini models' thinkingConfig is also straightforward:  It just need to extract the filed BudgetTokens as  thinkingBudget and directly use IncludeThoughts
With this change, users can use the same codes for models from these two providers, which is a key advantage to use a proxy.
4. Further scope of unified interface
Reasoning models from other providers/backends also have different config definitions. For example in vLLM, to disable thinking for qwen3 models, users need to use
client.completions.create(
    ...
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}
    ...
)We can also provide a unified interface for this case, then users can write:
client.completions.create(
    ...
    thinking={"type": disabled"}
    ...
)and we translate it into extra_body={"chat_template_kwargs": {"enable_thinking": False}} internally.
In summary, we propose to achieve the goal:
- For openai's gpt models, we use 
reasoning_effortto config thinking parameters. - For all other backends, including models using anthropic api, converse api, gcp vertex api, vLLM etc, we provide a single unified 
ThinkingUnioninterface.