Provide unified thinking config among different providers

# Feature Proposal: Unified Thinking/Reasoning Configuration Interface

## 1. Goal
Provide a single, unified interface for configuring advanced thinking and reasoning settings across different Large Language Model (LLM) providers. This standardization is a key requirement and major benefit of using our gateway proxy, significantly improving the developer experience.

## 2. Current Status: Vendor-Specific Configuration
Currently, users must manage different configuration fields and structures for similar functionality when calling different vendor APIs. This complexity defeats the purpose of a unified gateway.

for anthropic models, we have  
```go
type AnthropicVendorFields struct {
	Thinking *anthropic.ThinkingConfigParamUnion `json:"thinking,omitzero"`
}

type ThinkingConfigParamUnion struct {
	OfEnabled  *ThinkingConfigEnabledParam  `json:",omitzero,inline"`
	OfDisabled *ThinkingConfigDisabledParam `json:",omitzero,inline"`
	paramUnion
}

type ThinkingConfigEnabledParam struct {
	BudgetTokens int64 `json:"budget_tokens,required"`
	Type constant.Enabled `json:"type,required"`
}

type ThinkingConfigDisabledParam struct {
	Type constant.Disabled `json:"type,required"`
}
```

for gemini models, we have 
```go
type GCPVertexAIGenerationConfig struct {
	ThinkingConfig *genai.ThinkingConfig `json:"thinkingConfig,omitzero"`
}
type ThinkingConfig struct {
	IncludeThoughts bool `json:"includeThoughts,omitempty"`
	ThinkingBudget *int32 `json:"thinkingBudget,omitempty"`
}
```

As a result, users need to use different inference configs for different models. For example:
```python
if (model is a anthropic model){
     client.completions.create(
          ...
          thinking = {"type": "enabled", "budget_tokens": 1024}
          ...
   )
}
elif (model is a gemini model){
      client.completions.create(
          ...
          thinkingConfig = {"thinkingBudget": 1024, "includeThoughts": false}
          ...
      )

}
```

## 3. Proposal

### 3.1 Reason

However, the interface among these two are actually very similar, most of them are the same thing with different names. For example `ThinkingConfig` and `Thinking`, `ThinkingBudget` and `BudgetTokens`.  

### 3.2 Unified interface
Thus, we propose to use a single unified interface directly in `ChatCompletionRequest` for both providers. The unified interface is similar to anthropic's thinking config, but extended with `includeThoughts`.

Here is the proposed definition: 
```go
Thinking *ThinkingUnion `json:"thinking,omitzero"`

type ThinkingUnion struct {
	OfEnabled  *ThinkingEnabled  `json:",omitzero,inline"`
	OfDisabled *ThinkingDisabled `json:",omitzero,inline"`
}

type ThinkingEnabled struct {
	BudgetTokens int64 `json:"budget_tokens"`
        Type string `json:"type"`
	IncludeThoughts bool `json:"includeThoughts,omitempty"`
}

type ThinkingDisabled struct {
	Type string `json:"type,"`
}
```

### 3.3 Translation

The translation from `ThinkingUnion` into anthropic's `thinking` is trivial as it just need to ignore `IncludeThoughts` and extract other fields.

The translation from `ThinkingUnion` into gemini models' `thinkingConfig` is also straightforward:  It just need to extract the filed `BudgetTokens` as ` thinkingBudget` and directly use `IncludeThoughts`

With this change, users can use the same codes for models from these two providers, which is a key advantage to use a proxy.

## 4. Further scope of unified interface

Reasoning models from other providers/backends also have different config definitions. For example in vLLM, to disable thinking for qwen3 models, users need to use 

```python
client.completions.create(
    ...
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}
    ...
)
```

We can also provide a unified interface for this case, then users  can write:

```python
client.completions.create(
    ...
    thinking={"type": disabled"}
    ...
)
```
and we translate it into `extra_body={"chat_template_kwargs": {"enable_thinking": False}}` internally.

In summary, we propose to achieve the goal:

- For openai's gpt models, we use `reasoning_effort` to config thinking parameters. 
- For all other backends, including models using anthropic api, converse api, gcp vertex api, vLLM etc, we provide a single unified `ThinkingUnion` interface.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide unified thinking config among different providers #1463

Feature Proposal: Unified Thinking/Reasoning Configuration Interface

1. Goal

2. Current Status: Vendor-Specific Configuration

3. Proposal

3.1 Reason

3.2 Unified interface

3.3 Translation

4. Further scope of unified interface

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide unified thinking config among different providers #1463

Description

Feature Proposal: Unified Thinking/Reasoning Configuration Interface

1. Goal

2. Current Status: Vendor-Specific Configuration

3. Proposal

3.1 Reason

3.2 Unified interface

3.3 Translation

4. Further scope of unified interface

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions