Plan: Unified Thinking Capability Improvements

Branch: thinking-cap-improvements (off upstream/main, post-merge of thinking-cap) Context: The unified thinking feature was merged via the capabilities PR. This plan addresses review findings from REVIEW_COMMENTS.md — correctness bugs, missing implementations, effort mapping fixes, and comprehensive tests.

Phase 1: Correctness Fixes (provider translation bugs)

These are behavioral bugs where the unified thinking setting produces wrong or no-op results. Each fix is small and self-contained.

1.1 Anthropic: `'medium'` should map to adaptive, not fixed budget

File: pydantic_ai_slim/pydantic_ai/profiles/anthropic.py

Change ANTHROPIC_THINKING_BUDGET_MAP so that for adaptive-capable models:

True → {'type': 'adaptive'}
'medium' → {'type': 'adaptive'}
'low' → budget_tokens: 1024 (see 1.2)
'high' → budget_tokens: 16384 (unchanged)

The translation method needs to check whether the model supports adaptive thinking and use the appropriate mapping. For budget-only models, keep existing fixed-budget behavior.

1.2 Anthropic: `'low'` should use 1024 (documented minimum)

File: pydantic_ai_slim/pydantic_ai/profiles/anthropic.py

Change 'low' mapping from 2048 to 1024. Anthropic's SDK docs state budget_tokens >= 1024. Since 'low' means "as little thinking as possible," the floor is the correct semantic match.

1.3 xAI: Invert effort mapping (round down, not up)

File: xAI model/helper where thinking translation lives

xAI Grok supports only 'low' and 'high'. Current mapping rounds UP for ambiguous values. Fix to round DOWN (conservative = cheaper, safer):

True → 'low' (match xAI SDK default)
'low' → 'low'
'medium' → 'low' (round down)
'high' → 'high'

1.4 Groq: `thinking=False` should return `'hidden'`

File: pydantic_ai_slim/pydantic_ai/models/groq.py (lines 245-256)

Groq's SDK accepts 'hidden', 'raw', 'parsed' — no true disable exists. But 'hidden' is the closest to "disabled" semantics (hides reasoning output). Change:

thinking=False → 'hidden' (instead of NOT_GIVEN)

Add a code comment explaining the SDK limitation. Also add a note in docs/thinking.md in the Groq section.

1.5 Cerebras: Handle `thinking=True` (not just `False`)

File: Cerebras model/helper (_cerebras_settings_to_openai_settings())

Currently only maps thinking=False → disable_reasoning: True. Add the reverse:

thinking=True or effort level → disable_reasoning: False

Phase 2: Missing Implementations (silent no-ops)

These providers have profile flags set but their model classes never read model_request_parameters.thinking. The unified setting is silently ignored.

2.1 Cohere: Implement `_translate_thinking()` or remove flags

File: pydantic_ai_slim/pydantic_ai/models/cohere.py Also: pydantic_ai_slim/pydantic_ai/profiles/cohere.py

Research how Cohere's API controls thinking for Command R+ models. Implement _translate_thinking() that reads model_request_parameters.thinking and maps it to Cohere's API parameter.

If Cohere has no API-level thinking control: Remove supports_thinking=True and thinking_always_enabled=True from the profile, and add a comment explaining why. Models that always think with no API knob don't benefit from profile flags.

2.2 Mistral: Implement `_translate_thinking()` or remove flags

File: pydantic_ai_slim/pydantic_ai/models/mistral.py Also: pydantic_ai_slim/pydantic_ai/profiles/mistral.py

Same as Cohere — research Magistral's API, implement translation, or remove flags.

2.3 Missing profile flags for profile-only providers

Files:

profiles/meta.py — Llama-4 thinking models: supports_thinking=True, thinking_tags
profiles/qwen.py — QwQ models: supports_thinking=True, thinking_always_enabled=True, thinking_tags
profiles/moonshotai.py — kimi-thinking-*: supports_thinking=True
profiles/harmony.py — gpt-oss-120b: supports_thinking=True
profiles/zai.py — zai-glm-4.6: supports_thinking=True

These enable cascading to 15+ hosting providers (Ollama, Fireworks, Together, etc.).

Research each model family to determine correct flag values (especially thinking_always_enabled and whether thinking_tags are needed for tag-based parsing).

Phase 3: Safety & Hygiene

3.1 Strip `thinking` from `model_settings` after resolution

File: pydantic_ai_slim/pydantic_ai/models/__init__.py (in prepare_request())

After copying model_settings['thinking'] to model_request_parameters.thinking, pop it from the dict:

model_settings.pop('thinking', None)

Prevents downstream provider code from accidentally reading the raw unresolved value.

3.2 Add `thinking_mode` property to `ModelProfile`

File: pydantic_ai_slim/pydantic_ai/profiles/__init__.py

@property
def thinking_mode(self) -> Literal['unsupported', 'optional', 'required']:
    if self.thinking_always_enabled:
        return 'required'
    if self.supports_thinking:
        return 'optional'
    return 'unsupported'

Update prepare_request() to use it instead of inline boolean checks.

3.3 Rename Google profile field

File: GoogleModelProfile definition + Google model class

Change google_supports_thinking_level: bool to google_thinking_api: Literal['budget', 'level'] | None.

Update all usages in the Google model class and profile factory.

3.4 Rename all thinking translation methods to `_translate_thinking()`

Files: All provider model classes with thinking translation methods

Rename for consistency and grep-ability:

Anthropic: _get_thinking_param() → _translate_thinking()
OpenAI: _get_reasoning_effort() → _translate_thinking()
Groq: _get_reasoning_format() → _translate_thinking()
Google: _get_thinking_config() → _translate_thinking()
Bedrock: _get_thinking_fields() → _translate_thinking()

Return types differ per provider — the convention is about naming, not signatures.

Phase 4: Tests

File: tests/test_unified_thinking.py (new)

Every behavior from Phases 1-3 should have a corresponding test.

4.1 Core resolver tests (`TestPrepareRequestThinking`)

Test Model.prepare_request() thinking resolution:

thinking=True + supports_thinking=True → params.thinking = True
thinking='high' + supports_thinking=True → params.thinking = 'high'
thinking='medium' + supports_thinking=False → params.thinking = None (silently dropped)
thinking=True + thinking_always_enabled=True → params.thinking = True
No thinking + thinking_always_enabled=True → params.thinking = True (auto-enabled)
No thinking + supports_thinking=True → params.thinking = None (not auto-enabled)

4.2 No-mutation regression test

Test that prepare_request() does not mutate the original model_settings dict. (This was a real bug on a prior branch.)

4.3 Provider-specific settings precedence

For each provider with both unified and provider-specific settings:

anthropic_thinking set + thinking set → anthropic_thinking wins
openai_reasoning_effort set + thinking set → openai_reasoning_effort wins
etc.

4.4 Per-provider translation tests

One test class per provider. Each tests the full ThinkingLevel → provider parameter mapping after the Phase 1 fixes:

Anthropic:

True → adaptive (adaptive-capable models) / budget (budget-only models)
False → {'type': 'disabled'}
'low' → budget_tokens: 1024
'medium' → adaptive
'high' → budget_tokens: 16384

OpenAI:

True → reasoning_effort='medium'
False → reasoning_effort='none'
'low'/'medium'/'high' → direct passthrough

Google (budget models, i.e. Gemini 2.5):

True → default budget
False → thinking_budget: 0
Effort levels → specific budgets (2048/8192/24576)

Google (level models, i.e. Gemini 3+):

Effort levels → LOW/MEDIUM/HIGH enum values

Groq:

True/effort → reasoning_format='parsed'
False → reasoning_format='hidden'

xAI:

True → 'low'
'medium' → 'low'
'high' → 'high'

Bedrock (Anthropic variant): Same as Anthropic through Bedrock's layer Bedrock (OpenAI variant): Same as OpenAI through Bedrock's layer

4.5 Anthropic conflict test

thinking=True + output type requiring tool → raises UserError
thinking=False + output type requiring tool → no error

4.6 Settings stripping test

After prepare_request(), model_settings should not contain 'thinking' key

4.7 Cross-provider portability

Same Thinking(effort='high') capability produces sensible (non-error, non-no-op) results across all providers with supports_thinking=True

Phase 5: Documentation

5.1 Update `docs/thinking.md`

Add Groq limitation note (no true disable — 'hidden' hides output only)
Verify effort mapping tables match the code after Phase 1 fixes
Ensure all providers with profile flags are represented

5.2 Update effort mapping reference table

After Phase 1 fixes, update the mapping table to reflect:

Anthropic: True/'medium' → adaptive, 'low' → budget 1024, 'high' → budget 16384
xAI: True/'medium' → low, 'high' → high
Groq: False → hidden

Execution Order

Phase 1 (correctness fixes) ─┐
Phase 2 (missing impls)      ├─ can be parallelized
Phase 3 (safety & hygiene)   ─┘
         │
         ▼
Phase 4 (tests) ── validates Phases 1-3
         │
         ▼
Phase 5 (docs) ── reflects final state

Out of Scope

Thought summaries: No unified summary field. Provider-specific settings remain.
ResolvedThinkingConfig dataclass: Superseded by capabilities architecture. Raw ThinkingLevel on ModelRequestParameters is the right design.
OpenAI 'minimal'/'xhigh': Valid SDK values, not exposed in unified API. Users can reach them via openai_reasoning_effort provider-specific setting.
Broader Bedrock vendor families: Existing variants cover current needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan: Unified Thinking Capability Improvements

Phase 1: Correctness Fixes (provider translation bugs)

1.1 Anthropic: `'medium'` should map to adaptive, not fixed budget

1.2 Anthropic: `'low'` should use 1024 (documented minimum)

1.3 xAI: Invert effort mapping (round down, not up)

1.4 Groq: `thinking=False` should return `'hidden'`

1.5 Cerebras: Handle `thinking=True` (not just `False`)

Phase 2: Missing Implementations (silent no-ops)

2.1 Cohere: Implement `_translate_thinking()` or remove flags

2.2 Mistral: Implement `_translate_thinking()` or remove flags

2.3 Missing profile flags for profile-only providers

Phase 3: Safety & Hygiene

3.1 Strip `thinking` from `model_settings` after resolution

3.2 Add `thinking_mode` property to `ModelProfile`

3.3 Rename Google profile field

3.4 Rename all thinking translation methods to `_translate_thinking()`

Phase 4: Tests

4.1 Core resolver tests (`TestPrepareRequestThinking`)

4.2 No-mutation regression test

4.3 Provider-specific settings precedence

4.4 Per-provider translation tests

4.5 Anthropic conflict test

4.6 Settings stripping test

4.7 Cross-provider portability

Phase 5: Documentation

5.1 Update `docs/thinking.md`

5.2 Update effort mapping reference table

Execution Order

Out of Scope

FilesExpand file tree

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

Plan: Unified Thinking Capability Improvements

Phase 1: Correctness Fixes (provider translation bugs)

1.1 Anthropic: 'medium' should map to adaptive, not fixed budget

1.2 Anthropic: 'low' should use 1024 (documented minimum)

1.3 xAI: Invert effort mapping (round down, not up)

1.4 Groq: thinking=False should return 'hidden'

1.5 Cerebras: Handle thinking=True (not just False)

Phase 2: Missing Implementations (silent no-ops)

2.1 Cohere: Implement _translate_thinking() or remove flags

2.2 Mistral: Implement _translate_thinking() or remove flags

2.3 Missing profile flags for profile-only providers

Phase 3: Safety & Hygiene

3.1 Strip thinking from model_settings after resolution

3.2 Add thinking_mode property to ModelProfile

3.3 Rename Google profile field

3.4 Rename all thinking translation methods to _translate_thinking()

Phase 4: Tests

4.1 Core resolver tests (TestPrepareRequestThinking)

4.2 No-mutation regression test

4.3 Provider-specific settings precedence

4.4 Per-provider translation tests

4.5 Anthropic conflict test

4.6 Settings stripping test

4.7 Cross-provider portability

Phase 5: Documentation

5.1 Update docs/thinking.md

5.2 Update effort mapping reference table

Execution Order

Out of Scope

1.1 Anthropic: `'medium'` should map to adaptive, not fixed budget

1.2 Anthropic: `'low'` should use 1024 (documented minimum)

1.4 Groq: `thinking=False` should return `'hidden'`

1.5 Cerebras: Handle `thinking=True` (not just `False`)

2.1 Cohere: Implement `_translate_thinking()` or remove flags

2.2 Mistral: Implement `_translate_thinking()` or remove flags

3.1 Strip `thinking` from `model_settings` after resolution

3.2 Add `thinking_mode` property to `ModelProfile`

3.4 Rename all thinking translation methods to `_translate_thinking()`

4.1 Core resolver tests (`TestPrepareRequestThinking`)

5.1 Update `docs/thinking.md`