Branch:
thinking-cap-improvements(offupstream/main, post-merge ofthinking-cap) Context: The unified thinking feature was merged via the capabilities PR. This plan addresses review findings fromREVIEW_COMMENTS.md— correctness bugs, missing implementations, effort mapping fixes, and comprehensive tests.
These are behavioral bugs where the unified thinking setting produces wrong or no-op
results. Each fix is small and self-contained.
File: pydantic_ai_slim/pydantic_ai/profiles/anthropic.py
Change ANTHROPIC_THINKING_BUDGET_MAP so that for adaptive-capable models:
True→{'type': 'adaptive'}'medium'→{'type': 'adaptive'}'low'→budget_tokens: 1024(see 1.2)'high'→budget_tokens: 16384(unchanged)
The translation method needs to check whether the model supports adaptive thinking and use the appropriate mapping. For budget-only models, keep existing fixed-budget behavior.
File: pydantic_ai_slim/pydantic_ai/profiles/anthropic.py
Change 'low' mapping from 2048 to 1024. Anthropic's SDK docs state
budget_tokens >= 1024. Since 'low' means "as little thinking as possible,"
the floor is the correct semantic match.
File: xAI model/helper where thinking translation lives
xAI Grok supports only 'low' and 'high'. Current mapping rounds UP for ambiguous
values. Fix to round DOWN (conservative = cheaper, safer):
True→'low'(match xAI SDK default)'low'→'low''medium'→'low'(round down)'high'→'high'
1.4 Groq: thinking=False should return 'hidden'
File: pydantic_ai_slim/pydantic_ai/models/groq.py (lines 245-256)
Groq's SDK accepts 'hidden', 'raw', 'parsed' — no true disable exists. But
'hidden' is the closest to "disabled" semantics (hides reasoning output). Change:
thinking=False→'hidden'(instead ofNOT_GIVEN)
Add a code comment explaining the SDK limitation. Also add a note in docs/thinking.md
in the Groq section.
File: Cerebras model/helper (_cerebras_settings_to_openai_settings())
Currently only maps thinking=False → disable_reasoning: True. Add the reverse:
thinking=Trueor effort level →disable_reasoning: False
These providers have profile flags set but their model classes never read
model_request_parameters.thinking. The unified setting is silently ignored.
File: pydantic_ai_slim/pydantic_ai/models/cohere.py
Also: pydantic_ai_slim/pydantic_ai/profiles/cohere.py
Research how Cohere's API controls thinking for Command R+ models. Implement
_translate_thinking() that reads model_request_parameters.thinking and maps it
to Cohere's API parameter.
If Cohere has no API-level thinking control: Remove supports_thinking=True and
thinking_always_enabled=True from the profile, and add a comment explaining why.
Models that always think with no API knob don't benefit from profile flags.
File: pydantic_ai_slim/pydantic_ai/models/mistral.py
Also: pydantic_ai_slim/pydantic_ai/profiles/mistral.py
Same as Cohere — research Magistral's API, implement translation, or remove flags.
Files:
profiles/meta.py— Llama-4 thinking models:supports_thinking=True,thinking_tagsprofiles/qwen.py— QwQ models:supports_thinking=True,thinking_always_enabled=True,thinking_tagsprofiles/moonshotai.py—kimi-thinking-*:supports_thinking=Trueprofiles/harmony.py—gpt-oss-120b:supports_thinking=Trueprofiles/zai.py—zai-glm-4.6:supports_thinking=True
These enable cascading to 15+ hosting providers (Ollama, Fireworks, Together, etc.).
Research each model family to determine correct flag values (especially
thinking_always_enabled and whether thinking_tags are needed for tag-based parsing).
File: pydantic_ai_slim/pydantic_ai/models/__init__.py (in prepare_request())
After copying model_settings['thinking'] to model_request_parameters.thinking,
pop it from the dict:
model_settings.pop('thinking', None)Prevents downstream provider code from accidentally reading the raw unresolved value.
File: pydantic_ai_slim/pydantic_ai/profiles/__init__.py
@property
def thinking_mode(self) -> Literal['unsupported', 'optional', 'required']:
if self.thinking_always_enabled:
return 'required'
if self.supports_thinking:
return 'optional'
return 'unsupported'Update prepare_request() to use it instead of inline boolean checks.
File: GoogleModelProfile definition + Google model class
Change google_supports_thinking_level: bool to
google_thinking_api: Literal['budget', 'level'] | None.
Update all usages in the Google model class and profile factory.
Files: All provider model classes with thinking translation methods
Rename for consistency and grep-ability:
- Anthropic:
_get_thinking_param()→_translate_thinking() - OpenAI:
_get_reasoning_effort()→_translate_thinking() - Groq:
_get_reasoning_format()→_translate_thinking() - Google:
_get_thinking_config()→_translate_thinking() - Bedrock:
_get_thinking_fields()→_translate_thinking()
Return types differ per provider — the convention is about naming, not signatures.
File: tests/test_unified_thinking.py (new)
Every behavior from Phases 1-3 should have a corresponding test.
Test Model.prepare_request() thinking resolution:
thinking=True+supports_thinking=True→params.thinking = Truethinking='high'+supports_thinking=True→params.thinking = 'high'thinking='medium'+supports_thinking=False→params.thinking = None(silently dropped)thinking=True+thinking_always_enabled=True→params.thinking = True- No
thinking+thinking_always_enabled=True→params.thinking = True(auto-enabled) - No
thinking+supports_thinking=True→params.thinking = None(not auto-enabled)
Test that prepare_request() does not mutate the original model_settings dict.
(This was a real bug on a prior branch.)
For each provider with both unified and provider-specific settings:
anthropic_thinkingset +thinkingset →anthropic_thinkingwinsopenai_reasoning_effortset +thinkingset →openai_reasoning_effortwins- etc.
One test class per provider. Each tests the full ThinkingLevel → provider parameter
mapping after the Phase 1 fixes:
Anthropic:
True→ adaptive (adaptive-capable models) / budget (budget-only models)False→{'type': 'disabled'}'low'→budget_tokens: 1024'medium'→ adaptive'high'→budget_tokens: 16384
OpenAI:
True→reasoning_effort='medium'False→reasoning_effort='none''low'/'medium'/'high'→ direct passthrough
Google (budget models, i.e. Gemini 2.5):
True→ default budgetFalse→thinking_budget: 0- Effort levels → specific budgets (2048/8192/24576)
Google (level models, i.e. Gemini 3+):
- Effort levels →
LOW/MEDIUM/HIGHenum values
Groq:
True/effort →reasoning_format='parsed'False→reasoning_format='hidden'
xAI:
True→'low''medium'→'low''high'→'high'
Bedrock (Anthropic variant): Same as Anthropic through Bedrock's layer Bedrock (OpenAI variant): Same as OpenAI through Bedrock's layer
thinking=True+ output type requiring tool → raisesUserErrorthinking=False+ output type requiring tool → no error
- After
prepare_request(),model_settingsshould not contain'thinking'key
- Same
Thinking(effort='high')capability produces sensible (non-error, non-no-op) results across all providers withsupports_thinking=True
- Add Groq limitation note (no true disable —
'hidden'hides output only) - Verify effort mapping tables match the code after Phase 1 fixes
- Ensure all providers with profile flags are represented
After Phase 1 fixes, update the mapping table to reflect:
- Anthropic:
True/'medium'→ adaptive,'low'→ budget 1024,'high'→ budget 16384 - xAI:
True/'medium'→ low,'high'→ high - Groq:
False→ hidden
Phase 1 (correctness fixes) ─┐
Phase 2 (missing impls) ├─ can be parallelized
Phase 3 (safety & hygiene) ─┘
│
▼
Phase 4 (tests) ── validates Phases 1-3
│
▼
Phase 5 (docs) ── reflects final state
- Thought summaries: No unified summary field. Provider-specific settings remain.
ResolvedThinkingConfigdataclass: Superseded by capabilities architecture. RawThinkingLevelonModelRequestParametersis the right design.- OpenAI
'minimal'/'xhigh': Valid SDK values, not exposed in unified API. Users can reach them viaopenai_reasoning_effortprovider-specific setting. - Broader Bedrock vendor families: Existing variants cover current needs.