Skip to content

Commit 0b7d92e

Browse files
committed
Rework thinking budget: opt-in by default, adaptive thinking, effort levels
Thinking was unconditionally enabled for all models with provider-specific defaults (e.g. 'medium' for OpenAI, 8192 tokens for Anthropic). This meant every model paid the latency and cost of thinking even when the user never asked for it. This commit makes thinking opt-in: it is only enabled when the user sets thinking_budget in their YAML config, with one exception — thinking-only models (OpenAI o-series) still get a default of 'medium' since they cannot function without it. New features: - Adaptive thinking for Anthropic (thinking_budget: adaptive). Uses thinking.type=adaptive which lets the model decide when and how much to think. Recommended for Claude 4.6 models. - Effort-level strings for Anthropic (thinking_budget: low/medium/high/max). Translated to adaptive thinking + output_config.effort in the API. Previously these strings were silently ignored because the Anthropic client only checked for token-based budgets. - Effort-level strings for Bedrock Claude. Mapped to token budgets via EffortTokens() since the Bedrock API does not support adaptive thinking natively. Bug fixes: - Anthropic/Bedrock clients silently ignored string effort levels (minimal/low/medium/high). A config with thinking_budget: high produced no thinking at all because the code only checked .Tokens > 0. - applyOverrides and applyProviderDefaults used shallow struct copies that shared the underlying ProviderOpts map. Disabling thinking via /think deleted interleaved_thinking from the original config's map. Introduced cloneModelConfig() to deep-copy the map. - /think on a Gemini 2.0 model (which does not support thinking) returned a 'medium' budget that caused API errors. The default case now returns nil for unknown/older Gemini models. Code quality: - Extracted resolveProviderType() to replace three copies of the same provider-type resolution logic. - Extracted ensureInterleavedThinking() to replace four copies of the same ProviderOpts write pattern. - Separated setThinkingDefaults (used by /think toggle, generous) from applyModelDefaults (used at config load, conservative). - Removed empty applyGoogleDefaults, merged applyAnthropicDefaults and applyBedrockDefaults into shared helpers. - Consolidated test files from 8+ test functions into compact table- driven tests with a unified assertion pattern. - Moved ThinkingBudget method tests (IsDisabled, IsAdaptive, EffortTokens) to pkg/config/latest where the type lives. Schema and examples updated to document adaptive, max, and effort levels. Assisted-By: docker-agent
1 parent d871092 commit 0b7d92e

File tree

11 files changed

+688
-1351
lines changed

11 files changed

+688
-1351
lines changed

agent-schema.json

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -535,7 +535,7 @@
535535
"description": "Whether to track usage"
536536
},
537537
"thinking_budget": {
538-
"description": "Controls reasoning effort/budget. Use 'none' or 0 to disable thinking. OpenAI: string levels ('minimal','low','medium','high'), default 'medium'. Anthropic: integer token budget (1024-32768), default 8192. Amazon Bedrock (Claude): same as Anthropic. Google Gemini 2.5: integer token budget (-1 for dynamic, 0 to disable, 24576 max), default -1. Google Gemini 3: string levels ('minimal' Flash only,'low','medium','high'), default 'high' for Pro, 'medium' for Flash.",
538+
"description": "Controls reasoning effort/budget. Use 'none' or 0 to disable thinking. OpenAI: string levels ('minimal','low','medium','high'). Anthropic: integer token budget (1024-32768), 'adaptive' (lets the model decide), or effort levels ('low','medium','high','max') which use adaptive thinking with the given effort. Amazon Bedrock (Claude): integer token budget or effort levels ('low','medium','high') mapped to token budgets. Google Gemini 2.5: integer token budget (-1 for dynamic, 0 to disable, 24576 max). Google Gemini 3: string levels ('minimal' Flash only,'low','medium','high'). Thinking is only enabled when explicitly configured.",
539539
"oneOf": [
540540
{
541541
"type": "string",
@@ -544,9 +544,11 @@
544544
"minimal",
545545
"low",
546546
"medium",
547-
"high"
547+
"high",
548+
"max",
549+
"adaptive"
548550
],
549-
"description": "Reasoning effort level (OpenAI, Gemini 3). Use 'none' to disable thinking."
551+
"description": "Reasoning effort level. 'adaptive'/'max' are Anthropic-specific. Use 'none' to disable thinking."
550552
},
551553
{
552554
"type": "integer",
@@ -562,6 +564,8 @@
562564
"low",
563565
"medium",
564566
"high",
567+
"max",
568+
"adaptive",
565569
-1,
566570
1024,
567571
8192,

examples/thinking_budget.yaml

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
agents:
77
root:
88
model: gpt-5-mini-min # <- try with gpt-5-mini-high
9-
# model: claude-4-5-sonnet-min # <- try with claude-4-5-sonnet-high
9+
# model: claude-4-5-sonnet-min # <- try with claude-4-5-sonnet-high or claude-opus-4-6-adaptive
1010
# model: gemini-2-5-flash-dynamic-thinking # <- try with -no-thinking, -low or -high variants
1111
description: a helpful assistant that thinks
1212
instruction: you are a helpful assistant who can also use tools, but only if you need to
@@ -29,15 +29,25 @@ models:
2929
claude-4-5-sonnet-min:
3030
provider: anthropic
3131
model: claude-sonnet-4-5-20250929
32-
thinking_budget: 1024 # <- tokens, 1024 is the minimum
32+
thinking_budget: 1024 # <- explicit token budget (1024-32768) for older models
3333

3434
claude-4-5-sonnet-high:
3535
provider: anthropic
3636
model: claude-sonnet-4-5-20250929
37-
thinking_budget: 32768 # <- tokens, 32768 is the Anthropic suggested maximum without batching
37+
thinking_budget: 32768 # <- explicit token budget (32768 is the Anthropic suggested maximum)
3838
provider_opts:
3939
interleaved_thinking: true # <- enables interleaved thinking, aka tool calling during model reasoning
4040

41+
claude-opus-4-6-adaptive:
42+
provider: anthropic
43+
model: claude-opus-4-6
44+
thinking_budget: adaptive # <- lets the model decide when and how much to think (recommended for 4.6)
45+
46+
claude-opus-4-6-low:
47+
provider: anthropic
48+
model: claude-opus-4-6
49+
thinking_budget: low # <- adaptive thinking with low effort: "low", "medium", "high", "max"
50+
4151
gemini-2-5-flash-dynamic-thinking:
4252
provider: google
4353
model: gemini-2.5-flash

pkg/config/latest/types.go

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -397,7 +397,10 @@ type ModelConfig struct {
397397
TrackUsage *bool `json:"track_usage,omitempty"`
398398
// ThinkingBudget controls reasoning effort/budget:
399399
// - For OpenAI: accepts string levels "minimal", "low", "medium", "high"
400-
// - For Anthropic: accepts integer token budget (1024-32000)
400+
// - For Anthropic: accepts integer token budget (1024-32000), "adaptive",
401+
// or string levels "low", "medium", "high", "max" (uses adaptive thinking with effort)
402+
// - For Bedrock Claude: accepts integer token budget or string levels
403+
// "minimal", "low", "medium", "high" (mapped to token budgets via EffortTokens)
401404
// - For other providers: may be ignored
402405
ThinkingBudget *ThinkingBudget `json:"thinking_budget,omitempty"`
403406
// Routing defines rules for routing requests to different models.
@@ -670,6 +673,7 @@ func (d DeferConfig) MarshalYAML() (any, error) {
670673
// ThinkingBudget represents reasoning budget configuration.
671674
// It accepts either a string effort level or an integer token budget:
672675
// - String: "minimal", "low", "medium", "high" (for OpenAI)
676+
// - String: "adaptive" (for Anthropic models that support adaptive thinking)
673677
// - Integer: token count (for Anthropic, range 1024-32768)
674678
type ThinkingBudget struct {
675679
// Effort stores string-based reasoning effort levels
@@ -717,14 +721,50 @@ func (t ThinkingBudget) MarshalYAML() (any, error) {
717721
// NOT disabled when:
718722
// - Tokens > 0 or Tokens == -1 (explicit token budget)
719723
// - Effort is a real level like "medium" or "high"
724+
// - Effort is "adaptive"
720725
func (t *ThinkingBudget) IsDisabled() bool {
721726
if t == nil {
722727
return false
723728
}
724729
if t.Tokens == 0 && t.Effort == "" {
725730
return true
726731
}
727-
return t.Effort == "none"
732+
return strings.EqualFold(t.Effort, "none")
733+
}
734+
735+
// IsAdaptive returns true if the thinking budget is set to adaptive mode.
736+
// Adaptive thinking lets the model decide how much thinking to do.
737+
func (t *ThinkingBudget) IsAdaptive() bool {
738+
if t == nil {
739+
return false
740+
}
741+
return strings.EqualFold(t.Effort, "adaptive")
742+
}
743+
744+
// EffortTokens maps a string effort level to a token budget for providers
745+
// that only support token-based thinking (e.g. Bedrock Claude).
746+
//
747+
// The Anthropic direct API uses adaptive thinking + output_config.effort
748+
// for string levels instead; see anthropicEffort in the anthropic package.
749+
//
750+
// Returns (tokens, true) when a mapping exists, or (0, false) when
751+
// the budget uses an explicit token count or an unrecognised effort string.
752+
func (t *ThinkingBudget) EffortTokens() (int, bool) {
753+
if t == nil || t.Effort == "" {
754+
return 0, false
755+
}
756+
switch strings.ToLower(strings.TrimSpace(t.Effort)) {
757+
case "minimal":
758+
return 1024, true
759+
case "low":
760+
return 2048, true
761+
case "medium":
762+
return 8192, true
763+
case "high":
764+
return 16384, true
765+
default:
766+
return 0, false
767+
}
728768
}
729769

730770
// MarshalJSON implements custom marshaling to output simple string or int format

pkg/config/latest/types_test.go

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,77 @@ func TestThinkingBudget_MarshalUnmarshal_Zero(t *testing.T) {
121121
require.Equal(t, "thinking_budget: 0\n", string(output))
122122
}
123123

124+
func TestThinkingBudget_IsDisabled(t *testing.T) {
125+
t.Parallel()
126+
127+
for _, tt := range []struct {
128+
name string
129+
b *ThinkingBudget
130+
want bool
131+
}{
132+
{"nil", nil, false},
133+
{"zero tokens", &ThinkingBudget{Tokens: 0}, true},
134+
{"none effort", &ThinkingBudget{Effort: "none"}, true},
135+
{"positive tokens", &ThinkingBudget{Tokens: 8192}, false},
136+
{"medium effort", &ThinkingBudget{Effort: "medium"}, false},
137+
{"adaptive effort", &ThinkingBudget{Effort: "adaptive"}, false},
138+
{"negative tokens (dynamic)", &ThinkingBudget{Tokens: -1}, false},
139+
} {
140+
t.Run(tt.name, func(t *testing.T) {
141+
t.Parallel()
142+
require.Equal(t, tt.want, tt.b.IsDisabled())
143+
})
144+
}
145+
}
146+
147+
func TestThinkingBudget_IsAdaptive(t *testing.T) {
148+
t.Parallel()
149+
150+
for _, tt := range []struct {
151+
name string
152+
b *ThinkingBudget
153+
want bool
154+
}{
155+
{"nil", nil, false},
156+
{"adaptive", &ThinkingBudget{Effort: "adaptive"}, true},
157+
{"medium", &ThinkingBudget{Effort: "medium"}, false},
158+
{"tokens", &ThinkingBudget{Tokens: 8192}, false},
159+
} {
160+
t.Run(tt.name, func(t *testing.T) {
161+
t.Parallel()
162+
require.Equal(t, tt.want, tt.b.IsAdaptive())
163+
})
164+
}
165+
}
166+
167+
func TestThinkingBudget_EffortTokens(t *testing.T) {
168+
t.Parallel()
169+
170+
for _, tt := range []struct {
171+
name string
172+
b *ThinkingBudget
173+
wantTokens int
174+
wantOK bool
175+
}{
176+
{"nil", nil, 0, false},
177+
{"minimal", &ThinkingBudget{Effort: "minimal"}, 1024, true},
178+
{"low", &ThinkingBudget{Effort: "low"}, 2048, true},
179+
{"medium", &ThinkingBudget{Effort: "medium"}, 8192, true},
180+
{"high", &ThinkingBudget{Effort: "high"}, 16384, true},
181+
{"adaptive", &ThinkingBudget{Effort: "adaptive"}, 0, false},
182+
{"none", &ThinkingBudget{Effort: "none"}, 0, false},
183+
{"explicit tokens", &ThinkingBudget{Tokens: 4096}, 0, false},
184+
{"empty effort", &ThinkingBudget{}, 0, false},
185+
} {
186+
t.Run(tt.name, func(t *testing.T) {
187+
t.Parallel()
188+
tokens, ok := tt.b.EffortTokens()
189+
require.Equal(t, tt.wantOK, ok)
190+
require.Equal(t, tt.wantTokens, tokens)
191+
})
192+
}
193+
}
194+
124195
func TestAgents_UnmarshalYAML_RejectsUnknownFields(t *testing.T) {
125196
t.Parallel()
126197

pkg/model/provider/anthropic/beta_client.go

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -95,20 +95,38 @@ func (c *Client) createBetaStream(
9595
// For interleaved thinking to make sense, we use a default of 16384 tokens for the thinking budget
9696
thinkingEnabled := c.ModelOptions.Thinking() == nil || *c.ModelOptions.Thinking()
9797
if thinkingEnabled {
98-
thinkingTokens := int64(16384)
99-
if c.ModelConfig.ThinkingBudget != nil {
100-
thinkingTokens = int64(c.ModelConfig.ThinkingBudget.Tokens)
98+
if c.ModelConfig.ThinkingBudget != nil && c.ModelConfig.ThinkingBudget.IsAdaptive() {
99+
// Adaptive thinking: let the model decide how much thinking to do
100+
adaptive := anthropic.NewBetaThinkingConfigAdaptiveParam()
101+
params.Thinking = anthropic.BetaThinkingConfigParamUnion{
102+
OfAdaptive: &adaptive,
103+
}
104+
slog.Debug("Anthropic Beta API using adaptive thinking")
105+
} else if effort, ok := anthropicEffort(c.ModelConfig.ThinkingBudget); ok {
106+
// Effort level: use adaptive thinking + output_config.effort
107+
adaptive := anthropic.NewBetaThinkingConfigAdaptiveParam()
108+
params.Thinking = anthropic.BetaThinkingConfigParamUnion{
109+
OfAdaptive: &adaptive,
110+
}
111+
params.OutputConfig.Effort = anthropic.BetaOutputConfigEffort(effort)
112+
slog.Debug("Anthropic Beta API using adaptive thinking with effort",
113+
"effort", effort)
101114
} else {
102-
slog.Info("Anthropic Beta API using default thinking_budget with interleaved thinking", "budget_tokens", thinkingTokens)
103-
}
104-
switch {
105-
case thinkingTokens >= 1024 && thinkingTokens < maxTokens:
106-
params.Thinking = anthropic.BetaThinkingConfigParamOfEnabled(thinkingTokens)
107-
slog.Debug("Anthropic Beta API using thinking_budget with interleaved thinking", "budget_tokens", thinkingTokens)
108-
case thinkingTokens >= maxTokens:
109-
slog.Warn("Anthropic Beta API thinking_budget must be less than max_tokens, ignoring", "tokens", thinkingTokens, "max_tokens", maxTokens)
110-
default:
111-
slog.Warn("Anthropic Beta API thinking_budget below minimum (1024), ignoring", "tokens", thinkingTokens)
115+
thinkingTokens := int64(16384)
116+
if c.ModelConfig.ThinkingBudget != nil {
117+
thinkingTokens = int64(c.ModelConfig.ThinkingBudget.Tokens)
118+
} else {
119+
slog.Info("Anthropic Beta API using default thinking_budget with interleaved thinking", "budget_tokens", thinkingTokens)
120+
}
121+
switch {
122+
case thinkingTokens >= 1024 && thinkingTokens < maxTokens:
123+
params.Thinking = anthropic.BetaThinkingConfigParamOfEnabled(thinkingTokens)
124+
slog.Debug("Anthropic Beta API using thinking_budget with interleaved thinking", "budget_tokens", thinkingTokens)
125+
case thinkingTokens >= maxTokens:
126+
slog.Warn("Anthropic Beta API thinking_budget must be less than max_tokens, ignoring", "tokens", thinkingTokens, "max_tokens", maxTokens)
127+
default:
128+
slog.Warn("Anthropic Beta API thinking_budget below minimum (1024), ignoring", "tokens", thinkingTokens)
129+
}
112130
}
113131
} else {
114132
slog.Debug("Anthropic Beta API: Thinking disabled via /think command")

pkg/model/provider/anthropic/client.go

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,23 @@ func (c *Client) getResponseTrailer() http.Header {
5050
// adjustMaxTokensForThinking checks if max_tokens needs adjustment for thinking_budget.
5151
// Anthropic's max_tokens represents the combined budget for thinking + output tokens.
5252
// Returns the adjusted maxTokens value and an error if user-set max_tokens is too low.
53+
//
54+
// This only applies to fixed token budgets. Adaptive thinking and effort-based
55+
// budgets don't need adjustment since the model manages its own thinking allocation.
5356
func (c *Client) adjustMaxTokensForThinking(maxTokens int64) (int64, error) {
54-
if c.ModelConfig.ThinkingBudget == nil || c.ModelConfig.ThinkingBudget.Tokens <= 0 {
57+
if c.ModelConfig.ThinkingBudget == nil || c.ModelConfig.ThinkingBudget.IsAdaptive() {
58+
return maxTokens, nil
59+
}
60+
// Effort-based budgets use adaptive thinking — no token adjustment needed.
61+
if _, ok := anthropicEffort(c.ModelConfig.ThinkingBudget); ok {
5562
return maxTokens, nil
5663
}
5764

5865
thinkingTokens := int64(c.ModelConfig.ThinkingBudget.Tokens)
66+
if thinkingTokens <= 0 {
67+
return maxTokens, nil
68+
}
69+
5970
minRequired := thinkingTokens + 1024 // configured thinking budget + minimum output buffer
6071

6172
if maxTokens <= thinkingTokens {
@@ -297,7 +308,25 @@ func (c *Client) CreateChatCompletionStream(
297308

298309
// Apply thinking budget first, as it affects whether we can set temperature
299310
thinkingEnabled := false
300-
if c.ModelConfig.ThinkingBudget != nil && c.ModelConfig.ThinkingBudget.Tokens > 0 {
311+
if c.ModelConfig.ThinkingBudget != nil && c.ModelConfig.ThinkingBudget.IsAdaptive() {
312+
// Adaptive thinking: let the model decide how much thinking to do
313+
adaptive := anthropic.NewThinkingConfigAdaptiveParam()
314+
params.Thinking = anthropic.ThinkingConfigParamUnion{
315+
OfAdaptive: &adaptive,
316+
}
317+
thinkingEnabled = true
318+
slog.Debug("Anthropic API using adaptive thinking (standard messages)")
319+
} else if effort, ok := anthropicEffort(c.ModelConfig.ThinkingBudget); ok {
320+
// Effort level: use adaptive thinking + output_config.effort
321+
adaptive := anthropic.NewThinkingConfigAdaptiveParam()
322+
params.Thinking = anthropic.ThinkingConfigParamUnion{
323+
OfAdaptive: &adaptive,
324+
}
325+
params.OutputConfig.Effort = anthropic.OutputConfigEffort(effort)
326+
thinkingEnabled = true
327+
slog.Debug("Anthropic API using adaptive thinking with effort",
328+
"effort", effort)
329+
} else if c.ModelConfig.ThinkingBudget != nil && c.ModelConfig.ThinkingBudget.Tokens > 0 {
301330
thinkingTokens := int64(c.ModelConfig.ThinkingBudget.Tokens)
302331
switch {
303332
case thinkingTokens >= 1024 && thinkingTokens < maxTokens:
@@ -895,6 +924,29 @@ func differenceIDs(a, b map[string]struct{}) []string {
895924
return missing
896925
}
897926

927+
// anthropicEffort maps a ThinkingBudget effort string to an Anthropic API
928+
// effort level ("low", "medium", "high", "max"). Returns ("", false) when
929+
// the budget uses token counts, adaptive mode, or an unrecognised string.
930+
func anthropicEffort(b *latest.ThinkingBudget) (string, bool) {
931+
if b == nil {
932+
return "", false
933+
}
934+
switch strings.ToLower(strings.TrimSpace(b.Effort)) {
935+
case "low":
936+
return "low", true
937+
case "minimal": // "minimal" is not in the Anthropic API; map to closest
938+
return "low", true
939+
case "medium":
940+
return "medium", true
941+
case "high":
942+
return "high", true
943+
case "max":
944+
return "max", true
945+
default:
946+
return "", false
947+
}
948+
}
949+
898950
// anthropicContextLimit returns a reasonable default context window for Anthropic models.
899951
// We default to 200k tokens, which is what 3.5-4.5 models support; adjust as needed over time.
900952
func anthropicContextLimit(model string) int64 {

pkg/model/provider/bedrock/client.go

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -275,16 +275,23 @@ func (c *Client) buildInferenceConfig() *types.InferenceConfiguration {
275275
return cfg
276276
}
277277

278+
// resolveThinkingTokens returns the effective token budget for thinking.
279+
// It handles both explicit token counts and effort-level strings.
280+
// Returns 0 if no valid thinking budget is configured.
281+
func (c *Client) resolveThinkingTokens() int {
282+
if c.ModelConfig.ThinkingBudget == nil {
283+
return 0
284+
}
285+
if tokens, ok := c.ModelConfig.ThinkingBudget.EffortTokens(); ok {
286+
return tokens
287+
}
288+
return c.ModelConfig.ThinkingBudget.Tokens
289+
}
290+
278291
// isThinkingEnabled mirrors the validation in buildAdditionalModelRequestFields
279292
// to determine if thinking params will affect inference config (temp/topP constraints).
280293
func (c *Client) isThinkingEnabled() bool {
281-
if c.ModelConfig.ThinkingBudget == nil || c.ModelConfig.ThinkingBudget.Tokens <= 0 {
282-
return false
283-
}
284-
285-
tokens := c.ModelConfig.ThinkingBudget.Tokens
286-
287-
// Check minimum (Claude requires at least 1024 tokens for thinking)
294+
tokens := c.resolveThinkingTokens()
288295
if tokens < 1024 {
289296
return false
290297
}
@@ -310,12 +317,11 @@ func (c *Client) promptCachingEnabled() bool {
310317

311318
// buildAdditionalModelRequestFields configures Claude's extended thinking (reasoning) mode.
312319
func (c *Client) buildAdditionalModelRequestFields() document.Interface {
313-
if c.ModelConfig.ThinkingBudget == nil || c.ModelConfig.ThinkingBudget.Tokens <= 0 {
320+
tokens := c.resolveThinkingTokens()
321+
if tokens <= 0 {
314322
return nil
315323
}
316324

317-
tokens := c.ModelConfig.ThinkingBudget.Tokens
318-
319325
// Validate minimum (Claude requires at least 1024 tokens for thinking)
320326
if tokens < 1024 {
321327
slog.Warn("Bedrock thinking_budget below minimum (1024), ignoring",

0 commit comments

Comments
 (0)