Opal supports configuring reasoning effort (extended thinking) for models that support it. This controls how much internal reasoning the model performs before responding โ higher effort means more thorough analysis at the cost of latency and tokens.
| Level | Description |
|---|---|
low |
Minimal reasoning โ fast, economical |
medium |
Balanced reasoning and speed |
high |
Thorough reasoning โ slower, more tokens (default for thinking-capable models) |
max |
Unconstrained reasoning (Opus 4.6+ only) |
Thinking-capable models (Claude, GPT-5, o3, o4) default to high. Models that don't support reasoning (e.g. GPT-4o, GPT-4.1) always run without it. The off level is not user-selectable for thinking-capable models โ reasoning is always enabled at low or above.
graph TD
User["User selects<br/>model + thinking level"]
Model["Opal.Provider.Model<br/><small>thinking_level: :high</small>"]
Copilot["Provider.Copilot"]
ChatAPI["Chat Completions API<br/><small>reasoning_effort: high</small><br/>reasoning_content deltas when provided"]
RespAPI["Responses API<br/><small>reasoning.effort: high<br/>reasoning.summary: auto</small><br/>reasoning_summary_text deltas"]
User --> Model
Model --> Copilot
Copilot -->|"Claude / o3 / o4 / others"| ChatAPI
Copilot -->|"GPT-5 / oswe families"| RespAPI
The Copilot API proxies multiple model families through two API variants. Reasoning effort is sent in the request body for thinking-capable models, and Opal parses thinking deltas when providers emit them.
Chat Completions: Adds reasoning_effort string to the request body for thinking-capable model IDs.
# In Provider.Copilot โ delegates to shared Provider module
defp maybe_add_reasoning(body, %{thinking_level: level, id: id}, :completions) do
if thinking_capable?(id) do
Map.put(body, :reasoning_effort, Opal.Provider.reasoning_effort(level))
else
body
end
endResponses API (gpt-5* and oswe*): Adds reasoning object with effort level and summary: "auto", and switches system role to "developer" when thinking is enabled.
# In Provider.Copilot โ Responses API reasoning
defp maybe_add_reasoning(body, %{thinking_level: :off}, :responses), do: body
defp maybe_add_reasoning(body, %{thinking_level: level}, :responses) do
Map.put(body, :reasoning, %{effort: Opal.Provider.reasoning_effort(level), summary: "auto"})
endLevel mapping (via Opal.Provider.reasoning_effort/1): :low โ "low", :medium โ "medium", :high โ "high", :max โ "high" (clamped โ the Copilot proxy doesn't support "max" natively).
Thinking-capable models are detected by model ID prefix: gpt-5*, claude-sonnet-4*, claude-opus-4*, claude-haiku-4.5, o3*, o4*. Other models (e.g. gpt-4o, gpt-4.1) don't receive reasoning params.
Copilot thinking support matrix:
| Model Family | API Path | Reasoning Sent? | Thinking Event Parsed |
|---|---|---|---|
GPT-5, GPT-5.1, GPT-5.2, GPT-5.3, oswe* |
Responses API | โ
reasoning.effort |
response.reasoning_summary_text.delta โ {:thinking_delta, text} |
| Claude Sonnet 4, Opus 4.x, Haiku 4.5, o3/o4 | Chat Completions | โ
reasoning_effort (thinking-capable IDs) |
choices[].delta.reasoning_text โ {:thinking_delta, text} (summary) |
| GPT-4o, GPT-4.1 | Chat Completions | โ Not sent | N/A |
Thinking output is parsed from SSE into Opal's semantic events:
| SSE Data | Opal Event | Notes |
|---|---|---|
{"choices": [{"delta": {"reasoning_text": "..."}}]} |
{:thinking_delta, text} |
Copilot proxy for Claude โ human-readable summary |
{"choices": [{"delta": {"reasoning_content": "..."}}]} |
{:thinking_delta, text} |
Standard Chat Completions (Anthropic direct) |
{"choices": [{"delta": {"reasoning_opaque": "..."}}]} |
(ignored) | Encrypted round-trip data; not displayable |
{"type": "response.output_item.added", "item": {"type": "reasoning"}} |
{:thinking_start, %{item_id: id}} |
Responses API |
{"type": "response.reasoning_summary_text.delta", "delta": "..."} |
{:thinking_delta, text} |
Responses API |
The Copilot proxy wraps Claude's extended thinking in a non-standard Chat Completions SSE format that differs from both Anthropic's native API and OpenAI's reasoning_content convention. Each thinking chunk includes three keys:
{
"choices": [{
"delta": {
"role": "assistant",
"content": "",
"reasoning_text": "Let me analyze this step by step..."
}
}]
}| Key | Purpose |
|---|---|
reasoning_text |
Human-readable thinking text (displayed in UI) |
reasoning_opaque |
Encrypted blob for round-tripping to the API (not displayable) |
content |
Always "" during thinking chunks |
Every thinking chunk also carries role: "assistant" and content: "". Without special handling, the role_start parser would emit a spurious {:text_start} on every thinking chunk, creating dozens of empty assistant message entries. The parser guards against this by checking for reasoning_text, reasoning_content, and reasoning_opaque keys before emitting text_start. As a second safety net, stream.ex deduplicates {:message_start} broadcasts via the message_started flag (reset each streaming cycle in begin_stream/2).
After the thinking phase, an opaque-only chunk signals the end of reasoning:
{"choices": [{"delta": {"role": "assistant", "content": "", "reasoning_opaque": "1b9UghY8..."}}]}Then normal text content follows with non-empty content values.
Note: Direct LLM provider support (Anthropic, OpenAI, etc.) has been removed in the current version. The behaviour is designed for future re-addition. The section below is kept for reference.
For direct provider access, the thinking level would map to provider-specific reasoning parameters:
| Opal Level | Anthropic (adaptive) | OpenAI |
|---|---|---|
:low |
effort: "low" |
reasoning.effort: "low" |
:medium |
effort: "medium" |
reasoning.effort: "medium" |
:high |
effort: "high" |
reasoning.effort: "high" |
:max |
effort: "max" |
clamped โ "high" |
Thinking content is persisted in messages via the thinking field on Opal.Message. When the agent finalizes a response, accumulated thinking text is stored alongside the assistant message content and tool calls.
graph LR
Stream["SSE Stream"] --> Accumulate["stream.ex<br/><small>current_thinking</small>"]
Accumulate --> Finalize["agent.ex<br/><small>Message.assistant(text, calls, thinking: ...)</small>"]
Finalize --> Session["session/session.ex<br/><small>DETS persistence</small>"]
Finalize --> NextTurn["Next API call<br/><small>Thinking included in<br/>message conversion</small>"]
Why roundtrip? OpenAI's docs recommend passing back reasoning items between tool calls to maintain reasoning continuity. Anthropic's thinking blocks have cryptographic signatures for context continuity. Without roundtripping, the model loses its chain of thought at every tool call.
How it roundtrips per API variant:
- Chat Completions: Previous assistant messages include
reasoning_contentfield if they had thinking content. - Responses API: A
reasoningitem withsummary: [%{type: "summary_text", text: ...}]is prepended to the assistant's output items in the input array.
The stream.ex handler auto-emits a thinking_start event before the first thinking_delta if the provider didn't send one (Chat Completions doesn't emit explicit start events). This is detected via current_thinking being nil (not started) vs "" (started).
Thinking content is displayed in the CLI timeline when Ctrl+O is toggled on (the same toggle that shows tool output):
- Thinking blocks appear as
{ kind: "thinking", text: string }entries in the timeline, interspersed with messages and tool calls in the order they occurred. - Rendered with a
๐ญprefix in dim/italic gray to distinguish from assistant text. - Truncated to 8 lines when viewing historical thinking blocks.
- During streaming, the animated
ThinkingIndicatorkaomoji still appears below the timeline.
When Ctrl+O is off (default), thinking entries are hidden from the timeline view.
The /models command opens an interactive picker. After selecting a model that supports thinking, a second picker appears with the available thinking levels (fetched from the server, not hardcoded).
// Set model with thinking level
{"method": "model/set", "params": {
"session_id": "...",
"model_id": "claude-opus-4.6",
"thinking_level": "high"
}}
// Change thinking level without switching models
{"method": "thinking/set", "params": {
"session_id": "...",
"level": "medium"
}}# At session start
Opal.start_session(%{
model: {:copilot, "claude-opus-4.6"},
thinking_level: :high
})
# Mid-session
model = Opal.Provider.Model.new(:copilot, "claude-opus-4.6", thinking_level: :high)
Opal.set_model(agent, model)The models/list RPC returns reasoning capability per model:
{
"models": [
{
"id": "claude-opus-4.6",
"name": "Claude Opus 4.6",
"provider": "copilot",
"supports_thinking": true,
"thinking_levels": ["low", "medium", "high", "max"]
},
{
"id": "claude-sonnet-4.5",
"name": "Claude Sonnet 4.5",
"provider": "copilot",
"supports_thinking": true,
"thinking_levels": ["low", "medium", "high"]
},
{
"id": "gpt-4o",
"name": "GPT-4o",
"provider": "copilot",
"supports_thinking": false,
"thinking_levels": []
}
]
}This data comes from LLMDB's reasoning.enabled capability flag. Opus 4.6+ models get the additional "max" level. The CLI model picker uses this to decide whether to show the thinking level step.
Provider SSE โ parse_stream_event/1 โ {:thinking_start, %{}}
(Copilot) โ {:thinking_delta, "Let me analyze..."}
โ stream.ex โ Accumulates into current_thinking
โ Auto-emits thinking_start if missing
โ Agent broadcasts โ {:thinking_start}
โ {:thinking_delta, %{delta: "..."}}
โ RPC serializes โ {"type": "thinking_start"}
โ {"type": "thinking_delta", "delta": "..."}
โ CLI reducer โ Timeline: {kind: "thinking", text: "..."}
โ AgentView.thinking: "..." (indicator)
โ finalize_response โ Message.assistant(text, calls, thinking: "...")
โ Persisted via Session DETS store
lib/opal/provider/model.exโthinking_levelfield and validation (:off | :low | :medium | :high | :max)lib/opal/provider/registry.exโ Per-modelthinking_levelsfrom LLMDB,supports_max_thinking?/1lib/opal/message.exโthinkingfield on Message structlib/opal/provider/provider.exโ Sharedparse_chat_event/1,convert_messages_openai/2,reasoning_effort/1. Parsesreasoning_textandreasoning_contentas{:thinking_delta, text}.lib/opal/provider/copilot.exโmaybe_add_reasoning/3,thinking_capable?/1, Responses API conversionlib/opal/agent/stream.exโ Thinking accumulation, auto-start detection, andmessage_starteddedup guardlib/opal/agent/state.exโmessage_startedboolean prevents duplicate{:message_start}broadcasts per streaming cyclelib/opal/agent/agent.exโcurrent_thinkingstate, SSE stream handling,finalize_responselib/opal/session/session.exโ Thinking persistence on messages in the session store (DETS)lib/opal/rpc/server.exโthinking/setandmodel/setwiththinking_levelsrc/hooks/use-opal.tsโ Timeline thinking entries,appendThinkingDeltasrc/components/message-list.tsxโThinkingBlockcomponentsrc/components/model-picker.tsxโ Two-step picker (model โ thinking level)test/opal/reasoning_effort_test.exsโ Reasoning effort unit teststest/opal/thinking_integration_test.exsโ Full-stack thinking integration tests (fixture replay)test/opal/live_thinking_test.exsโ Live API tests that record thinking fixturestest/opal/provider/openai_test.exsโ Shared provider helpers tests
- OpenAI Reasoning Guide โ Official docs for
reasoning.effortandreasoning.summaryparameters on the Responses API. - Anthropic Extended Thinking โ Official docs for budget-based and adaptive thinking modes, including
output_config.effortlevels.