Advisor-Tool - Enhance fast models with smart ones #25790

krrish-berri-2 · 2026-04-15T15:19:57Z

krrish-berri-2
Apr 15, 2026
Maintainer

We're discussing a possible improvement to model routing on LiteLLM. The goal is to let cheaper, faster models tackle most turns while still delivering frontier-quality reasoning when the task genuinely needs it — without forcing users/apps to pick one or the other upfront.

The idea: if an admin sets advisor_model under a model's litellm_params, LiteLLM automatically exposes an enhanced variant of that model alongside the base one — no second entry in model_list needed. The enhanced variant runs the base model as the primary responder and gives it access to an advisor tool backed by the configured smarter model. The base model calls the advisor only when it gets stuck on a hard sub-step, and uses the advice to continue.

This productizes the Anthropic advisor tool pattern, so any downstream app (OpenWebUI, Cursor, Continue, custom apps talking OpenAI-compatible endpoints) can get "smart-when-it-matters" behavior with zero code changes.

1. Example config.yaml

model_list:
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      advisor_model: anthropic/claude-opus-4-6   # ← enables enhanced variant
      advisor_max_calls: 3
      advisor_budget_usd: 0.50

  - model_name: gpt-5-mini
    litellm_params:
      model: openai/gpt-5-mini
      advisor_model: openai/gpt-5-pro

With the config above, the proxy exposes:

claude-sonnet-4-6              # base, unchanged
enhanced_model/claude-sonnet-4-6   # auto-exposed because advisor_model is set
gpt-5-mini
enhanced_model/gpt-5-mini

Clients pick whichever they want at call time. No duplicate model_list entries, no separate aliasing block to maintain. If advisor_model isn't set, the enhanced variant simply isn't exposed.

2. User Request

curl https://litellm-proxy/v1/chat/completions \
  -H "Authorization: Bearer sk-user-key" \
  -d '{
    "model": "enhanced_model/claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "refactor this auth module and explain the trade-offs"}]
  }'

Response headers expose what happened:

X-LiteLLM-Base-Model: claude-sonnet-4-6
X-LiteLLM-Advisor-Model: claude-opus-4-6
X-LiteLLM-Advisor-Calls: 2

And the usage block reports both spends:

{
  "usage": {
    "prompt_tokens": 1820,
    "completion_tokens": 940,
    "advisor_prompt_tokens": 3100,
    "advisor_completion_tokens": 420,
    "advisor_cost_usd": 0.031
  }
}

3. Preview / Trace Endpoint

curl https://litellm-proxy/v1/routing/preview \
  -d '{"model": "enhanced_model/claude-sonnet-4-6", "messages": [{"role": "user", "content": "..."}]}'
# → {"base_model": "claude-sonnet-4-6",
#    "advisor_model": "claude-opus-4-6",
#    "advisor_tool_injected": true,
#    "advisor_budget_usd": 0.50}

How it might work

Request arrives with model=enhanced_model/claude-sonnet-4-6
      │
      ▼
Resolve base_model + advisor_model from config
  → admin override in litellm_params (if set)
  → else LiteLLM's built-in pairing registry
      │
      ▼
Inject `advisor` tool into the request's tool list
  (merged alongside any user-defined tools)
      │
      ▼
Dispatch to base model (claude-sonnet-4-6) as normal
  → streaming preserved
      │
      ▼
If base model calls advisor(question, context):
  - intercept tool call
  - forward to advisor_model (claude-opus-4-6)
  - return advice as tool_result
  - enforce advisor_max_calls + advisor_budget_usd
      │
      ▼
Base model continues, produces final response
      │
      ▼
Stream response back, add X-LiteLLM-Advisor-* headers,
merge advisor usage into response.usage

Why

Today users pick one model per request:

Pick the smart model → pay Opus/GPT-5-pro prices on every token, even for trivial turns.
Pick the cheap model → save money and latency, but hit a quality cliff on hard sub-steps.

Most real workloads are bimodal: ~90% of turns are easy, ~10% need a smarter brain. The advisor pattern solves this within a single request — this proposal just makes it one model string away for every LiteLLM user.

Open questions

Prefix naming — enhanced_model/<name> vs. something shorter like smart/<name> or +<name>?
Should advisor_model accept a reference to another model_name in model_list (so it picks up that entry's auth/rate limits), a raw provider string, or both?
Should the advisor tool schema be fixed, or customizable per entry (custom advisor prompt/schema)?
Logging/tracing: advisor calls as sub-spans of the parent request in Langfuse/OTEL?
Caching identical advisor questions within a session?
Fallback when the advisor fails or hits the budget cap — base model continues without advice, or return an error?

Prior art

Anthropic advisor tool: https://docs.litellm.ai/docs/completion/anthropic_advisor_tool
Auto-router discussion (Auto-router - Content-Aware Preference-Aligned Routing #25703) — picks one model per request; this picks within a request.
Speculative decoding — same intuition at the token layer.

krrish-berri-2 · 2026-04-15T15:20:22Z

krrish-berri-2
Apr 15, 2026
Maintainer Author

cc: @DmitriyAlergant , @marty-sullivan

0 replies

DmitriyAlergant · 2026-04-15T15:46:51Z

DmitriyAlergant
Apr 15, 2026

I'm all for "server-side-tools". Both on LLM provider side (e.g. web search) and on LiteLLM/router side. Advisor tool is a fine example of a server-side tool.

How I would expected this to work:

advisor_model can be specified by admin (litellm config) or by client request
accepts a reference to another managed model (model_list), not raw string!
schema/prompts: hardcoded defaults + system-level env var overrides + optional model-level overrides
no special magical prefixes, no "enhanced_model/" etc. In the example above, any call to "claude-sonnet-4-6" will have advisor tool available. If admins need to expose both variants (with or without advisor), they can explicitly create these entries separately. KISS, unix-way.
model access permissions when advisor is added via litellm config: bypasses user/key access checks. If the client can access base model, advisor is assumed to be permissible since it was introduced by the admin
model access permissions when advisor is added by the user: apply user/key access checks, client needs to have access to base model and to the advisor model. Also, no special permissions required to do that from the client side, since this is not a privilege escalation: a client having access to both models could always implement a tool on the client side. Short-circuiting on LiteLLM side is a convenience.
caching: not sure. I am not a believer in LiteLLM-level caching. I don't see value in it. Prompt caching is done on the providers side. I guess normal LiteLLM caching can still work if enabled.
fallback when the advisor fails or hits the budget cap: default "base model continues without advice", but can be overridden system-wide or per-model
logging/tracing: not sure about langfuse/OTEL but please make sure token counts in LiteLLM_SpendLogs json are accurately reported and broken down by models

4 replies

krrish-berri-2 Apr 15, 2026
Maintainer Author

If admins need to expose both variants (with or without advisor), they can explicitly create these entries separately. KISS, unix-way.

this seems like more, unecessary work. If i'm adding an 'advisor_model' to a model list entry, am i not doing that already?

by exposing it via a model name, it would work inside ai tools that don't allow client selection of tools (e.g. openwebui)

DmitriyAlergant Apr 15, 2026

This means precisely following admin's intent.

  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      advisor_model: anthropic/claude-opus-4-6

Should mean precisely that: we introduced one model variant ("claude-sonnet-4-6") that always has advisor model enabled with it. The actual LLM may decide to invoke it or not. Not introducing two variants with opinionated prefixes.

There are MANY other things that an admin can configure and influence with litellm_params, you are not auto-generating different model entries based on those. What makes advisor_model special? It should not be special.

krrish-berri-2 Apr 15, 2026
Maintainer Author

help me understand, why don't you want to expose it as an option for the user?

DmitriyAlergant Apr 15, 2026

This introduces unneeded complexity to the users with clients that expose /v1/models list for selection. Users are expected to understand what "enhanced_model/" is and how it is different from the other Sonnet etc. For unsophisticated users this is often undesirable. It limits admin's flexibility to only have ONE option available, for more simplistic users.

On the other hand, for sophisticated clients who build using API, you don't need a pre-built configuration and you can advise them to include the advisor_model parameters with the request body.

Implicit creation of opinionated model variants is inconsistent with how the rest of LiteLLM works.

E.g. today, I can use litellm_params to forcefully enable provider-side Web Search and Web Fetch tools for claude-sonnet-4-6. It works, and it does not produce additional model_list entries - I still only have one claude-sonnet-4-6 available but now it has web search by default.

DmitriyAlergant · 2026-04-15T15:56:57Z

DmitriyAlergant
Apr 15, 2026

From the config and request body viewpoint, I would perhaps wrapped the settings as a nested agent? E.g.

model_list:
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      litellm_tools:
       - type: advisor_tool
       - model: claude-opus-4-6
       - ...

And a similar nested object in request body.

Something like that (?) but that's up to you

1 reply

krrish-berri-2 Apr 15, 2026
Maintainer Author

interesting approach

Uh oh!

Advisor-Tool - Enhance fast models with smart ones #25790

Uh oh!

krrish-berri-2 Apr 15, 2026 Maintainer

1. Example config.yaml

2. User Request

3. Preview / Trace Endpoint

How it might work

Why

Open questions

Prior art

Replies: 4 comments · 5 replies

Uh oh!

krrish-berri-2 Apr 15, 2026 Maintainer Author

Uh oh!

Uh oh!

DmitriyAlergant Apr 15, 2026

Uh oh!

krrish-berri-2 Apr 15, 2026 Maintainer Author

Uh oh!

Uh oh!

DmitriyAlergant Apr 15, 2026

Uh oh!

krrish-berri-2 Apr 15, 2026 Maintainer Author

Uh oh!

Uh oh!

DmitriyAlergant Apr 15, 2026

Uh oh!

DmitriyAlergant Apr 15, 2026

Uh oh!

krrish-berri-2 Apr 15, 2026 Maintainer Author

krrish-berri-2
Apr 15, 2026
Maintainer

Replies: 4 comments 5 replies

krrish-berri-2
Apr 15, 2026
Maintainer Author

DmitriyAlergant
Apr 15, 2026

krrish-berri-2 Apr 15, 2026
Maintainer Author

krrish-berri-2 Apr 15, 2026
Maintainer Author

DmitriyAlergant
Apr 15, 2026

krrish-berri-2 Apr 15, 2026
Maintainer Author