Advisor-Tool - Enhance fast models with smart ones #25790
krrish-berri-2
started this conversation in
Ideas
Replies: 4 comments 5 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
I'm all for "server-side-tools". Both on LLM provider side (e.g. web search) and on LiteLLM/router side. Advisor tool is a fine example of a server-side tool. How I would expected this to work:
|
Beta Was this translation helpful? Give feedback.
4 replies
-
|
From the config and request body viewpoint, I would perhaps wrapped the settings as a nested agent? E.g. model_list:
- model_name: claude-sonnet-4-6
litellm_params:
model: anthropic/claude-sonnet-4-6
litellm_tools:
- type: advisor_tool
- model: claude-opus-4-6
- ...And a similar nested object in request body. Something like that (?) but that's up to you |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We're discussing a possible improvement to model routing on LiteLLM. The goal is to let cheaper, faster models tackle most turns while still delivering frontier-quality reasoning when the task genuinely needs it — without forcing users/apps to pick one or the other upfront.
The idea: if an admin sets
advisor_modelunder a model'slitellm_params, LiteLLM automatically exposes an enhanced variant of that model alongside the base one — no second entry inmodel_listneeded. The enhanced variant runs the base model as the primary responder and gives it access to an advisor tool backed by the configured smarter model. The base model calls the advisor only when it gets stuck on a hard sub-step, and uses the advice to continue.This productizes the Anthropic advisor tool pattern, so any downstream app (OpenWebUI, Cursor, Continue, custom apps talking OpenAI-compatible endpoints) can get "smart-when-it-matters" behavior with zero code changes.
1. Example config.yaml
With the config above, the proxy exposes:
Clients pick whichever they want at call time. No duplicate
model_listentries, no separate aliasing block to maintain. Ifadvisor_modelisn't set, the enhanced variant simply isn't exposed.2. User Request
Response headers expose what happened:
And the
usageblock reports both spends:{ "usage": { "prompt_tokens": 1820, "completion_tokens": 940, "advisor_prompt_tokens": 3100, "advisor_completion_tokens": 420, "advisor_cost_usd": 0.031 } }3. Preview / Trace Endpoint
How it might work
Why
Today users pick one model per request:
Most real workloads are bimodal: ~90% of turns are easy, ~10% need a smarter brain. The advisor pattern solves this within a single request — this proposal just makes it one model string away for every LiteLLM user.
Open questions
enhanced_model/<name>vs. something shorter likesmart/<name>or+<name>?advisor_modelaccept a reference to anothermodel_nameinmodel_list(so it picks up that entry's auth/rate limits), a raw provider string, or both?Prior art
Beta Was this translation helpful? Give feedback.
All reactions