Skip to content

Litellm krrish staging 04 20 2026#26138

Merged
krrish-berri-2 merged 7 commits intolitellm_internal_stagingfrom
litellm_krrish_staging_4_20_2026
Apr 20, 2026
Merged

Litellm krrish staging 04 20 2026#26138
krrish-berri-2 merged 7 commits intolitellm_internal_stagingfrom
litellm_krrish_staging_4_20_2026

Conversation

@krrish-berri-2
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

yuneng-berri and others added 3 commits April 18, 2026 19:33
…#25987)

* feat(router): add auto_router/quality_router for quality-tier routing

Adds a new auto-router type that routes a request to a model at a target
quality tier. The quality tier is inferred by re-using the existing
ComplexityRouter's classification, then mapped through an admin-configured
complexity_to_quality table. Each candidate model declares its own
quality_tier in model_info.litellm_routing_preferences.

Resolution strategy: exact tier match, else round up to the next higher
tier, else fall back to default_model.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* feat(quality_router): add capability-based filtering

Each deployment can declare a `capabilities: List[str]` field in
`model_info.litellm_routing_preferences` (e.g. ["vision",
"function_calling"]). Requests can pass `litellm_capabilities` in
`request_kwargs` to require specific capabilities — the router will only
route to deployments whose declared capabilities are a superset.

Resolution still walks tier (exact → round up), but at each tier filters
by capability before picking. Falls back to default_model only when it
also satisfies the required capabilities; otherwise raises rather than
silently routing to a model that lacks a required capability.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* feat(quality_router): expose routing decision in response headers

For transparency, expose the QualityRouter's routing decision in the
proxy response headers:

  x-litellm-quality-router-model       → picked model_name (e.g. "haiku-vision")
  x-litellm-quality-router-tier        → resolved quality tier (e.g. "1")
  x-litellm-quality-router-complexity  → ComplexityTier name (e.g. "SIMPLE")

Mechanism: the pre-routing hook stashes the decision in
request_kwargs["metadata"]["quality_router_decision"]. After the call
returns, Router.set_response_headers lifts the decision into
response._hidden_params["additional_headers"] alongside the existing
x-litellm-model-group / x-litellm-model-id headers. Existing metadata
keys (trace_id, user_id, etc.) are preserved.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* feat(quality_router): replace capabilities with keyword override

Drops the capability-based filtering in favor of a keyword-based override
for v0:

- RoutingPreferences.keywords: List[str] (replaces capabilities) — each
  deployment can declare substring keywords.
- If any declared keyword (case-insensitive) appears in the user message,
  the router short-circuits the complexity-classification flow and routes
  to the matching deployment.
- Tiebreaker for overlapping keyword matches: quality_tier DESC, then
  cheapest model_info.input_cost_per_token ASC. Unpriced models lose ties
  to priced ones.

Decision metadata + headers now expose the override:
  x-litellm-quality-router-via       → "keyword" | "quality_tier"
  x-litellm-quality-router-keyword   → matched keyword (only on keyword route)
  x-litellm-quality-router-complexity → complexity tier (only on tier route)

Removes:
- request_kwargs["litellm_capabilities"] reading
- _model_capabilities, _model_supports_capabilities,
  _first_capable_model_at_tier, capability filter in
  _resolve_model_for_quality_tier

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* feat(quality_router): add explicit `order` to RoutingPreferences

Adds an explicit priority field to RoutingPreferences for resolving
collisions deterministically:

  RoutingPreferences.order: Optional[int]   # lower wins; unset = +inf

Used as the PRIMARY tiebreaker in two places:

1. Keyword overlap: when multiple deployments declare the same matching
   keyword, sort by (order ASC, quality_tier DESC, input_cost_per_token
   ASC, model_name ASC). Explicit always beats implicit.

2. Tier resolution: when multiple deployments share a quality tier,
   `_resolve_model_for_quality_tier` picks the one with the lowest
   order. The tier list is now sorted at index-build time.

This lets admins make routing decisions explicit when the natural
quality-and-price ordering would pick the wrong model.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* feat(quality_router): reorder tiebreak to (quality, order, price)

Changes the tiebreak ordering so quality_tier always wins first, then
explicit `order` is used to break ties within the same tier, then price
breaks the rest:

  1. quality_tier DESC      ← best model wins first
  2. order ASC              ← explicit priority within a tier
  3. input_cost_per_token ASC
  4. model_name ASC

Previously `order` was the primary key — that meant a tier-2 model with
`order=1` would beat a tier-3 model with no `order`, which is the wrong
default. Now `order` only resolves collisions among same-tier candidates.

Tier resolution (within a single tier) keeps the same key minus quality:
(order ASC, cost ASC, name).

Test renames + flips:
  - test_explicit_order_overrides_quality_tier → test_quality_wins_over_explicit_order
  - new: test_order_breaks_tie_within_same_quality_tier

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* fix(quality_router): resolve Greptile review feedback

Addresses four P1 findings from PR review plus test coverage:

1. set_model_list missing quality_routers reset
   - Hot-reloading the Router would leave stale QualityRouter instances
     pointing at the old model_list. `set_model_list` now clears
     `self.quality_routers` alongside the other indices.

2. Round-down fallback before default_model
   - `_resolve_model_for_quality_tier` now rounds DOWN to the closest
     lower tier after round-up fails, before falling back to
     `default_model`. Degrades gracefully rather than jumping straight
     off-tier.

3. RoutingPreferences validation bypass
   - `_build_tier_index` now instantiates `RoutingPreferences(**prefs)`
     so invalid shapes (e.g. non-int quality_tier) raise a clear
     ValueError instead of silently succeeding.

4. Config-ordering dependency
   - `_tier_to_models` is now built lazily on first access. Previously,
     eager construction in `__init__` meant a QualityRouter deployment
     had to appear AFTER all its referenced models in config.yaml,
     because `Router._create_deployment` populates `model_list`
     incrementally. Any `available_models` defined after the router
     entry would silently be reported as missing.

Also adds 6 new tests covering each fix:
- test_invalid_quality_tier_type_raises_clear_error
- test_router_can_be_instantiated_before_its_targets_exist
- test_set_model_list_clears_quality_routers_registry
- test_rounds_down_when_no_higher_tier_exists
- test_rounds_down_prefers_closest_lower_tier
- test_prefers_round_up_over_round_down

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* style: apply black 24.10.0 formatting to pre-existing offenders

Unblocks the LiteLLM Linting check for this PR — these 12 files are already
failing `black --check` on main (the lint workflow only runs on PRs, so main
drifts). No behavior changes; formatting-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update litellm/router.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* feat(proxy): add --reload flag for uvicorn hot reload (dev only)

Opt-in CLI flag, off by default, no env var. Only affects the uvicorn
run path; gunicorn/hypercorn paths and prod (which doesn't pass the
flag) are unaffected.

* Feature/add audio support for scaleway (#26110)

* feat(scaleway): add SCALEWAY to LlmProviders enum

* feat(scaleway): add audio transcription config and dispatch wiring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(scaleway): add behavior tests for audio transcription config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(scaleway): advertise audio_transcriptions in endpoint-support JSON

* docs(scaleway): document audio transcription support

* fix(scaleway): address PR review — plain-text response_format + missing-key fail-fast

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(scaleway): cover new response paths, drop gettysburg.wav coupling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Prompt Compression - add it to the proxy (#25729)

* refactor: new agentic loop event hook

simplifies how to create logic for tool based multi llm calls

* fix: compress - make it work on anthropic input as well

* fix(compress.py): working prompt compression for claude code

ensures claude code messages can run through proxy easily

* docs: add agentic loop hook guide

* docs: add agentic_loop_hook to sidebar

* fix: fix multiple arguments error

* fix: fix tool call loop for compression on streaming /v1/messages

* fix: fix linting errors

* fix: fix ci/cd errors

* feat(litellm_pre_call_utils.py): use claude code session for litellm session id

allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation

* fix: suppress incorrect mypy warning rE: module

* revert: drop PR's changes to litellm/proxy/_experimental/out/

Restores the 34 HTML files under _experimental/out/ to their pre-PR
paths (X/index.html -> X.html). All renames are R100 (content
unchanged); no other files are touched.

* fix: address greptile review comments on PR #25729

- Skip ``kwargs["tools"] = []`` injection when compression is a no-op —
  Anthropic Messages rejects empty tool arrays on requests that did not
  originally declare tools.
- Move agentic-loop safety guards (fingerprint cycle / max depth) out of
  the per-callback try/except so they propagate instead of being swallowed
  by the generic exception handler. Extracted _check_agentic_loop_safety.
- Gate generic ``x-<vendor>-session-id`` capture behind the
  LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to
  preserve backwards compatibility; explicit x-litellm-* headers are
  unaffected.
- Fix monkeypatch target in pre-call-hook test to patch the actual
  module-level binding
  (litellm.integrations.compression_interception.handler.compress).
- Add regression tests for empty-tools skip and opt-in session capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag

Generic x-<vendor>-session-id header capture is a new feature and only
runs *after* the explicit x-litellm-trace-id / x-litellm-session-id
checks, so it does not change behavior for any existing caller that was
already using the LiteLLM headers — no backwards-incompatibility to gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(compress): replace input_type with CallTypes call_type

Drop the bespoke ``CompressionInputType`` literal and use the existing
``litellm.types.utils.CallTypes`` enum instead.  ``litellm.compress()``
now takes ``call_type: Union[CallTypes, str]`` (default
``CallTypes.completion``) — no new concept to learn, and the enum is
already the way the rest of the codebase talks about request shapes.

Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions
shape) and ``anthropic_messages`` (Anthropic structured content blocks).

Updated: compress(), the compression_interception handler, tests, docs,
and the two eval scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Support /v1/responses in complexity router

Adds cross-format support to the complexity router via the guardrail
translation handler dispatch. Adds get_structured_messages to base
translation plus OpenAI chat, Responses, and Anthropic handlers.
Auto-router helper _extract_text_from_messages handles tool-call and
multimodal messages. Widens async_pre_routing_hook messages type to
Dict[str, Any].

Fixes #25134

* chore: apply black formatting

* fix: fallback to trying each handler when route inference fails

---------

Co-authored-by: Ryan Crabbe <ryan@berri.ai>
Co-authored-by: nhyy244 <106547304+nhyy244@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yuneng-berri
❌ krrish-berri-2
You have signed the CLA already but the status is still pending? Let us recheck it.

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Apr 20, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29203065 Triggered JSON Web Token 26fcbc9 tests/test_litellm/proxy/test_litellm_pre_call_utils.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 20, 2026

Greptile Summary

This PR introduces three new request-routing strategies — AutoRouter (semantic/embedding-based), ComplexityRouter (rule-based, <1ms), and QualityRouter (complexity tier → quality tier mapping with keyword override) — plus fixes the auto_routers hot-reload regression and lifts QualityRouter decisions into response headers.

Confidence Score: 5/5

Safe to merge; all open findings are P2 style/config issues that do not block the primary routing path.

The hot-reload regression (auto_routers not cleared) from the prior review is fixed, tests are comprehensive and mock-only, and the two remaining findings are a guard-ordering inefficiency and a broken example config — neither affects correctness of the routing logic itself.

litellm/proxy/new_secret_config.yaml (broken default model reference) and litellm/router.py init*_deployment methods (guard after construction).

Important Files Changed

Filename Overview
litellm/router_strategy/auto_router/auto_router.py New AutoRouter using semantic_router to classify messages and route to configured deployments; logic and fallback paths look correct.
litellm/router_strategy/quality_router/quality_router.py New QualityRouter with lazy tier index, keyword override, and complexity-based routing; tiebreak logic is well tested, but the side-index dicts are never cleared before index rebuild.
litellm/router.py Hot-reload fix (auto_routers now cleared in set_model_list), QualityRouter headers lifted into set_response_headers, but all three init_*_deployment methods check for duplicates after constructing the router object.
litellm/proxy/_new_secret_config.yaml Example config references non-existent model "small-model" as the complexity router's default_model, which would cause routing failures for MEDIUM/REASONING tier requests.
tests/test_litellm/router_strategy/test_quality_router.py Comprehensive mock-only tests covering tier index, resolution, keyword override, tiebreaking, and decision metadata; no real network calls detected.
tests/test_litellm/router_strategy/test_auto_router.py Tests for AutoRouter._extract_text_from_messages are solid; routing hook tests are marked skip-beta and have the known RouteChoice isinstance issue (surfaced in a prior review).
litellm/router_strategy/quality_router/config.py Clean Pydantic config models for QualityRouter with sensible defaults and extra='allow' for forward-compatibility.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B{async_pre_routing_hook}
    B --> C{model in auto_routers?}
    C -- Yes --> D[AutoRouter\nSemanticRouter embedding lookup]
    D --> E[RouteChoice.name or default_model]
    C -- No --> F{model in complexity_routers?}
    F -- Yes --> G[ComplexityRouter\nRule-based scoring]
    G --> H[classify → SIMPLE/MEDIUM/COMPLEX/REASONING]
    H --> I[get_model_for_tier]
    F -- No --> J{model in quality_routers?}
    J -- Yes --> K{keyword match?}
    K -- Yes --> L[Keyword override\nhighest quality_tier wins]
    K -- No --> M[ComplexityRouter.classify\n→ complexity_to_quality mapping]
    M --> N[_resolve_model_for_quality_tier\nexact → round-up → round-down → default]
    J -- No --> O[Normal routing]
    E --> P[PreRoutingHookResponse]
    I --> P
    L --> P
    N --> P
    P --> Q[Router selects deployment\nset_response_headers adds x-litellm-quality-router-* headers]
Loading

Reviews (3): Last reviewed commit: "style: apply black formatting to websear..." | Re-trigger Greptile

Comment thread litellm/router.py
Comment on lines +7030 to +7031
self.quality_routers = {}
self.complexity_routers = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 auto_routers not reset on hot-reload

set_model_list resets quality_routers and complexity_routers to prevent stale state after hot-reload, but auto_routers is never cleared. Any set_model_list call that re-registers an existing auto_router/ deployment will hit the guard in init_auto_router_deployment and raise a ValueError, breaking hot-reload. Fix: add self.auto_routers = {} alongside the other resets.

Comment on lines +93 to +97
def _tier_to_models(self) -> Dict[int, List[str]]:
"""Lazy tier→models index; built on first access."""
if self._tier_to_models_cache is None:
self._tier_to_models_cache = self._build_tier_index()
return self._tier_to_models_cache
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Tier index not invalidated on dynamic model additions

_tier_to_models_cache is built lazily on first access and never cleared. Models added via add_deployment after the cache is populated are silently excluded from quality routing until set_model_list is called.

@krrish-berri-2 krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 23:09 — with GitHub Actions Inactive
@krrish-berri-2 krrish-berri-2 merged commit e7bc316 into litellm_internal_staging Apr 20, 2026
87 of 95 checks passed
@krrish-berri-2 krrish-berri-2 deleted the litellm_krrish_staging_4_20_2026 branch April 20, 2026 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants