Litellm krrish staging 04 20 2026 by krrish-berri-2 · Pull Request #26138 · BerriAI/litellm

krrish-berri-2 · 2026-04-20T22:51:26Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

[Infra] Promote staging to main

…#25987) * feat(router): add auto_router/quality_router for quality-tier routing Adds a new auto-router type that routes a request to a model at a target quality tier. The quality tier is inferred by re-using the existing ComplexityRouter's classification, then mapped through an admin-configured complexity_to_quality table. Each candidate model declares its own quality_tier in model_info.litellm_routing_preferences. Resolution strategy: exact tier match, else round up to the next higher tier, else fall back to default_model. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): add capability-based filtering Each deployment can declare a `capabilities: List[str]` field in `model_info.litellm_routing_preferences` (e.g. ["vision", "function_calling"]). Requests can pass `litellm_capabilities` in `request_kwargs` to require specific capabilities — the router will only route to deployments whose declared capabilities are a superset. Resolution still walks tier (exact → round up), but at each tier filters by capability before picking. Falls back to default_model only when it also satisfies the required capabilities; otherwise raises rather than silently routing to a model that lacks a required capability. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): expose routing decision in response headers For transparency, expose the QualityRouter's routing decision in the proxy response headers: x-litellm-quality-router-model → picked model_name (e.g. "haiku-vision") x-litellm-quality-router-tier → resolved quality tier (e.g. "1") x-litellm-quality-router-complexity → ComplexityTier name (e.g. "SIMPLE") Mechanism: the pre-routing hook stashes the decision in request_kwargs["metadata"]["quality_router_decision"]. After the call returns, Router.set_response_headers lifts the decision into response._hidden_params["additional_headers"] alongside the existing x-litellm-model-group / x-litellm-model-id headers. Existing metadata keys (trace_id, user_id, etc.) are preserved. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): replace capabilities with keyword override Drops the capability-based filtering in favor of a keyword-based override for v0: - RoutingPreferences.keywords: List[str] (replaces capabilities) — each deployment can declare substring keywords. - If any declared keyword (case-insensitive) appears in the user message, the router short-circuits the complexity-classification flow and routes to the matching deployment. - Tiebreaker for overlapping keyword matches: quality_tier DESC, then cheapest model_info.input_cost_per_token ASC. Unpriced models lose ties to priced ones. Decision metadata + headers now expose the override: x-litellm-quality-router-via → "keyword" | "quality_tier" x-litellm-quality-router-keyword → matched keyword (only on keyword route) x-litellm-quality-router-complexity → complexity tier (only on tier route) Removes: - request_kwargs["litellm_capabilities"] reading - _model_capabilities, _model_supports_capabilities, _first_capable_model_at_tier, capability filter in _resolve_model_for_quality_tier Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): add explicit `order` to RoutingPreferences Adds an explicit priority field to RoutingPreferences for resolving collisions deterministically: RoutingPreferences.order: Optional[int] # lower wins; unset = +inf Used as the PRIMARY tiebreaker in two places: 1. Keyword overlap: when multiple deployments declare the same matching keyword, sort by (order ASC, quality_tier DESC, input_cost_per_token ASC, model_name ASC). Explicit always beats implicit. 2. Tier resolution: when multiple deployments share a quality tier, `_resolve_model_for_quality_tier` picks the one with the lowest order. The tier list is now sorted at index-build time. This lets admins make routing decisions explicit when the natural quality-and-price ordering would pick the wrong model. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): reorder tiebreak to (quality, order, price) Changes the tiebreak ordering so quality_tier always wins first, then explicit `order` is used to break ties within the same tier, then price breaks the rest: 1. quality_tier DESC ← best model wins first 2. order ASC ← explicit priority within a tier 3. input_cost_per_token ASC 4. model_name ASC Previously `order` was the primary key — that meant a tier-2 model with `order=1` would beat a tier-3 model with no `order`, which is the wrong default. Now `order` only resolves collisions among same-tier candidates. Tier resolution (within a single tier) keeps the same key minus quality: (order ASC, cost ASC, name). Test renames + flips: - test_explicit_order_overrides_quality_tier → test_quality_wins_over_explicit_order - new: test_order_breaks_tie_within_same_quality_tier Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * fix(quality_router): resolve Greptile review feedback Addresses four P1 findings from PR review plus test coverage: 1. set_model_list missing quality_routers reset - Hot-reloading the Router would leave stale QualityRouter instances pointing at the old model_list. `set_model_list` now clears `self.quality_routers` alongside the other indices. 2. Round-down fallback before default_model - `_resolve_model_for_quality_tier` now rounds DOWN to the closest lower tier after round-up fails, before falling back to `default_model`. Degrades gracefully rather than jumping straight off-tier. 3. RoutingPreferences validation bypass - `_build_tier_index` now instantiates `RoutingPreferences(**prefs)` so invalid shapes (e.g. non-int quality_tier) raise a clear ValueError instead of silently succeeding. 4. Config-ordering dependency - `_tier_to_models` is now built lazily on first access. Previously, eager construction in `__init__` meant a QualityRouter deployment had to appear AFTER all its referenced models in config.yaml, because `Router._create_deployment` populates `model_list` incrementally. Any `available_models` defined after the router entry would silently be reported as missing. Also adds 6 new tests covering each fix: - test_invalid_quality_tier_type_raises_clear_error - test_router_can_be_instantiated_before_its_targets_exist - test_set_model_list_clears_quality_routers_registry - test_rounds_down_when_no_higher_tier_exists - test_rounds_down_prefers_closest_lower_tier - test_prefers_round_up_over_round_down Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * style: apply black 24.10.0 formatting to pre-existing offenders Unblocks the LiteLLM Linting check for this PR — these 12 files are already failing `black --check` on main (the lint workflow only runs on PRs, so main drifts). No behavior changes; formatting-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update litellm/router.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* feat(proxy): add --reload flag for uvicorn hot reload (dev only) Opt-in CLI flag, off by default, no env var. Only affects the uvicorn run path; gunicorn/hypercorn paths and prod (which doesn't pass the flag) are unaffected. * Feature/add audio support for scaleway (#26110) * feat(scaleway): add SCALEWAY to LlmProviders enum * feat(scaleway): add audio transcription config and dispatch wiring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(scaleway): add behavior tests for audio transcription config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(scaleway): advertise audio_transcriptions in endpoint-support JSON * docs(scaleway): document audio transcription support * fix(scaleway): address PR review — plain-text response_format + missing-key fail-fast Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(scaleway): cover new response paths, drop gettysburg.wav coupling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Prompt Compression - add it to the proxy (#25729) * refactor: new agentic loop event hook simplifies how to create logic for tool based multi llm calls * fix: compress - make it work on anthropic input as well * fix(compress.py): working prompt compression for claude code ensures claude code messages can run through proxy easily * docs: add agentic loop hook guide * docs: add agentic_loop_hook to sidebar * fix: fix multiple arguments error * fix: fix tool call loop for compression on streaming /v1/messages * fix: fix linting errors * fix: fix ci/cd errors * feat(litellm_pre_call_utils.py): use claude code session for litellm session id allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation * fix: suppress incorrect mypy warning rE: module * revert: drop PR's changes to litellm/proxy/_experimental/out/ Restores the 34 HTML files under _experimental/out/ to their pre-PR paths (X/index.html -> X.html). All renames are R100 (content unchanged); no other files are touched. * fix: address greptile review comments on PR #25729 - Skip ``kwargs["tools"] = []`` injection when compression is a no-op — Anthropic Messages rejects empty tool arrays on requests that did not originally declare tools. - Move agentic-loop safety guards (fingerprint cycle / max depth) out of the per-callback try/except so they propagate instead of being swallowed by the generic exception handler. Extracted _check_agentic_loop_safety. - Gate generic ``x-<vendor>-session-id`` capture behind the LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to preserve backwards compatibility; explicit x-litellm-* headers are unaffected. - Fix monkeypatch target in pre-call-hook test to patch the actual module-level binding (litellm.integrations.compression_interception.handler.compress). - Add regression tests for empty-tools skip and opt-in session capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag Generic x-<vendor>-session-id header capture is a new feature and only runs *after* the explicit x-litellm-trace-id / x-litellm-session-id checks, so it does not change behavior for any existing caller that was already using the LiteLLM headers — no backwards-incompatibility to gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(compress): replace input_type with CallTypes call_type Drop the bespoke ``CompressionInputType`` literal and use the existing ``litellm.types.utils.CallTypes`` enum instead. ``litellm.compress()`` now takes ``call_type: Union[CallTypes, str]`` (default ``CallTypes.completion``) — no new concept to learn, and the enum is already the way the rest of the codebase talks about request shapes. Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions shape) and ``anthropic_messages`` (Anthropic structured content blocks). Updated: compress(), the compression_interception handler, tests, docs, and the two eval scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Support /v1/responses in complexity router Adds cross-format support to the complexity router via the guardrail translation handler dispatch. Adds get_structured_messages to base translation plus OpenAI chat, Responses, and Anthropic handlers. Auto-router helper _extract_text_from_messages handles tool-call and multimodal messages. Widens async_pre_routing_hook messages type to Dict[str, Any]. Fixes #25134 * chore: apply black formatting * fix: fallback to trying each handler when route inference fails --------- Co-authored-by: Ryan Crabbe <ryan@berri.ai> Co-authored-by: nhyy244 <106547304+nhyy244@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

CLAassistant · 2026-04-20T22:51:32Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yuneng-berri
❌ krrish-berri-2
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…_20_2026

gitguardian · 2026-04-20T22:52:41Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
29203065	Triggered	JSON Web Token	`26fcbc9`	tests/test_litellm/proxy/test_litellm_pre_call_utils.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

greptile-apps · 2026-04-20T22:56:24Z

Greptile Summary

This PR introduces three new request-routing strategies — AutoRouter (semantic/embedding-based), ComplexityRouter (rule-based, <1ms), and QualityRouter (complexity tier → quality tier mapping with keyword override) — plus fixes the auto_routers hot-reload regression and lifts QualityRouter decisions into response headers.

Confidence Score: 5/5

Safe to merge; all open findings are P2 style/config issues that do not block the primary routing path.

The hot-reload regression (auto_routers not cleared) from the prior review is fixed, tests are comprehensive and mock-only, and the two remaining findings are a guard-ordering inefficiency and a broken example config — neither affects correctness of the routing logic itself.

litellm/proxy/new_secret_config.yaml (broken default model reference) and litellm/router.py init*_deployment methods (guard after construction).

Important Files Changed

Filename	Overview
litellm/router_strategy/auto_router/auto_router.py	New AutoRouter using semantic_router to classify messages and route to configured deployments; logic and fallback paths look correct.
litellm/router_strategy/quality_router/quality_router.py	New QualityRouter with lazy tier index, keyword override, and complexity-based routing; tiebreak logic is well tested, but the side-index dicts are never cleared before index rebuild.
litellm/router.py	Hot-reload fix (auto_routers now cleared in set_model_list), QualityRouter headers lifted into set_response_headers, but all three init_*_deployment methods check for duplicates after constructing the router object.
litellm/proxy/_new_secret_config.yaml	Example config references non-existent model "small-model" as the complexity router's default_model, which would cause routing failures for MEDIUM/REASONING tier requests.
tests/test_litellm/router_strategy/test_quality_router.py	Comprehensive mock-only tests covering tier index, resolution, keyword override, tiebreaking, and decision metadata; no real network calls detected.
tests/test_litellm/router_strategy/test_auto_router.py	Tests for AutoRouter._extract_text_from_messages are solid; routing hook tests are marked skip-beta and have the known RouteChoice isinstance issue (surfaced in a prior review).
litellm/router_strategy/quality_router/config.py	Clean Pydantic config models for QualityRouter with sensible defaults and extra='allow' for forward-compatibility.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B{async_pre_routing_hook}
    B --> C{model in auto_routers?}
    C -- Yes --> D[AutoRouter\nSemanticRouter embedding lookup]
    D --> E[RouteChoice.name or default_model]
    C -- No --> F{model in complexity_routers?}
    F -- Yes --> G[ComplexityRouter\nRule-based scoring]
    G --> H[classify → SIMPLE/MEDIUM/COMPLEX/REASONING]
    H --> I[get_model_for_tier]
    F -- No --> J{model in quality_routers?}
    J -- Yes --> K{keyword match?}
    K -- Yes --> L[Keyword override\nhighest quality_tier wins]
    K -- No --> M[ComplexityRouter.classify\n→ complexity_to_quality mapping]
    M --> N[_resolve_model_for_quality_tier\nexact → round-up → round-down → default]
    J -- No --> O[Normal routing]
    E --> P[PreRoutingHookResponse]
    I --> P
    L --> P
    N --> P
    P --> Q[Router selects deployment\nset_response_headers adds x-litellm-quality-router-* headers]

_{Reviews (3): Last reviewed commit: "style: apply black formatting to websear..." | Re-trigger Greptile}

greptile-apps · 2026-04-20T22:56:28Z

+        self.quality_routers = {}
+        self.complexity_routers = {}


auto_routers not reset on hot-reload

set_model_list resets quality_routers and complexity_routers to prevent stale state after hot-reload, but auto_routers is never cleared. Any set_model_list call that re-registers an existing auto_router/ deployment will hit the guard in init_auto_router_deployment and raise a ValueError, breaking hot-reload. Fix: add self.auto_routers = {} alongside the other resets.

greptile-apps · 2026-04-20T22:56:29Z

+    def _tier_to_models(self) -> Dict[int, List[str]]:
+        """Lazy tier→models index; built on first access."""
+        if self._tier_to_models_cache is None:
+            self._tier_to_models_cache = self._build_tier_index()
+        return self._tier_to_models_cache


Tier index not invalidated on dynamic model additions

_tier_to_models_cache is built lazily on first access and never cleared. Models added via add_deployment after the cache is populated are silently excluded from quality routing until set_model_list is called.

…loyment

…Error

…treaming_iterator

yuneng-berri and others added 3 commits April 18, 2026 19:33

Merge pull request #26044 from BerriAI/litellm_internal_staging

26fcbc9

[Infra] Promote staging to main

krrish-berri-2 had a problem deploying to integration-postgres April 20, 2026 22:52 — with GitHub Actions Error

Merge branch 'litellm_internal_staging' into litellm_krrish_staging_4…

72af41f

…_20_2026

krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 22:53 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 20, 2026 22:53 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 22:53 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 20, 2026 22:53 — with GitHub Actions Error

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

test: cover _is_quality_router_deployment and init_quality_router_dep…

63f7970

…loyment

krrish-berri-2 had a problem deploying to integration-postgres April 20, 2026 22:59 — with GitHub Actions Error

fix: reset auto_routers on set_model_list to prevent hot-reload Value…

ea906d0

…Error

krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 23:00 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 20, 2026 23:01 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 23:01 — with GitHub Actions Inactive

style: apply black formatting to websearch_interception and agentic_s…

1ed1aad

…treaming_iterator

krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 23:08 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 20, 2026 23:09 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 20, 2026 23:09 — with GitHub Actions Inactive

yuneng-berri approved these changes Apr 20, 2026

View reviewed changes

krrish-berri-2 merged commit e7bc316 into litellm_internal_staging Apr 20, 2026
87 of 95 checks passed

krrish-berri-2 deleted the litellm_krrish_staging_4_20_2026 branch April 20, 2026 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Litellm krrish staging 04 20 2026#26138

Litellm krrish staging 04 20 2026#26138
krrish-berri-2 merged 7 commits intolitellm_internal_stagingfrom
litellm_krrish_staging_4_20_2026

krrish-berri-2 commented Apr 20, 2026

Uh oh!

CLAassistant commented Apr 20, 2026

Uh oh!

gitguardian bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 20, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot Apr 20, 2026

Uh oh!

greptile-apps bot Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

krrish-berri-2 commented Apr 20, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

CLAassistant commented Apr 20, 2026

Uh oh!

gitguardian bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

greptile-apps bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gitguardian bot commented Apr 20, 2026 •

edited

Loading

greptile-apps bot commented Apr 20, 2026 •

edited

Loading