Litellm krrish staging 04 20 2026#26138
Litellm krrish staging 04 20 2026#26138krrish-berri-2 merged 7 commits intolitellm_internal_stagingfrom
Conversation
[Infra] Promote staging to main
…#25987) * feat(router): add auto_router/quality_router for quality-tier routing Adds a new auto-router type that routes a request to a model at a target quality tier. The quality tier is inferred by re-using the existing ComplexityRouter's classification, then mapped through an admin-configured complexity_to_quality table. Each candidate model declares its own quality_tier in model_info.litellm_routing_preferences. Resolution strategy: exact tier match, else round up to the next higher tier, else fall back to default_model. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): add capability-based filtering Each deployment can declare a `capabilities: List[str]` field in `model_info.litellm_routing_preferences` (e.g. ["vision", "function_calling"]). Requests can pass `litellm_capabilities` in `request_kwargs` to require specific capabilities — the router will only route to deployments whose declared capabilities are a superset. Resolution still walks tier (exact → round up), but at each tier filters by capability before picking. Falls back to default_model only when it also satisfies the required capabilities; otherwise raises rather than silently routing to a model that lacks a required capability. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): expose routing decision in response headers For transparency, expose the QualityRouter's routing decision in the proxy response headers: x-litellm-quality-router-model → picked model_name (e.g. "haiku-vision") x-litellm-quality-router-tier → resolved quality tier (e.g. "1") x-litellm-quality-router-complexity → ComplexityTier name (e.g. "SIMPLE") Mechanism: the pre-routing hook stashes the decision in request_kwargs["metadata"]["quality_router_decision"]. After the call returns, Router.set_response_headers lifts the decision into response._hidden_params["additional_headers"] alongside the existing x-litellm-model-group / x-litellm-model-id headers. Existing metadata keys (trace_id, user_id, etc.) are preserved. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): replace capabilities with keyword override Drops the capability-based filtering in favor of a keyword-based override for v0: - RoutingPreferences.keywords: List[str] (replaces capabilities) — each deployment can declare substring keywords. - If any declared keyword (case-insensitive) appears in the user message, the router short-circuits the complexity-classification flow and routes to the matching deployment. - Tiebreaker for overlapping keyword matches: quality_tier DESC, then cheapest model_info.input_cost_per_token ASC. Unpriced models lose ties to priced ones. Decision metadata + headers now expose the override: x-litellm-quality-router-via → "keyword" | "quality_tier" x-litellm-quality-router-keyword → matched keyword (only on keyword route) x-litellm-quality-router-complexity → complexity tier (only on tier route) Removes: - request_kwargs["litellm_capabilities"] reading - _model_capabilities, _model_supports_capabilities, _first_capable_model_at_tier, capability filter in _resolve_model_for_quality_tier Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): add explicit `order` to RoutingPreferences Adds an explicit priority field to RoutingPreferences for resolving collisions deterministically: RoutingPreferences.order: Optional[int] # lower wins; unset = +inf Used as the PRIMARY tiebreaker in two places: 1. Keyword overlap: when multiple deployments declare the same matching keyword, sort by (order ASC, quality_tier DESC, input_cost_per_token ASC, model_name ASC). Explicit always beats implicit. 2. Tier resolution: when multiple deployments share a quality tier, `_resolve_model_for_quality_tier` picks the one with the lowest order. The tier list is now sorted at index-build time. This lets admins make routing decisions explicit when the natural quality-and-price ordering would pick the wrong model. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat(quality_router): reorder tiebreak to (quality, order, price) Changes the tiebreak ordering so quality_tier always wins first, then explicit `order` is used to break ties within the same tier, then price breaks the rest: 1. quality_tier DESC ← best model wins first 2. order ASC ← explicit priority within a tier 3. input_cost_per_token ASC 4. model_name ASC Previously `order` was the primary key — that meant a tier-2 model with `order=1` would beat a tier-3 model with no `order`, which is the wrong default. Now `order` only resolves collisions among same-tier candidates. Tier resolution (within a single tier) keeps the same key minus quality: (order ASC, cost ASC, name). Test renames + flips: - test_explicit_order_overrides_quality_tier → test_quality_wins_over_explicit_order - new: test_order_breaks_tie_within_same_quality_tier Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * fix(quality_router): resolve Greptile review feedback Addresses four P1 findings from PR review plus test coverage: 1. set_model_list missing quality_routers reset - Hot-reloading the Router would leave stale QualityRouter instances pointing at the old model_list. `set_model_list` now clears `self.quality_routers` alongside the other indices. 2. Round-down fallback before default_model - `_resolve_model_for_quality_tier` now rounds DOWN to the closest lower tier after round-up fails, before falling back to `default_model`. Degrades gracefully rather than jumping straight off-tier. 3. RoutingPreferences validation bypass - `_build_tier_index` now instantiates `RoutingPreferences(**prefs)` so invalid shapes (e.g. non-int quality_tier) raise a clear ValueError instead of silently succeeding. 4. Config-ordering dependency - `_tier_to_models` is now built lazily on first access. Previously, eager construction in `__init__` meant a QualityRouter deployment had to appear AFTER all its referenced models in config.yaml, because `Router._create_deployment` populates `model_list` incrementally. Any `available_models` defined after the router entry would silently be reported as missing. Also adds 6 new tests covering each fix: - test_invalid_quality_tier_type_raises_clear_error - test_router_can_be_instantiated_before_its_targets_exist - test_set_model_list_clears_quality_routers_registry - test_rounds_down_when_no_higher_tier_exists - test_rounds_down_prefers_closest_lower_tier - test_prefers_round_up_over_round_down Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * style: apply black 24.10.0 formatting to pre-existing offenders Unblocks the LiteLLM Linting check for this PR — these 12 files are already failing `black --check` on main (the lint workflow only runs on PRs, so main drifts). No behavior changes; formatting-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update litellm/router.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* feat(proxy): add --reload flag for uvicorn hot reload (dev only) Opt-in CLI flag, off by default, no env var. Only affects the uvicorn run path; gunicorn/hypercorn paths and prod (which doesn't pass the flag) are unaffected. * Feature/add audio support for scaleway (#26110) * feat(scaleway): add SCALEWAY to LlmProviders enum * feat(scaleway): add audio transcription config and dispatch wiring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(scaleway): add behavior tests for audio transcription config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(scaleway): advertise audio_transcriptions in endpoint-support JSON * docs(scaleway): document audio transcription support * fix(scaleway): address PR review — plain-text response_format + missing-key fail-fast Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(scaleway): cover new response paths, drop gettysburg.wav coupling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Prompt Compression - add it to the proxy (#25729) * refactor: new agentic loop event hook simplifies how to create logic for tool based multi llm calls * fix: compress - make it work on anthropic input as well * fix(compress.py): working prompt compression for claude code ensures claude code messages can run through proxy easily * docs: add agentic loop hook guide * docs: add agentic_loop_hook to sidebar * fix: fix multiple arguments error * fix: fix tool call loop for compression on streaming /v1/messages * fix: fix linting errors * fix: fix ci/cd errors * feat(litellm_pre_call_utils.py): use claude code session for litellm session id allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation * fix: suppress incorrect mypy warning rE: module * revert: drop PR's changes to litellm/proxy/_experimental/out/ Restores the 34 HTML files under _experimental/out/ to their pre-PR paths (X/index.html -> X.html). All renames are R100 (content unchanged); no other files are touched. * fix: address greptile review comments on PR #25729 - Skip ``kwargs["tools"] = []`` injection when compression is a no-op — Anthropic Messages rejects empty tool arrays on requests that did not originally declare tools. - Move agentic-loop safety guards (fingerprint cycle / max depth) out of the per-callback try/except so they propagate instead of being swallowed by the generic exception handler. Extracted _check_agentic_loop_safety. - Gate generic ``x-<vendor>-session-id`` capture behind the LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to preserve backwards compatibility; explicit x-litellm-* headers are unaffected. - Fix monkeypatch target in pre-call-hook test to patch the actual module-level binding (litellm.integrations.compression_interception.handler.compress). - Add regression tests for empty-tools skip and opt-in session capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag Generic x-<vendor>-session-id header capture is a new feature and only runs *after* the explicit x-litellm-trace-id / x-litellm-session-id checks, so it does not change behavior for any existing caller that was already using the LiteLLM headers — no backwards-incompatibility to gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(compress): replace input_type with CallTypes call_type Drop the bespoke ``CompressionInputType`` literal and use the existing ``litellm.types.utils.CallTypes`` enum instead. ``litellm.compress()`` now takes ``call_type: Union[CallTypes, str]`` (default ``CallTypes.completion``) — no new concept to learn, and the enum is already the way the rest of the codebase talks about request shapes. Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions shape) and ``anthropic_messages`` (Anthropic structured content blocks). Updated: compress(), the compression_interception handler, tests, docs, and the two eval scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Support /v1/responses in complexity router Adds cross-format support to the complexity router via the guardrail translation handler dispatch. Adds get_structured_messages to base translation plus OpenAI chat, Responses, and Anthropic handlers. Auto-router helper _extract_text_from_messages handles tool-call and multimodal messages. Widens async_pre_routing_hook messages type to Dict[str, Any]. Fixes #25134 * chore: apply black formatting * fix: fallback to trying each handler when route inference fails --------- Co-authored-by: Ryan Crabbe <ryan@berri.ai> Co-authored-by: nhyy244 <106547304+nhyy244@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 29203065 | Triggered | JSON Web Token | 26fcbc9 | tests/test_litellm/proxy/test_litellm_pre_call_utils.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Greptile SummaryThis PR introduces three new request-routing strategies — Confidence Score: 5/5Safe to merge; all open findings are P2 style/config issues that do not block the primary routing path. The hot-reload regression (auto_routers not cleared) from the prior review is fixed, tests are comprehensive and mock-only, and the two remaining findings are a guard-ordering inefficiency and a broken example config — neither affects correctness of the routing logic itself. litellm/proxy/new_secret_config.yaml (broken default model reference) and litellm/router.py init*_deployment methods (guard after construction).
|
| Filename | Overview |
|---|---|
| litellm/router_strategy/auto_router/auto_router.py | New AutoRouter using semantic_router to classify messages and route to configured deployments; logic and fallback paths look correct. |
| litellm/router_strategy/quality_router/quality_router.py | New QualityRouter with lazy tier index, keyword override, and complexity-based routing; tiebreak logic is well tested, but the side-index dicts are never cleared before index rebuild. |
| litellm/router.py | Hot-reload fix (auto_routers now cleared in set_model_list), QualityRouter headers lifted into set_response_headers, but all three init_*_deployment methods check for duplicates after constructing the router object. |
| litellm/proxy/_new_secret_config.yaml | Example config references non-existent model "small-model" as the complexity router's default_model, which would cause routing failures for MEDIUM/REASONING tier requests. |
| tests/test_litellm/router_strategy/test_quality_router.py | Comprehensive mock-only tests covering tier index, resolution, keyword override, tiebreaking, and decision metadata; no real network calls detected. |
| tests/test_litellm/router_strategy/test_auto_router.py | Tests for AutoRouter._extract_text_from_messages are solid; routing hook tests are marked skip-beta and have the known RouteChoice isinstance issue (surfaced in a prior review). |
| litellm/router_strategy/quality_router/config.py | Clean Pydantic config models for QualityRouter with sensible defaults and extra='allow' for forward-compatibility. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request] --> B{async_pre_routing_hook}
B --> C{model in auto_routers?}
C -- Yes --> D[AutoRouter\nSemanticRouter embedding lookup]
D --> E[RouteChoice.name or default_model]
C -- No --> F{model in complexity_routers?}
F -- Yes --> G[ComplexityRouter\nRule-based scoring]
G --> H[classify → SIMPLE/MEDIUM/COMPLEX/REASONING]
H --> I[get_model_for_tier]
F -- No --> J{model in quality_routers?}
J -- Yes --> K{keyword match?}
K -- Yes --> L[Keyword override\nhighest quality_tier wins]
K -- No --> M[ComplexityRouter.classify\n→ complexity_to_quality mapping]
M --> N[_resolve_model_for_quality_tier\nexact → round-up → round-down → default]
J -- No --> O[Normal routing]
E --> P[PreRoutingHookResponse]
I --> P
L --> P
N --> P
P --> Q[Router selects deployment\nset_response_headers adds x-litellm-quality-router-* headers]
Reviews (3): Last reviewed commit: "style: apply black formatting to websear..." | Re-trigger Greptile
| self.quality_routers = {} | ||
| self.complexity_routers = {} |
There was a problem hiding this comment.
auto_routers not reset on hot-reload
set_model_list resets quality_routers and complexity_routers to prevent stale state after hot-reload, but auto_routers is never cleared. Any set_model_list call that re-registers an existing auto_router/ deployment will hit the guard in init_auto_router_deployment and raise a ValueError, breaking hot-reload. Fix: add self.auto_routers = {} alongside the other resets.
| def _tier_to_models(self) -> Dict[int, List[str]]: | ||
| """Lazy tier→models index; built on first access.""" | ||
| if self._tier_to_models_cache is None: | ||
| self._tier_to_models_cache = self._build_tier_index() | ||
| return self._tier_to_models_cache |
There was a problem hiding this comment.
…treaming_iterator
e7bc316
into
litellm_internal_staging
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes