-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Bug]: /rerank does not apply Virtual Key fallback chain (Fallbacks=None) #25298
Copy link
Copy link
Open
Labels
Description
Check for existing issues
- I have searched the existing issues and checked that this is not a duplicate.
- I found a rerank issue (Facing the Issue with the rerank model of cohere.rerank-v3-5:0 #20473), but it is about provider auth for Cohere rerank, not Virtual Key fallback handling on
/rerank.
What happened?
When configuring a Virtual Key in the LiteLLM UI with a Fallback Chain for a rerank model, requests to /rerank do not see any fallback configuration.
In the Virtual Keys UI:
- Primary model:
Qwen3-Reranker-8B - Fallback chain includes:
deepinfra/Qwen/Qwen3-Reranker-8B
Then I create/copy the virtual key and call /rerank with:
model: "Qwen3-Reranker-8B"mock_testing_fallbacks: true
Expected behavior:
- LiteLLM should pick up the fallback chain configured on the key
mock_testing_fallbacks: trueshould trigger fallback routing- Request should continue to the fallback model instead of failing with
Fallbacks=None
Actual behavior:
- LiteLLM throws an internal mock exception and reports:
Fallbacks=NoneAvailable Model Group Fallbacks=None
This makes it look like the fallback chain saved on the Virtual Key is not being applied to /rerank requests at all.
Steps to Reproduce
- Open LiteLLM UI -> Virtual Keys
- Create a new virtual key
- Set primary model to
Qwen3-Reranker-8B - In Router Settings, configure a fallback chain with
deepinfra/Qwen/Qwen3-Reranker-8B - Save the key and copy the generated key
- Send a rerank request like this:
curl -X POST "https://<proxy>/rerank" \
-H "Authorization: Bearer <VIRTUAL_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3-Reranker-8B",
"query": "ping",
"documents": [
"say ping",
"say pong",
"hello world"
],
"top_n": 2,
"mock_testing_fallbacks": true
}'- Observe that the request fails immediately instead of routing to the configured fallback
Relevant log output
{
"error": {
"message": "litellm.InternalServerError: This is a mock exception for model=Qwen3-Reranker-8B, to trigger a fallback. Fallbacks=None. Received Model Group=Qwen3-Reranker-8B\nAvailable Model Group Fallbacks=None",
"type": null,
"param": null,
"code": "500"
}
}What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
v1.83.3
Twitter / LinkedIn details
No response

Reactions are currently unavailable