This app is a PoC. This app acts as a proxy for your existing running infinity inference. It utilizes classifier endpoint route of the infinity on backend, and presents itself as a reranking endpoint on the frontend so you can use it with LiteLLM.
Michael from Infinity recommended to run model as a classifier. If you want to utilize this model via reranking endpoint with your application (such as LiteLLM) that expects reranking endpoint format you need to run application like this.
Using the setup.sh script.
Using Python
python proxy-classifier.pyOr using docker:
docker compose -f docker-compose.yaml up -dIn a separate terminal. Use the model Michael Feil already converted for us, thanks Michael!
infinity_emb v2 --port 7997 \
--model-id michaelfeil/mxbai-rerank-large-v2-seq --batch-size 1 --revision "refs/heads/main" \
--url-prefix "/v1"curl --location 'http://localhost:8002/v1/rerank' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
"model": "michaelfeil/mxbai-rerank-large-v2-seq",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 4
}'{
"object": "rerank",
"results": [
{
"relevance_score": 0.9999301433563232,
"index": 2,
"document": null
},
{
"relevance_score": 0.8872248530387878,
"index": 0,
"document": null
},
{
"relevance_score": 0.8324779272079468,
"index": 1,
"document": null
},
{
"relevance_score": 0.44686806201934814,
"index": 3,
"document": null
}
],
"model": "michaelfeil/mxbai-rerank-large-v2-seq",
"usage": {
"prompt_tokens": 2282,
"total_tokens": 2282
},
"id": "infinity-9d00343b-20a9-4d69-9367-438a68bc08cb",
"created": 1742163022
}