-
Notifications
You must be signed in to change notification settings - Fork 122
Description
hello,
has anybody else tried the Qdrant samples out and failed to reproduce the expected outputs, when using their COLBERT samples :
And
for fastembed-colbert => it also fails to rank the desired movies (example with their propsoed query "A story about a strong historically significant female figure » ranks the movie "The Passion of Joan of Arc » in 3 or 4 position ,depending on the colbert model used… with result N° 1 being "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent » and N°2 being ""Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl » accroos all Colbert models used.
for multivecrtor => it consistently fails to rank the desired best match (the answer to "How does AI help in medicine? » would be : "Artificial intelligence is used in hospitals for cancer diagnosis and treatment. ») but that expected output is ranked last… the same goes with the movie dataset, we can’t rank « Joanne of Arc » first - other queries also fail.
I have found that the standard reranking approach (intfloat/multilingual-e5-large and jinaai/jina-reranker-v2-base-multilingual) provides the most accuracy —
Config :
- Local on Macbook M1
- Qdrant Database either on localhost:6333 or in memory.
- python 3.13.