Skip to content

qdrant COLBERT tutorials fail #1776

@rwaner-fr

Description

@rwaner-fr

hello,

has anybody else tried the Qdrant samples out and failed to reproduce the expected outputs, when using their COLBERT samples :

https://github.com/qdrant/landing_page/blob/master/qdrant-landing/content/documentation/fastembed/fastembed-colbert.md

And

https://github.com/qdrant/landing_page/blob/master/qdrant-landing/content/documentation/advanced-tutorials/using-multivector-representations.md

for fastembed-colbert => it also fails to rank the desired movies (example with their propsoed query "A story about a strong historically significant female figure » ranks the movie "The Passion of Joan of Arc » in 3 or 4 position ,depending on the colbert model used… with result N° 1 being "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent » and N°2 being ""Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl » accroos all Colbert models used.

for multivecrtor => it consistently fails to rank the desired best match (the answer to "How does AI help in medicine? » would be : "Artificial intelligence is used in hospitals for cancer diagnosis and treatment. ») but that expected output is ranked last… the same goes with the movie dataset, we can’t rank « Joanne of Arc » first - other queries also fail.

I have found that the standard reranking approach (intfloat/multilingual-e5-large and jinaai/jina-reranker-v2-base-multilingual) provides the most accuracy —

Config :

  • Local on Macbook M1
  • Qdrant Database either on localhost:6333 or in memory.
  • python 3.13.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions