Description
Is your feature request related to a problem? Please describe
ColBert, Colpali like late interaction models provide multiple token level vectors for a document/query and multiple patch level vectors for an image. These patch/token level vectors when subjected to similarity search should be scored using MaxSim operation based on the late interaction mechanism. An excellent blog from Weviate on late interaction models, MaxSim etc.
These models when used for search provides better relevance due to the interaction of every query and document token which is not possible with ordinary vector search based on single vectors for the document and query, where similarity is calculated as Cosine(1 doc vector, 1 query vector). Also, late interaction models allows interpretability of vector search as in which token (for a document) or patch (for an image) is causing the document to appear in the result and you can highlight those tokens/patches for your results (semantic highlighting). Examples,

Describe the solution you'd like
The workflow can be depicted in 2 ways,
Type 1. Pooling of query token vectors:
- During ingestion, the multiple vectors per document are stored using OpenSearch nested vector fields
- Phase 1 (Retrieval): On query time, the multiple token vectors of the query will be converted into a single query vector (Max/Min Pooling) and subjected to ANN search against the document nested vector field.
- The unique documents fetched (de-duplication already handled by OpenSearch nested vector field) should enter the ranking phase.
- Phase 2 (Ranking):In the ranking phase, all the token level vectors of both the documents and query are considered to calculate the score for the whole document-query using MaxSim formula.
Type 2. No pooling done on query token vectors:
- During ingestion, the multiple vectors per document are stored using OpenSearch nested vector fields
- Phase 1 (Retrieval): On query time, each token vector of the query will be subjected to ANN search against the document nested vector field.
- The unique documents fetched (de-duplication already handled by OpenSearch nested vector field) enter the ranking phase. These documents are the union list of all candidates that resulted from ANN search of every token query vector.
- Phase 2 (Ranking): In the ranking phase, all the token level vectors of both the documents and query are considered to calculate the score for the whole document-query using MaxSim formula.
Just for similar feature comparison, the type 1 multi-phase retrieval is supported in a ElasticSearch using a new rescorer retriever query type and maxsim ranking is supported by a script_score function, maxSimDotProduct (MaxSim over dot product of full precision vectors) and maxSimInvHamming (MaxSim over Hamming distance of bit vector dimensions) over multiple vectors per document stored in a field type called rank_vectors
Related component
Search
Describe alternatives you've considered
ClientSide MaxSim scoring
OpenSearch Hybrid query to support upto ANN of 5 query token vectors against the document multi vectors in nested vector field
Additional context
No response
Metadata
Metadata
Assignees
Type
Projects
Status