Skip to content

[FEATURE] Integrate Jvector engine as another vector engine of choice #2386

@sam-herman

Description

@sam-herman

Is your feature request related to a problem?
Currently k-NN plugin supports 3 engines, Nmslib, Faiss and Lucene.
In this change I would like to integrate JVector as another engine of choice.
There are a number of unique advantages to doing so:

  1. Disk ANN - JVector is capable to perform search without loading the entire index into RAM. This is a functionality that is not available today through Lucene and can be done through jVector without involving native dependencies (FAISS) and cumbersome JNI mechanism.
  2. Thread Safety - JVector is a threadsafe index that supports concurrent modification and inserts with near perfect scalability as you add cores, Lucene is not threadsafe; OpenSearch kind of works around this with multiple segments but then has to compact them so insert performance still suffers (and I believe you can't read from a lucene segment during construction)
  3. quantized index construction - JVector can perform index construction w/ quantized vectors, saving memory = larger segments = fewer segments = faster searches
  4. Quantized Disk ANN - JVector supports DiskANN style quantization with rerank, it's quite easy (in principle) to demonstrate that this is a massive difference in performance for larger-than-memory indexes (in practice it takes days/weeks to insert enough vectors into Lucene to show this b/c of the single threaded problem, that's the only hard part)
  5. PQ and BQ support - As part of (3) JVector supports PQ as well as the BQ that Lucene offers, it seems that this is fairly rare (pgvector doesn't do PQ either) because (1) the code required to get high performance ADC with SIMD is a bit involved and (2) it requires a separate codebook which Lucene isn't set up to easily accommodate. PQ at 64x compression gives you higher relevance than BQ at 32x
  6. Fused ADC - Features that nobody else has like Fused ADC and NVQ and Anisotropic PQ
  7. Compatibility - JVector is compatible with Cassandra. Which allows to more easily transfer vector encoded data from Cassandra to OpenSearch and vice versa.

What solution would you like?
Introduce JVector into K-NN plugin as another supported engine.

** Benchmarks **
Will be adding some benchmarks to illustrate the above advantages...

What alternatives have you considered?
NA

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions