Skip to content

Conversation

adityamachiroutu
Copy link
Contributor

@adityamachiroutu adityamachiroutu commented Aug 7, 2025

Description

Integrates Lucene’s Better Binary Quantization (BBQ) into the OpenSearch k-NN plugin, enabling a memory-efficient encoding option for high-dimensional vector search. Compared to Faiss binary quantization, BBQ has increased recall, while maintaining low memory usage and integrating well into existing query and rescoring pipelines.

This integration exposes BBQ through the Lucene engine with a new "encoder": "binary" parameter in index mappings, leveraging Lucene’s native vector quantization and storage formats. BBQ improves recall on large datasets with a modest trade-off in latency and throughput, as demonstrated in the benchmarks below. Users can adjust the oversample factor in queries to balance recall and performance. By default the oversample factor set is 3x for vectors with 1000+ dimensions, and 5x otherwise. This matches the FAISS 32x compression defaults set.

Users can additionally configure the BBQ encoder by setting their engine as Lucene and the compression level to 32x, from which BBQ is automatically enabled.

Benchmarking Results:

No Rescoring Tests Comparing FAISS BQ to Lucene BBQ

Metric Sift-128, Lucene BBQ Sift-128, FaissBQ Cohere-1m, Lucene BBQ Cohere-1m, FaissBQ Unit
Mean recall@k 0.32 0.18 0.63 0.3
50th percentile service time 5.69515 5.71014 5.88795 5.58729 ms
90th percentile service time 8.88647 7.59109 7.74107 6.48138 ms
99th percentile service time 18.26953 17.57099 19.85139 17.39427 ms
Mean Throughput 6310.84 9569.38 3011.83 3112.96 docs/s
Median Throughput 6226.56 9582.13 3065.34 3152.94 docs/s

Config:
1 shard, 1 segments, 10 indexing clients, k=100, ef_search: 100, ef_construction: 100

Single Node Tests Comparing FAISS BQ (on disk) to Lucene BBQ

Dataset 32x Compression Technique Recall@100 p90 ms latency (1 shard, 1 segment)
clip-flickr-image-text-queries Lucene BBQ 0.91 96.1
Faiss BQ (on disk) 0.81 58.24
cohere-v2-dbpedia Lucene BBQ 0.75 52.4
Faiss BQ (on disk) 0.76 12.29
cohere-v2-wiki Lucene BBQ 0.9 58.9
Faiss BQ (on disk) 0.9 15.74
cohere-v3-bioasq Lucene BBQ 0.73 226.63
Faiss BQ (on disk) 0.73 17.97
e5small-msmarco Lucene BBQ 0.9 40.06
Faiss BQ (on disk) 0.8 16.04
gist Lucene BBQ 0.64 49.67
Faiss BQ (on disk) 0.12 46.84
glove Lucene BBQ 0.63 4.37
Faiss BQ (on disk) 0.42 4.37
minilm-msmarco Lucene BBQ 0.97 2.81
Faiss BQ (on disk) 0.9 3.6
mpnet-msmarco Lucene BBQ 0.98 4.98
Faiss BQ (on disk) 0.95 3.68
mbread_marco Lucene BBQ 0.96 5.93
Faiss BQ (on disk) 0.92 4.96
sift Lucene BBQ 0.69 5.23
Faiss BQ (on disk) 0.39 3.72
snowflake-msmarco Lucene BBQ 0.94 70.41
Faiss BQ (on disk) 0.93 47.79
tasb-msmarco Lucene BBQ 0.95 9.73
Faiss BQ (on disk) 0.87 3.93

Config: ef_search: 256, ef_construction: 256, m: 16, 1 shard, 1 client, 1 segment

graph size spacetype dimension embeddingtype
  • These tests compare Lucene’s better binary quantization with Faiss BQ (on disk), which is the current default quantization method used by OpenSearch when undergoing 32x compression.
  • Overall, we see that Lucene BBQ achieves higher recall compared to Faiss BQ on disk. This occurs for all nearly datasets, regardless of space type, dimensionality, dataset size, or modality of the data.
  • FAISS binary quantization significantly outperforms BBQ in p90 latency across all datasets.
  • Both models seems to struggle with image datasets compared to text datasets, although Lucene’s BBQ still outperformed FAISS
  • As dataset sizes increases, both quantization techniques scaled effectively, and the trade off recall to latency persisted
  • When accuracy is the focus, Lucene BBQ seems to be the use case fit, while Faiss BQ is more suitable for latency sensitive times.

Related Issues

Resolves #2805

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@naveentatikonda
Copy link
Member

@adityamachiroutu as mentioned offline, can you double check the oversampling factor of Faiss BQ with a debugger(check the value of this firstPassK). I believe it's 5x not 3x if dimension of vector is less than 1000 - https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/CompressionLevel.java#L119

@finnroblin
Copy link
Contributor

@naveentatikonda it's 5 for dimension < 1000 and 3 for >= 1000 as you mentioned. I checked with debugger last month. We also have a unit test confirming this: https://github.com/opensearch-project/k-NN/blob/b7fc5dd98072ea157a9b301e7f93d79e97[…]java/org/opensearch/knn/index/mapper/CompressionLevelTests.java

@adityamachiroutu adityamachiroutu changed the title [DRAFT] Integrating Lucene's Better Binary Quantization Integrating Lucene's Better Binary Quantization Aug 11, 2025
…which calls the lucene codec that implements better binary quantization

Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
@adityamachiroutu adityamachiroutu force-pushed the bbq-lucene-integration branch 2 times, most recently from e9ee3b5 to b814e1f Compare August 25, 2025 17:12
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
Signed-off-by: Aditya Machiroutu <[email protected]>
@adityamachiroutu adityamachiroutu force-pushed the bbq-lucene-integration branch 2 times, most recently from 988d959 to f8aea0c Compare August 26, 2025 17:59
Copy link
Contributor

@finnroblin finnroblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small comments and questions, thanks Aditya!

CHANGELOG.md Outdated
## [Unreleased 3.3](https://github.com/opensearch-project/k-NN/compare/main...HEAD)

### Features
* Integrated Lucene's better binary quantization [#2838](https://github.com/opensearch-project/k-NN/pull/2838)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: present tense

}
}

@AwaitsFix(bugUrl = "https://github.com/opensearch-project/k-NN/issues/2805")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be here?

if (engine == KNNEngine.LUCENE) {
if (params != null && params.containsKey(METHOD_ENCODER_PARAMETER)) {
KNNBBQVectorsFormatParams bbqParams = new KNNBBQVectorsFormatParams(params, defaultMaxConnections, defaultBeamWidth);
if (bbqParams.validate(params)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add debug log like the below?

),
knnBBQVectorsFormatParams -> new Lucene102HnswBinaryQuantizedVectorsFormat(
knnBBQVectorsFormatParams.getMaxConnections(),
knnBBQVectorsFormatParams.getBeamWidth(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no constructor parameter for bits like above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Lucene does not have that constructor.

* Check if BBQ is enabled
* @return true if BBQ is enabled, false otherwise
*/
public boolean isBBQEnabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this method?

MethodComponentContext methodComponentContext = resolvedKNNMethodContext.getMethodComponentContext();
MethodComponentContext encoderComponentContext = new MethodComponentContext(SQ_ENCODER.getName(), new HashMap<>());

String encoderName = (resolvedCompressionLevel == CompressionLevel.x32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since there are multiple clauses let's make this one if check of x32 compression

* is invalid.
*/
public RescoreContext getDefaultRescoreContext(Mode mode, int dimension, Version version) {
// TODO move this to separate class called resolver to resolve rescore context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your PR address this TODO? If not can we please leave it in?

);

// Invalid compression
expectThrows(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this in and change to x16 to still throw with a comment explaining that x32 added in this PR

Signed-off-by: Aditya Machiroutu <[email protected]>
@navneet1v
Copy link
Collaborator

@adityamachiroutu during your tests on different datasets was the rescoring added? or this is just bare bone testing of the quantization techniques?

@adityamachiroutu
Copy link
Contributor Author

@adityamachiroutu during your tests on different datasets was the rescoring added? or this is just bare bone testing of the quantization techniques?

Yes, during my tests, rescoring was added. For consistency, the same oversampling factor (5x for vectors with <1000 dimensions, 3x for vectors > 1000 dimensions) was kept for Faiss and Lucene runs.

@adityamachiroutu adityamachiroutu changed the base branch from main to feature/lucene-bbq August 27, 2025 17:33
KNNBBQVectorsFormatParams bbqParams = new KNNBBQVectorsFormatParams(params, defaultMaxConnections, defaultBeamWidth);
if (bbqParams.validate(params)) {
log.debug(
"Initialize KNN vector format for field [{}] with params [{}] = \"{}\", [{}] = \"{}\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the log statement have BBQ in it? Idea is for the log to distinguish whether BBQ or ScalarQuantized format was instantiated.

return false;
}

if (!(params.get(METHOD_ENCODER_PARAMETER) instanceof MethodComponentContext)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use == false check. It's standard across OpenSearch

if ((params.get(METHOD_ENCODER_PARAMETER) instanceof MethodComponentContext) == false)

// Add new method signature with KNNEngine parameter
public RescoreContext getDefaultRescoreContext(Mode mode, int dimension, Version version, KNNEngine engine) {
// TODO move this to separate class called resolver to resolve rescore context
if (modesForRescore.contains(mode)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also need a version check similar to here - it could be a restore scenario across versions and the encoder might not be supported, unless we backport it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for reviewing, addressed your comments.

Signed-off-by: Aditya Machiroutu <[email protected]>
knn_bwc_version.startsWith("2.15.")) {
filter {
excludeTestsMatching "org.opensearch.knn.bwc.IndexingIT.testKNNIndexLuceneQuantization"
excludeTestsMatching "org.opensearch.knn.bwc.IndexingIT.testKNNIndexLuceneBBQ"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you trying to exclude this test only until version 2.15 ? it won't work until version 3.2 ?

private static final int NUM_DOCS = 10;
private static int QUERY_COUNT = 0;

private static final String ALGO = "hnsw";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this constant

}

public void testKNNIndexLuceneBBQ() throws Exception {
waitForClusterHealthGreen(NODES_BWC_CLUSTER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add the same condition to validate BWC version twice in if and else blocks, probably add it here after the cluster is green

if (!isBBQEncoderSupported(getBWCVersion())) {
                logger.info("Skipping testKNNIndexLuceneBBQ as BBQ encoder is not supported in version: {}", getBWCVersion());
                return;
            }

private static final Set<VectorDataType> SUPPORTED_DATA_TYPES = ImmutableSet.of(VectorDataType.FLOAT);

private final static MethodComponent METHOD_COMPONENT = MethodComponent.Builder.builder(ENCODER_BBQ)
.addSupportedDataTypes(SUPPORTED_DATA_TYPES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline in the past, can you add support for bits parameter and set default to 1 bit (32x compression) such that in the future if Lucene supports 2 and 4 bits we can use this parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the naming convention, pls keep it consistent with Faiss BQ

@naveentatikonda
Copy link
Member

@adityamachiroutu can you also share the benchmarking results that you have without rescoring? Also, pls share your findings once you identify the hotspots for the reason behind this huge spike in latencies especially with cohere datasets (4x to 12x higher when compared to Faiss). Thanks!

@naveentatikonda
Copy link
Member

naveentatikonda commented Aug 28, 2025

Benchmarking Results:

Single Node Tests Comparing FAISS BQ (on disk) to Lucene BBQ

Dataset 32x Compression Technique Recall@100 p90 ms latency (1 shard, 1 segment)

@adityamachiroutu can you add the configuration used for running these tests specifically m, ef_construction ?

Signed-off-by: Aditya Machiroutu <[email protected]>
RescoreContext getDefaultRescoreContext(Mode mode, int dimension) {
return getDefaultRescoreContext(mode, dimension, Version.CURRENT);
// Special handling for Lucene BBQ (x32 compression)
if (this == x32 && engine == KNNEngine.LUCENE && version.onOrAfter(Version.V_3_2_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be after V_3_3_0, since this was not added as a part of 3.2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, version.onOrAfter(Version.V_3_3_0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] [RFC] Integrating Lucene's Better Binary Quantization into OpenSearch

5 participants