Skip to content

[FEATURE] [RFC] Integrating Lucene's Better Binary Quantization into OpenSearch #2805

@adityamachiroutu

Description

@adityamachiroutu

1. Problem Statement

This project aims to integrate BBQ into the OpenSearch k-NN plugin to offer users a memory-efficient alternative, ideal for large-scale vector workloads in constrained compute environments.

The OpenSearch k-NN plugin currently supports two engines, each with distinct quantization capabilities:

  • FAISS Engine: Supports Product Quantization, Scalar Quantization, Binary Quantization, and raw float32 storage
  • Lucene Engine: Supports scalar quantization (7-bit) byte vectors, and raw float32 storage

While both engines offer quantization options, OpenSearch lacks access to Lucene’s Better Binary Quantization, which provides a better compress to quality ratio. BBQ is available in the underlying Lucene library, but not exposed through OpenSearch’s k-NN plugin.

This project aims to address this gap and integrate Lucene’s BBQ into OpenSearch’s k-NN plugin, which provides users with a new memory efficient quantization option.

With this optimization binary quantization achieves a recall enhancement without oversampling from 0.18 using Faiss Binary Quantization to 0.32 using BBQ with the sift-128 dataset, and an improvement of 0.3 to 0.63 using Cohere's-1M dataset.


2. Overview

2.1 Introduction

Better Binary Quantization is a vector compression technique developed by Lucene which focuses on reducing memory usage for approximate nearest neighbor search. It encodes high dimensional float vectors into compact binary representations using advanced quantization methods optimized for search efficiency. BBQ was developed with RaBitQ as a reference, drawing on its approach of combining product quantization with binary encoding.


2.2 Understanding Better Binary Quantization

Technical Process:

This diagram illustrates the process of computing the global centroid of a set of vectors and how the vectors are then centered around this centroid. After centering, we calculate basic statistics and apply an optimization process to quantize the vectors efficiently. This quantization is ultimately used to produce the company binary representations of the vectors.

Centroid Calculation

Quantization Process

Compared to naive binary quantization, the centroid calculation is identical. Naive binary quantization does not, however, calculate stats, optimize intervals, and hold on to corrective factors. Once a centered vector has been calculated, naive binary quantization quantizes without making any optimization.


2.3 Comparison of Quantization Methods

Method Compression Ratio Recall Memory Usage
Scalar Quantization 2x, 4x Great Moderate
Faiss Binary Quantization 8x, 16x, 32x Moderate Low
Lucene Better Binary Quantization 32x Good Low

3. Proposed Solution

  • Lucene BBQ Integration: Incorporate Lucene’s Lucene102BinaryQuantizedVectorsFormat into OpenSearch.
  • Query Pipeline: Ensure BBQ-based search scoring is invoked properly.
  • Parameterization: Allow customizable BBQ configurations via mappings.

4. User Interface

Users will configure BBQ through the field mapping:

{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "encoder": {
              // CHANGED PARAMETER
              "name": "binary"
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

5. Implementation Details

5.1 Architecture Flow

  • User specifies "encoder": {"name": "binary"} in index mapping
  • LuceneHNSWMethod resolves BBQ encoder from SUPPORTED_ENCODERS map
  • BasePerFieldKnnVectorsFormat validates BBQ parameters via KNNBBQVectorsFormatParams
  • Codec instantiates Lucene102HnswBinaryQuantizedVectorsFormat for BBQ-enabled fields
  • Lucene's native BBQ implementation handles vector quantization and storage

5.2 Core Components

LuceneBBQEncoder

Implements Encoder interface, not much configuration as it is set similar to SQ, but without 3 variables.

public class LuceneBBQEncoder implements Encoder {
    private static final Set<VectorDataType> SUPPORTED_DATA_TYPES = ImmutableSet.of(VectorDataType.FLOAT);

    private final static MethodComponent METHOD_COMPONENT = MethodComponent.Builder.builder(ENCODER_BBQ)
        .addSupportedDataTypes(SUPPORTED_DATA_TYPES)
        .build();

    @Override
    public MethodComponent getMethodComponent() {
        return METHOD_COMPONENT;
    }

    @Override
    public CompressionLevel calculateCompressionLevel(
        MethodComponentContext methodComponentContext,
        KNNMethodConfigContext knnMethodConfigContext
    ) {
        return CompressionLevel.NOT_CONFIGURED;
    }
}

LuceneHNSWMethod

BBQ encoder is registered the same way as SQ encoder

final static Encoder SQ_ENCODER = new LuceneSQEncoder();
final static Encoder BBQ_ENCODER = new LuceneBBQEncoder();
final static Map<String, Encoder> SUPPORTED_ENCODERS = Map.of(
    SQ_ENCODER.getName(), SQ_ENCODER,
    BBQ_ENCODER.getName(), BBQ_ENCODER
);

KNNBBQVectorsFormatParams

Used for validation, ensures proper bbq encoder identification

public class KNNBBQVectorsFormatParams extends KNNVectorsFormatParams {
    @Override
    public boolean validate(Map<String, Object> params) {
        if (params.get(METHOD_ENCODER_PARAMETER) == null) {
            return false;
        }

        if (!(params.get(METHOD_ENCODER_PARAMETER) instanceof MethodComponentContext)) {
            return false;
        }

        MethodComponentContext encoderMethodComponentContext = 
            (MethodComponentContext) params.get(METHOD_ENCODER_PARAMETER);
        return ENCODER_BBQ.equals(encoderMethodComponentContext.getName());
    }
}

5.3 Codec Integration

BasePerFieldKnnVectorsFormat Changes

Check for logic that the user select BBQ

if (engine == KNNEngine.LUCENE) {
    if (params != null && params.containsKey(METHOD_ENCODER_PARAMETER)) {
        KNNBBQVectorsFormatParams bbqParams = new KNNBBQVectorsFormatParams(
            params, defaultMaxConnections, defaultBeamWidth
        );
        if (bbqParams.validate(params)) {
            log.debug("Initialize KNN vector format for field [{}] with BBQ encoder", field);
            return bbqVectorsFormatSupplier.apply(bbqParams);
        }
       SQ encoder validation...
    }
}

KNN9120PerFieldVectorsFormat Implementation

The concrete format supplier instantiates Lucene’s BBQ format

knnBBQVectorsFormatParams -> {
    final Tuple<Integer, ExecutorService> mergeThreadCountAndExecutorService = 
        getMergeThreadCountAndExecutorService();
    return new Lucene102HnswBinaryQuantizedVectorsFormat(
        knnBBQVectorsFormatParams.getMaxConnections(),
        knnBBQVectorsFormatParams.getBeamWidth(),
        mergeThreadCountAndExecutorService.v1(),
        mergeThreadCountAndExecutorService.v2()
    );
}

5.4 Rescoring/Oversampling

5.4.1 Leveraging Existing Infrastructure

BBQ integrates with OpenSearch k-NN's existing rescoring framework through Lucene's built-in search mechanism. No additional rescoring classes are required as Lucene's Lucene102HnswBinaryQuantizedVectorsFormat handles rescoring automatically.

5.4.2 Automatic Search Implementation

Lucene's BBQ implementation automatically performs:

  • HNSW graph traversal using binary quantized vectors for fast candidate retrieval
  • Final scoring using original FP32 vectors from rawVectorDelegate for precision

5.4.3 Query Parameter Usage

Users control BBQ rescoring through existing query structure:

{
  "query": {                                                                                                                                                                         
    "knn": {                                                                                                                                                                         
      "my_vector1": {                                                                                                                                                                
        "vector": [0.1, 0.2, 0.3],                                                                                                                                                   
        "k": 10,                                                                                                                                                                     
        "rescore": {                                                                                                                                                                 
          "oversample_factor": 3.0                                                                                                                                                   
        }                                                                                                                                                                            
      }                                                                                                                                                                              
    }                                                                                                                                                                                
  }                                                                                                                                                                                  
}                                                                                                                                                                                    
  • Rescoring needs to happen through oversampling_factor parameter.
  • Scalar quantization by default sets a compression level of 4x. This compression level is not configured with BBQ, so instead a user must pass an oversample factor by query to handle these differences
  • Unlike Faiss, Lucene mmaps the index, which allows search without fully loading the entire index into memory
  • The segment size is larger because we store the .veb file in addition to the .vemb file, which contains extra information

5.4.4 Built-in Performance Optimizations

Lucene's BBQ format automatically provides:

  • Selective Vector Access: Only loads FP32 vectors when needed for final scoring
  • Memory Efficiency: Leverages dual storage architecture to minimize memory usage
  • Optimized Scoring: Uses corrective terms and binary representations for efficient similarity computation

6. Benchmarking

Config

Config Item Sift-128 Cohere-768-1M
Nodes 1 1
Primary Shards 1 1
Replica Shards 1 1
ef_construction 100 100
ef_search 100 100
m 16 16
space type l2 l2

Metrics

Metric Task Sift Dataset, Lucene BBQ Sift Dataset, FaissBQ Cohere-768-1M, Lucene BBQ Cohere-768-1M, Faiss BQ Unit
Mean recall@100 prod-queries 0.32 0.18 0.63 0.3
Mean Throughput custom-vector-bulk 6310.84 9569.38 3011.83 3112.96 docs/s
Median Throughput custom-vector-bulk 6226.56 9582.13 3065.34 3152.94 docs/s
Mean Throughput prod-queries 1.82 1.61 1.66 1.69 ops/s
Median Throughput prod-queries 1.82 1.61 1.66 1.69 ops/s
50th percentile service time prod-queries 5.69515 5.71014 5.88795 5.58729 ms
90th percentile service time prod-queries 8.88647 7.59109 7.74107 6.48138 ms
99th percentile service time prod-queries 18.26953 17.57099 19.85139 17.39427 ms

Dataset: Cohere-10M

Metric Task No Oversample Oversample Factor 1 Oversample Factor 1.5 Oversample Factor 2 Oversample Factor 2.5 Oversample Factor 3 Oversample Factor 3.5 Oversample Factor 4 Oversample Factor 4.5 Oversample Factor 5 Unit
Translog size 0.00108 0.00036 8.20E-07 0.00036 8.20E-07 0.00072 0.00036 0.00108 0.00072 0.00108 GB
Mean Throughput prod-queries 181 169.54 152.16 138.07 125.6 115.96 107.24 100.06 93.94 88.6 ops/s
Median Throughput prod-queries 181.35 170.12 152.33 138.36 125.87 116.11 107.39 100.28 94.16 88.78 ops/s
50th percentile latency, 50th percentile service time prod-queries 3.33675 3.67427 4.42162 5.10511 5.81401 6.49634 7.18395 7.81621 8.47487 9.09765 ms
Mean recall@k prod-queries 0.62 0.87 0.9 0.92 0.94 0.95 0.95 0.96 0.96 0.97
Mean recall@1 prod-queries 0.57 0.94 0.96 0.96 0.97 0.98 0.98 0.98 0.98 0.98

Without any oversampling, there is a definite recall improvement when using BBQ as opposed to the existing Faiss binary quantization, with minor affects on latency and throughput. With rescoring enabled, BBQ has significant recall improvement, reaching recall > 0.95 with oversample factors above 3.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions