[FEATURE] [RFC] Integrating Lucene's Better Binary Quantization into OpenSearch

## ***1. Problem Statement***

This project aims to integrate BBQ into the OpenSearch k-NN plugin to offer users a memory-efficient alternative, ideal for large-scale vector workloads in constrained compute environments.

The OpenSearch k-NN plugin currently supports two engines, each with distinct quantization capabilities:

- **FAISS Engine**: Supports Product Quantization, Scalar Quantization, Binary Quantization, and raw float32 storage
- **Lucene Engine**: Supports scalar quantization (7-bit) byte vectors, and raw float32 storage

While both engines offer quantization options, OpenSearch lacks access to Lucene’s Better Binary Quantization, which provides a better compress to quality ratio. BBQ is available in the underlying Lucene library, but not exposed through OpenSearch’s k-NN plugin.

This project aims to address this gap and integrate Lucene’s BBQ into OpenSearch’s k-NN plugin, which provides users with a new memory efficient quantization option. 

With this optimization binary quantization achieves a recall enhancement without oversampling from 0.18 using Faiss Binary Quantization to 0.32 using BBQ with the sift-128 dataset, and an improvement of 0.3 to 0.63 using Cohere's-1M dataset.

---

## ***2. Overview***

### **2.1 Introduction**

Better Binary Quantization is a vector compression technique developed by Lucene which focuses on reducing memory usage for approximate nearest neighbor search. It encodes high dimensional float vectors into compact binary representations using advanced quantization methods optimized for search efficiency. BBQ was developed with RaBitQ as a reference, drawing on its approach of combining product quantization with binary encoding.

---

### **2.2 Understanding Better Binary Quantization**

**Technical Process:**

This diagram illustrates the process of computing the global centroid of a set of vectors and how the vectors are then centered around this centroid. After centering, we calculate basic statistics and apply an optimization process to quantize the vectors efficiently. This quantization is ultimately used to produce the company binary representations of the vectors.

![Centroid Calculation](https://github.com/user-attachments/assets/14d7d075-070f-4dd6-9853-1f28cc2ae399)

![Quantization Process](https://github.com/user-attachments/assets/3d90eb33-11f4-4e9c-90b1-0060bb95485e)

Compared to naive binary quantization, the centroid calculation is identical. Naive binary quantization does not, however, calculate stats, optimize intervals, and hold on to corrective factors. Once a centered vector has been calculated, naive binary quantization quantizes without making any optimization.

---

### **2.3 Comparison of Quantization Methods**

| Method                         | Compression Ratio | Recall   | Memory Usage |
| ----------------------------- | ----------------: | -------: | -----------: |
| Scalar Quantization           | 2x, 4x            | Great    | Moderate     |
| Faiss Binary Quantization     | 8x, 16x, 32x      | Moderate | Low          |
| Lucene Better Binary Quantization | 32x           | Good     | Low          |

---

## ***3. Proposed Solution***

- **Lucene BBQ Integration**: Incorporate Lucene’s `Lucene102BinaryQuantizedVectorsFormat` into OpenSearch.
- **Query Pipeline**: Ensure BBQ-based search scoring is invoked properly.
- **Parameterization**: Allow customizable BBQ configurations via mappings.

---

## ***4. User Interface***

Users will configure BBQ through the field mapping:

```json
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "encoder": {
              // CHANGED PARAMETER
              "name": "binary"
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}
```
---

## ***5. Implementation Details***

### **5.1 Architecture Flow**

<img src="https://github.com/user-attachments/assets/59973f41-8918-4986-af0f-8d5869822859" width="400">

- User specifies `"encoder": {"name": "binary"}` in index mapping
- `LuceneHNSWMethod` resolves BBQ encoder from `SUPPORTED_ENCODERS` map
- `BasePerFieldKnnVectorsFormat` validates BBQ parameters via `KNNBBQVectorsFormatParams`
- Codec instantiates `Lucene102HnswBinaryQuantizedVectorsFormat` for BBQ-enabled fields
- Lucene's native BBQ implementation handles vector quantization and storage

---

### **5.2 Core Components**

#### LuceneBBQEncoder

Implements `Encoder` interface, not much configuration as it is set similar to SQ, but without 3 variables.

```java
public class LuceneBBQEncoder implements Encoder {
    private static final Set<VectorDataType> SUPPORTED_DATA_TYPES = ImmutableSet.of(VectorDataType.FLOAT);

    private final static MethodComponent METHOD_COMPONENT = MethodComponent.Builder.builder(ENCODER_BBQ)
        .addSupportedDataTypes(SUPPORTED_DATA_TYPES)
        .build();

    @Override
    public MethodComponent getMethodComponent() {
        return METHOD_COMPONENT;
    }

    @Override
    public CompressionLevel calculateCompressionLevel(
        MethodComponentContext methodComponentContext,
        KNNMethodConfigContext knnMethodConfigContext
    ) {
        return CompressionLevel.NOT_CONFIGURED;
    }
}
```
#### LuceneHNSWMethod

BBQ encoder is registered the same way as SQ encoder

```java
final static Encoder SQ_ENCODER = new LuceneSQEncoder();
final static Encoder BBQ_ENCODER = new LuceneBBQEncoder();
final static Map<String, Encoder> SUPPORTED_ENCODERS = Map.of(
    SQ_ENCODER.getName(), SQ_ENCODER,
    BBQ_ENCODER.getName(), BBQ_ENCODER
);
```

#### KNNBBQVectorsFormatParams

Used for validation, ensures proper bbq encoder identification

```java
public class KNNBBQVectorsFormatParams extends KNNVectorsFormatParams {
    @Override
    public boolean validate(Map<String, Object> params) {
        if (params.get(METHOD_ENCODER_PARAMETER) == null) {
            return false;
        }

        if (!(params.get(METHOD_ENCODER_PARAMETER) instanceof MethodComponentContext)) {
            return false;
        }

        MethodComponentContext encoderMethodComponentContext = 
            (MethodComponentContext) params.get(METHOD_ENCODER_PARAMETER);
        return ENCODER_BBQ.equals(encoderMethodComponentContext.getName());
    }
}
```

### **5.3 Codec Integration**

#### BasePerFieldKnnVectorsFormat Changes

Check for logic that the user select BBQ

```java
if (engine == KNNEngine.LUCENE) {
    if (params != null && params.containsKey(METHOD_ENCODER_PARAMETER)) {
        KNNBBQVectorsFormatParams bbqParams = new KNNBBQVectorsFormatParams(
            params, defaultMaxConnections, defaultBeamWidth
        );
        if (bbqParams.validate(params)) {
            log.debug("Initialize KNN vector format for field [{}] with BBQ encoder", field);
            return bbqVectorsFormatSupplier.apply(bbqParams);
        }
       SQ encoder validation...
    }
}
```

#### KNN9120PerFieldVectorsFormat Implementation

 The concrete format supplier instantiates Lucene’s BBQ format

```java
knnBBQVectorsFormatParams -> {
    final Tuple<Integer, ExecutorService> mergeThreadCountAndExecutorService = 
        getMergeThreadCountAndExecutorService();
    return new Lucene102HnswBinaryQuantizedVectorsFormat(
        knnBBQVectorsFormatParams.getMaxConnections(),
        knnBBQVectorsFormatParams.getBeamWidth(),
        mergeThreadCountAndExecutorService.v1(),
        mergeThreadCountAndExecutorService.v2()
    );
}
```

### **5.4 Rescoring/Oversampling**

#### 5.4.1 Leveraging Existing Infrastructure

BBQ integrates with OpenSearch k-NN's existing rescoring framework through Lucene's built-in search mechanism. No additional rescoring classes are required as Lucene's Lucene102HnswBinaryQuantizedVectorsFormat handles rescoring automatically.

#### 5.4.2 Automatic Search Implementation

Lucene's BBQ implementation automatically performs:

* HNSW graph traversal using binary quantized vectors for fast candidate retrieval
* Final scoring using original FP32 vectors from rawVectorDelegate for precision

#### 5.4.3 Query Parameter Usage

Users control BBQ rescoring through existing query structure:  

```json
{
  "query": {                                                                                                                                                                         
    "knn": {                                                                                                                                                                         
      "my_vector1": {                                                                                                                                                                
        "vector": [0.1, 0.2, 0.3],                                                                                                                                                   
        "k": 10,                                                                                                                                                                     
        "rescore": {                                                                                                                                                                 
          "oversample_factor": 3.0                                                                                                                                                   
        }                                                                                                                                                                            
      }                                                                                                                                                                              
    }                                                                                                                                                                                
  }                                                                                                                                                                                  
}                                                                                                                                                                                    
```
* Rescoring needs to happen through oversampling_factor parameter. 
* Scalar quantization by default sets a compression level of 4x. This compression level is not configured with BBQ, so instead a user must pass an oversample factor by query to handle these differences
* Unlike Faiss, Lucene mmaps the index, which allows search without fully loading the entire index into memory
* The segment size is larger because we store the .veb file in addition to the .vemb file, which contains extra information

#### 5.4.4 Built-in Performance Optimizations

Lucene's BBQ format automatically provides:

* Selective Vector Access: Only loads FP32 vectors when needed for final scoring
* Memory Efficiency: Leverages dual storage architecture to minimize memory usage
* Optimized Scoring: Uses corrective terms and binary representations for efficient similarity computation

## ***6. Benchmarking***

## Config

| Config Item       | Sift-128 | Cohere-768-1M |
|------------------|:--------:|:-------------:|
| Nodes            |    1     |       1       |
| Primary Shards   |    1     |       1       |
| Replica Shards   |    1     |       1       |
| ef_construction  |   100    |      100      |
| ef_search        |   100    |      100      |
| m                |   16     |      16       |
| space type       |   l2     |      l2       |

## Metrics

| Metric                              | Task                     | Sift Dataset, Lucene BBQ | Sift Dataset, FaissBQ | Cohere-768-1M, Lucene BBQ | Cohere-768-1M, Faiss BQ | Unit      |
|------------------------------------|-------------------------|:-----------------------:|:--------------------:|:------------------------:|:----------------------:|:---------:|
| Mean recall@100                    | prod-queries            |          0.32           |         0.18         |           0.63           |          0.3           |           |
| Mean Throughput                    | custom-vector-bulk      |        6310.84          |       9569.38        |         3011.83          |        3112.96         | docs/s    |
| Median Throughput                  | custom-vector-bulk      |        6226.56          |       9582.13        |         3065.34          |        3152.94         | docs/s    |
| Mean Throughput                    | prod-queries            |          1.82           |         1.61         |           1.66           |          1.69          | ops/s     |
| Median Throughput                  | prod-queries            |          1.82           |         1.61         |           1.66           |          1.69          | ops/s     |
| 50th percentile service time       | prod-queries            |         5.69515         |       5.71014        |         5.88795          |        5.58729         | ms        |
| 90th percentile service time       | prod-queries            |         8.88647         |       7.59109        |         7.74107          |        6.48138         | ms        |
| 99th percentile service time       | prod-queries            |        18.26953         |      17.57099        |        19.85139          |       17.39427         | ms        |


Dataset: Cohere-10M
| Metric                                             | Task         | No Oversample | Oversample Factor 1 | Oversample Factor 1.5 | Oversample Factor 2 | Oversample Factor 2.5 | Oversample Factor 3 | Oversample Factor 3.5 | Oversample Factor 4 | Oversample Factor 4.5 | Oversample Factor 5 | Unit   |
|----------------------------------------------------|--------------|----------------|-----------------------|------------------------|----------------------|------------------------|----------------------|------------------------|----------------------|------------------------|----------------------|--------|
| Translog size                                      |              | 0.00108        | 0.00036               | 8.20E-07               | 0.00036              | 8.20E-07               | 0.00072              | 0.00036               | 0.00108              | 0.00072               | 0.00108              | GB     |
| Mean Throughput                                    | prod-queries | 181            | 169.54                | 152.16                 | 138.07               | 125.6                  | 115.96               | 107.24                | 100.06               | 93.94                 | 88.6                 | ops/s  |
| Median Throughput                                  | prod-queries | 181.35         | 170.12                | 152.33                 | 138.36               | 125.87                 | 116.11               | 107.39                | 100.28               | 94.16                 | 88.78                | ops/s  |
| 50th percentile latency, 50th percentile service time | prod-queries | 3.33675        | 3.67427               | 4.42162                | 5.10511              | 5.81401                | 6.49634              | 7.18395               | 7.81621              | 8.47487               | 9.09765              | ms     |
| Mean recall@k                                      | prod-queries | 0.62           | 0.87                  | 0.9                    | 0.92                 | 0.94                   | 0.95                 | 0.95                  | 0.96                 | 0.96                  | 0.97                 |        |
| Mean recall@1                                      | prod-queries | 0.57           | 0.94                  | 0.96                   | 0.96                 | 0.97                   | 0.98                 | 0.98                  | 0.98                 | 0.98                  | 0.98                 |        |


Without any oversampling, there is a definite recall improvement when using BBQ as opposed to the existing Faiss binary quantization, with minor affects on latency and throughput. With rescoring enabled, BBQ has significant recall improvement, reaching recall > 0.95 with oversample factors above 3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] [RFC] Integrating Lucene's Better Binary Quantization into OpenSearch #2805

1. Problem Statement

2. Overview

2.1 Introduction

2.2 Understanding Better Binary Quantization

2.3 Comparison of Quantization Methods

3. Proposed Solution

4. User Interface

5. Implementation Details

5.1 Architecture Flow

5.2 Core Components

LuceneBBQEncoder

LuceneHNSWMethod

KNNBBQVectorsFormatParams

5.3 Codec Integration

BasePerFieldKnnVectorsFormat Changes

KNN9120PerFieldVectorsFormat Implementation

5.4 Rescoring/Oversampling

5.4.1 Leveraging Existing Infrastructure

5.4.2 Automatic Search Implementation

5.4.3 Query Parameter Usage

5.4.4 Built-in Performance Optimizations

6. Benchmarking

Config

Metrics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Method	Compression Ratio	Recall	Memory Usage
Scalar Quantization	2x, 4x	Great	Moderate
Faiss Binary Quantization	8x, 16x, 32x	Moderate	Low
Lucene Better Binary Quantization	32x	Good	Low

Config Item	Sift-128	Cohere-768-1M
Nodes	1	1
Primary Shards	1	1
Replica Shards	1	1
ef_construction	100	100
ef_search	100	100
m	16	16
space type	l2	l2

Metric	Task	Sift Dataset, Lucene BBQ	Sift Dataset, FaissBQ	Cohere-768-1M, Lucene BBQ	Cohere-768-1M, Faiss BQ	Unit
Mean recall@100	prod-queries	0.32	0.18	0.63	0.3
Mean Throughput	custom-vector-bulk	6310.84	9569.38	3011.83	3112.96	docs/s
Median Throughput	custom-vector-bulk	6226.56	9582.13	3065.34	3152.94	docs/s
Mean Throughput	prod-queries	1.82	1.61	1.66	1.69	ops/s
Median Throughput	prod-queries	1.82	1.61	1.66	1.69	ops/s
50th percentile service time	prod-queries	5.69515	5.71014	5.88795	5.58729	ms
90th percentile service time	prod-queries	8.88647	7.59109	7.74107	6.48138	ms
99th percentile service time	prod-queries	18.26953	17.57099	19.85139	17.39427	ms

Metric	Task	No Oversample	Oversample Factor 1	Oversample Factor 1.5	Oversample Factor 2	Oversample Factor 2.5	Oversample Factor 3	Oversample Factor 3.5	Oversample Factor 4	Oversample Factor 4.5	Oversample Factor 5	Unit
Translog size		0.00108	0.00036	8.20E-07	0.00036	8.20E-07	0.00072	0.00036	0.00108	0.00072	0.00108	GB
Mean Throughput	prod-queries	181	169.54	152.16	138.07	125.6	115.96	107.24	100.06	93.94	88.6	ops/s
Median Throughput	prod-queries	181.35	170.12	152.33	138.36	125.87	116.11	107.39	100.28	94.16	88.78	ops/s
50th percentile latency, 50th percentile service time	prod-queries	3.33675	3.67427	4.42162	5.10511	5.81401	6.49634	7.18395	7.81621	8.47487	9.09765	ms
Mean recall@k	prod-queries	0.62	0.87	0.9	0.92	0.94	0.95	0.95	0.96	0.96	0.97
Mean recall@1	prod-queries	0.57	0.94	0.96	0.96	0.97	0.98	0.98	0.98	0.98	0.98

[FEATURE] [RFC] Integrating Lucene's Better Binary Quantization into OpenSearch #2805

Description

1. Problem Statement

2. Overview

2.1 Introduction

2.2 Understanding Better Binary Quantization

2.3 Comparison of Quantization Methods

3. Proposed Solution

4. User Interface

5. Implementation Details

5.1 Architecture Flow

5.2 Core Components

LuceneBBQEncoder

LuceneHNSWMethod

KNNBBQVectorsFormatParams

5.3 Codec Integration

BasePerFieldKnnVectorsFormat Changes

KNN9120PerFieldVectorsFormat Implementation

5.4 Rescoring/Oversampling

5.4.1 Leveraging Existing Infrastructure

5.4.2 Automatic Search Implementation

5.4.3 Query Parameter Usage

5.4.4 Built-in Performance Optimizations

6. Benchmarking

Config

Metrics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions