-
Notifications
You must be signed in to change notification settings - Fork 169
Description
1. Problem Statement
This project aims to integrate BBQ into the OpenSearch k-NN plugin to offer users a memory-efficient alternative, ideal for large-scale vector workloads in constrained compute environments.
The OpenSearch k-NN plugin currently supports two engines, each with distinct quantization capabilities:
- FAISS Engine: Supports Product Quantization, Scalar Quantization, Binary Quantization, and raw float32 storage
- Lucene Engine: Supports scalar quantization (7-bit) byte vectors, and raw float32 storage
While both engines offer quantization options, OpenSearch lacks access to Lucene’s Better Binary Quantization, which provides a better compress to quality ratio. BBQ is available in the underlying Lucene library, but not exposed through OpenSearch’s k-NN plugin.
This project aims to address this gap and integrate Lucene’s BBQ into OpenSearch’s k-NN plugin, which provides users with a new memory efficient quantization option.
With this optimization binary quantization achieves a recall enhancement without oversampling from 0.18 using Faiss Binary Quantization to 0.32 using BBQ with the sift-128 dataset, and an improvement of 0.3 to 0.63 using Cohere's-1M dataset.
2. Overview
2.1 Introduction
Better Binary Quantization is a vector compression technique developed by Lucene which focuses on reducing memory usage for approximate nearest neighbor search. It encodes high dimensional float vectors into compact binary representations using advanced quantization methods optimized for search efficiency. BBQ was developed with RaBitQ as a reference, drawing on its approach of combining product quantization with binary encoding.
2.2 Understanding Better Binary Quantization
Technical Process:
This diagram illustrates the process of computing the global centroid of a set of vectors and how the vectors are then centered around this centroid. After centering, we calculate basic statistics and apply an optimization process to quantize the vectors efficiently. This quantization is ultimately used to produce the company binary representations of the vectors.
Compared to naive binary quantization, the centroid calculation is identical. Naive binary quantization does not, however, calculate stats, optimize intervals, and hold on to corrective factors. Once a centered vector has been calculated, naive binary quantization quantizes without making any optimization.
2.3 Comparison of Quantization Methods
Method | Compression Ratio | Recall | Memory Usage |
---|---|---|---|
Scalar Quantization | 2x, 4x | Great | Moderate |
Faiss Binary Quantization | 8x, 16x, 32x | Moderate | Low |
Lucene Better Binary Quantization | 32x | Good | Low |
3. Proposed Solution
- Lucene BBQ Integration: Incorporate Lucene’s
Lucene102BinaryQuantizedVectorsFormat
into OpenSearch. - Query Pipeline: Ensure BBQ-based search scoring is invoked properly.
- Parameterization: Allow customizable BBQ configurations via mappings.
4. User Interface
Users will configure BBQ through the field mapping:
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2,
"space_type": "l2",
"method": {
"name": "hnsw",
"engine": "lucene",
"parameters": {
"encoder": {
// CHANGED PARAMETER
"name": "binary"
},
"ef_construction": 256,
"m": 8
}
}
}
}
}
}
5. Implementation Details
5.1 Architecture Flow

- User specifies
"encoder": {"name": "binary"}
in index mapping LuceneHNSWMethod
resolves BBQ encoder fromSUPPORTED_ENCODERS
mapBasePerFieldKnnVectorsFormat
validates BBQ parameters viaKNNBBQVectorsFormatParams
- Codec instantiates
Lucene102HnswBinaryQuantizedVectorsFormat
for BBQ-enabled fields - Lucene's native BBQ implementation handles vector quantization and storage
5.2 Core Components
LuceneBBQEncoder
Implements Encoder
interface, not much configuration as it is set similar to SQ, but without 3 variables.
public class LuceneBBQEncoder implements Encoder {
private static final Set<VectorDataType> SUPPORTED_DATA_TYPES = ImmutableSet.of(VectorDataType.FLOAT);
private final static MethodComponent METHOD_COMPONENT = MethodComponent.Builder.builder(ENCODER_BBQ)
.addSupportedDataTypes(SUPPORTED_DATA_TYPES)
.build();
@Override
public MethodComponent getMethodComponent() {
return METHOD_COMPONENT;
}
@Override
public CompressionLevel calculateCompressionLevel(
MethodComponentContext methodComponentContext,
KNNMethodConfigContext knnMethodConfigContext
) {
return CompressionLevel.NOT_CONFIGURED;
}
}
LuceneHNSWMethod
BBQ encoder is registered the same way as SQ encoder
final static Encoder SQ_ENCODER = new LuceneSQEncoder();
final static Encoder BBQ_ENCODER = new LuceneBBQEncoder();
final static Map<String, Encoder> SUPPORTED_ENCODERS = Map.of(
SQ_ENCODER.getName(), SQ_ENCODER,
BBQ_ENCODER.getName(), BBQ_ENCODER
);
KNNBBQVectorsFormatParams
Used for validation, ensures proper bbq encoder identification
public class KNNBBQVectorsFormatParams extends KNNVectorsFormatParams {
@Override
public boolean validate(Map<String, Object> params) {
if (params.get(METHOD_ENCODER_PARAMETER) == null) {
return false;
}
if (!(params.get(METHOD_ENCODER_PARAMETER) instanceof MethodComponentContext)) {
return false;
}
MethodComponentContext encoderMethodComponentContext =
(MethodComponentContext) params.get(METHOD_ENCODER_PARAMETER);
return ENCODER_BBQ.equals(encoderMethodComponentContext.getName());
}
}
5.3 Codec Integration
BasePerFieldKnnVectorsFormat Changes
Check for logic that the user select BBQ
if (engine == KNNEngine.LUCENE) {
if (params != null && params.containsKey(METHOD_ENCODER_PARAMETER)) {
KNNBBQVectorsFormatParams bbqParams = new KNNBBQVectorsFormatParams(
params, defaultMaxConnections, defaultBeamWidth
);
if (bbqParams.validate(params)) {
log.debug("Initialize KNN vector format for field [{}] with BBQ encoder", field);
return bbqVectorsFormatSupplier.apply(bbqParams);
}
SQ encoder validation...
}
}
KNN9120PerFieldVectorsFormat Implementation
The concrete format supplier instantiates Lucene’s BBQ format
knnBBQVectorsFormatParams -> {
final Tuple<Integer, ExecutorService> mergeThreadCountAndExecutorService =
getMergeThreadCountAndExecutorService();
return new Lucene102HnswBinaryQuantizedVectorsFormat(
knnBBQVectorsFormatParams.getMaxConnections(),
knnBBQVectorsFormatParams.getBeamWidth(),
mergeThreadCountAndExecutorService.v1(),
mergeThreadCountAndExecutorService.v2()
);
}
5.4 Rescoring/Oversampling
5.4.1 Leveraging Existing Infrastructure
BBQ integrates with OpenSearch k-NN's existing rescoring framework through Lucene's built-in search mechanism. No additional rescoring classes are required as Lucene's Lucene102HnswBinaryQuantizedVectorsFormat handles rescoring automatically.
5.4.2 Automatic Search Implementation
Lucene's BBQ implementation automatically performs:
- HNSW graph traversal using binary quantized vectors for fast candidate retrieval
- Final scoring using original FP32 vectors from rawVectorDelegate for precision
5.4.3 Query Parameter Usage
Users control BBQ rescoring through existing query structure:
{
"query": {
"knn": {
"my_vector1": {
"vector": [0.1, 0.2, 0.3],
"k": 10,
"rescore": {
"oversample_factor": 3.0
}
}
}
}
}
- Rescoring needs to happen through oversampling_factor parameter.
- Scalar quantization by default sets a compression level of 4x. This compression level is not configured with BBQ, so instead a user must pass an oversample factor by query to handle these differences
- Unlike Faiss, Lucene mmaps the index, which allows search without fully loading the entire index into memory
- The segment size is larger because we store the .veb file in addition to the .vemb file, which contains extra information
5.4.4 Built-in Performance Optimizations
Lucene's BBQ format automatically provides:
- Selective Vector Access: Only loads FP32 vectors when needed for final scoring
- Memory Efficiency: Leverages dual storage architecture to minimize memory usage
- Optimized Scoring: Uses corrective terms and binary representations for efficient similarity computation
6. Benchmarking
Config
Config Item | Sift-128 | Cohere-768-1M |
---|---|---|
Nodes | 1 | 1 |
Primary Shards | 1 | 1 |
Replica Shards | 1 | 1 |
ef_construction | 100 | 100 |
ef_search | 100 | 100 |
m | 16 | 16 |
space type | l2 | l2 |
Metrics
Metric | Task | Sift Dataset, Lucene BBQ | Sift Dataset, FaissBQ | Cohere-768-1M, Lucene BBQ | Cohere-768-1M, Faiss BQ | Unit |
---|---|---|---|---|---|---|
Mean recall@100 | prod-queries | 0.32 | 0.18 | 0.63 | 0.3 | |
Mean Throughput | custom-vector-bulk | 6310.84 | 9569.38 | 3011.83 | 3112.96 | docs/s |
Median Throughput | custom-vector-bulk | 6226.56 | 9582.13 | 3065.34 | 3152.94 | docs/s |
Mean Throughput | prod-queries | 1.82 | 1.61 | 1.66 | 1.69 | ops/s |
Median Throughput | prod-queries | 1.82 | 1.61 | 1.66 | 1.69 | ops/s |
50th percentile service time | prod-queries | 5.69515 | 5.71014 | 5.88795 | 5.58729 | ms |
90th percentile service time | prod-queries | 8.88647 | 7.59109 | 7.74107 | 6.48138 | ms |
99th percentile service time | prod-queries | 18.26953 | 17.57099 | 19.85139 | 17.39427 | ms |
Dataset: Cohere-10M
Metric | Task | No Oversample | Oversample Factor 1 | Oversample Factor 1.5 | Oversample Factor 2 | Oversample Factor 2.5 | Oversample Factor 3 | Oversample Factor 3.5 | Oversample Factor 4 | Oversample Factor 4.5 | Oversample Factor 5 | Unit |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Translog size | 0.00108 | 0.00036 | 8.20E-07 | 0.00036 | 8.20E-07 | 0.00072 | 0.00036 | 0.00108 | 0.00072 | 0.00108 | GB | |
Mean Throughput | prod-queries | 181 | 169.54 | 152.16 | 138.07 | 125.6 | 115.96 | 107.24 | 100.06 | 93.94 | 88.6 | ops/s |
Median Throughput | prod-queries | 181.35 | 170.12 | 152.33 | 138.36 | 125.87 | 116.11 | 107.39 | 100.28 | 94.16 | 88.78 | ops/s |
50th percentile latency, 50th percentile service time | prod-queries | 3.33675 | 3.67427 | 4.42162 | 5.10511 | 5.81401 | 6.49634 | 7.18395 | 7.81621 | 8.47487 | 9.09765 | ms |
Mean recall@k | prod-queries | 0.62 | 0.87 | 0.9 | 0.92 | 0.94 | 0.95 | 0.95 | 0.96 | 0.96 | 0.97 | |
Mean recall@1 | prod-queries | 0.57 | 0.94 | 0.96 | 0.96 | 0.97 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 |
Without any oversampling, there is a definite recall improvement when using BBQ as opposed to the existing Faiss binary quantization, with minor affects on latency and throughput. With rescoring enabled, BBQ has significant recall improvement, reaching recall > 0.95 with oversample factors above 3.