-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add VectorIndex and VectorIndexConfiguration for vector search #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add VectorIndex and VectorIndexConfiguration for vector search #43
Conversation
Add support for vector search indexes in Couchbase Lite: - VectorIndex class for QueryBuilder-style index creation - VectorIndexConfiguration for configuration-based index creation - DistanceMetric enum (cosine, euclidean, euclideanSquared, dot) - VectorEncoding class (none, scalarQuantizer, productQuantizer) - ScalarQuantizerType enum (SQ4, SQ6, SQ8) This enables APPROX_VECTOR_DISTANCE() queries for similarity search. Ref: Couchbase-Ecosystem/cbl-reactnative#86
Summary of ChangesHello @Fujio-Turner, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances Couchbase Lite React Native by integrating robust vector search capabilities. It introduces new classes and enums that empower developers to define and manage vector indexes with fine-grained control over distance metrics, vector encoding, and other crucial parameters. This addition facilitates efficient approximate nearest neighbor queries, opening up possibilities for advanced similarity search functionalities within applications. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for vector search indexes by adding VectorIndex and VectorIndexConfiguration. The implementation is well-structured, but there are a few critical issues to address. The scalar quantization logic is currently non-functional as it ignores the quantization level parameter. There is also significant code duplication between VectorIndex and VectorIndexConfiguration which has led to inconsistencies like missing validation. Finally, a magic number is used for the index type instead of updating the IndexType enum. Addressing these points will improve the correctness and maintainability of the new feature.
| static scalarQuantizer(type: ScalarQuantizerType = ScalarQuantizerType.SQ8): VectorEncoding { | ||
| return new VectorEncoding('SQ', undefined, undefined); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type parameter is unused. This means the specific scalar quantization level (SQ4, SQ6, SQ8) is lost, and a generic 'SQ' type is used instead. This prevents users from properly configuring the index encoding. This method should use the type parameter to create the VectorEncoding. Note that this change will also require an update to the toJson() method to handle the new types.
static scalarQuantizer(type: ScalarQuantizerType = ScalarQuantizerType.SQ8): VectorEncoding {
if (type === ScalarQuantizerType.NONE) {
return VectorEncoding.none();
}
return new VectorEncoding(type);
}| toJson(): any { | ||
| if (this._type === 'none') { | ||
| return { type: 'none' }; | ||
| } else if (this._type === 'SQ') { | ||
| return { type: 'SQ' }; | ||
| } else { | ||
| return { | ||
| type: 'PQ', | ||
| bits: this._bits, | ||
| subquantizers: this._subquantizers, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the required change in scalarQuantizer, this toJson method will fail because this._type will be SQ4, SQ6, or SQ8, not 'SQ'. The method doesn't correctly serialize the specific scalar quantization type and needs to be updated to handle these specific types.
toJson(): any {
if (this._type === 'none') {
return { type: 'none' };
} else if (this._type.startsWith('SQ')) {
return { type: this._type };
} else {
return {
type: 'PQ',
bits: this._bits,
subquantizers: this._subquantizers,
};
}
}| type(): IndexType { | ||
| // Vector = 3 in IndexType enum | ||
| return 3 as IndexType; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type() method returns a magic number 3 cast to IndexType. The comment indicates this should correspond to a Vector index type, but the IndexType enum in abstract-index.ts has not been updated to include a Vector member. Using a magic number is brittle and defeats the purpose of the enum. The IndexType enum should be updated to include Vector = 3. While abstract-index.ts may not be part of this PR, this is an important issue to address for code correctness and maintainability.
| export enum ScalarQuantizerType { | ||
| /** No quantization - full precision */ | ||
| NONE = 'none', | ||
| /** 4-bit quantization */ | ||
| SQ4 = 'SQ4', | ||
| /** 6-bit quantization */ | ||
| SQ6 = 'SQ6', | ||
| /** 8-bit quantization (recommended) */ | ||
| SQ8 = 'SQ8', | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ScalarQuantizerType enum includes a NONE member. This is confusing because there is also a VectorEncoding.none() static method, suggesting two ways to specify no encoding. To improve API clarity, consider removing NONE from this enum and guiding users to use VectorEncoding.none() exclusively for uncompressed vectors.
| * await collection.createIndex('my_vector_idx', index); | ||
| * ``` | ||
| */ | ||
| export class VectorIndex extends AbstractIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VectorIndex class duplicates a significant amount of code from VectorIndexConfiguration, including properties, default values, and the toJson() method. This duplication can lead to maintenance issues. For instance, the constructor of VectorIndexConfiguration validates dimensions and centroids, but this validation is missing in the VectorIndex constructor. To improve maintainability and ensure consistency, consider refactoring to eliminate this duplication. One approach is to use composition by creating a shared helper class that holds the common configuration, validation, and serialization logic.
- Add vector-index.ts export to index.ts - Add Vector enum to IndexType - Add vectorIndex() method to IndexBuilder
Summary
Add support for vector search indexes in Couchbase Lite React Native.
New Files
cblite/src/vector-index.ts- Vector index classes and enumsFeatures
VectorIndexclass for QueryBuilder-style index creationVectorIndexConfigurationfor configuration-based index creationDistanceMetricenum (cosine, euclidean, euclideanSquared, dot)VectorEncodingclass (none, scalarQuantizer, productQuantizer)ScalarQuantizerTypeenum (SQ4, SQ6, SQ8)Usage
This enables
APPROX_VECTOR_DISTANCE()queries for similarity search.Related: Couchbase-Ecosystem/cbl-reactnative#86