Skip to content

support IVF index #276

Open
Open
@wxyucs

Description

IVF index is a partition type index, which consists of a set of inverted buckets. In the retrieval stage, an inverted bucket with a certain amount of data is selected, and then scanned in the bucket to obtain several candidate points that finally meet the nearest neighbors.

Compared with HNSW type algorithms, IVF often requires a certain amount of data for training bucketing. The usual bucketing method is to use K-means clustering to generate K centroids. For the bucket selection strategy during query, we support indexing these K centroids, such as graph indexing for routing.
The vectors in the bucket support multiple encoding methods. Due to the continuous arrangement of data, the access overhead is relatively low, but the total computational effort is higher than that of graph algorithms with similar configurations.

Below we will introduce the basic design framework of IVF index

  1. Construct BucketDatacell to manage the data in the bucket (excluding the centroid)

  2. Construct a data structure called Router to manage the centroid and the corresponding routing method. A simple implementation of Router is composed of centroids. Its classic construction method is k-means. Of course, it also supports importing from the outside. It can contain an Index entity

  3. IVF also supports a reordering mechanism

The relevant Pull Requests are as follows:

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions