support IVF index

IVF index is a partition type index, which consists of a set of inverted buckets. In the retrieval stage, an inverted bucket with a certain amount of data is selected, and then scanned in the bucket to obtain several candidate points that finally meet the nearest neighbors.

Compared with HNSW type algorithms, IVF often requires a certain amount of data for training bucketing. The usual bucketing method is to use K-means clustering to generate K centroids. For the bucket selection strategy during query, we support indexing these K centroids, such as graph indexing for routing.
The vectors in the bucket support multiple encoding methods. Due to the continuous arrangement of data, the access overhead is relatively low, but the total computational effort is higher than that of graph algorithms with similar configurations.

Below we will introduce the basic design framework of IVF index

1. Construct `BucketDatacell` to manage the data in the bucket (excluding the centroid)

2. Construct a data structure called `Router` to manage the centroid and the corresponding routing method. A simple implementation of Router is composed of centroids. Its classic construction method is k-means. Of course, it also supports importing from the outside. It can contain an Index entity

3. IVF also supports a reordering mechanism

Tasks:
- [x] implement bucket datacell
- [x] implement partition strategy
- [ ] support search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support IVF index #276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support IVF index #276

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions