Skip to content

[Feature] Skip adding flat vector storage during CAGRA to HNSW graph conversion #4931

@rchitale7

Description

@rchitale7

Background/Motivation

The OpenSearch k-NN plugin supports building vector indexes using a GPU-accelerated remote index build service. (Reference blog post: https://opensearch.org/blog/GPU-Accelerated-Vector-Search-OpenSearch-New-Frontier/)

The remote index build service performs the following steps:

  1. Download vectors from a remote object storage
  2. Build the CAGRA graph using GPUs
  3. Convert the CAGRA graph to HNSW graph
  4. Serialize the HNSW graph, via faiss.write_index
  5. Upload the HNSW graph to remote object storage

Then, once the HNSW graph has been uploaded, the k-NN plugin can download this graph and search it, as if it had been built on a CPU.

One of the main limitations is the amount of CPU memory consumed during step 3. The HNSW object contains both the flat vector storage and the graph structure. The flat vector storage is loaded here:

index->storage->add(n_train, train_dataset);
.

However, the k-NN plugin already has a copy of the vectors before it sends them to the remote index build service. In theory, the flat vector storage does not need to be loaded into the HNSW object; instead, the k-NN plugin can stitch the graph structure together with its copy of the flat vectors at search time. With this approach, the memory consumption on the remote build service is reduced by ~ 50%. Performance also improves, since the HNSW index uploaded to remote object storage is much smaller.

Proposal

There may be other use cases besides OpenSearch k-NN GPU acceleration that follow this architecture, where the storage is managed separately from the converted IndexHNSWCagra graph.

Thus, I propose changing the GpuIndexCagra::copyTo signature like so:

void copyTo(faiss::IndexHNSWCagra* index, bool skip_storage = false) const;

where if skip_storage = true, the code skips adding the vector storage to the index. This is analogous to the faiss.write_index IO_FLAG_SKIP_STORAGE flag:

if (io_flags & IO_FLAG_SKIP_STORAGE) {

When copyTo is called with skip_storage = true, and then write_index is called with IO_FLAG_SKIP_STORAGE = true, the serialized HNSW graph will not have the flat vector storage. skip_storage is set to false for backwards compatibility.

I also propose changing the signature of the GpuIndexBinaryCagra function like so:

void copyTo(faiss::IndexBinaryHNSWCagra* index, bool skip_storage = false) const;

and adding support for io_flags in write_index_binary:

void write_index_binary(const IndexBinary* idx, const char* fname, int io_flags = 0);
void write_index_binary(const IndexBinary* idx, FILE* f, int io_flags = 0);
void write_index_binary(const IndexBinary* idx, IOWriter* writer, int io_flags = 0);

Within copyTo, I am thinking we could add a check for skip_storage=true before this line:

if (numeric_type_ == NumericType::Float32) {

And then in write_index, we need to make sure the io_flags get passed in the recursive call:

write_index(idxmap->index, f);

For the binary case, the changes would be similar, and we'd also need to add support for reading the fourcc("null") and throwing nullptr:

if (h == fourcc("IBxF")) {

I'm looking for some feedback on this, and would love to get a maintainer opinion. If there is alignment on the interface changes and the general approach, then I can go ahead and raise a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions