Background/Motivation
The OpenSearch k-NN plugin supports building vector indexes using a GPU-accelerated remote index build service. (Reference blog post: https://opensearch.org/blog/GPU-Accelerated-Vector-Search-OpenSearch-New-Frontier/)
The remote index build service performs the following steps:
- Download vectors from a remote object storage
- Build the CAGRA graph using GPUs
- Convert the CAGRA graph to HNSW graph
- Serialize the HNSW graph, via
faiss.write_index
- Upload the HNSW graph to remote object storage
Then, once the HNSW graph has been uploaded, the k-NN plugin can download this graph and search it, as if it had been built on a CPU.
One of the main limitations is the amount of CPU memory consumed during step 3. The HNSW object contains both the flat vector storage and the graph structure. The flat vector storage is loaded here:
|
index->storage->add(n_train, train_dataset); |
.
However, the k-NN plugin already has a copy of the vectors before it sends them to the remote index build service. In theory, the flat vector storage does not need to be loaded into the HNSW object; instead, the k-NN plugin can stitch the graph structure together with its copy of the flat vectors at search time. With this approach, the memory consumption on the remote build service is reduced by ~ 50%. Performance also improves, since the HNSW index uploaded to remote object storage is much smaller.
Proposal
There may be other use cases besides OpenSearch k-NN GPU acceleration that follow this architecture, where the storage is managed separately from the converted IndexHNSWCagra graph.
Thus, I propose changing the GpuIndexCagra::copyTo signature like so:
void copyTo(faiss::IndexHNSWCagra* index, bool skip_storage = false) const;
where if skip_storage = true, the code skips adding the vector storage to the index. This is analogous to the faiss.write_index IO_FLAG_SKIP_STORAGE flag:
|
if (io_flags & IO_FLAG_SKIP_STORAGE) { |
When copyTo is called with skip_storage = true, and then write_index is called with IO_FLAG_SKIP_STORAGE = true, the serialized HNSW graph will not have the flat vector storage. skip_storage is set to false for backwards compatibility.
I also propose changing the signature of the GpuIndexBinaryCagra function like so:
void copyTo(faiss::IndexBinaryHNSWCagra* index, bool skip_storage = false) const;
and adding support for io_flags in write_index_binary:
void write_index_binary(const IndexBinary* idx, const char* fname, int io_flags = 0);
void write_index_binary(const IndexBinary* idx, FILE* f, int io_flags = 0);
void write_index_binary(const IndexBinary* idx, IOWriter* writer, int io_flags = 0);
Within copyTo, I am thinking we could add a check for skip_storage=true before this line:
|
if (numeric_type_ == NumericType::Float32) { |
And then in write_index, we need to make sure the io_flags get passed in the recursive call:
|
write_index(idxmap->index, f); |
For the binary case, the changes would be similar, and we'd also need to add support for reading the fourcc("null") and throwing nullptr:
|
if (h == fourcc("IBxF")) { |
I'm looking for some feedback on this, and would love to get a maintainer opinion. If there is alignment on the interface changes and the general approach, then I can go ahead and raise a PR.
Background/Motivation
The OpenSearch k-NN plugin supports building vector indexes using a GPU-accelerated remote index build service. (Reference blog post: https://opensearch.org/blog/GPU-Accelerated-Vector-Search-OpenSearch-New-Frontier/)
The remote index build service performs the following steps:
faiss.write_indexThen, once the HNSW graph has been uploaded, the k-NN plugin can download this graph and search it, as if it had been built on a CPU.
One of the main limitations is the amount of CPU memory consumed during step 3. The HNSW object contains both the flat vector storage and the graph structure. The flat vector storage is loaded here:
faiss/faiss/gpu/GpuIndexCagra.cu
Line 480 in 9ea026c
However, the k-NN plugin already has a copy of the vectors before it sends them to the remote index build service. In theory, the flat vector storage does not need to be loaded into the HNSW object; instead, the k-NN plugin can stitch the graph structure together with its copy of the flat vectors at search time. With this approach, the memory consumption on the remote build service is reduced by ~ 50%. Performance also improves, since the HNSW index uploaded to remote object storage is much smaller.
Proposal
There may be other use cases besides OpenSearch k-NN GPU acceleration that follow this architecture, where the storage is managed separately from the converted
IndexHNSWCagragraph.Thus, I propose changing the
GpuIndexCagra::copyTosignature like so:where if
skip_storage=true, the code skips adding the vector storage to the index. This is analogous to thefaiss.write_indexIO_FLAG_SKIP_STORAGEflag:faiss/faiss/impl/index_write.cpp
Line 872 in 0d147a7
When
copyTois called withskip_storage=true, and thenwrite_indexis called withIO_FLAG_SKIP_STORAGE=true, the serialized HNSW graph will not have the flat vector storage.skip_storageis set tofalsefor backwards compatibility.I also propose changing the signature of the
GpuIndexBinaryCagrafunction like so:and adding support for
io_flagsinwrite_index_binary:Within
copyTo, I am thinking we could add a check forskip_storage=truebefore this line:faiss/faiss/gpu/GpuIndexCagra.cu
Line 456 in 0d147a7
And then in
write_index, we need to make sure theio_flagsget passed in the recursive call:faiss/faiss/impl/index_write.cpp
Line 844 in 0d147a7
For the binary case, the changes would be similar, and we'd also need to add support for reading the
fourcc("null")and throwingnullptr:faiss/faiss/impl/index_read.cpp
Line 2048 in 0d147a7
I'm looking for some feedback on this, and would love to get a maintainer opinion. If there is alignment on the interface changes and the general approach, then I can go ahead and raise a PR.