Description
Meta issue for all things related to optimising memory usage for vector search (and in particular quantization and HNSW), with the best defaults for the best out-of-the-box experience.
At a high-level, HNSW works best when "everything" is in memory. Quantization is great in that it significantly reduces the size of the compressed vectors. When using HNSW with quantization we want to offer the best out of the box experience, so align the implementation to make best use of memory.
The general idea is to use the page cache as efficiently as possible, and avoid unnecessarily causing critical data structures from being paged out. Specifically, 1) pre load the HNSW graph (since not having it in memory results in very poor performance) and offer insights into its residency, and 2) avoid perturbing the page cache when rescoring.
Tasks
- Use Direct I/O for rescoring in BBQ
- Pre load the HNSW graph and quantised vectors for BBQ
- Add low-level statistics about the memory requirements and usage of vectors in the system #125681
- Enhance the API to up level the pre loading, from file extensions to, say, dense vector index type
- Expose low-level statistics to allow for scale up / down events.
The focus of this issue is towards solving for the general case of HNSW and quantisation, but we should keep the BBQ use case top of mind and not make it any more complex than needs to be.