Open
Description
Description
Vector similarity search using HNSW accesses the vectors very heavily during the search (the vec
or veq
files). Even more than the HNSW graph itself (the vex
file). If the vector files don't fit into the page cache, the performance is reduced very significantly (around 100x in our particular case). Users typically configure their search servers to have enough RAM to fit these files.
Lucene currently uses ReadAdvice.RANDOM
when opening these files. I think it would be better to use RANDOM_PRELOAD
.
If you agree, I can provide a PR.
Version and environment details
No response