Summary
Enabling prefetch with the default number of prefetch threads (32) for onDiskInvertedLists seems to degrade overall performance.
Platform
OS: Linux
Faiss version: v1.13.2-7-gc9bab48d1
Running on: CPU with 64 cores
Interface: C++ (Python should work as well)
Dataset used: hotpotqa
Reproduction instructions
I prebuilt the vector database on hotpotqa dataset with all-MiniLM-L6-v2 model, and save the trained vector database as the onDiskInvertedLists. The core code of running search is attached below:
vector<float> embed_dist(nq * k);
vector<idx_t> embed_idx(nq * k);
faiss::Index* index = faiss::read_index(index_path.c_str());
faiss::IndexIVFFlat* index_ivf = dynamic_cast<faiss::IndexIVFFlat*>(index);
faiss::IndexFlatL2* quantizer = dynamic_cast<faiss::IndexFlatL2*>(index_ivf->quantizer);
int nt = std::min(omp_get_max_threads(), (int)nq);
index_ivf->nprobe = nprobe;
index_ivf->search(nq, xq, k, embed_dist.data(), embed_idx.data());
In index_ivf->search function, if I comment out the function call prefetch_lists in the sub_search_func, the overall search latency is lower than without commenting out the function call.
Another thing that I have tried is to reduce the number of prefetch threads, but the performance is still worse than without prefetching. The experiments that
| Config |
Overall Latency |
search_time captured by ivf_stats |
| No prefetch |
10995ms |
674501ms |
| Prefetch_nthread=32 |
11928ms |
711488ms |
| Prefetch_nthread=8 |
11685ms |
693821ms |
| Prefetch_nthread=4 |
11557ms |
689031ms |
| Prefetch_nthread=1 |
11862ms |
693961ms |
Is this performance trend as expected to see performance degradation with the default 32 prefetch threads? Under what configuration does using 32 prefetch threads be performance-wise beneficial?
Summary
Enabling prefetch with the default number of prefetch threads (32) for onDiskInvertedLists seems to degrade overall performance.
Platform
OS: Linux
Faiss version: v1.13.2-7-gc9bab48d1
Running on: CPU with 64 cores
Interface: C++ (Python should work as well)
Dataset used: hotpotqa
Reproduction instructions
I prebuilt the vector database on hotpotqa dataset with all-MiniLM-L6-v2 model, and save the trained vector database as the onDiskInvertedLists. The core code of running search is attached below:
In
index_ivf->searchfunction, if I comment out the function callprefetch_listsin thesub_search_func, the overall search latency is lower than without commenting out the function call.Another thing that I have tried is to reduce the number of prefetch threads, but the performance is still worse than without prefetching. The experiments that
search_timecaptured byivf_statsIs this performance trend as expected to see performance degradation with the default 32 prefetch threads? Under what configuration does using 32 prefetch threads be performance-wise beneficial?