[Performance] Prefetch for on-disk inverted list

## Summary
Enabling prefetch with the default number of prefetch threads (32) for onDiskInvertedLists seems to degrade overall performance. 

## Platform
OS: Linux
Faiss version: v1.13.2-7-gc9bab48d1
Running on: CPU with 64 cores
Interface: C++ (Python should work as well)
Dataset used: hotpotqa

## Reproduction instructions
I prebuilt the vector database on hotpotqa dataset with all-MiniLM-L6-v2 model, and save the trained vector database as the onDiskInvertedLists. The core code of running search is attached below:
```
vector<float> embed_dist(nq * k);
vector<idx_t> embed_idx(nq * k);
faiss::Index* index = faiss::read_index(index_path.c_str());
faiss::IndexIVFFlat* index_ivf = dynamic_cast<faiss::IndexIVFFlat*>(index);
faiss::IndexFlatL2* quantizer = dynamic_cast<faiss::IndexFlatL2*>(index_ivf->quantizer);
int nt = std::min(omp_get_max_threads(), (int)nq);
index_ivf->nprobe = nprobe;
index_ivf->search(nq, xq, k, embed_dist.data(), embed_idx.data());
```
In `index_ivf->search` function, if I comment out the function call `prefetch_lists` in the `sub_search_func`, the overall search latency is lower than without commenting out the function call.

Another thing that I have tried is to reduce the number of prefetch threads, but the performance is still worse than without prefetching. The experiments that 

|Config|Overall Latency| `search_time` captured by `ivf_stats`|
|----|---|---|
|No prefetch|10995ms|674501ms|
|Prefetch_nthread=32|11928ms| 711488ms |
|Prefetch_nthread=8| 11685ms | 693821ms |
|Prefetch_nthread=4| 11557ms | 689031ms |
|Prefetch_nthread=1| 11862ms | 693961ms |

Is this performance trend as expected to see performance degradation with the default 32 prefetch threads? Under what configuration does using 32 prefetch threads be performance-wise beneficial?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Prefetch for on-disk inverted list #5010

Summary

Platform

Reproduction instructions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Config	Overall Latency	`search_time` captured by `ivf_stats`
No prefetch	10995ms	674501ms
Prefetch_nthread=32	11928ms	711488ms
Prefetch_nthread=8	11685ms	693821ms
Prefetch_nthread=4	11557ms	689031ms
Prefetch_nthread=1	11862ms	693961ms

[Performance] Prefetch for on-disk inverted list #5010

Description

Summary

Platform

Reproduction instructions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions