Panorama HNSW Optimizations (~x1.2)#5190
Conversation
|
On another note, could we also get a quick update on the IVFPQ PR? I know it's a big one to review, but we would want to clarify the concern @mdouze had about the storage of codes. Thanks! |
|
We are working on setting up benchmarking for a broad swath of available indexes internally to enable easier review of new index types like IVFPQ Panorama. Sorry for the delay. For this PR: Can you clarify this part?
Is this about the index serialization? We have been discussing removing the "forward compatibility" version of the test (conda write -> cmake read), so we should be able to bypass that one. |
Instead of storing the coefficients in a different location in memory, we embed them within each point. Point 1: [coeff 1, coeff 2, coeff 3, dim 1, dim 2, dim 3, dim 4] etc. Instead of: Point 1: [dim 1, dim 2, dim 3, dim 4] Metadata 1: [coeff 1, coeff 2, coeff 3] This would break backward compatibility. We can integrate it (and speedup HNSW by another 10%), but this would mean we need to add quite a bit of code, as we cannot delete the existing one. |
|
@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D105385379. |
| bool is_new = vt.set(v1); | ||
| bool is_selected = !sel || sel->is_member(v1); | ||
| if (is_new && is_selected) { | ||
| const float vsum = |
There was a problem hiding this comment.
How about checking at top of loop:
if (initial_size >= buf_cap) {
break;
}
There was a problem hiding this comment.
We allocate 2*M = nb_per_parent extra in this buffer:
const size_t buf_cap = kTargetBatch + nb_per_parent;
and then we have this condition:
while (initial_size < kTargetBatch
So it means that in one iteration, at most, we add 2*M elements in our buffer.
FWIW I think the same optimization can be applied for vanilla HNSW, it just so happens to be really needed by Panorama given how much more memory-bound we are.
| // We already have parents queued; un-pop this | ||
| // one so the next outer iteration sees it and | ||
| // re-applies the stop check from a clean state. | ||
| candidates.push(v0, d0); |
There was a problem hiding this comment.
Do we need to stop_flag = true; in this branch too?
There was a problem hiding this comment.
I do not think so, because we will have some points staged for exploration, which might populate the candidates heap again. This keeps it consistent with the original code path.
That being said, I must admit that this MiniMaxHeap is beyond my (and @aknayar) comprehension - so if you think we need to add it, we will gladly do so :-)
| @@ -801,177 +802,129 @@ int search_from_candidates_panorama( | |||
| flat_codes_qdis, | |||
| "DistanceComputer must be a FlatCodesDistanceComputer"); | |||
|
|
|||
There was a problem hiding this comment.
Just in case:
FAISS_THROW_IF_NOT_MSG(
level >= 0 && level <= hnsw.max_level, "Invalid HNSW level");
There was a problem hiding this comment.
Isn't this function only called at the lowest HNSW level?
Also FWIW the existing search_from_candidates function does not have that check either.
If really we only use level = 0, we should consider removing the argument altogether.
Perf improvements on HNSWPanorama, removing regressions on lower-dimensional datasets (such as SIFT or Deep). Overall achieves a 1.2x speedup over the existing code.
There is another optimization that improves performance by an additional ~10%, but it requires inlining the coefficients directly into the vectors (i.e. the
cum_sums) to reduce cache misses. The downside is that this breaks backward compatibility. Any advice, @mnorris11?A ~10% gain feels meaningful enough that it might be worth it. From the results we currently have in our WIP paper, this optimization seems to push Faiss' (Panorama) HNSW much closer to the top tier on vector search benchmarks. The problem is that keeping backward compatibility cleanly has been pretty awkward so far, since we’d likely need to maintain two code paths 😓
The overall goal is to make the overhead low enough that there’s little reason to ever use the standard HNSW path, and changing the layout seems to help quite a bit there.
Thanks @aknayar for his help on this PR.