Skip to content

Panorama HNSW Optimizations (~x1.2)#5190

Open
AlSchlo wants to merge 17 commits into
facebookresearch:mainfrom
AlSchlo:hnsw-optim
Open

Panorama HNSW Optimizations (~x1.2)#5190
AlSchlo wants to merge 17 commits into
facebookresearch:mainfrom
AlSchlo:hnsw-optim

Conversation

@AlSchlo
Copy link
Copy Markdown
Contributor

@AlSchlo AlSchlo commented May 7, 2026

Perf improvements on HNSWPanorama, removing regressions on lower-dimensional datasets (such as SIFT or Deep). Overall achieves a 1.2x speedup over the existing code.

  1. Hides cache misses in HNSW by popping multiple elements at once from the candidate heap.
  2. Applies FixedWidth for levels that are multiples of 8.
  3. Disables (in benchmarks) the visited-set optimization: the hash set fallback performs terribly on the golden datasets (e.g. SIFT and GIST). We should consider updating this threshold in a different PR.

There is another optimization that improves performance by an additional ~10%, but it requires inlining the coefficients directly into the vectors (i.e. the cum_sums) to reduce cache misses. The downside is that this breaks backward compatibility. Any advice, @mnorris11?

A ~10% gain feels meaningful enough that it might be worth it. From the results we currently have in our WIP paper, this optimization seems to push Faiss' (Panorama) HNSW much closer to the top tier on vector search benchmarks. The problem is that keeping backward compatibility cleanly has been pretty awkward so far, since we’d likely need to maintain two code paths 😓

The overall goal is to make the overhead low enough that there’s little reason to ever use the standard HNSW path, and changing the layout seems to help quite a bit there.

Screenshot 2026-05-07 at 12 25 15 AM

Thanks @aknayar for his help on this PR.

@meta-cla meta-cla Bot added the CLA Signed label May 7, 2026
@AlSchlo
Copy link
Copy Markdown
Contributor Author

AlSchlo commented May 7, 2026

On another note, could we also get a quick update on the IVFPQ PR? I know it's a big one to review, but we would want to clarify the concern @mdouze had about the storage of codes. Thanks!

@mnorris11
Copy link
Copy Markdown
Contributor

We are working on setting up benchmarking for a broad swath of available indexes internally to enable easier review of new index types like IVFPQ Panorama. Sorry for the delay.

For this PR: Can you clarify this part?

There is another optimization that improves performance by an additional ~10%, but it requires inlining the coefficients directly into the vectors (i.e. the cum_sums) to reduce cache misses. The downside is that this breaks backward compatibility.

Is this about the index serialization? We have been discussing removing the "forward compatibility" version of the test (conda write -> cmake read), so we should be able to bypass that one.

@AlSchlo
Copy link
Copy Markdown
Contributor Author

AlSchlo commented May 12, 2026

Is this about the index serialization? We have been discussing removing the "forward compatibility" version of the test (conda write -> cmake read), so we should be able to bypass that one.

Instead of storing the coefficients in a different location in memory, we embed them within each point.
E.g. for 2 levels and 4 dimensions

Point 1: [coeff 1, coeff 2, coeff 3, dim 1, dim 2, dim 3, dim 4]
Point 2: [coeff 1, coeff 2, coeff 3, dim 1, dim 2, dim 3, dim 4]

etc.

Instead of:

Point 1: [dim 1, dim 2, dim 3, dim 4]
Point 2: [dim 1, dim 2, dim 3, dim 4]

Metadata 1: [coeff 1, coeff 2, coeff 3]
Metadata 2: [coeff 1, coeff 2, coeff 3]

This would break backward compatibility. We can integrate it (and speedup HNSW by another 10%), but this would mean we need to add quite a bit of code, as we cannot delete the existing one.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 15, 2026

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D105385379.

Comment thread faiss/impl/HNSW.cpp
bool is_new = vt.set(v1);
bool is_selected = !sel || sel->is_member(v1);
if (is_new && is_selected) {
const float vsum =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking at top of loop:

                    if (initial_size >= buf_cap) {
                        break;
                    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allocate 2*M = nb_per_parent extra in this buffer:

const size_t buf_cap = kTargetBatch + nb_per_parent;

and then we have this condition:

 while (initial_size < kTargetBatch

So it means that in one iteration, at most, we add 2*M elements in our buffer.

FWIW I think the same optimization can be applied for vanilla HNSW, it just so happens to be really needed by Panorama given how much more memory-bound we are.

Comment thread faiss/impl/HNSW.cpp
// We already have parents queued; un-pop this
// one so the next outer iteration sees it and
// re-applies the stop check from a clean state.
candidates.push(v0, d0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to stop_flag = true; in this branch too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so, because we will have some points staged for exploration, which might populate the candidates heap again. This keeps it consistent with the original code path.

That being said, I must admit that this MiniMaxHeap is beyond my (and @aknayar) comprehension - so if you think we need to add it, we will gladly do so :-)

Comment thread faiss/impl/HNSW.cpp
@@ -801,177 +802,129 @@ int search_from_candidates_panorama(
flat_codes_qdis,
"DistanceComputer must be a FlatCodesDistanceComputer");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case:

FAISS_THROW_IF_NOT_MSG(
          level >= 0 && level <= hnsw.max_level, "Invalid HNSW level");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this function only called at the lowest HNSW level?
Also FWIW the existing search_from_candidates function does not have that check either.

If really we only use level = 0, we should consider removing the argument altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants