Panorama HNSW Optimizations (~x1.2) by AlSchlo · Pull Request #5190 · facebookresearch/faiss

AlSchlo · 2026-05-07T07:28:32Z

Perf improvements on HNSWPanorama, removing regressions on lower-dimensional datasets (such as SIFT or Deep). Overall achieves a 1.2x speedup over the existing code.

Hides cache misses in HNSW by popping multiple elements at once from the candidate heap.
Applies FixedWidth for levels that are multiples of 8.
Disables (in benchmarks) the visited-set optimization: the hash set fallback performs terribly on the golden datasets (e.g. SIFT and GIST). We should consider updating this threshold in a different PR.

There is another optimization that improves performance by an additional ~10%, but it requires inlining the coefficients directly into the vectors (i.e. the cum_sums) to reduce cache misses. The downside is that this breaks backward compatibility. Any advice, @mnorris11?

A ~10% gain feels meaningful enough that it might be worth it. From the results we currently have in our WIP paper, this optimization seems to push Faiss' (Panorama) HNSW much closer to the top tier on vector search benchmarks. The problem is that keeping backward compatibility cleanly has been pretty awkward so far, since we’d likely need to maintain two code paths 😓

The overall goal is to make the overhead low enough that there’s little reason to ever use the standard HNSW path, and changing the layout seems to help quite a bit there.

Thanks @aknayar for his help on this PR.

AlSchlo · 2026-05-07T07:38:36Z

On another note, could we also get a quick update on the IVFPQ PR? I know it's a big one to review, but we would want to clarify the concern @mdouze had about the storage of codes. Thanks!

mnorris11 · 2026-05-12T02:36:56Z

We are working on setting up benchmarking for a broad swath of available indexes internally to enable easier review of new index types like IVFPQ Panorama. Sorry for the delay.

For this PR: Can you clarify this part?

There is another optimization that improves performance by an additional ~10%, but it requires inlining the coefficients directly into the vectors (i.e. the cum_sums) to reduce cache misses. The downside is that this breaks backward compatibility.

Is this about the index serialization? We have been discussing removing the "forward compatibility" version of the test (conda write -> cmake read), so we should be able to bypass that one.

AlSchlo · 2026-05-12T04:35:44Z

Is this about the index serialization? We have been discussing removing the "forward compatibility" version of the test (conda write -> cmake read), so we should be able to bypass that one.

Instead of storing the coefficients in a different location in memory, we embed them within each point.
E.g. for 2 levels and 4 dimensions

Point 1: [coeff 1, coeff 2, coeff 3, dim 1, dim 2, dim 3, dim 4]
Point 2: [coeff 1, coeff 2, coeff 3, dim 1, dim 2, dim 3, dim 4]

etc.

Instead of:

Point 1: [dim 1, dim 2, dim 3, dim 4]
Point 2: [dim 1, dim 2, dim 3, dim 4]

Metadata 1: [coeff 1, coeff 2, coeff 3]
Metadata 2: [coeff 1, coeff 2, coeff 3]

This would break backward compatibility. We can integrate it (and speedup HNSW by another 10%), but this would mean we need to add quite a bit of code, as we cannot delete the existing one.

meta-codesync · 2026-05-15T23:15:49Z

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D105385379.

mnorris11 · 2026-05-16T03:21:24Z

+                bool is_new = vt.set(v1);
+                bool is_selected = !sel || sel->is_member(v1);
+                if (is_new && is_selected) {
+                    const float vsum =


How about checking at top of loop:

if (initial_size >= buf_cap) { break; }

We allocate 2*M = nb_per_parent extra in this buffer:

const size_t buf_cap = kTargetBatch + nb_per_parent;

and then we have this condition:

while (initial_size < kTargetBatch

So it means that in one iteration, at most, we add 2*M elements in our buffer.

FWIW I think the same optimization can be applied for vanilla HNSW, it just so happens to be really needed by Panorama given how much more memory-bound we are.

mnorris11 · 2026-05-16T03:21:26Z

+                        // We already have parents queued; un-pop this
+                        // one so the next outer iteration sees it and
+                        // re-applies the stop check from a clean state.
+                        candidates.push(v0, d0);


Do we need to stop_flag = true; in this branch too?

I do not think so, because we will have some points staged for exploration, which might populate the candidates heap again. This keeps it consistent with the original code path.

That being said, I must admit that this MiniMaxHeap is beyond my (and @aknayar) comprehension - so if you think we need to add it, we will gladly do so :-)

mnorris11 · 2026-05-16T03:21:28Z

@@ -801,177 +802,129 @@ int search_from_candidates_panorama(
            flat_codes_qdis,
            "DistanceComputer must be a FlatCodesDistanceComputer");



Just in case:

FAISS_THROW_IF_NOT_MSG( level >= 0 && level <= hnsw.max_level, "Invalid HNSW level");

Isn't this function only called at the lowest HNSW level?
Also FWIW the existing search_from_candidates function does not have that check either.

If really we only use level = 0, we should consider removing the argument altogether.

AlSchlo added 10 commits May 7, 2026 04:49

First iteration

0a7883e

bench

d214aec

bench

c10cbc5

revert layout optim for sanity of Faiss reviewers

d89c5c7

Simplify

b729735

better

f592ab4

fix

2642ec7

fix

eb0e2f7

useless claude comment

c62a2e4

useless claude comment

abe7c26

meta-cla Bot added the CLA Signed label May 7, 2026

Merge branch 'main' into hnsw-optim

37e3d4d

AlSchlo added 2 commits May 8, 2026 10:26

Merge branch 'main' into hnsw-optim

ebeeba5

Merge branch 'main' into hnsw-optim

74acda9

AlSchlo and others added 3 commits May 11, 2026 21:35

Merge branch 'main' into hnsw-optim

26ff076

Merge branch 'main' into hnsw-optim

1095d65

Merge branch 'main' into hnsw-optim

64e2d6d

mnorris11 reviewed May 16, 2026

View reviewed changes

Merge branch 'main' into hnsw-optim

d89ae54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panorama HNSW Optimizations (~x1.2)#5190

Panorama HNSW Optimizations (~x1.2)#5190
AlSchlo wants to merge 17 commits into
facebookresearch:mainfrom
AlSchlo:hnsw-optim

AlSchlo commented May 7, 2026 •

edited

Loading

Uh oh!

AlSchlo commented May 7, 2026 •

edited

Loading

Uh oh!

mnorris11 commented May 12, 2026

Uh oh!

AlSchlo commented May 12, 2026

Uh oh!

meta-codesync Bot commented May 15, 2026

Uh oh!

mnorris11 May 16, 2026

Uh oh!

AlSchlo May 16, 2026

Uh oh!

mnorris11 May 16, 2026

Uh oh!

AlSchlo May 16, 2026

Uh oh!

mnorris11 May 16, 2026

Uh oh!

AlSchlo May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -801,177 +802,129 @@ int search_from_candidates_panorama(
		flat_codes_qdis,
		"DistanceComputer must be a FlatCodesDistanceComputer");

Conversation

AlSchlo commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlSchlo commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mnorris11 commented May 12, 2026

Uh oh!

AlSchlo commented May 12, 2026

Uh oh!

meta-codesync Bot commented May 15, 2026

Uh oh!

mnorris11 May 16, 2026

Choose a reason for hiding this comment

Uh oh!

AlSchlo May 16, 2026

Choose a reason for hiding this comment

Uh oh!

mnorris11 May 16, 2026

Choose a reason for hiding this comment

Uh oh!

AlSchlo May 16, 2026

Choose a reason for hiding this comment

Uh oh!

mnorris11 May 16, 2026

Choose a reason for hiding this comment

Uh oh!

AlSchlo May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlSchlo commented May 7, 2026 •

edited

Loading

AlSchlo commented May 7, 2026 •

edited

Loading