Convert binary index Hamming callers to dynamic dispatch (#5071) by algoriddle · Pull Request #5071 · facebookresearch/faiss

algoriddle · 2026-04-09T15:46:07Z

Summary:

Phase 2 of the Hamming DD conversion. Converts 5 external callers of
dispatch_HammingComputer to use per-ISA TUs so the correct
HammingComputer structs (AVX2/NEON/generic) are used at runtime.

Previously, these files compiled as common TUs with no ISA flags,
causing hamdis-inl.h to fall through to generic-inl.h — scalar
HammingComputer structs even on AVX2-capable machines in DD mode.

Converted files:

IndexBinaryHNSW.cpp: get_distance_computer() factory
IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned()
IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types)
IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory
IndexPQ.cpp: polysemous_inner_loop dispatch

NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved
with PQ distance computation inside IVFPQScannerT, niche (off by
default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis.

Differential Revision: D100020358

meta-codesync · 2026-04-09T15:46:39Z

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100020358.

…5070) Summary: Convert all 8 public Hamming distance functions in `hamming.cpp` to dynamic dispatch so they use the correct SIMD implementation at runtime instead of always running at scalar speed in DD mode. This is Phase 1 of the Hamming DD conversion. It converts the public API functions in `hamming.cpp`; Phase 2 (future diff) will migrate external callers that use `dispatch_HammingComputer` directly. **Why Hamming is harder than previous DD conversions:** The SIMD code lives inside 11 HammingComputer/GenHammingComputer structs with ISA-specific memory layouts (e.g., NEON `HammingComputer16` stores `uint8x16_t a0` vs x86 `uint64_t a0, a1`). The structs use raw intrinsics (not simdlib) and are template parameters in hot inner loops where `.hamming()` must be inlined. **Approach:** - Extract all template code from `hamming.cpp` into a shared header `hamming_impl.h`, compiled once per ISA TU (AVX2, NEON, NONE). - The `hamdis-inl.h` ISA ladder (`#ifdef __AVX2__` etc.) selects the correct struct definitions based on per-TU compiler flags. - All internal templates live in an anonymous namespace for ODR safety (different TUs see different struct layouts but never share instances). - 8 entry point template specializations (`_dispatch<SL>`) provide the external linkage symbols that the dispatch wrappers call. - `hamming.cpp` includes `hamming_impl.h` with `THE_SIMD_LEVEL=NONE` for the scalar fallback, then dispatches via `with_simd_level_256bit`. - `hamming.h` keeps including `hamdis-inl.h` for backward compatibility with external callers (they continue to work with generic structs). **Functions dispatched:** - `hammings_knn_hc` (binary flat/IVF kNN search) - `hammings_knn_mc` (max-count kNN) - `hamming_range_search` - `hammings` (all-pairs distance) - `generalized_hammings_knn_hc` (byte-level Hamming) - `hamming_count_thres`, `crosshamming_count_thres`, `match_hamming_thres` **Performance impact:** - NEON: HammingComputer16/20/32/64 gain real vectorization (`vcntq_u8`) - AVX2: GenHammingComputer16/32/M8 gain SSE2/AVX2 byte comparison - AVX2: All HammingComputers get hardware `popcnt` (implied by `-mpopcnt`) Differential Revision: D99994593

…earch#5071) Summary: Phase 2 of the Hamming DD conversion. Converts 5 external callers of dispatch_HammingComputer to use per-ISA TUs so the correct HammingComputer structs (AVX2/NEON/generic) are used at runtime. Previously, these files compiled as common TUs with no ISA flags, causing hamdis-inl.h to fall through to generic-inl.h — scalar HammingComputer structs even on AVX2-capable machines in DD mode. Converted files: - IndexBinaryHNSW.cpp: get_distance_computer() factory - IndexBinaryIVF.cpp: get_InvertedListScanner() + search_preassigned() - IndexBinaryHash.cpp: 4 search entry points (knn+range × 2 index types) - IndexIVFSpectralHash.cpp: get_InvertedListScanner() factory - IndexPQ.cpp: polysemous_inner_loop dispatch NOT converted: IndexIVFPQ.cpp polysemous path — deeply interleaved with PQ distance computation inside IVFPQScannerT, niche (off by default, polysemous_ht=0). See ~/ivfpq_polysemous_dd.md for analysis. Differential Revision: D100020358

meta-cla bot added the CLA Signed label Apr 9, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 9, 2026

algoriddle added 2 commits April 10, 2026 04:56

algoriddle force-pushed the export-D100020358 branch from cdd97db to 79d54ee Compare April 10, 2026 11:56

meta-codesync bot changed the title ~~Convert binary index Hamming callers to dynamic dispatch~~ Convert binary index Hamming callers to dynamic dispatch (#5071) Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert binary index Hamming callers to dynamic dispatch (#5071)#5071

Convert binary index Hamming callers to dynamic dispatch (#5071)#5071
algoriddle wants to merge 2 commits intofacebookresearch:mainfrom
algoriddle:export-D100020358

algoriddle commented Apr 9, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

algoriddle commented Apr 9, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

algoriddle commented Apr 9, 2026 •

edited by meta-codesync bot

Loading