Optimize multibit sign-bit unpacking in RaBitQ FastScan handlers by alibeklfc · Pull Request #5097 · facebookresearch/faiss

alibeklfc · 2026-04-13T23:43:03Z

Summary:
Replace CodePackerRaBitQ::unpack_1() with rabitq_utils::unpack_sign_bits_from_packed() in both RaBitQHeapHandler and IVFRaBitQHeapHandler multibit refinement paths.

The old path called pq4_get_packed_element twice per output byte, each call recomputing the vector's in-block position from scratch (division, modulo, branches). The new function precomputes the PQ4 address once and iterates with simple strided byte loads. It also skips the unnecessary auxiliary data copy that unpack_1 performed.

Micro-benchmark results (unpack-only, median ns/call):

d	Old (ns)	New (ns)	Speedup
64	627	166	3.8x
128	1204	279	4.3x
256	2329	525	4.4x
512	4583	996	4.6x
768	6731	1376	4.9x
1024	9344	1819	5.1x

End-to-end (unpack + SIMD distance) speedup is 1.4-1.6x.

Additional cleanup: removed CodePacker heap allocation and virtual dispatch from both handlers.

Differential Revision: D100718832

…cebookresearch#5095) Summary: D100399519 added IVFRaBitQSearchParameters support to the FastScan scanner but only patched the distance_to_code fallback path. The main search path (LUT construction and SIMD distance correction in handle()) still read qb/centered from the index, ignoring the search params override. This diff completes the fix by: 1. Adding qb/centered fields to FastScanDistancePostProcessing context 2. Threading them through compute_LUT → compute_residual_LUT 3. Reading them from context in the handler's handle() method 4. Extracting them from IVFRaBitQSearchParameters in search_preassigned Differential Revision: D100674751

Summary: Replace `CodePackerRaBitQ::unpack_1()` with `rabitq_utils::unpack_sign_bits_from_packed()` in both `RaBitQHeapHandler` and `IVFRaBitQHeapHandler` multibit refinement paths. The old path called `pq4_get_packed_element` twice per output byte, each call recomputing the vector's in-block position from scratch (division, modulo, branches). The new function precomputes the PQ4 address once and iterates with simple strided byte loads. It also skips the unnecessary auxiliary data copy that `unpack_1` performed. Micro-benchmark results (unpack-only, median ns/call): | d | Old (ns) | New (ns) | Speedup | |------|----------|----------|---------| | 64 | 627 | 166 | 3.8x | | 128 | 1204 | 279 | 4.3x | | 256 | 2329 | 525 | 4.4x | | 512 | 4583 | 996 | 4.6x | | 768 | 6731 | 1376 | 4.9x | | 1024 | 9344 | 1819 | 5.1x | End-to-end (unpack + SIMD distance) speedup is 1.4-1.6x. Additional cleanup: removed `CodePacker` heap allocation and virtual dispatch from both handlers. Differential Revision: D100718832

meta-codesync · 2026-04-13T23:43:26Z

@alibeklfc has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100718832.

alibeklfc added 2 commits April 13, 2026 16:42

meta-cla bot added the CLA Signed label Apr 13, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize multibit sign-bit unpacking in RaBitQ FastScan handlers#5097

Optimize multibit sign-bit unpacking in RaBitQ FastScan handlers#5097
alibeklfc wants to merge 2 commits intofacebookresearch:mainfrom
alibeklfc:export-D100718832

alibeklfc commented Apr 13, 2026

Uh oh!

meta-codesync bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alibeklfc commented Apr 13, 2026

Uh oh!

meta-codesync bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant