SIMD-optimize multi-bit RaBitQ inner product by alibeklfc · Pull Request #4850 · facebookresearch/faiss

alibeklfc · 2026-03-02T18:58:15Z

Summary:
The multi-bit RaBitQ distance computation (compute_full_multibit_distance) previously extracted each code value bit-by-bit using extract_code_inline, which iterated ex_bits times per dimension — O(d × ex_bits) total with a data-dependent branch per bit.

This diff replaces it with two complementary optimizations:

1. Improved scalar extraction (all platforms):
Replaces the per-bit extraction loop with a 64-bit window read (memcpy + shift + mask) that extracts each code value in O(1) regardless of ex_bits. This alone gives 25–142% QPS improvement (higher gains at more bits).

2. SIMD bit-plane decomposition (AVX2 + BMI2):
Instead of extracting per-element multi-bit codes, decomposes the inner product into (1 + ex_bits) bit-plane dot products. Each plane is a float × bit-vector dot product computed via bit→mask→float conversion. For ex_bits == 1, both sign and ex are 1-bit packed, enabling zero-extraction kernels (AVX-512 and AVX2). For ex_bits 2–7, BMI2 PEXT extracts each bit plane in one instruction per 8 dimensions.

Also adds -mbmi2 to the AVX2 compiler flags in xplat.bzl.

Recall@10 is identical across all nb_bits before and after.

Differential Revision: D94587233

Summary: The multi-bit RaBitQ distance computation (`compute_full_multibit_distance`) previously extracted each code value bit-by-bit using `extract_code_inline`, which iterated `ex_bits` times per dimension — O(d × ex_bits) total with a data-dependent branch per bit. This diff replaces it with two complementary optimizations: **1. Improved scalar extraction (all platforms):** Replaces the per-bit extraction loop with a 64-bit window read (`memcpy` + shift + mask) that extracts each code value in O(1) regardless of `ex_bits`. This alone gives 25–142% QPS improvement (higher gains at more bits). **2. SIMD bit-plane decomposition (AVX2 + BMI2):** Instead of extracting per-element multi-bit codes, decomposes the inner product into `(1 + ex_bits)` bit-plane dot products. Each plane is a float × bit-vector dot product computed via bit→mask→float conversion. For `ex_bits == 1`, both sign and ex are 1-bit packed, enabling zero-extraction kernels (AVX-512 and AVX2). For `ex_bits` 2–7, BMI2 PEXT extracts each bit plane in one instruction per 8 dimensions. Also adds `-mbmi2` to the AVX2 compiler flags in `xplat.bzl`. Recall@10 is identical across all nb_bits before and after. Differential Revision: D94587233

meta-codesync · 2026-03-02T18:58:24Z

@alibeklfc has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94587233.

alexanderguzhva · 2026-03-02T23:09:32Z

@alibeklfc BMI2 is VERY slow on AMD Zen3 and below

meta-codesync · 2026-03-03T17:00:24Z

This pull request has been merged in 8af77fe.

AlSchlo · 2026-03-15T09:37:17Z

Hi @alibeklfc and @alexanderguzhva,

Does Faiss support BMI2 in the open-source build? I noticed BMI2 checks in the code guards, but I do not see it enabled in the public compile targets. I imagine xplat.bzl is a file internal to Meta?

The reason I ask is that I am experimenting with a feature that relies on PEXT (specifically for PQ with Panorama). For now I have been adding the -mbmi2 flag manually to enable it.

Would it be acceptable to add this flag to the OSS build configuration as well, or is there a reason it is intentionally omitted? What would be the best workaround?

Thanks.

meta-cla Bot added the CLA Signed label Mar 2, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 2, 2026

meta-codesync Bot closed this in 8af77fe Mar 3, 2026

facebook-github-bot added the Merged label Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD-optimize multi-bit RaBitQ inner product#4850

SIMD-optimize multi-bit RaBitQ inner product#4850
alibeklfc wants to merge 1 commit into
facebookresearch:mainfrom
alibeklfc:export-D94587233

alibeklfc commented Mar 2, 2026

Uh oh!

meta-codesync Bot commented Mar 2, 2026

Uh oh!

alexanderguzhva commented Mar 2, 2026

Uh oh!

meta-codesync Bot commented Mar 3, 2026

Uh oh!

AlSchlo commented Mar 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alibeklfc commented Mar 2, 2026

Uh oh!

meta-codesync Bot commented Mar 2, 2026

Uh oh!

alexanderguzhva commented Mar 2, 2026

Uh oh!

meta-codesync Bot commented Mar 3, 2026

Uh oh!

AlSchlo commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AlSchlo commented Mar 15, 2026 •

edited

Loading