Skip to content

Fix: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds#5057

Open
mulugetam wants to merge 1 commit intofacebookresearch:mainfrom
mulugetam:static-dispatch-fix
Open

Fix: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds#5057
mulugetam wants to merge 1 commit intofacebookresearch:mainfrom
mulugetam:static-dispatch-fix

Conversation

@mulugetam
Copy link
Copy Markdown
Contributor

@mulugetam mulugetam commented Apr 7, 2026

While running benchs/bench_rabitq.py, I observed that the execution path is using scalar instructions instead of AVX-512, despite FAISS being built with a static dispatch target of avx512_spr. Below is a summary and a fix (Unless this was a deliberate design decision, I’m not sure what the reasoning is).

Summary

Static (non-DD) builds with avx512_spr, and partially avx512 and arm_sve, silently fall back to scalar (SIMDLevel::NONE) for all SIMD-dispatched functions. This is because the static dispatch path in with_selected_simd_levels lacks the fallthrough logic that the DD path implements via switch/[[fallthrough]].

Problem

The static dispatch in faiss/impl/simd_dispatch.h performs a single check:

if constexpr (available_levels & (1 << int(SINGLE_SIMD_LEVEL))) {
    return action.template operator()<SINGLE_SIMD_LEVEL>();
} else {
    return action.template operator()<SIMDLevel::NONE>();
}

This is a binary decision and none of the predefined masks (A0, A1, A2, AVX2_NEON, MINIMAX_HEAP_SIMD_LEVELS) include AVX512_SPR, so in a static avx512_spr build, it resolves to scalar.

For example, RaBitQuantizer dispatches via with_selected_simd_levels<AVAILABLE_SIMD_LEVELS_A0>(...).
AVAILABLE_SIMD_LEVELS_A0 is defined as:

constexpr int AVAILABLE_SIMD_LEVELS_A0 =
       AVAILABLE_SIMD_LEVELS_AVX2_NEON | (1 << int(SIMDLevel::AVX512));

This mask includes bits for NONE (0), AVX2 (1), AVX512 (2), and ARM_NEON (4) but it does NOT include AVX512_SPR (3).

The DD path handles this correctly because its switch statement falls through from AVX512_SPR --> AVX512 --> AVX2 --> NONE, picking the best available implementation. The static path had no equivalent mechanism.

Affected configurations

Static build Mask Dispatches to (current) Should dispatch to
avx512_spr A0, A1, MINIMAX NONE AVX512
avx512_spr A2, AVX2_NEON NONE AVX2
avx512 A2, AVX2_NEON NONE AVX2
arm_sve A0, AVX2_NEON NONE ARM_NEON

DD builds are not affected.

Fix

Add a compile-time fallthrough chain in the static dispatch path that mirrors the DD runtime behavior:

  • AVX512_SPR --> try AVX512 --> try AVX2 --> NONE
  • AVX512 --> try AVX2 --> NONE
  • ARM_SVE --> try ARM_NEON --> NONE

This would fix broken (level x mask) combinations across distances, RaBitQ, scalar/product quantizer, HNSW, IndexFlat, IVF, and fused distances.

…ve builds

The static (non-DD) dispatch path in with_selected_simd_levels performs a
single exact-match check against SINGLE_SIMD_LEVEL. When the compiled
level is not in the available-levels mask, it falls directly to NONE
(scalar) instead of trying lower SIMD levels.

No predefined mask includes AVX512_SPR, and no AVX512_SPR template
specializations exist, so static avx512_spr builds dispatch every
SIMD-accelerated function to scalar. Static avx512 builds also regress
to scalar for 256-bit operations (AVX2_NEON mask), and static arm_sve
builds lose ARM_NEON fallback.

Add a compile-time fallthrough chain mirroring the DD switch/fallthrough:
  x86: AVX512_SPR -> AVX512 -> AVX2 -> NONE
  ARM: ARM_SVE -> ARM_NEON -> NONE

Fixes 9 broken (level x mask) combinations across distances, RaBitQ,
scalar/product quantizer, HNSW, IndexFlat, IVF, and fused distances.

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@meta-cla meta-cla bot added the CLA Signed label Apr 7, 2026
@mulugetam mulugetam changed the title FIX: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds Fix: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds Apr 7, 2026
@alibeklfc
Copy link
Copy Markdown
Contributor

Hi! Thank you for your contribution!
We are now almost done refactoring SIMD code in FAISS. Once we finish, we will review this diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants