Fix: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds#5057
Open
mulugetam wants to merge 1 commit intofacebookresearch:mainfrom
Open
Fix: static SIMD dispatch falls to scalar for avx512_spr/avx512/arm_sve builds#5057mulugetam wants to merge 1 commit intofacebookresearch:mainfrom
mulugetam wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
…ve builds The static (non-DD) dispatch path in with_selected_simd_levels performs a single exact-match check against SINGLE_SIMD_LEVEL. When the compiled level is not in the available-levels mask, it falls directly to NONE (scalar) instead of trying lower SIMD levels. No predefined mask includes AVX512_SPR, and no AVX512_SPR template specializations exist, so static avx512_spr builds dispatch every SIMD-accelerated function to scalar. Static avx512 builds also regress to scalar for 256-bit operations (AVX2_NEON mask), and static arm_sve builds lose ARM_NEON fallback. Add a compile-time fallthrough chain mirroring the DD switch/fallthrough: x86: AVX512_SPR -> AVX512 -> AVX2 -> NONE ARM: ARM_SVE -> ARM_NEON -> NONE Fixes 9 broken (level x mask) combinations across distances, RaBitQ, scalar/product quantizer, HNSW, IndexFlat, IVF, and fused distances. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
Contributor
|
Hi! Thank you for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While running
benchs/bench_rabitq.py, I observed that the execution path is using scalar instructions instead of AVX-512, despite FAISS being built with a static dispatch target ofavx512_spr. Below is a summary and a fix (Unless this was a deliberate design decision, I’m not sure what the reasoning is).Summary
Static (non-DD) builds with
avx512_spr, and partiallyavx512andarm_sve, silently fall back to scalar (SIMDLevel::NONE) for all SIMD-dispatched functions. This is because the static dispatch path inwith_selected_simd_levelslacks the fallthrough logic that the DD path implements viaswitch/[[fallthrough]].Problem
The static dispatch in
faiss/impl/simd_dispatch.hperforms a single check:This is a binary decision and none of the predefined masks (
A0,A1,A2,AVX2_NEON,MINIMAX_HEAP_SIMD_LEVELS) includeAVX512_SPR, so in a staticavx512_sprbuild, it resolves to scalar.For example,
RaBitQuantizerdispatches viawith_selected_simd_levels<AVAILABLE_SIMD_LEVELS_A0>(...).AVAILABLE_SIMD_LEVELS_A0is defined as:This mask includes bits for
NONE (0),AVX2 (1),AVX512 (2), andARM_NEON (4)but it does NOT includeAVX512_SPR (3).The DD path handles this correctly because its
switchstatement falls through fromAVX512_SPR-->AVX512-->AVX2-->NONE, picking the best available implementation. The static path had no equivalent mechanism.Affected configurations
avx512_sprNONEAVX512avx512_sprNONEAVX2avx512NONEAVX2arm_sveNONEARM_NEONDD builds are not affected.
Fix
Add a compile-time fallthrough chain in the static dispatch path that mirrors the DD runtime behavior:
AVX512_SPR--> tryAVX512--> tryAVX2-->NONEAVX512--> tryAVX2-->NONEARM_SVE--> tryARM_NEON-->NONEThis would fix broken (level x mask) combinations across distances, RaBitQ, scalar/product quantizer, HNSW, IndexFlat, IVF, and fused distances.