ScalarQuantizer: split SIMD specializations into per-SIMD TUs + DD dispatch#4839
Closed
algoriddle wants to merge 1 commit into
Closed
ScalarQuantizer: split SIMD specializations into per-SIMD TUs + DD dispatch#4839algoriddle wants to merge 1 commit into
algoriddle wants to merge 1 commit into
Conversation
Contributor
|
@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94375408. |
8d56538 to
2f94b02
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Feb 25, 2026
…spatch (facebookresearch#4839) Summary: Split the SIMD-gated template specializations out of ScalarQuantizer.cpp into per-SIMD compilation units and wire up the Dynamic Dispatch (DD) infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`). This follows the established pattern from `pq_code_distance/` and `distances/`. **New files:** - `sq_impl.h` — declares `sq_select_quantizer<SL>`, `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>` - `sq-inl.h` — private implementation header with shared template bodies (`select_quantizer_1_body`, `select_distance_computer_body`, `select_InvertedListScanner_body`) and scanner class templates (`IVFSQScannerIP`, `IVFSQScannerL2`) - `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled) - `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment) - `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding - `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment) **Modified files:** - `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch with nullptr-fallback to NONE - `quantizers.h`, `distance_computers.h` — lint formatting only - `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers Each per-SIMD factory returns `nullptr` when the dimension doesn't align, and the caller falls back to NONE. This avoids ODR issues from instantiating `<NONE>` templates in multiple TUs. The sub-headers (codecs.h, quantizers.h, similarities.h, distance_computers.h) keep their original compiler-defined guards (`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*` macros are globally visible in DD mode but the SIMD intrinsics are only available in per-SIMD TUs. The `USE_*` macros are now defined in `sq-inl.h`. Differential Revision: D94375408
2f94b02 to
2fa9109
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Feb 26, 2026
…spatch (facebookresearch#4839) Summary: Split the SIMD-gated template specializations out of ScalarQuantizer.cpp into per-SIMD compilation units and wire up the Dynamic Dispatch (DD) infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`). This follows the established pattern from `pq_code_distance/` and `distances/`. **New files:** - `sq_impl.h` — declares `sq_select_quantizer<SL>`, `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>` - `sq-inl.h` — private implementation header with shared template bodies (`select_quantizer_1_body`, `select_distance_computer_body`, `select_InvertedListScanner_body`) and scanner class templates (`IVFSQScannerIP`, `IVFSQScannerL2`) - `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled) - `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment) - `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding - `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment) **Modified files:** - `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch with nullptr-fallback to NONE - `quantizers.h`, `distance_computers.h` — lint formatting only - `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers Each per-SIMD factory returns `nullptr` when the dimension doesn't align, and the caller falls back to NONE. This avoids ODR issues from instantiating `<NONE>` templates in multiple TUs. The sub-headers (codecs.h, quantizers.h, similarities.h, distance_computers.h) keep their original compiler-defined guards (`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*` macros are globally visible in DD mode but the SIMD intrinsics are only available in per-SIMD TUs. The `USE_*` macros are now defined in `sq-inl.h`. Differential Revision: D94375408
2fa9109 to
9607c46
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Feb 26, 2026
…spatch (facebookresearch#4839) Summary: Split the SIMD-gated template specializations out of ScalarQuantizer.cpp into per-SIMD compilation units and wire up the Dynamic Dispatch (DD) infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`). This follows the established pattern from `pq_code_distance/` and `distances/`. **New files:** - `sq_impl.h` — declares `sq_select_quantizer<SL>`, `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>` - `sq-inl.h` — private implementation header with shared template bodies (`select_quantizer_1_body`, `select_distance_computer_body`, `select_InvertedListScanner_body`) and scanner class templates (`IVFSQScannerIP`, `IVFSQScannerL2`) - `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled) - `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment) - `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding - `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment) **Modified files:** - `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch with nullptr-fallback to NONE - `quantizers.h`, `distance_computers.h` — lint formatting only - `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers Each per-SIMD factory returns `nullptr` when the dimension doesn't align, and the caller falls back to NONE. This avoids ODR issues from instantiating `<NONE>` templates in multiple TUs. The sub-headers (codecs.h, quantizers.h, similarities.h, distance_computers.h) keep their original compiler-defined guards (`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*` macros are globally visible in DD mode but the SIMD intrinsics are only available in per-SIMD TUs. The `USE_*` macros are now defined in `sq-inl.h`. Differential Revision: D94375408
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 2, 2026
…spatch (facebookresearch#4839) Summary: Split the SIMD-gated template specializations out of ScalarQuantizer.cpp into per-SIMD compilation units and wire up the Dynamic Dispatch (DD) infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`). This follows the established pattern from `pq_code_distance/` and `distances/`. **New files:** - `sq_impl.h` — declares `sq_select_quantizer<SL>`, `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>` - `sq-inl.h` — private implementation header with shared template bodies (`select_quantizer_1_body`, `select_distance_computer_body`, `select_InvertedListScanner_body`) and scanner class templates (`IVFSQScannerIP`, `IVFSQScannerL2`) - `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled) - `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment) - `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding - `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment) **Modified files:** - `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch with nullptr-fallback to NONE - `quantizers.h`, `distance_computers.h` — lint formatting only - `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers Each per-SIMD factory returns `nullptr` when the dimension doesn't align, and the caller falls back to NONE. This avoids ODR issues from instantiating `<NONE>` templates in multiple TUs. The sub-headers (codecs.h, quantizers.h, similarities.h, distance_computers.h) keep their original compiler-defined guards (`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*` macros are globally visible in DD mode but the SIMD intrinsics are only available in per-SIMD TUs. The `USE_*` macros are now defined in `sq-inl.h`. Differential Revision: D94375408
9607c46 to
e47544e
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 2, 2026
…spatch (facebookresearch#4839) Summary: Split the SIMD-gated template specializations out of ScalarQuantizer.cpp into per-SIMD compilation units and wire up the Dynamic Dispatch (DD) infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`). This follows the established pattern from `pq_code_distance/` and `distances/`. **New files:** - `sq_impl.h` — declares `sq_select_quantizer<SL>`, `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>` - `sq-inl.h` — private implementation header with shared template bodies (`select_quantizer_1_body`, `select_distance_computer_body`, `select_InvertedListScanner_body`) and scanner class templates (`IVFSQScannerIP`, `IVFSQScannerL2`) - `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled) - `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment) - `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding - `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment) **Modified files:** - `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch with nullptr-fallback to NONE - `quantizers.h`, `distance_computers.h` — lint formatting only - `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers Each per-SIMD factory returns `nullptr` when the dimension doesn't align, and the caller falls back to NONE. This avoids ODR issues from instantiating `<NONE>` templates in multiple TUs. The sub-headers (codecs.h, quantizers.h, similarities.h, distance_computers.h) keep their original compiler-defined guards (`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*` macros are globally visible in DD mode but the SIMD intrinsics are only available in per-SIMD TUs. The `USE_*` macros are now defined in `sq-inl.h`. Differential Revision: D94375408
e47544e to
744fa00
Compare
…spatch (facebookresearch#4839) Summary: Split the SIMD-gated template specializations out of ScalarQuantizer.cpp and the shared headers into per-SIMD compilation units and wire up the Dynamic Dispatch (DD) infrastructure (COMPILE_SIMD_*, with_simd_level). **What moved where** - SIMD specializations removed from `codecs.h`, `quantizers.h`, `similarities.h`, `distance_computers.h` — these now contain only primary templates and scalar (`SIMDLevel::NONE`) specializations. (Most use empty primary templates; `quantizers.h` uses an inheriting fallback pattern for `QuantizerFP16`, `QuantizerBF16`, etc.) - SIMD specializations moved into `sq-avx2.cpp` / `sq-avx512.cpp` / `sq-neon.cpp`, each guarded by `COMPILE_SIMD_*`. - `sq-generic.cpp` deleted — the `NONE` level is now instantiated directly in `ScalarQuantizer.cpp` via `sq-dispatch.h`. - `sq-inl.h` renamed to `scanners.h`. **Dispatch mechanism** - `sq-dispatch.h` is an X-macro-style header: each per-SIMD `.cpp` file `#define`s `THE_LEVEL_TO_DISPATCH` and `#include`s it to stamp out explicit template specializations of the selection functions (`sq_select_quantizer`, `sq_select_distance_computer`, `sq_select_InvertedListScanner`). - `ScalarQuantizer.cpp` uses `with_simd_level` for runtime dispatch and instantiates the `NONE` level via the same `sq-dispatch.h`. - Each per-SIMD selection function returns `nullptr` when the dimension doesn't align, and the caller falls back to `NONE`. - `sq-neon.cpp` handles both `ARM_NEON` and `ARM_SVE` (SVE forwards to NEON — no dedicated SVE SQ implementation yet). **Build** - `xplat.bzl`, `CMakeLists.txt` — register new SIMD source files and headers. - Within the SQ module, `COMPILE_SIMD_*` macros gate all SIMD code paths. (Compiler-defined macros like `__AVX2__` are still used in lower-level shared headers like `simdlib.h` and `fp16.h`.) Differential Revision: D94375408
744fa00 to
455e03e
Compare
Contributor
|
This pull request has been merged in ccc934f. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (
COMPILE_SIMD_*,with_simd_level,DISPATCH_SIMDLevel).This follows the established pattern from
pq_code_distance/anddistances/.New files:
sq_impl.h— declaressq_select_quantizer<SL>,sq_select_distance_computer<SL>,sq_select_InvertedListScanner<SL>sq-inl.h— private implementation header with shared template bodies(
select_quantizer_1_body,select_distance_computer_body,select_InvertedListScanner_body) and scanner class templates(
IVFSQScannerIP,IVFSQScannerL2)sq-generic.cpp—SIMDLevel::NONEspecializations (always compiled)sq-avx2.cpp—SIMDLevel::AVX2specializations (d%8alignment)sq-avx512.cpp—SIMDLevel::AVX512+AVX512_SPRforwardingsq-neon.cpp—SIMDLevel::ARM_NEONspecializations (d%8alignment)Modified files:
ScalarQuantizer.cpp— rewritten to usewith_simd_leveldispatchwith nullptr-fallback to NONE
quantizers.h,distance_computers.h— lint formatting onlyxplat.bzl,CMakeLists.txt— register new SIMD files and headersEach per-SIMD factory returns
nullptrwhen the dimension doesn't align,and the caller falls back to NONE. This avoids ODR issues from
instantiating
<NONE>templates in multiple TUs.The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(
__AVX512F__,__AVX2__,USE_NEON, etc.) becauseCOMPILE_SIMD_*macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The
USE_*macros are now defined insq-inl.h.Differential Revision: D94375408