Skip to content

ScalarQuantizer: split SIMD specializations into per-SIMD TUs + DD dispatch#4839

Closed
algoriddle wants to merge 1 commit into
facebookresearch:mainfrom
algoriddle:export-D94375408
Closed

ScalarQuantizer: split SIMD specializations into per-SIMD TUs + DD dispatch#4839
algoriddle wants to merge 1 commit into
facebookresearch:mainfrom
algoriddle:export-D94375408

Conversation

@algoriddle
Copy link
Copy Markdown
Contributor

Summary:
Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (COMPILE_SIMD_*, with_simd_level, DISPATCH_SIMDLevel).

This follows the established pattern from pq_code_distance/ and
distances/.

New files:

  • sq_impl.h — declares sq_select_quantizer<SL>,
    sq_select_distance_computer<SL>, sq_select_InvertedListScanner<SL>
  • sq-inl.h — private implementation header with shared template bodies
    (select_quantizer_1_body, select_distance_computer_body,
    select_InvertedListScanner_body) and scanner class templates
    (IVFSQScannerIP, IVFSQScannerL2)
  • sq-generic.cppSIMDLevel::NONE specializations (always compiled)
  • sq-avx2.cppSIMDLevel::AVX2 specializations (d%8 alignment)
  • sq-avx512.cppSIMDLevel::AVX512 + AVX512_SPR forwarding
  • sq-neon.cppSIMDLevel::ARM_NEON specializations (d%8 alignment)

Modified files:

  • ScalarQuantizer.cpp — rewritten to use with_simd_level dispatch
    with nullptr-fallback to NONE
  • quantizers.h, distance_computers.h — lint formatting only
  • xplat.bzl, CMakeLists.txt — register new SIMD files and headers

Each per-SIMD factory returns nullptr when the dimension doesn't align,
and the caller falls back to NONE. This avoids ODR issues from
instantiating <NONE> templates in multiple TUs.

The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(__AVX512F__, __AVX2__, USE_NEON, etc.) because COMPILE_SIMD_*
macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The USE_* macros are now defined in
sq-inl.h.

Differential Revision: D94375408

@meta-cla meta-cla Bot added the CLA Signed label Feb 25, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Feb 25, 2026

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94375408.

algoriddle added a commit to algoriddle/faiss that referenced this pull request Feb 25, 2026
…spatch (facebookresearch#4839)

Summary:

Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`).

This follows the established pattern from `pq_code_distance/` and
`distances/`.

**New files:**
- `sq_impl.h` — declares `sq_select_quantizer<SL>`,
  `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>`
- `sq-inl.h` — private implementation header with shared template bodies
  (`select_quantizer_1_body`, `select_distance_computer_body`,
  `select_InvertedListScanner_body`) and scanner class templates
  (`IVFSQScannerIP`, `IVFSQScannerL2`)
- `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled)
- `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment)
- `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding
- `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment)

**Modified files:**
- `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch
  with nullptr-fallback to NONE
- `quantizers.h`, `distance_computers.h` — lint formatting only
- `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers

Each per-SIMD factory returns `nullptr` when the dimension doesn't align,
and the caller falls back to NONE. This avoids ODR issues from
instantiating `<NONE>` templates in multiple TUs.

The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*`
macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The `USE_*` macros are now defined in
`sq-inl.h`.

Differential Revision: D94375408
algoriddle added a commit to algoriddle/faiss that referenced this pull request Feb 26, 2026
…spatch (facebookresearch#4839)

Summary:

Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`).

This follows the established pattern from `pq_code_distance/` and
`distances/`.

**New files:**
- `sq_impl.h` — declares `sq_select_quantizer<SL>`,
  `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>`
- `sq-inl.h` — private implementation header with shared template bodies
  (`select_quantizer_1_body`, `select_distance_computer_body`,
  `select_InvertedListScanner_body`) and scanner class templates
  (`IVFSQScannerIP`, `IVFSQScannerL2`)
- `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled)
- `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment)
- `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding
- `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment)

**Modified files:**
- `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch
  with nullptr-fallback to NONE
- `quantizers.h`, `distance_computers.h` — lint formatting only
- `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers

Each per-SIMD factory returns `nullptr` when the dimension doesn't align,
and the caller falls back to NONE. This avoids ODR issues from
instantiating `<NONE>` templates in multiple TUs.

The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*`
macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The `USE_*` macros are now defined in
`sq-inl.h`.

Differential Revision: D94375408
algoriddle added a commit to algoriddle/faiss that referenced this pull request Feb 26, 2026
…spatch (facebookresearch#4839)

Summary:

Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`).

This follows the established pattern from `pq_code_distance/` and
`distances/`.

**New files:**
- `sq_impl.h` — declares `sq_select_quantizer<SL>`,
  `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>`
- `sq-inl.h` — private implementation header with shared template bodies
  (`select_quantizer_1_body`, `select_distance_computer_body`,
  `select_InvertedListScanner_body`) and scanner class templates
  (`IVFSQScannerIP`, `IVFSQScannerL2`)
- `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled)
- `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment)
- `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding
- `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment)

**Modified files:**
- `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch
  with nullptr-fallback to NONE
- `quantizers.h`, `distance_computers.h` — lint formatting only
- `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers

Each per-SIMD factory returns `nullptr` when the dimension doesn't align,
and the caller falls back to NONE. This avoids ODR issues from
instantiating `<NONE>` templates in multiple TUs.

The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*`
macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The `USE_*` macros are now defined in
`sq-inl.h`.

Differential Revision: D94375408
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 2, 2026
…spatch (facebookresearch#4839)

Summary:

Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`).

This follows the established pattern from `pq_code_distance/` and
`distances/`.

**New files:**
- `sq_impl.h` — declares `sq_select_quantizer<SL>`,
  `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>`
- `sq-inl.h` — private implementation header with shared template bodies
  (`select_quantizer_1_body`, `select_distance_computer_body`,
  `select_InvertedListScanner_body`) and scanner class templates
  (`IVFSQScannerIP`, `IVFSQScannerL2`)
- `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled)
- `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment)
- `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding
- `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment)

**Modified files:**
- `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch
  with nullptr-fallback to NONE
- `quantizers.h`, `distance_computers.h` — lint formatting only
- `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers

Each per-SIMD factory returns `nullptr` when the dimension doesn't align,
and the caller falls back to NONE. This avoids ODR issues from
instantiating `<NONE>` templates in multiple TUs.

The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*`
macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The `USE_*` macros are now defined in
`sq-inl.h`.

Differential Revision: D94375408
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 2, 2026
…spatch (facebookresearch#4839)

Summary:

Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
into per-SIMD compilation units and wire up the Dynamic Dispatch (DD)
infrastructure (`COMPILE_SIMD_*`, `with_simd_level`, `DISPATCH_SIMDLevel`).

This follows the established pattern from `pq_code_distance/` and
`distances/`.

**New files:**
- `sq_impl.h` — declares `sq_select_quantizer<SL>`,
  `sq_select_distance_computer<SL>`, `sq_select_InvertedListScanner<SL>`
- `sq-inl.h` — private implementation header with shared template bodies
  (`select_quantizer_1_body`, `select_distance_computer_body`,
  `select_InvertedListScanner_body`) and scanner class templates
  (`IVFSQScannerIP`, `IVFSQScannerL2`)
- `sq-generic.cpp` — `SIMDLevel::NONE` specializations (always compiled)
- `sq-avx2.cpp` — `SIMDLevel::AVX2` specializations (`d%8` alignment)
- `sq-avx512.cpp` — `SIMDLevel::AVX512` + `AVX512_SPR` forwarding
- `sq-neon.cpp` — `SIMDLevel::ARM_NEON` specializations (`d%8` alignment)

**Modified files:**
- `ScalarQuantizer.cpp` — rewritten to use `with_simd_level` dispatch
  with nullptr-fallback to NONE
- `quantizers.h`, `distance_computers.h` — lint formatting only
- `xplat.bzl`, `CMakeLists.txt` — register new SIMD files and headers

Each per-SIMD factory returns `nullptr` when the dimension doesn't align,
and the caller falls back to NONE. This avoids ODR issues from
instantiating `<NONE>` templates in multiple TUs.

The sub-headers (codecs.h, quantizers.h, similarities.h,
distance_computers.h) keep their original compiler-defined guards
(`__AVX512F__`, `__AVX2__`, `USE_NEON`, etc.) because `COMPILE_SIMD_*`
macros are globally visible in DD mode but the SIMD intrinsics are only
available in per-SIMD TUs. The `USE_*` macros are now defined in
`sq-inl.h`.

Differential Revision: D94375408
…spatch (facebookresearch#4839)

Summary:

Split the SIMD-gated template specializations out of ScalarQuantizer.cpp
and the shared headers into per-SIMD compilation units and wire up the
Dynamic Dispatch (DD) infrastructure (COMPILE_SIMD_*, with_simd_level).

**What moved where**

- SIMD specializations removed from `codecs.h`, `quantizers.h`,
  `similarities.h`, `distance_computers.h` — these now contain only
  primary templates and scalar (`SIMDLevel::NONE`) specializations.
  (Most use empty primary templates; `quantizers.h` uses an inheriting
  fallback pattern for `QuantizerFP16`, `QuantizerBF16`, etc.)
- SIMD specializations moved into `sq-avx2.cpp` / `sq-avx512.cpp` /
  `sq-neon.cpp`, each guarded by `COMPILE_SIMD_*`.
- `sq-generic.cpp` deleted — the `NONE` level is now instantiated
  directly in `ScalarQuantizer.cpp` via `sq-dispatch.h`.
- `sq-inl.h` renamed to `scanners.h`.

**Dispatch mechanism**

- `sq-dispatch.h` is an X-macro-style header: each per-SIMD `.cpp` file
  `#define`s `THE_LEVEL_TO_DISPATCH` and `#include`s it to stamp out
  explicit template specializations of the selection functions
  (`sq_select_quantizer`, `sq_select_distance_computer`,
  `sq_select_InvertedListScanner`).
- `ScalarQuantizer.cpp` uses `with_simd_level` for runtime dispatch
  and instantiates the `NONE` level via the same `sq-dispatch.h`.
- Each per-SIMD selection function returns `nullptr` when the dimension
  doesn't align, and the caller falls back to `NONE`.
- `sq-neon.cpp` handles both `ARM_NEON` and `ARM_SVE` (SVE forwards
  to NEON — no dedicated SVE SQ implementation yet).

**Build**

- `xplat.bzl`, `CMakeLists.txt` — register new SIMD source files and
  headers.
- Within the SQ module, `COMPILE_SIMD_*` macros gate all SIMD code
  paths. (Compiler-defined macros like `__AVX2__` are still used in
  lower-level shared headers like `simdlib.h` and `fp16.h`.)

Differential Revision: D94375408
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Mar 3, 2026

This pull request has been merged in ccc934f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants