Skip to content

Commit 28b2b66

Browse files
mdouzemeta-codesync[bot]
authored andcommitted
Inline PQ code distance kernels into scanner TUs (#5159)
Summary: Pull Request resolved: #5159 After the SIMD dispatch refactoring, the PQ code distance implementations (pq_code_distance_single_impl<SL>, pq_code_distance_four_impl<SL>) lived in separate translation units from the scanner loops (IVFPQScannerT::scan_list_with_table, PQDistanceComputer::distance_to_code). The compiler could not inline the SIMD gather/accumulate code into the hot inner loops. This diff converts the per-SIMD pq_code_distance .cpp files to .h headers and includes them in the corresponding scanner TUs before the scanner _impl.h includes. This puts the kernel definitions in the same TU, enabling the compiler to inline the AVX2/AVX512 vgatherdps code directly into scan_list_with_table and scan_list_polysemous_hc. Changes: - pq_code_distance-avx2.cpp → pq_code_distance-avx2.h (header, #pragma once) - pq_code_distance-avx512.cpp → pq_code_distance-avx512.h (same) - New pq_code_distance-generic.h with inline NONE/ARM_NEON specializations - Scanner TUs (avx2.cpp, avx512.cpp, neon.cpp) include the PQ distance headers - PQCodeDistance gains static constexpr simd_level member - scan_list_polysemous uses PQCodeDist::simd_level for Hamming computer dispatch (was hardcoded to SIMDLevel::NONE) - Scanner TUs include per-ISA hamming_computer headers for SIMD Hamming dispatch - Build files (xplat.bzl, CMakeLists.txt) updated Verified via objdump that scan_list_with_table and scan_list_polysemous_hc contain zero calls to pq_code_distance_*_impl — all AVX2/AVX512 gather code is fully inlined. Reviewed By: algoriddle Differential Revision: D102942787 fbshipit-source-id: d2979f9cd629652ac1de4886afd3c335fbb4fac8
1 parent 417c53e commit 28b2b66

19 files changed

Lines changed: 1954 additions & 1037 deletions

faiss/CMakeLists.txt

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
set(FAISS_SIMD_AVX2_SRC
1212
impl/fast_scan/impl-avx2.cpp
1313
impl/hnsw/avx2.cpp
14-
impl/pq_code_distance/pq_code_distance-avx2.cpp
14+
impl/pq_code_distance/avx2.cpp
1515
impl/scalar_quantizer/sq-avx2.cpp
1616
impl/approx_topk/avx2.cpp
1717
impl/binary_hamming/avx2.cpp
@@ -25,7 +25,7 @@ set(FAISS_SIMD_AVX2_SRC
2525
set(FAISS_SIMD_AVX512_SRC
2626
impl/fast_scan/impl-avx512.cpp
2727
impl/hnsw/avx512.cpp
28-
impl/pq_code_distance/pq_code_distance-avx512.cpp
28+
impl/pq_code_distance/avx512.cpp
2929
impl/scalar_quantizer/sq-avx512.cpp
3030
impl/binary_hamming/avx512.cpp
3131
utils/simd_impl/distances_avx512.cpp
@@ -39,6 +39,7 @@ set(FAISS_SIMD_NEON_SRC
3939
impl/scalar_quantizer/sq-neon.cpp
4040
impl/approx_topk/neon.cpp
4141
impl/binary_hamming/neon.cpp
42+
impl/pq_code_distance/neon.cpp
4243
utils/simd_impl/distances_aarch64.cpp
4344
utils/hamming_distance/hamming_neon.cpp
4445
utils/simd_impl/partitioning_neon.cpp
@@ -133,6 +134,7 @@ set(FAISS_SRC
133134
impl/PolysemousTraining.cpp
134135
impl/ProductQuantizer.cpp
135136
impl/pq_code_distance/pq_code_distance-generic.cpp
137+
impl/pq_code_distance/IVFPQ_QueryTables.cpp
136138
impl/AdditiveQuantizer.cpp
137139
impl/RaBitQuantizer.cpp
138140
impl/RaBitQuantizerMultiBit.cpp
@@ -311,8 +313,13 @@ set(FAISS_HEADERS
311313
impl/simd_dispatch.h
312314
impl/fast_scan/simd_result_handlers.h
313315
impl/zerocopy_io.h
314-
utils/pq_code_distance.h
315316
impl/pq_code_distance/pq_code_distance-inl.h
317+
impl/pq_code_distance/pq_code_distance-avx2.h
318+
impl/pq_code_distance/pq_code_distance-avx512.h
319+
impl/pq_code_distance/pq_code_distance-generic.h
320+
impl/pq_code_distance/IVFPQ_QueryTables.h
321+
impl/pq_code_distance/IVFPQScanner_impl.h
322+
impl/pq_code_distance/PQDistanceComputer_impl.h
316323
invlists/BlockInvertedLists.h
317324
invlists/DirectMap.h
318325
invlists/InvertedLists.h

0 commit comments

Comments
 (0)