Skip to content

Commit 0320279

Browse files
alexanderguzhvameta-codesync[bot]
authored andcommitted
Introduce RVV (#5156)
Summary: I've verified the successful compilation via docker + qemu combination. Also, this PR introduces all the needed overrides, which rely on default implementation, but later can be populated with the vectorized code. Also, fixes a typo in SQ4U. algoriddle I would appreciate if you could take a look whenever you have time. Thanks! Pull Request resolved: #5156 Reviewed By: mdouze Differential Revision: D102911491 Pulled By: mnorris11 fbshipit-source-id: f3555987cdbf8d32a4fbc9d07ec2307f0f8a112a
1 parent 715725d commit 0320279

16 files changed

Lines changed: 539 additions & 22 deletions

File tree

faiss/CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,12 @@ set(FAISS_SIMD_SVE_SRC
5151
utils/simd_impl/distances_arm_sve.cpp
5252
)
5353
set(FAISS_SIMD_RVV_SRC
54+
impl/pq_code_distance/rvv.cpp
5455
impl/scalar_quantizer/sq-rvv.cpp
56+
impl/binary_hamming/rvv.cpp
57+
utils/simd_impl/distances_rvv.cpp
58+
utils/hamming_distance/hamming_rvv.cpp
59+
utils/simd_impl/rabitq_rvv.cpp
5560
)
5661
# Select SIMD sources based on target architecture
5762
if(CMAKE_SYSTEM_PROCESSOR MATCHES "(x86_64|amd64|AMD64)")
@@ -370,6 +375,7 @@ set(FAISS_HEADERS
370375
utils/hamming_distance/hamming_computer-avx512.h
371376
utils/hamming_distance/hamming_computer-generic.h
372377
utils/hamming_distance/hamming_computer-neon.h
378+
utils/hamming_distance/hamming_computer-rvv.h
373379
utils/simd_impl/distances_autovec-inl.h
374380
utils/simd_impl/distances_simdlib256.h
375381
utils/hamming_distance/hamming_impl.h

faiss/docs/simd_dynamic_dispatch_migration.md

Lines changed: 33 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44

55
Single Instruction, Multiple Data (SIMD) is used heavily in Faiss to speed up
66
many types of operations. This includes AVX2, AVX512 (various flavors) for
7-
x86_64 CPUs and NEON, SVE for ARM CPUs. SIMD code that is run on a machine
8-
that does not support it will crash with SIGILL (illegal instruction signal),
9-
therefore it is important to select the right implementation for the current
10-
machine.
7+
x86_64 CPUs, NEON and SVE for ARM CPUs, and RVV for RISC-V CPUs. SIMD code
8+
that is run on a machine that does not support it will crash with SIGILL
9+
(illegal instruction signal), therefore it is important to select the right
10+
implementation for the current machine.
1111

1212
Faiss is transitioning from a **monolithic SIMD** model to a **dynamic
1313
dispatch** model. New code should be written with dynamic dispatch in mind.
@@ -211,12 +211,13 @@ FlatCodesDistanceComputer* get_distance_computer() {
211211
```
212212

213213
**Dispatch masks:** `with_simd_level` assumes NONE + AVX2 + AVX512 +
214-
ARM_NEON implementations exist. If your function has another subset of available
215-
implementations, it can be passed with
214+
ARM_NEON + RISCV_RVV implementations exist. If your function has another subset
215+
of available implementations, it can be passed with
216216
`with_selected_simd_levels<mask>` with a bitmask of available levels. Missing
217217
levels in the mask cause the dispatch to **fall through** to the next lower
218218
level in the same architecture family (x86: AVX512_SPR → AVX512 → AVX2 →
219-
NONE; ARM: ARM_SVE → ARM_NEON → NONE — x86 and ARM chains are independent):
219+
NONE; ARM: ARM_SVE → ARM_NEON → NONE; RISC-V: RISCV_RVV → NONE —
220+
architecture chains are independent):
220221

221222
```cpp
222223
// Only NONE, AVX2, and ARM_SVE implementations exist.
@@ -237,7 +238,7 @@ your own with `(1 << int(SIMDLevel::X)) | ...`):
237238
|------|--------|---------|
238239
| `AVAILABLE_SIMD_LEVELS_NONE` | NONE only | Scalar-only functions |
239240
| `AVAILABLE_SIMD_LEVELS_AVX2_NEON` | NONE, AVX2, ARM_NEON | 256-bit `simdlib` ops (`with_simd_level_256bit`) |
240-
| `AVAILABLE_SIMD_LEVELS_A0` | NONE, AVX2, AVX512, ARM_NEON | Default (`with_simd_level`) |
241+
| `AVAILABLE_SIMD_LEVELS_A0` | NONE, AVX2, AVX512, ARM_NEON, RISCV_RVV | Default (`with_simd_level`) |
241242
| `AVAILABLE_SIMD_LEVELS_A1` | A0 + ARM_SVE | Functions with dedicated SVE implementations |
242243
| `AVAILABLE_SIMD_LEVELS_ALL` | All levels | Identity / diagnostic functions |
243244

@@ -265,6 +266,10 @@ set(FAISS_SIMD_SVE_SRC
265266
# ... existing entries ...
266267
path/to/functions_sve.cpp # <-- add (if SVE implementation exists)
267268
)
269+
set(FAISS_SIMD_RVV_SRC
270+
# ... existing entries ...
271+
path/to/functions_rvv.cpp # <-- add (if RVV implementation exists)
272+
)
268273
# Also add any new headers to FAISS_HEADERS
269274
```
270275

@@ -289,6 +294,7 @@ SIMD_FILES = {
289294
"path/to/functions_avx2.cpp": (X86_64, AVX2),
290295
"path/to/functions_avx512.cpp": (X86_64, AVX512),
291296
"path/to/functions_neon.cpp": (AARCH64, ARM_NEON),
297+
"path/to/functions_rvv.cpp": (RISCV64, RISCV_RVV),
292298
}
293299
# Also add headers to header_files()
294300
```
@@ -377,9 +383,9 @@ The `simdlib` wrappers (`simd8float32_tpl`,
377383
`simd8uint32_tpl`) provide portable 256-bit and 512-bit operations
378384
across AVX2, AVX512 and NEON (two 128 bit NEON registers are clumped
379385
together in 256 bits)
380-
There is **no simdlib for SVE** (`simdlib_sve.h` does not exist).
381-
Use raw intrinsics when you need SVE
382-
(variable-length vectors via `svcntw()`).
386+
There is **no simdlib for SVE or RVV** (`simdlib_sve.h` and `simdlib_rvv.h`
387+
do not exist). Use raw intrinsics when you need SVE (variable-length vectors
388+
via `svcntw()`) or RVV (variable-length vectors via `__riscv_vsetvl*`).
383389
An example of usage is with `-inl.h` files
384390
385391
**The include order matters** —
@@ -423,11 +429,17 @@ void my_kernel(...) {
423429
factory/constructor boundary. The constructed object carries its
424430
`SIMDLevel` as a compile-time template parameter.
425431

426-
5. **Private dispatch machinery.** `simd_dispatch.h` is internal — do not
432+
5. **Variable-width SIMD is not fixed-width simdlib.** SVE and RVV are
433+
variable-width architectures. Do not route them through fixed-width helpers
434+
such as `with_simd_level_256bit`, `with_simd_level_512bit`, or
435+
`simd8float32_tpl` unless an explicit selector maps them to a supported
436+
fixed-width fallback.
437+
438+
6. **Private dispatch machinery.** `simd_dispatch.h` is internal — do not
427439
include in public headers. The public API is `SIMDConfig` and `SIMDLevel`
428440
in `utils/simd_levels.h`.
429441

430-
6. **Build system parity.** Every change must be reflected in both
442+
7. **Build system parity.** Every change must be reflected in both
431443
CMakeLists.txt and Buck's xplat.bzl.
432444

433445
## Conversion approach
@@ -470,12 +482,17 @@ cd build_dd && ctest --output-on-failure
470482
# Verify dispatch at different levels (DD mode only)
471483
FAISS_SIMD_LEVEL=NONE ctest --output-on-failure
472484
FAISS_SIMD_LEVEL=AVX2 ctest --output-on-failure
485+
FAISS_SIMD_LEVEL=RISCV_RVV ctest --output-on-failure
473486

474487
# Also build/test static modes for comparison
475488
cmake -B build_avx2 -DFAISS_OPT_LEVEL=avx2 -DBUILD_TESTING=ON .
476489
cmake --build build_avx2 -j$(nproc) && cd build_avx2 && ctest --output-on-failure
477490
```
478491

492+
For RVV, build on `riscv64` or cross-build with RISC-V flags and run the
493+
resulting tests under hardware or QEMU with vector support enabled, for example
494+
`QEMU_CPU=rv64,v=true`.
495+
479496
### Buck (internal)
480497

481498
```bash
@@ -503,3 +520,6 @@ buck2 test -c faiss.dynamic_dispatch=true fbcode//faiss/tests:test_your_module
503520
- Building with CMake's default `FAISS_OPT_LEVEL=generic` and thinking DD is
504521
enabled — generic mode has no SIMD and no dispatch. Use
505522
`FAISS_OPT_LEVEL=dd` explicitly.
523+
- Treating SVE or RVV as fixed-width `simdlib` backends — they are
524+
variable-width ISAs and need raw-intrinsic implementations or explicit scalar
525+
fallbacks for fixed-width helper paths.

faiss/impl/binary_hamming/rvv.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
*
4+
* This source code is licensed under the MIT license found in the
5+
* LICENSE file in the root directory of this source tree.
6+
*/
7+
8+
#ifdef COMPILE_SIMD_RISCV_RVV
9+
10+
#define THE_SIMD_LEVEL SIMDLevel::RISCV_RVV
11+
12+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
13+
#include <faiss/utils/hamming_distance/hamming_computer-rvv.h>
14+
15+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
16+
#include <faiss/impl/binary_hamming/IndexBinaryHNSW_impl.h>
17+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
18+
#include <faiss/impl/binary_hamming/IndexBinaryHash_impl.h>
19+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
20+
#include <faiss/impl/binary_hamming/IndexBinaryIVF_impl.h>
21+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
22+
#include <faiss/impl/binary_hamming/IndexIVFSpectralHash_impl.h>
23+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
24+
#include <faiss/impl/binary_hamming/IndexPQ_impl.h>
25+
26+
#endif // COMPILE_SIMD_RISCV_RVV

faiss/impl/fast_scan/fast_scan.cpp

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,82 @@ void accumulate_to_mem(
416416

417417
namespace faiss {
418418

419+
#ifdef COMPILE_SIMD_RISCV_RVV
420+
template <>
421+
std::unique_ptr<FastScanCodeScanner> make_fast_scan_scanner_impl<
422+
SIMDLevel::RISCV_RVV>(
423+
bool is_max,
424+
int impl,
425+
size_t nq,
426+
size_t ntotal,
427+
int64_t k,
428+
float* distances,
429+
int64_t* ids,
430+
const IDSelector* sel,
431+
bool with_id_map) {
432+
return make_fast_scan_scanner_impl<SIMDLevel::NONE>(
433+
is_max, impl, nq, ntotal, k, distances, ids, sel, with_id_map);
434+
}
435+
436+
template <>
437+
std::unique_ptr<FastScanCodeScanner> make_range_scanner_impl<
438+
SIMDLevel::RISCV_RVV>(
439+
bool is_max,
440+
RangeSearchResult& rres,
441+
float radius,
442+
size_t ntotal,
443+
const IDSelector* sel) {
444+
return make_range_scanner_impl<SIMDLevel::NONE>(
445+
is_max, rres, radius, ntotal, sel);
446+
}
447+
448+
template <>
449+
std::unique_ptr<FastScanCodeScanner> make_partial_range_scanner_impl<
450+
SIMDLevel::RISCV_RVV>(
451+
bool is_max,
452+
RangeSearchPartialResult& pres,
453+
float radius,
454+
size_t ntotal,
455+
size_t q0,
456+
size_t q1,
457+
const IDSelector* sel) {
458+
return make_partial_range_scanner_impl<SIMDLevel::NONE>(
459+
is_max, pres, radius, ntotal, q0, q1, sel);
460+
}
461+
462+
template <>
463+
std::unique_ptr<FastScanCodeScanner> rabitq_make_knn_scanner_impl<
464+
SIMDLevel::RISCV_RVV>(
465+
const IndexRaBitQFastScan* index,
466+
bool is_max,
467+
size_t nq,
468+
int64_t k,
469+
float* distances,
470+
int64_t* ids,
471+
const IDSelector* sel,
472+
const FastScanDistancePostProcessing& context,
473+
bool is_multi_bit) {
474+
return rabitq_make_knn_scanner_impl<SIMDLevel::NONE>(
475+
index, is_max, nq, k, distances, ids, sel, context, is_multi_bit);
476+
}
477+
478+
template <>
479+
std::unique_ptr<FastScanCodeScanner> rabitq_ivf_make_knn_scanner_impl<
480+
SIMDLevel::RISCV_RVV>(
481+
bool is_max,
482+
const IndexIVFRaBitQFastScan* index,
483+
size_t nq,
484+
size_t k,
485+
float* distances,
486+
int64_t* ids,
487+
const IDSelector* sel,
488+
const FastScanDistancePostProcessing* context,
489+
bool multi_bit) {
490+
return rabitq_ivf_make_knn_scanner_impl<SIMDLevel::NONE>(
491+
is_max, index, nq, k, distances, ids, sel, context, multi_bit);
492+
}
493+
#endif // COMPILE_SIMD_RISCV_RVV
494+
419495
std::unique_ptr<FastScanCodeScanner> make_fast_scan_knn_scanner(
420496
bool is_max,
421497
int impl,
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
*
4+
* This source code is licensed under the MIT license found in the
5+
* LICENSE file in the root directory of this source tree.
6+
*/
7+
8+
#ifdef COMPILE_SIMD_RISCV_RVV
9+
10+
#include <faiss/impl/pq_code_distance/pq_code_distance-inl.h>
11+
12+
namespace faiss {
13+
namespace pq_code_distance {
14+
15+
// RISCV_RVV: no RVV-optimized PQ code distance exists yet. Use scalar.
16+
17+
// NOLINTNEXTLINE(facebook-hte-MisplacedTemplateSpecialization)
18+
template <>
19+
float pq_code_distance_8bit_single_impl<SIMDLevel::RISCV_RVV>(
20+
size_t M,
21+
const float* sim_table,
22+
const uint8_t* code) {
23+
return PQCodeDistanceScalar<PQDecoder8>::distance_single_code(
24+
M, 8, sim_table, code);
25+
}
26+
27+
// NOLINTNEXTLINE(facebook-hte-MisplacedTemplateSpecialization)
28+
template <>
29+
void pq_code_distance_8bit_four_impl<SIMDLevel::RISCV_RVV>(
30+
size_t M,
31+
const float* sim_table,
32+
const uint8_t* __restrict code0,
33+
const uint8_t* __restrict code1,
34+
const uint8_t* __restrict code2,
35+
const uint8_t* __restrict code3,
36+
float& result0,
37+
float& result1,
38+
float& result2,
39+
float& result3) {
40+
PQCodeDistanceScalar<PQDecoder8>::distance_four_codes(
41+
M,
42+
8,
43+
sim_table,
44+
code0,
45+
code1,
46+
code2,
47+
code3,
48+
result0,
49+
result1,
50+
result2,
51+
result3);
52+
}
53+
54+
} // namespace pq_code_distance
55+
} // namespace faiss
56+
57+
#define THE_SIMD_LEVEL SIMDLevel::RISCV_RVV
58+
59+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
60+
#include <faiss/utils/hamming_distance/hamming_computer-rvv.h>
61+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
62+
#include <faiss/impl/pq_code_distance/PQDistanceComputer_impl.h>
63+
// NOLINTNEXTLINE(facebook-hte-InlineHeader)
64+
#include <faiss/impl/pq_code_distance/IVFPQScanner_impl.h>
65+
66+
#undef THE_SIMD_LEVEL
67+
68+
#endif // COMPILE_SIMD_RISCV_RVV

faiss/impl/scalar_quantizer/sq-rvv.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ struct DCTemplate<
186186
const float inv_scale = (vdiff == 0.0f) ? 0.0f : 15.0f / vdiff;
187187
for (size_t i = 0; i < d; i++) {
188188
float val = (x[i] - vmin) * inv_scale;
189-
int code = static_cast<int>(std::floor(val + 0.5f));
189+
int code = static_cast<int>(val);
190190
if (code < 0) {
191191
code = 0;
192192
}

faiss/impl/simd_dispatch.h

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ constexpr int AVAILABLE_SIMD_LEVELS_NONE = (1 << int(SIMDLevel::NONE));
3232
constexpr int AVAILABLE_SIMD_LEVELS_AVX2_NEON = AVAILABLE_SIMD_LEVELS_NONE |
3333
(1 << int(SIMDLevel::AVX2)) | (1 << int(SIMDLevel::ARM_NEON));
3434

35-
// A0: same + AVX512
36-
constexpr int AVAILABLE_SIMD_LEVELS_A0 =
37-
AVAILABLE_SIMD_LEVELS_AVX2_NEON | (1 << int(SIMDLevel::AVX512));
35+
// A0: same + AVX512 + RISCV_RVV
36+
constexpr int AVAILABLE_SIMD_LEVELS_A0 = AVAILABLE_SIMD_LEVELS_AVX2_NEON |
37+
(1 << int(SIMDLevel::AVX512)) | (1 << int(SIMDLevel::RISCV_RVV));
3838

3939
// A1: same + ARM_SVE (for functions with dedicated SVE implementations)
4040
constexpr int AVAILABLE_SIMD_LEVELS_A1 =
@@ -147,8 +147,8 @@ inline auto with_selected_simd_levels(LambdaType&& action) {
147147
* });
148148
*
149149
* The lambda must be a generic lambda with a SIMDLevel template parameter.
150-
* By default, the lambda uses levels AVX2 + AVX512 + NEON, since these are the
151-
* most common cases.
150+
* By default, the lambda uses levels AVX2 + AVX512 + NEON + RVV, since these
151+
* are the most common cases.
152152
*
153153
* @param action A generic lambda with signature `template<SIMDLevel> T
154154
* operator()()`

faiss/utils/distances_fused/distances_fused.cpp

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,20 @@ bool exhaustive_L2sqr_fused_cmax<SIMDLevel::NONE>(
2424
return false;
2525
}
2626

27+
#ifdef COMPILE_SIMD_RISCV_RVV
28+
template <>
29+
bool exhaustive_L2sqr_fused_cmax<SIMDLevel::RISCV_RVV>(
30+
const float*,
31+
const float*,
32+
size_t,
33+
size_t,
34+
size_t,
35+
Top1BlockResultHandler<CMax<float, int64_t>>&,
36+
const float*) {
37+
return false;
38+
}
39+
#endif // COMPILE_SIMD_RISCV_RVV
40+
2741
bool exhaustive_L2sqr_fused_cmax(
2842
const float* x,
2943
const float* y,

0 commit comments

Comments
 (0)