IVF-PQ: low-precision coarse search #715

achirkin · 2025-02-21T08:15:44Z

Enable low-precision (half / int8) element type for use in the cuBLAS GEMM performed during coarse search (select clusters to probe). This makes cuBLAS use tensor cores and thus speeds up the coarse search.

Also propagate kMaxQueries compile time constant to a runtime search parameter: this allows to improve GPU utilization in extremely large batch size case, such as using IVF-PQ for constructing a nearest-neighbor graph for the whole dataset.

tfeher · 2025-03-17T23:05:55Z

Hi Artem, thanks for the PR! Could you add tests for the new options?

achirkin · 2025-03-21T14:01:18Z

Sure, thanks for pointing this out!

It's worth mentioning the int8 coarse search often gives the garbage recall and that is rather unavoidable. The problem is that we keep cluster norms as a part of cluster vectors and compute GEMM of the whole thing for the L2 case. But the norms are not normalized, so they grow very fast with the number of dimensions, which makes int8 representation impossible. I slightly improved the situation by encoding the norms into several int8 slots, but even that didn't help in many cases.

tfeher

Thank you Artem for the PR! It looks good overall, but I have a few questions.

cpp/src/neighbors/ivf_pq_index.cu

tfeher · 2025-03-24T00:04:18Z

cpp/tests/neighbors/ann_ivf_pq.cuh

+    // 8-bit coarse search is experimental and there's no go guarantee of any recall
+    // if the data is not normalized. Especially for L2, because we store vector norms alongside the
+    // cluster centers.
+    x.min_recall = 0.1;


Is the normalization requirement documented elsewhere?

Can't we use our quantization API to set proper normalizaiton constants?

Can we have larger min_recall by increasing nprobes?

To make sure the int8_t coarse search works correctly, we need even stricter requirements that all elements are smaller than one. I'm also not sure it makes sense to require L2 normalization (that all norms are smaller than 2m), because that means reduce precision (if both the norm and the components are divided by the same constant).
I think increasing the nprobes won't help a lot, because if the norm is out of range we basically get the random selection.
All in all I doubt the int8_t variant will be useful, but we may reuse the code later by changing it to fp8 (and we can estimate the performance by running int8_t now). Therefore I suppose there's a value in having it as an experimental feature and not invest too much in documentation and testing.

cjnolet

Can we add the half/int8 precision directly to balanced kmeans so that we can reuse the solution across other algorithms which use that?

achirkin · 2025-03-31T09:02:21Z

Hi @cjnolet, the coarse search bits in IVF-PQ are rather not portable as it does cluster search + query type mapping at the same time and relies on IVF-PQ-specific representation (cluster center norms stored alongside the vectors), so unfortunately there's no code to share between the two.

tfeher

Thanks Artem for the update! The PR looks good to me.

I am not completely convinced that we need to add the int8 option, but you did add a clear explanation on its limitation, therefore I am fine with it.

cpp/src/neighbors/ivf_pq_index.cu

cjnolet · 2025-04-25T14:29:01Z

/merge

Enable low-precision (half / int8) element type for use in the cuBLAS GEMM performed during coarse search (select clusters to probe). This makes cuBLAS use tensor cores and thus speeds up the coarse search. Also propagate `kMaxQueries` compile time constant to a runtime search parameter: this allows to improve GPU utilization in extremely large batch size case, such as using IVF-PQ for constructing a nearest-neighbor graph for the whole dataset. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#715

IVF-PQ: low-precision coarse search

8c3b0aa

achirkin added feature request New feature or request non-breaking Introduces a non-breaking change labels Feb 21, 2025

achirkin self-assigned this Feb 21, 2025

achirkin requested review from a team as code owners February 21, 2025 08:15

github-actions bot added cpp CMake labels Feb 21, 2025

achirkin added 4 commits February 25, 2025 09:48

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

7e239a4

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

bcc1aae

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

6271db3

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

f4c3d70

achirkin and others added 4 commits March 21, 2025 10:46

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

5a7f497

Add a few test cases to cover new parameters

020e6c3

Relax the tests for 8-bit coarse search

27abf81

Relax the tests for 8-bit coarse search

4cb5836

tfeher requested changes Mar 24, 2025

View reviewed changes

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

9ece563

cjnolet requested changes Mar 26, 2025

View reviewed changes

achirkin and others added 2 commits March 31, 2025 09:51

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

0cf7630

Add notes to int8_t implementation

d8cef4a

Add the new search parameters to the C API

47be16a

achirkin requested review from tfeher and cjnolet March 31, 2025 09:12

tfeher approved these changes Mar 31, 2025

View reviewed changes

cpp/src/neighbors/ivf_pq_index.cu Show resolved Hide resolved

achirkin added 2 commits April 1, 2025 09:43

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

3b76ae7

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

f2da0b5

Remove redundant 'coarse' from the error message.

94b5f4c

achirkin removed the request for review from cjnolet April 3, 2025 08:11

achirkin changed the base branch from branch-25.04 to branch-25.06 April 16, 2025 07:18

achirkin requested a review from cjnolet April 16, 2025 07:18

achirkin added 6 commits April 16, 2025 09:19

Merge branch 'branch-25.06' into fea-ivf-pq-low-precision-coarse-search

a2f41ef

Merge branch 'branch-25.06' into fea-ivf-pq-low-precision-coarse-search

9ef4657

Merge branch 'branch-25.06' into fea-ivf-pq-low-precision-coarse-search

b46f36e

Merge branch 'branch-25.06' into fea-ivf-pq-low-precision-coarse-search

3630820

Merge branch 'branch-25.06' into fea-ivf-pq-low-precision-coarse-search

1997932

Merge branch 'branch-25.06' into fea-ivf-pq-low-precision-coarse-search

be964a0

cjnolet approved these changes Apr 25, 2025

View reviewed changes

rapids-bot bot merged commit fd845a9 into rapidsai:branch-25.06 Apr 25, 2025
66 checks passed

github-project-automation bot moved this from In Progress to Done in Vector Search, ML, & Data Mining Release Board Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IVF-PQ: low-precision coarse search #715

IVF-PQ: low-precision coarse search #715

Uh oh!

achirkin commented Feb 21, 2025

Uh oh!

tfeher commented Mar 17, 2025

Uh oh!

achirkin commented Mar 21, 2025

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

Uh oh!

tfeher Mar 24, 2025

Uh oh!

achirkin Mar 24, 2025

Uh oh!

cjnolet left a comment

Uh oh!

achirkin commented Mar 31, 2025

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

cjnolet commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

IVF-PQ: low-precision coarse search #715

IVF-PQ: low-precision coarse search #715

Uh oh!

Conversation

achirkin commented Feb 21, 2025

Uh oh!

tfeher commented Mar 17, 2025

Uh oh!

achirkin commented Mar 21, 2025

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tfeher Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

achirkin Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

cjnolet left a comment

Choose a reason for hiding this comment

Uh oh!

achirkin commented Mar 31, 2025

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjnolet commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!