Skip to content

IVF-PQ: low-precision coarse search #715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: branch-25.06
Choose a base branch
from

Conversation

achirkin
Copy link
Contributor

Enable low-precision (half / int8) element type for use in the cuBLAS GEMM performed during coarse search (select clusters to probe). This makes cuBLAS use tensor cores and thus speeds up the coarse search.

Also propagate kMaxQueries compile time constant to a runtime search parameter: this allows to improve GPU utilization in extremely large batch size case, such as using IVF-PQ for constructing a nearest-neighbor graph for the whole dataset.

@achirkin achirkin added feature request New feature or request non-breaking Introduces a non-breaking change labels Feb 21, 2025
@achirkin achirkin self-assigned this Feb 21, 2025
@achirkin achirkin requested review from a team as code owners February 21, 2025 08:15
@tfeher
Copy link
Contributor

tfeher commented Mar 17, 2025

Hi Artem, thanks for the PR! Could you add tests for the new options?

@achirkin
Copy link
Contributor Author

Sure, thanks for pointing this out!

It's worth mentioning the int8 coarse search often gives the garbage recall and that is rather unavoidable. The problem is that we keep cluster norms as a part of cluster vectors and compute GEMM of the whole thing for the L2 case. But the norms are not normalized, so they grow very fast with the number of dimensions, which makes int8 representation impossible. I slightly improved the situation by encoding the norms into several int8 slots, but even that didn't help in many cases.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Artem for the PR! It looks good overall, but I have a few questions.

// 8-bit coarse search is experimental and there's no go guarantee of any recall
// if the data is not normalized. Especially for L2, because we store vector norms alongside the
// cluster centers.
x.min_recall = 0.1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is the normalization requirement documented elsewhere?
  • Can't we use our quantization API to set proper normalizaiton constants?
  • Can we have larger min_recall by increasing nprobes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure the int8_t coarse search works correctly, we need even stricter requirements that all elements are smaller than one. I'm also not sure it makes sense to require L2 normalization (that all norms are smaller than 2m), because that means reduce precision (if both the norm and the components are divided by the same constant).
I think increasing the nprobes won't help a lot, because if the norm is out of range we basically get the random selection.
All in all I doubt the int8_t variant will be useful, but we may reuse the code later by changing it to fp8 (and we can estimate the performance by running int8_t now). Therefore I suppose there's a value in having it as an experimental feature and not invest too much in documentation and testing.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the half/int8 precision directly to balanced kmeans so that we can reuse the solution across other algorithms which use that?

@achirkin
Copy link
Contributor Author

Hi @cjnolet, the coarse search bits in IVF-PQ are rather not portable as it does cluster search + query type mapping at the same time and relies on IVF-PQ-specific representation (cluster center norms stored alongside the vectors), so unfortunately there's no code to share between the two.

@achirkin achirkin requested review from tfeher and cjnolet March 31, 2025 09:12
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the update! The PR looks good to me.

I am not completely convinced that we need to add the int8 option, but you did add a clear explanation on its limitation, therefore I am fine with it.

@achirkin achirkin removed the request for review from cjnolet April 3, 2025 08:11
@achirkin achirkin changed the base branch from branch-25.04 to branch-25.06 April 16, 2025 07:18
@achirkin achirkin requested a review from cjnolet April 16, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants