Skip to content

[WIP][BUG] Fix CAGRA search recall with a graph built by NN Descent #819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: branch-25.06
Choose a base branch
from

Conversation

enp1s0
Copy link
Member

@enp1s0 enp1s0 commented Apr 12, 2025

This PR addresses an unexpected low recall issue in the CAGRA search with a graph generated by NN Descent.
For that, it updates the initial NN Descent graph generation so that all indices are included, whereas some nodes are not found in the branch-25.06 implementation.

This PR also makes the following changes:

  • Adds a workaround to remove duplicate node indices from the kNN result
  • Adds test cases for different graph degrees

@enp1s0 enp1s0 requested a review from a team as a code owner April 12, 2025 16:00
@enp1s0 enp1s0 self-assigned this Apr 12, 2025
@github-actions github-actions bot added the cpp label Apr 12, 2025
@enp1s0 enp1s0 added bug Something isn't working non-breaking Introduces a non-breaking change cpp and removed cpp labels Apr 12, 2025
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @enp1s0 for this PR! Looks good overall, I have just one request to clarify the expectations for the random graph.

@@ -829,36 +830,29 @@ void GnndGraph<Index_t>::sample_graph_new(InternalID_t<Index_t>* new_neighbors,
template <typename Index_t>
void GnndGraph<Index_t>::init_random_graph()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring to describe what properties the random graph is expected to have? What is ensured by the current PR to fix the low recall issue? E.g. is it guaranteed to be fully connected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants