Skip to content

Query: Adds support for weighted RRF in Hybrid Search #45328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

aayush3011
Copy link
Member

Weighted RRF support

This change adds support for weighted RRF in hybrid search.

We allow weights to be negative but the negative sign is used to signal that we should sort scores in ascending order for the corresponding component. The final WRRF score is then computed using the absolute value of the weight.

In this approach, the sign of the weight indicates the interpretation of the ranking itself rather than directly affecting the calculated score:
WRRF(d) = ∑ |w_i| × 1/(k + r_i'(d))

Where:

|w_i| is the absolute value of the weight for the i-th component
r_i'(d) is the rank of document d in the i-th component, with a crucial difference:

  • If w_i > 0: r_i'(d) is the original rank (descending order, where 1 is best)
  • If w_i < 0: r_i'(d) is the inverted rank (ascending order, where higher number is better)

k = 60

Intuition Behind This Modification

This modification addresses a key scenario in information retrieval and ranking fusion: sometimes "lower is better" rather than "higher is better."

  • Handling Different Ranking Interpretations: Some systems naturally produce rankings where lower values are better (like error rates, distances, or pricing). The sign becomes a semantic indicator of how to interpret the ranking.
  • Unifying Disparate Metrics: This allows you to combine rankings based on completely different metrics (relevance scores, error rates, freshness, etc.) without having to pre-process them into a common format.
  • Integration of Minimization Metrics: You can directly incorporate metrics that should be minimized (like latency or error) alongside metrics that should be maximized (like relevance or user engagement).

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@Copilot Copilot AI review requested due to automatic review settings May 14, 2025 16:59
@aayush3011 aayush3011 requested review from kirankumarkolli and a team as code owners May 14, 2025 16:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for weighted RRF in hybrid search by introducing a new component weight property and updating the ranking calculations accordingly. Key changes include:

  • Adding a new property “componentWeights” and its getter in HybridSearchQueryInfo.
  • Updating query feature lists and ranking computation in HybridSearchDocumentQueryExecutionContext to incorporate weight values.
  • Adjusting tests to cover various weighted RRF scenarios.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
HybridSearchQueryInfo.java Introduces a new componentWeights field and getter supporting weighted queries.
QueryPlanRetriever.java Updates the supported query features list to include WeightedRankFusion.
QueryFeature.java Adds the WeightedRankFusion enum value.
HybridSearchDocumentQueryExecutionContext.java Modifies methods to retrieve component weights, sort component scores using the provided comparator, and use weights in RRF score calculations.
Constants.java Updates JSON property mapping constants for hybrid search, though the key names appear to be swapped.
HybridSearchQueryTest.java Adds new tests covering weighted RRF scenarios.

Copy link
Contributor

github-actions bot commented May 14, 2025

API Change Check

APIView identified API level changes in this PR and created the following API reviews

com.azure:azure-cosmos

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aayush3011 , looks good to me, can you please get approval from @simorenoh and @bambriz on this, thanks!

@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes since we don't want to merge this to main

Copy link
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment but not blocking on it since it doesn't affect the code, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants