-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Query: Adds support for weighted RRF in Hybrid Search #45328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for weighted RRF in hybrid search by introducing a new component weight property and updating the ranking calculations accordingly. Key changes include:
- Adding a new property “componentWeights” and its getter in HybridSearchQueryInfo.
- Updating query feature lists and ranking computation in HybridSearchDocumentQueryExecutionContext to incorporate weight values.
- Adjusting tests to cover various weighted RRF scenarios.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
HybridSearchQueryInfo.java | Introduces a new componentWeights field and getter supporting weighted queries. |
QueryPlanRetriever.java | Updates the supported query features list to include WeightedRankFusion. |
QueryFeature.java | Adds the WeightedRankFusion enum value. |
HybridSearchDocumentQueryExecutionContext.java | Modifies methods to retrieve component weights, sort component scores using the provided comparator, and use weights in RRF score calculations. |
Constants.java | Updates JSON property mapping constants for hybrid search, though the key names appear to be swapped. |
HybridSearchQueryTest.java | Adds new tests covering weighted RRF scenarios. |
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aayush3011 , looks good to me, can you please get approval from @simorenoh and @bambriz on this, thanks!
...in/java/com/azure/cosmos/implementation/query/HybridSearchDocumentQueryExecutionContext.java
Outdated
Show resolved
Hide resolved
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes since we don't want to merge this to main
...in/java/com/azure/cosmos/implementation/query/HybridSearchDocumentQueryExecutionContext.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment but not blocking on it since it doesn't affect the code, thanks!
Weighted RRF support
This change adds support for weighted RRF in hybrid search.
We allow weights to be negative but the negative sign is used to signal that we should sort scores in ascending order for the corresponding component. The final WRRF score is then computed using the absolute value of the weight.
In this approach, the sign of the weight indicates the interpretation of the ranking itself rather than directly affecting the calculated score:
WRRF(d) = ∑ |w_i| × 1/(k + r_i'(d))
Where:
|w_i| is the absolute value of the weight for the i-th component
r_i'(d) is the rank of document d in the i-th component, with a crucial difference:
k = 60
Intuition Behind This Modification
This modification addresses a key scenario in information retrieval and ranking fusion: sometimes "lower is better" rather than "higher is better."
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines