-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Adding MinScore support to Linear Retriever #124182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
a41dac9
MinScore implementation in Linear retriever
mridula-s109 61dd8df
[CI] Auto commit changes from spotless
elasticsearchmachine 6be41de
Merge remote-tracking branch 'origin/minscore-linear' into minscore-l…
mridula-s109 a52628c
Resolving PR comments
mridula-s109 a225c01
Fixed PR comments, added yaml and made changes to the markdown
mridula-s109 429a620
Merge branch 'main' into minscore-linear
mridula-s109 95710e6
Update docs/changelog/124182.yaml
mridula-s109 a32f947
[CI] Auto commit changes from spotless
elasticsearchmachine a424e77
Resolved on the PR comments
mridula-s109 38a9b50
[CI] Auto commit changes from spotless
elasticsearchmachine 810d151
Added changes wrt to yaml testing from PR comments
mridula-s109 88703fd
Worked on kathleen comments first half
mridula-s109 d9b44e2
Reverted the integration test in line with the main branch
mridula-s109 46a1b94
Resolved comments in the PR and its in compiling state
mridula-s109 73a9bad
Unit tests passing
mridula-s109 6f93d3d
[CI] Auto commit changes from spotless
elasticsearchmachine 1fd4a22
reverted inclusion of pit in retrierver it file
mridula-s109 f53cd0a
Removed transport versions
mridula-s109 9be3783
Modified rrfRank doc to the way main was
mridula-s109 7e2f732
Committing the changes done until now, will be doing a clean commit next
mridula-s109 3feaa3f
[CI] Auto commit changes from spotless
elasticsearchmachine File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 124182 | ||
summary: Add `min_score` support to linear retriever | ||
area: Search | ||
type: enhancement | ||
issues: [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,14 +29,36 @@ public class RankDocsQueryBuilder extends AbstractQueryBuilder<RankDocsQueryBuil | |
|
||
public static final String NAME = "rank_docs_query"; | ||
|
||
/** | ||
* Default minimum score threshold for documents to be included in results. | ||
* Using Float.MIN_VALUE as the default ensures that by default no documents | ||
* are filtered out based on score, as virtually all scores will be above this threshold. | ||
* | ||
* This threshold is separate from the special handling of scores that are exactly 0: | ||
* - The minScore parameter determines which documents are included in results based on their score | ||
* - Documents with a score of exactly 0 will always be assigned Float.MIN_VALUE internally | ||
* to differentiate them from filtered matches, regardless of the minScore value | ||
* | ||
* Setting minScore to a higher value (e.g., 0.0f) would filter out documents with scores below that threshold, | ||
* which can be useful to remove documents that only match filters but have no relevance score contribution. | ||
*/ | ||
public static final float DEFAULT_MIN_SCORE = Float.MIN_VALUE; | ||
|
||
private final RankDoc[] rankDocs; | ||
private final QueryBuilder[] queryBuilders; | ||
private final boolean onlyRankDocs; | ||
private final float minScore; | ||
private boolean countFilteredHits = false; | ||
|
||
public RankDocsQueryBuilder(RankDoc[] rankDocs, QueryBuilder[] queryBuilders, boolean onlyRankDocs) { | ||
this(rankDocs, queryBuilders, onlyRankDocs, DEFAULT_MIN_SCORE); | ||
} | ||
|
||
public RankDocsQueryBuilder(RankDoc[] rankDocs, QueryBuilder[] queryBuilders, boolean onlyRankDocs, float minScore) { | ||
this.rankDocs = rankDocs; | ||
this.queryBuilders = queryBuilders; | ||
this.onlyRankDocs = onlyRankDocs; | ||
this.minScore = minScore; | ||
} | ||
|
||
public RankDocsQueryBuilder(StreamInput in) throws IOException { | ||
|
@@ -45,9 +67,17 @@ public RankDocsQueryBuilder(StreamInput in) throws IOException { | |
if (in.getTransportVersion().onOrAfter(TransportVersions.V_8_16_0)) { | ||
this.queryBuilders = in.readOptionalArray(c -> c.readNamedWriteable(QueryBuilder.class), QueryBuilder[]::new); | ||
this.onlyRankDocs = in.readBoolean(); | ||
this.minScore = in.readFloat(); | ||
if (in.getTransportVersion().onOrAfter(TransportVersions.V_8_17_0)) { | ||
this.countFilteredHits = in.readBoolean(); | ||
} else { | ||
this.countFilteredHits = false; | ||
} | ||
} else { | ||
this.queryBuilders = null; | ||
this.onlyRankDocs = false; | ||
mridula-s109 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
this.minScore = DEFAULT_MIN_SCORE; | ||
this.countFilteredHits = false; | ||
} | ||
} | ||
|
||
|
@@ -70,7 +100,7 @@ protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) throws | |
changed |= newQueryBuilders[i] != queryBuilders[i]; | ||
} | ||
if (changed) { | ||
RankDocsQueryBuilder clone = new RankDocsQueryBuilder(rankDocs, newQueryBuilders, onlyRankDocs); | ||
RankDocsQueryBuilder clone = new RankDocsQueryBuilder(rankDocs, newQueryBuilders, onlyRankDocs, minScore); | ||
clone.queryName(queryName()); | ||
return clone; | ||
} | ||
|
@@ -88,6 +118,10 @@ protected void doWriteTo(StreamOutput out) throws IOException { | |
if (out.getTransportVersion().onOrAfter(TransportVersions.V_8_16_0)) { | ||
out.writeOptionalArray(StreamOutput::writeNamedWriteable, queryBuilders); | ||
out.writeBoolean(onlyRankDocs); | ||
out.writeFloat(minScore); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You'll need to reference the new transport version you create here as well, to ensure that serialization is consistent. |
||
if (out.getTransportVersion().onOrAfter(TransportVersions.V_8_17_0)) { | ||
out.writeBoolean(countFilteredHits); | ||
} | ||
} | ||
} | ||
|
||
|
@@ -115,7 +149,12 @@ protected Query doToQuery(SearchExecutionContext context) throws IOException { | |
queries = new Query[0]; | ||
queryNames = Strings.EMPTY_ARRAY; | ||
} | ||
return new RankDocsQuery(reader, shardRankDocs, queries, queryNames, onlyRankDocs); | ||
|
||
RankDocsQuery query = new RankDocsQuery(reader, shardRankDocs, queries, queryNames, onlyRankDocs, minScore); | ||
if (countFilteredHits) { | ||
query.setCountFilteredHits(true); | ||
} | ||
return query; | ||
} | ||
|
||
@Override | ||
|
@@ -135,16 +174,31 @@ protected void doXContent(XContentBuilder builder, Params params) throws IOExcep | |
protected boolean doEquals(RankDocsQueryBuilder other) { | ||
return Arrays.equals(rankDocs, other.rankDocs) | ||
&& Arrays.equals(queryBuilders, other.queryBuilders) | ||
&& onlyRankDocs == other.onlyRankDocs; | ||
&& onlyRankDocs == other.onlyRankDocs | ||
&& minScore == other.minScore | ||
&& countFilteredHits == other.countFilteredHits; | ||
} | ||
|
||
@Override | ||
protected int doHashCode() { | ||
return Objects.hash(Arrays.hashCode(rankDocs), Arrays.hashCode(queryBuilders), onlyRankDocs); | ||
return Objects.hash(Arrays.hashCode(rankDocs), Arrays.hashCode(queryBuilders), onlyRankDocs, minScore, countFilteredHits); | ||
} | ||
|
||
@Override | ||
public TransportVersion getMinimalSupportedVersion() { | ||
return TransportVersions.V_8_16_0; | ||
} | ||
|
||
/** | ||
* Sets whether this query should count only documents that pass the min_score filter. | ||
* When true, the total hits count will reflect the number of documents meeting the minimum score threshold. | ||
* When false (default), the total hits count will include all matching documents regardless of score. | ||
* | ||
* @param countFilteredHits true to count only documents passing min_score, false to count all matches | ||
* @return this builder | ||
*/ | ||
public RankDocsQueryBuilder setCountFilteredHits(boolean countFilteredHits) { | ||
this.countFilteredHits = countFilteredHits; | ||
return this; | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is written, it will break transport serialization.
What you want to do, is register a new
TransportVersion
for your change inTransportVersions.java
. You'll then reference this new transport version to serialize the min score - older transport versions will always just use a default value.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i want to include min_score without transport versioning changes, then is the way it is currently written fine?