-
Notifications
You must be signed in to change notification settings - Fork 25.2k
lucene_snapshot: Update to new Lucene 10.3 postings format #128240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lucene_snapshot
Are you sure you want to change the base?
lucene_snapshot: Update to new Lucene 10.3 postings format #128240
Conversation
Pinging @elastic/es-search (Team:Search) |
@@ -318,7 +318,10 @@ public void testRangeQuery() throws IOException { | |||
LongPoint.newRangeQuery("field", instant1, instant2), | |||
SortedNumericDocValuesField.newSlowRangeQuery("field", instant1, instant2) | |||
); | |||
// TODO: temporarily disable - the range query rewrites to a MatchNoDocsQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to follow up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, shout if you need help on this! this is probably due to apache/lucene#14609 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this case can no longer be tested the way that we do, because we have no data in the index have we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, again I think that the assertions are too strict, and in fact assert an implementation detail. Likely just need to adjust what is expected ( once I spend a little more time on it ).
@@ -952,8 +959,9 @@ public void testMaxScoreQueryVisitor() { | |||
|
|||
// assert score docs are in order and their number is as expected | |||
private static void assertSortResults(TopDocs topDocs, long totalNumDocs, boolean isDoubleSort) { | |||
assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, topDocs.totalHits.relation()); | |||
assertThat(topDocs.totalHits.value(), lessThan(totalNumDocs)); // we collected less docs than total number | |||
// TODO: fix java.lang.AssertionError: expected:<GREATER_THAN_OR_EQUAL_TO> but was:<EQUAL_TO> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to follow up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything to do other than update the assertions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think that the outcome here will be to adjust the assertions - I just wanna spend a little more time on it before doing that, so as to satisfy myself complete! :-)
@@ -35,12 +36,12 @@ | |||
*/ | |||
public class PerFieldFormatSupplier { | |||
public static final FeatureFlag USE_LUCENE101_POSTINGS_FORMAT = new FeatureFlag("use_lucene101_postings_format"); | |||
public static final FeatureFlag USE_LUCENE103_POSTINGS_FORMAT = new FeatureFlag("use_lucene103_postings_format"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be removed once #128236 is merged.
@Override | ||
public FlatVectorsReader fieldsReader(SegmentReadState state) throws IOException { | ||
SegmentReadState directIOState = new SegmentReadState( | ||
state.directory, | ||
state.segmentInfo, | ||
state.fieldInfos, | ||
DIRECT_IO_CONTEXT, | ||
state.context.withHints(DirectIOHint.INSTANCE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a specific implementation of IOContext
so the hints weren't overridden by calls to withHints
within Lucene99FlatVectorsReader
. I should have commented that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that we should not be changing the context if it was a merge or flush. Should withHints
be additive rather than replace? Or we otherwise build a new hint set from the existing hints along with directIO ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, we have just discovered that other directory implementations can be used. So DirectIOHint should probably be added to all the others - we can modify our implementation here to add DirectIO
to any withHints
that are specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, thanks @ChrisHegarty !
@@ -461,7 +461,8 @@ | |||
org.elasticsearch.index.codec.Elasticsearch814Codec, | |||
org.elasticsearch.index.codec.Elasticsearch816Codec, | |||
org.elasticsearch.index.codec.Elasticsearch900Codec, | |||
org.elasticsearch.index.codec.Elasticsearch900Lucene101Codec; | |||
org.elasticsearch.index.codec.Elasticsearch900Lucene101Codec, | |||
org.elasticsearch.index.codec.Elasticsearch902Lucene103Codec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes me wonder why I used 900 instead of 90 back when I added it in the first place. 9.2.0 could have been 92 as well and perhaps 902 makes it look like a bugfix release which it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. So this should be Elasticsearch92Lucene103Codec
, then right ? I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better yea. pardon the inconsistency with 900 that we will have to live with.
@@ -318,7 +318,10 @@ public void testRangeQuery() throws IOException { | |||
LongPoint.newRangeQuery("field", instant1, instant2), | |||
SortedNumericDocValuesField.newSlowRangeQuery("field", instant1, instant2) | |||
); | |||
// TODO: temporarily disable - the range query rewrites to a MatchNoDocsQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, shout if you need help on this! this is probably due to apache/lucene#14609 ?
@@ -318,7 +318,10 @@ public void testRangeQuery() throws IOException { | |||
LongPoint.newRangeQuery("field", instant1, instant2), | |||
SortedNumericDocValuesField.newSlowRangeQuery("field", instant1, instant2) | |||
); | |||
// TODO: temporarily disable - the range query rewrites to a MatchNoDocsQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this case can no longer be tested the way that we do, because we have no data in the index have we?
@@ -952,8 +959,9 @@ public void testMaxScoreQueryVisitor() { | |||
|
|||
// assert score docs are in order and their number is as expected | |||
private static void assertSortResults(TopDocs topDocs, long totalNumDocs, boolean isDoubleSort) { | |||
assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, topDocs.totalHits.relation()); | |||
assertThat(topDocs.totalHits.value(), lessThan(totalNumDocs)); // we collected less docs than total number | |||
// TODO: fix java.lang.AssertionError: expected:<GREATER_THAN_OR_EQUAL_TO> but was:<EQUAL_TO> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything to do other than update the assertions here?
@@ -817,6 +817,13 @@ public void testNumericSortOptimization() throws Exception { | |||
|
|||
Query q = LongPoint.newRangeQuery(fieldNameLong, startLongValue, startLongValue + numDocs); | |||
|
|||
// 0. test assertion - the query rewritten to a match all - https://github.com/apache/lucene/pull/14609/ | |||
// TODO: reflow total hits expectations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this refers to the other TODO below or is it something else?
@@ -0,0 +1,1191 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we forking this from Lucene? Why so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we have to copy (not fork per se). Since the writer has been moved to backward compat test in Lucene - so is not in any shipping Lucene jar :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, why do we need it again? Sorry if it's a dumb question, it is not obvious to me.
import java.io.IOException; | ||
|
||
/** Compression algorithm used for suffixes of a block of terms. */ | ||
public enum CompressionAlgorithm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is somehow surprising, what makes it required to fork on our end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's adjust the access modifier in Lucene to avoid this needless copying - apache/lucene#14695
There are a few different things going on in this PR, all of which are required to get the
lucene_snspahot
branch building again, but the most substantial is the update to the new Lucene 10.3 postings format,Lucene103PostingsFormat
.Additionally, changes are required because of the removal of deprecated methods in IOContext, as well as the override of hints for direct IO.