Skip to content

lucene_snapshot: Update to new Lucene 10.3 postings format #128240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: lucene_snapshot
Choose a base branch
from

Conversation

ChrisHegarty
Copy link
Contributor

There are a few different things going on in this PR, all of which are required to get the lucene_snspahot branch building again, but the most substantial is the update to the new Lucene 10.3 postings format, Lucene103PostingsFormat.

Additionally, changes are required because of the removal of deprecated methods in IOContext, as well as the override of hints for direct IO.

@ChrisHegarty ChrisHegarty added >non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels May 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@@ -318,7 +318,10 @@ public void testRangeQuery() throws IOException {
LongPoint.newRangeQuery("field", instant1, instant2),
SortedNumericDocValuesField.newSlowRangeQuery("field", instant1, instant2)
);
// TODO: temporarily disable - the range query rewrites to a MatchNoDocsQuery
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to follow up on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, shout if you need help on this! this is probably due to apache/lucene#14609 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this case can no longer be tested the way that we do, because we have no data in the index have we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, again I think that the assertions are too strict, and in fact assert an implementation detail. Likely just need to adjust what is expected ( once I spend a little more time on it ).

@@ -952,8 +959,9 @@ public void testMaxScoreQueryVisitor() {

// assert score docs are in order and their number is as expected
private static void assertSortResults(TopDocs topDocs, long totalNumDocs, boolean isDoubleSort) {
assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, topDocs.totalHits.relation());
assertThat(topDocs.totalHits.value(), lessThan(totalNumDocs)); // we collected less docs than total number
// TODO: fix java.lang.AssertionError: expected:<GREATER_THAN_OR_EQUAL_TO> but was:<EQUAL_TO>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to follow up on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything to do other than update the assertions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think that the outcome here will be to adjust the assertions - I just wanna spend a little more time on it before doing that, so as to satisfy myself complete! :-)

@@ -35,12 +36,12 @@
*/
public class PerFieldFormatSupplier {
public static final FeatureFlag USE_LUCENE101_POSTINGS_FORMAT = new FeatureFlag("use_lucene101_postings_format");
public static final FeatureFlag USE_LUCENE103_POSTINGS_FORMAT = new FeatureFlag("use_lucene103_postings_format");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed once #128236 is merged.

@Override
public FlatVectorsReader fieldsReader(SegmentReadState state) throws IOException {
SegmentReadState directIOState = new SegmentReadState(
state.directory,
state.segmentInfo,
state.fieldInfos,
DIRECT_IO_CONTEXT,
state.context.withHints(DirectIOHint.INSTANCE),
Copy link
Member

@thecoop thecoop May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a specific implementation of IOContext so the hints weren't overridden by calls to withHints within Lucene99FlatVectorsReader. I should have commented that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we should not be changing the context if it was a merge or flush. Should withHints be additive rather than replace? Or we otherwise build a new hint set from the existing hints along with directIO ?

Copy link
Member

@thecoop thecoop May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, we have just discovered that other directory implementations can be used. So DirectIOHint should probably be added to all the others - we can modify our implementation here to add DirectIO to any withHints that are specified

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, thanks @ChrisHegarty !

@@ -461,7 +461,8 @@
org.elasticsearch.index.codec.Elasticsearch814Codec,
org.elasticsearch.index.codec.Elasticsearch816Codec,
org.elasticsearch.index.codec.Elasticsearch900Codec,
org.elasticsearch.index.codec.Elasticsearch900Lucene101Codec;
org.elasticsearch.index.codec.Elasticsearch900Lucene101Codec,
org.elasticsearch.index.codec.Elasticsearch902Lucene103Codec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes me wonder why I used 900 instead of 90 back when I added it in the first place. 9.2.0 could have been 92 as well and perhaps 902 makes it look like a bugfix release which it is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point. So this should be Elasticsearch92Lucene103Codec, then right ? I'll update it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better yea. pardon the inconsistency with 900 that we will have to live with.

@@ -318,7 +318,10 @@ public void testRangeQuery() throws IOException {
LongPoint.newRangeQuery("field", instant1, instant2),
SortedNumericDocValuesField.newSlowRangeQuery("field", instant1, instant2)
);
// TODO: temporarily disable - the range query rewrites to a MatchNoDocsQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, shout if you need help on this! this is probably due to apache/lucene#14609 ?

@@ -318,7 +318,10 @@ public void testRangeQuery() throws IOException {
LongPoint.newRangeQuery("field", instant1, instant2),
SortedNumericDocValuesField.newSlowRangeQuery("field", instant1, instant2)
);
// TODO: temporarily disable - the range query rewrites to a MatchNoDocsQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this case can no longer be tested the way that we do, because we have no data in the index have we?

@@ -952,8 +959,9 @@ public void testMaxScoreQueryVisitor() {

// assert score docs are in order and their number is as expected
private static void assertSortResults(TopDocs topDocs, long totalNumDocs, boolean isDoubleSort) {
assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, topDocs.totalHits.relation());
assertThat(topDocs.totalHits.value(), lessThan(totalNumDocs)); // we collected less docs than total number
// TODO: fix java.lang.AssertionError: expected:<GREATER_THAN_OR_EQUAL_TO> but was:<EQUAL_TO>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything to do other than update the assertions here?

@@ -817,6 +817,13 @@ public void testNumericSortOptimization() throws Exception {

Query q = LongPoint.newRangeQuery(fieldNameLong, startLongValue, startLongValue + numDocs);

// 0. test assertion - the query rewritten to a match all - https://github.com/apache/lucene/pull/14609/
// TODO: reflow total hits expectations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this refers to the other TODO below or is it something else?

@@ -0,0 +1,1191 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we forking this from Lucene? Why so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we have to copy (not fork per se). Since the writer has been moved to backward compat test in Lucene - so is not in any shipping Lucene jar :-(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, why do we need it again? Sorry if it's a dumb question, it is not obvious to me.

import java.io.IOException;

/** Compression algorithm used for suffixes of a block of terms. */
public enum CompressionAlgorithm {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is somehow surprising, what makes it required to fork on our end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's adjust the access modifier in Lucene to avoid this needless copying - apache/lucene#14695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants