Skip to content

Upgrade Lucene to version 9.12.2#2971

Closed
suraj-subrahmanyan wants to merge 4 commits intocastorini:masterfrom
suraj-subrahmanyan:upgrade-lucene
Closed

Upgrade Lucene to version 9.12.2#2971
suraj-subrahmanyan wants to merge 4 commits intocastorini:masterfrom
suraj-subrahmanyan:upgrade-lucene

Conversation

@suraj-subrahmanyan
Copy link
Contributor

This PR addresses issue #2947.

Start: 9.9.1

9.9.1 --> 9.10.0

Error:

[ERROR] Failures:
[ERROR]   SearchFlatDenseVectorsTest.testBasicCosDprQuantized:376 expected:<[384]> but was:<[136]>
[ERROR]   SearchHnswDenseVectorsTest.testBasicCosDprQuantized:394 expected:<[384]> but was:<[136]>

Caused by: GITHUB#13090.
Solution: Updated test output with new relevance scores and other metrics.

9.10.0 --> 9.11.0

Error:

[ERROR] /home/ssubr/TASK/anserini/src/main/java/io/anserini/index/codecs/AnseriniLucene99ScalarQuantizedVectorsFormat.java:[19,32] cannot find symbol
[ERROR]   symbol:   class FlatVectorsFormat
[ERROR]   location: package org.apache.lucene.codecs
[ERROR] /home/ssubr/TASK/anserini/src/main/java/io/anserini/index/codecs/AnseriniLucene99ScalarQuantizedVectorsFormat.java:[20,32] cannot find symbol
[ERROR]   symbol:   class FlatVectorsReader
[ERROR]   location: package org.apache.lucene.codecs
[ERROR] /home/ssubr/TASK/anserini/src/main/java/io/anserini/index/codecs/AnseriniLucene99ScalarQuantizedVectorsFormat.java:[21,32] cannot find symbol
[ERROR]   symbol:   class FlatVectorsWriter
......

Caused by: GITHUB#13288.
Solution: Updated imports and later the expected test output (similar to 9.9.1 --> 9.10.0).

9.11.0 --> 9.11.1
No errors or changes.

9.11.1 --> 9.12.0
Caused by: GITHUB#13469.
Solution: Refactored Lucene99Codec to Lucene912Codec and removed second parameter from FlatVectorsWriter.addField().

9.12.0 --> 9.12.2
No errors or changes.

mvn clean test results in all tests passed.

@lintool
Copy link
Member

lintool commented Sep 14, 2025

Hi @suraj-subrahmanyan thanks for working on this!

Do the prebuilt indexes (e.g., https://github.com/castorini/anserini/blob/master/docs/fatjar-regressions/fatjar-regressions-v1.2.2.md ) still work? Or will we have to completely rebuild indexes?

@suraj-subrahmanyan
Copy link
Contributor Author

The pre-built indexes are working correctly -- particularly, I was able to download pre-built indexes with IndexReaderUtils -stats using pre-built indexes from https://github.com/castorini/anserini/blob/master/docs/prebuilt-indexes.md. Also, the BM25 search/dense vector search, bright benchmark, and BEIR collections ran correctly which very likely means the indexes built with older Lucene versions can still be read.

@lintool
Copy link
Member

lintool commented Sep 15, 2025

hi @suraj-subrahmanyan I noticed that the HNSW scores change... which means we'll need to run regressions from scratch and manually verify/fix all scores... it's going to be a lot of work :(

we'll have to do it all over again when we upgrade to Lucene 10... which makes me think... should we just bite the bullet and upgrade to Lucene 10 directly?

wdyt?

@suraj-subrahmanyan
Copy link
Contributor Author

suraj-subrahmanyan commented Sep 16, 2025

Oh, I realize that during the upgrade to the Lucene version, I had to fix test cases because of the vector ranking changed. Should've realized that likely applied to the rest of HNSW scores. I see your point, and I agree -- I don't think updating all the regressions for 9.12.2 is worth it for the minor benefits it introduced. Although Lucene version 10 seems relatively new, it might help restructure Anserini for the better, long-term?

@lintool
Copy link
Member

lintool commented Jan 11, 2026

Closing, superseded by #3082

@lintool lintool closed this Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants