Skip to content

Upgrade to Lucene 10.4#4195

Open
rahulgoswami wants to merge 7 commits intoapache:mainfrom
rahulgoswami:lucene1040
Open

Upgrade to Lucene 10.4#4195
rahulgoswami wants to merge 7 commits intoapache:mainfrom
rahulgoswami:lucene1040

Conversation

@rahulgoswami
Copy link
Member

@rahulgoswami rahulgoswami commented Mar 7, 2026

https://issues.apache.org/jira/browse/SOLR-18143

Description

Upgrade Lucene dependency to 10.4

Solution

Followed instructions in dev-docs/lucene-upgrade.md and resolved compilation/test failures. Also made changes to documentation and upgrade guide wherever applicable.

@rahulgoswami
Copy link
Member Author

WIP...fixing failures with DenseVectorField tests

@rahulgoswami
Copy link
Member Author

rahulgoswami commented Mar 9, 2026

Put Claude(Opus 4.6) and Codex (GPT 5.4) to work on compilation errors and to understand test failures. Main test failures were around ScalarQuantizedDenseVectorField and BinaryQuantizedDenseVectorField due to breaking changes in Lucene104ScalarQuantizedVectorsFormat.

Major changes:

  • Lucene104HnswScalarQuantizedVectorsFormat has moved to an encoding based API. It no longer accepts "confidenceInterval" or "compression" params. Hence made those no-op in ScalarQuantizedDenseVectorField and removed the same from tests and documentation.

  • Added a note in documentation to say that older Solr 10.x schema may contain those params but not to be used going forward. Existing schemas will continue to be supported.

  • There is no separate binary quantization format in Lucene 10.4. Binary quantization is now just another encoding type of Lucene104ScalarQuantizedVectorsFormat (encoding=ScalarEncoding.SINGLE_BIT_QUERY_NIBBLE). But we'll need to continue to expose it at the Solr level as a separate type through BinaryQuantizedDenseVectorField for back compatibility.

@rahulgoswami rahulgoswami marked this pull request as ready for review March 9, 2026 06:21
@rahulgoswami rahulgoswami requested a review from dsmiley March 9, 2026 06:21
@rahulgoswami
Copy link
Member Author

rahulgoswami commented Mar 9, 2026

Additionally, Lucene104ScalarQuantizedVectorsFormat now supports 1,2,4,7 and 8 bits in the format as opposed to only 4 and 7 earlier. Guess that should be scoped under a separate PR with test and documentation changes instead of squashing everything together in this upgrade PR.

@rahulgoswami
Copy link
Member Author

@dsmiley @alessandrobenedetti Requesting a review please.

@rahulgoswami
Copy link
Member Author

rahulgoswami commented Mar 9, 2026

To-Do: Lucene104HnswScalarQuantizedVectorsFormat now defaults to 8 bits (ScalarEncoding.UNSIGNED_BYTE) instead of 7 bits in Lucene99HnswScalarQuantizedVectorsFormat earlier. Make that the new default for ScalarQuantizedDenseVectorField too (ScalarQuantizedDenseVectorField.DEFAULT_BITS)?

@rahulgoswami
Copy link
Member Author

Also adding @liangkaiwen since he worked on these quantization types for 10.0. Kevin, could you please take a look at the related changes?

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember the discussion thread about a Lucene upgrade that includes a Codec update, and needing to broadcast this fact in the upgrade notes page? This markdown file should explicitly add this step.

Copy link
Member Author

@rahulgoswami rahulgoswami Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I remember a discussion thread regarding this. Quick search on the dev list wasn't helpful either. No mention of codec changes in the current olr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc either. Do you have a link to the discussion ?

// Retain parsing of these ("compress", "confidenceInterval") legacy Solr 10 schema params for
// compatibility even though
// Lucene 10.4's scalar-quantized vector format no longer consumes them directly. They are no-op
// going forward.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe lets mark the params deprecated with a version param of 10.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, use the DeprecationLog so that the user sees a warning if they provide these params

Comment on lines -26 to +28
return new Lucene102HnswBinaryQuantizedVectorsFormat(getHnswM(), getHnswEfConstruction());
return new Lucene104HnswScalarQuantizedVectorsFormat(
ScalarEncoding.SINGLE_BIT_QUERY_NIBBLE, getHnswM(), getHnswEfConstruction());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -103 to -104
useCompression(),
getConfidenceInterval(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps reduce visibility of the accessors and/or mark them deprecated.

- public boolean useCompression() {
+ @Deprecated
+ @VisibleForTesting
+ boolean useCompression() {

+
Accepted values: `BOOLEAN`
`NOTE:` Existing Solr 10 schemas may still contain legacy scalar-quantization parameters such as
`confidenceInterval`, `dynamicConfidenceInterval`, or `compress`. They are accepted for backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`confidenceInterval`, `dynamicConfidenceInterval`, or `compress`. They are accepted for backward
`confidenceInterval`, `dynamicConfidenceInterval`, or `compress`. They are deprecated (i.e. parsed and ignored but not used) for backward

@rahulgoswami
Copy link
Member Author

Thank you for the review @dsmiley and @cpoerschke. I have incorporated all the comments except one by David (pending clarification).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants