Conversation
|
WIP...fixing failures with DenseVectorField tests |
…tSize_shouldThrowException
…to changes in Lucene104ScalarQuantizedVectorField format in Lucene 10.4
|
Put Claude(Opus 4.6) and Codex (GPT 5.4) to work on compilation errors and to understand test failures. Main test failures were around ScalarQuantizedDenseVectorField and BinaryQuantizedDenseVectorField due to breaking changes in Lucene104ScalarQuantizedVectorsFormat. Major changes:
|
|
Additionally, Lucene104ScalarQuantizedVectorsFormat now supports 1,2,4,7 and 8 bits in the format as opposed to only 4 and 7 earlier. Guess that should be scoped under a separate PR with test and documentation changes instead of squashing everything together in this upgrade PR. |
|
@dsmiley @alessandrobenedetti Requesting a review please. |
|
To-Do: Lucene104HnswScalarQuantizedVectorsFormat now defaults to 8 bits (ScalarEncoding.UNSIGNED_BYTE) instead of 7 bits in Lucene99HnswScalarQuantizedVectorsFormat earlier. Make that the new default for ScalarQuantizedDenseVectorField too (ScalarQuantizedDenseVectorField.DEFAULT_BITS)? |
|
Also adding @liangkaiwen since he worked on these quantization types for 10.0. Kevin, could you please take a look at the related changes? |
dsmiley
left a comment
There was a problem hiding this comment.
Remember solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc
There was a problem hiding this comment.
Remember the discussion thread about a Lucene upgrade that includes a Codec update, and needing to broadcast this fact in the upgrade notes page? This markdown file should explicitly add this step.
There was a problem hiding this comment.
I don't think I remember a discussion thread regarding this. Quick search on the dev list wasn't helpful either. No mention of codec changes in the current olr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc either. Do you have a link to the discussion ?
| // Retain parsing of these ("compress", "confidenceInterval") legacy Solr 10 schema params for | ||
| // compatibility even though | ||
| // Lucene 10.4's scalar-quantized vector format no longer consumes them directly. They are no-op | ||
| // going forward. |
There was a problem hiding this comment.
Then maybe lets mark the params deprecated with a version param of 10.1
There was a problem hiding this comment.
Also, use the DeprecationLog so that the user sees a warning if they provide these params
| return new Lucene102HnswBinaryQuantizedVectorsFormat(getHnswM(), getHnswEfConstruction()); | ||
| return new Lucene104HnswScalarQuantizedVectorsFormat( | ||
| ScalarEncoding.SINGLE_BIT_QUERY_NIBBLE, getHnswM(), getHnswEfConstruction()); |
| useCompression(), | ||
| getConfidenceInterval(), |
There was a problem hiding this comment.
Perhaps reduce visibility of the accessors and/or mark them deprecated.
- public boolean useCompression() {
+ @Deprecated
+ @VisibleForTesting
+ boolean useCompression() {
| + | ||
| Accepted values: `BOOLEAN` | ||
| `NOTE:` Existing Solr 10 schemas may still contain legacy scalar-quantization parameters such as | ||
| `confidenceInterval`, `dynamicConfidenceInterval`, or `compress`. They are accepted for backward |
There was a problem hiding this comment.
| `confidenceInterval`, `dynamicConfidenceInterval`, or `compress`. They are accepted for backward | |
| `confidenceInterval`, `dynamicConfidenceInterval`, or `compress`. They are deprecated (i.e. parsed and ignored but not used) for backward |
…Log to notify the user
|
Thank you for the review @dsmiley and @cpoerschke. I have incorporated all the comments except one by David (pending clarification). |
https://issues.apache.org/jira/browse/SOLR-18143
Description
Upgrade Lucene dependency to 10.4
Solution
Followed instructions in dev-docs/lucene-upgrade.md and resolved compilation/test failures. Also made changes to documentation and upgrade guide wherever applicable.