Open
Conversation
kelockhart
approved these changes
Apr 15, 2025
Member
kelockhart
left a comment
There was a problem hiding this comment.
The expected behavior in the unit tests looks good to me
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
The PR masks first author searches created using the dedicated syntax sugar. It also introduces a configuration path to expand this desugaring to other fields as needed.
Why?
After the Solr upgrade from 7 to 9 we experienced a large number of failing position queries. This is because the max clause limit, previously enforced only for boolean clauses, was expanded to more query types. Unfortunately, prefix queries, which are commonly used in author searches, are expanded to all matching terms prior to lookup; these terms are each represented as a single clause and joined together into a (con/dis)junction. Given the size of our collection, it's not uncommon for author searches to pull in 10k+ author names, which is far higher than the default limit.
Alternatives
We could have altered the max clause limit to be something ludicrously high-- this would have mostly reverted the behavior. However, in some limited circumstances the limit would still be breached and users would likely be even more confused than they are today. There's also the chance that a user might discover the increased limit and use it to DDoS ADS/SciX.
Thanks to @sstults for the pairing time spent on this PR.