-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Block loaders for MV_MIN and MV_MAX for keywords #137473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds special purpose `BlockLoader` implementations for the `MV_MIN` and `MV_MAX` functions for `keyword` fields with doc values. These are a noop for single valued keywords but should be *much* faster for multivalued keywords. These aren't plugged in yet. We can plug them in and performance test them in elastic#137382. And they give us two more functions we can use to demonstrate elastic#137382.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you, Nik!
| if (docs.count() - offset == 1) { | ||
| return readSingleDoc(factory, docs.get(offset)); | ||
| } | ||
| try (var builder = factory.sortedSetOrdinalsBuilder(ordinals, docs.count() - offset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use singletonOrdinalsBuilder since each row should have at most one value. However, we are not using it because it requires SortedDocValues. This is not an issue here, but if this pattern recurs, we might consider relaxing singletonOrdinalsBuilder to accept an interface that supports looking up terms by ordinal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
|
||
| @Override | ||
| protected AllReader sortedSetReader(SortedSetDocValues docValues) { | ||
| return new MvMaxSortedSet(docValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered wrapping SortedSetDocValues as SortedDocValues and then calling Singleton, but I think the approach in this PR is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a little more paranoid, yeah.
| /** | ||
| * Loads {@code keyword} style fields that are stored as a lookup table. | ||
| */ | ||
| abstract class AbstractBytesRefsFromOrdsBlockLoader extends BlockDocValuesReader.DocValuesBlockLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
|
||
| @Override | ||
| protected AllReader sortedReader(SortedNumericDocValues docValues) { | ||
| return new MvMinSorted(docValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: MvMaxSorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| return "IntsFromDocValues[" + fieldName + "]"; | ||
| } | ||
|
|
||
| public static class MvMinSorted extends BlockDocValuesReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: MvMaxSorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Adds special purpose
BlockLoaderimplementations for theMV_MINandMV_MAXfunctions forkeywordfields with doc values. These are a noop for single valued keywords but should be much faster for multivalued keywords.These aren't plugged in yet. We can plug them in and performance test them in #137382. And they give us two more functions we can use to demonstrate #137382.