-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Combining filter rewrite and skip list to optimize sub aggregation #19573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jainankitk
wants to merge
8
commits into
opensearch-project:main
Choose a base branch
from
jainankitk:agg-perf
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
469cf51
Combining filter rewrite and skip list approaches for further optimiz…
jainankitk e20f702
Removing parent aggregation check for perf benchmark
jainankitk 82bc95d
Adding changelog entry
jainankitk aff3dc6
Applying the skip list optimization for AutoDateHistogram
jainankitk 1c29540
Addressing checkstyle failures
jainankitk b9e9f2b
Apply spotless
jainankitk a28b9c1
Merge branch 'main' into agg-perf
jainankitk 0a9ef40
Minor bug fix
jainankitk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
server/src/main/java/org/opensearch/search/aggregations/BitSetDocIdStream.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
/* | ||
* SPDX-License-Identifier: Apache-2.0 | ||
* | ||
* The OpenSearch Contributors require contributions made to | ||
* this file be licensed under the Apache-2.0 license or a | ||
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.search.aggregations; | ||
|
||
import org.apache.lucene.search.CheckedIntConsumer; | ||
import org.apache.lucene.search.DocIdStream; | ||
import org.apache.lucene.util.FixedBitSet; | ||
import org.apache.lucene.util.MathUtil; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* DocIdStream implementation using FixedBitSet. This is duplicate of the implementation in Lucene | ||
* and should ideally eventually be removed. | ||
* | ||
* @opensearch.internal | ||
*/ | ||
public final class BitSetDocIdStream extends DocIdStream { | ||
private final FixedBitSet bitSet; | ||
private final int offset, max; | ||
private int upTo; | ||
|
||
public BitSetDocIdStream(FixedBitSet bitSet, int offset) { | ||
this.bitSet = bitSet; | ||
this.offset = offset; | ||
upTo = offset; | ||
max = MathUtil.unsignedMin(Integer.MAX_VALUE, offset + bitSet.length()); | ||
} | ||
|
||
@Override | ||
public boolean mayHaveRemaining() { | ||
return upTo < max; | ||
} | ||
|
||
@Override | ||
public void forEach(int upTo, CheckedIntConsumer<IOException> consumer) throws IOException { | ||
if (upTo > this.upTo) { | ||
upTo = Math.min(upTo, max); | ||
bitSet.forEach(this.upTo - offset, upTo - offset, offset, consumer); | ||
this.upTo = upTo; | ||
} | ||
} | ||
|
||
@Override | ||
public int count(int upTo) throws IOException { | ||
if (upTo > this.upTo) { | ||
upTo = Math.min(upTo, max); | ||
int count = bitSet.cardinality(this.upTo - offset, upTo - offset); | ||
this.upTo = upTo; | ||
return count; | ||
} else { | ||
return 0; | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
168 changes: 168 additions & 0 deletions
168
...c/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
/* | ||
* SPDX-License-Identifier: Apache-2.0 | ||
* | ||
* The OpenSearch Contributors require contributions made to | ||
* this file be licensed under the Apache-2.0 license or a | ||
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.search.aggregations.bucket; | ||
|
||
import org.apache.lucene.index.DocValuesSkipper; | ||
import org.apache.lucene.index.NumericDocValues; | ||
import org.apache.lucene.search.DocIdStream; | ||
import org.apache.lucene.search.Scorable; | ||
import org.opensearch.common.Rounding; | ||
import org.opensearch.search.aggregations.LeafBucketCollector; | ||
import org.opensearch.search.aggregations.bucket.terms.LongKeyedBucketOrds; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Histogram collection logic using skip list. | ||
* | ||
* @opensearch.internal | ||
*/ | ||
public class HistogramSkiplistLeafCollector extends LeafBucketCollector { | ||
|
||
private final NumericDocValues values; | ||
private final DocValuesSkipper skipper; | ||
private final Rounding.Prepared preparedRounding; | ||
private final LongKeyedBucketOrds bucketOrds; | ||
private final LeafBucketCollector sub; | ||
private final BucketsAggregator aggregator; | ||
|
||
/** | ||
* Max doc ID (inclusive) up to which all docs values may map to the same bucket. | ||
*/ | ||
private int upToInclusive = -1; | ||
|
||
/** | ||
* Whether all docs up to {@link #upToInclusive} values map to the same bucket. | ||
*/ | ||
private boolean upToSameBucket; | ||
|
||
/** | ||
* Index in bucketOrds for docs up to {@link #upToInclusive}. | ||
*/ | ||
private long upToBucketIndex; | ||
|
||
public HistogramSkiplistLeafCollector( | ||
NumericDocValues values, | ||
DocValuesSkipper skipper, | ||
Rounding.Prepared preparedRounding, | ||
LongKeyedBucketOrds bucketOrds, | ||
LeafBucketCollector sub, | ||
BucketsAggregator aggregator | ||
) { | ||
this.values = values; | ||
this.skipper = skipper; | ||
this.preparedRounding = preparedRounding; | ||
this.bucketOrds = bucketOrds; | ||
this.sub = sub; | ||
this.aggregator = aggregator; | ||
} | ||
|
||
@Override | ||
public void setScorer(Scorable scorer) throws IOException { | ||
if (sub != null) { | ||
sub.setScorer(scorer); | ||
} | ||
} | ||
|
||
private void advanceSkipper(int doc, long owningBucketOrd) throws IOException { | ||
if (doc > skipper.maxDocID(0)) { | ||
skipper.advance(doc); | ||
} | ||
upToSameBucket = false; | ||
|
||
if (skipper.minDocID(0) > doc) { | ||
// Corner case which happens if `doc` doesn't have a value and is between two intervals of | ||
// the doc-value skip index. | ||
upToInclusive = skipper.minDocID(0) - 1; | ||
return; | ||
} | ||
|
||
upToInclusive = skipper.maxDocID(0); | ||
|
||
// Now find the highest level where all docs map to the same bucket. | ||
for (int level = 0; level < skipper.numLevels(); ++level) { | ||
int totalDocsAtLevel = skipper.maxDocID(level) - skipper.minDocID(level) + 1; | ||
long minBucket = preparedRounding.round(skipper.minValue(level)); | ||
long maxBucket = preparedRounding.round(skipper.maxValue(level)); | ||
|
||
if (skipper.docCount(level) == totalDocsAtLevel && minBucket == maxBucket) { | ||
// All docs at this level have a value, and all values map to the same bucket. | ||
upToInclusive = skipper.maxDocID(level); | ||
upToSameBucket = true; | ||
upToBucketIndex = bucketOrds.add(owningBucketOrd, maxBucket); | ||
if (upToBucketIndex < 0) { | ||
upToBucketIndex = -1 - upToBucketIndex; | ||
} | ||
} else { | ||
break; | ||
} | ||
} | ||
} | ||
|
||
@Override | ||
public void collect(int doc, long owningBucketOrd) throws IOException { | ||
if (doc > upToInclusive) { | ||
advanceSkipper(doc, owningBucketOrd); | ||
} | ||
|
||
if (upToSameBucket) { | ||
aggregator.incrementBucketDocCount(upToBucketIndex, 1L); | ||
sub.collect(doc, upToBucketIndex); | ||
} else if (values.advanceExact(doc)) { | ||
final long value = values.longValue(); | ||
long bucketIndex = bucketOrds.add(owningBucketOrd, preparedRounding.round(value)); | ||
if (bucketIndex < 0) { | ||
bucketIndex = -1 - bucketIndex; | ||
aggregator.collectExistingBucket(sub, doc, bucketIndex); | ||
} else { | ||
aggregator.collectBucket(sub, doc, bucketIndex); | ||
} | ||
} | ||
} | ||
|
||
@Override | ||
public void collect(DocIdStream stream) throws IOException { | ||
// This will only be called if its the top agg | ||
collect(stream, 0); | ||
} | ||
|
||
@Override | ||
public void collect(DocIdStream stream, long owningBucketOrd) throws IOException { | ||
// This will only be called if its the sub aggregation | ||
for (;;) { | ||
int upToExclusive = upToInclusive + 1; | ||
if (upToExclusive < 0) { // overflow | ||
upToExclusive = Integer.MAX_VALUE; | ||
} | ||
|
||
if (upToSameBucket) { | ||
if (sub == NO_OP_COLLECTOR) { | ||
// stream.count maybe faster when we don't need to handle sub-aggs | ||
long count = stream.count(upToExclusive); | ||
aggregator.incrementBucketDocCount(upToBucketIndex, count); | ||
} else { | ||
final int[] count = { 0 }; | ||
stream.forEach(upToExclusive, doc -> { | ||
sub.collect(doc, upToBucketIndex); | ||
count[0]++; | ||
}); | ||
aggregator.incrementBucketDocCount(upToBucketIndex, count[0]); | ||
} | ||
} else { | ||
stream.forEach(upToExclusive, doc -> collect(doc, owningBucketOrd)); | ||
} | ||
|
||
if (stream.mayHaveRemaining()) { | ||
advanceSkipper(upToExclusive, owningBucketOrd); | ||
} else { | ||
break; | ||
} | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out this preparedRound is too granular, so upToSameBucket is always false,
i.e. within the same skipList block (4k) the min != max.
Auto date histogram does some estimation of rounding, based on some starting interval, so we'll need to same within Skiplist version.
For now the game plan is:
getRounding()
method)