-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Adding logic for histogram aggregation using skiplist #19130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
❌ Gradle check result for 2747c0d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
@asimmahmood1 - As discussed offline, I realized that we can get maximum benefit using skip_list if the index itself is sorted on the field for which skip_list is being used to align the docId with docValues for that specific field. Let us discuss further once we have updated numbers on the data indexed with sort field specified |
I tested this change with index sort enabled on
query:
Results
Let me capture flamegraphs. |
❌ Gradle check result for b39ac57: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Asim Mahmood <[email protected]>
Added nyc_taxis operation:
Baseline vs Candidate -
|
I see some significant regression as part of this benchmark run for non-filtered queries. Is the regression due to some setting causing multi range traversal to get skipped? |
My baseline is 3.2, so not direct comparison, I'll need to create a new baseline on 3.3. In the mean time, let me check if I can run nyc_taxis manually on another setup. |
Signed-off-by: Ankit Jain <[email protected]>
❌ Gradle check result for 0acbf1d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes LGTM! We can merge this PR once we ascertain that the regression seen in opensearch-project/opensearch-benchmark-workloads#697 is not due to this PR. While most of the changes have been done by @asimmahmood1, I initiated POC PR, hence unable to approve it myself.
So this is comparing only the SkiplistAgg, on same index: [ec2-user@ip-172-31-61-197 ~]$ opensearch-benchmark compare -b bf0a911e-35c7-4fb1-8588-c6456ed8fa73 -c 5e89aafd-d01a-41bb-ab6b-f3e0e2144ea5 / __ ____ ___ ____ / / ____ / / / __ ) ____ / / ____ ___ ____ / / Comparing baseline with contender
/ () ____ / / / /_____ ________
|
Metric | Task | Baseline | Contender | %Diff | Diff | Unit |
---|---|---|---|---|---|---|
Cumulative indexing time of primary shards | 0 | 0 | 0.00% | 0 | min | |
Min cumulative indexing time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Median cumulative indexing time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Max cumulative indexing time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Cumulative indexing throttle time of primary shards | 0 | 0 | 0.00% | 0 | min | |
Min cumulative indexing throttle time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Median cumulative indexing throttle time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Max cumulative indexing throttle time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Cumulative merge time of primary shards | 0 | 0 | 0.00% | 0 | min | |
Cumulative merge count of primary shards | 0 | 0 | 0.00% | 0 | ||
Min cumulative merge time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Median cumulative merge time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Max cumulative merge time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Cumulative merge throttle time of primary shards | 0 | 0 | 0.00% | 0 | min | |
Min cumulative merge throttle time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Median cumulative merge throttle time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Max cumulative merge throttle time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Cumulative refresh time of primary shards | 0 | 0 | 0.00% | 0 | min | |
Cumulative refresh count of primary shards | 2 | 2 | 0.00% | 0 | ||
Min cumulative refresh time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Median cumulative refresh time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Max cumulative refresh time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Cumulative flush time of primary shards | 0 | 0 | 0.00% | 0 | min | |
Cumulative flush count of primary shards | 0 | 0 | 0.00% | 0 | ||
Min cumulative flush time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Median cumulative flush time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Max cumulative flush time across primary shard | 0 | 0 | 0.00% | 0 | min | |
Total Young Gen GC time | 0 | 0 | 0.00% | 0 | s | |
Total Young Gen GC count | 0 | 0 | 0.00% | 0 | ||
Total Old Gen GC time | 0 | 0 | 0.00% | 0 | s | |
Total Old Gen GC count | 0 | 0 | 0.00% | 0 | ||
Store size | 4.36969 | 4.36969 | 0.00% | 0 | GB | |
Translog size | 5.12227e-08 | 5.12227e-08 | 0.00% | 0 | GB | |
Heap used for segments | 0 | 0 | 0.00% | 0 | MB | |
Heap used for doc values | 0 | 0 | 0.00% | 0 | MB | |
Heap used for terms | 0 | 0 | 0.00% | 0 | MB | |
Heap used for norms | 0 | 0 | 0.00% | 0 | MB | |
Heap used for points | 0 | 0 | 0.00% | 0 | MB | |
Heap used for stored fields | 0 | 0 | 0.00% | 0 | MB | |
Segment count | 10 | 10 | 0.00% | 0 | ||
Min Throughput | date_histogram_calendar_interval | 1.22775 | 1.50107 | +22.26% 🔴 | 0.27332 | ops/s |
Mean Throughput | date_histogram_calendar_interval | 1.23948 | 1.50174 | +21.16% 🔴 | 0.26226 | ops/s |
Median Throughput | date_histogram_calendar_interval | 1.24216 | 1.5016 | +20.89% 🔴 | 0.25944 | ops/s |
Max Throughput | date_histogram_calendar_interval | 1.24348 | 1.50306 | +20.88% 🔴 | 0.25958 | ops/s |
50th percentile latency | date_histogram_calendar_interval | 14112.5 | 146.298 | -98.96% 🟢 | -13966.2 | ms |
90th percentile latency | date_histogram_calendar_interval | 19572.2 | 149.045 | -99.24% 🟢 | -19423.1 | ms |
99th percentile latency | date_histogram_calendar_interval | 20780.3 | 173.572 | -99.16% 🟢 | -20606.7 | ms |
100th percentile latency | date_histogram_calendar_interval | 20906.5 | 260.943 | -98.75% 🟢 | -20645.6 | ms |
50th percentile service time | date_histogram_calendar_interval | 795.485 | 144.83 | -81.79% 🟢 | -650.656 | ms |
90th percentile service time | date_histogram_calendar_interval | 816.351 | 148.061 | -81.86% 🟢 | -668.29 | ms |
99th percentile service time | date_histogram_calendar_interval | 843.014 | 171.686 | -79.63% 🟢 | -671.328 | ms |
100th percentile service time | date_histogram_calendar_interval | 849.253 | 259.405 | -69.45% 🟢 | -589.849 | ms |
error rate | date_histogram_calendar_interval | 0 | 0 | 0.00% | 0 | % |
Min Throughput | date_histogram_calendar_interval_with_filter | 1.50912 | 1.50928 | 0.01% | 0.00016 | ops/s |
Mean Throughput | date_histogram_calendar_interval_with_filter | 1.51506 | 1.51533 | 0.02% | 0.00026 | ops/s |
Median Throughput | date_histogram_calendar_interval_with_filter | 1.5137 | 1.51395 | 0.02% | 0.00025 | ops/s |
Max Throughput | date_histogram_calendar_interval_with_filter | 1.52712 | 1.52761 | 0.03% | 0.00049 | ops/s |
50th percentile latency | date_histogram_calendar_interval_with_filter | 19.6856 | 9.4227 | -52.13% 🟢 | -10.2629 | ms |
90th percentile latency | date_histogram_calendar_interval_with_filter | 20.7572 | 10.3136 | -50.31% 🟢 | -10.4436 | ms |
99th percentile latency | date_histogram_calendar_interval_with_filter | 25.5856 | 12.8673 | -49.71% 🟢 | -12.7183 | ms |
100th percentile latency | date_histogram_calendar_interval_with_filter | 26.2323 | 13.2452 | -49.51% 🟢 | -12.9871 | ms |
50th percentile service time | date_histogram_calendar_interval_with_filter | 18.1354 | 7.93881 | -56.22% 🟢 | -10.1965 | ms |
90th percentile service time | date_histogram_calendar_interval_with_filter | 19.2861 | 8.82866 | -54.22% 🟢 | -10.4574 | ms |
99th percentile service time | date_histogram_calendar_interval_with_filter | 23.9196 | 11.6412 | -51.33% 🟢 | -12.2784 | ms |
100th percentile service time | date_histogram_calendar_interval_with_filter | 24.4274 | 11.6436 | -52.33% 🟢 | -12.7838 | ms |
error rate | date_histogram_calendar_interval_with_filter | 0 | 0 | 0.00% | 0 | % |
[INFO] SUCCESS (took 0 seconds)
Okay as per the latest benchmark numbers- #19130 (comment), this change looks really promising. @rishabh6788 - The benchmark should reflect increase in throughput as green instead of red. I was slightly |
❌ Gradle check result for 0acbf1d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Unrelated test failures:
Retrying gradle check |
❌ Gradle check result for 0acbf1d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 0acbf1d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…ject#19130) --------- Signed-off-by: Ankit Jain <[email protected]> Signed-off-by: Asim Mahmood <[email protected]> Signed-off-by: Ankit Jain <[email protected]> Co-authored-by: Asim Mahmood <[email protected]>
…ject#19130) --------- Signed-off-by: Ankit Jain <[email protected]> Signed-off-by: Asim Mahmood <[email protected]> Signed-off-by: Ankit Jain <[email protected]> Co-authored-by: Asim Mahmood <[email protected]>
Description
This PR adds logic for histogram collection using skiplist. PR not to be reviewed, just poc for how skiplist might help efficiently collect the matching documents for bucket aggregation use casesThis PR adds logic to efficiently collect the matching documents for date histogram bucket aggregation using skiplist.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.co-author: @asimmahmood1
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.