Adding logic for histogram aggregation using skiplist #19130

jainankitk · 2025-08-22T23:12:52Z

Description

~~This PR adds logic for histogram collection using skiplist. PR not to be reviewed, just poc for how skiplist might help efficiently collect the matching documents for bucket aggregation use cases~~
This PR adds logic to efficiently collect the matching documents for date histogram bucket aggregation using skiplist.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
~~[ ] API changes companion pull request created, if applicable.~~
~~[ ] Public documentation issue/PR created, if applicable.~~

co-author: @asimmahmood1

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2025-08-22T23:18:24Z

❌ Gradle check result for 2747c0d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

asimmahmood1 · 2025-08-27T21:51:33Z

Thanks for the draft.

I tested the changes using 20% of nyc_taxis corpus, so ~4GB.

Query

curl -XGET "http://localhost:9200/nyc_taxis/_search" -H "Content-Type: application/json" -d '{
  "size": 0,
  "query": {
            "match_all": {}
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month"
            }
          }
        }
}'

The values aren't correct, baseline on avg 600, and skiplist was 550, so not a huge difference.

The flame graph shows that skiplist collector spent more of time trying to add to bucket, so your idea of collecting locally until next bucket would really help.

Baseline

baseline_dateagg_nync.html

Skiplist

skiplist_dateagg_nync.html

...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java

jainankitk · 2025-09-03T20:50:08Z

@asimmahmood1 - As discussed offline, I realized that we can get maximum benefit using skip_list if the index itself is sorted on the field for which skip_list is being used to align the docId with docValues for that specific field. Let us discuss further once we have updated numbers on the data indexed with sort field specified

asimmahmood1 · 2025-09-04T18:25:55Z

I tested this change with index sort enabled on dropoff_datetime (nyc_taxis does not have @timestamp field). There a major speed up compared to doc value agg.

sort": [
                  {
                    "field": "dropoff_datetime",
                    "mode": "min",
                    "missing": "9223372036854775807",
                    "reverse": false
                  }

query:


curl -XGET "http://localhost:9200/nyc_taxis/_search" -H "Content-Type: application/json" -d '{
  "size": 0,
  "query": {
            "match_all": {}
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month"
            }
          }
        }
}'

Results

baseline (bkd)	histo (skiplist)	histo (no skiplist)
5	11	630

Let me capture flamegraphs.

github-actions · 2025-09-23T18:13:32Z

❌ Gradle check result for b39ac57: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Asim Mahmood <[email protected]>

asimmahmood1 · 2025-09-23T19:18:06Z

Added nyc_taxis operation:

    {
      "name": "date_histogram_calendar_interval_with_filter",
      "operation-type": "search",
      "body": {
        "size": 0,
         "query": {
           "bool": {
             "filter": {
               "term": {
                 "trip_type": 2
               }
             }
           }
        },
        "aggs": {
          "dropoffs_over_time": {
            "date_histogram": {
              "field": "dropoff_datetime",
              "calendar_interval": "month"
            }
          }
        }
      }
    },

Baseline vs Candidate -

Metric	Task	Baseline	Contender	%Diff	Diff	Unit
Min Throughput	date_histogram_calendar_interval_with_filter	1.48533	1.50975	1.64%	0.02441	ops/s
Mean Throughput	date_histogram_calendar_interval_with_filter	1.49176	1.51611	1.63%	0.02435	ops/s
Median Throughput	date_histogram_calendar_interval_with_filter	1.49247	1.51467	1.49%	0.0222	ops/s
Max Throughput	date_histogram_calendar_interval_with_filter	1.49498	1.52903	2.28%	0.03405	ops/s
50th percentile latency	date_histogram_calendar_interval_with_filter	33.6987	9.22881	-72.61% 🟢	-24.4699	ms
90th percentile latency	date_histogram_calendar_interval_with_filter	38.0945	10.5454	-72.32% 🟢	-27.5491	ms
99th percentile latency	date_histogram_calendar_interval_with_filter	38.9266	12.8582	-66.97% 🟢	-26.0685	ms
100th percentile latency	date_histogram_calendar_interval_with_filter	39.0018	13.3591	-65.75% 🟢	-25.6427	ms
50th percentile service time	date_histogram_calendar_interval_with_filter	32.2213	7.70275	-76.09% 🟢	-24.5186	ms
90th percentile service time	date_histogram_calendar_interval_with_filter	36.5675	8.91834	-75.61% 🟢	-27.6492	ms
99th percentile service time	date_histogram_calendar_interval_with_filter	37.1727	10.9156	-70.64% 🟢	-26.2571	ms
100th percentile service time	date_histogram_calendar_interval_with_filter	37.7295	11.7682	-68.81% 🟢	-25.9612	ms
error rate	date_histogram_calendar_interval_with_filter	0	0	0.00%	0	%

asimmahmood1 · 2025-09-23T19:29:01Z

skiplist_sorted_index_nyc_skiplist_final.html

asimmahmood1 · 2025-09-23T19:36:00Z

opensearch-project/opensearch-benchmark-workloads#697

github-actions · 2025-09-23T20:05:14Z

✅ Gradle check result for bacbeb4: SUCCESS

jainankitk · 2025-09-23T20:31:39Z

opensearch-project/opensearch-benchmark-workloads#697

I see some significant regression as part of this benchmark run for non-filtered queries. Is the regression due to some setting causing multi range traversal to get skipped?

asimmahmood1 · 2025-09-23T20:47:45Z

I see some significant regression as part of this benchmark run for non-filtered queries. Is the regression due to some setting causing multi range traversal to get skipped?

My baseline is 3.2, so not direct comparison, I'll need to create a new baseline on 3.3.

In the mean time, let me check if I can run nyc_taxis manually on another setup.

Signed-off-by: Ankit Jain <[email protected]>

github-actions · 2025-09-24T19:58:54Z

❌ Gradle check result for 0acbf1d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

jainankitk

Code changes LGTM! We can merge this PR once we ascertain that the regression seen in opensearch-project/opensearch-benchmark-workloads#697 is not due to this PR. While most of the changes have been done by @asimmahmood1, I initiated POC PR, hence unable to approve it myself.

asimmahmood1 · 2025-09-24T23:02:07Z

So this is comparing only the SkiplistAgg, on same index:

[ec2-user@ip-172-31-61-197 ~]$ opensearch-benchmark compare -b bf0a911e-35c7-4fb1-8588-c6456ed8fa73 -c 5e89aafd-d01a-41bb-ab6b-f3e0e2144ea5

/ __ ____ ___ ____ / / ____ / / / __ ) ____ / / ____ ___ ____ / /
/ / / / __ / _ / __ \ / _ / __ / ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ / __ `/ / ///
/ // / // / / / / // / / /_/ / / / // / / / / // / / / / / // / / / / / / / / // / / / ,<
_/ ./_// //____/_/_,/_/ _// // //_// //_// /// // //_,// //||
/_/

Comparing baseline
TestExecution ID: bf0a911e-35c7-4fb1-8588-c6456ed8fa73
TestExecution timestamp: 2025-09-24 22:47:29
TestProcedure: append-no-conflicts
ProvisionConfigInstance: external
User tags: asimmahm=3.3-nyc-dateskiplist-baseline

with contender
TestExecution ID: 5e89aafd-d01a-41bb-ab6b-f3e0e2144ea5
TestExecution timestamp: 2025-09-24 22:40:35
TestProcedure: append-no-conflicts
ProvisionConfigInstance: external
User tags: asimmahm=3.3-nyc-dateskiplist-candidate

_______             __   _____

/ () __ / / / /_ ______
/ /_ / / / `/ / / / / / _
/ / / / / / / // / / / / // // / / / /
// /// //_,// /__/_/_// ___/

Metric	Task	Baseline	Contender	%Diff	Diff	Unit
Cumulative indexing time of primary shards		0	0	0.00%	0	min
Min cumulative indexing time across primary shard		0	0	0.00%	0	min
Median cumulative indexing time across primary shard		0	0	0.00%	0	min
Max cumulative indexing time across primary shard		0	0	0.00%	0	min
Cumulative indexing throttle time of primary shards		0	0	0.00%	0	min
Min cumulative indexing throttle time across primary shard		0	0	0.00%	0	min
Median cumulative indexing throttle time across primary shard		0	0	0.00%	0	min
Max cumulative indexing throttle time across primary shard		0	0	0.00%	0	min
Cumulative merge time of primary shards		0	0	0.00%	0	min
Cumulative merge count of primary shards		0	0	0.00%	0
Min cumulative merge time across primary shard		0	0	0.00%	0	min
Median cumulative merge time across primary shard		0	0	0.00%	0	min
Max cumulative merge time across primary shard		0	0	0.00%	0	min
Cumulative merge throttle time of primary shards		0	0	0.00%	0	min
Min cumulative merge throttle time across primary shard		0	0	0.00%	0	min
Median cumulative merge throttle time across primary shard		0	0	0.00%	0	min
Max cumulative merge throttle time across primary shard		0	0	0.00%	0	min
Cumulative refresh time of primary shards		0	0	0.00%	0	min
Cumulative refresh count of primary shards		2	2	0.00%	0
Min cumulative refresh time across primary shard		0	0	0.00%	0	min
Median cumulative refresh time across primary shard		0	0	0.00%	0	min
Max cumulative refresh time across primary shard		0	0	0.00%	0	min
Cumulative flush time of primary shards		0	0	0.00%	0	min
Cumulative flush count of primary shards		0	0	0.00%	0
Min cumulative flush time across primary shard		0	0	0.00%	0	min
Median cumulative flush time across primary shard		0	0	0.00%	0	min
Max cumulative flush time across primary shard		0	0	0.00%	0	min
Total Young Gen GC time		0	0	0.00%	0	s
Total Young Gen GC count		0	0	0.00%	0
Total Old Gen GC time		0	0	0.00%	0	s
Total Old Gen GC count		0	0	0.00%	0
Store size		4.36969	4.36969	0.00%	0	GB
Translog size		5.12227e-08	5.12227e-08	0.00%	0	GB
Heap used for segments		0	0	0.00%	0	MB
Heap used for doc values		0	0	0.00%	0	MB
Heap used for terms		0	0	0.00%	0	MB
Heap used for norms		0	0	0.00%	0	MB
Heap used for points		0	0	0.00%	0	MB
Heap used for stored fields		0	0	0.00%	0	MB
Segment count		10	10	0.00%	0
Min Throughput	date_histogram_calendar_interval	1.22775	1.50107	+22.26% 🔴	0.27332	ops/s
Mean Throughput	date_histogram_calendar_interval	1.23948	1.50174	+21.16% 🔴	0.26226	ops/s
Median Throughput	date_histogram_calendar_interval	1.24216	1.5016	+20.89% 🔴	0.25944	ops/s
Max Throughput	date_histogram_calendar_interval	1.24348	1.50306	+20.88% 🔴	0.25958	ops/s
50th percentile latency	date_histogram_calendar_interval	14112.5	146.298	-98.96% 🟢	-13966.2	ms
90th percentile latency	date_histogram_calendar_interval	19572.2	149.045	-99.24% 🟢	-19423.1	ms
99th percentile latency	date_histogram_calendar_interval	20780.3	173.572	-99.16% 🟢	-20606.7	ms
100th percentile latency	date_histogram_calendar_interval	20906.5	260.943	-98.75% 🟢	-20645.6	ms
50th percentile service time	date_histogram_calendar_interval	795.485	144.83	-81.79% 🟢	-650.656	ms
90th percentile service time	date_histogram_calendar_interval	816.351	148.061	-81.86% 🟢	-668.29	ms
99th percentile service time	date_histogram_calendar_interval	843.014	171.686	-79.63% 🟢	-671.328	ms
100th percentile service time	date_histogram_calendar_interval	849.253	259.405	-69.45% 🟢	-589.849	ms
error rate	date_histogram_calendar_interval	0	0	0.00%	0	%
Min Throughput	date_histogram_calendar_interval_with_filter	1.50912	1.50928	0.01%	0.00016	ops/s
Mean Throughput	date_histogram_calendar_interval_with_filter	1.51506	1.51533	0.02%	0.00026	ops/s
Median Throughput	date_histogram_calendar_interval_with_filter	1.5137	1.51395	0.02%	0.00025	ops/s
Max Throughput	date_histogram_calendar_interval_with_filter	1.52712	1.52761	0.03%	0.00049	ops/s
50th percentile latency	date_histogram_calendar_interval_with_filter	19.6856	9.4227	-52.13% 🟢	-10.2629	ms
90th percentile latency	date_histogram_calendar_interval_with_filter	20.7572	10.3136	-50.31% 🟢	-10.4436	ms
99th percentile latency	date_histogram_calendar_interval_with_filter	25.5856	12.8673	-49.71% 🟢	-12.7183	ms
100th percentile latency	date_histogram_calendar_interval_with_filter	26.2323	13.2452	-49.51% 🟢	-12.9871	ms
50th percentile service time	date_histogram_calendar_interval_with_filter	18.1354	7.93881	-56.22% 🟢	-10.1965	ms
90th percentile service time	date_histogram_calendar_interval_with_filter	19.2861	8.82866	-54.22% 🟢	-10.4574	ms
99th percentile service time	date_histogram_calendar_interval_with_filter	23.9196	11.6412	-51.33% 🟢	-12.2784	ms
100th percentile service time	date_histogram_calendar_interval_with_filter	24.4274	11.6436	-52.33% 🟢	-12.7838	ms
error rate	date_histogram_calendar_interval_with_filter	0	0	0.00%	0	%

[INFO] SUCCESS (took 0 seconds)

jainankitk · 2025-09-24T23:11:19Z

Okay as per the latest benchmark numbers- #19130 (comment), this change looks really promising.

@rishabh6788 - The benchmark should reflect increase in throughput as green instead of red. I was slightly
confused by that initially.

github-actions · 2025-09-24T23:13:01Z

❌ Gradle check result for 0acbf1d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

jainankitk · 2025-09-24T23:31:13Z

Unrelated test failures:

[Test Result](https://build.ci.opensearch.org/job/gradle-check/64286/testReport/) (2 failures / -3)

    [org.opensearch.plugin.kafka.KafkaSingleNodeTests.testShardInitializationUsingUnknownTopic](https://build.ci.opensearch.org/job/gradle-check/64286/testReport/junit/org.opensearch.plugin.kafka/KafkaSingleNodeTests/testShardInitializationUsingUnknownTopic/)
    [org.opensearch.plugin.kafka.KafkaSingleNodeTests.testPauseAndResumeAPIs](https://build.ci.opensearch.org/job/gradle-check/64286/testReport/junit/org.opensearch.plugin.kafka/KafkaSingleNodeTests/testPauseAndResumeAPIs/)

Retrying gradle check

github-actions · 2025-09-24T23:42:07Z

❌ Gradle check result for 0acbf1d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-09-25T00:45:25Z

❌ Gradle check result for 0acbf1d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-09-25T07:04:54Z

✅ Gradle check result for 0acbf1d: SUCCESS

…ject#19130) --------- Signed-off-by: Ankit Jain <[email protected]> Signed-off-by: Asim Mahmood <[email protected]> Signed-off-by: Ankit Jain <[email protected]> Co-authored-by: Asim Mahmood <[email protected]>

jainankitk requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, VachaShah, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners August 22, 2025 23:12

jainankitk marked this pull request as draft August 22, 2025 23:13

asimmahmood1 reviewed Aug 27, 2025

View reviewed changes

...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java Outdated Show resolved Hide resolved

bowenlan-amzn reviewed Sep 2, 2025

View reviewed changes

...c/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java Outdated Show resolved Hide resolved

asimmahmood1 added this to Performance Roadmap Sep 4, 2025

github-project-automation bot moved this to Todo in Performance Roadmap Sep 4, 2025

github-project-automation bot moved this from Done to In Progress in Performance Roadmap Sep 23, 2025

Fix style

bacbeb4

Signed-off-by: Asim Mahmood <[email protected]>

jainankitk mentioned this pull request Sep 23, 2025

[META] Skip List Based Optimization for Aggregations in OpenSearch #19384

Open

Merge branch 'main' into skip-agg

0acbf1d

Signed-off-by: Ankit Jain <[email protected]>

jainankitk commented Sep 24, 2025

View reviewed changes

opensearch-ci-bot mentioned this pull request Sep 24, 2025

[AUTOCUT] Gradle Check Flaky Test Report for IndexServiceTests #14407

Open

rishabhmaurya approved these changes Sep 24, 2025

View reviewed changes

jainankitk merged commit 1c171b7 into opensearch-project:main Sep 25, 2025
36 of 45 checks passed

github-project-automation bot moved this from In Progress to Done in Performance Roadmap Sep 25, 2025

jainankitk deleted the skip-agg branch September 25, 2025 16:29

opensearch-ci-bot mentioned this pull request Sep 24, 2025

[AUTOCUT] Gradle Check Flaky Test Report for RemoteStoreReplicationSourceTests #16683

Open

asimmahmood1 mentioned this pull request Sep 26, 2025

Add sub aggregation support for histogram aggregation using skiplist #19438

Merged

1 task

asimmahmood1 mentioned this pull request Sep 30, 2025

Enable skip_list for @timestamp field or index sort field by default #19480

Merged

2 tasks

Adding logic for histogram aggregation using skiplist #19130

Adding logic for histogram aggregation using skiplist #19130

Uh oh!

Conversation

jainankitk commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

asimmahmood1 commented Aug 27, 2025

Query

Baseline

Skiplist

Uh oh!

Uh oh!

Uh oh!

jainankitk commented Sep 3, 2025

Uh oh!

asimmahmood1 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

asimmahmood1 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Baseline vs Candidate -

Uh oh!

asimmahmood1 commented Sep 23, 2025

Uh oh!

asimmahmood1 commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

jainankitk commented Sep 23, 2025

Uh oh!

asimmahmood1 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

jainankitk left a comment

Choose a reason for hiding this comment

Uh oh!

asimmahmood1 commented Sep 24, 2025

/ () ____ / / / /_____ ________ / /_ / / __ / __ `/ / __ / / __ / / _ / __/ / / / / / // / / / / // // / / / __/ // /// //_,// /__/_/_// ___/

[INFO] SUCCESS (took 0 seconds)

Uh oh!

jainankitk commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

jainankitk commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jainankitk commented Aug 22, 2025 •

edited

Loading

asimmahmood1 commented Sep 4, 2025 •

edited

Loading

asimmahmood1 commented Sep 23, 2025 •

edited

Loading

asimmahmood1 commented Sep 23, 2025 •

edited

Loading

/ () __ / / / /_ ______
/ /_ / / / `/ / / / / / _
/ / / / / / / // / / / / // // / / / /
// /// //_,// /__/_/_// ___/