Add Hybrid Cardinality collector to prioritize Ordinals Collector #19524

anandpatel9998 · 2025-10-04T01:53:59Z

Description

Current cardinality aggregator logic selects DirectCollector over OrdinalsCollector when relative memory overhead due to OrdinalsCollector (compared to DirectCollector) is higher. Because of this relative memory consumption logic, DirectCollector is selected for high cardinality aggregation queries. DirectCollector is slower compared to OrdinalsCollector. This default selection leads to higher search latency even when Opensearch process have available memory to use ordinals collector for faster query performance.

There is no way to figure out memory requirement for nested aggregation because number of buckets are dynamically created as we traverse through all the matching document ids. To overcome this limitation, this change create a hybrid collector which will first use Ordinals Collector and will switch to DirectCollector if memory usage for Ordinals Collector Increase beyond certain threshold. When Hybrid collector switch from Ordinals Collector to Direct Collector, it will utilize already computed aggregation data from Ordinals Collector so that we do not have to rebuild aggregation result using Direct Collector.

Signed-off-by: Anand Pravinbhai Patel [email protected]

Related Issues

Resolves #19260

Check List

[ Done ] Functionality includes testing.
[ Not Applicable ] API changes companion pull request created, if applicable.
[ Is it required ? ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2025-10-04T03:56:13Z

❌ Gradle check result for a2f5dd7: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-04T23:20:52Z

❌ Gradle check result for 41a9e69: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-05T02:44:38Z

❌ Gradle check result for c142ac4: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-06T00:47:00Z

❌ Gradle check result for 88989f3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-06T17:19:51Z

❌ Gradle check result for c142ac4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-06T19:24:51Z

❌ Gradle check result for 06ce5c3: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-07T02:57:16Z

❌ Gradle check result for fc328a2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

anandpatel9998 · 2025-10-07T21:21:44Z

Thanks for the suggestion @owaiskazi19

I am wondering if that will help or not since if one process is running without latest commit changes, it may still fail. Can you help me understand how mixed cluster tests execute ?

owaiskazi19 · 2025-10-07T21:25:34Z

Mixed clusters tests mixed-version clusters to ensure that newer versions can interoperate correctly with older nodes. The :qa:mixed-cluster task spins up a test cluster composed of different versions (old/new nodes). Then the tests validate behavior across upgrades or during rolling restarts.
There is a blog also for the bwc framework: https://opensearch.org/blog/bwc-testing-for-opensearch/
You can also try conditional matching

- is_one_of: 
    profile.shards.0.aggregations.0.debug.ordinals_collectors_used: [0, 1]

github-actions · 2025-10-07T23:09:46Z

❕ Gradle check result for 4ee0fd1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2025-10-07T23:10:09Z

Codecov Report

❌ Patch coverage is 87.27273% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.06%. Comparing base (39b7a59) to head (a34c044).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
...ch/aggregations/metrics/CardinalityAggregator.java	92.68%	1 Missing and 2 partials ⚠️
...va/org/opensearch/search/DefaultSearchContext.java	83.33%	2 Missing ⚠️
.../org/opensearch/search/internal/SearchContext.java	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #19524      +/-   ##
============================================
+ Coverage     73.00%   73.06%   +0.05%     
+ Complexity    70534    70522      -12     
============================================
  Files          5719     5719              
  Lines        323260   323310      +50     
  Branches      46816    46818       +2     
============================================
+ Hits         235993   236217     +224     
+ Misses        68224    67995     -229     
- Partials      19043    19098      +55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

anandpatel9998 · 2025-10-07T23:32:12Z

Thanks @owaiskazi19 for your suggestions. Adding skip filter helped fix the mixed-cluster tests.

Current cardinality aggregator logic selects DirectCollector over OrdinalsCollector when relative memory overhead due to OrdinalsCollector (compared to DirectCollector) is higher. Because of this relative memory consumption logic, DirectCollector is selected for high cardinality aggregation queries. DirectCollector is slower compared to OrdinalsCollector. This default selection leads to higher search latency even when Opensearch process have available memory to use ordinals collector for faster query performance. There is no way to figure out memory requirement for nested aggregation because number of buckets are dynamically created as we traverse through all the matching document ids. To overcome this limitation, this change create a hybrid collector which will first use Ordinals Collector and will switch to DirectCollector if memory usage for Ordinals Collector Increase beyond certain threshold. When Hybrid collector switch from Ordinals Collector to Direct Collector, it will utilize already computed aggregation data from Ordinals Collector so that we do not have to rebuild aggregation result using Direct Collector. Signed-off-by: Anand Pravinbhai Patel <[email protected]>

Signed-off-by: Anand Patel <[email protected]>

github-actions · 2025-10-08T02:07:36Z

❕ Gradle check result for 522a92b: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2025-10-08T16:20:19Z

❌ Gradle check result for 6375b70: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-08T20:24:53Z

❌ Gradle check result for b666de2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-08T22:01:45Z

❌ Gradle check result for e9e7fe0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-09T00:20:34Z

❌ Gradle check result for 5848513: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-10-09T05:56:56Z

✅ Gradle check result for a34c044: SUCCESS

CHANGELOG.md

...api-spec/src/main/resources/rest-api-spec/test/search.aggregation/170_cardinality_metric.yml

Signed-off-by: Anand Patel <[email protected]>

github-actions · 2025-10-10T21:53:31Z

❌ Gradle check result for 871ff0a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

rishabhmaurya · 2025-10-11T00:45:07Z

server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java

                bits = new BitArray(maxOrd, bigArrays);
                visitedOrds.set(bucketOrd, bits);
+                // Update memory usage when new BitArray is created
+                currentMemoryUsage += memoryOverhead(maxOrd);


can we maintain a flag here if memory limit is breached and use it in hybrid collector check?
and maybe switch the active collector too? this may avoid additional check with each collect call.

Other way could be throw special exception here, catch it in hybrid collector and switch the collector to direct.

rishabhmaurya · 2025-10-11T00:53:26Z

server/src/main/java/org/opensearch/search/DefaultSearchContext.java

        return 0;
    }

+    private boolean evaluateCardinalityAggregationHybridCollectorEnabled() {


any reason for placing these methods in SearchContext?

rishabhmaurya · 2025-10-11T00:57:03Z

@anandpatel9998 I have added couple of comments, changes mostly looks good. Thanks for working on it.
Did you happen to run benchmark against cases where hybrid collector will come into action in big5? If not, we should add a query where we hit this code path and compare the performance when director collector would have used.

If we are able to prove decent gains, this change calls for a blogpost. Term with cardinality aggs is a pain point for a lot of users.

anandpatel9998 force-pushed the hybrid-collector branch from 88989f3 to c142ac4 Compare October 6, 2025 16:12

anandpatel9998 force-pushed the hybrid-collector branch from c142ac4 to 06ce5c3 Compare October 6, 2025 17:21

anandpatel9998 force-pushed the hybrid-collector branch from 06ce5c3 to fc328a2 Compare October 7, 2025 01:50

anandpatel9998 marked this pull request as ready for review October 7, 2025 02:58

anandpatel9998 requested review from Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners October 7, 2025 02:58

anandpatel9998 force-pushed the hybrid-collector branch from 3d9432e to 4ee0fd1 Compare October 7, 2025 21:43

anandpatel9998 force-pushed the hybrid-collector branch from 4ee0fd1 to d67fc4f Compare October 8, 2025 00:35

Merge branch 'main' into hybrid-collector

522a92b

Signed-off-by: Anand Patel <[email protected]>

anandpatel9998 force-pushed the hybrid-collector branch from 6375b70 to b666de2 Compare October 8, 2025 19:56

anandpatel9998 force-pushed the hybrid-collector branch from b666de2 to e9e7fe0 Compare October 8, 2025 20:52

anandpatel9998 force-pushed the hybrid-collector branch from e9e7fe0 to 5848513 Compare October 8, 2025 22:53

anandpatel9998 force-pushed the hybrid-collector branch from 5848513 to a34c044 Compare October 9, 2025 04:31

This was referenced Oct 9, 2025

[AUTOCUT] Gradle Check Flaky Test Report for IndexServiceTests #14407

Open

[AUTOCUT] Gradle Check Flaky Test Report for InstallPluginCommandTests #19256

Closed

[AUTOCUT] Gradle Check Flaky Test Report for ResourceAwareTasksTests #14293

Open

sandeshkr419 reviewed Oct 10, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

...api-spec/src/main/resources/rest-api-spec/test/search.aggregation/170_cardinality_metric.yml Outdated Show resolved Hide resolved

Merge branch 'main' into hybrid-collector

9d3118f

Signed-off-by: Anand Patel <[email protected]>

anandpatel9998 force-pushed the hybrid-collector branch from a34c044 to 9d3118f Compare October 10, 2025 20:46

Merge branch 'main' into hybrid-collector

871ff0a

Signed-off-by: Anand Patel <[email protected]>

rishabhmaurya reviewed Oct 11, 2025

View reviewed changes

Add Hybrid Cardinality collector to prioritize Ordinals Collector #19524

Are you sure you want to change the base?

Add Hybrid Cardinality collector to prioritize Ordinals Collector #19524

Conversation

anandpatel9998 commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

anandpatel9998 commented Oct 7, 2025

Uh oh!

owaiskazi19 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

codecov bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

anandpatel9998 commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

rishabhmaurya Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rishabhmaurya Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

rishabhmaurya commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anandpatel9998 commented Oct 4, 2025 •

edited

Loading

owaiskazi19 commented Oct 7, 2025 •

edited

Loading

codecov bot commented Oct 7, 2025 •

edited

Loading

rishabhmaurya Oct 11, 2025 •

edited

Loading

rishabhmaurya commented Oct 11, 2025 •

edited

Loading