Skip to content

Conversation

gaobinlong
Copy link
Contributor

@gaobinlong gaobinlong commented Aug 29, 2025

Description

Similar to SimpleTopDocsCollectorContext, we can omit the MaxScoreCollector when sorting by score in CollapsingTopDocsCollectorContext, this can reduce the search latency and cpu usage.

By a benchmark test against the big5 index and other indices, this change can achieve 10%-20% performance improvement in both concurrent segment search enabled and disabled.

The query DSL on big5 index is:

GET big5/_search
{
  "query": {
    "match": {
      "message": "50-136-239-27"
    }
  },
  "collapse": {
    "field": "process.name.keyword"
  }
}

, the benchmark result:

Before:

|                                        50th percentile latency | collapsing |     740.061 |     ms |
|                                        90th percentile latency | collapsing |     748.504 |     ms |
|                                        99th percentile latency | collapsing |     771.312 |     ms |
|                                      99.9th percentile latency | collapsing |     843.527 |     ms |
|                                       100th percentile latency | collapsing |     857.434 |     ms |
|                                   50th percentile service time | collapsing |     738.741 |     ms |
|                                   90th percentile service time | collapsing |     747.404 |     ms |
|                                   99th percentile service time | collapsing |     770.124 |     ms |
|                                 99.9th percentile service time | collapsing |     842.116 |     ms |
|                                  100th percentile service time | collapsing |     856.587 |     ms |


After

|                                        50th percentile latency | collapsing |     568.893 |     ms |
|                                        90th percentile latency | collapsing |     579.025 |     ms |
|                                        99th percentile latency | collapsing |      621.08 |     ms |
|                                      99.9th percentile latency | collapsing |     660.756 |     ms |
|                                       100th percentile latency | collapsing |     709.702 |     ms |
|                                   50th percentile service time | collapsing |     567.327 |     ms |
|                                   90th percentile service time | collapsing |     577.377 |     ms |
|                                   99th percentile service time | collapsing |      619.84 |     ms |
|                                 99.9th percentile service time | collapsing |      659.55 |     ms |
|                                  100th percentile service time | collapsing |     708.664 |     ms |

In addition, this PR fixes the bug that max_score is null when we have multiple sort fields and _score is used as a primary sort.

Related Issues

Resolves #19179 and partly resolves #19180.

Check List

  • Functionality includes testing.
    - [ ] API changes companion pull request created, if applicable.
    - [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc Search:Performance labels Aug 29, 2025
Signed-off-by: Binlong Gao <[email protected]>
Signed-off-by: Binlong Gao <[email protected]>
Copy link
Contributor

❌ Gradle check result for 23018ff: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 23018ff: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.87%. Comparing base (af75e0b) to head (c5b8098).
⚠️ Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
...ensearch/search/query/TopDocsCollectorContext.java 82.35% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19181      +/-   ##
============================================
+ Coverage     72.84%   72.87%   +0.02%     
+ Complexity    69857    69823      -34     
============================================
  Files          5674     5674              
  Lines        320901   320911      +10     
  Branches      46394    46397       +3     
============================================
+ Hits         233756   233856     +100     
+ Misses        68205    68116      -89     
+ Partials      18940    18939       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Binlong Gao <[email protected]>
Copy link
Contributor

❌ Gradle check result for 82eb922: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c5b8098: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for c5b8098: SUCCESS

@gaobinlong
Copy link
Contributor Author

@reta @kkewwei could you help to review this PR? Thanks!

Copy link
Contributor

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @gaobinlong

@gaobinlong gaobinlong merged commit 01425df into opensearch-project:main Sep 29, 2025
33 of 35 checks passed
karenyrx pushed a commit to karenyrx/OpenSearch that referenced this pull request Sep 29, 2025
…ding (opensearch-project#19181)

* Omit maxScoreCollector for field collapsing when sort by score descending

Signed-off-by: Binlong Gao <[email protected]>

* Modify change log

Signed-off-by: Binlong Gao <[email protected]>

* Add yaml rest test

Signed-off-by: Binlong Gao <[email protected]>

---------

Signed-off-by: Binlong Gao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement Enhancement or improvement to existing feature or request Search:Performance Search Search query, autocomplete ...etc

Projects

None yet

3 participants