Skip to content

Enable sort optimization on int, short and byte fields #127968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mayya-sharipova
Copy link
Contributor

Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types.

This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization.

There are several caveats with changing sort type that are addressed:

  • Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values.
  • Similar for collapsing when there is mixed INT and LONG sort types.
  • Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT.

Before this PR sorting on integer, short and byte fields types
used SortField.Type.LONG. This made sort optimization impossible
for these field types.

This PR uses SortField.Type.INT for integer, short and byte fields.
This enables sort optimization.

There are several caveats with changing sort type that are addressed:
- Before mixed sort on integer and long fields was automatically supported,
as both field types used SortField.TYPE.LONG. Now when merging results
from different shards, we need to convert sort to LONG and results to
long values.
- Similar for collapsing when there is mixed INT and LONG sort types.
- Index sorting. Similarly, before for index sorting on integer field,
SortField.Type.LONG was used. This sort type is stored in the index
writer config on disk and can't be modified. Now when providing
sortField() for index sorting, we need to account for index version:
for older indices return sort with SortField.Type.LONG and for new
indices return SortField.Type.INT.
@mayya-sharipova mayya-sharipova added >enhancement :Search Relevance/Search Catch all for Search Relevance labels May 9, 2025
@elasticsearchmachine elasticsearchmachine added v9.1.0 Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels May 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @mayya-sharipova, I've created a changelog YAML for you.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented May 9, 2025

Benchmarks done on the geonames track where population and elevation were indexed as integer:
Results, the contender has over 2.4x - 6.7x times improvement in queries speed than the baseline

  • baseline (current main branch)
  • contender (this PR)
|                                                        Metric |                                       Task |         Baseline |        Contender |        Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|-------------------------------------------:|-----------------:|-----------------:|------------:|-------:|---------:|
|                                                 Segment count |                                            |     69           |     71           |     2       |        |   +2.90% |
|                                  90th percentile service time |                       desc_sort_population |     55.8922      |      9.73511     |   -46.1571  |     ms |  -82.58% |
|                                  90th percentile service time |    desc_sort_population_can_match_shortcut |     27.5603      |     11.3325      |   -16.2278  |     ms |  -58.88% |
|                                  90th percentile service time | desc_sort_population_no_can_match_shortcut |     27.6777      |     10.4456      |   -17.232   |     ms |  -62.26% |
|                                  90th percentile service time |                        asc_sort_population |     43.7791      |      7.9269      |   -35.8522  |     ms |  -81.89% |
|                                  90th percentile service time |             asc_sort_with_after_population |     62.1788      |      8.2714      |   -53.9074  |     ms |  -86.70% |
|                                  90th percentile service time |                        desc_sort_elevation |     52.0221      |      8.24807     |   -43.774   |     ms |  -84.15% |
|                                  90th percentile service time |                         asc_sort_elevation |     41.8961      |      6.16761     |   -35.7284  |     ms |  -85.28% |

Benchmarks done on the http_logs track where "ingest_percentage:20, where status and size were indexed as integer:
Results, the contender has over 6x-136x times improvement in queries speed than the baseline

  • baseline (current main branch)
  • contender (this PR)
|                                                        Metric |                                                   Task |         Baseline |        Contender |         Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|-------------------------------------------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
|                                                 Segment count |                                                        |     25           |     25           |      0       |        |    0.00% |
|                                  90th percentile service time |                                         sort_size_desc |    111.687       |     15.1521      |    -96.5345  |     ms |  -86.43% |
|                                  90th percentile service time |                                          sort_size_asc |     93.394       |     14.8044      |    -78.5896  |     ms |  -84.15% |
|                                  90th percentile service time |                                       sort_status_desc |    123.959       |     13.4924      |   -110.466   |     ms |  -89.12% |
|                                  90th percentile service time |                                        sort_status_asc |    105.192       |     11.857       |    -93.3347  |     ms |  -88.73% |
|                                  90th percentile service time |                 sort-size-desc-after-force-merge-1-seg |    111.995       |     21.2422      |    -90.7533  |     ms |  -81.03% |
|                                       90th percentile latency |                  sort-size-asc-after-force-merge-1-seg |  12386.2         |     11.8291      | -12374.4     |     ms |  -99.90% |
|                                       90th percentile latency |               sort-status-desc-after-force-merge-1-seg |    126.58        |     13.4277      |   -113.152   |     ms |  -89.39% |
|                                  90th percentile service time |                sort-status-asc-after-force-merge-1-seg |    104.873       |      7.42143     |    -97.4519  |     ms |  -92.92% |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants