Enable sort optimization on int, short and byte fields #127968

mayya-sharipova · 2025-05-09T13:18:57Z

Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types.

This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization.

There are several caveats with changing sort type that are addressed:

Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values.
Similar for collapsing when there is mixed INT and LONG sort types.
Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT.

There is only 1 change that may be considered not backwards compatible:
Before if an integer field was missing a value , it sort values will return Long.MAX_VALUE in a search response. With this integer, it sort valeu will return Integer.MAX_VALUE. But I think this change is ok, as in our documentation, we don't provide information what value will be returned, we just say it will be sorted last.

Also closes #127965 (as same type validation in added for collapse queries)

Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types. This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization. There are several caveats with changing sort type that are addressed: - Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values. - Similar for collapsing when there is mixed INT and LONG sort types. - Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT.

elasticsearchmachine · 2025-05-09T13:19:21Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-05-09T13:19:55Z

Hi @mayya-sharipova, I've created a changelog YAML for you.

mayya-sharipova · 2025-05-09T13:26:42Z

Benchmarks done on the geonames track where population and elevation were indexed as integer:
Results, the contender has over 2.4x - 6.7x times improvement in queries speed than the baseline

baseline (current main branch)
contender (this PR)

|                                                        Metric |                                       Task |         Baseline |        Contender |        Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|-------------------------------------------:|-----------------:|-----------------:|------------:|-------:|---------:|
|                                                 Segment count |                                            |     69           |     71           |     2       |        |   +2.90% |
|                                  90th percentile service time |                       desc_sort_population |     55.8922      |      9.73511     |   -46.1571  |     ms |  -82.58% |
|                                  90th percentile service time |    desc_sort_population_can_match_shortcut |     27.5603      |     11.3325      |   -16.2278  |     ms |  -58.88% |
|                                  90th percentile service time | desc_sort_population_no_can_match_shortcut |     27.6777      |     10.4456      |   -17.232   |     ms |  -62.26% |
|                                  90th percentile service time |                        asc_sort_population |     43.7791      |      7.9269      |   -35.8522  |     ms |  -81.89% |
|                                  90th percentile service time |             asc_sort_with_after_population |     62.1788      |      8.2714      |   -53.9074  |     ms |  -86.70% |
|                                  90th percentile service time |                        desc_sort_elevation |     52.0221      |      8.24807     |   -43.774   |     ms |  -84.15% |
|                                  90th percentile service time |                         asc_sort_elevation |     41.8961      |      6.16761     |   -35.7284  |     ms |  -85.28% |

Benchmarks done on the http_logs track where "ingest_percentage:20, where status and size were indexed as integer:
Results, the contender has over 6x-136x times improvement in queries speed than the baseline

baseline (current main branch)
contender (this PR)

|                                                        Metric |                                                   Task |         Baseline |        Contender |         Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|-------------------------------------------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
|                                                 Segment count |                                                        |     25           |     25           |      0       |        |    0.00% |
|                                  90th percentile service time |                                         sort_size_desc |    111.687       |     15.1521      |    -96.5345  |     ms |  -86.43% |
|                                  90th percentile service time |                                          sort_size_asc |     93.394       |     14.8044      |    -78.5896  |     ms |  -84.15% |
|                                  90th percentile service time |                                       sort_status_desc |    123.959       |     13.4924      |   -110.466   |     ms |  -89.12% |
|                                  90th percentile service time |                                        sort_status_asc |    105.192       |     11.857       |    -93.3347  |     ms |  -88.73% |
|                                  90th percentile service time |                 sort-size-desc-after-force-merge-1-seg |    111.995       |     21.2422      |    -90.7533  |     ms |  -81.03% |
|                                       90th percentile latency |                  sort-size-asc-after-force-merge-1-seg |  12386.2         |     11.8291      | -12374.4     |     ms |  -99.90% |
|                                       90th percentile latency |               sort-status-desc-after-force-merge-1-seg |    126.58        |     13.4277      |   -113.152   |     ms |  -89.39% |
|                                  90th percentile service time |                sort-status-asc-after-force-merge-1-seg |    104.873       |      7.42143     |    -97.4519  |     ms |  -92.92% |

john-wagster · 2025-05-19T20:45:46Z

server/src/main/java/org/elasticsearch/search/searchafter/SearchAfterBuilder.java

@@ -154,8 +154,11 @@ private static Object convertValueFromSortType(String fieldName, SortField.Type
        try {
            switch (sortType) {
                case DOC, INT:
-                    if (value instanceof Number) {
-                        return ((Number) value).intValue();
+                    if (value instanceof Number valueNumber) {


Mostly thinking outloud since I'm not entirely sure what a user's expectations would be here. But when I read this snippet I was wondering if this logic should throw an error instead of defaulting the value to max int.

@john-wagster I've added a comment for clarification.

This is to support the current behaviour for compatibility. Currently, before this PR, all sort on INT fields was treated as Long sort, so search_after of values > Integer.MAX_VALUE were allowed.
Now to support this and also to support sort on mixed shards of int and long value, we should convert values larger than Integer.MAX_VALUE

john-wagster

left one comment, took a couple of passes through the PR, and overall lgtm to me.

elasticsearchmachine · 2025-05-21T14:55:49Z

Hi @mayya-sharipova, I've updated the changelog YAML for you.

mayya-sharipova · 2025-05-21T16:08:52Z

@elasticsearchmachine run elasticsearch-ci/part-2

mayya-sharipova · 2025-05-21T17:07:40Z

@elasticsearchmachine run elasticsearch-ci/part-3

mayya-sharipova · 2025-05-21T17:12:05Z

@elasticsearchmachine run "Elasticsearch Serverless Checks"

jimczi

Nice results on the benchmark!
From what I understand bytes and shorts would benefit even more, right?
Let's follow up on that to ensure we have a rally track for these new optimisations.
It would be helpful to add dedicated YAML tests for this behavior to ensure full coverage.

qa/rolling-upgrade/src/javaRestTest/java/org/elasticsearch/upgrades/IndexSortUpgradeIT.java

server/src/main/java/org/elasticsearch/index/fielddata/IndexNumericFieldData.java

mayya-sharipova · 2025-05-23T13:45:44Z

@jimczi Thanks for the review, I will address it:

Let's follow up on that to ensure we have a rally track for these new optimisations.

I've added a PR for integer_sort to http_logs.

This adds sorting on integer fields of "size" and "status". We are optimizing integer sort in elastic/elasticsearch/pull/127968, and it would be nice to have dedicated operations for integer sort. Notice, no target-throughput for these operations, as when optimization is merged we expect massive speedups.

mayya-sharipova · 2025-05-26T17:46:54Z

@jimczi Thanks for the review.
I added more tests in : 8404d0f

…nt_short_byte

mayya-sharipova · 2025-05-28T14:02:08Z

@jimczi Do you have more comments for this PR, or could it be merged?

mayya-sharipova · 2025-05-29T20:06:33Z

The test on transforms failing because of this: apache/lucene#14732

benwtrent

I was digging around and attempting to fix the transforms failure, indeed, I think its due to the bug you found here: apache/lucene#14732

Its frustrating that this class is just full of anonymous classes & private methods.

Any way for us to apply the fix here and use a XIndexSortSortedNumericDocValuesRangeQuery extends IndexSortSortedNumericDocValuesRangeQuery in the appropriate places?

benwtrent · 2025-05-29T19:15:11Z

server/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+                if (getType(sortFields[fieldIdx]) == SortField.Type.INT) {
+                    for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
+                        FieldDoc fieldDoc = (FieldDoc) scoreDoc;
+                        fieldDoc.fields[fieldIdx] = ((Number) fieldDoc.fields[fieldIdx]).longValue();
+                    }


Why don't we need to change the internal sort field type to LONG as well?

We need to convert them to sort field types to LONG, as TopDocs.merge will use this type to get Long comparator to sorting long values.

benwtrent · 2025-05-29T19:59:37Z

...c/main/java/org/elasticsearch/index/fielddata/fieldcomparator/IntValuesComparatorSource.java

+/**
+ * Comparator source for integer values.
+ */
+public class IntValuesComparatorSource extends LongValuesComparatorSource {


Is there a reason why we don't provide an public BucketedSort newBucketedSort override for ints?

I guess we can do that in a follow up, but it should reduce memory consumption as we could tell BigArrays to use an IntArray, instead of a LongArray.

Thanks Ben, this is indeed a good follow up, I will add TODO:

mayya-sharipova · 2025-05-30T20:05:53Z

I've updated http_logs benchmarks to compare integer sort before and after these changes. The following charts have been added:

for multiple segments: sort_size_desc, sort_size_asc, sort_status_desc, sort_status_asc,
for single segment: sort-size-desc-after-force-merge-1-seg, sort-size-asc-after-force-merge-1-seg, sort-status-desc-after-force-merge-1-seg, sort-status-asc-after-force-merge-1-seg

…nt_short_byte

mayya-sharipova · 2025-06-02T19:13:49Z

@elasticsearchmachine run elasticsearch-ci/part-1

elasticsearchmachine · 2025-06-02T21:51:32Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127968

Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types. This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization. There are several caveats with changing sort type that are addressed: - Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values. - Similar for collapsing when there is mixed INT and LONG sort types. - Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT. --- There is only 1 change that may be considered not backwards compatible: Before if an integer field was [missing a value](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/sort-search-results#_missing_values) , it sort values will return Long.MAX_VALUE in a search response. With this integer, it sort valeu will return Integer.MAX_VALUE. But I think this change is ok, as in our documentation, we don't provide information what value will be returned, we just say it will be sorted last. --- Also closes elastic#127965 (as same type validation in added for collapse queries)

Follow up on elastic#127968

…8832) * Enable sort optimization on int, short and byte fields (#127968) Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types. This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization. There are several caveats with changing sort type that are addressed: - Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values. - Similar for collapsing when there is mixed INT and LONG sort types. - Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT. --- There is only 1 change that may be considered not backwards compatible: Before if an integer field was [missing a value](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/sort-search-results#_missing_values) , it sort values will return Long.MAX_VALUE in a search response. With this integer, it sort valeu will return Integer.MAX_VALUE. But I think this change is ok, as in our documentation, we don't provide information what value will be returned, we just say it will be sorted last. --- Also closes #127965 (as same type validation in added for collapse queries) * [CI] Auto commit changes from spotless * Add bucketedSort based on int --------- Co-authored-by: elasticsearchmachine <[email protected]>

Add bucketedSort on Int Follow up on #127968

Relates to PR elastic#127968 Closes elastic#128861, elastic#128862, elastic#128863

Relates to PR #127968 Closes #128861, #128862, #128863

Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types. This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization. There are several caveats with changing sort type that are addressed: - Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values. - Similar for collapsing when there is mixed INT and LONG sort types. - Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT. --- There is only 1 change that may be considered not backwards compatible: Before if an integer field was [missing a value](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/sort-search-results#_missing_values) , it sort values will return Long.MAX_VALUE in a search response. With this integer, it sort valeu will return Integer.MAX_VALUE. But I think this change is ok, as in our documentation, we don't provide information what value will be returned, we just say it will be sorted last. --- Also closes elastic#127965 (as same type validation in added for collapse queries)

mayya-sharipova added >enhancement :Search Relevance/Search Catch all for Search Relevance labels May 9, 2025

elasticsearchmachine added v9.1.0 Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels May 9, 2025

Update docs/changelog/127968.yaml

7beadf3

mayya-sharipova added 3 commits May 9, 2025 14:00

Fix tests

a9f5a8b

Fix test failures

44e64b2

Merge branch 'main' into sort_optimization_int_short_byte

25cebb8

john-wagster reviewed May 19, 2025

View reviewed changes

john-wagster approved these changes May 19, 2025

View reviewed changes

mayya-sharipova added 4 commits May 20, 2025 12:11

Fix test and add clarification to SearchAfterBuilder

2ea6efd

Fix BWC tests

443e7df

Merge branch 'main' into sort_optimization_int_short_byte

2cfaacd

Update docs/changelog/127968.yaml

4341c8d

Merge branch 'main' into sort_optimization_int_short_byte

2796301

Merge branch 'main' into sort_optimization_int_short_byte

b13cba1

jimczi reviewed May 23, 2025

View reviewed changes

qa/rolling-upgrade/src/javaRestTest/java/org/elasticsearch/upgrades/IndexSortUpgradeIT.java Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/fielddata/IndexNumericFieldData.java Show resolved Hide resolved

mayya-sharipova mentioned this pull request May 23, 2025

http_logs add search with int sort elastic/rally-tracks#778

Merged

Add more tests

8404d0f

mayya-sharipova added the v8.19.0 label May 26, 2025

Merge remote-tracking branch 'upstream/main' into sort_optimization_i…

b40dc57

…nt_short_byte

benwtrent reviewed May 29, 2025

View reviewed changes

benwtrent mentioned this pull request May 30, 2025

Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732 #128671

Merged

mayya-sharipova added 2 commits June 2, 2025 13:22

Merge remote-tracking branch 'upstream/main' into sort_optimization_i…

0f9c65e

…nt_short_byte

Add TODO for bucketSort

b8397f5

mayya-sharipova added auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jun 2, 2025

Merge branch 'main' into sort_optimization_int_short_byte

056862d

elasticsearchmachine merged commit 080a0cd into elastic:main Jun 2, 2025
18 checks passed

mayya-sharipova deleted the sort_optimization_int_short_byte branch June 2, 2025 21:51

elasticsearchmachine added the backport pending label Jun 2, 2025

mayya-sharipova mentioned this pull request Jun 3, 2025

Enable sort optimization on int, short and byte fields (#127968) #128832

Merged

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jun 3, 2025

Add bucketedSort based on int

55e2fd4

Follow up on elastic#127968

mayya-sharipova mentioned this pull request Jun 3, 2025

Add bucketedSort based on int #128848

Merged

elasticsearchmachine pushed a commit that referenced this pull request Jun 3, 2025

Add bucketedSort based on int (#128848)

1ba21c2

Add bucketedSort on Int Follow up on #127968

mayya-sharipova removed the backport pending label Jun 4, 2025

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jun 4, 2025

Fix IndexSortUpgradeIT test

1f7a621

Relates to PR elastic#127968 Closes elastic#128861, elastic#128862, elastic#128863

mayya-sharipova mentioned this pull request Jun 4, 2025

Fix IndexSortUpgradeIT test #128900

Merged

elasticsearchmachine pushed a commit that referenced this pull request Jun 4, 2025

Fix IndexSortUpgradeIT test (#128900)

4880245

Relates to PR #127968 Closes #128861, #128862, #128863

Enable sort optimization on int, short and byte fields #127968

Enable sort optimization on int, short and byte fields #127968

Uh oh!

Conversation

mayya-sharipova commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented May 9, 2025

Uh oh!

elasticsearchmachine commented May 9, 2025

Uh oh!

mayya-sharipova commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

john-wagster May 19, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova May 20, 2025

Choose a reason for hiding this comment

Uh oh!

john-wagster left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented May 21, 2025

Uh oh!

mayya-sharipova commented May 21, 2025

Uh oh!

mayya-sharipova commented May 21, 2025

Uh oh!

mayya-sharipova commented May 21, 2025

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mayya-sharipova commented May 23, 2025

Uh oh!

mayya-sharipova commented May 26, 2025

Uh oh!

mayya-sharipova commented May 28, 2025

Uh oh!

mayya-sharipova commented May 29, 2025

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent May 29, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent May 29, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent May 29, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova May 30, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayya-sharipova commented Jun 2, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 2, 2025

💔 Backport failed

Uh oh!

Uh oh!

mayya-sharipova commented May 9, 2025 •

edited

Loading

mayya-sharipova commented May 9, 2025 •

edited

Loading

mayya-sharipova commented May 30, 2025 •

edited

Loading