-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Enable sort optimization on int, short and byte fields #127968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable sort optimization on int, short and byte fields #127968
Conversation
Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types. This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization. There are several caveats with changing sort type that are addressed: - Before mixed sort on integer and long fields was automatically supported, as both field types used SortField.TYPE.LONG. Now when merging results from different shards, we need to convert sort to LONG and results to long values. - Similar for collapsing when there is mixed INT and LONG sort types. - Index sorting. Similarly, before for index sorting on integer field, SortField.Type.LONG was used. This sort type is stored in the index writer config on disk and can't be modified. Now when providing sortField() for index sorting, we need to account for index version: for older indices return sort with SortField.Type.LONG and for new indices return SortField.Type.INT.
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @mayya-sharipova, I've created a changelog YAML for you. |
Benchmarks done on the geonames track where
| Metric | Task | Baseline | Contender | Diff | Unit | Diff % |
|--------------------------------------------------------------:|-------------------------------------------:|-----------------:|-----------------:|------------:|-------:|---------:|
| Segment count | | 69 | 71 | 2 | | +2.90% |
| 90th percentile service time | desc_sort_population | 55.8922 | 9.73511 | -46.1571 | ms | -82.58% |
| 90th percentile service time | desc_sort_population_can_match_shortcut | 27.5603 | 11.3325 | -16.2278 | ms | -58.88% |
| 90th percentile service time | desc_sort_population_no_can_match_shortcut | 27.6777 | 10.4456 | -17.232 | ms | -62.26% |
| 90th percentile service time | asc_sort_population | 43.7791 | 7.9269 | -35.8522 | ms | -81.89% |
| 90th percentile service time | asc_sort_with_after_population | 62.1788 | 8.2714 | -53.9074 | ms | -86.70% |
| 90th percentile service time | desc_sort_elevation | 52.0221 | 8.24807 | -43.774 | ms | -84.15% |
| 90th percentile service time | asc_sort_elevation | 41.8961 | 6.16761 | -35.7284 | ms | -85.28% | Benchmarks done on the http_logs track where "ingest_percentage:20, where
| Metric | Task | Baseline | Contender | Diff | Unit | Diff % |
|--------------------------------------------------------------:|-------------------------------------------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
| Segment count | | 25 | 25 | 0 | | 0.00% |
| 90th percentile service time | sort_size_desc | 111.687 | 15.1521 | -96.5345 | ms | -86.43% |
| 90th percentile service time | sort_size_asc | 93.394 | 14.8044 | -78.5896 | ms | -84.15% |
| 90th percentile service time | sort_status_desc | 123.959 | 13.4924 | -110.466 | ms | -89.12% |
| 90th percentile service time | sort_status_asc | 105.192 | 11.857 | -93.3347 | ms | -88.73% |
| 90th percentile service time | sort-size-desc-after-force-merge-1-seg | 111.995 | 21.2422 | -90.7533 | ms | -81.03% |
| 90th percentile latency | sort-size-asc-after-force-merge-1-seg | 12386.2 | 11.8291 | -12374.4 | ms | -99.90% |
| 90th percentile latency | sort-status-desc-after-force-merge-1-seg | 126.58 | 13.4277 | -113.152 | ms | -89.39% |
| 90th percentile service time | sort-status-asc-after-force-merge-1-seg | 104.873 | 7.42143 | -97.4519 | ms | -92.92% | |
@@ -154,8 +154,11 @@ private static Object convertValueFromSortType(String fieldName, SortField.Type | |||
try { | |||
switch (sortType) { | |||
case DOC, INT: | |||
if (value instanceof Number) { | |||
return ((Number) value).intValue(); | |||
if (value instanceof Number valueNumber) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly thinking outloud since I'm not entirely sure what a user's expectations would be here. But when I read this snippet I was wondering if this logic should throw an error instead of defaulting the value to max int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-wagster I've added a comment for clarification.
This is to support the current behaviour for compatibility. Currently, before this PR, all sort on INT fields was treated as Long sort, so search_after of values > Integer.MAX_VALUE were allowed.
Now to support this and also to support sort on mixed shards of int and long value, we should convert values larger than Integer.MAX_VALUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one comment, took a couple of passes through the PR, and overall lgtm to me.
Hi @mayya-sharipova, I've updated the changelog YAML for you. |
@elasticsearchmachine run elasticsearch-ci/part-2 |
@elasticsearchmachine run elasticsearch-ci/part-3 |
@elasticsearchmachine run "Elasticsearch Serverless Checks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice results on the benchmark!
From what I understand bytes and shorts would benefit even more, right?
Let's follow up on that to ensure we have a rally track for these new optimisations.
It would be helpful to add dedicated YAML tests for this behavior to ensure full coverage.
} | ||
""".formatted(config.fieldName())); | ||
var searchResponse = client().performRequest(searchRequest); | ||
assertOK(searchResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check that the returned sort values are correct?
break; | ||
} | ||
|
||
sortField.setOptimizeSortWithPoints(isIndexed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Before this PR sorting on integer, short and byte fields types used SortField.Type.LONG. This made sort optimization impossible for these field types.
This PR uses SortField.Type.INT for integer, short and byte fields. This enables sort optimization.
There are several caveats with changing sort type that are addressed:
Also closes #127965 (as same type validation in added for collapse queries)