Closed
Description
Description
Feature description
Consider the following 4 documents with a keyword
field "names":
_doc/0: { names: [A, B] }
_doc/1: { names: [A, E] }
_doc/2: { names: [A, D] }
_doc/4: { names: [A, C] }
The current supported behavior would have these 4 docs be considered as ties when sorted ascendingly by names
.
I would like to be able to sort these so that the lists are sorted lexicographically as a whole; i.e. when the first element is a tie, it compares the second element, and so on, like so:
_doc/0: { names: [A, B] }
_doc/4: { names: [A, C] }
_doc/2: { names: [A, D] }
_doc/1: { names: [A, E] }
More examples:
[ [A], [B], [A, B] ]
would sort as[ [A], [A, B], [B] ]
[ [A], [A, B, C], [A, B, B] ]
would sort as[ [A], [A, B, B], [A, B, C] ]
This could be solved by Elasticsearch by introducing a new sort mode (e.g lex
, as a placeholder name for now) for multi-value fields.
POST /_search
{
"query" : {
"match_all" : {}
},
"sort" : [
{"names" : {"order" : "asc", "mode" : "lex"}}
]
}
How I had to solve this instead
I created a new field names_sortKey
of type keyword
, and in my application I joined the list elements with a delimiter character that sorts before all printable characters (e.g. \u001F
) and then I perform the sort on this field.
Example:
{
"names": ["A", "B", "C"],
"names_sortKey": "A\u001fB\u001fC"
}
POST /_search
{
"query" : {
"match_all" : {}
},
"sort" : [
{"names_sortKey" : {"order" : "asc"}}
]
}
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
[-]New sort mode for multi-valued keyword fields[/-][+]New lexicographical sort mode for multi-valued keyword fields[/+]elasticsearchmachine commentedon May 9, 2025
Pinging @elastic/es-search-relevance (Team:Search Relevance)
mayya-sharipova commentedon May 9, 2025
You can also do that through a script for sort, like this (there is no need to index an extra field):
Is this a good option for you and can we close this issue?
igordemiranda commentedon May 9, 2025
Hi Mayya, thanks for the quick response.
That does sounds like a good general workaround. In my case I'm using the
icu_collation_keyword
for text sorting. For example:I wonder:
An alternative would be if I'm able to create a custom analyzer that does the join at index time before going through the ICU analyzer. I couldn't find a way to do that though. If I understood correctly an analyzer cannot transform multi-valued fields into a single value? If you have any insights into that direction that would be great too.
mayya-sharipova commentedon May 21, 2025
@igordemiranda When you index a document with an
icu_collation_keyword
field, Elasticsearch uses the ICU library to generate a binary collation key for the field's string value. This precomputed binary values is stored is stored in doc_values, and is used later for sorting. Since it is already pre-computed, the sorting during search is very efficient, as it involves just binary comparison.Answering your specific questions:
I think we can consider this issue closed.
Please feel free to reopen if you think it is not addressed.