Description
Elasticsearch Version
Serverless
Installed Plugins
No response
Java Version
bundled
OS Version
Serverless
Problem Description
A query on the uri wildcard field being expanded in 260k term queries; this resulted in 2.4GB consumed by one query.
Looking at the heap dump, we can see that the source contained more than 9K terms on a couple of url.* fields, that based on mappings they're both defined as wildcard fields.
... "url": { "properties": { ..., "full": { "type": "wildcard", "fields": { "text": { "type": "match_only_text" } } }, "original": { "type": "wildcard", "fields": { "text": { "type": "match_only_text" } } }, ... } ...
This query generates ~27K BinaryDvConfirmedAutomatonQuery
queries, which in turn, each one of them holds 10 (based on MAX_CLAUSES_IN_APPROXIMATION_QUERY
)TermQueries with a size of ~0.01 MB. So we end up with a single query consuming ~2GB of heap.
There is a failover mechanism for the individual wildcard queries to prevent "expensive" ones, but these are all batched underneath a should clause, so even though could be individually ok, collectively, they could cause OOMs.
We should identify in advance potential issues with the queries and fail early (to avoid OOMs).
Steps to Reproduce
Unavailable
Logs (if relevant)
No response