Description
Description
A remote enrich query using a policy that exports/enriches with a field already present in the query is planned slightly incorrectly -- it works, but fails verification.
Example: policy hosts
matches on ip
and has as enrich_fields ip
and os
:
FROM *:events,events
| EVAL ip= TO_STR(host)
| SORT timestamp, user, ip
| LIMIT 5
| ENRICH _REMOTE:hosts ON ip
| KEEP host, timestamp, user, os
This produces the optimised physical plan:
ProjectExec[[host{f}#14, timestamp{f}#16, user{f}#15, os{r}#21]]
\_TopNExec[[Order[timestamp{f}#16,ASC,LAST], Order[user{f}#15,ASC,LAST], Order[ip{r}#3,ASC,LAST]],5[INTEGER],null]
\_ExchangeExec[[host{f}#14, timestamp{f}#16, user{f}#15, os{r}#21, ip{r}#3],false]
\_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
Project[[host{f}#14, timestamp{f}#16, user{f}#15, os{r}#21, ip{r}#3]]
\_Enrich[REMOTE,[68 6f 73 74 73][KEYWORD],ip{r}#3,{"match":{"indices":[],"match_field":"ip","enrich_fields":["ip","os"]}},{=.enrich-hosts-1733836249291, c1=.enrich-hosts-1733836248939, c2=.enrich-hosts-1733836249107},[ip{r}#20, os{r}#21]]
\_TopN[[Order[timestamp{f}#16,ASC,LAST], Order[user{f}#15,ASC,LAST], Order[ip{r}#3,ASC,LAST]],5[INTEGER]]
\_Eval[[TOSTRING(host{f}#14) AS ip]]
\_EsRelation[events,c1:events,c2:events][host{f}#14, timestamp{f}#16, user{f}#15]<>]]
Note that in the fragment, Project
outputs ip{r}#3
(the node is produced by ProjectAwayColumns
based on the TopN below Enrich
), but Enrich
below it outputs ip{r}#20
(since it also enriches with its own ip
field). So the verification fails later when remapping the fragment plan on ProjectExec, since its inputs don't provide an ip{r}#3
. (If we KEEP ip
too, the verification would also fail due to attributes with duplicate name.)
Normally we would drop the ip
after TopN, but the Enrich remote planning pushes it to the remote cluster and the ip
is still needed for the coordinator TopN.
Related: #118307.