Skip to content

Conversation

@ncordon
Copy link
Contributor

@ncordon ncordon commented Nov 3, 2025

Addresses #133992 and #136598, partially.

Missing from this pr that we still need to do: at the moment the runtime part tries to avoid double computations, resulting in exceptions if the plan is correct but not optimal. In other words, queries like:

from airports 
rename scalerank AS x 
stats  a = count(x), b = count(x) + count(x), c = count_distinct(x)

should had never failed at runtime even if the plan was not optimal for repeated aggregations.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 labels Nov 3, 2025
@ncordon ncordon added :Analytics/ES|QL AKA ESQL Team:ES|QL and removed needs:triage Requires assignment of a team area label labels Nov 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 3, 2025
@ncordon ncordon added the >bug label Nov 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ncordon, I've created a changelog YAML for you.

@astefan astefan requested a review from alex-spies November 3, 2025 12:18
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, no review yet, except for a very quick first glance.

This also fixes #136598, nice! Let's note that down in the PR description so the other issue gets auto-closed on merge, as well.

That said, I don't think this addresses problems like

| stats median(foo), percentile(foo, 50), count_distinct(foo)

because the substitution median(foo) -> percentile(foo, 50) happens after ReplaceAggregateAggExpressionWithEval, right?

The PR description says this partially addresses #133992; what else is not yet addressed?

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, could you please add some tests to the logical plan optimizer tests that demonstrate what the plans for some relevant STATS queries will look like? Actually, we should create a test class similar to ReplaceStatsFilteredAggWithEvalTests; there are probably some tests in LogicalPlanOptimizerTests that could be moved there, too, but that's optional.

I'm interested in seeing a bunch of cases, esp. ones with a BY clause and with per-agg-function WHERE clauses. We seem to have little coverage of per-agg-function WHERE clauses that are different from their canonicalization (otherwise I'd have expected some test failures).

Other than that, I think the approach in the fix is good! Clearly, when deduplicating aggs in expressions, we need to be consistent between a single agg function and an expression with agg functions within it.

if (alias == null) {
// create synthetic alias ove the found agg function
alias = new Alias(af.source(), syntheticName(canonical, child, counter[0]++), canonical, null, true);
alias = new Alias(af.source(), syntheticName(canonical, child, counter[0]++), af, null, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not want to use the canonicalized agg function here anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep af.canonicalize(). The canonicalization still affects the per-agg filter, as in STATS c = count(field) WHERE other_field*1 > 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the explanation on why it is important to keep the af.cannonical() here. I'd swear at some point I had to change this because tests were breaking otherwise. But if all tests are passing with it that means that either it is not that important or that we are missing specific tests that would break because of this?

Expression aggExpression = child.transformUp(AggregateFunction.class, af -> {
AggregateFunction canonical = (AggregateFunction) af.canonical();
// canonical representation, with resolved aliases
AggregateFunction canonical = (AggregateFunction) af.canonical().transformUp(e -> aliases.resolve(e, e));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use a helper function for this line to prevent this from being different from how we canonicalize agg functions above (line 91)?

891 | 1782 | 8
;

fixClassCastBug2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be nice to avoid numbers in test cases.
Additional description could also hint what aspects were broken before:

  • combining results of two aggregate functions
  • nesting functions
  • multiplying result by constant
  • etc

PUSHING_DOWN_EVAL_WITH_SCORE,

/**
* Fix for ClassCastException in STATS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Fix for ClassCastException in STATS
* Fix for ClassCastException in STATS
* https://github.com/elastic/elasticsearch/issues/133992

I realize it is a bit tricky to describe the change with java doc.
It might be worth linking an issue as it has a bit detailed description.
There are some prior examples with such links.

@ncordon
Copy link
Contributor Author

ncordon commented Nov 3, 2025

That said, I don't think this addresses problems like

| stats median(foo), percentile(foo, 50), count_distinct(foo)

@alex-spies, I've now included another planning phase after the constant folding that should take care of cases like these that @astefan suggested.

The PR description says this partially addresses #133992; what else is not yet addressed?

Philosophically I don't think the design of the compute engine part is correct at the moment. We try to make optimizations at runtime to avoid computing duplicated things and that breaks in case the plan is not optimal because we end up accessing wrong positions in our buffers.

For many (if not all) of the tests I added the plans were correct (but not optimal), and we are throwing at runtime. I've been in touch with @dnhatn about this part and he's helping me solve it.

* becomes
* stats a = min(x), c = count(*) by g | eval b = a, d = c | keep a, b, c, d, g
*/
public final class ReplaceDuplicatedAggs extends OptimizerRules.OptimizerRule<Aggregate> implements OptimizerRules.CoordinatorOnly {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the duplicate code in this file with respect to ReplaceAggregateAggExpressionWithEval.java, I'll try to share as much code as possible once I've checked this passes all tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:ES|QL v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants