Skip to content

[BUG] Private optimizer subquery shared-scan IT uses brittle plan alias marker #15073

Description

@amahussein

Describe the bug

The Databricks premerge pipeline failed in the rapids-databricks_premerge-github Jenkins job because private_optimizer_subquery_shared_scan_test.py::test_optimize_subquery_shared_scan checks for the exact optimized-plan substring generated_agg_list.

The optimized plan produced during the failing run appears to contain the expected shared-scan rewrite shape, but the final printed alias is generated_aggs instead of generated_agg_list. Because the test validates a display alias string rather than the logical structure of the rewrite, it reports that the rule did not fire even though the plan contains the combined named_struct(c_0, ..., c_1, ..., c_2, ...) aggregate and scalar subquery field extraction through .c_0, .c_1, and .c_2.

Sanitized failure excerpt:

private_optimizer_subquery_shared_scan_test.py::test_optimize_subquery_shared_scan

> assert_rule_fires(fn, on, off, marker="generated_agg_list")

E AssertionError: rule did not fire: marker 'generated_agg_list' absent with rule ON
E Project [scalar-subquery#... [].c_0 AS a#..., scalar-subquery#... [].c_1 AS b#..., scalar-subquery#... [].c_2 AS c#...]
E :  :- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]
E :  :  +- Filter (((g#... = 1) OR (g#... = 2)) OR (g#... = 3))
E :  :     +- Relation [id#...,g#...] parquet
E :  :- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]
E :  +- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]
E +- OneRowRelation

This looks like a test-marker robustness issue rather than a functional regression in the optimizer rule.

Observed Failure

Failure seen in PR #15064:

AssertionError: rule did not fire: marker 'generated_agg_list' absent with rule ON

The ON plan included this shape:

Project [scalar-subquery#... [].c_0 AS a#..., scalar-subquery#... [].c_1 AS b#..., scalar-subquery#... [].c_2 AS c#...]
:  :- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]

The test currently expects:

assert_rule_fires(fn, on, off, marker="generated_agg_list")

Expected Behavior

The test should pass when the private optimizer shared-scan rewrite is present, regardless of non-semantic alias text chosen or normalized by later optimizer processing.

Investigation Notes

  • The public test checks optimizedPlan().toString() for the exact string generated_agg_list.
  • The private optimizer artifact/implementation still constructs the combined aggregate using an alias named generated_agg_list.
  • The same implementation also gives the struct fields stable names such as c_0, c_1, and c_2.
  • The final optimized plan printed by the Databricks runtime used generated_aggs, while preserving the combined struct and field extraction that indicate the rewrite happened.
  • The private-side unit coverage checks for the structural shape of the combined aggregate rather than asserting the exact printed alias name.
  • The failure appears unrelated to the functional changes in PR Fix skip-merge shuffle handle lifetime #15064.

Suggested Fix

Update the integration test to avoid relying on the exact outer alias generated_agg_list.

Preferred options:

  1. Use a more structural marker or predicate that validates the shared-scan rewrite shape, such as the presence of the c_0/c_1/c_2 named struct fields and scalar-subquery field extraction.
  2. Extend the test helper to accept multiple valid ON markers and use both generated_agg_list and generated_aggs as accepted aliases.
  3. If feasible from Python, inspect the logical plan structure instead of matching optimizedPlan().toString().

The most robust fix is structural validation. Accepting both aliases would unblock the immediate failure, but it would still leave the test coupled to printed plan aliases.

Impact

This can cause unrelated PRs to fail Databricks premerge when the optimizer rewrite is present but the final printed plan alias differs from the test's hardcoded marker.

Metadata

Metadata

Assignees

Labels

bot_watchSlack bot watched issue for LLM analyzerbugSomething isn't workingtestOnly impacts tests

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions