You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Databricks premerge pipeline failed in the rapids-databricks_premerge-github Jenkins job because private_optimizer_subquery_shared_scan_test.py::test_optimize_subquery_shared_scan checks for the exact optimized-plan substring generated_agg_list.
The optimized plan produced during the failing run appears to contain the expected shared-scan rewrite shape, but the final printed alias is generated_aggs instead of generated_agg_list. Because the test validates a display alias string rather than the logical structure of the rewrite, it reports that the rule did not fire even though the plan contains the combined named_struct(c_0, ..., c_1, ..., c_2, ...) aggregate and scalar subquery field extraction through .c_0, .c_1, and .c_2.
Sanitized failure excerpt:
private_optimizer_subquery_shared_scan_test.py::test_optimize_subquery_shared_scan
> assert_rule_fires(fn, on, off, marker="generated_agg_list")
E AssertionError: rule did not fire: marker 'generated_agg_list' absent with rule ON
E Project [scalar-subquery#... [].c_0 AS a#..., scalar-subquery#... [].c_1 AS b#..., scalar-subquery#... [].c_2 AS c#...]
E : :- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]
E : : +- Filter (((g#... = 1) OR (g#... = 2)) OR (g#... = 3))
E : : +- Relation [id#...,g#...] parquet
E : :- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]
E : +- Aggregate [named_struct(c_0, max(if ((g#... = 1)) id#... else null), c_1, min(if ((g#... = 2)) id#... else null), c_2, count(if ((g#... = 3)) 1 else null)) AS generated_aggs#...]
E +- OneRowRelation
This looks like a test-marker robustness issue rather than a functional regression in the optimizer rule.
assert_rule_fires(fn, on, off, marker="generated_agg_list")
Expected Behavior
The test should pass when the private optimizer shared-scan rewrite is present, regardless of non-semantic alias text chosen or normalized by later optimizer processing.
Investigation Notes
The public test checks optimizedPlan().toString() for the exact string generated_agg_list.
The private optimizer artifact/implementation still constructs the combined aggregate using an alias named generated_agg_list.
The same implementation also gives the struct fields stable names such as c_0, c_1, and c_2.
The final optimized plan printed by the Databricks runtime used generated_aggs, while preserving the combined struct and field extraction that indicate the rewrite happened.
The private-side unit coverage checks for the structural shape of the combined aggregate rather than asserting the exact printed alias name.
Update the integration test to avoid relying on the exact outer alias generated_agg_list.
Preferred options:
Use a more structural marker or predicate that validates the shared-scan rewrite shape, such as the presence of the c_0/c_1/c_2 named struct fields and scalar-subquery field extraction.
Extend the test helper to accept multiple valid ON markers and use both generated_agg_list and generated_aggs as accepted aliases.
If feasible from Python, inspect the logical plan structure instead of matching optimizedPlan().toString().
The most robust fix is structural validation. Accepting both aliases would unblock the immediate failure, but it would still leave the test coupled to printed plan aliases.
Impact
This can cause unrelated PRs to fail Databricks premerge when the optimizer rewrite is present but the final printed plan alias differs from the test's hardcoded marker.
Describe the bug
The Databricks premerge pipeline failed in the
rapids-databricks_premerge-githubJenkins job becauseprivate_optimizer_subquery_shared_scan_test.py::test_optimize_subquery_shared_scanchecks for the exact optimized-plan substringgenerated_agg_list.The optimized plan produced during the failing run appears to contain the expected shared-scan rewrite shape, but the final printed alias is
generated_aggsinstead ofgenerated_agg_list. Because the test validates a display alias string rather than the logical structure of the rewrite, it reports that the rule did not fire even though the plan contains the combinednamed_struct(c_0, ..., c_1, ..., c_2, ...)aggregate and scalar subquery field extraction through.c_0,.c_1, and.c_2.Sanitized failure excerpt:
This looks like a test-marker robustness issue rather than a functional regression in the optimizer rule.
Observed Failure
Failure seen in PR #15064:
The ON plan included this shape:
The test currently expects:
Expected Behavior
The test should pass when the private optimizer shared-scan rewrite is present, regardless of non-semantic alias text chosen or normalized by later optimizer processing.
Investigation Notes
optimizedPlan().toString()for the exact stringgenerated_agg_list.generated_agg_list.c_0,c_1, andc_2.generated_aggs, while preserving the combined struct and field extraction that indicate the rewrite happened.Suggested Fix
Update the integration test to avoid relying on the exact outer alias
generated_agg_list.Preferred options:
c_0/c_1/c_2named struct fields and scalar-subquery field extraction.generated_agg_listandgenerated_aggsas accepted aliases.optimizedPlan().toString().The most robust fix is structural validation. Accepting both aliases would unblock the immediate failure, but it would still leave the test coupled to printed plan aliases.
Impact
This can cause unrelated PRs to fail Databricks premerge when the optimizer rewrite is present but the final printed plan alias differs from the test's hardcoded marker.