Skip to content

[Coverage] Enable 4 private optimizer rules from public IT (AggPushdown / DecomposeStddevPop / SkewedBHJ / SubquerySharedScan) #14900

@wjxiz1992

Description

@wjxiz1992

Coverage gap

4 private optimizer rules show near-zero Line coverage in the nightly JaCoCo W0 baseline (2026-05-27, build b3, anchor 972be678):

Rule LM LC Line%
AggPushdownRule 210 5 2.3%
DecomposeStddevPop 168 4 2.3%
OptimizeSkewedBHJJoinRule 80 6 7.0%
OptimizeSubquerySharedScanRule 60 32 34.8%

These 4 LC values (5/4/6/32) correspond to the rules' early-return paths — every apply(plan) call in public IT hits the conf check and exits because each rule's gate conf is default-off. The 4 rules ARE registered in every public IT run via SQLPlugin's extension chain (which includes SQLOptimizerPlugin); they just don't transform any plans.

What this PR does

Adds 4 Python IT tests under a new pytest marker @pytest.mark.private_optimizer, each flipping its rule's gate conf via test conf={} and producing a query that satisfies the rule's apply() predicate. The tests run in the same it-cuda13 JVM as the rest of the IT (JaCoCo agent attached), so the rule bodies execute and contribute to the nightly aggregate.

Probe verification (local jacococli + Artifactory nightly-ut.exec)

Single-query probes (one query per rule, run locally with JaCoCo agent) measured outer-class LC deltas vs the W0 baseline:

Rule Baseline LC Probe LC Δ outer LC
AggPushdownRule 5 27 +22
DecomposeStddevPop 4 158 +154
OptimizeSkewedBHJJoinRule 6 11 +5
OptimizeSubquerySharedScanRule 32 83 +51
Combined outer +232 LC

Plus ~115 LC across inner-class anonfun closures and helper classes that will appear as new rows in the next nightly. Multi-test-case extension per rule (varying agg fns / distinct subquery shapes / skew patterns) is expected to push the combined contribution to +0.55–0.85pp on the global sql-plugin Line% KPI.

Implementation notes

  • DecomposeStddevPop requires the master enable spark.rapids.sql.private.enabled=true in addition to its per-rule conf, and its test includes an explicit CPU-GPU parity assertion (the rule has a documented catastrophic-cancellation risk when the mean is small relative to the stddev).
  • OptimizeSkewedBHJ requires AQE + lowered spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes (1MB) + raised spark.sql.autoBroadcastJoinThreshold so the BHJ doesn't get auto-converted to a small-side broadcast that bypasses the rule.
  • OptimizeSubquerySharedScan requires a LogicalRelation target (parquet file or registered temp table) — spark.range() doesn't qualify because Range is not a LogicalRelation.
  • All 4 tests register under the new private_optimizer marker in pytest.ini and run alongside the rest of it-cuda13 (no separate Jenkins lane).

Tracking

This issue is PR-B-coverage under epic #14899 (sql-plugin Line coverage uplift). The earlier infra issue #14898 was closed not-planned after a probe confirmed SQLPlugin already chains SQLOptimizerPlugin into spark.sql.extensions — no infra change is needed.

The mandatory pre-PR gates from the epic (line-level pre-verification + jacococli merge net-Δ check) are completed for this issue's scope. The PR body will cite the verified numbers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifytestOnly impacts tests

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions