Coverage gap
4 private optimizer rules show near-zero Line coverage in the nightly JaCoCo W0 baseline (2026-05-27, build b3, anchor 972be678):
| Rule |
LM |
LC |
Line% |
AggPushdownRule |
210 |
5 |
2.3% |
DecomposeStddevPop |
168 |
4 |
2.3% |
OptimizeSkewedBHJJoinRule |
80 |
6 |
7.0% |
OptimizeSubquerySharedScanRule |
60 |
32 |
34.8% |
These 4 LC values (5/4/6/32) correspond to the rules' early-return paths — every apply(plan) call in public IT hits the conf check and exits because each rule's gate conf is default-off. The 4 rules ARE registered in every public IT run via SQLPlugin's extension chain (which includes SQLOptimizerPlugin); they just don't transform any plans.
What this PR does
Adds 4 Python IT tests under a new pytest marker @pytest.mark.private_optimizer, each flipping its rule's gate conf via test conf={} and producing a query that satisfies the rule's apply() predicate. The tests run in the same it-cuda13 JVM as the rest of the IT (JaCoCo agent attached), so the rule bodies execute and contribute to the nightly aggregate.
Probe verification (local jacococli + Artifactory nightly-ut.exec)
Single-query probes (one query per rule, run locally with JaCoCo agent) measured outer-class LC deltas vs the W0 baseline:
| Rule |
Baseline LC |
Probe LC |
Δ outer LC |
AggPushdownRule |
5 |
27 |
+22 |
DecomposeStddevPop |
4 |
158 |
+154 |
OptimizeSkewedBHJJoinRule |
6 |
11 |
+5 |
OptimizeSubquerySharedScanRule |
32 |
83 |
+51 |
| Combined outer |
|
|
+232 LC |
Plus ~115 LC across inner-class anonfun closures and helper classes that will appear as new rows in the next nightly. Multi-test-case extension per rule (varying agg fns / distinct subquery shapes / skew patterns) is expected to push the combined contribution to +0.55–0.85pp on the global sql-plugin Line% KPI.
Implementation notes
DecomposeStddevPop requires the master enable spark.rapids.sql.private.enabled=true in addition to its per-rule conf, and its test includes an explicit CPU-GPU parity assertion (the rule has a documented catastrophic-cancellation risk when the mean is small relative to the stddev).
OptimizeSkewedBHJ requires AQE + lowered spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes (1MB) + raised spark.sql.autoBroadcastJoinThreshold so the BHJ doesn't get auto-converted to a small-side broadcast that bypasses the rule.
OptimizeSubquerySharedScan requires a LogicalRelation target (parquet file or registered temp table) — spark.range() doesn't qualify because Range is not a LogicalRelation.
- All 4 tests register under the new
private_optimizer marker in pytest.ini and run alongside the rest of it-cuda13 (no separate Jenkins lane).
Tracking
This issue is PR-B-coverage under epic #14899 (sql-plugin Line coverage uplift). The earlier infra issue #14898 was closed not-planned after a probe confirmed SQLPlugin already chains SQLOptimizerPlugin into spark.sql.extensions — no infra change is needed.
The mandatory pre-PR gates from the epic (line-level pre-verification + jacococli merge net-Δ check) are completed for this issue's scope. The PR body will cite the verified numbers.
Coverage gap
4 private optimizer rules show near-zero
Linecoverage in the nightly JaCoCo W0 baseline (2026-05-27, build b3, anchor972be678):AggPushdownRuleDecomposeStddevPopOptimizeSkewedBHJJoinRuleOptimizeSubquerySharedScanRuleThese 4 LC values (5/4/6/32) correspond to the rules' early-return paths — every
apply(plan)call in public IT hits the conf check and exits because each rule's gate conf is default-off. The 4 rules ARE registered in every public IT run viaSQLPlugin's extension chain (which includesSQLOptimizerPlugin); they just don't transform any plans.What this PR does
Adds 4 Python IT tests under a new pytest marker
@pytest.mark.private_optimizer, each flipping its rule's gate conf via testconf={}and producing a query that satisfies the rule'sapply()predicate. The tests run in the sameit-cuda13JVM as the rest of the IT (JaCoCo agent attached), so the rule bodies execute and contribute to the nightly aggregate.Probe verification (local jacococli + Artifactory
nightly-ut.exec)Single-query probes (one query per rule, run locally with JaCoCo agent) measured outer-class LC deltas vs the W0 baseline:
AggPushdownRuleDecomposeStddevPopOptimizeSkewedBHJJoinRuleOptimizeSubquerySharedScanRulePlus ~115 LC across inner-class anonfun closures and helper classes that will appear as new rows in the next nightly. Multi-test-case extension per rule (varying agg fns / distinct subquery shapes / skew patterns) is expected to push the combined contribution to +0.55–0.85pp on the global
sql-pluginLine% KPI.Implementation notes
DecomposeStddevPoprequires the master enablespark.rapids.sql.private.enabled=truein addition to its per-rule conf, and its test includes an explicit CPU-GPU parity assertion (the rule has a documented catastrophic-cancellation risk when the mean is small relative to the stddev).OptimizeSkewedBHJrequires AQE + loweredspark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes(1MB) + raisedspark.sql.autoBroadcastJoinThresholdso the BHJ doesn't get auto-converted to a small-side broadcast that bypasses the rule.OptimizeSubquerySharedScanrequires aLogicalRelationtarget (parquet file or registered temp table) —spark.range()doesn't qualify becauseRangeis not aLogicalRelation.private_optimizermarker inpytest.iniand run alongside the rest ofit-cuda13(no separate Jenkins lane).Tracking
This issue is
PR-B-coverageunder epic #14899 (sql-plugin Line coverage uplift). The earlier infra issue #14898 was closed not-planned after a probe confirmedSQLPluginalready chainsSQLOptimizerPluginintospark.sql.extensions— no infra change is needed.The mandatory pre-PR gates from the epic (line-level pre-verification +
jacococli mergenet-Δ check) are completed for this issue's scope. The PR body will cite the verified numbers.