[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+ by wjxiz1992 · Pull Request #14907 · NVIDIA/spark-rapids

wjxiz1992 · 2026-05-29T08:58:08Z

Summary

Widens shim-json-lines headers on 15 Scala UT files under tests/src/test/spark330/scala/ so they participate in shimplify codegen for Spark 3.5+ builds.
7 files extend to {330..411} (cleanly compile under Scala 2.13 / Spark 4.x); 8 files extend to {330..356} only (referenced Spark 3.x-only API surfaces — Dataset[Row] Connect-split, SubqueryBroadcastExec.index Int→Seq[Int], FileSourceScanExec 9→10 arg, ManagedBuffer.convertToNettyForSsl — make 4.x compile fail; deferred to follow-up).
JaCoCo Δ vs nightly W0 baseline (972be678_20260525): Δ LINE_COVERED = +1,564 on sql-plugin scope (+2.42pp), 0 fallers across 3,449 common classes.

Contributes to #14906 (under epic #14899).

Test plan

Local validation: mvn package -pl tests -am -Dbuildver=350 -DwildcardSuites=<15 suites> -Dmaven.repo.local=./.mvn-repo -Drapids.test.gpu.allocFraction=0.3 (etc.) — Tests: succeeded 319, failed 0, canceled 1, ignored 0, pending 0
Compile-only on shim 400 (Scala 2.13 / Spark 4.0): mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=400 — BUILD SUCCESS
Compile-only on shim 411 (Scala 2.13 / Spark 4.1.1): mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=411 — BUILD SUCCESS
JaCoCo net-Δ gate: merged local PR jacoco.exec with nightly final-aggregate.exec; per-class report confirms Δ LINE_COVERED > 0 with no fallers.
Blossom premerge

Files and scope

Scope	Files
`{330..411}` (7)	ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite
`{330..356}` (8)	DynamicPruningSuite, CsvScanForIntervalSuite, IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite, TimestampSuite, RapidsShuffleThreadedReaderSuite

For the 2 files where 350db143 was pre-existing (ConcurrentWriterMetricsSuite, CsvScanForIntervalSuite), the line is reordered to its sorted position immediately after 350 to satisfy shimplify's sort assertion. Copyright bumped to 2026.

No production code touched; no test logic changed.

Why this increases coverage

Mechanism. shimplify reads each test file's shim-json-lines header at build time. If 350 is not in the header, the file is not symlinked into target/spark350/generated/src/... and the Scala compiler for buildver=350 literally cannot see it. The test class doesn't exist in shim 350's classpath, it never runs, and it contributes zero to the JaCoCo LINE_COVERED recorded by the nightly UT job at shim 350. This was the state for all 15 files in the W0 baseline. Adding {"spark": "350"} (and other 3.5+ entries) restores the symlink → the suite compiles and runs → the JaCoCo agent records every line it executes against the sql-plugin class set already loaded in the same JVM.

Why the gain concentrates on RapidsConf$. RapidsConf is a Scala object (companion), so its 1,309-line body — every conf registration and every conf-parse helper — runs once per JVM at first access. The W0 baseline reports RapidsConf$ at 0/1,309 covered, meaning no other suite in the shim 350 nightly triggers RapidsConf initialization on the Spark 3.5 conf surface. Each widened suite's SparkSession startup forces the object init, picking up +1,212 LC (the remaining 97 lines stay missed — likely error/deprecated paths). The other contributors (GpuCSVScan$ +47, IntervalUtils$ +34, SerializedBatchIterator +27, GpuDivideYMInterval +26, …) follow the same pattern on smaller scope: each widened suite walks a code path that no other shim-350 suite walked.

Why LINE_TOTAL (denominator) does not move. This PR doesn't add or change any production class. With no new classes and no <exclude> changes, LINE_TOTAL stays at the W0 baseline value. The +1,564 LINE_COVERED flows straight to the numerator, giving +1,564 / 39,727 ≈ +2.42pp. This is a covered-lines gain, not a denominator-shrink trick.

Coverage impact

Top +LINE_COVERED contributors from the JaCoCo merge (drift-cleaned across 3,449 common classes):

Class	Δ `LINE_COVERED`
`RapidsConf$` (Spark 3.5 conf-parse path, 0/1309 in baseline)	+1,212
`GpuCSVScan$` (CsvScanForIntervalSuite)	+47
`IntervalUtils$` (Interval cluster)	+34
`SerializedBatchIterator` (Shuffle suites)	+27
`GpuDivideYMInterval` (IntervalDivisionSuite)	+26

5 of the 15 suites contribute 0 measurable LC because their target classes are already near-full coverage in baseline; keeping them widened anyway is forward-compatible with future shim additions.

Checklist

Documentation

Updated for new or modified user-facing features or behaviors
No user-facing change

Testing

Added or modified tests to cover new code paths
Covered by existing tests
Not required

Performance

Tests ran and results are added in the PR description
Issue filed with a link in the PR description
Not required

… 3.5+ 15 Scala UT files under tests/src/test/spark330/scala/ were gated at {330..344}, so shimplify did not generate copies for shim 350+. This PR widens each file in one of two scopes depending on Spark 4.x API surface compatibility: - {330..411} (7 files): API surface compiles cleanly on Scala 2.13 / Spark 4.x. Files: ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite. - {330..356} (8 files): reference Spark 3.x-only API surfaces and break under Spark 4.0's Connect-split (Dataset[Row] type) or signature drift (SubqueryBroadcastExec.index Int -> indices Seq[Int], FileSourceScanExec 9-arg -> 10-arg, ManagedBuffer.convertToNettyForSsl). Files: DynamicPruningSuite, CsvScanForIntervalSuite, IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite, TimestampSuite, RapidsShuffleThreadedReaderSuite. For files where 350db143 was already present (ConcurrentWriterMetricsSuite, CsvScanForIntervalSuite), it is reordered to its sorted position immediately after 350. Copyright bumped to 2026. Validation: - mvn package -pl tests -am -Dbuildver=350 -DwildcardSuites=<15 suites>: Tests: succeeded 319, failed 0, canceled 1, ignored 0, pending 0 - mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=400: BUILD SUCCESS - mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=411: BUILD SUCCESS - JaCoCo merge vs nightly W0 baseline (972be678_20260525): Δ LINE_COVERED = +1,564 on sql-plugin scope (+2.42pp), 0 fallers across 3,449 common classes. Top contributors: RapidsConf$ +1212, GpuCSVScan$ +47, IntervalUtils$ +34, SerializedBatchIterator +27, GpuDivideYMInterval +26. Contributes to NVIDIA#14906 (under epic NVIDIA#14899). ### Review notes - Suggestion (not addressed, out of scope): widening could be extended to shim 357/358 on the Scala 2.12 lane and 400db173/401/402 on the 2.13 lane to match production shim parity in sql-plugin/src/main/spark350/.../GpuIntervalUtils.scala. Limiting to 356/411 keeps the PR mechanical; broader matrix is a follow-up. - Suggestion (pre-existing, not addressed): OrcEncryptionSuite's isValidTestForSparkVersion predicate matches only major == 3, so the widened 4.x runs will silently skip via `assume`. Coverage-neutral but harmless; broadening the predicate is a separate change. - Suggestion (pre-existing, not addressed): GpuIntervalUtilsTest.scala name violates the *Suite.scala project convention. Renaming is out-of-scope for this header-only PR. Signed-off-by: Allen Xu <allxu@nvidia.com>

greptile-apps · 2026-05-29T09:02:33Z

Greptile Summary

This PR widens the shimplify JSON-lines headers on 15 Scala unit-test files under tests/src/test/spark330/ so they participate in codegen for Spark 3.5.x (and, where the API surface allows, Spark 4.x). No test logic or production code is changed.

7 files → {330..411}: ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite — these compile cleanly under Scala 2.13 / Spark 4.x and have been verified with buildver=400 and buildver=411.
8 files → {330..356}: DynamicPruningSuite, CsvScanForIntervalSuite, IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite, TimestampSuite, RapidsShuffleThreadedReaderSuite — capped at 356 due to API incompatibilities in Spark 4.x.
For files where 350db143 was pre-existing (ConcurrentWriterMetricsSuite), the entry is correctly reordered to sorted position (after 350, before 351) to satisfy shimplify's sort assertion.

Confidence Score: 4/5

Safe to merge — all changes are shim metadata headers with no test or production logic modified, and local compilation on buildver=350/400/411 is confirmed.

The widening is well-scoped and the rationale for each boundary (330..356 vs 330..411) is clearly documented and tested. The only gap is that six of the seven Spark-4.x-capable files skip the 350db143 shim, meaning those suites won't execute under Databricks 3.5.0 CI; this is easy to address in a follow-up but does leave a Databricks coverage hole for the interval/shuffle/ORC paths.

The six {330..411} files that currently omit 350db143 — GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, and RapidsShuffleThreadedWriterSuite.

Important Files Changed

Filename	Overview
tests/src/test/spark330/scala/com/nvidia/spark/rapids/ConcurrentWriterMetricsSuite.scala	Shim header widened to {330..411}; pre-existing 350db143 correctly reordered to follow 350
tests/src/test/spark330/scala/com/nvidia/spark/rapids/GpuIntervalUtilsTest.scala	Shim header widened to {330..411}; includes 400/411 but omits 350db143 between 350 and 351
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalCastSuite.scala	Shim header widened to {330..411}; includes 400/411 but omits 350db143
tests/src/test/spark330/scala/com/nvidia/spark/rapids/OrcEncryptionSuite.scala	Shim header widened to {330..411}; includes 400/411 but omits 350db143
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedWriterSuite.scala	Shim header widened to {330..411}; includes 400/411 but omits 350db143
tests/src/test/spark330/scala/com/nvidia/spark/rapids/DynamicPruningSuite.scala	Shim header widened to {330..356}; capped at 356 due to SubqueryBroadcastExec.index API change in Spark 4.x
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedReaderSuite.scala	Shim header widened to {330..356}; correctly excludes 350db143 (ManagedBuffer.convertToNettyForSsl is DB-specific) and 400/411

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[15 spark330 test suites] --> B{Spark 4.x API compatible?}
    B -- Yes --> C[7 files: 330..411 group]
    B -- No --> D[8 files: 330..356 group]
    C --> C1[Add 350-356 + 400 + 411]
    C1 --> C2{350db143 present?}
    C2 -- ConcurrentWriterMetricsSuite --> C3[Yes — reorder to after 350]
    C2 -- Other 6 files --> C4[No — potential DB coverage gap]
    D --> D1[Add 350-356 only]
    D --> D2{Reason for cap}
    D2 --> D3[SubqueryBroadcastExec.index: DynamicPruningSuite]
    D2 --> D4[FileSourceScanExec arity / Dataset Connect-split: IntervalSuite etc.]
    D2 --> D5[ManagedBuffer.convertToNettyForSsl: RapidsShuffleThreadedReaderSuite]

_{Reviews (1): Last reviewed commit: "test: widen shim json-lines on 15 existi..." | Re-trigger Greptile}

greptile-apps · 2026-05-29T09:02:37Z

+{"spark": "350"}
+{"spark": "351"}


Missing 350db143 in Spark-4.x-capable suites

Six of the seven {330..411} files (GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite) jump directly from 350 to 351, skipping the 350db143 (Databricks Spark 3.5.0) shim. Because these files already compile cleanly on 400/411 — which introduce the API breaks cited for the {330..356} cap — the absence of 350db143 appears to be an oversight rather than a deliberate exclusion. Without it, none of these suites will run in Databricks CI for the 3.5.0 tier, leaving a gap in DB coverage for the interval/shuffle/ORC encryption paths. ConcurrentWriterMetricsSuite (the only file in this group that already carried 350db143) shows the correct ordering as a reference.

Copilot

Pull request overview

Updates shim-json-lines headers in existing Spark 3.3 test sources so shimplify codegen includes these suites in Spark 3.5+ (and some Spark 4.x) test builds, improving JaCoCo coverage on newer shims without changing test logic.

Changes:

Widened shim-json-lines buildver lists in 15 Scala test suites under tests/src/test/spark330/scala/ to include Spark 3.5.x+ buildvers (and, for a subset, Spark 4.x buildvers).
Reordered 350db143 in ConcurrentWriterMetricsSuite to keep buildver entries sorted.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 30 comments.

Show a summary per file

File	Description
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedWriterSuite.scala	Widen shim-json-lines coverage to include newer Spark buildvers.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedReaderSuite.scala	Widen shim-json-lines coverage to include newer Spark buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/TimestampSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/SampleSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/ParquetUDTSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/OrcEncryptionSuite.scala	Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalMultiplySuite.scala	Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalDivisionSuite.scala	Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalCastSuite.scala	Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalArithmeticSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/GpuIntervalUtilsTest.scala	Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/DynamicPruningSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/CsvScanForIntervalSuite.scala	Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/ConcurrentWriterMetricsSuite.scala	Widen shim-json-lines coverage (incl. Spark 4.x where supported) and keep buildver list sorted.

+{"spark": "355"}
+{"spark": "356"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
+{"spark": "400"}
+{"spark": "411"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
+{"spark": "400"}
+{"spark": "411"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
+{"spark": "400"}
+{"spark": "411"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
+{"spark": "400"}
+{"spark": "411"}
 spark-rapids-shim-json-lines ***/


+{"spark": "355"}
+{"spark": "356"}
+{"spark": "400"}
+{"spark": "411"}
 spark-rapids-shim-json-lines ***/


Copilot AI review requested due to automatic review settings May 29, 2026 08:58

Copilot started reviewing on behalf of wjxiz1992 May 29, 2026 08:58 View session

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Copilot AI reviewed May 29, 2026

View reviewed changes

wjxiz1992 marked this pull request as draft May 29, 2026 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+#14907

[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+#14907
wjxiz1992 wants to merge 1 commit into
NVIDIA:mainfrom
wjxiz1992:cov-gpu-exec

wjxiz1992 commented May 29, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		{"spark": "350"}
		{"spark": "351"}

Conversation

wjxiz1992 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Files and scope

Why this increases coverage

Coverage impact

Checklist

Uh oh!

greptile-apps Bot commented May 29, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wjxiz1992 commented May 29, 2026 •

edited

Loading