[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+#14907
[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+#14907wjxiz1992 wants to merge 1 commit into
Conversation
… 3.5+
15 Scala UT files under tests/src/test/spark330/scala/ were gated at
{330..344}, so shimplify did not generate copies for shim 350+. This
PR widens each file in one of two scopes depending on Spark 4.x API
surface compatibility:
- {330..411} (7 files): API surface compiles cleanly on Scala 2.13 /
Spark 4.x. Files: ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest,
IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite,
OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite.
- {330..356} (8 files): reference Spark 3.x-only API surfaces and
break under Spark 4.0's Connect-split (Dataset[Row] type) or
signature drift (SubqueryBroadcastExec.index Int -> indices Seq[Int],
FileSourceScanExec 9-arg -> 10-arg, ManagedBuffer.convertToNettyForSsl).
Files: DynamicPruningSuite, CsvScanForIntervalSuite,
IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite,
TimestampSuite, RapidsShuffleThreadedReaderSuite.
For files where 350db143 was already present (ConcurrentWriterMetricsSuite,
CsvScanForIntervalSuite), it is reordered to its sorted position
immediately after 350. Copyright bumped to 2026.
Validation:
- mvn package -pl tests -am -Dbuildver=350 -DwildcardSuites=<15 suites>:
Tests: succeeded 319, failed 0, canceled 1, ignored 0, pending 0
- mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=400:
BUILD SUCCESS
- mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=411:
BUILD SUCCESS
- JaCoCo merge vs nightly W0 baseline (972be678_20260525): Δ LINE_COVERED
= +1,564 on sql-plugin scope (+2.42pp), 0 fallers across 3,449 common
classes. Top contributors: RapidsConf$ +1212, GpuCSVScan$ +47,
IntervalUtils$ +34, SerializedBatchIterator +27, GpuDivideYMInterval +26.
Contributes to NVIDIA#14906 (under epic NVIDIA#14899).
### Review notes
- Suggestion (not addressed, out of scope): widening could be extended to
shim 357/358 on the Scala 2.12 lane and 400db173/401/402 on the 2.13
lane to match production shim parity in
sql-plugin/src/main/spark350/.../GpuIntervalUtils.scala. Limiting to
356/411 keeps the PR mechanical; broader matrix is a follow-up.
- Suggestion (pre-existing, not addressed): OrcEncryptionSuite's
isValidTestForSparkVersion predicate matches only major == 3, so the
widened 4.x runs will silently skip via `assume`. Coverage-neutral but
harmless; broadening the predicate is a separate change.
- Suggestion (pre-existing, not addressed): GpuIntervalUtilsTest.scala
name violates the *Suite.scala project convention. Renaming is
out-of-scope for this header-only PR.
Signed-off-by: Allen Xu <allxu@nvidia.com>
Greptile SummaryThis PR widens the shimplify JSON-lines headers on 15 Scala unit-test files under
Confidence Score: 4/5Safe to merge — all changes are shim metadata headers with no test or production logic modified, and local compilation on buildver=350/400/411 is confirmed. The widening is well-scoped and the rationale for each boundary (330..356 vs 330..411) is clearly documented and tested. The only gap is that six of the seven Spark-4.x-capable files skip the 350db143 shim, meaning those suites won't execute under Databricks 3.5.0 CI; this is easy to address in a follow-up but does leave a Databricks coverage hole for the interval/shuffle/ORC paths. The six Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[15 spark330 test suites] --> B{Spark 4.x API compatible?}
B -- Yes --> C[7 files: 330..411 group]
B -- No --> D[8 files: 330..356 group]
C --> C1[Add 350-356 + 400 + 411]
C1 --> C2{350db143 present?}
C2 -- ConcurrentWriterMetricsSuite --> C3[Yes — reorder to after 350]
C2 -- Other 6 files --> C4[No — potential DB coverage gap]
D --> D1[Add 350-356 only]
D --> D2{Reason for cap}
D2 --> D3[SubqueryBroadcastExec.index: DynamicPruningSuite]
D2 --> D4[FileSourceScanExec arity / Dataset Connect-split: IntervalSuite etc.]
D2 --> D5[ManagedBuffer.convertToNettyForSsl: RapidsShuffleThreadedReaderSuite]
Reviews (1): Last reviewed commit: "test: widen shim json-lines on 15 existi..." | Re-trigger Greptile |
| {"spark": "350"} | ||
| {"spark": "351"} |
There was a problem hiding this comment.
Missing
350db143 in Spark-4.x-capable suites
Six of the seven {330..411} files (GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite) jump directly from 350 to 351, skipping the 350db143 (Databricks Spark 3.5.0) shim. Because these files already compile cleanly on 400/411 — which introduce the API breaks cited for the {330..356} cap — the absence of 350db143 appears to be an oversight rather than a deliberate exclusion. Without it, none of these suites will run in Databricks CI for the 3.5.0 tier, leaving a gap in DB coverage for the interval/shuffle/ORC encryption paths. ConcurrentWriterMetricsSuite (the only file in this group that already carried 350db143) shows the correct ordering as a reference.
There was a problem hiding this comment.
Pull request overview
Updates shim-json-lines headers in existing Spark 3.3 test sources so shimplify codegen includes these suites in Spark 3.5+ (and some Spark 4.x) test builds, improving JaCoCo coverage on newer shims without changing test logic.
Changes:
- Widened shim-json-lines buildver lists in 15 Scala test suites under
tests/src/test/spark330/scala/to include Spark 3.5.x+ buildvers (and, for a subset, Spark 4.x buildvers). - Updated copyright headers to include 2026 in the touched files.
- Reordered
350db143inConcurrentWriterMetricsSuiteto keep buildver entries sorted.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 30 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedWriterSuite.scala | Widen shim-json-lines coverage to include newer Spark buildvers. |
| tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedReaderSuite.scala | Widen shim-json-lines coverage to include newer Spark buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/TimestampSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/SampleSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/ParquetUDTSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/OrcEncryptionSuite.scala | Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported). |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalMultiplySuite.scala | Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported). |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalDivisionSuite.scala | Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported). |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalCastSuite.scala | Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported). |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalArithmeticSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/GpuIntervalUtilsTest.scala | Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported). |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/DynamicPruningSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/CsvScanForIntervalSuite.scala | Widen shim-json-lines coverage for Spark 3.5+ buildvers. |
| tests/src/test/spark330/scala/com/nvidia/spark/rapids/ConcurrentWriterMetricsSuite.scala | Widen shim-json-lines coverage (incl. Spark 4.x where supported) and keep buildver list sorted. |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| {"spark": "400"} | ||
| {"spark": "411"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| {"spark": "400"} | ||
| {"spark": "411"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| {"spark": "400"} | ||
| {"spark": "411"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| {"spark": "400"} | ||
| {"spark": "411"} | ||
| spark-rapids-shim-json-lines ***/ |
| {"spark": "355"} | ||
| {"spark": "356"} | ||
| {"spark": "400"} | ||
| {"spark": "411"} | ||
| spark-rapids-shim-json-lines ***/ |
Summary
tests/src/test/spark330/scala/so they participate in shimplify codegen for Spark 3.5+ builds.{330..411}(cleanly compile under Scala 2.13 / Spark 4.x); 8 files extend to{330..356}only (referenced Spark 3.x-only API surfaces —Dataset[Row]Connect-split,SubqueryBroadcastExec.indexInt→Seq[Int],FileSourceScanExec9→10 arg,ManagedBuffer.convertToNettyForSsl— make 4.x compile fail; deferred to follow-up).972be678_20260525): ΔLINE_COVERED= +1,564 on sql-plugin scope (+2.42pp), 0 fallers across 3,449 common classes.Contributes to #14906 (under epic #14899).
Test plan
mvn package -pl tests -am -Dbuildver=350 -DwildcardSuites=<15 suites> -Dmaven.repo.local=./.mvn-repo -Drapids.test.gpu.allocFraction=0.3 (etc.)—Tests: succeeded 319, failed 0, canceled 1, ignored 0, pending 0mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=400— BUILD SUCCESSmvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=411— BUILD SUCCESSjacoco.execwith nightlyfinal-aggregate.exec; per-class report confirms ΔLINE_COVERED> 0 with no fallers.Files and scope
{330..411}(7){330..356}(8)For the 2 files where
350db143was pre-existing (ConcurrentWriterMetricsSuite, CsvScanForIntervalSuite), the line is reordered to its sorted position immediately after350to satisfy shimplify's sort assertion. Copyright bumped to 2026.No production code touched; no test logic changed.
Why this increases coverage
Mechanism. shimplify reads each test file's shim-json-lines header at build time. If
350is not in the header, the file is not symlinked intotarget/spark350/generated/src/...and the Scala compiler forbuildver=350literally cannot see it. The test class doesn't exist in shim 350's classpath, it never runs, and it contributes zero to the JaCoCoLINE_COVEREDrecorded by the nightly UT job at shim 350. This was the state for all 15 files in the W0 baseline. Adding{"spark": "350"}(and other 3.5+ entries) restores the symlink → the suite compiles and runs → the JaCoCo agent records every line it executes against the sql-plugin class set already loaded in the same JVM.Why the gain concentrates on
RapidsConf$.RapidsConfis a Scalaobject(companion), so its 1,309-line body — every conf registration and every conf-parse helper — runs once per JVM at first access. The W0 baseline reportsRapidsConf$at 0/1,309 covered, meaning no other suite in the shim 350 nightly triggersRapidsConfinitialization on the Spark 3.5 conf surface. Each widened suite's SparkSession startup forces the object init, picking up +1,212 LC (the remaining 97 lines stay missed — likely error/deprecated paths). The other contributors (GpuCSVScan$ +47,IntervalUtils$ +34,SerializedBatchIterator +27,GpuDivideYMInterval +26, …) follow the same pattern on smaller scope: each widened suite walks a code path that no other shim-350 suite walked.Why
LINE_TOTAL(denominator) does not move. This PR doesn't add or change any production class. With no new classes and no<exclude>changes,LINE_TOTALstays at the W0 baseline value. The +1,564LINE_COVEREDflows straight to the numerator, giving+1,564 / 39,727 ≈ +2.42pp. This is a covered-lines gain, not a denominator-shrink trick.Coverage impact
Top +
LINE_COVEREDcontributors from the JaCoCo merge (drift-cleaned across 3,449 common classes):LINE_COVEREDRapidsConf$(Spark 3.5 conf-parse path, 0/1309 in baseline)GpuCSVScan$(CsvScanForIntervalSuite)IntervalUtils$(Interval cluster)SerializedBatchIterator(Shuffle suites)GpuDivideYMInterval(IntervalDivisionSuite)5 of the 15 suites contribute 0 measurable LC because their target classes are already near-full coverage in baseline; keeping them widened anyway is forward-compatible with future shim additions.
Checklist
Documentation
Testing
Performance