Skip to content

[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+#14907

Draft
wjxiz1992 wants to merge 1 commit into
NVIDIA:mainfrom
wjxiz1992:cov-gpu-exec
Draft

[Coverage] Widen shim json-lines on 15 existing test suites to cover Spark 3.5+#14907
wjxiz1992 wants to merge 1 commit into
NVIDIA:mainfrom
wjxiz1992:cov-gpu-exec

Conversation

@wjxiz1992
Copy link
Copy Markdown
Collaborator

@wjxiz1992 wjxiz1992 commented May 29, 2026

Summary

  • Widens shim-json-lines headers on 15 Scala UT files under tests/src/test/spark330/scala/ so they participate in shimplify codegen for Spark 3.5+ builds.
  • 7 files extend to {330..411} (cleanly compile under Scala 2.13 / Spark 4.x); 8 files extend to {330..356} only (referenced Spark 3.x-only API surfaces — Dataset[Row] Connect-split, SubqueryBroadcastExec.index Int→Seq[Int], FileSourceScanExec 9→10 arg, ManagedBuffer.convertToNettyForSsl — make 4.x compile fail; deferred to follow-up).
  • JaCoCo Δ vs nightly W0 baseline (972be678_20260525): Δ LINE_COVERED = +1,564 on sql-plugin scope (+2.42pp), 0 fallers across 3,449 common classes.

Contributes to #14906 (under epic #14899).

Test plan

  • Local validation: mvn package -pl tests -am -Dbuildver=350 -DwildcardSuites=<15 suites> -Dmaven.repo.local=./.mvn-repo -Drapids.test.gpu.allocFraction=0.3 (etc.)Tests: succeeded 319, failed 0, canceled 1, ignored 0, pending 0
  • Compile-only on shim 400 (Scala 2.13 / Spark 4.0): mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=400 — BUILD SUCCESS
  • Compile-only on shim 411 (Scala 2.13 / Spark 4.1.1): mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=411 — BUILD SUCCESS
  • JaCoCo net-Δ gate: merged local PR jacoco.exec with nightly final-aggregate.exec; per-class report confirms Δ LINE_COVERED > 0 with no fallers.
  • Blossom premerge

Files and scope

Scope Files
{330..411} (7) ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite
{330..356} (8) DynamicPruningSuite, CsvScanForIntervalSuite, IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite, TimestampSuite, RapidsShuffleThreadedReaderSuite

For the 2 files where 350db143 was pre-existing (ConcurrentWriterMetricsSuite, CsvScanForIntervalSuite), the line is reordered to its sorted position immediately after 350 to satisfy shimplify's sort assertion. Copyright bumped to 2026.

No production code touched; no test logic changed.

Why this increases coverage

Mechanism. shimplify reads each test file's shim-json-lines header at build time. If 350 is not in the header, the file is not symlinked into target/spark350/generated/src/... and the Scala compiler for buildver=350 literally cannot see it. The test class doesn't exist in shim 350's classpath, it never runs, and it contributes zero to the JaCoCo LINE_COVERED recorded by the nightly UT job at shim 350. This was the state for all 15 files in the W0 baseline. Adding {"spark": "350"} (and other 3.5+ entries) restores the symlink → the suite compiles and runs → the JaCoCo agent records every line it executes against the sql-plugin class set already loaded in the same JVM.

Why the gain concentrates on RapidsConf$. RapidsConf is a Scala object (companion), so its 1,309-line body — every conf registration and every conf-parse helper — runs once per JVM at first access. The W0 baseline reports RapidsConf$ at 0/1,309 covered, meaning no other suite in the shim 350 nightly triggers RapidsConf initialization on the Spark 3.5 conf surface. Each widened suite's SparkSession startup forces the object init, picking up +1,212 LC (the remaining 97 lines stay missed — likely error/deprecated paths). The other contributors (GpuCSVScan$ +47, IntervalUtils$ +34, SerializedBatchIterator +27, GpuDivideYMInterval +26, …) follow the same pattern on smaller scope: each widened suite walks a code path that no other shim-350 suite walked.

Why LINE_TOTAL (denominator) does not move. This PR doesn't add or change any production class. With no new classes and no <exclude> changes, LINE_TOTAL stays at the W0 baseline value. The +1,564 LINE_COVERED flows straight to the numerator, giving +1,564 / 39,727 ≈ +2.42pp. This is a covered-lines gain, not a denominator-shrink trick.

Coverage impact

Top +LINE_COVERED contributors from the JaCoCo merge (drift-cleaned across 3,449 common classes):

Class Δ LINE_COVERED
RapidsConf$ (Spark 3.5 conf-parse path, 0/1309 in baseline) +1,212
GpuCSVScan$ (CsvScanForIntervalSuite) +47
IntervalUtils$ (Interval cluster) +34
SerializedBatchIterator (Shuffle suites) +27
GpuDivideYMInterval (IntervalDivisionSuite) +26

5 of the 15 suites contribute 0 measurable LC because their target classes are already near-full coverage in baseline; keeping them widened anyway is forward-compatible with future shim additions.

Checklist

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

… 3.5+

15 Scala UT files under tests/src/test/spark330/scala/ were gated at
{330..344}, so shimplify did not generate copies for shim 350+. This
PR widens each file in one of two scopes depending on Spark 4.x API
surface compatibility:

  - {330..411} (7 files): API surface compiles cleanly on Scala 2.13 /
    Spark 4.x. Files: ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest,
    IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite,
    OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite.

  - {330..356} (8 files): reference Spark 3.x-only API surfaces and
    break under Spark 4.0's Connect-split (Dataset[Row] type) or
    signature drift (SubqueryBroadcastExec.index Int -> indices Seq[Int],
    FileSourceScanExec 9-arg -> 10-arg, ManagedBuffer.convertToNettyForSsl).
    Files: DynamicPruningSuite, CsvScanForIntervalSuite,
    IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite,
    TimestampSuite, RapidsShuffleThreadedReaderSuite.

For files where 350db143 was already present (ConcurrentWriterMetricsSuite,
CsvScanForIntervalSuite), it is reordered to its sorted position
immediately after 350. Copyright bumped to 2026.

Validation:
- mvn package -pl tests -am -Dbuildver=350 -DwildcardSuites=<15 suites>:
  Tests: succeeded 319, failed 0, canceled 1, ignored 0, pending 0
- mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=400:
  BUILD SUCCESS
- mvn -f scala2.13/pom.xml package -DskipTests -pl tests -am -Dbuildver=411:
  BUILD SUCCESS
- JaCoCo merge vs nightly W0 baseline (972be678_20260525): Δ LINE_COVERED
  = +1,564 on sql-plugin scope (+2.42pp), 0 fallers across 3,449 common
  classes. Top contributors: RapidsConf$ +1212, GpuCSVScan$ +47,
  IntervalUtils$ +34, SerializedBatchIterator +27, GpuDivideYMInterval +26.

Contributes to NVIDIA#14906 (under epic NVIDIA#14899).

### Review notes
- Suggestion (not addressed, out of scope): widening could be extended to
  shim 357/358 on the Scala 2.12 lane and 400db173/401/402 on the 2.13
  lane to match production shim parity in
  sql-plugin/src/main/spark350/.../GpuIntervalUtils.scala. Limiting to
  356/411 keeps the PR mechanical; broader matrix is a follow-up.
- Suggestion (pre-existing, not addressed): OrcEncryptionSuite's
  isValidTestForSparkVersion predicate matches only major == 3, so the
  widened 4.x runs will silently skip via `assume`. Coverage-neutral but
  harmless; broadening the predicate is a separate change.
- Suggestion (pre-existing, not addressed): GpuIntervalUtilsTest.scala
  name violates the *Suite.scala project convention. Renaming is
  out-of-scope for this header-only PR.

Signed-off-by: Allen Xu <allxu@nvidia.com>
Copilot AI review requested due to automatic review settings May 29, 2026 08:58
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR widens the shimplify JSON-lines headers on 15 Scala unit-test files under tests/src/test/spark330/ so they participate in codegen for Spark 3.5.x (and, where the API surface allows, Spark 4.x). No test logic or production code is changed.

  • 7 files → {330..411}: ConcurrentWriterMetricsSuite, GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite — these compile cleanly under Scala 2.13 / Spark 4.x and have been verified with buildver=400 and buildver=411.
  • 8 files → {330..356}: DynamicPruningSuite, CsvScanForIntervalSuite, IntervalArithmeticSuite, IntervalSuite, ParquetUDTSuite, SampleSuite, TimestampSuite, RapidsShuffleThreadedReaderSuite — capped at 356 due to API incompatibilities in Spark 4.x.
  • For files where 350db143 was pre-existing (ConcurrentWriterMetricsSuite), the entry is correctly reordered to sorted position (after 350, before 351) to satisfy shimplify's sort assertion.

Confidence Score: 4/5

Safe to merge — all changes are shim metadata headers with no test or production logic modified, and local compilation on buildver=350/400/411 is confirmed.

The widening is well-scoped and the rationale for each boundary (330..356 vs 330..411) is clearly documented and tested. The only gap is that six of the seven Spark-4.x-capable files skip the 350db143 shim, meaning those suites won't execute under Databricks 3.5.0 CI; this is easy to address in a follow-up but does leave a Databricks coverage hole for the interval/shuffle/ORC paths.

The six {330..411} files that currently omit 350db143GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, and RapidsShuffleThreadedWriterSuite.

Important Files Changed

Filename Overview
tests/src/test/spark330/scala/com/nvidia/spark/rapids/ConcurrentWriterMetricsSuite.scala Shim header widened to {330..411}; pre-existing 350db143 correctly reordered to follow 350
tests/src/test/spark330/scala/com/nvidia/spark/rapids/GpuIntervalUtilsTest.scala Shim header widened to {330..411}; includes 400/411 but omits 350db143 between 350 and 351
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalCastSuite.scala Shim header widened to {330..411}; includes 400/411 but omits 350db143
tests/src/test/spark330/scala/com/nvidia/spark/rapids/OrcEncryptionSuite.scala Shim header widened to {330..411}; includes 400/411 but omits 350db143
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedWriterSuite.scala Shim header widened to {330..411}; includes 400/411 but omits 350db143
tests/src/test/spark330/scala/com/nvidia/spark/rapids/DynamicPruningSuite.scala Shim header widened to {330..356}; capped at 356 due to SubqueryBroadcastExec.index API change in Spark 4.x
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedReaderSuite.scala Shim header widened to {330..356}; correctly excludes 350db143 (ManagedBuffer.convertToNettyForSsl is DB-specific) and 400/411

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[15 spark330 test suites] --> B{Spark 4.x API compatible?}
    B -- Yes --> C[7 files: 330..411 group]
    B -- No --> D[8 files: 330..356 group]
    C --> C1[Add 350-356 + 400 + 411]
    C1 --> C2{350db143 present?}
    C2 -- ConcurrentWriterMetricsSuite --> C3[Yes — reorder to after 350]
    C2 -- Other 6 files --> C4[No — potential DB coverage gap]
    D --> D1[Add 350-356 only]
    D --> D2{Reason for cap}
    D2 --> D3[SubqueryBroadcastExec.index: DynamicPruningSuite]
    D2 --> D4[FileSourceScanExec arity / Dataset Connect-split: IntervalSuite etc.]
    D2 --> D5[ManagedBuffer.convertToNettyForSsl: RapidsShuffleThreadedReaderSuite]
Loading

Reviews (1): Last reviewed commit: "test: widen shim json-lines on 15 existi..." | Re-trigger Greptile

Comment on lines +27 to +28
{"spark": "350"}
{"spark": "351"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing 350db143 in Spark-4.x-capable suites

Six of the seven {330..411} files (GpuIntervalUtilsTest, IntervalCastSuite, IntervalDivisionSuite, IntervalMultiplySuite, OrcEncryptionSuite, RapidsShuffleThreadedWriterSuite) jump directly from 350 to 351, skipping the 350db143 (Databricks Spark 3.5.0) shim. Because these files already compile cleanly on 400/411 — which introduce the API breaks cited for the {330..356} cap — the absence of 350db143 appears to be an oversight rather than a deliberate exclusion. Without it, none of these suites will run in Databricks CI for the 3.5.0 tier, leaving a gap in DB coverage for the interval/shuffle/ORC encryption paths. ConcurrentWriterMetricsSuite (the only file in this group that already carried 350db143) shows the correct ordering as a reference.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates shim-json-lines headers in existing Spark 3.3 test sources so shimplify codegen includes these suites in Spark 3.5+ (and some Spark 4.x) test builds, improving JaCoCo coverage on newer shims without changing test logic.

Changes:

  • Widened shim-json-lines buildver lists in 15 Scala test suites under tests/src/test/spark330/scala/ to include Spark 3.5.x+ buildvers (and, for a subset, Spark 4.x buildvers).
  • Updated copyright headers to include 2026 in the touched files.
  • Reordered 350db143 in ConcurrentWriterMetricsSuite to keep buildver entries sorted.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 30 comments.

Show a summary per file
File Description
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedWriterSuite.scala Widen shim-json-lines coverage to include newer Spark buildvers.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/RapidsShuffleThreadedReaderSuite.scala Widen shim-json-lines coverage to include newer Spark buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/TimestampSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/SampleSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/ParquetUDTSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/OrcEncryptionSuite.scala Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalMultiplySuite.scala Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalDivisionSuite.scala Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalCastSuite.scala Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/IntervalArithmeticSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/GpuIntervalUtilsTest.scala Widen shim-json-lines coverage to include newer Spark buildvers (incl. Spark 4.x where supported).
tests/src/test/spark330/scala/com/nvidia/spark/rapids/DynamicPruningSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/CsvScanForIntervalSuite.scala Widen shim-json-lines coverage for Spark 3.5+ buildvers.
tests/src/test/spark330/scala/com/nvidia/spark/rapids/ConcurrentWriterMetricsSuite.scala Widen shim-json-lines coverage (incl. Spark 4.x where supported) and keep buildver list sorted.

Comment on lines +33 to 35
{"spark": "355"}
{"spark": "356"}
spark-rapids-shim-json-lines ***/
Comment on lines +32 to 34
{"spark": "355"}
{"spark": "356"}
spark-rapids-shim-json-lines ***/
Comment on lines +32 to 34
{"spark": "355"}
{"spark": "356"}
spark-rapids-shim-json-lines ***/
Comment on lines +33 to 35
{"spark": "355"}
{"spark": "356"}
spark-rapids-shim-json-lines ***/
Comment on lines +32 to 34
{"spark": "355"}
{"spark": "356"}
spark-rapids-shim-json-lines ***/
Comment on lines +33 to 37
{"spark": "355"}
{"spark": "356"}
{"spark": "400"}
{"spark": "411"}
spark-rapids-shim-json-lines ***/
Comment on lines +33 to 37
{"spark": "355"}
{"spark": "356"}
{"spark": "400"}
{"spark": "411"}
spark-rapids-shim-json-lines ***/
Comment on lines +32 to 36
{"spark": "355"}
{"spark": "356"}
{"spark": "400"}
{"spark": "411"}
spark-rapids-shim-json-lines ***/
Comment on lines +32 to 36
{"spark": "355"}
{"spark": "356"}
{"spark": "400"}
{"spark": "411"}
spark-rapids-shim-json-lines ***/
Comment on lines +37 to 41
{"spark": "355"}
{"spark": "356"}
{"spark": "400"}
{"spark": "411"}
spark-rapids-shim-json-lines ***/
@wjxiz1992 wjxiz1992 marked this pull request as draft May 29, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants