Skip to content

Remove old Spark 4 shim sources and update tests#15038

Open
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02v-shim-cleanup-35from
codex/unshim-stack-02w-shim-cleanup-40-tests
Open

Remove old Spark 4 shim sources and update tests#15038
gerashegalov wants to merge 2 commits into
codex/unshim-stack-02v-shim-cleanup-35from
codex/unshim-stack-02w-shim-cleanup-40-tests

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It removes old Spark 4 shim sources and updates tests for the shared helper layout. This is the final shim-source cleanup layer before the Delta/Iceberg follow-up.

Stack context

Testing and validation notes

  • This PR includes test updates for the shared helper layout and is also covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov changed the title codex/unshim stack 02w shim cleanup 40 tests Remove old Spark 4 shim sources and update tests Jun 10, 2026
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from f9b5a85 to 7da249c Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from a16bac9 to 526b855 Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 7da249c to f41eeda Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from 8a241ea to de16c4d Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 3e22bb7 to dd6902a Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from bc39c85 to f21d8b4 Compare June 10, 2026 22:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 3f348a1 to c27f3a3 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from 0deb6a7 to e4fc381 Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from c27f3a3 to 9f8ea3d Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from e4fc381 to a19a1be Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch 2 times, most recently from 7b24086 to 207d6eb Compare June 10, 2026 22:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from a19a1be to 2081b5f Compare June 10, 2026 22:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 207d6eb to 517d325 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 2081b5f to 0e399a5 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 517d325 to 95559eb Compare June 10, 2026 23:15
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from b7c4280 to f8dd2c4 Compare June 10, 2026 23:29
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 95559eb to 6f205d5 Compare June 10, 2026 23:29
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from f8dd2c4 to f5bec86 Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 6f205d5 to 916f019 Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from f5bec86 to 489d80a Compare June 10, 2026 23:48
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 916f019 to 4e38c2c Compare June 10, 2026 23:48
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 489d80a to f05fe36 Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch 2 times, most recently from f039201 to fa220fe Compare June 11, 2026 00:25
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from f05fe36 to d8de428 Compare June 11, 2026 00:25
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from fa220fe to b99a61c Compare June 11, 2026 00:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from d8de428 to 6d48d15 Compare June 11, 2026 00:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from b99a61c to b48ffdf Compare June 11, 2026 00:51
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 6d48d15 to 5ee22f3 Compare June 11, 2026 00:51
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from b48ffdf to 2dfb359 Compare June 11, 2026 01:18
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from 168505f to 8e3712f Compare June 11, 2026 01:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 2dfb359 to 25ad0c1 Compare June 11, 2026 01:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from bde56e7 to 91bc880 Compare June 11, 2026 01:58
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 3e2628f to 021a80c Compare June 11, 2026 01:58
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 91bc880 to b0846ff Compare June 11, 2026 02:26
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch 2 times, most recently from 0f6f59c to 7aae5ec Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from b0846ff to 19c3ec8 Compare June 13, 2026 12:13
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 19c3ec8 to b9557df Compare June 13, 2026 12:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02v-shim-cleanup-35 branch from 7aae5ec to 9e83c67 Compare June 13, 2026 12:20
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR removes old Spark 4 shim source files from sql-plugin/src/main/spark{400,401,402,411} that were relocated to the new sql-plugin-shims module in an earlier stack layer, and adapts the remaining shim code and tests to the revised class layout.

  • Deletions: SparkShimServiceProvider, DateTimeUtilsShims, OriginContextShim, SparkSessionUtils, TrampolineConnectShims, ShuffleManagerShims, ShuffleClientShims, and FileCommitProtocolShims are all removed from sql-plugin; their replacements exist in sql-plugin-shims or are unused.
  • case classclass migrations: InvokeExprMeta, GpuGroupedPythonRunnerFactory, GpuUnboundedToUnboundedAggStages, and SecondPassAggResult are converted to regular classes; call sites and tests are updated accordingly.
  • Functional addition: TimeAddShims in spark400db173 changes from an empty map to a full TimestampAddInterval GPU override (DB 17.3 and Spark 4.1+), and GpuLiteralShim is added to NullIntolerantShim.scala as a shared helper for Spark-4 shim stacks.

Confidence Score: 4/5

The structural cleanup is well-executed — all deleted service providers and utility shims have verified replacements in the new sql-plugin-shims module, and the case-class-to-class migrations are consistently applied across callers and tests.

The bulk of the change is safe bookkeeping: deletions with confirmed replacements, blank-line padding, and test adaptations. The two items worth a second look are (1) GpuLiteralShim.jsonFields not covering TimestampNTZType — affects only explain plan readability, not query correctness — and (2) the new TimestampAddInterval override omitting TIMESTAMP_NTZ from its TypeSig, which may be intentional parity with the old TimeAdd shims but is worth confirming.

sql-plugin/src/main/spark400/scala/com/nvidia/spark/rapids/shims/NullIntolerantShim.scala (GpuLiteralShim JSON serialization) and sql-plugin/src/main/spark400db173/scala/com/nvidia/spark/rapids/shims/TimeAddShims.scala (TIMESTAMP_NTZ coverage).

Important Files Changed

Filename Overview
sql-plugin/src/main/spark400db173/scala/com/nvidia/spark/rapids/shims/TimeAddShims.scala Replaces empty Map with a full TimestampAddInterval GPU override; only TypeSig.TIMESTAMP (LTZ) is accepted — TIMESTAMP_NTZ falls back to CPU, consistent with historical TimeAdd shims.
sql-plugin/src/main/spark400/scala/com/nvidia/spark/rapids/shims/NullIntolerantShim.scala Adds GpuLiteralShim abstract class for JSON serialization; handles TimestampType but not TimestampNTZType — NTZ literals will emit raw microsecond longs in explain output.
sql-plugin/src/main/spark411/scala/org/apache/spark/sql/rapids/execution/python/shims/GpuGroupedPythonRunnerFactory.scala Converted from case class to class + Serializable; argNames default removed — both callers already pass it explicitly, so no compilation breakage.
sql-plugin/src/main/spark400/scala/org/apache/spark/sql/rapids/shims/InvokeExprMeta.scala Converted from case class to plain class; both Spark400PlusCommonShims and Spark400PlusDBShims updated to use explicit lambda constructors.
tests/src/test/scala/com/nvidia/spark/rapids/GpuSortRetrySuite.scala Positional GpuSortEachBatchIterator arguments replaced with named parameters — readability improvement, no logic change.
tests/src/test/scala/com/nvidia/spark/rapids/window/GpuUnboundedToUnboundedAggWindowSuite.scala Updated to use new constructor syntax (case→class) for GpuUnboundedToUnboundedAggStages and SecondPassAggResult.
sql-plugin/src/main/spark400/scala/com/nvidia/spark/rapids/shims/CudfUnsafeRow.scala Adds blank-line padding before companion object for binary-dedupe alignment with pre-Spark-4 shims.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[PR #15038: Remove old Spark 4 shim sources] --> B[Deletions from sql-plugin]
    A --> C[In-place mutations in sql-plugin]
    A --> D[Test updates]

    B --> B1[SparkShimServiceProvider ×5\nspark400/401/402/400db173/411]
    B --> B2[Utility shims ×7\nDateTimeUtilsShims, OriginContextShim\nSparkSessionUtils, TrampolineConnectShims\nShuffleManagerShims, ShuffleClientShims\nFileCommitProtocolShims]

    B1 --> E[Replaced in sql-plugin-shims module]
    B2 --> F[Replaced in sql-plugin-shims\nor unused — safe to drop]

    C --> C1[InvokeExprMeta\ncase class → class\nExplicit lambda constructor in callers]
    C --> C2[GpuGroupedPythonRunnerFactory\ncase class → class + Serializable\nargNames default removed]
    C --> C3[TimeAddShims\nempty Map → TimestampAddInterval\nGPU override for 400db173 + 411]
    C --> C4[NullIntolerantShim\nAdds GpuLiteralShim helper]
    C --> C5[Blank-line padding\nBinary-dedupe alignment]

    D --> D1[GpuSortRetrySuite\nPositional → named params]
    D --> D2[GpuUnboundedToUnboundedAggWindowSuite\nnew-keyword constructors]
Loading

Reviews (1): Last reviewed commit: "Update tests for shared helper layout" | Re-trigger Greptile

Comment on lines +42 to +44
case (l: Long, org.apache.spark.sql.types.TimestampType) =>
org.json4s.JsonAST.JString(
org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaTimestamp(l).toString)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing TimestampNTZType case in jsonFields

GpuLiteralShim.jsonFields handles TimestampType (LTZ) via DateTimeUtils.toJavaTimestamp, but TimestampNTZType falls through to the catch-all other branch, which calls .toString on the raw Long microsecond value. Any GpuLiteral with a TIMESTAMP_NTZ type will therefore serialize as a bare integer in explain plans and plan JSON output rather than a human-readable timestamp string. The parallel path for DateType already shows the expected pattern — TIMESTAMP_NTZ would need something like DateTimeUtils.microsToLocalDateTime instead of toJavaTimestamp (which would incorrectly apply a timezone offset).

Comment on lines +38 to +42
ExprChecks.binaryProject(TypeSig.TIMESTAMP, TypeSig.TIMESTAMP,
("start", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP),
("interval", TypeSig.DAYTIME + TypeSig.lit(TypeEnum.CALENDAR)
.withPsNote(TypeEnum.CALENDAR, "month intervals are not supported"),
TypeSig.DAYTIME + TypeSig.CALENDAR)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TIMESTAMP_NTZ not covered by the new override

ExprChecks.binaryProject is registered with TypeSig.TIMESTAMP only, so TimestampAddInterval on TIMESTAMP_NTZ columns will silently fall back to CPU. In Spark 4.1 / DB 17.3, TimestampAddInterval supports both timestamp variants — the CPU implementation is timezone-ignorant for TIMESTAMP_NTZ, making it straightforward to add TypeSig.TIMESTAMP_NTZ to both the result and the start input sig. Worth confirming whether this omission is intentional (matching old TimeAdd shim behaviour) or an oversight given the new override. Is the omission of TIMESTAMP_NTZ intentional here (preserving prior TimeAdd behaviour) or should TypeSig.TIMESTAMP_NTZ be added to both the result and start signatures?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants