Remove old Spark 4 shim sources and update tests#15038
Conversation
f9b5a85 to
7da249c
Compare
a16bac9 to
526b855
Compare
7da249c to
f41eeda
Compare
8a241ea to
de16c4d
Compare
3e22bb7 to
dd6902a
Compare
bc39c85 to
f21d8b4
Compare
3f348a1 to
c27f3a3
Compare
0deb6a7 to
e4fc381
Compare
c27f3a3 to
9f8ea3d
Compare
e4fc381 to
a19a1be
Compare
7b24086 to
207d6eb
Compare
a19a1be to
2081b5f
Compare
207d6eb to
517d325
Compare
2081b5f to
0e399a5
Compare
517d325 to
95559eb
Compare
b7c4280 to
f8dd2c4
Compare
95559eb to
6f205d5
Compare
f8dd2c4 to
f5bec86
Compare
6f205d5 to
916f019
Compare
f5bec86 to
489d80a
Compare
916f019 to
4e38c2c
Compare
489d80a to
f05fe36
Compare
f039201 to
fa220fe
Compare
f05fe36 to
d8de428
Compare
fa220fe to
b99a61c
Compare
d8de428 to
6d48d15
Compare
b99a61c to
b48ffdf
Compare
6d48d15 to
5ee22f3
Compare
b48ffdf to
2dfb359
Compare
168505f to
8e3712f
Compare
2dfb359 to
25ad0c1
Compare
bde56e7 to
91bc880
Compare
3e2628f to
021a80c
Compare
91bc880 to
b0846ff
Compare
0f6f59c to
7aae5ec
Compare
b0846ff to
19c3ec8
Compare
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
19c3ec8 to
b9557df
Compare
7aae5ec to
9e83c67
Compare
Greptile SummaryThis PR removes old Spark 4 shim source files from
Confidence Score: 4/5The structural cleanup is well-executed — all deleted service providers and utility shims have verified replacements in the new sql-plugin-shims module, and the case-class-to-class migrations are consistently applied across callers and tests. The bulk of the change is safe bookkeeping: deletions with confirmed replacements, blank-line padding, and test adaptations. The two items worth a second look are (1) GpuLiteralShim.jsonFields not covering TimestampNTZType — affects only explain plan readability, not query correctness — and (2) the new TimestampAddInterval override omitting TIMESTAMP_NTZ from its TypeSig, which may be intentional parity with the old TimeAdd shims but is worth confirming. sql-plugin/src/main/spark400/scala/com/nvidia/spark/rapids/shims/NullIntolerantShim.scala (GpuLiteralShim JSON serialization) and sql-plugin/src/main/spark400db173/scala/com/nvidia/spark/rapids/shims/TimeAddShims.scala (TIMESTAMP_NTZ coverage). Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[PR #15038: Remove old Spark 4 shim sources] --> B[Deletions from sql-plugin]
A --> C[In-place mutations in sql-plugin]
A --> D[Test updates]
B --> B1[SparkShimServiceProvider ×5\nspark400/401/402/400db173/411]
B --> B2[Utility shims ×7\nDateTimeUtilsShims, OriginContextShim\nSparkSessionUtils, TrampolineConnectShims\nShuffleManagerShims, ShuffleClientShims\nFileCommitProtocolShims]
B1 --> E[Replaced in sql-plugin-shims module]
B2 --> F[Replaced in sql-plugin-shims\nor unused — safe to drop]
C --> C1[InvokeExprMeta\ncase class → class\nExplicit lambda constructor in callers]
C --> C2[GpuGroupedPythonRunnerFactory\ncase class → class + Serializable\nargNames default removed]
C --> C3[TimeAddShims\nempty Map → TimestampAddInterval\nGPU override for 400db173 + 411]
C --> C4[NullIntolerantShim\nAdds GpuLiteralShim helper]
C --> C5[Blank-line padding\nBinary-dedupe alignment]
D --> D1[GpuSortRetrySuite\nPositional → named params]
D --> D2[GpuUnboundedToUnboundedAggWindowSuite\nnew-keyword constructors]
Reviews (1): Last reviewed commit: "Update tests for shared helper layout" | Re-trigger Greptile |
| case (l: Long, org.apache.spark.sql.types.TimestampType) => | ||
| org.json4s.JsonAST.JString( | ||
| org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaTimestamp(l).toString) |
There was a problem hiding this comment.
Missing
TimestampNTZType case in jsonFields
GpuLiteralShim.jsonFields handles TimestampType (LTZ) via DateTimeUtils.toJavaTimestamp, but TimestampNTZType falls through to the catch-all other branch, which calls .toString on the raw Long microsecond value. Any GpuLiteral with a TIMESTAMP_NTZ type will therefore serialize as a bare integer in explain plans and plan JSON output rather than a human-readable timestamp string. The parallel path for DateType already shows the expected pattern — TIMESTAMP_NTZ would need something like DateTimeUtils.microsToLocalDateTime instead of toJavaTimestamp (which would incorrectly apply a timezone offset).
| ExprChecks.binaryProject(TypeSig.TIMESTAMP, TypeSig.TIMESTAMP, | ||
| ("start", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP), | ||
| ("interval", TypeSig.DAYTIME + TypeSig.lit(TypeEnum.CALENDAR) | ||
| .withPsNote(TypeEnum.CALENDAR, "month intervals are not supported"), | ||
| TypeSig.DAYTIME + TypeSig.CALENDAR)), |
There was a problem hiding this comment.
TIMESTAMP_NTZ not covered by the new override
ExprChecks.binaryProject is registered with TypeSig.TIMESTAMP only, so TimestampAddInterval on TIMESTAMP_NTZ columns will silently fall back to CPU. In Spark 4.1 / DB 17.3, TimestampAddInterval supports both timestamp variants — the CPU implementation is timezone-ignorant for TIMESTAMP_NTZ, making it straightforward to add TypeSig.TIMESTAMP_NTZ to both the result and the start input sig. Worth confirming whether this omission is intentional (matching old TimeAdd shim behaviour) or an oversight given the new override. Is the omission of TIMESTAMP_NTZ intentional here (preserving prior TimeAdd behaviour) or should TypeSig.TIMESTAMP_NTZ be added to both the result and start signatures?
Related to #14834.
Description
This PR is one reviewable layer in the unshim stack introduced by #15025. It removes old Spark 4 shim sources and updates tests for the shared helper layout. This is the final shim-source cleanup layer before the Delta/Iceberg follow-up.
Stack context
Testing and validation notes
Checklists
Documentation
Testing
(Please provide the names of the existing tests in the PR description.)
Performance