Adapt Delta and Iceberg to unshimmed helpers#15027
Conversation
8e3496f to
9c05b41
Compare
aabd751 to
6ebd35c
Compare
9c05b41 to
24e3c48
Compare
6ebd35c to
f608c12
Compare
f608c12 to
f04280c
Compare
526b855 to
8a241ea
Compare
f04280c to
2aad668
Compare
8a241ea to
de16c4d
Compare
2aad668 to
7918b20
Compare
de16c4d to
bc39c85
Compare
7918b20 to
9974576
Compare
bc39c85 to
f21d8b4
Compare
bc3dcf9 to
09556c1
Compare
0deb6a7 to
e4fc381
Compare
951af07 to
d0da6bd
Compare
e4fc381 to
a19a1be
Compare
d0da6bd to
9523c22
Compare
a19a1be to
2081b5f
Compare
9523c22 to
6a78825
Compare
2081b5f to
0e399a5
Compare
389bcfb to
1c955c0
Compare
f8dd2c4 to
f5bec86
Compare
57a7219 to
a1e022a
Compare
f5bec86 to
489d80a
Compare
a1e022a to
fa4c038
Compare
f05fe36 to
d8de428
Compare
e9c90de to
afbcd10
Compare
168505f to
8e3712f
Compare
afbcd10 to
632eb14
Compare
8e3712f to
bde56e7
Compare
632eb14 to
d630815
Compare
bde56e7 to
91bc880
Compare
d630815 to
7c174ff
Compare
91bc880 to
b0846ff
Compare
2c95704 to
2db9df3
Compare
b0846ff to
19c3ec8
Compare
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
19c3ec8 to
b9557df
Compare
2db9df3 to
e557937
Compare
Greptile SummaryThis PR adapts Delta Lake and Iceberg integration points to the unshimmed helper layout, primarily by mechanically converting
Confidence Score: 4/5The conversion is largely mechanical and correct, but Almost all converted types are either provably local (e.g. delta-lake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuDeltaTaskStatisticsTracker.scala — verify Important Files Changed
Sequence DiagramsequenceDiagram
participant Executor
participant GpuDeltaTaskStatisticsTracker
participant GpuDeltaFileStatistics
participant Spark Task Result
participant Driver
participant GpuDeltaJobStatisticsTracker
Executor->>GpuDeltaTaskStatisticsTracker: getFinalStats()
GpuDeltaTaskStatisticsTracker->>GpuDeltaFileStatistics: new GpuDeltaFileStatistics(results.toMap)
GpuDeltaFileStatistics-->>Executor: stats object
Executor->>Spark Task Result: serialize(WriteTaskStats)
Note over GpuDeltaFileStatistics,Spark Task Result: Missing Serializable — was auto-provided by case class
Spark Task Result->>Driver: send serialized task result
Driver->>GpuDeltaJobStatisticsTracker: processStats(Seq[WriteTaskStats])
Reviews (1): Last reviewed commit: "Adapt Delta and Iceberg to unshimmed hel..." | Re-trigger Greptile |
| * of the collected statistics. | ||
| */ | ||
| case class GpuDeltaFileStatistics(stats: Map[String, String]) extends WriteTaskStats | ||
| class GpuDeltaFileStatistics(val stats: Map[String, String]) extends WriteTaskStats |
There was a problem hiding this comment.
Missing
Serializable after case-class-to-class conversion
Scala case class automatically mixes in java.io.Serializable. Converting to a plain class removes that. WriteTaskStats is a plain sealed trait (no Serializable), so GpuDeltaFileStatistics is no longer serializable. Spark collects WriteTaskStats from executors to the driver as part of the task result, so a NotSerializableException would surface during any Delta write that uses stats collection. Every other converted class in this PR that needs to cross a serialization boundary (GpuIncrementMetricMeta, DeltaParquetChunkedReader, DeltaParquetTableReader) explicitly adds with Serializable. Delta Lake's own DeltaFileStatistics remains a case class for the same reason.
| class GpuDeltaFileStatistics(val stats: Map[String, String]) extends WriteTaskStats | |
| class GpuDeltaFileStatistics(val stats: Map[String, String]) extends WriteTaskStats with Serializable |
Related to #14834.
Description
This PR is one reviewable layer in the unshim stack introduced by #15025. It adapts Delta and Iceberg integration points to the unshimmed helper layout after the core SQL plugin helper movement and cleanup layers are in place.
Stack context
Testing and validation notes
Checklists
Documentation
Testing
(Covered by the validation notes in the PR description.)
Performance