Skip to content

Adapt Delta and Iceberg to unshimmed helpers#15027

Open
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02w-shim-cleanup-40-testsfrom
codex/unshim-stack-03-delta-iceberg
Open

Adapt Delta and Iceberg to unshimmed helpers#15027
gerashegalov wants to merge 1 commit into
codex/unshim-stack-02w-shim-cleanup-40-testsfrom
codex/unshim-stack-03-delta-iceberg

Conversation

@gerashegalov

@gerashegalov gerashegalov commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Related to #14834.

Description

This PR is one reviewable layer in the unshim stack introduced by #15025. It adapts Delta and Iceberg integration points to the unshimmed helper layout after the core SQL plugin helper movement and cleanup layers are in place.

Stack context

Testing and validation notes

  • Covered by the full-stack packaging/build validation described in Add default common unshim packaging flow #15025 and the existing Delta/Iceberg test coverage for the affected integrations.
  • The full split stack was verified to be tree-equivalent to the pre-split stack top.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Covered by the validation notes in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@gerashegalov gerashegalov changed the title codex/unshim stack 03 delta iceberg Adapt Delta and Iceberg to unshimmed helpers Jun 10, 2026
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02-sql-plugin-modules branch from 8e3496f to 9c05b41 Compare June 10, 2026 15:08
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from aabd751 to 6ebd35c Compare June 10, 2026 15:08
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02-sql-plugin-modules branch from 9c05b41 to 24e3c48 Compare June 10, 2026 15:43
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 6ebd35c to f608c12 Compare June 10, 2026 15:44
@gerashegalov gerashegalov changed the base branch from codex/unshim-stack-02-sql-plugin-modules to codex/unshim-stack-02w-shim-cleanup-40-tests June 10, 2026 15:54
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from f608c12 to f04280c Compare June 10, 2026 20:49
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 526b855 to 8a241ea Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from f04280c to 2aad668 Compare June 10, 2026 21:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 8a241ea to de16c4d Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 2aad668 to 7918b20 Compare June 10, 2026 21:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from de16c4d to bc39c85 Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 7918b20 to 9974576 Compare June 10, 2026 21:36
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from bc39c85 to f21d8b4 Compare June 10, 2026 22:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch 2 times, most recently from bc3dcf9 to 09556c1 Compare June 10, 2026 22:37
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from 0deb6a7 to e4fc381 Compare June 10, 2026 22:41
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch 2 times, most recently from 951af07 to d0da6bd Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from e4fc381 to a19a1be Compare June 10, 2026 22:46
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from d0da6bd to 9523c22 Compare June 10, 2026 22:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from a19a1be to 2081b5f Compare June 10, 2026 22:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 9523c22 to 6a78825 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 2081b5f to 0e399a5 Compare June 10, 2026 23:12
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 389bcfb to 1c955c0 Compare June 10, 2026 23:29
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from f8dd2c4 to f5bec86 Compare June 10, 2026 23:33
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch 2 times, most recently from 57a7219 to a1e022a Compare June 10, 2026 23:48
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from f5bec86 to 489d80a Compare June 10, 2026 23:48
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from a1e022a to fa4c038 Compare June 10, 2026 23:59
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from f05fe36 to d8de428 Compare June 11, 2026 00:25
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch 4 times, most recently from e9c90de to afbcd10 Compare June 11, 2026 01:18
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch 2 times, most recently from 168505f to 8e3712f Compare June 11, 2026 01:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from afbcd10 to 632eb14 Compare June 11, 2026 01:32
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 8e3712f to bde56e7 Compare June 11, 2026 01:43
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 632eb14 to d630815 Compare June 11, 2026 01:43
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from bde56e7 to 91bc880 Compare June 11, 2026 01:58
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from d630815 to 7c174ff Compare June 11, 2026 01:58
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 91bc880 to b0846ff Compare June 11, 2026 02:26
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch 3 times, most recently from 2c95704 to 2db9df3 Compare June 13, 2026 12:13
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from b0846ff to 19c3ec8 Compare June 13, 2026 12:13
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-02w-shim-cleanup-40-tests branch from 19c3ec8 to b9557df Compare June 13, 2026 12:20
@gerashegalov gerashegalov force-pushed the codex/unshim-stack-03-delta-iceberg branch from 2db9df3 to e557937 Compare June 13, 2026 12:20
@gerashegalov gerashegalov marked this pull request as ready for review June 13, 2026 12:49
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adapts Delta Lake and Iceberg integration points to the unshimmed helper layout, primarily by mechanically converting case class definitions to plain class with explicit val fields across both modules. It also adds reflection-based accessors for abort, commitOperation, and taskCommitFiles to GpuSparkWriteAccess, and threads the corresponding wrapper methods through GpuSparkWrite.

  • All case classclass conversions add explicit val accessors; classes that provably need to cross JVM serialization boundaries (GpuIncrementMetricMeta, DeltaParquetChunkedReader, DeltaParquetTableReader) correctly add with Serializable.
  • DeleteWriteContext.close() replaces the productIterator-based close with an explicit Seq[AutoCloseable]; a manual copy() with default-parameter semantics is added to preserve existing call sites.
  • New invokeMethod/findMethod helpers in GpuSparkWriteAccess follow the existing readField reflection pattern with proper exception unwrapping and class-hierarchy traversal.

Confidence Score: 4/5

The conversion is largely mechanical and correct, but GpuDeltaFileStatistics lost its implicit Serializable from the case-class conversion without getting an explicit replacement, unlike other converted classes in the same PR.

Almost all converted types are either provably local (e.g. DeltaWriteV1Config, StartTransactionArg) or explicitly gained with Serializable (e.g. GpuIncrementMetricMeta, DeltaParquetChunkedReader). GpuDeltaFileStatistics is the outlier: it carries write stats back from executors to the driver, and its WriteTaskStats supertrait does not extend Serializable, meaning the now-absent case-class auto-Serializable was load-bearing. Everything else — including the new reflection accessors in GpuSparkWriteAccess, the manual copy() on DeleteWriteContext, and the DeltaBatchExtraInfo constructor-call expansion — looks correct.

delta-lake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuDeltaTaskStatisticsTracker.scala — verify GpuDeltaFileStatistics serialization requirement and add with Serializable if confirmed.

Important Files Changed

Filename Overview
delta-lake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuDeltaTaskStatisticsTracker.scala Converts GpuDeltaFileStatistics from case class to plain class; loses auto-Serializable without explicitly adding with Serializable, unlike peer converted classes in this PR.
delta-lake/common/src/main/delta-33x-41x/scala/com/nvidia/spark/rapids/delta/common/GpuDeltaParquetFileFormatBase2.scala Mechanical case-class-to-class conversion for several inner types; DeltaBatchExtraInfo.loadedDVResults and DeltaParquetHostMemoryEmptyMetaData.allPartValues lose their defaults and are now explicit at every call site. Manual consumeHeadBuffer reconstruction and withLoadedDVResults correctly pass all fields.
iceberg/common/src/main/java/org/apache/iceberg/spark/source/GpuSparkWriteAccess.java Adds reflection-based abort, commitOperation, taskCommitFiles accessors; invokeMethod/findMethod helpers follow existing readField pattern, with proper hierarchy traversal and exception unwrapping.
iceberg/common/src/main/scala/org/apache/iceberg/spark/source/GpuSparkPositionDeltaWrite.scala Converts DeleteWriteContext from case class to plain class; explicit copy() method correctly mirrors case-class semantics, and close() now uses an explicit Seq[AutoCloseable] instead of productIterator.
delta-lake/common/src/main/delta-33x-41x/scala/com/nvidia/spark/rapids/delta/common/DeltaProviderBase.scala Converts GpuIncrementMetricMeta from case class to plain class; correctly adds with Serializable since ExprMeta subclasses can be serialized as part of query plans.
iceberg/common/src/main/scala/org/apache/iceberg/spark/source/GpuSparkWrite.scala Adds abort, files, and commitOperation wrapper methods delegating to GpuSparkWriteAccess; correct delegation pattern.
delta-lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/DatabricksDeltaProviderBase.scala Local-only DeltaWriteV1Config converted from case class to plain class; correct accessor semantics maintained with explicit val fields.
delta-lake/common/src/main/delta-io/scala/org/apache/spark/sql/delta/rapids/DeltaRuntimeShim.scala Converts StartTransactionArg from case class to plain class; function-scoped argument object, no serialization required, correct.
iceberg/common/src/main/scala/com/nvidia/spark/rapids/iceberg/GpuIcebergPartitioner.scala Converts ColumnarBatchWithPartition from case class to plain class; AutoCloseable semantics preserved.

Sequence Diagram

sequenceDiagram
    participant Executor
    participant GpuDeltaTaskStatisticsTracker
    participant GpuDeltaFileStatistics
    participant Spark Task Result
    participant Driver
    participant GpuDeltaJobStatisticsTracker

    Executor->>GpuDeltaTaskStatisticsTracker: getFinalStats()
    GpuDeltaTaskStatisticsTracker->>GpuDeltaFileStatistics: new GpuDeltaFileStatistics(results.toMap)
    GpuDeltaFileStatistics-->>Executor: stats object
    Executor->>Spark Task Result: serialize(WriteTaskStats)
    Note over GpuDeltaFileStatistics,Spark Task Result: Missing Serializable — was auto-provided by case class
    Spark Task Result->>Driver: send serialized task result
    Driver->>GpuDeltaJobStatisticsTracker: processStats(Seq[WriteTaskStats])
Loading

Reviews (1): Last reviewed commit: "Adapt Delta and Iceberg to unshimmed hel..." | Re-trigger Greptile

* of the collected statistics.
*/
case class GpuDeltaFileStatistics(stats: Map[String, String]) extends WriteTaskStats
class GpuDeltaFileStatistics(val stats: Map[String, String]) extends WriteTaskStats

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing Serializable after case-class-to-class conversion

Scala case class automatically mixes in java.io.Serializable. Converting to a plain class removes that. WriteTaskStats is a plain sealed trait (no Serializable), so GpuDeltaFileStatistics is no longer serializable. Spark collects WriteTaskStats from executors to the driver as part of the task result, so a NotSerializableException would surface during any Delta write that uses stats collection. Every other converted class in this PR that needs to cross a serialization boundary (GpuIncrementMetricMeta, DeltaParquetChunkedReader, DeltaParquetTableReader) explicitly adds with Serializable. Delta Lake's own DeltaFileStatistics remains a case class for the same reason.

Suggested change
class GpuDeltaFileStatistics(val stats: Map[String, String]) extends WriteTaskStats
class GpuDeltaFileStatistics(val stats: Map[String, String]) extends WriteTaskStats with Serializable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants