Skip to content

Iceberg 1.11 support for Spark 411, part (2/3): add iceberg-1-11-x module#14882

Open
res-life wants to merge 4 commits into
NVIDIA:mainfrom
res-life:iceberg-1.11/pr2-new-module-and-411
Open

Iceberg 1.11 support for Spark 411, part (2/3): add iceberg-1-11-x module#14882
res-life wants to merge 4 commits into
NVIDIA:mainfrom
res-life:iceberg-1.11/pr2-new-module-and-411

Conversation

@res-life

@res-life res-life commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Part (2/3) of #14853. Part (1/3) (#14881) is merged; this PR is now based
directly on main. Part (3/3) (#14883) stacks on this one.

Description

Apache Iceberg published iceberg-spark-runtime-4.1_2.13 starting at version 1.11.0 (apache/iceberg#14155 added Spark 4.1 support, released in 1.11.0). iceberg-1-10-x is not an option on Spark 4.1 because Iceberg never released a 4.1 runtime before 1.11.

Adds a new Maven submodule iceberg/iceberg-1-11-x and switches the release411 profile from iceberg/iceberg-stub to use it.

Module skeleton (mirrors iceberg-1-10-x with the iceberg111x sub-package and a spark411 shim source dir):

  • iceberg/iceberg-1-11-x/pom.xml + scala2.13 mirror
  • iceberg111x/IcebergProviderImpl, ShimUtilsImpl, GpuParquetIOShim
  • org.apache.iceberg.spark.source.GpuSparkCopyOnWriteScan — Iceberg 1.11 copy-on-write scan: SupportsRuntimeV2Filtering with filter(Predicate[])
  • spark411/.../GpuInternalRow overrides the new SpecializedGetters methods Spark 4.1 added (getGeometry, getGeography) alongside the existing getVariant

Wiring:

  • parent pom: add spark41x.iceberg.artifact.suffix=4.1 and iceberg.111x.version=1.11.0; swap release411 from iceberg-stub to iceberg-1-11-x
  • scala2.13 mirror regenerated via build/make-scala-version-build-files.sh
  • IcebergProbeImpl: lift the < 4.1.0 Spark cap to < 4.2.0; add the 1.11.0 commit-id mapping and "1.11" -> "iceberg111x" shim sub-package
  • iceberg/README.md: document the new row in the Iceberg/Spark support matrix

Enable iceberg integration tests on Spark 4.1.x:

  • spark_session.py: add is_spark_41x() and include 4.1.x in is_iceberg_supported_spark()
  • iceberg/__init__.py: update the skip reason to mention 4.1.x
  • iceberg_test.py::test_iceberg_read_timetravel: Iceberg 1.11 removed the .option("snapshot-id", ...) read API and directs users at Spark's built-in versionAsOf (works on both 1.10 and 1.11)

Also folds in two non-blocking review nits from the merged part (1/3) (#14881):

  • Dedup the V1 copy-on-write scan — the identical 1.6.x/1.9.x/1.10.x GpuSparkCopyOnWriteScan classes (post-Fix Iceberg package-private access after shim isolation #14866 they depend only on public Scan + SupportsRuntimeFiltering) are replaced by a single common GpuSparkCopyOnWriteV1Scan in iceberg/common, instantiated by all three V1 ShimUtilsImpls. Only the 1.11 V2 path keeps a version-specific class.
  • GpuSparkScanAccess.branch() reads the private branch field directly instead of the removed-in-1.11 SparkScan.branch() method, so it resolves across 1.6.x–1.11.x (was returning null on 1.11).

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
    (iceberg/README.md adds the Iceberg 1.11.x / Spark 4.1.x row to the support matrix.)
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
    (spark_session.py + iceberg/__init__.py re-enable the iceberg suite on Spark 4.1.x; test_iceberg_read_timetravel switched to versionAsOf.)
  • Covered by existing tests
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@res-life res-life force-pushed the iceberg-1.11/pr2-new-module-and-411 branch 4 times, most recently from 6491216 to b276ae1 Compare June 5, 2026 02:50
@res-life res-life force-pushed the iceberg-1.11/pr2-new-module-and-411 branch from b276ae1 to c04c056 Compare June 5, 2026 03:25
@res-life res-life changed the title Iceberg: add iceberg-1-11-x module wired to release411 (Spark 4.1) Iceberg 1.11 support for Spark 411, part (2/3): add iceberg-1-11-x module Jun 5, 2026
…dule

Adds the iceberg-1-11-x module and wires it into the release411 profile so
Spark 4.1.x uses Iceberg 1.11 (Iceberg only publishes iceberg-spark-runtime-4.1
from 1.11.0 on; iceberg-1-10-x is not an option on 4.1).

Module:
- iceberg/iceberg-1-11-x (pom + scala2.13 mirror), iceberg111x sub-package
  (IcebergProviderImpl, ShimUtilsImpl, GpuParquetIOShim), the 1.11 V2
  copy-on-write scan, and a spark411 GpuInternalRow overriding the new
  SpecializedGetters methods (getGeometry/getGeography) added in Spark 4.1.
- Parent pom: spark41x.iceberg.artifact.suffix=4.1, iceberg.111x.version=1.11.0;
  release411 swapped from iceberg-stub to iceberg-1-11-x.
- IcebergProbeImpl: lift the Spark cap to < 4.2.0; add the 1.11.0 commit-id
  mapping and the "1.11" -> "iceberg111x" shim sub-package.
- README + integration_tests wiring (is_spark_41x(), versionAsOf timetravel).

Folds in two non-blocking review nits from the merged part (1/3) (NVIDIA#14881):
- Dedup the V1 copy-on-write scan into a single common GpuSparkCopyOnWriteV1Scan
  in iceberg/common, instantiated by all three V1 ShimUtilsImpls. Only the 1.11
  V2 path keeps a version-specific class.
- GpuSparkScanAccess.branch() reads the private `branch` field directly instead
  of the removed-in-1.11 SparkScan.branch() method (was returning null on 1.11).

Review follow-ups on this PR:
- iceberg_version_detection_test.py now gates on iceberg_unsupported_mark
  (3.5/4.0/4.1) instead of is_spark_35x, and jenkins/spark-premerge-build.sh
  runs the version-detection test for Spark 4.1 with Iceberg 1.11.0, so the new
  1.11.0 commit-id mapping in IcebergProbeImpl gets automated coverage.
- New iceberg111x/IcebergProviderImpl.scala uses the current-year-only copyright
  header per the new-file convention.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life force-pushed the iceberg-1.11/pr2-new-module-and-411 branch from c04c056 to 911e7cc Compare June 5, 2026 07:20
@res-life

res-life commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

build

@res-life res-life marked this pull request as ready for review June 5, 2026 08:45
@res-life res-life requested a review from a team as a code owner June 5, 2026 08:45
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds the iceberg-1-11-x Maven submodule to wire Apache Iceberg 1.11.0 support into the Spark 4.1 (release411) build profile, replacing the previous stub. The module skeleton follows the established pattern of prior Iceberg shims and includes a Spark 4.1-specific GpuInternalRow override for the three new SpecializedGetters methods (getVariant, getGeometry, getGeography).

  • V2 filter path for Iceberg 1.11: the new GpuSparkCopyOnWriteScan correctly implements SupportsRuntimeV2Filtering with filter(Array[Predicate]), matching Iceberg 1.11's changed copy-on-write API; the V1 scan is consolidated into iceberg/common as GpuSparkCopyOnWriteV1Scan and shared by the 1.6/1.9/1.10 shims.
  • branch() reflection fix in GpuSparkScanAccess: reads the private branch field directly (traversing the class hierarchy) instead of the removed SparkScan.branch() method, catching both IllegalAccessException and RuntimeException so module-encapsulation failures degrade to null on this display-only path.
  • Integration-test enablement: is_iceberg_supported_spark() now includes Spark 4.1.x; test_iceberg_read_timetravel switches from the Iceberg-1.11-removed snapshot-id option to the standard Spark versionAsOf option, which is backward-compatible with 1.6.x–1.10.x.

Confidence Score: 5/5

All new and modified code follows the established iceberg-shim patterns; no GPU resource, OOM-retry, or correctness concerns in the changed paths.

The new iceberg-1-11-x module is a faithful port of iceberg-1-10-x with targeted API changes for Iceberg 1.11 (SupportsRuntimeV2Filtering, renamed fields). Resource management in GpuParquetIOShim.scala uses withResource/closeOnExcept correctly. The branch() reflection fix properly broadens the exception catch. The test change from snapshot-id to versionAsOf is semantically equivalent for Iceberg tables across all supported versions. The V1Scan consolidation is clean and all three V1 ShimUtilsImpls are updated consistently.

No files require special attention.

Important Files Changed

Filename Overview
iceberg/common/src/main/java/org/apache/iceberg/spark/source/GpuSparkScanAccess.java branch() now reads the private branch field via reflection instead of the removed SparkScan.branch() method; catches both IllegalAccessException and RuntimeException (covering InaccessibleObjectException), correctly addressing the previous review thread
iceberg/iceberg-1-11-x/src/main/java/com/nvidia/spark/rapids/iceberg/iceberg111x/ShimUtilsImpl.java New Iceberg 1.11.x shim; mirrors iceberg-1-10-x structure with SparkUtil::internalToSpark, storage-credential overlays, and delegates to GpuParquetIOShim for Parquet reader; correctly calls GpuSparkCopyOnWriteScan (V2 path)
iceberg/iceberg-1-11-x/src/main/scala/org/apache/iceberg/spark/source/GpuSparkCopyOnWriteScan.scala Iceberg 1.11 copy-on-write scan correctly implementing SupportsRuntimeV2Filtering with filter(Array[Predicate]); moved/evolved from the 1.10.x SupportsRuntimeFiltering version
iceberg/common/src/main/scala/org/apache/iceberg/spark/source/GpuSparkCopyOnWriteV1Scan.scala V1 copy-on-write scan consolidated from iceberg-1-6-x into common; cleanly renamed and updated companion object; shared by 1.6.x, 1.9.x, and 1.10.x ShimUtilsImpls
iceberg/iceberg-1-11-x/src/main/spark411/java/com/nvidia/spark/rapids/iceberg/GpuInternalRow.java Spark 4.1 shim for GpuInternalRow; adds delegation for getVariant, getGeometry, and getGeography — the three SpecializedGetters methods new in Spark 4.1, following established shim pattern with proper spark-rapids-shim-json-lines header
iceberg/iceberg-1-11-x/src/main/scala/com/nvidia/spark/rapids/iceberg/iceberg111x/GpuParquetIOShim.scala Parquet footer reader shim for Iceberg 1.11.x; resource management via withResource/closeOnExcept follows project conventions; mirrors iceberg-1-10-x pattern
iceberg/common/src/main/scala/com/nvidia/spark/rapids/iceberg/IcebergProbeImpl.scala Spark cap lifted from 4.1.0 to 4.2.0; 1.11.0 commit-ID mapping and iceberg111x shim-package entry added; straightforward extension of existing pattern
integration_tests/src/main/python/iceberg/iceberg_test.py test_iceberg_read_timetravel switched from deprecated snapshot-id option to versionAsOf; semantically equivalent for Iceberg tables and compatible with both 1.10.x and 1.11.x
pom.xml release411 profile: switches iceberg dependency from iceberg-stub to iceberg-1-11-x, adds spark41x.iceberg.artifact.suffix=4.1 and iceberg.111x.version=1.11.0

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[IcebergProbeImpl] -->|probes classpath jar| B{Iceberg version?}
    B -->|1.6.x| C[iceberg16x.ShimUtilsImpl]
    B -->|1.9.x| D[iceberg19x.ShimUtilsImpl]
    B -->|1.10.x| E[iceberg110x.ShimUtilsImpl]
    B -->|1.11.x| F[iceberg111x.ShimUtilsImpl]
    C -->|newCopyOnWriteScan| G[GpuSparkCopyOnWriteV1Scan\ncommon — SupportsRuntimeFiltering]
    D -->|newCopyOnWriteScan| G
    E -->|newCopyOnWriteScan| G
    F -->|newCopyOnWriteScan| H[GpuSparkCopyOnWriteScan\niceberg-1-11-x — SupportsRuntimeV2Filtering]
    G --> I[GpuSparkScanAccess.branch\nfield reflection - works 1.6-1.11]
    H --> I
Loading

Reviews (3): Last reviewed commit: "Note Spark 4.1 iceberg version-detect is..." | Re-trigger Greptile

Comment on lines +84 to +90
try {
f.setAccessible(true);
Object v = f.get(target);
return v == null ? null : v.toString();
} catch (IllegalAccessException e) {
return null;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 setAccessible can throw uncaught InaccessibleObjectException

f.setAccessible(true) can throw InaccessibleObjectException (which extends RuntimeException, not IllegalAccessException) in Java 9+ environments with module encapsulation. Only IllegalAccessException is caught here, so a security or module access rejection would propagate as an uncaught exception. Since branch() is used only for display purposes (query description), the catch block should also cover RuntimeException (or InaccessibleObjectException) to maintain the existing "return null on any access failure" contract.

@res-life res-life requested a review from a team June 5, 2026 08:55
Chong Gao and others added 3 commits June 10, 2026 13:03
…anch()

The branch() field-reflection fallback documents a "return null on any
access failure" contract but only caught IllegalAccessException. On Java
9+ (Spark 4.1 requires Java 17+) f.setAccessible(true) can also throw
InaccessibleObjectException (a RuntimeException) under module
encapsulation, or SecurityException, which would escape uncaught from
this display-only path. Widen the catch to also cover RuntimeException so
the documented contract holds across module-encapsulated JVMs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Chong Gao <res_life@163.com>
run_iceberg_version_detect_tests() now maps Spark 4.1 to Iceberg 1.11.0,
but run_iceberg_tests() in spark-tests.sh has no 4.1 entry yet, so the
"must stay in sync" comment over-claimed. The full 4.1 integration-suite
entry lands in the stacked follow-up PR; note that in the comment so the
stated invariant is accurate for this part.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Chong Gao <res_life@163.com>
…mment

ci_scala213() invokes run_iceberg_version_detect_tests() with SPARK_VER=4.0.1,
so the 4.1 -> 1.11.0 branch never runs in pre-merge CI; the 1.11.0 commit-ID
mapping is only exercised in nightly until the stacked follow-up PR adds the
full 4.1 integration suite. Update the sync comment to state this accurately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Chong Gao <res_life@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants