Iceberg 1.11 support for Spark 411, part (2/3): add iceberg-1-11-x module#14882
Iceberg 1.11 support for Spark 411, part (2/3): add iceberg-1-11-x module#14882res-life wants to merge 4 commits into
Conversation
6491216 to
b276ae1
Compare
b276ae1 to
c04c056
Compare
…dule Adds the iceberg-1-11-x module and wires it into the release411 profile so Spark 4.1.x uses Iceberg 1.11 (Iceberg only publishes iceberg-spark-runtime-4.1 from 1.11.0 on; iceberg-1-10-x is not an option on 4.1). Module: - iceberg/iceberg-1-11-x (pom + scala2.13 mirror), iceberg111x sub-package (IcebergProviderImpl, ShimUtilsImpl, GpuParquetIOShim), the 1.11 V2 copy-on-write scan, and a spark411 GpuInternalRow overriding the new SpecializedGetters methods (getGeometry/getGeography) added in Spark 4.1. - Parent pom: spark41x.iceberg.artifact.suffix=4.1, iceberg.111x.version=1.11.0; release411 swapped from iceberg-stub to iceberg-1-11-x. - IcebergProbeImpl: lift the Spark cap to < 4.2.0; add the 1.11.0 commit-id mapping and the "1.11" -> "iceberg111x" shim sub-package. - README + integration_tests wiring (is_spark_41x(), versionAsOf timetravel). Folds in two non-blocking review nits from the merged part (1/3) (NVIDIA#14881): - Dedup the V1 copy-on-write scan into a single common GpuSparkCopyOnWriteV1Scan in iceberg/common, instantiated by all three V1 ShimUtilsImpls. Only the 1.11 V2 path keeps a version-specific class. - GpuSparkScanAccess.branch() reads the private `branch` field directly instead of the removed-in-1.11 SparkScan.branch() method (was returning null on 1.11). Review follow-ups on this PR: - iceberg_version_detection_test.py now gates on iceberg_unsupported_mark (3.5/4.0/4.1) instead of is_spark_35x, and jenkins/spark-premerge-build.sh runs the version-detection test for Spark 4.1 with Iceberg 1.11.0, so the new 1.11.0 commit-id mapping in IcebergProbeImpl gets automated coverage. - New iceberg111x/IcebergProviderImpl.scala uses the current-year-only copyright header per the new-file convention. Signed-off-by: Chong Gao <res_life@163.com>
c04c056 to
911e7cc
Compare
|
build |
Greptile SummaryAdds the
Confidence Score: 5/5All new and modified code follows the established iceberg-shim patterns; no GPU resource, OOM-retry, or correctness concerns in the changed paths. The new iceberg-1-11-x module is a faithful port of iceberg-1-10-x with targeted API changes for Iceberg 1.11 (SupportsRuntimeV2Filtering, renamed fields). Resource management in GpuParquetIOShim.scala uses withResource/closeOnExcept correctly. The branch() reflection fix properly broadens the exception catch. The test change from snapshot-id to versionAsOf is semantically equivalent for Iceberg tables across all supported versions. The V1Scan consolidation is clean and all three V1 ShimUtilsImpls are updated consistently. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[IcebergProbeImpl] -->|probes classpath jar| B{Iceberg version?}
B -->|1.6.x| C[iceberg16x.ShimUtilsImpl]
B -->|1.9.x| D[iceberg19x.ShimUtilsImpl]
B -->|1.10.x| E[iceberg110x.ShimUtilsImpl]
B -->|1.11.x| F[iceberg111x.ShimUtilsImpl]
C -->|newCopyOnWriteScan| G[GpuSparkCopyOnWriteV1Scan\ncommon — SupportsRuntimeFiltering]
D -->|newCopyOnWriteScan| G
E -->|newCopyOnWriteScan| G
F -->|newCopyOnWriteScan| H[GpuSparkCopyOnWriteScan\niceberg-1-11-x — SupportsRuntimeV2Filtering]
G --> I[GpuSparkScanAccess.branch\nfield reflection - works 1.6-1.11]
H --> I
Reviews (3): Last reviewed commit: "Note Spark 4.1 iceberg version-detect is..." | Re-trigger Greptile |
| try { | ||
| f.setAccessible(true); | ||
| Object v = f.get(target); | ||
| return v == null ? null : v.toString(); | ||
| } catch (IllegalAccessException e) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
setAccessible can throw uncaught InaccessibleObjectException
f.setAccessible(true) can throw InaccessibleObjectException (which extends RuntimeException, not IllegalAccessException) in Java 9+ environments with module encapsulation. Only IllegalAccessException is caught here, so a security or module access rejection would propagate as an uncaught exception. Since branch() is used only for display purposes (query description), the catch block should also cover RuntimeException (or InaccessibleObjectException) to maintain the existing "return null on any access failure" contract.
…anch() The branch() field-reflection fallback documents a "return null on any access failure" contract but only caught IllegalAccessException. On Java 9+ (Spark 4.1 requires Java 17+) f.setAccessible(true) can also throw InaccessibleObjectException (a RuntimeException) under module encapsulation, or SecurityException, which would escape uncaught from this display-only path. Widen the catch to also cover RuntimeException so the documented contract holds across module-encapsulated JVMs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Chong Gao <res_life@163.com>
run_iceberg_version_detect_tests() now maps Spark 4.1 to Iceberg 1.11.0, but run_iceberg_tests() in spark-tests.sh has no 4.1 entry yet, so the "must stay in sync" comment over-claimed. The full 4.1 integration-suite entry lands in the stacked follow-up PR; note that in the comment so the stated invariant is accurate for this part. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Chong Gao <res_life@163.com>
…mment ci_scala213() invokes run_iceberg_version_detect_tests() with SPARK_VER=4.0.1, so the 4.1 -> 1.11.0 branch never runs in pre-merge CI; the 1.11.0 commit-ID mapping is only exercised in nightly until the stacked follow-up PR adds the full 4.1 integration suite. Update the sync comment to state this accurately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Chong Gao <res_life@163.com>
Part (2/3) of #14853. Part (1/3) (#14881) is merged; this PR is now based
directly on
main. Part (3/3) (#14883) stacks on this one.Description
Apache Iceberg published
iceberg-spark-runtime-4.1_2.13starting at version 1.11.0 (apache/iceberg#14155 added Spark 4.1 support, released in 1.11.0).iceberg-1-10-xis not an option on Spark 4.1 because Iceberg never released a 4.1 runtime before 1.11.Adds a new Maven submodule
iceberg/iceberg-1-11-xand switches therelease411profile fromiceberg/iceberg-stubto use it.Module skeleton (mirrors
iceberg-1-10-xwith theiceberg111xsub-package and aspark411shim source dir):iceberg/iceberg-1-11-x/pom.xml+ scala2.13 mirroriceberg111x/IcebergProviderImpl,ShimUtilsImpl,GpuParquetIOShimorg.apache.iceberg.spark.source.GpuSparkCopyOnWriteScan— Iceberg 1.11 copy-on-write scan:SupportsRuntimeV2Filteringwithfilter(Predicate[])spark411/.../GpuInternalRowoverrides the newSpecializedGettersmethods Spark 4.1 added (getGeometry,getGeography) alongside the existinggetVariantWiring:
spark41x.iceberg.artifact.suffix=4.1andiceberg.111x.version=1.11.0; swaprelease411fromiceberg-stubtoiceberg-1-11-xscala2.13mirror regenerated viabuild/make-scala-version-build-files.shIcebergProbeImpl: lift the< 4.1.0Spark cap to< 4.2.0; add the 1.11.0 commit-id mapping and"1.11" -> "iceberg111x"shim sub-packageiceberg/README.md: document the new row in the Iceberg/Spark support matrixEnable iceberg integration tests on Spark 4.1.x:
spark_session.py: addis_spark_41x()and include 4.1.x inis_iceberg_supported_spark()iceberg/__init__.py: update the skip reason to mention 4.1.xiceberg_test.py::test_iceberg_read_timetravel: Iceberg 1.11 removed the.option("snapshot-id", ...)read API and directs users at Spark's built-inversionAsOf(works on both 1.10 and 1.11)Also folds in two non-blocking review nits from the merged part (1/3) (#14881):
GpuSparkCopyOnWriteScanclasses (post-Fix Iceberg package-private access after shim isolation #14866 they depend only on publicScan+SupportsRuntimeFiltering) are replaced by a single commonGpuSparkCopyOnWriteV1Scaniniceberg/common, instantiated by all three V1ShimUtilsImpls. Only the 1.11 V2 path keeps a version-specific class.GpuSparkScanAccess.branch()reads the privatebranchfield directly instead of the removed-in-1.11SparkScan.branch()method, so it resolves across 1.6.x–1.11.x (was returningnullon 1.11).Checklists
Documentation
(
iceberg/README.mdadds the Iceberg 1.11.x / Spark 4.1.x row to the support matrix.)Testing
(
spark_session.py+iceberg/__init__.pyre-enable the iceberg suite on Spark 4.1.x;test_iceberg_read_timetravelswitched toversionAsOf.)Performance