Skip to content

Iceberg: add iceberg-1-11-x module wired to release411 (Spark 4.1)#14882

Draft
res-life wants to merge 2 commits into
NVIDIA:mainfrom
res-life:iceberg-1.11/pr2-new-module-and-411
Draft

Iceberg: add iceberg-1-11-x module wired to release411 (Spark 4.1)#14882
res-life wants to merge 2 commits into
NVIDIA:mainfrom
res-life:iceberg-1.11/pr2-new-module-and-411

Conversation

@res-life
Copy link
Copy Markdown
Collaborator

Stacked work for #14853 (2/3) — adds the new module and switches Spark 4.1 to use it. Stacked on top of #14881 — please review only the latest commit; the diff vs. `main` includes #14881's changes too.

Description

Apache Iceberg published `iceberg-spark-runtime-4.1_2.13` starting at version 1.11.0 (apache/iceberg#14155 added Spark 4.1 support, released 2026-05-19 in 1.11.0). `iceberg-1-10-x` is not an option on Spark 4.1 because Iceberg never released a 4.1 runtime before 1.11.

Adds a new Maven submodule `iceberg/iceberg-1-11-x` and switches the `release411` profile from `iceberg/iceberg-stub` to use it.

Module skeleton (mirrors `iceberg-1-10-x` with the `iceberg111x` sub-package and a `spark411` shim source dir):

  • `iceberg/iceberg-1-11-x/pom.xml` + scala2.13 mirror
  • `iceberg111x/IcebergProviderImpl`, `ShimUtilsImpl`, `GpuParquetIOShim`
  • `org.apache.iceberg.spark.source.GpuSparkCopyOnWriteScan` — Iceberg 1.11 copy-on-write scan: `SupportsRuntimeV2Filtering` with `filter(Predicate[])`
  • `spark411/.../GpuInternalRow` overrides the new `SpecializedGetters` methods Spark 4.1 added (`getGeometry`, `getGeography`) alongside the existing `getVariant`

Wiring:

  • parent pom: add `spark41x.iceberg.artifact.suffix=4.1` and `iceberg.111x.version=1.11.0` properties; swap `release411` from `iceberg-stub` to `iceberg-1-11-x` with the 4.1 runtime suffix
  • `scala2.13` mirror regenerated via `build/make-scala-version-build-files.sh`
  • `IcebergProbeImpl`: lift the `< 4.1.0` Spark cap to `< 4.2.0`; add the 1.11.0 commit-id mapping and `"1.11" -> "iceberg111x"` shim sub-package
  • `iceberg/README.md`: document the new row in the Iceberg/Spark support matrix

Enable iceberg integration tests on Spark 4.1.x:

  • `integration_tests/src/main/python/spark_session.py`: add `is_spark_41x()` helper and include 4.1.x in `is_iceberg_supported_spark()` so the `@iceberg`-marked tests no longer skip on 4.1.
  • `integration_tests/src/main/python/iceberg/init.py`: update the skip reason to mention 4.1.x.
  • `integration_tests/.../iceberg_test.py::test_iceberg_read_timetravel`: Iceberg 1.11 removed the `.option("snapshot-id", ...)` read API and directs users at Spark's built-in `versionAsOf` (works on both 1.10 and 1.11).

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
    (`iceberg/README.md` adds the Iceberg 1.11.x / Spark 4.1.x row to the support matrix.)
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
    (`spark_session.py` + `iceberg/init.py` re-enable the existing iceberg suite on Spark 4.1.x; `test_iceberg_read_timetravel` switched to `versionAsOf` so it exercises both 1.10 and 1.11.)
  • Covered by existing tests
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@res-life res-life force-pushed the iceberg-1.11/pr2-new-module-and-411 branch from 9635a1d to 36fec5b Compare May 29, 2026 03:11
Chong Gao added 2 commits May 29, 2026 11:36
Refactors iceberg/common so the {SparkScan, SparkBatchQueryScan,
SparkCopyOnWriteScan, SparkBatch, DataWriteResult} APIs that diverge
between Iceberg 1.10.x and 1.11.x are hidden behind a small interface,
with per-version implementations in iceberg-1-6-x / iceberg-1-9-x /
iceberg-1-10-x. No behavior change for the existing Iceberg versions
this PR ships; sets the stage for a follow-up that adds iceberg-1-11-x.

Common:
- GpuSparkCopyOnWriteScan -> renamed to GpuSparkCopyOnWriteScanBase
  (abstract); per-version concrete subclass mixes in the right runtime-
  filter trait (SupportsRuntimeFiltering vs SupportsRuntimeV2Filtering)
  and the matching filter() signature.
- GpuSparkScan: rewrite hasNestedType via Spark's readSchema() + Spark
  types so it no longer depends on the Iceberg 1.10-only
  cpuScan.expectedSchema(); dispatch SparkCopyOnWriteScan construction
  through ShimUtils.newCopyOnWriteScan.
- GpuSparkBatchQueryScan: toString uses cpuScan.description() (public,
  available in both Iceberg 1.10 and 1.11) instead of branch /
  expectedSchema / filterExpressions which 1.11 removed.
  runtimeFilterExpressions field read tolerates both 1.10 name
  (runtimeFilterExpressions) and 1.11 name (runtimeFilters) — a tactical
  fallback to be replaced with proper per-version shim methods.
- GpuSparkBatch: same tolerance for expectedSchema (1.10) vs projection
  (1.11).
- GpuSparkWrite: type-annotate `new Array[DataFile](0)` so Scala 2.13
  doesn't infer Array[Nothing] under 1.11's wildcarded
  DataWriteResult.dataFiles().
- IcebergShimUtils / ShimUtils: add newCopyOnWriteScan(Scan, ...) factory
  whose parameter is Spark's public Scan because Iceberg's
  SparkCopyOnWriteScan is package-private — cross-package callers cannot
  reference it directly.

Per-Iceberg-version module:
- New GpuSparkCopyOnWriteScan in org.apache.iceberg.spark.source (so it
  can reference the package-private SparkCopyOnWriteScan). Companion
  object exposes create(Scan, ...): GpuScan for cross-package callers.
  1.6/1.9/1.10 mix in SupportsRuntimeFiltering + filter(Filter[]).
- ShimUtilsImpl.java: implement newCopyOnWriteScan via
  GpuSparkCopyOnWriteScan.create.

Signed-off-by: Chong Gao <res_life@163.com>
Iceberg published iceberg-spark-runtime-4.1_2.13 starting at version
1.11.0; iceberg-1-10-x is not an option on Spark 4.1 because Iceberg
never released a 4.1 runtime before 1.11. Adds a new Maven submodule
iceberg/iceberg-1-11-x and switches the release411 profile from
iceberg/iceberg-stub to use it.

Module skeleton (mirrors iceberg-1-10-x with the iceberg111x sub-package
and a spark411 shim source dir):
- iceberg/iceberg-1-11-x/pom.xml + scala2.13 mirror
- iceberg111x/IcebergProviderImpl, ShimUtilsImpl, GpuParquetIOShim
- org.apache.iceberg.spark.source.GpuSparkCopyOnWriteScan (Iceberg 1.11
  copy-on-write scan: SupportsRuntimeV2Filtering with filter(Predicate[]))
- spark411/.../GpuInternalRow overriding the new SpecializedGetters
  methods Spark 4.1 added (getGeometry, getGeography) alongside the
  existing getVariant

Wiring:
- parent pom: add spark41x.iceberg.artifact.suffix=4.1 and
  iceberg.111x.version=1.11.0 properties; swap release411 from
  iceberg-stub to iceberg-1-11-x with the 4.1 runtime suffix
- scala2.13 mirror regenerated via build/make-scala-version-build-files.sh
- IcebergProbeImpl: lift the < 4.1.0 Spark cap to < 4.2.0; add 1.11.0
  commit-id mapping and "1.11" -> "iceberg111x" shim sub-package
- README: document the new row in the Iceberg/Spark support matrix

Integration tests: enable iceberg suite on Spark 4.1.x:
- spark_session.py: add is_spark_41x() helper and include 4.1.x in
  is_iceberg_supported_spark() so the @iceberg-marked tests no longer
  skip on 4.1.
- iceberg/__init__.py: update the skip reason to mention 4.1.x.
- iceberg_test.py::test_iceberg_read_timetravel: Iceberg 1.11 removed
  the `.option("snapshot-id", ...)` read path and directs users at
  Spark's built-in `versionAsOf`. Switching to versionAsOf works on both
  1.10 and 1.11.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life force-pushed the iceberg-1.11/pr2-new-module-and-411 branch from 36fec5b to e111369 Compare May 29, 2026 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants