Skip to content

[FLINK-39342][Iceberg] Support hadoop.conf.* prefix to pass Hadoop configuration properties#4351

Merged
lvyanquan merged 3 commits intoapache:masterfrom
eric666666:FLINK-39342
Apr 1, 2026
Merged

[FLINK-39342][Iceberg] Support hadoop.conf.* prefix to pass Hadoop configuration properties#4351
lvyanquan merged 3 commits intoapache:masterfrom
eric666666:FLINK-39342

Conversation

@eric666666
Copy link
Copy Markdown
Contributor

This PR adds support for a hadoop.conf.* configuration prefix in the Iceberg pipeline sink connector, allowing users to pass arbitrary Hadoop configuration properties directly through the pipeline job definition.

@github-actions github-actions bot added docs Improvements or additions to documentation iceberg-pipeline-connector labels Mar 27, 2026
@lvyanquan lvyanquan added the 3.7 label Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new hadoop.conf.* option prefix for the Iceberg pipeline sink connector so users can pass arbitrary Hadoop Configuration properties through the pipeline job config, and wires those options into Iceberg catalog/table operations.

Changes:

  • Parse and allow hadoop.conf.* options in IcebergDataSinkFactory (strip prefix) and propagate them through sink components.
  • Apply the propagated options when creating the Hadoop Configuration used to build Iceberg catalogs (writer/committer/metadata applier/compaction).
  • Update tests and docs to reflect the new configuration capability.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSinkOptions.java Introduces PREFIX_HADOOP_CONF constant for new option namespace.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSinkFactory.java Allows hadoop.conf.* during validation and extracts stripped Hadoop conf options.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSink.java Stores Hadoop conf options and passes them to sink + metadata applier.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergMetadataApplier.java Builds Iceberg catalog with Hadoop Configuration created from passed Hadoop conf options.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/utils/HadoopConfUtils.java New helper to build/apply Hadoop Configuration from option maps.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergSink.java Propagates Hadoop conf options into writer/committer/compaction operator construction.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergWriter.java Uses Hadoop conf options when building the Iceberg catalog.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergCommitter.java Uses Hadoop conf options when building the Iceberg catalog.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/compaction/CompactionOperator.java Uses Hadoop conf options when lazily building the Iceberg catalog for compaction.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/test/java/org/apache/flink/cdc/connectors/iceberg/sink/IcebergDataSinkFactoryTest.java Adds a test that hadoop.conf.* options are accepted by factory validation.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/test/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergWriterTest.java Updates writer/committer construction to pass Hadoop conf options.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/test/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergSinkITCase.java Updates sink construction to pass Hadoop conf options.
flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/test/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/CompactionOperatorTest.java Updates writer/committer/operator construction to pass Hadoop conf options.
docs/content/docs/connectors/pipeline-connectors/iceberg.md Documents the new hadoop.conf.* option prefix.
docs/content.zh/docs/connectors/pipeline-connectors/iceberg.md Chinese documentation for the new hadoop.conf.* option prefix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +115 to +119
sinkFactory.createDataSink(
new FactoryHelper.DefaultContext(
conf, conf, Thread.currentThread().getContextClassLoader()));
Assertions.assertThat(dataSink).isInstanceOf(IcebergDataSink.class);
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only asserts that the factory accepts hadoop.conf.* options, but it doesn't verify that the prefix is stripped and the resulting Hadoop configuration options are actually carried into the created sink (and later used when building the Iceberg catalog). Consider asserting the produced hadoopConfOptions contents (e.g., via reflection or a package-private accessor) to ensure the new feature works end-to-end.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment.
Here you've only proven that the DataSink can be created, but you haven't demonstrated that the Hadoop conf is actually taking effect. You need to verify in some way that the Hadoop conf is working as intended.

@eric666666
Copy link
Copy Markdown
Contributor Author

The test failure in NewlyAddedTableITCase.testJobManagerFailoverForNewlyAddedTableWithAheadBinlog is unrelated to the changes in this PR (which only touch Iceberg connector test files). It's a known flaky test — the
IllegalStateException in ensureJmLeaderServiceExists is a timing-sensitive JM failover issue that occasionally occurs in CI environments. Could someone with permissions please rerun the failed job? Thanks!
@lvyanquan

@lvyanquan lvyanquan added this to the V3.7.0 milestone Mar 30, 2026
@eric666666
Copy link
Copy Markdown
Contributor Author

The test failure in PaimonSinkITCase.testDuplicateCommitAfterRestore is a pre-existing flaky test unrelated to this PR. It fails because the COMPACT snapshot is generated asynchronously (since waitCompaction=false for
non-deletion-vector tables after commit 3447798), and the test queries snapshots before compaction completes. Could someone with access please retry the CI? @lvyanquan

Copy link
Copy Markdown
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lvyanquan lvyanquan merged commit 1a1e81c into apache:master Apr 1, 2026
39 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants