-
Notifications
You must be signed in to change notification settings - Fork 749
Add pre-publish step for IcebergSource and fix other issues in Iceberg file based copy #4155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7acb2c0 to
8a09042
Compare
a6e6ef9 to
1d2fa8b
Compare
73a7ba0 to
883ad29
Compare
883ad29 to
ed667f2
Compare
ed667f2 to
705ea0a
Compare
| /** | ||
| * Invoke the private addDeleteStepIfNeeded method using reflection | ||
| */ | ||
| private void invokeAddDeleteStepIfNeeded(List<IcebergTable.FilePathWithPartition> sourceFiles, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceFiles is not being used while invoking the method addDeleteStepIfNeeded
| List<IcebergTable.FilePathWithPartition> sourceFiles = Lists.newArrayList(); | ||
| Map<String, String> partitionData = new HashMap<>(); | ||
| partitionData.put("datepartition", "2025-10-11"); | ||
| sourceFiles.add(new IcebergTable.FilePathWithPartition("/source/path/file1.parquet", partitionData, 1000L)); | ||
| sourceFiles.add(new IcebergTable.FilePathWithPartition("/source/path/file4.parquet", partitionData, 1000L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceFiles is not being used in invokeDeleteStepIfNeeded, this test is effectively verifying similar behaviour as testDeleteSinglePartitionDirectory
| public void testDeleteTargetDirectoryNotExists() throws Exception { | ||
| // Configure: delete enabled but point to non-existent directory | ||
| state.setProp(IcebergSource.DELETE_FILES_NOT_IN_SOURCE, true); | ||
| state.setProp(ConfigurationKeys.DATA_PUBLISHER_FINAL_DIR, "/non/existent/path"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also add a check for testing when DATA_PUBLISHER_FINAL_DIR is not set i.e. before setting this property in this test itself, and rename to something like testDeleteTargetDirectoryNotConfiguredOrNotExists
| .datasetOutputPath(targetFs.getUri().getPath()) | ||
| .ancestorsOwnerAndPermission(ancestorOwnerAndPermissionList) | ||
| .build(); | ||
| copyableFile.setFsDatasets(originFs, targetFs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add a test that verifies that after extraction if we deserialize the CopyableFile then destinationData is not null, hence, also test the lineage fix
c21d238 to
d66598e
Compare
|
LGTM! |
d66598e to
70fe114
Compare
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
stream closed exceptionduring process work unit stepFileAwareInputStreamDataWriter can only process one file and cannot be reusedTests
Commits