Skip to content

[DRAFT] Add changes to populate the data source metadata details #25127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

evanvdia
Copy link
Contributor

@evanvdia evanvdia commented May 15, 2025

Description

To achieve combined lineage tracking, add changes to populate the data source metadata details.

Motivation and Context

Issue: #25123

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@evanvdia evanvdia requested review from hantangwangd, ZacBlanco and a team as code owners May 15, 2025 15:00
@evanvdia evanvdia requested a review from jaystarshot May 15, 2025 15:00
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 15, 2025
@prestodb-ci prestodb-ci requested review from a team, ScrapCodes and BryanCutler and removed request for a team May 15, 2025 15:00
@evanvdia evanvdia force-pushed the datasource_details_at_event branch from 2bf5773 to 9f9bc37 Compare May 15, 2025 15:02
@evanvdia evanvdia marked this pull request as draft May 15, 2025 15:53
@evanvdia evanvdia force-pushed the datasource_details_at_event branch from 9f9bc37 to cd7ec48 Compare May 16, 2025 09:14
@evanvdia
Copy link
Contributor Author

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evanvdia Thanks for the PR, I did a first pass and added a few suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evanvdia I don't see this class being referenced anywhere. Is the PR missing some changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjalpreet I have added the changes.


import static java.util.Objects.requireNonNull;

public class JdbcOutputMetaData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please rename to JdbcOutputMetadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@JsonProperty
public String getInfo()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add @Override annotation as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import java.util.List;
import java.util.Objects;

public class HiveConnectorOutputMetadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please rename this class as HiveOutputInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JsonProperty("partitionNames") List<String> partitionNames,
@JsonProperty("tableLocation") String tableLocation)
{
this.partitionNames = partitionNames;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add null checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

partitionUpdates.stream()
.map(PartitionUpdate::getName)
.collect(toList())));
.collect(toList()), writeInfo.getTargetPath().getName())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using writeInfo.getTargetPath().toString() in one case and writeInfo.getTargetPath().getName() in the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


import static java.util.Objects.requireNonNull;

public class HiveWrittenPartitions
implements ConnectorOutputMetadata
{
private final List<String> partitionNames;
private final HiveConnectorOutputMetadata connectorOutputInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename the object as hiveOutputInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
this.partitionNames = ImmutableList.copyOf(requireNonNull(partitionNames, "partitionNames is null"));
this.connectorOutputInfo = requireNonNull(connectorOutputInfo, "connectorOutputInfo is null");
}

@JsonProperty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an @Override annotation here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

IcebergTableLayoutHandle icebergTableHandle = (IcebergTableLayoutHandle) tableHandle;
return Optional.of(new IcebergInputInfo(
icebergTableHandle.getTable().getIcebergTableName().getSnapshotId(),
icebergTableHandle.getTable().getOutputPath().get()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add null checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -570,9 +571,9 @@ private Optional<ConnectorOutputMetadata> finishInsert(ConnectorSession session,
throw new PrestoException(ICEBERG_COMMIT_ERROR, "Failed to commit Iceberg update to table: " + writableTableHandle.getTableName(), e);
}

return Optional.of(new HiveWrittenPartitions(commitTasks.stream()
return Optional.of(new HiveWrittenPartitions(new HiveConnectorOutputMetadata(commitTasks.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we are not using IcebergWrittenPartitions here, and the class is unused. We can do a small refactor. Let's remove IcebergWrittenPartitions and also move HiveWrittenPartitions from presto-hive to presto-hive-common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@evanvdia evanvdia force-pushed the datasource_details_at_event branch 2 times, most recently from facd048 to 5aaf7e9 Compare May 21, 2025 09:20
@evanvdia evanvdia requested a review from imjalpreet May 21, 2025 16:59
@evanvdia evanvdia force-pushed the datasource_details_at_event branch from 5aaf7e9 to a1a4003 Compare May 22, 2025 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants