Skip to content

LIN-160 - Summary Lineage #4641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

LIN-160 - Summary Lineage #4641

wants to merge 16 commits into from

Conversation

akshaysw
Copy link

Change description

Connection level lineage
notion doc - https://www.notion.so/atlanhq/Summary-Lineage-eef721c5acfb48f9b5d3b84aa3da0d43

I have added new file LineagePreProcessor - >

  1. Files check bulk API payload for create process request
  2. checks whether the input/output assets belong to different connections.
  3. If they do, Atlas verifies whether a connection lineage already exists:
  • If a connection lineage exists, no further action is taken.
  • If no connection lineage exists, Atlas creates one.
  1. similarly delete process call is handled,

Type of change

  • Bug fix (fixes an issue)
  • New feature (adds functionality)

Related issues

Fix #1

Helm Config Changes for Running Tests (Staging PR)

Does this PR require Helm config changes for testing?

  • Tests are NOT required for this commit. (You can proceed with the PR.) ✅
  • No, Helm config changes are not needed. (You can proceed with the PR.) ✅
  • Yes, I have already updated the config-values on enpla9up36. (You can proceed with the PR.) ✅
  • Yes, but I have NOT updated the config-values. (Please update them before proceeding; or, tests will run with default values.)⚠️

Checklists

Development

  • Lint rules pass locally
  • Application changes have been tested thoroughly
  • Automated tests covering modified code pass

Security

  • Security impact of change has been considered
  • Code follows company security practices and guidelines

Code review

  • Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary
  • "Ready for review" label attached and reviewers assigned
  • Changes have been reviewed by at least one other contributor
  • Pull request linked to task tracker where applicable

AtlasVertex datasetVertex = AtlasGraphUtilsV2.findByGuid(this.graph, guid);
if (direction == AtlasLineageOnDemandInfo.LineageDirection.INPUT || direction == AtlasLineageOnDemandInfo.LineageDirection.BOTH)
traverseEdgesOnDemand(datasetVertex, true, depth, level, new HashSet<>(), atlasLineageOnDemandContext, ret, guid, inputEntitiesTraversed, traversalOrder);
traverseEdgesOnDemand(datasetVertex, true, depth, level, new HashSet<>(), atlasLineageOnDemandContext, ret, guid, inputEntitiesTraversed, traversalOrder, entityValidationResult);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traverseEdgesOnDemand has 11 parameters now. This is not getting any easy to maintain.

Always make an attempt to make code better than current! cc: @sumandas0 @aarshi0301

@akshaysw akshaysw requested a review from sriram-atlan May 27, 2025 07:20
@@ -35,6 +36,9 @@ public void setImmediateNeighbours(Boolean immediateNeighbours) {
this.immediateNeighbours = immediateNeighbours;
}


private String lineageType = LINEAGE_TYPE_DATASET_PROCESS_LINEAGE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would lineageType always be LINEAGE_TYPE_DATASET_PROCESS_LINEAGE, I do not see any code to conditionally set it to CONNECTION_PROCESS_LINEAGE.

If so we do not need this field at all

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass lineageType from payload, if not present in the payload then we assume default value LINEAGE_TYPE_DATASET_PROCESS_LINEAGE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field can be set from the request payload. If not provided, defaults to LINEAGE_TYPE_DATASET_PROCESS_LINEAGE Supported values: "DatasetProcessLineage" (default), "ConnectionProcessLineage"

@@ -90,6 +94,13 @@ public Integer getDepth() {
public void setDepth(Integer depth) {
this.depth = depth;
}
public String getLineageType() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will alway return LINEAGE_TYPE_DATASET_PROCESS_LINEAGE, please review need of the getter & field

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated logic to set based on paload.

@@ -22,6 +23,7 @@ public class LineageOnDemandRequest {
private Set<String> attributes;
private Set<String> relationAttributes;
private LineageOnDemandBaseParams defaultParams;
private String lineageType = LINEAGE_TYPE_DATASET_PROCESS_LINEAGE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern for lineageType, it will be always LINEAGE_TYPE_DATASET_PROCESS_LINEAGE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done above^

@@ -35,6 +38,7 @@ public AtlasLineageListContext(LineageListRequest lineageListRequest, AtlasTypeR
this.vertexTraversalPredicate = constructInMemoryPredicate(typeRegistry, lineageListRequest.getEntityTraversalFilters());
this.edgeTraversalPredicate = constructInMemoryPredicate(typeRegistry, lineageListRequest.getRelationshipTraversalFilters());
this.attributes = lineageListRequest.getAttributes();
this.lineageType = lineageListRequest.getLineageType();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always set value to LINEAGE_TYPE_DATASET_PROCESS_LINEAGE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated setter to set different value based on payload.

}

for (String connectionProcessQn : connectionProcessQNs) {
if (!checkIfChildProcessExistForConnectionProcess(connectionProcessQn)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If checkIfChildProcessExistForConnectionProcess returns GUIDs, you can avoid further getEntityVertex by atlasObjectId which does not use vertex cache, where GUID lookup with cache

}
} catch (Exception e) {
try {
String value = vertex.getProperty(PARENT_CONNECTION_PROCESS_QUALIFIED_NAME, String.class);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why are you expecting String PARENT_CONNECTION_PROCESS_QUALIFIED_NAME? I think this is not needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants