-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Correcting the pipeline object definition #34899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correcting the pipeline object definition #34899
Conversation
Assigning reviewers: R: @shunping for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Thanks for trying to clarify the concept. However, I actually prefer the comments prior to this change, i.e. a pipeline is a DAG of PTransforms. One reason is that when we construct a pipeline in any Beam SDKs, we are explicitly putting the PTransforms together. PCollections, on the other hand, only serve as the output of one PTransform and the input of another. |
Thanks for your review. Isn't this contradicting here, then ? |
If every PTransform has only one input PCollection and only one output PCollection, then it sounds also ok to say a Pipeline is a DAG of PCollections. However, some PTransform takes multiple inputs, like Flatten. Say the input PCollections are A and B and the output of Flatten is C. It will be weird to say A,B,C are nodes, because now we have two edges A-C and B-C, and both edges represent the same Flatten Transform. |
Looks like there is inconsistency here. I checked our pipeline proto, and it also says a pipeline is a graph of PTransforms. |
|
My appologies. I closed the PR by mistake. Could you help modify the comment here instead? Thanks! beam/sdks/python/apache_beam/pipeline.py Line 122 in 5b862dd
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #34899 +/- ##
============================================
+ Coverage 54.52% 54.54% +0.01%
Complexity 1479 1479
============================================
Files 1010 1011 +1
Lines 160461 160513 +52
Branches 1079 1079
============================================
+ Hits 87499 87544 +45
- Misses 70864 70871 +7
Partials 2098 2098
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Run Prism_Python PreCommit 3.12 |
Just noticed two small typos in the change. Could you correct them so we can merge the PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for clarifying the concepts.
* Correcting the pipeline object definition * Corrected the definition * lint correction * Corrected formatting
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.