Skip to content

Correcting the pipeline object definition #34899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 21, 2025

Conversation

TanuSharma2511
Copy link
Contributor


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@github-actions github-actions bot added the python label May 9, 2025
@TanuSharma2511
Copy link
Contributor Author

@liferoad

Copy link
Contributor

github-actions bot commented May 9, 2025

Assigning reviewers:

R: @shunping for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shunping
Copy link
Collaborator

Thanks for trying to clarify the concept.

However, I actually prefer the comments prior to this change, i.e. a pipeline is a DAG of PTransforms. One reason is that when we construct a pipeline in any Beam SDKs, we are explicitly putting the PTransforms together. PCollections, on the other hand, only serve as the output of one PTransform and the input of another.

@TanuSharma2511
Copy link
Contributor Author

Thanks for your review. Isn't this contradicting here, then ?

@shunping
Copy link
Collaborator

If every PTransform has only one input PCollection and only one output PCollection, then it sounds also ok to say a Pipeline is a DAG of PCollections.

However, some PTransform takes multiple inputs, like Flatten. Say the input PCollections are A and B and the output of Flatten is C. It will be weird to say A,B,C are nodes, because now we have two edges A-C and B-C, and both edges represent the same Flatten Transform.

@shunping
Copy link
Collaborator

shunping commented May 10, 2025

Thanks for your review. Isn't this contradicting here, then ?

Looks like there is inconsistency here. I checked our pipeline proto, and it also says a pipeline is a graph of PTransforms.
https://github.com/apache/beam/blob/release-2.65/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L80

@shunping
Copy link
Collaborator

here

Thanks for your review. Isn't this contradicting here, then ?

Looks like there is inconsistency here. I checked our pipeline proto, and it also says a pipeline is a graph of PTransform. https://github.com/apache/beam/blob/release-2.65/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L80

@shunping shunping closed this May 10, 2025
@shunping shunping reopened this May 10, 2025
@shunping
Copy link
Collaborator

shunping commented May 10, 2025

Thanks for your review. Isn't this contradicting here, then ?

Looks like there is inconsistency here. I checked our pipeline proto, and it also says a pipeline is a graph of PTransforms. https://github.com/apache/beam/blob/release-2.65/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L80

My appologies. I closed the PR by mistake.

Could you help modify the comment here instead? Thanks!

Conceptually the :class:`~apache_beam.pvalue.PValue` s are the DAG's nodes and

Copy link

codecov bot commented May 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.54%. Comparing base (9e8e3c3) to head (f3cd157).
Report is 57 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #34899      +/-   ##
============================================
+ Coverage     54.52%   54.54%   +0.01%     
  Complexity     1479     1479              
============================================
  Files          1010     1011       +1     
  Lines        160461   160513      +52     
  Branches       1079     1079              
============================================
+ Hits          87499    87544      +45     
- Misses        70864    70871       +7     
  Partials       2098     2098              
Flag Coverage Δ
python 81.10% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@TanuSharma2511
Copy link
Contributor Author

Run Prism_Python PreCommit 3.12

@shunping
Copy link
Collaborator

Just noticed two small typos in the change.

Could you correct them so we can merge the PR? Thanks!

Copy link
Collaborator

@shunping shunping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for clarifying the concepts.

@liferoad liferoad merged commit 270c2b0 into apache:master May 21, 2025
89 of 90 checks passed
changliiu pushed a commit to changliiu/beam that referenced this pull request May 22, 2025
* Correcting the pipeline object definition

* Corrected the definition

* lint correction

* Corrected formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants