Skip to content

Skip rewriting internal pcollection coders in a stage #34655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

shunping
Copy link
Collaborator

@shunping shunping commented Apr 17, 2025

Prior to the PR, unknown coders were potentially replaced (e.g., length-prefixed) for every transform within a processing stage. This approach, however, caused issues in Java pipelines (like #32931), where worker nodes required the original coder (e.g., SchemaCoder) to correctly relay data between internal transforms.

An observation driving this change is that only the input and output coders of a stage are critical for the runner, as actual serialization/deserialization occurs at these stage boundaries. Coders for internal transforms should be irrelevant to the runner's processing and will now be kept intact. This minimizes potential side effects from modifying the pipeline graph and ensures internal data flow remains correct.

#fixes #32931

@shunping shunping changed the title Skip rewriting internal pcollection coders Skip rewriting internal pcollection coders in a stage Apr 17, 2025
@shunping shunping self-assigned this Apr 17, 2025
@shunping shunping marked this pull request as ready for review April 17, 2025 03:51
@shunping shunping requested a review from lostluck April 17, 2025 03:51
@shunping shunping added this to the 2.65.0 Release milestone Apr 17, 2025
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link

codecov bot commented Apr 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.59%. Comparing base (4ead940) to head (b38e32a).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #34655      +/-   ##
============================================
- Coverage     54.61%   54.59%   -0.03%     
  Complexity     1480     1480              
============================================
  Files          1008     1008              
  Lines        159758   159657     -101     
  Branches       1079     1079              
============================================
- Hits          87259    87162      -97     
+ Misses        70399    70397       -2     
+ Partials       2100     2098       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elegant! I thought for sure this would have required a fix Java side as well.

But i suppose if it doesn't happen in practice then what's to worry about.

@liferoad liferoad merged commit 56f1aa1 into apache:master Apr 17, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[prism] org.apache.beam.sdk.transforms.ParDoSchemaTest - UserCoder not a row coder exceptions.
3 participants