Pipe Valuekind through DoFn and output builders#38490
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enables ValueKind metadata support throughout the Apache Beam pipeline. By allowing DoFns to output elements with specific ValueKind tags and receive them as parameters in @ProcessElement, this change provides greater flexibility for metadata-aware data processing. The implementation spans core SDKs, runner-specific overrides, and the Java harness, ensuring consistent behavior across different execution environments. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive support for ValueKind (e.g., INSERT, UPDATE, DELETE) across the Beam Java SDK and several runners, including Direct and Dataflow. Key changes include adding outputWithKind methods to DoFn contexts, allowing ValueKind as a parameter in @ProcessElement methods, and ensuring its propagation through transforms like Reshuffle and Reify. Review feedback identified critical bugs in SimpleDoFnRunner where variables were undefined and the ValueKind parameter was ignored in new method implementations. Additionally, an improvement was suggested for DoFnTester to ensure OpenTelemetry context is correctly propagated.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces ValueKind support to the Apache Beam Java SDK and runners, enabling elements to carry metadata such as INSERT, UPDATE_BEFORE, UPDATE_AFTER, or DELETE. Key changes include adding outputWithKind methods to DoFn contexts, updating WindowedValue to store ValueKind, and implementing the necessary logic in SimpleDoFnRunner, FnApiDoFnRunner, and the Dataflow runner. Review feedback identifies critical compilation errors in SimpleDoFnRunner where the timestamp variable is undefined in new methods, as well as a missing static import for assumeFalse in DataflowRunnerTest that will cause build failures.
|
|
||
| @Override | ||
| public <T> void outputWithKind(TupleTag<T> tag, T output, ValueKind kind) { | ||
| checkTimestamp(timestamp(), timestamp); |
There was a problem hiding this comment.
The variable timestamp is not defined in the scope of the outputWithKind method, which will lead to a compilation error. This appears to be a copy-paste error from the outputWithTimestamp method. Since the output timestamp is implicitly the context's firing timestamp (timestamp()), this check is redundant and should be removed.
|
|
||
| @Override | ||
| public <T> void outputWithKind(TupleTag<T> tag, T output, ValueKind kind) { | ||
| checkTimestamp(this.timestamp, timestamp); |
| } | ||
| // Skipp runner v2 because its Create uses a splittable DoFn, which contains a shuffle. | ||
| // ValueKind is not supported in Dataflow shuffle yet | ||
| assumeFalse(isRunnerV2); |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Allows DoFns to output elements with a ValueKind, and receive elements with a ValueKind parameter in the @ProcessElement method
Part of #38278
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.