-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize tf record io #34411
base: master
Are you sure you want to change the base?
Normalize tf record io #34411
Conversation
75d2682
to
e9d6428
Compare
When running the yaml integration tests, I get this error that I haven't figured out yet how to solve - |
Returns: | ||
A WriteToTFRecord transform object. | ||
""" | ||
return WriteToTFRecord( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because WriteToTFRecord takes in a PCollection of bytes, but yaml returns a PCollection of rows, we need a small conversion layer here - similar to
def write_to_text(pcoll, path: str): |
We will probably need a similar construct for reading to map it to rows
@@ -0,0 +1,111 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this is ready for full review (IMO, this is probably now if we can add a few integration tests). I'd recommend doing a separate PR for read/write since they're not really tied together at all. That will make it easier to review/iterate. It also will hopefully unblock the read review while dealing with the issues you're seeing on write
If you have integration test code, could you include that in the draft PR (or whatever follow up PR you have here)? I think it will help to see the full thing. Noting the |
This error is fixed. |
Add TFRecordIO normalization for java and python.
This fixes #28692.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.