-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG FIX : fix-ReadFromKafkaViaSDF-bug-shall-set-coder #34505
base: master
Are you sure you want to change the base?
BUG FIX : fix-ReadFromKafkaViaSDF-bug-shall-set-coder #34505
Conversation
Assigning reviewers. If you would like to opt out of this review, comment R: @chamikaramj for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
.withKeyDeserializerProviderAndCoder( | ||
kafkaRead.getKeyDeserializerProvider(), keyCoder) | ||
.withValueDeserializerProviderAndCoder( | ||
kafkaRead.getValueDeserializerProvider(), valueCoder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyCoder and valueCoder must be non-nullable. they are resolved here
beam/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
Lines 1562 to 1563 in 800d434
Coder<K> keyCoder = getKeyCoder(coderRegistry); | |
Coder<V> valueCoder = getValueCoder(coderRegistry); |
either from user input or infer from coderegistry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is at the entry of kafka IO, and it will 100% return a coder :
beam/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
Lines 1940 to 1943 in 800d434
private Coder<V> getValueCoder(CoderRegistry coderRegistry) { | |
return (getValueCoder() != null) | |
? getValueCoder() | |
: Preconditions.checkStateNotNull(getValueDeserializerProvider()).getCoder(coderRegistry); |
if coder is given, it will return the coder specified by user. If not, it will attempt to retrieve a coder from registry , which only has coder for build-in deserializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Would it be possible to extend the test coverage? I think you could modify a simpler test like testKafkaIOWriteWithErrorHandler in KafkaIOIT.java to have a custom deserializer (perhaps just a no-op subclass of StringSerializer for example) without a registered coder. That would help prevent regressions as well.
Yes, will do. Actually I was waiting for someone to point me to the place where I can write unit/integration tests 🤣 . Thanks. 🙇♂️ |
I think they are still there just grouped together differently.
…On Thu, Apr 3, 2025, 4:23 PM johnjcasey ***@***.***> wrote:
This change is technically not backwards compatible, because it removes
the old, but public facing, methods on ReadSourceDescriptors. Can you keep
those methods?
—
Reply to this email directly, view it on GitHub
<#34505 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBZZTHDTLTLNN4CEW6GX7D2XU75PAVCNFSM6AAAAAB2IH2E76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZVHE3DINRSGM>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
[image: johnjcasey]*johnjcasey* left a comment (apache/beam#34505)
<#34505 (comment)>
This change is technically not backwards compatible, because it removes
the old, but public facing, methods on ReadSourceDescriptors. Can you keep
those methods?
—
Reply to this email directly, view it on GitHub
<#34505 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBZZTHDTLTLNN4CEW6GX7D2XU75PAVCNFSM6AAAAAB2IH2E76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZVHE3DINRSGM>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
stop reviewer notifications |
Stopping reviewer notifications for this pull request: requested by reviewer. If you'd like to restart, comment |
sorry i do not get what do you mean here ; would you mind giving more details @scwhittle 🙇♂️ Thanks!! |
ssign set of reviewers |
Fix for #34506
This PR fixes a bug when using KafkaIO with a customized Deserializer that extends
Deserializer<Row>
. We built this Deserializer to deserialize byte arrays into Beam Rows with a schema we specified.When using KafkaIO, we use withValueDeserializerAndCoder.
It runs without any issues with legacy Kafka IO
ReadFromKafkaViaUnbounded
. However, it fails when usingReadFromKafkaViaSDF
. The error originates here.This happens because currently
ReadFromKafkaViaSDF
does not set the coder even if we explicitly provide both the deserializer and the coder using withValueDeserializerAndCoder. Since no coder is explicitly set, Beam infers the type from the deserializer (beam/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
Lines 1940 to 1943 in 800d434
However, if we use a customized deserializer, such as
foo_bar_Deserializer implements Deserializer<Row>
, Beam will be unable to infer the coder and will throw an error.ReadFromKafkaViaUnbounded
) sets both the deserializer and the coder based on the input.ReadFromKafkaViaSDF
) currently does not set the coder explicitly. It does not pass the coder and use it, instead relying on inferring the coder from the Deserializer, which will throw an error as there is no coder for Row in the registry for obvious reasons.To resolve this issue, we need to explicitly set the coder based on the input.
At the entry of KafkaIO, coders are resolved here:
beam/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
Lines 1562 to 1563 in 800d434
withValueDeserializerAndCoder
, or Beam infers the coder based on type from coderRegistry).This coder is passed to both
ReadFromKafkaViaUnbounded
andReadFromKafkaViaSDF
.ReadFromKafkaViaUnbounded
uses the coderReadFromKafkaViaSDF
does not use the coderThis PR addresses the issue for
ReadFromKafkaViaSDF
.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.