-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase consistency for BQ typed read with avro source #27971
Conversation
(AvroSource<T>) | ||
AvroSource.from(file.toString()) | ||
.withSchema(avroSchema) | ||
.withDatumReaderFactory(factory)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the factory returns a type T and coder is not propagated
Assigning reviewers. If you would like to opt out of this review, comment R: @bvolpato for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Thanks for solving this long-standing issue! The PR looks pretty solid to me. One concern is that could it be substantial breaking change if changing the signature from AvroSource to AvroSource<AvroT, T>. As Java Generic does not support default (e.g. class A<T, V=T>), one solution for less singature change is to define AvroSource extends CodableAvroSource<T, T>; or make CodableAvroSource<AvroT, T> extends AvroSource (did a little bit search: https://www.google.com/search?q=codable&oq=codable&aqs=chrome..69i57j0i512l6j69i64.1163j0j4&sourceid=chrome&ie=UTF-8) The goal is make the change needed for the code base smaller / which also means the possibility user needs to rewrite their pipeline smaller |
if (getMode() == SINGLE_FILE_OR_SUBRANGE) { | ||
// emptyMatchTreatment is unused for mode SINGLE_FILE_OR_SUBRANGE | ||
return this; | ||
} | ||
return new AvroSource<>( | ||
getFileOrPatternSpecProvider(), emptyMatchTreatment, getMinBundleSize(), mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new source was created, regardless of the mode, which may extract wrong parameters in case of SINGLE_FILE_OR_SUBRANGE
. Looking at parent constructor, field is unused in that case
I tried to avoid breaking changes as much as possible, but this gets extremely complex, or create lots of code duplication to go around that.
breaking would be 'fine' |
I see the difficulty here. Thanks again for addressing this. In this case, would you mind sending the proposed change to Beam devlist for the discussion to the change of AvroSource? Also CC: @aromanenko-dev who worked heavily on avro for thoughts |
Will do! In the meantime, can you check the linked PR ? |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @kennknowles for label java. Available commands:
|
Sorry for delay with a response. What is a current state of this PR? Is it blocked by something? |
If I understood correctly the original issue was resolved by another approach #28143 |
Yes, it looks so. Should we close this one then? |
I did not have time to start the discussion on the mailing list to discuss the breaking changes and unify the API between Avro and BQ Avro dump. We can close this one, and in case this gets accepted, I'll re-open a new PR. |
Fix #26329
AvroT
type parameter inAvroSource
to memorize the materialized avro record type (the one produced by the avro reading).AvroDatumReader
andparseFn
in the source.parseFn
toBigqueryIO.TypedRead
so it behaves as theAvroSource
.