-
Notifications
You must be signed in to change notification settings - Fork 4.8k
fix(source-github): Resolve duplicate field errors for +1/-1 reaction columns in Parquet/Avro destinations #67609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(source-github): Resolve duplicate field errors for +1/-1 reaction columns in Parquet/Avro destinations #67609
Conversation
… columns in Parquet/Avro destinations - Transform reaction fields +1 and -1 to plus_one and minus_one in GithubStream.transform() - Update all schema files to use plus_one and minus_one field names - Update integration test expected records to match new field names - Bump version to 1.9.1 Fixes airbytehq/oncall#9234 Co-Authored-By: unknown <>
Original prompt from API User
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Helpful Resources
PR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
|
|
…c.1 for progressive rollout Co-Authored-By: unknown <>
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 3f3429c. |
Co-Authored-By: unknown <>
What
Resolves duplicate field errors for GitHub reaction columns
+1
and-1
when syncing to S3 Parquet/Avro destinations.Fixes airbytehq/oncall#9234
The GitHub API returns reaction fields named
+1
and-1
, but these field names cause duplicate field errors in Parquet/Avro destinations because these special characters are not valid Avro field names and get normalized differently by different systems.How
Implements a source-side field renaming solution by:
GithubStream.transform()
to rename reaction fields from+1
/-1
toplus_one
/minus_one
during data processingshared/reactions.json
(main reactions schema)shared/events/comment.json
shared/events/commented.json
shared/events/cross_referenced.json
The transformation preserves the original GitHub API field names in unit test response files while using the sanitized field names in actual data output and integration tests.
Review Guide
source_github/streams.py
- Verify transformation logic handles edge cases (null reactions, empty objects)plus_one
/minus_one
consistentlyintegration_tests/expected_records.jsonl
- Verify records use new field names+1
/-1
field names (these should preserve raw API responses)User Impact
Positive Impact:
Breaking Change:
+1
/-1
reaction fields will need to be updated to useplus_one
/minus_one
Can this PR be safely reverted and rolled back?
This change is isolated to the GitHub source connector and only affects reaction field naming. Reverting would restore the original field names and schema definitions.
Link to Devin run: https://app.devin.ai/sessions/a37c96affcd24179b577950c5cc54791
Requested by: @vai-airbyte via
/ai-fix
command