Skip to content

Conversation

devin-ai-integration[bot]
Copy link
Contributor

What

Resolves duplicate field errors for GitHub reaction columns +1 and -1 when syncing to S3 Parquet/Avro destinations.

Fixes airbytehq/oncall#9234

The GitHub API returns reaction fields named +1 and -1, but these field names cause duplicate field errors in Parquet/Avro destinations because these special characters are not valid Avro field names and get normalized differently by different systems.

How

Implements a source-side field renaming solution by:

  1. Runtime Transformation: Added logic in GithubStream.transform() to rename reaction fields from +1/-1 to plus_one/minus_one during data processing
  2. Schema Updates: Updated all schema files that reference reaction fields to use the new field names:
    • shared/reactions.json (main reactions schema)
    • shared/events/comment.json
    • shared/events/commented.json
    • shared/events/cross_referenced.json
  3. Test Data Updates: Updated integration test expected records to match the new field names
  4. Version Bump: Incremented version to 1.9.1 as this is a schema-breaking change

The transformation preserves the original GitHub API field names in unit test response files while using the sanitized field names in actual data output and integration tests.

Review Guide

  1. source_github/streams.py - Verify transformation logic handles edge cases (null reactions, empty objects)
  2. Schema files - Confirm all reaction field references use plus_one/minus_one consistently
  3. integration_tests/expected_records.jsonl - Verify records use new field names
  4. Unit test response files - Ensure they still use original +1/-1 field names (these should preserve raw API responses)

User Impact

Positive Impact:

  • Eliminates duplicate field errors when syncing GitHub data to Parquet/Avro destinations
  • Resolves long-standing compatibility issues with S3 destinations

Breaking Change:

  • Users will need to refresh their GitHub source schema after upgrading
  • Existing dashboards/queries referencing +1/-1 reaction fields will need to be updated to use plus_one/minus_one

Can this PR be safely reverted and rolled back?

  • YES 💚

This change is isolated to the GitHub source connector and only affects reaction field naming. Reverting would restore the original field names and schema definitions.


Link to Devin run: https://app.devin.ai/sessions/a37c96affcd24179b577950c5cc54791
Requested by: @vai-airbyte via /ai-fix command

… columns in Parquet/Avro destinations

- Transform reaction fields +1 and -1 to plus_one and minus_one in GithubStream.transform()
- Update all schema files to use plus_one and minus_one field names
- Update integration test expected records to match new field names
- Bump version to 1.9.1

Fixes airbytehq/oncall#9234

Co-Authored-By: unknown <>
Copy link
Contributor Author

Original prompt from API User
Comment from @vai-airbyte: /ai-fix\n\nIMPORTANT: The user will expect a response posted back to the PR. You should post exactly one comment back to the respective issue PR. If the user requested a code change or PR, your comment should contain a link to the PR. Assume the user has no access to your session or conversation thread unless/until you respond back to them.\n\nIssue #9234 by @octavia-squidington-iii: Maintain - Source Github: Columns `+1` and `-1` causing duplicate field errors in S3 Parquet/Avro Destinations\n\nIssue URL: https://github.com/airbytehq/oncall/issues/9234\n\nPlease use playbook macro: !issue_fix

PLAYBOOK_md:
# AI Fix Playbook

You are AI Fix Devin, an expert at reproducing and fixing Airbyte-related issues.

## Context
You are working on the issue linked above in context. You will also need to pull issue comments for full context.

## Rule: Immediate Issue Comment After PR Creation
**MANDATORY REQUIREMENT**: If you create a PR during an AI Fix workflow, your **first action** after creating the PR must be to create a comment on the originating issue. If you cannot create a PR, likewise, your action should be to comment back to the issue.

## Your Task: Reproduce and Fix

1. **Analysis**: Read the complete issue content including all comments for full context.

2. **Research**: Check the internet and Airbyte repositories for:
   - Similar issues and their solutions
   - Known bugs or limitations
   - Recent changes that might have introduced the problem

3. **Environment Setup**: Verify and set up the necessary environment:
   - Check available credentials and access
   - Set up Airbyte repositories and dependencies
   - Prepare test environment for reproduction

4. **Reproduction Attempt**: Try to reproduce the issue:
   - Follow the exact steps described in the issue
   - Document your reproduction process
   - Capture logs, errors, and diagnostic information

5. **Root Cause Analysis**: If reproduction is successful:
   - Analyze the root ... (1704 chars truncated...)

Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Helpful Resources

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
    • You can specify a custom changelog by passing changelog. Example: /bump-version changelog="My cool update"
    • Leaving the changelog arg blank will auto-populate the changelog from the PR title.
  • /run-cat-tests - Runs legacy CAT tests (Connector Acceptance Tests)
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).
  • JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
    • /bump-bulk-cdk-version type=patch changelog='foo' - Bump the Bulk CDK's version. type can be major/minor/patch.
  • Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.

📝 Edit this welcome message.

Copy link
Contributor

github-actions bot commented Oct 10, 2025

source-github Connector Test Results

91 tests   87 ✅  20s ⏱️
 3 suites   4 💤
 3 files     0 ❌

Results for commit 3f3429c.

♻️ This comment has been updated with latest results.

…c.1 for progressive rollout

Co-Authored-By: unknown <>
Copy link
Contributor

github-actions bot commented Oct 10, 2025

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-dvzluhnaf-airbyte-growth.vercel.app

Built with commit 3f3429c.
This pull request is being automatically deployed with vercel-action

@DanyloGL DanyloGL moved this from New PRs to Waiting Eng Team in 🧑‍🏭 Community Pull Requests Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Waiting Eng Team

Development

Successfully merging this pull request may close these issues.

1 participant