Skip to content

Conversation

gbikhazi20
Copy link

@gbikhazi20 gbikhazi20 commented Jul 30, 2025

Summary by CodeRabbit

  • New Features

    • Improved handling of numeric types for BigQuery destinations to reduce precision errors.
    • Enhanced timestamp parsing for Snowflake destinations, supporting multiple timestamp formats during data import.
  • Bug Fixes

    • Updated Snowflake authentication to use private key authentication instead of password-based authentication for increased security.
  • Chores

    • Internal updates to credential handling for Snowflake destinations.

aman-immersa and others added 4 commits January 25, 2025 16:01
data type in the COPY statement - in order to fix failures due to
Snowflake not directly supporting the timestamp formats generated
by the Salesforce connector.
Copy link
Contributor

coderabbitai bot commented Jul 30, 2025

📝 Walkthrough

Walkthrough

This change updates type conversion logic for BigQuery to map DECIMAL and NUMERIC to FLOAT64, revises Snowflake authentication from password-based to private key-based, and adjusts the Snowflake destination configuration to use key pair authentication. Timestamp column handling in Snowflake's file ingestion is also improved.

Changes

Cohort / File(s) Change Summary
BigQuery Type Conversion
airbyte/_processors/sql/bigquery.py
Enhanced type conversion in BigQueryTypeConverter.to_sql_type to explicitly map SQLAlchemy DECIMAL and NUMERIC types to BigQuery's FLOAT64, mitigating precision issues.
Snowflake Authentication & Timestamp Handling
airbyte/_processors/sql/snowflake.py
Switched SnowflakeConfig authentication from password to private key. Updated connection URL and client logic accordingly. Improved timestamp ingestion by wrapping timestamp columns in COALESCE of TRY_TO_TIMESTAMP calls with multiple formats during file ingestion.
Snowflake Destination Credentials
airbyte/destinations/_translate_cache_to_dest.py
Changed credential construction in snowflake_cache_to_destination_configuration to use KeyPairAuthentication with a private key instead of UsernameAndPassword with a password.

Sequence Diagram(s)

sequenceDiagram
    participant Cache as Cache
    participant DestConfig as Destination Configuration
    participant Snowflake as Snowflake

    Cache->>DestConfig: Provide Snowflake cache (with private_key)
    DestConfig->>DestConfig: Construct KeyPairAuthentication using private_key
    DestConfig->>Snowflake: Use private_key for authentication
Loading
sequenceDiagram
    participant Files as Files
    participant SnowflakeProc as SnowflakeSqlProcessor
    participant Snowflake as Snowflake

    Files->>SnowflakeProc: Provide data files for ingestion
    SnowflakeProc->>SnowflakeProc: Build column expressions
    alt Timestamp column
        SnowflakeProc->>SnowflakeProc: Wrap with COALESCE(TRY_TO_TIMESTAMP(...))
    else Other column
        SnowflakeProc->>SnowflakeProc: Reference column directly
    end
    SnowflakeProc->>Snowflake: Execute COPY INTO with new column expressions
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

Suggested reviewers

  • aaronsteers

Would you like to add reviewers who have recently contributed to BigQuery or Snowflake integration logic as well, wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte/_processors/sql/bigquery.py (1)

120-124: Consider returning a SQLAlchemy type object for consistency?

The mapping to "FLOAT64" string looks good for addressing precision issues with BigQuery decimals. However, I noticed this returns a string while other mappings return SQLAlchemy type objects. Would it make sense to use something like sqlalchemy_types.Float() or a BigQuery-specific float type for consistency with the rest of the method? Just wondering if this might cause issues downstream where SQLAlchemy type objects are expected - wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1266018 and ab27781.

📒 Files selected for processing (3)
  • airbyte/_processors/sql/bigquery.py (1 hunks)
  • airbyte/_processors/sql/snowflake.py (4 hunks)
  • airbyte/destinations/_translate_cache_to_dest.py (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-09-23T21:27:04.266Z
Learning: In BigQuery, `sqlalchemy_types.Integer()` maps to `INT64`, and `sqlalchemy_bigquery` aliases `Integer` as `INT64`.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In BigQuery, `sqlalchemy_types.Integer()` maps to `INT64`, and `sqlalchemy_bigquery` aliases `Integer` as `INT64`.
airbyte/destinations/_translate_cache_to_dest.py (1)

Learnt from: aaronsteers
PR: #415
File: examples/run_perf_test_reads.py:117-127
Timestamp: 2024-10-09T19:21:45.994Z
Learning: In examples/run_perf_test_reads.py, the code for setting up Snowflake configuration in get_cache and get_destination cannot be refactored into a shared helper function because there are differences between them.

airbyte/_processors/sql/bigquery.py (2)

Learnt from: aaronsteers
PR: #396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-09-23T21:27:04.266Z
Learning: In BigQuery, sqlalchemy_types.Integer() maps to INT64, and sqlalchemy_bigquery aliases Integer as INT64.

Learnt from: aaronsteers
PR: #396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In BigQuery, sqlalchemy_types.Integer() maps to INT64, and sqlalchemy_bigquery aliases Integer as INT64.

airbyte/_processors/sql/snowflake.py (1)

Learnt from: aaronsteers
PR: #415
File: examples/run_perf_test_reads.py:117-127
Timestamp: 2024-10-09T19:21:45.994Z
Learning: In examples/run_perf_test_reads.py, the code for setting up Snowflake configuration in get_cache and get_destination cannot be refactored into a shared helper function because there are differences between them.

🔇 Additional comments (5)
airbyte/_processors/sql/snowflake.py (3)

41-41: Authentication method updated to private key - looks good!

The switch from password to private key authentication is a good security improvement. I can see this change is consistently applied across get_sql_alchemy_url and get_vendor_client methods.


34-34: Timestamp format constants look reasonable

The AUTO format with fallback to a specific ISO-like format should handle most timestamp variations. Just curious - have you tested this with the common timestamp formats you expect to encounter in your data sources?


162-172: Nice improvement to timestamp handling robustness!

The COALESCE approach with multiple TRY_TO_TIMESTAMP calls is a solid way to handle various timestamp formats gracefully. The logic correctly differentiates between timestamp columns (which get the special handling) and other columns (which are referenced directly). This should make the data loading much more resilient to timestamp format variations.

airbyte/destinations/_translate_cache_to_dest.py (2)

15-15: Consistent authentication update - great coordination!

The import change from UsernameAndPassword to KeyPairAuthentication aligns perfectly with the private key authentication changes in the Snowflake processor. Nice to see the coordinated effort across the codebase.


110-112: Credentials construction updated correctly

The switch to KeyPairAuthentication with private_key from the cache is consistent with the authentication method changes in the Snowflake processor. The implementation looks correct and maintains the same structure as before.

@gbikhazi20 gbikhazi20 closed this Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants