-
Notifications
You must be signed in to change notification settings - Fork 67
Decimal bug #729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal bug #729
Conversation
data type in the COPY statement - in order to fix failures due to Snowflake not directly supporting the timestamp formats generated by the Salesforce connector.
📝 WalkthroughWalkthroughThis change updates type conversion logic for BigQuery to map DECIMAL and NUMERIC to FLOAT64, revises Snowflake authentication from password-based to private key-based, and adjusts the Snowflake destination configuration to use key pair authentication. Timestamp column handling in Snowflake's file ingestion is also improved. Changes
Sequence Diagram(s)sequenceDiagram
participant Cache as Cache
participant DestConfig as Destination Configuration
participant Snowflake as Snowflake
Cache->>DestConfig: Provide Snowflake cache (with private_key)
DestConfig->>DestConfig: Construct KeyPairAuthentication using private_key
DestConfig->>Snowflake: Use private_key for authentication
sequenceDiagram
participant Files as Files
participant SnowflakeProc as SnowflakeSqlProcessor
participant Snowflake as Snowflake
Files->>SnowflakeProc: Provide data files for ingestion
SnowflakeProc->>SnowflakeProc: Build column expressions
alt Timestamp column
SnowflakeProc->>SnowflakeProc: Wrap with COALESCE(TRY_TO_TIMESTAMP(...))
else Other column
SnowflakeProc->>SnowflakeProc: Reference column directly
end
SnowflakeProc->>Snowflake: Execute COPY INTO with new column expressions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Suggested reviewers
Would you like to add reviewers who have recently contributed to BigQuery or Snowflake integration logic as well, wdyt? Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte/_processors/sql/bigquery.py (1)
120-124
: Consider returning a SQLAlchemy type object for consistency?The mapping to "FLOAT64" string looks good for addressing precision issues with BigQuery decimals. However, I noticed this returns a string while other mappings return SQLAlchemy type objects. Would it make sense to use something like
sqlalchemy_types.Float()
or a BigQuery-specific float type for consistency with the rest of the method? Just wondering if this might cause issues downstream where SQLAlchemy type objects are expected - wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/_processors/sql/bigquery.py
(1 hunks)airbyte/_processors/sql/snowflake.py
(4 hunks)airbyte/destinations/_translate_cache_to_dest.py
(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-09-23T21:27:04.266Z
Learning: In BigQuery, `sqlalchemy_types.Integer()` maps to `INT64`, and `sqlalchemy_bigquery` aliases `Integer` as `INT64`.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In BigQuery, `sqlalchemy_types.Integer()` maps to `INT64`, and `sqlalchemy_bigquery` aliases `Integer` as `INT64`.
airbyte/destinations/_translate_cache_to_dest.py (1)
Learnt from: aaronsteers
PR: #415
File: examples/run_perf_test_reads.py:117-127
Timestamp: 2024-10-09T19:21:45.994Z
Learning: In examples/run_perf_test_reads.py
, the code for setting up Snowflake configuration in get_cache
and get_destination
cannot be refactored into a shared helper function because there are differences between them.
airbyte/_processors/sql/bigquery.py (2)
Learnt from: aaronsteers
PR: #396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-09-23T21:27:04.266Z
Learning: In BigQuery, sqlalchemy_types.Integer()
maps to INT64
, and sqlalchemy_bigquery
aliases Integer
as INT64
.
Learnt from: aaronsteers
PR: #396
File: airbyte/_processors/sql/bigquery.py:119-121
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In BigQuery, sqlalchemy_types.Integer()
maps to INT64
, and sqlalchemy_bigquery
aliases Integer
as INT64
.
airbyte/_processors/sql/snowflake.py (1)
Learnt from: aaronsteers
PR: #415
File: examples/run_perf_test_reads.py:117-127
Timestamp: 2024-10-09T19:21:45.994Z
Learning: In examples/run_perf_test_reads.py
, the code for setting up Snowflake configuration in get_cache
and get_destination
cannot be refactored into a shared helper function because there are differences between them.
🔇 Additional comments (5)
airbyte/_processors/sql/snowflake.py (3)
41-41
: Authentication method updated to private key - looks good!The switch from password to private key authentication is a good security improvement. I can see this change is consistently applied across
get_sql_alchemy_url
andget_vendor_client
methods.
34-34
: Timestamp format constants look reasonableThe AUTO format with fallback to a specific ISO-like format should handle most timestamp variations. Just curious - have you tested this with the common timestamp formats you expect to encounter in your data sources?
162-172
: Nice improvement to timestamp handling robustness!The COALESCE approach with multiple TRY_TO_TIMESTAMP calls is a solid way to handle various timestamp formats gracefully. The logic correctly differentiates between timestamp columns (which get the special handling) and other columns (which are referenced directly). This should make the data loading much more resilient to timestamp format variations.
airbyte/destinations/_translate_cache_to_dest.py (2)
15-15
: Consistent authentication update - great coordination!The import change from
UsernameAndPassword
toKeyPairAuthentication
aligns perfectly with the private key authentication changes in the Snowflake processor. Nice to see the coordinated effort across the codebase.
110-112
: Credentials construction updated correctlyThe switch to
KeyPairAuthentication
withprivate_key
from the cache is consistent with the authentication method changes in the Snowflake processor. The implementation looks correct and maintains the same structure as before.
Summary by CodeRabbit
New Features
Bug Fixes
Chores