-
Notifications
You must be signed in to change notification settings - Fork 67
fix: store timestamps with timezone (from @uptickmetachu PR) #814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: store timestamps with timezone (from @uptickmetachu PR) #814
Conversation
This subsequently means type hints are also correct
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@aj/uptickmetachu/fix/store-timestamps-with-timezone' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@aj/uptickmetachu/fix/store-timestamps-with-timezone' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughTIMESTAMP usage was made explicit about timezone: code now constructs TIMESTAMP(timezone=True/False) for mapped types and extracted-at columns, and tests were added/updated to validate the timezone parameter in generated SQL and type translations. Changes
Sequence Diagram(s)sequenceDiagram
participant Schema as JSON Schema
participant Mapper as types.to_sql_type
participant Map as CONVERSION_MAP
participant Processor as SQL Processor (_get_sql_column_definitions)
participant DB as DB exec (e.g., DuckDB)
rect rgb(235,245,255)
Schema->>Mapper: request SQL type for field (format: date-time)
Mapper->>Map: lookup "timestamp_with_timezone"/"date-time"
Map-->>Mapper: return TIMESTAMP(timezone=True)/TIMESTAMP(timezone=False)
end
rect rgb(245,255,235)
Mapper->>Processor: emit column definition using TIMESTAMP(timezone=...)
Processor->>DB: execute CREATE/ALTER TABLE with TIMESTAMP(timezone=...)
DB-->>Processor: success / error
end
Note over Mapper,Processor: Tests assert TIMESTAMP instances include correct timezone flag
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Want me to add a comparison diagram for the old vs new column-creation flows, wdyt? Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PyTest Results (Fast Tests Only, No Creds)304 tests ±0 304 ✅ ±0 4m 22s ⏱️ -1s Results for commit e7fac19. ± Comparison against base commit be288bb. This pull request removes 2 and adds 2 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
PyTest Results (Full)368 tests +1 352 ✅ +1 19m 52s ⏱️ +26s Results for commit e7fac19. ± Comparison against base commit be288bb. This pull request removes 2 and adds 3 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integration_tests/test_all_cache_types.py (1)
307-307
: Clarify the temp_dir parameter.
Path()
creates a Path object pointing to the current directory. Is this the intended behavior, or should it bePath.cwd()
or a proper temporary directory? The parameter name suggests it should be a temporary location. Wdyt?Consider using a more explicit temporary directory:
processor = DuckDBSqlProcessor( catalog_provider=CatalogProvider(source.configured_catalog), - temp_dir=Path(), + temp_dir=Path.cwd() / "temp", # or use tempfile.mkdtemp() temp_file_cleanup=True, sql_config=config, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/integration_tests/test_all_cache_types.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/integration_tests/test_all_cache_types.py (4)
airbyte/sources/util.py (1)
get_source
(47-132)airbyte/_processors/sql/duckdb.py (2)
DuckDBConfig
(27-87)DuckDBSqlProcessor
(90-184)airbyte/shared/catalog_providers.py (1)
CatalogProvider
(31-222)airbyte/shared/sql_processor.py (3)
catalog_provider
(226-242)sql_config
(385-387)_ensure_final_table_exists
(625-646)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
tests/integration_tests/test_all_cache_types.py (2)
15-15
: LGTM!The import additions are all used in the new test function and are appropriate for the testing approach.
Also applies to: 20-20, 23-23
294-297
: Manually verify source-faker required config fields
The script couldn’t retrieve any required fields—ensure that an empty config is valid or add any mandatory fields to the test, wdyt?
Addresses CodeRabbit feedback to properly verify timezone-awareness. Instead of checking compiled SQL (which varies by dialect), now checks repr() of SQLAlchemy type objects which shows 'TIMESTAMP(timezone=True)'. This is dialect-agnostic and actually verifies the timezone parameter. Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/integration_tests/test_all_cache_types.py (1)
310-318
: Consider checking the.timezone
attribute directly?The current implementation verifies timezone awareness by checking
repr(col_type)
for the substring"TIMESTAMP(timezone=True)"
. While this works, it relies on string matching against the type's representation, which could be fragile if SQLAlchemy changes itsrepr()
format.Wdyt about checking the
.timezone
attribute directly instead? This would be more robust and future-proof:Apply this diff to check the attribute directly:
column_definitions = processor._get_sql_column_definitions("products") for col_name in ["created_at", "updated_at", "_airbyte_extracted_at"]: assert col_name in column_definitions, f"{col_name} column should exist" col_type = column_definitions[col_name] - col_type_repr = repr(col_type) - assert "TIMESTAMP(timezone=True)" in col_type_repr, ( - f"{col_name} should be timezone-aware. Got: {col_type_repr}" + assert hasattr(col_type, 'timezone'), ( + f"{col_name} should be a TIMESTAMP type with timezone attribute" + ) + assert col_type.timezone is True, ( + f"{col_name} should be timezone-aware (timezone=True). Got: {col_type.timezone}" )Based on learnings from the past review comment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/integration_tests/test_all_cache_types.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/integration_tests/test_all_cache_types.py (3)
airbyte/_processors/sql/duckdb.py (1)
DuckDBConfig
(27-87)airbyte/shared/catalog_providers.py (1)
CatalogProvider
(31-222)airbyte/shared/sql_processor.py (2)
catalog_provider
(226-242)_get_sql_column_definitions
(687-704)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
tests/integration_tests/test_all_cache_types.py (2)
19-19
: LGTM!The new imports are necessary for the test function and follow the existing patterns in the file.
Also applies to: 22-22
290-308
: LGTM!The test setup correctly instantiates the necessary components for validating timezone-aware TIMESTAMP columns.
I think this PR critically does not include these lines of code (I just did a test on 0.31.7) and |
Migrated from community PR:
Summary by CodeRabbit
Bug Fixes
Tests