Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Oct 2, 2025

Migrated from community PR:

Summary by CodeRabbit

  • Bug Fixes

    • Timestamp columns are now consistently timezone-aware where applicable.
    • Corrected type mappings for timestamps with and without timezone.
    • Date-time strings now map to timezone-aware timestamps for improved consistency across databases.
    • Final tables and cache columns reflect accurate timezone settings in generated SQL.
  • Tests

    • Added an integration test validating SQL generation for timezone-aware timestamp columns.
    • Updated unit tests to verify timezone attributes on timestamp types.

Copy link

github-actions bot commented Oct 2, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@aj/uptickmetachu/fix/store-timestamps-with-timezone' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@aj/uptickmetachu/fix/store-timestamps-with-timezone'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Copy link
Contributor

coderabbitai bot commented Oct 2, 2025

📝 Walkthrough

Walkthrough

TIMESTAMP usage was made explicit about timezone: code now constructs TIMESTAMP(timezone=True/False) for mapped types and extracted-at columns, and tests were added/updated to validate the timezone parameter in generated SQL and type translations.

Changes

Cohort / File(s) Summary of Changes
SQL Processor timestamp tweak
airbyte/shared/sql_processor.py
Set AB_EXTRACTED_AT_COLUMN to TIMESTAMP(timezone=True) in _get_sql_column_definitions.
Type translation timezone explicitness
airbyte/types.py
CONVERSION_MAP entries for timestamp types now use TIMESTAMP(timezone=True) / TIMESTAMP(timezone=False); to_sql_type for string with date-time returns TIMESTAMP(timezone=True).
Integration test for timezone-aware columns
tests/integration_tests/test_all_cache_types.py
Added test_cache_columns_for_datetime_types_are_timezone_aware() asserting generated SQL uses TIMESTAMP(timezone=...) for datetime columns via a patched DuckDB processor.
Unit tests updated for parameterized TIMESTAMP
tests/unit_tests/test_type_translation.py
Tests now construct types.TIMESTAMP(timezone=True/False) and, when expecting a TIMESTAMP, assert isinstance(..., TIMESTAMP) plus sql_type.timezone == expected.timezone; other type assertions unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Schema as JSON Schema
  participant Mapper as types.to_sql_type
  participant Map as CONVERSION_MAP
  participant Processor as SQL Processor (_get_sql_column_definitions)
  participant DB as DB exec (e.g., DuckDB)

  rect rgb(235,245,255)
    Schema->>Mapper: request SQL type for field (format: date-time)
    Mapper->>Map: lookup "timestamp_with_timezone"/"date-time"
    Map-->>Mapper: return TIMESTAMP(timezone=True)/TIMESTAMP(timezone=False)
  end

  rect rgb(245,255,235)
    Mapper->>Processor: emit column definition using TIMESTAMP(timezone=...)
    Processor->>DB: execute CREATE/ALTER TABLE with TIMESTAMP(timezone=...)
    DB-->>Processor: success / error
  end

  Note over Mapper,Processor: Tests assert TIMESTAMP instances include correct timezone flag
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Want me to add a comparison diagram for the old vs new column-creation flows, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit's high-level summary is enabled.
Title Check ✅ Passed The title clearly summarizes the main change—making timestamp storage timezone aware—and succinctly indicates that this is a fix without including irrelevant details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aj/uptickmetachu/fix/store-timestamps-with-timezone

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

github-actions bot commented Oct 2, 2025

PyTest Results (Fast Tests Only, No Creds)

304 tests  ±0   304 ✅ ±0   4m 22s ⏱️ -1s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit e7fac19. ± Comparison against base commit be288bb.

This pull request removes 2 and adds 2 tests. Note that renamed tests count towards both.
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def10-TIMESTAMP]
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def11-TIMESTAMP]
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def10-expected_sql_type10]
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def11-expected_sql_type11]

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Oct 2, 2025

PyTest Results (Full)

368 tests  +1   352 ✅ +1   19m 52s ⏱️ +26s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit e7fac19. ± Comparison against base commit be288bb.

This pull request removes 2 and adds 3 tests. Note that renamed tests count towards both.
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def10-TIMESTAMP]
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def11-TIMESTAMP]
tests.integration_tests.test_all_cache_types ‑ test_cache_columns_for_datetime_types_are_timezone_aware
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def10-expected_sql_type10]
tests.unit_tests.test_type_translation ‑ test_to_sql_type[json_schema_property_def11-expected_sql_type11]

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integration_tests/test_all_cache_types.py (1)

307-307: Clarify the temp_dir parameter.

Path() creates a Path object pointing to the current directory. Is this the intended behavior, or should it be Path.cwd() or a proper temporary directory? The parameter name suggests it should be a temporary location. Wdyt?

Consider using a more explicit temporary directory:

     processor = DuckDBSqlProcessor(
         catalog_provider=CatalogProvider(source.configured_catalog),
-        temp_dir=Path(),
+        temp_dir=Path.cwd() / "temp",  # or use tempfile.mkdtemp()
         temp_file_cleanup=True,
         sql_config=config,
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfd19ac and d63b82f.

📒 Files selected for processing (1)
  • tests/integration_tests/test_all_cache_types.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/integration_tests/test_all_cache_types.py (4)
airbyte/sources/util.py (1)
  • get_source (47-132)
airbyte/_processors/sql/duckdb.py (2)
  • DuckDBConfig (27-87)
  • DuckDBSqlProcessor (90-184)
airbyte/shared/catalog_providers.py (1)
  • CatalogProvider (31-222)
airbyte/shared/sql_processor.py (3)
  • catalog_provider (226-242)
  • sql_config (385-387)
  • _ensure_final_table_exists (625-646)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
tests/integration_tests/test_all_cache_types.py (2)

15-15: LGTM!

The import additions are all used in the new test function and are appropriate for the testing approach.

Also applies to: 20-20, 23-23


294-297: Manually verify source-faker required config fields
The script couldn’t retrieve any required fields—ensure that an empty config is valid or add any mandatory fields to the test, wdyt?

Addresses CodeRabbit feedback to properly verify timezone-awareness.
Instead of checking compiled SQL (which varies by dialect), now checks
repr() of SQLAlchemy type objects which shows 'TIMESTAMP(timezone=True)'.
This is dialect-agnostic and actually verifies the timezone parameter.

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/integration_tests/test_all_cache_types.py (1)

310-318: Consider checking the .timezone attribute directly?

The current implementation verifies timezone awareness by checking repr(col_type) for the substring "TIMESTAMP(timezone=True)". While this works, it relies on string matching against the type's representation, which could be fragile if SQLAlchemy changes its repr() format.

Wdyt about checking the .timezone attribute directly instead? This would be more robust and future-proof:

Apply this diff to check the attribute directly:

     column_definitions = processor._get_sql_column_definitions("products")
 
     for col_name in ["created_at", "updated_at", "_airbyte_extracted_at"]:
         assert col_name in column_definitions, f"{col_name} column should exist"
         col_type = column_definitions[col_name]
-        col_type_repr = repr(col_type)
-        assert "TIMESTAMP(timezone=True)" in col_type_repr, (
-            f"{col_name} should be timezone-aware. Got: {col_type_repr}"
+        assert hasattr(col_type, 'timezone'), (
+            f"{col_name} should be a TIMESTAMP type with timezone attribute"
+        )
+        assert col_type.timezone is True, (
+            f"{col_name} should be timezone-aware (timezone=True). Got: {col_type.timezone}"
         )

Based on learnings from the past review comment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d63b82f and e7fac19.

📒 Files selected for processing (1)
  • tests/integration_tests/test_all_cache_types.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/integration_tests/test_all_cache_types.py (3)
airbyte/_processors/sql/duckdb.py (1)
  • DuckDBConfig (27-87)
airbyte/shared/catalog_providers.py (1)
  • CatalogProvider (31-222)
airbyte/shared/sql_processor.py (2)
  • catalog_provider (226-242)
  • _get_sql_column_definitions (687-704)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
tests/integration_tests/test_all_cache_types.py (2)

19-19: LGTM!

The new imports are necessary for the test function and follow the existing patterns in the file.

Also applies to: 22-22


290-308: LGTM!

The test setup correctly instantiates the necessary components for validating timezone-aware TIMESTAMP columns.

@aaronsteers aaronsteers merged commit 99d2ccf into main Oct 2, 2025
21 checks passed
@aaronsteers aaronsteers deleted the aj/uptickmetachu/fix/store-timestamps-with-timezone branch October 2, 2025 22:52
@uptickmetachu
Copy link
Contributor

uptickmetachu commented Oct 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants