feat(sql): support TIMESTAMPTZ #7020
Conversation
Map TIMESTAMPTZ to Timestamp(unit, "UTC"), storing values internally as UTC (consistent with PostgreSQL semantics). Also parse TIMESTAMP WITHOUT TIME ZONE and TIME WITHOUT TIME ZONE for specification completeness. TIMETZ is deliberately rejected. Arrow's Time64 type has no timezone representation. This aligns with PostgreSQL's own documentation which notes that the TIME WITH TIME ZONE type "exhibits properties which lead to questionable usefulness" and recommends using TIMESTAMPTZ instead [1]. Fix supertype computation for Timestamp(None) vs Timestamp(Some(tz)) to enable comparisons between timezone-naive and timezone-aware timestamps, using "localize" semantics (equivalent to PostgreSQL with session timezone = UTC). [1] https://www.postgresql.org/docs/current/datatype-datetime.html Closes Eventual-Inc#3957
Greptile SummaryThis PR adds SQL support for
Confidence Score: 4/5The core SQL type mapping and supertype logic are correct and well-tested; the main concern is that silently accepting naive/aware timestamp comparisons (previously a hard error) is a behavioral change that should be flagged in the PR title. The type-mapping logic in
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["SQL TIMESTAMP type token"] --> B{TimezoneInfo?}
B -->|None / WITHOUT TIME ZONE| C["Timestamp(unit, None)"]
B -->|WITH TIME ZONE / Tz| D["Timestamp(unit, Some('UTC'))"]
E["SQL TIME type token"] --> F{TimezoneInfo?}
F -->|None / WITHOUT TIME ZONE| G["Time(unit)"]
F -->|WITH TIME ZONE / Tz| H["Error: Arrow Time64 has no timezone"]
C --> I[supertype resolution]
D --> I
J["Timestamp(unit, None)"] --> I
I --> K{Both sides of comparison}
K -->|"None + None"| L["Timestamp(max_tu, None)"]
K -->|"Some(tz) + Some(tz) same"| M["Timestamp(max_tu, Some(tz))"]
K -->|"Some(tz1) + Some(tz2) different"| N["Timestamp(max_tu, Some('UTC'))"]
K -->|"None + Some(tz) [NEW]"| O["Timestamp(max_tu, Some(tz)) — localize"]
Reviews (1): Last reviewed commit: "feat(sql): support TIMESTAMPTZ" | Re-trigger Greptile |
Add a test case with a non-UTC offset (+05:30) where the naive and timezone-aware epochs differ by the offset amount, verifying that localize semantics produce the correct (not equal) result.
The get_time_units function was selecting the wrong time unit when comparing timestamps with different precisions (e.g., Microseconds vs Milliseconds). Fix it to always return the higher-precision unit (Nanoseconds > Microseconds > Milliseconds).
Changes Made
Add SQL support for
TIMESTAMPTZ, backed by PostgreSQL's documented behavior.SQL type parsing (
schema.rs):TIMESTAMPTZ→Timestamp(unit, Some("UTC")). As PostgreSQL states: "the value is stored internally as UTC, and the originally stated or assumed time zone is not retained." Our mapping is consistent with this semantics.TIMESTAMP WITHOUT TIME ZONE/TIME WITHOUT TIME ZONE→ parsed for completeness.TIMETZ→ deliberately rejected. PostgreSQL itself advises in Section 8.5.3: "The type time with time zone is defined by the SQL standard, but the definition exhibits properties which lead to questionable usefulness." Arrow'sTime64also has no timezone representation.Supertype fix (
supertype.rs):Timestamp(None)vsTimestamp(Some(tz))now resolves by promoting the naive side with "localize" semantics (no conversion, just attaches the timezone label). Equivalent to PostgreSQL withsession timezone = UTC. Without this, comparisons/joins betweenTIMESTAMPandTIMESTAMPTZfail withTypeError.Tests:
schema.rsNonevsSome(tz), same-None, two-different-tz, and empty-tz edges insupertype.rstest_temporal_exprs.pyRelated Issues
Closes #3957