Skip to content

[CLAUDE] fix(databricks)!: disambiguate 2-arg date_add from 3-arg dateadd#7588

Open
RichardHughes-amp wants to merge 2 commits intotobymao:mainfrom
RichardHughes-amp:storm-2954-databricks-dateadd-disambiguation
Open

[CLAUDE] fix(databricks)!: disambiguate 2-arg date_add from 3-arg dateadd#7588
RichardHughes-amp wants to merge 2 commits intotobymao:mainfrom
RichardHughes-amp:storm-2954-databricks-dateadd-disambiguation

Conversation

@RichardHughes-amp
Copy link
Copy Markdown
Contributor

Follow-up to #3609.

That PR added Databricks support for the 3-arg DATE_ADD(unit, value, expr)
shape and noted: "the old / 2-arg version is still supported alongside the
new one and should be preserved to ensure the DATE vs TIMESTAMP semantics."

The transpile output preserves the unit, but at the AST level both shapes
collapsed to exp.DateAdd(unit=DAY|<unit>, this=expr, expression=value) with
bit-identical args. A type annotator on exp.DateAdd therefore cannot honor
the differing return-type contracts:

Form Documented return
date_add(startDate, numDays) DATE
dateadd(unit, value, expr) preserves expr's type

Empirical Databricks behavior

Both function names accept both arities, and arity alone selects the
semantic. Verified against a live Databricks SQL warehouse:

SQL Result Type
date_add(DATE'2024-01-01', 5) 2024-01-06 date
date_add(MONTH, 1, TIMESTAMP'2024-01-01 00:00') 2024-02-01T00:00:00.000+00:00 timestamp
dateadd(MONTH, 1, TIMESTAMP'2024-01-01 00:00') 2024-02-01T00:00:00.000+00:00 timestamp
dateadd(DATE'2024-01-01', 5) 2024-01-06 date
date_add(TIMESTAMP'2024-01-01 12:00', 5) 2024-01-06 date

Note the last row: 2-arg date_add coerces a TIMESTAMP operand to DATE on
return — a contract that cannot be expressed by _annotate_timeunit's
type-preserving _coerce_date path.

Docs:

(The docs do not document the off-diagonal arities; the 2×2 above was
filled in by direct verification.)

Change

  • Parser (sqlglot/parsers/databricks.py): a single arity-dispatching
    builder is keyed to both DATE_ADD and DATEADD, since the names are
    aliases in Databricks. The 2-arg form routes to exp.TsOrDsAdd (matching
    the Hive parser's existing routing for 2-arg DATE_ADD); the 3-arg form
    continues to produce exp.DateAdd.

  • Typing (sqlglot/typing/spark.py): registers a DATE-returning
    annotator for exp.TsOrDsAdd. Scoped to the Spark typing module — not
    Hive — because older Hive returned STRING from date_add. Spark and
    Databricks inherit the new entry; Hive is unchanged.

Tests

  • tests/dialects/test_databricks.py::test_add_date — round-trip and
    cross-dialect output for 2-arg, 3-arg, and off-diagonal forms.
  • tests/test_optimizer.py::test_databricks_date_add_annotation — full
    2×2 type-annotation matrix.
  • tests/test_optimizer.py::test_hive_chain_date_add_descent — pins
    Hive (UNKNOWN) → Spark (DATE) → Databricks (DATE) to demonstrate the
    typing change lands at Spark, not Hive.

Full upstream suite: 1209 tests pass.

Breaking change

AST consumers that previously matched 2-arg date_add(...) via
find_all(exp.DateAdd) will need to also match exp.TsOrDsAdd, or
switch to the latter for that shape specifically.

Databricks treats `date_add` and `dateadd` as full aliases with arity
selecting the semantic:
- 2-arg `(startDate, numDays)`: always returns DATE
- 3-arg `(unit, value, expr)`: preserves the operand's type

Previously the Databricks parser routed both shapes through
`build_date_delta(exp.DateAdd)`, collapsing them to a single AST node
with identical args. Type annotation could not honor both contracts.

Route the 2-arg form to `exp.TsOrDsAdd` (matching the Hive parser's
existing behavior) and register a DATE-returning annotator for
`exp.TsOrDsAdd` at the Spark typing level. The 3-arg form continues to
produce `exp.DateAdd`, annotated by `_annotate_timeunit`.

Follow-up to tobymao#3609, which preserved the unit during transpile but
collapsed both forms at the AST level.
ruff UP006. Drops the now-unused typing import.
@RichardHughes-amp RichardHughes-amp marked this pull request as ready for review April 30, 2026 20:22
@RichardHughes-amp
Copy link
Copy Markdown
Contributor Author

RichardHughes-amp commented May 1, 2026

I need to do an audit to find all the places that the AST consumers can be affected by this shift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant