[CLAUDE] fix(databricks)!: disambiguate 2-arg date_add from 3-arg dateadd#7588
Open
RichardHughes-amp wants to merge 2 commits intotobymao:mainfrom
Open
Conversation
Databricks treats `date_add` and `dateadd` as full aliases with arity selecting the semantic: - 2-arg `(startDate, numDays)`: always returns DATE - 3-arg `(unit, value, expr)`: preserves the operand's type Previously the Databricks parser routed both shapes through `build_date_delta(exp.DateAdd)`, collapsing them to a single AST node with identical args. Type annotation could not honor both contracts. Route the 2-arg form to `exp.TsOrDsAdd` (matching the Hive parser's existing behavior) and register a DATE-returning annotator for `exp.TsOrDsAdd` at the Spark typing level. The 3-arg form continues to produce `exp.DateAdd`, annotated by `_annotate_timeunit`. Follow-up to tobymao#3609, which preserved the unit during transpile but collapsed both forms at the AST level.
ruff UP006. Drops the now-unused typing import.
Contributor
Author
|
I need to do an audit to find all the places that the AST consumers can be affected by this shift. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #3609.
That PR added Databricks support for the 3-arg
DATE_ADD(unit, value, expr)shape and noted: "the old / 2-arg version is still supported alongside the
new one and should be preserved to ensure the DATE vs TIMESTAMP semantics."
The transpile output preserves the unit, but at the AST level both shapes
collapsed to
exp.DateAdd(unit=DAY|<unit>, this=expr, expression=value)withbit-identical args. A type annotator on
exp.DateAddtherefore cannot honorthe differing return-type contracts:
date_add(startDate, numDays)DATEdateadd(unit, value, expr)expr's typeEmpirical Databricks behavior
Both function names accept both arities, and arity alone selects the
semantic. Verified against a live Databricks SQL warehouse:
date_add(DATE'2024-01-01', 5)2024-01-06datedate_add(MONTH, 1, TIMESTAMP'2024-01-01 00:00')2024-02-01T00:00:00.000+00:00timestampdateadd(MONTH, 1, TIMESTAMP'2024-01-01 00:00')2024-02-01T00:00:00.000+00:00timestampdateadd(DATE'2024-01-01', 5)2024-01-06datedate_add(TIMESTAMP'2024-01-01 12:00', 5)2024-01-06dateNote the last row: 2-arg
date_addcoerces a TIMESTAMP operand to DATE onreturn — a contract that cannot be expressed by
_annotate_timeunit'stype-preserving
_coerce_datepath.Docs:
(The docs do not document the off-diagonal arities; the 2×2 above was
filled in by direct verification.)
Change
Parser (
sqlglot/parsers/databricks.py): a single arity-dispatchingbuilder is keyed to both
DATE_ADDandDATEADD, since the names arealiases in Databricks. The 2-arg form routes to
exp.TsOrDsAdd(matchingthe Hive parser's existing routing for 2-arg
DATE_ADD); the 3-arg formcontinues to produce
exp.DateAdd.Typing (
sqlglot/typing/spark.py): registers a DATE-returningannotator for
exp.TsOrDsAdd. Scoped to the Spark typing module — notHive — because older Hive returned STRING from
date_add. Spark andDatabricks inherit the new entry; Hive is unchanged.
Tests
tests/dialects/test_databricks.py::test_add_date— round-trip andcross-dialect output for 2-arg, 3-arg, and off-diagonal forms.
tests/test_optimizer.py::test_databricks_date_add_annotation— full2×2 type-annotation matrix.
tests/test_optimizer.py::test_hive_chain_date_add_descent— pinsHive (UNKNOWN) → Spark (DATE) → Databricks (DATE) to demonstrate the
typing change lands at Spark, not Hive.
Full upstream suite: 1209 tests pass.
Breaking change
AST consumers that previously matched 2-arg
date_add(...)viafind_all(exp.DateAdd)will need to also matchexp.TsOrDsAdd, orswitch to the latter for that shape specifically.