feat: unify to_timestamp/try_to_timestamp, honor ANSI, drop legacy UDF#2100
feat: unify to_timestamp/try_to_timestamp, honor ANSI, drop legacy UDF#2100davidlghellin wants to merge 3 commits into
to_timestamp/try_to_timestamp, honor ANSI, drop legacy UDF#2100Conversation
There was a problem hiding this comment.
Pull request overview
Unifies Spark-compatible to_timestamp / try_to_timestamp (and _ntz) planning and execution by routing parsing through a single ANSI-aware SparkTimestamp UDF, removing the legacy SparkTryToTimestamp implementation. This aligns timestamp parsing behavior with spark.sql.ansi.enabled while keeping cluster-mode serialization working via updated physical plan codec/proto.
Changes:
- Added new BDD coverage for
to_timestamp(strict/ANSI-aware) andtry_to_timestamp(NULL-on-failure) including timezone and per-row format cases. - Refactored planner logic to share
to_timestamp/try_to_timestampimplementation and consistently passansi_modeintoSparkTimestamp. - Removed legacy
SparkTryToTimestampUDF and extended remote execution serialization (codec.rs+physical.proto) to carryansi_mode.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| python/pysail/tests/spark/function/features/try_to_timestamp.feature | New BDD scenarios for safe timestamp parsing semantics. |
| python/pysail/tests/spark/function/features/to_timestamp.feature | New BDD scenarios for strict/ANSI-aware parsing + timezone/ntz behavior. |
| crates/sail-plan/src/resolver/expression/cast.rs | Passes ANSI mode into string→timestamp casting via SparkTimestamp. |
| crates/sail-plan/src/function/scalar/datetime.rs | Unifies planner path for to_timestamp/try_to_timestamp and uses SparkTimestamp for parsing. |
| crates/sail-plan/src/function/scalar/conversion.rs | Threads ANSI mode into cast_to_timestamp UDF planning. |
| crates/sail-plan/src/function/scalar/conditional.rs | Updates coalesce temporal coercion to use ANSI-strict SparkTimestamp. |
| crates/sail-function/src/scalar/datetime/spark_try_to_timestamp.rs | Removes the legacy UDF implementation. |
| crates/sail-function/src/scalar/datetime/spark_timestamp.rs | Extends SparkTimestamp to handle ANSI mode, try_* variants, and optional per-row formats. |
| crates/sail-function/src/scalar/datetime/mod.rs | Drops module export for removed legacy UDF. |
| crates/sail-execution/src/codec.rs | Updates physical plan codec to serialize/deserialize ansi_mode for SparkTimestamp. |
| crates/sail-execution/proto/sail/plan/physical.proto | Adds ansi_mode field to SparkTimestampUdf for cluster execution parity. |
| When query | ||
| """ | ||
| SELECT to_timestamp('2024-01-15 10:30:45') AS result | ||
| """ | ||
| Then query result |
| When query | ||
| """ | ||
| SELECT try_to_timestamp('2024-01-15 10:30:45') AS result | ||
| """ | ||
| Then query result | ||
| | result | |
There was a problem hiding this comment.
Fixed — reindented the docstring (""")
| Err(PlanError::invalid( | ||
| "try_to_timestamp requires 1 or 2 arguments", | ||
| )) | ||
| Err(PlanError::invalid("to_timestamp requires 1 or 2 arguments")) |
| let mut coerced = arg_types.to_vec(); | ||
| if let Some(format) = arg_types.get(1) { | ||
| match format { | ||
| DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View => {} | ||
| // A NULL format yields a NULL result; coerce it to a Utf8 null. | ||
| DataType::Null => coerced[1] = DataType::Utf8, | ||
| other => { | ||
| return Err(unsupported_data_type_exec_err(self.name(), "STRING", other)); | ||
| } | ||
| } | ||
| } | ||
| Ok(coerced) | ||
| } |
There was a problem hiding this comment.
Fixed — but I went with the "reject non-string types" option rather than coercing to Utf8.
Spark 3.5.7 Test ReportCommit Information
Test Summary
Test DetailsError CountsPassed Tests Diff(empty) Failed Tests |
Spark 4.1.1 Test ReportCommit Information
Test Summary
Test DetailsError CountsPassed Tests Diff(empty) Failed Tests(truncated) |
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #2100 +/- ##
==========================================
- Coverage 77.17% 76.03% -1.15%
==========================================
Files 1004 1004
Lines 162436 162505 +69
==========================================
- Hits 125365 123565 -1800
- Misses 37071 38940 +1869
*This pull request uses carry forward flags. Click here to find out more.
... and 51 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Ibis Test ReportCommit Information
Test Summary
Test DetailsError CountsPassed Tests Diff--- before.txt 2026-06-18 18:27:01.670252618 +0000
+++ after.txt 2026-06-18 18:27:01.866254582 +0000
@@ -502,0 +503 @@
+ibis/backends/tests/test_export.py::test_table_to_csv[pyspark]Failed Tests |
No description provided.