Skip to content

Commit 5488ec0

Browse files
committed
feat: support dayname and monthname expressions
Add Comet support for the Spark 4.0+ dayname and monthname expressions by routing them through the Arrow-direct codegen dispatcher in the spark-4.0 shim, so they run Spark's own generated code for exact parity including locale handling. Also add tests confirming that several datetime functions already execute in Comet via Spark's rewrite rules, and update the expression support doc to reflect this coverage: - to_date / to_timestamp / to_timestamp_ntz / to_timestamp_ltz reduce to Cast (no format) or GetTimestamp (with format) - make_timestamp_ntz / make_timestamp_ltz (6-argument form) reduce to MakeTimestamp - current_date / current_timestamp / now / curdate are constant-folded to literals before Comet sees the plan
1 parent e79183e commit 5488ec0

5 files changed

Lines changed: 110 additions & 14 deletions

File tree

docs/source/contributor-guide/spark_expressions_support.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -407,10 +407,13 @@
407407

408408
- [x] add_months
409409
- [x] convert_timezone
410-
- [ ] curdate
411-
- [ ] current_date
410+
- [x] curdate
411+
- Alias of `current_date`; constant-folded to a literal by Spark's `ComputeCurrentTime` rule before Comet sees the plan.
412+
- [x] current_date
413+
- Constant-folded to a literal by Spark's `ComputeCurrentTime` rule before Comet sees the plan.
412414
- [ ] current_time
413-
- [ ] current_timestamp
415+
- [x] current_timestamp
416+
- Constant-folded to a literal by Spark's `ComputeCurrentTime` rule before Comet sees the plan.
414417
- [x] current_timezone
415418
- [x] date_add
416419
- [x] date_diff
@@ -423,7 +426,8 @@
423426
- [x] datediff
424427
- [x] datepart
425428
- [x] day
426-
- [ ] dayname
429+
- [x] dayname
430+
- Spark 4.0+. Has no native lowering; routed through the codegen dispatcher (runs Spark's own `doGenCode`), gated by `spark.comet.exec.scalaUDF.codegen.enabled` (default true).
427431
- [x] dayofmonth
428432
- [x] dayofweek
429433
- [x] dayofyear
@@ -442,15 +446,18 @@
442446
- [ ] make_interval
443447
- [ ] make_time
444448
- [x] make_timestamp
445-
- [ ] make_timestamp_ltz
446-
- [ ] make_timestamp_ntz
449+
- [x] make_timestamp_ltz
450+
- The 6-argument form rewrites to `MakeTimestamp` and runs via the codegen dispatcher. The 2-argument `(date, time)` form requires the Spark 4.1 TIME type and falls back.
451+
- [x] make_timestamp_ntz
452+
- The 6-argument form rewrites to `MakeTimestamp` and runs via the codegen dispatcher. The 2-argument `(date, time)` form requires the Spark 4.1 TIME type and falls back.
447453
- [ ] make_ym_interval
448454
- [x] minute
449455
- [x] month
450456
- [ ] monthname
451457
- [x] months_between
452458
- [x] next_day
453-
- [ ] now
459+
- [x] now
460+
- Alias of `current_timestamp`; constant-folded to a literal by Spark's `ComputeCurrentTime` rule before Comet sees the plan.
454461
- [x] quarter
455462
- [x] second
456463
- [ ] session_window
@@ -459,11 +466,15 @@
459466
- [x] timestamp_micros
460467
- [x] timestamp_millis
461468
- [x] timestamp_seconds
462-
- [ ] to_date
469+
- [x] to_date
470+
- Rewrites to `Cast` (no format, native) or `Cast(GetTimestamp(...))` (with format, via the codegen dispatcher) before Comet sees the plan.
463471
- [ ] to_time
464-
- [ ] to_timestamp
465-
- [ ] to_timestamp_ltz
466-
- [ ] to_timestamp_ntz
472+
- [x] to_timestamp
473+
- Rewrites to `Cast` (no format, native) or `GetTimestamp` (with format, via the codegen dispatcher) before Comet sees the plan.
474+
- [x] to_timestamp_ltz
475+
- Rewrites to `to_timestamp` with `TimestampType`; same support as `to_timestamp`.
476+
- [x] to_timestamp_ntz
477+
- Rewrites to `to_timestamp` with `TimestampNTZType`; same support as `to_timestamp`.
467478
- [x] to_unix_timestamp
468479
- [x] to_utc_timestamp
469480
- Spark 3.4.3 (audited 2026-05-12): identical to 3.5.8.

spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ import org.apache.spark.sql.types.{ArrayType, BinaryType, BooleanType, DataTypes
3131
import org.apache.comet.{CometConf, CometExplainInfo}
3232
import org.apache.comet.CometSparkSessionExtensions.withFallbackReason
3333
import org.apache.comet.expressions.{CometCast, CometEvalMode}
34-
import org.apache.comet.serde.{CommonStringExprs, Compatible, ExprOuterClass, Incompatible, SupportLevel}
34+
import org.apache.comet.serde.{CometScalaUDF, CommonStringExprs, Compatible, ExprOuterClass, Incompatible, SupportLevel}
3535
import org.apache.comet.serde.ExprOuterClass.{BinaryOutputStyle, Expr}
3636
import org.apache.comet.serde.QueryPlanSerde.{exprToProtoInternal, optExprWithFallbackReason, scalarFunctionExprToProto, scalarFunctionExprToProtoWithReturnType, supportedScalarSortElementType}
3737

@@ -183,6 +183,13 @@ trait CometExprShim extends CommonStringExprs {
183183
optExprWithFallbackReason(mapSortExpr, ms, ms.child)
184184
}
185185

186+
// dayname / monthname (Spark 4.0+) have no native lowering. Route them through the
187+
// Arrow-direct codegen dispatcher so they run Spark's own doGenCode for exact parity
188+
// (including locale handling). Returns None and falls back cleanly when the dispatcher is
189+
// disabled via spark.comet.exec.scalaUDF.codegen.enabled.
190+
case _: DayName | _: MonthName =>
191+
CometScalaUDF.emitJvmCodegenDispatch(expr, inputs, binding)
192+
186193
case _ => None
187194
}
188195
}

spark/src/main/spark-4.1/org/apache/comet/shims/CometExprShim.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ import org.apache.spark.sql.types.{ArrayType, BinaryType, BooleanType, DataTypes
3232
import org.apache.comet.{CometConf, CometExplainInfo}
3333
import org.apache.comet.CometSparkSessionExtensions.withFallbackReason
3434
import org.apache.comet.expressions.{CometCast, CometEvalMode}
35-
import org.apache.comet.serde.{CommonStringExprs, Compatible, ExprOuterClass, Incompatible, SupportLevel}
35+
import org.apache.comet.serde.{CometScalaUDF, CommonStringExprs, Compatible, ExprOuterClass, Incompatible, SupportLevel}
3636
import org.apache.comet.serde.ExprOuterClass.{BinaryOutputStyle, Expr}
3737
import org.apache.comet.serde.QueryPlanSerde.{exprToProtoInternal, optExprWithFallbackReason, scalarFunctionExprToProto, scalarFunctionExprToProtoWithReturnType, supportedScalarSortElementType}
3838

@@ -214,6 +214,13 @@ trait CometExprShim extends CommonStringExprs {
214214
optExprWithFallbackReason(mapSortExpr, ms, ms.child)
215215
}
216216

217+
// dayname / monthname (Spark 4.0+) have no native lowering. Route them through the
218+
// Arrow-direct codegen dispatcher so they run Spark's own doGenCode for exact parity
219+
// (including locale handling). Returns None and falls back cleanly when the dispatcher is
220+
// disabled via spark.comet.exec.scalaUDF.codegen.enabled.
221+
case _: DayName | _: MonthName =>
222+
CometScalaUDF.emitJvmCodegenDispatch(expr, inputs, binding)
223+
217224
case _ => None
218225
}
219226
}

spark/src/main/spark-4.2/org/apache/comet/shims/CometExprShim.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ import org.apache.spark.sql.types.{ArrayType, BinaryType, BooleanType, DataTypes
3232
import org.apache.comet.{CometConf, CometExplainInfo}
3333
import org.apache.comet.CometSparkSessionExtensions.withFallbackReason
3434
import org.apache.comet.expressions.{CometCast, CometEvalMode}
35-
import org.apache.comet.serde.{CommonStringExprs, Compatible, ExprOuterClass, Incompatible, SupportLevel}
35+
import org.apache.comet.serde.{CometScalaUDF, CommonStringExprs, Compatible, ExprOuterClass, Incompatible, SupportLevel}
3636
import org.apache.comet.serde.ExprOuterClass.{BinaryOutputStyle, Expr}
3737
import org.apache.comet.serde.QueryPlanSerde.{exprToProtoInternal, optExprWithFallbackReason, scalarFunctionExprToProto, scalarFunctionExprToProtoWithReturnType, supportedScalarSortElementType}
3838

@@ -214,6 +214,13 @@ trait CometExprShim extends CommonStringExprs {
214214
optExprWithFallbackReason(mapSortExpr, ms, ms.child)
215215
}
216216

217+
// dayname / monthname (Spark 4.0+) have no native lowering. Route them through the
218+
// Arrow-direct codegen dispatcher so they run Spark's own doGenCode for exact parity
219+
// (including locale handling). Returns None and falls back cleanly when the dispatcher is
220+
// disabled via spark.comet.exec.scalaUDF.codegen.enabled.
221+
case _: DayName | _: MonthName =>
222+
CometScalaUDF.emitJvmCodegenDispatch(expr, inputs, binding)
223+
217224
case _ => None
218225
}
219226
}

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -538,6 +538,70 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
538538
}
539539
}
540540

541+
// The following tests cover datetime functions that are not registered under their own name in
542+
// the Comet expression maps but are still handled: Spark rewrites them to supported expressions
543+
// (Cast / GetTimestamp / MakeTimestamp) before Comet sees the plan, or the codegen dispatcher
544+
// runs Spark's own generated code.
545+
546+
test("to_date / to_timestamp without format run natively") {
547+
withParquetTable(
548+
Seq(("2024-01-15", "2024-01-15 10:30:45"), ("2020-12-31", "2020-12-31 23:59:59")),
549+
"tbl") {
550+
// These rewrite to Cast(string -> date/timestamp), which Comet supports natively.
551+
checkSparkAnswerAndOperator("SELECT to_date(_1) FROM tbl")
552+
checkSparkAnswerAndOperator("SELECT to_timestamp(_2) FROM tbl")
553+
checkSparkAnswerAndOperator("SELECT to_timestamp_ntz(_2) FROM tbl")
554+
checkSparkAnswerAndOperator("SELECT to_timestamp_ltz(_2) FROM tbl")
555+
checkSparkAnswerAndOperator("SELECT try_to_timestamp(_2) FROM tbl")
556+
}
557+
}
558+
559+
test("to_date / to_timestamp with format run via codegen dispatcher") {
560+
withSQLConf(CometConf.COMET_SCALA_UDF_CODEGEN_ENABLED.key -> "true") {
561+
withParquetTable(
562+
Seq(("2024-01-15", "2024-01-15 10:30:45"), ("2020-12-31", "2020-12-31 23:59:59")),
563+
"tbl") {
564+
// With a format these rewrite to GetTimestamp, which is routed through the dispatcher.
565+
checkSparkAnswerAndOperator("SELECT to_date(_1, 'yyyy-MM-dd') FROM tbl")
566+
checkSparkAnswerAndOperator("SELECT to_timestamp(_2, 'yyyy-MM-dd HH:mm:ss') FROM tbl")
567+
}
568+
}
569+
}
570+
571+
test("make_timestamp_ntz / make_timestamp_ltz run via codegen dispatcher") {
572+
withSQLConf(CometConf.COMET_SCALA_UDF_CODEGEN_ENABLED.key -> "true") {
573+
withParquetTable(Seq((2024, 1, 15, 10, 30, 45), (2020, 12, 31, 23, 59, 59)), "tbl") {
574+
// The 6-argument forms rewrite to MakeTimestamp, which is routed through the dispatcher.
575+
checkSparkAnswerAndOperator("SELECT make_timestamp_ntz(_1, _2, _3, _4, _5, _6) FROM tbl")
576+
checkSparkAnswerAndOperator("SELECT make_timestamp_ltz(_1, _2, _3, _4, _5, _6) FROM tbl")
577+
}
578+
}
579+
}
580+
581+
test("dayname / monthname run via codegen dispatcher") {
582+
assume(isSpark40Plus)
583+
withSQLConf(CometConf.COMET_SCALA_UDF_CODEGEN_ENABLED.key -> "true") {
584+
withParquetTable(
585+
Seq(Tuple1("2024-01-15"), Tuple1("2024-06-30"), Tuple1("2020-12-31")),
586+
"tbl") {
587+
checkSparkAnswerAndOperator("SELECT dayname(CAST(_1 AS DATE)) FROM tbl")
588+
checkSparkAnswerAndOperator("SELECT monthname(CAST(_1 AS DATE)) FROM tbl")
589+
}
590+
}
591+
}
592+
593+
test("current_date / current_timestamp / now constant-folded before Comet") {
594+
Seq("current_date()", "current_timestamp()", "now()", "current_date", "current_timestamp")
595+
.foreach { fn =>
596+
val plan = spark.sql(s"SELECT $fn AS r").queryExecution.optimizedPlan
597+
val folded = plan.expressions.exists {
598+
case Alias(_: Literal, _) => true
599+
case _ => false
600+
}
601+
assert(folded, s"expected '$fn' to be constant-folded to a Literal before Comet")
602+
}
603+
}
604+
541605
test("hour on int96 timestamp column") {
542606
import testImplicits._
543607

0 commit comments

Comments
 (0)