Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51419][SQL] Get hours of TIME datatype #50355

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

senthh
Copy link
Contributor

@senthh senthh commented Mar 23, 2025

What changes were proposed in this pull request?

This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL.

scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+----------------------------+
|hour(TIME '07:01:09.123123')|
+----------------------------+
|                           7|
+----------------------------+

scala> spark.sql("SELECT hour('2009-07-30 12:58:59')").show()
+-------------------------+
|hour(2009-07-30 12:58:59)|
+-------------------------+
|                       12|
+-------------------------+

Why are the changes needed?

Spark previously supported hour() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that hour(TIME'HH:MM:SS.######') behaves correctly without unnecessary type coercion.

Does this PR introduce any user-facing change?

Yes

  • Before this PR, calling hour(TIME'HH:MM:SS.######') resulted in a type mismatch error or an implicit cast attempt to TIMESTAMP, which was incorrect.
  • With this PR, hour(TIME'HH:MM:SS.######') now works correctly for TIME values without implicit casting.
  • Users can now extract the hour component from TIME values natively.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *TimeExpressionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 23, 2025
@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

Hi @MaxGekk

Could you please review this PR?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the output of the example:

scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+------------------------------+
|minute(TIME '07:01:09.123123')|
+------------------------------+
| 7|
+------------------------------+

should be hour not minute.

@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

Could you fix the output of the example:

scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+------------------------------+
|minute(TIME '07:01:09.123123')|
+------------------------------+
| 7|
+------------------------------+

should be hour not minute.

Yes corrected

@senthh senthh requested a review from MaxGekk March 23, 2025 11:44
@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

Modified as per review feedback @MaxGekk , Please check it looks good

@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

ExpressionsSchemaSuite failled with below error

 - Check schemas for expression examples *** FAILED *** (444 milliseconds)
[info]   "SELECT hour('20[09-07-30] 12:58:59')" did not equal "SELECT hour('20[18-02-14] 12:58:59')" SQL query did not match (ExpressionsSchemaSuite.scala:190)
[info]   Analysis:
[info]   "SELECT hour('20[09-07-30] 12:58:59')" -> "SELECT hour('20[18-02-14] 12:58:59')"

@senthh senthh requested a review from HyukjinKwon March 24, 2025 06:22

test("Hour with TIME type") {
// A few test times in microseconds since midnight:
// time in microseconds -> expected minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// time in microseconds -> expected minute
// time in microseconds -> expected hours

@senthh senthh requested a review from MaxGekk March 25, 2025 03:33
@MaxGekk MaxGekk changed the title [SPARK-51419][SQL] Get hour of TIME datatype [SPARK-51419][SQL] Get hours of TIME datatype Mar 25, 2025
Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM except @MaxGekk 's comments.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except of a couple comments.

@senthh senthh requested a review from MaxGekk March 26, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants