Skip to content

Conversation

@lucas-nelson-uiuc
Copy link
Contributor

@lucas-nelson-uiuc lucas-nelson-uiuc commented May 10, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Implementation for expr_dt.to_string - disclaimer: probably glossing over some dt/tz details, feel free to call me out

Makes use of the convention mapping from strptime_to_pyspark_format - can move this to _spark_like.utils for expr_dt and expr_str to import from:

def strptime_to_pyspark_format(format: str) -> str:
"""Converts a Python strptime datetime format string to a PySpark datetime format string."""
# Mapping from Python strptime format to PySpark format

couple notes:

  • some pyspark formats aren't zero-padded - e.g. the format "%Y-%m-%d" should evaluate to "yyyy-MM-dd" in pyspark but this returns "y-M-d" which causes some minor differences (instead of 2020-01-09 we get 2020-1-9). wondering if we should update the pyspark mappings to their zero-padded equivalents?

format_mapping = {
"%Y": "y", # Year with century
"%y": "y", # Year without century
"%m": "M", # Month
"%d": "d", # Day of the month
"%H": "H", # Hour (24-hour clock) 0-23
"%I": "h", # Hour (12-hour clock) 1-12
"%M": "m", # Minute
"%S": "s", # Second
"%f": "S", # Microseconds -> Milliseconds
"%p": "a", # AM/PM
"%a": "E", # Abbreviated weekday name
"%A": "E", # Full weekday name
"%j": "D", # Day of the year
"%z": "Z", # Timezone offset
"%s": "X", # Unix timestamp
}

  • not all tests pass for a couple reasons:
    • pattern contains week-based patterns (W): IllegalArgumentException: All week-based patterns are unsupported since Spark 3.0, detected: W, Please use the SQL function EXTRACT instead
    • timestamp handling: two things going on here:
      • similar to above, %f shortchanges the milliseconds by only returning one decimal place. could default to higher number of places
      • strptime_to_pyspark_format removes all instances of "T" for timezones

let me know if there's anything else i'm missing that I can implement

@FBruzzesi FBruzzesi added spark-like enhancement New feature or request labels May 10, 2025
@FBruzzesi
Copy link
Member

Hey @lucas-nelson-uiuc , thanks for the contribution.

There is an open PR (#1842) to add the same feature - we couldn't finish all the details in there, but would love some help and/or feedback to push it forward

@lucas-nelson-uiuc
Copy link
Contributor Author

ah totally missed that - thanks @FBruzzesi I'll check it out

@FBruzzesi
Copy link
Member

@lucas-nelson-uiuc thanks again for the effort here! I am going to close this one as I am trying to finalize #1842
Feel free to comment/review there if needed πŸ™ŒπŸΌ

@FBruzzesi FBruzzesi closed this May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request spark-like

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants