Skip to content

Do not drop freq when constructing DatetimeIndex from pandas #18778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: branch-25.08
Choose a base branch
from

Conversation

brandon-b-miller
Copy link
Contributor

Closes #18753

Needs test.

@brandon-b-miller brandon-b-miller added bug Something isn't working Python Affects Python cuDF API. non-breaking Non-breaking change labels May 13, 2025
Copy link

copy-pr-bot bot commented May 13, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@brandon-b-miller brandon-b-miller marked this pull request as ready for review May 14, 2025 19:20
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner May 14, 2025 19:20
@@ -83,6 +83,14 @@ def test_from_pandas_rangeindex():
assert idx1.name == idx2.name


def test_from_pandas_datetimeindex_freq():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandon-b-miller Can we also add cudf.Index, cudf.Index.from_pandas flows into this pytest? Apart from that, I think this PR is good to go.

@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 20, 2025
@galipremsagar
Copy link
Contributor

Thanks @brandon-b-miller !

@galipremsagar
Copy link
Contributor

/merge

@brandon-b-miller
Copy link
Contributor Author

I have a failing test locally - looks like there's an existing issue with the way we read in frequencies that is being exposed by this. Looking into it

@vyasr
Copy link
Contributor

vyasr commented May 20, 2025

Looks like we have some conflicts, probably from #18751.

@brandon-b-miller
Copy link
Contributor Author

brandon-b-miller commented May 21, 2025

Argh, there's a couple tricky cases that have bubbled up through the tests that aren't simple fixes.

Firstly, pandas adjusts the frequency when slicing with a step. Theres no equivalent computation in cuDF python, instead we simply apply the original frequency which causes a mismatch.

>>> ps
2001-01-01    1
2001-01-02    2
2001-01-03    3
2001-01-04    4
2001-01-05    5
Freq: D, dtype: int64
>>> ps[::2]
2001-01-01    1
2001-01-03    3
2001-01-05    5
Freq: 2D, dtype: int64

Secondly, pandas has deprecated certain frequencies such as M in favor of others such as ME (month end) and automatically promotes them, probably to push a user towards using an exact number of days or explicitly request the end of the month

>>> pd.date_range(start="1990-01-01", periods=10, freq="M").freq
<python-input-11>:1: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<MonthEnd>

However cudf.DateOffset doesn't implement MonthEnd and if we just try and keep track of it as months, we error later on anytime we hit to_pandas as pandas tries to validate the frequency against the data itself and you'll get

Inferred frequency ME from passed values does not conform to passed frequency <DateOffset: months=1> 

Finally, [this section of the code here ] attempts to validate that there is equal spacing between datetime values when initting a datetimeindex. This wasn't being hit before, because freq wasn't carried through to this point, but now that it's being hit, it doesn't look like the logic works for months which can have len({28, 30, 31}) == 3 days.

This will likely take a little more time to fix since carrying it through has elucidated a few areas where our freq support isn't complete/working more generally.

@brandon-b-miller brandon-b-miller added 2 - In Progress Currently a work in progress and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels May 21, 2025
@brandon-b-miller
Copy link
Contributor Author

Converted this back to in progress in light of the recent findings, pushed some changes, still some failures.

@vyasr vyasr changed the base branch from branch-25.06 to branch-25.08 May 21, 2025 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Index frequency lost when using cudf.from_pandas
3 participants