Skip to content

Conversation

@chengzhuzhang
Copy link
Collaborator

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label May 20, 2025
@codecov
Copy link

codecov bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (14f876b) to head (4dc841a).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #761   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1702      1702           
=========================================
  Hits          1702      1702           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chengzhuzhang
Copy link
Collaborator Author

@tomvothecoder this fix works okay for the e3sm_diags PR:E3SM-Project/e3sm_diags#982
though I don't know if there are other tests needed. Please review, thanks.

@tomvothecoder tomvothecoder requested a review from Copilot May 20, 2025 19:42
@tomvothecoder tomvothecoder moved this from Todo to In Review in xCDAT Development May 20, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a ZeroDivisionError when computing hourly intervals by using total_seconds() instead of the seconds attribute on timedelta objects.

  • Uses diff.total_seconds() for accurate hour calculation
  • Maintains downstream daily sub-frequency logic
Comments suppressed due to low confidence (1)

xcdat/bounds.py:652

  • Add unit tests for edge cases where diff is zero, less than one hour, exactly 24 hours, and greater than 24 hours to verify daily_subfreq is computed correctly and no division-by-zero or invalid frequency occurs.
daily_subfreq = int(24 / hrs)  # type: ignore

diff = pd.to_timedelta(diff)

hrs = diff.seconds / 3600
hrs = diff.total_seconds() / 3600
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If diff exceeds 24 hours, hrs will be >=24 and daily_subfreq = int(24 / hrs) becomes zero, leading to downstream errors. Consider adding a guard to ensure hrs is within (0, 24] or handle diff > 24h explicitly before computing daily_subfreq.

Suggested change
hrs = diff.total_seconds() / 3600
hrs = diff.total_seconds() / 3600
if not (0 < hrs <= 24):
raise ValueError(
f"Invalid time difference: {hrs} hours. Expected a value in the range (0, 24]."
)

Copilot uses AI. Check for mistakes.
@pochedls
Copy link
Collaborator

@chengzhuzhang – is there a compact explanation for why .total_seconds() works and .seconds() does not? It seems like .seconds gives you the seconds into a single day (i.e., <86400) where as .total_seconds can go beyond a day (see caution statement here).

I can see how .seconds could create problems (and we probably should have been using .total_seconds all along), but this seems to arise when xcdat is processing bounds for temporal resolution that is finer than daily resolution (e.g., hourly or 3-hourly). Shouldn't .seconds and .total_seconds give the same result in this case?

It might be helpful to add a little bit of context if we ever revisit this section of code.

@chengzhuzhang
Copy link
Collaborator Author

chengzhuzhang commented May 20, 2025

@pochedls yes, I should have written better description... I'm just copy over the caution statement you linked:

It is a somewhat common bug for code to unintentionally use this attribute when it is actually intended to get a total_seconds() value instead:

from datetime import timedelta

duration = timedelta(seconds=11235813)

duration.days, duration.seconds
(130, 3813)

duration.total_seconds()
11235813.0

In my case, I'm processing daily data, for example, I have a timedelta object like following:

>>> diff
datetime.timedelta(days=1)
>>> diff.seconds    # hr = 24/diff caused the ZeroDivisionError: 
0
>>> diff.total_seconds()
86400.0

@pochedls
Copy link
Collaborator

I realize it is daily data, so I would have thought the inferred frequency is daily and it would have skipped the problematic lines of code altogether.

@chengzhuzhang
Copy link
Collaborator Author

I realize it is daily data, so I would have thought the inferred frequency is daily and it would have skipped the problematic lines of code altogether.

Interesting! I only looked around lines of code that caused trouble and didn't go through the logic. Yes, it is interesting this was caught by freq = hours condition.

@tomvothecoder
Copy link
Collaborator

I realize it is daily data, so I would have thought the inferred frequency is daily and it would have skipped the problematic lines of code altogether.

Interesting! I only looked around lines of code that caused trouble and didn't go through the logic. Yes, it is interesting this was caught by freq = hours condition.

xCDAT's add_missing_bounds() will attempt to add time bounds by inferring the frequency. In this case, It looks like the inferred frequency is hour but we actually need day.

The call stack looks like:

  1. add_missing_bounds() call to self._create_time_bounds():
  2. freq is inferred with _infer_freq()
  3. _infer_freq() implementation:
    • xcdat/xcdat/temporal.py

      Lines 2080 to 2111 in 14f876b

      def _infer_freq(time_coords: xr.DataArray) -> Frequency:
      """Infers the time frequency from the coordinates.
      This method infers the time frequency from the coordinates by
      calculating the minimum delta and comparing it against a set of
      conditionals.
      The native ``xr.infer_freq()`` method does not work for all cases
      because the frequency can be irregular (e.g., different hour
      measurements), which ends up returning None.
      Parameters
      ----------
      time_coords : xr.DataArray
      A DataArray for the time dimension coordinate variable.
      Returns
      -------
      Frequency
      The time frequency.
      """
      # TODO: Raise exception if the frequency cannot be inferred.
      min_delta = pd.to_timedelta(np.diff(time_coords).min(), unit="ns")
      if min_delta < pd.Timedelta(days=1):
      return "hour"
      elif min_delta >= pd.Timedelta(days=1) and min_delta < pd.Timedelta(days=21):
      return "day"
      elif min_delta >= pd.Timedelta(days=21) and min_delta < pd.Timedelta(days=300):
      return "month"
      else:
      return "year"

Is there a bug in _infer_freq()? @pochedls @chengzhuzhang

@chengzhuzhang
Copy link
Collaborator Author

chengzhuzhang commented May 21, 2025

yep, probably the inferring mechanism (bullet # 3) is not very reliable. Following based on data from the MVCE.

>>> time_coords
<xarray.DataArray 'time' (time: 7305)> Size: 58kB
array([cftime.DatetimeGregorian(2001, 1, 1, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 2, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 3, 12, 0, 0, 0, has_year_zero=False),
       ...,
       cftime.DatetimeGregorian(2020, 12, 29, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2020, 12, 30, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2020, 12, 31, 12, 0, 0, 0, has_year_zero=False)],
      shape=(7305,), dtype=object)
Coordinates:
  * time     (time) object 58kB 2001-01-01 12:00:00 ... 2020-12-31 12:00:00
Attributes:
    axis:           T
    cell_methods:   time: mean
    standard_name:  time
>>> min_delta = pd.to_timedelta(np.diff(time_coords).min(), unit="ns")
>>> min_delta
Timedelta('0 days 23:45:00') 

Change min_delta = pd.to_timedelta(np.diff(time_coords).min(), unit="ns") to min_delta = pd.to_timedelta(np.diff(time_coords).max(), unit="ns") would class this dataset under day

@chengzhuzhang
Copy link
Collaborator Author

Having a close look of the data. I think it might be data problem. The min time difference that is shorter than one day is actually caused by an odd time values around Feb of a leap year:

>>> time_coords[424].values
array(cftime.DatetimeGregorian(2002, 3, 1, 12, 0, 0, 0, has_year_zero=False),
      dtype=object)
>>> time_coords[423].values
array(cftime.DatetimeGregorian(2002, 2, 28, 12, 15, 0, 0, has_year_zero=False),
      dtype=object)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Error:ZeroDivisionError: float division by zero when adding missing bounds for daily frequency data

4 participants