(Requires design discussion) handling all NaNs - #703 #776

JinghanFu · 2024-11-26T04:40:20Z

Please work through the following checklists. Delete anything that isn't relevant.

Development for new xarray-based metrics

Works with n-dimensional data and includes reduce_dims, preserve_dims, and weights args.
Typehints added
Add error handling
Imported into the API
Works with both xr.DataArrays and xr.Datasets if possible

Docstrings

Docstrings complete and follow Napoleon (google) style
Maths equation added
Reference to paper/webpage is in docstring. The preferred referencing style for journal articles is APA (7th edition)
Code example added

Testing of new xarray-based metrics

100% unit test coverage
Test that metric is compatible with dask.
Test that metrics work with inputs that contain NaNs
Test that broadcasting with xarray works
Test both reduce and preserve dims arguments work
Test that errors are raised as expected
Test that it works with both xr.DataArrays and xr.Datasets

Tutorial notebook

Short introduction to why you would use that metric and what it tells you
A link to a reference
A "things to try next" section at the end
Add notebook to Tutorial_Gallery.ipynb
Optional - a detailed discussion of how the metric works at the end of the notebook

Documentation

Add the score to the API documentation
Add the score to the included list of metrics and tools

…earsonr will return a NaN, generate a python warning

…earsonr will return a NaN, generate a python warning (after black)

…earsonr will return a NaN, generate a python warning (after black and isort)

tennlee · 2024-11-26T12:36:03Z

@nicholasloveday I think this PR is interesting from the perspective of what kinds of conditions should raise a user warning. Perhaps for all our methods, anything which is all NaN should raise a user warning? Correct setting of NaNs is intended behaviour, so doesn't indicate a program failure. On the other hand, something being entirely NaN could be an indication of a data issue or user error, and a warning could be useful. We should probably consider adopting something consistent across the codebase. Take a look at what @JinghanFu has done in this example and have a think. This work was done because @durgals mentioned it following his KGE work as being useful. @durgals you might also like to make a comment about what kinds of user warning should perhaps be raised across all our methods. Thanks @JinghanFu for putting together a concrete example for us to discuss.

JinghanFu · 2024-12-01T05:02:44Z

@tennlee Thank you very much for your guidance throughout the day!! Looking forward to joining the whole PyCon next year and contributing more!

nikeethr · 2025-03-15T00:53:25Z

Im guessing this is from a sprint of sorts. But in the interest of getting PRs through before they are stale. I have a strong opinion on how this should be handled see issue for detail.

At least one dimension MUST be reduced for ANY operation that requires accumulation of a computation along a dimension (such as a sum) for a successful computation. Dividing by variance is one common example of this. Which is common to quite a few scores actually. MSE for example doesn’t require this - so it’s fine to not raise anything, although it’s still silly and I actually wouldn’t mind if it threw an exception for attempting use dask to compute an identity function in an overly contrived way.

This should be an EARLY check on preserve and reduce dimensions rather than a post score check, because I already know the result without doing any scoring - it’s nan…

We don’t want to mix up failure states where the user input is invalid for a score vs a score returning an invalid result. The latter could be a warning but the former MUST be an exception because that fundamentally invalid input for the score.

having said that even if the score were to return all NaNs given a valid input id still be expect an error because it seems to me something fundamentally unhandled within the computation. The only exception to this is to raise a warning for partially computed results if they are still valid and there’s no trivial way of avoiding it (such as divide by 0 on some elements but not others)

#815 currently handles it the way I’d like it to. I also put some comments in #703

nikeethr · 2025-03-15T01:00:59Z

Also minor comments though I understand this may have been done as a proof of concept:

even though “never” is strong and I typically don’t like to use it - in this case it’s warranted - never assign a default value to an empty (or any) mutable structure. The original implementation with None was correct.
try not to use math when we already have access to numpy/xarray’s port of numpys ufuncs
I’m guessing the gobals are just for PoC we just need to handle them appropriately within internal scope.

nikeethr · 2025-03-24T02:26:31Z

This implementation needs to be reconsidered - I'm leaving it open but converting to draft.

jifu2274 added 6 commits November 26, 2024 14:35

all of the 'chainging_func_kwargs' are changed to 'chaining_func_kwargs'

57ae3ed

Jupyter notebook rerun everything to reflect the changes on method name

a315c11

flag enable or disable user warning is created + if is enable, when p…

4c16325

…earsonr will return a NaN, generate a python warning

flag enable or disable user warning is created + if is enable, when p…

789f441

…earsonr will return a NaN, generate a python warning

flag enable or disable user warning is created + if is enable, when p…

51e6829

…earsonr will return a NaN, generate a python warning (after black)

flag enable or disable user warning is created + if is enable, when p…

9e28354

…earsonr will return a NaN, generate a python warning (after black and isort)

tennlee added this to the Wishlist milestone Dec 5, 2024

nikeethr marked this pull request as draft March 24, 2025 02:25

nikeethr changed the title ~~703 warn user when pearsonr returns nan + a flag enable or disable the warning message (default is disable)~~ (Requires design discussion) handling all NaNs - #703 Mar 24, 2025

nikeethr linked an issue Mar 24, 2025 that may be closed by this pull request

Review docstring and possibly raise a userwarning for pearsonr based on preserve_dims=all #703

Open

nikeethr mentioned this pull request May 15, 2025

Add a UserWarning (or maybe add an exception) for negative weights #829

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Requires design discussion) handling all NaNs - #703 #776

(Requires design discussion) handling all NaNs - #703 #776

JinghanFu commented Nov 26, 2024

Uh oh!

tennlee commented Nov 26, 2024

Uh oh!

JinghanFu commented Dec 1, 2024

Uh oh!

nikeethr commented Mar 15, 2025 •

edited

Loading

Uh oh!

nikeethr commented Mar 15, 2025 •

edited

Loading

Uh oh!

nikeethr commented Mar 24, 2025

Uh oh!

Uh oh!

(Requires design discussion) handling all NaNs - #703 #776

Are you sure you want to change the base?

(Requires design discussion) handling all NaNs - #703 #776

Conversation

JinghanFu commented Nov 26, 2024

Development for new xarray-based metrics

Docstrings

Testing of new xarray-based metrics

Tutorial notebook

Documentation

Uh oh!

tennlee commented Nov 26, 2024

Uh oh!

JinghanFu commented Dec 1, 2024

Uh oh!

nikeethr commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikeethr commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikeethr commented Mar 24, 2025

Uh oh!

Uh oh!

nikeethr commented Mar 15, 2025 •

edited

Loading

nikeethr commented Mar 15, 2025 •

edited

Loading