-
Notifications
You must be signed in to change notification settings - Fork 32
(Requires design discussion) handling all NaNs - #703 #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
(Requires design discussion) handling all NaNs - #703 #776
Conversation
…earsonr will return a NaN, generate a python warning
…earsonr will return a NaN, generate a python warning
…earsonr will return a NaN, generate a python warning (after black)
…earsonr will return a NaN, generate a python warning (after black and isort)
@nicholasloveday I think this PR is interesting from the perspective of what kinds of conditions should raise a user warning. Perhaps for all our methods, anything which is all NaN should raise a user warning? Correct setting of NaNs is intended behaviour, so doesn't indicate a program failure. On the other hand, something being entirely NaN could be an indication of a data issue or user error, and a warning could be useful. We should probably consider adopting something consistent across the codebase. Take a look at what @JinghanFu has done in this example and have a think. This work was done because @durgals mentioned it following his KGE work as being useful. @durgals you might also like to make a comment about what kinds of user warning should perhaps be raised across all our methods. Thanks @JinghanFu for putting together a concrete example for us to discuss. |
@tennlee Thank you very much for your guidance throughout the day!! Looking forward to joining the whole PyCon next year and contributing more! |
Im guessing this is from a sprint of sorts. But in the interest of getting PRs through before they are stale. I have a strong opinion on how this should be handled see issue for detail. At least one dimension MUST be reduced for ANY operation that requires accumulation of a computation along a dimension (such as a sum) for a successful computation. Dividing by variance is one common example of this. Which is common to quite a few scores actually. MSE for example doesn’t require this - so it’s fine to not raise anything, although it’s still silly and I actually wouldn’t mind if it threw an exception for attempting use dask to compute an identity function in an overly contrived way. This should be an EARLY check on preserve and reduce dimensions rather than a post score check, because I already know the result without doing any scoring - it’s nan… We don’t want to mix up failure states where the user input is invalid for a score vs a score returning an invalid result. The latter could be a warning but the former MUST be an exception because that fundamentally invalid input for the score. having said that even if the score were to return all NaNs given a valid input id still be expect an error because it seems to me something fundamentally unhandled within the computation. The only exception to this is to raise a warning for partially computed results if they are still valid and there’s no trivial way of avoiding it (such as divide by 0 on some elements but not others) #815 currently handles it the way I’d like it to. I also put some comments in #703 |
Also minor comments though I understand this may have been done as a proof of concept:
|
This implementation needs to be reconsidered - I'm leaving it open but converting to draft. |
Please work through the following checklists. Delete anything that isn't relevant.
Development for new xarray-based metrics
reduce_dims
,preserve_dims
, andweights
args.xr.DataArrays
andxr.Datasets
if possibleDocstrings
Testing of new xarray-based metrics
xr.DataArrays
andxr.Datasets
Tutorial notebook
Documentation