-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Index.validate_dataarray_coord #10137
base: main
Are you sure you want to change the base?
Conversation
... when check_default_indexes=False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick thoughts
Functions with a leading underscore are marked by pyright as unused if they are not used from within the module in which they are defined. Also remove unneeded nested import.
Move check_dataarray_coords in xarray.core.coordinates module and rename it to validate_dataarray_coords (name consistent with Index.validate_dataarray_coord). Move CoordinateValidationError from xarray.core.indexes to xarray.core.coordinates module.
Thanks @shoyer for taking a look! This is now ready for (another round of) review. Compared to #10116 this PR still represents a major data model change for DataArray (in the sense that it allows overriding strict enforcement of the DataArray model in certain cases), but I think that the risk is mitigated since it is here done explicitly via @dcherian regarding your #10116 (comment), in the example notebook the DataArray reprs shown without any list of dimensions for the Coordinates section do not strike me much. That said, the |
): | ||
"""Validate an index coordinate variable to include in a DataArray. | ||
|
||
This method is called repeatedly for each coordinate associated with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called repeatedly for each coordinate associated with | |
This method is called repeatedly for each Variable associated with |
When a DataArray is constructed from a Dataset (variable access), if the | ||
validation fails Xarray will fail back to propagating the coordinate | ||
according to the default rules for DataArray (i.e., depending on its | ||
dimensions), which may drop this index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a DataArray is constructed from a Dataset (variable access), if the | |
validation fails Xarray will fail back to propagating the coordinate | |
according to the default rules for DataArray (i.e., depending on its | |
dimensions), which may drop this index. | |
If this method raises when a DataArray is constructed from a Dataset, | |
Xarray will fail back to propagating the coordinate | |
according to the default rules for DataArray --- i.e., the dimensions of every | |
coordinate variable must be a subset of DataArray.dims --- which may drop this index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Yeah I think it could be improved, but certainly not a blocker here. IMO we can merge. I'm not a huge fan of the method name |
whats-new.rst
api.rst
Same goal than #10116 using the alternative approach suggested in #10116 (comment) where the propagation (validation) of coordinates in a DataArray is delegated to their index (if any).
I find this approach cleaner than #10116. Here is the same notebook example adapted for this PR.
Index API alternatives
Index.validate_dataarray_coord(self, name: Hashable, var: Variable, dims: set[Hashable]) -> None
Index.validate_dataarray_coords(self, variables: dict[Hashable, Variable], dims: set[Hashable]) -> None
Indexes.group_by_index
while in option 1 we simply need to iterate over_variables
,_coords
and/or_indexes
).Index.validate_dataarray_coords(self, variables: dict[Hashable, Variable], dims: set[Hashable]) -> Coordinates
Option 1, the simplest and working well with IntervalIndex, is currently implemented in this PR.
cc @shoyer @dcherian