-
Notifications
You must be signed in to change notification settings - Fork 40
Propose design doc and roadmap for VirtualiZarr 2.0 #568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
@kylebarron @chuckwondo @d-v-b I wasn't able to request reviews from you all but it'd be great to get your thoughts on this if you have time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my suggestions are punctuation and syntax fixes, but I do have a couple of more significant comments in here.
Co-authored-by: Chuck Daniels <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the initiative here @maxrjones ! It's clear from this that there was far too much intention just scattered around various issues - it's nice to discuss it in one place.
Co-authored-by: Tom Nicholas <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #568 +/- ##
===========================================
+ Coverage 88.80% 89.34% +0.53%
===========================================
Files 34 34
Lines 1948 1943 -5
===========================================
+ Hits 1730 1736 +6
+ Misses 218 207 -11 🚀 New features to boost your workflow:
|
Thanks for putting this together @maxrjones it is super helpful to have these conversations synthesized into this design doc, especially with the associated issues linked. Take my comments with a grain of salt as I haven't been involved directly in these discussions. |
Co-authored-by: Aimee Barciauskas <[email protected]>
Co-authored-by: Tom Nicholas <[email protected]>
Not sure which Phase in the roadmap it fits into, but having https support for virtual stores would be really clutch for a lot of usecases. |
Co-authored-by: Raphael Hagen <[email protected]>
join: "JoinOptions" = "outer", | ||
attrs_file: str | os.PathLike | None = None, | ||
combine_attrs: "CombineAttrsOptions" = "override", | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest considering removing **kwargs
from any top-level function because it's often unclear where they go. Instead, if those kwargs are intended for the object reader, they should be passed into the reader before the instance is passed into this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It provides these arguments to open_virtual_dataset
:
drop_variables: Iterable[str] | None = None,
loadable_variables: Iterable[str] | None = None,
decode_times: bool | None = None,
cftime_variables: Iterable[str] | None = None,
indexes: Mapping[str, Index] | None = None,
Since I think the motivation was to follow the design of https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html closely, I doubt we'll want to change this even though I also dislike pass-through **kwargs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you know it's always going to be those 5 arguments, why not re-declare them here so you get accurate type hinting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to Tom on this to decide the preferred balance of mirroring Xarray vs accurate type hinting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to match xarray's **kwargs
syntax in today's coordination call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we are having **kwargs
in our signature or we're going to hoist the specific argument parameters that xarray supports into our signature so that we don't have to accept **kwargs
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we are having **kwargs in our signature
Yes, following the design pattern established in https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, we just want the signature of open_virtual_X
to match the signature of open_X
as closely as possible. Even though that signature includes untyped **kwargs
(for open_mfdataset
). We're choosing for now to value consistency with xarray over strong typing.
There's a lot of interconnected technical design issues open. This design doc should help us discuss solutions that impact multiple issues.
The key differences to the current design is using protocols more than ABCs, defining a "Reader" as a protocol required by VirtualiZarr that can be used by backends and the ManifestStore, and orienting backends around ManifestStore creation rather than Dataset creation.
Just a starting point building off ideas shared by @kylebarron and @sharkinsspatial.