Skip to content

Propose design doc and roadmap for VirtualiZarr 2.0 #568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from

Conversation

maxrjones
Copy link
Member

There's a lot of interconnected technical design issues open. This design doc should help us discuss solutions that impact multiple issues.

The key differences to the current design is using protocols more than ABCs, defining a "Reader" as a protocol required by VirtualiZarr that can be used by backends and the ManifestStore, and orienting backends around ManifestStore creation rather than Dataset creation.

Just a starting point building off ideas shared by @kylebarron and @sharkinsspatial.

@maxrjones
Copy link
Member Author

@kylebarron @chuckwondo @d-v-b I wasn't able to request reviews from you all but it'd be great to get your thoughts on this if you have time

@maxrjones maxrjones changed the title Propose design doc for VirtualiZarr V2 architecture discussion Propose design doc for VirtualiZarr V2 architecture Apr 25, 2025
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my suggestions are punctuation and syntax fixes, but I do have a couple of more significant comments in here.

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the initiative here @maxrjones ! It's clear from this that there was far too much intention just scattered around various issues - it's nice to discuss it in one place.

Copy link

codecov bot commented Apr 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.34%. Comparing base (ff1ddb4) to head (81705e9).
Report is 8 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #568      +/-   ##
===========================================
+ Coverage    88.80%   89.34%   +0.53%     
===========================================
  Files           34       34              
  Lines         1948     1943       -5     
===========================================
+ Hits          1730     1736       +6     
+ Misses         218      207      -11     

see 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@abarciauskas-bgse
Copy link
Collaborator

Thanks for putting this together @maxrjones it is super helpful to have these conversations synthesized into this design doc, especially with the associated issues linked. Take my comments with a grain of salt as I haven't been involved directly in these discussions.

Co-authored-by: Aimee Barciauskas <[email protected]>
@maxrjones maxrjones changed the title Propose design doc for VirtualiZarr V2 architecture Propose design doc and roadmap for VirtualiZarr 2.0 May 15, 2025
Co-authored-by: Tom Nicholas <[email protected]>
@norlandrhagen
Copy link
Collaborator

Not sure which Phase in the roadmap it fits into, but having https support for virtual stores would be really clutch for a lot of usecases.

Co-authored-by: Raphael Hagen <[email protected]>
join: "JoinOptions" = "outer",
attrs_file: str | os.PathLike | None = None,
combine_attrs: "CombineAttrsOptions" = "override",
**kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest considering removing **kwargs from any top-level function because it's often unclear where they go. Instead, if those kwargs are intended for the object reader, they should be passed into the reader before the instance is passed into this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides these arguments to open_virtual_dataset:

    drop_variables: Iterable[str] | None = None,
    loadable_variables: Iterable[str] | None = None,
    decode_times: bool | None = None,
    cftime_variables: Iterable[str] | None = None,
    indexes: Mapping[str, Index] | None = None,

Since I think the motivation was to follow the design of https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html closely, I doubt we'll want to change this even though I also dislike pass-through **kwargs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know it's always going to be those 5 arguments, why not re-declare them here so you get accurate type hinting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to Tom on this to decide the preferred balance of mirroring Xarray vs accurate type hinting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to match xarray's **kwargs syntax in today's coordination call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we are having **kwargs in our signature or we're going to hoist the specific argument parameters that xarray supports into our signature so that we don't have to accept **kwargs here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we are having **kwargs in our signature

Yes, following the design pattern established in https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we just want the signature of open_virtual_X to match the signature of open_X as closely as possible. Even though that signature includes untyped **kwargs (for open_mfdataset). We're choosing for now to value consistency with xarray over strong typing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants