-
Notifications
You must be signed in to change notification settings - Fork 4
Description
This is a bit of a very early stages idea, but I'm just putting a ticket up for documenting and sharing ideas. I think it would be really interesting for an H5Dataset to be something of a subclassed numpy array. I'm interested in that object acting like a numpy array but having additional h5coro specific attributes and methods.
I still need to figure out exactly what this means, probably by doing some playing around. I oscillate between being really excited about it and feeling like we should leave super user-oriented data structures up to the higher level libraries like pandas or xarray.
I dug into the feasibility of this and it turns out that actually subclassing numpy arrays is tricky and, overall,
not encouraged (reference). There are, however, a few pages that discuss creating numpy-compatible containers: See interoperability or custom array containers. Of particular interest was the example I read using pandas dataframes as an example of achieving numpy interoperability with a separate array-like data structure. Dask arrays were another interesting example, in particular because dask also has to deal with an async/lazy loading paradigm.