-
Notifications
You must be signed in to change notification settings - Fork 90
Description
What would you like to see added to PyNWB?
Chunking generally improves read/write performance and is more cloud-friendly (and LINDI-friendly). (Related to NeurodataWithoutBorders/lindi#84.)
I suggest that TimeSeries.__init__ wraps data and timestamps with a H5DataIO or ZarrDataIO, depending on backend, with chunks=True, if the input data/timestamps are not already wrapped. We can add flags to say chunk_data=True, chunk_timestamps=True that users can flip to turn off this behavior. A challenge will be figuring out the backend within TimeSeries...
We could use the h5py defaults for now, and more targeted defaults for ElectricalSeries data / TwoPhotonSeries data later.
I believe all Zarr data are already chunked by default.
Is your feature request related to a problem?
Contiguous HDF5 datasets have slow performance in the non-contiguous dimensions and are difficult to stream or use with Zarr/LINDI.
What solution would you like?
Chunking of time series data by default.
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
- I agree to follow this project's Code of Conduct
- Have you checked the Contributing document?
- Have you ensured this change was not already requested?