Skip to content

[Feature]: Chunk all TimeSeries data and timestamps by default #1945

@rly

Description

@rly

What would you like to see added to PyNWB?

Chunking generally improves read/write performance and is more cloud-friendly (and LINDI-friendly). (Related to NeurodataWithoutBorders/lindi#84.)

I suggest that TimeSeries.__init__ wraps data and timestamps with a H5DataIO or ZarrDataIO, depending on backend, with chunks=True, if the input data/timestamps are not already wrapped. We can add flags to say chunk_data=True, chunk_timestamps=True that users can flip to turn off this behavior. A challenge will be figuring out the backend within TimeSeries...

We could use the h5py defaults for now, and more targeted defaults for ElectricalSeries data / TwoPhotonSeries data later.

I believe all Zarr data are already chunked by default.

Is your feature request related to a problem?

Contiguous HDF5 datasets have slow performance in the non-contiguous dimensions and are difficult to stream or use with Zarr/LINDI.

What solution would you like?

Chunking of time series data by default.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

Metadata

Metadata

Assignees

Labels

category: proposalproposed enhancements or new featurespriority: mediumnon-critical problem and/or affecting only a small set of NWB users

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions