Skip to content

[Feature]: Streaming DANDI:000541 takes a long time #1889

@rly

Description

@rly

What would you like to see added to PyNWB?

From @dysprague: When looping through all files in dandiset 000541 and extracting the NeuroPAL images, it takes ~33 minutes. There are 21 files that are on the order of ~2 GB. This is a lot slower than the other dandisets that also have NeuroPAL images (e.g., 000714, 000692, and 000776). This problem exists for streaming with both PyNWB and MatNWB.

It is actually faster to download and open the file than stream it on my computer and connection.

I suspect it has to do with the fact that this dandiset has one set of 960 PlaneSegmentation tables for the "CalciumSeriesSegmentation" ImageSegmentation group, another set of 960 for the "CalciumSeriesSegmentationdNMF" ImageSegmentation group, and another set of 960 for the "NeuronIDs/ImageSegmentation" group. Each table represents the segmentation at a particular time point. That is a lot of groups.

Is your feature request related to a problem?

No response

What solution would you like?

Provide a recommendation for how to reorganize this data for more efficient streaming. I can do this but I need to look more closely into what is changing across tables / ImageSegmentation groups. It is possible that this can all be combined into a single (or two) PlaneSegmentation table with a column for time sample.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

Metadata

Metadata

Assignees

Labels

category: enhancementimprovements of code or code behaviorpriority: lowalternative solution already working and/or relevant to only specific user(s)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions