Description
While working on the Microscopy BEP, it was brought to our attention that some very large microscopy datasets sometimes need to be split across different folders. For example because of limitations or performance issue with large files or large number of files in a single repository.
I was wondering if this issue has come up in BIDS in the past and if there is an official mechanism for dealing with such situations?
Here is an example to illustrate my thoughts.
In this example, one subject (sub-01
) has 2000 samples (sample-0001 to sample-2000
), and each of the sample has 20 chunks (chunk-01 to chunk-20
), as illustrated below:
dataset
└── sub-01
└── microscopy
├── sub-01_sample-0001_chunk-01_BF.tif
├── sub-01_sample-0001_chunk-02_BF.tif
├── ...
├── sub-01_sample-0001_chunk-20_BF.tif
├── ...
├── sub-01_sample-2000_chunk-01_BF.tif
├── sub-01_sample-2000_chunk-02_BF.tif
├── ...
└── sub-01_sample-2000_chunk-20_BF.tif
Let’s say that the dataset needs to be split in 2, I would suggest to split the dataset with the first 1000 samples in one dataset (dataset1
) and the samples 1001 to 2000 in another dataset (dataset2
), as follow:
dataset-01
└── sub-01
└── microscopy
├── sub-01_sample-0001_chunk-01_BF.tif
├── sub-01_sample-0001_chunk-02_BF.tif
├── ...
├── sub-01_sample-0001_chunk-20_BF.tif
├── ...
├── sub-01_sample-1000_chunk-01_BF.tif
├── sub-01_sample-1000_chunk-02_BF.tif
├── ...
└── sub-01_sample-1000_chunk-20_BF.tif
dataset-02
└── sub-01
└── microscopy
├── sub-01_sample-1001_chunk-01_BF.tif
├── sub-01_sample-1001_chunk-02_BF.tif
├── ...
├── sub-01_sample-1001_chunk-20_BF.tif
├── ...
├── sub-01_sample-2000_chunk-01_BF.tif
├── sub-01_sample-2000_chunk-02_BF.tif
├── ...
└── sub-01_sample-2000_chunk-20_BF.tif
Would that splitting method make sense with BIDS?
And in a case like this, is there a way to "link" the 2 datasets together, in dataset_description.json
for example?
Thank you!