Skip to content

Commit c9ef93a

Browse files
committed
Clarify BIDS dataset organization and distribution considerations
- Add note about BIDS Raw datasets being distributable without derivatives - Include dataset_description.json in directory structure examples to emphasize where we observe legit BIDS datasets - Explain disadvantages of nested dataset organization for distribution - Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets - Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories - Re-Include example of non-nested dataset organization in my_study folder (I based this change on top of the removal proposal in #687)
1 parent 1ef8732 commit c9ef93a

File tree

1 file changed

+34
-4
lines changed

1 file changed

+34
-4
lines changed

docs/getting_started/folders_and_files/derivatives.md

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,8 @@ BIDS Derivatives datasets are intended to be interpretable and distributable
184184
with or without the datasets used to generate them.
185185
This is necessary for storage and bandwidth constraints,
186186
as well as to permit the distribution of derivatives when the source data are restricted.
187+
Similarly, BIDS Raw datasets should be interpretable and distributable without
188+
all possible derivatives produced from them.
187189

188190
This independence affords flexibility in the relative organization of datasets.
189191
The following examples show three ways to organize, relative to each other,
@@ -199,22 +201,50 @@ my_dataset/
199201
analysis/
200202
sub-01/
201203
...
204+
dataset_description.json
202205
```
203206

204-
A BIDS Derivatives dataset may contain references to its input datasets
205-
in the `sourcedata/` subdirectory:
207+
Disadvantage is that such organization would complicate distribution of the raw BIDS dataset
208+
by itself as it would require explicit exclusion of datasets within its `derivatives/` folder.
209+
210+
A BIDS Derivative dataset may contain references to its input datasets
211+
(could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the `sourcedata/` subdirectory:
206212

207213
```bash
208214
my_analysis/
209215
sourcedata/
210216
raw/
217+
sub-01/
218+
...
219+
dataset_description.json
211220
preprocessed/
212221
sub-01/
213222
...
223+
dataset_description.json
214224
```
215225

226+
Disadvantage here is similar -- distribution of such BIDS Derivative dataset alone would
227+
require explicit exclusion of the datasets within its `sourcedata/` folder.
228+
216229
Note that the `sourcedata/` and `derivatives/` subdirectories constitute dataset boundaries.
217-
Any contents of these directories may be validated independently,
218-
but their contents must not affect the interpretation of the nested or containing datasets.
230+
Any subfolders of these directories may be validated independently, if they are BIDS datasets
231+
which would be indicated by presence of `dataset_description.json` in them with a
232+
REQUIRED `"BIDSVersion"` key.
233+
It is important to note that their contents must not affect the interpretation of the nested
234+
or containing datasets.
235+
236+
It is also possible to completely avoid nesting of datasets by simply placing them in a folder
237+
containing both `sourcedata/` and `derivatives/` at the same time:
238+
239+
```bash
240+
my_study/
241+
sourcedata/
242+
raw/
243+
sub-01/
244+
...
245+
derivatives/
246+
preprocessed/
247+
analysis/
248+
```
219249

220250
<!-- TODO derivatives JSON -->

0 commit comments

Comments
 (0)