Skip to content

Commit 132a9a4

Browse files
yarikopticeffigieskabilarjulia-pfarr
authored
[ENH] Add "study" DatasetType to organize a collection of source and derivative datasets (#1972)
* Add the notion that example layout can in fact be a valid BIDS dataset This reverts commit a3c12f8 where I have tried to introduce it in #1741 but it required a little more of further detailing. * Move and extend description and definition of DatasetType "project" * Add rule to ensure that "project" DatasetType has no subject folders Idea from @effigies while discussing this PR at BIDS Maintainers meeting 2025 * Make NoSubjectFolders into a "warning" from "error" to have aligned with SubjectFolders check Also adjusted wording to be aligned too * Rename "project" to "study" While discussing with @jbpoline we wondered, if may be `study` would be a better descriptor to use here in favor of `project`. One of the rationales, is that e.g. in [BEP035](https://bids.neuroimaging.io/extensions/beps/bep_035.html) (attn @bids-standard/bep035) on Mega-analysis they introduce `study-` entity as a groupping element. It kinda then would match natively. we also mention "study" in various places in BIDS which seems to align nicely here ```shell ❯ git grep study src/CHANGES.md:- \[FIX] update physio bids name in longitudinal study page examples [#863](#863) ([Remi-Gau](https://github.com/Remi-Gau)) src/appendices/coordinate-systems.md:The following template identifiers are RECOMMENDED for individual- and study-specific reference src/appendices/coordinate-systems.md:In the case of multiple study templates, additional names may need to be defined. src/appendices/coordinate-systems.md:| study | Custom space defined using a group/study-specific template. This coordinate system requires specifying an additional file to be fully defined. | src/appendices/hed.md:numerical values that are similar across the recordings in the study. src/appendices/hed.md:repository on GitHub should be used to validate the study event annotations. src/common-principles.md: unless when appropriate given the study goals, for example, when scanning babies. src/introduction.md:> The data used in the study were organized using the src/modality-specific-files/genetic-descriptor.md: "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1", src/modality-specific-files/intracranial-electroencephalography.md:Note that the date and time information SHOULD be stored in the study key file src/modality-specific-files/magnetic-resonance-spectroscopy.md:acquisition parameters in filenames is helpful or necessary to distinguish datasets in a given study. src/modality-specific-files/motion.md:Note that the onsets of the recordings SHOULD be stored in the study key file [(`scans.tsv`)](../modality-agnostic-files.md#scans-file). src/modality-specific-files/positron-emission-tomography.md:This entity is OPTIONAL if only one tracer is used in the study, src/modality-specific-files/task-events.md:Please mind that this does not imply that only so called "event related" study designs src/schema/objects/common_principles.yaml: A set of neuroimaging and behavioral data acquired for a purpose of a particular study. src/schema/objects/common_principles.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/common_principles.yaml: A person or animal participating in the study. src/schema/objects/entities.yaml: For example, this should be used when a study includes two T1w images - src/schema/objects/entities.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/entities.yaml: A person or animal participating in the study. src/schema/objects/enums.yaml:study: src/schema/objects/enums.yaml: value: study src/schema/objects/enums.yaml: display_name: study src/schema/objects/enums.yaml: Custom space defined using a group/study-specific template. src/schema/objects/metadata.yaml: Reference to the study/studies on which the implementation is based. src/schema/objects/metadata.yaml: The version of the HED schema used to validate HED tags for study. tools/schemacode/src/bidsschematools/tests/data/broken_dataset_description.json:"EthicsApprovals": ["The original study from which this BIDS example dataset was derived was approved by the Ethics committee of Ghent University Hospital with identifier EC 2017/1103."] ``` and "project" mentionings are not particularly aligned. So, I think, we should just make it a "study", hence renaming accordingly. * Update src/schema/objects/metadata.yaml * Remove divergence: do alert about absent subjects in any "non-study" dataset Well, any BIDS dataset is a "study" dataset, but there the point is that ATM for both "raw" and "derivative" types we expect to have sub- folders and that was the prior behavior, which should not be affected by this PR. This should address the review comment of @effigies https://github.com/bids-standard/bids-specification/pull/1972/files#r2142639988 * Clarify description of "study BIDS dataset" Co-authored-by: Kabilar Gunalan <[email protected]> --------- Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Kabilar Gunalan <[email protected]> Co-authored-by: Julia-Katharina Pfarr <[email protected]>
1 parent 5a357dd commit 132a9a4

File tree

4 files changed

+69
-45
lines changed

4 files changed

+69
-45
lines changed

src/common-principles.md

Lines changed: 47 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -275,49 +275,6 @@ However, in the case that these data are to be included:
275275
We RECOMMEND including the PDF print-out with the actual sequence
276276
parameters generated by the scanner in the `sourcedata` directory.
277277

278-
Alternatively one can organize their data in the following way
279-
280-
<!-- This block generates a file tree.
281-
A guide for using macros can be found at
282-
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
283-
-->
284-
{{ MACROS___make_filetree_example(
285-
{
286-
"my_project-1": {
287-
"sourcedata": {
288-
"dicoms": {},
289-
"raw": {
290-
"sub-01": {},
291-
"sub-02": {},
292-
"...": "",
293-
"dataset_description.json": "",
294-
"...": "",
295-
},
296-
"..." : "",
297-
},
298-
"derivatives": {
299-
"pipeline_1": {},
300-
"pipeline_2": {},
301-
"...": "",
302-
}
303-
}
304-
}
305-
) }}
306-
307-
In this example, `sourcedata/dicoms` is not nested inside
308-
`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders.
309-
The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets
310-
(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion).
311-
The above example is just a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining BIDS compliance of the raw data directory.
312-
When using this convention it is RECOMMENDED to set the `SourceDatasets`
313-
field in `dataset_description.json` of each subdirectory of `derivatives` to:
314-
315-
```JSON
316-
{
317-
"SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ]
318-
}
319-
```
320-
321278
!!! danger "Caution"
322279

323280
Sharing source data may help amend errors and missing data discovered
@@ -443,6 +400,53 @@ In particular, if a BIDS dataset contains a `derivatives/` subdirectory,
443400
the contents of that directory may be a heterogeneous mix of BIDS Derivatives
444401
datasets and non-compliant derivatives.
445402
403+
## Study dataset
404+
405+
BIDS allows one to organize the data for the entire study (original source data, raw BIDS, derivatives) as a valid BIDS dataset in the following way
406+
407+
<!-- This block generates a file tree.
408+
A guide for using macros can be found at
409+
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
410+
-->
411+
{{ MACROS___make_filetree_example(
412+
{
413+
"study-1": {
414+
"sourcedata": {
415+
"dicoms": {},
416+
"raw": {
417+
"sub-01": {},
418+
"sub-02": {},
419+
"...": "",
420+
"dataset_description.json": "",
421+
"...": "",
422+
},
423+
"..." : "",
424+
},
425+
"derivatives": {
426+
"pipeline_1": {},
427+
"pipeline_2": {},
428+
"...": "",
429+
},
430+
"dataset_description.json": "",
431+
"...": "",
432+
}
433+
}
434+
) }}
435+
436+
In this example, `sourcedata/dicoms` is not nested inside
437+
`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders.
438+
The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets
439+
(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion).
440+
The above example is a fully compliant BIDS dataset, providing a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining overall BIDS compliance.
441+
When using this convention, `dataset_description.json` MUST have `DatasetType` to be set to `"study"`. It is also RECOMMENDED to set the `SourceDatasets`
442+
field in `dataset_description.json` of each subdirectory of `derivatives` to:
443+
444+
```JSON
445+
{
446+
"SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ]
447+
}
448+
```
449+
446450
## File format specification
447451

448452
### Imaging files

src/schema/objects/enums.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -360,14 +360,20 @@ individual:
360360
In context of surfaces this space has been referred to as `fsnative`.
361361
362362
In order for this space to be interpretable, `SpatialReference` metadata MUST be provided.
363-
study:
363+
study__space:
364364
value: study
365365
display_name: study
366366
description: |
367367
Custom space defined using a group/study-specific template.
368368
This coordinate system requires specifying an additional file to be fully defined.
369369
370370
In order for this space to be interpretable, `SpatialReference` metadata MUST be provided.
371+
study__datasettype:
372+
value: study
373+
display_name: study
374+
description: |
375+
A study BIDS dataset to organize the data for the entire study (original source data, raw BIDS,
376+
derivatives) as a valid BIDS dataset.
371377
scanner:
372378
value: scanner
373379
display_name: scanner

src/schema/objects/metadata.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,7 @@ DatasetType:
582582
enum:
583583
- $ref: objects.enums.raw.value
584584
- $ref: objects.enums.derivative.value
585+
- $ref: objects.enums.study__datasettype.value
585586
DecayCorrectionFactor:
586587
name: DecayCorrectionFactor
587588
display_name: Decay Correction Factor

src/schema/rules/checks/dataset.yaml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,26 @@ SubjectFolders:
66
issue:
77
code: SUBJECT_FOLDERS
88
message: |
9-
There are no subject directories (labeled "sub-*") in the root of this dataset.
9+
There are no subject directories (labeled "sub-*") in the root of this BIDS dataset.
1010
level: warning
1111
selectors:
1212
- path == '/dataset_description.json'
13+
- dataset.dataset_description.DatasetType != "study"
1314
checks:
1415
- length(dataset.subjects.sub_dirs) > 0
1516

17+
NoSubjectFolders:
18+
issue:
19+
code: NOSUBJECT_FOLDERS
20+
message: |
21+
There should be no subject directories (labeled "sub-*") in the root of this study BIDS dataset.
22+
level: warning
23+
selectors:
24+
- path == '/dataset_description.json'
25+
- dataset.dataset_description.DatasetType == "study"
26+
checks:
27+
- length(dataset.subjects.sub_dirs) == 0
28+
1629
# 49
1730
ParticipantIDMismatch:
1831
issue:

0 commit comments

Comments
 (0)