From c529ca6818f5cdeb0b85edff9ec3bf820a8c0b5a Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Thu, 20 Jun 2024 00:16:10 -0400 Subject: [PATCH 1/8] Add the notion that example layout can in fact be a valid BIDS dataset This reverts commit a3c12f89bbca7a57f77832d146a808f6c6ca0194 where I have tried to introduce it in https://github.com/bids-standard/bids-specification/pull/1741 but it required a little more of further detailing. --- src/common-principles.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/common-principles.md b/src/common-principles.md index d15cfa937a..64cf87ef23 100644 --- a/src/common-principles.md +++ b/src/common-principles.md @@ -283,7 +283,7 @@ A guide for using macros can be found at --> {{ MACROS___make_filetree_example( { - "my_project-1": { + "my_dataset-1": { "sourcedata": { "dicoms": {}, "raw": { @@ -299,7 +299,9 @@ A guide for using macros can be found at "pipeline_1": {}, "pipeline_2": {}, "...": "", - } + }, + "dataset_description.json": "", + "...": "", } } ) }} @@ -308,7 +310,7 @@ In this example, `sourcedata/dicoms` is not nested inside `sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders. The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets (see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion). -The above example is just a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining BIDS compliance of the raw data directory. +The above example is a fully compliant BIDS dataset, providing a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining overall BIDS compliance. When using this convention it is RECOMMENDED to set the `SourceDatasets` field in `dataset_description.json` of each subdirectory of `derivatives` to: From 8eb8769490c6488570376e12f64c40d2099088df Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Thu, 20 Jun 2024 03:57:23 -0400 Subject: [PATCH 2/8] Move and extend description and definition of DatasetType "project" --- src/common-principles.md | 92 ++++++++++++++-------------- src/schema/objects/enums.yaml | 5 ++ src/schema/objects/metadata.yaml | 1 + src/schema/rules/checks/dataset.yaml | 3 +- 4 files changed, 55 insertions(+), 46 deletions(-) diff --git a/src/common-principles.md b/src/common-principles.md index 64cf87ef23..3321c7c491 100644 --- a/src/common-principles.md +++ b/src/common-principles.md @@ -275,51 +275,6 @@ However, in the case that these data are to be included: We RECOMMEND including the PDF print-out with the actual sequence parameters generated by the scanner in the `sourcedata` directory. -Alternatively one can organize their data in the following way - - -{{ MACROS___make_filetree_example( - { - "my_dataset-1": { - "sourcedata": { - "dicoms": {}, - "raw": { - "sub-01": {}, - "sub-02": {}, - "...": "", - "dataset_description.json": "", - "...": "", - }, - "..." : "", - }, - "derivatives": { - "pipeline_1": {}, - "pipeline_2": {}, - "...": "", - }, - "dataset_description.json": "", - "...": "", - } - } -) }} - -In this example, `sourcedata/dicoms` is not nested inside -`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders. -The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets -(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion). -The above example is a fully compliant BIDS dataset, providing a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining overall BIDS compliance. -When using this convention it is RECOMMENDED to set the `SourceDatasets` -field in `dataset_description.json` of each subdirectory of `derivatives` to: - -```JSON -{ - "SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ] -} -``` - !!! danger "Caution" Sharing source data may help amend errors and missing data discovered @@ -445,6 +400,53 @@ In particular, if a BIDS dataset contains a `derivatives/` subdirectory, the contents of that directory may be a heterogeneous mix of BIDS Derivatives datasets and non-compliant derivatives. +## Project dataset + +BIDS allows one to organize the data for the entire project (original source data, raw BIDS, derivatives) as a valid BIDS dataset in the following way + + +{{ MACROS___make_filetree_example( + { + "my_project-1": { + "sourcedata": { + "dicoms": {}, + "raw": { + "sub-01": {}, + "sub-02": {}, + "...": "", + "dataset_description.json": "", + "...": "", + }, + "..." : "", + }, + "derivatives": { + "pipeline_1": {}, + "pipeline_2": {}, + "...": "", + }, + "dataset_description.json": "", + "...": "", + } + } +) }} + +In this example, `sourcedata/dicoms` is not nested inside +`sourcedata/raw`, **and only the `sourcedata/raw` subdirectory** is a BIDS-compliant dataset among `sourcedata/` subfolders. +The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets +(see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion). +The above example is a fully compliant BIDS dataset, providing a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining overall BIDS compliance. +When using this convention, `dataset_description.json` MUST have `DatasetType` to be set to `"project"`. It is also RECOMMENDED to set the `SourceDatasets` +field in `dataset_description.json` of each subdirectory of `derivatives` to: + +```JSON +{ + "SourceDatasets": [ {"URL": "../../sourcedata/raw/"} ] +} +``` + ## File format specification ### Imaging files diff --git a/src/schema/objects/enums.yaml b/src/schema/objects/enums.yaml index d2da4fd509..7075d00be0 100644 --- a/src/schema/objects/enums.yaml +++ b/src/schema/objects/enums.yaml @@ -1319,6 +1319,11 @@ derivative: display_name: derivative description: | A derived BIDS dataset. +project: + value: project + display_name: project + description: | + A project BIDS dataset. balanced: value: balanced display_name: balanced diff --git a/src/schema/objects/metadata.yaml b/src/schema/objects/metadata.yaml index 8af01e23a4..3d0c606b88 100644 --- a/src/schema/objects/metadata.yaml +++ b/src/schema/objects/metadata.yaml @@ -583,6 +583,7 @@ DatasetType: enum: - $ref: objects.enums.raw.value - $ref: objects.enums.derivative.value + - $ref: objects.enums.project.value DecayCorrectionFactor: name: DecayCorrectionFactor display_name: Decay Correction Factor diff --git a/src/schema/rules/checks/dataset.yaml b/src/schema/rules/checks/dataset.yaml index eee9abb17c..a33004e4d1 100644 --- a/src/schema/rules/checks/dataset.yaml +++ b/src/schema/rules/checks/dataset.yaml @@ -6,10 +6,11 @@ SubjectFolders: issue: code: SUBJECT_FOLDERS message: | - There are no subject directories (labeled "sub-*") in the root of this dataset. + There are no subject directories (labeled "sub-*") in the root of this raw BIDS dataset. level: warning selectors: - path == '/dataset_description.json' + - dataset.dataset_description.DatasetType == "raw" checks: - length(dataset.subjects.sub_dirs) > 0 From 1929135999bbc2c68e215febef6ac296745730eb Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Tue, 10 Jun 2025 08:38:54 -0400 Subject: [PATCH 3/8] Add rule to ensure that "project" DatasetType has no subject folders Idea from @effigies while discussing this PR at BIDS Maintainers meeting 2025 --- src/schema/rules/checks/dataset.yaml | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/schema/rules/checks/dataset.yaml b/src/schema/rules/checks/dataset.yaml index a33004e4d1..14c7d548a1 100644 --- a/src/schema/rules/checks/dataset.yaml +++ b/src/schema/rules/checks/dataset.yaml @@ -14,6 +14,18 @@ SubjectFolders: checks: - length(dataset.subjects.sub_dirs) > 0 +NoSubjectFolders: + issue: + code: NOSUBJECT_FOLDERS + message: | + There must be no subject directories (labeled "sub-*") in the root of the "project" type BIDS dataset. + level: error + selectors: + - path == '/dataset_description.json' + - dataset.dataset_description.DatasetType == "project" + checks: + - length(dataset.subjects.sub_dirs) == 0 + # 49 ParticipantIDMismatch: issue: From adfcc79bdd21f73519a9fe97ad51420ddf40c684 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Tue, 10 Jun 2025 08:43:08 -0400 Subject: [PATCH 4/8] Make NoSubjectFolders into a "warning" from "error" to have aligned with SubjectFolders check Also adjusted wording to be aligned too --- src/schema/rules/checks/dataset.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/schema/rules/checks/dataset.yaml b/src/schema/rules/checks/dataset.yaml index 14c7d548a1..2c1851a230 100644 --- a/src/schema/rules/checks/dataset.yaml +++ b/src/schema/rules/checks/dataset.yaml @@ -18,8 +18,8 @@ NoSubjectFolders: issue: code: NOSUBJECT_FOLDERS message: | - There must be no subject directories (labeled "sub-*") in the root of the "project" type BIDS dataset. - level: error + There should be no subject directories (labeled "sub-*") in the root of this project BIDS dataset. + level: warning selectors: - path == '/dataset_description.json' - dataset.dataset_description.DatasetType == "project" From 315c08fb40d4bde609a3d28d470d568b5e7b7f70 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Thu, 12 Jun 2025 04:23:43 -0400 Subject: [PATCH 5/8] Rename "project" to "study" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While discussing with @jbpoline we wondered, if may be `study` would be a better descriptor to use here in favor of `project`. One of the rationales, is that e.g. in [BEP035](https://bids.neuroimaging.io/extensions/beps/bep_035.html) (attn @bids-standard/bep035) on Mega-analysis they introduce `study-` entity as a groupping element. It kinda then would match natively. we also mention "study" in various places in BIDS which seems to align nicely here ```shell ❯ git grep study src/CHANGES.md:- \[FIX] update physio bids name in longitudinal study page examples [#863](https://github.com/bids-standard/bids-specification/pull/863) ([Remi-Gau](https://github.com/Remi-Gau)) src/appendices/coordinate-systems.md:The following template identifiers are RECOMMENDED for individual- and study-specific reference src/appendices/coordinate-systems.md:In the case of multiple study templates, additional names may need to be defined. src/appendices/coordinate-systems.md:| study | Custom space defined using a group/study-specific template. This coordinate system requires specifying an additional file to be fully defined. | src/appendices/hed.md:numerical values that are similar across the recordings in the study. src/appendices/hed.md:repository on GitHub should be used to validate the study event annotations. src/common-principles.md: unless when appropriate given the study goals, for example, when scanning babies. src/introduction.md:> The data used in the study were organized using the src/modality-specific-files/genetic-descriptor.md: "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1", src/modality-specific-files/intracranial-electroencephalography.md:Note that the date and time information SHOULD be stored in the study key file src/modality-specific-files/magnetic-resonance-spectroscopy.md:acquisition parameters in filenames is helpful or necessary to distinguish datasets in a given study. src/modality-specific-files/motion.md:Note that the onsets of the recordings SHOULD be stored in the study key file [(`scans.tsv`)](../modality-agnostic-files.md#scans-file). src/modality-specific-files/positron-emission-tomography.md:This entity is OPTIONAL if only one tracer is used in the study, src/modality-specific-files/task-events.md:Please mind that this does not imply that only so called "event related" study designs src/schema/objects/common_principles.yaml: A set of neuroimaging and behavioral data acquired for a purpose of a particular study. src/schema/objects/common_principles.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/common_principles.yaml: A person or animal participating in the study. src/schema/objects/entities.yaml: For example, this should be used when a study includes two T1w images - src/schema/objects/entities.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/entities.yaml: A person or animal participating in the study. src/schema/objects/enums.yaml:study: src/schema/objects/enums.yaml: value: study src/schema/objects/enums.yaml: display_name: study src/schema/objects/enums.yaml: Custom space defined using a group/study-specific template. src/schema/objects/metadata.yaml: Reference to the study/studies on which the implementation is based. src/schema/objects/metadata.yaml: The version of the HED schema used to validate HED tags for study. tools/schemacode/src/bidsschematools/tests/data/broken_dataset_description.json:"EthicsApprovals": ["The original study from which this BIDS example dataset was derived was approved by the Ethics committee of Ghent University Hospital with identifier EC 2017/1103."] ``` and "project" mentionings are not particularly aligned. So, I think, we should just make it a "study", hence renaming accordingly. --- src/common-principles.md | 8 ++++---- src/schema/objects/enums.yaml | 12 ++++++------ src/schema/objects/metadata.yaml | 2 +- src/schema/rules/checks/dataset.yaml | 4 ++-- 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/src/common-principles.md b/src/common-principles.md index 3321c7c491..e86d9042f2 100644 --- a/src/common-principles.md +++ b/src/common-principles.md @@ -400,9 +400,9 @@ In particular, if a BIDS dataset contains a `derivatives/` subdirectory, the contents of that directory may be a heterogeneous mix of BIDS Derivatives datasets and non-compliant derivatives. -## Project dataset +## Study dataset -BIDS allows one to organize the data for the entire project (original source data, raw BIDS, derivatives) as a valid BIDS dataset in the following way +BIDS allows one to organize the data for the entire study (original source data, raw BIDS, derivatives) as a valid BIDS dataset in the following way {{ MACROS___make_filetree_example( { - "my_project-1": { + "study-1": { "sourcedata": { "dicoms": {}, "raw": { @@ -438,7 +438,7 @@ In this example, `sourcedata/dicoms` is not nested inside The subdirectories of `derivatives` MAY be BIDS-compliant derivatives datasets (see [Non-compliant derivatives](#non-compliant-derivatives) for further discussion). The above example is a fully compliant BIDS dataset, providing a convention useful for organizing source, raw BIDS, and derived BIDS data while maintaining overall BIDS compliance. -When using this convention, `dataset_description.json` MUST have `DatasetType` to be set to `"project"`. It is also RECOMMENDED to set the `SourceDatasets` +When using this convention, `dataset_description.json` MUST have `DatasetType` to be set to `"study"`. It is also RECOMMENDED to set the `SourceDatasets` field in `dataset_description.json` of each subdirectory of `derivatives` to: ```JSON diff --git a/src/schema/objects/enums.yaml b/src/schema/objects/enums.yaml index 7075d00be0..388fdd29cf 100644 --- a/src/schema/objects/enums.yaml +++ b/src/schema/objects/enums.yaml @@ -360,7 +360,7 @@ individual: In context of surfaces this space has been referred to as `fsnative`. In order for this space to be interpretable, `SpatialReference` metadata MUST be provided. -study: +study__space: value: study display_name: study description: | @@ -368,6 +368,11 @@ study: This coordinate system requires specifying an additional file to be fully defined. In order for this space to be interpretable, `SpatialReference` metadata MUST be provided. +study__datasettype: + value: study + display_name: study + description: | + A study BIDS dataset. scanner: value: scanner display_name: scanner @@ -1319,11 +1324,6 @@ derivative: display_name: derivative description: | A derived BIDS dataset. -project: - value: project - display_name: project - description: | - A project BIDS dataset. balanced: value: balanced display_name: balanced diff --git a/src/schema/objects/metadata.yaml b/src/schema/objects/metadata.yaml index 3d0c606b88..a229178c0d 100644 --- a/src/schema/objects/metadata.yaml +++ b/src/schema/objects/metadata.yaml @@ -583,7 +583,7 @@ DatasetType: enum: - $ref: objects.enums.raw.value - $ref: objects.enums.derivative.value - - $ref: objects.enums.project.value + - $ref: objects.enums.study__datasetype.value DecayCorrectionFactor: name: DecayCorrectionFactor display_name: Decay Correction Factor diff --git a/src/schema/rules/checks/dataset.yaml b/src/schema/rules/checks/dataset.yaml index 2c1851a230..104884ab3c 100644 --- a/src/schema/rules/checks/dataset.yaml +++ b/src/schema/rules/checks/dataset.yaml @@ -18,11 +18,11 @@ NoSubjectFolders: issue: code: NOSUBJECT_FOLDERS message: | - There should be no subject directories (labeled "sub-*") in the root of this project BIDS dataset. + There should be no subject directories (labeled "sub-*") in the root of this study BIDS dataset. level: warning selectors: - path == '/dataset_description.json' - - dataset.dataset_description.DatasetType == "project" + - dataset.dataset_description.DatasetType == "study" checks: - length(dataset.subjects.sub_dirs) == 0 From e8c5d84a83305ee7d48e8267ced1a41727e2adb5 Mon Sep 17 00:00:00 2001 From: Chris Markiewicz Date: Thu, 12 Jun 2025 06:03:33 -0400 Subject: [PATCH 6/8] Update src/schema/objects/metadata.yaml --- src/schema/objects/metadata.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/schema/objects/metadata.yaml b/src/schema/objects/metadata.yaml index a229178c0d..a612f91fe3 100644 --- a/src/schema/objects/metadata.yaml +++ b/src/schema/objects/metadata.yaml @@ -583,7 +583,7 @@ DatasetType: enum: - $ref: objects.enums.raw.value - $ref: objects.enums.derivative.value - - $ref: objects.enums.study__datasetype.value + - $ref: objects.enums.study__datasettype.value DecayCorrectionFactor: name: DecayCorrectionFactor display_name: Decay Correction Factor From 1ac57a04604663e58516ef7b2a0f4bcaaacf9ee5 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Fri, 13 Jun 2025 15:30:23 -0400 Subject: [PATCH 7/8] Remove divergence: do alert about absent subjects in any "non-study" dataset Well, any BIDS dataset is a "study" dataset, but there the point is that ATM for both "raw" and "derivative" types we expect to have sub- folders and that was the prior behavior, which should not be affected by this PR. This should address the review comment of @effigies https://github.com/bids-standard/bids-specification/pull/1972/files#r2142639988 --- src/schema/rules/checks/dataset.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/schema/rules/checks/dataset.yaml b/src/schema/rules/checks/dataset.yaml index 104884ab3c..5fbf91c4a6 100644 --- a/src/schema/rules/checks/dataset.yaml +++ b/src/schema/rules/checks/dataset.yaml @@ -6,11 +6,11 @@ SubjectFolders: issue: code: SUBJECT_FOLDERS message: | - There are no subject directories (labeled "sub-*") in the root of this raw BIDS dataset. + There are no subject directories (labeled "sub-*") in the root of this BIDS dataset. level: warning selectors: - path == '/dataset_description.json' - - dataset.dataset_description.DatasetType == "raw" + - dataset.dataset_description.DatasetType != "study" checks: - length(dataset.subjects.sub_dirs) > 0 From 02a80743fcc8a9432313b47838cd7765275886a8 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Fri, 8 Aug 2025 13:05:28 -0400 Subject: [PATCH 8/8] Clarify description of "study BIDS dataset" Co-authored-by: Kabilar Gunalan --- src/schema/objects/enums.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/schema/objects/enums.yaml b/src/schema/objects/enums.yaml index 388fdd29cf..75251d945a 100644 --- a/src/schema/objects/enums.yaml +++ b/src/schema/objects/enums.yaml @@ -372,7 +372,8 @@ study__datasettype: value: study display_name: study description: | - A study BIDS dataset. + A study BIDS dataset to organize the data for the entire study (original source data, raw BIDS, + derivatives) as a valid BIDS dataset. scanner: value: scanner display_name: scanner