Skip to content

Commit 6938a3b

Browse files
authored
Merge pull request #159 from bclenet/spec
Provenance description levels
2 parents aa81f81 + 3a0a832 commit 6938a3b

File tree

1 file changed

+69
-125
lines changed

1 file changed

+69
-125
lines changed

bep028spec.md

Lines changed: 69 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -443,9 +443,6 @@ BIDS-Prov introduces the following entity:
443443
In the following example, two separated processings (`conversion` and `smoothing`) were performed on the data, resulting in two groups of provenance records.
444444
```
445445
└─ dataset
446-
├─ sub-001/
447-
│ └─ prov/
448-
│ └─ sub-001_prov-smoothing_act.json
449446
└─ prov/
450447
├─ prov-conversion_all.jsonld
451448
├─ prov-smoothing_base.json
@@ -527,20 +524,48 @@ This section describes the places where BIDS-Prov metadata can be stored.
527524

528525
For further information about organization conventions, please consult the BIDS specification ([https://bids-specification.readthedocs.io](https://bids-specification.readthedocs.io)). Until these conventions are established in BIDS, it is RECOMMENDED to use the following.
529526

530-
BIDS-Prov metadata can be stored at different levels:
531-
* at dataset level ;
532-
* inside dataset subdirectories ;
533-
* at file level.
527+
BIDS-Prov metadata can be stored in different places:
528+
* inside a top-level `prov/` directory;
529+
* inside sidecar JSON files;
530+
* inside the `dataset_description.json` file.
534531

535-
It is recommended that the records are stored at the level they describe. E.g.:
536-
* an Activity that generated as set of files for several subjects of the dataset must be described at the dataset level ;
537-
* an Activity that generated as set of files for one subject only must be described at the subject's subdirectory level ;
538-
* an Activity that generated one file only can be described at this file's level.
532+
#### 3.2.1 `prov/` directory
539533

540-
#### 3.2.1 File level provenance
534+
BIDS-Prov files can be stored in a `prov/` directory immediately below the BIDS dataset (or BIDS-Derivatives dataset) root. At the dataset level, provenance can be about any BIDS file in the dataset.
541535

542-
> [!CAUTION]
543-
> TODO: at this level, how do we know to which provenance group belongs the records in the sidecar JSONs? (As no `prov` entity is used)
536+
Each BIDS-Prov file MUST meet the following naming convention. The `label` of the `prov` entity is arbitrary, `suffix` is one of listed in [3.1.3 Suffixes](#3-1-3-suffixes), and `extension` is either `json` or `jsonld`
537+
538+
```
539+
prov/
540+
[<subdirectories>/]
541+
prov-<label>_<suffix>.<extension>
542+
```
543+
544+
Here is an example:
545+
```
546+
└─ dataset
547+
├─ prov/
548+
│ ├─ dcm2niix/
549+
│ │ └─ prov-dcm2niix_base.jsonld
550+
│ ├─ prov-preprocessing_base.json
551+
│ ├─ prov-preprocessing_soft.json
552+
│ └─ ...
553+
├─ sub-001/
554+
├─ sub-002/
555+
├─ sub-003/
556+
├─ ...
557+
└─ dataset_description.json
558+
```
559+
560+
> [!WARNING]
561+
> When using `.json` files, the `@context` and `BIDSProvVersion` fields MUST be defined inside a `*_base.json` file, e.g.:
562+
> ```JSON
563+
> {
564+
> "@context": "https://purl.org/nidash/bidsprov/context.json",
565+
> "BIDSProvVersion": "0.0.1"
566+
> }
567+
568+
#### 3.2.2 File level provenance
544569
545570
BIDS-Prov provenance metadata can be stored inside the sidecar JSON of any BIDS file (or BIDS-Derivatives file) it applies to.
546571
In this case, the BIDS-Prov content only refers to the associated data file.
@@ -564,23 +589,11 @@ The sidecar JSON naming convention is already defined by BIDS. Here is an exampl
564589
└─ dataset_description.json
565590
```
566591
567-
Inside the sidecar JSON, the `GenearatedBy` field must describe the `Activity` that generated the data file, either with a reference to an existing `Id`:
568-
569-
```JSON
570-
{
571-
"GeneratedBy": "urn:conversion-00f3a18f",
572-
}
573-
```
574-
575-
or with a complete description of the `Activity` if it was not described elsewhere.
592+
Inside the sidecar JSON, the `GenearatedBy` field must describe the `Activity` that generated the data file, with a reference to an existing `Id`:
576593
577594
```JSON
578595
{
579-
"GeneratedBy": {
580-
"Id": "urn:conversion-00f3a18f",
581-
"Label": "Conversion",
582-
"Command": "convert -i raw_file.ext -o sub-001_ses-01_T1w.nii.gz"
583-
}
596+
"GeneratedBy": "bids::prov#conversion-00f3a18f",
584597
}
585598
```
586599

@@ -590,94 +603,14 @@ If the `SidecarGenearatedBy` field is not defined, BIDS-Prov assumes that the si
590603
No other field is allowed to describe provenance inside sidecar JSONs.
591604

592605
> [!WARNING]
593-
> When using sidecar JSON files to describe provenance, the `@context` and `BIDSProvVersion` fields MUST be defined inside a `*_base.json` file, e.g.:
606+
> When using sidecar JSON files to describe provenance, the `@context` and `BIDSProvVersion` fields MUST be defined inside a `prov/prov-<label>_base.json` file, e.g.:
594607
> ```JSON
595608
> {
596609
> "@context": "https://purl.org/nidash/bidsprov/context.json",
597610
> "BIDSProvVersion": "0.0.1"
598611
> }
599612
600-
#### 3.2.2 Subdirectories level provenance
601-
602-
BIDS-Prov files can be stored in a `prov/` directory in any subdirectory of the dataset (or BIDS-Derivatives directories).
603-
604-
In this case, the provenance metadata applies to the data files inside or below in the directory tree ; as stated by [BIDS common principles](https://bids-specification.readthedocs.io/en/stable/common-principles.html#filesystem-structure).
605-
606-
Each BIDS-Prov file MUST meet the following naming convention. The `label` of the `prov` entity is arbitrary, `suffix` is one of listed in [3.3.1 Suffixes](#3-1-3-suffixes), and `extension` is either `json` or `jsonld`.
607-
608-
```
609-
sub-<label>/
610-
[ses-<label>/]
611-
prov/
612-
sub-<label>[_ses-<label>]_prov-<label>_<suffix>.<extension>
613-
```
614-
615-
> [!TIP]
616-
> Here is an example dataset tree:
617-
> ```
618-
> └─ dataset
619-
> ├─ prov/
620-
> │ └─ prov-dcm2niix_base.json
621-
> ├─ sub-001/
622-
> │ ├─ prov/
623-
> │ │ └─ sub-001_prov-dcm2niix_act.json
624-
> │ └─ ses-01/
625-
> │ ├─ prov/
626-
> │ │ └─ sub-001_ses-01_prov-dcm2niix_act.json
627-
> │ └─ ...
628-
> ├─ sub-002/
629-
> │ ├─ prov/
630-
> │ │ └─ sub-002_prov-dcm2niix_act.json
631-
> │ └─ ...
632-
> ├─ ...
633-
> └─ dataset_description.json
634-
>```
635-
636-
> [!WARNING]
637-
> When using `.json` files, the `@context` and `BIDSProvVersion` fields MUST be defined inside a `*_base.json` file, e.g.:
638-
> ```JSON
639-
> {
640-
> "@context": "https://purl.org/nidash/bidsprov/context.json",
641-
> "BIDSProvVersion": "0.0.1"
642-
> }
643-
644-
#### 3.2.3 Dataset level provenance - `prov/` directory
645-
646-
BIDS-Prov files can be stored in a `prov/` directory immediately below the BIDS dataset (or BIDS-Derivatives dataset) root. At the dataset level, provenance can be about any BIDS file in the dataset.
647-
648-
Each BIDS-Prov file MUST meet the following naming convention. The `label` of the `prov` entity is arbitrary, `suffix` is one of listed in [3.1.3 Suffixes](#3-1-3-suffixes), and `extension` is either `json` or `jsonld`
649-
650-
```
651-
prov/
652-
[<subdirectories>/]
653-
prov-<label>_<suffix>.<extension>
654-
```
655-
656-
Here is an example:
657-
```
658-
└─ dataset
659-
├─ prov/
660-
│ ├─ dcm2niix/
661-
│ │ └─ prov-dcm2niix_base.jsonld
662-
│ ├─ prov-preprocessing_base.json
663-
│ ├─ prov-preprocessing_soft.json
664-
│ └─ ...
665-
├─ sub-001/
666-
├─ sub-002/
667-
├─ sub-003/
668-
├─ ...
669-
└─ dataset_description.json
670-
```
671-
672-
> [!WARNING]
673-
> When using `.json` files, the `@context` and `BIDSProvVersion` fields MUST be defined inside a `*_base.json` file, e.g.:
674-
> ```JSON
675-
> {
676-
> "@context": "https://purl.org/nidash/bidsprov/context.json",
677-
> "BIDSProvVersion": "0.0.1"
678-
> }
679-
680-
#### 3.2.4 Dataset level provenance - `dataset_description.json` file
613+
#### 3.2.3 Dataset level provenance - `dataset_description.json` file
681614
682615
> [!CAUTION]
683616
> TODO: how do we know to which provenance group belongs the records in the `dataset_description.json`? (As no `prov` entity is used)
@@ -691,7 +624,7 @@ Here is an example of a `GeneratedByProv` field containing a complete descriptio
691624
```JSON
692625
{
693626
"GeneratedByProv": {
694-
"Id": "bids::#conversion-00f3a18f",
627+
"Id": "bids::prov#conversion-00f3a18f",
695628
"Label": "Dicom to Nifti conversion",
696629
"Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
697630
"AssociatedWith": {
@@ -713,7 +646,7 @@ Here is an example of a `GeneratedByProv` field containing the IRI of an `Entity
713646

714647
```JSON
715648
{
716-
"GeneratedByProv": "bids::#conversion-00f3a18f"
649+
"GeneratedByProv": "bids::prov#conversion-00f3a18f"
717650
}
718651
```
719652

@@ -732,22 +665,19 @@ BIDS-Prov recommends the following conventions in order to have consistent, huma
732665
IRIs identifying `Activity`, `Agent`, and `Environment` provenance records inside files stored in a directory `<directory>` relatively to a BIDS dataset `<dataset>` SHOULD have the following form, where `<label>` is a human readable label for the record and `<uid>` is a unique group of chars:
733666
734667
```
735-
bids:<dataset>:<directory>#<name>-<uid>
668+
bids:<dataset>:prov#<name>-<uid>
736669
```
737670
738671
Here are a few naming examples:
739-
* `bids:ds001734:prov#conversion-xfMMbHK1`: an `Activity` described at dataset level inside the `ds001734` dataset;
740-
* `bids::sub-001/prov#dcm2niix-70ug8pl5"`: an `Agent` described at subject level inside the current dataset ;
741-
* `bids::prov#fedora-uldfv058"`: an `Environment` described at dataset level inside the current dataset.
672+
* `bids:ds001734:prov#conversion-xfMMbHK1`: an `Activity` described inside the `ds001734` dataset;
673+
* `bids::prov#fedora-uldfv058"`: an `Environment` described inside the current dataset.
742674
743675
IRI identifying `Entity` provenance records for a file `<file>` relatively to a BIDS dataset `<dataset>` SHOULD have the following form:
744676
745677
```
746678
bids:<dataset>:<file>
747679
```
748680
749-
derivatives/fmriprep/sub-001/func/sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
750-
751681
Here are a few naming examples:
752682
* `bids:ds001734:sub-002/anat/sub-02_T1w.nii`: an `Entity` describing a T1w file for subject `sub-002` in the `ds001734` dataset ;
753683
* `bids:derivatives:fmriprep/sub-001/func/sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz`: an `Entity` describing a bold file for subject `sub-001` in the `derivatives` dataset.
@@ -760,17 +690,18 @@ Here is another example that considers the following dataset:
760690
│ └─ dicoms/
761691
│ └─ ...
762692
├─ sub-001/
763-
│ ├─ anat/
764-
│ │ └─ sub-001_T1W.nii.gz
765-
│ └─ prov/
766-
│ └─ sub-001_prov-dcm2niix_act.json
693+
│ └─ anat/
694+
│ ├─ sub-001_T1w.nii.gz
695+
│ └─ sub-001_T1w.json
767696
├─ ...
768697
└─ prov/
698+
├─ prov-dcm2niix_act.json
769699
├─ prov-dcm2niix_base.json
770700
└─ prov-dcm2niix_soft.json
771701
```
772702
773703
IRIs of provenance records defined in `prov/prov-dcm2niix_soft.json` should start with `bids:dataset:prov#` or `bids::prov#`.
704+
774705
```JSON
775706
{
776707
"bids:dataset:prov#dcm2niix-70ug8pl5": {
@@ -780,17 +711,26 @@ IRIs of provenance records defined in `prov/prov-dcm2niix_soft.json` should star
780711
}
781712
```
782713

783-
This `Agent` can be referred to in the `sub-001/prov/sub-001_prov-dcm2niix_act.json` file:
714+
The previously described `Agent` can be referred to in the `prov/prov-dcm2niix_act.json` file:
715+
784716
```JSON
785717
{
786-
"bids:dataset:sub-001/prov#conversion-00f3a18f": {
718+
"bids:dataset:prov#conversion-00f3a18f": {
787719
"Label": "Conversion",
788720
"Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
789721
"AssociatedWith": "bids:dataset:prov#dcm2niix-70ug8pl5"
790722
}
791723
}
792724
```
793725

726+
The previously described `Activity` can be referred to in the `sub-001/anat/sub-001_T1w.json` sidecar JSON file:
727+
728+
```JSON
729+
{
730+
"GeneratedBy":"bids:dataset:prov#conversion-00f3a18f"
731+
}
732+
```
733+
794734
## 4. Examples
795735

796736
A list of examples for BIDS-Prov are available in https://github.com/bids-standard/BEP028_BIDSprov/tree/master/examples
@@ -837,7 +777,11 @@ A list of examples for BIDS-Prov are available in https://github.com/bids-standa
837777
<tr>
838778
<td><a href="https://github.com/bids-standard/BEP028_BIDSprov/tree/master/examples/dcm2niix/">dcm2niix/</a>
839779
</td>
840-
<td>A set of examples describing dicom to nifti conversion using dcm2niix. These aim at showing different ways to organise the exact same provenance records inside a dataset.
780+
<td>A set of examples describing dicom to nifti conversion using dcm2niix. These aim at showing different ways to organise the exact same provenance records inside a dataset:
781+
<ul>
782+
<li><code>dcm2niix_1</code>: all provenance records inside one JSON-LD file at dataset level.</li>
783+
<li><code>dcm2niix_4</code>: all provenance records inside several JSON files at dataset level, sidecar JSON use references to these files.</li>
784+
</ul>
841785
</td>
842786
</tr>
843787

0 commit comments

Comments
 (0)