Skip to content

Commit 7b4d26a

Browse files
committed
Description levels
1 parent c6bebf5 commit 7b4d26a

File tree

1 file changed

+109
-21
lines changed

1 file changed

+109
-21
lines changed

bep028spec.md

Lines changed: 109 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -51,47 +51,135 @@ Note that some level of provenance is already encoded in BIDS (cf. [`GeneratedBy
5151

5252
### 1.3 File naming {#1-3-file-naming}
5353

54-
This document describes the contents of a BIDS Prov file; for naming and organization conventions, please consult the BIDS specification ([https://bids-specification.readthedocs.io](https://bids-specification.readthedocs.io)). Until these conventions are established in BIDS, it is RECOMMENDED to use the following:
54+
This section describes the places where BIDS-Prov contents can be stored; for naming and organization conventions, please consult the BIDS specification ([https://bids-specification.readthedocs.io](https://bids-specification.readthedocs.io)). Until these conventions are established in BIDS, it is RECOMMENDED to use the following.
5555

56-
BIDS-Prov files are JSON-LD files -i.e. a specific type of JSON files that allows encoding graph-like structures with the Resource Description Framework[^1]-
56+
BIDS-Prov files contain JSON or JSON-LD data. JSON-LD is a specific type of JSON that allows encoding graph-like structures with the Resource Description Framework[^1].
5757

58-
They can be stored in two different locations:
58+
They can be stored in different locations:
59+
* at dataset level ;
60+
* inside dataset subdirectories ;
61+
* at file level.
5962

60-
**File level provenance.** BIDS-Prov files can be stored immediately alongside the BIDS file (or BIDS-Derivatives file) they apply to. Each BIDS-Prov file must meet the following naming convention:
63+
It is recommanded that the records are stored at the level they describe. E.g.:
64+
* an Activity that generated as set of files for several subjects of the dataset must be described at the dataset level ;
65+
* an Activity that generated as set of files for one subject only must be described at the subject's subdirectory level ;
66+
* an Activity that generated one file only can be described at this file's level.
6167

68+
#### File level provenance
69+
70+
BIDS-Prov provenance metadata can be stored inside the [JSON sidecar of any BIDS file]() (or BIDS-Derivatives file) it applies to.
71+
In this case, the BIDS-Prov content only refers to the associated data file.
72+
The JSON sidecar file must have the following naming convention:
73+
74+
```
75+
sub-<label>/
76+
[ses-<label>/]
77+
sub-<label>[_ses-<label>]_<suffix>.json
6278
```
6379

64-
sub-<label>/[ses-<label>/]sub-<label>[_ses-<label>]_<suffix>_prov.jsonld
65-
prov/<sub_file_path>.prov.jsonld
80+
The `GenearatedBy` field must describe the `Activity` that generated the data file, either with a reference to an existing `Id`:
81+
82+
```JSON
83+
{
84+
"GeneratedBy": "urn:conversion-00f3a18f",
85+
}
6686
```
6787

68-
At the file level, provenance follows some of the same concepts at the dataset level, but is specifically about the current file under consideration.
88+
or with a complete definition of the `Activity` if it was not defined elsewhere.
6989

70-
**Participant level provenance.**
71-
BIDS-Prov files can be stored in a `prov/` folder inside a subfolder. Each BIDS-Prov file must meet the following naming convention:
90+
```JSON
91+
{
92+
"GeneratedBy": {
93+
"Id": "urn:conversion-00f3a18f",
94+
"Label": "Conversion",
95+
"Command": "convert -i raw_file.ext -o sub-001_ses-01_T1w.nii.gz"
96+
}
97+
}
98+
```
7299

100+
No other field is allowed to describe provenance.
73101

102+
Here is an example:
74103
```
75-
prov/sub-<label>/[ses-<label>/]sub-<label>[_ses-<label>]_<suffix>_prov.jsonld
76-
prov/sub-<label>/[ses-<label>/]<modality>/sub-<label>[_ses-<label>]_<suffix>_prov.jsonld
77-
prov/<label>_prov.jsonld
104+
└─ example_dataset
105+
├─ sub-001/
106+
│ └─ ses-01/
107+
│ └─ anat/
108+
│ ├─ sub-001_ses-01_T1w.nii.gz
109+
│ └─ sub-001_ses-01_T1w.json
110+
├─ sub-002/
111+
│ └─ ses-01/
112+
│ └─ anat/
113+
│ ├─ sub-002_ses-01_T1w.nii.gz
114+
│ └─ sub-002_ses-01_T1w.json
115+
├─ ...
116+
└─ dataset_description.json
78117
```
79118

80-
Participant-level provenance -- Not related to a given file but all related to a given subject!
119+
#### Subdirectories level provenance
120+
121+
BIDS-Prov files can be stored in a `prov/` directory in any subdirectory of the dataset (or BIDS-Derivatives directories).
81122

82-
We need to think about provenance of BIDS-derivatives !!
123+
In this case, the provenance metadata applies to the data files inside or below in the directory tree ; as stated by [BIDS common principles](https://bids-specification.readthedocs.io/en/stable/common-principles.html#filesystem-structure).
83124

84-
**Dataset level provenance.** BIDS-Prov files can be stored in a `prov/` directory immediately below the BIDS dataset (or BIDS-Derivatives dataset) root. Each BIDS-Prov file must meet the following naming convention:
125+
Each BIDS-Prov file must meet the following naming convention. The `label` of the `prov` entity is arbitrary, and `suffix` is one of listed in [§ Suffixes](#suffixes).
85126

86127
```
87-
<label>_prov.jsonld
128+
sub-<label>/
129+
[ses-<label>/]
130+
prov/
131+
sub-<label>[_ses-<label>]_prov-<label>_<suffix>.json
88132
```
89133

90-
At the dataset level, provenance could be about the dataset itself, or about any BIDS file in the dataset.
134+
Here is an example:
91135

92-
It is RECOMMENDED to place entity (file) related provenance alongside the files where it is possible (i.e. file level provenance). Dataset level provenance may evolve as new data are added, which may include sourcedata, BIDS data, and BIDS derived data. One option is to make use of <code>[https://w3c.github.io/json-ld-syntax/#named-graphs](https://w3c.github.io/json-ld-syntax/#named-graphs)</code>.
136+
```
137+
└─ dataset
138+
├─ sub-001/
139+
│ ├─ prov/
140+
│ │ └─ sub-001_prov-dcm2niix_act.json
141+
│ ├─ ses-01/
142+
│ │ ├─ prov/
143+
│ │ │ └─ sub-001_ses-01_prov-dcm2niix_act.json
144+
│ │ └─ ...
145+
│ ├─ ses-02/
146+
│ └─ ...
147+
├─ sub-002/
148+
│ ├─ prov/
149+
│ │ └─ sub-002_prov-dcm2niix_act.json
150+
│ └─ ...
151+
├─ ...
152+
└─ dataset_description.json
153+
```
93154

94-
Note: since these jsonld documents are graph objects, they can be aggregated using RDF tools without the need to apply the inheritance principle.
155+
#### Dataset level provenance
156+
157+
BIDS-Prov files can be stored in a `prov/` directory immediately below the BIDS dataset (or BIDS-Derivatives dataset) root. At the dataset level, provenance can be about any BIDS file in the dataset.
158+
159+
Each BIDS-Prov file must meet the following naming convention, where `label` can be arbitrary, `suffix` is one of listed in [§ Suffixes](#suffixes), and `suffix` is either `json` or `jsonld`
160+
161+
```
162+
prov/
163+
[<subdirectories>/]
164+
prov-<label>_<suffix>.<extension>
165+
```
166+
167+
Here is an example:
168+
169+
```
170+
└─ dataset
171+
├─ prov/
172+
│ ├─ dcm2niix/
173+
│ │ └─ prov-dcm2niix_base.jsonld
174+
│ ├─ prov-preprocessing_base.json
175+
│ ├─ prov-preprocessing_soft.json
176+
│ └─ ...
177+
├─ sub-001/
178+
├─ sub-002/
179+
├─ sub-003/
180+
├─ ...
181+
└─ dataset_description.json
182+
```
95183

96184
### 1.4 Top-level structure {#1-4-top-level-structure}
97185

@@ -239,7 +327,7 @@ A complete schema for the model file to facilitate specification and validation
239327

240328
## 2. Provenance records {#2-provenance-records}
241329

242-
Each provenance record is composed of a set of Activities that represent the transformations that have been applied to the data. Each Activity can use Entities as inputs and outputs. The Agent specifies the software package.
330+
Each provenance record is composed of a set of Activities that represent the transformations that have been applied to the data. Each Activity can use Entities as inputs and outputs. The Agent specifies the software package. Environments specify the software environment in which the provenance record was obtained.
243331

244332
![](img/records.svg)
245333

@@ -389,7 +477,7 @@ Including an Agent record is OPTIONAL. If included, each Agent record has the fo
389477
</tr>
390478
</table>
391479

392-
### 2.4 Environments (Optional) {#2-4-environments-optional}
480+
### 2.4 Environment (Optional) {#2-4-environments-optional}
393481

394482
Information about the environment in which the provenance record was obtained is modeled with an environment record.
395483

0 commit comments

Comments
 (0)