Skip to content

Commit bf9c6b6

Browse files
committed
Conventions for naming and structure
1 parent c1ca56f commit bf9c6b6

File tree

1 file changed

+113
-76
lines changed

1 file changed

+113
-76
lines changed

bep028spec.md

Lines changed: 113 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ But provenance comes up in other contexts as well, which might be addressed at a
4747

4848
Provenance can be captured using different mechanisms, but independent of encoding, always reflects transformations by either humans or software. The interpretability of provenance records requires a consistent vocabulary for provenance as well as an expectation for a consistent terminology for the objects being encoded.
4949

50-
Note that some level of provenance is already encoded in BIDS (cf. the [`GeneratedBy`](https://bids-specification.readthedocs.io/en/stable/glossary.html#generatedby-metadata) of the `dataset_description.json` file that contains the metadata of a dataset), this BEP avoids duplicating information already available in sidecar JSONs.
50+
Note that some level of provenance is already encoded in BIDS (cf. the [`GeneratedBy`](https://bids-specification.readthedocs.io/en/stable/glossary.html#generatedby-metadata) of the `dataset_description.json` file that contains the provenance metadata for the dataset). This BEP avoids duplicating information already available in sidecar JSONs.
5151

5252
### 1.3 Provenance format {#1-3-provenance-format}
5353

@@ -56,11 +56,14 @@ BIDS-Prov metadata is written in JSON or JSON-LD.
5656
[JSON-LD](https://www.w3.org/TR/json-ld11/) is a specific type of JSON that allows encoding graph-like structures with the Resource Description Framework[^1].
5757

5858
TODO: written in a single file (JSON-LD) or several JSON files that can be aggregated into one JSON-LD
59+
5960
TODO: BIDS-Prov tools
6061

6162
## 2. Provenance records {#2-provenance-records}
6263

63-
Each provenance record is composed of a set of Activities that represent the transformations that have been applied to the data. Each Activity can use Entities as inputs and outputs. The Agent specifies the software package. Environments specify the software environment in which the provenance record was obtained.
64+
BIDS-Prov metadata consists in a set or records. There are 4 types of records: `Activity`, `Entity`, `Agent`, and `Environment`.
65+
66+
Activities represent the transformations that have been applied to the data. Each Activity can use Entities as inputs and outputs. The Agent specifies the software package. Environments specify the software environment in which the provenance record was obtained.
6467

6568
![](img/records.svg)
6669

@@ -124,16 +127,16 @@ Each Activity record is a JSON Object with the following fields:
124127
</tr>
125128
</table>
126129

127-
Example of an Activity record:
130+
Here is an example of an Activity record:
128131
```JSON
129132
{
130133
"Id": "bids::prov/#conversion-00f3a18f",
131134
"Label": "Dicom to Nifti conversion",
132135
"Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
133136
"AssociatedWith": "bids::prov/#dcm2niix-khhkm7u1",
134137
"Used": [
135-
"bids::prov/#fedora-uldfv058",
136-
"bids::sourcedata/dicoms"
138+
"bids::prov/#fedora-uldfv058",
139+
"bids::sourcedata/dicoms"
137140
],
138141
"Type": "Activity",
139142
"StartedAtTime": "2025-03-13T10:26:00",
@@ -178,7 +181,7 @@ Each Entity record is a JSON Object with the following fields:
178181
<tr>
179182
<td><code>Type</code>
180183
</td>
181-
<td>OPTIONAL. URI. A term from a controlled vocabulary that more specifically describes the activity.
184+
<td>OPTIONAL. URI. A term from a controlled vocabulary that more specifically describes the entity.
182185
</td>
183186
</tr>
184187
<tr>
@@ -304,24 +307,16 @@ Here is an example of an Environment record:
304307

305308
### 3.1 File naming {#3-1-file-naming}
306309

307-
hence having either a `.json` or a `.jsonld` extension.
308-
When using a `.jsonld` extension, the contents of the file must be JSON-LD.
309-
As JSON-LD is JSON, `*.jsonld` files can contain JSON.
310-
311310
This section describes additions to the BIDS naming conventions for BIDS-Prov files.
312311

313312
For further information about naming conventions, please consult the BIDS specification ([https://bids-specification.readthedocs.io](https://bids-specification.readthedocs.io)). Until these conventions are established in BIDS, it is RECOMMENDED to use the following.
314313

315-
#### File formats
314+
#### 3.1.1 File extensions {#3-1-1-file-extensions}
316315

317316
BIDS-Prov files contain JSON or JSON-LD data, hence having either a `.json` or a `.jsonld` extension.
317+
When using a `.jsonld` extension, the contents of the file must be JSON-LD. As JSON-LD is JSON, `*.jsonld` files can contain JSON.
318318

319-
[JSON-LD](https://www.w3.org/TR/json-ld11/) is a specific type of JSON that allows encoding graph-like structures with the Resource Description Framework[^1].
320-
321-
When using a `.jsonld` extension, the contents of the file must be JSON-LD.
322-
As JSON-LD is JSON, `*.jsonld` files can contain JSON.
323-
324-
#### The `prov` entity
319+
#### 3.1.2 The `prov` entity {#3-1-2-the-prov-entity}
325320

326321
BIDS-Prov introduces the following entity:
327322

@@ -347,23 +342,68 @@ In the following example, two separated processings (`conversion` and `smoothing
347342
└─ ...
348343
```
349344

350-
#### Suffixes
345+
#### 3.1.3 Suffixes {#3-1-3-Suffixes}
351346

352-
The following BIDS suffixes (cf. [Definitions](https://bids-specification.readthedocs.io/en/stable/common-principles.html#definitions)) specify the contents of a provenance file:
347+
The following BIDS suffixes (cf. [Definitions](https://bids-specification.readthedocs.io/en/stable/common-principles.html#definitions)) specify the contents of a provenance file.
353348

354-
* `act`: the file describes BIDS Prov Activities for the group of provenance records
355-
* `soft`: the file describes BIDS Prov Software for the group of provenance records
356-
* `ent`: the file describes BIDS Prov Entities for the group of provenance records
357-
* `env`: the file describes BIDS Prov Environments for the group of provenance records
358-
* `base`: the file describes common BIDS Prov parameters for the group of provenance records (version and context for BIDS Prov)
349+
<table>
350+
<tr>
351+
<td><strong>Suffix</strong>
352+
</td>
353+
<td><strong>Description</strong>
354+
</td>
355+
<td><strong>File extension</strong>
356+
</td>
357+
</tr>
358+
<tr>
359+
<td><code>act</code>
360+
</td>
361+
<td>Activities for the group of provenance records.
362+
</td>
363+
<td><code>.json</code>
364+
</td>
365+
</tr>
366+
<tr>
367+
<td><code>ent</code>
368+
</td>
369+
<td>Agents for the group of provenance records.
370+
</td>
371+
<td><code>.json</code>
372+
</td>
373+
</tr>
374+
<tr>
375+
<td><code>env</code>
376+
</td>
377+
<td>Entities for the group of provenance records.
378+
</td>
379+
<td><code>.json</code>
380+
</td>
381+
</tr>
382+
<tr>
383+
<td><code>base</code>
384+
</td>
385+
<td>Common parameters for the group of provenance records (version and context for BIDS-Prov).
386+
<td><code>.json</code>
387+
</td>
388+
</td>
389+
</tr>
390+
<tr>
391+
<td><code>all</code>
392+
</td>
393+
<td>All records for the group of provenance records.
394+
</td>
395+
<td><code>.jsonld</code>
396+
</td>
397+
</tr>
398+
</table>
359399

360400
### 3.2 Provenance description levels {#3-2-provenance-description-levels}
361401

362-
This section describes the places where BIDS-Prov contents can be stored.
402+
This section describes the places where BIDS-Prov metadata can be stored.
363403

364404
For further information about organization conventions, please consult the BIDS specification ([https://bids-specification.readthedocs.io](https://bids-specification.readthedocs.io)). Until these conventions are established in BIDS, it is RECOMMENDED to use the following.
365405

366-
BIDS-Prov contents can be stored in different locations:
406+
BIDS-Prov metadata can be stored at different levels:
367407
* at dataset level ;
368408
* inside dataset subdirectories ;
369409
* at file level.
@@ -373,41 +413,13 @@ It is recommanded that the records are stored at the level they describe. E.g.:
373413
* an Activity that generated as set of files for one subject only must be described at the subject's subdirectory level ;
374414
* an Activity that generated one file only can be described at this file's level.
375415

376-
#### File level provenance
416+
#### 3.2.1 File level provenance {#3-2-1-file-level-provenance}
377417

378-
BIDS-Prov provenance metadata can be stored inside the [JSON sidecar of any BIDS file]() (or BIDS-Derivatives file) it applies to.
418+
BIDS-Prov provenance metadata can be stored inside the sidecar JSON of any BIDS file (or BIDS-Derivatives file) it applies to.
379419
In this case, the BIDS-Prov content only refers to the associated data file.
380-
The JSON sidecar file must have the following naming convention:
381-
382-
```
383-
sub-<label>/
384-
[ses-<label>/]
385-
sub-<label>[_ses-<label>]_<suffix>.json
386-
```
387-
388-
The `GenearatedBy` field must describe the `Activity` that generated the data file, either with a reference to an existing `Id`:
389-
390-
```JSON
391-
{
392-
"GeneratedBy": "urn:conversion-00f3a18f",
393-
}
394-
```
395-
396-
or with a complete definition of the `Activity` if it was not defined elsewhere.
397-
398-
```JSON
399-
{
400-
"GeneratedBy": {
401-
"Id": "urn:conversion-00f3a18f",
402-
"Label": "Conversion",
403-
"Command": "convert -i raw_file.ext -o sub-001_ses-01_T1w.nii.gz"
404-
}
405-
}
406-
```
407420

408-
No other field is allowed to describe provenance.
421+
The sidecar JSON naming convention is already defined by BIDS. Here is an example dataset tree:
409422

410-
Here is an example:
411423
```
412424
└─ example_dataset
413425
├─ sub-001/
@@ -424,22 +436,47 @@ Here is an example:
424436
└─ dataset_description.json
425437
```
426438

427-
#### Subdirectories level provenance
439+
Inside the sidecar JSON, the `GenearatedBy` field must describe the `Activity` that generated the data file, either with a reference to an existing `Id`:
440+
441+
```JSON
442+
{
443+
"GeneratedBy": "urn:conversion-00f3a18f",
444+
}
445+
```
446+
447+
or with a complete description of the `Activity` if it was not described elsewhere.
448+
449+
```JSON
450+
{
451+
"GeneratedBy": {
452+
"Id": "urn:conversion-00f3a18f",
453+
"Label": "Conversion",
454+
"Command": "convert -i raw_file.ext -o sub-001_ses-01_T1w.nii.gz"
455+
}
456+
}
457+
```
458+
459+
Based on the same principle, the `SidecarGenearatedBy` field can be defined to describe the `Activity` that generated the sidecar JSON file.
460+
If the `SidecarGenearatedBy` field is not defined, BIDS-Prov assumes that the sidecar JSON was generated by the `Activity` described in the `GenearatedBy` field.
461+
462+
No other field is allowed to describe provenance inside sidecar JSONs.
463+
464+
#### 3.2.2 Subdirectories level provenance {#3-2-2-subdirectories-level-provenance}
428465

429466
BIDS-Prov files can be stored in a `prov/` directory in any subdirectory of the dataset (or BIDS-Derivatives directories).
430467

431468
In this case, the provenance metadata applies to the data files inside or below in the directory tree ; as stated by [BIDS common principles](https://bids-specification.readthedocs.io/en/stable/common-principles.html#filesystem-structure).
432469

433-
Each BIDS-Prov file must meet the following naming convention. The `label` of the `prov` entity is arbitrary, and `suffix` is one of listed in [§ Suffixes](#suffixes).
470+
Each BIDS-Prov file must meet the following naming convention. The `label` of the `prov` entity is arbitrary, `suffix` is one of listed in [3.3.1 Suffixes](#3-1-3-suffixes), and `extension` is either `json` or `jsonld`.
434471

435472
```
436473
sub-<label>/
437474
[ses-<label>/]
438475
prov/
439-
sub-<label>[_ses-<label>]_prov-<label>_<suffix>.json
476+
sub-<label>[_ses-<label>]_prov-<label>_<suffix>.<extension>
440477
```
441478

442-
Here is an example:
479+
Here is an example dataset tree:
443480

444481
```
445482
└─ dataset
@@ -458,11 +495,11 @@ Here is an example:
458495
└─ dataset_description.json
459496
```
460497

461-
#### Dataset level provenance - `prov/` directory
498+
#### 3.2.3 Dataset level provenance - `prov/` directory {#3-2-3-dataset-level-provenance-prov-directory}
462499

463500
BIDS-Prov files can be stored in a `prov/` directory immediately below the BIDS dataset (or BIDS-Derivatives dataset) root. At the dataset level, provenance can be about any BIDS file in the dataset.
464501

465-
Each BIDS-Prov file must meet the following naming convention, where `label` can be arbitrary, `suffix` is one of listed in [§ Suffixes](#suffixes), and `suffix` is either `json` or `jsonld`
502+
Each BIDS-Prov file must meet the following naming convention. The `label` of the `prov` entity is arbitrary, `suffix` is one of listed in [3.1.3 Suffixes](#3-1-3-suffixes), and `extension` is either `json` or `jsonld`
466503

467504
```
468505
prov/
@@ -487,29 +524,29 @@ Here is an example:
487524
└─ dataset_description.json
488525
```
489526

490-
#### Dataset level provenance - `dataset_description.json` file
527+
#### 3.2.4 Dataset level provenance - `dataset_description.json` file {#3-2-4-dataset-level-provenance-dataset-description}
491528

492529
In the current version of the BIDS specification (1.10.0), the [`GeneratedBy`](https://bids-specification.readthedocs.io/en/stable/glossary.html#generatedby-metadata) field of the `dataset_description.json` files allows to specify provenance of the dataset.
493530

494-
> [!NOTE] BEP028 proposes that the following description replaces the `GeneratedBy` field as part of a major revision of the BIDS specification. Until this happens, BIDS Prov provenance records can be stored in a `GeneratedByProv` field.
531+
BEP028 proposes that the following description replaces the `GeneratedBy` field as part of a major revision of the BIDS specification. Until this happens, BIDS-Prov provenance records can be stored in a `GeneratedByProv` field.
495532

496-
Here is an example of a `GeneratedByProv` field containing a complete description of an `Entity`:
533+
Here is an example of a `GeneratedByProv` field containing a complete description of an `Activity`:
497534

498535
```JSON
499536
{
500537
"GeneratedByProv": {
501-
"Id": "urn:conversion-00f3a18f",
502-
"Label": "Conversion",
503-
"Command": "dcm2niix -i -o ",
538+
"Id": "bids::#conversion-00f3a18f",
539+
"Label": "Dicom to Nifti conversion",
540+
"Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
504541
"AssociatedWith": {
505-
"Id": "urn:dcm2niix-70ug8pl5",
506-
"Type": "Agent",
542+
"Id": "bids::#dcm2niix-khhkm7u1",
543+
"AltIdentifier": "RRID:SCR_023517",
507544
"Label": "dcm2niix",
508-
"Version": "v1.1.3",
545+
"Version": "v1.0.20220720",
509546
"Used": {
510-
"Id": "urn:environment-gjqhxnbc",
511-
"Type": "Environment",
512-
"Label": "Docker container"
547+
"Id": "bids::#fedora-uldfv058",
548+
"Label": "Fedora release 36 (Thirty Six)",
549+
"OperatingSystem": "GNU/Linux 6.2.15-100.fc36.x86_64"
513550
}
514551
}
515552
}
@@ -520,7 +557,7 @@ Here is an example of a `GeneratedByProv` field containing the IRI of an `Entity
520557

521558
```JSON
522559
{
523-
"GeneratedByProv": "urn:conversion-00f3a18f"
560+
"GeneratedByProv": "bids::#conversion-00f3a18f"
524561
}
525562
```
526563

0 commit comments

Comments
 (0)