Skip to content

Commit dcee2e8

Browse files
authored
Merge pull request bids-standard#156 from bclenet/fmriprep_example
[Example] provenance of fMRIPrep derivatives
2 parents d2e49ea + 638a294 commit dcee2e8

File tree

86 files changed

+233214
-4
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+233214
-4
lines changed

.codespellrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
[codespell]
2-
skip = .git,*.pdf,*.svg,venvs,./examples/from_parsers
2+
skip = .git,*.pdf,*.svg,venvs,./examples/from_parsers,*.trig
33
#
44
# ignore-words-list =

.gitmodules

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
[submodule "examples/fmriprep/ds001734"]
2+
path = examples/fmriprep/ds001734
3+
url = [email protected]:OpenNeuroDatasets/ds001734.git
4+
datalad-id = ca05bc10-29a0-11e9-9a7b-0242ac13000d
15
[submodule "examples/heudiconv/sourcedata/hirni-demo"]
26
path = examples/heudiconv/sourcedata/hirni-demo
37
url = https://github.com/psychoinformatics-de/hirni-demo.git

bids_prov/visualize.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,9 @@ def viz_turtle(content=None, img_file=None, source=None, **kwargs) -> None:
2828
# TODO : show attributes has optional arg
2929
dot = prov_to_dot(prov_doc, use_labels=True,
3030
show_element_attributes=False, show_relation_attributes=False)
31-
dot.write_png(img_file)
31+
# dot.write_png(img_file)
32+
with open(img_file, 'wb') as file:
33+
file.write(dot.create_svg())
3234

3335

3436
def viz_jsonld11(jsonld11: dict, img_file: str) -> None:

examples/fmriprep/README.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# A `fMRIPrep` example for BIDS-Prov
2+
3+
This example aims at showing provenance records for the [fMRIPrep](https://fmriprep.org/en/23.1.3/index.html) preprocessing software, as a typical usecase on how to store provenance inside a BIDS derivatives dataset.
4+
5+
> [!NOTE]
6+
> The command lines described in this documentation are supposed to be run from the `examples/fmriprep/` directory.
7+
8+
> [!WARNING]
9+
> This examples needs the bids prov visualizer to export .svg files instead of .png files (for which the resolution is way too low to display a whole fMRIprep graph), hence modifications in `bids_prov/visualize.py`
10+
11+
## Source dataset
12+
13+
We use the dataset from https://openneuro.org/datasets/ds001734/versions/1.0.5, containing raw and preprocessed fMRI data of two versions of the mixed gambles task, from the Neuroimaging Analysis Replication and Prediction Study (NARPS).
14+
15+
```shell
16+
datalad install https://github.com/OpenNeuroDatasets/ds001734.git
17+
git submodule add https://github.com/OpenNeuroDatasets/ds001734.git ds001734
18+
cd ds001734
19+
datalad get sub-001/*
20+
```
21+
22+
## `fMRIPrep` installation
23+
24+
```shell
25+
pip install fmriprep-docker==1.1.4
26+
docker pull poldracklab/fmriprep:1.1.4
27+
mkdir derivatives/
28+
```
29+
30+
## Getting provenance records from nipype
31+
32+
Create a `nipype.cfg` file to setup provenance recording in nipype. The file contains the following lines:
33+
```
34+
[execution]
35+
write_provenance = true
36+
hash_method = content
37+
```
38+
39+
Launch `fMRIPrep` on one subject (sub-001):
40+
```shell
41+
fmriprep-docker --participant-label=001 --fs-license-file=freesurfer_license.txt --config=nipype.cfg -w=derivatives/work/ ds001734/ derivatives/ participant
42+
```
43+
44+
> [!NOTE]
45+
> This is responsible for launching the following command line:
46+
> ```shell
47+
> docker run --rm -it -v <absolute_path_to>/freesurfer_license.txt:/opt/freesurfer/license.txt:ro -v <absolute_path_to>/nipype.cfg:/root/.nipype/nipype.cfg:ro -v <absolute_path_to>ds001734/:/data:ro -v <absolute_path_to>derivatives/:/out -v <absolute_path_to>derivatives/work:/scratch poldracklab/fmriprep:1.1.4 /data /out participant --participant-label=001 -w /scratch
48+
> ```
49+
50+
## Converting nipype provenance to BIDS-Prov
51+
52+
Nipype generates RDF provenance records in Trig format, as contained in `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959.trig`.
53+
54+
We use the `code/convert_prov.py` script to convert it to BIDS-Prov compliant provenance:
55+
56+
```shell
57+
cd derivatives/fmriprep/
58+
python code/convert_prov.py
59+
```
60+
61+
This script perform SPARQL queries to extract a simplified version of the RDF graph, containing activities, entities, environments and agents with these relations:
62+
63+
| Record | relations |
64+
| --- | --- |
65+
| Activities | `Label`<br>`Type`<br>`Command`<br>`AssociatedWith`<br>`Used`<br>`StartedAtTime`<br>`EndedAtTime` |
66+
| Entities | `Label`<br>`AtLocation`<br>`GeneratedBy`<br>`Type`<br>`Digest` |
67+
| Agents | `Label`<br>`Type`<br>`Version` |
68+
| Environments | `Label`<br>`Type`<br>`EnvVar` |
69+
70+
> [!NOTE]
71+
> The script works with the `code/queries.py` module containing a set of exhaustive queries, and a set of simplified ones. The example uses he simplified queries (that do not extract Environments or Agents) to simplify the output graph.
72+
73+
The script generates:
74+
* `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_compacted.jsonld`: a JSON-LD file, which is the serialization of the simplified RDF graph
75+
* `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld`: a BIDS-Prov file created by adapting the previous JSON-LD file to a BIDS-Prov skeleton
76+
* provenance records split into JSON files `derivatives/fmriprep/prov/prov-fmriprep_*.json`
77+
78+
We are able to visualize the BIDS-Prov graph:
79+
```shell
80+
pip install bids-prov==0.1.0
81+
bids_prov_visualizer --input_file derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld --output_file derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.svg
82+
```
83+
84+
![](/examples/fmriprep/derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.svg)
85+
86+
## Storing provenance in sidecar JSONs
87+
88+
We use the `code/split_prov.py` script to create (or complement) sidecar JSON files form Entity records of `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld`.
89+
90+
```shell
91+
python code/split_prov.py -i prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld -o .
92+
```
93+
94+
This gives the following tree (`code/` and `prov/nipype/` directories are ignored):
95+
96+
```
97+
.
98+
├── prov
99+
│ ├── prov-fmriprep_act.json
100+
│ ├── prov-fmriprep_base.json
101+
│ ├── prov-fmriprep_ent.json
102+
│ ├── prov-fmriprep_env.json
103+
│ └── prov-fmriprep_soft.json
104+
└── sub-001
105+
├── anat
106+
│ ├── sub-001_T1w_brainmask.json
107+
│ ├── sub-001_T1w_brainmask.nii.gz
108+
│ ├── sub-001_T1w_dtissue.json
109+
│ ├── sub-001_T1w_dtissue.nii.gz
110+
│ ├── sub-001_T1w_inflated.L.surf.gii
111+
│ ├── sub-001_T1w_inflated.L.surf.json
112+
│ ├── sub-001_T1w_inflated.R.surf.gii
113+
│ ├── sub-001_T1w_inflated.R.surf.json
114+
│ ├── sub-001_T1w_label-aparcaseg_roi.json
115+
│ ├── sub-001_T1w_label-aparcaseg_roi.nii.gz
116+
│ ├── sub-001_T1w_label-aseg_roi.json
117+
│ ├── sub-001_T1w_label-aseg_roi.nii.gz
118+
│ ├── sub-001_T1w_midthickness.L.surf.gii
119+
│ ├── sub-001_T1w_midthickness.L.surf.json
120+
│ ├── sub-001_T1w_midthickness.R.surf.gii
121+
│ ├── sub-001_T1w_midthickness.R.surf.json
122+
│ ├── sub-001_T1w_pial.L.surf.gii
123+
│ ├── sub-001_T1w_pial.L.surf.json
124+
│ ├── sub-001_T1w_pial.R.surf.gii
125+
│ ├── sub-001_T1w_pial.R.surf.json
126+
│ ├── sub-001_T1w_preproc.json
127+
│ ├── sub-001_T1w_preproc.nii.gz
128+
│ ├── sub-001_T1w_smoothwm.L.surf.gii
129+
│ ├── sub-001_T1w_smoothwm.L.surf.json
130+
│ ├── sub-001_T1w_smoothwm.R.surf.gii
131+
│ ├── sub-001_T1w_smoothwm.R.surf.json
132+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_brainmask.json
133+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_brainmask.nii.gz
134+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_dtissue.json
135+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_dtissue.nii.gz
136+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_preproc.json
137+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_preproc.nii.gz
138+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_target-T1w_warp.h5
139+
│ ├── sub-001_T1w_space-MNI152NLin2009cAsym_target-T1w_warp.json
140+
│ ├── sub-001_T1w_space-orig_target-T1w_affine.json
141+
│ ├── sub-001_T1w_space-orig_target-T1w_affine.txt
142+
│ ├── sub-001_T1w_target-fsnative_affine.json
143+
│ ├── sub-001_T1w_target-fsnative_affine.txt
144+
│ ├── sub-001_T1w_target-MNI152NLin2009cAsym_warp.h5
145+
│ └── sub-001_T1w_target-MNI152NLin2009cAsym_warp.json
146+
└── func
147+
├── sub-001_task-MGT_run-01_bold_confounds.json
148+
├── sub-001_task-MGT_run-01_bold_confounds.tsv
149+
├── sub-001_task-MGT_run-01_bold_space-fsaverage5.L.func.gii
150+
├── sub-001_task-MGT_run-01_bold_space-fsaverage5.L.func.json
151+
├── sub-001_task-MGT_run-01_bold_space-fsaverage5.R.func.gii
152+
├── sub-001_task-MGT_run-01_bold_space-fsaverage5.R.func.json
153+
├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_brainmask.json
154+
├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
155+
├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.json
156+
├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
157+
├── sub-001_task-MGT_run-01_bold_space-T1w_label-aparcaseg_roi.json
158+
├── sub-001_task-MGT_run-01_bold_space-T1w_label-aparcaseg_roi.nii.gz
159+
├── sub-001_task-MGT_run-01_bold_space-T1w_label-aseg_roi.json
160+
├── sub-001_task-MGT_run-01_bold_space-T1w_label-aseg_roi.nii.gz
161+
├── sub-001_task-MGT_run-02_bold_confounds.json
162+
├── sub-001_task-MGT_run-02_bold_confounds.tsv
163+
├── sub-001_task-MGT_run-02_bold_space-fsaverage5.L.func.gii
164+
├── sub-001_task-MGT_run-02_bold_space-fsaverage5.L.func.json
165+
├── sub-001_task-MGT_run-02_bold_space-fsaverage5.R.func.gii
166+
├── sub-001_task-MGT_run-02_bold_space-fsaverage5.R.func.json
167+
├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_brainmask.json
168+
├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
169+
├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_preproc.json
170+
├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
171+
├── sub-001_task-MGT_run-02_bold_space-T1w_label-aparcaseg_roi.json
172+
├── sub-001_task-MGT_run-02_bold_space-T1w_label-aparcaseg_roi.nii.gz
173+
├── sub-001_task-MGT_run-02_bold_space-T1w_label-aseg_roi.json
174+
├── sub-001_task-MGT_run-02_bold_space-T1w_label-aseg_roi.nii.gz
175+
├── sub-001_task-MGT_run-03_bold_confounds.json
176+
├── sub-001_task-MGT_run-03_bold_confounds.tsv
177+
├── sub-001_task-MGT_run-03_bold_space-fsaverage5.L.func.gii
178+
├── sub-001_task-MGT_run-03_bold_space-fsaverage5.L.func.json
179+
├── sub-001_task-MGT_run-03_bold_space-fsaverage5.R.func.gii
180+
├── sub-001_task-MGT_run-03_bold_space-fsaverage5.R.func.json
181+
├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_brainmask.json
182+
├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
183+
├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_preproc.json
184+
├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
185+
├── sub-001_task-MGT_run-03_bold_space-T1w_label-aparcaseg_roi.json
186+
├── sub-001_task-MGT_run-03_bold_space-T1w_label-aparcaseg_roi.nii.gz
187+
├── sub-001_task-MGT_run-03_bold_space-T1w_label-aseg_roi.json
188+
├── sub-001_task-MGT_run-03_bold_space-T1w_label-aseg_roi.nii.gz
189+
├── sub-001_task-MGT_run-04_bold_confounds.json
190+
├── sub-001_task-MGT_run-04_bold_confounds.tsv
191+
├── sub-001_task-MGT_run-04_bold_space-fsaverage5.L.func.gii
192+
├── sub-001_task-MGT_run-04_bold_space-fsaverage5.L.func.json
193+
├── sub-001_task-MGT_run-04_bold_space-fsaverage5.R.func.gii
194+
├── sub-001_task-MGT_run-04_bold_space-fsaverage5.R.func.json
195+
├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_brainmask.json
196+
├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
197+
├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_preproc.json
198+
├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
199+
├── sub-001_task-MGT_run-04_bold_space-T1w_label-aparcaseg_roi.json
200+
├── sub-001_task-MGT_run-04_bold_space-T1w_label-aparcaseg_roi.nii.gz
201+
├── sub-001_task-MGT_run-04_bold_space-T1w_label-aseg_roi.json
202+
└── sub-001_task-MGT_run-04_bold_space-T1w_label-aseg_roi.nii.gz
203+
```
204+
205+
### Limitations
206+
207+
* For now, we use a simplified description of the provenance, leaving aside software and environments as well as keys such as `Digest`, `Version`, `EnvVar`, `StartedAtTime`, `EndedAtTime`.
208+
* Some entities end up with several labels / atlocation. E.g.:
209+
```JSON-LD
210+
{
211+
"Id": "http://iri.nidash.org/262c247816c9fc071309a1da8bad277d",
212+
"Type": "Entities",
213+
"Label": [
214+
"file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_03_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz",
215+
"file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_02_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz"
216+
],
217+
"Atlocation": [
218+
"file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_03_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz",
219+
"file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_02_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz"
220+
],
221+
"https://github.com/bids-standard/BEP028_BIDSprov/terms/Digest": "sha512:c585500ee6565b5e8277e3cf72dcdef81768439e7998c258d9e3cfc4042cf2d3fa80ecd359400deda90a4ed141e3180b78a942b32827bd41fb0ca367c8f91c9c"
222+
}
223+
```
224+
* As a result of the previous point, we are not able to fully replace these Entities from the `prov/prov-fmriprep_ent.json` file by a `GeneratedBy` field inside a sidecar JSON
225+
* Some terms are missing in the BIDS-Prov context although they are in the specification (such as `Digest`, `Version`, `EnvVar`)
226+
* For now, the conversion script is not able to transform RDF triplets into dictionaries, as requested for `Digest` or `EnvVar` objects.
227+
* IRIs are not human readable enough (e.g.: `http://iri.nidash.org/262c247816c9fc071309a1da8bad277d`)
228+
* Some "Function" and other activity nodes Use and Generate the same entity. Does this really mean that they read and write the same file ?
229+
230+
### Next steps
231+
232+
* how to represent entities with two labels and locations ?
233+
* then, use file names for Ids of entities
234+
* make extractions based on consistent use of qualifiedUsage and qualifiedGeneration (vs. Used and GeneratedBy)
235+
* investiate activities Using an Generating the same file (e.g.: `http://iri.nidash.org/4650c7ac00df11f0992d72ca464e997e` with entity `http://iri.nidash.org/72737575a38dda35b8ab6530a55aa543` which is `file://b330d9dac87a/data/sub-001/anat/sub-001_T1w.nii.gz`)
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
#!/usr/bin/python
2+
# coding: utf-8
3+
4+
""" Convert nipype provenance traces into one BIDS-Prov compliant JSON-LD graph """
5+
6+
import json
7+
from pyld import jsonld
8+
from rdflib import Dataset, Graph, Namespace
9+
from rdflib.namespace import RDF, RDFS, PROV
10+
from rdflib.plugins.sparql import prepareQuery
11+
12+
from queries import queries, simple_queries
13+
14+
# Dict of namespaces to be used in queries
15+
NAMESPACES = {
16+
'rdfs': RDFS,
17+
'rdf': RDF,
18+
'prov': PROV,
19+
'nipype': Namespace("http://nipy.org/nipype/terms/"),
20+
'niiri': Namespace("http://iri.nidash.org/"),
21+
'crypto': Namespace("http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/"),
22+
'bidsprov': Namespace("https://github.com/bids-standard/BEP028_BIDSprov/terms/")
23+
}
24+
25+
# Parse the nipype RDF provenance file
26+
# We use Dataset as there might be several graphs in the file
27+
nipype_prov = Dataset()
28+
nipype_prov.parse('prov/nipype/workflow_provenance_20250314T155959.trig', format='trig')
29+
30+
31+
# Create an empty graph for output provenance
32+
bids_prov = Graph()
33+
34+
# Query input graphs
35+
for label, query in simple_queries.items():
36+
37+
for graph in nipype_prov.graphs():
38+
q = prepareQuery(query, initNs = NAMESPACES)
39+
queried_graph = graph.query(q)
40+
41+
if len(queried_graph) > 0:
42+
bids_prov += queried_graph
43+
44+
# Serialize output graph to JSON-LD and compact
45+
compacted = jsonld.compact(
46+
json.loads(bids_prov.serialize(format='json-ld')),
47+
'https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json'
48+
)
49+
50+
# Write compacted JSON-LD
51+
with open('prov/nipype/workflow_provenance_20250314T155959_compacted.jsonld', 'w', encoding='utf-8') as file:
52+
file.write(json.dumps(compacted, indent=2))
53+
54+
# Merge records into a BIDS-Prov skeleton and write separated JSON files
55+
bids_prov_skeleton = {
56+
"@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
57+
"BIDSProvVersion": "0.0.1",
58+
"Records": {
59+
"Software": [],
60+
"Activities": [],
61+
"Entities": [],
62+
"Environments": []
63+
}
64+
}
65+
software_records = {}
66+
activities_records = {}
67+
entities_records = {}
68+
environments_records = {}
69+
70+
for record in compacted['@graph']:
71+
72+
record_without_id = record.copy()
73+
record_without_id.pop('Id')
74+
75+
if 'Type' not in record:
76+
continue
77+
if record['Type'] == 'Software':
78+
bids_prov_skeleton['Records']['Software'].append(record)
79+
software_records[record['Id']] = record_without_id
80+
elif record['Type'] == 'Activities':
81+
bids_prov_skeleton['Records']['Activities'].append(record)
82+
activities_records[record['Id']] = record_without_id
83+
elif 'Environment' in record['Type']:
84+
bids_prov_skeleton['Records']['Environments'].append(record)
85+
environments_records[record['Id']] = record_without_id
86+
else:
87+
bids_prov_skeleton['Records']['Entities'].append(record)
88+
entities_records[record['Id']] = record_without_id
89+
90+
# Write BIDS-Prov JSON-LD
91+
with open('prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld', 'w', encoding='utf-8') as file:
92+
file.write(json.dumps(bids_prov_skeleton, indent=2))
93+
94+
# Write split JSONs
95+
with open('prov/prov-fmriprep_soft.json', 'w', encoding='utf-8') as file:
96+
file.write(json.dumps(software_records, indent=2))
97+
with open('prov/prov-fmriprep_act.json', 'w', encoding='utf-8') as file:
98+
file.write(json.dumps(activities_records, indent=2))
99+
with open('prov/prov-fmriprep_env.json', 'w', encoding='utf-8') as file:
100+
file.write(json.dumps(environments_records, indent=2))
101+
with open('prov/prov-fmriprep_ent.json', 'w', encoding='utf-8') as file:
102+
file.write(json.dumps(entities_records, indent=2))

0 commit comments

Comments
 (0)