Skip to content

Commit eccb336

Browse files
committed
fmriprep example from nipype provenance
1 parent 0cd7cf9 commit eccb336

File tree

6 files changed

+215366
-43
lines changed

6 files changed

+215366
-43
lines changed

examples/fmriprep/README.md

Lines changed: 49 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,82 @@
11
# A `fMRIPrep` example for BIDS-Prov
22

3-
This example aims at showing provenance traces for the [fMRIPrep](https://fmriprep.org/en/23.1.3/index.html) preprocessing software on a Linux-based (Fedora) operating system.
3+
This example aims at showing provenance records for the [fMRIPrep](https://fmriprep.org/en/23.1.3/index.html) preprocessing software, as a typical usecase on how to store provenance inside a BIDS derivatives dataset.
44

5-
## `fMRIPrep` installation
5+
> [!NOTE]
6+
> The command lines described in this documentation are supposed to be run from the `examples/fmriprep/` directory.
67
7-
```shell
8-
pip install fmriprep-docker==1.1.4
8+
## Source dataset
99

10-
docker pull poldracklab/fmriprep:1.1.4
10+
We use the dataset from https://openneuro.org/datasets/ds001734/versions/1.0.5, containing raw and preprocessed fMRI data of two versions of the mixed gambles task, from the Neuroimaging Analysis Replication and Prediction Study (NARPS).
1111

12-
mkdir derivatives/
12+
```shell
13+
datalad install https://github.com/OpenNeuroDatasets/ds001734.git
14+
git submodule add https://github.com/OpenNeuroDatasets/ds001734.git ds001734
15+
cd ds001734
16+
datalad get sub-001/*
1317
```
1418

15-
Launching `fMRIPrep` on one subject.
19+
## `fMRIPrep` installation
1620

1721
```shell
18-
fmriprep-docker --participant-label=001 --fs-license-file=soft/freesurfer/license.txt --config=nipype.cfg -w=data/ds001734_fmriprep/work/ dev/BEP028_BIDSprov/examples/fmriprep/ds001734/ data/ds001734_fmriprep/ participant
22+
pip install fmriprep-docker==1.1.4
23+
docker pull poldracklab/fmriprep:1.1.4
24+
mkdir derivatives/
1925
```
2026

21-
TODO : alternative nipype configuration to enable provenance
27+
## Getting provenance records from nipype
2228

23-
nipype.cfg:
29+
Create a `nipype.cfg` file to setup provenance recording in nipype. The file contains the following lines:
2430
```
2531
[execution]
2632
write_provenance = true
2733
hash_method = content
2834
```
2935

30-
docker run --rm -it -v /home/$USER/soft/freesurfer/license.txt:/opt/freesurfer/license.txt:ro -v /home/$USER/dev/bidsprov/nipype.cfg:/root/.nipype/nipype.cfg:ro -v /home/$USER/nas-empenn/share/dbs/narps_open/data/original/ds001734/:/data:ro -v /data/$USER/ds001734_fmriprep:/out -v /data/$USER/ds001734_fmriprep/work:/scratch poldracklab/fmriprep:1.1.4 /data /out participant --participant-label=001 -w /scratch
36+
Launch `fMRIPrep` on one subject (sub-001):
37+
```shell
38+
fmriprep-docker --participant-label=001 --fs-license-file=freesurfer_license.txt --config=nipype.cfg -w=derivatives/work/ ds001734/ derivatives/ participant
39+
```
3140

41+
> [!NOTE]
42+
> This is responsible for launching the following command line:
43+
> ```shell
44+
> docker run --rm -it -v <absolute_path_to>/freesurfer_license.txt:/opt/freesurfer/license.txt:ro -v <absolute_path_to>/nipype.cfg:/root/.nipype/nipype.cfg:ro -v <absolute_path_to>ds001734/:/data:ro -v <absolute_path_to>derivatives/:/out -v <absolute_path_to>derivatives/work:/scratch poldracklab/fmriprep:1.1.4 /data /out participant --participant-label=001 -w /scratch
45+
> ```
3246
47+
## Converting nipype provenance to BIDS-Prov
3348
34-
## Source dataset
49+
Nipype generates RDF provenance records in Trig format, as contained in `derivatives/fmriprep/prov/workflow_provenance_20250314T155959.trig`.
3550
36-
We use the dataset from https://openneuro.org/datasets/ds001734/versions/1.0.5, containing raw and preprocessed fMRI data of two versions of the mixed gambles task, from the Neuroimaging Analysis Replication and Prediction Study (NARPS).
51+
We use the `code/convert_prov.py` script to convert it to BIDS-Prov compliant provenance:
3752
3853
```shell
39-
datalad install https://github.com/OpenNeuroDatasets/ds001734.git
54+
cd derivatives/fmriprep/
55+
python code/convert_prov.py
56+
```
4057
41-
git submodule add https://github.com/OpenNeuroDatasets/ds001734.git examples/fmriprep/ds001734
58+
This script perform SPARQL queries to extract a simplified version of the RDF graph, containing activities, entities, agents, and environments with these relations:
4259

43-
datalad get sub-001/*
44-
```
60+
| Record | relations |
61+
| --- | --- |
62+
| Activities | Label<br>Type<br>Command<br>AssociatedWith<br>Used<br>StartedAtTime<br>EndedAtTime |
63+
| Entities | Label<br>AtLocation<br>GeneratedBy<br>Type<br>Digest |
64+
| Agents | Label<br>Type<br>Version |
65+
| Environments | Label<br>Type<br>EnvVar |
4566

46-
## Associated provenance
67+
The script generates:
68+
* `derivatives/fmriprep/prov/workflow_provenance_20250314T155959_compacted.jsonld`: a JSON-LD file, which is the serialization of the simplified RDF graph
69+
* `derivatives/fmriprep/prov/workflow_provenance_20250314T155959_bidsprov.jsonld`: a BIDS-Prov file created by adapting the previous JSON-LD file to a BIDS-Prov skeleton
70+
71+
We are able to visualize the BIDS-Prov graph:
72+
```shell
73+
pip install bids-prov==0.1.0
74+
bids_prov_visualizer --input_file derivatives/fmriprep/prov/workflow_provenance_20250314T155959_bidsprov.jsonld --output_file derivatives/fmriprep/prov/workflow_provenance_20250314T155959_bidsprov.svg
75+
```
4776

48-
In order to describe provenance records using BIDS Prov, we use:
77+
![](/examples/fmriprep/derivatives/fmriprep/prov/workflow_provenance_20250314T155959_bidsprov.svg)
4978

50-
* modality agnostic files inside the `prov/` directory
51-
* subject / modality level provenance files
79+
## Storing provenance in the dataset
5280

5381
```
5482
.
@@ -71,23 +99,6 @@ In order to describe provenance records using BIDS Prov, we use:
7199
└── sub-001_task-MGT_bold_prov-fmriprep_ent.prov.json
72100
```
73101

74-
## New features for BIDS / BIDS Prov
75-
76-
We introduce the following BIDS entity that is currently not existing:
77-
78-
* `prov`
79-
* Full name: Provenance traces
80-
* Format: `prov-<label>`
81-
* Definition: A grouping of provenance traces. Defining multiple provenance traces groups is appropriate when several processings have been performed on data.
82-
83-
We introduce the following BIDS suffixes that are currently not existing:
84-
85-
* `act`: the file describes BIDS Prov Activities for the group of provenance traces
86-
* `soft`: the file describes BIDS Prov Software for the group of provenance traces
87-
* `env`: the file describes BIDS Prov Environments for the group of provenance traces
88-
* `ent`: the file describes BIDS Prov Entities for the group of provenance traces
89-
* `base`: the file describes common BIDS Prov parameters for the group of provenance traces (version and context for BIDS Prov)
90-
91102
## Merging JSON in a JSON-LD file and plotting graph
92103

93104
The python script `code/merge_prov.py` aims at merging all these provenance records into one JSON-LD graph.
@@ -99,12 +110,7 @@ python code/merge_prov.py
99110

100111
From that, we generate the JSON-LD graph `prov/merge/prov-fmriprep.prov.jsonld`. Then we were able to plot the graph as a png file. We used this command:
101112

102-
```shell
103-
pip install bids-prov==0.1.0
104-
bids_prov_visualizer --input_file prov/merged/prov-fmriprep.prov.jsonld --output_file prov/merged/prov-fmriprep.prov.png
105-
```
106113

107-
![](/examples/fmriprep/prov/merged/prov-fmriprep.prov.png)
108114

109115
### Notes
110116

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
#!/usr/bin/python
2+
# coding: utf-8
3+
4+
""" Convert nipype provenance traces into one BIDS-Prov compliant JSON-LD graph """
5+
6+
import json
7+
from pyld import jsonld
8+
from rdflib import Dataset, Graph, Namespace
9+
from rdflib.namespace import RDF, RDFS, PROV
10+
from rdflib.plugins.sparql import prepareQuery
11+
12+
# Dict of namespaces to be used in queries
13+
NAMESPACES = {
14+
'rdfs': RDFS,
15+
'rdf': RDF,
16+
'prov': PROV,
17+
'nipype': Namespace("http://nipy.org/nipype/terms/"),
18+
'niiri': Namespace("http://iri.nidash.org/"),
19+
'crypto': Namespace("http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/"),
20+
'bidsprov': Namespace("https://github.com/bids-standard/BEP028_BIDSprov/terms/")
21+
}
22+
23+
# Parse the nipype RDF provenance file
24+
# We use Dataset as there might be several graphs in the file
25+
nipype_prov = Dataset()
26+
nipype_prov.parse('prov/workflow_provenance_20250314T155959.trig', format='trig')
27+
28+
# Create an empty graph for output provenance
29+
bids_prov = Graph()
30+
31+
# Create a list of queries to extract data from the input file
32+
query_labels = [
33+
'1. Extract output file entities',
34+
'2. Extract input file entities',
35+
'3. Extract activities',
36+
'4. Extract agents',
37+
'5. Extract environments'
38+
]
39+
queries = [
40+
# 1. Extract output file entities
41+
"""
42+
CONSTRUCT {
43+
?s rdfs:label ?label .
44+
?s prov:atLocation ?atlocation .
45+
?s prov:wasGeneratedBy ?act .
46+
?s rdf:type ?type .
47+
}
48+
WHERE {
49+
?s ?p ?o .
50+
?s prov:qualifiedGeneration ?gen . # entity has a qualified generation
51+
?gen prov:activity ?act . # this qualified generation has an activity
52+
?act nipype:command ?x . # this activity has a command (disables activities representing nipype interfaces)
53+
?s prov:value ?label .
54+
?s prov:atLocation ?atlocation .
55+
?s rdf:type prov:Entity .
56+
?s rdf:type ?type .
57+
?s crypto:sha512 ?sha .
58+
BIND(STR(?label) as ?label)
59+
BIND(STR(?atlocation) as ?atlocation)
60+
}
61+
""",
62+
# 2. Extract input file entities
63+
"""
64+
CONSTRUCT {
65+
?s rdfs:label ?label .
66+
?s prov:atLocation ?atlocation .
67+
?s rdf:type prov:Entity .
68+
?s bidsprov:Digest ?sha .
69+
}
70+
WHERE {
71+
?s ?p ?o .
72+
?collection prov:hadMember ?s .
73+
?collection rdf:type nipype:Inputs .
74+
?s prov:value ?label .
75+
?s prov:atLocation ?atlocation .
76+
?s rdf:type prov:Entity .
77+
?s crypto:sha512 ?sha .
78+
FILTER NOT EXISTS { ?s prov:wasGeneratedBy ?x . } # Entity was not generated by anything
79+
BIND(STR(?label) as ?label)
80+
BIND(STR(?atlocation) as ?atlocation)
81+
BIND(CONCAT("sha512:", STR(?sha)) as ?sha)
82+
}
83+
""",
84+
# 3. Extract activities
85+
"""
86+
CONSTRUCT {
87+
?s rdfs:label ?label .
88+
?s rdf:type prov:Activity .
89+
?s bidsprov:Command ?command . # we select activities with commands only (disables activities representing nipype interfaces)
90+
?s prov:wasAssociatedWith ?associated .
91+
# ?s prov:used ?used . # comment this line to remove prov:used environments
92+
?s prov:used ?usedent .
93+
?s prov:startedAtTime ?started .
94+
?s prov:endedAtTime ?ended .
95+
}
96+
WHERE {
97+
?s ?p ?o .
98+
?s rdfs:label ?label .
99+
?s rdf:type prov:Activity .
100+
?s nipype:command ?command .
101+
?s prov:wasAssociatedWith ?associated .
102+
?s prov:used ?used .
103+
?s prov:startedAtTime ?started .
104+
?s prov:endedAtTime ?ended .
105+
?s prov:qualifiedUsage ?qu .
106+
?qu prov:entity ?usedent .
107+
?usedent prov:atLocation ?x .
108+
BIND(STR(?label) as ?label)
109+
BIND(STR(?command) as ?command)
110+
}
111+
""",
112+
# 4. Extract agents
113+
"""
114+
CONSTRUCT {
115+
?s rdfs:label ?label .
116+
?s rdf:type prov:Agent .
117+
?s bidsprov:Version ?version .
118+
}
119+
WHERE {
120+
?s ?p ?o .
121+
?s rdfs:label ?label .
122+
?s rdf:type prov:SoftwareAgent .
123+
?s nipype:version ?version .
124+
BIND(STR(?label) as ?label)
125+
BIND(STR(?version) as ?version)
126+
}
127+
""",
128+
# 5. Extract environments
129+
"""
130+
CONSTRUCT {
131+
?s rdfs:label ?label .
132+
?s rdf:type bidsprov:Environment .
133+
?s bidsprov:EnvVar ?envvar .
134+
?envvar rdfs:label ?envvarkey .
135+
?envvar prov:value ?envvarval .
136+
}
137+
WHERE {
138+
?s ?p ?o .
139+
?s rdfs:label ?label .
140+
?s rdf:type nipype:Environment .
141+
?envvar a prov:Entity .
142+
?envvar nipype:environmentVariable ?envvarkey .
143+
?envvar prov:value ?envvarval .
144+
?s prov:hadMember ?envvar .
145+
BIND(STR(?label) as ?label)
146+
BIND(STR(?envvarkey) as ?envvarkey)
147+
BIND(STR(?envvarval) as ?envvarval)
148+
}
149+
"""
150+
]
151+
152+
# Query input graph
153+
for label, query in zip(query_labels, queries):
154+
print(label)
155+
if 'environments' not in label:
156+
q = prepareQuery(query, initNs = NAMESPACES)
157+
for graph in nipype_prov.graphs():
158+
queried_graph = graph.query(q)
159+
if len(queried_graph) > 0:
160+
bids_prov += queried_graph
161+
162+
# Serialize output graph to JSON-LD and compact
163+
compacted = jsonld.compact(
164+
json.loads(bids_prov.serialize(format='json-ld')),
165+
'https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json'
166+
)
167+
168+
# Write compacted JSON-LD
169+
with open('prov/workflow_provenance_20250314T155959_compacted.jsonld', 'w', encoding='utf-8') as file:
170+
file.write(json.dumps(compacted, indent=2))
171+
172+
# Merge records into a BIDS-Prov skeleton
173+
bids_prov_skeleton = {
174+
"@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
175+
"BIDSProvVersion": "0.0.1",
176+
"Records": {
177+
"Software": [],
178+
"Activities": [],
179+
"Entities": [],
180+
"Environments": []
181+
}
182+
}
183+
for record in compacted['@graph']:
184+
if 'Type' not in record:
185+
continue
186+
if record['Type'] == 'Software':
187+
bids_prov_skeleton['Records']['Software'].append(record)
188+
elif record['Type'] == 'Activities':
189+
bids_prov_skeleton['Records']['Activities'].append(record)
190+
elif 'Environment' in record['Type']:
191+
bids_prov_skeleton['Records']['Environments'].append(record)
192+
else:
193+
bids_prov_skeleton['Records']['Entities'].append(record)
194+
195+
# Write BIDS-Prov JSON-LD
196+
with open('prov/workflow_provenance_20250314T155959_bidsprov.jsonld', 'w', encoding='utf-8') as file:
197+
file.write(json.dumps(bids_prov_skeleton, indent=2))

0 commit comments

Comments
 (0)