|
| 1 | +# A `fMRIPrep` example for BIDS-Prov |
| 2 | + |
| 3 | +This example aims at showing provenance records for the [fMRIPrep](https://fmriprep.org/en/23.1.3/index.html) preprocessing software, as a typical usecase on how to store provenance inside a BIDS derivatives dataset. |
| 4 | + |
| 5 | +> [!NOTE] |
| 6 | +> The command lines described in this documentation are supposed to be run from the `examples/fmriprep/` directory. |
| 7 | +
|
| 8 | +> [!WARNING] |
| 9 | +> This examples needs the bids prov visualizer to export .svg files instead of .png files (for which the resolution is way too low to display a whole fMRIprep graph), hence modifications in `bids_prov/visualize.py` |
| 10 | +
|
| 11 | +## Source dataset |
| 12 | + |
| 13 | +We use the dataset from https://openneuro.org/datasets/ds001734/versions/1.0.5, containing raw and preprocessed fMRI data of two versions of the mixed gambles task, from the Neuroimaging Analysis Replication and Prediction Study (NARPS). |
| 14 | + |
| 15 | +```shell |
| 16 | +datalad install https://github.com/OpenNeuroDatasets/ds001734.git |
| 17 | +git submodule add https://github.com/OpenNeuroDatasets/ds001734.git ds001734 |
| 18 | +cd ds001734 |
| 19 | +datalad get sub-001/* |
| 20 | +``` |
| 21 | + |
| 22 | +## `fMRIPrep` installation |
| 23 | + |
| 24 | +```shell |
| 25 | +pip install fmriprep-docker==1.1.4 |
| 26 | +docker pull poldracklab/fmriprep:1.1.4 |
| 27 | +mkdir derivatives/ |
| 28 | +``` |
| 29 | + |
| 30 | +## Getting provenance records from nipype |
| 31 | + |
| 32 | +Create a `nipype.cfg` file to setup provenance recording in nipype. The file contains the following lines: |
| 33 | +``` |
| 34 | +[execution] |
| 35 | +write_provenance = true |
| 36 | +hash_method = content |
| 37 | +``` |
| 38 | + |
| 39 | +Launch `fMRIPrep` on one subject (sub-001): |
| 40 | +```shell |
| 41 | +fmriprep-docker --participant-label=001 --fs-license-file=freesurfer_license.txt --config=nipype.cfg -w=derivatives/work/ ds001734/ derivatives/ participant |
| 42 | +``` |
| 43 | + |
| 44 | +> [!NOTE] |
| 45 | +> This is responsible for launching the following command line: |
| 46 | +> ```shell |
| 47 | +> docker run --rm -it -v <absolute_path_to>/freesurfer_license.txt:/opt/freesurfer/license.txt:ro -v <absolute_path_to>/nipype.cfg:/root/.nipype/nipype.cfg:ro -v <absolute_path_to>ds001734/:/data:ro -v <absolute_path_to>derivatives/:/out -v <absolute_path_to>derivatives/work:/scratch poldracklab/fmriprep:1.1.4 /data /out participant --participant-label=001 -w /scratch |
| 48 | +> ``` |
| 49 | +
|
| 50 | +## Converting nipype provenance to BIDS-Prov |
| 51 | +
|
| 52 | +Nipype generates RDF provenance records in Trig format, as contained in `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959.trig`. |
| 53 | +
|
| 54 | +We use the `code/convert_prov.py` script to convert it to BIDS-Prov compliant provenance: |
| 55 | +
|
| 56 | +```shell |
| 57 | +cd derivatives/fmriprep/ |
| 58 | +python code/convert_prov.py |
| 59 | +``` |
| 60 | +
|
| 61 | +This script perform SPARQL queries to extract a simplified version of the RDF graph, containing activities, entities, environments and agents with these relations: |
| 62 | + |
| 63 | +| Record | relations | |
| 64 | +| --- | --- | |
| 65 | +| Activities | `Label`<br>`Type`<br>`Command`<br>`AssociatedWith`<br>`Used`<br>`StartedAtTime`<br>`EndedAtTime` | |
| 66 | +| Entities | `Label`<br>`AtLocation`<br>`GeneratedBy`<br>`Type`<br>`Digest` | |
| 67 | +| Agents | `Label`<br>`Type`<br>`Version` | |
| 68 | +| Environments | `Label`<br>`Type`<br>`EnvVar` | |
| 69 | + |
| 70 | +> [!NOTE] |
| 71 | +> The script works with the `code/queries.py` module containing a set of exhaustive queries, and a set of simplified ones. The example uses he simplified queries (that do not extract Environments or Agents) to simplify the output graph. |
| 72 | +
|
| 73 | +The script generates: |
| 74 | +* `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_compacted.jsonld`: a JSON-LD file, which is the serialization of the simplified RDF graph |
| 75 | +* `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld`: a BIDS-Prov file created by adapting the previous JSON-LD file to a BIDS-Prov skeleton |
| 76 | +* provenance records split into JSON files `derivatives/fmriprep/prov/prov-fmriprep_*.json` |
| 77 | + |
| 78 | +We are able to visualize the BIDS-Prov graph: |
| 79 | +```shell |
| 80 | +pip install bids-prov==0.1.0 |
| 81 | +bids_prov_visualizer --input_file derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld --output_file derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.svg |
| 82 | +``` |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +## Storing provenance in sidecar JSONs |
| 87 | + |
| 88 | +We use the `code/split_prov.py` script to create (or complement) sidecar JSON files form Entity records of `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld`. |
| 89 | + |
| 90 | +```shell |
| 91 | +python code/split_prov.py -i prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld -o . |
| 92 | +``` |
| 93 | + |
| 94 | +This gives the following tree (`code/` and `prov/nipype/` directories are ignored): |
| 95 | + |
| 96 | +``` |
| 97 | +. |
| 98 | +├── prov |
| 99 | +│ ├── prov-fmriprep_act.json |
| 100 | +│ ├── prov-fmriprep_base.json |
| 101 | +│ ├── prov-fmriprep_ent.json |
| 102 | +│ ├── prov-fmriprep_env.json |
| 103 | +│ └── prov-fmriprep_soft.json |
| 104 | +└── sub-001 |
| 105 | + ├── anat |
| 106 | + │ ├── sub-001_T1w_brainmask.json |
| 107 | + │ ├── sub-001_T1w_brainmask.nii.gz |
| 108 | + │ ├── sub-001_T1w_dtissue.json |
| 109 | + │ ├── sub-001_T1w_dtissue.nii.gz |
| 110 | + │ ├── sub-001_T1w_inflated.L.surf.gii |
| 111 | + │ ├── sub-001_T1w_inflated.L.surf.json |
| 112 | + │ ├── sub-001_T1w_inflated.R.surf.gii |
| 113 | + │ ├── sub-001_T1w_inflated.R.surf.json |
| 114 | + │ ├── sub-001_T1w_label-aparcaseg_roi.json |
| 115 | + │ ├── sub-001_T1w_label-aparcaseg_roi.nii.gz |
| 116 | + │ ├── sub-001_T1w_label-aseg_roi.json |
| 117 | + │ ├── sub-001_T1w_label-aseg_roi.nii.gz |
| 118 | + │ ├── sub-001_T1w_midthickness.L.surf.gii |
| 119 | + │ ├── sub-001_T1w_midthickness.L.surf.json |
| 120 | + │ ├── sub-001_T1w_midthickness.R.surf.gii |
| 121 | + │ ├── sub-001_T1w_midthickness.R.surf.json |
| 122 | + │ ├── sub-001_T1w_pial.L.surf.gii |
| 123 | + │ ├── sub-001_T1w_pial.L.surf.json |
| 124 | + │ ├── sub-001_T1w_pial.R.surf.gii |
| 125 | + │ ├── sub-001_T1w_pial.R.surf.json |
| 126 | + │ ├── sub-001_T1w_preproc.json |
| 127 | + │ ├── sub-001_T1w_preproc.nii.gz |
| 128 | + │ ├── sub-001_T1w_smoothwm.L.surf.gii |
| 129 | + │ ├── sub-001_T1w_smoothwm.L.surf.json |
| 130 | + │ ├── sub-001_T1w_smoothwm.R.surf.gii |
| 131 | + │ ├── sub-001_T1w_smoothwm.R.surf.json |
| 132 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_brainmask.json |
| 133 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_brainmask.nii.gz |
| 134 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_dtissue.json |
| 135 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_dtissue.nii.gz |
| 136 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_preproc.json |
| 137 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_preproc.nii.gz |
| 138 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_target-T1w_warp.h5 |
| 139 | + │ ├── sub-001_T1w_space-MNI152NLin2009cAsym_target-T1w_warp.json |
| 140 | + │ ├── sub-001_T1w_space-orig_target-T1w_affine.json |
| 141 | + │ ├── sub-001_T1w_space-orig_target-T1w_affine.txt |
| 142 | + │ ├── sub-001_T1w_target-fsnative_affine.json |
| 143 | + │ ├── sub-001_T1w_target-fsnative_affine.txt |
| 144 | + │ ├── sub-001_T1w_target-MNI152NLin2009cAsym_warp.h5 |
| 145 | + │ └── sub-001_T1w_target-MNI152NLin2009cAsym_warp.json |
| 146 | + └── func |
| 147 | + ├── sub-001_task-MGT_run-01_bold_confounds.json |
| 148 | + ├── sub-001_task-MGT_run-01_bold_confounds.tsv |
| 149 | + ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.L.func.gii |
| 150 | + ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.L.func.json |
| 151 | + ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.R.func.gii |
| 152 | + ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.R.func.json |
| 153 | + ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_brainmask.json |
| 154 | + ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz |
| 155 | + ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.json |
| 156 | + ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz |
| 157 | + ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aparcaseg_roi.json |
| 158 | + ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aparcaseg_roi.nii.gz |
| 159 | + ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aseg_roi.json |
| 160 | + ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aseg_roi.nii.gz |
| 161 | + ├── sub-001_task-MGT_run-02_bold_confounds.json |
| 162 | + ├── sub-001_task-MGT_run-02_bold_confounds.tsv |
| 163 | + ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.L.func.gii |
| 164 | + ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.L.func.json |
| 165 | + ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.R.func.gii |
| 166 | + ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.R.func.json |
| 167 | + ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_brainmask.json |
| 168 | + ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz |
| 169 | + ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_preproc.json |
| 170 | + ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_preproc.nii.gz |
| 171 | + ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aparcaseg_roi.json |
| 172 | + ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aparcaseg_roi.nii.gz |
| 173 | + ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aseg_roi.json |
| 174 | + ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aseg_roi.nii.gz |
| 175 | + ├── sub-001_task-MGT_run-03_bold_confounds.json |
| 176 | + ├── sub-001_task-MGT_run-03_bold_confounds.tsv |
| 177 | + ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.L.func.gii |
| 178 | + ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.L.func.json |
| 179 | + ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.R.func.gii |
| 180 | + ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.R.func.json |
| 181 | + ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_brainmask.json |
| 182 | + ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz |
| 183 | + ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_preproc.json |
| 184 | + ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_preproc.nii.gz |
| 185 | + ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aparcaseg_roi.json |
| 186 | + ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aparcaseg_roi.nii.gz |
| 187 | + ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aseg_roi.json |
| 188 | + ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aseg_roi.nii.gz |
| 189 | + ├── sub-001_task-MGT_run-04_bold_confounds.json |
| 190 | + ├── sub-001_task-MGT_run-04_bold_confounds.tsv |
| 191 | + ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.L.func.gii |
| 192 | + ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.L.func.json |
| 193 | + ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.R.func.gii |
| 194 | + ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.R.func.json |
| 195 | + ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_brainmask.json |
| 196 | + ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz |
| 197 | + ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_preproc.json |
| 198 | + ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_preproc.nii.gz |
| 199 | + ├── sub-001_task-MGT_run-04_bold_space-T1w_label-aparcaseg_roi.json |
| 200 | + ├── sub-001_task-MGT_run-04_bold_space-T1w_label-aparcaseg_roi.nii.gz |
| 201 | + ├── sub-001_task-MGT_run-04_bold_space-T1w_label-aseg_roi.json |
| 202 | + └── sub-001_task-MGT_run-04_bold_space-T1w_label-aseg_roi.nii.gz |
| 203 | +``` |
| 204 | + |
| 205 | +### Limitations |
| 206 | + |
| 207 | +* For now, we use a simplified description of the provenance, leaving aside software and environments as well as keys such as `Digest`, `Version`, `EnvVar`, `StartedAtTime`, `EndedAtTime`. |
| 208 | +* Some entities end up with several labels / atlocation. E.g.: |
| 209 | +```JSON-LD |
| 210 | +{ |
| 211 | + "Id": "http://iri.nidash.org/262c247816c9fc071309a1da8bad277d", |
| 212 | + "Type": "Entities", |
| 213 | + "Label": [ |
| 214 | + "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_03_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz", |
| 215 | + "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_02_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz" |
| 216 | + ], |
| 217 | + "Atlocation": [ |
| 218 | + "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_03_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz", |
| 219 | + "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_02_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz" |
| 220 | + ], |
| 221 | + "https://github.com/bids-standard/BEP028_BIDSprov/terms/Digest": "sha512:c585500ee6565b5e8277e3cf72dcdef81768439e7998c258d9e3cfc4042cf2d3fa80ecd359400deda90a4ed141e3180b78a942b32827bd41fb0ca367c8f91c9c" |
| 222 | +} |
| 223 | +``` |
| 224 | +* As a result of the previous point, we are not able to fully replace these Entities from the `prov/prov-fmriprep_ent.json` file by a `GeneratedBy` field inside a sidecar JSON |
| 225 | +* Some terms are missing in the BIDS-Prov context although they are in the specification (such as `Digest`, `Version`, `EnvVar`) |
| 226 | +* For now, the conversion script is not able to transform RDF triplets into dictionaries, as requested for `Digest` or `EnvVar` objects. |
| 227 | +* IRIs are not human readable enough (e.g.: `http://iri.nidash.org/262c247816c9fc071309a1da8bad277d`) |
| 228 | +* Some "Function" and other activity nodes Use and Generate the same entity. Does this really mean that they read and write the same file ? |
| 229 | + |
| 230 | +### Next steps |
| 231 | + |
| 232 | +* how to represent entities with two labels and locations ? |
| 233 | +* then, use file names for Ids of entities |
| 234 | +* make extractions based on consistent use of qualifiedUsage and qualifiedGeneration (vs. Used and GeneratedBy) |
| 235 | +* investiate activities Using an Generating the same file (e.g.: `http://iri.nidash.org/4650c7ac00df11f0992d72ca464e997e` with entity `http://iri.nidash.org/72737575a38dda35b8ab6530a55aa543` which is `file://b330d9dac87a/data/sub-001/anat/sub-001_T1w.nii.gz`) |
0 commit comments