bclenet
diff --git a/‎.codespellrc‎
Lines changed: 1 addition & 1 deletion b/‎.codespellrc‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.gitmodules‎
Lines changed: 4 additions & 0 deletions b/‎.gitmodules‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎bids_prov/visualize.py‎
Lines changed: 3 additions & 1 deletion b/‎bids_prov/visualize.py‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎examples/fmriprep/README.md‎
Lines changed: 235 additions & 0 deletions b/‎examples/fmriprep/README.md‎
Lines changed: 235 additions & 0 deletions
diff --git a/‎examples/fmriprep/derivatives/fmriprep/code/convert_prov.py‎
Lines changed: 102 additions & 0 deletions b/‎examples/fmriprep/derivatives/fmriprep/code/convert_prov.py‎
Lines changed: 102 additions & 0 deletions
@@ -1,4 +1,4 @@
 [codespell]
-skip = .git,*.pdf,*.svg,venvs,./examples/from_parsers
+skip = .git,*.pdf,*.svg,venvs,./examples/from_parsers,*.trig
 #
 # ignore-words-list =
@@ -1,3 +1,7 @@
+[submodule "examples/fmriprep/ds001734"]
+	path = examples/fmriprep/ds001734
+	url = [email protected]:OpenNeuroDatasets/ds001734.git
+	datalad-id = ca05bc10-29a0-11e9-9a7b-0242ac13000d
 [submodule "examples/heudiconv/sourcedata/hirni-demo"]
 	path = examples/heudiconv/sourcedata/hirni-demo
 	url = https://github.com/psychoinformatics-de/hirni-demo.git
@@ -28,7 +28,9 @@ def viz_turtle(content=None, img_file=None, source=None, **kwargs) -> None:
     # TODO : show attributes has optional arg
     dot = prov_to_dot(prov_doc, use_labels=True,
                       show_element_attributes=False, show_relation_attributes=False)
-    dot.write_png(img_file)
+    # dot.write_png(img_file)
+    with open(img_file, 'wb') as file:
+        file.write(dot.create_svg())
 
 
 def viz_jsonld11(jsonld11: dict, img_file: str) -> None:
 
@@ -0,0 +1,235 @@
+# A `fMRIPrep` example for BIDS-Prov
+
+This example aims at showing provenance records for the [fMRIPrep](https://fmriprep.org/en/23.1.3/index.html) preprocessing software, as a typical usecase on how to store provenance inside a BIDS derivatives dataset.
+
+> [!NOTE]
+> The command lines described in this documentation are supposed to be run from the `examples/fmriprep/` directory.
+
+> [!WARNING]
+> This examples needs the bids prov visualizer to export .svg files instead of .png files (for which the resolution is way too low to display a whole fMRIprep graph), hence modifications in `bids_prov/visualize.py`
+
+## Source dataset
+
+We use the dataset from https://openneuro.org/datasets/ds001734/versions/1.0.5, containing raw and preprocessed fMRI data of two versions of the mixed gambles task, from the Neuroimaging Analysis Replication and Prediction Study (NARPS).
+
+```shell
+datalad install https://github.com/OpenNeuroDatasets/ds001734.git
+git submodule add https://github.com/OpenNeuroDatasets/ds001734.git ds001734
+cd ds001734
+datalad get sub-001/*
+```
+
+## `fMRIPrep` installation
+
+```shell
+pip install fmriprep-docker==1.1.4
+docker pull poldracklab/fmriprep:1.1.4
+mkdir derivatives/
+```
+
+## Getting provenance records from nipype
+
+Create a `nipype.cfg` file to setup provenance recording in nipype. The file contains the following lines:
+```
+[execution]
+write_provenance = true
+hash_method = content
+```
+
+Launch `fMRIPrep` on one subject (sub-001):
+```shell
+fmriprep-docker --participant-label=001 --fs-license-file=freesurfer_license.txt --config=nipype.cfg -w=derivatives/work/ ds001734/ derivatives/ participant
+```
+
+> [!NOTE]
+> This is responsible for launching the following command line:
+> ```shell
+> docker run --rm -it -v <absolute_path_to>/freesurfer_license.txt:/opt/freesurfer/license.txt:ro -v <absolute_path_to>/nipype.cfg:/root/.nipype/nipype.cfg:ro -v <absolute_path_to>ds001734/:/data:ro -v <absolute_path_to>derivatives/:/out -v <absolute_path_to>derivatives/work:/scratch poldracklab/fmriprep:1.1.4 /data /out participant --participant-label=001 -w /scratch
+> ```
+
+## Converting nipype provenance to BIDS-Prov
+
+Nipype generates RDF provenance records in Trig format, as contained in `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959.trig`.
+
+We use the `code/convert_prov.py` script to convert it to BIDS-Prov compliant provenance:
+
+```shell
+cd derivatives/fmriprep/
+python code/convert_prov.py
+```
+
+This script perform SPARQL queries to extract a simplified version of the RDF graph, containing activities, entities, environments and agents with these relations:
+
+| Record | relations |
+| --- | --- |
+| Activities | `Label`<br>`Type`<br>`Command`<br>`AssociatedWith`<br>`Used`<br>`StartedAtTime`<br>`EndedAtTime` |
+| Entities | `Label`<br>`AtLocation`<br>`GeneratedBy`<br>`Type`<br>`Digest` |
+| Agents | `Label`<br>`Type`<br>`Version` |
+| Environments | `Label`<br>`Type`<br>`EnvVar` |
+
+> [!NOTE]
+> The script works with the `code/queries.py` module containing a set of exhaustive queries, and a set of simplified ones. The example uses he simplified queries (that do not extract Environments or Agents) to simplify the output graph.
+
+The script  generates:
+* `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_compacted.jsonld`: a JSON-LD file, which is the serialization of the simplified RDF graph
+* `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld`: a BIDS-Prov file created by adapting the previous JSON-LD file to a BIDS-Prov skeleton
+* provenance records split into JSON files `derivatives/fmriprep/prov/prov-fmriprep_*.json`
+
+We are able to visualize the BIDS-Prov graph:
+```shell
+pip install bids-prov==0.1.0
+bids_prov_visualizer --input_file derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld --output_file derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.svg
+```
+
+![](/examples/fmriprep/derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.svg)
+
+## Storing provenance in sidecar JSONs
+
+We use the `code/split_prov.py` script to create (or complement) sidecar JSON files form Entity records of `derivatives/fmriprep/prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld`.
+
+```shell
+python code/split_prov.py -i prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld -o .
+```
+
+This gives the following tree (`code/` and `prov/nipype/` directories are ignored):
+
+```
+.
+├── prov
+│   ├── prov-fmriprep_act.json
+│   ├── prov-fmriprep_base.json
+│   ├── prov-fmriprep_ent.json
+│   ├── prov-fmriprep_env.json
+│   └── prov-fmriprep_soft.json
+└── sub-001
+    ├── anat
+    │   ├── sub-001_T1w_brainmask.json
+    │   ├── sub-001_T1w_brainmask.nii.gz
+    │   ├── sub-001_T1w_dtissue.json
+    │   ├── sub-001_T1w_dtissue.nii.gz
+    │   ├── sub-001_T1w_inflated.L.surf.gii
+    │   ├── sub-001_T1w_inflated.L.surf.json
+    │   ├── sub-001_T1w_inflated.R.surf.gii
+    │   ├── sub-001_T1w_inflated.R.surf.json
+    │   ├── sub-001_T1w_label-aparcaseg_roi.json
+    │   ├── sub-001_T1w_label-aparcaseg_roi.nii.gz
+    │   ├── sub-001_T1w_label-aseg_roi.json
+    │   ├── sub-001_T1w_label-aseg_roi.nii.gz
+    │   ├── sub-001_T1w_midthickness.L.surf.gii
+    │   ├── sub-001_T1w_midthickness.L.surf.json
+    │   ├── sub-001_T1w_midthickness.R.surf.gii
+    │   ├── sub-001_T1w_midthickness.R.surf.json
+    │   ├── sub-001_T1w_pial.L.surf.gii
+    │   ├── sub-001_T1w_pial.L.surf.json
+    │   ├── sub-001_T1w_pial.R.surf.gii
+    │   ├── sub-001_T1w_pial.R.surf.json
+    │   ├── sub-001_T1w_preproc.json
+    │   ├── sub-001_T1w_preproc.nii.gz
+    │   ├── sub-001_T1w_smoothwm.L.surf.gii
+    │   ├── sub-001_T1w_smoothwm.L.surf.json
+    │   ├── sub-001_T1w_smoothwm.R.surf.gii
+    │   ├── sub-001_T1w_smoothwm.R.surf.json
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_brainmask.json
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_brainmask.nii.gz
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_dtissue.json
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_dtissue.nii.gz
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_preproc.json
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_preproc.nii.gz
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_target-T1w_warp.h5
+    │   ├── sub-001_T1w_space-MNI152NLin2009cAsym_target-T1w_warp.json
+    │   ├── sub-001_T1w_space-orig_target-T1w_affine.json
+    │   ├── sub-001_T1w_space-orig_target-T1w_affine.txt
+    │   ├── sub-001_T1w_target-fsnative_affine.json
+    │   ├── sub-001_T1w_target-fsnative_affine.txt
+    │   ├── sub-001_T1w_target-MNI152NLin2009cAsym_warp.h5
+    │   └── sub-001_T1w_target-MNI152NLin2009cAsym_warp.json
+    └── func
+        ├── sub-001_task-MGT_run-01_bold_confounds.json
+        ├── sub-001_task-MGT_run-01_bold_confounds.tsv
+        ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.L.func.gii
+        ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.L.func.json
+        ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.R.func.gii
+        ├── sub-001_task-MGT_run-01_bold_space-fsaverage5.R.func.json
+        ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_brainmask.json
+        ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
+        ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.json
+        ├── sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
+        ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aparcaseg_roi.json
+        ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aparcaseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aseg_roi.json
+        ├── sub-001_task-MGT_run-01_bold_space-T1w_label-aseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-02_bold_confounds.json
+        ├── sub-001_task-MGT_run-02_bold_confounds.tsv
+        ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.L.func.gii
+        ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.L.func.json
+        ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.R.func.gii
+        ├── sub-001_task-MGT_run-02_bold_space-fsaverage5.R.func.json
+        ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_brainmask.json
+        ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
+        ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_preproc.json
+        ├── sub-001_task-MGT_run-02_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
+        ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aparcaseg_roi.json
+        ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aparcaseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aseg_roi.json
+        ├── sub-001_task-MGT_run-02_bold_space-T1w_label-aseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-03_bold_confounds.json
+        ├── sub-001_task-MGT_run-03_bold_confounds.tsv
+        ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.L.func.gii
+        ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.L.func.json
+        ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.R.func.gii
+        ├── sub-001_task-MGT_run-03_bold_space-fsaverage5.R.func.json
+        ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_brainmask.json
+        ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
+        ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_preproc.json
+        ├── sub-001_task-MGT_run-03_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
+        ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aparcaseg_roi.json
+        ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aparcaseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aseg_roi.json
+        ├── sub-001_task-MGT_run-03_bold_space-T1w_label-aseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-04_bold_confounds.json
+        ├── sub-001_task-MGT_run-04_bold_confounds.tsv
+        ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.L.func.gii
+        ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.L.func.json
+        ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.R.func.gii
+        ├── sub-001_task-MGT_run-04_bold_space-fsaverage5.R.func.json
+        ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_brainmask.json
+        ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz
+        ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_preproc.json
+        ├── sub-001_task-MGT_run-04_bold_space-MNI152NLin2009cAsym_preproc.nii.gz
+        ├── sub-001_task-MGT_run-04_bold_space-T1w_label-aparcaseg_roi.json
+        ├── sub-001_task-MGT_run-04_bold_space-T1w_label-aparcaseg_roi.nii.gz
+        ├── sub-001_task-MGT_run-04_bold_space-T1w_label-aseg_roi.json
+        └── sub-001_task-MGT_run-04_bold_space-T1w_label-aseg_roi.nii.gz
+```
+
+### Limitations
+
+* For now, we use a simplified description of the provenance, leaving aside software and environments as well as keys such as `Digest`, `Version`, `EnvVar`, `StartedAtTime`, `EndedAtTime`.
+* Some entities end up with several labels / atlocation. E.g.:
+```JSON-LD
+{
+    "Id": "http://iri.nidash.org/262c247816c9fc071309a1da8bad277d",
+    "Type": "Entities",
+    "Label": [
+      "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_03_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz",
+      "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_02_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz"
+    ],
+    "Atlocation": [
+      "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_03_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz",
+      "file://b330d9dac87a/scratch/fmriprep_wf/single_subject_001_wf/func_preproc_task_MGT_run_02_wf/sdc_wf/phdiff_wf/demean/sub-001_phasediff_rads_unwrapped_filt_demean.nii.gz"
+    ],
+    "https://github.com/bids-standard/BEP028_BIDSprov/terms/Digest": "sha512:c585500ee6565b5e8277e3cf72dcdef81768439e7998c258d9e3cfc4042cf2d3fa80ecd359400deda90a4ed141e3180b78a942b32827bd41fb0ca367c8f91c9c"
+}
+```
+* As a result of the previous point, we are not able to fully replace these Entities from the `prov/prov-fmriprep_ent.json` file by a `GeneratedBy` field inside a sidecar JSON
+* Some terms are missing in the BIDS-Prov context although they are in the specification (such as `Digest`, `Version`, `EnvVar`)
+* For now, the conversion script is not able to transform RDF triplets into dictionaries, as requested for `Digest` or `EnvVar` objects.
+* IRIs are not human readable enough (e.g.: `http://iri.nidash.org/262c247816c9fc071309a1da8bad277d`)
+* Some "Function" and other activity nodes Use and Generate the same entity. Does this really mean that they read and write the same file ?
+
+### Next steps
+
+* how to represent entities with two labels and locations ?
+* then, use file names for Ids of entities
+* make extractions based on consistent use of qualifiedUsage and qualifiedGeneration (vs. Used and GeneratedBy)
+* investiate activities Using an Generating the same file (e.g.: `http://iri.nidash.org/4650c7ac00df11f0992d72ca464e997e` with entity `http://iri.nidash.org/72737575a38dda35b8ab6530a55aa543` which is `file://b330d9dac87a/data/sub-001/anat/sub-001_T1w.nii.gz`)
@@ -0,0 +1,102 @@
+#!/usr/bin/python
+# coding: utf-8
+
+""" Convert nipype provenance traces into one BIDS-Prov compliant JSON-LD graph """
+
+import json
+from pyld import jsonld
+from rdflib import Dataset, Graph, Namespace
+from rdflib.namespace import RDF, RDFS, PROV
+from rdflib.plugins.sparql import prepareQuery
+
+from queries import queries, simple_queries
+
+# Dict of namespaces to be used in queries
+NAMESPACES = {
+    'rdfs': RDFS,
+    'rdf': RDF,
+    'prov': PROV,
+    'nipype': Namespace("http://nipy.org/nipype/terms/"),
+    'niiri': Namespace("http://iri.nidash.org/"),
+    'crypto': Namespace("http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/"),
+    'bidsprov': Namespace("https://github.com/bids-standard/BEP028_BIDSprov/terms/")
+}
+
+# Parse the nipype RDF provenance file
+# We use Dataset as there might be several graphs in the file
+nipype_prov = Dataset()
+nipype_prov.parse('prov/nipype/workflow_provenance_20250314T155959.trig', format='trig')
+
+
+# Create an empty graph for output provenance
+bids_prov = Graph()
+
+# Query input graphs
+for label, query in simple_queries.items():
+
+    for graph in nipype_prov.graphs():
+        q = prepareQuery(query, initNs = NAMESPACES)
+        queried_graph = graph.query(q)
+
+        if len(queried_graph) > 0:
+            bids_prov += queried_graph
+
+# Serialize output graph to JSON-LD and compact
+compacted = jsonld.compact(
+    json.loads(bids_prov.serialize(format='json-ld')),
+    'https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json'
+    )
+
+# Write compacted JSON-LD
+with open('prov/nipype/workflow_provenance_20250314T155959_compacted.jsonld', 'w', encoding='utf-8') as file:
+    file.write(json.dumps(compacted, indent=2))
+
+# Merge records into a BIDS-Prov skeleton and write separated JSON files
+bids_prov_skeleton = {
+  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
+  "BIDSProvVersion": "0.0.1",
+  "Records": {
+    "Software": [],
+    "Activities": [],
+    "Entities": [],
+    "Environments": []
+  }
+}
+software_records = {}
+activities_records = {}
+entities_records = {}
+environments_records = {}
+
+for record in compacted['@graph']:
+
+    record_without_id = record.copy()
+    record_without_id.pop('Id')
+
+    if 'Type' not in record:
+        continue
+    if record['Type'] == 'Software':
+        bids_prov_skeleton['Records']['Software'].append(record)
+        software_records[record['Id']] = record_without_id
+    elif record['Type'] == 'Activities':
+        bids_prov_skeleton['Records']['Activities'].append(record)
+        activities_records[record['Id']] = record_without_id
+    elif 'Environment' in record['Type']:
+        bids_prov_skeleton['Records']['Environments'].append(record)
+        environments_records[record['Id']] = record_without_id
+    else:
+        bids_prov_skeleton['Records']['Entities'].append(record)
+        entities_records[record['Id']] = record_without_id
+
+# Write BIDS-Prov JSON-LD
+with open('prov/nipype/workflow_provenance_20250314T155959_bidsprov.jsonld', 'w', encoding='utf-8') as file:
+    file.write(json.dumps(bids_prov_skeleton, indent=2))
+
+# Write split JSONs
+with open('prov/prov-fmriprep_soft.json', 'w', encoding='utf-8') as file:
+    file.write(json.dumps(software_records, indent=2))
+with open('prov/prov-fmriprep_act.json', 'w', encoding='utf-8') as file:
+    file.write(json.dumps(activities_records, indent=2))
+with open('prov/prov-fmriprep_env.json', 'w', encoding='utf-8') as file:
+    file.write(json.dumps(environments_records, indent=2))
+with open('prov/prov-fmriprep_ent.json', 'w', encoding='utf-8') as file:
+    file.write(json.dumps(entities_records, indent=2))
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`	`1`	`[codespell]`
`2`		`-skip = .git,.pdf,.svg,venvs,./examples/from_parsers`
	`2`	`+skip = .git,.pdf,.svg,venvs,./examples/from_parsers,*.trig`
`3`	`3`	`#`
`4`	`4`	`# ignore-words-list =`