-
Notifications
You must be signed in to change notification settings - Fork 12
Description
This issue is open to keep track of the discussions, decisions and remaining points about BIDS URIs, their lack of uniqueness and their use as Identifiers for ProvEntities in the BIDS-Prov BEP.
@bclenet, @yarikoptic, @satra: I hope if this description captures well where we stand. If you have comments, including if you identify more drawbacks to include in the "problem statement", please let me know.
Problem statement
The BIDS-Prov specification makes the assumption that the identifier associated to a given file in the BIDS hierarchy is the BIDS URI of that file. Yet, BIDS URIs are not unique which can create issues when recording provenance through time.
In particular:
- A BIDS file that is modified but keeps the same filename (and path) will have a different identifier at different points in time (see example use-case below)
Example use-case
Let's say that a file named sub-01_T1w.nii.gz is created by activity Activity01 and that this file is modified by another activity Activity02.
Behaviour with current specification
Provenance recorded after both activities
If provenance is recorded after both activities have ended then it will look like this:
In sub-01_T1w.json we'll have the link to the last activity that led to the creation of this file:
[...] GeneratedBy: "bids::prov#Activity-abcd02" [...]
In prov/prov-myprocess_act.json, we'll have the description of both activities:
{
"Activities": [
{
"Id": "bids::prov#Activity-abcd02",
"Label": "A fix",
}
{
"Id": "bids::prov#Activity-abcd01",
"Label": "Dicom to Nifti conversion",
"Used": "bids::prov#provEntity-abcd02",
}
]
}
and the first version of the sub-01_T1w.json (which no longer exists) will be described in prov/prov-myprocess_ent.json:
{
"ProvEntities": [
{
"Id": "bids::prov#provEntity-abcd02",
"Label": "sub-01_T1w.json",
"GeneratedBy": "bids::prov#Activity-abcd01",
}
]
}
Provenance recorded after each activity
Now, let's consider instead that provenance metadata are stored after each activity, then after activity 1 was performed we'll have Activity 1 described in prov/prov-myprocess_act.json (as before) and in the sidecar json:
[...] GeneratedBy: "bids::prov#Activity-abcd02" [...]
And then after the second activity, :
- the sidecar json will have to be updated to:
[...] GeneratedBy: "bids::prov#Activity-abcd01" [...]
- The metadata about the old
sub-01_T1w.json(which no longer exists) will have to be moved toprov/prov-myprocess_ent.json(to make up the same entity as before). - Activity02 will be described in
prov/prov-myprocess_act.json(same as before )
Drawbacks
Importantly, the identifier of the first version of the file sub-01_T1w.json has changed between before and after activity02 was run (before was sub-01_T1w.nii.gz and after is bids::prov#provEntity-abcd02) altough this represents the same file.
Behavior with unique BIDS URIs (i.e. including fragments)
We envision that in the future, the BIDS specification will be updated so that BIDS URIs are adapted to include a fragment that can be automatically computed from a given BIDS file and its (mandatory) metadata. Then the problem identified above in BIDS-Prov would be solved as a given file in the BIDS hierarchy would keep its identifier (BIDS-URI+fragment) regardless of whether or not it is currently available in the BIDS dataset.
About defining unique fragments for the URI of a given BIDS file
The question of how to define a unique fragment of a BIDS file is challenging because:
- By definition, the fragment has to be unique to the content of a given file.
- The fragment cannot rely on optionnal metadata (as those may be absent or present but the BIDS file is still the same)
- We note that, if the fragment relies on the content of the BIDS file (e.g. using a shasum) then the fragment cannot be computed after the BIDS file is modified. This means that in the example above, a provenance engine would have to move the provenance (from the sidecar JSON to and provenance file in the
provfolder) of all BIDS files that are to-be modified by an upcoming activity before the activity happens (So that the provenance engine can compute the BIDS-URI+fragment of the to-be modified files).
Conclusion
For now, we leave it to (a future version of) the BIDS specification to specify how to compute unique BIDS URIs with fragments and we consider this problem to be out of the scope of the BIDS-Prov BEP.