Description
Introduction
In this section about the modality agnostic phenotypic data, there is a sentence that I don't agree with (after preparing A LOT of phenotypic data):
... the entries of [the
participant_id
] column MUST correspond to the subjects in the BIDS dataset andparticipants.tsv
file.
In our case there are more subjects in the phenotype/
TSV data who were screened out (but can still be shared) before MRI or MEG data collection than there are MEG and MRI participant data combined. And our participants.tsv
file contained every subject across all phenotype/
TSV data so I would think perhaps from the above sentence that we could be in compliance aside from the "BIDS dataset and participants.tsv
file" part of that sentence.
I discovered this behavior by way of running the validator, but I don't think my proposal is a particularly difficult one. I propose one of three changes. They are listed in my order of preference, so I would prefer Option A over the others.
Option A: Inclusive OR/Union
Change the sentence explicitly to accept a union (an inclusive OR) of both the BIDS dataset and the participants.tsv
file:
... the entries of [the
participant_id
] column MUST correspond to the subjects in the BIDS dataset unioned with (inclusive or) theparticipants.tsv
file.
This way phenotype/
TSV data can correspond to either subjects in the participants.tsv
or the BIDS dataset all-inclusive.
Option B: Exclusive participants.tsv
Change the sentence to accept only the participants.tsv
and ignore the BIDS dataset:
... the entries of [the
participant_id
] column MUST correspond to the subjects in theBIDS dataset andparticipants.tsv
file.
This is a little stricter to really require that if phenotype/
TSV data is prepared, then a valid participants.tsv
is prepared as well.
Option C: Soften requirement
Change the sentence to make it a soft and optional thing rather than a hard and rigid rule:
... the entries of [the
participant_id
] column SHOULD correspond to the subjects in the BIDS dataset andparticipants.tsv
file.
I don't like this option because it is less prescriptive and could be more confusing to someone receiving a BIDS dataset, but it is an option. This would imply phenotype/
TSV data could be asynchronous with the BIDS dataset and participants.tsv
, but the validator would only WARN about this being a problem instead of giving an ERROR.
References
For more on this conversation, you can see my NeuroStars post about this. Thank you all for reading, your consideration, and the great work being done here!