Skip to content

Extra participants in phenotype TSVs, but not in data set folders should not be an ERROR in validation #914

Open
@ericearl

Description

@ericearl

Introduction

In this section about the modality agnostic phenotypic data, there is a sentence that I don't agree with (after preparing A LOT of phenotypic data):

... the entries of [the participant_id] column MUST correspond to the subjects in the BIDS dataset and participants.tsv file.

In our case there are more subjects in the phenotype/ TSV data who were screened out (but can still be shared) before MRI or MEG data collection than there are MEG and MRI participant data combined. And our participants.tsv file contained every subject across all phenotype/ TSV data so I would think perhaps from the above sentence that we could be in compliance aside from the "BIDS dataset and participants.tsv file" part of that sentence.

I discovered this behavior by way of running the validator, but I don't think my proposal is a particularly difficult one. I propose one of three changes. They are listed in my order of preference, so I would prefer Option A over the others.

Option A: Inclusive OR/Union

Change the sentence explicitly to accept a union (an inclusive OR) of both the BIDS dataset and the participants.tsv file:

... the entries of [the participant_id] column MUST correspond to the subjects in the BIDS dataset unioned with (inclusive or) the participants.tsv file.

This way phenotype/ TSV data can correspond to either subjects in the participants.tsv or the BIDS dataset all-inclusive.

Option B: Exclusive participants.tsv

Change the sentence to accept only the participants.tsv and ignore the BIDS dataset:

... the entries of [the participant_id] column MUST correspond to the subjects in the BIDS dataset and participants.tsv file.

This is a little stricter to really require that if phenotype/ TSV data is prepared, then a valid participants.tsv is prepared as well.

Option C: Soften requirement

Change the sentence to make it a soft and optional thing rather than a hard and rigid rule:

... the entries of [the participant_id] column SHOULD correspond to the subjects in the BIDS dataset and participants.tsv file.

I don't like this option because it is less prescriptive and could be more confusing to someone receiving a BIDS dataset, but it is an option. This would imply phenotype/ TSV data could be asynchronous with the BIDS dataset and participants.tsv, but the validator would only WARN about this being a problem instead of giving an ERROR.

References

For more on this conversation, you can see my NeuroStars post about this. Thank you all for reading, your consideration, and the great work being done here!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions