Extra participants in phenotype TSVs, but not in data set folders should not be an ERROR in validation

## Introduction

In [this section about the modality agnostic phenotypic data](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#phenotypic-and-assessment-data), there is a sentence that I don't agree with (after preparing A LOT of phenotypic data):

> ... the entries of [the `participant_id`] column MUST correspond to the subjects in the BIDS dataset and `participants.tsv` file.

In our case there are more subjects in the `phenotype/` TSV data who were screened out (but can still be shared) before MRI or MEG data collection than there are MEG and MRI participant data combined.  And our `participants.tsv` file contained every subject across all `phenotype/` TSV data so I would think perhaps from the above sentence that we could be in compliance aside from the *"BIDS dataset and `participants.tsv` file"* part of that sentence.

I discovered this behavior by way of running the validator, but I don't think my proposal is a particularly difficult one.  I propose one of three changes.  They are listed in my order of preference, so I would prefer Option A over the others.

## Option A: Inclusive OR/Union

Change the sentence explicitly to accept a union (an inclusive OR) of both the BIDS dataset and the `participants.tsv` file:

> ... the entries of [the `participant_id`] column MUST correspond to the subjects in the BIDS dataset **unioned with (inclusive or) the** `participants.tsv` file.

This way `phenotype/` TSV data can correspond to either subjects in the `participants.tsv` or the BIDS dataset all-inclusive.

## Option B: Exclusive `participants.tsv`

Change the sentence to accept only the `participants.tsv` and ignore the BIDS dataset:

> ... the entries of [the `participant_id`] column MUST correspond to the subjects in the ~~BIDS dataset and~~ `participants.tsv` file.

This is a little stricter to really require that if `phenotype/` TSV data is prepared, then a valid `participants.tsv` is prepared as well.

## Option C: Soften requirement

Change the sentence to make it a soft and optional thing rather than a hard and rigid rule:

> ... the entries of [the `participant_id`] column **SHOULD** correspond to the subjects in the BIDS dataset and `participants.tsv` file.

I don't like this option because it is less prescriptive and could be more confusing to someone receiving a BIDS dataset, but it is an option.  This would imply `phenotype/` TSV data could be asynchronous with the BIDS dataset and `participants.tsv`, but the validator would only WARN about this being a problem instead of giving an ERROR.

## References

For more on this conversation, you can see [my NeuroStars post about this](https://neurostars.org/t/bids-validator-error-code-51-phenotype-subjects-missing/20167).  Thank you all for reading, your consideration, and the great work being done here!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extra participants in phenotype TSVs, but not in data set folders should not be an ERROR in validation #914

Introduction

Option A: Inclusive OR/Union

Option B: Exclusive `participants.tsv`

Option C: Soften requirement

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extra participants in phenotype TSVs, but not in data set folders should not be an ERROR in validation #914

Description

Introduction

Option A: Inclusive OR/Union

Option B: Exclusive participants.tsv

Option C: Soften requirement

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Option B: Exclusive `participants.tsv`