Improve spec/validation of participants.tsv

# TLDR

In the `participants.tsv` file, the `age` and `sex` columns are sometimes not well defined, and this leads to (unnecessary) issues on the side of tool developers (and thus eventually the users). We should improve either the spec or the validator, or both.

cc @jasmainak @agramfort @adam2392 @hoechenberger 

came up in: https://github.com/mne-tools/mne-bids/issues/396

# Intro

The specification says the following about the [participants.tsv file](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#participants-file):

>  In case of single session studies this file has one compulsory column participant_id that consists of sub-<label>, followed by a list of optional columns describing participants.

so strictly speaking, all columns that are *not* `participant_id` are OPTIONAL, and thus SHOULD be described in an accompanying `participants.json`.

For optional columns that are not described, the validator currently emits a warning such as this:

```
1: [WARN] Tabular file contains custom columns not described in a data dictionary (code: 82 - CUSTOM_COLUMN_WITHOUT_DESCRIPTION)
  ./participants.tsv
    Evidence: Columns: group not defined, please define in: /participants.json

```

Yet, the validator treats some "optional" columns differently, i.e., these columns are accepted WITHOUT warning. Examples of these are:

- age
- sex

However, the specification does not cover that these two variables are "expected optional columns". The expected behavior would be to raise a warning also for age and sex.

I could not pin down the exact part of the validator that is responsible for this behavior, but it may be this line:

https://github.com/bids-standard/bids-validator/blob/dfabbfb058daca406ed1d0897c3a25be059a5ad6/bids-validator/utils/summary/collectSubjectMetadata.js#L31

perhaps @nellh or @rwblair can help

# The problem 

The issue that arises from this (apart from inconsistency) is that users define their own levels for the `sex` column, and are NOT reminded by the validator to please define their levels further in a `participant.json`.

As a result, these values are hard (or impossible) to parse by software.

E.g., we may have the following `participants.json`:

```Text
participant_id	age	sex
sub-05	25	fem
sub-06	30	ma
sub-07	26	ma
```

what's `fem`? what's `ma`?

# How to fix?

I think we should do one of the following:

1. fix the validator so that it emits a warning if age and sex are columns in `participants.tsv` but have no description in an accompanying `participants.json`

OR 

2. Amend the `participants.tsv` part of specification and explicitly say that age and sex are "to-be-expected" columns ... and then also define the expected inputs: 

- age MUST be a float (years since birth)
    - if a user wants to specify age differently, they must make their own custom column, e.g. `age_in_months`
- sex MUST be a string (here we need to discuss, which strings we accept. Most straight forward would perhaps be "male", "female", ~"undefined"~, "other", but I would like somebody with a bit more experience in inclusive language to make a suggestion here.
    - again: if a user wants to do their own sex column they can make their own custom column with a wide range of acceptable factor levels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve spec/validation of participants.tsv #458

TLDR

Intro

The problem

How to fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve spec/validation of participants.tsv #458

Description

TLDR

Intro

The problem

How to fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions