Formalize "life-cycle"/movement and placement of metadata

A brief discussion came up on this topic in the recent BIDS 2.0 WG meeting.

ATM we have two major principled formats and locations for metadata:
- sidecar .json files -- metadata applicable to a specific data file
  - due to inheritance principle, a single `.json` file (e.g. at higher level in hierarchy) already can provide metadata for **many** data files, groupping on the entities or suffix used in the filename (e.g. `task-rest_bold.json`) 
- .tsv file(s), of which I see a few major types
  - `{entity_plural}.tsv` (e.g. `participants.tsv`, `sessions.tsv`, etc) - metadata which is typically not placed into individual sidecar files and groups metadata based on a specific single entity per `{entity_id}` value (so similar to aforementioned `{entity}-{value}*.json` where it is not groupped)
  - `scans.tsv` - summarization of metadata about individual data files at the higher level in hierarchy (TODO: link issues on need to rename, since no longer a good name which initially MRI specific)
  - `{nonentity_plural}.tsv` (e.g. `channels.tsv`, etc) -- metadata on some groupping level present **within** data files, and thus without (yet) an explicit `{entity}` defined

There is of cause also notion of an `entity` itself which some times (e.g. `sub`, `ses` etc) contains the actual metadata "value" which could also be present in a `.tsv` or `.json` file(s).  But for those we are in agreement that "use of the entity values for metadata storage is discouraged and they are used more for indexing and identification" (TODO: replace with quote and ref)

In particular, both "sidecar `.json`" and `{entity_plural}.tsv` (and `scans.tsv`) are the places for metadata in groupped or not "fashion". 

`.json` and `.tsv` formalizations have some similarities
-  inheritance principle
- for BIDS prescribed metadata fields/columns we define names and types in the schema allowing for validation
- TODO: more?
 
but also different "features", (TODO: make into a table?) e.g.
- `.tsv` have formalization to describe their columns and validator complains whenever undescribed column is included
   - `.json` files have nothing like that!
   - there is discussion on introducing prefix to non-schema defined fields in .json: #80
- We use `CamelCase`ing for fields in `.json` but `snake_case` for columns in `.tsv`
- TODO: more?

Ideally, for consistency, and also various needs (e.g. in BEP036) where metadata clearly could be defined in two forms ("summarized" in .tsv, hence also see ["inheritance->summarization" issue](#65) ) and thus for overall "standard forming common principles (#66) it would be great if 

- **naming** of metadata fields in .tsv and .json was harmonized (e.g. all `snake_case`, with some metadata field describing all or non-standard fields etc) 
- **"semantic"** unified - metadata field `blah` in .json would be the same meanin/type/etc as column `blah` in .tsv.
- **specification** unified - treat/describe non-spec fields uniformly across .tsv and .json (e.g. `x-` prefix for all non-standard not only sidecar fields but columns as well)

and provide recommendations on where/when to place metadata
- #65 is highly relevant since would just help to reduce cognitive load - seeing value for metadata field at any level immediately tells you the value without needing to resort to tools to establish the value by traversing full hierarchy 

edits:
- `dataset_description.json` is currently the "summary" on what is common across all data (e.g. all participants). So, e.g. that is where [species](https://bids-specification.readthedocs.io/en/stable/glossary.html#species-columns) is common across all subjects (quite typical ;-) ). But ATM we do not have ability to provide species at the level of whole dataset AFAIK.


- we have `_stain-` entity and then instruct to [place related metadata into sidecar .json](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/microscopy.html#filename-entities) instead of some `stains.tsv`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Formalize "life-cycle"/movement and placement of metadata #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Formalize "life-cycle"/movement and placement of metadata #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions