-
Notifications
You must be signed in to change notification settings - Fork 1
Description
A brief discussion came up on this topic in the recent BIDS 2.0 WG meeting.
ATM we have two major principled formats and locations for metadata:
- sidecar .json files -- metadata applicable to a specific data file
- due to inheritance principle, a single
.jsonfile (e.g. at higher level in hierarchy) already can provide metadata for many data files, groupping on the entities or suffix used in the filename (e.g.task-rest_bold.json)
- due to inheritance principle, a single
- .tsv file(s), of which I see a few major types
{entity_plural}.tsv(e.g.participants.tsv,sessions.tsv, etc) - metadata which is typically not placed into individual sidecar files and groups metadata based on a specific single entity per{entity_id}value (so similar to aforementioned{entity}-{value}*.jsonwhere it is not groupped)scans.tsv- summarization of metadata about individual data files at the higher level in hierarchy (TODO: link issues on need to rename, since no longer a good name which initially MRI specific){nonentity_plural}.tsv(e.g.channels.tsv, etc) -- metadata on some groupping level present within data files, and thus without (yet) an explicit{entity}defined
There is of cause also notion of an entity itself which some times (e.g. sub, ses etc) contains the actual metadata "value" which could also be present in a .tsv or .json file(s). But for those we are in agreement that "use of the entity values for metadata storage is discouraged and they are used more for indexing and identification" (TODO: replace with quote and ref)
In particular, both "sidecar .json" and {entity_plural}.tsv (and scans.tsv) are the places for metadata in groupped or not "fashion".
.json and .tsv formalizations have some similarities
- inheritance principle
- for BIDS prescribed metadata fields/columns we define names and types in the schema allowing for validation
- TODO: more?
but also different "features", (TODO: make into a table?) e.g.
.tsvhave formalization to describe their columns and validator complains whenever undescribed column is included.jsonfiles have nothing like that!- there is discussion on introducing prefix to non-schema defined fields in .json: Custom prefix (X-) for arbitrary metadata in .json files #80
- We use
CamelCaseing for fields in.jsonbutsnake_casefor columns in.tsv - TODO: more?
Ideally, for consistency, and also various needs (e.g. in BEP036) where metadata clearly could be defined in two forms ("summarized" in .tsv, hence also see "inheritance->summarization" issue ) and thus for overall "standard forming common principles (#66) it would be great if
- naming of metadata fields in .tsv and .json was harmonized (e.g. all
snake_case, with some metadata field describing all or non-standard fields etc) - "semantic" unified - metadata field
blahin .json would be the same meanin/type/etc as columnblahin .tsv. - specification unified - treat/describe non-spec fields uniformly across .tsv and .json (e.g.
x-prefix for all non-standard not only sidecar fields but columns as well)
and provide recommendations on where/when to place metadata
- Replace "inheritance" with "summarization" principle #65 is highly relevant since would just help to reduce cognitive load - seeing value for metadata field at any level immediately tells you the value without needing to resort to tools to establish the value by traversing full hierarchy
edits:
-
dataset_description.jsonis currently the "summary" on what is common across all data (e.g. all participants). So, e.g. that is where species is common across all subjects (quite typical ;-) ). But ATM we do not have ability to provide species at the level of whole dataset AFAIK. -
we have
_stain-entity and then instruct to place related metadata into sidecar .json instead of somestains.tsv.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status