Skip to content

Formalize "life-cycle"/movement and placement of metadata #85

@yarikoptic

Description

@yarikoptic

A brief discussion came up on this topic in the recent BIDS 2.0 WG meeting.

ATM we have two major principled formats and locations for metadata:

  • sidecar .json files -- metadata applicable to a specific data file
    • due to inheritance principle, a single .json file (e.g. at higher level in hierarchy) already can provide metadata for many data files, groupping on the entities or suffix used in the filename (e.g. task-rest_bold.json)
  • .tsv file(s), of which I see a few major types
    • {entity_plural}.tsv (e.g. participants.tsv, sessions.tsv, etc) - metadata which is typically not placed into individual sidecar files and groups metadata based on a specific single entity per {entity_id} value (so similar to aforementioned {entity}-{value}*.json where it is not groupped)
    • scans.tsv - summarization of metadata about individual data files at the higher level in hierarchy (TODO: link issues on need to rename, since no longer a good name which initially MRI specific)
    • {nonentity_plural}.tsv (e.g. channels.tsv, etc) -- metadata on some groupping level present within data files, and thus without (yet) an explicit {entity} defined

There is of cause also notion of an entity itself which some times (e.g. sub, ses etc) contains the actual metadata "value" which could also be present in a .tsv or .json file(s). But for those we are in agreement that "use of the entity values for metadata storage is discouraged and they are used more for indexing and identification" (TODO: replace with quote and ref)

In particular, both "sidecar .json" and {entity_plural}.tsv (and scans.tsv) are the places for metadata in groupped or not "fashion".

.json and .tsv formalizations have some similarities

  • inheritance principle
  • for BIDS prescribed metadata fields/columns we define names and types in the schema allowing for validation
  • TODO: more?

but also different "features", (TODO: make into a table?) e.g.

  • .tsv have formalization to describe their columns and validator complains whenever undescribed column is included
  • We use CamelCaseing for fields in .json but snake_case for columns in .tsv
  • TODO: more?

Ideally, for consistency, and also various needs (e.g. in BEP036) where metadata clearly could be defined in two forms ("summarized" in .tsv, hence also see "inheritance->summarization" issue ) and thus for overall "standard forming common principles (#66) it would be great if

  • naming of metadata fields in .tsv and .json was harmonized (e.g. all snake_case, with some metadata field describing all or non-standard fields etc)
  • "semantic" unified - metadata field blah in .json would be the same meanin/type/etc as column blah in .tsv.
  • specification unified - treat/describe non-spec fields uniformly across .tsv and .json (e.g. x- prefix for all non-standard not only sidecar fields but columns as well)

and provide recommendations on where/when to place metadata

  • Replace "inheritance" with "summarization" principle #65 is highly relevant since would just help to reduce cognitive load - seeing value for metadata field at any level immediately tells you the value without needing to resort to tools to establish the value by traversing full hierarchy

edits:

  • dataset_description.json is currently the "summary" on what is common across all data (e.g. all participants). So, e.g. that is where species is common across all subjects (quite typical ;-) ). But ATM we do not have ability to provide species at the level of whole dataset AFAIK.

  • we have _stain- entity and then instruct to place related metadata into sidecar .json instead of some stains.tsv.

Metadata

Metadata

Assignees

No one assigned

    Labels

    consistencyAspect requiring special treatment/logic outside of generic common principlesmetadataChanges to metadata fields/files.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions