Skip to content

Additional "output" mode for bids-validator -- "bids-derivative" #253

@yarikoptic

Description

@yarikoptic

ATM there is two principle modes of output of bids-validator

  • text mode -- human oriented, shortened for consumption unless --verbose etc
  • json -- machine oriented

Also it could in principle contribute to changing "mental picture" of what bids-validator output is. Since a few months back I started to "ship" bids-validator output under derivatives/bids-validator, e.g. https://github.com/OpenNeuroDatasets/ds005256/tree/main/derivatives/bids-validator (work with @jungheejung) and came up a few times last week on random occasion(s) e.g. in dialog with @jbpoline 's and my groups (hence attn @asmacdo and @michellewang @nikhil153)

Moreover, I really think is that bids-validator output is nearly the only derivative for any BIDS (raw or not) dataset worth shipping under that dataset's derivatives/ folder (as opposed to outside of that dataset e.g. while composing into a study type layout or following YODA). It is because

  • those outputs are typically small (large one is the sign of a worry!!!), smaller than MRIQC output etc.
  • they are highly relevant to describe the state of this particular BIDS dataset: Ideally should be just a summary statement stating no errors present and based on which publishers could make claims of "bids compliance"

Cons: there is a problem of ensuring that it is "up to date" with the dataset it is contained in, but it is a common issue for any derivative / raw relation so should be ok.

NB note that bids-validator folder name under derivatives/ is suboptimal since includes - and bids recommends {pipeline}-{flavor} naming so for a machine it would sound like validator flavor of bids which is not entirely incorrect but may be we should store indeed under bids-validator-{version} and adjust description in bids that flavor is the one which must be without -?

But with that in mind, and major shortcomings of hard to handle "text mode" output for humans, I think it is very well worth making output of bids-validator into a BIDS derivative dataset itself!!! Then we could rely on having more human accessible summaries within .tsv files etc, potentially with some even more convenient renderers on top but also might be just to let humans run tools like visidata, e.g. here is a sample on ds000221 files:

Image Image

TODOs

  • seek feedback
  • provide prototype and examples.
  • Potential improvements which came to mind
    • since many if not most records are just repeating across paths, instead of storing per each file, we could identify "unique collections" of issues by removing paths from records. Then create issue_collections.tsv and .json (with exact detailed records) and then per file .json would be just linking or pointing to those specific collection ids. Will be much more compact and more handy IMHO!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions