-
Notifications
You must be signed in to change notification settings - Fork 16
Description
ATM there is two principle modes of output of bids-validator
- text mode -- human oriented, shortened for consumption unless --verbose etc
- json -- machine oriented
Also it could in principle contribute to changing "mental picture" of what bids-validator output is. Since a few months back I started to "ship" bids-validator output under derivatives/bids-validator, e.g. https://github.com/OpenNeuroDatasets/ds005256/tree/main/derivatives/bids-validator (work with @jungheejung) and came up a few times last week on random occasion(s) e.g. in dialog with @jbpoline 's and my groups (hence attn @asmacdo and @michellewang @nikhil153)
Moreover, I really think is that bids-validator output is nearly the only derivative for any BIDS (raw or not) dataset worth shipping under that dataset's derivatives/ folder (as opposed to outside of that dataset e.g. while composing into a study type layout or following YODA). It is because
- those outputs are typically small (large one is the sign of a worry!!!), smaller than MRIQC output etc.
- they are highly relevant to describe the state of this particular BIDS dataset: Ideally should be just a summary statement stating no errors present and based on which publishers could make claims of "bids compliance"
Cons: there is a problem of ensuring that it is "up to date" with the dataset it is contained in, but it is a common issue for any derivative / raw relation so should be ok.
NB note that bids-validator folder name under derivatives/ is suboptimal since includes - and bids recommends {pipeline}-{flavor} naming so for a machine it would sound like validator flavor of bids which is not entirely incorrect but may be we should store indeed under bids-validator-{version} and adjust description in bids that flavor is the one which must be without -?
But with that in mind, and major shortcomings of hard to handle "text mode" output for humans, I think it is very well worth making output of bids-validator into a BIDS derivative dataset itself!!! Then we could rely on having more human accessible summaries within .tsv files etc, potentially with some even more convenient renderers on top but also might be just to let humans run tools like visidata, e.g. here is a sample on ds000221 files:
TODOs
- seek feedback
- provide prototype and examples.
- crude prototype by claude and yours truly: https://github.com/yarikoptic/bids-validator-derivative (see code/bids_validator_relayout.py)
- example outputs on openneuro datasets (ran for now with
--ignoreNiftiHeaders... will do later via datalad-fuse): find under https://github.com/yarikoptic/bids-validator-derivatives/tree/master/openneuro - share repo with openneuro validator outputs
- difficulty: can't go into a single git repo. So far totals to 1.1 million of files across openneuro datasets. So we are doomed to create organization for that etc
- finish cooking for all openneuro datasets
- rerun without
--ignoreNiftiHeaders
- Potential improvements which came to mind
- since many if not most records are just repeating across paths, instead of storing per each file, we could identify "unique collections" of issues by removing paths from records. Then create
issue_collections.tsvand .json (with exact detailed records) and then per file .json would be just linking or pointing to those specific collection ids. Will be much more compact and more handy IMHO!
- since many if not most records are just repeating across paths, instead of storing per each file, we could identify "unique collections" of issues by removing paths from records. Then create