[ENH] Allow participants.tsv to contain a superset of subject directories and subjects listed in phenotype files#2044
Conversation
The participants schema description now contains the comprehensive superset rule from bids-standard#914.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2044 +/- ##
=======================================
Coverage 82.44% 82.44%
=======================================
Files 17 17
Lines 1504 1504
=======================================
Hits 1240 1240
Misses 264 264 ☔ View full report in Codecov by Sentry. |
Committing the good suggestion. Co-authored-by: Chris Markiewicz <effigies@gmail.com>
|
Yes, that looks like it satisfies our need. Thanks for the suggestion @effigies! |
|
@rwblair We pre-load all phenotype files at the beginning of the run in order to populate RuleName:
selectors:
- datatype == 'phenotype'
- extension == '.tsv'
checks:
- |
allequal(
sorted(intersects(dataset.subjects.participant_id, columns.participant_id)),
sorted(columns.participant_id)
)I'm curious which one would be more inefficient:
It would also be worth considering which one could be optimized under the hood. While it is simplest if the context continues to be serializable to a JSON object, we could consider set-like structures that make it more efficient to run |
|
me posting above overlapped with @effigies actually providing "howto" ;) |
|
I crossed my prior note, but reflecting on the rule by @effigies above, do we already provide top level directory ATM no rule mentions it as a datatype, here is the list/counts❯ git grep -h 'datatype ==' | sed -e 's,^ *,,g' | sort | uniq -c | sort -n
1 - datatype == 'fmap'
2 - datatype == "beh"
2 - datatype == "dwi"
2 - datatype == "mrs"
3 - datatype == "anat"
6 - datatype == "motion"
7 - datatype == "micr"
9 - datatype == "fmap"
9 - datatype == "func"
13 - datatype == "ieeg"
17 - datatype == "eeg"
18 - datatype == "pet"
20 - datatype == "perf"
21 - datatype == "meg"
24 - datatype == "nirs"Would we similarly define |
|
I did pragmatically use it as a datatype in #1672. I don't think there's a call to make stimuli that, as long as there is no constraint on the contents of the stimuli directory. My understanding was your preference was to classify stimuli as a new dataset type and validate its contents separately? |
|
@ericearl I took a quick pass at updating the schema. Would you mind putting together a small example for bids-examples? Maybe one with |
|
@effigies I made our draft PR ready for review over on bids-examples at bids-standard/bids-examples#465. You'll want pheno004 for the example you're asking for. |
|
@effigies What else needs to happen next to finish off this PR? I know there's got to be the two reviews that aren't done by you or I. |
|
We need to get the examples validating. |
|
@effigies All 4 or just |
|
I guess just 004 for this, but if the others aren't going to be fixed, it probably makes sense to pull out into its own PR. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
I added just the |
|
2 independent reviews and more than a week since substantive changes. Merging. |
The
participantsdescription insrc/schema/objects/files.yamlnow contains the comprehensive superset rule from #914. This change allows phenotype-only participant_ids (participants not present in thesub-XXfolders) to be included in the participants.tsv file. @effigies I believe has a plan to integrate this change into the next BIDS release for the BIDS schema validator.