feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file #2107

effigies · 2025-04-25T12:24:17Z

This builds on and takes advantage of #2050 to select phenotype files and compare the participant_id column to the dataset-global dataset.subjects.participant_id, which is populated from participants.tsv.

This does two things for validation implementations:

It allows the specific phenotype file with extra participants to be identified, since the phenotype participants are not collapsed into one superset.
It allows us to avoid calculating the union at validator startup, which could be potentially costly for files with many phenotype files, as well as avoid loading each phenotype file twice (or hold a long-lived cache of all phenotype data).

Note that #2050 is not strictly required in order to use this check, as validating phenotype file names already requires us to have it as a pseudo-datatype. I do think a second use case (with more expected from BEP36) justifies moving this from implementation detail to official spec language.

codecov · 2025-04-25T12:28:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.71%. Comparing base (254e22e) to head (2c9e882).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2107   +/-   ##
=======================================
  Coverage   82.71%   82.71%           
=======================================
  Files          17       17           
  Lines        1533     1533           
=======================================
  Hits         1268     1268           
  Misses        265      265

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…pe file

effigies added schema Issues related to the YAML schema representation of the specification. Patch version release. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. labels Apr 25, 2025

effigies requested a review from ericearl April 25, 2025 12:24

effigies mentioned this pull request Apr 25, 2025

[SCHEMA] harmonize into dirs, ids, ids_phenotype for "subjects" and "sessions" #1981

Draft

2 tasks

effigies added 2 commits May 15, 2025 16:47

rf(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenoty…

1ff73de

…pe file

rf(schema): Remove expensive-to-construct context fields

2c9e882

effigies force-pushed the schema/phenotype-specificity branch from 170af5c to 2c9e882 Compare May 15, 2025 20:47

effigies marked this pull request as ready for review May 15, 2025 20:47

effigies requested a review from erdalkaraca as a code owner May 15, 2025 20:47

effigies requested review from rwblair and removed request for erdalkaraca May 15, 2025 20:49

effigies changed the title ~~rf(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file~~ feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file May 15, 2025

rwblair approved these changes May 15, 2025

View reviewed changes

effigies added exclude-from-changelog This item will not feature in the automatically generated changelog needs review labels May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file #2107

feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file #2107

effigies commented Apr 25, 2025

codecov bot commented Apr 25, 2025 •

edited

Loading

feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file #2107

Are you sure you want to change the base?

feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file #2107

Conversation

effigies commented Apr 25, 2025

codecov bot commented Apr 25, 2025 • edited Loading

Codecov Report

codecov bot commented Apr 25, 2025 •

edited

Loading