Skip to content

feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file #2107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

effigies
Copy link
Collaborator

This builds on and takes advantage of #2050 to select phenotype files and compare the participant_id column to the dataset-global dataset.subjects.participant_id, which is populated from participants.tsv.

This does two things for validation implementations:

  1. It allows the specific phenotype file with extra participants to be identified, since the phenotype participants are not collapsed into one superset.
  2. It allows us to avoid calculating the union at validator startup, which could be potentially costly for files with many phenotype files, as well as avoid loading each phenotype file twice (or hold a long-lived cache of all phenotype data).

Note that #2050 is not strictly required in order to use this check, as validating phenotype file names already requires us to have it as a pseudo-datatype. I do think a second use case (with more expected from BEP36) justifies moving this from implementation detail to official spec language.

@effigies effigies added schema Issues related to the YAML schema representation of the specification. Patch version release. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. labels Apr 25, 2025
@effigies effigies requested a review from ericearl April 25, 2025 12:24
Copy link

codecov bot commented Apr 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.71%. Comparing base (254e22e) to head (2c9e882).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2107   +/-   ##
=======================================
  Coverage   82.71%   82.71%           
=======================================
  Files          17       17           
  Lines        1533     1533           
=======================================
  Hits         1268     1268           
  Misses        265      265           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@effigies effigies force-pushed the schema/phenotype-specificity branch from 170af5c to 2c9e882 Compare May 15, 2025 20:47
@effigies effigies marked this pull request as ready for review May 15, 2025 20:47
@effigies effigies requested a review from erdalkaraca as a code owner May 15, 2025 20:47
@effigies effigies requested review from rwblair and removed request for erdalkaraca May 15, 2025 20:49
@effigies effigies changed the title rf(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file feat(schema): Rewrite PHENOTYPE_SUBJECTS_MISSING to run on each phenotype file May 15, 2025
@effigies effigies added exclude-from-changelog This item will not feature in the automatically generated changelog needs review labels May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-from-changelog This item will not feature in the automatically generated changelog needs review schema Issues related to the YAML schema representation of the specification. Patch version release. schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants