-
Notifications
You must be signed in to change notification settings - Fork 6
Molecular Consequence Schema and Tests #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v1
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only gone through the MolecularConsequence piece thus far. I think I'd like to chat asap to make sure I understand the full use cases. I'm not 100% sure why the proteinMolecule and proteinAllele are included as I assume they will only ever be included if they are associated with a transcriptMolecule and transcriptAllele. If that is correct then I think we need to consider a structure that represents that relationship a bit more clearly. If not, then I need to understand that use case.
@ahwagner Since I am assuming that the reason we would not include transcriptMolecule and/or transcriptAllele is because of intronic c. alleles (as they don't actually reside on a transcript. I think this is very clearly indicating VRS NEEDS a solution to showing variants with relationships between underlying sequences of different molecular types (e.g. genomic to transcript, etc...). I'm not super keen on skirting past that issue since intronic c. represented alleles are fundamental to this kind of study result. We should discuss this a bit to see if there's a good solution that we can propose to VRS 2.1?
const: "MolecularConsequence" | ||
default: "MolecularConsequence" | ||
description: MUST be "MolecularConsequence". | ||
focusVariant: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mrinal-Thomas-Epic if the only type of variant for this is Allele
then we should consider calling it focusAllele
like we did in the CohortAlleleFrequencyStudyResult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the description should specify 'allele' instead of 'variant'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the VEP documentation, it seems like they support annotating adjacencies as well. I'll try to get an example of this
extends: sourceDataSet | ||
description: >- | ||
The full data set that provided the reported the functional impact score. | ||
MolecularConsequence: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this name, but if we stick with it then we should consider renaming the other XxxStudyResult
base profiles Xxx
for consistency (and apply the same to all inherited classes).
- $refCurie: vrs:Allele | ||
- $refCurie: gks.core:iriReference | ||
description: The variant which this annotation applies to. | ||
transcriptMolecule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahwagner it would be helpful if there was a validation mechanism to assure that only transcript sequences could be used in this attribute. Wondering if you feel the same and whether or not we should do something about that now.
oneOf: | ||
- $refCurie: vrs:SequenceReference | ||
- $refCurie: gks.core:iriReference | ||
transcriptVariant: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mrinal-Thomas-Epic same as above re: focusAllele
vs focusVariant
.
oneOf: | ||
- $refCurie: vrs:SequenceReference | ||
- $refCurie: gks.core:iriReference | ||
proteinVariant: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mrinal-Thomas-Epic same as above re: focusAllele
vs focusVariant
@Mrinal-Thomas-Epic If it's not too much trouble, I would love it if you could create separate PRs for the InSilico stuff and the MoleConsq stuff. If that's too difficult let me know. |
b477dce
to
9803028
Compare
@ahwagner @Mrinal-Thomas-Epic maybe we should find a block of time asap to draft the intronic vrs class we need to support some of these use cases. we can do this with a small group like the 3 of us and try it out in these va-spec statements before bringing it to the larger group. And since it's draft we can add it to VRS so that EPIC and VEP can move forward with these statements. |
My primary comment on this PR is that Molecular Consequence really should be a Statement, not a Study Result. Sara Hunt raised this point on the last VA Community Call, and I strongly concur. Some additional notes about the rationale for this and a proposed model are in the document here. Also, with Mrinal‘s help I worked up a draft PR defining the classes/attributes needed for a Statement-based representation, along with some data examples. Note that the proposal re-introduces a modeling pattern from a few years ago for capturing "ancillary information" within a Statement. This pattern goes back to the ACM days, and was originally informed by several use cases from our original landscape analysis. This pattern seemed a perfect fit for capturing some of the information accompanying the central MC annotation in VEP data, and I think will be a generally useful pattern to have in the model. More info in the doc and PR, and looking forward to discussing this further. |
368930a
to
ae64766
Compare
Updates to VariantMolecularConcequenceProposition model to reflect decisions on 9-25-25 VA call: - Subject: MUST be a genomic variant - Predicate: hasMolecularConsequence (constant) - Object: remains the place where we capture the MC terms - but we are dropping the MolecularConsequence class and using a more generic 'Concept or Concept Set' class - Qualifiers: separate qualifiers for transcript and protein variant context
Additions reflect decisions made on 9-25-25 VA Call ConceptOrConceptSet class: defined as one of MappableConcept (for a single concept) or ConceptSet (necessarily 2 or more concepts) I made ConceptOrConceptSet the parent of Condition and Therapeutic, and I made ConceptSet the parent of ConditionSet and TherapyGroup (@ahwagner @larrybabb we didn’t explicitly discuss this, but I assumed it was the plan)
This attribute is no longer inherited, as the MC Proposition class now descends directly from SubjectVariantProposition.
@Mrinal-Thomas-Epic @ahwagner @larrybabb - I updated this MC Profile PR (schema, and test fixture examples) according to implement decisions from the 9-25-25 VA Call. Please review to make sure I got things right. I made a separate PR (#375) to rename ClinicalVariantProposition -> GeneticContextVariantPropositon. Below is a summary of these decisions, the corresponding model changes, and some questions: va-core-source.yaml
domain-entities-source.yaml
Test Fixtures:
Questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @mbrush ! Overall I like the changes and think they work. I still need to pull this down and run the tests, etc. but I'm staring with this preliminary review.
One topic we should discuss before merging is whether ConceptSet
belongs in the gks core so that Cat-VRS and VRS can use it if needed. This class seems to be a further wrapping and use of the general use MappableConcept
class and therefore makes more sense to be in the core library IMO.
This is potentially a bigger decision and should be thoughtfully considered. Doing this would cause us to have to do a point release for every product (I think), maybe a patch release.
I also want to verify with @ahwagner whether va-spec would need a minor release change or path release change to it's version. Since the ConditionSet is in trial use
I think this would be a breaking change (i.e. the "conditions" attibritue becomes "concepts".
I think it would be good to include a full Allele in one of them, just for demonstration purposes. Regarding the genomic subjectVariant requirement, VEP does allow you to provide a c. variant as input, but I believe it converts it to genomic before then calculating consequences across all transcripts (example). Let's check with @sarahhunt , whether it makes sense to introduce this requirement
I would argue "AND" fits better, since both terms apply. Super minor, but where did we land on the word |
Thanks for the feedback @Mrinal-Thomas-Epic . Addressing your points below:
|
@larrybabb Thanks for the feedback as well. I agree that Finally, @larrybabb it says you 'requested changes' - but I’m not sure how to tell what these changes are? |
Initial draft of schema for representing molecular consequence predictions from VEP (and potentially other sources in the future).
Out of scope: