Skip to content

Conversation

Mrinal-Thomas-Epic
Copy link

@Mrinal-Thomas-Epic Mrinal-Thomas-Epic commented Apr 9, 2025

Initial draft of schema for representing molecular consequence predictions from VEP (and potentially other sources in the future).

Out of scope:

  • Handling intronic variants. This will be added in a later time, once VRS issue #650 (actively being worked on) is resolved

Copy link
Contributor

@larrybabb larrybabb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only gone through the MolecularConsequence piece thus far. I think I'd like to chat asap to make sure I understand the full use cases. I'm not 100% sure why the proteinMolecule and proteinAllele are included as I assume they will only ever be included if they are associated with a transcriptMolecule and transcriptAllele. If that is correct then I think we need to consider a structure that represents that relationship a bit more clearly. If not, then I need to understand that use case.

@ahwagner Since I am assuming that the reason we would not include transcriptMolecule and/or transcriptAllele is because of intronic c. alleles (as they don't actually reside on a transcript. I think this is very clearly indicating VRS NEEDS a solution to showing variants with relationships between underlying sequences of different molecular types (e.g. genomic to transcript, etc...). I'm not super keen on skirting past that issue since intronic c. represented alleles are fundamental to this kind of study result. We should discuss this a bit to see if there's a good solution that we can propose to VRS 2.1?

const: "MolecularConsequence"
default: "MolecularConsequence"
description: MUST be "MolecularConsequence".
focusVariant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mrinal-Thomas-Epic if the only type of variant for this is Allele then we should consider calling it focusAllele like we did in the CohortAlleleFrequencyStudyResult

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the description should specify 'allele' instead of 'variant'

Copy link
Author

@Mrinal-Thomas-Epic Mrinal-Thomas-Epic Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the VEP documentation, it seems like they support annotating adjacencies as well. I'll try to get an example of this

extends: sourceDataSet
description: >-
The full data set that provided the reported the functional impact score.
MolecularConsequence:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this name, but if we stick with it then we should consider renaming the other XxxStudyResult base profiles Xxx for consistency (and apply the same to all inherited classes).

- $refCurie: vrs:Allele
- $refCurie: gks.core:iriReference
description: The variant which this annotation applies to.
transcriptMolecule:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahwagner it would be helpful if there was a validation mechanism to assure that only transcript sequences could be used in this attribute. Wondering if you feel the same and whether or not we should do something about that now.

oneOf:
- $refCurie: vrs:SequenceReference
- $refCurie: gks.core:iriReference
transcriptVariant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mrinal-Thomas-Epic same as above re: focusAllele vs focusVariant.

oneOf:
- $refCurie: vrs:SequenceReference
- $refCurie: gks.core:iriReference
proteinVariant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mrinal-Thomas-Epic same as above re: focusAllele vs focusVariant

@larrybabb
Copy link
Contributor

@Mrinal-Thomas-Epic If it's not too much trouble, I would love it if you could create separate PRs for the InSilico stuff and the MoleConsq stuff. If that's too difficult let me know.

@Mrinal-Thomas-Epic Mrinal-Thomas-Epic force-pushed the epic-vep-molecular-consequence branch from b477dce to 9803028 Compare April 9, 2025 22:14
@larrybabb
Copy link
Contributor

@ahwagner @Mrinal-Thomas-Epic maybe we should find a block of time asap to draft the intronic vrs class we need to support some of these use cases. we can do this with a small group like the 3 of us and try it out in these va-spec statements before bringing it to the larger group. And since it's draft we can add it to VRS so that EPIC and VEP can move forward with these statements.

@mbrush
Copy link
Contributor

mbrush commented May 14, 2025

My primary comment on this PR is that Molecular Consequence really should be a Statement, not a Study Result. Sara Hunt raised this point on the last VA Community Call, and I strongly concur.

Some additional notes about the rationale for this and a proposed model are in the document here.

Also, with Mrinal‘s help I worked up a draft PR defining the classes/attributes needed for a Statement-based representation, along with some data examples.

Note that the proposal re-introduces a modeling pattern from a few years ago for capturing "ancillary information" within a Statement. This pattern goes back to the ACM days, and was originally informed by several use cases from our original landscape analysis. This pattern seemed a perfect fit for capturing some of the information accompanying the central MC annotation in VEP data, and I think will be a generally useful pattern to have in the model.

More info in the doc and PR, and looking forward to discussing this further.

Base automatically changed from 1.0.0-ballot.2025-03 to 1.x June 4, 2025 18:56
Updates to VariantMolecularConcequenceProposition model to reflect decisions on 9-25-25 VA call:

- Subject: MUST be a genomic variant 
- Predicate: hasMolecularConsequence (constant)
- Object: remains the place where we capture the MC terms - but we are dropping the MolecularConsequence class and using a more generic 'Concept or Concept Set' class 
- Qualifiers: separate qualifiers for transcript and protein variant context
Additions reflect decisions made on 9-25-25 VA Call

ConceptOrConceptSet class: defined as one of MappableConcept (for a single concept) or ConceptSet (necessarily 2 or more concepts) 

I made ConceptOrConceptSet the parent of Condition and Therapeutic, and I made ConceptSet the parent of ConditionSet and TherapyGroup  (@ahwagner  @larrybabb we didn’t explicitly discuss this, but I assumed it was the plan)
This attribute is no longer inherited, as the MC Proposition class now descends directly from SubjectVariantProposition.
@mbrush
Copy link
Contributor

mbrush commented Sep 26, 2025

@Mrinal-Thomas-Epic @ahwagner @larrybabb - I updated this MC Profile PR (schema, and test fixture examples) according to implement decisions from the 9-25-25 VA Call. Please review to make sure I got things right.

I made a separate PR (#375) to rename ClinicalVariantProposition -> GeneticContextVariantPropositon.

Below is a summary of these decisions, the corresponding model changes, and some questions:

va-core-source.yaml

  • subjectVariant: MUST be a genomic variant
    • I specified this informally in the attribute description as a MUST convention (AFAIK there is no way to formally encode this in the model?)
    • Also, @ahwagner given this convention, do the four variation classes in the oneOf clause for this attribute make sense? i.e. does it make sense to allow Categorical variants, or Adjacencies, in the subject field given the requirement to be genomic?
  • objectConsequence: remains the place where we capture the MC terms - but we are dropping the MolecularConsequence class and using a more generic 'Concept or Concept Set' class (I don’t love the name, but we can change later if we want)
  • qualifiers: separate, single-valued qualifiers for transcript and protein variant context
    • transcriptVariationContextQualifier, and proteinVariationContextQualifier
    • @ahwagner @larrybabb please review descriptions and informal requirements I added here - which reflect what I understood from our conversations.

domain-entities-source.yaml

  • ConceptOrConceptSet:

    • defined as oneOf MappableConcept (for a single concept) or ConceptSet (necessarily 2 or more concepts)
    • made this the parent of Therapy and Condition - @ahwagner @larrybabb we didn’t explicitly discuss this, but I assumed it was the plan.
    • This parallels pattern for Therapeutic as one of MappableConcept or TherapyGroup, and Condition as one of Mappable:Concept or Conditionset.
  • ConceptSet:

    • defined as the parent of TherapyGroup and ConditionSet - @ahwagner @larrybabb we didn’t explicitly discuss this, but I assumed it was the plan.

Test Fixtures:

  • updated to comply with the changes above

Questions:

  • should we add a proper genomic variant Allele object as the subject in test fixture examples to illustrate this requirement and data structure
  • confirm that "OR" is the proper membershipOperator in the intronic-consequence example (see my description of this attribute in the ConceptSet class description)
  • given the inheritance pattern for Concept Sets above, will the membershipOperator attribute in TherapyGroup and ConditionSet inherit the enumeration binding from the parent ConceptSet class? If so, we can remove these duplications. Note that the descriptions/comments on these child classes are specialized however, so keep these as extensions.

Copy link
Contributor

@larrybabb larrybabb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @mbrush ! Overall I like the changes and think they work. I still need to pull this down and run the tests, etc. but I'm staring with this preliminary review.

One topic we should discuss before merging is whether ConceptSet belongs in the gks core so that Cat-VRS and VRS can use it if needed. This class seems to be a further wrapping and use of the general use MappableConcept class and therefore makes more sense to be in the core library IMO.

This is potentially a bigger decision and should be thoughtfully considered. Doing this would cause us to have to do a point release for every product (I think), maybe a patch release.

I also want to verify with @ahwagner whether va-spec would need a minor release change or path release change to it's version. Since the ConditionSet is in trial use I think this would be a breaking change (i.e. the "conditions" attibritue becomes "concepts".

@Mrinal-Thomas-Epic
Copy link
Author

  • should we add a proper genomic variant Allele object as the subject in test fixture examples to illustrate this requirement and data structure

I think it would be good to include a full Allele in one of them, just for demonstration purposes. Regarding the genomic subjectVariant requirement, VEP does allow you to provide a c. variant as input, but I believe it converts it to genomic before then calculating consequences across all transcripts (example). Let's check with @sarahhunt , whether it makes sense to introduce this requirement

  • confirm that "OR" is the proper membershipOperator in the intronic-consequence example (see my description of this attribute in the ConceptSet class description)

I would argue "AND" fits better, since both terms apply.

Super minor, but where did we land on the word Context in qualifier names again? Everything else looks good to me!

@mbrush
Copy link
Contributor

mbrush commented Oct 15, 2025

Thanks for the feedback @Mrinal-Thomas-Epic . Addressing your points below:

  • If others agree that the test fixture data examples should include a proper genomic variant, hoping @larrybabb or @ahwagner or @korikuzma might add this in. (Or at least a proper va hash id reference for the variant)
  • Re: the correct logical operator - the semantics of "OR" is that all members of the set apply on their own, independent of the other (not that one of the set applies but possible not the other). This is why I think "OR" is correct here. The "AND" operator means that the members of the set are considered together as a co-occurring collective. This is applicable for things like therapy sets, or condition sets - but I don’t think makes much sense for MC term sets.
  • Finally, I think we agreed that "context" should remain in the qualifier names as they were implemented in my recent updates to the PR.

@mbrush
Copy link
Contributor

mbrush commented Oct 15, 2025

@larrybabb Thanks for the feedback as well.

I agree that ConceptSet and ConceptOrConceptSet may be candidates for gks-core. And that it is not critical to address now as it may require broader conversation, and add more work that would slow things now. Curious to hear from @ahwagner about this, your release question, and the other questions I posed in my comment above so we can merge this and be on our way!

Finally, @larrybabb it says you 'requested changes' - but I’m not sure how to tell what these changes are?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants