Skip to content

Add warning if optional Sequence Location and Sequence Reference files are not populated #20

@brendanreardon

Description

@brendanreardon

Feature description

Could we update cat-vrs-python to produce warnings if a cat-vrs object does not have the following Sequence Location and Sequence Reference fields populated when validating cat-vrs objects?

Within a Sequence Location:

  • start
  • end
  • sequence

Within Sequence Reference:

  • id
  • name

Use case

We would like to encourage implementers to populate these field / attributes within their cat-vrs/vrs objects, and adding a warning from the reference implementation(s) is a gentle way to suggest them to do so. Because they are optional attributes within for Sequence Location and Sequence Reference, sometimes implementations and tools will only include the hashes, reducing the utility of the cat-vrs/vrs objects.

@rhdolin opened an Issue where he emphasizing the importance of having these location and reference attributes populated within cat-vrs / vrs objects to facilitate matching and searching across categorical variants via location. At today's meeting he shared a really nice illustration going over their use case. @cmprocknow also added that they are trying to do something similar at EPIC, and are facing similar limitations of receiving only hashed ids from some datasources.

For example, while the variant normalizer produces start, end, and sequence for the location object, it only produces the refgetAccession for sequenceReference. @korikuzma explained to me a bit ago that this could be resolved by updating seqrepo, if my memory is serving me correctly.

Proposed solution

I think that this would be addressed by adding a function to the DefiningAlleleConstraint and DefiningLocationConstraint classes within cat-vrs/src/ga4gh/cat-vrs/python that checks if these fields are populated?

I'm not sure of the interplay between vrs-python and cat-vrs-python, but this may be more appropriate to put in the vrs-python repository.

Alternatives considered

No response

Implementation details

No response

Potential Impact

No response

Additional context

No response

Contribution

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions