-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Feature description
Could we update cat-vrs-python to produce warnings if a cat-vrs object does not have the following Sequence Location and Sequence Reference fields populated when validating cat-vrs objects?
Within a Sequence Location:
- start
- end
- sequence
Within Sequence Reference:
- id
- name
Use case
We would like to encourage implementers to populate these field / attributes within their cat-vrs/vrs objects, and adding a warning from the reference implementation(s) is a gentle way to suggest them to do so. Because they are optional attributes within for Sequence Location and Sequence Reference, sometimes implementations and tools will only include the hashes, reducing the utility of the cat-vrs/vrs objects.
@rhdolin opened an Issue where he emphasizing the importance of having these location and reference attributes populated within cat-vrs / vrs objects to facilitate matching and searching across categorical variants via location. At today's meeting he shared a really nice illustration going over their use case. @cmprocknow also added that they are trying to do something similar at EPIC, and are facing similar limitations of receiving only hashed ids from some datasources.
For example, while the variant normalizer produces start
, end
, and sequence
for the location object, it only produces the refgetAccession
for sequenceReference
. @korikuzma explained to me a bit ago that this could be resolved by updating seqrepo, if my memory is serving me correctly.
Proposed solution
I think that this would be addressed by adding a function to the DefiningAlleleConstraint and DefiningLocationConstraint classes within cat-vrs/src/ga4gh/cat-vrs/python that checks if these fields are populated?
I'm not sure of the interplay between vrs-python and cat-vrs-python, but this may be more appropriate to put in the vrs-python repository.
Alternatives considered
No response
Implementation details
No response
Potential Impact
No response
Additional context
No response
Contribution
None