-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Labels
priority:highHigh priorityHigh priority
Description
2. Compare the two Allele sequences, if:
a. both are empty, the input Allele is a reference Allele. Return a new
Allele with:
1. the `location` from the input Allele.
2. a `ReferenceLengthExpression` for the `state` with `length` and
`repeatSubunitLength` both set to the length of the input `location`.
https://vrs.ga4gh.org/en/latest/conventions/normalization.html#allele-normalization
Intended implementation marked in code here:
vrs-python/src/ga4gh/vrs/normalize.py
Lines 137 to 140 in f01f648
except ValueError: | |
# Occurs for ref agree Alleles (when alt = ref) | |
len_trimmed_ref = len_trimmed_alt = 0 | |
# TODO: Return RLE for ref agree Alleles |
Adding a test to exercise this:
@pytest.mark.vcr
def test_reference_allele_rle(tlr):
"""Test that reference alleles (REF==ALT) are normalized to ReferenceLengthExpression."""
# Test with gnomad format
gnomad_ref_allele = "1-100210778-AA-AA"
allele = tlr._from_gnomad(gnomad_ref_allele)
expected = {
"type": "Allele",
"location": {
"type": "SequenceLocation",
"sequenceReference": {
"type": "SequenceReference",
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
},
"start": 100210777,
"end": 100210779,
},
"state": {
"type": "ReferenceLengthExpression",
"length": 2,
"repeatSubunitLength": 2,
"sequence": "AA",
},
}
assert allele.model_dump(exclude_none=True) == expected
Fails with:
FAILED tests/extras/test_allele_translator.py::test_reference_allele_rle - AssertionError: assert {'type': 'Allele', 'location': {'type': 'SequenceLocation', 'sequenceReference': {'type': 'SequenceReference', 'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'}, 'start': 100210777, 'end': 100210779}, 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'AA'}} == {'type': 'Allele', 'location': {'type': 'SequenceLocation', 'sequenceReference': {'type': 'SequenceReference', 'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'}, 'start': 100210777, 'end': 100210779}, 'state': {'type': 'ReferenceLengthExpression', 'length': 2, 'repeatSubunitLength': 2, 'sequence': 'AA'}}
Common items:
{'location': {'end': 100210779,
'sequenceReference': {'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO',
'type': 'SequenceReference'},
'start': 100210777,
'type': 'SequenceLocation'},
'type': 'Allele'}
Differing items:
{'state': {'sequence': 'AA', 'type': 'LiteralSequenceExpression'}} != {'state': {'length': 2, 'repeatSubunitLength': 2, 'sequence': 'AA', 'type': 'ReferenceLengthExpression'}}
Full diff:
{
'location': {
'end': 100210779,
'sequenceReference': {
'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO',
'type': 'SequenceReference',
},
'start': 100210777,
'type': 'SequenceLocation',
},
'state': {
- 'length': 2,
- 'repeatSubunitLength': 2,
'sequence': 'AA',
- 'type': 'ReferenceLengthExpression',
? ^^^ ------
+ 'type': 'LiteralSequenceExpression',
? ^^^ ++++++
},
'type': 'Allele',
}
Metadata
Metadata
Assignees
Labels
priority:highHigh priorityHigh priority