Skip to content

Same-as-reference Alleles should have a ReferenceLengthExpression state #587

@theferrit32

Description

@theferrit32
2. Compare the two Allele sequences, if:

   a. both are empty, the input Allele is a reference Allele. Return a new
      Allele with:

      1. the `location` from the input Allele.

      2. a `ReferenceLengthExpression` for the `state` with `length` and
         `repeatSubunitLength` both set to the length of the input `location`.

https://vrs.ga4gh.org/en/latest/conventions/normalization.html#allele-normalization

Intended implementation marked in code here:

except ValueError:
# Occurs for ref agree Alleles (when alt = ref)
len_trimmed_ref = len_trimmed_alt = 0
# TODO: Return RLE for ref agree Alleles

Adding a test to exercise this:

@pytest.mark.vcr
def test_reference_allele_rle(tlr):
    """Test that reference alleles (REF==ALT) are normalized to ReferenceLengthExpression."""
    # Test with gnomad format
    gnomad_ref_allele = "1-100210778-AA-AA"
    allele = tlr._from_gnomad(gnomad_ref_allele)

    expected = {
        "type": "Allele",
        "location": {
            "type": "SequenceLocation",
            "sequenceReference": {
                "type": "SequenceReference",
                "refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
            },
            "start": 100210777,
            "end": 100210779,
        },
        "state": {
            "type": "ReferenceLengthExpression",
            "length": 2,
            "repeatSubunitLength": 2,
            "sequence": "AA",
        },
    }

    assert allele.model_dump(exclude_none=True) == expected

Fails with:

FAILED tests/extras/test_allele_translator.py::test_reference_allele_rle - AssertionError: assert {'type': 'Allele', 'location': {'type': 'SequenceLocation', 'sequenceReference': {'type': 'SequenceReference', 'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'}, 'start': 100210777, 'end': 100210779}, 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'AA'}} == {'type': 'Allele', 'location': {'type': 'SequenceLocation', 'sequenceReference': {'type': 'SequenceReference', 'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'}, 'start': 100210777, 'end': 100210779}, 'state': {'type': 'ReferenceLengthExpression', 'length': 2, 'repeatSubunitLength': 2, 'sequence': 'AA'}}

  Common items:
  {'location': {'end': 100210779,
                'sequenceReference': {'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO',
                                      'type': 'SequenceReference'},
                'start': 100210777,
                'type': 'SequenceLocation'},
   'type': 'Allele'}
  Differing items:
  {'state': {'sequence': 'AA', 'type': 'LiteralSequenceExpression'}} != {'state': {'length': 2, 'repeatSubunitLength': 2, 'sequence': 'AA', 'type': 'ReferenceLengthExpression'}}

  Full diff:
    {
        'location': {
            'end': 100210779,
            'sequenceReference': {
                'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO',
                'type': 'SequenceReference',
            },
            'start': 100210777,
            'type': 'SequenceLocation',
        },
        'state': {
  -         'length': 2,
  -         'repeatSubunitLength': 2,
            'sequence': 'AA',
  -         'type': 'ReferenceLengthExpression',
  ?                  ^^^      ------
  +         'type': 'LiteralSequenceExpression',
  ?                  ^^^  ++++++
        },
        'type': 'Allele',
    }

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions