Skip to content

hgvs to vrs conversion fails with repeat sequence notation #582

@spencerseale

Description

@spencerseale

I'm translating the info_CLNHGVS field in clinvar 38 build into VRS IDs.

Many hgvs IDs including [int] are failing (repeat sequence notation in hgvs). Is there a transformation I should perform on the source hgvs data prior to passing to the allele translator?

Error translating NC_000001.11:g.930090TTCCTCTCCTCCTGCCCCACC[2]: NC_000001.11:g.930090TTCCTCTCCTCCTGCCCCACC[2]: char 42: expected the character '='
Error translating NC_000001.11:g.930139CCT[1]: NC_000001.11:g.930139CCT[1]: char 24: expected the character '='
Error translating NC_000001.11:g.930212AAG[1]: NC_000001.11:g.930212AAG[1]: char 24: expected the character '='

Using ga4gh-ver==2.1.3:

from ga4gh.vrs.dataproxy import create_dataproxy
from ga4gh.vrs.extras.translator import AlleleTranslator
import os

os.environ["UTA_DB_URL"] = "postgresql://anonymous:[email protected]:5432/uta/uta_20241220"
seqrepo_rest_service_url = "seqrepo+https://services.genomicmedlab.org/seqrepo"
dataproxy = create_dataproxy(uri=seqrepo_rest_service_url)
translator = AlleleTranslator(dataproxy) 

hgvs = [
    "NC_000001.11:g.930090TTCCTCTCCTCCTGCCCCACC[2]",
    "NC_000001.11:g.930139CCT[1]",
    "NC_000001.11:g.930212AAG[1]",
]

for h in hgvs:
    translated = translator.translate_from(h, "hgvs")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions