Open
Description
Expected: Converting a long HGVS dup to variant coordinates then back again will make a dup
Actual: A long dup is converted to a delins:
from pyhgvs import parse_hgvs_name, variant_to_hgvs_name
g_hgvs_str = "NC_000001.10:g.235611675_235611994dup"
c_hgvs_str = "NM_003193.4(TBCE):c.1411_1501dup"
chrom, offset, ref, alt = parse_hgvs_name(g_hgvs_str, f, None)
g_hgvs_name = variant_to_hgvs_name(chrom, offset, ref, alt, f, None)
print(f"{g_hgvs_str=} => {g_hgvs_name=}")
chrom, offset, ref, alt = parse_hgvs_name(c_hgvs_str, f, transcript)
c_hgvs_name = variant_to_hgvs_name(chrom, offset, ref, alt, f, transcript)
print(f"{c_hgvs_str=} => {c_hgvs_name=}")
Output:
g_hgvs_str='NC_000001.10:g.235611675_235611994dup' => g_hgvs_name=HGVSName('g.235611773_235611774ins320')
c_hgvs_str='NM_003193.4(TBCE):c.1411_1501dup' => c_hgvs_name=HGVSName('NM_003193.4(TBCE):c.1491+18_1491+19ins320')
This is because hgvs_justify_indel only looks a hardcoded 100 bases around the indel
If you change the code to:
size = max(len(ref), len(alt)) + 1
start = max(offset - size, 0)
end = offset + size
It keeps the dup:
g_hgvs_str='NC_000001.10:g.235611675_235611994dup' => g_hgvs_name=HGVSName('g.235611675_235611994dup320')
c_hgvs_str='NM_003193.4(TBCE):c.1411_1501dup' => c_hgvs_name=HGVSName('NM_003193.4(TBCE):c.1411_1501dup320')
Metadata
Metadata
Assignees
Labels
No labels
Activity