When aligning sequences with indels in repeat regions, the alignment is ambiguous. Most aligners implicitly right or left shuffled gaps, but this choice is arbitrary: the real region of ambiguity spans from the left extreme to the right extreme. The result is that it’s hard to know whether a particular region is within an ambiguity region.
Proposal: Based on work from NCBI, GA4GH adopted fully-justified normalization for the GA4GH VR specification.
Migrating UTA alignments to fully-justified would allow relatively simple tests for whether a variant overlaped a region of ambiguity.
When aligning sequences with indels in repeat regions, the alignment is ambiguous. Most aligners implicitly right or left shuffled gaps, but this choice is arbitrary: the real region of ambiguity spans from the left extreme to the right extreme. The result is that it’s hard to know whether a particular region is within an ambiguity region.
Proposal: Based on work from NCBI, GA4GH adopted fully-justified normalization for the GA4GH VR specification.
Migrating UTA alignments to fully-justified would allow relatively simple tests for whether a variant overlaped a region of ambiguity.