-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Attached is a toy dataset that illustrates the problem: sniffles_wrong_insertion_example.zip
It consists of:
- a reference of 10k random nucleotides
- an insertion of 30k random nucleotides
- a read created by taking the first 3500nt of the reference, then the insertion, then the next 2500nt of the reference, and reverse complementing
- a bam file from minimap2, sorted and indexed
- the calls from sniffles 2.6.2 (installed with pip)
- a simple script to reproduce the bam file starting from the read and reference
Sniffles calls an insertion at the correct location on the reference and with the expected length. However, the inserted sequence is offset by 1000nt from the true position in the read. The sequence is set in sv.py here https://github.com/fritzsedlazeck/Sniffles/blob/8a017b2b9047380abe2a6b0e5696f778ec092d4b/src/sniffles/sv.py#L576C1-L576C84
In this particular case, the correct coordinates are obtained by setting the end index to the read length - last.qry_end, and setting the start index to read length - current.qry_start.
There are, however, many cases:
- primary +, supplementary +, insertion between them
- primary -, supplementary -, insertion between them (this is the attached data)
- primary +, supplementary1 +, supplementary2 +, insertion between the two supplementary alignments
- primary -, supplementary1 +, supplementary2 +, insertion between the two supplementary alignments
- primary +, supplementary1 -, supplementary2 -, insertion between the two supplementary alignments
- primary -, supplementary1 -, supplementary2 -, insertion between the two supplementary alignments
So far as I can tell, in the general case this subtraction is needed whenever the primary alignment is on the minus strand. If the strand of the alignments used to infer the insertion differs from the strand of the primary alignment, then one must also reverse complement the result (even if the primary alignment is on the plus strand).
Thanks for your time, and your continuous improvements to Sniffles2!