Skip to content

Wrong insertion can be inferred from split alignments #561

@HillJamie

Description

@HillJamie

Attached is a toy dataset that illustrates the problem: sniffles_wrong_insertion_example.zip

It consists of:

  • a reference of 10k random nucleotides
  • an insertion of 30k random nucleotides
  • a read created by taking the first 3500nt of the reference, then the insertion, then the next 2500nt of the reference, and reverse complementing
  • a bam file from minimap2, sorted and indexed
  • the calls from sniffles 2.6.2 (installed with pip)
  • a simple script to reproduce the bam file starting from the read and reference

Sniffles calls an insertion at the correct location on the reference and with the expected length. However, the inserted sequence is offset by 1000nt from the true position in the read. The sequence is set in sv.py here https://github.com/fritzsedlazeck/Sniffles/blob/8a017b2b9047380abe2a6b0e5696f778ec092d4b/src/sniffles/sv.py#L576C1-L576C84

In this particular case, the correct coordinates are obtained by setting the end index to the read length - last.qry_end, and setting the start index to read length - current.qry_start.

There are, however, many cases:

  • primary +, supplementary +, insertion between them
  • primary -, supplementary -, insertion between them (this is the attached data)
  • primary +, supplementary1 +, supplementary2 +, insertion between the two supplementary alignments
  • primary -, supplementary1 +, supplementary2 +, insertion between the two supplementary alignments
  • primary +, supplementary1 -, supplementary2 -, insertion between the two supplementary alignments
  • primary -, supplementary1 -, supplementary2 -, insertion between the two supplementary alignments

So far as I can tell, in the general case this subtraction is needed whenever the primary alignment is on the minus strand. If the strand of the alignments used to infer the insertion differs from the strand of the primary alignment, then one must also reverse complement the result (even if the primary alignment is on the plus strand).

Thanks for your time, and your continuous improvements to Sniffles2!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions