Skip to content

Strand not taken into account inside GroupReadsByUmi #1084

@SPPearce

Description

@SPPearce

Hi fgbio,

I think I have found an issue with the GroupReadsByUmi code that I wasn't expecting.

I have a bam file containing 9 paired-end reads with the same start/end position, but 7 are on the forward strand and 2 are on the reverse strand:

Image

The UMIs are all set to be the same, AAAAAA (see #1077 for the reason why!).
When I run GroupReadsByUmi I was expecting it to group the two strands separately; leading to two consensus reads, one on each strand.
However, instead I get (with "Edit" or "Adjacency") only one group and one consensus read (that corresponds to the forward strand, although that could be because it was the first read in the set).

If I set to a joint UMI like "AAAA-AAAA", Paired will correctly say these are the same.

It looks like the strand is not taken into account when marking the reads into a group, which will lead to sometimes combining both strands if there are reads at the same start/end with the same UMI. Probably a niche case for most library preps, but breaks everything when you set all the UMIs to be the same!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions