Hi fgbio,
I think I have found an issue with the GroupReadsByUmi code that I wasn't expecting.
I have a bam file containing 9 paired-end reads with the same start/end position, but 7 are on the forward strand and 2 are on the reverse strand:
The UMIs are all set to be the same, AAAAAA (see #1077 for the reason why!).
When I run GroupReadsByUmi I was expecting it to group the two strands separately; leading to two consensus reads, one on each strand.
However, instead I get (with "Edit" or "Adjacency") only one group and one consensus read (that corresponds to the forward strand, although that could be because it was the first read in the set).
If I set to a joint UMI like "AAAA-AAAA", Paired will correctly say these are the same.
It looks like the strand is not taken into account when marking the reads into a group, which will lead to sometimes combining both strands if there are reads at the same start/end with the same UMI. Probably a niche case for most library preps, but breaks everything when you set all the UMIs to be the same!
Hi fgbio,
I think I have found an issue with the
GroupReadsByUmicode that I wasn't expecting.I have a bam file containing 9 paired-end reads with the same start/end position, but 7 are on the forward strand and 2 are on the reverse strand:
The UMIs are all set to be the same,
AAAAAA(see #1077 for the reason why!).When I run
GroupReadsByUmiI was expecting it to group the two strands separately; leading to two consensus reads, one on each strand.However, instead I get (with "Edit" or "Adjacency") only one group and one consensus read (that corresponds to the forward strand, although that could be because it was the first read in the set).
If I set to a joint UMI like "AAAA-AAAA", Paired will correctly say these are the same.
It looks like the strand is not taken into account when marking the reads into a group, which will lead to sometimes combining both strands if there are reads at the same start/end with the same UMI. Probably a niche case for most library preps, but breaks everything when you set all the UMIs to be the same!