-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Is your feature request related to a problem? Please describe.
When you use a list of barcodes, you may end up with this error:
ValueError: There are overlapping barcodes in the list (difficult to determine which barcode is correct).
The error itself is not very clear. You need to find the corresponding python script and go inside the code to understand what it means exactly, in particular:
https://github.com/pughlab/ConsensusCruncher/blob/master/ConsensusCruncher/extract_barcodes.py
You will find the corresponding error here:
elif check_overlap(blist):raise ValueError("There are overlapping barcodes in the list (difficult to determine which barcode is correct).")
The function check_overlap(blist) gives a better understanding of the signification of the error:
def check_overlap(blist):
"""(list) -> bool
Return boolean indicating whether or not there's overlapping barcodes within the list.
check_overlap(['AACT', 'AGCT'])
False
check_overlap(['AACTCT', 'AACT'])
True
"""
overlap = False
for barcode in blist:
if sum([barcode in b for b in blist]) > 1:
overlap = True
return overlap
In other words, overlapping barcodes do not mean exact duplicates! It's a bit more subtile than this.
AACTCT and AACT will be considered as overlapping barcodes because AACT is a substring of AACTCT.
Describe the solution you'd like
Explicit better what overlapping barcodes mean and if there are any, remove them automatically within the code before running Consensus Cruncher.
Describe alternatives you've considered
To fix this error, you need to verify that all your barcodes are "unique" in the sense that they are not a substring of another barcode.
So I wrote my own script to remove all those "overlapping barcodes" before running Consensus Cruncher.