-
Notifications
You must be signed in to change notification settings - Fork 156
Open
Labels
Description
Description of bug
- Compared to the main branch, E. coli TruSeq 100x dataset looses the longest contig.
main: NODE_1_length_405507_cov_64.322436 (405507 bp)
try-no-mapper NODE_1_length_361537_cov_63.875850 (361537 bp)
See alignment viewer - It appears that path extend has not enough information to thread through a big loop, ending at edge with ID 2006. Proper path path continues to 1429, but path extend loops back to 381962 (see pdf attached)
- The main cause for lost paired information (381962 -> 1429 and 128578 -> 1429) is 3 out 6 reads that do not map to these edges. In fact, these edges still map but with much smaller ranges (text file attached)
- In turn, the key reason for that is that there is high number of mismatches at the end of edge 381962 (see IGV screenshot). Previously, k-mer mapper was helping to map these reads.
Proposed ideas for fix:
- Improve mapping to thread reads through mismatches
- Currently, mismatch corrector selects mismatch candidate positions based on the k-mers from k-mer mapper. It also uses only reads entirely mapping inside one edge. Need to use proper read mapping to correct all possible mismatches.
E.coli 100x lost paired info.pdf
spades.log
params.txt
SPAdes version
4.2.0
Operating System
Linux-6.8.0-65-generic-x86_64-with-glibc2.35
Python Version
3.10.12
Method of SPAdes installation
manual, try-no-mapper brabch
No errors reported in spades.log
- Yes