Skip to content

Paired information is lost when k-mer mapper is turned off #1534

@andrewprzh

Description

@andrewprzh

Description of bug

  1. Compared to the main branch, E. coli TruSeq 100x dataset looses the longest contig.
    main: NODE_1_length_405507_cov_64.322436 (405507 bp)
    try-no-mapper NODE_1_length_361537_cov_63.875850 (361537 bp)
    See alignment viewer
  2. It appears that path extend has not enough information to thread through a big loop, ending at edge with ID 2006. Proper path path continues to 1429, but path extend loops back to 381962 (see pdf attached)
  3. The main cause for lost paired information (381962 -> 1429 and 128578 -> 1429) is 3 out 6 reads that do not map to these edges. In fact, these edges still map but with much smaller ranges (text file attached)
  4. In turn, the key reason for that is that there is high number of mismatches at the end of edge 381962 (see IGV screenshot). Previously, k-mer mapper was helping to map these reads.

Proposed ideas for fix:

  1. Improve mapping to thread reads through mismatches
  2. Currently, mismatch corrector selects mismatch candidate positions based on the k-mers from k-mer mapper. It also uses only reads entirely mapping inside one edge. Need to use proper read mapping to correct all possible mismatches.

alignment_viewer.zip

E.coli 100x lost paired info.pdf

Image

pe_fill.txt

spades.log

spades.log

params.txt

params.txt

SPAdes version

4.2.0

Operating System

Linux-6.8.0-65-generic-x86_64-with-glibc2.35

Python Version

3.10.12

Method of SPAdes installation

manual, try-no-mapper brabch

No errors reported in spades.log

  • Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions