Skip to content

Help Understanding seed_depth #224

@cizydorczyk

Description

@cizydorczyk

Is there a simple explanation of what the seed_depth parameter means? It seems quite important, yet I am not grasping its significance despite having read the manuscript a couple of times.

I am interested in trying NextDenovo for some bacterial genomes from Nanopore data. I have low sequencing depth (20-50x), but a preliminary run of a single genome with 20x sequencing depth gives me 5 contigs for one 5.5Mbp sized genome. This is in line with what I get from Canu and Flye with the same data (also 5 contigs from each of these assemblers).

I am specifically interested in NextDenovo's read correction, but its assembly would be a plus. I understand it may not have been created for this purpose (i.e., bacterial genomes).

Its performance so far seems adequate, but I do not fully understand how it works, and the significance of the various parameters, most importantly seed_depth.

  1. If I understand, seed_depth is the sequencing depth value which is used to select the longest reads that amount to <seed_depth> for error correction? So, 45X means starting with the longest read in order of decreasing length, reads will continue to be selected from the input dataset until 45X sequencing depth is reached. These are then used in error correction?

  2. Where does seed_cutoff come in? Is this the read length below which reads will not be selected in the above step, no matter the current depth?

  3. Lastly, --blacklist is used to somehow keep more reads in the corrected reads set, yes? I saw in another issue that setting this is what I want if I want to also assemble the corrected reads using another assembler?

Thank you kindly,
Conrad

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions