Skip to content

pileup on (large) metagenome - parallelization #404

Open
@brambloemen

Description

@brambloemen

Recently, I've been running modkit pileup with default settings on a metagenomic promethion run (130Gb).

The assembled metagenome consists of about 400Mbp of contigs, ranging in size from a couple 1000bp to 5Mbp, with coverage ranging from 1X to 1000X or more.

The modkit pileup is the bottleneck in my pipeline, requiring very high memory (peak about 350GB), and very long compute times (e.g. >12h on 24 cores).

In the documentation it's stated that --interval-size, --sampling-interval-size, and --chunk-size can be modified to improve parallelism.

What would be the best settings for my usecase?

Thanks!

Bram

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions