Skip to content

New pipeline: nf-core/nanotax #104

@dialvarezs

Description

@dialvarezs

Pipeline title/name

nanotax

Keywords

nanopore, 16s, 18s, its, metabarcoding

What is it about?

A metabarcoding pipeline specially designed for Oxford Nanopore data.

Please provide a schematic diagram of the proposed pipeline

Image

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

Metabarcoding analysis in nf-core is handled by ampliseq, which supports short-read technologies and PacBio sequencing. However, Oxford Nanopore makes traditional ASV and OTU clustering approaches difficult to use given its higher error rate.

Typically, Nanopore reads are assigned taxonomy directly instead of being grouped into molecular units, which is the approach taken by the official EPI2ME workflow. There are also specialized tools like EMU that use an expectation-maximization algorithm to generate more accurate abundance profiles.

This use case is not completely covered in nf-core currently. It could be difficult to add this feature to ampliseq because it relies heavily on ASV/OTU formation, but we can develop and share subworkflows where this is applicable (post-analysis mainly).

Who would be interested?

Researchers analyzing the composition of microbial samples sequenced with Oxford Nanopore, which keeps getting more popular given its low entry barrier and ease of use.

What has been done so far

We have a initial version for 16S using MMseqs that we use internally in our lab.
Currently, we are refactoring it in subworkflows to have a more modular design and make easy to add support for other amplicons (18S, ITS) and other classification tools. We also have been porting local modules to nf-core in this process.

URL to existing work (if applicable)

https://github.com/catg-umag/nanotax

Are there any similar existing nf-core pipelines?

ampliseq

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions