Skip to content

New pipeline: nf-core/protaga #102

@michalxlevin

Description

@michalxlevin

Pipeline title/name

proteotranscriptomics assisted gene annotation

Keywords

transcriptome assembly, gene annotation, peptide evidence

What is it about?

The pipeline takes raw sequencing reads (FASTQ), performs quality control, read correction, trimming, alignment, and then uses the cleaned data for either De novo transcriptome assembly (Genome-free, GF) or Genome-guided transcriptome assembly (GG). The outputs from both assemblies then undergo annotation, quantification, and quality assessment, and finally proteomics-based validation, including tools like BUSCO, TransDecoder, MaxQuant, Transrate, and Detonate.

Please provide a schematic diagram of the proposed pipeline

Image

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

We work with this pipeline quite some time and found it useful in a myriad of different research fields especially where genome assemblies and annotations are not of high quality.

Who would be interested?

Any researcher who would like to apply transcriptomics or proteomics in an organism that is not annotated satisfactory will be able to build his own database with protein coding genes with peptide evidence. The data can either be produced from scratch with simple paired end sequencing and mass-spectrometry measurements from the organism and tissue of interest or publicly available data can be used.

What has been done so far

I have a running nextflow pipeline which performs all steps smoothly.

URL to existing work (if applicable)

The pipeline sits on my gitlab account

Are there any similar existing nf-core pipelines?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions