New pipeline: nf-core/protaga

### Pipeline title/name

proteotranscriptomics assisted gene annotation

### Keywords

transcriptome assembly, gene annotation, peptide evidence

### What is it about?

The pipeline takes raw sequencing reads (FASTQ), performs quality control, read correction, trimming, alignment, and then uses the cleaned data for either De novo transcriptome assembly (Genome-free, GF) or Genome-guided transcriptome assembly (GG). The outputs from both assemblies then undergo annotation, quantification, and quality assessment, and finally proteomics-based validation, including tools like BUSCO, TransDecoder, MaxQuant, Transrate, and Detonate.


### Please provide a schematic diagram of the proposed pipeline

<img width="2266" height="1647" alt="Image" src="https://github.com/user-attachments/assets/98f3dc9e-057e-4953-b212-2cf6f9075e17" />

### I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

- [x] be built with Nextflow.
- [x] pass nf-core lint tests and use standardized parameters.
- [x] be community-owned and developed within the nf-core organization.
- [x] open source under the MIT license with proper credits and acknowledgments.
- [x] have a descriptive, all lowercase, and without punctuation name.
- [x] use the nf-core pipeline template and predominantly use official nf-core modules.
- [x] focus on a specific data/analysis type with appropriate scope.
- [x] have properly maintained documentation.
- [x] be bundled using versioned Docker/Singularity containers.

### Why do we need a new pipeline?

We work with this pipeline quite some time and found it  useful in a myriad of different research fields especially where genome assemblies and annotations are not of high quality.

### Who would be interested?

Any researcher who would like to apply transcriptomics or proteomics in an organism that is not annotated satisfactory will be able to build his own database with protein coding genes with peptide evidence. The data can either be produced from scratch with simple paired end sequencing and mass-spectrometry measurements from the organism and tissue of interest or publicly available data can be used.

### What has been done so far

I have a running nextflow pipeline which performs all steps smoothly.

### URL to existing work (if applicable)

The pipeline sits on my gitlab account

### Are there any similar existing nf-core pipelines?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New pipeline: nf-core/protaga #102

Pipeline title/name

Keywords

What is it about?

Please provide a schematic diagram of the proposed pipeline

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

Why do we need a new pipeline?

Who would be interested?

What has been done so far

URL to existing work (if applicable)

Are there any similar existing nf-core pipelines?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New pipeline: nf-core/protaga #102

Description

Pipeline title/name

Keywords

What is it about?

Please provide a schematic diagram of the proposed pipeline

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

Why do we need a new pipeline?

Who would be interested?

What has been done so far

URL to existing work (if applicable)

Are there any similar existing nf-core pipelines?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions