Skip to content

New pipeline: nf-core/transporter #126

@Lumimar

Description

@Lumimar

Pipeline title/name

transporter

Keywords

genome annotation, annotation transfer, comparative genomics, viral genomics, microbial genomics, reference-guided annotation

What is it about?

TheTransporter is a bioinformatics workflow designed to transfer annotated
genomic features from a reference genome to a related genome sequence.
It identifies homologous loci by combining several BLAST-based similarity
searches and sequence comparisons, enabling automated annotation transfer
even when the genomes are not identical.

The workflow takes as input a reference genome with annotated features and a
target genome sequence. Using nucleotide and protein similarity searches,
the pipeline identifies the best matching loci in the target genome and
transfers the corresponding annotations while accounting for sequence
divergence.

Originally developed for viral genome annotation workflows, the approach is
applicable to any relatively small genome where a curated reference
annotation exists and a new genome assembly needs to be annotated quickly
and consistently.

Please provide a schematic diagram of the proposed pipeline

Image

What would a minimal first release of this pipeline include?

The nf-core pipeline could consist of the following logical stages:

  1. Input validation: Validate reference genome, annotation files, and target genome inputs.

  2. Reference database preparation:

  • Extract coding sequences and proteins from reference annotation.
  • Build BLAST databases.
  1. Similarity searches
  • Run BLAST-based searches (e.g. blastn, blastp, tblastn) between reference features and the target genome.
  1. Feature mapping
  • Identify best matching loci for each annotated feature.
  • Resolve overlaps or conflicting mappings.
  1. Annotation transfer
  • Transfer features to the target genome coordinates.
  • Adjust coordinates and feature metadata as needed.
  1. Output generation
  • Produce annotated genome files and summary reports.

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

The proposed pipeline addresses a common problem in genomics: transferring curated genome annotations from a well-characterised reference genome to newly assembled genomes that are closely related.

In many biological contexts—particularly viral and microbial genomics—new genomes are frequently generated that are highly similar to existing references. In these cases, performing de novo annotation from scratch can be unnecessary, time-consuming, and may produce inconsistent gene models compared to previously curated annotations. A reference-guided annotation transfer approach allows researchers to propagate existing high-quality annotations to new genomes in a consistent and reproducible way.

Currently, nf-core provides pipelines for many genomic analysis tasks (such as assembly, variant calling, transcriptomics, and functional annotation), but it does not include a workflow specifically designed for annotation transfer between related genomes. This type of workflow fills an important gap between genome assembly and downstream comparative genomics.

TheTransporter implements a structured approach for annotation transfer that combines multiple sources of evidence, including sequence similarity searches and ORF detection, to identify homologous loci and propagate annotations to the target genome.

An nf-core implementation would make this approach easier to adopt and integrate into broader genomic analysis workflows.

Who would be interested?

The pipeline would primarily benefit researchers working with viral and microbial genomes, where reference-guided annotation transfer is a common task.

Potential users include:

Viral genomics researchers who regularly assemble and annotate new viral genomes during surveillance or outbreak investigations.

Microbial genomics and comparative genomics groups analysing collections of related bacterial or archaeal genomes.

Public health and genomic surveillance laboratories that need rapid and consistent annotation of newly sequenced pathogens.

Genome curation teams maintaining reference genome annotations and propagating updates to related sequences.

Bioinformatics facilities and core labs supporting genomics projects that generate multiple related genome assemblies.

Because the pipeline focuses on transferring curated annotations rather than predicting genes from scratch, it is particularly useful in projects where consistency of annotation across genomes is important, such as comparative genomics, evolutionary studies, and reference database maintenance.

What has been done so far

The transporter pipeline has been implemented as a series of nextflow modules, and runs as a nextflow pipeline, so the next step would be to convert the modules to nf-core format.

URL to existing work (if applicable)

https://github.com/PaoloRibeca/TheTransporter

Are there any similar existing nf-core pipelines?

viralrecon

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    proposed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions