-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Pipeline title/name
musa (multi-source variant annotation)
Keywords
clinical genomics, variant annotation, ACMG classification, FAIR data
What is it about?
MuSA (Multi-Source variant Annotation) is a Nextflow pipeline designed for large-scale, multi-tool variant annotation and interpretation in clinical and research genomics.
It automates the integration of multiple annotation engines and databases to produce standardized, high-quality variant annotations, ACMG-based classifications, and VUS reclassification in a fully reproducible and scalable fashion.
MuSA bridges the gap between raw variant calls and clinical interpretation by harmonizing the outputs of different tools such as VEP, ANNOVAR, Genebe, and RENOVO into unified, standardized MAF outputs.
Its modular design supports diverse data sources, reproducible database updates, streamlining the entire variant interpretation workflow.
Please provide a schematic diagram of the proposed pipeline
I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:
- be built with Nextflow.
- pass nf-core lint tests and use standardized parameters.
- be community-owned and developed within the nf-core organization.
- open source under the MIT license with proper credits and acknowledgments.
- have a descriptive, all lowercase, and without punctuation name.
- use the nf-core pipeline template and predominantly use official nf-core modules.
- focus on a specific data/analysis type with appropriate scope.
- have properly maintained documentation.
- be bundled using versioned Docker/Singularity containers.
Why do we need a new pipeline?
Variant interpretation remains a major bottleneck in clinical genomics. Existing online annotation tools often process single variants, while the mentioned local tools require extensive manual setup, making large-scale or reproducible analyses difficult.
Current workflows lack integration, resulting in fragmented outputs and inconsistent ACMG classification strategies.
MuSA addresses these needs by:
- Combining complementary annotation tools (VEP, ANNOVAR, Genebe, RENOVO) into a unified, automated pipeline.
- Enabling standardized, multi-sample outputs in MAF format compatible with downstream tools like maftools.
- Providing automated ACMG classification with rule-level evidence and customizable reclassification using machine learning.
- Supporting flexible integration of patient metadata (HPO, relation to proband) for phenotype-driven variant prioritization.
- Managing database downloads automatically for full reproducibility.
MuSA therefore provides a single, modular, FAIR-compliant workflow for scalable variant interpretation — a key unmet need in the clinical genomics community.
To our knowledge, there is no nf-core pipeline specifically dedicated to comprehensive, multi-source variant annotation and ACMG-based interpretation.
While nf-core/sarek is an excellent, widely used workflow for variant calling and upstream processing (from FASTQ to VCF), its annotation component is relatively general and not designed for deep, multi-database interpretation.
MuSA, in contrast, is specifically built for annotation and interpretation, integrating dozens of external databases, over 40 VEP plugins, and dedicated modules for ACMG classification, variant prioritization, and VUS reclassification.
MuSA is therefore complementary to sarek and to many other "all in one" variant calling pipelines: it focuses exclusively on the high-resolution, clinical-grade annotation layer that follows variant calling. We believe its scope, depth, and specialized outputs go well beyond what any other pipeline is intended to provide, making it a robust standalone solution for downstream interpretation.
Who would be interested?
- Clinical genomics and molecular diagnostics researchers
- Bioinformaticians performing germline or somatic variant interpretation
- Institutions aiming to standardize annotation workflows across projects
- Developers and users of ACMG classification and variant reclassification tools
What has been done so far
A complete germline module (v1.0) is implemented and tested, integrating VEP, ANNOVAR, Genebe, and RENOVO.
The pipeline currently produces:
- ACMG-focused MAF file highlighting clinically relevant variants.
- Phenotype-based filtered MAF file, when HPO terms are provided.
- Raw MAF file with complete annotations (up to 700 columns).
- Interactive HTML report summarizing results and including maftools plots and tables.
A setup workflow for automated database downloads is already implemented; a somatic (cancer) module (v2.0) will be introduced, expanding MuSA’s scope to oncology applications. Note: some databases require users to provide their own free license or API access credentials prior to download.
URL to existing work (if applicable)
No response
Are there any similar existing nf-core pipelines?
sarek