-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Pipeline title/name
swgsrelate
Keywords
genomics, relatedness, lcWGS, variant calling, GATK, bcftools, BQSR, population genetics, non-model organisms
What is it about?
The pipeline provides a complete workflow for estimating pairwise relatedness in diploid eukaryotic organisms using low-coverage whole-genome sequencing (lcWGS) data.
It performs read preprocessing, base quality score recalibration (BQSR), variant calling (using both GATK and bcftools), and multiple complementary relatedness estimation approaches.
Originally developed for wild guineafowl populations, the workflow is generalizable to any diploid species with a reference genome but without high-confidence variant resources. The analytical logic follows the general approach described in Snyder-Mackler et al. 2016, adapted for low-coverage whole-genome sequencing (lcWGS) data in non-model organisms.
Please provide a schematic diagram of the proposed pipeline
A schematic overview will be added once the new DSL2 workflow structure is finalized.
At present, the pipeline conceptually consists of four modular stages:
(1) preprocessing, (2) base quality score recalibration (BQSR),
(3) variant calling, and (4) relatedness estimation.
I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:
- be built with Nextflow.
- pass nf-core lint tests and use standardized parameters.
- be community-owned and developed within the nf-core organization.
- open source under the MIT license with proper credits and acknowledgments.
- have a descriptive, all lowercase, and without punctuation name.
- use the nf-core pipeline template and predominantly use official nf-core modules.
- focus on a specific data/analysis type with appropriate scope.
- have properly maintained documentation.
- be bundled using versioned Docker/Singularity containers.
Why do we need a new pipeline?
An older, non–nf-core, DSL1 version of this workflow already exists, but:
- It predates nf-core and modern Nextflow DSL2 module/subworkflow standards.
- It uses a monolithic structure that prevents modular reuse and containerized reproducibility.
- It was licensed under GPLv3, which prevents integration into nf-core (MIT license).
- Practically all code must be rewritten from scratch to meet nf-core style and configuration standards.
The new pipeline will therefore be a complete reimplementation, retaining only the workflow logic while adopting the nf-core structure and community standards.
Who would be interested?
- Population and evolutionary biologists working with low-coverage WGS data on non-model species
- Research groups estimating kinship and relatedness in field-collected samples
- Core sequencing and bioinformatics facilities supporting such projects
- Comparative genomics researchers using GATK, bcftools, ANGSD, and related tools in lcWGS contexts
What has been done so far
- A fresh nf-core template repository (DSL2, MIT) has been initialized.
- Early modularization of the preprocessing and variant-calling stages is in progress.
- The previous GPLv3-based DSL1 workflow has been reviewed for conceptual structure but will not be reused in code form.
URL to existing work (if applicable)
https://github.com/GiselaHKopp/sWGS_Relatedness_Pipeline_v1
Are there any similar existing nf-core pipelines?
No response