This pipeline is designed for DNA sequencing analysis, specifically tailored for cardiology research. It leverages Snakemake for workflow management and Conda for environment management.
-
Clone the repository:
git clone https://github.com/yourusername/icc-pipeline.git cd icc-pipeline -
Install Conda environments:
conda env create -f environment.yml
-
Activate the environment:
conda activate icc_pipeline
To run the pipeline, use the cidna.py script with the required arguments:
./cidna.py run workflow/config.yml -i /path/to/input -o /path/to/output -- -c88 --printshellcmds --rerun-incompleteThe pipeline consists of several steps, each managed by Snakemake rules. The main steps include:
- Quality Control: Initial quality checks on raw sequencing data.
- Trimming: Removing adapters and low-quality bases.
- Alignment: Aligning reads to the reference genome.
- Variant Calling: Identifying variants from the aligned reads.
- Variant Filtering: Filtering the identified variants.
- Annotation: Annotating the filtered variants.
- Rule:
raw_fastqc - Description: Runs FastQC on raw sequencing data.
- Input: Raw FASTQ files.
- Output: FastQC reports.
- Rule:
trimming - Description: Trims adapters and low-quality bases using Prinseq.
- Input: Raw FASTQ files.
- Output: Trimmed FASTQ files.
- Rule:
bwa_alignment - Description: Aligns trimmed reads to the reference genome using BWA.
- Input: Trimmed FASTQ files.
- Output: BAM files.
- Rule:
haplotypecaller - Description: Calls variants using GATK HaplotypeCaller.
- Input: Realigned BAM files.
- Output: VCF files.
- Rule:
filter_variants - Description: Filters variants using GATK VariantFiltration.
- Input: VCF files.
- Output: Filtered VCF files.
- Rule:
annotate_variants - Description: Annotates variants using VEP.
- Input: Filtered VCF files.
- Output: Annotated VCF files.
The pipeline tracks resource usage, including CPU, memory, and network usage. A detailed report is generated at the end of the pipeline run.
- Removed IndelRealigner and RealignerTargetCreator because gatk4 HaplotypeCaller realigns the reads on the fly.
- Removed samtools and used sambamba with the same filters as the original command
- Removed gatk CallableLoci and kept gatk DepthOfCoverage. Both are coverage-based and can be addressed in one step with the right configuration. (the below args need to be added though)
--omitDepthOutputAtEachBase false
--includeDeletions true