ICC Pipeline

This pipeline is designed for DNA sequencing analysis, specifically tailored for cardiology research. It leverages Snakemake for workflow management and Conda for environment management.

Installation

Clone the repository:

git clone https://github.com/yourusername/icc-pipeline.git
cd icc-pipeline

Install Conda environments:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate icc_pipeline
```

Usage

To run the pipeline, use the cidna.py script with the required arguments:

./cidna.py run workflow/config.yml -i /path/to/input -o /path/to/output -- -c88 --printshellcmds --rerun-incomplete

Workflow Overview

The pipeline consists of several steps, each managed by Snakemake rules. The main steps include:

Quality Control: Initial quality checks on raw sequencing data.
Trimming: Removing adapters and low-quality bases.
Alignment: Aligning reads to the reference genome.
Variant Calling: Identifying variants from the aligned reads.
Variant Filtering: Filtering the identified variants.
Annotation: Annotating the filtered variants.

Rules

Quality Control

Rule: raw_fastqc
Description: Runs FastQC on raw sequencing data.
Input: Raw FASTQ files.
Output: FastQC reports.

Trimming

Rule: trimming
Description: Trims adapters and low-quality bases using Prinseq.
Input: Raw FASTQ files.
Output: Trimmed FASTQ files.

Alignment

Rule: bwa_alignment
Description: Aligns trimmed reads to the reference genome using BWA.
Input: Trimmed FASTQ files.
Output: BAM files.

Variant Calling

Rule: haplotypecaller
Description: Calls variants using GATK HaplotypeCaller.
Input: Realigned BAM files.
Output: VCF files.

Variant Filtering

Rule: filter_variants
Description: Filters variants using GATK VariantFiltration.
Input: VCF files.
Output: Filtered VCF files.

Annotation

Rule: annotate_variants
Description: Annotates variants using VEP.
Input: Filtered VCF files.
Output: Annotated VCF files.

Resource Tracking

The pipeline tracks resource usage, including CPU, memory, and network usage. A detailed report is generated at the end of the pipeline run.

Inhouse vs new pipeline changelog

Removed IndelRealigner and RealignerTargetCreator because gatk4 HaplotypeCaller realigns the reads on the fly.
Removed samtools and used sambamba with the same filters as the original command
Removed gatk CallableLoci and kept gatk DepthOfCoverage. Both are coverage-based and can be addressed in one step with the right configuration. (the below args need to be added though) --omitDepthOutputAtEachBase false
--includeDeletions true

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
workflow		workflow
.gitignore		.gitignore
README.md		README.md
cidna.py		cidna.py
cidnap.py		cidnap.py
dag.png		dag.png
report.html		report.html
rulegraph.png		rulegraph.png
samples.csv		samples.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICC Pipeline

Table of Contents

Installation

Usage

Workflow Overview

Rules

Quality Control

Trimming

Alignment

Variant Calling

Variant Filtering

Annotation

Resource Tracking

Inhouse vs new pipeline changelog

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ICC Pipeline

Table of Contents

Installation

Usage

Workflow Overview

Rules

Quality Control

Trimming

Alignment

Variant Calling

Variant Filtering

Annotation

Resource Tracking

Inhouse vs new pipeline changelog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages