Skip to content

JackSherry6/TFBindingMap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChIP-seq Analysis Pipeline

A reproducible, modular ChIP-seq analysis pipeline for understanding transcription factor binding, built with Nextflow, designed for alignment, QC, peak calling, annotation, motif analysis, and signal-based visualization. The workflow supports automated processing of multiple replicates using user-provided metadata and reference files.

Table of Contents

  1. Features
  2. Workflow Visualization
  3. Requirements
  4. Installation
  5. Usage
  6. Configuration
  7. Output Figures
  8. Contributing

Features

  • Modular Nextflow pipeline with clear separation of steps:
    • Read QC (FastQC + MultiQC)
    • Adapter trimming
    • Alignment with Bowtie2
    • Sorted BAM generation and indexing
    • Coverage track generation (BigWig)
    • Multi-sample correlation and signal profiling
    • Peak calling (HOMER)
    • Peak intersection across replicates
    • Blacklist removal
    • Peak annotation
    • Motif analysis
  • Docker/Singularity container support for reproducibility
  • Automatic logging and error handling
  • Scalable to large RNA-seq datasets
  • Supports both BU SCC and AWS Batch execution

Workflow Visualization

Mermaid-plot

Requirements

  • Must have a conda environment with nextflow in order to run nextflow
  • Modules already installed on BU Shared Computing cluster (SCC)
  • If using aws, see envs file for all packages to install
  • If not using BU SCC, see envs directory for software and version information

Installation

Usage

Basic execution:

  • module load miniconda
  • conda activate <name_of_your_nexflow_conda_env>
  • Add your samples to samplesheet in the format specified in csv file
  • Set all params in config files to the locations of your files
  • nextflow run main.nf -profile conda,cluster (for waxman lab you should always run on the cluster, but if using aws, substitute aws for cluster)

Configuration

  • Edit the nextflow.config file to:
    • Set input paths (reads, gtf, blacklist, etc...)
    • Adjust queueSize based on the number of samples
    • Optionally set resume = true to continue interrupted runs

Example Output Figures

Sample heatmap

Example Binding Profile Plot

Post processing example figures (not generated by pipeline directly):

NEAT1 ucsc track

RunX1 general binding locations

Bound gene enrichment terms

Contributing

  • Email me at jgsherry@bu.edu for additional information or contributing information

About

A scalable, reproducible Nextflow-based ChIP-seq pipeline that automates quality control, alignment, peak calling, and downstream analysis for raw sequencing data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors