Skip to content

Mariacherepkova/GATA6_bulk_ATAC-seq-analyisis

Repository files navigation

GATA6_bulk_ATAC-seq-analyisis

This repository contains the workflow and scripts used for processing bulk ATAC-seq data for the GATA6 perturbation experiment. The pipeline consists of read trimming and QC, reference indexing, read alignment with QuasR/Rbowtie, peak calling, and differential accessibility analysis.

All steps are run on the DMLS cluster.
Environments are managed with micromamba and R project reproducibility is controlled with renv.


Environment Setup

Before starting to set up the environments request the interactive session on cluster. For installing the environments it's better to request more memory and at least 2-3 h. Also, you can use tmux to not loose the interactive session when loosing internet connection.

srun --pty -n 1 -c 2 --time=02:00:00 --mem=32G bash -l

Load micromamba in your shell (add to ~/.bashrc if not already present). Read ScienceCluster manual for more information.

eval "$(micromamba shell hook --shell bash)"

1) Trimming + QC Environment (atac-trim)

micromamba create -n atac-trim -c bioconda -c conda-forge \
  trimmomatic fastqc multiqc pigz openjdk=17 -y

Activate:

micromamba activate atac-trim

Check:

trimmomatic -version
fastqc --version
multiqc --version

The Nextera adapter file used in trimming is automatically provided: ${CONDA_PREFIX}/share/trimmomatic/adapters/NexteraPE-PE.fa

2) Reference Preparation Environment (refprep)

Used in: bowtie_index.sbatch.sh Tools required: bowtie-build + samtools

micromamba create -n refprep -c bioconda -c conda-forge \
  bowtie samtools -y

Activate:

micromamba activate refprep

Check:

which bowtie-build
samtools --version

_Useful commands: _

list installed environments: bash mamba -list env

3) R + QuasR Alignment Environment (renv)

This env provides R and system libraries; R packages are tracked inside the project via renv.

micromamba create -p /home/mchere/data/conda/envs/renv \
  -c conda-forge -c bioconda \
  r-base=4.4 r-devtools r-xml r-data.table r-rcpp r-rlang r-cli \
  bowtie samtools -y

Activate:

micromamba activate /home/mchere/data/conda/envs/renv

This is what worked for Lena:

micromamba create -p /home/lewolff/data/conda/envs/R_envr -c conda-forge -c bioconda r-base=4.4 r-devtools r-xml r-data.table r-rcpp r-rlang r-cli bioconductor-biocparallel bioconductor-biocgenerics bioconductor-genomicranges bioconductor-genomicalignments bioconductor-rsamtools bioconductor-rhtslib bioconductor-rtracklayer bioconductor-bsgenome bioconductor-genomicfeatures bioconductor-variantannotation bioconductor-genomicfiles bowtie samtools -y
micromamba install -p /home/lewolff/data/conda/envs/R_envr -c conda-forge -c bioconda -y xz zlib bzip2 curl make gcc gxx

Inside R (first setup only):

install.packages("BiocManager")
BiocManager::install(c("QuasR","Rbowtie","GenomicRanges","rtracklayer","BiocParallel"))
install.packages("renv")
renv::init()      # or renv::restore() if lockfile already exists
renv::snapshot()

4) MACS3 Environment for peak calling

mamba create -y -n macs3-env -c conda-forge python=3.10 samtools bedtools pip
mamba activate macs3-env
pip install "macs3>=3.0.1"

Directory Structure - NEEDS TO BE UPDATED

GATA6_bulk-ATAC-seq/
├─ scripts/                     # sbatch scripts and helper scripts
│  ├─ run_trimmomatic.sbatch.sh
│  ├─ quasr_align_mm10.sbatch.sh
│  └─ bowtie_index.sbatch.sh
├─ trimmed_fastqs/              # output of trimming
├─ bams/                        # aligned reads
├─ peaks/                       # MACS2 outputs
├─ qc/                          # FastQC / MultiQC reports
├─ R/                           # R analysis scripts
│  ├─ 01_quasr_align.R
│  ├─ 02_count_and_merge.R
│  ├─ 03_deseq2_analysis.R
│  └─ renv.lock
└─ README.md

Workflow Summary

Files location

/shares/domcke.dmls.uzh/external/data/Domcke_bulkATAC-seq

1) Trim reads

sbatch scripts/run_trimmomatic.sbatch.sh

This script also does FastQC before and after trimming:

sbatch run_trimmomatic_fastqc.sbatch.sh 

2) Prepare bowtie index (if needed)

sbatch scripts/bowtie_index.sbatch.sh /path/to/genome.fa

3) Align reads with QuasR/Rbowtie

Edit quasr_samples_full.tsv to list sample names and paired FASTQs. Run:

sbatch scripts/quasr_align_mm10.sbatch.sh

This generates:

proj.rds alignment project

alignment QC report (qAlign_QC_mm10.pdf)

4) Peak calling

5) merging, filtering, DE analysis

Performed in R using Science Apps - scripts will be added later

Specific parameters used

  1. For Trimmomatic:
# ---- Adapters from the environment ----
ADAPTERS="${CONDA_PREFIX}/share/trimmomatic/adapters/NexteraPE-PE.fa"

# ---- Threads ----
THREADS="${SLURM_CPUS_PER_TASK:-8}"

# ---- Run Trimmomatic ----
for f1 in "${INDIR}"/bulk*R1.fastq.gz; do
    [[ -e "$f1" ]] || { echo "No input files matched. Exiting."; exit 1; }

    echo "Processing ${f1}"
    f2="${f1/_R1/_R2}"

    # Safe basenames and stems
    b1="$(basename "$f1")"
    b2="$(basename "$f2")"
    s1="${b1%.fastq.gz}"
    s2="${b2%.fastq.gz}"

    # Output files
    o1="${OUTDIR}/${s1}.trim.fastq.gz"
    u1="${OUTDIR}/${s1}.trim.unpaired.fastq.gz"
    o2="${OUTDIR}/${s2}.trim.fastq.gz"
    u2="${OUTDIR}/${s2}.trim.unpaired.fastq.gz"

    trimmomatic PE -threads "${THREADS}" \
        "$f1" "$f2" \
        "$o1" "$u1" \
        "$o2" "$u2" \
        ILLUMINACLIP:"${ADAPTERS}":2:30:10:1:true \
        TRAILING:3 SLIDINGWINDOW:4:10 MINLEN:20
done
  1. For alignment:
Rscript -e '
library(QuasR)
proj <- qAlign(
  sampleFile = "quasr_samples_full.tsv",
  genome     = "/shares/domcke.dmls.uzh/genomes/mm10/mm10.fa",
  aligner    = "Rbowtie",
  paired     = "fr"
)
qQCReport(proj, pdfFile = "/home/mchere/scratch/GATA6_bulk-ATAC-seq_MCLW/qAlign_QC_mm10.pdf")
'

Genome used:

/shares/domcke.dmls.uzh/genomes/mm10
  1. For peak calling:
# --- MACS3 settings for ATAC-seq (paired-end) ---
# For paired-end ATAC, use -f BAMPE and no shifting (MACS uses fragment spans).
# Genome: mm10 -> use -g mm
QVAL=0.01

# Loop over BAMs and call peaks individually
for BAM in "${BAM_DIR}"/*.bam; do
  # skip index files just in case
  [[ "${BAM}" == *.bam ]] || continue

  SAMPLE="$(basename "${BAM}" .bam)"
  OUTDIR="${OUT_ROOT}/${SAMPLE}"
  mkdir -p "${OUTDIR}"

  echo "[$(date)] Calling peaks for ${SAMPLE}"
  macs3 callpeak \
    -t "${BAM}" \
    -f BAMPE \
    -g mm \
    -n "${SAMPLE}" \
    --outdir "${OUTDIR}" \
    --keep-dup all \
    -q "${QVAL}"

  # Produce a narrowPeak BED sorted and indexed
  if [ -f "${OUTDIR}/${SAMPLE}_peaks.narrowPeak" ]; then
    sort -k1,1 -k2,2n "${OUTDIR}/${SAMPLE}_peaks.narrowPeak" > "${OUTDIR}/${SAMPLE}.narrowPeak.sorted.bed"
    bgzip -f "${OUTDIR}/${SAMPLE}.narrowPeak.sorted.bed"
    tabix -p bed "${OUTDIR}/${SAMPLE}.narrowPeak.sorted.bed.gz"
  fi

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages