A complete, reproducible RNA-Seq differential expression analysis pipeline with an interactive Shiny dashboard. Built by independently replicating Shemer et al., Immunity 2020 from raw sequencing data.
⚠️ First load takes ~2 minutes — DESeq2 is running live on real data.
What happens to microglia when they lose the ability to sense IL-10 after an immune challenge — and which genes drive that failure?
More precisely: which genes fail to return to baseline in IL-10R deficient (Mutant) microglia compared to controls at 48h post-LPS — the point of peak hyperactivation described in Figure 3E of the paper?
| Dataset | GSE157234 — Mouse microglia, 48h after peripheral LPS challenge |
| Comparison | IL-10R Mutant (deficient) vs Control (intact signalling) |
| Key finding | Without IL-10 signalling, microglia hyperactivate and overproduce TNF, causing neuronal damage |
| Paper | Shemer et al., Immunity 53, 1033–1049, 2020 |
| DOI | 10.1016/j.immuni.2020.09.018 |
The complete V2 workflow from raw sequencing reads to differential expression results:
flowchart TD
A[📥 Download raw FASTQs from SRA — 9 samples via Galaxy] --> B
B[🔍 FastQC + MultiQC\nPer-sample quality assessment, Check read quality, adapter content, duplication] --> C
C[✂️ Trimmomatic,Adapter trimming + quality filtering, Remove low-quality bases and reads] --> D
D[🧬 HISAT2 Alignment, Map trimmed reads to mm10 genome, NCSBI RefSeq annotation] --> E
E{✅ Check Mapping Rate,Acceptable: > 80% per sample} --> F
F[📊 Post-Alignment QC,Samtools flagstat, Verify alignment statistics] --> G
G[🔢 featureCounts,Count reads per gene,All exons, Entrez Gene IDs] --> H
H[📈 DESeq2 Differential Expression,Normalization → Dispersion → GLM → Wald test\napeglm LFC shrinkage] --> I
I[🖼️ Visualisation,Volcano Plot · PCA Plot · Heatmap,Shiny Interactive Dashboard]
style A fill:#3498db,color:#fff
style B fill:#9b59b6,color:#fff
style C fill:#9b59b6,color:#fff
style D fill:#27ae60,color:#fff
style E fill:#f39c12,color:#fff
style F fill:#27ae60,color:#fff
style G fill:#27ae60,color:#fff
style H fill:#e74c3c,color:#fff
style I fill:#e74c3c,color:#fff
| Feature | Description |
|---|---|
| 🌋 Volcano Plot | Interactive — hover any gene, adjust padj and LFC thresholds live |
| 🔵 PCA Plot | Sample clustering — confirms Mutant vs Control separation at 48h |
| 🟥 Heatmap | Top N DEGs with z-scored expression, adjustable gene count |
| 📋 Results Table | Searchable, filterable DEG table with CSV download |
| 📤 Upload Your Data | Upload your own count matrix + metadata to reuse the full pipeline |
| ⬇️ Downloads | PNG, PDF, and CSV exports for all plots and results |
rna-seq-shiny-pipeline/
│
├── README.md ← This file
├── .gitignore
├── LICENSE ← MIT License
│
├── files/ ← Analysis scripts (4 items)
│ ├── analysis.final.R ← V1 pipeline (UTAP-normalized input)
│ ├── analysis_v2.R ← V2 pipeline (true raw counts) ← USE THIS
│ ├── app.R ← V1 Shiny app
│ └── app_v2.R ← V2 Shiny app ← USE THIS
│
├── data/ ← Data files (2 items)
│ ├── v2/ ← V2 processed data
│ │ ├── count_matrix_raw_v2.csv ← True raw counts (9 samples)
│ │ └── metadata_v2.csv ← Sample metadata (condition assignments)
│ └── raw_counts_featurecounts.tabular ← Galaxy featureCounts output
│
│
├── results/
│ └── v2/
│ ├── DESeq2_results_v2_Mutant_vs_Control.csv
│ ├── top100_upregulated_v2.csv
│ ├── top100_downregulated_v2.csv
│ ├── session_info_v2.txt ← R environment record
│ ├── dds_object_v2.rds ← Pre-computed DESeq2 object*
│ ├── vsd_object_v2.rds ← VST object*
│ └── res_df_v2.rds ← Annotated results dataframe*
│
├── plots/
│ └── v2/
│ ├── volcano_plot_v2.png / .pdf
│ ├── pca_plot_v2.png / .pdf
│ └── heatmap_top50_DEGs_v2.png / .pdf
│
└── deploy/ ← Posit Cloud deployment (3 items)
├── app.R ← V1 deployment app
├── app_v2.R ← V2 deployment app
└── manifest_v2.json ← Auto-generated by rsconnect
*RDS files are excluded from GitHub via
.gitignore— regenerate by runninganalysis_v2.RBlock 13.
git clone https://github.com/mdabrarfaiyaj/rna-seq-shiny-pipeline.git
cd rna-seq-shiny-pipelineFile → Open Project → select rna-seq-shiny-pipeline.Rproj
⚠️ Always open via.Rproj— this sets the working directory correctly so all relative paths work on any machine.
# Place your Galaxy featureCounts tabular file in data/
# Rename it to: raw_counts_featurecounts.tabular
cp "Galaxy363-[Column_join].tabular" data/raw_counts_featurecounts.tabularinstall.packages("BiocManager")
BiocManager::install(c("DESeq2", "apeglm", "org.Mm.eg.db", "AnnotationDbi"))
install.packages(c("shiny", "shinydashboard", "ggplot2", "ggrepel",
"pheatmap", "dplyr", "RColorBrewer", "plotly",
"DT", "viridis"))source("files/analysis_v2.R")shiny::runApp("files/app_v2.R", launch.browser = TRUE)| Step | Tool | Details |
|---|---|---|
| Raw data source | SRA | FASTQs for 9 samples (6 Control, 3 Mutant) |
| Quality control | FastQC + MultiQC (via Galaxy) | Per-sample read quality assessment |
| Trimming | Trimmomatic (via Galaxy) | Adapter removal, quality filtering |
| Alignment | HISAT2 (via Galaxy) | mm10, NCBI RefSeq annotation |
| Mapping QC | Samtools flagstat | Post-alignment statistics |
| Quantification | featureCounts (via Galaxy) | All exons, Entrez Gene IDs |
| Gene ID mapping | org.Mm.eg.db | Entrez IDs → gene symbols |
| Sample subset | All 9 samples | All are 48h post-LPS |
| Excluded | None | DKO samples not present in this SRA subset |
| Differential expression | DESeq2 | design = ~ condition |
| LFC shrinkage | apeglm | Modern replacement for deprecated betaPrior=TRUE |
| Low-count filter | DESeq2 | ≥10 counts in ≥2 samples |
| Significance | DESeq2 | padj < 0.05 AND |log2FC| > 1 |
| Transformation | DESeq2 VST | blind=FALSE, for PCA and heatmap |
| Visualisation | ggplot2, pheatmap, plotly | Volcano, PCA, Heatmap |
V1 downloaded UTAP-normalized counts from GEO (GSE157234) — the only file publicly available at the time. UTAP is the Weizmann Institute's transcriptome pipeline; its output is DESeq2's own size-factor normalized counts. Feeding these back into a new DESeq2 run caused double-normalization, explaining the DEG count difference from the paper. This was a data availability constraint, not a pipeline design error, and was documented transparently in V1.
| Direction | Genes | Key markers |
|---|---|---|
| ⬆️ Up in Mutant | 621 | Tnf, Ccl5, Il12b, Il6, Il1b |
| ⬇️ Down in Mutant | 976 | P2ry12, Sall1, Tmem119, Il10ra |
| Version | Input | Up | Down | Total |
|---|---|---|---|---|
| Paper (Fig 3E) | UTAP raw counts (internal) | 954 | 693 | 1647 |
| V1 (this project) | UTAP-normalized (GEO) | 669 | 894 | 1563 |
| V2 (this project) | True raw counts (SRA) | 621 | 976 | 1597 |
Three documented reasons — none indicate incorrect analysis:
- Annotation: Paper used Gencode vM10 with MARS-seq 3'UTR counting window (1000bp upstream of 3'end). V2 uses NCBI RefSeq counting all exons — a fundamentally different quantification strategy.
- LFC shrinkage: Paper used deprecated
betaPrior=TRUE. V2 uses modernapeglmshrinkage — the correct current approach. - Samples: Paper may have used additional samples not present in the SRA deposit for this comparison.
Key biological markers are confirmed in the correct direction in V2:
- Tnf ↑, Ccl5 ↑, Il6 ↑, Il12b ↑, Il1b ↑ — pro-inflammatory hyperactivation ✅
- P2ry12 ↓, Sall1 ↓, Tmem119 ↓ — loss of homeostatic identity ✅
- PCA shows clean Mutant/Control separation consistent with Figure 3B ✅
All 9 samples are 48h post-LPS (the peak hyperactivation timepoint).
| Column Order | Galaxy Dataset | SRR Accession | Condition |
|---|---|---|---|
| 1 | 196 | SRR12564699 | Mutant |
| 2 | 190 | SRR12564698 | Mutant |
| 3 | 184 | SRR12564697 | Mutant |
| 4 | 178 | SRR12564671 | Control |
| 5 | 172 | SRR12564670 | Control |
| 6 | 166 | SRR12564669 | Control |
| 7 | 160 | SRR12564668 | Control |
| 8 | 154 | SRR12564667 | Control |
| 9 | 148 | SRR12564666 | Control |
Column order confirmed by reading the actual tabular file header — Mutant samples appear first (Galaxy 196 → 148).
Raw FASTQ files are available from NCBI SRA (linked from GEO accession GSE157234). The raw count matrix was not deposited on GEO — only UTAP-normalized counts were publicly available. V2 re-quantifies from SRA FASTQs using HISAT2 + featureCounts via Galaxy, generating true raw counts for methodologically correct DESeq2 input.
Original data: Shemer et al., Immunity 2020. All rights to the original data remain with the submitting authors.
The V2 Shiny dashboard accepts custom uploads:
- Count matrix — CSV, rows = genes, columns = samples, raw integer counts
- Metadata — CSV, rows = samples, must include a
conditioncolumn with exactly 2 groups
This project implements four reproducibility layers:
| Layer | What | How |
|---|---|---|
| set.seed(123) | Reproducible DESeq2 runs | Set before DESeq() call |
| Session info | Exact R and package versions | Saved to results/v2/session_info_v2.txt |
| RDS objects | Pre-computed results | Saved to results/v2/ — app loads in ~1 second |
| renv | Package version locking | Run renv::init() then renv::snapshot() |
To restore exact package versions:
renv::restore()Md. Abrar Faiyaj MSc Biotechnology (Thesis Track) | Junior Research Collaborator, ABCD Laboratory BRAC University, Dhaka, Bangladesh
GEO: GSE157234
Paper: Shemer A, Scheyltjens I, Frumer GR, et al. Interleukin-10 Prevents Pathological Microglia Hyperactivation following Peripheral Endotoxin Challenge. Immunity. 2020;53(5):1033–1049.
DOI: 10.1016/j.immuni.2020.09.018
The V2 alignment and quantification workflow (HISAT2 → featureCounts via Galaxy) was guided by the following Galaxy Training Network resources:
Doyle M, Phipson B, Dashnow H (2026). RNA-Seq reads to counts (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/rna-seq-reads-to-counts/tutorial.html [Accessed: Mon Apr 20 2026]
Hiltemann S, Rasche H, Gladman S, et al. (2023). Galaxy Training: A powerful framework for teaching! PLOS Computational Biology 19(1):e1010752. doi:10.1371/journal.pcbi.1010752
Batut B, Hiltemann S, Bagnacani A, et al. (2018). Community-Driven Data Analysis Training for Biology. Cell Systems 6(6):752–758. doi:10.1016/j.cels.2018.05.012
@misc{transcriptomics-rna-seq-reads-to-counts,
author = {Maria Doyle and Belinda Phipson and Harriet Dashnow},
title = {{RNA-Seq reads to counts (Galaxy Training Materials)}},
year = {2026},
url = {https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/rna-seq-reads-to-counts/tutorial.html},
note = {[Online; accessed Mon Apr 20 2026]}
}
@article{Hiltemann_2023,
doi = {10.1371/journal.pcbi.1010752},
url = {https://doi.org/10.1371/journal.pcbi.1010752},
year = {2023},
month = {jan},
publisher = {Public Library of Science ({PLoS})},
volume = {19},
number = {1},
pages = {e1010752},
author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and
Hans-Rudolf Hotz and Delphine Larivière and Daniel Blankenberg
and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau
and Nadia Goué and Timothy J. Griffin and Coline Royaux and
Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens
and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and
Fotis Psomopoulos and Cristóbal Gallardo-Alba and John Davis and
Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle
and Beatriz Serrano-Solano and Anne Claire Fouilloux and
Peter van Heusden and Wolfgang Maier and Dave Clements and
Florian Heyl and Björn Grüning and Bérénice Batut},
editor = {Francis Ouellette},
title = {{Galaxy Training: A powerful framework for teaching!}},
journal = {PLoS Comput Biol}
}
@article{Batut_2018,
doi = {10.1016/j.cels.2018.05.012},
url = {https://doi.org/10.1016/j.cels.2018.05.012},
year = {2018},
publisher = {Elsevier},
volume = {6},
number = {6},
pages = {752--758},
author = {Bérénice Batut and Saskia Hiltemann and Andrea Bagnacani and
Dannon Baker and Vivek Bhardwaj and Clemens Blank and
Anthony Bretaudeau and Loraine Brillet-Guéguen and Björn Grüning
and others},
title = {{Community-Driven Data Analysis Training for Biology}},
journal = {Cell Systems}
}Code: MIT License — free to use and adapt with attribution.
Data: Original GEO data (GSE157234) remains subject to Shemer et al. 2020 terms. Data not redistributed in this repository — download directly from NCBI GEO or SRA.