Skip to content

Releases: Ekin-Kahraman/bulk-rnaseq-differential-expression

v2.1.0 — Covariate-adjusted model + workflow diagram

05 Apr 14:40

Choose a tag to compare

Changes since v2.0.0

  • Primary DE model adjusted for sex covariate (~ condition + gender). Addresses the main biological weakness: unadjusted model despite having sex metadata. Results regenerated: 1,773 DE genes (was 1,902), 99.8% sign concordance with full cohort.
  • Workflow diagram added to README (ASCII pipeline visualization).
  • NA gender filtering added before balanced sampling and full-cohort analysis.
  • All figures, tables, and tests regenerated with the covariate-adjusted model. CI passes including full pipeline rebuild.

v2.0.0 — Extended covariate analyses

23 Mar 15:12
cd2da9a

Choose a tag to compare

What's New

Extended Analyses

  • Viral load stratification: High vs low Ct differential expression with ISG dose-response correlation (Script 10)
  • Sex-stratified interaction model: condition x gender interaction effects identifying sex-biased genes (Script 11)

Key Results

  • 1,510 DE genes between high/low viral load groups
  • 12 genes with significant condition x sex interaction (9 male-biased, 3 female-biased)
  • ISG dose-dependent gradient confirmed across viral load strata

Infrastructure

  • Cross-platform numeric tolerance for CI reproducibility
  • Updated KEGG pathway database compatibility
  • All linting issues resolved

PRs Merged

  • #3: Add viral load stratification and sex-stratified interaction analyses
  • #4: Rename Novel to Extended in script comments
  • #5: Fix lint spacing in expression operators
  • #6: Update KEGG pathway table for current database version
  • #7: Widen numeric tolerance for cross-platform reproducibility
  • #8: Bump tolerance to 1e-3 for cross-platform p-value drift

v1.1.2

07 Mar 12:22

Choose a tag to compare

Archival release of a reproducible bulk RNA-seq differential expression workflow for nasopharyngeal SARS-CoV-2 host-response analysis using GEO GSE152075.

Scope

This repository starts from the published count matrix and sample metadata provided through GEO. It does not perform raw-read processing, alignment, or quantification. The focus is downstream differential expression, enrichment analysis, reproducibility, and result validation.

Contents

  • Quality control and balanced subset construction
  • PCA and exploratory visualization
  • DESeq2 differential expression analysis
  • apeglm log2 fold-change shrinkage for ranking and visualization
  • GO Biological Process and KEGG pathway enrichment
  • Full-cohort sensitivity analysis
  • Committed figures and result tables
  • Pinned software environment via renv
  • GitHub Actions checks for environment consistency, linting, tests, and rebuild validation

Main outputs

  • Balanced primary analysis: 1,902 thresholded differentially expressed genes
  • Full-cohort sensitivity analysis: 4,371 thresholded differentially expressed genes
  • Shared thresholded genes: 1,314
  • Shared effect-direction concordance: 99.7%

Reproducibility

From the repository root:

  • Rscript 000_install_dependencies.R
  • Rscript run_all.R
  • Rscript -e 'renv::status()'
  • Rscript dev/lint.R
  • Rscript -e 'testthat::test_dir("tests/testthat")'

Positioning

Relative to broader workflow standards, the repository includes pinned dependencies, deterministic seeds, committed derived outputs, CI rebuild checks, and explicit session provenance. It remains intentionally lightweight and reviewable, while leaving upstream FASTQ-level processing and covariate-rich modeling out of scope.

Data source

Lieberman NAP et al. (2020). In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biology 18(9): e3000849.
GEO accession: GSE152075
DOI: https://doi.org/10.1371/journal.pbio.3000849

Robustness and reproducibility update

05 Feb 21:45

Choose a tag to compare

Summary

This release improves reproducibility for the SARS-CoV-2 host-response bulk RNA-seq pipeline (GSE152075), with no intended changes to the core analysis design.

What changed

  • Hardened data ingestion in scripts/00_get_data.R:
    • stricter sample ID validation
    • explicit checks for count/metadata overlap
    • controlled handling of unknown condition labels
  • Improved QC safety in scripts/01_qc.R:
    • fail fast checks for sample/metadata alignment
    • explicit handling when one condition has too few samples
  • Improved enrichment robustness in scripts/06_enrichment.R:
    • explicit stop when no DE genes pass thresholds
    • graceful KEGG failure handling for transient network/service issues
    • deterministic writing of expected enrichment output tables
  • Expanded smoke tests in tests/testthat/test-smoke.R:
    • p-value sanity checks
    • verification of key derived enrichment tables
  • Documentation update in README.md:
    • added “Data and Code Availability”
    • added “Peer Review Checklist”

Validation

  • renv::status() clean
  • Rscript dev/lint.R passes
  • testthat::test_dir("tests/testthat") passes
  • GitHub Actions CI passes on main

Data and citation

  • Source dataset: GEO GSE152075
  • Repository DOI (Zenodo concept DOI): 10.5281/zenodo.18432519
  • CITATION.cff included for citation metadata

v1.1.0 - Reproducibility and bug fixes

01 Feb 06:17

Choose a tag to compare

What's Changed

  • Added run_all.R for one-command pipeline execution
  • Added 000_install_dependencies.R for easy setup
  • Added input validation to all scripts
  • Added results tables (CSV files)
  • Fixed library size description (20M reads)
  • Fixed NaN handling in heatmap scaling
  • Fixed scree plot assignment before ggsave
  • Removed temp files from repo

Quick Start

source("000_install_dependencies.R")
source("run_all.R")

v1.0.1 - MA plot fix and documentation updates

30 Jan 21:06

Choose a tag to compare

Fixed MA plot rendering issue.

v1.0.0 - Bulk RNA-seq differential expression pipeline

30 Jan 14:41

Choose a tag to compare

Reproducible bulk RNA-seq analysis pipeline for SARS-CoV-2 host response (GEO GSE152075).

Key results:

  • 1,902 differentially expressed genes (FDR < 0.05, |log₂FC| > 1)
  • Top pathway: Coronavirus disease - COVID-19 (FDR = 1.5×10⁻⁴⁰)
  • 529 enriched GO terms, 28 KEGG pathways

Pipeline:

  • Quality control and filtering
  • PCA and exploratory analysis
  • DESeq2 differential expression
  • GO/KEGG pathway enrichment
  • Model diagnostics and visualization

Reference: Lieberman et al. (2020) PLoS Biology
License: MIT