Releases · Ekin-Kahraman/bulk-rnaseq-differential-expression

05 Apr 14:40

Ekin-Kahraman

v2.1.0

2a4a104

v2.1.0 — Covariate-adjusted model + workflow diagram Latest

Latest

Changes since v2.0.0

Primary DE model adjusted for sex covariate (~ condition + gender). Addresses the main biological weakness: unadjusted model despite having sex metadata. Results regenerated: 1,773 DE genes (was 1,902), 99.8% sign concordance with full cohort.
Workflow diagram added to README (ASCII pipeline visualization).
NA gender filtering added before balanced sampling and full-cohort analysis.
All figures, tables, and tests regenerated with the covariate-adjusted model. CI passes including full pipeline rebuild.

Assets 2

23 Mar 15:12

Ekin-Kahraman

v2.0.0

cd2da9a

v2.0.0 — Extended covariate analyses

What's New

Extended Analyses

Viral load stratification: High vs low Ct differential expression with ISG dose-response correlation (Script 10)
Sex-stratified interaction model: condition x gender interaction effects identifying sex-biased genes (Script 11)

Key Results

1,510 DE genes between high/low viral load groups
12 genes with significant condition x sex interaction (9 male-biased, 3 female-biased)
ISG dose-dependent gradient confirmed across viral load strata

Infrastructure

Cross-platform numeric tolerance for CI reproducibility
Updated KEGG pathway database compatibility
All linting issues resolved

PRs Merged

#3: Add viral load stratification and sex-stratified interaction analyses
#4: Rename Novel to Extended in script comments
#5: Fix lint spacing in expression operators
#6: Update KEGG pathway table for current database version
#7: Widen numeric tolerance for cross-platform reproducibility
#8: Bump tolerance to 1e-3 for cross-platform p-value drift

Assets 2

07 Mar 12:22

Ekin-Kahraman

v1.1.2

3fb071a

v1.1.2

Archival release of a reproducible bulk RNA-seq differential expression workflow for nasopharyngeal SARS-CoV-2 host-response analysis using GEO GSE152075.

Scope

This repository starts from the published count matrix and sample metadata provided through GEO. It does not perform raw-read processing, alignment, or quantification. The focus is downstream differential expression, enrichment analysis, reproducibility, and result validation.

Quality control and balanced subset construction
PCA and exploratory visualization
DESeq2 differential expression analysis
apeglm log2 fold-change shrinkage for ranking and visualization
GO Biological Process and KEGG pathway enrichment
Full-cohort sensitivity analysis
Committed figures and result tables
Pinned software environment via renv
GitHub Actions checks for environment consistency, linting, tests, and rebuild validation

Main outputs

Balanced primary analysis: 1,902 thresholded differentially expressed genes
Full-cohort sensitivity analysis: 4,371 thresholded differentially expressed genes
Shared thresholded genes: 1,314
Shared effect-direction concordance: 99.7%

Reproducibility

From the repository root:

Rscript 000_install_dependencies.R
Rscript run_all.R
Rscript -e 'renv::status()'
Rscript dev/lint.R
Rscript -e 'testthat::test_dir("tests/testthat")'

Positioning

Relative to broader workflow standards, the repository includes pinned dependencies, deterministic seeds, committed derived outputs, CI rebuild checks, and explicit session provenance. It remains intentionally lightweight and reviewable, while leaving upstream FASTQ-level processing and covariate-rich modeling out of scope.

Data source

Lieberman NAP et al. (2020). In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biology 18(9): e3000849.
GEO accession: GSE152075
DOI: https://doi.org/10.1371/journal.pbio.3000849

Assets 2

05 Feb 21:45

Ekin-Kahraman

v1.1.1

de81844

Robustness and reproducibility update

Summary

This release improves reproducibility for the SARS-CoV-2 host-response bulk RNA-seq pipeline (GSE152075), with no intended changes to the core analysis design.

What changed

Hardened data ingestion in scripts/00_get_data.R:
- stricter sample ID validation
- explicit checks for count/metadata overlap
- controlled handling of unknown condition labels
Improved QC safety in scripts/01_qc.R:
- fail fast checks for sample/metadata alignment
- explicit handling when one condition has too few samples
Improved enrichment robustness in scripts/06_enrichment.R:
- explicit stop when no DE genes pass thresholds
- graceful KEGG failure handling for transient network/service issues
- deterministic writing of expected enrichment output tables
Expanded smoke tests in tests/testthat/test-smoke.R:
- p-value sanity checks
- verification of key derived enrichment tables
Documentation update in README.md:
- added “Data and Code Availability”
- added “Peer Review Checklist”

Validation

renv::status() clean
Rscript dev/lint.R passes
testthat::test_dir("tests/testthat") passes
GitHub Actions CI passes on main

Data and citation

Source dataset: GEO GSE152075
Repository DOI (Zenodo concept DOI): 10.5281/zenodo.18432519
CITATION.cff included for citation metadata

Assets 2

01 Feb 06:17

Ekin-Kahraman

v1.1.0

340011e

v1.1.0 - Reproducibility and bug fixes

What's Changed

Added run_all.R for one-command pipeline execution
Added 000_install_dependencies.R for easy setup
Added input validation to all scripts
Added results tables (CSV files)
Fixed library size description (20M reads)
Fixed NaN handling in heatmap scaling
Fixed scree plot assignment before ggsave
Removed temp files from repo

Quick Start

source("000_install_dependencies.R")
source("run_all.R")

Assets 2

30 Jan 21:06

Ekin-Kahraman

v1.0.1

5ffc734

v1.0.1 - MA plot fix and documentation updates

Fixed MA plot rendering issue.

Assets 2

30 Jan 14:41

Ekin-Kahraman

v1.0.0

69a79ee

v1.0.0 - Bulk RNA-seq differential expression pipeline

Reproducible bulk RNA-seq analysis pipeline for SARS-CoV-2 host response (GEO GSE152075).

Key results:

1,902 differentially expressed genes (FDR < 0.05, |log₂FC| > 1)
Top pathway: Coronavirus disease - COVID-19 (FDR = 1.5×10⁻⁴⁰)
529 enriched GO terms, 28 KEGG pathways

Pipeline:

Quality control and filtering
PCA and exploratory analysis
DESeq2 differential expression
GO/KEGG pathway enrichment
Model diagnostics and visualization

Reference: Lieberman et al. (2020) PLoS Biology
License: MIT

Assets 2

Releases: Ekin-Kahraman/bulk-rnaseq-differential-expression

v2.1.0 — Covariate-adjusted model + workflow diagram

Changes since v2.0.0

Uh oh!

v2.0.0 — Extended covariate analyses

What's New

Extended Analyses

Key Results

Infrastructure

PRs Merged

Uh oh!

v1.1.2

Scope

Contents

Main outputs

Reproducibility

Positioning

Data source

Uh oh!

Robustness and reproducibility update

Summary

What changed

Validation

Data and citation

Uh oh!

v1.1.0 - Reproducibility and bug fixes

What's Changed

Quick Start

Uh oh!

v1.0.1 - MA plot fix and documentation updates

Uh oh!

v1.0.0 - Bulk RNA-seq differential expression pipeline

Uh oh!