v1.1.2
Archival release of a reproducible bulk RNA-seq differential expression workflow for nasopharyngeal SARS-CoV-2 host-response analysis using GEO GSE152075.
Scope
This repository starts from the published count matrix and sample metadata provided through GEO. It does not perform raw-read processing, alignment, or quantification. The focus is downstream differential expression, enrichment analysis, reproducibility, and result validation.
Contents
- Quality control and balanced subset construction
- PCA and exploratory visualization
- DESeq2 differential expression analysis
apeglmlog2 fold-change shrinkage for ranking and visualization- GO Biological Process and KEGG pathway enrichment
- Full-cohort sensitivity analysis
- Committed figures and result tables
- Pinned software environment via
renv - GitHub Actions checks for environment consistency, linting, tests, and rebuild validation
Main outputs
- Balanced primary analysis: 1,902 thresholded differentially expressed genes
- Full-cohort sensitivity analysis: 4,371 thresholded differentially expressed genes
- Shared thresholded genes: 1,314
- Shared effect-direction concordance: 99.7%
Reproducibility
From the repository root:
Rscript 000_install_dependencies.RRscript run_all.RRscript -e 'renv::status()'Rscript dev/lint.RRscript -e 'testthat::test_dir("tests/testthat")'
Positioning
Relative to broader workflow standards, the repository includes pinned dependencies, deterministic seeds, committed derived outputs, CI rebuild checks, and explicit session provenance. It remains intentionally lightweight and reviewable, while leaving upstream FASTQ-level processing and covariate-rich modeling out of scope.
Data source
Lieberman NAP et al. (2020). In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biology 18(9): e3000849.
GEO accession: GSE152075
DOI: https://doi.org/10.1371/journal.pbio.3000849