This repository is set up so a reviewer can reproduce the analysis with a small number of commands.
- Code: analysis scripts in
scripts/and an orchestrator inrun_all.R. - Version pinning:
renv.lockpins CRAN + Bioconductor package versions. - Pre-computed outputs: key figures and tables are committed under
results/for convenience and quick verification. - Analysis summary:
results/tables/analysis_summary.csvcaptures the main counts used in the narrative. - Output manifest:
results/tables/output_manifest.csvrecords file sizes and MD5 checksums for committed figures and tables. - Pinned pathway snapshot:
data/reference/kegg_hsa_pathway_*.tsvfreezes the KEGG human pathway universe used by enrichment.
Run these commands from the repository root (i.e., a fresh clone):
# Restore/install pinned dependencies from renv.lock
Rscript 000_install_dependencies.R
# Run the full pipeline (downloads data if needed, then regenerates results)
Rscript run_all.RMaintainers who intentionally change dependencies should refresh the lockfile explicitly:
Rscript dev/snapshot_lockfile.RThe data download step (scripts/00_get_data.R) is idempotent:
- If
data/counts_raw.rdsanddata/metadata.rdsalready exist, it will skip re-downloading. - To force a fresh download from GEO:
FORCE_DOWNLOAD=true Rscript scripts/00_get_data.RThe GEO download step (scripts/00_get_data.R) requires network access on first run. KEGG enrichment does not query live KEGG during routine analysis; it reads the pinned human pathway snapshot in data/reference/ so exact table comparisons remain meaningful when KEGG changes upstream.
The balanced subset selection uses a fixed seed (set.seed(123) in scripts/01_qc.R) so repeated runs should yield the same subset and downstream results, given the same package versions. Figure label placement for ggrepel-based figures is also seeded, and results/session_info.txt now records the active git commit, branch, and analysis configuration.
After a successful run, you should see (among others):
results/tables/deseq2_results.csvresults/tables/deseq2_results_shrunken.csvresults/tables/full_cohort_deseq2_results.csvresults/tables/analysis_summary.csvresults/tables/output_manifest.csvresults/figures/volcano_plot.pngresults/figures/sensitivity_lfc_scatter.pngresults/figures/pca_plot.pngresults/session_info.txt(records R, package versions, git commit, and config for the run)
# Check environment consistency against renv.lock
Rscript -e 'renv::status()'
# Run output validation tests
Rscript -e 'testthat::test_dir("tests/testthat")'
# Lint the analysis scripts
Rscript dev/lint.RGitHub Actions also performs a clean rebuild of the tracked analysis outputs, compares regenerated tables against the committed versions, and checks that key figures were regenerated successfully.
For a workflow-level comparison against DESeq2, nf-core/rnaseq, targets, and workflowr, see WORKFLOW_BENCHMARK.md.