Skip to content

Code to run analyses and generate figures for JUNIPER: Reconstructing Transmission Events from Next-Generation Sequencing Data at Scale

Notifications You must be signed in to change notification settings

broadinstitute/juniper-analyses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

juniper-analyses

Code to run analyses and generate figures for JUNIPER: Reconstructing Transmission Events from Next-Generation Sequencing Data at Scale by Specht et al. (2025).

Setup

Large File Merging

Before running code in this repository, please download the file juniper-analyses-oversize.zip from the Zenodo repository 10.5281/zenodo.14841603 and merge the contents with the juniper-analyses directory stored here. For example, the large file juniper-analyses-oversize/h5n1/res.RData from the Zenodo repository should be moved to juniper-analyses/h5n1/res.RData in this repository. The files deposited in the Zenodo repository were too large to host on this GitHub.

Packages

The analyses in this repository require the following R packages: outbreaker2, ape, ggplot2, igraph, readxl, reshape2, lubridate, coda, and cowplot. Additionally, synthetic data generation requires our custom R package simulatR. To install it, first install the devtools package, and then run devtools::install_github("broadinstitute/simulatR").

BEAST

Our comparison to the BadTrIP outbreak reconstruction tool requires the BEAST software, which may be downloaded and installed from beast2.org. We use BEAST version 2.7.5 for Mac OS X.

IQTree and TreeTime

Our comparison to the TransPhylo outbreak reconstruction tool requires the IQ-TREE and TreeTime software packages, which may be downloaded and installed from iqtree.org and treetime.readthedocs.io/en/latest/installation.html. We use IQ-TREE version 2.3.6 and TreeTime version 0.11.4 for Mac OS X.

De Novo iSNV Frequency Model

The code used to validate our model for the frequencies of de novo iSNVs can be found in isnv_frequency_model.R, and executed using

Rscript isnv_frequency_model.R

The estimate of the parameter r will be printed in the terminal, and a figure will be saved to the figs subdirectory.

Validation of JUNIPER on Synthetic Data

Data Generation and Model Execution

The directories experiment_1 through experiment_23, which contain the synthetic outbreaks used to validate JUNIPER and reconstructions thereof using JUNIPER, outbreaker2, BadTRIP, and TransPhylo, can be generated using the script synthetic_generation.R. This script may be run from the command line using five arguments, the first being the index of the synthetic outbreak to generate and reconstruction, and the second through fifth being the number of global iterations at which to run JUNIPER, outbreaker2, TransPhylo, and BadTrip. Note that the same number of global iterations for different methods may have drastically different runtimes and effective sample sizes. On the synthetic datasets, to achieve reasonable effective sample sizes, we recommend a default of 10,000 iterations each for JUNIPER and outbreaker2, 100,000 for BadTrIP, and 1,000,000 for TransPhylo.

As an example, to generate the directory experiment_1 with the above defaults, first navigate to the juniper-analyses directory. Then, from the command line, run

Rscript synthetic_generation.R 1 10000 10000 100000 1000000

Note that if the directory experiment_1 already exists, the script will prompt the user to delete it and try again. Additionally, running different versions of BEAST, IQ-TREE, and TreeTime and/or using different operating systems may require the user to modify the calls to said software packages, located in the functions compare_badtrip and compare_outbreaker within synthetic_generation.R.

Data Processing and Visualization

All performance statistics and figures for the synthetic data validation section of the JUNIPER manuscript were generated using the script synthetic_analysis.R. It can be run from the command line as follows:

Rscript synthetic_analysis.R

Summary statistics will be printed in the terminal, and figures will be saved to the figs subdirectory.

About

Code to run analyses and generate figures for JUNIPER: Reconstructing Transmission Events from Next-Generation Sequencing Data at Scale

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published