Releases: gpaasi/ebolavirus-gl-phylogeography
Releases · gpaasi/ebolavirus-gl-phylogeography
Hierarchical Patterns of Ebola-Virus Movement: A 1976–2025 Multilayered Phylogeographic Analysis
Hierarchical Patterns of Ebola-Virus Movement: A 1976–2025 Multilayered Phylogeographic Analysis
Latest
This is the first official release of the “Ebola virus Great-Lakes phylogeography” repository, providing all data, code, environments, and documentation needed to fully reproduce and explore the analyses described in:
“A five-decade spatial history of Zaire, Sudan and Bundibugyo ebolaviruses in the Great-Lakes basin.”
Included Materials
-
Conda Environments (
envs/)phylo.yml– All software (MAFFT, BEAST2, IQ-TREE, R packages, etc.) required for sequence alignment, quality control, and BEAST phylogenetic inference.visualization.yml– Tools for building and serving Nextstrain/Auspice visualizations (Augur, Auspice, Biopython, Node.js, etc.).
-
Raw & Curated Data (
data/)raw/sequences/ebov_all.fasta,sudv_all.fasta,bdbv_all.fasta– Original FASTA files downloaded from GenBank containing all publicly available complete genomes of Zaire ebolavirus (EBOV), Sudan ebolavirus (SUDV), and Bundibugyo ebolavirus (BDBV) sampled in the Great-Lakes region (Uganda, DRC, Rwanda, Tanzania; 1976–2022).
raw/metadata/ebov_metadata.tsv,sudv_metadata.tsv,bdbv_metadata.tsv– Tab-delimited metadata (accession, strain name, collection date, country, region, district, host, sequence length, GenBank link).
aligned/ebov_aligned.fasta,sudv_aligned.fasta,bdbv_aligned.fasta– MAFFT alignments after initial quality filtering and trimming.
processed/ebov_subset.fasta– Filtered alignment used for BEAST analysis.combined_metadata.tsv– Unified, standardized metadata for all three species (with Nextstrain-compatible columns).nextstrain_metadata.tsv– Metadata formatted for Augur/Auspice (includingstrain,date,region,division,country).
-
Snakemake Workflow & Helper Scripts (
code/)Snakefile– Main Snakemake workflow orchestrating all steps from raw inputs to final visualizations.config.yaml– Configuration parameters (alignment settings, BEAST priors, subsampling thresholds, Nextstrain build settings).scripts/align_sequences.py– Wraps MAFFT to align raw FASTAs and perform basic sequence QC.build_beast_xml.py– Generates BEAST2 XML files with pre-configured models (GTR+Γ, relaxed clock, Skygrid).parse_beast_output.R– R script that summarizes BEAST posterior trees, extracts MCC tree, computes migration rates, and produces skyline plots.generate_nextstrain.py– Automates Augur commands to produce Auspice JSON for EBOV, SUDV, and BDBV.
visualization/auspice_config.json– Custom Auspice theme (node colors, trait panels, layout).custom_theme.css– CSS overrides for Auspice visualization.
-
Results & Outputs (
results/)trees/ebov_beast_tree.nexus,sudv_beast_tree.nexus,bdbv_beast_tree.nexus– Time-calibrated MCC trees for each species.
summary_statistics/skyline_plots.pdf– Bayesian Skygrid reconstructions of effective population size (Nₑ) over time.phylogeography_maps.pdf– Maps showing inferred discrete trait transitions (district-to-district migration).migration_rates.tsv– Bayes factors and Markov jumps for each inferred cross-border route.
nextstrain_build/auspice/ebov.json,auspice/sudv.json,auspice/bdbv.json– Nextstrain/Auspice JSON files for interactive visualization.static/ebov_clock.png,static/sudv_clock.png,static/bdbv_clock.png– Time-scaled phylogeny snapshots.
-
Documentation (
docs/)methodology.md– Detailed description of sequence retrieval, alignment, BEAST model choices, and phylogeographic inference parameters.FAQ.md– Frequently asked questions on data licensing, environment setup, and pipeline execution.LICENSE_AGREEMENT.txt– Data usage agreements, GenBank/WHO metadata licensing information.
-
Additional Files (Root Level)
LICENSE– Creative Commons Attribution 4.0 International (CC BY 4.0).CITATION.cff– Citation metadata for the entire repository.zenodo.json– Metadata for Zenodo deposition (title, authors, DOI, license).README.md– Overview and usage instructions (this file).
How to Reproduce
- Clone the repository
git clone https://github.com/gpaasi/ebolavirus-gl-phylogeography.git cd ebolavirus-gl-phylogeography