Skip to content

Releases: gpaasi/ebolavirus-gl-phylogeography

Hierarchical Patterns of Ebola-Virus Movement: A 1976–2025 Multilayered Phylogeographic Analysis

28 May 15:03
1a5e704

Choose a tag to compare

This is the first official release of the “Ebola virus Great-Lakes phylogeography” repository, providing all data, code, environments, and documentation needed to fully reproduce and explore the analyses described in:

“A five-decade spatial history of Zaire, Sudan and Bundibugyo ebolaviruses in the Great-Lakes basin.”


Included Materials

  • Conda Environments (envs/)

    • phylo.yml – All software (MAFFT, BEAST2, IQ-TREE, R packages, etc.) required for sequence alignment, quality control, and BEAST phylogenetic inference.
    • visualization.yml – Tools for building and serving Nextstrain/Auspice visualizations (Augur, Auspice, Biopython, Node.js, etc.).
  • Raw & Curated Data (data/)

    • raw/sequences/
      • ebov_all.fasta, sudv_all.fasta, bdbv_all.fasta – Original FASTA files downloaded from GenBank containing all publicly available complete genomes of Zaire ebolavirus (EBOV), Sudan ebolavirus (SUDV), and Bundibugyo ebolavirus (BDBV) sampled in the Great-Lakes region (Uganda, DRC, Rwanda, Tanzania; 1976–2022).
    • raw/metadata/
      • ebov_metadata.tsv, sudv_metadata.tsv, bdbv_metadata.tsv – Tab-delimited metadata (accession, strain name, collection date, country, region, district, host, sequence length, GenBank link).
    • aligned/
      • ebov_aligned.fasta, sudv_aligned.fasta, bdbv_aligned.fasta – MAFFT alignments after initial quality filtering and trimming.
    • processed/
      • ebov_subset.fasta – Filtered alignment used for BEAST analysis.
      • combined_metadata.tsv – Unified, standardized metadata for all three species (with Nextstrain-compatible columns).
      • nextstrain_metadata.tsv – Metadata formatted for Augur/Auspice (including strain, date, region, division, country).
  • Snakemake Workflow & Helper Scripts (code/)

    • Snakefile – Main Snakemake workflow orchestrating all steps from raw inputs to final visualizations.
    • config.yaml – Configuration parameters (alignment settings, BEAST priors, subsampling thresholds, Nextstrain build settings).
    • scripts/
      • align_sequences.py – Wraps MAFFT to align raw FASTAs and perform basic sequence QC.
      • build_beast_xml.py – Generates BEAST2 XML files with pre-configured models (GTR+Γ, relaxed clock, Skygrid).
      • parse_beast_output.R – R script that summarizes BEAST posterior trees, extracts MCC tree, computes migration rates, and produces skyline plots.
      • generate_nextstrain.py – Automates Augur commands to produce Auspice JSON for EBOV, SUDV, and BDBV.
    • visualization/
      • auspice_config.json – Custom Auspice theme (node colors, trait panels, layout).
      • custom_theme.css – CSS overrides for Auspice visualization.
  • Results & Outputs (results/)

    • trees/
      • ebov_beast_tree.nexus, sudv_beast_tree.nexus, bdbv_beast_tree.nexus – Time-calibrated MCC trees for each species.
    • summary_statistics/
      • skyline_plots.pdf – Bayesian Skygrid reconstructions of effective population size (Nₑ) over time.
      • phylogeography_maps.pdf – Maps showing inferred discrete trait transitions (district-to-district migration).
      • migration_rates.tsv – Bayes factors and Markov jumps for each inferred cross-border route.
    • nextstrain_build/
      • auspice/ebov.json, auspice/sudv.json, auspice/bdbv.json – Nextstrain/Auspice JSON files for interactive visualization.
      • static/ebov_clock.png, static/sudv_clock.png, static/bdbv_clock.png – Time-scaled phylogeny snapshots.
  • Documentation (docs/)

    • methodology.md – Detailed description of sequence retrieval, alignment, BEAST model choices, and phylogeographic inference parameters.
    • FAQ.md – Frequently asked questions on data licensing, environment setup, and pipeline execution.
    • LICENSE_AGREEMENT.txt – Data usage agreements, GenBank/WHO metadata licensing information.
  • Additional Files (Root Level)

    • LICENSE – Creative Commons Attribution 4.0 International (CC BY 4.0).
    • CITATION.cff – Citation metadata for the entire repository.
    • zenodo.json – Metadata for Zenodo deposition (title, authors, DOI, license).
    • README.md – Overview and usage instructions (this file).

How to Reproduce

  1. Clone the repository
    git clone https://github.com/gpaasi/ebolavirus-gl-phylogeography.git
    cd ebolavirus-gl-phylogeography