This workflow uses metadata and sequences to produce one or multiple Nextstrain datasets that can be visualized in Auspice.
The workflow can be run from the top level pathogen repo directory:
# Build the genotype/genome trees
nextstrain build phylogenetic
# Build the all/gene trees
nextstrain build phylogenetic --configfile defaults/all/config.yaml
Alternatively, the workflow can also be run from within the phylogenetic directory:
cd phylogenetic
nextstrain build .
nextstrain build . --configfile defaults/all/config.yaml
This produces the default outputs of the phylogenetic workflow:
- auspice_json(s) = auspice/*.json
The core phylogenetic workflow will use metadata values as-is, so please do any desired data formatting and curations as part of the ingest workflow.
- The metadata must include an ID column that can be used as as exact match for the sequence ID present in the FASTA headers.
- The
datecolumn in the metadata must be in ISO 8601 date format (i.e. YYYY-MM-DD). - Ambiguous dates should be masked with
XX(e.g. 2023-01-XX).
The defaults directory contains all of the default configurations for the phylogenetic workflow.
defaults/config.yaml contains all of the default configuration parameters
used for the phylogenetic workflow. Use Snakemake's --configfile/--config
options to override these default values.
The rules directory contains separate Snakefiles (*.smk) as modules of the core phylogenetic workflow.
The modules of the workflow are in separate files to keep the main phylogenetic Snakefile succinct and organized.
The workdir is hardcoded to be the phylogenetic directory so all filepaths for
inputs/outputs should be relative to the phylogenetic directory.
Modules are all included in the main Snakefile in the order that they are expected to run.
The build-configs directory contains custom configs and rules that override and/or extend the default workflow.
- ci - CI build that runs with example data