This workflow uses already-curated WNV metadata and sequences to produce Nextstrain datasets that can be visualized in Auspice. By default the curated data is sourced from our public S3 bucket and produces three analyses:
- all-lineages (
auspice/WNV_all-lineages.json) - Lineage 1A (
auspice/WNV_lineage-1A.json) - Lineage 2 (
auspice/WNV_lineage-2.json)
The workflow to execute all Nextstrain builds can be run from the top level directory:
nextstrain build phylogenetic
The workflow to execute the Washington focused build can also be run from the top level directory:
nextstrain build phylogenetic --configfile build-configs/washington-state/config.yaml
Alternatively, the workflow can also be run from within the phylogenetic directory:
cd phylogenetic
nextstrain build .
This produces the default outputs of the phylogenetic workflow:
- auspice_json(s) = auspice/*.json
The core phylogenetic workflow will use two files output from the Ingest workflow 'metadata_all.tsv' and 'sequences_all.fasta' Any desired data formatting and curations should be done as part of the ingest workflow.
The first step in the phylogenetic workflow is to subsample (or filter) the data. The subsampling criteria are specified in the phylogenetic/config/defaults.yaml file. The criteria are then executed in the Snakefile using wildcards and an input function. Documentation about subsampling can be found here filtering and subsampling
The defaults directory contains all of the default configurations for the phylogenetic workflow.
defaults/config.yaml contains all of the default configuration parameters
used for the phylogenetic workflow. Use Snakemake's --configfile/--config
options to override these default values.
The rules directory contains separate Snakefiles (*.smk) as modules of the core phylogenetic workflow.
The modules of the workflow are in separate files to keep the main phylogenetic Snakefile succinct and organized.
The workdir is hardcoded to be the phylogenetic directory so all filepaths for
inputs/outputs should be relative to the phylogenetic directory.
Modules are all included in the main Snakefile in the order that they are expected to run.
The build-configs directory contains custom configs and rules that override and/or extend the default workflow.
- chores - chores that are run separately from the main workflow
- ci - CI build that runs with example data
example data should be updated occasionally. To update, run:
nextstrain build . update_example_data -F \
--configfiles defaults/config.yaml build-configs/chores/config.yaml