Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Phylogenetic

This workflow uses already-curated WNV metadata and sequences to produce Nextstrain datasets that can be visualized in Auspice. By default the curated data is sourced from our public S3 bucket and produces three analyses:

  1. all-lineages (auspice/WNV_all-lineages.json)
  2. Lineage 1A (auspice/WNV_lineage-1A.json)
  3. Lineage 2 (auspice/WNV_lineage-2.json)

Workflow Usage

The workflow to execute all Nextstrain builds can be run from the top level directory:

nextstrain build phylogenetic

The workflow to execute the Washington focused build can also be run from the top level directory:

nextstrain build phylogenetic --configfile build-configs/washington-state/config.yaml

Alternatively, the workflow can also be run from within the phylogenetic directory:

cd phylogenetic
nextstrain build .

This produces the default outputs of the phylogenetic workflow:

  • auspice_json(s) = auspice/*.json

Data Requirements

The core phylogenetic workflow will use two files output from the Ingest workflow 'metadata_all.tsv' and 'sequences_all.fasta' Any desired data formatting and curations should be done as part of the ingest workflow.

Subsampling

The first step in the phylogenetic workflow is to subsample (or filter) the data. The subsampling criteria are specified in the phylogenetic/config/defaults.yaml file. The criteria are then executed in the Snakefile using wildcards and an input function. Documentation about subsampling can be found here filtering and subsampling

Defaults

The defaults directory contains all of the default configurations for the phylogenetic workflow.

defaults/config.yaml contains all of the default configuration parameters used for the phylogenetic workflow. Use Snakemake's --configfile/--config options to override these default values.

Snakefile and rules

The rules directory contains separate Snakefiles (*.smk) as modules of the core phylogenetic workflow. The modules of the workflow are in separate files to keep the main phylogenetic Snakefile succinct and organized.

The workdir is hardcoded to be the phylogenetic directory so all filepaths for inputs/outputs should be relative to the phylogenetic directory.

Modules are all included in the main Snakefile in the order that they are expected to run.

Build configs

The build-configs directory contains custom configs and rules that override and/or extend the default workflow.

Update example data

example data should be updated occasionally. To update, run:

nextstrain build . update_example_data -F \
    --configfiles defaults/config.yaml build-configs/chores/config.yaml