Skip to content

Commit e67fd85

Browse files
authored
Merge pull request #120: Update example data
2 parents 951aea3 + f662819 commit e67fd85

7 files changed

Lines changed: 21469 additions & 8207 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
We use this CHANGELOG to document breaking changes, new features, bug fixes, and config value changes that may affect both the usage of the workflows and the outputs of the workflows.
44

5+
## 2026
6+
7+
* 9 March 2026: Added date annotations to some sequences from 1998 and 1999. [PR #120](https://github.com/nextstrain/WNV/pull/120) @victorlin
8+
59
## 2025
610

711
* 25 February 2026: Changes to files referenced in `subsample` config will trigger a re-run of the rule. [PR #116](https://github.com/nextstrain/WNV/pull/116) @victorlin

ingest/defaults/annotations.tsv

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@ EF631123 date XXXX-XX-XX
1717
DQ116961 date 2004-XX-XX
1818
AY603654 date 1976-XX-XX
1919
AM404308 date 1971-XX-XX
20-
AF260968 date 1951-XX-XX
2120
AY660002 date 2003-XX-XX
2221
AY268132 date 2000-XX-XX
22+
AF202541 date 1999-XX-XX
23+
AF206518 date 1999-XX-XX
24+
AF481864 date 1998-XX-XX
25+
AY277251 date 1998-XX-XX
26+
NC_001563 date 1937-XX-XX
27+
NC_009942 date 1999-XX-XX

phylogenetic/README.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Any desired data formatting and curations should be done as part of the [ingest]
3838

3939
The first step in the phylogenetic workflow is to subsample (or filter) the data. The subsampling criteria are specified in the
4040
phylogenetic/config/defaults.yaml file. The criteria are then executed in the Snakefile using wildcards and an input function.
41-
Documentation about subsampling can be found here [filtering and subsampling] (https://docs.nextstrain.org/en/latest/guides/bioinformatics/filtering-and-subsampling.html#subsampling-within-augur-filter)
41+
Documentation about subsampling can be found here [filtering and subsampling](https://docs.nextstrain.org/en/latest/guides/bioinformatics/filtering-and-subsampling.html#subsampling-within-augur-filter)
4242

4343

4444
## Defaults
@@ -65,6 +65,18 @@ in the main Snakefile in the order that they are expected to run.
6565
The build-configs directory contains custom configs and rules that override and/or
6666
extend the default workflow.
6767

68-
- [ci](build-configs/ci/) - CI build that runs with example data
68+
- [chores](build-configs/chores/) - chores that are run separately from the main workflow
69+
- [ci](build-configs/ci/) - [CI][] build that runs with [example data][]
6970

71+
## Update example data
72+
73+
[example data][] should be updated occasionally. To update, run:
74+
75+
```bash
76+
nextstrain build . update_example_data -F \
77+
--configfiles defaults/config.yaml build-configs/chores/config.yaml
78+
```
79+
80+
[CI]: https://github.com/nextstrain/WNV/actions/workflows/ci.yaml
81+
[example data]: ./example_data/
7082
[Nextstrain datasets]: https://docs.nextstrain.org/en/latest/reference/glossary.html#term-dataset
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
rule update_example_data:
2+
"""This updates the files under example_data/ based on latest available data from data.nextstrain.org.
3+
4+
The subset of data is generated by an augur filter call which:
5+
- sets the subsampling size to 50
6+
- samples evenly across time
7+
- adds everything in defaults/all-lineages/include.txt
8+
"""
9+
input:
10+
sequences="results/sequences.fasta",
11+
metadata="results/metadata.tsv",
12+
output:
13+
sequences="example_data/sequences.fasta",
14+
metadata="example_data/metadata.tsv",
15+
params:
16+
strain_id=config["strain_id_field"],
17+
shell:
18+
r"""
19+
augur filter \
20+
--metadata {input.metadata} \
21+
--metadata-id-columns {params.strain_id} \
22+
--sequences {input.sequences} \
23+
--subsample-max-sequences 50 \
24+
--group-by month \
25+
--subsample-seed 0 \
26+
--include defaults/all-lineages/include.txt \
27+
--output-metadata {output.metadata} \
28+
--output-sequences {output.sequences}
29+
"""
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
custom_rules:
2+
- build-configs/chores/chores.smk

0 commit comments

Comments
 (0)