Add species to ingest/nextstrain-automation#56
Conversation
|
Both ingest and phylo workflows completed successfully in GH Action workflow. I'll merge this before tomorrow's automated run if there's no feedback. |
| bdbv/metadata.tsv.zst: results/bdbv/metadata.tsv | ||
| bdbv/metadata_open.tsv.zst: results/bdbv/metadata_open.tsv | ||
| bdbv/sequences.fasta.zst: results/bdbv/sequences.fasta | ||
| bdbv/sequences_open.fasta.zst: results/bdbv/sequences_open.fasta | ||
| sudv/metadata.tsv.zst: results/sudv/metadata.tsv | ||
| sudv/metadata_open.tsv.zst: results/sudv/metadata_open.tsv | ||
| sudv/sequences.fasta.zst: results/sudv/sequences.fasta | ||
| sudv/sequences_open.fasta.zst: results/sudv/sequences_open.fasta |
There was a problem hiding this comment.
Tangentially (i.e. not-blocking), the ingest for bdbv & sudv is so quick it makes me wonder if there are situations / pathogens where we'd skip intermediate files and just re-ingest from PPX each time. Has this come up for other pathogens?
There was a problem hiding this comment.
Oh, hmm, I don't think that's been considered elsewhere.
If we were using the standard inputs config that points to the local ingest outputs, I'd think this would be pretty easy to do without any custom workflow handling:
# phylogenetic/defaults/bdbv/config.yaml
inputs:
- name: bdbv
metadata: ../ingest/results/bdbv/metadata.tsv
sequences: ../ingest/results/bdbv/sequences.fastanextstrain build ingest && nextstrain build phylogenetic --configfile defaults/bdbv/config.yamlThere was a problem hiding this comment.
The BDBV/SUDV builds in #54 are hardcoded to only use local ingest files at the moment
There was a problem hiding this comment.
Ah, the ingest workflow should be updated to be able to ingest a single species then. We can update the hardcoded SPECIES to
SPECIES = config["species"]Then you should be able to run
nextstrain build ingest --config species="['bdbv']" && nextstrain build phylogenetic -s species-workflows/bdbv.snakefileThere was a problem hiding this comment.
Add ingest config for species in c97ed1a.
There was a problem hiding this comment.
But also, upon reflection, let's drop the S3 uploads for files we ourselves are not (yet) using
There was a problem hiding this comment.
Okay, only running ebov in the automated workflow with f077ccb.
There was a problem hiding this comment.
Also deleted the bdbv/sudv objects on S3 so we don't get confused.
Add ability to configure species so that users can run ingest for a subset of species. Motivated by discussion in <#56 (comment)>
Note that only ebov has Nextclade outputs for now. This still follows the old pattern of all data files and separate OPEN files for PPX data because the phylo workflow does not support multiple inputs yet. Follow up to <#53>
Run the phylo job regardless of cache check for manual runs via `workflow_dispatch` since manual runs are expected to be used for testing and for forcing the full workflow run.
Add ability to configure species so that users can run ingest for a subset of species. Motivated by discussion in <#56 (comment)>
ba334ef to
c97ed1a
Compare
bdbv/sudv workflows do not pull data from S3 (yet), so only run the ebov ingest for now.
Description of proposed changes
Related issue(s)
Follow up to #53
Resolves #55
Checklist