Add species to ingest/nextstrain-automation by joverlee521 · Pull Request #56 · nextstrain/ebola

joverlee521 · 2026-05-18T17:41:57Z

Description of proposed changes

Add species to ingest/nextstrain-automation uploads
Update phylo/all-outbreaks to start from ebov files

Related issue(s)

Follow up to #53
Resolves #55

Checklist

Checks pass
Update changelog
ingest-to-phylo run
ingest-to-phylo run #2

joverlee521 · 2026-05-18T18:37:35Z

Both ingest and phylo workflows completed successfully in GH Action workflow. I'll merge this before tomorrow's automated run if there's no feedback.

jameshadfield

Thanks Jover!

jameshadfield · 2026-05-18T18:57:58Z

+  bdbv/metadata.tsv.zst: results/bdbv/metadata.tsv
+  bdbv/metadata_open.tsv.zst: results/bdbv/metadata_open.tsv
+  bdbv/sequences.fasta.zst: results/bdbv/sequences.fasta
+  bdbv/sequences_open.fasta.zst: results/bdbv/sequences_open.fasta
+  sudv/metadata.tsv.zst: results/sudv/metadata.tsv
+  sudv/metadata_open.tsv.zst: results/sudv/metadata_open.tsv
+  sudv/sequences.fasta.zst: results/sudv/sequences.fasta
+  sudv/sequences_open.fasta.zst: results/sudv/sequences_open.fasta


Tangentially (i.e. not-blocking), the ingest for bdbv & sudv is so quick it makes me wonder if there are situations / pathogens where we'd skip intermediate files and just re-ingest from PPX each time. Has this come up for other pathogens?

Oh, hmm, I don't think that's been considered elsewhere.

If we were using the standard inputs config that points to the local ingest outputs, I'd think this would be pretty easy to do without any custom workflow handling:

# phylogenetic/defaults/bdbv/config.yaml inputs: - name: bdbv metadata: ../ingest/results/bdbv/metadata.tsv sequences: ../ingest/results/bdbv/sequences.fasta

nextstrain build ingest && nextstrain build phylogenetic --configfile defaults/bdbv/config.yaml

The BDBV/SUDV builds in #54 are hardcoded to only use local ingest files at the moment

Ah, the ingest workflow should be updated to be able to ingest a single species then. We can update the hardcoded SPECIES to

SPECIES = config["species"]

Then you should be able to run

nextstrain build ingest --config species="['bdbv']" && nextstrain build phylogenetic -s species-workflows/bdbv.snakefile

Add ingest config for species in c97ed1a.

But also, upon reflection, let's drop the S3 uploads for files we ourselves are not (yet) using

Okay, only running ebov in the automated workflow with f077ccb.

f077ccb ⭐

Also deleted the bdbv/sudv objects on S3 so we don't get confused.

Add ability to configure species so that users can run ingest for a subset of species. Motivated by discussion in <#56 (comment)>

Note that only ebov has Nextclade outputs for now. This still follows the old pattern of all data files and separate OPEN files for PPX data because the phylo workflow does not support multiple inputs yet. Follow up to <#53>

Run the phylo job regardless of cache check for manual runs via `workflow_dispatch` since manual runs are expected to be used for testing and for forcing the full workflow run.

Add ability to configure species so that users can run ingest for a subset of species. Motivated by discussion in <#56 (comment)>

bdbv/sudv workflows do not pull data from S3 (yet), so only run the ebov ingest for now.

jameshadfield approved these changes May 18, 2026

View reviewed changes

joverlee521 added a commit that referenced this pull request May 18, 2026

ingest: add species config param

ba334ef

Add ability to configure species so that users can run ingest for a subset of species. Motivated by discussion in <#56 (comment)>

joverlee521 added 5 commits May 18, 2026 16:37

ingest/nextstrain-automation: Upload species specific files

2a3aec3

Note that only ebov has Nextclade outputs for now. This still follows the old pattern of all data files and separate OPEN files for PPX data because the phylo workflow does not support multiple inputs yet. Follow up to <#53>

phylo/all-outbreaks: add ebov prefix to inputs

e6c9805

.github/ingest-to-phylo: add ebov prefix to cache check

022c7e3

.github/ingest-to-phylo: workflow_dispatch always runs phylo job

7583c82

Run the phylo job regardless of cache check for manual runs via `workflow_dispatch` since manual runs are expected to be used for testing and for forcing the full workflow run.

ingest: add species config param

c97ed1a

Add ability to configure species so that users can run ingest for a subset of species. Motivated by discussion in <#56 (comment)>

joverlee521 force-pushed the fix-species-ingest branch from ba334ef to c97ed1a Compare May 18, 2026 23:38

ingest/nextstrain-automation: Only run ebov

f077ccb

bdbv/sudv workflows do not pull data from S3 (yet), so only run the ebov ingest for now.

joverlee521 merged commit 1ab9e80 into main May 19, 2026
5 checks passed

joverlee521 deleted the fix-species-ingest branch May 19, 2026 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add species to ingest/nextstrain-automation#56

Add species to ingest/nextstrain-automation#56
joverlee521 merged 6 commits into
mainfrom
fix-species-ingest

joverlee521 commented May 18, 2026 •

edited

Loading

Uh oh!

joverlee521 commented May 18, 2026

Uh oh!

jameshadfield left a comment

Uh oh!

jameshadfield May 18, 2026

Uh oh!

joverlee521 May 18, 2026

Uh oh!

jameshadfield May 18, 2026

Uh oh!

joverlee521 May 18, 2026

Uh oh!

joverlee521 May 18, 2026

Uh oh!

jameshadfield May 19, 2026

Uh oh!

joverlee521 May 19, 2026

Uh oh!

jameshadfield May 19, 2026

Uh oh!

joverlee521 May 19, 2026

Uh oh!

jameshadfield May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joverlee521 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed changes

Related issue(s)

Checklist

Uh oh!

joverlee521 commented May 18, 2026

Uh oh!

jameshadfield left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joverlee521 commented May 18, 2026 •

edited

Loading