Note
External users can ignore this directory! This build is tailored for the internal Nextstrain team to run manually to upload GISAID files to our private AWS S3 bucket.
This workflow uploads files which have been manually downloaded from GISAID to S3, storing them as "unprocessed" pairs of files.
This workflow is expected to be run manually after downloading files from GISAID. The GISAID files are expected to be saved as
<YYYY-MM-DD-N>-metadata.tsv<YYYY-MM-DD-N>-sequences.fasta
<YYYY-MM-DD> is the date the files were downloaded from GISAID.
<N> is the number of the download since GISAID limits the number of records per download.
For example, if you had to split the data between two downloads on 2025-04-11, then save the files as
2025-04-11-1-metadata.tsv2025-04-11-1-sequences.fasta2025-04-11-2-metadata.tsv2025-04-11-2-sequences.fasta
The directory in which you save the GISAID files depends on which command you are using to run the workflow.
When running with nextstrain run, you can save the GISAID files in the data
directory within any arbitrary analysis directory. However, you must also create
a config.yaml within the analysis directory to specify the gisaid_pairs to upload.
Continuing the example above, your analysis directory should look like
<analysis-dir>
├── config.yaml
└── data
├── 2025-04-11-1-metadata.tsv
├── 2025-04-11-1-sequences.fasta
├── 2025-04-11-2-metadata.tsv
└── 2025-04-11-2-sequences.fasta
With the config.yaml specifying the gisaid_pairs you want to upload
gisaid_pairs:
- 2025-04-11-1
- 2025-04-11-2Make sure you have the latest ncov-ingest pathogen setup.
$ nextstrain update ncov-ingest@master
Checking for newer versions of Nextstrain CLI…
nextstrain-cli is up to date!
Updating ncov-ingest@master pathogen version…
'ncov-ingest@master' already up-to-date.
Updated ncov-ingest@master pathogen version!
All updates successful!Then run the workflow
nextstrain run \
--env AWS_ACCESS_KEY_ID \
--env AWS_SECRET_ACCESS_KEY \
ncov-ingest@master \
manual-upload \
<analysis-directory>
When running with nextstrain build the files must be saved within the
ncov-ingest repo.
Save the downloaded GISAID metadata and sequences as:
manual-upload/data/<YYYY-MM-DD-N>-metadata.tsvmanual-upload/data/<YYYY-MM-DD-N>-sequences.fasta
The workflow can be run from the top level pathogen repo directory with:
nextstrain build \
--env AWS_ACCESS_KEY_ID \
--env AWS_SECRET_ACCESS_KEY \
manual-upload \
--config gisaid_pairs=["2025-04-11-1", "2025-04-11-2"]
You need to have AWS credentials with permissions to upload to the private
AWS S3 bucket nextstrain-ncov-private
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY