Nextstrain manual upload

Note

External users can ignore this directory! This build is tailored for the internal Nextstrain team to run manually to upload GISAID files to our private AWS S3 bucket.

This workflow uploads files which have been manually downloaded from GISAID to S3, storing them as "unprocessed" pairs of files.

Run the workflow

This workflow is expected to be run manually after downloading files from GISAID. The GISAID files are expected to be saved as

<YYYY-MM-DD-N>-metadata.tsv
<YYYY-MM-DD-N>-sequences.fasta

<YYYY-MM-DD> is the date the files were downloaded from GISAID. <N> is the number of the download since GISAID limits the number of records per download.

For example, if you had to split the data between two downloads on 2025-04-11, then save the files as

2025-04-11-1-metadata.tsv
2025-04-11-1-sequences.fasta
2025-04-11-2-metadata.tsv
2025-04-11-2-sequences.fasta

The directory in which you save the GISAID files depends on which command you are using to run the workflow.

With `nextstrain run`

When running with nextstrain run, you can save the GISAID files in the data directory within any arbitrary analysis directory. However, you must also create a config.yaml within the analysis directory to specify the gisaid_pairs to upload.

Continuing the example above, your analysis directory should look like

<analysis-dir>
├── config.yaml
└── data
    ├── 2025-04-11-1-metadata.tsv
    ├── 2025-04-11-1-sequences.fasta
    ├── 2025-04-11-2-metadata.tsv
    └── 2025-04-11-2-sequences.fasta

With the config.yaml specifying the gisaid_pairs you want to upload

gisaid_pairs:
 - 2025-04-11-1
 - 2025-04-11-2

Make sure you have the latest ncov-ingest pathogen setup.

$ nextstrain update ncov-ingest@master
Checking for newer versions of Nextstrain CLI…

nextstrain-cli is up to date!

Updating ncov-ingest@master pathogen version…
'ncov-ingest@master' already up-to-date.

Updated ncov-ingest@master pathogen version!

All updates successful!

Then run the workflow

nextstrain run \
    --env AWS_ACCESS_KEY_ID \
    --env AWS_SECRET_ACCESS_KEY \
    ncov-ingest@master \
    manual-upload \
    <analysis-directory>

With `nextstrain build`

When running with nextstrain build the files must be saved within the ncov-ingest repo.

Save the downloaded GISAID metadata and sequences as:

manual-upload/data/<YYYY-MM-DD-N>-metadata.tsv
manual-upload/data/<YYYY-MM-DD-N>-sequences.fasta

The workflow can be run from the top level pathogen repo directory with:

nextstrain build \
    --env AWS_ACCESS_KEY_ID \
    --env AWS_SECRET_ACCESS_KEY \
    manual-upload \
        --config gisaid_pairs=["2025-04-11-1", "2025-04-11-2"]

Required environment variables

You need to have AWS credentials with permissions to upload to the private AWS S3 bucket nextstrain-ncov-private

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nextstrain manual upload

Run the workflow

With `nextstrain run`

With `nextstrain build`

Required environment variables

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Nextstrain manual upload

Run the workflow

With nextstrain run

With nextstrain build

Required environment variables

With `nextstrain run`

With `nextstrain build`