This Nextflow workflow is designed to process a sample sheet (samplesheet.csv), retrieve files from Synapse based on entityId, and upload them to an AWS S3 bucket.
[!NOTE]
The workflow consists of two main steps:
- synapse_get: Downloads the files from Synapse using the
entityIdfrom the sample sheet. - cds_upload: Uploads the downloaded files to a specified AWS S3 bucket.
nextflow run ncihtan/nf-cdstransfer --input samplesheet.csv| Parameter | Type | Description |
|---|---|---|
| input | str |
Path to input samplesheet CSV file containing entityId and aws_uri columns. Required. |
| take_n | int |
Number of samples to process from the samplesheet. Use -1 to process all samples. Default: -1 |
| dryrun | bool |
If true, adds --dryrun flag to AWS copy commands for testing without actual file transfer. Default: false |
| aws_secret_prefix | str |
Prefix for AWS credential environment variables. Used to construct variable names like ${aws_secret_prefix}_AWS_ACCESS_KEY_ID. Useful for managing multiple AWS credential sets. Default: "" |
The workflow transfers files from Synapse to CDS (Cloud Data Service) in three main steps:
- Read and parse input samplesheet
- Download files from Synapse
- Upload files to CDS S3 bucket
- Generate transfer report
Nextflow secrets are used to ensure that tokens, keys and secrets are not exposed
The following Nextflow secrets should be set:
SYNAPSE_AUTH_TOKEN: Synapse authentication token.<params.aws_secret_prefix>_AWS_ACCESS_KEY_ID: AWS access key ID. egCDS_AWS_ACCESS_KEY_ID<params.aws_secret_prefix>: AWS secret access key. egCDS_AWS_SECRET_ACCESS_KEY
nextflow secrets set SYNAPSE_AUTH_TOKEN <SUPER_SECRET_THING>
| Field | Required | Pattern | Description | Example |
|---|---|---|---|---|
| entityId | Yes | ^syn\d+$ |
Synapse entity ID starting with 'syn' followed by numbers | syn123456 |
| file_url_in_cds | Yes | ^s3://.+ |
URL to the file location in AWS S3, must start with 's3://' | s3://mybucket/path/to/file |
Notes:
- Additional columns are allowed but not validated
- Both fields are mapped internally:
entityId→entityidfile_url_in_cds→aws_uri
The workflow uses the following plugins:
nf-schema: For parameter validation and schema managementnf-boost: For enhanced functionality and utilities
The included nextflow.config file specifies the following default options. These are used if not overridden by a custom config or profile.
docker.enabled = true
The nextflow.config file defines several profiles to customize the workflow execution. Below are the available profiles and the parameters/settings they configure:
| Setting / Profile | test | CDS | local | docker | tower |
|---|---|---|---|---|---|
| params.input | $projectDir/samplesheet.csv | - | - | - | - |
| params.aws_secret_prefix | TEST | CDS | - | - | - |
| params.dryrun | true | - | - | - | - |
| docker.enabled | true | true | true | true | true |
| process.executor | local | - | local | - | - |
| process.cpus | - | - | - | 1 * task.attempt | |
| process.memory | - | - | - | 1.GB * task.attempt | |
| process.maxRetries | - | - | - | 3 | |
| process.errorStrategy | - | - | - | retrys |
Downloads files from Synapse using entityIds.
meta: Object containingentityIdandaws_uri
- Tuple of (
meta, downloaded file path)
- Requires
SYNAPSE_AUTH_TOKENsecret - Uses
synapsepythonclientcontainer
Uploads downloaded files to CDS S3 bucket.
- Tuple of (
meta, file path) fromsynapse_get
- Tuple of (
meta, upload success boolean)
- Requires AWS credentials:
${aws_secret_prefix}_AWS_ACCESS_KEY_ID${aws_secret_prefix}_AWS_SECRET_ACCESS_KEY
- Uses AWS CLI container
No specific outputs are generated by the workflow.
By default a trace file is saved to reports/trace.csv
- Ensure Nextflow is installed.
- Ensure you have access to the necessary containers (
synapseclient,awscli). - Ensure you have the appropriate credentials for Synapse and AWS.
Run the workflow with the following command:
nextflow run ncihtan/nf-cdstransfer --input path/to/samplesheet.csvUsing the test profile will use samplesheet.csv when stored in your projectDir. Please generate your own samplesheet and use aws_secret_prefix TEST when setting your relevent AWS Nextflow secrets
nextflow run ncihtan/nf-cdstransfer -profile testTo avoid having to reset secrets when moving between destination accounts you can set your secrets using a prefix
nextflow secrets set MYCREDS_AWS_ACCESS_KEY_ID
nextflow secrets set MYCREDS_AWS_SECRET_ACCESS_KEY
nextflow run ncihtan/nf-cdstransfer --aws_secret_prefix MYCREDSor use a configured profile in which params.aws_secret_prefix is set
nextflow run ncihtan/nf-cdstransfer -profile CDS --input samplesheet.csv