A home for experimental scripts used for launching Nextflow Tower workflows for iAtlas Data Processing
These instructions assume that you already have Python, a Python version manager (pyenv), and pipenv installed.
Set up your Python environment for using these scripts by ruinning
pipenv install
pipenv shell
In order for the scripts leveraging py-orca to connect to Nextflow Tower, you must configure a NEXTFLOWTOWER_CONNECTION_URI in your local environment. To do so, you will need a Nextflow Tower token with access to the workspace that you wish to use (Sage-Bionetworks/iatlas-project in this case). You can then copy .env.example into a local .env file, replace <tower-access-token>, and run source .env in your terminal.
Steps to run the Immune Subtype Classifier workflow:
- Upload all sample files to a folder on Synapse. Make sure that the only
.tsvfiles in the folder are the ones that you want to be processed. - Prepare the master data sheet by executing
immune_subtype_classifier/prepare_data_sheet.pywith three arguments:parent: Synapse ID of the folder where your data files areexport_name: Name that you want the master data file to be exported toupload_location: Synapse ID of the folder that you want to upload the master data file to
python immune_subtype_classifier/prepare_data_sheet.py <parent> <export_name> <upload_location>
- Create and store your CWL
.jsonconfiguration file in the same location in Synapse as the data file produced by the previous step. Example.jsonfile:
{
"input_file": {
"path": <export_name>,
"class": "File"
},
"input_gene_column": "gene"
}- Create and store your
nf-synstageinput.csvfile in an S3 bucket accessible to the Nextflow Tower workspace (s3://iatlas-project-tower-bucketors3://iatlas-project-tower-scratch). Example.csvfile:
data_file,input_file
<synapse_id_for_master_data_sheet>,<synapse_id_for_json_input_file>- Stage the master data sheet and your CWL
.jsonconfiguration file to S3 buckets by executingimmune_subtype_classifier/nf_stage.pywith three arguments:run_name: What you want the workflow on Nextflow Tower to be namedinput: S3 URI for yournf-synstage-friendly input.csvfileoutdir: S3 URI for where you want the output ofnf-synstageto be stored
python immune_subtype_classifier/nf_stage.py <run_name> <input> <outdir>
- Execute the Immune Subtypes Classifier workflow on Nextflow Tower by executing
immune_subtype_classifier/nf_launch.pywith three arguments:run_name: What you want the workflow on Nextflow Tower to be nameds3_file: S3 URI the output fromnf-synstagecwl_file: File path to your CWL workflow file
python immune_subtype_classifier/nf_launch.py <run_name> <s3_file> <cwl_file>