nf-genie

Nextflow workflow for main GENIE processing. This follows the SOP outlined in the GENIE confluence page.

Process and developing locally

Follow instructions here for running the main GENIE processing locally.

It's recommended to use an EC2 instance with docker to run processing and develop locally. Follow instructions using Service-Catalog-Provisioning to create an ec2 on service catalog. You will also want to follow the section SSM with SSH if you want to use VS code to run/develop.

Dependencies

Install nextflow and any dependencies (e.g: Java) by following instructions here: Get started — Nextflow
Be sure to pull the latest version of the main GENIE docker image into your environment, see here for more details: GENIE Dockerhub

Using an EC2

For an EC2 instance with Linux and docker, see here for installing JAVA 11: How do I install a software package from the Extras Library on an EC2 instance running Amazon Linux 2?

Configuration

Prior to running the test pipeline, you will need to create a Nextflow secret called SYNAPSE_AUTH_TOKEN with a Synapse personal access token (docs).

Authentication

This workflow takes care of transferring files to and from Synapse. Hence, it requires a secret with a personal access token for authentication. To configure Nextflow with such a token, follow these steps:

Generate a personal access token (PAT) on Synapse using this dashboard. Make sure to enable the view, download, and modify scopes since this workflow both downloads and uploads to Synapse.
Create a secret called SYNAPSE_AUTH_TOKEN containing a Synapse personal access token using the Nextflow CLI or Nextflow Tower.
(Tower only) When launching the workflow, include the SYNAPSE_AUTH_TOKEN as a pipeline secret from either your user or workspace secrets.

Running the pipeline

The commands under Commands run on the test pipeline. To run on production pipeline, specify a specific value to the release parameter, e.g:

13.1-public (for public releases)
13.1-consortium (for consortium releases)

Parameters

You can run the following command to get a list of the current available parameters, their defaults and descriptions.

nextflow run main.nf --help

See nextflow_schema.json for the same thing.

Config Profiles

We use two profiles for the pipeline which contains the docker container defaults and resource specifications for running the pipeline:

aws_prod - used for production pipeline runs
aws_test - used for test pipeline runs

See nextflow.config for more details on the profiles' content. Read more about config profiles and how to call them here: Config Profiles

Running with docker locally

Add -with-docker <docker_image_name> and specify the docker image parameter to every nextflow command to invoke docker in general to be used. See docker-containers for more details.

See nextflow.config for the docker parameters available.

Note that all the docker parameters have set default docker containers based on the profile you select. If you want to use a different default from what is available in the profiles, you must:

docker pull <container_name> the container(s) you want to use in your local / ec2 instance
Specify the docker parameter(s) in your command call(s) below to be the name of the container(s) you pulled

Example: To use a feature branch docker image for your nextflow pipeline step run, specify

nextflow run main.nf ... \
    --main_pipeline_docker <name_of_your_feature_branch_docker_image>

Commands

Only validate files on test pipeline

nextflow run main.nf -profile aws_test \
        --process_type only_validate \
        -with-docker ghcr.io/sage-bionetworks/genie:develop
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

Processes non-mutation files on test pipeline

nextflow run main.nf -profile aws_test \
        --process_type main_process \
        -with-docker ghcr.io/sage-bionetworks/genie:develop
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

Processes mutation files on test pipeline

To execute the MAF process for all centers, you can either specify the maf_centers as "ALL" or leave it blank.

nextflow run main.nf -profile aws_test \
        --process_type maf_process \
        --create_new_maf_db \
        -with-docker ghcr.io/sage-bionetworks/genie:develop
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

Or

nextflow run main.nf -profile aws_test \
        --process_type maf_process \
        --maf_centers ALL \
        --create_new_maf_db \
        -with-docker ghcr.io/sage-bionetworks/genie:develop \
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

To execute the MAF process for a single center, you can specify the maf_centers parameter using the name of that center.

nextflow run main.nf -profile aws_test \
        --process_type maf_process \
        --maf_centers TEST \
        --create_new_maf_db \
        -with-docker ghcr.io/sage-bionetworks/genie:develop \
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

To execute the MAF process for multiple centers, you can specify the maf_centers as a comma-separated list of center names and append results to the MAF table.

nextflow run main.nf -profile aws_test \
        --process_type maf_process \
        --maf_centers TEST,SAGE \
        --create_new_maf_db false \
        -with-docker ghcr.io/sage-bionetworks/genie:develop \
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

Runs processing and consortium release (including data guide creation) on test pipeline

nextflow run main.nf -profile aws_test \
        --process_type consortium_release \
        --create_new_maf_db \
        -with-docker ghcr.io/sage-bionetworks/genie:develop
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

Runs public release (including data guide creation) on test pipeline

nextflow run main.nf -profile aws_test \
        --process_type public_release \
        -with-docker ghcr.io/sage-bionetworks/genie:develop
        --main_pipeline_docker ghcr.io/sage-bionetworks/genie:develop

Testing

Run unit tests from the root of the repo. These unit tests cover the code in the scripts/ directory.

python3 -m pytest tests

Unit tests have to be run manually for now. You will need pandas and synapseclient to run them. See Dockerfile for the version of synapseclient to use.

Processing on Seqera Platform

Follow instructions here for running the main GENIE processing directly on Seqera Platform:

Please create a IBCDPE help desk request to gain access to the genie-bpc-project on Seqera Platform.
After you have access, you will want to head to the launchpad
Click on the test_main_genie pipeline
Fill out the parameters for launching the specific parts of the pipeline.
If you need to modify any of the underlying default launch settings like config profiles or run a pipeline on a feature branch rather than develop or main, navigate back to the Launchpad and click on Add pipeline. Typically, the relevant settings you would need to modify would be the following:
- Config profiles - profile to launch with, see Config profiles for more details
- Revision number - branch of nf-genie that you're launching the pipeline on

Visit the Nextflow Tower docs for more info/training

Other Modules

There are other scripts that are not part of the direct GENIE pipeline steps. Those will have their own READMEs and will be linked here.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
.github		.github
bin		bin
img		img
lib		lib
modules		modules
scripts		scripts
tests/scripts		tests/scripts
validation		validation
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nextflow_schema_patch_release.json		nextflow_schema_patch_release.json
patch_release_main.nf		patch_release_main.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-genie

Process and developing locally

Dependencies

Using an EC2

Configuration

Authentication

Running the pipeline

Parameters

Config Profiles

Running with docker locally

Commands

Testing

Processing on Seqera Platform

Other Modules

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors 9

Uh oh!

Languages

License

Sage-Bionetworks-Workflows/nf-genie

Folders and files

Latest commit

History

Repository files navigation

nf-genie

Process and developing locally

Dependencies

Using an EC2

Configuration

Authentication

Running the pipeline

Parameters

Config Profiles

Running with docker locally

Commands

Testing

Processing on Seqera Platform

Other Modules

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors 9

Uh oh!

Languages

Packages