Skip to content

NVIDIA-AI-Blueprints/genomics-analysis

Repository files navigation

Genomics Analysis Developer Example

Easily run essential genomics workflows to save time leveraging Parabricks and CodonFM.

Overview

This developer example enables bioinformaticians to run GPU-accelerated genomics workflows in minutes on any cloud through Brev.dev. NVIDIA® Parabricks® powers both linear and graph-based read alignment along with variant calling via DeepVariant. CodonFM, NVIDIA's RNA foundation model, can then be used to predict the functional impact of each detected variant on specific genes.

Experience Workflow

This developer example shows how to use GPU accelerated tools for alignment (linear and graph), variant calling, and variant effect prediction.

Architecture Diagram

The exact steps to run this workflow are outlined below:

Notebook Outline

All the code can be found in Jupyter notebooks in the notebooks directory of the Github repo.

germline_wes.ipynb

Runs a standard germline variant calling workflow on whole exome sequencing (WES) data. Downloads the NA12878 sample from the Genome in a Bottle consortium, aligns reads to the GRCh38 reference using GPU-accelerated BWA-MEM via Parabricks fq2bam, and calls variants with GPU-accelerated DeepVariant, producing a final .vcf file.

pangenome.ipynb

Demonstrates a pangenome analysis workflow as an alternative to single-reference alignment. Downloads the HPRC v1.1 pangenome graph, aligns short-read FASTQ samples using GPU-accelerated Giraffe, and calls variants with Pangenome-Aware DeepVariant — a variant of DeepVariant that uses the pangenome graph to improve alignment accuracy and variant detection across diverse populations.

variant_effect_prediction.ipynb

Runs a full variant effect prediction pipeline starting from raw FASTQ files. Uses Parabricks to align reads and call variants, processes GENCODE gene annotations to extract protein-coding sequences, maps detected variants onto transcripts, and uses CodonFM (NVIDIA's RNA foundation model) to predict the functional impact of each variant via log likelihood ratios.

How to Run

Hardware Requirements

The L40s with at least 48GB of GPU memory is recommended for the best combination of cost and performance. Users can also try L4 or T4 (better cost) or RTX Pro 6000 (better performance).

NVIDIA Parabricks can be run on any NVIDIA GPU that supports CUDA® architecture 75, 80, 86, 89, 90, 100, or 120 and has at least 16GB of GPU RAM.

Parabricks has been tested specifically on the following NVIDIA GPUs:

  • T4
  • A10, A30, A40, A100, A6000
  • L4, L40
  • H100, H200
  • GH200
  • B200, B300
  • GB200, GB300
  • RTX PRO 6000 Blackwell Server Edition
  • RTX PRO 4500
  • DGX Spark
  • DGX Station

The minimum amount of CPU RAM and CPU threads depends on the number of GPUs. Please refer to the table below:

GPUs Minimum CPU RAM (GB) Minimum CPU Threads
2 100 24
4 196 32
8 392 48

Software Requirements

  • Any NVIDIA driver that is compatible with CUDA 12.9 (535, 550, 570, 575, or similar). Please check here for more details on forward compatibility.
  • Any Linux operating system that supports Docker version 20.10 (or higher) with the NVIDIA GPU runtime.

Pre-configured Instances

These notebooks are available as a launchable on Brev. This is a one-click method, that automatically installs dependencies, provisions hardware, and loads this repository.

 Click here to deploy.

Manual installation

For users who prefer to run on their own hardware, installation instructions are provided below:

Prerequisites: Python3

# Create Python virtual environment and activate it
python3 -m venv .venv 
source .venv/bin/activate

# Run the setup script 
./scripts/local_setup.sh

# Start Jupyter lab 
jupyter lab

References

Terms of Use

Governing Terms: The Parabricks container is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products. This Genomics Analysis Blueprint github repository is provided under Apache License 2.0.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors