The NVIDIA Biomedical AI-Q Research Agent Developer Blueprint allows you to create a deep research agent with virtual screening capabilities that can run on-premise, allowing anyone to create detailed research reports using on-premise data and web search. This developer blueprint is built on top of the AI-Q NVIDIA Research Assistant Blueprint. We have added capabilities from the Virtual Screening Blueprint in addition to the research capability, so that when a biomedical researcher is investigating a condition or disease, with a target protein and recent small-molecule therapy, virtual screening can help with discovering novel small-molecule therapies through guided molecular generation and docking. This developer blueprint serves as a demonstration on how you could add in your own custom functionalities that may not be virtual screening, to the existing research and report generation functionalities in the foundational AI-Q NVIDIA Research Assistant Blueprint.
- Key Features
- Target Audience
- Software Components
- Technical Diagram
- Minimum System Requirements
- Getting Started
- License
- Security Considerations
- Deep Research: Given a report topic and desired report structure, an agent (1) creates a report plan, (2) searches data sources for answers, (3) writes a report, (4) reflects on gaps in the report for further queries, (5) finishes a report with a list of sources.
- Parallel Search: During the research phase, multiple research questions are searched in parallel. For each query, the RAG service is consulted and an LLM-as-a-judge is used to check the relevancy of the results. If more information is needed, a fallback web search is performed. This search approach ensures internal documents are given preference over generic web results while maintaining accuracy. Performing query search in parallel allows for many data sources to be consulted in an efficient manner.
- Human-in-the-loop: Human feedback on the report plan, interactive report edits, and Q&A with the final report.
- Data Sources: Integration with the NVIDIA RAG blueprint to search multimodal documents with text, charts, and tables. For a full list of file formats supported such as
pdf,pptx,docx,jpegand more, and the modalities of extractions supported, please see the RAG Blueprint. Optional web search through Tavily. - Demo Web Application: Frontend web application showcasing end-to-end use of the Biomedical AI-Q Research Agent.
- Virtual Screening: Virtual Screening capabilities for discovering novel small-molecule therapies when researching a disease or condition. Including MolMIM for novel molecular generation and DiffDock for predicting the 3D structure of how a molecule interacts with a protein.
- Demonstration of Customizing the AI-Q Research Assistant Blueprint: This blueprint demonstrates how we could add the Virtual Screeing capabilities into the foundational AI-Q Research Assistant Blueprint, and similarly, how other customized capabilities can be added into the foundational research AI-Q Research Assistant Blueprint.
- Biomedical Researchers: This developer blueprint can be deployed by IT organizations to provide an on-premise deep research application with virtual screening capabilities for biomedical researchers
- Developers: This developer blueprint serves as a reference architecture for teams to add their own custom capabilities to a research-only agent
The Biomedical AI-Q Research Agent Developer Blueprint provides these components:
- Demo Frontend: A docker container with a fully functional demo web application is provided. This web application is deployed by default if you follow the getting started guides and is the easiest way to quickly experiment with deep research using internal data sources via the NVIDA RAG blueprint. The source code for this demo web application is not distributed.
- Backend Service via RESTful API: The main Biomedical AI-Q Research Agent code is distributed as the
aiq-airaPython package located in the/airadirectory. These backend functions are available directly or via a RESTful API. - Middleware Proxy: An nginx proxy is deployed as part of the getting started guides. This proxy enables frontend web applications to interact with a single backend service. In turn, the proxy routes requests between the NVIDIA RAG blueprint services and the Biomedical AI-Q Research Agent service.
Additionally, the blueprint uses these components:
- AI-Q NVIDIA Research Assistant Blueprint Provides research and report generation capabilities. This the baseline foundational blueprint that this developer blueprint adapts from and modifies.
- Virtual Screening Blueprint Provides virtual screening capabilities for discovering novel small-molecule therapies. We are specifically using the following components of the Virtual Screening Blueprint:
- NVIDIA NeMo Agent Toolkit Provides a toolkit for managing a LangGraph codebase. Provides observability, API services and documentation, and easy configuration of different LLMs.
- NVIDIA RAG Blueprint Provides a solution for querying large sets of on-premise multi-modal documents.
- NVIDIA NeMo Retriever Microservices
- NVIDIA NIM Microservices Used through the RAG blueprint for multi-modal document ingestion. Provides the foundational LLMs used for report writing and reasoning, including the Llama-3.3-Nemotron-Super-49B-v1 reasoning model. Provides the BioNeMo NIMs MolMIM and DiffDock for the virtual screening capability.
- Web search powered by Tavily Supplements on-premise sources with real-time web search.
- The rcsb-api package Provides a Python interface to RCSB PDB API services. It is used to search and fetch from RCSB PDB at RCSB.org the possible protein IDs based on the text of the retrieved target protein name.
- The pubchempy package Provides a Python interface query the PubChem website and database to find a molecule's SMILES string based on its name.
72 GB minimum (assuming only deploying RAG Ingestion locally as recommended)
37 GB minimum (assuming all using hosted NVIDIA NIM microservices and no localy deployment)
435 GB minimum
Ubuntu 22.04
16 CPUs
- Docker Compose with running minimal NIMs locally, utilizing NVIDIA AI Endpoint hosted NIMs
- Docker Compose with running all NIMs locally
NVIDIA Container ToolKit
GPU Driver - 530.30.02 or later
CUDA version - 12.6 or later
This blueprint can be run entirely with hosted NVIDIA NIM Microservices without local NIM deployments. See https://build.nvidia.com/ for details on each NIM. For this case, no GPU is required.
While it can be run without local NIM deployments, we recommend deploying the RAG Ingestion services locally. For this case, the GPU requirement is 1 x L40S or comparable.
| Use | Service(s) | Recommended GPU* |
|---|---|---|
| Nemo Retriever Microservices for multi-modal document ingest | graphic-elements, table-structure, paddle-ocr, nv-ingest, embedqa |
1 x H100 80GB* 1 x A100 80GB |
| Reasoning Model for Report Generation and RAG Q&A Retrieval | llama-3.3-nemotron-super-49b-v1 with a FP8 profile |
1 x H100 80 GB* 2 x A100 80GB |
| Instruct Model for Report Generation | llama-3.3-70b-instruct |
2 x H100 80GB* 4 x A100 80GB |
| Generative Model for Small Molecule Drug Development | nvcr.io/nim/nvidia/molmim:1.0.0 |
Single Ampere/L40 GPU with at least 3 GB memory (doc) |
| Generative Model for Molecular Docking | nvcr.io/nim/mit/diffdock:2.1.0 |
1 x H100 80GB 1 x A100 40GB 1 x A6000 48GB 1 x A10 24GB 1 x L40S 48GB (doc) |
| --- | -- | -- |
| Total | Entire Biomedical AI-Q Research Agent Developer Blueprint | 5 x H100 80GB* 8 x A100 80GB |
*This recommendation is based off of the configuration used to test the blueprint. For alternative configurations, view the RAG blueprint documentation.
- NVIDIA AI Enterprise developer licence required to local host NVIDIA NIM Microservices.
- NVIDIA API catalog or NGC API Keys for container download and access to hosted NVIDIA NIM Microservices
- TAVILY API Key for optional web search
- Deploy all components in the Biomedical AI-Q Research Agent with Docker Compose
- Follow Get Started Notebook for deploying with minimum local NIM deployment
- Follow Docker Compose for a full local deployment of all NIMs utilized in the blueprint
- Research new a condition or disease that is of interest to you, other than the example topic. You coud upload documents related to the new topic to the existing collection or a new collection, and utilize the limited web search for context retrieval.
- Add your own customizable functionalities to the foundational research functionality in AI-Q NVIDIA Research Assistant Blueprint. This Biomedical AI-Q Research Agent developer blueprint can serve as a starting point for your customization, or as a reference for the additions needed. See this Customization Guide.
- If you would like to customize your backend application in NVIDIA NeMo Agent Toolkit
- with Docker Compose, make your customizations, and follow the same processes in the first bullet point to bring up
aira-backend - without Docker Compose (virtual environment on bare metal), please visit the Local Development Guide.
- with Docker Compose, make your customizations, and follow the same processes in the first bullet point to bring up
Please visit the troubleshooting guide.
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
- The Biomedical AI-Q Research Agent Developer Blueprint doesn't generate any code that may require sandboxing.
- The Biomedical AI-Q Research Agent Developer Blueprint is shared as a reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment up to date, ensure the containers/source code are secure and free of known vulnerabilities.
- A frontend that handles AuthN & AuthZ should be in place as missing AuthN & AuthZ could provide ungated access to customer models if directly exposed to e.g. the internet, resulting in either cost to the customer, resource exhaustion, or denial of service.
- The Biomedical AI-Q Research Agent Developer Blueprint doesn't require any privileged access to the system.
- The end users are responsible for ensuring the availability of their deployment.
- The end users are responsible for building the container images and keeping them up to date.
- The end users are responsible for ensuring that OSS packages used by the developer blueprint are current.
- The logs from nginx proxy, backend, and demo app are printed to standard out. They can include input prompts and output completions for development purposes. The end users are advised to handle logging securely and avoid information leakage for production use cases.
GOVERNING TERMS: The NVIDIA Biomedical AI-Q Research Agent Developer Blueprint and Biomedical AI-Q Research Agent Brev launchable are governed by the Apache 2.0 License. The remaining software and materials are governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products; except as follows: (a) the models, other than the Llama-3.3-Nemotron-Super-49B-v1 model, are governed by the NVIDIA Community Model License; (b) the Llama-3.3-Nemotron-Super-49B-v1 model is governed by the NVIDIA Open Model License Agreement; (c) the NeMo Retriever extraction is governed by the Apache 2.0 license, and (d) data from the RCSB Protein Data Bank is governed by CC0 1.0 Universal.
ADDITIONAL INFORMATION: For NVIDIA Retrieval QA Llama 3.2 1B Reranking v2 model, NeMo Retriever Graphic Elements v1 model, and NVIDIA Retrieval QA Llama 3.2 1B Embedding v2: Llama 3.2 Community License Agreement. For Llama-3.3-70b-Instruct model, Llama 3.3 Community License Agreement. Built with Llama.
