🧬 AI for Genomics Automation Workshop

A comprehensive workshop demonstrating AI-driven genomics workflow automation using AWS HealthOmics, Strands Agents, and multi-agent systems.

🎯 Overview

This workshop teaches you to build intelligent AI agents that can automate genomics workflows on AWS HealthOmics. You'll learn to create agents that can manage workflows, monitor runs, analyze results, and troubleshoot issues autonomously.

🏗️ Architecture

Strands Agents Framework - Python framework for building AI agents
AWS HealthOmics - Managed genomics service for workflow execution
Model Context Protocol (MCP) - Tool connectivity for external systems
Multi-Agent Systems - Coordinated agents for complex genomics pipelines

📚 Workshop Structure

1. Introduction to Strands Agents (`01-strands-agents-introduction.ipynb`)

Core concepts and architecture
Building your first HealthOmics agent
MCP integration and tool connectivity
Interactive experimentation

2. Genomics Supervisor Agent (`02-genomics-supervisor-agent.ipynb`)

Advanced agent orchestration
Workflow management and monitoring
Performance optimization strategies

3. Multi-Agent Genomics Pipeline (`03-multi-agent-genomics-pipeline.ipynb`)

Coordinated multi-agent systems
Specialized agents for different pipeline stages
End-to-end automation workflows

🛠️ Prerequisites

AWS Account with HealthOmics access
Python 3.12+
Basic understanding of genomics workflows
Familiarity with WDL/CWL workflow languages

🚀 Quick Start

Clone the repository

git clone <repository-url>
cd sample-healthomics-automation-with-ai-agents

Install dependencies

pip install -r notebooks/requirements.txt

Build workflow

cd somatic_variant_calling 
zip mutect2.zip main.wdl
aws s3 cp mutect2.zip s3://<your-bucket>/<your-prefix>/mutect2.zip

Deploy infrastructure

aws cloudformation deploy \
  --template-file infrastructure/infrastructure_cfn.yaml \
  --stack-name genomics-ai-workshop \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides \
    OmicsResourcesS3Bucket=<your-bucket> \
    OmicsResourcesS3Prefix=<your-prefix> \
    OmicsWorkflowDefinitionZipS3=mutect2.zip

Start the workshop
- Open notebooks/01-strands-agents-introduction.ipynb
- Follow the step-by-step instructions

📁 Project Structure

├── notebooks/                          # Interactive Jupyter notebooks
│   ├── 01-strands-agents-introduction.ipynb
│   ├── 02-genomics-supervisor-agent.ipynb
│   ├── 03-multi-agent-genomics-pipeline.ipynb
│   ├── civic-data/                     # Sample genomics data
│   │   ├── AssertionSummaries.tsv
│   │   ├── ClinicalEvidenceSummaries.tsv
│   │   ├── FeatureSummaries.tsv
│   │   └── VariantSummaries.tsv
│   ├── data_discovery_agent.py         # Data discovery agent implementation
│   ├── interpretation_and_reporting_agent.py # Reporting agent
│   ├── mcp_clients.py                  # MCP client configurations
│   ├── qc_agent.py                     # Quality control agent
│   ├── run_graph_agent.py              # Run monitoring agent
│   ├── workflow_orchestrator_agent.py  # Workflow orchestration agent
│   ├── test_workflow_orchestrator.py   # Test utilities
│   └── requirements.txt                # Python dependencies
├── infrastructure/                     # AWS CloudFormation templates
│   ├── infrastructure_cfn.yaml        # Main infrastructure
│   └── start_workflow/                # Lambda functions
│       ├── start_workflow_lambda.py   # Workflow starter Lambda
│       ├── build.sh                   # Build script
dependencies
├── somatic-variant-calling-pipeline/   # Sample WDL workflow
│   ├── main.wdl                       # Mutect2 workflow
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
└── README.md

🤖 Agent Capabilities

Core Agents

Data Discovery Agent - Find and catalog genomics datasets
QC Agent - Quality control and validation
Workflow Orchestrator - Manage workflow execution
Interpretation & Reporting - Analyze results and generate reports

Key Features

Workflow Management - Create, deploy, and version workflows
Real-time Monitoring - Track execution with automatic polling
Performance Analysis - Resource optimization recommendations
Failure Diagnostics - Automated troubleshooting
Validation - WDL/CWL syntax checking and best practices

🔧 Infrastructure Components

HealthOmics Workflows - Pre-configured Mutect2 somatic variant calling
HealthOmics Workflow Run -- Run a test Mutect2 workflow with publicly available data
S3 Storage - Workflow results and genomics data
IAM Roles - Secure access management
SageMaker Notebook - Interactive development environment
ECR Repositories - Container image management

📊 Sample Workflows

Mutect2 Somatic Variant Calling

Tumor/normal pair analysis
Scatter-gather parallelization
VCF to MAF conversion
Configurable "cooking show" mode for demonstrations

🎓 Learning Outcomes

By completing this workshop, you will:

Build Production AI Agents - Create robust agents using Strands framework
Integrate MCP Tools - Connect agents to external systems seamlessly
Automate Genomics Workflows - End-to-end pipeline automation
Implement Multi-Agent Systems - Coordinate specialized agents
Optimize Performance - Resource usage and cost optimization
Handle Failures - Automated error detection and recovery

🔍 Key Technologies

Strands Agents - AI agent framework
AWS HealthOmics - Genomics workflow service
Amazon Bedrock - Foundation models (Claude)
Model Context Protocol - Tool integration standard
WDL - Workflow description languages
GATK - Genomics analysis toolkit

📝 Requirements

strands-agents>=1.0.0
boto3
pandas>=2.3.0
bedrock-agentcore
awslabs-aws-healthomics-mcp-server
awslabs.aws-api-mcp-server>=0.0.13
uv

🆘 Support

For workshop-related questions:

Check the notebook documentation
Review the infrastructure logs
Consult AWS HealthOmics documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 AI for Genomics Automation Workshop

🎯 Overview

🏗️ Architecture

📚 Workshop Structure

1. Introduction to Strands Agents (`01-strands-agents-introduction.ipynb`)

2. Genomics Supervisor Agent (`02-genomics-supervisor-agent.ipynb`)

3. Multi-Agent Genomics Pipeline (`03-multi-agent-genomics-pipeline.ipynb`)

🛠️ Prerequisites

🚀 Quick Start

📁 Project Structure

🤖 Agent Capabilities

Core Agents

Key Features

🔧 Infrastructure Components

📊 Sample Workflows

Mutect2 Somatic Variant Calling

🎓 Learning Outcomes

🔍 Key Technologies

📝 Requirements

🆘 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
infrastructure		infrastructure
notebooks		notebooks
somatic-variant-calling-pipeline		somatic-variant-calling-pipeline
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

aws-samples/sample-healthomics-automation-with-ai-agents

Folders and files

Latest commit

History

Repository files navigation

🧬 AI for Genomics Automation Workshop

🎯 Overview

🏗️ Architecture

📚 Workshop Structure

1. Introduction to Strands Agents (01-strands-agents-introduction.ipynb)

2. Genomics Supervisor Agent (02-genomics-supervisor-agent.ipynb)

3. Multi-Agent Genomics Pipeline (03-multi-agent-genomics-pipeline.ipynb)

🛠️ Prerequisites

🚀 Quick Start

📁 Project Structure

🤖 Agent Capabilities

Core Agents

Key Features

🔧 Infrastructure Components

📊 Sample Workflows

Mutect2 Somatic Variant Calling

🎓 Learning Outcomes

🔍 Key Technologies

📝 Requirements

🆘 Support

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

1. Introduction to Strands Agents (`01-strands-agents-introduction.ipynb`)

2. Genomics Supervisor Agent (`02-genomics-supervisor-agent.ipynb`)

3. Multi-Agent Genomics Pipeline (`03-multi-agent-genomics-pipeline.ipynb`)

Packages