Genomic Intelligence Powered by Evo2
A full-stack genomic variant analysis platform leveraging the Evo2 DNA language model for zero-shot pathogenicity prediction
Features โข Tech Stack โข Quick Start โข Architecture โข API
|
|
|
|
| Technology | Version | Purpose |
|---|---|---|
| Next.js | 15 | React framework with App Router |
| React | 19 | UI library |
| TailwindCSS | 4 | Utility-first CSS |
| shadcn/ui | Latest | Component library |
| Framer Motion | 12 | Animations |
| TypeScript | 5.8 | Type safety |
| Technology | Version | Purpose |
|---|---|---|
| Python | 3.12 | Runtime |
| Modal | Latest | Serverless GPU infrastructure |
| Evo2 | 7B | DNA language model |
| PyTorch | 2.8 | Deep learning framework |
| Flash Attention | 2.8.3 | Efficient attention |
| CUDA | 12.6 | GPU acceleration |
- UCSC Genome Browser API โ Reference sequence data
- NCBI ClinVar API โ Clinical variant annotations
- NCBI Gene API โ Gene information and coordinates
- Node.js 20+ and npm
- Python 3.12+
- Modal account (sign up)
- NVIDIA GPU with CUDA support (for local development, or use Modal's H100s)
# Navigate to frontend directory
cd genelm-frontend
# Install dependencies
npm install
# Create environment file
cp .env.example .env.local
# Configure your environment variables
# NEXT_PUBLIC_ANALYZE_SINGLE_VARIANT_BASE_URL=<your-modal-endpoint>
# Start development server
npm run devThe frontend will be available at http://localhost:3000
# Navigate to backend directory
cd genelm-backend
# Install Modal CLI
pip install modal
# Authenticate with Modal
modal setup
# Deploy the application
modal deploy main.py
# Or run locally for development
modal serve main.pyAfter deployment, Modal will provide an endpoint URL for the analyze_single_variant API.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Frontend (Next.js) โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Gene Browserโ โ Sequence โ โ Variant Analysis โ โ
โ โ Component โ โ Viewer โ โ Dashboard โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโฌโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โ NCBI Gene โ โ UCSC Genome โ โ Modal Backend โ
โ API โ โ API โ โ (H100) โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโฌโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโ
โ Evo2 โ
โ (7B Model) โ
โโโโโโโโโโโโโโโโ
- Gene Selection โ User browses/searches genes via NCBI Gene API
- Sequence Loading โ Genomic sequence fetched from UCSC API
- Variant Selection โ User clicks nucleotide or selects ClinVar variant
- Evo2 Analysis โ Request sent to Modal backend with H100 GPU
- Prediction โ Model scores reference vs variant sequences
- Results โ Delta-likelihood score + pathogenicity prediction returned
Analyze a single nucleotide variant using Evo2.
Request Body:
{
"variant_pos": 43119628,
"alt_allele": "G",
"genome": "hg38",
"chromosome": "chr17"
}Response:
{
"position": 43119628,
"reference": "A",
"variant": "G",
"delta_score": -0.00234,
"prediction": "Likely pathogenic",
"confidence": 0.87
}| Field | Type | Description |
|---|---|---|
position |
int | Genomic position |
reference |
str | Reference allele |
variant |
str | Alternative allele |
delta_score |
float | Log-likelihood difference (ref - var) |
prediction |
str | "Likely pathogenic" or "Likely benign" |
confidence |
float | Confidence score (0-1) |
| Metric | Value |
|---|---|
| AUROC | ~95% |
| Model | Evo2 7B |
| Context Window | 8,192 bp |
| Variants Tested | 500 SNVs |
| Classification | LOF vs FUNC/INT |
The model uses a threshold-based classification derived from Youden's J statistic optimization on the BRCA1 saturation mutagenesis dataset.
| Configuration | Latency |
|---|---|
| Modal H100 (cold start) | ~30s |
| Modal H100 (warm) | ~2-5s |
| Batch scoring (100 variants) | ~60s |
GeneLM-Evo2/
โโโ genelm-frontend/ # Next.js frontend application
โ โโโ src/
โ โ โโโ app/ # App router pages
โ โ โโโ components/ # React components
โ โ โ โโโ gene-sequence.tsx
โ โ โ โโโ known-variants.tsx
โ โ โ โโโ ui/ # shadcn/ui components
โ โ โโโ utils/ # API utilities
โ โ โโโ variants-api.ts
โ โ โโโ genome-api.ts
โ โ โโโ genes-api.ts
โ โโโ package.json
โ โโโ tailwind.config.ts
โ
โโโ genelm-backend/ # Modal serverless backend
โ โโโ main.py # Evo2 model & API endpoints
โ โโโ requirements.txt
โ
โโโ README.md
- Batch variant analysis
- VCF file upload support
- Additional gene benchmarks (TP53, BRCA2)
- Variant effect visualization
- Export results to PDF/CSV
- Multi-model comparison (ESM, Nucleotide Transformer)
- Arc Institute โ Evo2 DNA language model
- Modal โ Serverless GPU infrastructure
- UCSC Genome Browser โ Reference genome data
- NCBI ClinVar โ Clinical variant database
- shadcn/ui โ Beautiful UI components
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ๐งฌ by Jarvis Zhang