A deterministic, auditable compiler platform for cell-state engineering. Scientists define cell-state transitions, configure biological constraints, and run the compiler to receive ranked candidate intervention designs — each scored by real Evo 2 genome foundation model inference.
Research platform only. All outputs are model-derived research signals. Not biological validation. Not for clinical use. Not for pathogen, toxin, or gain-of-function research.
Engineering cell states is one of the most important and difficult problems in modern medicine. The ability to reprogram a cell — say, converting an exhausted T cell back into a functional memory-like state, or pushing a fibroblast toward a cardiomyocyte — would unlock treatments for cancer, autoimmune disease, aging, and tissue regeneration.
But today the process is largely artisanal:
- Researchers manually search literature to identify candidate transcription factors or CRISPR targets
- Interventions are designed by intuition and prior knowledge
- Screening is expensive: a single experiment can take weeks and cost tens of thousands of dollars
- There is no principled way to rank candidates before committing to wet lab work
- Failed candidates leave no systematic trail — knowledge is lost between labs and experiments
The core bottleneck is the translation gap between a target cell state (what we want) and a ranked set of concrete molecular interventions (what to actually try). That gap is currently filled by expert intuition — brilliant, but slow, unscalable, and hard to audit.
Cell State Compiler treats cell-state engineering as a compilation problem.
Just as a software compiler translates high-level source code into optimized machine instructions, this platform takes a high-level biological specification — starting state, target state, constraints — and compiles it into ranked molecular intervention candidates, each scored against the genome itself using a foundation model.
The workflow:
[Starting Cell State] → [Compiler] → [Ranked Candidates]
[Target Cell State] [Evo 2 Scores]
[Constraint Set] [Assay Plan]
[Audit Trail]
Each compile job runs a deterministic nine-step pipeline:
- Text screening — safety gate blocks disallowed research domains before any computation
- State encoding — starting cell state encoded as a 384-dimensional vector (marker profile + pathway scores + state labels)
- Candidate generation — systematic enumeration across six intervention modalities (TF payload, CRISPRa, CRISPRi, RNA payload, regulatory context, small molecule context)
- Genome context build — each candidate is grounded to a real DNA context sequence
- Evo 2 scoring — genome foundation model scores sequence plausibility and produces a dense embedding for each candidate
- Trajectory prediction — deterministic model predicts the state transition path from start to target
- Risk assessment — flags hard constraint violations, biosecurity concerns, and high-uncertainty candidates
- Safety filtering — rejects any candidate that fails any gate; never silently degrades
- Ranking — candidates sorted by weighted composite score; assay plan and full report generated
The result is a ranked, explainable, auditable list of intervention candidates grounded in genome-level sequence plausibility — not just literature association.
Classical computational biology approaches to cell state analysis rely on curated gene regulatory networks, transcription factor binding databases, and pathway enrichment scores. These methods are powerful but limited: they can only reason about what has already been measured and annotated.
Genome foundation models — large neural networks trained on billions of base pairs of DNA — represent a qualitative shift. By learning the statistical structure of genomic sequences at scale, they develop internal representations that capture:
- Sequence plausibility — how likely a given DNA sequence is under the distribution of real genomic sequences
- Functional context — which sequence features associate with gene expression, chromatin accessibility, and regulatory activity
- Variant sensitivity — how a single nucleotide change alters the model's assessment of a locus
- Transferable embeddings — dense vector representations that can be used for downstream prediction tasks with relatively little labeled data
The analogy to large language models is direct. Just as GPT-scale models learn the statistical structure of language and generalize to new tasks, genome foundation models learn the statistical structure of DNA and generalize to regulatory genomics problems they were never explicitly trained on.
Evo 2 is a genome foundation model developed by the Arc Institute, trained on a large corpus of prokaryotic and eukaryotic sequences at single-nucleotide resolution. At 40 billion parameters it is the largest publicly available genome model to date.
Key capabilities used in this platform:
| Operation | What It Computes | How It's Used |
|---|---|---|
| Sequence scoring | Mean log-likelihood of a DNA sequence under the model | Measures how "native" a candidate context sequence looks to the genome — high plausibility = the genome can produce this; low plausibility = unusual sequence that may not function as intended |
| Embedding | Dense vector representation of a sequence from an intermediate layer | Used to compute embedding feature scores and stored for future retrieval/comparison |
| Variant effect | Delta log-likelihood between a reference and alternate sequence | Quantifies the effect of a proposed edit relative to the reference context |
These scores enter the ranking formula as explicit weighted terms — Evo 2 is a first-class input to ranking, not an annotation added afterward.
target_state_similarity 0.26
identity_preservation 0.18
safety 0.22
manufacturability 0.10
evo2_sequence_plausibility 0.12 ← Evo 2 score
evo2_context_confidence 0.08 ← derived from Evo 2 uncertainty
evo2_embedding_feature_score 0.04 ← from Evo 2 embedding
uncertainty_penalty -0.15 ← Evo 2 uncertainty penalizes rank
The goal: candidates that look plausible to the genome itself rank higher than candidates that are merely mechanistically appealing on paper.
When no CUDA GPU is available, the genome model service automatically falls back to a CPU composition scorer — a real 4-mer background frequency model against the human genome composition. This is genuine bioinformatics computation (the same statistical model used by tools like FIMO and HOMER for sequence background scoring), clearly labeled provider: cpu_composition in all outputs. It is not a mock and it is not silent — every result tells you which scoring method was used.
On GPU hardware with Evo 2 installed, all scoring switches automatically to real neural network inference.
┌─────────────────────────────────────────────────────────────┐
│ Browser │
│ Next.js 14 App Router · TypeScript · Tailwind │
│ TanStack Query · Recharts · Radix UI │
│ localhost:3000 │
└───────────────────────┬─────────────────────────────────────┘
│ REST / JSON
┌───────────────────────▼─────────────────────────────────────┐
│ FastAPI (Python 3.11) localhost:8000 │
│ JWT auth · SQLAlchemy 2 · Alembic │
│ Compiler pipeline · Audit logging │
└──────┬──────────────────────────┬────────────────────────────┘
│ RQ job queue │ httpx calls
▼ ▼
┌──────────────┐ ┌─────────────────────────────────────────┐
│ RQ Worker │ │ Genome Model Service :8100 │
│ (Python) │ │ FastAPI · Safety gateway │
│ │ │ ┌─────────────────────────────────┐ │
│ │ │ │ Evo 2 (GPU) OR CPU scorer │ │
│ │ │ │ arcinstitute/evo2_40b │ │
│ │ │ │ arcinstitute/evo2_7b │ │
│ │ │ │ arcinstitute/evo2_1b_base │ │
│ │ │ └─────────────────────────────────┘ │
└──────────────┘ └─────────────────────────────────────────┘
│
▼
┌──────────────┐ ┌──────────────┐
│ PostgreSQL │ │ Redis │
│ + pgvector │ │ (RQ broker) │
│ :5432 │ │ :6379 │
└──────────────┘ └──────────────┘
| Service | Technology | Purpose |
|---|---|---|
web |
Next.js 14, TypeScript, Tailwind | Full UI: projects, states, compile, candidates, reports |
api |
FastAPI, SQLAlchemy 2, pgvector | REST API, auth, compiler pipeline orchestration |
worker |
Python, RQ | Async compile job execution |
genome-model-service |
FastAPI | Evo 2 inference: scoring, embedding, variant effect |
postgres |
PostgreSQL 16 + pgvector | All relational data + 384-dim vector columns |
redis |
Redis 7 | RQ job queue |
users · organizations · organization_members
projects
cell_states (vector(384))
target_states (vector(384))
constraint_sets
compile_jobs
genome_assets
candidate_payloads
state_trajectories
risk_assessments
evo2_model_runs ← one record per Evo 2 API call
assay_plans
reports
experiments
audit_logs
compile_cell_program(request, db):
1. screen_text_fields() # biosecurity gate
2. check_evo2_health() # hard fail if service down
3. encode_cell_state() → vec384 # marker + pathway + label encoding
4. generate_candidates() # 5-6 modalities × target objectives
5. for each candidate:
build_genome_context() # ground to real DNA sequence
store GenomeAsset
score_candidate_with_evo2() # → ScoreSequenceResponse
embed_candidate_with_evo2() # → EmbedSequenceResponse
store Evo2ModelRun records
predict_trajectory() # deterministic state path
assess_risk()
apply_safety_filter()
compute_final_score() # weighted formula
6. rank_candidates()
7. generate_assay_plan()
8. generate_report()
9. write_audit_logs()- Docker Desktop 4.x or Docker Engine 24+ with Compose V2
- 16 GB RAM minimum (CPU-only mode)
GPU requirements for Evo 2:
| Model | VRAM | Notes |
|---|---|---|
evo2_1b_base |
~4 GB | CPU also works (slow, ~60s/sequence) |
evo2_7b |
~16 GB | Single A100/H100 40 GB |
evo2_40b |
~80 GB | Two H100 80 GB, use docker-compose.gpu.yml |
git clone <repo>
cd cell-state-compiler
cp .env.example .envKey .env settings:
# For CPU-only local development:
EVO2_MODEL_NAME=evo2_1b_base
EVO2_DEVICE=cpu
# For GPU with 40B model:
EVO2_MODEL_NAME=evo2_40b
EVO2_DEVICE=cuda:0
HUGGINGFACE_TOKEN=hf_... # required if repo is gated# Ubuntu / Debian
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
| sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart dockerCPU (development, any machine):
docker compose up --buildGPU with Evo 2 40B:
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --buildThe docker-compose.gpu.yml override:
- Uses
Dockerfile.gpu(CUDA 12.4 + flash-attn + evo2) - Sets
EVO2_MODEL_NAME=evo2_40b - Allocates all available GPUs to the genome model service
- Evo 2 weights download automatically from HuggingFace on first start (~80 GB for 40B)
# Wait for the api container to be healthy, then:
docker compose exec api python /scripts/seed_demo_data.pyThis creates:
- Admin user:
demo@cellcompiler.local/password123 - Demo organization, project, cell state, target state, constraint set
# All services healthy
docker compose ps
# API
curl http://localhost:8000/health
# Evo 2 status
curl http://localhost:8100/v1/evo2/health | python3 -m json.tool
# Frontend
open http://localhost:3000| URL | Description |
|---|---|
http://localhost:3000 |
Web application |
http://localhost:8000/docs |
FastAPI interactive docs |
http://localhost:8100/docs |
Genome model service docs |
- Log in at
http://localhost:3000with demo credentials or create an account - Create a project — name, cell type, disease context
- Define the starting state — marker profile (CD8, PD1, TOX, TCF7, ...), pathway scores, state labels
- Define the target state — desired markers, functional objectives
- Configure constraints — allowed modalities, forbidden mechanisms, risk thresholds
- Run compile — click Compile Cell Program; job runs asynchronously in the worker
- Review candidates — ranked table with Evo 2 scores, plausibility, uncertainty, trajectory
- Inspect each candidate — Overview, Scores, Evo2 Analysis, Trajectory, Risk, Assay Plan, Audit tabs
- Export report — full markdown + JSON compile report
curl http://localhost:8100/v1/evo2/health | python3 -m json.toolGPU with Evo 2 loaded:
{
"healthy": true,
"provider": "local",
"model_name": "evo2_40b",
"device": "cuda:0",
"cuda_available": true,
"model_loaded": true,
"smoke_test_passed": true
}CPU-only (4-mer composition scoring):
{
"healthy": true,
"provider": "cpu_composition",
"model_name": "4mer_human_background",
"device": "cpu",
"cuda_available": false,
"model_loaded": true,
"details": {
"evo2_available": false,
"scoring_mode": "cpu_composition"
}
}If you have access to NVIDIA's hosted Evo 2 NIM endpoint:
EVO2_PROVIDER=nvidia_nim
NVIDIA_NIM_API_KEY=your_key
NVIDIA_NIM_EVO2_URL=https://your-nim-endpointThe NIM adapter calls the remote endpoint for all scoring and embedding operations. If an operation is unsupported by the endpoint it returns 501 — it never fabricates results.
Safety is structural, not advisory:
- Biosecurity text gate — compile requests are rejected before computation if any text field contains pathogen, virus, toxin, virulence, immune evasion, gain of function, bioweapon, weapon, replication competent, or gain-of-function terms
- Sequence safety gateway — the genome model service screens every DNA sequence before it reaches Evo 2; blocked sequences are rejected, never silently passed
- Candidate safety filter — candidates with a blocked Evo 2 safety status, hard constraint violations, or a biosecurity-flagged risk class are removed from results; they are not scored lower, they are rejected
- Generation disabled by default — the sequence generation endpoint requires
ENABLE_EVO2_GENERATION=trueAND a purpose-specific safety gate pass before any generation occurs - No mock results — CI has a grep check that fails if
MockEvo2ormock_evo2appears anywhere in the codebase; there is no path through the system that returns fabricated scores - Full audit log — every action (login, project creation, compile job start/complete/fail, candidate view) is written to
audit_logswith user ID, timestamp, and entity reference
Disclaimer shown on every Evo 2 result:
Evo 2 scores are model-derived research signals, not biological validation. Experimental confirmation required before any research decision.
cell-state-compiler/
apps/
api/ FastAPI backend (Python 3.11)
app/
compiler/ Nine-step compile pipeline
models/ SQLAlchemy ORM models
api/routes/ REST endpoints
services/ Evo2 client, audit service
jobs/ RQ task definitions
migrations/ Alembic migrations
worker/ RQ worker process
genome-model-service/
app/
adapters/ LocalEvo2Adapter, CpuCompositionScorer, NvidiaEvo2NimAdapter
services/ Routing and safety application
safety.py SequenceSafetyGateway
model_health.py Health check with fallback logic
web/ Next.js 14 frontend
app/ App Router pages
components/ Shared components
hooks/ TanStack Query hooks
lib/ API client, auth utilities
scripts/
seed_demo_data.py
reset_db.py
check_evo2_runtime.py
data/demo_sequences/ Safe synthetic DNA for smoke tests
docker-compose.yml
docker-compose.gpu.yml
# Backend
cd apps/api
pip install -e ".[test]"
pytest app/tests/ -v
# Genome model service
cd apps/genome-model-service
pip install -e ".[test]"
pytest app/tests/ -v
# Frontend type check
cd apps/web
npm run buildgrep -r "MockEvo2\|mock_evo2" --include="*.py" --include="*.ts" . \
&& echo "FAIL: mocks found" || echo "PASS"# Start infra only
docker compose up postgres redis genome-model-service
# API
cd apps/api
pip install -e .
alembic upgrade head
uvicorn app.main:app --reload --port 8000
# Worker (separate terminal)
cd apps/worker
python worker.py
# Frontend
cd apps/web
npm install
npm run dev| Variable | Default | Description |
|---|---|---|
EVO2_PROVIDER |
local |
local or nvidia_nim |
EVO2_MODEL_NAME |
evo2_1b_base |
Evo 2 checkpoint: evo2_1b_base, evo2_7b, evo2_40b |
EVO2_DEVICE |
cpu |
cpu or cuda:0 |
EVO2_HEALTH_RUN_SMOKE_TEST |
true |
Run inference smoke test on health check |
EVO2_MAX_CONTEXT_LENGTH |
8192 |
Max tokens per forward pass |
HUGGINGFACE_TOKEN |
— | HuggingFace token for downloading gated models |
ENABLE_EVO2_GENERATION |
false |
Enable sequence generation endpoint |
SEQUENCE_SAFETY_MODE |
restricted |
Safety screening strictness |
MAX_SEQUENCE_LENGTH |
8192 |
Max DNA sequence length accepted |
NVIDIA_NIM_API_KEY |
— | API key for NVIDIA NIM endpoint |
NVIDIA_NIM_EVO2_URL |
— | NVIDIA NIM Evo 2 base URL |
JWT_SECRET |
— | Secret for JWT signing (change in production) |
DATABASE_URL |
— | PostgreSQL connection string |
REDIS_URL |
— | Redis connection string |
GENOME_MODEL_SERVICE_URL |
http://genome-model-service:8100 |
Internal URL for API → genome service calls |
See .env.example for all variables with comments.
Evo 2 requires a CUDA GPU and flash-attn. Without GPU hardware, the genome model service automatically falls back to CPU composition scoring. To run neural network inference, use the GPU compose override:
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --buildEvo 2 weights download from HuggingFace on first start. Weights are cached in the hf-model-cache Docker volume so subsequent starts are instant. Ensure you have enough disk space:
evo2_1b_base: ~2 GBevo2_7b: ~14 GBevo2_40b: ~80 GB
For gated repositories, set HUGGINGFACE_TOKEN in .env.
Switch to a smaller model:
EVO2_MODEL_NAME=evo2_7b # or evo2_1b_baseOr use device_map="auto" across multiple GPUs (enabled automatically when torch.cuda.device_count() > 1).
The worker container must be running and connected to Redis. Check:
docker compose logs worker
docker compose exec redis redis-cli pingTypically a 422 validation error from the API. Open browser DevTools → Network tab to see the raw error response.
- Retrieval-augmented candidate generation using pgvector similarity search across historical experiments
- Evo 2 variant effect scoring to rank specific nucleotide edits, not just genomic contexts
- Prediction vs. observation comparison with quantitative error analysis once wet lab results are uploaded
- Support for additional foundation models (Nucleotide Transformer, HyenaDNA, DNABERT-2) as alternative or ensemble backends
- Multi-user organizations with role-based access control
- Export candidates as structured protocols for lab automation (Opentrons, Hamilton)