You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.
version
1.0.0
author
Orchestra Research
license
MIT
tags
Infrastructure
Serverless
GPU
Cloud
Deployment
Modal
dependencies
modal>=0.64.0
Modal Serverless GPU
Comprehensive guide to running ML workloads on Modal's serverless GPU cloud platform.
When to use Modal
Use Modal when:
Running GPU-intensive ML workloads without managing infrastructure
Deploying ML models as auto-scaling APIs
Running batch processing jobs (training, inference, data processing)
Need pay-per-second GPU pricing without idle costs
# Single GPU@app.function(gpu="A100")# Specific memory variant@app.function(gpu="A100-80GB")# Multiple GPUs (up to 8)@app.function(gpu="H100:4")# GPU with fallbacks@app.function(gpu=["H100", "A100", "L40S"])# Any available GPU@app.function(gpu="any")
Container images
# Basic image with pipimage=modal.Image.debian_slim(python_version="3.11").pip_install(
"torch==2.1.0", "transformers==4.36.0", "accelerate"
)
# From CUDA baseimage=modal.Image.from_registry(
"nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04",
add_python="3.11"
).pip_install("torch", "transformers")
# With system packagesimage=modal.Image.debian_slim().apt_install("git", "ffmpeg").pip_install("whisper")
@app.cls(gpu="A100")classModel:
@modal.enter() # Run once at container startdefload(self):
self.model=load_model() # Load during warm-up@modal.method()defpredict(self, x):
returnself.model(x)
Parallel processing
@app.function()defprocess_item(item):
returnexpensive_computation(item)
@app.function()defrun_parallel():
items=list(range(1000))
# Fan out to parallel containersresults=list(process_item.map(items))
returnresults
Common configuration
@app.function(gpu="A100",memory=32768, # 32GB RAMcpu=4, # 4 CPU corestimeout=3600, # 1 hour maxcontainer_idle_timeout=120,# Keep warm 2 minretries=3, # Retry on failureconcurrency_limit=10, # Max concurrent containers)defmy_function():
pass