Name	Name	Last commit message	Last commit date
parent directory ..
model-cache	model-cache
sglang	sglang
trtllm/disagg/wide_ep/gb200	trtllm/disagg/wide_ep/gb200
vllm/disagg	vllm/disagg
README.md	README.md

DeepSeek-R1 Recipes

Production-ready deployments for DeepSeek-R1 (671B MoE) across multiple backends and hardware configurations.

Available Configurations

Configuration	GPUs	Backend	Mode	Description
sglang/disagg-8gpu	16x H200	SGLang	Disaggregated WideEP	TP=8 per worker, single-node
sglang/disagg-16gpu	32x H200	SGLang	Disaggregated WideEP	TP=16 per worker, multi-node
trtllm/disagg/wide_ep/gb200	36x GB200	TensorRT-LLM	Disaggregated WideEP	8 decode + 1 prefill nodes
vllm/disagg	32x H200	vLLM	Disaggregated DEP16	Multi-node, data-expert parallel

Prerequisites

Dynamo Platform installed — See Kubernetes Deployment Guide
GPU cluster with H200 or GB200 GPUs matching the configuration requirements
HuggingFace token with access to DeepSeek models
High-bandwidth networking — InfiniBand or RoCE recommended for multi-node deployments

Quick Start

# Set namespace
export NAMESPACE=dynamo-demo
kubectl create namespace ${NAMESPACE}

# Create HuggingFace token secret
kubectl create secret generic hf-token-secret \
  --from-literal=HF_TOKEN="your-token-here" \
  -n ${NAMESPACE}

# Download model (update storageClassName in model-cache.yaml first!)
# For SGLang deployments:
kubectl apply -f model-cache/model-cache.yaml -n ${NAMESPACE}
kubectl apply -f model-cache/model-download-sglang.yaml -n ${NAMESPACE}

# For vLLM/TRT-LLM deployments:
kubectl apply -f model-cache/model-cache.yaml -n ${NAMESPACE}
kubectl apply -f model-cache/model-download.yaml -n ${NAMESPACE}

# Wait for download (this is a large model - may take 1+ hours)
# For SGLang: kubectl wait --for=condition=Complete job/model-download-sglang ...
# For vLLM/TRT-LLM: kubectl wait --for=condition=Complete job/model-download ...
kubectl wait --for=condition=Complete job/model-download-sglang -n ${NAMESPACE} --timeout=7200s

# Deploy (choose one configuration)
kubectl apply -f sglang/disagg-8gpu/deploy.yaml -n ${NAMESPACE}

Test the Deployment

# Port-forward the frontend (service name varies by deployment)
kubectl port-forward svc/sgl-dsr1-8gpu-frontend 8000:8000 -n ${NAMESPACE}

# Send a test request
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Model Details

Model: deepseek-ai/DeepSeek-R1
Architecture: 671B parameter Mixture-of-Experts (MoE)
Active parameters: ~37B per token
Recommended: FP8 quantization for production deployments

Hardware Requirements

DeepSeek-R1 is a very large model requiring significant GPU memory:

Configuration	Min GPU Memory	Recommended
16x H200 (SGLang TP=8)	1.1TB total	H200 SXM (141GB each)
32x H200 (SGLang TP=16, vLLM)	2.2TB total	H200 SXM (141GB each)
36x GB200 (TRT-LLM)	~2.5TB total	GB200 NVL72

Notes

Model download time: DeepSeek-R1 is ~1.3TB; expect 1-2 hours for download
NCCL errors: Usually indicate OOM. Reduce --mem-fraction-static in worker args
Multi-node: Requires InfiniBand/IBGDA enabled. See vLLM EP docs
Storage class: Update storageClassName in model-cache/model-cache.yaml before deploying

Backend-Specific Notes

SGLang

Uses WideEP (Wide Expert Parallel) for efficient MoE inference
See sglang/README.md for SGLang-specific configuration

TensorRT-LLM

Requires FP4 quantized checkpoint
GB200-specific optimizations

vLLM

Uses DEP (Data-Expert Parallel) with hybrid load balancing
See vllm/disagg/README.md for detailed setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

DeepSeek-R1 Recipes

Available Configurations

Prerequisites

Quick Start

Test the Deployment

Model Details

Hardware Requirements

Notes

Backend-Specific Notes

SGLang

TensorRT-LLM

vLLM

FilesExpand file tree

deepseek-r1

Directory actions

More options

Directory actions

More options

Latest commit

History

deepseek-r1

Folders and files

parent directory

README.md

DeepSeek-R1 Recipes

Available Configurations

Prerequisites

Quick Start

Test the Deployment

Model Details

Hardware Requirements

Notes

Backend-Specific Notes

SGLang

TensorRT-LLM

vLLM