Skip to content

Commit 1792401

Browse files
authored
Create DeepSeek-R1-Distill-Qwen-python.md
1 parent 9171013 commit 1792401

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
## 1. Pre-Requisites: Make a virtual environment and install ONNX Runtime GenAI
2+
```bash
3+
# Installing onnxruntime-genai, olive, and dependencies for CPU
4+
python -m venv .venv && source .venv/bin/activate
5+
pip install requests numpy --pre onnxruntime-genai olive-ai"
6+
```
7+
8+
```bash
9+
# Installing onnxruntime-genai, olive, and dependencies for CUDA GPU
10+
python -m venv .venv && source .venv/bin/activate
11+
pip install requests numpy --pre onnxruntime-genai-cuda "olive-ai[gpu]"
12+
```
13+
14+
## 2. Acquire model
15+
16+
Choose your model and convert to ONNX. Note that many LLMs work, so feel free to try with other models too:
17+
18+
```bash
19+
# Using Olive auto-opt to pull a huggingface model, optimize for CPU, and quantize to INT4 using RTN.
20+
olive auto-opt --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --output_path ./deepseek-r1-distill-qwen-1.5B --device cpu --provider CPUExecutionProvider --precision int4 --use_model_builder --log_level 1
21+
```
22+
23+
```bash
24+
# Using Olive auto-opt to pull a huggingface model, optimize for CUDA GPUs, and quantize to INT4 using RTN.
25+
olive auto-opt --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --output_path ./deepseek-r1-distill-qwen-1.5B --device gpu --provider CUDAExecutionProvider --precision int4 --use_model_builder --log_level 1
26+
```
27+
28+
OR download directly using the Huggingface CLI:
29+
30+
```bash
31+
# Download the model directly using the huggingface cli
32+
huggingface-cli download onnxruntime/DeepSeek-R1-Distill-ONNX --include 'deepseek-r1-distill-qwen-1.5B/*' --local-dir .
33+
```
34+
35+
## 3. Play with your model on device!
36+
```bash
37+
# CPU Chat inference. If you pulled the model from huggingface, adjust the model directory (-m) accordingly
38+
curl -o https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/model-chat.py
39+
python model-chat.py -m deepseek-r1-distill-qwen-1.5B/model -e cpu --chat_template "<|begin▁of▁sentence|><|User|>{input}<|Assistant|>"
40+
```
41+
42+
```bash
43+
# GPU Chat inference. If you pulled the model from huggingface, adjust the model directory (-m) accordingly
44+
curl -o https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/model-chat.py
45+
python model-chat.py -m deepseek-r1-distill-qwen-1.5B/model -e cuda --chat_template "<|begin▁of▁sentence|><|User|>{input}<|Assistant|>"

0 commit comments

Comments
 (0)