Skip to content

Commit 23731b8

Browse files
committed
added cmd instructions into blog.
1 parent 4e9220e commit 23731b8

File tree

1 file changed

+24
-2
lines changed
  • src/routes/blogs/deepseek-r1-on-device

1 file changed

+24
-2
lines changed

src/routes/blogs/deepseek-r1-on-device/+page.svx

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,10 @@ Are you a developer looking to harness the power of your users' local compute fo
1313

1414
Building on the recent ability to run models on [Copilot+PCs on NPUs](https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/), you can now efficiently run these models on CPU and GPU devices as well. You can now download and run the ONNX optimized variants of the models from [Hugging Face](https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX).
1515

16-
The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. To easily get started with the model, you can use our ONNX Runtime `Generate()` API. See instructions for CPU, GPU (CUDA, DML) [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md).
16+
1717

1818
## Download and run your models easily!
19+
The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. To easily get started with the model, you can use our ONNX Runtime `Generate()` API.
1920
<!-- Video Embed -->
2021
<div>
2122
<iframe
@@ -27,8 +28,29 @@ The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achievin
2728
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
2829
allowfullscreen
2930
/>
30-
</div>
31+
</div>
32+
33+
### Quickstart on CPU
34+
35+
36+
Installing onnxruntime-genai, olive, and dependencies for CPU in a virtual environment:
37+
```python
38+
python -m venv .venv && source .venv/bin/activate
39+
pip install requests numpy --pre onnxruntime-genai olive-ai
40+
```
41+
42+
Download the model directly using the huggingface cli:
43+
```python
44+
huggingface-cli download onnxruntime/DeepSeek-R1-Distill-ONNX --include "deepseek-r1-distill-qwen-1.5B/*" --local-dir ./
45+
```
46+
47+
CPU Chat inference. If you pulled the model from huggingface, adjust the model directory (-m) accordingly:
48+
```python
49+
wget https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/model-chat.py
50+
python model-chat.py -m deepseek-r1-distill-qwen-1.5B/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu
51+
```
3152

53+
See instructions for GPU (CUDA, DML) [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md).
3254
## ONNX Model Performance Improvements
3355

3456
ONNX enables you to run your models on-device across CPU, GPU, NPU. With ONNX you can run your models on any machine across all silica Qualcomm, AMD, Intel, Nvidia. See table below for some key benchmarks for Windows GPU and CPU devices.

0 commit comments

Comments
 (0)