You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/routes/blogs/deepseek-r1-on-device/+page.svx
+24-2Lines changed: 24 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,10 @@ Are you a developer looking to harness the power of your users' local compute fo
13
13
14
14
Building on the recent ability to run models on [Copilot+PCs on NPUs](https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/), you can now efficiently run these models on CPU and GPU devices as well. You can now download and run the ONNX optimized variants of the models from [Hugging Face](https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX).
15
15
16
-
The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. To easily get started with the model, you can use our ONNX Runtime `Generate()` API. See instructions for CPU, GPU (CUDA, DML) [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md).
16
+
17
17
18
18
## Download and run your models easily!
19
+
The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. To easily get started with the model, you can use our ONNX Runtime `Generate()` API.
19
20
<!-- Video Embed -->
20
21
<div>
21
22
<iframe
@@ -27,8 +28,29 @@ The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achievin
python model-chat.py -m deepseek-r1-distill-qwen-1.5B/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu
51
+
```
31
52
53
+
See instructions for GPU (CUDA, DML) [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md).
32
54
## ONNX Model Performance Improvements
33
55
34
56
ONNX enables you to run your models on-device across CPU, GPU, NPU. With ONNX you can run your models on any machine across all silica Qualcomm, AMD, Intel, Nvidia. See table below for some key benchmarks for Windows GPU and CPU devices.
0 commit comments