Skip to content

Commit c9751ec

Browse files
committed
Made changes, used new video.
1 parent 3109b76 commit c9751ec

File tree

1 file changed

+19
-16
lines changed
  • src/routes/blogs/deepseek-r1-on-device

1 file changed

+19
-16
lines changed

src/routes/blogs/deepseek-r1-on-device/+page.svx

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,36 +9,26 @@ image: 'https://iili.io/2yV40bV.png'
99
imageSquare: 'https://iili.io/2yV40bV.png'
1010
url: 'https://onnxruntime.ai/blogs/deepseek-r1-on-device'
1111
---
12+
Are you a developer looking to harness the power of your users' local compute for AI inferencing on PCs with NPUs, GPUs, and CPUs? Look no further!
1213

13-
Are you a developer looking to harness the power of your users' local compute for AI inferencing on PCs with NPUs, GPUs, and CPUs? Look no further! Expanding on the recent capability to run the models on new [Copilot+PCs on NPUs](https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/), you now efficiently run these models on CPU and GPU devices as well. These models are available on Azure AI Foundry model catalog and [Hugging Face](https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX).
14+
Building on the recent ability to run models on [Copilot+PCs on NPUs](https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/), you can now efficiently run these models on CPU and GPU devices as well. You can now download and run the ONNX optimized variants of the models from [Hugging Face](https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX).
1415

15-
The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. These optimized models are coming soon via Azure AI Foundry and can be easily accessed via the command line or the [VS Code AI Toolkit](https://code.visualstudio.com/docs/intelligentapps/overview).
16-
17-
VS Code AI Toolkit and included frameworks like Olive gives you a complete stack that enables you to convert, optimize, quantize and finetune these models. Olive provides ease of optimized and quantized model generation. ONNX Runtime Generative API gives you faster performance.
18-
19-
## Convert, Quantize, and Run your models easily!
20-
21-
Watch the video below and follow along on how you can covert the DeepSeek model to ONNX using Olive, and run it using a chat template on-device using ONNXRuntime Generative API.
16+
The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. To easily get started with the model, you can use our ONNX Runtime `Generate()` API. See instructions for CPU, GPU (CUDA, DML) [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md).
2217

18+
## Download and run your models easily!
2319
<!-- Video Embed -->
2420
<div>
2521
<iframe
2622
class="pb-2 w-full"
2723
height="600px"
28-
src="https://www.youtube.com/embed/r_fXlTVHP_A"
24+
src="https://www.youtube.com/embed/XW-AYw-4oTQ"
2925
title="YouTube video player"
3026
frameborder="0"
3127
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
3228
allowfullscreen
3329
/>
3430
</div>
3531

36-
The steps of how to acquire and play with the model are also shared [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md).
37-
38-
## Easily Finetune your models with Olive.
39-
40-
This [notebook](https://github.com/microsoft/Olive/blob/samuel100/deekseek-example/examples/getting_started/olive-deepseek-finetune.ipynb) provides a step-by-step guide to fine-tuning DeepSeek models using the Olive framework. It covers the process of setting up your environment, preparing your data, and leveraging Azure AI Foundry to optimize and deploy your models. The notebook is designed to help you get started quickly and efficiently with DeepSeek and Olive, making your AI development process smoother and more effective.
41-
4232
## ONNX Model Performance Improvements
4333

4434
ONNX enables you to run your models on-device across CPU, GPU, NPU. With ONNX you can run your models on any machine across all silica Qualcomm, AMD, Intel, Nvidia. See table below for some key benchmarks for Windows GPU and CPU devices.
@@ -55,10 +45,23 @@ ONNX enables you to run your models on-device across CPU, GPU, NPU. With ONNX yo
5545
_CUDA BUILD SPECS: Onnxruntime-genai-cuda==0.6.0-dev, transformers==4.46.2, onnxruntime-gpu==1.20.1_ <br/>
5646
_CPU BUILD SPECS: Onnxruntime-genai==0.6.0-dev, transformers==4.46.2, onnxruntime==1.20.01_
5747

48+
## Convert, Quantize, and Run your models easily!
49+
50+
Watch the video below and follow along on how you can covert the DeepSeek model to ONNX using Olive, and run it using a chat template on-device using ONNXRuntime Generative API.
51+
52+
53+
54+
55+
## Easily Finetune your models with Olive.
56+
57+
This [notebook](https://github.com/microsoft/Olive/blob/main/examples/getting_started/olive-deepseek-finetune.ipynb) provides a step-by-step guide to fine-tuning DeepSeek models using the Olive framework. It covers the process of setting up your environment, preparing your data, and leveraging Azure AI Foundry to optimize and deploy your models. The notebook is designed to help you get started quickly and efficiently with DeepSeek and Olive, making your AI development process smoother and more effective.
58+
59+
5860
## Conclusion
5961

60-
Optimizing DeepSeek R1 distilled models with ONNX Runtime can lead to significant performance improvements. By leveraging our AI framework solution with Azure Foundry, AI Toolkit, Olive, and ONNX Runtime you get your end to end solution for model development experience. Stay tuned for more updates and best practices on enhancing AI model performance.
62+
Optimizing DeepSeek R1 distilled models with ONNX Runtime can lead to significant performance improvements. These optimized models are coming soon via Azure AI Foundry and can be easily accessed via the command line or the [VS Code AI Toolkit](https://code.visualstudio.com/docs/intelligentapps/overview).
6163

64+
By leveraging our AI framework solution with Azure Foundry, AI Toolkit, Olive, and ONNX Runtime you get your end-to-end solution for model development experience. Stay tuned for more updates and best practices on enhancing AI model performance.
6265
<style>
6366
a {
6467
text-decoration: underline;

0 commit comments

Comments
 (0)