-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Uploaded deepseek blog, ready for post. #23740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
3109b76
Uploaded deepseek blog, ready for post.
MaanavD c9751ec
Made changes, used new video.
MaanavD bc4f4db
Added new video.
MaanavD 4e9220e
removed unused section.
MaanavD 23731b8
added cmd instructions into blog.
MaanavD c5cbe5f
removed olive in install.
MaanavD c088dc9
Updated blog with feedback.
MaanavD 742b968
removed bf16
MaanavD 3c786ae
formatted blog better.
MaanavD 3550d1f
Formatting fixed
MaanavD File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| --- | ||
| title: Enhancing DeepSeek R1 performance for on-device inference with ONNX Runtime. | ||
| date: '19th Februrary, 2025' | ||
| description: 'Boost DeepSeek R1 performance on-device with ONNX Runtime, achieving faster inference across CPU, GPU, and NPU.' | ||
| keywords: 'DeepSeek R1 optimization, ONNX Runtime performance, AI inferencing on-device, GPU and CPU model acceleration, Quantizing AI models with Olive, Azure AI Foundry model catalog, ONNX Generative API, AI development best practices, Faster PyTorch alternatives, Model deployment on Copilot+ PCs' | ||
| authors: ['Parinita Rahi', 'Sunghoon Choi', 'Kunal Vaishnavi', 'Maanav Dalal'] | ||
| authorsLink: ['https://www.linkedin.com/in/parinitaparinita/', 'https://www.linkedin.com/in/sunghoon/', 'https://www.linkedin.com/in/kunal-v-16315b94/', 'https://www.linkedin.com/in/maanavdalal/'] | ||
| image: 'https://iili.io/2yV40bV.png' | ||
| imageSquare: 'https://iili.io/2yV40bV.png' | ||
| url: 'https://onnxruntime.ai/blogs/deepseek-r1-on-device' | ||
| --- | ||
| Are you a developer looking to harness the power of your users' local compute for AI inferencing on PCs with NPUs, GPUs, and CPUs? Look no further! | ||
|
|
||
| Building on the recent ability to run models on [Copilot+PCs on NPUs](https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/), you can now efficiently run these models on CPU and GPU devices as well. You can now download and run the ONNX optimized variants of the models from [Hugging Face](https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX). | ||
|
|
||
| The DeepSeek ONNX models enables you to run DeepSeek on any GPU or CPU, achieving performance speeds 1.3 to 6.3 times faster than native PyTorch. To easily get started with the model, you can use our ONNX Runtime `Generate()` API. See instructions for CPU, GPU (CUDA, DML) [here](https://github.com/microsoft/onnxruntime/blob/gh-pages/docs/genai/tutorials/deepseek-python.md). | ||
|
|
||
| ## Download and run your models easily! | ||
| <!-- Video Embed --> | ||
| <div> | ||
| <iframe | ||
| class="pb-2 w-full" | ||
| height="600px" | ||
| src="https://www.youtube.com/embed/s63vSd8ZI5g" | ||
| title="YouTube video player" | ||
| frameborder="0" | ||
| allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" | ||
| allowfullscreen | ||
| /> | ||
| </div> | ||
|
|
||
| ## ONNX Model Performance Improvements | ||
|
|
||
| ONNX enables you to run your models on-device across CPU, GPU, NPU. With ONNX you can run your models on any machine across all silica Qualcomm, AMD, Intel, Nvidia. See table below for some key benchmarks for Windows GPU and CPU devices. | ||
|
|
||
| | Model | Precision | Device Type | Execution Provider | Device | Token Generation Throughput | Speed Up vs PyTorch | | ||
| | ----------------------------------------- | ------------------ | ----------- | ------------------ | -------- | --------------------------- | ------------------- | | ||
| | deepseek-ai_DeepSeek-R1-Distill-Qwen-1.5B | ONNX fp16 | GPU | CUDA | RTX 4090 | 197.195 | 4X | | ||
| | deepseek-ai_DeepSeek-R1-Distill-Qwen-1.5B | ONNX Int4 | GPU | CUDA | RTX 4090 | 313.32 | 6.3X | | ||
| | deepseek-ai_DeepSeek-R1-Distill-Qwen-7B | ONNX fp16 | GPU | CUDA | RTX 4090 | 57.316 | 1.3X | | ||
| | deepseek-ai_DeepSeek-R1-Distill-Qwen-7B | ONNX Int4 | GPU | CUDA | RTX 4090 | 161.00 | 3.7X | | ||
| | deepseek-ai_DeepSeek-R1-Distill-Qwen-7B | ONNX Int4/bfloat16 | CPU | CPU | Intel i9 | 3.184 | 20X | | ||
parinitarahi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | deepseek-ai_DeepSeek-R1-Distill-Qwen-1.5B | ONNX Int4 | CPU | CPU | Intel i9 | 11.749 | 1.4X | | ||
|
|
||
| _CUDA BUILD SPECS: Onnxruntime-genai-cuda==0.6.0-dev, transformers==4.46.2, onnxruntime-gpu==1.20.1_ <br/> | ||
MaanavD marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| _CPU BUILD SPECS: Onnxruntime-genai==0.6.0-dev, transformers==4.46.2, onnxruntime==1.20.01_ | ||
parinitarahi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
MaanavD marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Easily Finetune your models with Olive. | ||
|
|
||
| This [notebook](https://github.com/microsoft/Olive/blob/main/examples/getting_started/olive-deepseek-finetune.ipynb) provides a step-by-step guide to fine-tuning DeepSeek models using the Olive framework. It covers the process of setting up your environment, preparing your data, and leveraging Azure AI Foundry to optimize and deploy your models. The notebook is designed to help you get started quickly and efficiently with DeepSeek and Olive, making your AI development process smoother and more effective. | ||
|
|
||
|
|
||
| ## Conclusion | ||
|
|
||
| Optimizing DeepSeek R1 distilled models with ONNX Runtime can lead to significant performance improvements. These optimized models are coming soon via Azure AI Foundry and can be easily accessed via the command line or the [VS Code AI Toolkit](https://code.visualstudio.com/docs/intelligentapps/overview). | ||
|
|
||
| By leveraging our AI framework solution with Azure Foundry, AI Toolkit, Olive, and ONNX Runtime you get your end-to-end solution for model development experience. Stay tuned for more updates and best practices on enhancing AI model performance. | ||
| <style> | ||
| a { | ||
| text-decoration: underline; | ||
| } | ||
| </style> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.