Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/genai/tutorials/deepseek-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Learn how to chat with DeepSeek-R1-Distill ONNX models on your devi
has_children: false
parent: Tutorials
grand_parent: Generate API (Preview)
nav_order: 4
nav_order: 5
---

# Reasoning in Python with DeepSeek-R1-Distill models
Expand Down
117 changes: 117 additions & 0 deletions docs/genai/tutorials/snapdragon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
title: Run on Snapdragon devices
description: Learn how to run Phi-3.5 and Llama 3.2 ONNX models on Snapdragon devices
has_children: false
parent: Tutorials
grand_parent: Generate API (Preview)
nav_order: 6
---


# Run models on Snapdragon devices with NPUs

Learn how to run SLMs on Snapragon devices with ONNX Runtime.

## Models
Devices with Snapdragon NPUs requires models in a specific size and format.

Models supported currently are:
* [Phi-3.5 mini instruct](https://github.com/microsoft/ort_npu_samples/releases/tag/v73-phi-3.5-2.31)
* [Llama 3.2 3B](https://github.com/microsoft/ort_npu_samples)

Due to Meta licensing restrictions, the Llama model cannot be pre-published. Instructions to generate the Llama model can be found in the link.


## Python application

If your device has Python installed, you can run a simple question and answering script to query the model.

### Install the runtime

```powershell
pip install onnxruntime-genai
```

### Download the script

```powershell
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/model-qa.py -o model-qa.py
```

### Run the script

```powershell
python .\model-qa.py -e cpu -g -v --system_prompt "You are a helpful assistant. Be brief and concise." --chat_template "<|user|>\n{input} <|end|>\n<|assistant|>" -m ..\..\models\microsoft\phi-3.5-mini-instruct-npu-qnn-2.31-v2
```

### A look inside the Python script


## C++ Application

To run the models on snadragon NPU within a C++ application, use the code from here: https://github.com/microsoft/onnxruntime-genai/tree/main/examples/c.

Building and running this application requires a Windows PC with a Snadragon NPU, as well as:
* cmake
* Visual Studio 2022


1. Clone the repo

```powershell
git clone https://github.com/microsoft/onnxruntime-genai
cd examples\c
```

2. Install onnxruntime

Currently requires the nightly build of onnxruntime, as there are up to the minute changes to QNN support for language models.

Download the nightly version of the ONNX Runtime QNN binaries from [here](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/NuGet/Microsoft.ML.OnnxRuntime.QNN/overview/1.22.0-dev-20250225-0548-e46c0d8)


```powershell
mkdir onnxruntime-win-arm64-qnn
move Microsoft.ML.OnnxRuntime.QNN.1.22.0-dev-20250225-0548-e46c0d8.nupkg onnxruntime-win-arm64-qnn
cd onnxruntime-win-arm64-qnn
tar xvzf Microsoft.ML.OnnxRuntime.QNN.1.22.0-dev-20250225-0548-e46c0d8.nupkg
copy runtimes\win-arm64\native\* ..\..\..\lib
cd ..
```


3. Install onnxruntime-genai

```powershell
curl https://github.com/microsoft/onnxruntime-genai/releases/download/v0.6.0/onnxruntime-genai-0.6.0-win-arm64.zip -o onnxruntime-genai-win-arm64.zip
tar xvf onnxruntime-genai-win-arm64.zip
cd onnxruntime-genai-0.6.0-win-arm64
copy include\* ..\include
copy lib\* ..\lib
```

4. Build the sample

```powershell
cmake -A arm64 -S . -B build -DPHI3-QA=ON
cd build
cmake --build . --config Release
```

5. Run the sample

```
cd Release
.\phi3_qa.exe <path_to_model>
```











45 changes: 39 additions & 6 deletions docs/install/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,18 +117,16 @@ To build from source on Linux, follow the instructions [here](https://onnxruntim



## C#/C/C++/WinML Installs
## C# Installs

### Install ONNX Runtime

#### Install ONNX Runtime CPU
### Install ONNX Runtime CPU

```bash
# CPU
dotnet add package Microsoft.ML.OnnxRuntime
```

#### Install ONNX Runtime GPU (CUDA 12.x)
### Install ONNX Runtime GPU (CUDA 12.x)

The default CUDA version for ORT is 12.x

Expand All @@ -137,7 +135,7 @@ The default CUDA version for ORT is 12.x
dotnet add package Microsoft.ML.OnnxRuntime.Gpu
```

#### Install ONNX Runtime GPU (CUDA 11.8)
### Install ONNX Runtime GPU (CUDA 11.8)

1. Project Setup

Expand Down Expand Up @@ -179,6 +177,41 @@ dotnet add package Microsoft.ML.OnnxRuntime.DirectML
dotnet add package Microsoft.AI.MachineLearning
```

## C++/C Installs

### CPU

Find your release here: https://github.com/microsoft/onnxruntime/releases

Download and unzip the archive.

For example:

```
curl -LO https://github.com/microsoft/onnxruntime/releases/download/v1.20.0/onnxruntime-win-arm64-1.20.0.zip
```

On Windows

```
tar xvzf onnxruntime-win-arm64-1.20.0.zip
move onnxruntime-win-arm64-1.20.0\include <your application include folder>
move onnxruntime-win-arm64-1.20.0\lib <your application lib folder>
move
```

### Arm64 QNN

QNN binaries are published in the NuGet archive

```
curl -LO https://www.nuget.org/api/v2/package/Microsoft.ML.OnnxRuntime.QNN/1.20.0
tar xvzf microsoft.ml.onnxruntime.qnn.1.20.2.nupkg
move build\native\include <your application include folder>
move build\native\win-arm64\native <your application lib folder>
```


## Install on web and mobile

The pre-built packages have full support for all ONNX opsets and operators.
Expand Down
Loading