microsoft · natke · Feb 18, 2025 · Feb 25, 2025
diff --git a/docs/genai/tutorials/deepseek-python.md b/docs/genai/tutorials/deepseek-python.md
@@ -4,7 +4,7 @@ description: Learn how to chat with DeepSeek-R1-Distill ONNX models on your devi
 has_children: false
 parent: Tutorials
 grand_parent: Generate API (Preview)
-nav_order: 4
+nav_order: 5
 ---
 
 # Reasoning in Python with DeepSeek-R1-Distill models

diff --git a/docs/genai/tutorials/snapdragon.md b/docs/genai/tutorials/snapdragon.md
@@ -0,0 +1,117 @@
+---
+title: Run on Snapdragon devices 
+description: Learn how to run Phi-3.5 and Llama 3.2 ONNX models on Snapdragon devices 
+has_children: false
+parent: Tutorials
+grand_parent: Generate API (Preview)
+nav_order: 6
+---
+
+
+# Run models on Snapdragon devices with NPUs
+
+Learn how to run SLMs on Snapragon devices with ONNX Runtime.
+
+## Models
+Devices with Snapdragon NPUs requires models in a specific size and format.
+
+Models supported currently are:
+* [Phi-3.5 mini instruct](https://github.com/microsoft/ort_npu_samples/releases/tag/v73-phi-3.5-2.31)
+* [Llama 3.2 3B](https://github.com/microsoft/ort_npu_samples)
+
+Due to Meta licensing restrictions, the Llama model cannot be pre-published. Instructions to generate the Llama model can be found in the link.
+
+
+## Python application
+
+If your device has Python installed, you can run a simple question and answering script to query the model.
+
+### Install the runtime
+
+```powershell
+pip install onnxruntime-genai
+```
+
+### Download the script
+
+```powershell
+curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/model-qa.py -o model-qa.py
+```
+
+### Run the script
+
+```powershell
+python .\model-qa.py -e cpu -g -v --system_prompt "You are a helpful assistant. Be brief and concise." --chat_template "<|user|>\n{input} <|end|>\n<|assistant|>" -m ..\..\models\microsoft\phi-3.5-mini-instruct-npu-qnn-2.31-v2
+```
+
+### A look inside the Python script
+
+
+## C++ Application
+
+To run the models on snadragon NPU within a C++ application, use the code from here: https://github.com/microsoft/onnxruntime-genai/tree/main/examples/c.
+
+Building and running this application requires a Windows PC with a Snadragon NPU, as well as:
+* cmake
+* Visual Studio 2022
+
+
+1. Clone the repo
+
+   ```powershell
+   git clone https://github.com/microsoft/onnxruntime-genai
+   cd examples\c
+   ```
+
+2. Install onnxruntime
+
+   Currently requires the nightly build of onnxruntime, as there are up to the minute changes to QNN support for language models. 
+
+   Download the nightly version of the ONNX Runtime QNN binaries from [here](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/NuGet/Microsoft.ML.OnnxRuntime.QNN/overview/1.22.0-dev-20250225-0548-e46c0d8)
+
+
+   ```powershell
+   mkdir onnxruntime-win-arm64-qnn
+   move Microsoft.ML.OnnxRuntime.QNN.1.22.0-dev-20250225-0548-e46c0d8.nupkg onnxruntime-win-arm64-qnn
+   cd onnxruntime-win-arm64-qnn
+   tar xvzf Microsoft.ML.OnnxRuntime.QNN.1.22.0-dev-20250225-0548-e46c0d8.nupkg
+   copy runtimes\win-arm64\native\* ..\..\..\lib
+   cd ..
+   ```
+
+
+3. Install onnxruntime-genai
+
+   ```powershell
+   curl https://github.com/microsoft/onnxruntime-genai/releases/download/v0.6.0/onnxruntime-genai-0.6.0-win-arm64.zip -o onnxruntime-genai-win-arm64.zip
+   tar xvf onnxruntime-genai-win-arm64.zip
+   cd onnxruntime-genai-0.6.0-win-arm64
+   copy include\* ..\include
+   copy lib\* ..\lib
+   ```
+
+4. Build the sample
+
+   ```powershell
+   cmake -A arm64 -S . -B build -DPHI3-QA=ON
+   cd build
+   cmake --build . --config Release
+   ```
+
+5. Run the sample
+
+   ```
+   cd Release
+   .\phi3_qa.exe <path_to_model>
+   ```
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/install/index.md b/docs/install/index.md
@@ -117,18 +117,16 @@ To build from source on Linux, follow the instructions [here](https://onnxruntim
 
 
 
-## C#/C/C++/WinML Installs
+## C# Installs
 
-### Install ONNX Runtime
-
-#### Install ONNX Runtime CPU
+### Install ONNX Runtime CPU
 
 ```bash
 # CPU
 dotnet add package Microsoft.ML.OnnxRuntime
 ```
 
-#### Install ONNX Runtime GPU (CUDA 12.x)
+### Install ONNX Runtime GPU (CUDA 12.x)
 
 The default CUDA version for ORT is 12.x
 
@@ -137,7 +135,7 @@ The default CUDA version for ORT is 12.x
 dotnet add package Microsoft.ML.OnnxRuntime.Gpu
 ```
 
-#### Install ONNX Runtime GPU (CUDA 11.8)
+### Install ONNX Runtime GPU (CUDA 11.8)
 
 1. Project Setup
 
@@ -179,6 +177,41 @@ dotnet add package Microsoft.ML.OnnxRuntime.DirectML
 dotnet add package Microsoft.AI.MachineLearning
 ```
 
+## C++/C Installs
+
+### CPU
+
+Find your release here: https://github.com/microsoft/onnxruntime/releases
+
+Download and unzip the archive.
+
+For example:
+
+```
+curl -LO https://github.com/microsoft/onnxruntime/releases/download/v1.20.0/onnxruntime-win-arm64-1.20.0.zip
+```
+
+On Windows
+
+```
+tar xvzf onnxruntime-win-arm64-1.20.0.zip
+move onnxruntime-win-arm64-1.20.0\include <your application include folder>
+move onnxruntime-win-arm64-1.20.0\lib <your application lib folder>
+move 
+```
+
+### Arm64 QNN 
+
+QNN binaries are published in the NuGet archive
+
+```
+curl -LO https://www.nuget.org/api/v2/package/Microsoft.ML.OnnxRuntime.QNN/1.20.0
+tar xvzf microsoft.ml.onnxruntime.qnn.1.20.2.nupkg
+move build\native\include <your application include folder>
+move build\native\win-arm64\native <your application lib folder>
+```
+
+
 ## Install on web and mobile
 
 The pre-built packages have full support for all ONNX opsets and operators.