In this tutorial we will using yolov8m model for object detection, including preparing, quantization and deploying a BF16 and XINT8 model using RyzenAI Software.
The following steps outline how to deploy the quantized model on an NPU:
- Download the yolov8m model from the Ultralytics and save it as ONNX (Optset 17) model
- Quantize the model to
BF16orXINT8using the AMD Quark Quantization API - Compile and run the model on NPU using ONNX Runtime with the Vitis AI Execution Provider
Python code to download the model from ultralytics
from ultralytics import YOLO
def export_yolov8m_to_onnx():
model = YOLO("yolov8m.pt")
print("Number of classes:", model.model.nc)
model.export(format="onnx", opset=17) # Exports to yolov8m.onnx
print("YOLOv8m exported to yolov8m.onnx")
if __name__ == "__main__":
export_yolov8m_to_onnx()Command to download and export the model from .pt to .onnx
cd models
python export_to_onnx.pyNote: If asked to update the Ultralytics package, it will upgrade the onnx-runtime. To run on NPU, ensure you create a new clone with existing environment created by RyzenAI software installer.
set RYZEN_AI_CONDA_ENV_NAME=ryzen-ai-<version>
conda create --name yolov8m_env --clone %RYZEN_AI_CONDA_ENV_NAME%
conda activate yolov8m_envInstall the required python packages for the tutorial:
pip install -r requirements.txtModel quantization levarages the power of AMD-Quark to optimize the model for significant performance without losing accuracy.
This tutorial will guide through quantizing model for BF16 configuration as well as XINT8 configuration
- Model Quantization - Model is quantized using AMD-Quark with
BF16configuration
python quantize_quark.py --input_model_path models/yolov8m.onnx \
--calib_data_path calib_images \
--output_model_path models/yolov8m_BF16.onnx \
--config BF16- Compile and Test Model - Run model inference on the
test_image.jpgfrom the COCO dataset
python run_inference.py --model_input models\yolov8m_BF16.onnx --input_image test_image.jpg --output_image test_output.jpg --device npu-bf16The BF16 quantized model accuracy is evaluated on COCO dataset
Use the prepare_data.py script to download the COCO dataset
python prepare_data.pyEvaluate the accuracy of the model on COCO dataset, use --device options cpu or npu-bf16 to measure accuracy metrics on CPU/NPU respectively.
python run_inference --model_input models\yolov8m_BF16.onnx --evaluate --coco_dataset datasets\coco --device npu-bf16| Yolov8m | mAP (AP@[IoU=0.50:0.95]) | mAP50 (AP@IoU=0.50) | mAP75 (AP@IoU=0.75) |
|---|---|---|---|
| Float 32 | 44.0 | 57.4 | 47.9 |
| BF16 (CPU) | 42.5 | 57.2 | 46.6 |
| BF16 (NPU) | 42.8 | 57.7 | 46.7 |
- Sample Output - Sample outputs generated using
BF16model
- Model Quantization - Model is quantized using AMD-Quark with
XINT8configuration
python quantize_quark.py --input_model_path models/yolov8m.onnx \
--calib_data_path calib_images \
--output_model_path models/yolov8m_XINT8.onnx \
--config XINT8- Compile and Test Model - Run model inference on the
test_image.jpgfrom the COCO dataset
python run_inference.py --model_input models\yolov8m_XINT8.onnx --input_image test_image.jpg --output_image test_output_int8.jpg --device npu-int8- Sample Output - Sample outputs generated using
XINT8configuration
The XINT8 quantized model accuracy is evaluated on COCO dataset
python run_inference --model_input models\yolov8m_XINT8.onnx --evaluate --coco_dataset datasets\coco --device npu-int8Note: The evaluation functions fails to detect any objects
The model uses concat operations to combine the confidence and bounding boxes as shown in the below in yolov8m ONNX model. This leads to significant degradation in confidence values, missing most of the bounding boxes.
We need to skip the post-processing sub-graph to improve the accuracy of the XINT8 quantized model. Shown below in the post-processing sub-graph yolov8m model.
After the above modifications model is quantized using AMD-Quark with XINT8 configuration
python quantize_quark.py --input_model_path models/yolov8m.onnx \
--calib_data_path calib_images \
--output_model_path models/yolov8m_XINT8.onnx \
--config XINT8
--exclude_subgraphs "[/model.22/Concat_3], [/model.22/Concat_5]]"Sample outputs generated using XINT8 quantized model with skipped nodes.
The XINT8 quantized model accuracy is evaluated on COCO dataset
Use the prepare_data.py script to download the COCO dataset
python prepare_data.pyEvaluate the accuracy of the model on COCO dataset, use --device options cpu or npu-int8 to measure accuracy metrics on CPU/NPU respectively.
python run_inference --model_input models\yolov8m_XINT8.onnx --evaluate --coco_dataset datasets\coco --device npu-int8| Yolov8m | mAP (AP@[IoU=0.50:0.95]) | mAP50 (AP@IoU=0.50) | mAP75 (AP@IoU=0.75) |
|---|---|---|---|
| Float 32 | 44.0 | 57.4 | 47.9 |
| XINT8 (CPU) | 38.2 | 52.3 | 41.8 |
| XINT8 (NPU) | 38.1 | 52.2 | 41.6 |




