Skip to content

Latest commit

 

History

History
193 lines (123 loc) · 6.5 KB

File metadata and controls

193 lines (123 loc) · 6.5 KB

Introduction

In this tutorial we will using yolov8m model for object detection, including preparing, quantization and deploying a BF16 and XINT8 model using RyzenAI Software.

Overview

The following steps outline how to deploy the quantized model on an NPU:

  • Download the yolov8m model from the Ultralytics and save it as ONNX (Optset 17) model
  • Quantize the model to BF16 or XINT8 using the AMD Quark Quantization API
  • Compile and run the model on NPU using ONNX Runtime with the Vitis AI Execution Provider

Download and export the model

Python code to download the model from ultralytics

from ultralytics import YOLO

def export_yolov8m_to_onnx():
    model = YOLO("yolov8m.pt")
    print("Number of classes:", model.model.nc)
    model.export(format="onnx", opset=17)  # Exports to yolov8m.onnx
    print("YOLOv8m exported to yolov8m.onnx")

if __name__ == "__main__":
    export_yolov8m_to_onnx()

Command to download and export the model from .pt to .onnx

cd models
python export_to_onnx.py

Note: If asked to update the Ultralytics package, it will upgrade the onnx-runtime. To run on NPU, ensure you create a new clone with existing environment created by RyzenAI software installer.

Create and Activate Conda Environment

set RYZEN_AI_CONDA_ENV_NAME=ryzen-ai-<version>
conda create --name yolov8m_env --clone %RYZEN_AI_CONDA_ENV_NAME%
conda activate yolov8m_env

Install the required python packages for the tutorial:

pip install -r requirements.txt

Model Quantization and Compilation

Model quantization levarages the power of AMD-Quark to optimize the model for significant performance without losing accuracy. This tutorial will guide through quantizing model for BF16 configuration as well as XINT8 configuration

BF16 quantization

  • Model Quantization - Model is quantized using AMD-Quark with BF16 configuration
python quantize_quark.py --input_model_path models/yolov8m.onnx \
                         --calib_data_path calib_images \
                         --output_model_path models/yolov8m_BF16.onnx \
                         --config BF16
  • Compile and Test Model - Run model inference on the test_image.jpg from the COCO dataset
python run_inference.py --model_input models\yolov8m_BF16.onnx --input_image test_image.jpg --output_image test_output.jpg --device npu-bf16

Evaluate the model accuracy

The BF16 quantized model accuracy is evaluated on COCO dataset

Use the prepare_data.py script to download the COCO dataset

python prepare_data.py

Evaluate the accuracy of the model on COCO dataset, use --device options cpu or npu-bf16 to measure accuracy metrics on CPU/NPU respectively.

python run_inference --model_input models\yolov8m_BF16.onnx --evaluate --coco_dataset datasets\coco --device npu-bf16
Yolov8m mAP (AP@[IoU=0.50:0.95]) mAP50 (AP@IoU=0.50) mAP75 (AP@IoU=0.75)
Float 32 44.0 57.4 47.9
BF16 (CPU) 42.5 57.2 46.6
BF16 (NPU) 42.8 57.7 46.7
  • Sample Output - Sample outputs generated using BF16 model

test_output_bf16

XINT8 quantization

  • Model Quantization - Model is quantized using AMD-Quark with XINT8 configuration
python quantize_quark.py --input_model_path models/yolov8m.onnx \
                         --calib_data_path calib_images \
                         --output_model_path models/yolov8m_XINT8.onnx \
                         --config XINT8
  • Compile and Test Model - Run model inference on the test_image.jpg from the COCO dataset
python run_inference.py --model_input models\yolov8m_XINT8.onnx --input_image test_image.jpg --output_image test_output_int8.jpg --device npu-int8
  • Sample Output - Sample outputs generated using XINT8 configuration

test_output_int8

Evaluate quantized model accuracy

The XINT8 quantized model accuracy is evaluated on COCO dataset

python run_inference --model_input models\yolov8m_XINT8.onnx --evaluate --coco_dataset datasets\coco --device npu-int8

Note: The evaluation functions fails to detect any objects

Modification

The model uses concat operations to combine the confidence and bounding boxes as shown in the below in yolov8m ONNX model. This leads to significant degradation in confidence values, missing most of the bounding boxes.

image

We need to skip the post-processing sub-graph to improve the accuracy of the XINT8 quantized model. Shown below in the post-processing sub-graph yolov8m model.

image

Model Quantization

After the above modifications model is quantized using AMD-Quark with XINT8 configuration

python quantize_quark.py --input_model_path models/yolov8m.onnx \
                         --calib_data_path calib_images \
                         --output_model_path models/yolov8m_XINT8.onnx \
                         --config XINT8
                         --exclude_subgraphs "[/model.22/Concat_3], [/model.22/Concat_5]]"

Sample Output

Sample outputs generated using XINT8 quantized model with skipped nodes.

test_output_int8

Evaluate the model accuracy

The XINT8 quantized model accuracy is evaluated on COCO dataset

Use the prepare_data.py script to download the COCO dataset

python prepare_data.py

Evaluate the accuracy of the model on COCO dataset, use --device options cpu or npu-int8 to measure accuracy metrics on CPU/NPU respectively.

python run_inference --model_input models\yolov8m_XINT8.onnx --evaluate --coco_dataset datasets\coco --device npu-int8
Yolov8m mAP (AP@[IoU=0.50:0.95]) mAP50 (AP@IoU=0.50) mAP75 (AP@IoU=0.75)
Float 32 44.0 57.4 47.9
XINT8 (CPU) 38.2 52.3 41.8
XINT8 (NPU) 38.1 52.2 41.6