Skip to content

sidnarsipur/OTX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OTX: Benchmark and test ONNX models

OTX is a Python-based tool for benchmarking and testing ONNX models with different optimization levels using ONNX Runtime.

Features

  • Test model correctness across multiple optimization levels
  • Profile GPU performance metrics (inference time, utilization, memory usage)
  • Compare perfomance between base and optimized models
  • Support for PYNVML monitoring and NVIDIA Nsight Compute profiling

Requirements

  • Python 3.8+
  • NVIDIA GPU with CUDA 11.0+ support
  • NVIDIA Nsight Compute (optional, for NCU profiling)

Installation

Install required Python packages:

pip install onnx onnxruntime-gpu numpy pillow torchvision requests huggingface-hub nvidia-ml-py

Quick Start

from OTX.implementation.resnet import ResnetModel
import onnxruntime as rt

# Load model from Hugging Face
model = ResnetModel("resnet50-v2-7.onnx", repo_id="onnxmodelzoo/resnet50-v2-7")
model.setup_dataset("path/to/imagenet/images")

# Run inference
outputs, stats = model.inference(device_id=0, capture_stats=True)

# Create optimized variant
optimized_model = model.optimize_model("optimized.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_ALL)
opt_outputs, opt_stats = optimized_model.inference(device_id=0)

# Compare results
comparison = model.compare_outputs(outputs, opt_outputs)

Optimization Levels

  1. DISABLE_ALL - No optimizations
  2. ENABLE_BASIC - Basic graph optimizations (constant folding, redundant node elimination)
  3. ENABLE_EXTENDED - Complex device-specific node fusions
  4. ENABLE_ALL - All optimizations including layout transformations
import onnxruntime as rt

basic = model.optimize_model("model_basic.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_BASIC)
extended = model.optimize_model("model_ext.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_EXTENDED)
all_opt = model.optimize_model("model_all.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_ALL)

GPU Profiling

PYNVML Mode (lightweight monitoring for full datasets):

outputs, stats = model.inference(device_id=0, capture_stats=True, ncu_mode=False)
print(f"GPU Utilization: {stats.avg_gpu_util}%")
print(f"Memory Usage: {stats.peak_memory_mb} MB")

NCU Mode (detailed kernel profiling):

outputs, ncu_output = model.inference(device_id=0, ncu_mode=True)

Supported Models

  • Image Classification: ResNet, MobileNet, ShuffleNet, SqueezeNet, GoogleNet
  • Object Detection: SSD, YOLO, Faster R-CNN

See OTX/implementation/ for all available models.

Bring Your Custom Model

from OTX.core import Model

class MyModel(Model):
    def setup_dataset(self, directory: str):
        """Load your dataset"""
        pass

    def score_output(self, outputs):
        """Score model predictions"""
        return {"accuracy": 0.95}

    def compare_outputs(self, outputs_a, outputs_b):
        """Compare two sets of outputs"""
        return {"agreement_ratio": 1.0}

    def prepare_input_feed(self, data, session):
        """Prepare input for ONNX Runtime"""
        input_name = session.get_inputs()[0].name
        return {input_name: data}

This project was developed as part of CSC 290 at the University of Rochester.

About

Benchmark and test ONNX models against optimized variants

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages