OTX is a Python-based tool for benchmarking and testing ONNX models with different optimization levels using ONNX Runtime.
- Test model correctness across multiple optimization levels
- Profile GPU performance metrics (inference time, utilization, memory usage)
- Compare perfomance between base and optimized models
- Support for PYNVML monitoring and NVIDIA Nsight Compute profiling
- Python 3.8+
- NVIDIA GPU with CUDA 11.0+ support
- NVIDIA Nsight Compute (optional, for NCU profiling)
Install required Python packages:
pip install onnx onnxruntime-gpu numpy pillow torchvision requests huggingface-hub nvidia-ml-pyfrom OTX.implementation.resnet import ResnetModel
import onnxruntime as rt
# Load model from Hugging Face
model = ResnetModel("resnet50-v2-7.onnx", repo_id="onnxmodelzoo/resnet50-v2-7")
model.setup_dataset("path/to/imagenet/images")
# Run inference
outputs, stats = model.inference(device_id=0, capture_stats=True)
# Create optimized variant
optimized_model = model.optimize_model("optimized.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_ALL)
opt_outputs, opt_stats = optimized_model.inference(device_id=0)
# Compare results
comparison = model.compare_outputs(outputs, opt_outputs)- DISABLE_ALL - No optimizations
- ENABLE_BASIC - Basic graph optimizations (constant folding, redundant node elimination)
- ENABLE_EXTENDED - Complex device-specific node fusions
- ENABLE_ALL - All optimizations including layout transformations
import onnxruntime as rt
basic = model.optimize_model("model_basic.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_BASIC)
extended = model.optimize_model("model_ext.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_EXTENDED)
all_opt = model.optimize_model("model_all.onnx", rt.GraphOptimizationLevel.ORT_ENABLE_ALL)PYNVML Mode (lightweight monitoring for full datasets):
outputs, stats = model.inference(device_id=0, capture_stats=True, ncu_mode=False)
print(f"GPU Utilization: {stats.avg_gpu_util}%")
print(f"Memory Usage: {stats.peak_memory_mb} MB")NCU Mode (detailed kernel profiling):
outputs, ncu_output = model.inference(device_id=0, ncu_mode=True)- Image Classification: ResNet, MobileNet, ShuffleNet, SqueezeNet, GoogleNet
- Object Detection: SSD, YOLO, Faster R-CNN
See OTX/implementation/ for all available models.
from OTX.core import Model
class MyModel(Model):
def setup_dataset(self, directory: str):
"""Load your dataset"""
pass
def score_output(self, outputs):
"""Score model predictions"""
return {"accuracy": 0.95}
def compare_outputs(self, outputs_a, outputs_b):
"""Compare two sets of outputs"""
return {"agreement_ratio": 1.0}
def prepare_input_feed(self, data, session):
"""Prepare input for ONNX Runtime"""
input_name = session.get_inputs()[0].name
return {input_name: data}This project was developed as part of CSC 290 at the University of Rochester.