This directory contains scripts to test and compare ONNX models before and after simplification using onnxsim. These scripts help you understand the impact of ONNX model optimization on file size, inference speed, and numerical accuracy.
A comprehensive script that creates a complex PyTorch model, exports it to ONNX, simplifies it, and compares:
- Model file sizes
- Number of nodes and operations
- Inference performance
- Output accuracy
- Support for dynamic batch sizes
Usage:
python test_onnx_simplification.pyFeatures:
- Creates a complex model with attention, batch norm, and redundant operations
- Exports to ONNX with proper optimization settings
- Uses
onnxsimfor simplification - Benchmarks inference time with ONNXRuntime
- Validates output accuracy between original and simplified models
- Tests with multiple batch sizes
- Generates detailed comparison reports
Tests simplification on existing ONNX models in your workspace.
Usage:
python test_existing_model.py --model path/to/your/model.onnxExample:
python test_existing_model.py --model dynamic_shape_model_dynamo.onnx
python test_existing_model.py --model 290925_model_opset21.onnxFeatures:
- Works with any existing ONNX model
- Automatically creates appropriate test inputs
- Compares performance and accuracy
- Saves simplified model for inspection
- Handles dynamic shapes and multiple inputs/outputs
A simple demonstration script that finds ONNX models in the current directory and shows the simplification benefits.
Usage:
python demo_simplification.pyFeatures:
- Automatically finds ONNX models in current directory
- Shows basic statistics (node count, file size)
- Quick simplification without detailed benchmarking
- Good for getting a quick overview of potential improvements
Install required dependencies:
pip install -r requirements.txtOr manually:
pip install torch onnx onnxruntime onnxsim numpyMetric Original Simplified Improvement
--------------------------------------------------------------------------------
File Size (MB) 5.23 3.45 34.0%
Number of Nodes 1247 892 28.5%
Operator Types 23 18 21.7%
Inference Time (ms) 12.45 8.32 33.2%
- File Size Reduction: Smaller models load faster and use less memory
- Node Reduction: Fewer operations mean faster inference
- Operator Type Reduction: Simplified operation types may be better optimized
- Inference Time: Direct performance improvement measurement
The scripts check that simplified models produce identical outputs to the original models within a small tolerance (1e-5). This ensures that optimization doesn't change the model's behavior.
ONNX simplification (onnxsim) performs several optimizations:
- Constant Folding: Computes constant expressions at optimization time
- Dead Code Elimination: Removes unused operations
- Operator Fusion: Combines multiple operations into single optimized operations
- Shape Inference: Improves shape information for better optimization
- Redundant Operation Removal: Eliminates identity operations and no-ops
- Remove
Identitynodes that don't change data - Fold constants (e.g.,
x + 0becomes justx) - Merge consecutive operations (e.g., multiple reshapes)
- Eliminate unreachable code
- Simplify mathematical expressions
Typical improvements you might see:
| Model Type | Size Reduction | Speed Improvement | Node Reduction |
|---|---|---|---|
| CNN | 20-40% | 15-30% | 25-45% |
| Transformer | 15-35% | 10-25% | 20-40% |
| MLP | 10-25% | 5-20% | 15-30% |
To test your existing models:
-
Single Model Test:
python test_existing_model.py --model your_model.onnx
-
Quick Overview:
python demo_simplification.py
-
Full PyTorch Pipeline:
python test_onnx_simplification.py
-
"Module not found" errors: Install missing dependencies
pip install onnx onnxruntime onnxsim
-
Simplification validation fails: Some models with complex dynamic shapes may not simplify safely. The original model will still work.
-
Memory errors with large models: The scripts create multiple copies of models in memory. For very large models (>1GB), you may need more RAM.
-
Shape inference errors: Some models with very dynamic shapes may have issues. Try with smaller batch sizes or simpler models first.
-
Use GPU providers for faster inference:
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
-
For very large models, reduce the number of benchmark runs
-
Test with realistic input sizes for your use case
You can modify the scripts to use your own test data instead of random inputs:
# In test_existing_model.py, replace create_dummy_inputs()
test_inputs = {
'input_name': your_real_data.astype(np.float32)
}The scripts automatically test different batch sizes to verify dynamic shape support.
For detailed output comparison, the scripts provide:
- Maximum absolute difference between outputs
- Mean absolute difference
- Relative error percentage
This helps you understand if any numerical differences are significant for your application.
After running the tests, you'll find:
simplification_test/directory with simplified modelsonnx_simplification_test/directory with complete test results- Detailed console output with comparison tables
- Run the demo on your existing models
- Analyze the improvements for your specific use case
- Integrate simplified models into your deployment pipeline
- Consider further optimizations like quantization or tensorRT conversion
Remember: Always validate that simplified models maintain acceptable accuracy for your specific application!