You can use the run subtool to compare a model between different frameworks.
In the simplest case, you can supply a model, and one or more framework flags.
By default, it will generate synthetic input data, run inference using the
specified frameworks, then compare outputs of the specified frameworks.
In this example, we'll outline various common use-cases for the run subtool:
- Comparing TensorRT And ONNX-Runtime Outputs
- Comparing TensorRT Precisions
- Changing Tolerances
- Changing Comparison Metrics
- Comparing Per-Layer Outputs Between ONNX-Runtime And TensorRT
To run the model in Polygraphy with both frameworks and perform an output comparison:
polygraphy run dynamic_identity.onnx --trt --onnxrtThe dynamic_identity.onnx model has dynamic input shapes. By default,
Polygraphy will override any dynamic input dimensions in the model to
constants.DEFAULT_SHAPE_VALUE (defined as 1) and warn you:
[W] Input tensor: X (dtype=DataType.FLOAT, shape=(1, 2, -1, -1)) | No shapes provided; Will use shape: [1, 2, 1, 1] for min/opt/max in profile.
[W] This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
In order to suppress this message and explicitly provide input shapes to
Polygraphy, use the --input-shapes option:
polygraphy run dynamic_identity.onnx --trt --onnxrt \
--input-shapes X:[1,2,4,4]
To build a TensorRT engine with reduced precision layers for comparison against
ONNXRT, use one of the supported precision flags (--tf32, --fp16, or
--int8). For example:
polygraphy run dynamic_identity.onnx --trt --fp16 --onnxrt \
--input-shapes X:[1,2,4,4]
⚠️ Getting acceptable accuracy with INT8 precision typically requires an additional calibration step: see the developer guide and instructions on how to do calibration with Polygraphy on the command line.
The default tolerances used by run are usually appropriate for FP32 precision
but may not be appropriate for reduced precisions. In order to relax tolerances,
you can use the --atol and --rtol options to set absolute and relative
tolerance respectively.
You can use the --check-error-stat option to change the metric used for
comparison. By default, Polygraphy uses an "elementwise" metric
(--check-error-stat elemwise).
Other possible metrics for --check-error-stat are mean, median, and max, which
compares the mean, median, and maximum absolute/relative error across the tensor, respectively.
To better understand this, suppose we are
comparing two outputs out0 and out1. Polygraphy takes
the elementwise absolute and relative difference of these tensors:
absdiff = out0 - out1
reldiff = absdiff / abs(out1)
Then, for each index i in the output, Polygraphy checks whether
absdiff[i] > atol and reldiff[i] > rtol. If any index satisfies this,
then the comparison will fail. This is less stringent than comparing the maximum
absolute and relative error across the entire tensor (--check-error-stat max) since if
different indices i and j satisfy absdiff[i] > atol and reldiff[j] > rtol,
then the max comparison will fail but the elemwise comparison may
pass.
Putting it all together, the below example runs a median comparison between
TensorRT using FP16 and ONNX-Runtime, using absolute and relative tolerances of 0.001:
polygraphy run dynamic_identity.onnx --trt --fp16 --onnxrt \
--input-shapes X:[1,2,4,4] \
--atol 0.001 --rtol 0.001 --check-error-stat medianYou can also specify per-output values for
--atol/--rtol/--check-error-stat. See the help output of therunsubtool for more information.
When network outputs do not match, it can be useful to compare per-layer outputs
to see where the error is introduced. To do so, you can use the --trt-outputs
and --onnx-outputs options respectively. These options accept one or more
output names as their arguments. The special value mark all indicates that all
tensors in the model should be compared:
polygraphy run dynamic_identity.onnx --trt --onnxrt \
--trt-outputs mark all \
--onnx-outputs mark allTo find the first mismatched output more easily, you can use the --fail-fast
option which will cause the tool to exit after the first mismatch between
outputs.
Note that use of --trt-outputs mark all can sometimes perturb the generated
engine due to differences in timing, layer fusion choices, and format
constraints, which can hide the failure. In that case, you may have to use a
more sophisticated approach to bisect the failing model and generate a reduced
test case that reproduces the error. See Reducing Failing ONNX
Models for a tutorial on
how to do this with Polygraphy.
-
In some cases you may need to do comparisons across multiple Polygraphy runs (for example, when comparing the output of a pre-built TensorRT engine or Polygraphy network script against ONNX-Runtime). See Comparing Across Runs for a tutorial on how to accomplish this.
-
For more details on working with dynamic shapes in TensorRT:
- See Dynamic Shapes in TensorRT for how to specify optimization profiles for use with the engine using the Polygraphy CLI
- See TensorRT and Dynamic Shapes for details on how to do this with the Polygraphy API
-
For details on how to supply real input data, see Comparing with Custom Input Data.
-
See Debugging TensorRT Accuracy Issues for a broader tutorial on how to debug accuracy failures using Polygraphy.