Description
I need to run the Triton Server using an ONNX model that generates a TensorRT engine on-the-fly. I'm aware that I could use the trtexec utility to generate the TensorRT engine, but I have multiple types of GPUs and would need to run the trtexec on different hosts. Using the ONNX Runtime to generate the TensorRT engine on-the-fly is what I need.
I have a ONNX model with grid, EfficientNMS plugin and dynamic batch size.
Using trtexec to build a model works fine.
./tensorrt/bin/trtexec --onnx=yolov7.onnx --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --workspace=4096 --saveEngine=yolov7-fp16-1x8x8.engine --timingCacheFile=timing.cache
Issue description
I0504 16:36:16.981021 1 server.cc:610]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0504 16:36:16.981056 1 server.cc:653]
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| yolov7| 1 | UNAVAILABLE: Internal: onnx runtime error 1: Load model from /models/yolov7/1/model.onnx failed:Fatal error: TRT:EfficientNMS_TRT(-1) is not a registered function/op |
+-----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I'm encountering a fatal error when running my YOLOv7 model with TensorRT optimization. Specifically, the error message states that "TRT:EfficientNMS_TRT(-1) is not a registered function/op".
Steps to reproduce
Run the model with the following configuration:
name: "yolov7"
platform: "onnxruntime_onnx"
max_batch_size: 10
input [
{
name: "images"
data_type: TYPE_FP32
dims: [1, 3, 640, 640]
}
]
output [
{
name: "num_dets"
data_type: TYPE_INT32
dims: [1, 1]
},
{
name: "det_boxes"
data_type: TYPE_FP32
dims: [1, 300, 4]
},
{
name: "det_scores"
data_type: TYPE_FP32
dims: [1, 300]
},
{
name: "det_classes"
data_type: TYPE_INT32
dims: [1, 300]
}
]
optimization { execution_accelerators {
gpu_execution_accelerator : [ {
name : "tensorrt"
parameters { key: "precision_mode" value: "FP16" }
parameters { key: "max_workspace_size_bytes" value: "4073741824" }}
]
}}
dynamic_batching {
max_queue_delay_microseconds: 100
}
Docker Run:
docker run --gpus all --rm --name triton_server --ipc=host -p8000:8000 -p8001:8001 -p8002:8002 -v /storage/triton-server/devel/triton-server_23.04/models:/models nvcr.io/nvidia/tritonserver:23.04-py3 tritonserver --model-repository=/models --log-verbose=1