Description
Describe the bug
I converted a frozen TensorFlow model (in the form of a .pb file) to ONNX using the following command:
python -m tf2onnx.convert --input rfcn_WIDERFACE.pb --inputs image_tensor:0[1,-1,-1,3] --outputs num_detections:0,detection_scores:0,detection_classes:0,detection_boxes:0 --output rfcn_WIDERFACE.onnx --opset=15
It gave me an ONNX file that is much slower on GPU (almost 2 seconds per image in average) than on CPU (0.35 seconds per image) with the ONNX runtime both in Python and C++. After analysis of the models it turned out that there is a lot of Loop subgraphs in the model, which is most likely the cause of this lack of performance:
Can this be a bug in the tf2onnx program? I installed the latest tf2onnx via pip install.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): Windows Server 2016
- TensorFlow Version: 2.9
- Python version: 3.8
- ONNX version (if applicable, e.g. 1.11*): 1.16
- ONNXRuntime version (if applicable, e.g. 1.11*): 1.16
To Reproduce
python -m tf2onnx.convert --input rfcn_WIDERFACE.pb --inputs image_tensor:0[1,-1,-1,3] --outputs num_detections:0,detection_scores:0,detection_classes:0,detection_boxes:0 --output rfcn_WIDERFACE.onnx --opset=15
The original .pb model: https://evolucare-my.sharepoint.com/:u:/p/a_ducournau/EdHJfemstxxOjNA0uTGfmEUBzHIfEhiSHHQ20jZ-v_zY0w?e=Ew1QSa
The produced ONNX model: https://evolucare-my.sharepoint.com/:u:/p/a_ducournau/ETufzCteZplCjU-ODydjq9QBI9vdXQ-MIE8FthiJdxR2rA?e=Uekq9w