Description
Describe the bug
A clear and concise description of what the bug is.
- Not able to run ARM Compute Library properly with ResNet50. (GEMM)
- Also, doesn't seem to accelerate operator kernels comparing to CPU.
Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
I want to use ACL as soon as possible regarding my graduation project.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04 aarch64
- ONNX Runtime installed from (source or binary): Python wheel (cross-compiled)
- ONNX Runtime version: 1.6.0
- Python version: 3.5
- Visual Studio version (if applicable): X
- GCC/Compiler version (if compiling from source): aarch64-linux-gnu-gcc version 6.5.0
- CUDA/cuDNN version: X
- GPU model and memory: X
To Reproduce
- Describe steps/code to reproduce the behavior.
- Attach the ONNX model to the issue (where applicable) to expedite the investigation.
Running ResNet50 using ACLExecutionProvider. (model.onnx)
https://drive.google.com/drive/folders/1r7Ii-h0yCuZjGDu-C_-XyJZ-wg3ksfUW?usp=sharing
CMAKE cmd:
cmake -Donnxruntime_GCC_STATIC_CPP_RUNTIME=ON -DCMAKE_BUILD_TYPE=Release -Dprotobuf_WITH_ZLIB=OFF -DCMAKE_TOOLCHAIN_FILE=../many-tool-chain.cmake -Donnxruntime_ENABLE_PYTHON=ON -DPYTHON_LIBRARY=dl -Donnxruntime_USE_ACL=ON -Donnxruntime_ACL_HOME=/home/jwlee/ComputeLibrary -Donnxruntime_ACL_LIBS=/home/jwlee/ComputeLibrary/build -DPYTHON_EXECUTABLE=/home/jwlee/anaconda3/envs/tmp/bin/python3 -DONNX_CUSTOM_PROTOC_EXECUTABLE=/home/jwlee/protoc-3/bin/protoc "-DPYTHON_INCLUDE_DIR=/mnt/sdf2/usr/include;/mnt/sdf2/usr/include/python3.5m/" -DNUMPY_INCLUDE_DIR=/mnt/sdf2/usr/local/lib/python3.5/dist-packages/numpy/core/include/numpy ../cmake
Expected behavior
A clear and concise description of what you expected to happen.
Running on CPU shows the correct result, while ACL doesn't recognize parameters at the GEMM node (pred_w_0).
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.
Using ACL doesn't seem to accelerate the operators.
CPU:
https://drive.google.com/file/d/1cFKMR9KiK4zMSWWPkKZLoNJmVjzaxQs8/view?usp=sharing
ACL:
https://drive.google.com/file/d/13ZVTjSKVb2m9HO1Sd_0ZTUaSZIywUgSv/view?usp=sharing