Skip to content

[Python API + ARM64] Running ResNet50 on ARM board using ACL Error and Performance Issue #7234

Open
@JAEWOOKe

Description

Describe the bug
A clear and concise description of what the bug is.

  1. Not able to run ARM Compute Library properly with ResNet50. (GEMM)
  2. Also, doesn't seem to accelerate operator kernels comparing to CPU.

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
I want to use ACL as soon as possible regarding my graduation project.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04 aarch64
  • ONNX Runtime installed from (source or binary): Python wheel (cross-compiled)
  • ONNX Runtime version: 1.6.0
  • Python version: 3.5
  • Visual Studio version (if applicable): X
  • GCC/Compiler version (if compiling from source): aarch64-linux-gnu-gcc version 6.5.0
  • CUDA/cuDNN version: X
  • GPU model and memory: X

To Reproduce

  • Describe steps/code to reproduce the behavior.
  • Attach the ONNX model to the issue (where applicable) to expedite the investigation.
    Running ResNet50 using ACLExecutionProvider. (model.onnx)
    https://drive.google.com/drive/folders/1r7Ii-h0yCuZjGDu-C_-XyJZ-wg3ksfUW?usp=sharing
    CMAKE cmd:
    cmake -Donnxruntime_GCC_STATIC_CPP_RUNTIME=ON -DCMAKE_BUILD_TYPE=Release -Dprotobuf_WITH_ZLIB=OFF -DCMAKE_TOOLCHAIN_FILE=../many-tool-chain.cmake -Donnxruntime_ENABLE_PYTHON=ON -DPYTHON_LIBRARY=dl -Donnxruntime_USE_ACL=ON -Donnxruntime_ACL_HOME=/home/jwlee/ComputeLibrary -Donnxruntime_ACL_LIBS=/home/jwlee/ComputeLibrary/build -DPYTHON_EXECUTABLE=/home/jwlee/anaconda3/envs/tmp/bin/python3 -DONNX_CUSTOM_PROTOC_EXECUTABLE=/home/jwlee/protoc-3/bin/protoc "-DPYTHON_INCLUDE_DIR=/mnt/sdf2/usr/include;/mnt/sdf2/usr/include/python3.5m/" -DNUMPY_INCLUDE_DIR=/mnt/sdf2/usr/local/lib/python3.5/dist-packages/numpy/core/include/numpy ../cmake

Expected behavior
A clear and concise description of what you expected to happen.
Running on CPU shows the correct result, while ACL doesn't recognize parameters at the GEMM node (pred_w_0).

Screenshots
If applicable, add screenshots to help explain your problem.
image
image

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.
Using ACL doesn't seem to accelerate the operators.
CPU:
https://drive.google.com/file/d/1cFKMR9KiK4zMSWWPkKZLoNJmVjzaxQs8/view?usp=sharing
ACL:
https://drive.google.com/file/d/13ZVTjSKVb2m9HO1Sd_0ZTUaSZIywUgSv/view?usp=sharing

Metadata

Assignees

No one assigned

    Labels

    ep:ACLissues related to ACL execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions