Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow model loading time for CoreML quantized model #5718

Open
cccclai opened this issue Sep 27, 2024 · 9 comments
Open

Slow model loading time for CoreML quantized model #5718

cccclai opened this issue Sep 27, 2024 · 9 comments
Assignees
Labels
module: coreml Issues related to Apple's Core ML delegation triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@cccclai
Copy link
Contributor

cccclai commented Sep 27, 2024

🐛 Describe the bug

Get #5710 and run

python executorch.examples.apple.coreml.scripts.export -m resnet18 --quantize

The FP32 model runs fully resident on ANE at 0.9ms on average and 11.13ms cold-start (first inference).
The int8 quantized model runs also fully resident on ANE at 0.54ms on average and 3.10 ms cold-start. Also looking at the layers, looks like there is a lot of quantize followed immediately by dequantize.

Versions

Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.0 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.3)
CMake version: version 3.29.2
Libc version: N/A

Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.0-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==0.4.0a0+7047162
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] torch==2.5.0.dev20240618
[pip3] torchaudio==2.4.0.dev20240618
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240618
[conda] executorch                0.4.0a0+7047162          pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] numpydoc                  1.5.0           py311hca03da5_0
[conda] torch                     2.4.0a0+gitae81855           dev_0    <develop>
[conda] torchaudio                2.4.0.dev20240618          pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchvision               0.20.0.dev20240618          pypi_0    pypi
@cccclai cccclai added the module: coreml Issues related to Apple's Core ML delegation label Sep 27, 2024
@Olivia-liu Olivia-liu added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 27, 2024
@YifanShenSZ
Copy link
Collaborator

Could you please clarify what's the goal of this issue? Is the 3.10 ms cold-start time too much?

@d-findlay
Copy link

d-findlay commented Sep 27, 2024

Thanks @YifanShenSZ, I'd like to correct the above numbers. The goal of this issue is to resolve the long load time for quantized models using the CoreML delegate. Load times:
fp32: 484ms
quantized: 1392ms

@cccclai
Copy link
Contributor Author

cccclai commented Sep 27, 2024

Could you please clarify what's the goal of this issue? Is the 3.10 ms cold-start time too much?

Sorry the description is not clear; David's response is clearer, and it was original from him

@YifanShenSZ
Copy link
Collaborator

Thanks David, so the issue is, using executorch CoreML delegate has much longer loading time than directly using CoreML runtime?

Handing it over to @cymbalrush to investigate where the overhead came from

@d-findlay
Copy link

Thanks @YifanShenSZ @cymbalrush. Both models are using the ExecuTorch CoreML delegate. The quantized model takes much longer.

@cymbalrush
Copy link
Collaborator

@d-findlay how are you getting the load time? is it from the devtools?

@cccclai
Copy link
Contributor Author

cccclai commented Oct 3, 2024

@d-findlay how are you getting the load time? is it from the devtools?

I just asked @d-findlay and he said both devtools and the xcode instruments showed long load time.

@d-findlay
Copy link

@cymbalrush, we are using devtools while specifying profile so we can inspect it with Instruments.

We can see that the quantized model takes 1.3seconds to Load (prepare and cache) the model on CoreML, where 1.14 seconds is spent on The Neural Engine Compile.

Compared to the unquantized model that takes 464ms to Load (prepare and cache) the model on CoreML, where 297ms is spent on The Neural Engine Compile.

@d-findlay
Copy link

@cymbalrush It's also worth noting that when we try to use MODEL_TYPE.COMPILED_MODEL, we get a failure:
AttributeError: 'NoneType' object has no attribute 'get_compiled_model_path'

However, this is unrelated to the above concern that with default MODEL_TYPE we still get longer load times for quantized models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: coreml Issues related to Apple's Core ML delegation triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants