From 4a593c11644a2687922e03f65b7c106d0bc91d1a Mon Sep 17 00:00:00 2001 From: wejoncy Date: Wed, 27 Nov 2024 15:09:23 +0800 Subject: [PATCH 01/20] update coreml new opretors and flags --- .../CoreML-ExecutionProvider.md | 69 +++++++++++++++++-- 1 file changed, 63 insertions(+), 6 deletions(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 6ffa77edc60b5..7c6e0332d411b 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -41,6 +41,15 @@ The CoreML EP can be used via the C, C++, Objective-C, C# and Java APIs. The CoreML EP must be explicitly registered when creating the inference session. For example: +```C++ +Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"}; +Ort::SessionOptions so; +so.AppendExecutionProvider("CoreML", {{"MLComputeUnits", std::to_string("MLProgram")}}); +Ort::Session session(env, model_path, so); +``` + +> [!WARNING] +> Deprecated APIs `OrtSessionOptionsAppendExecutionProvider_CoreML` in ONNX Runtime 1.20.0. Please use `OrtSessionOptionsAppendExecutionProvider` instead. ```C++ Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"}; Ort::SessionOptions so; @@ -49,18 +58,47 @@ Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_CoreML(so, coreml_fla Ort::Session session(env, model_path, so); ``` -## Configuration Options +## Configuration Options (NEW API) There are several run time options available for the CoreML EP. To use the CoreML EP run time options, create an unsigned integer representing the options, and set each individual option by using the bitwise OR operator. +ProviderOptions can be set by passing the unsigned integer to the `AppendExecutionProvider` method. +``` +std::unordered_map provider_options = +{{"MLComputeUnits", std::to_string("MLProgram")}}; +``` +### Available Options (New API) +`ModelFormat` can be one of the following values: +- `MLProgram`: Create an MLProgram format model. Requires Core ML 5 or later (iOS 15+ or macOS 12+). +- `NeuralNetwork`: Create a NeuralNetwork format model. Requires Core ML 3 or later (iOS 13+ or macOS 10.15+). + +`MLComputeUnits` can be one of the following values: +- `CPUOnly`: Limit CoreML to running on CPU only. +- `CPUAndNeuralEngine`: Enable CoreML EP for Apple devices with a compatible Apple Neural Engine (ANE). +- `CPUAndGPU`: Enable CoreML EP for Apple devices with a compatible GPU. +- `ALL`: Enable CoreML EP for all compatible Apple devices. + +`RequireStaticInputShapes` can be one of the following values: +- `0`: Allow the CoreML EP to take nodes with inputs that have dynamic shapes. +- `1`: Only allow the CoreML EP to take nodes with inputs that have static shapes. + + +`EnableOnSubgraphs` can be one of the following values: +- `0`: Disable CoreML EP to run on a subgraph in the body of a control flow operator. +- `1`: Enable CoreML EP to run on a subgraph in the body of a control flow operator. + + +## Configuration Options (Old API) +> [!WARNING] +> It's deprecated ``` uint32_t coreml_flags = 0; coreml_flags |= COREML_FLAG_ONLY_ENABLE_DEVICE_WITH_ANE; ``` -### Available Options +### Available Options (Deprecated API) ##### COREML_FLAG_USE_CPU_ONLY @@ -147,28 +185,47 @@ Operators that are supported by the CoreML Execution Provider when a MLProgram m |Operator|Note| |--------|------| |ai.onnx:Add|| +|ai.onnx:Argmax|| |ai.onnx:AveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| +|ai.onnx:Cast|| |ai.onnx:Clip|| |ai.onnx:Concat|| |ai.onnx:Conv|Only 1D/2D Conv is supported.
Bias if provided must be constant.| |ai.onnx:ConvTranspose|Weight and bias must be constant.
padding_type of SAME_UPPER/SAME_LOWER is not supported.
kernel_shape must have default values.
output_shape is not supported.
output_padding must have default values.| -|ai.onnx.DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.| +|ai.onnx:DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.| |ai.onnx:Div|| +|ai.onnx:Erf|| |ai.onnx:Gemm|Input B must be constant.| +|ai.onnx:Gelu|| |ai.onnx:GlobalAveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| |ai.onnx:GlobalMaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| |ai.onnx:GridSample|4D input.
'mode' of 'linear' or 'zeros'.
(mode==linear && padding_mode==reflection && align_corners==0) is not supported.| -|ai.onnx.LeakyRelu|| +|ai.onnx:GroupNormalization|| +|ai.onnx:InstanceNormalization|| +|ai.onnx:LayerNormalization|| +|ai.onnx:LeakyRelu|| |ai.onnx:MatMul|Only support for transA == 0, alpha == 1.0 and beta == 1.0 is currently implemented.| |ai.onnx:MaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| +|ai.onnx:Max|| |ai.onnx:Mul|| |ai.onnx:Pow|Only supports cases when both inputs are fp32.| +|ai.onnx:PRelu|| +|ai.onnx:Reciprocal|this ask for a `epislon` (default 1e-4) where onnx don't provide| +|ai.onnx:ReduceSum|| +|ai.onnx:ReduceMean|| +|ai.onnx:ReduceMax|| |ai.onnx:Relu|| |ai.onnx:Reshape|| |ai.onnx:Resize|See [resize_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/coreml/builders/impl/resize_op_builder.cc) implementation. There are too many permutations to describe the valid combinations.| -|ai.onnx.Slice|starts/ends/axes/steps must be constant initializers.| -|ai.onnx.Split|If provided, `splits` must be constant.| +|ai.onnx:Round|| +|ai.onnx:Shape|| +|ai.onnx:Slice|starts/ends/axes/steps must be constant initializers.| +|ai.onnx:Split|If provided, `splits` must be constant.| |ai.onnx:Sub|| |ai.onnx:Sigmoid|| +|ai.onnx:Softmax|| +|ai.onnx:Sqrt|| +|ai.onnx:Squeeze|| |ai.onnx:Tanh|| |ai.onnx:Transpose|| +|ai.onnx:Unsqueeze|| From 1d8f3e991ec3b64d7c2d0a5982731cdbe1ddd7b3 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Wed, 27 Nov 2024 15:15:04 +0800 Subject: [PATCH 02/20] fix --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 7c6e0332d411b..4977d7880b51e 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -65,7 +65,7 @@ There are several run time options available for the CoreML EP. To use the CoreML EP run time options, create an unsigned integer representing the options, and set each individual option by using the bitwise OR operator. ProviderOptions can be set by passing the unsigned integer to the `AppendExecutionProvider` method. -``` +```c++ std::unordered_map provider_options = {{"MLComputeUnits", std::to_string("MLProgram")}}; ``` From d5fec11997955ac023ffd0007ed5e900184fbcca Mon Sep 17 00:00:00 2001 From: wejoncy Date: Wed, 27 Nov 2024 15:29:25 +0800 Subject: [PATCH 03/20] fix bad link --- docs/genai/tutorials/finetune.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/genai/tutorials/finetune.md b/docs/genai/tutorials/finetune.md index 5d0302b896dfc..9c0fb9efecd30 100644 --- a/docs/genai/tutorials/finetune.md +++ b/docs/genai/tutorials/finetune.md @@ -65,7 +65,7 @@ Olive generates models and adapters in ONNX format. These models and adapters ca Note: this operations requires a system with an NVIDIA GPU, with CUDA installed - Use the `olive fine-tune` command: https://microsoft.github.io/Olive/features/cli.html#finetune + Use the `olive fine-tune` command: https://microsoft.github.io/Olive/how-to/cli/cli-finetune.html Here is an example usage of the command: @@ -75,12 +75,12 @@ Olive generates models and adapters in ONNX format. These models and adapters ca 2. Optionally, quantize your model - Use the `olive quantize` command: https://microsoft.github.io/Olive/features/cli.html#quantize + Use the `olive quantize` command: https://microsoft.github.io/Olive/how-to/cli/cli-quantize.html 3. Generate the ONNX model and adapter using the quantized model - Use the `olive auto-opt` command for this step: https://microsoft.github.io/Olive/features/cli.html#auto-opt + Use the `olive auto-opt` command for this step: https://microsoft.github.io/Olive/how-to/cli/cli-auto-opt.html The `--adapter path` can either be a HuggingFace adapter reference, or a path to the adapter you fine-tuned above. @@ -162,4 +162,4 @@ python app.py -m -a <.onnx_adapter files> -t -s ## References * [Python API docs](../api/python.md#adapter-class) -* [Olive CLI docs](https://microsoft.github.io/Olive/features/cli.html) +* [Olive CLI docs](https://microsoft.github.io/Olive/how-to/index.html) From f21be8fa34f66263f6e960df725a5de476946621 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Wed, 27 Nov 2024 15:33:30 +0800 Subject: [PATCH 04/20] Revert "fix bad link" This reverts commit d5fec11997955ac023ffd0007ed5e900184fbcca. --- docs/genai/tutorials/finetune.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/genai/tutorials/finetune.md b/docs/genai/tutorials/finetune.md index 9c0fb9efecd30..5d0302b896dfc 100644 --- a/docs/genai/tutorials/finetune.md +++ b/docs/genai/tutorials/finetune.md @@ -65,7 +65,7 @@ Olive generates models and adapters in ONNX format. These models and adapters ca Note: this operations requires a system with an NVIDIA GPU, with CUDA installed - Use the `olive fine-tune` command: https://microsoft.github.io/Olive/how-to/cli/cli-finetune.html + Use the `olive fine-tune` command: https://microsoft.github.io/Olive/features/cli.html#finetune Here is an example usage of the command: @@ -75,12 +75,12 @@ Olive generates models and adapters in ONNX format. These models and adapters ca 2. Optionally, quantize your model - Use the `olive quantize` command: https://microsoft.github.io/Olive/how-to/cli/cli-quantize.html + Use the `olive quantize` command: https://microsoft.github.io/Olive/features/cli.html#quantize 3. Generate the ONNX model and adapter using the quantized model - Use the `olive auto-opt` command for this step: https://microsoft.github.io/Olive/how-to/cli/cli-auto-opt.html + Use the `olive auto-opt` command for this step: https://microsoft.github.io/Olive/features/cli.html#auto-opt The `--adapter path` can either be a HuggingFace adapter reference, or a path to the adapter you fine-tuned above. @@ -162,4 +162,4 @@ python app.py -m -a <.onnx_adapter files> -t -s ## References * [Python API docs](../api/python.md#adapter-class) -* [Olive CLI docs](https://microsoft.github.io/Olive/how-to/index.html) +* [Olive CLI docs](https://microsoft.github.io/Olive/features/cli.html) From 47ef5cbc27fb2939aa1633e3fc175d2d4ad4bfc1 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Wed, 27 Nov 2024 16:13:30 +0800 Subject: [PATCH 05/20] fix --- .../CoreML-ExecutionProvider.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 4977d7880b51e..c9698f81f0161 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -44,12 +44,14 @@ The CoreML EP must be explicitly registered when creating the inference session. ```C++ Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"}; Ort::SessionOptions so; -so.AppendExecutionProvider("CoreML", {{"MLComputeUnits", std::to_string("MLProgram")}}); +std::unordered_map provider_options; +provider_options["ModelFormat"] = std::to_string("MLProgram"); +so.AppendExecutionProvider("CoreML", provider_options); Ort::Session session(env, model_path, so); ``` -> [!WARNING] -> Deprecated APIs `OrtSessionOptionsAppendExecutionProvider_CoreML` in ONNX Runtime 1.20.0. Please use `OrtSessionOptionsAppendExecutionProvider` instead. + +**Deprecated** APIs `OrtSessionOptionsAppendExecutionProvider_CoreML` in ONNX Runtime 1.20.0. Please use `OrtSessionOptionsAppendExecutionProvider` instead. ```C++ Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"}; Ort::SessionOptions so; @@ -66,8 +68,11 @@ To use the CoreML EP run time options, create an unsigned integer representing t ProviderOptions can be set by passing the unsigned integer to the `AppendExecutionProvider` method. ```c++ -std::unordered_map provider_options = -{{"MLComputeUnits", std::to_string("MLProgram")}}; +std::unordered_map provider_options; +provider_options["ModelFormat"] = std::to_string("MLProgram"); +provider_options["MLComputeUnits"] = std::to_string("ALL"); +provider_options["RequireStaticInputShapes"] = std::to_string("0"); +provider_options["EnableOnSubgraphs"] = std::to_string("0"); ``` ### Available Options (New API) `ModelFormat` can be one of the following values: @@ -91,8 +96,6 @@ std::unordered_map provider_options = ## Configuration Options (Old API) -> [!WARNING] -> It's deprecated ``` uint32_t coreml_flags = 0; coreml_flags |= COREML_FLAG_ONLY_ENABLE_DEVICE_WITH_ANE; From 8304b12b7ab530ab051c8e44b5b71f2dcfddc7f2 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Thu, 28 Nov 2024 12:54:46 +0800 Subject: [PATCH 06/20] config --- .../CoreML-ExecutionProvider.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index c9698f81f0161..fc53a699cc70d 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -74,6 +74,22 @@ provider_options["MLComputeUnits"] = std::to_string("ALL"); provider_options["RequireStaticInputShapes"] = std::to_string("0"); provider_options["EnableOnSubgraphs"] = std::to_string("0"); ``` + +Python inference example code to use the CoreML EP run time options: +```python +import onnxruntime as ort +model_path = "model.onnx" +providers = [ + ('CoreMLExecutionProvider', { + "ModelFormat": "MLProgram", "MLComputeUnits": "ALL", + "RequireStaticInputShapes": "0", "EnableOnSubgraphs": "0" + }), +] + +session = ort.InferenceSession(model_path, providers=providers) +outputs = ort_sess.run(None, input_feed) +``` + ### Available Options (New API) `ModelFormat` can be one of the following values: - `MLProgram`: Create an MLProgram format model. Requires Core ML 5 or later (iOS 15+ or macOS 12+). From 9231742ed74cb0a80f4591adf573112289a028cb Mon Sep 17 00:00:00 2001 From: wejoncy Date: Tue, 3 Dec 2024 11:37:05 +0800 Subject: [PATCH 07/20] address comments --- .../CoreML-ExecutionProvider.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index fc53a699cc70d..2550c45ad7742 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -90,23 +90,29 @@ session = ort.InferenceSession(model_path, providers=providers) outputs = ort_sess.run(None, input_feed) ``` -### Available Options (New API) -`ModelFormat` can be one of the following values: +### Available Options (NEW API) +`ModelFormat` can be one of the following values: (`NeuralNetwork` by default ) - `MLProgram`: Create an MLProgram format model. Requires Core ML 5 or later (iOS 15+ or macOS 12+). - `NeuralNetwork`: Create a NeuralNetwork format model. Requires Core ML 3 or later (iOS 13+ or macOS 10.15+). -`MLComputeUnits` can be one of the following values: +`MLComputeUnits` can be one of the following values: (`ALL` by default ) - `CPUOnly`: Limit CoreML to running on CPU only. - `CPUAndNeuralEngine`: Enable CoreML EP for Apple devices with a compatible Apple Neural Engine (ANE). - `CPUAndGPU`: Enable CoreML EP for Apple devices with a compatible GPU. - `ALL`: Enable CoreML EP for all compatible Apple devices. -`RequireStaticInputShapes` can be one of the following values: +`RequireStaticInputShapes` can be one of the following values: (`0` by default ) + +Only allow the CoreML EP to take nodes with inputs that have static shapes. +By default the CoreML EP will also allow inputs with dynamic shapes, however performance may be negatively impacted by inputs with dynamic shapes. + - `0`: Allow the CoreML EP to take nodes with inputs that have dynamic shapes. - `1`: Only allow the CoreML EP to take nodes with inputs that have static shapes. -`EnableOnSubgraphs` can be one of the following values: +`EnableOnSubgraphs` can be one of the following values: (`0` by default ) + +Enable CoreML EP to run on a subgraph in the body of a control flow operator (i.e. a [Loop](https://github.com/onnx/onnx/blob/master/docs/Operators.md#loop), [Scan](https://github.com/onnx/onnx/blob/master/docs/Operators.md#scan) or [If](https://github.com/onnx/onnx/blob/master/docs/Operators.md#if) operator). - `0`: Disable CoreML EP to run on a subgraph in the body of a control flow operator. - `1`: Enable CoreML EP to run on a subgraph in the body of a control flow operator. From 2ec634177a951190832bfa61fde02bc2e08d47df Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 9 Dec 2024 11:16:21 +0000 Subject: [PATCH 08/20] add more flag --- docs/execution-providers/CoreML-ExecutionProvider.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 2550c45ad7742..2aa996576e1f9 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -116,6 +116,18 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `0`: Disable CoreML EP to run on a subgraph in the body of a control flow operator. - `1`: Enable CoreML EP to run on a subgraph in the body of a control flow operator. +`SpecializationStrategy`: This feature is available since MacOs>=10.15 or IOS>=18.0. This process can affect the model loading time and the prediction latency. Use this option to tailor the specialization strategy for your model. Navigate to [Apple Doc](https://developer.apple.com/documentation/coreml/mloptimizationhints-swift.struct/specializationstrategy-swift.property) for more information. Can be one of the following values: (`Default` by default ) +- `Default`: +- `FastPrediction`: + +`ProfileComputePlan`:Profile the Core ML MLComputePlan. This logs the hardware each operator is dispatched to and the estimated execution time. Intended for developer usage but provide useful diagnostic information if performance is not as expected. can be one of the following values: (`0` by default ) +- `0`: Disable profile. +- `1`: Enable profile. + +`AllowLowPrecisionAccumulationOnGPU`: please refer to https://developer.apple.com/documentation/coreml/mlmodelconfiguration/allowlowprecisionaccumulationongpu. can be one of the following values: (`0` by default ) +- `0`: Use float32 data type to accumulate data. +- `1`: Use low precision data(float16) to accumulate data. + ## Configuration Options (Old API) ``` From 2df72b7cc04ebae25cee46f79742245d1729bf49 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 9 Dec 2024 11:23:53 +0000 Subject: [PATCH 09/20] format --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 2aa996576e1f9..077d72b4a4200 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -124,7 +124,7 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `0`: Disable profile. - `1`: Enable profile. -`AllowLowPrecisionAccumulationOnGPU`: please refer to https://developer.apple.com/documentation/coreml/mlmodelconfiguration/allowlowprecisionaccumulationongpu. can be one of the following values: (`0` by default ) +`AllowLowPrecisionAccumulationOnGPU`: please refer to [Apple Doc](https://developer.apple.com/documentation/coreml/mlmodelconfiguration/allowlowprecisionaccumulationongpu). can be one of the following values: (`0` by default ) - `0`: Use float32 data type to accumulate data. - `1`: Use low precision data(float16) to accumulate data. From 93be3ebb8e83ac74e26015265f76ab2c58fd5022 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 10:01:35 +0800 Subject: [PATCH 10/20] Update docs/execution-providers/CoreML-ExecutionProvider.md Co-authored-by: Scott McKay --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 077d72b4a4200..4d8f6d67eb062 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -116,7 +116,7 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `0`: Disable CoreML EP to run on a subgraph in the body of a control flow operator. - `1`: Enable CoreML EP to run on a subgraph in the body of a control flow operator. -`SpecializationStrategy`: This feature is available since MacOs>=10.15 or IOS>=18.0. This process can affect the model loading time and the prediction latency. Use this option to tailor the specialization strategy for your model. Navigate to [Apple Doc](https://developer.apple.com/documentation/coreml/mloptimizationhints-swift.struct/specializationstrategy-swift.property) for more information. Can be one of the following values: (`Default` by default ) +`SpecializationStrategy`: This feature is available since macOS>=10.15 or iOS>=18.0. This process can affect the model loading time and the prediction latency. Use this option to tailor the specialization strategy for your model. Navigate to [Apple Doc](https://developer.apple.com/documentation/coreml/mloptimizationhints-swift.struct/specializationstrategy-swift.property) for more information. Can be one of the following values: (`Default` by default ) - `Default`: - `FastPrediction`: From cfbda0e1262fc4e823ea1206f41592903279bd3c Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 10:01:49 +0800 Subject: [PATCH 11/20] Update docs/execution-providers/CoreML-ExecutionProvider.md Co-authored-by: Scott McKay --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 4d8f6d67eb062..bb168b180621c 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -120,7 +120,7 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `Default`: - `FastPrediction`: -`ProfileComputePlan`:Profile the Core ML MLComputePlan. This logs the hardware each operator is dispatched to and the estimated execution time. Intended for developer usage but provide useful diagnostic information if performance is not as expected. can be one of the following values: (`0` by default ) +`ProfileComputePlan`:Profile the Core ML MLComputePlan. This logs the hardware each operator is dispatched to and the estimated execution time. Intended for developer usage but provides useful diagnostic information if performance is not as expected. can be one of the following values: (`0` by default ) - `0`: Disable profile. - `1`: Enable profile. From 2b29ed74ccca189fd7d50926f0c55297e029f7d1 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 16:36:12 +0800 Subject: [PATCH 12/20] update model cache --- .../CoreML-ExecutionProvider.md | 34 ++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index bb168b180621c..87d498129e217 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -66,7 +66,7 @@ There are several run time options available for the CoreML EP. To use the CoreML EP run time options, create an unsigned integer representing the options, and set each individual option by using the bitwise OR operator. -ProviderOptions can be set by passing the unsigned integer to the `AppendExecutionProvider` method. +ProviderOptions can be set by passing string to the `AppendExecutionProvider` method. ```c++ std::unordered_map provider_options; provider_options["ModelFormat"] = std::to_string("MLProgram"); @@ -128,6 +128,38 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `0`: Use float32 data type to accumulate data. - `1`: Use low precision data(float16) to accumulate data. +`ModelCachePath`: The path to the directory where the Core ML model cache is stored. CoreML EP will compile the captured subgraph to CoreML format graph and saved to disk. +For the same model, if caching is not enabled, CoreML EP will do the compiling and saving to disk every time, this may cost some time(even minutes) for complicated model. By passing a cache path and a model hash (which is different for different model), CoreML format model can be reused.(Cache disbled by default). +- `""` : Disable cache. (empty string by default) +- `"/path/to/cache"` : Enable cache. (path to cache directory, will be created if not exist) + +The model hash is very sensitive and important to a specific model, if the model content is changed, the hash will be changed, and the cache will be invalid. If user didn't provide a model hash, CoreML EP will calculate the hash based on the model Path, and use it as the model hash. Please attention that the model hash calculated by CoreML EP is not reliable if model path is not find or even user used a same model path for different model. In such case, even the model is changed, the cache will be reused, this will produce totally wrong results. + +Here is an example of how to fill model hash in metadata of model: +```python +import onnx +import hashlib + +def hash_file(file_path, algorithm='sha256', chunk_size=8192): + hash_func = hashlib.new(algorithm) + with open(file_path, 'rb') as file: + while chunk := file.read(chunk_size): + hash_func.update(chunk) + return hash_func.hexdigest() + +CACHE_KEY_NAME = "CACHE_KEY" +model_path = "/a/b/c/model.onnx" +m = onnx.load(model_path) + +cache_key = m.metadata_props.add() +cache_key.key = CACHE_KEY_NAME +cache_key.value = str(hash_file(model_path)) + +for entry in m.metadata_props: + print(entry) # to verify the metadata +onnx.save_model(m, model_path) +``` + ## Configuration Options (Old API) ``` From 5b6f2614dd2f6fdbab236950894e743024773b18 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 18:21:18 +0800 Subject: [PATCH 13/20] Update docs/execution-providers/CoreML-ExecutionProvider.md Co-authored-by: Scott McKay --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 87d498129e217..2e185fa30321a 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -155,8 +155,6 @@ cache_key = m.metadata_props.add() cache_key.key = CACHE_KEY_NAME cache_key.value = str(hash_file(model_path)) -for entry in m.metadata_props: - print(entry) # to verify the metadata onnx.save_model(m, model_path) ``` From bdb608eaa7c2216f5f3d1149edda21335400bf84 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 18:23:26 +0800 Subject: [PATCH 14/20] Update docs/execution-providers/CoreML-ExecutionProvider.md Co-authored-by: Scott McKay --- docs/execution-providers/CoreML-ExecutionProvider.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 2e185fa30321a..52a53e2c7dee3 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -133,7 +133,15 @@ For the same model, if caching is not enabled, CoreML EP will do the compiling a - `""` : Disable cache. (empty string by default) - `"/path/to/cache"` : Enable cache. (path to cache directory, will be created if not exist) -The model hash is very sensitive and important to a specific model, if the model content is changed, the hash will be changed, and the cache will be invalid. If user didn't provide a model hash, CoreML EP will calculate the hash based on the model Path, and use it as the model hash. Please attention that the model hash calculated by CoreML EP is not reliable if model path is not find or even user used a same model path for different model. In such case, even the model is changed, the cache will be reused, this will produce totally wrong results. +The cached information for the model is stored under a model hash in the cache directory. There are three ways the hash may be calculated, in order of preference. + +1. Read from the model metadata_props. This provides the user a way to directly control the hash, and is the recommended usage. The value must only contain alphanumeric characters. +2. Hash of the model url the inference session was created with. +3. Hash of the graph inputs and node outputs if the inference session was created with in memory bytes (i.e. there was no model path). + +It is critical that if the model changes either the hash value must change, or you must clear out the previous cache information. e.g. if the model url is being used for the hash (option 2 above) the updated model must be loaded from a different path to change the hash value. + +ONNX Runtime does NOT have a mechanism to track model changes and does not delete the cache entries. Here is an example of how to fill model hash in metadata of model: ```python From 2c3343f675df5616c2a4e66c3532379cd07b3f28 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 18:24:39 +0800 Subject: [PATCH 15/20] Update docs/execution-providers/CoreML-ExecutionProvider.md Co-authored-by: Scott McKay --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 52a53e2c7dee3..c59bbbc18c5b2 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -129,7 +129,7 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `1`: Use low precision data(float16) to accumulate data. `ModelCachePath`: The path to the directory where the Core ML model cache is stored. CoreML EP will compile the captured subgraph to CoreML format graph and saved to disk. -For the same model, if caching is not enabled, CoreML EP will do the compiling and saving to disk every time, this may cost some time(even minutes) for complicated model. By passing a cache path and a model hash (which is different for different model), CoreML format model can be reused.(Cache disbled by default). +For the given model, if caching is not enabled, CoreML EP will compile and save to disk every time, which may cost significant time (even minutes) for a complicated model. By providing a cache path the CoreML format model can be reused. (Cache disbled by default). - `""` : Disable cache. (empty string by default) - `"/path/to/cache"` : Enable cache. (path to cache directory, will be created if not exist) From a0db694e7899793bf1aed198b49939ba2273d383 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 18:30:24 +0800 Subject: [PATCH 16/20] polish doc --- docs/execution-providers/CoreML-ExecutionProvider.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index c59bbbc18c5b2..f05d76c11c822 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -68,11 +68,16 @@ To use the CoreML EP run time options, create an unsigned integer representing t ProviderOptions can be set by passing string to the `AppendExecutionProvider` method. ```c++ +Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"}; +Ort::SessionOptions so; +std::string model_path = "/a/b/c/model.onnx"; std::unordered_map provider_options; provider_options["ModelFormat"] = std::to_string("MLProgram"); provider_options["MLComputeUnits"] = std::to_string("ALL"); provider_options["RequireStaticInputShapes"] = std::to_string("0"); provider_options["EnableOnSubgraphs"] = std::to_string("0"); +so.AppendExecutionProvider("CoreML", provider_options); +Ort::Session session(env, model_path, so); ``` Python inference example code to use the CoreML EP run time options: @@ -148,6 +153,7 @@ Here is an example of how to fill model hash in metadata of model: import onnx import hashlib +# You can use any other hash algorithms to ensure the model and its hash-value is a one-one mapping. def hash_file(file_path, algorithm='sha256', chunk_size=8192): hash_func = hashlib.new(algorithm) with open(file_path, 'rb') as file: From bc8abde74d45107e27573e09cf2e9f2f3575de36 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 18:31:21 +0800 Subject: [PATCH 17/20] code format --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index f05d76c11c822..035902cb5e9fd 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -45,7 +45,7 @@ The CoreML EP must be explicitly registered when creating the inference session. Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"}; Ort::SessionOptions so; std::unordered_map provider_options; -provider_options["ModelFormat"] = std::to_string("MLProgram"); +provider_options["ModelFormat"] = std::to_string("MLProgram"); so.AppendExecutionProvider("CoreML", provider_options); Ort::Session session(env, model_path, so); ``` From dee7ad122395219d6e363c5bac177468738f69ac Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 18:40:08 +0800 Subject: [PATCH 18/20] ModelCacheDirectory --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 035902cb5e9fd..836f8de08550a 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -133,7 +133,7 @@ Enable CoreML EP to run on a subgraph in the body of a control flow operator (i. - `0`: Use float32 data type to accumulate data. - `1`: Use low precision data(float16) to accumulate data. -`ModelCachePath`: The path to the directory where the Core ML model cache is stored. CoreML EP will compile the captured subgraph to CoreML format graph and saved to disk. +`ModelCacheDirectory`: The path to the directory where the Core ML model cache is stored. CoreML EP will compile the captured subgraph to CoreML format graph and saved to disk. For the given model, if caching is not enabled, CoreML EP will compile and save to disk every time, which may cost significant time (even minutes) for a complicated model. By providing a cache path the CoreML format model can be reused. (Cache disbled by default). - `""` : Disable cache. (empty string by default) - `"/path/to/cache"` : Enable cache. (path to cache directory, will be created if not exist) From a66a47321863d8c469892246c2ffff1445c6ecee Mon Sep 17 00:00:00 2001 From: wejoncy Date: Mon, 16 Dec 2024 19:03:09 +0800 Subject: [PATCH 19/20] update doc --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 836f8de08550a..1f0d1eb0b63ca 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -140,7 +140,7 @@ For the given model, if caching is not enabled, CoreML EP will compile and save The cached information for the model is stored under a model hash in the cache directory. There are three ways the hash may be calculated, in order of preference. -1. Read from the model metadata_props. This provides the user a way to directly control the hash, and is the recommended usage. The value must only contain alphanumeric characters. +1. Read from the model metadata_props. This provides the user a way to directly control the hash, and is the recommended usage. The cache key should satisfy that, (1) The value must only contain alphanumeric characters. (2) len(value) < 32. EP will re-hash the cache-key to satisfy these conditions. 2. Hash of the model url the inference session was created with. 3. Hash of the graph inputs and node outputs if the inference session was created with in memory bytes (i.e. there was no model path). From 210ae635bf70cb05ac5c02ea29f30c070ae10531 Mon Sep 17 00:00:00 2001 From: wejoncy Date: Tue, 17 Dec 2024 15:36:14 +0800 Subject: [PATCH 20/20] update chars limit --- docs/execution-providers/CoreML-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index 1f0d1eb0b63ca..7551cc37c2179 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -140,7 +140,7 @@ For the given model, if caching is not enabled, CoreML EP will compile and save The cached information for the model is stored under a model hash in the cache directory. There are three ways the hash may be calculated, in order of preference. -1. Read from the model metadata_props. This provides the user a way to directly control the hash, and is the recommended usage. The cache key should satisfy that, (1) The value must only contain alphanumeric characters. (2) len(value) < 32. EP will re-hash the cache-key to satisfy these conditions. +1. Read from the model metadata_props. This provides the user a way to directly control the hash, and is the recommended usage. The cache key should satisfy that, (1) The value must only contain alphanumeric characters. (2) len(value) < 64. EP will re-hash the cache-key to satisfy these conditions. 2. Hash of the model url the inference session was created with. 3. Hash of the graph inputs and node outputs if the inference session was created with in memory bytes (i.e. there was no model path).