-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Cherry-picks into rel-1.22.0 #24580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Cherry-picks into rel-1.22.0 #24580
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description Excludes QnnGpu.dll from Windows x64 NuGet package because it is not available for that architecture. ### Motivation and Context Fix failure in QNN packaging pipeline: ```shell CreateNativePackage: Generating nuspec for the native Microsoft.ML.OnnxRuntime.QNN nuget package... python ..\tools\nuget\generate_nuspec_for_native_nuget.py --package_version 1.22.0-dev-20250421-0439-2abab8d --package_name Microsoft.ML.OnnxRuntime.QNN --target_architecture x64 --build_config RelWithDebInfo --native_build_path D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo --packages_path D:\a\_work\1\b\packages --ort_build_path D:\a\_work\1\b --sources_path D:\a\_work\1\s --commit_id 2abab8d --is_release_build False --execution_provider None --nuspec_name NativeNuget.nuspec 1 file(s) copied. 1 file(s) copied. nuspec_name: NativeNuget.nuspec Bundling native shared library artifacts into Microsoft.ML.OnnxRuntime nuget package... nuget pack NativeNuget.nuspec Attempting to build package from 'NativeNuget.nuspec'. ##[error]EXEC(0,0): Error NU5019: File not found: 'D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo\QnnGpu.dll'. EXEC : error NU5019: File not found: 'D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo\QnnGpu.dll'. [D:\a\_work\1\s\csharp\OnnxRuntime.CSharp.proj] ##[error]csharp\OnnxRuntime.CSharp.proj(109,5): Error MSB3073: The command "nuget pack NativeNuget.nuspec" exited with code 1. ``` Introduced by this PR: #24435
### Description <!-- Describe your changes. --> For QNN-EP, build FP-to-Bool Cast into NotEqual. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> HTP currently does not support FP-to-Bool Cast due to some limitations. To unblock CLIP models, replace such Cast with NotEqual to achieve the same functionality. Co-authored-by: minfhong-quic <[email protected]>
### Description Removes unnecessary std::move on an r-value expression. This caused a compiler warning/error in the Linux Android QNN pipeline. ### Motivation and Context Introduced by PR: #24466
`onnx.mapping` was deprecated and is being removed. This PR updates removes deprecated usage. @MaanavD would be good if this can make it into 1.22.0 for forward ONNX release (1.19+) compatibility.
…on Maven) (#24494) ### Description Updates the Android QNN package to use QNN SDK 2.33.0, which is available on Maven. QNN SDK 2.33.2 is not available yet on Maven: https://mvnrepository.com/artifact/com.qualcomm.qti/qnn-runtime ### Motivation and Context Previous PR that updated QNN SDK version: #24440
Increases operator coverage for WebGPU EP.
Reverts #24372 The above PR removes the `build-nuget` command-line argument from the `dml-vs-2022.yml` file. This PR reverts that change and adds the `build-nuget` back to the file. The `--build_nuget` option creates the `csharp\src\Microsoft.ML.OnnxRuntime\bin\RelWithDebInfo` directory structure and stores binaries in there. There's a subsequent task in the yaml file that tries to sign DLLs in the `csharp\src\Microsoft.ML.OnnxRuntime\bin\RelWithDebInfo`, however this task fails because the directory structure is now never created (due to removal of `--build_nuget`).
### Description <!-- Describe your changes. --> Add `python_version < "3.13"` for `onnxscript` dependency in tools/ci_build/github/linux/python/requirements.txt. `onnxscript` has `onnx` as a dependency. Building the `onnx` wheel fails with Python 3.13. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix pipeline build failures.
New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT. ### Description Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely). ### Motivation and Context The new TensorRT for RTX is going to have: 1. Much smaller footprint 2. Much faster model compile/load times. 3. Better usability in terms of use of cached models across multiple RTX GPUs. This effort is also targeting WCR ML workflows. --------- Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]>
…wnstream node is not QuantizeLinear (#24537) ### Description Updates the WeightBiasQuantization optimizer to skip processing on Conv/Gemm nodes if the downstream child node is not a QuantizeLinear. #### Before this PR Original graph: ``` input_0 -> DQ -> Conv -> graph_output (or non-Q node) ^ ^ | | weights_f32------+ | bias_f32------------+ ``` Becomes: ``` input_0 -> DQ ------> Conv -> graph_output (or non-Q node) ^ ^ | | weights_quant -> DQ --+ | bias_quant -> DQ --------+ ``` The above is **NOT** a valid QDQ node unit for Conv because the Conv's output is not consumed by a QuantizeLinear node. #### With this PR The above example graph remains unchanged after L1 optimizations: ``` input_0 -> DQ -> Conv -> graph_output (or non-Q node) ^ ^ | | weights_f32------+ | bias_f32------------+ ``` ### Motivation and Context Caused inaccuracy for a customer model. Automatically quantizing the weights and biases of a Conv/Gemm is detrimental if the output of the Conv/Gemm is not consumed by a QuantizeLinear node. In this scenario, the whole node group is not considered a valid QDQ node unit, and so the EP has to run the Conv/Gemm as float32/float16 anyway. If the Conv/Gemm is running as float32/float16, then quantizing the weights and biases introduces inaccuracy for no gain. PR that originally added this optimizer: #22969
### Description <!-- Describe your changes. --> Add wrappers for the AutoEP C API changes to the C++ API. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Fixed the bug in #24228 which causes the incorrect result for phi models when flash attention is disabled.
### Description Fixes a segfault that occurs when an EP library is re-loaded in the same process. ### Motivation and Context A recent [PR ](#24430) updated the Environment to unload all EP libraries on destruction of `OrtEnv`. We forgot to properly update the state to mark the EP library as unloaded. Therefore, this caused a segfault when the EP library was re-loaded.
### Description This PR updates how the K path is identified in Phi-4 multimodal. ### Motivation and Context This is needed as part of the updates made to the rewritten modeling code for the speech component of Phi-4 multimodal.
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description <!-- Describe your changes. --> Fix memleakdbg call stack output. The call stack output was getting clobbered: `C:\dev\onnxruntime\build\Debug\_deps\googletest-src\googletest\include\gtest\internal\gtest-port.h(1631): l\gtest-port.h(1631): eadLocal<testing::Sequence *>::GetOrCreateValue` I think the issue is that this aliasing of `buffer` and `symbol`: https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L97-L100 does not play nicely with a call to `_snprintf_s` like this: https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L115 The clobbered output does not match the predefined, ignored patterns, so we see spurious mem leak check output. This change updates the memleakdbg output generation to use C++ ostreams and instead of fixed size buffers and `_snprintf_s`. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix spurious mem leak check output. Fix #24535.
### Description <!-- Describe your changes. --> Fix DML autoep select test. It should only select one device as that's all the test infrastructure is setup to handle. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
jywu-msft
approved these changes
Apr 29, 2025
/azp run onnxruntime-binary-size-checks-ci-pipeline |
No pipelines are associated with this pull request. |
adrianlizarraga
approved these changes
Apr 30, 2025
jatinwadhwa921
pushed a commit
to intel/onnxruntime
that referenced
this pull request
Apr 30, 2025
### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24487) - (microsoft#24466) - (microsoft#24493) - (microsoft#24484) - (microsoft#24494) - (microsoft#24489) - (microsoft#24504) - (microsoft#24510) - (microsoft#24456) - (microsoft#24537) - (microsoft#24501) - (microsoft#24519) - (microsoft#24513) - (microsoft#24539) - (microsoft#24514) - (microsoft#24542) - (microsoft#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) --------- Co-authored-by: vraspar <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Cherry pick the following into rel-1.22.0
python_version < "3.13"
for onnxscript dependency. #24510)Not added:
Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready