Cherry-picks into rel-1.22.0 #24580

vraspar · 2025-04-28T19:02:39Z

Description

Cherry pick the following into rel-1.22.0

Not added:

Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready

### Description Excludes QnnGpu.dll from Windows x64 NuGet package because it is not available for that architecture. ### Motivation and Context Fix failure in QNN packaging pipeline: ```shell CreateNativePackage: Generating nuspec for the native Microsoft.ML.OnnxRuntime.QNN nuget package... python ..\tools\nuget\generate_nuspec_for_native_nuget.py --package_version 1.22.0-dev-20250421-0439-2abab8d --package_name Microsoft.ML.OnnxRuntime.QNN --target_architecture x64 --build_config RelWithDebInfo --native_build_path D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo --packages_path D:\a\_work\1\b\packages --ort_build_path D:\a\_work\1\b --sources_path D:\a\_work\1\s --commit_id 2abab8d --is_release_build False --execution_provider None --nuspec_name NativeNuget.nuspec 1 file(s) copied. 1 file(s) copied. nuspec_name: NativeNuget.nuspec Bundling native shared library artifacts into Microsoft.ML.OnnxRuntime nuget package... nuget pack NativeNuget.nuspec Attempting to build package from 'NativeNuget.nuspec'. ##[error]EXEC(0,0): Error NU5019: File not found: 'D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo\QnnGpu.dll'. EXEC : error NU5019: File not found: 'D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo\QnnGpu.dll'. [D:\a\_work\1\s\csharp\OnnxRuntime.CSharp.proj] ##[error]csharp\OnnxRuntime.CSharp.proj(109,5): Error MSB3073: The command "nuget pack NativeNuget.nuspec" exited with code 1. ``` Introduced by this PR: #24435

### Description  For QNN-EP, build FP-to-Bool Cast into NotEqual. ### Motivation and Context  HTP currently does not support FP-to-Bool Cast due to some limitations. To unblock CLIP models, replace such Cast with NotEqual to achieve the same functionality. Co-authored-by: minfhong-quic <[email protected]>

### Description Removes unnecessary std::move on an r-value expression. This caused a compiler warning/error in the Linux Android QNN pipeline. ### Motivation and Context Introduced by PR: #24466

@MaanavD

`onnx.mapping` was deprecated and is being removed. This PR updates removes deprecated usage. @MaanavD would be good if this can make it into 1.22.0 for forward ONNX release (1.19+) compatibility.

…on Maven) (#24494) ### Description Updates the Android QNN package to use QNN SDK 2.33.0, which is available on Maven. QNN SDK 2.33.2 is not available yet on Maven: https://mvnrepository.com/artifact/com.qualcomm.qti/qnn-runtime ### Motivation and Context Previous PR that updated QNN SDK version: #24440

Increases operator coverage for WebGPU EP.

Reverts #24372 The above PR removes the `build-nuget` command-line argument from the `dml-vs-2022.yml` file. This PR reverts that change and adds the `build-nuget` back to the file. The `--build_nuget` option creates the `csharp\src\Microsoft.ML.OnnxRuntime\bin\RelWithDebInfo` directory structure and stores binaries in there. There's a subsequent task in the yaml file that tries to sign DLLs in the `csharp\src\Microsoft.ML.OnnxRuntime\bin\RelWithDebInfo`, however this task fails because the directory structure is now never created (due to removal of `--build_nuget`).

### Description  Add `python_version < "3.13"` for `onnxscript` dependency in tools/ci_build/github/linux/python/requirements.txt. `onnxscript` has `onnx` as a dependency. Building the `onnx` wheel fails with Python 3.13. ### Motivation and Context  Fix pipeline build failures.

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT. ### Description Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely). ### Motivation and Context The new TensorRT for RTX is going to have: 1. Much smaller footprint 2. Much faster model compile/load times. 3. Better usability in terms of use of cached models across multiple RTX GPUs. This effort is also targeting WCR ML workflows. --------- Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]>

…wnstream node is not QuantizeLinear (#24537) ### Description Updates the WeightBiasQuantization optimizer to skip processing on Conv/Gemm nodes if the downstream child node is not a QuantizeLinear. #### Before this PR Original graph: ``` input_0 -> DQ -> Conv -> graph_output (or non-Q node) ^ ^ | | weights_f32------+ | bias_f32------------+ ``` Becomes: ``` input_0 -> DQ ------> Conv -> graph_output (or non-Q node) ^ ^ | | weights_quant -> DQ --+ | bias_quant -> DQ --------+ ``` The above is **NOT** a valid QDQ node unit for Conv because the Conv's output is not consumed by a QuantizeLinear node. #### With this PR The above example graph remains unchanged after L1 optimizations: ``` input_0 -> DQ -> Conv -> graph_output (or non-Q node) ^ ^ | | weights_f32------+ | bias_f32------------+ ``` ### Motivation and Context Caused inaccuracy for a customer model. Automatically quantizing the weights and biases of a Conv/Gemm is detrimental if the output of the Conv/Gemm is not consumed by a QuantizeLinear node. In this scenario, the whole node group is not considered a valid QDQ node unit, and so the EP has to run the Conv/Gemm as float32/float16 anyway. If the Conv/Gemm is running as float32/float16, then quantizing the weights and biases introduces inaccuracy for no gain. PR that originally added this optimizer: #22969

### Description  Add wrappers for the AutoEP C API changes to the C++ API. ### Motivation and Context

Fixed the bug in #24228 which causes the incorrect result for phi models when flash attention is disabled.

### Description Fixes a segfault that occurs when an EP library is re-loaded in the same process. ### Motivation and Context A recent [PR ](#24430) updated the Environment to unload all EP libraries on destruction of `OrtEnv`. We forgot to properly update the state to mark the EP library as unloaded. Therefore, this caused a segfault when the EP library was re-loaded.

### Description This PR updates how the K path is identified in Phi-4 multimodal. ### Motivation and Context This is needed as part of the updates made to the rewritten modeling code for the speech component of Phi-4 multimodal.

### Description  ### Motivation and Context

### Description  Fix memleakdbg call stack output. The call stack output was getting clobbered: `C:\dev\onnxruntime\build\Debug\_deps\googletest-src\googletest\include\gtest\internal\gtest-port.h(1631): l\gtest-port.h(1631): eadLocal<testing::Sequence *>::GetOrCreateValue` I think the issue is that this aliasing of `buffer` and `symbol`: https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L97-L100 does not play nicely with a call to `_snprintf_s` like this: https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L115 The clobbered output does not match the predefined, ignored patterns, so we see spurious mem leak check output. This change updates the memleakdbg output generation to use C++ ostreams and instead of fixed size buffers and `_snprintf_s`. ### Motivation and Context  Fix spurious mem leak check output. Fix #24535.

### Description  Fix DML autoep select test. It should only select one device as that's all the test infrastructure is setup to handle. ### Motivation and Context

jywu-msft · 2025-04-29T23:08:30Z

/azp run onnxruntime-binary-size-checks-ci-pipeline

azure-pipelines · 2025-04-29T23:08:36Z

No pipelines are associated with this pull request.

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24487) - (microsoft#24466) - (microsoft#24493) - (microsoft#24484) - (microsoft#24494) - (microsoft#24489) - (microsoft#24504) - (microsoft#24510) - (microsoft#24456) - (microsoft#24537) - (microsoft#24501) - (microsoft#24519) - (microsoft#24513) - (microsoft#24539) - (microsoft#24514) - (microsoft#24542) - (microsoft#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) --------- Co-authored-by: vraspar <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>

adrianlizarraga and others added 17 commits April 28, 2025 11:44

[QNN EP] Remove unnecessary std::move to fix warning/error (#24493)

9df2495

### Description Removes unnecessary std::move on an r-value expression. This caused a compiler warning/error in the Linux Android QNN pipeline. ### Motivation and Context Introduced by PR: #24466

Remove deprecated onnx.mapping usage (#24484)

fb37338

`onnx.mapping` was deprecated and is being removed. This PR updates removes deprecated usage. @MaanavD would be good if this can make it into 1.22.0 for forward ONNX release (1.19+) compatibility.

[WebGPU EP] Implements Depth-To-Space Operator (#24489)

c03106b

Increases operator coverage for WebGPU EP.

[webgpu] Fix bug in 1D dispatch workgroups (#24519)

9b18e32

Fixed the bug in #24228 which causes the incorrect result for phi models when flash attention is disabled.

webgpu: fix InstanceNorm errors (#24514)

78247bd

### Description  ### Motivation and Context

jywu-msft approved these changes Apr 29, 2025

View reviewed changes

adrianlizarraga approved these changes Apr 30, 2025

View reviewed changes

jywu-msft merged commit ef546e9 into rel-1.22.0 Apr 30, 2025
182 of 195 checks passed

jywu-msft deleted the vraspar/rel1.22/cherry_picks_round1 branch April 30, 2025 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cherry-picks into rel-1.22.0 #24580

Cherry-picks into rel-1.22.0 #24580

Uh oh!

vraspar commented Apr 28, 2025 •

edited

Loading

Uh oh!

jywu-msft commented Apr 29, 2025

Uh oh!

azure-pipelines bot commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

Cherry-picks into rel-1.22.0 #24580

Cherry-picks into rel-1.22.0 #24580

Uh oh!

Conversation

vraspar commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

jywu-msft commented Apr 29, 2025

Uh oh!

azure-pipelines bot commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

vraspar commented Apr 28, 2025 •

edited

Loading