Skip to content

Cherry-picks into rel-1.22.0 #24580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 30, 2025
Merged

Conversation

adrianlizarraga and others added 17 commits April 28, 2025 11:44
### Description
Excludes QnnGpu.dll from Windows x64 NuGet package because it is not
available for that architecture.



### Motivation and Context
Fix failure in QNN packaging pipeline:
```shell
CreateNativePackage:
  Generating nuspec for the native Microsoft.ML.OnnxRuntime.QNN nuget package...
  python ..\tools\nuget\generate_nuspec_for_native_nuget.py --package_version 1.22.0-dev-20250421-0439-2abab8d --package_name Microsoft.ML.OnnxRuntime.QNN --target_architecture x64 --build_config RelWithDebInfo --native_build_path D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo --packages_path D:\a\_work\1\b\packages --ort_build_path D:\a\_work\1\b --sources_path D:\a\_work\1\s --commit_id 2abab8d --is_release_build False --execution_provider None --nuspec_name NativeNuget.nuspec
          1 file(s) copied.
          1 file(s) copied.
  nuspec_name: NativeNuget.nuspec
  Bundling native shared library artifacts into Microsoft.ML.OnnxRuntime nuget package...
  nuget pack NativeNuget.nuspec
  Attempting to build package from 'NativeNuget.nuspec'.
##[error]EXEC(0,0): Error NU5019: File not found: 'D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo\QnnGpu.dll'.
EXEC : error NU5019: File not found: 'D:\a\_work\1\b\RelWithDebInfo\RelWithDebInfo\QnnGpu.dll'. [D:\a\_work\1\s\csharp\OnnxRuntime.CSharp.proj]
##[error]csharp\OnnxRuntime.CSharp.proj(109,5): Error MSB3073: The command "nuget pack NativeNuget.nuspec" exited with code 1.
```

Introduced by this PR:
#24435
### Description
<!-- Describe your changes. -->
For QNN-EP, build FP-to-Bool Cast into NotEqual.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
HTP currently does not support FP-to-Bool Cast due to some limitations.
To unblock CLIP models, replace such Cast with NotEqual to achieve the
same functionality.

Co-authored-by: minfhong-quic <[email protected]>
### Description
Removes unnecessary std::move on an r-value expression. This caused a
compiler warning/error in the Linux Android QNN pipeline.



### Motivation and Context
Introduced by PR: #24466
`onnx.mapping` was deprecated and is being removed. This PR updates
removes deprecated usage.

@MaanavD would be good if this can make it into 1.22.0 for forward ONNX
release (1.19+) compatibility.
…on Maven) (#24494)

### Description
Updates the Android QNN package to use QNN SDK 2.33.0, which is
available on Maven. QNN SDK 2.33.2 is not available yet on Maven:
https://mvnrepository.com/artifact/com.qualcomm.qti/qnn-runtime


### Motivation and Context
Previous PR that updated QNN SDK version:
#24440
Increases operator coverage for WebGPU EP.
Reverts #24372

The above PR removes the `build-nuget` command-line argument from the
`dml-vs-2022.yml` file. This PR reverts that change and adds the
`build-nuget` back to the file.


The `--build_nuget` option creates the
`csharp\src\Microsoft.ML.OnnxRuntime\bin\RelWithDebInfo` directory
structure and stores binaries in there. There's a subsequent task in the
yaml file that tries to sign DLLs in the
`csharp\src\Microsoft.ML.OnnxRuntime\bin\RelWithDebInfo`, however this
task fails because the directory structure is now never created (due to
removal of `--build_nuget`).
### Description
<!-- Describe your changes. -->

Add `python_version < "3.13"` for `onnxscript` dependency in
tools/ci_build/github/linux/python/requirements.txt.

`onnxscript` has `onnx` as a dependency. Building the `onnx` wheel fails
with Python 3.13.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix pipeline build failures.
New EP - currently based on existing TensorRT EP but meant to be used on
RTX GPUs with a lean version of TensorRT.

### Description
Adding a new EP based on TensorRT EP. This is going to use a special
version of TensorRT optimized for RTX GPUs. In the future we plan to
make changes to the EP to streamline it further (e.g, get rid of
dependency on CUDA EP completely).

### Motivation and Context
The new TensorRT for RTX is going to have:
1. Much smaller footprint 
2. Much faster model compile/load times. 
3. Better usability in terms of use of cached models across multiple RTX
GPUs.

This effort is also targeting WCR ML workflows.

---------

Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
Co-authored-by: iraut <[email protected]>
Co-authored-by: Hrishikesh Manohar <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
…wnstream node is not QuantizeLinear (#24537)

### Description
Updates the WeightBiasQuantization optimizer to skip processing on
Conv/Gemm nodes if the downstream child node is not a QuantizeLinear.

#### Before this PR
Original graph:
```
input_0 -> DQ -> Conv -> graph_output (or non-Q node)
                 ^  ^
                 |  |
weights_f32------+
                    |
bias_f32------------+
```
Becomes:

```
input_0 -> DQ ------> Conv -> graph_output (or non-Q node)
                      ^  ^
                      |  |
weights_quant -> DQ --+
                         |
bias_quant -> DQ --------+
```
The above is **NOT** a valid QDQ node unit for Conv because the Conv's
output is not consumed by a QuantizeLinear node.

#### With this PR
The above example graph remains unchanged after L1 optimizations:
```
input_0 -> DQ -> Conv -> graph_output (or non-Q node)
                 ^  ^
                 |  |
weights_f32------+
                    |
bias_f32------------+
```


### Motivation and Context
Caused inaccuracy for a customer model. Automatically quantizing the
weights and biases of a Conv/Gemm is detrimental if the output of the
Conv/Gemm is not consumed by a QuantizeLinear node. In this scenario,
the whole node group is not considered a valid QDQ node unit, and so the
EP has to run the Conv/Gemm as float32/float16 anyway. If the Conv/Gemm
is running as float32/float16, then quantizing the weights and biases
introduces inaccuracy for no gain.

PR that originally added this optimizer:
#22969
### Description
<!-- Describe your changes. -->
Add wrappers for the AutoEP C API changes to the C++ API.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fixed the bug in #24228 which causes the incorrect result for phi models
when flash attention is disabled.
### Description
Fixes a segfault that occurs when an EP library is re-loaded in the same
process.


### Motivation and Context
A recent [PR ](#24430)
updated the Environment to unload all EP libraries on destruction of
`OrtEnv`. We forgot to properly update the state to mark the EP library
as unloaded. Therefore, this caused a segfault when the EP library was
re-loaded.
### Description

This PR updates how the K path is identified in Phi-4 multimodal.

### Motivation and Context

This is needed as part of the updates made to the rewritten modeling
code for the speech component of Phi-4 multimodal.
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->

Fix memleakdbg call stack output.

The call stack output was getting clobbered:

`C:\dev\onnxruntime\build\Debug\_deps\googletest-src\googletest\include\gtest\internal\gtest-port.h(1631):
l\gtest-port.h(1631): eadLocal<testing::Sequence *>::GetOrCreateValue`

I think the issue is that this aliasing of `buffer` and `symbol`:

https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L97-L100

does not play nicely with a call to `_snprintf_s` like this:

https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L115

The clobbered output does not match the predefined, ignored patterns, so
we see spurious mem leak check output.

This change updates the memleakdbg output generation to use C++ ostreams
and instead of fixed size buffers and `_snprintf_s`.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix spurious mem leak check output.
Fix #24535.
### Description
<!-- Describe your changes. -->
Fix DML autoep select test. It should only select one device as that's
all the test infrastructure is setup to handle.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@jywu-msft
Copy link
Member

/azp run onnxruntime-binary-size-checks-ci-pipeline

Copy link

No pipelines are associated with this pull request.

@jywu-msft jywu-msft merged commit ef546e9 into rel-1.22.0 Apr 30, 2025
182 of 195 checks passed
@jywu-msft jywu-msft deleted the vraspar/rel1.22/cherry_picks_round1 branch April 30, 2025 03:07
jatinwadhwa921 pushed a commit to intel/onnxruntime that referenced this pull request Apr 30, 2025
### Description

Cherry pick the following into
[rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0)


- (microsoft#24487)
- (microsoft#24466)
- (microsoft#24493)
- (microsoft#24484)
- (microsoft#24494)
- (microsoft#24489)
- (microsoft#24504)
- (microsoft#24510)
- (microsoft#24456)
- (microsoft#24537)
- (microsoft#24501)
- (microsoft#24519)
- (microsoft#24513)
- (microsoft#24539)
- (microsoft#24514)
- (microsoft#24542)
- (microsoft#24585)

Not added:

Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing
cuda pipeline is ready
- (microsoft#24491)
- (microsoft#24509)
- (microsoft#24564)

---------

Co-authored-by: vraspar <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: minfhong-quic <[email protected]>
Co-authored-by: minfhong-quic <[email protected]>
Co-authored-by: Justin Chu <[email protected]>
Co-authored-by: Prathik Rao <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Ankan Banerjee <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
Co-authored-by: iraut <[email protected]>
Co-authored-by: Hrishikesh Manohar <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: xhcao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.