Skip to content

DML EP shape mismatch failure in InferAndVerifyOutputSizes if torch export dynamo=True #26826

@LigaLiao

Description

@LigaLiao

2025-12-11 23:11:51.7008023 [E:onnxruntime:, inference_session.cc:2545 onnxruntime::InferenceSession::Initialize::<lambda_73d8de3ce9bc7d47058d99ebffb3c8e5>::operator ()] Exception during initialization: E:_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2853)\onnxruntime_pybind11_state.pyd!00007FF9C3BBA59D: (caller: 00007FF9C3BDAA58) Exception(1) tid(c4e4) 80070057 ?*************** EP Error ***************
EP Error 'utf-8' codec can't decode byte 0xb2 in position 286: invalid start byte when using ['DmlExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.

1.clone
https://github.com/ladaapp/lada

2.export
lada/scripts/convert_restoration_to_onnx2.py
Use this https://github.com/nusu-github/deform_conv2d_onnx_exporter Do not use pip install deform_conv2d_onnx_exporter

cmd: lada>python scripts/convert_restoration_to_onnx2.py

3.test
lada/model_weights/test2.py

cmd: lada\model_weights>python test2.py

run collect_env:

PyTorch version: 2.9.1+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 专业版 (10.0.26100 64 位)
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.30.8
Libc version: N/A

Python version: 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:05:38) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-11-10.0.26100-SP0
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 581.80
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Name: AMD Ryzen 9 9950X 16-Core Processor
Manufacturer: AuthenticAMD
Family: 107
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 4300
MaxClockSpeed: 4300
L2CacheSize: 16384
L2CacheSpeed: None
Revision: 17408

Versions of relevant libraries:
[pip3] deform-conv2d-onnx-exporter==1.2.0
[pip3] numpy==2.3.4
[pip3] onnx==1.20.0
[pip3] onnx-ir==0.1.12
[pip3] onnxruntime-directml==1.23.0
[pip3] onnxscript==0.5.6
[pip3] onnxsim==0.4.36
[pip3] onnxslim==0.1.78
[pip3] torch==2.9.1
[pip3] torchvision==0.24.1
[conda] numpy 2.3.4 pypi_0 pypi
[conda] torch 2.9.1 pypi_0 pypi
[conda] torchvision 0.24.1 pypi_0 pypi

report:onnx_export_2025-12-17_18-11-38-222918_success.md

onnx:https://mega.nz/file/0RlREAgY#6vBIxOvaZ-MRuRvGbz_9gi-R0tKMWjBuZlRUICTQMUs

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:DMLissues related to the DirectML execution providerep:Xnnpackissues related to XNNPACK EPstaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions