Skip to content
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d4943e4
Add an implementation an NHWC implementation of convolution to avoid …
orlmon01 Dec 19, 2025
1606a1c
Add a value for channels_last to bench_sconv.cpp
orlmon01 Dec 19, 2025
f80cc39
Merge branch 'microsoft:main' into main
orlmon01 Jan 7, 2026
6045333
Merge branch 'microsoft:main' into main
orlmon01 Jan 12, 2026
2dd199e
Update internal_testing_tests.cc
orlmon01 Jan 12, 2026
eb026d1
Merge branch 'microsoft:main' into main
orlmon01 Jan 14, 2026
4df9cea
Update nhwc_transformer_test.cc
orlmon01 Jan 14, 2026
b133782
Update internal_testing_tests.cc
orlmon01 Jan 14, 2026
0c2d1cd
Update ort_model_only_test.cc
orlmon01 Jan 14, 2026
25c0be7
Lintrunner fixes
orlmon01 Jan 14, 2026
bee0892
Merge branch 'microsoft:main' into main
orlmon01 Jan 15, 2026
a64af7c
Merge branch 'microsoft:main' into main
orlmon01 Jan 16, 2026
bc1ada6
Merge branch 'microsoft:main' into main
orlmon01 Jan 21, 2026
0482150
Update onnxruntime/core/optimizer/nhwc_transformer.cc
orlmon01 Jan 26, 2026
f9606cd
Update onnxruntime/core/framework/kernel_type_str_resolver.cc
orlmon01 Jan 26, 2026
63d9c55
Update onnxruntime/core/providers/cpu/nn/conv.cc
orlmon01 Jan 26, 2026
457513b
Update onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc
orlmon01 Jan 26, 2026
b836bd3
Update onnxruntime/test/framework/ort_model_only_test.cc
orlmon01 Jan 26, 2026
d305b8f
Merge branch 'microsoft:main' into main
orlmon01 Jan 26, 2026
891dad5
Additional guards to not include KLEIDIAI specific kernels
orlmon01 Feb 4, 2026
7acbfcf
Merge branch 'microsoft:main' into main
orlmon01 Feb 4, 2026
878dff6
Merge branch 'microsoft:main' into main
orlmon01 Feb 6, 2026
0a04afc
Merge branch 'microsoft:main' into main
orlmon01 Feb 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, EmbedLayerNormalization);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, ExpandDims);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, FusedConv);
#ifdef USE_KLEIDIAI
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, NhwcFusedConv);
#endif
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, FusedGemm);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, GreedySearch);
class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, MultiHeadAttention);
Expand Down Expand Up @@ -290,7 +293,7 @@
}

Status RegisterCpuContribKernels(KernelRegistry& kernel_registry) {
static const BuildKernelCreateInfoFn function_table[] = {

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

'function_table': const object must be initialized

Check failure on line 296 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

'function_table': const object must be initialized
BuildKernelCreateInfo<void>, // default entry to avoid the list become empty after ops-reducing
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, SampleOp)>,

Expand All @@ -302,6 +305,7 @@
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, EmbedLayerNormalization)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, ExpandDims)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, FusedConv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, NhwcFusedConv)>,

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier

Check failure on line 308 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

'kCpuExecutionProvider_NhwcFusedConv_kMSDomain_ver1_float': undeclared identifier
Comment on lines 304 to 309
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registration of NhwcFusedConv kernel is unconditional in cpu_contrib_kernels.cc (line 308), but the kernel declaration is conditionally compiled with USE_KLEIDIAI guards in the same file (lines 21-23). This creates an inconsistency: when USE_KLEIDIAI is not defined, the declaration is absent but the registration still attempts to register the kernel, which will likely cause a compilation error. The registration on line 308 should also be wrapped with #ifdef USE_KLEIDIAI guards to match the conditional declaration.

Suggested change
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, FusedConv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, NhwcFusedConv)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, FusedConv)>,
#ifdef USE_KLEIDIAI
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, NhwcFusedConv)>,
#endif

Copilot uses AI. Check for mistakes.
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, FusedGemm)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, GreedySearch)>,
BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, MultiHeadAttention)>,
Expand Down Expand Up @@ -389,8 +393,8 @@

};

for (auto& function_table_entry : function_table) {

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

'const onnxruntime::BuildKernelCreateInfoFn

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

syntax error: missing ';' before ')'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

syntax error: missing ';' before ':'

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

'function_table_entry': a symbol whose type contains 'auto' must have an initializer

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

'function_table_entry': references must be initialized

Check failure on line 396 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

'const onnxruntime::BuildKernelCreateInfoFn
KernelCreateInfo info = function_table_entry();

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_vitisai

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x86_release

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_ep_generic_interface

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_debug

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / build_x64_release_xnnpack

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU CUDA CI Pipeline

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU TensorRT CI Pipeline

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (novcpkg, dynamic)

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_build_x64_RelWithDebInfo (vcpkg, static)

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / Windows GPU DML CI Pipeline

term does not evaluate to a function taking 0 arguments

Check failure on line 397 in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc

View workflow job for this annotation

GitHub Actions / webgpu_minimal_build_edge_build_x64_RelWithDebInfo

term does not evaluate to a function taking 0 arguments
if (info.kernel_def != nullptr) { // filter disabled entries where type is void
ORT_RETURN_IF_ERROR(kernel_registry.Register(std::move(info)));
}
Expand Down
8 changes: 8 additions & 0 deletions onnxruntime/contrib_ops/cpu/fused_conv.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,13 @@ ONNX_CPU_OPERATOR_TYPED_MS_KERNEL(
.TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
FusedConvFloat);

ONNX_CPU_OPERATOR_TYPED_MS_KERNEL(
NhwcFusedConv,
1,
float,
KernelDefBuilder()
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NhwcFusedConv kernel registration is missing the MayInplace hint that is present in the FusedConv registration. The FusedConv kernel uses .MayInplace(3, 0) to allow the optional "sum" input (index 3) to be reused as the output buffer (index 0) for efficiency. However, NhwcFusedConv does not include this hint. This means that even though the code in conv.cc handles the Sum input for channels_last mode, the allocation planner cannot optimize memory usage by reusing the Sum buffer for the output when using NhwcFusedConv. Consider adding .MayInplace(3, 0) to the NhwcFusedConv kernel builder to maintain consistency and enable the same memory optimization.

Suggested change
KernelDefBuilder()
KernelDefBuilder()
// Allow the optional "sum" input (index 3) to be reused as the output buffer (index 0),
// consistent with the FusedConv kernel registration.
.MayInplace(3, 0)

Copilot uses AI. Check for mistakes.
.TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
FusedConvFloat);
Comment on lines +29 to +35
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NhwcFusedConv kernel registration (lines 29-35) is not conditionally compiled with USE_KLEIDIAI guards, but the PR description states this is a "KleidiAi specific implementation" that is "only used with KleidiAi (for now)". This is inconsistent with the conditional registration approach used in cpu_contrib_kernels.cc where the declaration is guarded by #ifdef USE_KLEIDIAI. Either this registration should also be conditionally compiled with USE_KLEIDIAI, or if the kernel is meant to work without KleidiAI (using the fallback path), the guards in cpu_contrib_kernels.cc should be removed.

Copilot uses AI. Check for mistakes.

} // namespace contrib
} // namespace onnxruntime
12 changes: 12 additions & 0 deletions onnxruntime/core/framework/kernel_type_str_resolver.cc
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,18 @@ static OpKernelTypeStrMap::const_iterator LookUpOpId(const OpIdentifier& op_id,
}
}

#ifdef USE_KLEIDIAI
// KleidiAI specific block for NhwcFusedConvolutions
if (op_it == map.end() && op_id.domain == kMSDomain && op_id.op_type == "NhwcFusedConv") {
const auto fused_conv_op_id = OpIdentifier{std::string{kMSDomain}, "FusedConv", op_id.since_version};
op_it = map.find(fused_conv_op_id);
if (op_it == map.end()) {
const auto conv_op_id = OpIdentifier{std::string{kOnnxDomain}, "Conv", op_id.since_version};
op_it = map.find(conv_op_id);
}
}
#endif

return op_it;
}

Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/core/graph/contrib_ops/nhwc_schema_defs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ Only has fp16 implementation as of 2023/04/15.
.Input(2, "B", "", "T", OpSchema::Optional)
.Input(3, "Z", "Tensor to be added to the output, must be the same shape and format as the output tensor.", "T", OpSchema::Optional)
.Output(0, "Y", "", "T")
.TypeConstraint("T", {"tensor(float16)"}, "Constrain input and output types to float tensors")
.TypeConstraint("T", {"tensor(float16)", "tensor(float)"}, "Constrain input and output types to float tensors")
.TypeAndShapeInferenceFunction([](InferenceContext& ctx) {
ONNX_NAMESPACE::propagateElemTypeFromInputToOutput(ctx, 0, 0);
convPoolShapeInferenceNhwc(ctx, true, false, 0, 1);
Expand Down
2 changes: 2 additions & 0 deletions onnxruntime/core/mlas/inc/mlas.h
Original file line number Diff line number Diff line change
Expand Up @@ -851,6 +851,7 @@ struct MLAS_CONV_PARAMETERS {
size_t BatchCount;
size_t GroupCount;
size_t InputChannels;
bool ChannelsLast;
size_t InputShape[3];
size_t KernelShape[3];
size_t DilationShape[3];
Expand Down Expand Up @@ -890,6 +891,7 @@ MlasConvPrepare(MLAS_CONV_PARAMETERS* Parameters,
size_t FilterCount,
const MLAS_ACTIVATION* Activation,
size_t* WorkingBufferSize,
bool ChannelsLast,
float Beta,
MLAS_THREADPOOL* ThreadPool);

Expand Down
4 changes: 3 additions & 1 deletion onnxruntime/core/mlas/lib/convolve.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1262,6 +1262,7 @@ MlasConvPrepare(
size_t FilterCount,
const MLAS_ACTIVATION* Activation,
size_t* WorkingBufferSize,
bool ChannelsLast,
float Beta,
MLAS_THREADPOOL* ThreadPool
)
Expand Down Expand Up @@ -1320,7 +1321,7 @@ Return Value:
if (GetMlasPlatform().MlasConvPrepareOverride != nullptr &&
GetMlasPlatform().MlasConvPrepareOverride(Parameters, Dimensions, BatchCount, GroupCount, InputChannels,
InputShape,KernelShape,DilationShape, Padding, StrideShape, OutputShape, FilterCount,
Activation, WorkingBufferSize, Beta, ThreadPool)){
Activation, WorkingBufferSize, ChannelsLast, Beta, ThreadPool)){
return;
}
//
Expand All @@ -1331,6 +1332,7 @@ Return Value:
Parameters->BatchCount = BatchCount;
Parameters->GroupCount = GroupCount;
Parameters->InputChannels = InputChannels;
Parameters->ChannelsLast = ChannelsLast;
Parameters->FilterCount = FilterCount;
Parameters->Beta = Beta;

Expand Down
31 changes: 19 additions & 12 deletions onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,7 @@ static std::shared_ptr<const void*[]> LhsPtrFill(const size_t ci, const size_t i
static std::unique_ptr<std::byte[]> LhsPackImageDataSme(const size_t ci, const size_t ih, const size_t iw,
const size_t kh, const size_t kw, const size_t sh,
const size_t sw, const size_t padding, const float* in,
bool input_is_channels_last,
MLAS_THREADPOOL* ThreadPool)
{
size_t padsize = 256;
Expand All @@ -476,7 +477,14 @@ static std::unique_ptr<std::byte[]> LhsPackImageDataSme(const size_t ci, const s
const auto lhs_size = kai_get_lhs_packed_size_lhs_imatmul_pack_x32p2vlx1_x32p_sme(m,kh*kw,ci);
auto lhs = std::make_unique<std::byte[]>(lhs_size);

auto nhwc = NChwToNhwc(1, ci, ih, iw, in, 1, 1, false, ThreadPool);
std::unique_ptr<float[]> nhwc_holder;
const float* activation_src = nullptr;
if (input_is_channels_last) {
activation_src = in;
} else {
nhwc_holder = NChwToNhwc(1, ci, ih, iw, in, 1, 1, false, ThreadPool);
activation_src = nhwc_holder.get();
}

// Cache of computed lhs ptr offsets. thread_local to prevent interference from parallel sessions.
thread_local std::unordered_map<LhsCacheKey, std::shared_ptr<const void*[]>> lhs_ptrs_cache;
Expand All @@ -489,7 +497,7 @@ static std::unique_ptr<std::byte[]> LhsPackImageDataSme(const size_t ci, const s
lhs_ptrs_cache[key] = lhs_ptrs;
}

MultiThreadedLHSPackSme(ThreadPool, ci, m, kh, kw, &lhs_ptrs[0], &lhs[0], &nhwc[0], &pad_ptr[0]);
MultiThreadedLHSPackSme(ThreadPool, ci, m, kh, kw, &lhs_ptrs[0], &lhs[0], activation_src, &pad_ptr[0]);

return lhs;
}
Expand All @@ -511,6 +519,7 @@ static void ConvolveSme(const size_t co, //channels out
const float* in, //in image data
float* out, //out image data
float* tmp_mlas_aligned, //intermediate buffer if we need to perform a transpose
bool input_is_channels_last,
MLAS_THREADPOOL* ThreadPool) {

//RhsPackWeightsBiasSme() - to perform dilation increases kernel size and masks unused weights
Expand Down Expand Up @@ -550,17 +559,13 @@ static void ConvolveSme(const size_t co, //channels out

for (size_t g = 0; g < groups; ++g) {

auto result{out};
//do we require a post matmul transpose ?
//output is m x n or image_data x co or hw x co
//MLAS require it as n x m (or co x hw), transpose required
if (co > 1) {
//intermediate buffer required, pre-transpose
//Note: because we are calling MlasTranspose() need to ensure we use a MLAS aligned buffer
auto result = out;
const bool need_transpose = (!input_is_channels_last) && (co > 1);
if (need_transpose) {
result = tmp_mlas_aligned;
}

auto lhs = LhsPackImageDataSme(ci, ih, iw, d_kh, d_kw, sh, sw, padding, in, ThreadPool);
auto lhs = LhsPackImageDataSme(ci, ih, iw, d_kh, d_kw, sh, sw, padding, in, input_is_channels_last, ThreadPool);
auto rhs = RhsPackWeightsBiasSme(co, ci, kh, kw, dilationh, dilationw, weights, bias, ThreadPool);

MlasTrySimpleParallel(ThreadPool, static_cast<ptrdiff_t>(dim[0] * dim[1] * dim[2]), [&](ptrdiff_t tid) {
Expand Down Expand Up @@ -626,7 +631,7 @@ static void ConvolveSme(const size_t co, //channels out
}
});

if (result == tmp_mlas_aligned) {
if (need_transpose) {
//Note: this could be absorbed into post conv activation
MlasTranspose(tmp_mlas_aligned, out, m, co, ThreadPool);
}
Expand Down Expand Up @@ -655,6 +660,7 @@ ArmKleidiAI::MlasConvPrepare(MLAS_CONV_PARAMETERS* Parameters,
size_t FilterCount,
const MLAS_ACTIVATION* Activation,
size_t* WorkingBufferSize,
bool ChannelsLast,
float Beta,
MLAS_THREADPOOL* ThreadPool)
{
Expand All @@ -668,6 +674,7 @@ ArmKleidiAI::MlasConvPrepare(MLAS_CONV_PARAMETERS* Parameters,
Parameters->BatchCount = BatchCount;
Parameters->GroupCount = GroupCount;
Parameters->InputChannels = InputChannels;
Parameters->ChannelsLast = ChannelsLast;
Parameters->FilterCount = FilterCount;
Parameters->Beta = Beta;

Expand Down Expand Up @@ -733,7 +740,7 @@ ArmKleidiAI::MlasConv(
Parameters->DilationShape[0], Parameters->DilationShape[1], // kernel dilation
Parameters->Padding[0], // image padding
Parameters->GroupCount, // filter groups
Filter, Bias, Input, Output, WorkingBuffer, ThreadPool);
Filter, Bias, Input, Output, WorkingBuffer, Parameters->ChannelsLast, ThreadPool);

MlasActivation(Parameters->Activation, Output, nullptr, Parameters->FilterCount, Parameters->OutputSize,
Parameters->OutputSize);
Expand Down
1 change: 1 addition & 0 deletions onnxruntime/core/mlas/lib/kleidiai/mlasi_kleidiai.h
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ MlasConvPrepare(MLAS_CONV_PARAMETERS* Parameters,
size_t FilterCount,
const MLAS_ACTIVATION* Activation,
size_t* WorkingBufferSize,
bool ChannelsLast,
float Beta,
MLAS_THREADPOOL* ThreadPool);

Expand Down
2 changes: 2 additions & 0 deletions onnxruntime/core/mlas/lib/mlasi.h
Original file line number Diff line number Diff line change
Expand Up @@ -827,6 +827,7 @@ void
size_t FilterCount,
const MLAS_ACTIVATION* Activation,
size_t* WorkingBufferSize,
bool ChannelsLast,
float Beta,
MLAS_THREADPOOL* ThreadPool
);
Expand All @@ -847,6 +848,7 @@ bool
size_t FilterCount,
const MLAS_ACTIVATION* Activation,
size_t* WorkingBufferSize,
bool ChannelsLast,
float Beta,
MLAS_THREADPOOL* ThreadPool
);
Expand Down
5 changes: 4 additions & 1 deletion onnxruntime/core/optimizer/conv_activation_fusion.cc
Original file line number Diff line number Diff line change
Expand Up @@ -140,9 +140,12 @@ class FuseConvActivationAction : public ReplaceWithNew {
return "FusedConv";
}
} else if (domain == kMSDomain) {
if (op_type == "NhwcConv") {
if (op_type == "NhwcConv" || op_type == "NhwcFusedConv") {
return "NhwcFusedConv";
}
if (op_type == "FusedConv") {
return "FusedConv";
}
} else if (domain == kMSInternalNHWCDomain) {
if (op_type == "Conv") {
return "Conv";
Expand Down
10 changes: 9 additions & 1 deletion onnxruntime/core/optimizer/conv_add_act_fusion.cc
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,15 @@ class FuseConvAddActivationAction : public ReplaceWithNew {

private:
std::string OpType(const RuntimeState& runtimeState) const override {
return (runtimeState.selected_nodes.Target().OpType() == "Conv") ? "FusedConv" : "NhwcFusedConv";
const auto& target = runtimeState.selected_nodes.Target();
const auto* channels_last_attr = graph_utils::GetNodeAttribute(target, "channels_last");
const bool channels_last = channels_last_attr != nullptr && channels_last_attr->i() != 0;

if (target.OpType() == "Conv") {
return channels_last ? "NhwcFusedConv" : "FusedConv";
}

return "NhwcFusedConv";
Comment on lines +217 to +222
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpType method has incomplete logic. When the target node is not "Conv" (line 218), it unconditionally returns "NhwcFusedConv" (line 222). However, if the target is already "FusedConv" without channels_last attribute, it should remain as "FusedConv", not become "NhwcFusedConv". The correct logic should check if channels_last is set for all node types, not just for "Conv". Consider changing line 218 to check for both "Conv" and "FusedConv", or restructure to check channels_last first regardless of node type.

Suggested change
if (target.OpType() == "Conv") {
return channels_last ? "NhwcFusedConv" : "FusedConv";
}
return "NhwcFusedConv";
const std::string& op_type = target.OpType();
// If channels_last is set, use NHWC fused convolution regardless of original op type.
if (channels_last) {
return "NhwcFusedConv";
}
// Without channels_last, convert Conv to FusedConv, and leave other op types unchanged.
if (op_type == "Conv") {
return "FusedConv";
}
return op_type;

Copilot uses AI. Check for mistakes.
}

std::string Domain(const RuntimeState&) const override { return kMSDomain; }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ const std::unordered_set<std::string_view>& GetORTLayoutSensitiveOps() {
// Define a static local string array so we can refer to the elements with string_views.
static const std::string layout_sensitive_contrib_ops[]{
MakeORTLayoutSensitiveOpId(kMSDomain, "FusedConv"),
MakeORTLayoutSensitiveOpId(kMSDomain, "NhwcFusedConv"),
MakeORTLayoutSensitiveOpId(kMSDomain, "GridSample"),
MakeORTLayoutSensitiveOpId(kMSDomain, "QLinearAveragePool"),
MakeORTLayoutSensitiveOpId(kMSDomain, "QLinearGlobalAveragePool"),
Expand Down
Loading
Loading