[Core] Expose a session option to disable NCHWc layout optimizations by hariharans29 · Pull Request #27248 · microsoft/onnxruntime

hariharans29 · 2026-02-05T01:25:35Z

Description

Sometimes, a model could have an "outlier" convolution operation with large kernel sizes (eg) 11x11 and a good amout of filters (eg) 64 filters. Since, NCHWc layout transformations are turned on by default, data layout for the Conv operation is changed to NCHWc (if other Conv node parameters fit the bill). The NCHWc Conv implementations in MLAS use a "direct" convolution implementation and large kernel sizes + a good amount of filters make the entire operation a heavily memory badwidth bound operation. With this background, it makes NCHWc data layouts for such models not a good choice for certain platforms whereas on some other platforms (possibly platforms with superior memory bandwidth characteristics and/or other memory characteristics like cache sizes), it still works well. The work-around today is for users to drop down to a lower graph optimization level that does not include the NCHWc transformer on platforms where perf is poor for such models. This seems like a bad precedence to set because it would mean users will miss out on other L3 and L4 optimizers.

Short term:

Introduce a session option for users to disable NCHWc layout transformation
Log a warning prompting users to consider disabling NCHWc if while ORT is performing the NCHWc layout transofmation, we encounter a Conv node that is an outlier in terms of the kernel size

Ultimately, the short term fix gives users the data to help them pick the right data layout for their target model on their target hardware to run it most optimally with ORT.

Longer term:
Add complementary convolution implementations in the NCHWc convolution suite in MLAS (like Im2Col + SGemm) to go with the direct convolution implementations and add heuristics to pick an implementation

More longer term:
Enable online benchmarking infrastructure to help pick the best algo (direct vs Im2Col + SGemm) for a given data layout and Conv parameters (filter sizes, dilations, strides, etc.) based on the data from warm-up runs and use that for algo selection for future runs

Motivation and Context

Takeaway from #26992 and possibly #23587 (although it needs a repro model to be sure)

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/optimizer/graph_transformer_utils.cc

onnxruntime/core/optimizer/nchwc_transformer.cc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…soft/onnxruntime into hari/intel_vs_amd_takeaway

Copilot

Pull request overview

This PR introduces a session configuration option to allow users to disable NCHWc layout transformations, addressing performance issues observed on certain hardware platforms (particularly Intel CPUs) where large kernel convolutions with NCHWc layout perform poorly due to memory bandwidth bottlenecks.

Changes:

Added a new session configuration key kOrtSessionOptionsDisableNchwcLayoutTransformation to control NCHWc layout transformation
Implemented warning logging when Conv nodes with large kernel sizes (>=7x7) are encountered during NCHWc transformation
Updated the NCHWc transformer implementation to accept and use a logger for warnings

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h	Adds the new session configuration constant for disabling NCHWc layout transformation
onnxruntime/core/optimizer/graph_transformer_utils.cc	Checks the new session option and conditionally registers the NCHWc transformer
onnxruntime/core/optimizer/nchwc_transformer.cc	Passes logger to transformer implementation and adds warning for large kernel Conv operations
onnxruntime/core/mlas/lib/snchwc.cpp	Adds TODO comment documenting the need for alternative Conv implementations for large kernels

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/optimizer/graph_transformer_utils.cc

onnxruntime/core/optimizer/nchwc_transformer.cc

onnxruntime/core/optimizer/graph_transformer_utils.cc

onnxruntime/core/optimizer/nchwc_transformer.cc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…soft/onnxruntime into hari/intel_vs_amd_takeaway

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/test/optimizer/nchwc_optimizer_test.cc

Takeaway

2926612

github-actions bot reviewed Feb 5, 2026

View reviewed changes

onnxruntime/core/optimizer/graph_transformer_utils.cc Outdated Show resolved Hide resolved

onnxruntime/core/optimizer/nchwc_transformer.cc Outdated Show resolved Hide resolved

hariharans29 and others added 4 commits February 4, 2026 17:30

Update onnxruntime/core/optimizer/graph_transformer_utils.cc

48ee911

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update onnxruntime/core/optimizer/nchwc_transformer.cc

9413528

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

More commentary

4fda425

Merge branch 'hari/intel_vs_amd_takeaway' of https://github.com/micro…

e83abad

…soft/onnxruntime into hari/intel_vs_amd_takeaway

hariharans29 changed the title ~~Expose a session option to disable NCHWc layout optimizations~~ [Core] Expose a session option to disable NCHWc layout optimizations Feb 5, 2026

hariharans29 requested a review from Copilot February 5, 2026 01:44

Copilot started reviewing on behalf of hariharans29 February 5, 2026 01:45 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

hariharans29 and others added 3 commits February 4, 2026 17:50

Update onnxruntime/core/optimizer/graph_transformer_utils.cc

4c0cda8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot comments + test addition

7759456

Merge branch 'hari/intel_vs_amd_takeaway' of https://github.com/micro…

f9baf6b

…soft/onnxruntime into hari/intel_vs_amd_takeaway

github-actions bot reviewed Feb 5, 2026

View reviewed changes

onnxruntime/test/optimizer/nchwc_optimizer_test.cc Outdated Show resolved Hide resolved

Fix test

91f6452

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Expose a session option to disable NCHWc layout optimizations#27248

[Core] Expose a session option to disable NCHWc layout optimizations#27248
hariharans29 wants to merge 9 commits intomainfrom
hari/intel_vs_amd_takeaway

hariharans29 commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hariharans29 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hariharans29 commented Feb 5, 2026 •

edited

Loading