Skip to content

[Core] Expose a session option to disable NCHWc layout optimizations#27248

Open
hariharans29 wants to merge 9 commits intomainfrom
hari/intel_vs_amd_takeaway
Open

[Core] Expose a session option to disable NCHWc layout optimizations#27248
hariharans29 wants to merge 9 commits intomainfrom
hari/intel_vs_amd_takeaway

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Feb 5, 2026

Description

Sometimes, a model could have an "outlier" convolution operation with large kernel sizes (eg) 11x11 and a good amout of filters (eg) 64 filters. Since, NCHWc layout transformations are turned on by default, data layout for the Conv operation is changed to NCHWc (if other Conv node parameters fit the bill). The NCHWc Conv implementations in MLAS use a "direct" convolution implementation and large kernel sizes + a good amount of filters make the entire operation a heavily memory badwidth bound operation. With this background, it makes NCHWc data layouts for such models not a good choice for certain platforms whereas on some other platforms (possibly platforms with superior memory bandwidth characteristics and/or other memory characteristics like cache sizes), it still works well. The work-around today is for users to drop down to a lower graph optimization level that does not include the NCHWc transformer on platforms where perf is poor for such models. This seems like a bad precedence to set because it would mean users will miss out on other L3 and L4 optimizers.

Short term:

  1. Introduce a session option for users to disable NCHWc layout transformation
  2. Log a warning prompting users to consider disabling NCHWc if while ORT is performing the NCHWc layout transofmation, we encounter a Conv node that is an outlier in terms of the kernel size

Ultimately, the short term fix gives users the data to help them pick the right data layout for their target model on their target hardware to run it most optimally with ORT.

Longer term:
Add complementary convolution implementations in the NCHWc convolution suite in MLAS (like Im2Col + SGemm) to go with the direct convolution implementations and add heuristics to pick an implementation

More longer term:
Enable online benchmarking infrastructure to help pick the best algo (direct vs Im2Col + SGemm) for a given data layout and Conv parameters (filter sizes, dilations, strides, etc.) based on the data from warm-up runs and use that for algo selection for future runs

Motivation and Context

Takeaway from #26992 and possibly #23587 (although it needs a repro model to be sure)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

hariharans29 and others added 4 commits February 4, 2026 17:30
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@hariharans29 hariharans29 changed the title Expose a session option to disable NCHWc layout optimizations [Core] Expose a session option to disable NCHWc layout optimizations Feb 5, 2026
@hariharans29 hariharans29 requested a review from Copilot February 5, 2026 01:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a session configuration option to allow users to disable NCHWc layout transformations, addressing performance issues observed on certain hardware platforms (particularly Intel CPUs) where large kernel convolutions with NCHWc layout perform poorly due to memory bandwidth bottlenecks.

Changes:

  • Added a new session configuration key kOrtSessionOptionsDisableNchwcLayoutTransformation to control NCHWc layout transformation
  • Implemented warning logging when Conv nodes with large kernel sizes (>=7x7) are encountered during NCHWc transformation
  • Updated the NCHWc transformer implementation to accept and use a logger for warnings

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h Adds the new session configuration constant for disabling NCHWc layout transformation
onnxruntime/core/optimizer/graph_transformer_utils.cc Checks the new session option and conditionally registers the NCHWc transformer
onnxruntime/core/optimizer/nchwc_transformer.cc Passes logger to transformer implementation and adds warning for large kernel Conv operations
onnxruntime/core/mlas/lib/snchwc.cpp Adds TODO comment documenting the need for alternative Conv implementations for large kernels

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant