Fix for AveragePool performance regression by guarding ceil_mode clamp by tengli-alaska · Pull Request #27331 · microsoft/onnxruntime

tengli-alaska · 2026-02-12T19:25:25Z

Description

Guards the std::min bounds clamping in AveragePool1DTask, AveragePool2DTask, and AveragePool3DTask behind a ceil_mode check to restore performance for the default ceil_mode=0 path.

Root Cause

PR #16752 added std::min clamping on hend/wend/dend inside the hot pooling loops to fix correctness when ceil_mode=1. However, these clamps run unconditionally, including when ceil_mode=0 (the default for the vast majority of models). This introduces a data-dependent loop bound that can prevent compiler auto-vectorization of the inner accumulation loop, causing a +17% to +28% kernel slowdown as reported in #27190.

Fix

When ceil_mode=0, the output dimensions are computed with floor division, which guarantees that hstart + kernel_size <= height + pad — the clamp is a no-op. This PR guards each clamp behind if (ceil_mode), restoring the original fast path while preserving the correctness fix for ceil_mode=1.

Changes

pool_functors.h: Added int64_t ceil_mode field to AveragePool1DTask, AveragePool2DTask, AveragePool3DTask and wrapped each std::min clamp in if (ceil_mode)
pool.cc: Pass pool_attrs_.ceil_mode to the three AveragePool task initializers

Benchmark Results

Regression confirmed on AMD EPYC 7543 (x86_64, Linux) using ORT's built-in profiling on averagepool_pool_11_18_average_pool_3d_valid_input:

Version	Kernel Time (ms)	Change
1.20.0	0.1655	baseline
1.21.0	0.1710	+3.3%

The original issue observed +17% to +28% regression. Unable to build ORT from source locally to benchmark the patched version, would appreciate if @junghyunpark2001 could validate the fix on their setup.

Notes

Zero-cost for ceil_mode=0 (default path)
ceil_mode=1 correctness fix from Align AvgPool ceil_mode on last value to torch #16752 is fully preserved
Existing AveragePool tests cover both ceil_mode=0 and ceil_mode=1

tengli-alaska · 2026-02-12T19:36:40Z

@microsoft-github-policy-service agree

Copilot

Pull request overview

This PR addresses a performance regression in the AveragePool operator introduced in PR #16752. The regression (17-28% kernel slowdown) was caused by unconditional std::min clamping operations in the hot pooling loops, which prevented compiler auto-vectorization. The fix guards these clamps behind ceil_mode checks, restoring performance for the default ceil_mode=0 path while preserving correctness for ceil_mode=1.

Changes:

Added ceil_mode field to AveragePool1DTask, AveragePool2DTask, and AveragePool3DTask structs
Conditionally guards all std::min clamps with if (ceil_mode) checks
Updated task initializers in pool.cc to pass pool_attrs_.ceil_mode

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
onnxruntime/core/providers/cpu/nn/pool_functors.h	Added ceil_mode field and conditional clamp guards for 1D, 2D, and 3D average pooling tasks
onnxruntime/core/providers/cpu/nn/pool.cc	Updated task initializations to pass ceil_mode parameter
.vscode/settings.json	Unrelated formatting changes and addition of C_Cpp_Runner.msvcBatchPath configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.vscode/settings.json

titaiwangms · 2026-02-12T21:33:45Z

/azp run Linux QNN CI Pipeline
/azp run Win_TRT_Minimal_CUDA_Test_CI
/azp run Windows ARM64 QNN CI Pipeline
/azp run Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-02-12T21:33:52Z

No pipelines are associated with this pull request.

titaiwangms · 2026-02-12T21:34:26Z

/azp run Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-02-12T21:34:37Z

Azure Pipelines successfully started running 1 pipeline(s).

titaiwangms · 2026-02-12T21:36:02Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-02-12T21:36:19Z

Azure Pipelines successfully started running 3 pipeline(s).

titaiwangms

I also only discovered non-significant changes.

{
  "model": "/home/titaiwang/onnxruntime/profiling_avgpool/averagepool_pool_11_18_average_pool_3d_valid_input/averagepool_pool_11_18_average_pool_3d_valid_input.onnx",
  "operator": "AveragePool",
  "old_version": "1.20.0",
  "new_version": "1.21.0",
  "summary": {
    "kernel_old_ms": 0.191,
    "kernel_new_ms": 0.198,
    "kernel_change_ms": 0.007000000000000006,
    "kernel_change_pct": 3.6649214659685896,
    "total_old_ms": 0.201,
    "total_new_ms": 0.204,
    "is_regression": false
  }
}

.vscode/settings.json

tengli-alaska added 2 commits February 12, 2026 10:38

Guard AveragePool ceil_mode clamp to fix perf regression microsoft#27190

10a383b

Removed unrelated vscode settings change

2e66c83

tengli-alaska mentioned this pull request Feb 12, 2026

[Performance] Performance regression in AveragePool operator #27190

Open

titaiwangms requested review from Copilot and titaiwangms February 12, 2026 19:42

Copilot started reviewing on behalf of titaiwangms February 12, 2026 19:43 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

.vscode/settings.json Outdated Show resolved Hide resolved

Removed unrelated vscode settings

07a9512

titaiwangms reviewed Feb 12, 2026

View reviewed changes

.vscode/settings.json Show resolved Hide resolved

Restore .vscode/settings.json

401f384

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for AveragePool performance regression by guarding ceil_mode clamp#27331

Fix for AveragePool performance regression by guarding ceil_mode clamp#27331
tengli-alaska wants to merge 4 commits intomicrosoft:mainfrom
tengli-alaska:fix/averagepool-regression

tengli-alaska commented Feb 12, 2026

Uh oh!

tengli-alaska commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

titaiwangms commented Feb 12, 2026

Uh oh!

azure-pipelines bot commented Feb 12, 2026

Uh oh!

titaiwangms commented Feb 12, 2026

Uh oh!

azure-pipelines bot commented Feb 12, 2026

Uh oh!

titaiwangms commented Feb 12, 2026

Uh oh!

azure-pipelines bot commented Feb 12, 2026

Uh oh!

titaiwangms left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tengli-alaska commented Feb 12, 2026

Description

Root Cause

Fix

Changes

Benchmark Results

Notes

Uh oh!

tengli-alaska commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

titaiwangms commented Feb 12, 2026

Uh oh!

azure-pipelines bot commented Feb 12, 2026

Uh oh!

titaiwangms commented Feb 12, 2026

Uh oh!

azure-pipelines bot commented Feb 12, 2026

Uh oh!

titaiwangms commented Feb 12, 2026

Uh oh!

azure-pipelines bot commented Feb 12, 2026

Uh oh!

titaiwangms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants