Skip to content

[Performance] ORT-WebGPU Average Pooling is working too long in edge case #23614

Open
@grazder

Description

@grazder

Describe the issue

I'm using ORT with WebGPU for inference. I did profiling and found that 4 AveragePoolings takes almost half of model inference speed.

Image

Here is example of operation:

Image

AvgPool inference time has linear dependence on kernel size

Shapes // Time

  • (20, 32) = 640 // 0.2ms;
  • (40, 64) = 2560 (x4) // 0.8ms (x4)
  • (80, 128) = 10240 (x16) // 3.2ms (x16)

Problem here is that it is an edge case, because it is literally ReduceMean on dims=(2,3). I changed the operation and got giant speed boost.

Image

I think that it is an edge case (that AvgPool == ReduceMean) and you can create a special handler for it.

To reproduce

Just create AvgPool with big kernel_size

Urgency

Not urgent, workaround was found

Platform

Mac

OS Version

15.2

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

JavaScript

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    api:Javascriptissues related to the Javascript APIep:WebGPUort-web webgpu providerperformanceissues related to performance regressionsplatform:webissues related to ONNX Runtime web; typically submitted using templatestaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions