Open
Description
Describe the issue
I'm using ORT with WebGPU for inference. I did profiling and found that 4 AveragePoolings takes almost half of model inference speed.
Here is example of operation:
AvgPool inference time has linear dependence on kernel size
Shapes // Time
- (20, 32) = 640 // 0.2ms;
- (40, 64) = 2560 (x4) // 0.8ms (x4)
- (80, 128) = 10240 (x16) // 3.2ms (x16)
Problem here is that it is an edge case, because it is literally ReduceMean on dims=(2,3). I changed the operation and got giant speed boost.
I think that it is an edge case (that AvgPool == ReduceMean) and you can create a special handler for it.
To reproduce
Just create AvgPool with big kernel_size
Urgency
Not urgent, workaround was found
Platform
Mac
OS Version
15.2
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
JavaScript
Architecture
ARM64
Execution Provider
Other / Unknown
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No