[Performance] Multithreading for DequantizeLinear #23395
Open
Description
Describe the issue
The current DequantizeLinear CPU operator does not use threads.
I have implemented a quick prototype that shows a 4x speed up on that operator when used with a Qwen 2.5 0.5B model
I do see a comment about this:
@fajin-corp is this something you were planning to implement? I'd be happy to help under your guidance
To reproduce
n/a
Urgency
No response
Platform
Windows
OS Version
any
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
main
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes