Skip to content

[Performance] Multithreading for DequantizeLinear #23395

Open
@tarekziade

Description

Describe the issue

The current DequantizeLinear CPU operator does not use threads.

I have implemented a quick prototype that shows a 4x speed up on that operator when used with a Qwen 2.5 0.5B model

I do see a comment about this:

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/quantization/quantize_linear.cc#L302

@fajin-corp is this something you were planning to implement? I'd be happy to help under your guidance

To reproduce

n/a

Urgency

No response

Platform

Windows

OS Version

any

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

main

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressionsquantizationissues related to quantization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions