Skip to content

add onnx dequantize linear channel wise pattern for NPU #30265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bopeng1234
Copy link
Contributor

Details:

  • when DequantizeLinear OP met requirements of NPU (sym INT4, channel-wised Quantization) the performance raised a lot

      // For NPU Optimization
      // reference: https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/text_generation#npu-support
      //  1. model must be exported with symmetric INT4 quantization.
      //  2. The quantized LLM MatMul Op need to use Channel-wised quantization for better performance. (block_size = K)
      // with the limitation, the NPU vpux-plugin compiler could use mix-precision matmul with i4 weight input.
      // it raised the performance a lot in NPU.
    

This PR is going to optimize the decomposition OPs in ONNX frontend (DequantizeLinear) to match the NPU optimize pattern, before this PR, the INT4 phi3 onnx model using NPUW runs:
1st token 8000ms, 2nd token 9040ms, 3rd+ token avg 4000ms

apply this PR, the same model could reach to:
1st token 3400ms, 2nd token 3400ms, 3rd+ token avg 250ms

16x speed up in NPUW for phi3 model

@bopeng1234 bopeng1234 requested review from a team as code owners April 22, 2025 02:21
@github-actions github-actions bot added category: ONNX FE OpenVINO ONNX FrontEnd category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Apr 22, 2025
@bopeng1234
Copy link
Contributor Author

@gkrivor created this new one for NPU, run with onnx model (QDQ, dequantize linear OP)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin category: ONNX FE OpenVINO ONNX FrontEnd ExternalIntelPR External contributor from Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants