add onednn w8a16 gemm by zufangzhu · Pull Request #24 · vllm-project/vllm-xpu-kernels

zufangzhu · 2025-08-22T05:47:57Z

add third_party/oneDNN
migrating onednn w8a16 gemm and add tests

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Copilot

Pull Request Overview

This PR adds oneDNN w8a16 GEMM support to the vLLM XPU kernels by migrating oneDNN-based w8a16 (weight 8-bit, activation 16-bit) GEMM operations and adding corresponding tests. This enables weight-only quantization for efficient inference on Intel XPU devices.

Adds oneDNN as a third-party dependency via Git submodule
Implements FP8 linear layer with weight-only quantization support
Adds oneDNN-based FP8 GEMM kernel implementation for w8a16 operations

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
vllm_xpu_kernels/layers/quantization/utils.py	Defines quantization enums and mappings for different quantization methods and data types
vllm_xpu_kernels/layers/quantization/fp8_linear.py	Implements WeightOnlyQuantizedLinear class with FP8 quantization support and GPTQ/AWQ compatibility
tests/test_fp8_linear.py	Adds comprehensive tests for FP8 linear layer with various parameter combinations
third_party/oneDNN	Adds oneDNN submodule for low-level GEMM operations
csrc/xpu/onednn/onednn_ext.h	Provides oneDNN extensions and utilities for XPU kernel integration
csrc/xpu/onednn/fp8_gemm_w8a16.h	Header for FP8 GEMM w8a16 implementation
csrc/xpu/onednn/fp8_gemm_w8a16.cpp	Core implementation of FP8 GEMM w8a16 operations
cmake/Modules/FindoneDNN.cmake	CMake module for finding and configuring oneDNN dependency
setup.py	Updates build configuration to use Intel compilers for XPU targets
CMakeLists.txt	Integrates oneDNN into the build system
csrc/xpu/torch_bindings.cpp	Adds PyTorch bindings for fp8_gemm_w8a16 operation
csrc/xpu/ops.h	Declares the fp8_gemm_w8a16 function interface
.gitmodules	Configures oneDNN as a Git submodule

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

vllm_xpu_kernels/layers/quantization/fp8_linear.py

csrc/xpu/onednn/fp8_gemm_w8a16.cpp

jikunshang

can you add a section in doc to illustrate onednn compile & link issue/best practice?

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

jikunshang · 2025-08-26T05:10:46Z

README.md

+
+By linking statically, we avoid potential performance variability introduced by different builds or configurations of DNNL that might be present on the host system.
+
+#### 3. **Avoiding Runtime Errors**


maybe one more reason: torch-xpu also use static link. cc @rogerxfeng8

csrc/xpu/onednn/fp8_gemm_w8a16.h

csrc/xpu/onednn/onednn_ext.h

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

rogerxfeng8 · 2025-08-27T08:48:13Z

CMakeLists.txt

 # Import torch cmake configuration.
 find_package(Torch REQUIRED)

+find_package(oneDNN QUIET)


REQUIRED is preferred which stops with error.

Got it, I will change this.

BTW, we also have env ONEDNN_FOUND to detect if the onednn is found.

rogerxfeng8 · 2025-08-27T08:54:23Z

README.md

@@ -43,3 +43,23 @@ VLLM_TARGET_DEVICE=xpu python3 setup.py bdist_wheel

 ### how to use in vLLM


Fix existed typos:
line 15: Preparation
line 25: Build & installation
line 44: How to

rogerxfeng8 · 2025-08-27T09:01:46Z

CMakeLists.txt


 if(ONEDNN_FOUND)
+  set(_ONEDNN_SRC)
+  file(GLOB _ONEDNN_SRC csrc/xpu/onednn/*.cpp)


How about *.h?

rogerxfeng8 · 2025-08-27T09:08:22Z

cmake/Modules/FindoneDNN.cmake

+
+SET(DNNL_LIBRARY_TYPE STATIC CACHE STRING "" FORCE)
+
+SET(DNNL_CPU_RUNTIME "THREADPOOL" CACHE STRING "oneDNN cpu backend" FORCE)


Assuming this is a copy from onednn makefile. cpu runtime can be cleaned up.

We need set this, or the OneDNN will set this env to default OMP, which will add dependency to libiomp5.so.

rogerxfeng8

code implementation looks good to me

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

baodii

LGTM

add onednn w8a16 gemm

f2b2510

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

Copilot AI review requested due to automatic review settings August 22, 2025 05:47

Copilot AI reviewed Aug 22, 2025

View reviewed changes

jikunshang reviewed Aug 25, 2025

View reviewed changes

add explanation for onednn link method

6245d5e

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

jikunshang reviewed Aug 26, 2025

View reviewed changes

csrc/xpu/onednn/fp8_gemm_w8a16.h Outdated Show resolved Hide resolved

dbyoung18 added the onednn label Aug 26, 2025

jikunshang reviewed Aug 26, 2025

View reviewed changes

csrc/xpu/onednn/onednn_ext.h Outdated Show resolved Hide resolved

csrc/xpu/onednn/onednn_ext.h Show resolved Hide resolved

update

154a402

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

zufangzhu force-pushed the zufang/onednn_wf8a16 branch from 9e73b8c to 154a402 Compare August 26, 2025 08:47

update new url for oneDNN

d0c13a5

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

rogerxfeng8 reviewed Aug 27, 2025

View reviewed changes

CMakeLists.txt

if(ONEDNN_FOUND)

set(_ONEDNN_SRC)

file(GLOB _ONEDNN_SRC csrc/xpu/onednn/*.cpp)

Copy link
Copy Markdown

Collaborator

rogerxfeng8 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about *.h?

rogerxfeng8 reviewed Aug 27, 2025

View reviewed changes

rogerxfeng8 approved these changes Aug 27, 2025

View reviewed changes

update

cde5ea8

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

jikunshang requested a review from baodii September 2, 2025 07:11

baodii approved these changes Sep 3, 2025

View reviewed changes

zufangzhu merged commit eb77574 into vllm-project:main Sep 3, 2025
3 checks passed

zufangzhu deleted the zufang/onednn_wf8a16 branch November 5, 2025 08:14


		By linking statically, we avoid potential performance variability introduced by different builds or configurations of DNNL that might be present on the host system.

		#### 3. Avoiding Runtime Errors

		@@ -43,3 +43,23 @@ VLLM_TARGET_DEVICE=xpu python3 setup.py bdist_wheel

		### how to use in vLLM


		SET(DNNL_LIBRARY_TYPE STATIC CACHE STRING "" FORCE)

		SET(DNNL_CPU_RUNTIME "THREADPOOL" CACHE STRING "oneDNN cpu backend" FORCE)

Conversation

zufangzhu commented Aug 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

jikunshang Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

zufangzhu Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rogerxfeng8 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

zufangzhu Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

rogerxfeng8 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogerxfeng8 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

rogerxfeng8 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

zufangzhu Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogerxfeng8 left a comment

Choose a reason for hiding this comment

Uh oh!

baodii left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rogerxfeng8 Aug 27, 2025 •

edited

Loading

zufangzhu Aug 28, 2025 •

edited

Loading