Skip to content

add onednn w8a16 gemm#24

Merged
zufangzhu merged 5 commits intovllm-project:mainfrom
zufangzhu:zufang/onednn_wf8a16
Sep 3, 2025
Merged

add onednn w8a16 gemm#24
zufangzhu merged 5 commits intovllm-project:mainfrom
zufangzhu:zufang/onednn_wf8a16

Conversation

@zufangzhu
Copy link
Copy Markdown
Collaborator

add third_party/oneDNN
migrating onednn w8a16 gemm and add tests

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Copilot AI review requested due to automatic review settings August 22, 2025 05:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds oneDNN w8a16 GEMM support to the vLLM XPU kernels by migrating oneDNN-based w8a16 (weight 8-bit, activation 16-bit) GEMM operations and adding corresponding tests. This enables weight-only quantization for efficient inference on Intel XPU devices.

  • Adds oneDNN as a third-party dependency via Git submodule
  • Implements FP8 linear layer with weight-only quantization support
  • Adds oneDNN-based FP8 GEMM kernel implementation for w8a16 operations

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vllm_xpu_kernels/layers/quantization/utils.py Defines quantization enums and mappings for different quantization methods and data types
vllm_xpu_kernels/layers/quantization/fp8_linear.py Implements WeightOnlyQuantizedLinear class with FP8 quantization support and GPTQ/AWQ compatibility
tests/test_fp8_linear.py Adds comprehensive tests for FP8 linear layer with various parameter combinations
third_party/oneDNN Adds oneDNN submodule for low-level GEMM operations
csrc/xpu/onednn/onednn_ext.h Provides oneDNN extensions and utilities for XPU kernel integration
csrc/xpu/onednn/fp8_gemm_w8a16.h Header for FP8 GEMM w8a16 implementation
csrc/xpu/onednn/fp8_gemm_w8a16.cpp Core implementation of FP8 GEMM w8a16 operations
cmake/Modules/FindoneDNN.cmake CMake module for finding and configuring oneDNN dependency
setup.py Updates build configuration to use Intel compilers for XPU targets
CMakeLists.txt Integrates oneDNN into the build system
csrc/xpu/torch_bindings.cpp Adds PyTorch bindings for fp8_gemm_w8a16 operation
csrc/xpu/ops.h Declares the fp8_gemm_w8a16 function interface
.gitmodules Configures oneDNN as a Git submodule

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a section in doc to illustrate onednn compile & link issue/best practice?

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>

By linking statically, we avoid potential performance variability introduced by different builds or configurations of DNNL that might be present on the host system.

#### 3. **Avoiding Runtime Errors**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe one more reason: torch-xpu also use static link. cc @rogerxfeng8

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
@zufangzhu zufangzhu force-pushed the zufang/onednn_wf8a16 branch from 9e73b8c to 154a402 Compare August 26, 2025 08:47
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
CMakeLists.txt Outdated
# Import torch cmake configuration.
find_package(Torch REQUIRED)

find_package(oneDNN QUIET)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUIRED is preferred which stops with error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will change this.

BTW, we also have env ONEDNN_FOUND to detect if the onednn is found.

README.md Outdated
@@ -43,3 +43,23 @@ VLLM_TARGET_DEVICE=xpu python3 setup.py bdist_wheel

### how to use in vLLM
Copy link
Copy Markdown
Collaborator

@rogerxfeng8 rogerxfeng8 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix existed typos:
line 15: Preparation
line 25: Build & installation
line 44: How to


if(ONEDNN_FOUND)
set(_ONEDNN_SRC)
file(GLOB _ONEDNN_SRC csrc/xpu/onednn/*.cpp)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about *.h?


SET(DNNL_LIBRARY_TYPE STATIC CACHE STRING "" FORCE)

SET(DNNL_CPU_RUNTIME "THREADPOOL" CACHE STRING "oneDNN cpu backend" FORCE)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is a copy from onednn makefile. cpu runtime can be cleaned up.

Copy link
Copy Markdown
Collaborator Author

@zufangzhu zufangzhu Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need set this, or the OneDNN will set this env to default OMP, which will add dependency to libiomp5.so.

Copy link
Copy Markdown
Collaborator

@rogerxfeng8 rogerxfeng8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code implementation looks good to me

Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
@jikunshang jikunshang requested a review from baodii September 2, 2025 07:11
Copy link
Copy Markdown
Collaborator

@baodii baodii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zufangzhu zufangzhu merged commit eb77574 into vllm-project:main Sep 3, 2025
3 checks passed
@zufangzhu zufangzhu deleted the zufang/onednn_wf8a16 branch November 5, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants