[5/n] Migrate CUTLASS MLA, hadamard, awq, allspark and DSV3 fused a gemm to torch stable ABI by mikaylagawarecki · Pull Request #38671 · vllm-project/vllm

mikaylagawarecki · 2026-04-01T00:14:13Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

On A100

  python -m pytest tests/kernels/quantization/test_allspark_gemm.py

On H100

  python -m pytest tests/kernels/quantization/test_hadacore.py
  python -m pytest tests/kernels/quantization/test_awq.py

On B200

  python -m pytest tests/kernels/attention/test_cutlass_mla_decode.py

Deepseek gemm kernel does not seem to have a test

Test Result

A100:

H100:

B200:

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

gemini-code-assist

Code Review

This pull request migrates several CUDA kernels—including AWQ, AllSpark, DeepSeek V3 fused A GEMM, Hadacore, and CUTLASS MLA—from the standard extension to the stable ABI extension (_C_stable_libtorch). The changes involve updating CMakeLists.txt to reassign source files, replacing standard Torch types and macros with stable ABI equivalents (e.g., torch::stable::Tensor, STD_TORCH_CHECK), and implementing stable ABI-compliant utilities for device property caching and cuBLAS handle retrieval. Feedback highlights critical issues regarding thread safety with global workspace tensors, potential compilation failures when using non-movable types in containers, and the need for better bounds checking and naming consistency in the new utility functions.

gemini-code-assist · 2026-04-01T00:19:03Z

csrc/libtorch_stable/torch_utils.h

+// Device properties cache for stable ABI compatibility.
+// Uses raw CUDA/HIP APIs instead of ATen functions.
+// Using inline ensures a single instance across all translation units.
+inline std::deque<std::once_flag> device_flags;


The use of std::deque<std::once_flag> is problematic because std::once_flag is non-copyable and non-movable. While std::deque generally provides stable pointers to its elements, the resize operation (line 35) requires the type to be MoveInsertable according to the C++ standard, which std::once_flag is not. This will likely lead to compilation errors on many toolchains. A better approach is to initialize all device properties at once during the global initialization phase, removing the need for per-device once_flag containers.

The code here is actually a very slight adaptation of the code in torch https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cuda/CUDAContext.cpp#L12-L59 to make it stable

(granted torch uses c10::once_flag, but that is also non-copyable and non-movable which has the same issue)

Since the std::deque is only ever resized once from size 0 to num_devices, I don't think this is actually problematic. However, I I can fix this if anyone thinks it is problematic

csrc/libtorch_stable/torch_utils.h

gemini-code-assist · 2026-04-01T00:19:03Z

csrc/libtorch_stable/quantization/gptq_allspark/allspark_qgemm_w8a16.cu

+#include "core/registration.h"
+#include "libtorch_stable/torch_utils.h"
+
+torch::stable::Tensor as_g_workspace;


The global variable as_g_workspace of type torch::stable::Tensor introduces a significant race condition. In a multi-threaded or multi-stream environment, concurrent calls to allspark_w8a16_gemm will attempt to check and reallocate this global tensor (lines 991-996), leading to memory corruption or use-after-free errors when one thread overwrites the workspace while another is using it. For stable ABI compatibility and thread safety, workspace memory should be managed via a thread-local cache, a per-device map, or ideally passed as an argument from the Python allocator.

pre-existing

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Pure move, no code changes. Preparatory step for stable ABI migration. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Pure move, no code changes. Preparatory step for stable ABI migration. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

mergify bot added ci/build nvidia labels Apr 1, 2026

github-project-automation bot added this to NVIDIA Apr 1, 2026

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

Move CUTLASS MLA files from csrc to csrc/libtorch_stable

36a2041

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

mikaylagawarecki force-pushed the new-stable-abi-phase5 branch from 10e67b6 to 8bd7514 Compare April 1, 2026 00:53

mikaylagawarecki added 9 commits March 31, 2026 17:56

[a/n] Migrate CUTLASS MLA to torch stable ABI

e029c2a

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Move Hadamard files from csrc to csrc/libtorch_stable

f0470f0

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

[b/n] Migrate Hadamard (hadacore) kernel to torch stable ABI

3d60936

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Move AWQ files from csrc to csrc/libtorch_stable

e984b08

Pure move, no code changes. Preparatory step for stable ABI migration. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

[c/n] Migrate AWQ kernels to torch stable ABI

8ce3e88

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Move DSV3 fused A GEMM from csrc to csrc/libtorch_stable

3fda7ba

Pure move, no code changes. Preparatory step for stable ABI migration. Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

[d/n] Migrate DSV3 fused A GEMM to torch stable ABI

c0db8d5

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

Move AllSpark files from csrc to csrc/libtorch_stable

f576b95

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

[e/n] Migrate AllSpark kernels to torch stable ABI

2233700

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>

mikaylagawarecki force-pushed the new-stable-abi-phase5 branch from 8bd7514 to 2233700 Compare April 1, 2026 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[5/n] Migrate CUTLASS MLA, hadamard, awq, allspark and DSV3 fused a gemm to torch stable ABI#38671

[5/n] Migrate CUTLASS MLA, hadamard, awq, allspark and DSV3 fused a gemm to torch stable ABI#38671
mikaylagawarecki wants to merge 10 commits intovllm-project:mainfrom
mikaylagawarecki:new-stable-abi-phase5

mikaylagawarecki commented Apr 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

mikaylagawarecki Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

mikaylagawarecki Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mikaylagawarecki commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikaylagawarecki commented Apr 1, 2026 •

edited

Loading

mikaylagawarecki Apr 1, 2026 •

edited

Loading