Skip to content

[WIP] Optimize GDN chunk_fwd_o_kernel performance#297

Draft
YangQun1 wants to merge 5 commits intovllm-project:mainfrom
YangQun1:dev/fwd_o_opt
Draft

[WIP] Optimize GDN chunk_fwd_o_kernel performance#297
YangQun1 wants to merge 5 commits intovllm-project:mainfrom
YangQun1:dev/fwd_o_opt

Conversation

@YangQun1
Copy link
Copy Markdown
Contributor

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Optimize chunk_fwd_o_kernel performance:

  • use native exp
  • merge WS and QS gemm to reuse S tensor in register

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

YangQun1 and others added 5 commits April 23, 2026 13:12
Signed-off-by: yangqun <qun.yang@intel.com>
Signed-off-by: yangqun <qun.yang@intel.com>
Signed-off-by: yangqun <qun.yang@intel.com>
Signed-off-by: yangqun <qun.yang@intel.com>
Signed-off-by: yangqun <qun.yang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant