Skip to content

Add chunk_causal_conv1d_opt_kernel in GDN for Qwen3.5#278

Draft
YangQun1 wants to merge 3 commits intovllm-project:mainfrom
YangQun1:dev/qwen3_5-gdn-opt-causal_conv1d
Draft

Add chunk_causal_conv1d_opt_kernel in GDN for Qwen3.5#278
YangQun1 wants to merge 3 commits intovllm-project:mainfrom
YangQun1:dev/qwen3_5-gdn-opt-causal_conv1d

Conversation

@YangQun1
Copy link
Copy Markdown
Contributor

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Add an optimized chunk causal conv1d kernel in GDN module for Qwen3.5

Test Plan

Unit test + model e2e test + performance benchmark

Test Result

Unit test

  • pass

Model e2e test

  • TODO

Performance benchmark

  • bs=1, total token 4k config:
    • original kernel: 672 us, 20% of GDN total time
    • new kernel: 475 us, 15% of GDN total time
  • bs=4, total token 16k config:
    • origin kernel: 2823 us, 25% of GDN total time
    • new kernel: 1834 us, 17% of GDN total time

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: yangqun <qun.yang@intel.com>
Signed-off-by: yangqun <qun.yang@intel.com>
Signed-off-by: yangqun <qun.yang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant