Skip to content

fix: fix triton kernel tiling and fp8_gemm swizzle#1098

Open
JackeyLove1 wants to merge 1 commit intodeepseek-ai:mainfrom
JackeyLove1:main
Open

fix: fix triton kernel tiling and fp8_gemm swizzle#1098
JackeyLove1 wants to merge 1 commit intodeepseek-ai:mainfrom
JackeyLove1:main

Conversation

@JackeyLove1
Copy link
Copy Markdown

  1. Add contiguous/alignment hints for act quant and weight dequant kernels.
  2. Add GROUP_SIZE to fp8_gemm autotune configs and use swizzle2d to improve performance

Add contiguous/alignment hints for act quant and weight dequant kernels.
Add GROUP_SIZE to fp8_gemm autotune configs and use swizzle2d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant