Skip to content

[webgpu] Optimize DP4AMatMulNBitsSmallMProgram for intel#2

Open
jing-bao wants to merge 1 commit intomainfrom
dp4a_matmul_nbits_smallm_opt
Open

[webgpu] Optimize DP4AMatMulNBitsSmallMProgram for intel#2
jing-bao wants to merge 1 commit intomainfrom
dp4a_matmul_nbits_smallm_opt

Conversation

@jing-bao
Copy link
Owner

Description

This PR optimizes the Intel path for the DP4AMatMulNBitsSmallMProgram by tuning tile_size and tile_size_k_vec.

Motivation and Context

With this change, we achieved >8% performance boost on Intel iGPUs (Xe-LP and Xe2-LPG) for phi-4-mini-accuracy4 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant