Skip to content

[JSEP/WebGPU] Remove explicit split operator in GQA when QKV packed.#22627

Closed
satyajandhyala wants to merge 3 commits intomainfrom
sajandhy/webgpu_gqa_remove_split
Closed

[JSEP/WebGPU] Remove explicit split operator in GQA when QKV packed.#22627
satyajandhyala wants to merge 3 commits intomainfrom
sajandhy/webgpu_gqa_remove_split

Conversation

@satyajandhyala
Copy link
Contributor

Description

Remove the explicit split operation to support packed QKV. Instead use indexing to read the key/value data from Q input.

Motivation and Context

Improve GQA performance.

@satyajandhyala satyajandhyala marked this pull request as ready for review October 29, 2024 17:56
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Oct 30, 2024
@satyajandhyala satyajandhyala marked this pull request as draft January 8, 2025 16:48
@satyajandhyala satyajandhyala marked this pull request as ready for review January 15, 2025 21:04
@satyajandhyala satyajandhyala marked this pull request as draft March 15, 2025 00:43
@snnn snnn closed this Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants