Skip to content

[GPU] Enable multi head size support for KV cache #29936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 1, 2025

Conversation

clee30
Copy link
Contributor

@clee30 clee30 commented Apr 4, 2025

In continue batching, head size for key and value will be different. Add support for it for sdpa.

Tickets:

CVS-162339 and CVS-161089

@clee30 clee30 requested review from a team as code owners April 4, 2025 07:24
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Apr 4, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Apr 4, 2025
@p-durandin
Copy link
Contributor

build_jenkins

@sshlyapn sshlyapn added this to the 2025.2 milestone Apr 4, 2025
@clee30 clee30 force-pushed the kv_multiheadsize branch from 4ed2f07 to 9e5f1a8 Compare April 4, 2025 08:41
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 force-pushed the kv_multiheadsize branch from 9e5f1a8 to 0960ea1 Compare April 7, 2025 02:08
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 closed this Apr 8, 2025
@clee30 clee30 force-pushed the kv_multiheadsize branch from 3c5d405 to d119656 Compare April 8, 2025 10:02
@clee30 clee30 reopened this Apr 8, 2025
@p-durandin
Copy link
Contributor

build_jenkins

@clee30
Copy link
Contributor Author

clee30 commented Apr 9, 2025

Found regression issue when running qwen2-7b with paged_attention. Need to check on it.

@clee30 clee30 force-pushed the kv_multiheadsize branch from d44fd0a to dd777a0 Compare April 9, 2025 08:29
@p-durandin
Copy link
Contributor

build_jenkins

@yeonbok
Copy link
Contributor

yeonbok commented Apr 10, 2025

Is the regression issue resolved? If not, please add a label of "Do not merge" or "Under perf check"

@clee30 clee30 closed this Apr 23, 2025
@clee30 clee30 force-pushed the kv_multiheadsize branch from 2e6968e to ec7d46f Compare April 23, 2025 14:11
@clee30 clee30 reopened this Apr 24, 2025
@clee30 clee30 force-pushed the kv_multiheadsize branch from 7e73669 to 3d82d72 Compare April 25, 2025 08:58
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 force-pushed the kv_multiheadsize branch from 3d82d72 to ba58dbe Compare April 25, 2025 14:35
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 force-pushed the kv_multiheadsize branch 2 times, most recently from b9153fc to d2a58d5 Compare April 28, 2025 02:15
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 force-pushed the kv_multiheadsize branch from d2a58d5 to 553897f Compare April 29, 2025 13:18
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 closed this Apr 30, 2025
@clee30 clee30 force-pushed the kv_multiheadsize branch from 553897f to e2d33bb Compare April 30, 2025 10:33
@clee30 clee30 reopened this Apr 30, 2025
@p-durandin
Copy link
Contributor

build_jenkins

@clee30 clee30 force-pushed the kv_multiheadsize branch from a7bf0f7 to f6d4cf7 Compare April 30, 2025 14:35
@p-durandin
Copy link
Contributor

build_jenkins

Copy link
Contributor

@sshlyapn sshlyapn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me

In continue batching, head size for key and value will be different.
Add support for it for sdpa and paged attention.

sdpa_opt has updated to work correctly for SDPA.  Besides, force multi head for dGPU to use sdpa_opt as sdpa_micro doesn't work yet
@clee30 clee30 force-pushed the kv_multiheadsize branch from f6d4cf7 to b231031 Compare May 1, 2025 07:54
@p-durandin
Copy link
Contributor

build_jenkins

@p-durandin p-durandin enabled auto-merge May 1, 2025 08:11
@p-durandin p-durandin added this pull request to the merge queue May 1, 2025
Merged via the queue into openvinotoolkit:master with commit 307f4ed May 1, 2025
170 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin ExternalIntelPR External contributor from Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants