-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[GPU] Enable multi head size support for KV cache #29936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
build_jenkins |
build_jenkins |
build_jenkins |
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_base.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_micro.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/pa_sdpa_kernel_opt.cpp
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_base.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_micro.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/pa_kv_cache_update_ref.cl
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/pa_kv_cache_update_ref.cl
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/sdpa_opt.cl
Outdated
Show resolved
Hide resolved
build_jenkins |
Found regression issue when running qwen2-7b with paged_attention. Need to check on it. |
build_jenkins |
Is the regression issue resolved? If not, please add a label of "Do not merge" or "Under perf check" |
2e6968e
to
ec7d46f
Compare
...ugins/intel_gpu/tests/functional/single_layer_tests/dynamic/scaled_dot_product_attention.cpp
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/unit/test_cases/paged_attention_gpu_test.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/unit/test_cases/paged_attention_gpu_test.cpp
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/sdpa_opt.cl
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/pa_sdpa_kernel_opt.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_opt.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_ref.cpp
Outdated
Show resolved
Hide resolved
7e73669
to
3d82d72
Compare
build_jenkins |
3d82d72
to
ba58dbe
Compare
build_jenkins |
b9153fc
to
d2a58d5
Compare
build_jenkins |
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/pa_sdpa_kernel_opt.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/pa_sdpa_kernel_opt.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/src/kernel_selector/kernels/sdpa/sdpa_kernel_opt.cpp
Outdated
Show resolved
Hide resolved
d2a58d5
to
553897f
Compare
build_jenkins |
553897f
to
e2d33bb
Compare
build_jenkins |
a7bf0f7
to
f6d4cf7
Compare
build_jenkins |
...ugins/intel_gpu/tests/functional/single_layer_tests/dynamic/scaled_dot_product_attention.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good to me
In continue batching, head size for key and value will be different. Add support for it for sdpa and paged attention. sdpa_opt has updated to work correctly for SDPA. Besides, force multi head for dGPU to use sdpa_opt as sdpa_micro doesn't work yet
build_jenkins |
In continue batching, head size for key and value will be different. Add support for it for sdpa.
Tickets:
CVS-162339 and CVS-161089