-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[GPU] Add the capability for KV cache to update past KV #33114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Kotomi-Du
merged 43 commits into
openvinotoolkit:master
from
Kotomi-Du:update_kvcache_node
Jan 21, 2026
Merged
Changes from 23 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
5a95cc6
fuse GQA slice node into kvCache for in-place crop
Kotomi-Du 84d8095
fix conformance issue
Kotomi-Du ab5be7c
Use RemoteTensor to reorder KV cache
mdvoretc-intel a0ed479
Add a kernel to reorder KV cache
mdvoretc-intel 1c72628
Add KVCache index fusion for reorder
mdvoretc-intel 4d67227
Fix basic issues
mdvoretc-intel ff94cfe
Prevent KV reorder execution for cases where it's not required
mdvoretc-intel c85841f
Fix scalar arguments bug, remove debug prints
mdvoretc-intel 817c983
Remove unused gather_by_axis code
mdvoretc-intel b9d5f30
Fix input offsets
mdvoretc-intel 07c75d8
Add unit test case
mdvoretc-intel 1586380
Add feature bounds check
mdvoretc-intel d338bcc
clean up code
Kotomi-Du 648a5fc
clean up execution stage
Kotomi-Du d7043fe
use scatterElementUpdate kernel instead of self customized kernel
Kotomi-Du 145e0f5
delete customized kernel path
Kotomi-Du e01cedf
clean up code
Kotomi-Du 4c2d73a
fix code style
Kotomi-Du 9db4cc7
adjust index for compressed KV stage when update_kv stage is existed
Kotomi-Du 73a739e
refactor tests, merge duplicated code
ZackyLake 7914ddd
refactor kvcache stage.
ZackyLake 8e74647
remove update_kv logic on compress kv.
ZackyLake 9bfb862
remove indirect support on kv_update due to lack of test.
ZackyLake 5326963
add debug priont for skipped kernel
ZackyLake f321f9b
Merge branch 'master' into update_kvcache_node
ZackyLake ea925ed
Merge branch 'master' into update_kvcache_node
Kotomi-Du 6cc91a1
fix kv fusion pattern。
ZackyLake 561bc54
Merge branch 'master' into update_kvcache_node
ZackyLake d836186
fix test
ZackyLake 6344883
move trim_length to kv_cache_inst.h
Kotomi-Du 4d06b8e
fix kv fusion for test(stridedslice)
ZackyLake 644cc75
include trim-only support
ZackyLake abcdf72
fix concat_axis signness
ZackyLake a1ecd57
fix fusion logic
ZackyLake f8f1a58
fix signedness
ZackyLake 3d65d54
allow trim on indirect kvcache.
ZackyLake 313a748
Merge branch 'master' into update_kvcache_node
Kotomi-Du c429feb
Make CompressedKV compatible with trim.
ZackyLake 6781639
fix
ZackyLake 6a3bbd2
Merge branch 'master' into update_kvcache_node
Kotomi-Du 3df8bac
Merge branch 'master' into update_kvcache_node
Kotomi-Du 765e167
Merge branch 'master' into update_kvcache_node
Kotomi-Du 0dc75bc
add comment
ZackyLake File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a right place to add such kv-cache specific field.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you suggest any other place for putting this? Here is the investigation on our side.
The table below shows all the available KVCache related files, but none of them are suitable to put this parameter which requires to be updated in each iteration in runtime.
Specifically, for
kv_cache_inst.h, kv_cache_inst::trim_length couldn't be updated in static functioncalc_output_layout(). It also doesn't make sense to set kv_cache_inst::trim_length as static to make it work, because it will lead to data race across multi-kv-instances or multi-threads.Furthermore,
kernel_imp_params.halso includes other op-specific variables with TODO comment (prior-box).So, it seems acceptable in our case as well.
ov::intel_gpu::op::KVCachesrc/plugins/intel_gpu/include/intel_gpu/op/kv_cache.hppcldnn::kv_cachesrc/plugins/intel_gpu/include/intel_gpu/primitives/kv_cache.hppcldnn::typed_primitive_inst<kv_cache>src/plugins/intel_gpu/src/graph/include/kv_cache_inst.hThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about introducing separate method in kv_cache_inst.h? You can make a non-static method and use it to store information in primitive_inst. Then this API can be just called after shape inference. You should not place this field in kernel_impl_param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated implementation looks good for trim_length.