-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[GPU] Add the capability for KV cache to update past KV #33114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Kotomi-Du
merged 43 commits into
openvinotoolkit:master
from
Kotomi-Du:update_kvcache_node
Jan 21, 2026
Merged
Changes from 36 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
5a95cc6
fuse GQA slice node into kvCache for in-place crop
Kotomi-Du 84d8095
fix conformance issue
Kotomi-Du ab5be7c
Use RemoteTensor to reorder KV cache
mdvoretc-intel a0ed479
Add a kernel to reorder KV cache
mdvoretc-intel 1c72628
Add KVCache index fusion for reorder
mdvoretc-intel 4d67227
Fix basic issues
mdvoretc-intel ff94cfe
Prevent KV reorder execution for cases where it's not required
mdvoretc-intel c85841f
Fix scalar arguments bug, remove debug prints
mdvoretc-intel 817c983
Remove unused gather_by_axis code
mdvoretc-intel b9d5f30
Fix input offsets
mdvoretc-intel 07c75d8
Add unit test case
mdvoretc-intel 1586380
Add feature bounds check
mdvoretc-intel d338bcc
clean up code
Kotomi-Du 648a5fc
clean up execution stage
Kotomi-Du d7043fe
use scatterElementUpdate kernel instead of self customized kernel
Kotomi-Du 145e0f5
delete customized kernel path
Kotomi-Du e01cedf
clean up code
Kotomi-Du 4c2d73a
fix code style
Kotomi-Du 9db4cc7
adjust index for compressed KV stage when update_kv stage is existed
Kotomi-Du 73a739e
refactor tests, merge duplicated code
ZackyLake 7914ddd
refactor kvcache stage.
ZackyLake 8e74647
remove update_kv logic on compress kv.
ZackyLake 9bfb862
remove indirect support on kv_update due to lack of test.
ZackyLake 5326963
add debug priont for skipped kernel
ZackyLake f321f9b
Merge branch 'master' into update_kvcache_node
ZackyLake ea925ed
Merge branch 'master' into update_kvcache_node
Kotomi-Du 6cc91a1
fix kv fusion pattern。
ZackyLake 561bc54
Merge branch 'master' into update_kvcache_node
ZackyLake d836186
fix test
ZackyLake 6344883
move trim_length to kv_cache_inst.h
Kotomi-Du 4d06b8e
fix kv fusion for test(stridedslice)
ZackyLake 644cc75
include trim-only support
ZackyLake abcdf72
fix concat_axis signness
ZackyLake a1ecd57
fix fusion logic
ZackyLake f8f1a58
fix signedness
ZackyLake 3d65d54
allow trim on indirect kvcache.
ZackyLake 313a748
Merge branch 'master' into update_kvcache_node
Kotomi-Du c429feb
Make CompressedKV compatible with trim.
ZackyLake 6781639
fix
ZackyLake 6a3bbd2
Merge branch 'master' into update_kvcache_node
Kotomi-Du 3df8bac
Merge branch 'master' into update_kvcache_node
Kotomi-Du 765e167
Merge branch 'master' into update_kvcache_node
Kotomi-Du 0dc75bc
add comment
ZackyLake File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate more why/how it can be optimized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If read_value is not optimized, we will get incorrect result among scatterelementupdate, so some change here is needed.
Original code is simply checking if
readvalueis being used by single user, to be honest I don't know if it can prove anything --- that user could be actually a no-op with multiple further users.From the comment in its caller, looks like it's actually trying to ensure
assignwill not impact any following user ofreadvalue, the original logic looks not very promising already.Anyway, for our case,
readvalue's user eventually need to passkvcachebeforeassign, which makeskvcachenode the dominator ofassignnode, so it could be safely treated as ifreadvalueis directly connecting tokvcache, and could be optimized.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my ask here was to add comment on "why/how". As it is not blocking code merge, could you follow-up as a separate PR?