Skip to content

fix: consider kv cache 64 align to estimate kv cache memory size#417

Open
rebel-wonsubkim wants to merge 1 commit intodevfrom
fix_cache_64align
Open

fix: consider kv cache 64 align to estimate kv cache memory size#417
rebel-wonsubkim wants to merge 1 commit intodevfrom
fix_cache_64align

Conversation

@rebel-wonsubkim
Copy link
Copy Markdown
Contributor

@rebel-wonsubkim rebel-wonsubkim commented Feb 27, 2026

Problem - in opt model test, kv cache oom occurred

Solution - when calculating kv cache available memory, SHOULD consider device kv cache 64 memory align
if head_size is not aligned to 64 (=128B), SHOULD adjust available memory for kv cache

+ SHOULD consider device 64B memory align for kv cache

Signed-off-by: wonsub kim <subang0@rebellions.ai>
@rebel-wonsubkim rebel-wonsubkim changed the base branch from main to dev February 27, 2026 01:16
@rebel-wonsubkim rebel-wonsubkim changed the title fix : consider kv cache 64 align to estimate kv cache memory size fix: consider kv cache 64 align to estimate kv cache memory size Feb 27, 2026
Copy link
Copy Markdown
Collaborator

@rebel-jiwoopark rebel-jiwoopark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rebel-jiwoopark rebel-jiwoopark added the torch.compile torch.compile based implementation label Mar 11, 2026
@rebel-jiwoopark
Copy link
Copy Markdown
Collaborator

@rebel-wonsubkim If there are no problems, we can go ahead and merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

torch.compile torch.compile based implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants