Skip to content

feat(Llama GQA): optional shared-KV matmul rewrite for attention. WIP: (env-gated)#905

Draft
vbaddi wants to merge 1 commit intoquic:mainfrom
vbaddi:feat/enable_llama_gqa_shared_kv
Draft

feat(Llama GQA): optional shared-KV matmul rewrite for attention. WIP: (env-gated)#905
vbaddi wants to merge 1 commit intoquic:mainfrom
vbaddi:feat/enable_llama_gqa_shared_kv

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Apr 3, 2026

This PR adds an optional Llama GQA attention rewrite that avoids explicit KV expansion in the normal eager attention path. The optimization is gated by an environment variable and is OFF by default, so baseline behavior is unchanged unless enabled.

How to enable for testing
export QEFF_LLAMA_GQA_SHARED_KV=1

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi marked this pull request as draft April 3, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant