[Executorch][llm] Enable leveraging ring kv cache via module swap #10611

kimishpatel · 2025-05-01T17:19:19Z

Stack from ghstack (oldest at bottom):

This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3.

Differential Revision: D73891426

This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]

pytorch-bot · 2025-05-01T17:19:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10611

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CI workflows being skipped on PR

❌ 7 New Failures

As of commit e47dfd9 with merge base cd3b53d ():

NEW FAILURES - The following jobs have failed:

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 7de0c2b972d3c8b626eca7512dc7fe9845fd98e39595be3875166afa352957e0 /exec failed with exit code 1
pull / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 2b1320fe9e0c5560ebf298cde3bb8cccea58ec6eff1d9db361579711b4bbc203 /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
examples/models/llama/tests/test_simple_sdpa.py::SDPATest::test_simple_sdpa
pull / unittest / macos / macos-job (gh)
examples/models/llama/tests/test_ring_attention.py::TestRingAttention::test_single_token_processing_custom
pull / unittest-arm-backend-with-no-fvp (test_pytest_ops) / linux-job (gh)
backends/arm/test/ops/test_bmm.py::TestBMM::test_bmm_single_input_tosa_MI_2
pull / unittest-editable / linux / linux-job (gh)
examples/models/llama/tests/test_simple_sdpa.py::SDPATest::test_simple_sdpa
pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-05-01T17:19:56Z

This pull request was exported from Phabricator. Differential Revision: D73891426

…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]

facebook-github-bot · 2025-05-05T14:08:06Z

This pull request was exported from Phabricator. Differential Revision: D73891426

kimishpatel requested review from lucylq and jackzhxng as code owners May 1, 2025 17:19

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 1, 2025

facebook-github-bot added the fb-exported label May 1, 2025

kimishpatel added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label May 5, 2025

digantdesai approved these changes May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Executorch][llm] Enable leveraging ring kv cache via module swap #10611

[Executorch][llm] Enable leveraging ring kv cache via module swap #10611

kimishpatel commented May 1, 2025 •

edited

Loading

pytorch-bot bot commented May 1, 2025 •

edited

Loading

facebook-github-bot commented May 1, 2025

facebook-github-bot commented May 5, 2025

[Executorch][llm] Enable leveraging ring kv cache via module swap #10611

Are you sure you want to change the base?

[Executorch][llm] Enable leveraging ring kv cache via module swap #10611

Conversation

kimishpatel commented May 1, 2025 • edited Loading

pytorch-bot bot commented May 1, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10611

❗ 1 Active SEVs

❌ 7 New Failures

facebook-github-bot commented May 1, 2025

facebook-github-bot commented May 5, 2025

kimishpatel commented May 1, 2025 •

edited

Loading

pytorch-bot bot commented May 1, 2025 •

edited

Loading