Skip to content

Conversation

@orozery
Copy link
Contributor

@orozery orozery commented Dec 15, 2025

This PR enables cross-layer blocks for the triton attention backend.
An accuracy test is added to the CPU KV offloading test.

This commits enables cross-layer blocks for the triton attention backend.
An accuracy test is added to the CPU KV offloading test.

Signed-off-by: Or Ozeri <[email protected]>
@orozery orozery requested a review from tdoublep as a code owner December 15, 2025 10:19
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added the v1 label Dec 15, 2025
@orozery
Copy link
Contributor Author

orozery commented Dec 15, 2025

cc @NickLucche @heheda12345
This should allow cross-layers to be used on models such as openai/gpt-oss-20b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant