Skip to content
Discussion options

You must be logged in to vote

In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used?

Prefill will not use KV cache. Extend and decode will use KV cache.

Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature?

Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@FL77N
Comment options

@imoisture
Comment options

Answer selected by FL77N
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants