about MLA kv cache #4156

FL77N · 2025-03-07T01:55:56Z

FL77N
Mar 7, 2025

Hello, I wanted to kindly ask for clarification: In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used? Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature? Thank you for your guidance

Answered by ispobock

Mar 7, 2025

In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used?

Prefill will not use KV cache. Extend and decode will use KV cache.

Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature?

Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.

View full answer

ispobock · 2025-03-07T14:14:48Z

ispobock
Mar 7, 2025
Collaborator

In the DeepseekV2AttentionMLA implementation within sglang, is it correct that the prefill phase does not utilize KV cache, and the decode phase is when KV cache is used?

Prefill will not use KV cache. Extend and decode will use KV cache.

Additionally, when employing MLA, would it be advisable to avoid enabling the mix chunked feature?

Yes. Prefill and decode have different computation/memory access characteristics. We optimize it with different forward logic. It's better to use separate batch.

2 replies

FL77N Mar 10, 2025
Author

thank for you reply!

imoisture Oct 30, 2025

what is the difference between Extend and Prefill mode?🦾

wejoncy · 2025-03-17T11:55:10Z

wejoncy
Mar 17, 2025

Hi @ispobock
Kindly asking, Is it possible to go normal branch in prefill/extend phase? I am trying to enable it for a 32k seq prefill, and the output is not correct. However, the profiling shows forward_normal is faster than forward_absorb().

Would you point me how to enable the forward_normal with extend support in subsequent chunks? Much thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about MLA kv cache #4156

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

about MLA kv cache #4156

Uh oh!

Uh oh!

FL77N Mar 7, 2025

Replies: 2 comments · 2 replies

Uh oh!

ispobock Mar 7, 2025 Collaborator

Uh oh!

FL77N Mar 10, 2025 Author

Uh oh!

imoisture Oct 30, 2025

Uh oh!

wejoncy Mar 17, 2025

FL77N
Mar 7, 2025

Replies: 2 comments 2 replies

ispobock
Mar 7, 2025
Collaborator

FL77N Mar 10, 2025
Author

wejoncy
Mar 17, 2025