[Bug] Missing saving kv for LLaDA2

### Checklist

- [ ] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [ ] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [ ] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.

### Describe the bug

When we override `enable_fused_set_kv_buffer` to False and run `test/registered/dllm/test_llada2_mini.py`, this test will fail.

The root cause is that, the flashinfer attention backend skip the `save_kv_cache`, which is incorrect. After removing the following lines, the test will pass again.

https://github.com/sgl-project/sglang/blob/d73f06f09149b137bb23ac1dcd3966acce06dc49/python/sglang/srt/layers/attention/flashinfer_backend.py#L815-L816


### Reproduction

1. Modify this function. Always return False 
https://github.com/sgl-project/sglang/blob/d73f06f09149b137bb23ac1dcd3966acce06dc49/python/sglang/srt/models/utils.py#L107-L114
2. Run `pytest -xss test/registered/dllm/test_llada2_mini.py`

### Environment

H200

	def enable_fused_set_kv_buffer(forward_batch: ForwardBatch):
	"""Enable fused set_kv_buffer only on CUDA with bfloat16 KV cache."""
	return (
	_is_cuda
	and hasattr(forward_batch.token_to_kv_pool, "dtype")
	and forward_batch.token_to_kv_pool.dtype == torch.bfloat16
	and not isinstance(forward_batch.token_to_kv_pool, SWAKVPool)
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Missing saving kv for LLaDA2 #19019

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if save_kv_cache and layer.attn_type == AttentionType.ENCODER_ONLY:
	save_kv_cache = False

[Bug] Missing saving kv for LLaDA2 #19019

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions