-
Notifications
You must be signed in to change notification settings - Fork 41
DeepSeek V3.2 support and optimization plan #154
Copy link
Copy link
Open
Description
To support DeepSeek V3.2 model on XPU, we will add Sparse MLA backend into vllm and several new XPU kernels.
Functionlity
- XPU MLA Sparse backend. Add XPU MLA Sparse backend for DeepSeek v3.2 vllm#33230
- fallback to Triton fp8 moe backend for w8a8 [XPU] Support block fp8 moe by fallback to TritonExpert on XPU vllm#36458
- enable sycl kernels for indexer in vLLM [XPU] Enable topk_per_row and indexer_quant_cache kernels for DeepSeekV3.2 and GLM5 vllm#37888
- this will depend on next vllm-xpu-kernel release
Optimizations
- add sycl kernel for indexer_k_quant_and_cache. Support indexer_k_quant_and_cache #193
- add sycl kernel for cp_gather_indexer_k_quant_cache. Support cp_gather_indexer_k_quant_cache #210
- add sycl kernel for top_k_per_row_prefill/decode. Add Sycl topk per row kernel #191
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels