DeepSeek V3.2 support and optimization plan

To support DeepSeek V3.2 model on XPU, we will add Sparse MLA backend into vllm and several new XPU kernels.

## Functionlity

- [x] XPU MLA Sparse backend. https://github.com/vllm-project/vllm/pull/33230
- [x] fallback to Triton fp8 moe backend for w8a8 https://github.com/vllm-project/vllm/pull/36458
- [ ] enable sycl kernels for indexer in vLLM https://github.com/vllm-project/vllm/pull/37888
  - this will depend on next vllm-xpu-kernel release

## Optimizations

- [x] add sycl kernel for indexer_k_quant_and_cache. https://github.com/vllm-project/vllm-xpu-kernels/pull/193
- [x] add sycl kernel for cp_gather_indexer_k_quant_cache. https://github.com/vllm-project/vllm-xpu-kernels/pull/210
- [x] add sycl kernel for top_k_per_row_prefill/decode. https://github.com/vllm-project/vllm-xpu-kernels/pull/191

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek V3.2 support and optimization plan #154

Functionlity

Optimizations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DeepSeek V3.2 support and optimization plan #154

Description

Functionlity

Optimizations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions