Commit bbe9bc0
[v0.13.0][Feature] Add DeepSeek v4 initial support (#8648)
### What this PR does / why we need it?
**DeepSeek V4 Support**: Added support for DeepSeek V4 by introducing
new operators and infrastructure, including the Compressor operator and
associated tiling logic.
Note that: this PR is for v0.13.0. After this PR merge will release a
special release v0.13.0rc3. vLLM Ascend team will also work on main
branch rebase work soon.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- CI passed
- Do e2e test for
https://modelscope.cn/models/deepseek-ai/DeepSeek-V4-Flash
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: GDzhu01 <809721801@qq.com>
Signed-off-by: LookAround0301 <lixushi@huawei.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: WithHades<244036962@qq.com>
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: coder-fny <985619145@qq.com>
Signed-off-by: slippersss <slippersss@126.com>
Signed-off-by: yiz-liu <liu_yizhou@outlook.com>
Signed-off-by: maoxx241 <maomaoyu870@gmail.com>
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
Signed-off-by: wxh571001500 <571001500@qq.com>
Signed-off-by: lcfenglinwan <lcfenglin@qq.com>
Signed-off-by: zhenwenqi_2024 <zhenwenqi_2022@qq.com>
Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
Signed-off-by: monologue815 <monologue815@qq.com>
Signed-off-by: Liexss <924834690@qq.com>
Signed-off-by: pinfa <1819563383@qq.com>
Signed-off-by: weinachuan<1173732899@qq.com>
Signed-off-by: chenchris2 <1349418798@qq.com>
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: GDzhu01 <809721801@qq.com>1 parent 2e5f72f commit bbe9bc0
900 files changed
Lines changed: 136446 additions & 1786 deletions
File tree
- .github/workflows
- csrc
- add_rms_norm_bias/op_host
- attention
- compressor
- op_host
- op_kernel
- arch32
- arch35
- vf
- indexer_compress_epilog
- op_host
- op_kernel
- inplace_partial_rotary_mul
- op_host
- op_kernel
- kv_compress_epilog
- op_host
- op_kernel
- lightning_indexer_custom
- op_host
- op_kernel
- lightning_indexer_quant_metadata
- examples
- op_api
- op_graph
- op_host
- op_kernel_aicpu
- quant_lightning_indexer_metadata
- op_api
- op_graph
- op_host
- op_kernel_aicpu
- quant_lightning_indexer
- op_host
- op_kernel
- arch32
- arch35
- vf
- rms_norm_dynamic_quant
- op_host
- op_kernel
- sparse_flash_attention_custom
- op_host
- op_kernel
- cmake
- modules
- scripts
- custom
- examples
- utest
- util
- third_party
- build/modules/patch
- common
- aicpu
- include
- common
- err
- external
- aclnn_kernels
- common
- fallback
- framework
- kernel
- op_graph
- static
- tiling_base
- tiling_sink
- src
- framework
- tiling_base
- tiling_sink
- stub
- inc/framework
- op_api
- aclnn_kernels
- common
- level0
- op_tiling
- register
- dispatch_layout/op_host
- gmm
- grouped_matmul_swiglu_quant_clamp
- op_host
- op_api
- op_kernel
- grouped_matmul_swiglu_quant_weight_nz_tensor_list
- op_host
- op_api
- op_kernel
- grouped_matmul_swiglu_quant_weight_nz_tensor_list/op_host
- matmul_allreduce_add_rmsnorm/op_host
- mc2
- dispatch_ffn_combine
- op_host
- op_api
- op_kernel
- moe_init_routing_quant_v2
- unpermute
- utils
- dispatch_gmm_combine_decode
- op_host
- op_api
- op_kernel
- dispatch_gmm_combine_decode
- epilogue
- block
- tile
- gemm
- block
- kernel
- raw_distributed
- dispatch_layout
- op_host
- op_api
- op_kernel
- kernel
- matmul_allreduce_add_rmsnorm
- op_host
- op_api
- op_kernel
- moe_combine_normal
- op_host
- op_api
- op_kernel
- utils
- moe_dispatch_normal
- op_host
- op_api
- op_kernel
- utils
- notify_dispatch
- op_host
- op_api
- op_kernel
- kernel
- moe_gating_top_k/op_host
- moe_init_routing_custom/op_host
- moe
- add_rms_norm_bias
- op_host
- op_kernel
- hc_post
- op_host
- op_kernel
- arch35
- hc_pre_inv_rms
- op_host
- op_kernel
- arch35
- hc_pre_sinkhorn
- op_host
- op_kernel
- arch35
- moe_gating_top_k_hash
- op_host
- op_kernel
- arch35
- moe_gating_top_k
- op_host
- op_kernel
- tiling_base
- moe_init_routing_custom
- op_host
- op_kernel
- notify_dispatch/op_host
- scripts
- opgen
- template
- add
- examples
- op_host
- op_kernel
- tests/ut
- package
- common
- cfg
- py
- utils
- sh
- latest_manager/scripts
- module/ascend
- ops_transformer
- scripts
- empty_package_scripts
- util
- utils/inc/kernel
- docs/source
- tutorials
- user_guide/support_matrix
- tests
- e2e
- multicard/2-cards
- nightly/single_node/ops/singlecard_ops
- singlecard
- ut/ops
- vllm_ascend
- attention
- compilation
- passes
- core
- distributed
- models
- layer/attention
- ops
- fused_moe
- triton
- patch
- platform
- worker
- quantization
- compressed_tensors
- spec_decode
- transformers_utils/configs
- worker
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| 87 | + | |
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
| |||
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| |||
194 | 196 | | |
195 | 197 | | |
196 | 198 | | |
| 199 | + | |
197 | 200 | | |
198 | 201 | | |
199 | 202 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
84 | 89 | | |
85 | 90 | | |
86 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
210 | 210 | | |
211 | 211 | | |
212 | 212 | | |
| 213 | + | |
| 214 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
0 commit comments