v0.11.0 #272
xyDong0223
announced in
Announcements
v0.11.0
#272
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
vLLM-Kunlun v0.11.0
vLLM-Kunlun v0.11.0 featured 154 commits from 31 contributors (including new contributors)!
✨ Highlights
🤖 DeepSeek-V3/R1/V3.2 Full Support
vLLM-Kunlun v0.11.0 delivers complete support for the DeepSeek model family on Kunlun hardware:
--compilation-configis no longer required for DeepSeek-V3.1 ([Bugfix]remove mla patch, server args with no --compilation-config for ds v3.1 #145)🔀 Multi-LoRA Inference Optimization
🔍 Embedding Model Support
🗜️ Quantization Enhancements
⚡ Kernel Optimizations
topk_per_rowkernel to optimize Top-K index calculation ([Kernel] add topk_per_row to optimize the calculation of topk_indexes #168)flashinfer_rotary_embeddingandfast_topkv2kernels; optimizedint8_paged_mqa_logitswith parallelism ([Feature][DS32] add 2 kernels and optimize the calculation of topk_indices #134, [Feature][DS32] Add kernels to optimize RoPE and the decoding stage #143)🆕 New Models
🔧 Features
🔍 Embedding
🗜️ Quantization
⚡ Kernels
topk_per_rowto optimize Top-K index calculation ([Kernel] add topk_per_row to optimize the calculation of topk_indexes #168) by @fromckflashinfer_rotary_embedding,fast_topkv2) and optimizetopk_indicescalculation ([Feature][DS32] add 2 kernels and optimize the calculation of topk_indices #134) by @fromckgemma_rmsnorm,moe_pre_small, andsplit_norm_ropekernels ([Feature] Add gemma_rmsnorm, moe_pre_small and split_norm_rope. #180) by @Hanyu-Jin🔀 Multi-LoRA
🔮 MTP (Multi-Token Prediction)
apply_top_k_top_p([Model] Support Qwen3-Next MTP #268) by @ldh2020🏗️ Infrastructure
torch.opsusing OOT method ([Update] 1/N Unified the registration of custom operators to torch.ops and fixed some minor issues #203, [Update] 1/N for v0.15.1 Implement and register Fused MoE Kunlun kernels using OOT method #209) by @xyDong0223layernorm,rotary_embedding, andvocab_parallel_embeddingvia@CustomOp.register_oot([Feature]Using @CustomOp.register_oot to register layernorm/rotary_embedding/vocab_parallel_embdding #234) by @lishiyong110collect_envfeature for environment diagnostics ([Misc] add collect_env feat #218) by @Lidang-Jiang🐛 Bug Fixes
torch.ops._Cinstead of_kunlun([Bugfix] fix error for cudagraph, bind weak_ref_tensor to torch.ops._C instead of _kunlun #220) by @lishiyong110kunlun_scale_mmbias bug ([fix]bias bug in kunlun_scale_mm #126) by @liwei109cutlass_scaled_mminference error ([fix] resolve cutlass_scaled_mm inference error #82) by @tangshiwenKeyError: ((1, 1, 3), '<i8')([Bug] Fix InternVL KeyError: ((1, 1, 3), '<i8') #108) by @Lidang-Jiangapply_top_k_top_pnot applied issue ([Bug] Fix no apply_top_k_top_p issue. #101) by @Hanyu-Jincompressed_tensorsimport error ([Bugfix] fix can not import compressed_tensors #87) by @baoqian426cocopodops not found ([Bugfix] cocopod ops can't be finded #242) by @liwei109xspeedgate_opsimport in Kunlun ops and FLA chunk ([Bugfix] fix miss import xspeedgate_ops in fla chunk #237, [Bugfix] fix miss import xspeedgate_ops in kunlun ops #238) by @xyDong0223transformers4.57 ([BugFix] Adapt GLM5 config for transformers 4.57 #207) by @tangshiwenapply_repetition_penalties_in custom op (register apply_repetition_penalties_ in custom_op #110) by @roger-lcc🔬 CI / Build
.pre-commit-config.yaml, add_pylint.yml([CI/Build] update .pre-commit-config.yaml && add _pylint.yml && updat… #155) by @WeiJie-520PULL_REQUEST_TEMPLATE.mdandISSUE_TEMPLATE(【Docs】add PULL_REQUEST_TEMPLATE.md and ISSUE_TEMPLATE #56) by @tanjunchenCODE_OF_CONDUCT.md,MAINTAINERS.md, and contributing guide (【Docs】update readme and contributing guide #55) by @tanjunchen📝 Documentation
uv; integrate xpytorch and ops into image ([Doc] update base image url(1.Replace conda with uv; 2.Integrate xpyt… #146) by @WeiJie-520xspeedgate_opsdocumentation ([Doc] update xspeedgate_ops (20260130) #188) by @WeiJie-520--compilation-configfrom all documentation; P800 no longer requires this parameter ([Doc] Remove --compilation-config from all docs #253) by @Lidang-Jiang📋 What's Changed
🎉 New Contributors
We warmly welcome all first-time contributors to vLLM-Kunlun!
Full Changelog: https://github.com/xyDong0223/vLLM-Kunlun-kunlunops/commits/main
What's Changed
New Contributors
Full Changelog: v0.11.0rc1...v0.11.0rc2
This discussion was created from the release v0.11.0.
Beta Was this translation helpful? Give feedback.
All reactions