v0.2.0

Latest

Latest

jianglan89 released this 31 Oct 07:54

· 9 commits to release/0.2.0 since this release

d1e93ce

Overview

RTP-LLM First Release Version:0.2.0(2025.09)

Features

Framkework Advanced Feature

PD Disaggregation && PD Entrance Transpose
Attention Support more Backend: XQA, FlashInfer
Speculative Decoding
EPLB
MicroBatch & Overlapping
MTP
DeepEP
LoadBalance
3FS
FP8 KVCache
REUSE KV CACHE
Quantization
MultiLoRA
Attention FFN Disaggregation
Frontend/Backend Disaggregation

New Models

Model Family (Variants)	Example HuggingFace Identifier	Description	Support CardType
DeepSeek (v1, v2, v3/R1)	`deepseek-ai/DeepSeek-R1`	Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning; top performance on complex reasoning, math, and code tasks. RTP-LLM provides Deepseek v3/R1 model-specific optimizations	NV ✅ AMD ✅
Kimi (Kimi-K2)	`moonshotai/Kimi-K2-Instruct`	Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence	NV ✅ AMD ✅
Qwen (v1, v1.5, v2, v2.5, v3, QWQ, Qwen3-Coder)	`Qwen/Qwen3-235B-A22B`	Series of advanced reasoning-optimized models, Significantly improved performance on reasoning tasks, including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise — achieving state-of-the-art results among open-source thinking models. Markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences. Enhanced 256K long-context understanding capabilities.	NV ✅ AMD ✅
QwenVL (VL2, VL2.5, VL3)	`Qwen/Qwen2-VL-2B`	Series of advanced Vision-language model series based on Qwen2.5/Qwen3	NV ✅ AMD ❌
Llama	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	Meta’s open LLM series, spanning 7B to 400B parameters (Llama 2, 3, and new Llama 4) with well-recognized performance.	NV ✅ AMD ✅

Bug Fixs

P/D Disaggregation dead lock casuse by request cancel/failed before remote running
Raw Request stream stop_words cause fake hang

Question of omission

In 3fs Case need more MEM or set FRONTEND_SERVER_COUNT=1 to reduce frontend_server mem usage in P/D when Use Frontend Disaggregation.
too many dynamic lora need more reserver_runtime_mem_mb
AMD not support MoE models
MoE model without shared_experter cannot use enable-layer-micro-batch
P/D Disaggregation with EPLB and MTP step > 1 may cause Prefill Hang
Embedding of VL Model is not ok cause by position id is wrong

Performance

Compatibility

Package

Docker Image

CardType	image	tag
CUDA-SM9x	ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_sm9x_opensource	0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
CUDA-SM8x	ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_sm8x_opensource	0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
CUDA-SM7x	ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_sm7x_opensource	0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
Frontend	ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_frontend-opensource	0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
AMD	ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_rocm_opensource	0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12

Wheel

CardType	wheel
CUDA-SM9x	https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-sm9x-cp310-cp310-manylinux1_x86_64.whl
CUDA-SM8x	https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-sm8x-cp310-cp310-manylinux1_x86_64.whl
CUDA-SM7x	https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-sm7x-cp310-cp310-manylinux1_x86_64.whl
Frontend	https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-frontend-cp310-cp310-manylinux1_x86_64.whl
AMD	https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-rocm-cp310-cp310-manylinux1_x86_64.whl

Images of Master is comming soon

Assets 2