Skip to content

v0.2.0

Latest

Choose a tag to compare

@jianglan89 jianglan89 released this 31 Oct 07:54
· 9 commits to release/0.2.0 since this release

Overview

RTP-LLM First Release Version:0.2.0(2025.09)

Features

Framkework Advanced Feature

New Models

Model Family (Variants) Example HuggingFace Identifier Description Support CardType
DeepSeek (v1, v2, v3/R1) deepseek-ai/DeepSeek-R1 Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning;
top performance on complex reasoning, math, and code tasks.
RTP-LLM provides Deepseek v3/R1 model-specific optimizations
NV ✅
AMD ✅
Kimi (Kimi-K2) moonshotai/Kimi-K2-Instruct Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence NV ✅
AMD ✅
Qwen (v1, v1.5, v2, v2.5, v3, QWQ, Qwen3-Coder) Qwen/Qwen3-235B-A22B Series of advanced reasoning-optimized models,
Significantly improved performance on reasoning tasks,
including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise — achieving state-of-the-art results among open-source thinking models.
Markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences.
Enhanced 256K long-context understanding capabilities.
NV ✅
AMD ✅
QwenVL (VL2, VL2.5, VL3) Qwen/Qwen2-VL-2B Series of advanced Vision-language model series based on Qwen2.5/Qwen3 NV ✅
AMD ❌
Llama meta-llama/Llama-4-Scout-17B-16E-Instruct Meta’s open LLM series, spanning 7B to 400B parameters (Llama 2, 3, and new Llama 4) with well-recognized performance. NV ✅
AMD ✅

Bug Fixs

  • P/D Disaggregation dead lock casuse by request cancel/failed before remote running
  • Raw Request stream stop_words cause fake hang

Question of omission

  • In 3fs Case need more MEM or set FRONTEND_SERVER_COUNT=1 to reduce frontend_server mem usage in P/D when Use Frontend Disaggregation.
  • too many dynamic lora need more reserver_runtime_mem_mb
  • AMD not support MoE models
  • MoE model without shared_experter cannot use enable-layer-micro-batch
  • P/D Disaggregation with EPLB and MTP step > 1 may cause Prefill Hang
  • Embedding of VL Model is not ok cause by position id is wrong

Performance

Compatibility

Package

Docker Image

CardType image tag
CUDA-SM9x ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_sm9x_opensource 0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
CUDA-SM8x ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_sm8x_opensource 0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
CUDA-SM7x ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_sm7x_opensource 0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
Frontend ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_frontend-opensource 0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12
AMD ali-hangzhou-hub-registry.cn-hangzhou.cr.aliyuncs.com/isearch/rtp_llm_rocm_opensource 0.2.0_0.2.0_2025_10_31_10_23_d1e93ce12

Wheel

CardType wheel
CUDA-SM9x https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-sm9x-cp310-cp310-manylinux1_x86_64.whl
CUDA-SM8x https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-sm8x-cp310-cp310-manylinux1_x86_64.whl
CUDA-SM7x https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-sm7x-cp310-cp310-manylinux1_x86_64.whl
Frontend https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-frontend-cp310-cp310-manylinux1_x86_64.whl
AMD https://rtp-opensource.oss-cn-hangzhou.aliyuncs.com/package/daily/2025_11_03/rtp_llm-0.2.0-0.2.0-2025_10_31_10_23_d1e93ce12-rocm-cp310-cp310-manylinux1_x86_64.whl

Images of Master is comming soon