Skip to content

v4.0.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 03 Mar 08:25
· 88 commits to main since this release

中文版

新特性

  1. 架构优化
    a. 目录结构重构与依赖关系优化,使用模块化设计,提升架构的可扩展性和可定制性。
    b. model_typetemplate解耦,简化同一 model_type 含多个 template 的模型支持流程。
    c. Megatron-SWIFT 训练循环重写,使用 megatron-core 替代 megatron-lm 依赖。(兼容Ascend NPU)
  2. Megatron-SWIFT
    a. 新模型支持:Qwen3.5系列、GLM4.7-Flash、MiniMax-M2.1、OLMoE。
    b. Embedding 任务支持,训练示例:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/embedding
    c. Reranker 任务支持,训练示例:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/reranker
    d. 新增save_total_limit参数,自动清理过期 checkpoint,并保留指标最优和最新的权重。
    e. Qwen3-Next/Qwen3.5 新增apply_wd_to_qk_layernorm参数,支持对 qk layernorm 应用权重衰减。
    f. 多模态MoE模型lora支持 --target_modules all-router 配置。
  3. RL
    a. 支持GDPO算法计算优势,使用参数--scale_rewards gdpo。(感谢 @Auraithm 的贡献)
    b. GKD 支持使用 top-k logits 计算KL以节约显存,使用参数 --gkd_topk_logits
    c. GKD 支持使用 teacher server,避免显式加载教师模型。
  4. 训练
    a. 新增 muon clip 优化器支持,训练示例:https://github.com/modelscope/ms-swift/blob/main/examples/train/optimizer/muonclip.sh (感谢 @vx120 的贡献)
    b. 依赖更新:兼容最新依赖 python3.12, transformers 5.2.0, vllm 0.15.1, trl 0.28, liger-kernel 0.7.0等。
    c. generative reranker lm_head 部分计算优化,降低显存占用。
    d. fsdp2支持激活 cpu offload;deepspeed elastic支持。(感谢招商 @meichangsu1 的贡献)

新模型

  1. 纯文本模型
    a. Qwen/Qwen3-Coder-Next
    b. ZhipuAI/GLM-4.7-Flash, ZhipuAI/GLM-5
    c. MiniMaxAI/MiniMax-M2.1
    d. Tencent-YouTu-Research/Youtu-LLM-2B
    e. IQuestLab/IQuest-Coder-V1-40B-Instruct
    f. allenai/OLMoE-1B-7B-0924-Instruct系列(感谢 @qianhao0713 的贡献)
  2. 多模态模型
    a. Qwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-9B 系列。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5
    b. Qwen3-VL-Embedding, Qwen3-VL-Reranker。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding/qwen3, https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker/qwen3
    c. deepseek-ai/DeepSeek-OCR-2
    d. ZhipuAI/GLM-OCR
    e. PaddlePaddle/PaddleOCR-VL-1.5
    f. OpenBMB/MiniCPM-o-4_5
    g. stepfun-ai/Step3-VL-10B
    h. google/medgemma-4b-it 系列

English Version

New Features

  1. Architecture Optimization
    a. Directory structure refactoring and dependency optimization with modular design to enhance architecture scalability and customizability.
    b. Decoupling of model_type and template to simplify support for models with multiple templates under the same model_type.
    c. Rewritten Megatron-SWIFT training loop using megatron-core instead of megatron-lm dependency. (Compatible with Ascend NPU)
  2. Megatron-SWIFT
    a. New model support: Qwen3.5 series, GLM4.7-Flash, MiniMax-M2.1, OLMoE.
    b. Embedding task support. Training example: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/embedding
    c. Reranker task support. Training example: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/reranker
    d. Added save_total_limit parameter to automatically clean up expired checkpoints while retaining the best-performing and latest weights.
    e. Added apply_wd_to_qk_layernorm parameter for Qwen3-Next/Qwen3.5 to support weight decay on qk layernorm.
    f. Multi-modal MoE model LoRA supports --target_modules all-router configuration.
  3. RL
    a. Support for GDPO algorithm to compute advantages using parameter --scale_rewards gdpo. (Thanks to @Auraithm)
    b. GKD supports using top-k logits to compute KL for memory savings with parameter --gkd_topk_logits.
    c. GKD supports using teacher server to avoid explicitly loading the teacher model.
  4. Training
    a. Added Muon-CLIP optimizer support. Training example: https://github.com/modelscope/ms-swift/blob/main/examples/train/optimizer/muonclip.sh (Thanks to @vx120)
    b. Dependency updates: Compatible with latest dependencies including python3.12, transformers 5.2.0, vllm 0.15.1, trl 0.28, liger-kernel 0.7.0, etc.
    c. Optimized generative reranker lm_head computation to reduce memory usage.
    d. FSDP2 supports CPU offload activation; DeepSpeed elastic support. (Thanks to @meichangsu1)

New Models

  1. Text-only Models
    a. Qwen/Qwen3-Coder-Next
    b. ZhipuAI/GLM-4.7-Flash, ZhipuAI/GLM-5
    c. MiniMaxAI/MiniMax-M2.1
    d. Tencent-YouTu-Research/Youtu-LLM-2B
    e. IQuestLab/IQuest-Coder-V1-40B-Instruct
    f. allenai/OLMoE-1B-7B-0924-Instruct series (Thanks to @qianhao0713)
  2. Multi-modal Models
    a. Qwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-9B series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5
    b. Qwen3-VL-Embedding, Qwen3-VL-Reranker. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding/qwen3, https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker/qwen3
    c. deepseek-ai/DeepSeek-OCR-2
    d. ZhipuAI/GLM-OCR
    e. PaddlePaddle/PaddleOCR-VL-1.5
    f. OpenBMB/MiniCPM-o-4_5
    g. stepfun-ai/Step3-VL-10B
    h. google/medgemma-4b-it series

What's Changed

New Contributors

Full Changelog: v3.12.6...v4.0.0