Skip to content

Releases: modelscope/ms-swift

Patch release v4.2.3

31 May 12:12

Choose a tag to compare

Patch release v4.2.2

24 May 15:18

Choose a tag to compare

Patch release v4.2.1

17 May 16:50

Choose a tag to compare

v4.2.0

07 May 09:14

Choose a tag to compare

中文版

新特性

  1. Megatron-SWIFT
    a. 新增 model_type 支持:kimi_k25、hy_v3、llava_onevision。(llava_onevision 感谢 @randydl 的贡献)
    b. 支持 GLM-5 共享参数 MTP,可通过 --mtp_shared_weights 参数启用。
    c. 支持 Qwen3.5 FP8 训练,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/models/qwen3_5/fp8.sh
    d. 自定义 Megatron 模型文档:https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Custom-Model.html
    e. 支持控制 MTP 分支中 decoder_input 是否停止梯度,即 MTP loss 能否直接通过 decoder_input 回传梯度到 Embedding/ViT,可通过 --mtp_decoder_input_detach 参数控制。
    f. mlp_padding_free 参数兼容序列并行
    g. 支持通过 megatron export 命令进行权重 FP8 量化导出,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/megatron/fp8/quant.sh
    h. 移除对 megatron-core 0.12 - 0.14 版本的依赖兼容支持。
  2. RL
    a. GKD/OPSD 支持设置 generation_batch_size/steps_per_generaiton 参数。
    b. GKD/OPSD teacher_server_api 兼容多模态训练。
    c. GKD/OPSD 兼容 padding_free。
    d. Megatron GRPO/GKD 权重同步支持仅同步 LoRA 权重。
    e. swift rollout 新增异常捕获机制,避免进程静默卡死。
    f. GRPO ref_sync_callback 支持在 ZeRO-3 下进行分层 gather,避免 OOM。
    g. GRPO TRL 依赖版本升级至 >= 0.26。
  3. 训练
    a. 支持 Qwen3.5 序列并行,可通过 --sequence_parallel_size 参数控制。(感谢 @meichangsu1 的贡献)
    b. 支持在数据集中直接指定 loss_scale,提供更灵活的控制方式,参考文档:https://swift.readthedocs.io/zh-cn/latest/Customization/Custom-dataset.html#id4
    c. 数据集 datasets 依赖兼容 4.x 版本。
    d. cached_dataset 与 --truncation_strategy split 策略兼容。
  4. 硬件
    a. NPU 支持基于 transformers/Megatron 后端的 Qwen3.5 训练,使用 Megatron 后端时需开启 USE_MCORE_GDN=0 环境变量。(感谢 @addsubmuldiv@hazelduan 的贡献)
    b. 新增 AMD 支持文档:https://swift.readthedocs.io/zh-cn/latest/BestPractices/AMD-support.html (感谢 @Treemann 的贡献)
    c. 支持 Metax 硬件的 RL 训练。(感谢 @suenphey 的贡献)
    d. NPU Megatron 训练兼容 megatron-core 0.15.3。(感谢 @addsubmuldiv 的贡献)

新模型

  1. 纯文本模型
    a. ZhipuAI/GLM-5.1
    b. MiniMax/MiniMax-M2.7
    c. moonshotai/Kimi-K2.6(仅含纯文本)
    d. Tencent-Hunyuan/Hy3-preview
    e. AIDC-AI/Marco-Nano-Instruct 系列
  2. 多模态模型
    a. Qwen/Qwen3.6-35B-A3B、Qwen/Qwen3.6-27B
    b. Qwen3-ASR(感谢 @xut806 的贡献)
    c. Gemma4 系列模型混合模态数据集训练支持
    d. OpenDataLab/MinerU2.5-Pro-2604-1.2B
    e. OpenBMB/MiniCPM-o-4_5 新增音频模态支持(感谢 @fanqiNO1 的贡献)
    f. allenai/Molmo2-4B(感谢 @Kagura-0001 的贡献)

English Version

New Features

  1. Megatron-SWIFT
    a. Added model_type support: kimi_k25, hy_v3, llava_onevision. (llava_onevision contributed by @randydl)
    b. Added support for GLM-5 shared-parameter MTP, which can be enabled via the --mtp_shared_weights argument.
    c. Added support for Qwen3.5 FP8 training. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/models/qwen3_5/fp8.sh
    d. Custom Megatron model documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Custom-Model.html
    e. Added support for controlling whether decoder_input stops gradient in the MTP branch (i.e., whether MTP loss can backpropagate gradients through decoder_input to Embedding/ViT), configurable via the --mtp_decoder_input_detach argument.
    f. mlp_padding_free is now compatible with Sequence Parallelism.
    g. Added support for FP8 quantization export via the megatron export command. Script reference: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/fp8/quant.sh
    h. Removed dependency compatibility support for megatron-core versions 0.12 - 0.14.
  2. RL
    a. GKD/OPSD now supports the generation_batch_size/steps_per_generation parameters.
    b. GKD/OPSD teacher_server_api is now compatible with multimodal training.
    c. GKD/OPSD is now compatible with padding_free.
    d. Megatron GRPO/GKD weight synchronization now supports syncing LoRA weights only.
    e. Added exception handling to swift rollout to prevent silent process hangs.
    f. GRPO ref_sync_callback now supports layer-wise gather under ZeRO-3 to avoid OOM.
    g. GRPO TRL dependency upgraded to >= 0.26.
  3. Training
    a. Added support for Qwen3.5 Sequence Parallelism, controllable via the --sequence_parallel_size argument. (Contributed by @meichangsu1)
    b. Added support for specifying loss_scale directly in the dataset for more flexible loss control. Documentation: https://swift.readthedocs.io/en/latest/Customization/Custom-dataset.html#supervised-fine-tuning
    c. Dataset dependency is now compatible with datasets 4.x.
    d. cached_dataset is now compatible with the --truncation_strategy split strategy.
  4. Hardware
    a. NPU now supports Qwen3.5 training with transformers/Megatron backends. When using the Megatron backend, the USE_MCORE_GDN=0 environment variable must be set. (Contributed by @addsubmuldiv, @hazelduan)
    b. Added AMD support documentation: https://swift.readthedocs.io/en/latest/BestPractices/AMD-support.html (Contributed by @Treemann)
    c. Added RL training support for MetaX hardware. (Contributed by @suenphey)
    d. NPU Megatron training is now compatible with megatron-core 0.15.3. (Contributed by @addsubmuldiv)

New Models

  1. Text-only Models
    a. ZhipuAI/GLM-5.1
    b. MiniMax/MiniMax-M2.7
    c. moonshotai/Kimi-K2.6 (text-only)
    d. Tencent-Hunyuan/Hy3-preview
    e. AIDC-AI/Marco-Nano-Instruct series
  2. Multimodal Models
    a. Qwen/Qwen3.6-35B-A3B, Qwen/Qwen3.6-27B
    b. Qwen3-ASR (Contributed by @xut806)
    c. Added mixed-modality dataset training support for Gemma4 series models.
    d. OpenDataLab/MinerU2.5-Pro-2604-1.2B
    e. OpenBMB/MiniCPM-o-4_5 now supports audio modality. (Contributed by @fanqiNO1)
    f. allenai/Molmo2-4B (Contributed by @Kagura-0001)

What's Changed

Read more

Patch release v4.1.3

25 Apr 13:50

Choose a tag to compare

Patch release v4.1.2

18 Apr 15:56

Choose a tag to compare

Patch release v4.1.1

13 Apr 14:21

Choose a tag to compare

v4.1.0

07 Apr 07:54

Choose a tag to compare

中文版

新特性

  1. Megatron-SWIFT
    a. mcore-bridge 从 ms-swift 拆分成独立 repo,为最先进模型提供 megatron-core 模型定义:https://github.com/modelscope/mcore-bridge
    b. 支持 GRPO Router Replay,使用--router_replay_mode 参数。 感谢招商技术团队 @XianlongLi 的贡献。
    c. Qwen3.5 解除 TP 数受 num_query_groups 限制的约束,支持 CP 和序列 packing,并支持多模态 MTP。参考 Qwen3.5 最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Qwen3_5-Best-Practice.html
    d. 新模型支持:GLM-5、Deepseek-v3.2 和 MiniMax2.5。
    e. 支持 muon、dist_muon 优化器,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/megatron/muon.sh
    f. 支持 --tuner_type lora_llm,对 LLM 部分使用 LoRA 训练,对 ViT/Aligner 使用全参数训练。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/multimodal/lora_llm_vit_full
  2. RL
    a. OPSD 算法支持,支持设置教师模型为训练模型并支持设置 teacher_prompt,参考https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html#opsd-on-policy-self-distillation
    b. REAL 算法支持,使用 --loss_type real 参数。感谢招商技术团队 @li2zhi 的贡献。
    c. 支持 QLoRA GRPO,参考 https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/internal/qlora.sh
    d. GRPO K3-KL 计算增加 clamp 操作稳定训练。
    e. top-k 默认值从 50 修改为 -1,top-p 默认值从 0.95 修改为 1。
  3. 训练
    a. 优化 yaml 启动方式的支持,参考:https://github.com/modelscope/ms-swift/tree/main/examples/yaml
    b. 新增架构文档:https://swift.readthedocs.io/zh-cn/latest/Customization/Architecture.html
    c. 新增 Metax 支持最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Metax-support.html
    d. 新增通过 uv 安装 ms-swift 的支持。

新模型

  1. 纯文本模型
    a. MiniMax/MiniMax-M2.5
    b. deepseek-ai/DeepSeek-V3.2
    c. Alibaba-AAIG/YuFeng-XGuard-Reason-0.6B系列 (感谢 @ciaoyizhen 的贡献)
  2. 多模态模型
    a. google/gemma-4-E2B-it系列,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/models/gemma4/train.sh

English Version

New Features

  1. Megatron-SWIFT
    a. mcore-bridge has been split from ms-swift into an independent repository, providing megatron-core model definitions for state-of-the-art models: https://github.com/modelscope/mcore-bridge
    b. Support for GRPO Router Replay via the --router_replay_mode parameter. Thanks to @XianlongLi from the CMB Tech team for the contribution.
    c. Qwen3.5 removes the TP size restriction imposed by num_query_groups, with added support for CP, sequence packing, and multimodal MTP. Refer to the Qwen3.5 best practices: https://swift.readthedocs.io/zh-cn/latest/BestPractices/Qwen3_5-Best-Practice.html
    d. New model support: GLM-5, DeepSeek-V3.2, and MiniMax2.5.
    e. Support for muon and dist_muon optimizers. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/muon.sh
    f. Support for --tuner_type lora_llm, enabling LoRA training on the LLM component and full-parameter training on ViT/Aligner. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/multimodal/lora_llm_vit_full
  2. RL
    a. Support for the OPSD algorithm, with the ability to set the teacher model as the training model and configure teacher_prompt. Refer to: https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html#opsd-on-policy-self-distillation
    b. Support for the REAL algorithm via the --loss_type real parameter. Thanks to @li2zhi from the CMB Tech team for the contribution.
    c. Support for QLoRA GRPO. Refer to: https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/internal/qlora.sh
    d. Added clamp operation to GRPO K3-KL computation for training stability.
    e. Changed the default value of top-k from 50 to -1, and top-p from 0.95 to 1.
  3. Training
    a. Improved support for YAML-based launch configurations. Refer to: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
    b. Added architecture documentation: https://swift.readthedocs.io/zh-cn/latest/Customization/Architecture.html
    c. Added Metax support best practices: https://swift.readthedocs.io/zh-cn/latest/BestPractices/Metax-support.html
    d. Added support for installing ms-swift via uv.

New Models

  1. Text-Only Models
    a. MiniMax/MiniMax-M2.5
    b. deepseek-ai/DeepSeek-V3.2
    c. Alibaba-AAIG/YuFeng-XGuard-Reason-0.6B series (Thanks to @ciaoyizhen for the contribution)
  2. Multimodal Models
    a. google/gemma-4-E2B-it series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/models/gemma4/train.sh

What's Changed

Read more

Patch release v4.0.4

03 Apr 22:36

Choose a tag to compare

Patch release v4.0.3

29 Mar 04:21

Choose a tag to compare