Skip to content

Commit fe928a9

Browse files
authored
[megatron] support megatron tuner_type 'lora_llm' (modelscope#8388)
1 parent 1011139 commit fe928a9

File tree

18 files changed

+134
-35
lines changed

18 files changed

+134
-35
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ Running Environment:
141141
| python | >=3.9 | 3.11/3.12 | |
142142
| cuda | | cuda12 | No need to install if using CPU, NPU, MPS |
143143
| torch | >=2.0 | 2.8.0/2.10.0 | |
144-
| transformers | >=4.33 | 4.57.6/5.3.0 | |
144+
| transformers | >=4.33 | 4.57.6/5.2.0 | |
145145
| modelscope | >=1.23 | | |
146146
| peft | >=0.11,<0.19 | | |
147147
| flash_attn | | 2.8.3/3.0.0b1 | |

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ uv pip install -e . --torch-backend=auto
137137
| python | >=3.9 | 3.11/3.12 | |
138138
| cuda | | cuda12 | 使用cpu、npu、mps则无需安装 |
139139
| torch | >=2.0 | 2.8.0/2.10.0 | |
140-
| transformers | >=4.33 | 4.57.6/5.3.0 | |
140+
| transformers | >=4.33 | 4.57.6/5.2.0 | |
141141
| modelscope | >=1.23 | | |
142142
| peft | >=0.11,<0.19 | | |
143143
| flash_attn | | 2.8.3/3.0.0b1 | |

docs/source/BestPractices/Qwen3_5-Best-Practice.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# Qwen3.5 最佳实践
22

3-
ms-swift 4.0支持使用transformers/Megatron后端对[Qwen3.5](https://github.com/QwenLM/Qwen3.5) Dense/Moe模型进行训练。Qwen3.5 属于混合思考的多模态模型,结合了linear attention和full attention。本文将介绍如何对Qwen3.5 Dense/Moe模型进行推理、指令微调以及强化学习。
3+
ms-swift 支持使用transformers/Megatron后端对[Qwen3.5](https://github.com/QwenLM/Qwen3.5) Dense/Moe模型进行训练。Qwen3.5 属于混合思考的多模态模型,结合了linear attention和full attention。本文将介绍如何对Qwen3.5 Dense/Moe模型进行推理、指令微调以及强化学习。
44

55

66
## 环境设置
77
```shell
88
pip install -U ms-swift
9-
pip install -U "transformers>=5.2.0" "qwen_vl_utils>=0.0.14" peft liger-kernel
9+
# "transformers==5.2.*" 会遇到与vllm的兼容问题,参考这个issue: https://github.com/modelscope/ms-swift/issues/8254
10+
# "transformers==5.3.*" 会遇到视频训练问题,参考这个issue: https://github.com/modelscope/ms-swift/issues/8362
11+
pip install -U "transformers==5.2.*" "qwen_vl_utils>=0.0.14" peft liger-kernel
1012

1113
# flash-linear-attention
1214
# 请安装fla main分支,若出现训练缓慢的问题请参考:https://github.com/fla-org/flash-linear-attention/issues/758
@@ -24,7 +26,7 @@ pip install deepspeed
2426
# vllm (torch2.10) for inference/deployment/RL
2527
pip install -U "vllm>=0.17.0"
2628
# 对于强化学习(RL)训练,需要覆盖 vLLM 的默认安装版本
27-
pip install -U "transformers>=5.2.0"
29+
pip install -U "transformers==5.2.*"
2830
```
2931

3032
- Qwen3.5 视频数据训练卡住:使用decord后端读取视频可能导致卡住问题,参考[这个issue](https://github.com/dmlc/decord/issues/269)。你可以使用torchcodec后端,具体参考[qwen_vl_utils](https://github.com/QwenLM/Qwen3-VL/blob/50068df2334f309979ff05d75f1078c8309c63ed/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L390-L400)库。

docs/source/GetStarted/SWIFT-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
140140
| python | >=3.9 | 3.11/3.12 | |
141141
| cuda | | cuda12 | 使用cpu、npu、mps则无需安装 |
142142
| torch | >=2.0 | 2.8.0/2.10.0 | |
143-
| transformers | >=4.33 | 4.57.6/5.3.0 | |
143+
| transformers | >=4.33 | 4.57.6/5.2.0 | |
144144
| modelscope | >=1.23 | | |
145145
| peft | >=0.11,<0.19 | | |
146146
| flash_attn | | 2.8.3/3.0.0b1 | |

docs/source/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
## 基本参数
1111

1212
- 🔥tuner_backend: 可选为'peft','unsloth'。默认为'peft'。
13-
- 🔥tuner_type: 可选为'lora'、'full'、'longlora'、'adalora'、'llamapro'、'adapter'、'vera'、'boft'、'fourierft'、'reft'。默认为'lora'。**在ms-swift3.x中参数名为`train_type`**
13+
- 🔥tuner_type: 可选为'lora'、'full'、'longlora'、'adalora'、'llamapro'、'adapter'、'vera'、'boft'、'fourierft'、'reft'。默认为'lora'。
1414
- 🔥adapters: 用于指定adapter的id/path的list,默认为`[]`。该参数通常用于推理/部署命令,例如:`swift infer --model '<model_id_or_path>' --adapters '<adapter_id_or_path>'`。该参数偶尔也用于断点续训,该参数与`resume_from_checkpoint`的区别在于,**该参数只读取adapter权重**,而不加载优化器和随机种子,并不跳过已训练的数据集部分。
1515
- `--model``--adapters`的区别:`--model`后接完整权重的目录路径,内包含model/tokenizer/config等完整权重信息,例如`model.safetensors``--adapters`后接增量adapter权重目录路径的列表,内涵adapter的增量权重信息,例如`adapter_model.safetensors`
1616
- 🔥external_plugins: 外部`plugin.py`文件列表,这些文件会被额外加载(即对模块进行`import`)。默认为`[]`。你可以传入自定义模型、对话模板和数据集注册的`.py`文件路径,参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/custom/sft.sh);或者自定义GRPO的组件,参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo/plugin/run_external_reward_func.sh)

docs/source/Megatron-SWIFT/Command-line-parameters.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,8 @@
198198
- mtp_loss_scaling_factor: 多token预测(MTP)损失的缩放因子。我们计算所有深度上MTP损失的平均值,然后乘以该缩放因子得到总体MTP损失,它将作为一个额外的训练目标。默认为0.1。
199199

200200
**Tuner参数**:
201-
- tuner_type: 可选为'lora'和'full'。默认为'full'。(**在ms-swift3.x中参数名为`train_type`**
201+
- tuner_type: 可选为'lora', 'full'和'lora_llm'。默认为'full'。
202+
- 其中'lora_llm'代表对llm部分进行lora,vit/aligner部分使用'full'。你可以使用`vit_lr/aligner_lr`设置各自的学习率。
202203
- 🔥freeze_llm: 该参数只对多模态模型生效,可用于全参数训练和LoRA训练,但会产生不同的效果。若是全参数训练,将freeze_llm设置为True会将LLM部分权重进行冻结;若是LoRA训练且`target_modules`设置为'all-linear',将freeze_llm设置为True将会取消在LLM部分添加LoRA模块。该参数默认为False。
203204
- 🔥freeze_vit: 该参数只对多模态模型生效,可用于全参数训练和LoRA训练,但会产生不同的效果。若是全参数训练,将freeze_vit设置为True会将vit部分权重进行冻结;若是LoRA训练且`target_modules`设置为'all-linear',将freeze_vit设置为True将会取消在vit部分添加LoRA模块。该参数默认为True。
204205
- 注意:**这里的vit不仅限于vision_tower, 也包括audio_tower**。若是Omni模型,若你只希望对vision_tower加LoRA,而不希望对audio_tower加LoRA,你可以修改[这里的代码](https://github.com/modelscope/ms-swift/blob/a5d4c0a2ce0658cef8332d6c0fa619a52afa26ff/swift/llm/model/model_arch.py#L544-L554)

docs/source/Megatron-SWIFT/Quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
7070
| apex | | 0.1 | |
7171
| megatron_core | >=0.12,<0.16 | 0.15 | |
7272
| flash_attn | | 2.8.3/3.0.0b1 | |
73-
| transformers | >=4.33 | 4.57.6/5.3.0 | |
73+
| transformers | >=4.33 | 4.57.6/5.2.0 | |
7474
| modelscope | >=1.23 | | |
7575
| peft | >=0.11,<0.19 | | LoRA |
7676
| trl | >=0.15,<0.29 | | RLHF |

docs/source_en/BestPractices/Qwen3_5-Best-Practice.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# Qwen3.5 Best Practices
22

3-
ms-swift 4.0 supports training [Qwen3.5](https://github.com/QwenLM/Qwen3.5) Dense/MoE models using transformers/Megatron backends. Qwen3.5 is a multimodal model with hybrid thinking, combining linear attention and full attention. This article will introduce how to perform inference, instruction fine-tuning, and reinforcement learning on Qwen3.5 Dense/MoE models.
3+
ms-swift supports training [Qwen3.5](https://github.com/QwenLM/Qwen3.5) Dense/MoE models using transformers/Megatron backends. Qwen3.5 is a multimodal model with hybrid thinking, combining linear attention and full attention. This article will introduce how to perform inference, instruction fine-tuning, and reinforcement learning on Qwen3.5 Dense/MoE models.
44

55
## Environment Setup
66

77
```shell
88
pip install -U ms-swift
9-
pip install -U "transformers>=5.2.0" "qwen_vl_utils>=0.0.14" peft liger-kernel
9+
# "transformers==5.2.*" encounters compatibility issues with vllm. See this issue: https://github.com/modelscope/ms-swift/issues/8254
10+
# "transformers==5.3.*" encounters video training issues. See this issue: https://github.com/modelscope/ms-swift/issues/8362
11+
pip install -U "transformers==5.2.*" "qwen_vl_utils>=0.0.14" peft liger-kernel
1012

1113
# flash-linear-attention
1214
# Please install the fla main branch. If you encounter slow training issues, please refer to: https://github.com/fla-org/flash-linear-attention/issues/758
@@ -24,7 +26,7 @@ pip install deepspeed
2426
# vllm (torch2.10) for inference/deployment/RL
2527
pip install -U "vllm>=0.17.0"
2628
# For RL training, need to override vllm's default installation version
27-
pip install -U "transformers>=5.2.0"
29+
pip install -U "transformers==5.2.*"
2830
```
2931

3032
- Qwen3.5 video data training hangs: Using the decord backend to read videos may cause hanging issues, refer to [this issue](https://github.com/dmlc/decord/issues/269). You can use the torchcodec backend, specifically refer to the [qwen_vl_utils](https://github.com/QwenLM/Qwen3-VL/blob/50068df2334f309979ff05d75f1078c8309c63ed/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L390-L400) library.

docs/source_en/GetStarted/SWIFT-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ More images can be found [here](https://modelscope.cn/docs/intro/environment-set
139139
| python | >=3.9 | 3.11/3.12 | |
140140
| cuda | | cuda12 | No need to install if using CPU, NPU, MPS |
141141
| torch | >=2.0 | 2.8.0/2.10.0 | |
142-
| transformers | >=4.33 | 4.57.6/5.3.0 | |
142+
| transformers | >=4.33 | 4.57.6/5.2.0 | |
143143
| modelscope | >=1.23 | | |
144144
| peft | >=0.11,<0.19 | | |
145145
| flash_attn | | 2.8.3/3.0.0b1 | |

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The command-line arguments will be introduced in four categories: basic argument
1111
## Base Arguments
1212

1313
- 🔥tuner_backend: Optional values are `'peft'` and `'unsloth'`. Default is `'peft'`.
14-
- 🔥tuner_type: Optional values are `'lora'`, `'full'`, `'longlora'`, `'adalora'`, `'llamapro'`, `'adapter'`, `'vera'`, `'boft'`, `'fourierft'`, `'reft'`. Default is `'lora'`. (**In ms-swift 3.x, the parameter name is `train_type`**)
14+
- 🔥tuner_type: Optional values are `'lora'`, `'full'`, `'longlora'`, `'adalora'`, `'llamapro'`, `'adapter'`, `'vera'`, `'boft'`, `'fourierft'`, `'reft'`. Default is `'lora'`.
1515
- 🔥adapters: A list specifying adapter IDs or paths. Default is `[]`. This parameter is typically used in inference/deployment commands, for example: `swift infer --model '<model_id_or_path>' --adapters '<adapter_id_or_path>'`. It can occasionally be used for resuming training from a checkpoint. The difference between this parameter and `resume_from_checkpoint` is that **this parameter only loads adapter weights**, without restoring the optimizer state or random seed, and does not skip already-trained portions of the dataset.
1616
- The difference between `--model` and `--adapters`: `--model` is followed by the directory path of the complete weights, which contains full weight information such as model/tokenizer/config, for example `model.safetensors`. `--adapters` is followed by a list of incremental adapter weight directory paths, which contain incremental weight information of the adapters, for example `adapter_model.safetensors`.
1717
- 🔥external_plugins: A list of external `plugin.py` files that will be additionally loaded (i.e., the modules will be imported). Defaults to `[]`. You can pass in `.py` file paths for custom model, template, and dataset registration, see [here](https://github.com/modelscope/ms-swift/blob/main/examples/custom/sft.sh); or for custom GRPO components, see [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo/plugin/run_external_reward_func.sh).

0 commit comments

Comments
 (0)