Skip to content

Commit 67ae607

Browse files
authored
[v4] refactor ms-swift v4 (modelscope#7238)
1 parent dfaf7e1 commit 67ae607

File tree

652 files changed

+13222
-13538
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

652 files changed

+13222
-13538
lines changed

README.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ You can contact us and communicate with us by adding our group:
7777

7878

7979
## 🎉 News
80+
- 🎁 2025.01.15: **ms-swift v4.0** major version update is in progress. It is recommended to use the stable branch [release/3.12](https://github.com/modelscope/ms-swift/tree/release/3.12). You can provide your feedback in [this issue](https://github.com/modelscope/ms-swift/issues/7250). Thank you for your support.
8081
- 🎁 2025.11.14: Megatron GRPO is now available! Check out the [docs](./docs/source_en/Megatron-SWIFT/GRPO.md) and [examples](examples/megatron/grpo).
8182
- 🎁 2025.11.04: Support for [Mcore-Bridge](docs/source_en/Megatron-SWIFT/Mcore-Bridge.md), making Megatron training as simple and easy to use as transformers.
8283
- 🎁 2025.10.28: Ray [here](docs/source_en/Instruction/Ray.md).
@@ -122,6 +123,8 @@ To install from source:
122123

123124
git clone https://github.com/modelscope/ms-swift.git
124125
cd ms-swift
126+
# The main branch is for swift 4.x. To install swift 3.x, please run the following command:
127+
# git checkout release/3.12
125128
pip install -e .
126129
```
127130

@@ -245,9 +248,10 @@ ms-swift also supports training and inference using Python. Below is pseudocode
245248
Training:
246249

247250
```python
251+
from swift import get_model_processor, get_template, Swift, load_dataset, EncodePreprocessor, Seq2SeqTrainer
248252
# Retrieve the model and template, and add a trainable LoRA module
249-
model, tokenizer = get_model_tokenizer(model_id_or_path, ...)
250-
template = get_template(model.model_meta.template, tokenizer, ...)
253+
model, tokenizer = get_model_processor(model_id_or_path, ...)
254+
template = get_template(tokenizer, ...)
251255
model = Swift.prepare_model(model, lora_config)
252256

253257
# Download and load the dataset, and encode the text into tokens
@@ -269,8 +273,9 @@ trainer.train()
269273
Inference:
270274

271275
```python
276+
from swift import TransformersEngine, InferRequest, RequestConfig
272277
# Perform inference using the native PyTorch engine
273-
engine = PtEngine(model_id_or_path, adapters=[lora_checkpoint])
278+
engine = TransformersEngine(model_id_or_path, adapters=[lora_checkpoint])
274279
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
275280
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)
276281

README_CN.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@
7373
- **模型量化**:支持AWQ、GPTQ、FP8和BNB的量化导出,导出的模型支持使用vLLM/SGLang/LmDeploy推理加速。
7474

7575
## 🎉 新闻
76+
- 🎁 2025.01.15: **ms-swift v4.0**大版本更新进行中,建议使用稳定分支[release/3.12](https://github.com/modelscope/ms-swift/tree/release/3.12),您的建议可以在[这个issue](https://github.com/modelscope/ms-swift/issues/7250)中反馈给我们,感谢您的支持。
7677
- 🎁 2025.11.14: Megatron GRPO现已支持!查看[文档](./docs/source/Megatron-SWIFT/GRPO.md)[示例](examples/megatron/grpo)
7778
- 🎁 2025.11.04: 支持[Mcore-Bridge](docs/source/Megatron-SWIFT/Mcore-Bridge.md),使Megatron训练像transformers一样简单易用。
7879
- 🎁 2025.10.28: Ray [已支持](docs/source/Instruction/Ray.md)
@@ -117,6 +118,8 @@ pip install ms-swift -U
117118

118119
git clone https://github.com/modelscope/ms-swift.git
119120
cd ms-swift
121+
# main分支为swift4.x。若安装swift3.x,请运行以下命令
122+
# git checkout release/3.12
120123
pip install -e .
121124
```
122125

@@ -233,9 +236,10 @@ ms-swift也支持使用python的方式进行训练和推理。下面给出训练
233236

234237
训练:
235238
```python
239+
from swift import get_model_processor, get_template, Swift, load_dataset, EncodePreprocessor, Seq2SeqTrainer
236240
# 获取模型和template,并加入可训练的LoRA模块
237-
model, tokenizer = get_model_tokenizer(model_id_or_path, ...)
238-
template = get_template(model.model_meta.template, tokenizer, ...)
241+
model, tokenizer = get_model_processor(model_id_or_path, ...)
242+
template = get_template(tokenizer, ...)
239243
model = Swift.prepare_model(model, lora_config)
240244

241245
# 下载并载入数据集,并将文本encode成tokens
@@ -257,8 +261,9 @@ trainer.train()
257261

258262
推理:
259263
```python
264+
from swift import TransformersEngine, InferRequest, RequestConfig
260265
# 使用原生pytorch引擎进行推理
261-
engine = PtEngine(model_id_or_path, adapters=[lora_checkpoint])
266+
engine = TransformersEngine(model_id_or_path, adapters=[lora_checkpoint])
262267
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
263268
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)
264269

docs/source/BestPractices/Embedding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ SWIFT已经支持Embedding模型的训练,包括纯文本和多模态两个类
4545
- online_contrastive: 考虑hard negative和hard positive部分的contrastive loss,label仅支持0和1两个值
4646
- infonce: 在同一个batch中不同row两两计算cosine相似度,并使row内部相似度最大,不同row相似度最小,不需要label
4747

48-
loss的源代码可以在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss.py)找到。
48+
loss的源代码可以在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/loss/mapping.py)找到。
4949

5050
## 数据集格式
5151

docs/source/BestPractices/MLLM-Registration.md

Lines changed: 70 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
## 环境准备
77
```shell
88
# 避免未来出现与文档的不兼容情况
9-
pip install "ms-swift>=3.10.2,<3.11"
9+
pip install "ms-swift>=4.0"
1010

1111
pip install "transformers==4.57.*" "qwen_omni_utils==0.0.8"
1212
```
@@ -17,15 +17,14 @@ pip install "transformers==4.57.*" "qwen_omni_utils==0.0.8"
1717
第一步,我们需要注册模型,来获取模型和processor。
1818

1919
```python
20-
from swift.llm import (
21-
register_model, ModelMeta, ModelGroup, Model, register_model_arch, MultiModelKeys,
22-
get_model_tokenizer_with_flash_attn, get_model_tokenizer
23-
)
24-
from swift.llm.model.model.qwen import patch_qwen_vl_utils
25-
from swift.llm.model.utils import use_submodel_func
26-
from swift.llm.model.patcher import patch_get_input_embeddings
27-
from swift.utils import get_env_args
20+
from transformers import PretrainedConfig, PreTrainedModel
2821

22+
from swift.model import (Model, ModelGroup, ModelMeta, MultiModelKeys, get_model_processor, register_model,
23+
register_model_arch, ModelLoader)
24+
from swift.model.models.qwen import patch_qwen_vl_utils
25+
from swift.model.patcher import patch_get_input_embeddings
26+
from swift.model.utils import use_submodel_func
27+
from swift.utils import get_env_args, Processor
2928

3029
register_model_arch(
3130
MultiModelKeys(
@@ -41,33 +40,44 @@ register_model_arch(
4140
generator=['talker', 'token2wav'],
4241
))
4342

44-
def get_model_tokenizer_qwen2_5_omni(model_dir, *args, **kwargs):
45-
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor, Qwen2_5OmniConfig
46-
from qwen_omni_utils import vision_process
47-
print('Run my_qwen2_5_omni...')
48-
kwargs['automodel_class'] = kwargs['automodel_class'] or Qwen2_5OmniForConditionalGeneration
49-
# 自定义`get_model_tokenizer_with_flash_attn`中获取tokenizer和config的方式
50-
processor = Qwen2_5OmniProcessor.from_pretrained(model_dir, trust_remote_code=True)
51-
kwargs['tokenizer'] = processor.tokenizer
52-
kwargs['model_config'] = Qwen2_5OmniConfig.from_pretrained(model_dir, trust_remote_code=True)
53-
enable_audio_output = get_env_args('ENABLE_AUDIO_OUTPUT', bool, None)
54-
if enable_audio_output is not None:
55-
kwargs['model_config'].enable_audio_output = enable_audio_output
56-
# 可以通过环境变量来控制qwen_omni_utils库中的常量,例如:`MAX_PIXELS`等
57-
patch_qwen_vl_utils(vision_process)
58-
# 请尽量使用该函数来获取model和tokenizer。而避免直接使用AutoModelForCausalLM(会产生不兼容问题)。
59-
model, _ = get_model_tokenizer_with_flash_attn(model_dir, *args, **kwargs)
60-
if model:
61-
# 为了多模态模型的统一性,我们将模型的forward/generate函数替换为其language_model的forward/generate函数。
62-
# 自己处理额外的部分。
43+
class Qwen2_5OmniLoader(ModelLoader):
44+
45+
46+
def get_config(self, model_dir: str) -> PretrainedConfig:
47+
from transformers import Qwen2_5OmniConfig
48+
config = Qwen2_5OmniConfig.from_pretrained(model_dir, trust_remote_code=True)
49+
enable_audio_output = get_env_args('ENABLE_AUDIO_OUTPUT', bool, None)
50+
if enable_audio_output is not None:
51+
config.enable_audio_output = enable_audio_output
52+
return config
53+
54+
def get_processor(self, model_dir: str, config: PretrainedConfig) -> Processor:
55+
from transformers import Qwen2_5OmniProcessor
56+
from qwen_omni_utils import vision_process
57+
processor = Qwen2_5OmniProcessor.from_pretrained(model_dir, trust_remote_code=True)
58+
# Control constants in qwen_omni_utils library via environment variables,
59+
# e.g., `MAX_PIXELS`, etc.
60+
patch_qwen_vl_utils(vision_process)
61+
return processor
62+
63+
def get_model(self, model_dir: str, config: PretrainedConfig, processor: Processor,
64+
model_kwargs) -> PreTrainedModel:
65+
from transformers import Qwen2_5OmniForConditionalGeneration
66+
print('Run my_qwen2_5_omni...')
67+
self.auto_model_cls = self.auto_model_cls or Qwen2_5OmniForConditionalGeneration
68+
model = super().get_model(model_dir, config, processor, model_kwargs)
69+
# For multimodal model consistency, we replace the model's forward/generate functions
70+
# with those of its language_model.
71+
# Handle additional parts separately.
6372
use_submodel_func(model, 'thinker')
64-
# 一些对model/config的自定义(通常不需要设置,若训练/推理中出现报错,则根据特定模型进行配置)
73+
# Avoid inplace operations on leaf_variable during training
74+
# (replacing parts of input_embeds with images_embeds)
75+
patch_get_input_embeddings(model.thinker.visual, 'patch_embed')
76+
# Some custom settings for model/config (usually not needed; configure based on
77+
# specific model if errors occur during training/inference)
6578
model.config.keys_to_ignore_at_inference += ['hidden_states', 'attention_mask']
6679
model.config.talker_config.pad_token_id = None
67-
# 避免在训练时对leaf_variable进行inplace操作导致报错(将input_embeds中的部分内容替换为images_embeds的行为)
68-
patch_get_input_embeddings(model.thinker.visual, 'patch_embed')
69-
# 最终需要返回model和 processor(多模态)/tokenizer(纯文本)
70-
return model, processor
80+
return model
7181

7282

7383
register_model(
@@ -79,9 +89,9 @@ register_model(
7989
Model('Qwen/Qwen2.5-Omni-7B', 'Qwen/Qwen2.5-Omni-7B'),
8090
]),
8191
],
82-
'my_qwen2_5_omni',
8392
# 用来获取model和processor的函数。
84-
get_model_tokenizer_qwen2_5_omni,
93+
Qwen2_5OmniLoader,
94+
template='my_qwen2_5_omni',
8595
is_multimodal=True, # 是否是多模态模型
8696
model_arch='my_qwen2_5_omni', # 通常只为多模态模型设置
8797
# 用于model_type的自动匹配
@@ -96,7 +106,7 @@ register_model(
96106

97107
if __name__ == '__main__':
98108
# 测试与debug
99-
model, processor = get_model_tokenizer('Qwen/Qwen2.5-Omni-7B', model_type='my_qwen2_5_omni')
109+
model, processor = get_model_processor('Qwen/Qwen2.5-Omni-7B', model_type='my_qwen2_5_omni')
100110
```
101111

102112
## 注册模板
@@ -110,18 +120,16 @@ template的功能如下:
110120

111121

112122
```python
113-
from swift.llm import (
114-
register_template, Template, get_packed_seq_params, to_float_dtype, TemplateMeta,
115-
get_template, get_model_tokenizer
116-
)
117-
from transformers.integrations import is_deepspeed_zero3_enabled
118-
from swift.llm.template.template_inputs import StdTemplateInputs
119-
from swift.llm.template.utils import Context, findall
120-
from swift.llm.template.vision_utils import load_audio
121-
from swift.utils import get_env_args, get_logger, is_deepspeed_enabled
122123
from functools import partial
123-
from typing import Dict, List, Any, Literal, Optional
124+
from typing import Any, Dict, List, Literal, Optional
125+
124126
import torch
127+
from transformers.integrations import is_deepspeed_zero3_enabled
128+
from swift import get_model_processor
129+
from swift.template import StdTemplateInputs, Template, TemplateMeta, get_template, register_template
130+
from swift.template.utils import Context, findall
131+
from swift.template.vision_utils import load_audio
132+
from swift.utils import Processor, get_env_args, get_logger, get_packed_seq_params, is_deepspeed_enabled, to_float_dtype
125133

126134
logger = get_logger()
127135

@@ -135,7 +143,7 @@ class Qwen2_5OmniTemplate(Template):
135143
# 并会使用简略方式打印(调用`template.safe_decode`)
136144
placeholder_tokens = ['<|IMAGE|>', '<|AUDIO|>', '<|VIDEO|>']
137145

138-
def init_processor(self, processor) -> None:
146+
def init_processor(self, processor: Processor) -> None:
139147
"""在初始化processor时,额外初始化所需的一些常量"""
140148
if processor is None:
141149
return
@@ -416,7 +424,7 @@ class Qwen2_5OmniTemplate(Template):
416424
return res
417425

418426
def generate(self, model, *args, **kwargs):
419-
"""`PtEngine`会调用template.generate方法进行文本生成,这里继承进行自定义。"""
427+
"""`TransformersEngine`会调用template.generate方法进行文本生成,这里继承进行自定义。"""
420428
if kwargs.get('video_grid_thw') is not None:
421429
kwargs['use_audio_in_video'] = self.use_audio_in_video
422430
return super().generate(model, *args, **kwargs)
@@ -432,8 +440,8 @@ register_template(
432440

433441
if __name__ == '__main__':
434442
# 测试与debug
435-
model, processor = get_model_tokenizer('Qwen/Qwen2.5-Omni-7B', model_type='my_qwen2_5_omni')
436-
template = get_template('my_qwen2_5_omni', processor)
443+
model, processor = get_model_processor('Qwen/Qwen2.5-Omni-7B', model_type='my_qwen2_5_omni')
444+
template = get_template(processor, template_type='my_qwen2_5_omni')
437445
data = {
438446
'messages': [
439447
{'role': 'user', 'content': '描述视频<video>与图片<image>内容。'},
@@ -451,14 +459,14 @@ if __name__ == '__main__':
451459

452460

453461
## 推理对齐
454-
接下来,你需要进行PtEngine与transformers的推理对齐。通常你需要对齐`input_ids`以及输出内容。你可以书写以下测试函数:
462+
接下来,你需要进行TransformersEngine与transformers的推理对齐。通常你需要对齐`input_ids`以及输出内容。你可以书写以下测试函数:
455463

456464
```python
457465
import os
458466
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
459467
from qwen_omni_utils import process_mm_info
460468
from modelscope import snapshot_download
461-
from swift.llm import PtEngine, InferRequest, RequestConfig
469+
from swift.infer_engine import TransformersEngine, InferRequest, RequestConfig
462470
import requests
463471

464472
def infer_hf():
@@ -494,7 +502,7 @@ def infer_hf():
494502
return inputs['input_ids'][0].tolist(), text[0]
495503

496504
def test_my_qwen2_5_omni():
497-
engine = PtEngine('Qwen/Qwen2.5-Omni-7B', model_type='my_qwen2_5_omni', attn_impl='flash_attention_2')
505+
engine = TransformersEngine('Qwen/Qwen2.5-Omni-7B', model_type='my_qwen2_5_omni', attn_impl='flash_attention_2')
498506
infer_request = InferRequest(messages=[{
499507
"role": "user",
500508
"content": "<video><image>描述视频和图像。",
@@ -503,14 +511,14 @@ def test_my_qwen2_5_omni():
503511
images=["http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png"],
504512
)
505513
request_config = RequestConfig(temperature=0, max_tokens=512)
506-
input_ids = engine.default_template.encode(infer_request)['input_ids']
514+
input_ids = engine.template.encode(infer_request)['input_ids']
507515
resp_list = engine.infer([infer_request], request_config)
508516
resp = resp_list[0].choices[0].message.content
509517
return input_ids, resp
510518

511519

512520
if __name__ == '__main__':
513-
# 开启debug模式,会打印`PtEngine.infer`的input_ids和generate_ids
521+
# 开启debug模式,会打印`TransformersEngine.infer`的input_ids和generate_ids
514522
os.environ['SWIFT_DEBUG'] = '1'
515523
input_ids_hf, response_hf = infer_hf()
516524
input_ids_swift, response_swift = test_my_qwen2_5_omni()
@@ -524,13 +532,13 @@ if __name__ == '__main__':
524532

525533
使用python代码训练,这通常更容易debug:
526534
```python
527-
from swift.llm import sft_main, TrainArguments
535+
from swift import sft_main, SftArguments
528536
import os
529537
if __name__ == '__main__':
530538
os.environ['MAX_PIXELS'] = '1003520'
531-
sft_main(TrainArguments(
539+
sft_main(SftArguments(
532540
model='Qwen/Qwen2.5-Omni-7B',
533-
dataset='AI-ModelScope/LaTeX_OCR#5000',
541+
dataset=['AI-ModelScope/LaTeX_OCR#5000'],
534542
model_type='my_qwen2_5_omni',
535543
template='my_qwen2_5_omni',
536544
load_from_cache_file=True,
@@ -545,7 +553,7 @@ if __name__ == '__main__':
545553
learning_rate=1e-4,
546554
lora_rank=8,
547555
lora_alpha=32,
548-
target_modules='all-linear',
556+
target_modules=['all-linear'],
549557
freeze_vit=True,
550558
freeze_aligner=True,
551559
gradient_accumulation_steps=1,
@@ -574,7 +582,7 @@ swift sft \
574582
--model Qwen/Qwen2.5-Omni-7B \
575583
--model_type my_qwen2_5_omni \
576584
--template my_qwen2_5_omni \
577-
--custom_register_path 'examples/custom/my_qwen2_5_omni/my_register.py' \
585+
--external_plugins 'examples/custom/my_qwen2_5_omni/my_register.py' \
578586
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#2000' \
579587
'AI-ModelScope/LaTeX_OCR:human_handwrite#2000' \
580588
'speech_asr/speech_asr_aishell1_trainsets:validation#2000' \
@@ -586,7 +594,7 @@ swift sft \
586594
--attn_impl flash_attn \
587595
--padding_free true \
588596
--packing true \
589-
--num_train_epochs 1 \
597+
--num_train_epochs 3 \
590598
--per_device_train_batch_size 1 \
591599
--per_device_eval_batch_size 1 \
592600
--learning_rate 1e-4 \
@@ -618,7 +626,7 @@ MAX_PIXELS=1003520 \
618626
swift infer \
619627
--adapters output/vx-xxx/checkpoint-xxx \
620628
--stream true \
621-
--max_new_tokens 2048 \
629+
--max_new_tokens 512 \
622630
--load_data_args true
623631
```
624632

docs/source/BestPractices/Qwen3-VL-Best-Practice.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
```shell
99
pip install "transformers>=4.57" "qwen_vl_utils>=0.0.14"
1010

11-
pip install "ms-swift>=3.9.1"
11+
pip install "ms-swift>=4.0"
1212
# pip install "vllm>=0.11.0" # 若使用vllm推理后端进行推理
1313
```
1414
- 关于训练缓慢:使用torch2.9会遇到训练(conv3d算子)缓慢的问题,请使用torch2.8尝试,参考[这个issue](https://github.com/pytorch/pytorch/issues/166122)。在 ms-swift>=3.11.2,你可以通过设置`SWIFT_PATCH_CONV3D=1`规避该问题,具体查看[这个issue](https://github.com/modelscope/ms-swift/issues/7108)
@@ -71,7 +71,7 @@ print(output_text[0])
7171
# 'A baby wearing glasses sits on a bed, engrossed in reading a book. The baby turns the pages with both hands, occasionally looking up and smiling. The room is cozy, with a crib in the background and clothes scattered around. The baby’s focus and curiosity are evident as they explore the book, creating a heartwarming scene of early learning and discovery.'
7272
```
7373

74-
使用 ms-swift 的 `PtEngine` 进行推理:
74+
使用 ms-swift 的 `TransformersEngine` 进行推理:
7575
```python
7676
import os
7777
# os.environ['SWIFT_DEBUG'] = '1'
@@ -80,8 +80,8 @@ os.environ['VIDEO_MAX_TOKEN_NUM'] = '128'
8080
os.environ['FPS_MAX_FRAMES'] = '16'
8181

8282

83-
from swift.llm import PtEngine, InferRequest, RequestConfig
84-
engine = PtEngine('Qwen/Qwen3-VL-4B-Instruct', attn_impl='flash_attention_2')
83+
from swift.infer_engine import TransformersEngine, InferRequest, RequestConfig
84+
engine = TransformersEngine('Qwen/Qwen3-VL-4B-Instruct') # attn_impl='flash_attention_2'
8585
infer_request = InferRequest(messages=[{
8686
"role": "user",
8787
"content": '<video>Describe this video.',

0 commit comments

Comments
 (0)