Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/Instruction/Supported-models-and-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,8 @@
|[PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking](https://modelscope.cn/models/PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking)|ernie_thinking|ernie_thinking|-|✔|-|[baidu/ERNIE-4.5-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)|
|[meituan-longcat/LongCat-Flash-Chat](https://modelscope.cn/models/meituan-longcat/LongCat-Flash-Chat)|longchat|longchat|transformers>=4.54,<4.56|&#x2718;|-|[meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)|
|[meituan-longcat/LongCat-Flash-Chat-FP8](https://modelscope.cn/models/meituan-longcat/LongCat-Flash-Chat-FP8)|longchat|longchat|transformers>=4.54,<4.56|&#x2718;|-|[meituan-longcat/LongCat-Flash-Chat-FP8](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat-FP8)|
|[XiaomiMiMo/MiMo-V2-Flash](https://modelscope.cn/models/XiaomiMiMo/MiMo-V2-Flash)|mimo_v2|mimo_v2|-|&#x2718;|-|[XiaomiMiMo/MiMo-V2-Flash](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash)|
|[XiaomiMiMo/MiMo-V2-Flash-Base](https://modelscope.cn/models/XiaomiMiMo/MiMo-V2-Flash-Base)|mimo_v2|mimo_v2|-|&#x2718;|-|[XiaomiMiMo/MiMo-V2-Flash-Base](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash-Base)|
|[answerdotai/ModernBERT-base](https://modelscope.cn/models/answerdotai/ModernBERT-base)|modern_bert|dummy|transformers>=4.48|&#x2718;|bert|[answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)|
|[answerdotai/ModernBERT-large](https://modelscope.cn/models/answerdotai/ModernBERT-large)|modern_bert|dummy|transformers>=4.48|&#x2718;|bert|[answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)|
|[iic/gte-modernbert-base](https://modelscope.cn/models/iic/gte-modernbert-base)|modern_bert_gte|dummy|transformers>=4.48|&#x2718;|bert, embedding|[Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base)|
Expand Down
2 changes: 2 additions & 0 deletions docs/source_en/Instruction/Supported-models-and-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -649,6 +649,8 @@ The table below introduces the models integrated with ms-swift:
|[PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking](https://modelscope.cn/models/PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking)|ernie_thinking|ernie_thinking|-|&#x2714;|-|[baidu/ERNIE-4.5-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)|
|[meituan-longcat/LongCat-Flash-Chat](https://modelscope.cn/models/meituan-longcat/LongCat-Flash-Chat)|longchat|longchat|transformers>=4.54,<4.56|&#x2718;|-|[meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)|
|[meituan-longcat/LongCat-Flash-Chat-FP8](https://modelscope.cn/models/meituan-longcat/LongCat-Flash-Chat-FP8)|longchat|longchat|transformers>=4.54,<4.56|&#x2718;|-|[meituan-longcat/LongCat-Flash-Chat-FP8](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat-FP8)|
|[XiaomiMiMo/MiMo-V2-Flash](https://modelscope.cn/models/XiaomiMiMo/MiMo-V2-Flash)|mimo_v2|mimo_v2|-|&#x2718;|-|[XiaomiMiMo/MiMo-V2-Flash](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash)|
|[XiaomiMiMo/MiMo-V2-Flash-Base](https://modelscope.cn/models/XiaomiMiMo/MiMo-V2-Flash-Base)|mimo_v2|mimo_v2|-|&#x2718;|-|[XiaomiMiMo/MiMo-V2-Flash-Base](https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash-Base)|
|[answerdotai/ModernBERT-base](https://modelscope.cn/models/answerdotai/ModernBERT-base)|modern_bert|dummy|transformers>=4.48|&#x2718;|bert|[answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)|
|[answerdotai/ModernBERT-large](https://modelscope.cn/models/answerdotai/ModernBERT-large)|modern_bert|dummy|transformers>=4.48|&#x2718;|bert|[answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)|
|[iic/gte-modernbert-base](https://modelscope.cn/models/iic/gte-modernbert-base)|modern_bert_gte|dummy|transformers>=4.48|&#x2718;|bert, embedding|[Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base)|
Expand Down
1 change: 1 addition & 0 deletions swift/llm/model/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ class LLMModelType:
gemma_emb = 'gemma_emb'
ernie_thinking = 'ernie_thinking'
longchat = 'longchat'
mimo_v2 = 'mimo_v2'


class BertModelType:
Expand Down
14 changes: 14 additions & 0 deletions swift/llm/model/model/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,17 @@ def get_model_tokenizer_yuan(model_dir: str,
get_model_tokenizer_with_flash_attn,
architectures=['BailingMoeV2ForCausalLM'],
))

register_model(
ModelMeta(
LLMModelType.mimo_v2,
[
ModelGroup([
Model('XiaomiMiMo/MiMo-V2-Flash', 'XiaomiMiMo/MiMo-V2-Flash'),
Model('XiaomiMiMo/MiMo-V2-Flash-Base', 'XiaomiMiMo/MiMo-V2-Flash-Base'),
])
],
TemplateType.mimo_v2,
get_model_tokenizer_with_flash_attn,
architectures=['MiMoV2FlashForCausalLM'],
))
1 change: 1 addition & 0 deletions swift/llm/template/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ class LLMTemplateType:
ernie = 'ernie'
ernie_thinking = 'ernie_thinking'
longchat = 'longchat'
mimo_v2 = 'mimo_v2'

aya = 'aya'
c4ai = 'c4ai'
Expand Down
7 changes: 7 additions & 0 deletions swift/llm/template/template/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,3 +424,10 @@ class GptOssTemplateMeta(TemplateMeta):
is_thinking=True,
thinking_prefix='<think>\n',
))

register_template(
ChatmlTemplateMeta(
LLMTemplateType.mimo_v2,
default_system='You are MiMo, a helpful AI assistant engineered by Xiaomi.',
response_prefix='<think></think>',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The response_prefix is set to <think></think>, which is an empty tag pair. This is inconsistent with other "thinking" templates like ring2 and gpt_oss which use <think>\n to allow the model to generate thoughts. An empty tag might not be the intended behavior. If the model is not expected to generate thoughts, it might be better to use <think>\n</think>\n to represent an empty thinking step explicitly. If it is expected to generate thoughts, <think>\n would be more appropriate.

Suggested change
response_prefix='<think></think>',
response_prefix='<think>\n</think>\n',

))
Loading