Description
MiMo-V2.5 is a native omnimodal model (text, image, video, audio) from Xiaomi, but when configured as a custom provider in crush.json, Crush returns:
This model (MiMo-V2.5) does not support image data.
MiMo-V2.5 is not listed in Crush's internal model registry, so Crush blocks image input even though the model supports it natively via a 729M-param Vision Transformer encoder.
Expected Behavior
Custom provider models should be able to receive image data, either:
- By adding a
vision: true (or capabilities) field to custom model definitions in crush.json
- By adding MiMo-V2.5 to the vision-capable model list in Catwalk
Current Model Config
{
"providers": {
"xiaomi": {
"type": "anthropic",
"base_url": "https://token-plan-sgp.xiaomimimo.com/anthropic",
"models": [
{
"id": "mimo-v2.5",
"name": "MiMo-V2.5",
"context_window": 262144
}
]
}
}
}
Suggested Fix
Allow custom provider models to declare capabilities, e.g.:
{
"id": "mimo-v2.5",
"name": "MiMo-V2.5",
"context_window": 262144,
"vision": true
}
Description
MiMo-V2.5 is a native omnimodal model (text, image, video, audio) from Xiaomi, but when configured as a custom provider in
crush.json, Crush returns:MiMo-V2.5 is not listed in Crush's internal model registry, so Crush blocks image input even though the model supports it natively via a 729M-param Vision Transformer encoder.
Expected Behavior
Custom provider models should be able to receive image data, either:
vision: true(orcapabilities) field to custom model definitions incrush.jsonCurrent Model Config
{ "providers": { "xiaomi": { "type": "anthropic", "base_url": "https://token-plan-sgp.xiaomimimo.com/anthropic", "models": [ { "id": "mimo-v2.5", "name": "MiMo-V2.5", "context_window": 262144 } ] } } }Suggested Fix
Allow custom provider models to declare capabilities, e.g.:
{ "id": "mimo-v2.5", "name": "MiMo-V2.5", "context_window": 262144, "vision": true }