Skip to content

feat: 添加独立视觉模型配置,支持fallback到主LLM#2

Merged
TangMeng12 merged 2 commits into
open-vela:devfrom
whyovo:dev
Apr 16, 2026
Merged

feat: 添加独立视觉模型配置,支持fallback到主LLM#2
TangMeng12 merged 2 commits into
open-vela:devfrom
whyovo:dev

Conversation

@whyovo
Copy link
Copy Markdown
Contributor

@whyovo whyovo commented Apr 14, 2026

概述

  • 新增 set_vision_llm 命令,允许为图片分析配置独立的视觉模型,同时保留文本模型用于对话。这样的话就可以文本用mimo-v2-flash,图片才用omni,降低成本。而且也可以参考miloco,单独在服务器布置视觉模型,比如miloco-vl-7b,这样可以保护图片的隐私,文本可以调用性能更高更快的api。当未配置视觉多模态模型或者运用set_vision_llm clear删除后,视觉理解依然使用原本的文本接口,保证原功能不变。
  • 修复 mimo preset 模型名从 MiMo-v2-Flash 改为 mimo-v2-flash(全小写),适配 api.xiaomimimo.com 的要求

改动文件

文件 改动内容
include/agent_config.h 新增 AGENT_CFG_KEY_VISION_MODEL/HOST/API_KEY 配置 key
src/llm/llm_proxy.c 新增 vision 静态变量、llm_snapshot_vision_config()(带 fallback)、llm_set_vision_model() setter
src/llm/llm_proxy.h 声明 llm_snapshot_vision_configllm_set_vision_model
src/llm/llm_vision.c llm_chat_visionllm_chat_vision_raw 改用 vision 专用配置
src/channels/cmd_llm.c 新增 cmd_set_vision_llm(含 mimo/openai/qwen/glm 四个 preset),修复 mimo 模型名
src/channels/cmd_llm.h 声明 cmd_set_vision_llm
src/channels/nsh_commands.c 注册命令、更新 help 文本和 config_show 显示

工作原理

  • 未配置 vision_model 时:视觉调用(analyze_imagecamera_capture)自动 fallback 到主 LLM 配置 — 完全向后兼容
  • 已配置 vision_model 时:视觉调用使用独立配置(可指定不同的 host、模型、API key)
  • 对话始终使用主 LLM,不受影响

使用方式

vela> set_vision_llm mimo <api_key>         # 视觉用小米 mimo-v2-omni
vela> set_vision_llm openai <api_key>       # 视觉用 OpenAI gpt-4o
vela> set_vision_llm qwen <api_key>         # 视觉用通义千问 qwen-vl-max
vela> set_vision_llm clear                  # 清除配置,回归主 LLM
vela> config_show                           # 会显示 Vision Model/Host/Key
image image

@github-actions
Copy link
Copy Markdown

❌ CLA Signature Required

@whyovo Some contributors need to sign the CLA:

  • 1914457309@qq.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #24410717863


💡 Tip: All contributors must sign the CLA before the PR can be merged.

@whyovo
Copy link
Copy Markdown
Contributor Author

whyovo commented Apr 14, 2026

/check-cla

@github-actions
Copy link
Copy Markdown

✅ CLA Verification Complete

@whyovo All contributors have signed the CLA!

  • 1914457309@qq.com

📋 View detailed check results: Action Run #24411063506

Your pull request can now proceed with the review process! 🎉

Add set_vision_llm command allowing users to configure a separate
vision-capable model (e.g. mimo-v2-omni, gpt-4o, qwen-vl-max) for
image analysis while keeping a cheaper text model for chat.

When vision_model is not configured, vision calls automatically fall
back to the main LLM config, maintaining full backward compatibility.

Changes:
- agent_config.h: add AGENT_CFG_KEY_VISION_* config keys
- llm_proxy.c: add vision static vars, snapshot, setter with fallback
- llm_proxy.h: declare llm_snapshot_vision_config, llm_set_vision_model
- llm_vision.c: use vision-specific config in both vision entry points
- cmd_llm.c: add cmd_set_vision_llm with 4 presets (mimo/openai/qwen/glm)
- nsh_commands.c: register command, update help text and config_show

Also fix mimo preset model name from MiMo-v2-Flash to mimo-v2-flash
(lowercase) as required by api.xiaomimimo.com.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/llm/llm_proxy.c Outdated
strncpy(s_vision_host, host, sizeof(s_vision_host) - 1);
s_vision_host[sizeof(s_vision_host) - 1] = '\0';
} else {
config_del(AGENT_CFG_KEY_VISION_HOST);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_del not public api, please use claw_config_set

Comment thread src/llm/llm_proxy.c

pthread_mutex_unlock(&s_llm_lock);

syslog(LOG_INFO, "[%s] Vision LLM config updated: model=%s host=%s\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syslog should move to unlock before

- Replace config_del with claw_config_set(key, "") per review feedback,
  using the public API instead of the non-public config_del function
- Move syslog before pthread_mutex_unlock to avoid reading
  s_vision_model/s_vision_host after lock release

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@whyovo
Copy link
Copy Markdown
Contributor Author

whyovo commented Apr 15, 2026

Thank you for your feedback! I've updated the code accordingly.

@TangMeng12 TangMeng12 merged commit 2b48a9d into open-vela:dev Apr 16, 2026
5 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants