Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 14 additions & 27 deletions apps/site/docs/en/model-common-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -74,37 +74,21 @@ MIDSCENE_MODEL_FAMILY="doubao-seed" # "doubao-vision" is also supported
# Optional: control reasoning effort (low, medium, high)
# MIDSCENE_MODEL_REASONING_EFFORT="medium"
```
### Qwen3.6 {#qwen36}
### Qwen3.X Series {#qwen3x}

Using Alibaba Cloud's `qwen3.6-plus` as an example. It is recommended to disable the platform's default thinking mode to improve execution speed, the environment variable configuration is as follows:
Qwen3.5 and Qwen3.6 are currently supported in the Qwen3.X series.

```bash
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen3.6-plus"
MIDSCENE_MODEL_FAMILY="qwen3.6"
MIDSCENE_MODEL_REASONING_ENABLED="false"
```

To enable thinking mode, remove the `MIDSCENE_MODEL_REASONING_ENABLED="false"` line and add `MIDSCENE_MODEL_REASONING_BUDGET="500"` to control thinking cost.

You can also use Qwen3.6 from [OpenRouter](https://openrouter.ai/qwen).

### Qwen3.5 {#qwen35}

Using Alibaba Cloud's `qwen3.5-plus` as an example. It is recommended to disable the platform's default thinking mode to improve execution speed, the environment variable configuration is as follows:
Using Alibaba Cloud's `qwen3.5-plus` as an example. It is recommended to disable the platform's default thinking mode to improve execution speed:

```bash
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_NAME="qwen3.5-plus" # For Qwen3.6, use "qwen3.6-plus"
MIDSCENE_MODEL_FAMILY="qwen3.5" # For Qwen3.6, use "qwen3.6"
MIDSCENE_MODEL_REASONING_ENABLED="false"
```

To enable thinking mode, remove the `MIDSCENE_MODEL_REASONING_ENABLED="false"` line and add `MIDSCENE_MODEL_REASONING_BUDGET="500"` to control thinking cost.

You can also use Qwen3.5 from [OpenRouter](https://openrouter.ai/qwen).
To enable thinking mode, set `MIDSCENE_MODEL_REASONING_ENABLED="true"` and add `MIDSCENE_MODEL_REASONING_BUDGET="500"` to control thinking cost.

### Qwen3-VL {#qwen3-vl}

Expand All @@ -119,20 +103,23 @@ MIDSCENE_MODEL_FAMILY="qwen3-vl"

You can also use Qwen3-VL from [OpenRouter](https://openrouter.ai/qwen).

### Zhipu GLM-V {#glm-v}
### Zhipu GLM-V Series {#glm-v}

Zhipu GLM-V is an open-source vision model from Zhipu AI. Using `GLM-4.6V` as an example:
Zhipu GLM-V is a vision understanding model from Zhipu AI. The latest versions include `GLM-4.6V` (open-source) and `GLM-5V-Turbo`.

It is recommended to disable the platform's default thinking mode to improve execution speed. Using `GLM-4.6V` as an example:

Obtain an API key from [Z.AI (Global)](https://z.ai/manage-apikey/apikey-list) or [BigModel (CN)](https://bigmodel.cn/usercenter/proj-mgmt/apikeys), and set:

```bash
MIDSCENE_MODEL_BASE_URL="https://api.z.ai/api/paas/v4" # Or https://open.bigmodel.cn/api/paas/v4
MIDSCENE_MODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # Or https://api.z.ai/api/paas/v4
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="glm-4.6v"
MIDSCENE_MODEL_NAME="glm-4.6v" # For GLM-5V-Turbo, use "glm-5v-turbo"
MIDSCENE_MODEL_FAMILY="glm-v"
MIDSCENE_MODEL_REASONING_ENABLED="false"
```

**Learn more about Zhipu GLM-V**
**Learn more about the open-source Zhipu GLM-V model**

- Github: [https://github.com/zai-org/GLM-V](https://github.com/zai-org/GLM-V)
- Hugging Face: [https://huggingface.co/zai-org/GLM-4.6V](https://huggingface.co/zai-org/GLM-4.6V)
Expand Down
4 changes: 2 additions & 2 deletions apps/site/docs/en/model-strategy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import TroubleshootingLLMConnectivity from './common/troubleshooting-llm-connect

If you want to try Midscene right away, pick a model and follow its configuration guide:
* [Doubao Seed Model](./model-common-config#doubao-seed-model)
* Qwen Models: [Qwen3.6](./model-common-config#qwen36), [Qwen3.5](./model-common-config#qwen35), [Qwen3-VL](./model-common-config#qwen3-vl)
* Qwen Models: [Qwen3.X Series](./model-common-config#qwen3x), [Qwen3-VL](./model-common-config#qwen3-vl)
* [Zhipu GLM-V](./model-common-config#glm-v)
* [Zhipu AutoGLM](./model-common-config#auto-glm)
* [Gemini-3-Pro / Gemini-3-Flash](./model-common-config#gemini-3-pro)
Expand Down Expand Up @@ -53,7 +53,7 @@ If you are unsure where to start, pick whichever model is easiest to access toda
| Model family | Deployment | Midscene notes |
| --- | --- | --- |
| Doubao Seed Model<br />[Quick setup](./model-common-config#doubao-seed-model) | Volcano Engine:<br />[Doubao-Seed-1.6-Vision](https://www.volcengine.com/docs/82379/1799865)<br />[Doubao-Seed-2.0-Lite](https://www.volcengine.com/docs/82379/1799865) | ⭐⭐⭐⭐<br />Strong at UI planning and targeting<br />Slightly slower |
| Qwen3.5<br />[Quick setup](./model-common-config#qwen35) | [Alibaba Cloud](https://help.aliyun.com/zh/model-studio/vision)<br/>[OpenRouter](https://openrouter.ai/qwen) | ⭐⭐⭐⭐<br />Stronger than Qwen3-VL and Qwen2.5-VL |
| Qwen3.5<br />[Quick setup](./model-common-config#qwen3x) | [Alibaba Cloud](https://help.aliyun.com/zh/model-studio/vision)<br/>[OpenRouter](https://openrouter.ai/qwen) | ⭐⭐⭐⭐<br />Stronger than Qwen3-VL and Qwen2.5-VL |
| Zhipu GLM-4.6V<br />[Quick setup](./model-common-config#glm-v) | [Z.AI (Global)](https://docs.z.ai/guides/vlm/glm-4.6v)<br/>[BigModel (CN)](https://docs.bigmodel.cn/cn/guide/models/vlm/glm-4.6v) | Newly integrated, welcome to try it out<br />Weights open-sourced on [HuggingFace](https://huggingface.co/zai-org/GLM-4.6V) |
| Gemini-3-Pro / Gemini-3-Flash<br />[Quick setup](./model-common-config#gemini-3-pro) | [Google Cloud](https://ai.google.dev/gemini-api/docs/models/gemini) | ⭐⭐⭐<br />Gemini-3-Flash is supported<br />Price is higher than Doubao and Qwen |
| UI-TARS<br />[Quick setup](./model-common-config#ui-tars) | [Volcano Engine](https://www.volcengine.com/docs/82379/1536429) | ⭐⭐<br />Strong exploratory ability but results vary by scenario<br />Open-source versions available ([HuggingFace](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT) / [GitHub](https://github.com/bytedance/ui-tars)) |
Expand Down
31 changes: 11 additions & 20 deletions apps/site/docs/zh/model-common-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,29 +78,17 @@ MIDSCENE_MODEL_FAMILY="doubao-seed" # 也兼容 "doubao-vision"
# MIDSCENE_MODEL_REASONING_EFFORT="medium"
```

### 千问 Qwen3.6 {#qwen36}
### 千问 Qwen3.X 系列 {#qwen3x}

以[阿里云](https://www.aliyun.com/)的 `qwen3.6-plus` 模型为例,推荐关闭平台默认的思考模式以提升执行速度,其环境变量配置如下:

```bash
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen3.6-plus"
MIDSCENE_MODEL_FAMILY="qwen3.6"
MIDSCENE_MODEL_REASONING_ENABLED="false"
```

如果需要开启思考模式,可以配置 `MIDSCENE_MODEL_REASONING_ENABLED="true"` 并增加 `MIDSCENE_MODEL_REASONING_BUDGET="500"` 配置以控制思考耗时。

### 千问 Qwen3.5 {#qwen35}
目前已支持 Qwen3.X 系列中的 Qwen3.5、Qwen3.6。

以[阿里云](https://www.aliyun.com/)的 `qwen3.5-plus` 模型为例,推荐关闭平台默认的思考模式以提升执行速度,其环境变量配置如下:

```bash
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_NAME="qwen3.5-plus" # Qwen3.6 系列的 plus 模型是 "qwen3.6-plus"
MIDSCENE_MODEL_FAMILY="qwen3.5" # Qwen3.6 系列对应的 family 是 "qwen3.6"
MIDSCENE_MODEL_REASONING_ENABLED="false"
```

Expand All @@ -117,20 +105,23 @@ MIDSCENE_MODEL_NAME="qwen3-vl-plus"
MIDSCENE_MODEL_FAMILY="qwen3-vl"
```

### 智谱 GLM-V {#glm-v}
### 智谱 GLM-V 系列 {#glm-v}

智谱 GLM-V 是智谱 AI 推出的开源视觉模型。以 `GLM-4.6V` 为例:
智谱 GLM-V 是智谱 AI 推出的视觉理解模型。最新版本有 `GLM-4.6V`(开源)、`GLM-5V-Turbo`。

推荐关闭平台默认的思考模式以提升执行速度,以 `GLM-4.6V` 为例:

从 [Z.AI(国际)](https://z.ai/manage-apikey/apikey-list)或 [BigModel(国内)](https://bigmodel.cn/usercenter/proj-mgmt/apikeys)获取 API 密钥,然后设置:

```bash
MIDSCENE_MODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # 或 https://api.z.ai/api/paas/v4
MIDSCENE_MODEL_API_KEY="......"
MIDSCENE_MODEL_NAME="glm-4.6v"
MIDSCENE_MODEL_NAME="glm-4.6v" # GLM-5V-Turbo 的对应模型名为 "glm-5v-turbo"
MIDSCENE_MODEL_FAMILY="glm-v"
MIDSCENE_MODEL_REASONING_ENABLED="false"
```

**了解更多关于智谱 GLM-V**
**了解更多关于智谱 GLM-V 开源模型**

- Github: [https://github.com/zai-org/GLM-V](https://github.com/zai-org/GLM-V)
- Hugging Face: [https://huggingface.co/zai-org/GLM-4.6V](https://huggingface.co/zai-org/GLM-4.6V)
Expand Down
4 changes: 2 additions & 2 deletions apps/site/docs/zh/model-strategy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import TroubleshootingLLMConnectivity from './common/troubleshooting-llm-connect

如果你想快速开始体验 Midscene,请选择模型并参考配置文档:
* [豆包 Seed 模型](./model-common-config#doubao-seed-model)
* 千问系列: [Qwen3.6](./model-common-config#qwen36), [Qwen3.5](./model-common-config#qwen35), [Qwen3-VL](./model-common-config#qwen3-vl)
* 千问系列: [Qwen3.X 系列](./model-common-config#qwen3x), [Qwen3-VL](./model-common-config#qwen3-vl)
* [智谱 GLM-V](./model-common-config#glm-v)
* [智谱 AutoGLM](./model-common-config#auto-glm)
* [Gemini-3-Pro / Gemini-3-Flash](./model-common-config#gemini-3-pro)
Expand Down Expand Up @@ -53,7 +53,7 @@ DOM 定位方案的稳定性不足预期,它常在 Canvas 元素、CSS backgro
|模型系列|部署|Midscene 评价|
|---|---|---|
|豆包 Seed 模型<br />[快速配置](./model-common-config#doubao-seed-model)|火山引擎版本:<br />[Doubao-Seed-1.6-Vision](https://www.volcengine.com/docs/82379/1799865)<br />[Doubao-Seed-2.0-Lite](https://www.volcengine.com/docs/82379/1799865)|⭐⭐⭐⭐<br/>UI 操作规划、定位能力较强<br />速度略慢|
|千问 Qwen3.5<br />[快速配置](./model-common-config#qwen35)|[阿里云](https://help.aliyun.com/zh/model-studio/vision)<br/>[OpenRouter](https://openrouter.ai/qwen)|⭐⭐⭐⭐<br/>综合效果优于 Qwen3-VL 和 Qwen2.5-VL |
|千问 Qwen3.5<br />[快速配置](./model-common-config#qwen3x)|[阿里云](https://help.aliyun.com/zh/model-studio/vision)<br/>[OpenRouter](https://openrouter.ai/qwen)|⭐⭐⭐⭐<br/>综合效果优于 Qwen3-VL 和 Qwen2.5-VL |
|智谱 GLM-4.6V<br />[快速配置](./model-common-config#glm-v)|[Z.AI (Global)](https://docs.z.ai/guides/vlm/glm-4.6v)<br/>[BigModel (CN)](https://docs.bigmodel.cn/cn/guide/models/vlm/glm-4.6v)|全新接入,欢迎体验<br />模型权重开源[HuggingFace](https://huggingface.co/zai-org/GLM-4.6V) |
|Gemini-3-Pro / Gemini-3-Flash<br />[快速配置](./model-common-config#gemini-3-pro)|[Google Cloud](https://ai.google.dev/gemini-api/docs/models/gemini)|⭐⭐⭐<br />支持 Gemini-3-Flash<br />价格高于豆包和千问|
|UI-TARS <br />[快速配置](./model-common-config#ui-tars)|[火山引擎](https://www.volcengine.com/docs/82379/1536429)|⭐⭐<br /> 有探索能力,但在不同场景表现可能差异较大<br />有开源版本([HuggingFace](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT) / [Github](https://github.com/bytedance/ui-tars)) |
Expand Down
Loading