diff --git a/.github/workflows/ai-evaluation.yml b/.github/workflows/ai-evaluation.yml index d80c4749e..c68fe6441 100644 --- a/.github/workflows/ai-evaluation.yml +++ b/.github/workflows/ai-evaluation.yml @@ -54,4 +54,12 @@ jobs: run: | cd packages/evaluation pnpm run evaluate:locator - pnpm run evaluate:planning \ No newline at end of file + pnpm run evaluate:planning + + - name: Upload Logs + if: always() + uses: actions/upload-artifact@v4 + with: + name: evaluation-logs + path: ${{ github.workspace }}/packages/evaluation/tests/__ai_responses__/ + if-no-files-found: ignore \ No newline at end of file diff --git a/README.md b/README.md index 59e140db8..07cf9ae83 100644 --- a/README.md +++ b/README.md @@ -43,7 +43,7 @@ Besides the default model *GPT-4o*, we have added two new recommended open-sourc - **Natural Language Interaction 👆**: Just describe your goals and steps, and Midscene will plan and operate the user interface for you. - **Chrome Extension Experience 🖥️**: Start experiencing immediately through the Chrome extension, no coding required. - **Puppeteer/Playwright Integration 🔧**: Supports Puppeteer and Playwright integration, allowing you to combine AI capabilities with these powerful automation tools for easy automation. -- **Support Private Deployment 🤖**: Supports private deployment of [`UI-TARS`](https://github.com/bytedance/ui-tars) model, which outperforms closed-source models like GPT-4o and Claude in UI automation scenarios while better protecting data security. +- **Support Open-Source Models 🤖**: Supports private deployment of [`UI-TARS`](https://github.com/bytedance/ui-tars) and [`Qwen2.5-VL`](https://github.com/QwenLM/Qwen2.5-VL), which outperforms closed-source models like GPT-4o and Claude in UI automation scenarios while better protecting data security. - **Support General Models 🌟**: Supports general large models like GPT-4o and Claude, adapting to various scenario needs. - **Visual Reports for Debugging 🎞️**: Through our test reports and Playground, you can easily understand, replay and debug the entire process. - **Support Caching 🔄**: The first time you execute a task through AI, it will be cached, and subsequent executions of the same task will significantly improve execution efficiency. diff --git a/README.zh.md b/README.zh.md index 3f5b11432..9828329c2 100644 --- a/README.zh.md +++ b/README.zh.md @@ -34,7 +34,7 @@ Midscene.js 让 AI 成为你的浏览器操作员 🤖。只需用自然语言 | 用 JS 代码驱动编排任务,搜集周杰伦演唱会的信息,并写入 Google Docs |