Skip to content

Commit 3350fc7

Browse files
authored
feat: update visualizer, and cache info field (#24)
* fix: page size * feat: update quick start * feat: update quick start * feat: update quick start * feat: update quick start * feat: update visualizer * feat: add cache info into task and dump * feat: add cache info into task and dump * fix: typo * fix: typo * feat: update docs * feat: update docs
1 parent 8ae566c commit 3350fc7

File tree

23 files changed

+197
-110
lines changed

23 files changed

+197
-110
lines changed
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
[
22
"introduction",
3-
"quick-start.md"
3+
"quick-start.md",
4+
"demo.md"
45
]
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Demo Projects
2+
3+
You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/
4+
5+
There are different folders with different type of project:
6+
7+
* [Playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
8+
* [Puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)

apps/site/docs/en/docs/getting-started/introduction.mdx

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
# Introduction
22

3-
<video controls>
4-
<source src="/MidScene_L.mp4" type="video/mp4" />
5-
</video>
6-
73
UI automation can be frustrating, often involving a maze of *#ids*, *data-test-xxx* attributes, and *.selectors* that are difficult to maintain, especially when the page undergoes a refactor.
84

95
Introducing MidScene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.
@@ -38,7 +34,7 @@ With our visualization tool, you can easily debug the prompt and AI response. Al
3834

3935
You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.
4036

41-
![](/Visualizer.gif)
37+
![](/visualizer.jpg)
4238

4339
## Flow Chart
4440

apps/site/docs/en/docs/getting-started/quick-start.md

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Quick Start
22

3-
In this example, we use OpenAI GPT-4o to search headphones on ebay, and then get the result items and prices in JSON format.
3+
In this example, we use OpenAI GPT-4o to search headphones on eBay, and then get the result items and prices in JSON format.
44

55
Remember to prepare an API key that is eligible for accessing OpenAI's GPT-4o before running.
66

@@ -13,14 +13,6 @@ Config the API key
1313
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
1414
```
1515

16-
Install Dependencies
17-
18-
```bash
19-
npm install @midscene/webaeb --save-dev
20-
# for demo use
21-
npm install puppeteer ts-node --save-dev
22-
```
23-
2416
## Integrate with Playwright
2517

2618
> [Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, primarily designed for end-to-end testing and web scraping of web applications.
@@ -92,10 +84,11 @@ npx playwright test ./e2e/ebay-search.spec.ts
9284

9385
### Step 5. view test report after running
9486

95-
Follow the instructions in the command line to server the report
87+
Follow the instructions in the command line to server the report.
9688

9789
```bash
98-
90+
# sample command
91+
npx http-server ./midscene_run/report -p 9888 -o -s
9992
```
10093

10194
## Integrate with Puppeteer
@@ -165,7 +158,7 @@ await mid.aiQuery(
165158

166159
### Step 3. run
167160

168-
Using ts-node to run, you will get the data of Headphones on ebay:
161+
Using ts-node to run, you will get the data of Headphones on eBay:
169162

170163
```bash
171164
# run
@@ -189,8 +182,3 @@ npx ts-node demo.ts
189182
After running, MidScene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
190183

191184
Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.
192-
193-
194-
## Demo Projects
195-
196-
You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/

apps/site/docs/en/docs/more/faq.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,17 @@ MidScene needs a multimodal Large Language Model (LLM) to understand the UI. Cur
2323

2424
### About the token cost
2525

26-
Image resolution and element numbers (i.e., a UI context size created by MidScene) form the token bill.
26+
Image resolution and element numbers (i.e., a UI context size created by MidScene) will affect the token bill.
2727

28-
Here are some typical data.
28+
Here are some typical data with GPT-4o.
2929

30-
|Task | Resolution | Input tokens | Output tokens | GPT-4o Price |
31-
|-----|------------|--------------|---------------|----------------|
32-
|Find the download button on the VSCode website| 1920x1080| 2011|54| $0.011|
33-
|Split the Github status page| 1920x1080| 3609|1020| $0.034|
30+
|Task | Resolution | Prompt Tokens / Price | Completion Tokens / Price |
31+
|-----|------------|--------------|---------------|
32+
|Plan the steps to search on eBay homepage| 1280x800 | 6,975 / $0.034875 |150 / $0.00225|
33+
|Locate the search box on the eBay homepage| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138|
34+
|Query the information about the item in the search results| 1280x800 | 13,403 / $0.067015 | 95 / $0.001425|
3435

35-
> The price data was calculated in June 2024.
36+
> The price data was calculated in August 2024.
3637
3738
### The automation process is running more slowly than it did before
3839

apps/site/docs/public/MidScene_L.mp4

-9.36 MB
Binary file not shown.

apps/site/docs/public/Visualizer.gif

-797 KB
Binary file not shown.

apps/site/docs/public/visualizer.jpg

349 KB
Loading
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
[
22
"introduction",
3-
"quick-start.md"
3+
"quick-start.md",
4+
"demo.md"
45
]
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# 样例项目
2+
3+
你可以在这里 Clone 完整的样例工程项目: https://github.com/web-infra-dev/midscene-example/
4+
5+
项目里提供了不同类型的项目集成样例:
6+
7+
* [Playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
8+
* [Puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)

apps/site/docs/zh/docs/getting-started/introduction.mdx

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,8 @@
11
# 介绍
22

3-
<video controls>
4-
<source src="/MidScene_L.mp4" type="video/mp4" />
5-
</video>
3+
UI 自动化太难写了。自动化脚本里到处都是选择器,比如 `#ids``data-test-xxx``.selectors`。在页面重构的时候,维护自动化脚本更将会是一场灾难。
64

7-
UI 自动化太难写了。自动化脚本里到处都是选择器,比如 `#ids``data-test-xxx``.selectors`。在页面重构的时候,维护自动化脚本更会会是一场灾难。
8-
9-
我们在这里推出 MidScene.js。通过 AI 加持,它能让自动化脚本变得简单、可维护,助你重拾编码的乐趣。
5+
我们在这里推出 MidScene.js,助你重拾编码的乐趣。
106

117
MidScene.js 采用了多模态大语言模型(LLM),能够直观地“理解”你的用户界面并执行必要的操作。你只需描述交互步骤或期望的数据格式,AI 就能为你完成任务。
128

@@ -48,7 +44,7 @@ const dataB = await agent.aiQuery('string[], 任务列表中的任务名');
4844

4945
你可以打开 [可视化工具](/visualization/index.html) 来查看示例。
5046

51-
![可视化工具示例](/Visualizer.gif)
47+
![](/visualizer.jpg)
5248

5349
## 流程图
5450

apps/site/docs/zh/docs/getting-started/quick-start.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 快速开始
22

3-
在这个例子中,我们将使用 OpenAI GPT-4o 在 ebay 上搜索 "耳机",并以 JSON 格式返回商品和价格结果。
3+
我们用这个需求来举例:使用 OpenAI GPT-4o 在 eBay 上搜索 "耳机",并以 JSON 格式返回商品和价格结果。
44

55
在运行该示例之前,请确保您已经准备了能够调用 OpenAI GPT-4o 模型的 API key。
66

@@ -11,7 +11,7 @@
1111
配置 API Key
1212

1313
```bash
14-
# replace by your own
14+
# 更新为你自己的 Key
1515
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
1616
```
1717

@@ -87,10 +87,11 @@ npx playwright test ./e2e/ebay-search.spec.ts
8787

8888
### Step 5. 查看测试报告
8989

90-
Follow the instructions in the command line to server the report
90+
根据命令行输出,执行命令,可以以此打开可视化报告
9191

9292
```bash
93-
93+
# 样例
94+
npx http-server ./midscene_run/report -p 9888 -o -s
9495
```
9596

9697
## 集成到 Puppeteer
@@ -186,7 +187,3 @@ npx ts-node demo.ts
186187
运行 MidScene 之后,系统会生成一个日志文件,默认存放在 `./midscene_run/report/latest.web-dump.json`。然后,你可以把这个文件导入 [可视化工具](/visualization/),这样你就能更清楚地了解整个过程。
187188

188189
[可视化工具](/visualization/) 中,点击 `Load Demo` 按钮,你将能够看到上方代码的运行结果以及其他的一些示例。
189-
190-
## 完整的样例工程
191-
192-
你可以在这里 Clone 完整的样例工程项目: https://github.com/web-infra-dev/midscene-example/

apps/site/docs/zh/docs/more/faq.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,21 +19,22 @@ MidScene 存在一些局限性,我们仍在努力改进。
1919

2020
### 关于 token 成本
2121

22-
Token 消耗分为两部分:图像分辨率和元素数量(即 MidScene 创建的 UI 上下文大小)。
22+
图像分辨率和元素数量(即 MidScene 创建的 UI 上下文大小)会显著影响 token 消耗
2323

2424
以下是一些典型数据:
2525

26-
| 任务 | 分辨率 | 输入 token | 输出 token | GPT-4o 价格 |
27-
|-------|----------|----------|----------|--------------|
28-
| 在 VSCode 网站上找到下载按钮 | 1920x1080 | 2011 | 54 | $0.011 |
29-
| 拆分 Github 状态页面 | 1920x1080 | 3609 | 1020 | $0.034 |
26+
|任务 | 分辨率 | Prompt Tokens / 价格 | Completion Tokens / 价格 |
27+
|-----|------------|--------------|---------------|
28+
|拆解(Plan)执行搜索的步骤| 1280x800| 6,975 / $0.034875 |150 / $0.00225|
29+
|定位(Locate)搜索框| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138 |
30+
|提取(Query)商品信息| 1280x800| 13,403 / $0.067015 | 95 / $0.001425 |
3031

31-
> 这些价格数据是 2024 年 6 月计算所得
32+
> 这些价格数据测算于 2024 年 8 月
3233
3334
### 脚本运行偏慢?
3435

3536
由于 MidScene.js 每次进行规划(Planning)和查询(Query)时都会调用 AI,其运行耗时可能比传统 Playwright 用例增加 3 到 10 倍,比如从 5 秒变成 20秒。目前,这一点仍无法避免。但随着大型语言模型(LLM)的进步,未来性能可能会有所改善。
3637

3738
尽管运行时间较长,MidScene 在实际应用中依然表现出色。它独特的开发体验会让代码库易于维护。我们相信,集成了 MidScene 的自动化脚本能够显著提升项目迭代效率,覆盖更多场景,提高整体生产力。
3839

39-
简而言之,虽然偏慢,但这些时间投入一定都是值得的
40+
简而言之,虽然偏慢,但这些投入一定都是值得的

packages/midscene/src/types.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,10 @@ export interface ExecutorContext {
209209
element?: BaseElement | null;
210210
}
211211

212+
export interface TaskCacheInfo {
213+
hit: boolean;
214+
}
215+
212216
export interface ExecutionTaskApply<
213217
Type extends ExecutionTaskType = any,
214218
TaskParam = any,
@@ -228,6 +232,7 @@ export interface ExecutionTaskReturn<TaskOutput = unknown, TaskLog = unknown> {
228232
output?: TaskOutput;
229233
log?: TaskLog;
230234
recorder?: ExecutionRecorderItem[];
235+
cache?: TaskCacheInfo;
231236
}
232237

233238
export type ExecutionTask<E extends ExecutionTaskApply<any, any, any> = ExecutionTaskApply<any, any, any>> =

0 commit comments

Comments
 (0)