Skip to content

Commit

Permalink
feat: update visualizer, and cache info field (#24)
Browse files Browse the repository at this point in the history
* fix: page size

* feat: update quick start

* feat: update quick start

* feat: update quick start

* feat: update quick start

* feat: update visualizer

* feat: add cache info into task and dump

* feat: add cache info into task and dump

* fix: typo

* fix: typo

* feat: update docs

* feat: update docs
  • Loading branch information
yuyutaotao authored Aug 2, 2024
1 parent 8ae566c commit 3350fc7
Show file tree
Hide file tree
Showing 23 changed files with 197 additions and 110 deletions.
3 changes: 2 additions & 1 deletion apps/site/docs/en/docs/getting-started/_meta.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[
"introduction",
"quick-start.md"
"quick-start.md",
"demo.md"
]
8 changes: 8 additions & 0 deletions apps/site/docs/en/docs/getting-started/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Demo Projects

You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/

There are different folders with different type of project:

* [Playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
* [Puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)
6 changes: 1 addition & 5 deletions apps/site/docs/en/docs/getting-started/introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# Introduction

<video controls>
<source src="/MidScene_L.mp4" type="video/mp4" />
</video>

UI automation can be frustrating, often involving a maze of *#ids*, *data-test-xxx* attributes, and *.selectors* that are difficult to maintain, especially when the page undergoes a refactor.

Introducing MidScene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.
Expand Down Expand Up @@ -38,7 +34,7 @@ With our visualization tool, you can easily debug the prompt and AI response. Al

You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.

![](/Visualizer.gif)
![](/visualizer.jpg)

## Flow Chart

Expand Down
22 changes: 5 additions & 17 deletions apps/site/docs/en/docs/getting-started/quick-start.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Quick Start

In this example, we use OpenAI GPT-4o to search headphones on ebay, and then get the result items and prices in JSON format.
In this example, we use OpenAI GPT-4o to search headphones on eBay, and then get the result items and prices in JSON format.

Remember to prepare an API key that is eligible for accessing OpenAI's GPT-4o before running.

Expand All @@ -13,14 +13,6 @@ Config the API key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

Install Dependencies

```bash
npm install @midscene/webaeb --save-dev
# for demo use
npm install puppeteer ts-node --save-dev
```

## Integrate with Playwright

> [Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, primarily designed for end-to-end testing and web scraping of web applications.
Expand Down Expand Up @@ -92,10 +84,11 @@ npx playwright test ./e2e/ebay-search.spec.ts

### Step 5. view test report after running

Follow the instructions in the command line to server the report
Follow the instructions in the command line to server the report.

```bash

# sample command
npx http-server ./midscene_run/report -p 9888 -o -s
```

## Integrate with Puppeteer
Expand Down Expand Up @@ -165,7 +158,7 @@ await mid.aiQuery(

### Step 3. run

Using ts-node to run, you will get the data of Headphones on ebay:
Using ts-node to run, you will get the data of Headphones on eBay:

```bash
# run
Expand All @@ -189,8 +182,3 @@ npx ts-node demo.ts
After running, MidScene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.

Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.


## Demo Projects

You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/
15 changes: 8 additions & 7 deletions apps/site/docs/en/docs/more/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,17 @@ MidScene needs a multimodal Large Language Model (LLM) to understand the UI. Cur

### About the token cost

Image resolution and element numbers (i.e., a UI context size created by MidScene) form the token bill.
Image resolution and element numbers (i.e., a UI context size created by MidScene) will affect the token bill.

Here are some typical data.
Here are some typical data with GPT-4o.

|Task | Resolution | Input tokens | Output tokens | GPT-4o Price |
|-----|------------|--------------|---------------|----------------|
|Find the download button on the VSCode website| 1920x1080| 2011|54| $0.011|
|Split the Github status page| 1920x1080| 3609|1020| $0.034|
|Task | Resolution | Prompt Tokens / Price | Completion Tokens / Price |
|-----|------------|--------------|---------------|
|Plan the steps to search on eBay homepage| 1280x800 | 6,975 / $0.034875 |150 / $0.00225|
|Locate the search box on the eBay homepage| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138|
|Query the information about the item in the search results| 1280x800 | 13,403 / $0.067015 | 95 / $0.001425|

> The price data was calculated in June 2024.
> The price data was calculated in August 2024.
### The automation process is running more slowly than it did before

Expand Down
Binary file removed apps/site/docs/public/MidScene_L.mp4
Binary file not shown.
Binary file removed apps/site/docs/public/Visualizer.gif
Binary file not shown.
Binary file added apps/site/docs/public/visualizer.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion apps/site/docs/zh/docs/getting-started/_meta.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[
"introduction",
"quick-start.md"
"quick-start.md",
"demo.md"
]
8 changes: 8 additions & 0 deletions apps/site/docs/zh/docs/getting-started/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# 样例项目

你可以在这里 Clone 完整的样例工程项目: https://github.com/web-infra-dev/midscene-example/

项目里提供了不同类型的项目集成样例:

* [Playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
* [Puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)
10 changes: 3 additions & 7 deletions apps/site/docs/zh/docs/getting-started/introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
# 介绍

<video controls>
<source src="/MidScene_L.mp4" type="video/mp4" />
</video>
UI 自动化太难写了。自动化脚本里到处都是选择器,比如 `#ids``data-test-xxx``.selectors`。在页面重构的时候,维护自动化脚本更将会是一场灾难。

UI 自动化太难写了。自动化脚本里到处都是选择器,比如 `#ids``data-test-xxx``.selectors`。在页面重构的时候,维护自动化脚本更会会是一场灾难。

我们在这里推出 MidScene.js。通过 AI 加持,它能让自动化脚本变得简单、可维护,助你重拾编码的乐趣。
我们在这里推出 MidScene.js,助你重拾编码的乐趣。

MidScene.js 采用了多模态大语言模型(LLM),能够直观地“理解”你的用户界面并执行必要的操作。你只需描述交互步骤或期望的数据格式,AI 就能为你完成任务。

Expand Down Expand Up @@ -48,7 +44,7 @@ const dataB = await agent.aiQuery('string[], 任务列表中的任务名');

你可以打开 [可视化工具](/visualization/index.html) 来查看示例。

![可视化工具示例](/Visualizer.gif)
![](/visualizer.jpg)

## 流程图

Expand Down
13 changes: 5 additions & 8 deletions apps/site/docs/zh/docs/getting-started/quick-start.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 快速开始

在这个例子中,我们将使用 OpenAI GPT-4o 在 ebay 上搜索 "耳机",并以 JSON 格式返回商品和价格结果。
我们用这个需求来举例:使用 OpenAI GPT-4o 在 eBay 上搜索 "耳机",并以 JSON 格式返回商品和价格结果。

在运行该示例之前,请确保您已经准备了能够调用 OpenAI GPT-4o 模型的 API key。

Expand All @@ -11,7 +11,7 @@
配置 API Key

```bash
# replace by your own
# 更新为你自己的 Key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

Expand Down Expand Up @@ -87,10 +87,11 @@ npx playwright test ./e2e/ebay-search.spec.ts

### Step 5. 查看测试报告

Follow the instructions in the command line to server the report
根据命令行输出,执行命令,可以以此打开可视化报告

```bash

# 样例
npx http-server ./midscene_run/report -p 9888 -o -s
```

## 集成到 Puppeteer
Expand Down Expand Up @@ -186,7 +187,3 @@ npx ts-node demo.ts
运行 MidScene 之后,系统会生成一个日志文件,默认存放在 `./midscene_run/report/latest.web-dump.json`。然后,你可以把这个文件导入 [可视化工具](/visualization/),这样你就能更清楚地了解整个过程。

[可视化工具](/visualization/) 中,点击 `Load Demo` 按钮,你将能够看到上方代码的运行结果以及其他的一些示例。

## 完整的样例工程

你可以在这里 Clone 完整的样例工程项目: https://github.com/web-infra-dev/midscene-example/
15 changes: 8 additions & 7 deletions apps/site/docs/zh/docs/more/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,22 @@ MidScene 存在一些局限性,我们仍在努力改进。

### 关于 token 成本

Token 消耗分为两部分:图像分辨率和元素数量(即 MidScene 创建的 UI 上下文大小)。
图像分辨率和元素数量(即 MidScene 创建的 UI 上下文大小)会显著影响 token 消耗

以下是一些典型数据:

| 任务 | 分辨率 | 输入 token | 输出 token | GPT-4o 价格 |
|-------|----------|----------|----------|--------------|
| 在 VSCode 网站上找到下载按钮 | 1920x1080 | 2011 | 54 | $0.011 |
| 拆分 Github 状态页面 | 1920x1080 | 3609 | 1020 | $0.034 |
|任务 | 分辨率 | Prompt Tokens / 价格 | Completion Tokens / 价格 |
|-----|------------|--------------|---------------|
|拆解(Plan)执行搜索的步骤| 1280x800| 6,975 / $0.034875 |150 / $0.00225|
|定位(Locate)搜索框| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138 |
|提取(Query)商品信息| 1280x800| 13,403 / $0.067015 | 95 / $0.001425 |

> 这些价格数据是 2024 年 6 月计算所得
> 这些价格数据测算于 2024 年 8 月
### 脚本运行偏慢?

由于 MidScene.js 每次进行规划(Planning)和查询(Query)时都会调用 AI,其运行耗时可能比传统 Playwright 用例增加 3 到 10 倍,比如从 5 秒变成 20秒。目前,这一点仍无法避免。但随着大型语言模型(LLM)的进步,未来性能可能会有所改善。

尽管运行时间较长,MidScene 在实际应用中依然表现出色。它独特的开发体验会让代码库易于维护。我们相信,集成了 MidScene 的自动化脚本能够显著提升项目迭代效率,覆盖更多场景,提高整体生产力。

简而言之,虽然偏慢,但这些时间投入一定都是值得的
简而言之,虽然偏慢,但这些投入一定都是值得的
5 changes: 5 additions & 0 deletions packages/midscene/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,10 @@ export interface ExecutorContext {
element?: BaseElement | null;
}

export interface TaskCacheInfo {
hit: boolean;
}

export interface ExecutionTaskApply<
Type extends ExecutionTaskType = any,
TaskParam = any,
Expand All @@ -228,6 +232,7 @@ export interface ExecutionTaskReturn<TaskOutput = unknown, TaskLog = unknown> {
output?: TaskOutput;
log?: TaskLog;
recorder?: ExecutionRecorderItem[];
cache?: TaskCacheInfo;
}

export type ExecutionTask<E extends ExecutionTaskApply<any, any, any> = ExecutionTaskApply<any, any, any>> =
Expand Down
Loading

0 comments on commit 3350fc7

Please sign in to comment.