Skip to content

Commit 5417632

Browse files
committed
feat: update docs, and some tiny bugfix
1 parent 4ec3607 commit 5417632

File tree

25 files changed

+253
-663
lines changed

25 files changed

+253
-663
lines changed

apps/site/docs/doc/_meta.json

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,5 @@
1414
"collapsed": false
1515
},
1616
"prompting-tips.md",
17-
{
18-
"type": "dir",
19-
"name": "integration",
20-
"label": "Integration",
21-
"collapsible": false,
22-
"collapsed": false
23-
},
2417
"faq.md"
2518
]

apps/site/docs/doc/faq.md

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Q & A
1+
# FAQ
22

33
#### About the token cost
44

@@ -13,19 +13,10 @@ Here are some typical data.
1313

1414
The above price data was calculated in June 2024.
1515

16-
#### How can I do assertions with MidScene ?
16+
#### The automation process is running more slowly than it did before
1717

18-
MidScene.js is an SDK for UI understanding, rather than a testing framework. You should integrate it with a familiar testing framework.
18+
Since MidScene.js will invoke the AI each time it performs planning and querying, the running time may increase by a factor of 5 to 10. This is inevitable for now, but it may improve with advancements in LLMs.
1919

20-
Here are some feasible ways:
21-
* Using Playwright, see [Integrate with Playwright](/doc/integration/playwright)
22-
* Using [Vitest](https://vitest.dev/) + [puppeteer](https://pptr.dev/), see [Integrate with Puppeteer](/doc/integration/puppeteer)
20+
Despite the increased time and cost, MidScene stands out in practical applications due to its unique development experience and easy-to-maintain codebase. We are confident that incorporating automation scripts powered by MidScene will significantly enhance your project’s efficiency, streamline complex tasks, and boost overall productivity. By integrating MidScene, your team can focus on more strategic and innovative activities, leading to faster development cycles and better outcomes.
2321

24-
25-
#### What's the "element" in MidScene ?
26-
27-
An element in MidScene is an object defined by MidScene. Currently, it contains only text elements, primarily consisting of text content and coordinates. It is different from elements in the browser, so you cannot call browser methods on it.
28-
29-
#### Failed to interact with web page ?
30-
31-
The coordinates returned from MidScene only represent their positions at the time they are collected. You should check the latest UI style when interacting with the UI.
22+
In short, it is worth the time and cost.

apps/site/docs/doc/getting-started/introduction.mdx

Lines changed: 37 additions & 161 deletions
Original file line numberDiff line numberDiff line change
@@ -4,168 +4,44 @@
44
<source src="/MidScene_L.mp4" type="video/mp4" />
55
</video>
66

7-
Writing UI automation is often an annoying task. Understanding the characteristics of the DOM while writing code is not easy. Worse, writing tests for an existing web page that lacks predefined `#id` or `data-test-xxx` properties can make your selectors unmanageable and the entire test file impossible to maintain.
8-
9-
### Using high-level understanding of UI to reshape the automation
10-
11-
With MidScene.js, we harness the power of AI’s multi-modality to turn your UI into consistent and well-organized outputs. What you have to do is describe the expected data shape from screenshot, AI will do the magical reasoning for you, and TypeScript will ensure a first-class developing experience. There won't be any `.selector`s in your script any longer.
12-
13-
Finally, the joy of programming will come back!
14-
15-
16-
### Flow Chart
17-
18-
Here is a flowchart illustrating the main process of MidScene.
19-
20-
![](/flow.png)
21-
22-
### Features
23-
24-
#### Locate - Find by natural language
25-
26-
Using GPT-4o, you can now locate the elements by natural language. Just like someone is viewing your page. DOM selectors should no longer be necessary.
27-
28-
```typescript
29-
const downloadBtns = await insight.locate('download buttons on the page', {multi: true});
30-
console.log(downloadBtns);
31-
```
32-
33-
The result would be like
34-
```typescript
35-
[
36-
{ content: 'Download', rect: { left: 1451, top: 78, width: 74, height: 22 } },
37-
{ content: 'Download Mac Universal', rect: { left: 432, top: 328, width: 232, height: 65 } }
38-
]
39-
```
40-
41-
#### Understand - And answer in JSON
42-
43-
Besides basic locator and segmentation, MidScene can help you to understand the UI.
44-
By providing the AI with the data shape you want, you will receive a predictable answer, both in terms of data structure and value. You may have never thought about UI automation in this way.
45-
46-
Use `query` to achieve this.
47-
48-
For example, if you want to understand some properties while locating elements:
49-
50-
```typescript
51-
const downloadBtns = await insight.locate(query('download buttons on the page', {
52-
textsOnButton: 'string',
53-
backgroundColor: 'string, color of text, one of blue / red / yellow / green / white / black / others',
54-
type: '`major` or `minor`. The Bigger one is major and the others are minor',
55-
platform: 'string. Say `unknown` when it is not clear on the element',
56-
}), {multi: true});
57-
```
58-
59-
The result would be like
60-
```typescript
61-
[
62-
{
63-
content: 'Download Mac Universal',
64-
rect: { left: 432, top: 328, width: 232, height: 65 },
65-
textsOnButton: 'Download Mac Universal', // <------ The data mentions in prompt
66-
backgroundColor: 'blue',
67-
type: 'major',
68-
platform: 'Mac'
69-
},
70-
{
71-
content: 'Download',
72-
type: 'minor',
73-
platform: 'unknown'
74-
// ...
75-
}
76-
]
77-
```
78-
79-
You can also extract data from a section by using `query`.
80-
For example, if you want to get the service status from the github status page:
81-
82-
```typescript
83-
const result = await insightStatus.segment({
84-
'services': query( // They are all the prompts being sent to the AI
85-
'a list with service names and status',
86-
{ items: '{service: "service name as string", status: "string, like normal"}[]' },
87-
),
88-
});
89-
```
90-
91-
Here is the return value:
92-
```typescript
93-
[
94-
{ service: 'Git Operations', status: 'Normal' },
95-
{ service: 'API Requests', status: 'Normal' },
96-
{ service: 'Webhooks', status: 'Normal' },
97-
// ...
98-
]
99-
```
100-
101-
#### Typed - Out-of-box TypeScript definitions
102-
103-
The custom data shape you defined can have types assigned automatically. Simply use dot notation to access them.
104-
105-
Let's take the `result` above as a sample. TypeScript will give you the basic type hint:
106-
```typescript
107-
const result: {
108-
services: UISection<{
109-
items: unknown;
110-
}>;
111-
}
112-
```
113-
114-
By providing a generic type parameter, the return value can be explicitly specified.
115-
116-
```typescript
117-
const result = await insight.segment({
118-
'services': query<{items: {service: string, status: string}[]}>(
119-
'a list with service names and status',
120-
{ items: '{service: "service name as string", status: "string, like normal"}[]' },
121-
),
122-
});
123-
124-
const { items } = result.services;
125-
```
126-
127-
TypeScript will give you the following definition:
128-
129-
```typescript
130-
const items: {
131-
service: string;
132-
status: string;
133-
}[]
134-
```
135-
136-
#### Segment - Customized UI splitting
137-
138-
Describing the sections inside a page, and let AI help you to find them out.
139-
140-
```typescript
141-
// The param map is also a prompt being sent to the AI model.
142-
const manySections = await insight.segment({
143-
cookiePrompt: 'cookie prompt with its action buttons on the top of the page',
144-
topRightWidgets: 'widgets on the top right corner',
145-
});
146-
```
147-
148-
The data is as follows.
149-
150-
```typescript
151-
{
152-
cookiePrompt: {
153-
texts: [ [Object], [Object], [Object], [Object], [Object], [Object] ],
154-
rect: { left: 144, top: 8, width: 1655, height: 49 }
155-
},
156-
topRightWidgets: {
157-
texts: [ [Object], [Object] ],
158-
rect: { left: 1241, top: 64, width: 284, height: 50 }
159-
},
160-
}
161-
```
162-
163-
#### Online Visualization - Help to visualize your prompt
164-
165-
With our visualization tool, you can easily debug the prompt and AI response.
166-
167-
All intermediate data, such as the query, coordinates, split reason, and custom data, can be visualized.
7+
UI automation can be quite frustrating. It is always full of *#id*, *data-test-xxx* and *.selectors* that are hard to maintain, not to mention when a refactor happens to the page.
8+
9+
Introducing MidScene.js, an SDK that aims to restore the joy of programming by automating tasks.
10+
11+
With MidScene.js, we harness the power of multimodal LLM to make your UI outputs consistent and well-organized. All you need to do is describe the interaction steps or the expected data format based on a screenshot, and the AI will execute these tasks for you. Finally, it will bring back the joy of programming!
12+
13+
## Features
14+
15+
### Public LLMs are Fine
16+
17+
It is fine to use publicly available LLMs such as GPT-4. There is no need for custom training. To experience the out-of-the-box AI-driven automation, token is all you need. 😀
18+
19+
### Execute Actions
20+
21+
Use `.aiAction` to perform a series of actions by describing the steps.
22+
23+
For example `.aiAction('Enter "Learn JS today" in the task box, then press Enter to create')`.
24+
25+
### Extract Data from Page
26+
27+
`.aiQuery` is the method to extract customized data from the UI.
28+
29+
For example, by calling `const dataB = await agent.aiQuery('string[], task names in the list');`, you will get an array with string of the task names.
30+
31+
### Perform Assertions
32+
33+
Call `.aiAssert` to perform assertions on the page.
34+
35+
#### Visualization Tool
36+
37+
With our visualization tool, you can easily debug the prompt and AI response. All intermediate data, such as queries, plans, and actions, can be visualized.
16838

16939
You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.
17040

17141
![](/Visualizer.gif)
42+
43+
## Flow Chart
44+
45+
Here is a flowchart illustrating the core process of MidScene.
46+
47+
![](/flow.png)
Lines changed: 27 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,20 @@
11
# Quick Start
22

3-
Currently we use OpenAI GPT-4o as the default engine. So prepare an OpenAI key that is eligible for accessing GPT-4o.
3+
In this example, we use OpenAI GPT-4o and Puppeteer.js to _________. Remember to prepare an OpenAI key that is eligible for accessing GPT-4o before running.
4+
5+
> [Puppeteer](https://pptr.dev/) is a Node.js library which provides a high-level API to control Chrome or Firefox over the DevTools Protocol or WebDriver BiDi. Puppeteer runs in the headless (no visible UI) by default but can be configured to run in a visible ("headful") browser.
6+
7+
Config the API key
48

59
```bash
610
# replace by your own
711
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
8-
9-
# optional, if you use a proxy
10-
# export OPENAI_BASE_URL="..."
1112
```
1213

1314
Install
1415

1516
```bash
16-
npm install midscene --save-dev
17+
npm install @midscene/web --save-dev
1718
# for demo use
1819
npm install puppeteer ts-node --save-dev
1920
```
@@ -22,35 +23,26 @@ Write a simple demo to **extract the main download button of vscode website**.
2223
Save the following code as `./demo.ts`.
2324

2425
```typescript
25-
import puppeteer from 'puppeteer';
26-
import Insight, { query } from 'midscene';
27-
28-
Promise.resolve(
29-
(async () => {
30-
// launch vscode website
31-
const browser = await puppeteer.launch();
32-
const page = (await browser.pages())[0];
33-
await page.setViewport({ width: 1920, height: 1080 })
34-
await page.goto('https://code.visualstudio.com/');
35-
// wait for 5s
36-
console.log('Wait for 5 seconds. After that, the demo will begin.');
37-
await new Promise((resolve) => setTimeout(resolve, 5 * 1000));
38-
39-
// ⭐ find the main download button and its backgroundColor ⭐
40-
const insight = await Insight.fromPuppeteerBrowser(browser);
41-
const downloadBtn = await insight.locate(
42-
query('main download button on the page', {
43-
textsOnButton: 'string',
44-
backgroundColor: 'string, color of text, one of blue / red / yellow / green / white / black / others',
45-
}),
46-
);
47-
console.log(`backgroundColor of main download button is: `, downloadBtn!.backgroundColor);
48-
console.log(`text on the button is: `, downloadBtn!.textsOnButton);
49-
50-
// clean up
51-
await browser.close();
52-
})(),
53-
);
26+
import puppeteer, { Viewport } from 'puppeteer';
27+
import { PuppeteerAgent } from '@midscene/web/puppeteer';
28+
29+
// init Puppeteer page
30+
const browser = await puppeteer.launch({
31+
headless: false, // here we use headed mode to help debug
32+
});
33+
34+
const page = await browser.newPage();
35+
await page.goto('https://www.bing.com');
36+
await page.waitForNavigation({
37+
timeout: 20 * 1000,
38+
waitUntil: 'networkidle0',
39+
});
40+
const page = await launchPage();
41+
42+
// init MidScene agent
43+
const agent = new PuppeteerAgent(page);
44+
await agent.aiAction('type "how much is the ferry ticket in Shanghai" in search box, hit Enter');
45+
5446
```
5547

5648
Using ts-node to run:
@@ -62,5 +54,5 @@ npx ts-node demo.ts
6254
# it should print '... is blue'
6355
```
6456

65-
After running, MidScene will generate a log dump, which is placed in `./midscene_run/latest.insight.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
57+
After running, MidScene will generate a log dump, which is placed in `./midscene_run/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
6658

apps/site/docs/doc/integration/_meta.json

Lines changed: 0 additions & 6 deletions
This file was deleted.

apps/site/docs/doc/integration/others.md

Lines changed: 0 additions & 33 deletions
This file was deleted.

0 commit comments

Comments
 (0)