|
4 | 4 | <source src="/MidScene_L.mp4" type="video/mp4" />
|
5 | 5 | </video>
|
6 | 6 |
|
7 |
| -Writing UI automation is often an annoying task. Understanding the characteristics of the DOM while writing code is not easy. Worse, writing tests for an existing web page that lacks predefined `#id` or `data-test-xxx` properties can make your selectors unmanageable and the entire test file impossible to maintain. |
8 |
| - |
9 |
| -### Using high-level understanding of UI to reshape the automation |
10 |
| - |
11 |
| -With MidScene.js, we harness the power of AI’s multi-modality to turn your UI into consistent and well-organized outputs. What you have to do is describe the expected data shape from screenshot, AI will do the magical reasoning for you, and TypeScript will ensure a first-class developing experience. There won't be any `.selector`s in your script any longer. |
12 |
| - |
13 |
| -Finally, the joy of programming will come back! |
14 |
| - |
15 |
| - |
16 |
| -### Flow Chart |
17 |
| - |
18 |
| -Here is a flowchart illustrating the main process of MidScene. |
19 |
| - |
20 |
| - |
21 |
| - |
22 |
| -### Features |
23 |
| - |
24 |
| -#### Locate - Find by natural language |
25 |
| - |
26 |
| -Using GPT-4o, you can now locate the elements by natural language. Just like someone is viewing your page. DOM selectors should no longer be necessary. |
27 |
| - |
28 |
| -```typescript |
29 |
| -const downloadBtns = await insight.locate('download buttons on the page', {multi: true}); |
30 |
| -console.log(downloadBtns); |
31 |
| -``` |
32 |
| - |
33 |
| -The result would be like |
34 |
| -```typescript |
35 |
| -[ |
36 |
| - { content: 'Download', rect: { left: 1451, top: 78, width: 74, height: 22 } }, |
37 |
| - { content: 'Download Mac Universal', rect: { left: 432, top: 328, width: 232, height: 65 } } |
38 |
| -] |
39 |
| -``` |
40 |
| - |
41 |
| -#### Understand - And answer in JSON |
42 |
| - |
43 |
| -Besides basic locator and segmentation, MidScene can help you to understand the UI. |
44 |
| -By providing the AI with the data shape you want, you will receive a predictable answer, both in terms of data structure and value. You may have never thought about UI automation in this way. |
45 |
| - |
46 |
| -Use `query` to achieve this. |
47 |
| - |
48 |
| -For example, if you want to understand some properties while locating elements: |
49 |
| - |
50 |
| -```typescript |
51 |
| -const downloadBtns = await insight.locate(query('download buttons on the page', { |
52 |
| - textsOnButton: 'string', |
53 |
| - backgroundColor: 'string, color of text, one of blue / red / yellow / green / white / black / others', |
54 |
| - type: '`major` or `minor`. The Bigger one is major and the others are minor', |
55 |
| - platform: 'string. Say `unknown` when it is not clear on the element', |
56 |
| -}), {multi: true}); |
57 |
| -``` |
58 |
| - |
59 |
| -The result would be like |
60 |
| -```typescript |
61 |
| -[ |
62 |
| - { |
63 |
| - content: 'Download Mac Universal', |
64 |
| - rect: { left: 432, top: 328, width: 232, height: 65 }, |
65 |
| - textsOnButton: 'Download Mac Universal', // <------ The data mentions in prompt |
66 |
| - backgroundColor: 'blue', |
67 |
| - type: 'major', |
68 |
| - platform: 'Mac' |
69 |
| - }, |
70 |
| - { |
71 |
| - content: 'Download', |
72 |
| - type: 'minor', |
73 |
| - platform: 'unknown' |
74 |
| - // ... |
75 |
| - } |
76 |
| -] |
77 |
| -``` |
78 |
| - |
79 |
| -You can also extract data from a section by using `query`. |
80 |
| -For example, if you want to get the service status from the github status page: |
81 |
| - |
82 |
| -```typescript |
83 |
| -const result = await insightStatus.segment({ |
84 |
| - 'services': query( // They are all the prompts being sent to the AI |
85 |
| - 'a list with service names and status', |
86 |
| - { items: '{service: "service name as string", status: "string, like normal"}[]' }, |
87 |
| - ), |
88 |
| -}); |
89 |
| -``` |
90 |
| - |
91 |
| -Here is the return value: |
92 |
| -```typescript |
93 |
| -[ |
94 |
| - { service: 'Git Operations', status: 'Normal' }, |
95 |
| - { service: 'API Requests', status: 'Normal' }, |
96 |
| - { service: 'Webhooks', status: 'Normal' }, |
97 |
| - // ... |
98 |
| -] |
99 |
| -``` |
100 |
| - |
101 |
| -#### Typed - Out-of-box TypeScript definitions |
102 |
| - |
103 |
| -The custom data shape you defined can have types assigned automatically. Simply use dot notation to access them. |
104 |
| - |
105 |
| -Let's take the `result` above as a sample. TypeScript will give you the basic type hint: |
106 |
| -```typescript |
107 |
| -const result: { |
108 |
| - services: UISection<{ |
109 |
| - items: unknown; |
110 |
| - }>; |
111 |
| -} |
112 |
| -``` |
113 |
| -
|
114 |
| -By providing a generic type parameter, the return value can be explicitly specified. |
115 |
| -
|
116 |
| -```typescript |
117 |
| -const result = await insight.segment({ |
118 |
| - 'services': query<{items: {service: string, status: string}[]}>( |
119 |
| - 'a list with service names and status', |
120 |
| - { items: '{service: "service name as string", status: "string, like normal"}[]' }, |
121 |
| - ), |
122 |
| -}); |
123 |
| - |
124 |
| -const { items } = result.services; |
125 |
| -``` |
126 |
| - |
127 |
| -TypeScript will give you the following definition: |
128 |
| - |
129 |
| -```typescript |
130 |
| -const items: { |
131 |
| - service: string; |
132 |
| - status: string; |
133 |
| -}[] |
134 |
| -``` |
135 |
| -
|
136 |
| -#### Segment - Customized UI splitting |
137 |
| -
|
138 |
| -Describing the sections inside a page, and let AI help you to find them out. |
139 |
| -
|
140 |
| -```typescript |
141 |
| -// The param map is also a prompt being sent to the AI model. |
142 |
| -const manySections = await insight.segment({ |
143 |
| - cookiePrompt: 'cookie prompt with its action buttons on the top of the page', |
144 |
| - topRightWidgets: 'widgets on the top right corner', |
145 |
| -}); |
146 |
| -``` |
147 |
| - |
148 |
| -The data is as follows. |
149 |
| - |
150 |
| -```typescript |
151 |
| -{ |
152 |
| - cookiePrompt: { |
153 |
| - texts: [ [Object], [Object], [Object], [Object], [Object], [Object] ], |
154 |
| - rect: { left: 144, top: 8, width: 1655, height: 49 } |
155 |
| - }, |
156 |
| - topRightWidgets: { |
157 |
| - texts: [ [Object], [Object] ], |
158 |
| - rect: { left: 1241, top: 64, width: 284, height: 50 } |
159 |
| - }, |
160 |
| -} |
161 |
| -``` |
162 |
| - |
163 |
| -#### Online Visualization - Help to visualize your prompt |
164 |
| - |
165 |
| -With our visualization tool, you can easily debug the prompt and AI response. |
166 |
| - |
167 |
| -All intermediate data, such as the query, coordinates, split reason, and custom data, can be visualized. |
| 7 | +UI automation can be quite frustrating. It is always full of *#id*, *data-test-xxx* and *.selectors* that are hard to maintain, not to mention when a refactor happens to the page. |
| 8 | + |
| 9 | +Introducing MidScene.js, an SDK that aims to restore the joy of programming by automating tasks. |
| 10 | + |
| 11 | +With MidScene.js, we harness the power of multimodal LLM to make your UI outputs consistent and well-organized. All you need to do is describe the interaction steps or the expected data format based on a screenshot, and the AI will execute these tasks for you. Finally, it will bring back the joy of programming! |
| 12 | + |
| 13 | +## Features |
| 14 | + |
| 15 | +### Public LLMs are Fine |
| 16 | + |
| 17 | +It is fine to use publicly available LLMs such as GPT-4. There is no need for custom training. To experience the out-of-the-box AI-driven automation, token is all you need. 😀 |
| 18 | + |
| 19 | +### Execute Actions |
| 20 | + |
| 21 | +Use `.aiAction` to perform a series of actions by describing the steps. |
| 22 | + |
| 23 | +For example `.aiAction('Enter "Learn JS today" in the task box, then press Enter to create')`. |
| 24 | + |
| 25 | +### Extract Data from Page |
| 26 | + |
| 27 | +`.aiQuery` is the method to extract customized data from the UI. |
| 28 | + |
| 29 | +For example, by calling `const dataB = await agent.aiQuery('string[], task names in the list');`, you will get an array with string of the task names. |
| 30 | + |
| 31 | +### Perform Assertions |
| 32 | + |
| 33 | +Call `.aiAssert` to perform assertions on the page. |
| 34 | + |
| 35 | +#### Visualization Tool |
| 36 | + |
| 37 | +With our visualization tool, you can easily debug the prompt and AI response. All intermediate data, such as queries, plans, and actions, can be visualized. |
168 | 38 |
|
169 | 39 | You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.
|
170 | 40 |
|
171 | 41 | 
|
| 42 | + |
| 43 | +## Flow Chart |
| 44 | + |
| 45 | +Here is a flowchart illustrating the core process of MidScene. |
| 46 | + |
| 47 | + |
0 commit comments