web-infra-dev
diff --git a/‎apps/site/docs/doc/_meta.json
Lines changed: 0 additions & 7 deletions b/‎apps/site/docs/doc/_meta.json
Lines changed: 0 additions & 7 deletions
diff --git a/‎apps/site/docs/doc/faq.md
Lines changed: 5 additions & 14 deletions b/‎apps/site/docs/doc/faq.md
Lines changed: 5 additions & 14 deletions
diff --git a/‎apps/site/docs/doc/getting-started/introduction.mdx
Lines changed: 37 additions & 161 deletions b/‎apps/site/docs/doc/getting-started/introduction.mdx
Lines changed: 37 additions & 161 deletions
diff --git a/‎apps/site/docs/doc/getting-started/quick-start.md
Lines changed: 27 additions & 35 deletions b/‎apps/site/docs/doc/getting-started/quick-start.md
Lines changed: 27 additions & 35 deletions
diff --git a/‎apps/site/docs/doc/integration/_meta.json
Lines changed: 0 additions & 6 deletions b/‎apps/site/docs/doc/integration/_meta.json
Lines changed: 0 additions & 6 deletions
diff --git a/‎apps/site/docs/doc/integration/others.md
Lines changed: 0 additions & 33 deletions b/‎apps/site/docs/doc/integration/others.md
Lines changed: 0 additions & 33 deletions
@@ -14,12 +14,5 @@
     "collapsed": false
   },
   "prompting-tips.md",
-  {
-    "type": "dir",
-    "name": "integration",
-    "label": "Integration",
-    "collapsible": false,
-    "collapsed": false
-  },
   "faq.md"
 ]
@@ -1,4 +1,4 @@
-# Q & A
+# FAQ
 
 #### About the token cost
 
@@ -13,19 +13,10 @@ Here are some typical data.
 
 The above price data was calculated in June 2024.
 
-#### How can I do assertions with MidScene ?
+#### The automation process is running more slowly than it did before
 
-MidScene.js is an SDK for UI understanding, rather than a testing framework. You should integrate it with a familiar testing framework.
+Since MidScene.js will invoke the AI each time it performs planning and querying, the running time may increase by a factor of 5 to 10. This is inevitable for now, but it may improve with advancements in LLMs.
 
-Here are some feasible ways:
-* Using Playwright, see [Integrate with Playwright](/doc/integration/playwright)
-* Using [Vitest](https://vitest.dev/) + [puppeteer](https://pptr.dev/), see [Integrate with Puppeteer](/doc/integration/puppeteer)
+Despite the increased time and cost, MidScene stands out in practical applications due to its unique development experience and easy-to-maintain codebase. We are confident that incorporating automation scripts powered by MidScene will significantly enhance your project’s efficiency, streamline complex tasks, and boost overall productivity. By integrating MidScene, your team can focus on more strategic and innovative activities, leading to faster development cycles and better outcomes.
 
-
-#### What's the "element" in MidScene ? 
-
-An element in MidScene is an object defined by MidScene. Currently, it contains only text elements, primarily consisting of text content and coordinates. It is different from elements in the browser, so you cannot call browser methods on it.
-
-#### Failed to interact with web page ?
-
-The coordinates returned from MidScene only represent their positions at the time they are collected. You should check the latest UI style when interacting with the UI.
+In short, it is worth the time and cost.
@@ -4,168 +4,44 @@
   <source src="/MidScene_L.mp4" type="video/mp4" />
 </video>
 
-Writing UI automation is often an annoying task. Understanding the characteristics of the DOM while writing code is not easy. Worse, writing tests for an existing web page that lacks predefined `#id` or `data-test-xxx` properties can make your selectors unmanageable and the entire test file impossible to maintain.
-
-### Using high-level understanding of UI to reshape the automation
-
-With MidScene.js, we harness the power of AI’s multi-modality to turn your UI into consistent and well-organized outputs. What you have to do is describe the expected data shape from screenshot, AI will do the magical reasoning for you, and TypeScript will ensure a first-class developing experience. There won't be any `.selector`s in your script any longer.
-
-Finally, the joy of programming will come back!
-
-
-### Flow Chart
-
-Here is a flowchart illustrating the main process of MidScene.
-
-![](/flow.png)
-
-### Features
-
-#### Locate - Find by natural language
-
-Using GPT-4o, you can now locate the elements by natural language. Just like someone is viewing your page. DOM selectors should no longer be necessary.
-
-```typescript
-const downloadBtns = await insight.locate('download buttons on the page', {multi: true});
-console.log(downloadBtns);
-```
-
-The result would be like
-```typescript
-[
-  { content: 'Download', rect: { left: 1451, top: 78, width: 74, height: 22 } },
-  { content: 'Download Mac Universal', rect: { left: 432, top: 328, width: 232, height: 65 } }
-]
-```
-
-#### Understand - And answer in JSON
-
-Besides basic locator and segmentation, MidScene can help you to understand the UI. 
-By providing the AI with the data shape you want, you will receive a predictable answer, both in terms of data structure and value. You may have never thought about UI automation in this way.
-
-Use `query` to  achieve this.
-
-For example, if you want to understand some properties while locating elements:
-
-```typescript
-const downloadBtns = await insight.locate(query('download buttons on the page', {
-  textsOnButton: 'string',
-  backgroundColor: 'string, color of text, one of blue / red / yellow / green / white / black / others',
-  type: '`major` or `minor`. The Bigger one is major and the others are minor',
-  platform: 'string. Say `unknown` when it is not clear on the element',
-}), {multi: true});
-```
-
-The result would be like
-```typescript
-[
-  {
-    content: 'Download Mac Universal',
-    rect: { left: 432, top: 328, width: 232, height: 65 },
-    textsOnButton: 'Download Mac Universal', // <------ The data mentions in prompt
-    backgroundColor: 'blue',
-    type: 'major',
-    platform: 'Mac'
-  },
-  {
-    content: 'Download',
-    type: 'minor',
-    platform: 'unknown'
-    // ...
-  }
-]
-```
-
-You can also extract data from a section by using `query`.
-For example, if you want to get the service status from the github status page: 
-
-```typescript
-const result = await insightStatus.segment({
-  'services': query( // They are all the prompts being sent to the AI
-    'a list with service names and status',
-    { items: '{service: "service name as string", status: "string, like normal"}[]' },
-  ),
-});
-```
-
-Here is the return value:
-```typescript
-[
-  { service: 'Git Operations', status: 'Normal' },
-  { service: 'API Requests', status: 'Normal' },
-  { service: 'Webhooks', status: 'Normal' },
-  // ...
-]
-```
-
-#### Typed - Out-of-box TypeScript definitions
-
-The custom data shape you defined can have types assigned automatically. Simply use dot notation to access them.
-
-Let's take the `result` above as a sample. TypeScript will give you the basic type hint:
-```typescript 
-const result: {
-    services: UISection<{
-        items: unknown;
-    }>;
-}
-```
-
-By providing a generic type parameter, the return value can be explicitly specified.
-
-```typescript
-const result = await insight.segment({
-  'services': query<{items: {service: string, status: string}[]}>(
-    'a list with service names and status',
-    { items: '{service: "service name as string", status: "string, like normal"}[]' },
-  ),
-});
-
-const { items } = result.services;
-```
-
-TypeScript will give you the following definition:
-
-```typescript
-const items: {
-  service: string;
-  status: string;
-}[]
-```
-
-#### Segment - Customized UI splitting
-
-Describing the sections inside a page, and let AI help you to find them out.
-
-```typescript
-// The param map is also a prompt being sent to the AI model.
-const manySections = await insight.segment({
-  cookiePrompt: 'cookie prompt with its action buttons on the top of the page',
-  topRightWidgets: 'widgets on the top right corner',
-});
-```
-
-The data is as follows.
-
-```typescript
-{
-  cookiePrompt: {
-    texts: [ [Object], [Object], [Object], [Object], [Object], [Object] ],
-    rect: { left: 144, top: 8, width: 1655, height: 49 }
-  },
-  topRightWidgets: {
-    texts: [ [Object], [Object] ],
-    rect: { left: 1241, top: 64, width: 284, height: 50 }
-  },
-}
-```
-
-#### Online Visualization - Help to visualize your prompt
-
-With our visualization tool, you can easily debug the prompt and AI response. 
-
-All intermediate data, such as the query, coordinates, split reason, and custom data, can be visualized.
+UI automation can be quite frustrating. It is always full of *#id*, *data-test-xxx* and *.selectors* that are hard to maintain, not to mention when a refactor happens to the page.
+
+Introducing MidScene.js, an SDK that aims to restore the joy of programming by automating tasks.
+
+With MidScene.js, we harness the power of multimodal LLM to make your UI outputs consistent and well-organized. All you need to do is describe the interaction steps or the expected data format based on a screenshot, and the AI will execute these tasks for you. Finally, it will bring back the joy of programming!
+
+## Features
+
+### Public LLMs are Fine
+
+It is fine to use publicly available LLMs such as GPT-4. There is no need for custom training. To experience the out-of-the-box AI-driven automation, token is all you need. 😀
+
+### Execute Actions
+
+Use `.aiAction` to perform a series of actions by describing the steps.
+
+For example `.aiAction('Enter "Learn JS today" in the task box, then press Enter to create')`.
+
+### Extract Data from Page
+
+`.aiQuery` is the method to extract customized data from the UI.
+
+For example, by calling `const dataB = await agent.aiQuery('string[], task names in the list');`, you will get an array with string of the task names.
+
+### Perform Assertions
+
+Call `.aiAssert` to perform assertions on the page.
+
+#### Visualization Tool
+
+With our visualization tool, you can easily debug the prompt and AI response. All intermediate data, such as queries, plans, and actions, can be visualized.
 
 You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.
 
 ![](/Visualizer.gif)
+
+## Flow Chart
+
+Here is a flowchart illustrating the core process of MidScene.
+
+![](/flow.png)
@@ -1,19 +1,20 @@
 # Quick Start
 
-Currently we use OpenAI GPT-4o as the default engine. So prepare an OpenAI key that is eligible for accessing GPT-4o.
+In this example, we use OpenAI GPT-4o and Puppeteer.js to _________. Remember to prepare an OpenAI key that is eligible for accessing GPT-4o before running.
+
+> [Puppeteer](https://pptr.dev/) is a Node.js library which provides a high-level API to control Chrome or Firefox over the DevTools Protocol or WebDriver BiDi. Puppeteer runs in the headless (no visible UI) by default but can be configured to run in a visible ("headful") browser.
+
+Config the API key
 
 ```bash
 # replace by your own
 export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
-
-# optional, if you use a proxy
-# export OPENAI_BASE_URL="..."
 ```
 
 Install 
 
 ```bash
-npm install midscene --save-dev
+npm install @midscene/web --save-dev
 # for demo use
 npm install puppeteer ts-node --save-dev 
 ```
@@ -22,35 +23,26 @@ Write a simple demo to **extract the main download button of vscode website**.
 Save the following code as `./demo.ts`.
 
 ```typescript
-import puppeteer from 'puppeteer';
-import Insight, { query } from 'midscene';
-
-Promise.resolve(
-  (async () => {
-    // launch vscode website
-    const browser = await puppeteer.launch();
-    const page = (await browser.pages())[0];
-    await page.setViewport({ width: 1920, height: 1080 })
-    await page.goto('https://code.visualstudio.com/');
-    // wait for 5s
-    console.log('Wait for 5 seconds. After that, the demo will begin.');
-    await new Promise((resolve) => setTimeout(resolve, 5 * 1000));
-
-    // ⭐ find the main download button and its backgroundColor ⭐
-    const insight = await Insight.fromPuppeteerBrowser(browser);
-    const downloadBtn = await insight.locate(
-      query('main download button on the page', {
-        textsOnButton: 'string',
-        backgroundColor: 'string, color of text, one of blue / red / yellow / green / white / black / others',
-      }),
-    );
-    console.log(`backgroundColor of main download button is: `, downloadBtn!.backgroundColor);
-    console.log(`text on the button is: `, downloadBtn!.textsOnButton);
-
-    // clean up
-    await browser.close();
-  })(),
-);
+import puppeteer, { Viewport } from 'puppeteer';
+import { PuppeteerAgent } from '@midscene/web/puppeteer';
+
+// init Puppeteer page
+const browser = await puppeteer.launch({
+  headless: false, // here we use headed mode to help debug
+});
+
+const page = await browser.newPage();
+await page.goto('https://www.bing.com');
+await page.waitForNavigation({
+  timeout: 20 * 1000,
+  waitUntil: 'networkidle0',
+});
+const page = await launchPage();
+
+// init MidScene agent
+const agent = new PuppeteerAgent(page);
+await agent.aiAction('type "how much is the ferry ticket in Shanghai" in search box, hit Enter');
+
 ```
 
 Using ts-node to run:
@@ -62,5 +54,5 @@ npx ts-node demo.ts
 # it should print '... is blue'
 ```
 
-After running, MidScene will generate a log dump, which is placed in `./midscene_run/latest.insight.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
+After running, MidScene will generate a log dump, which is placed in `./midscene_run/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.