feat: export yaml runner in javascipt (#368)

yuyutaotao · zhoushaw · web-flow · commit efa4263b06de · 2025-02-10T20:00:14.000+08:00
---------

Co-authored-by: zhouxiao.shaw &lt;zhouxiao.shaw@bytedance.com&gt;
diff --git a/apps/site/docs/en/API.md b/apps/site/docs/en/API.md
@@ -21,7 +21,7 @@ And also, puppeteer agent has an extra option:
 
 These are the main methods on all kinds of agents in Midscene.
 
-> In the following documentation, you may see functions called with the `mid.` prefix. If you use destructuring in Playwright, like `async ({ ai, aiQuery }) => { /* ... */}`, you can call the functions without this prefix. It's just a matter of syntax.
+> In the following documentation, you may see functions called with the `agent.` prefix. If you use destructuring in Playwright, like `async ({ ai, aiQuery }) => { /* ... */}`, you can call the functions without this prefix. It's just a matter of syntax.
 
 ### `.aiAction(steps: string)` or `.ai(steps: string)` - Interact with the page
 
@@ -32,11 +32,11 @@ You can use `.aiAction` to perform a series of actions. It accepts a `steps: str
 These are some good samples:
 
 ```typescript
-await mid.aiAction('Enter "Learn JS today" in the task box, then press Enter to create');
-await mid.aiAction('Move your mouse over the second item in the task list and click the Delete button to the right of the second task');
+await agent.aiAction('Enter "Learn JS today" in the task box, then press Enter to create');
+await agent.aiAction('Move your mouse over the second item in the task list and click the Delete button to the right of the second task');
 
 // use `.ai` shortcut
-await mid.ai('Click the "completed" status button below the task list');
+await agent.ai('Click the "completed" status button below the task list');
 ```
 
 Steps should always be clearly and thoroughly described. A very brief prompt like 'Tweet "Hello World"' will result in unstable performance and a high likelihood of failure. 
@@ -62,7 +62,7 @@ You can extract customized data from the UI. Provided that the multi-modal AI ca
 For example, to parse detailed information from page:
 
 ```typescript
-const dataA = await mid.aiQuery({
+const dataA = await agent.aiQuery({
   time: 'date and time, string',
   userInfo: 'user info, {name: string}',
   tableFields: 'field names of table, string[]',
@@ -74,18 +74,18 @@ You can also describe the expected return value format as a plain string:
 
 ```typescript
 // dataB will be a string array
-const dataB = await mid.aiQuery('string[], task names in the list');
+const dataB = await agent.aiQuery('string[], task names in the list');
 
 // dataC will be an array with objects
-const dataC = await mid.aiQuery('{name: string, age: string}[], Data Record in the table');
+const dataC = await agent.aiQuery('{name: string, age: string}[], Data Record in the table');
 ```
 
 ### `.aiAssert(assertion: string, errorMsg?: string)` - do an assertion
 
 `.aiAssert` works just like the normal `assert` method, except that the condition is a prompt string written in natural language. Midscene will call AI to determine if the `assertion` is true. If the condition is not met, an error will be thrown containing `errorMsg` and a detailed reason generated by AI.
 
 ```typescript
-await mid.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
+await agent.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
 ```
 
 :::tip
@@ -94,7 +94,7 @@ Assertions are usually a very important part of your script. To prevent the poss
 For example, to replace the previous assertion,
 
 ```typescript
-const items = await mid.aiQuery(
+const items = await agent.aiQuery(
   '"{name: string, price: number}[], return item name and price of each item',
 );
 const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');
@@ -110,7 +110,29 @@ expect(onesieItem.price).toBe(7.99);
 When considering the time required for the AI service, `.aiWaitFor` may not be very efficient. Using a simple `sleep` method might be a useful alternative to `waitFor`.
 
 ```typescript
-await mid.aiWaitFor("there is at least one headphone item on page");
+await agent.aiWaitFor("there is at least one headphone item on page");
+```
+
+### `.runYaml(yamlScriptContent: string)` - run a yaml script
+
+`.runYaml` will run the `tasks` part of the yaml script and return the result of all the `.aiQuery` calls (if any). The `target` part of the yaml script will be ignored in this function.
+
+To ignore some errors while running, you can set the `continueOnError` option in the yaml script. For more details about the yaml script schema, please refer to [Automate with Scripts in YAML](./automate-with-scripts-in-yaml).
+
+```typescript
+const { result } = await agent.runYaml(`
+tasks:
+  - name: search weather
+    flow:
+      - ai: input 'weather today' in input box, click search button
+      - sleep: 3000
+
+  - name: query weather
+    flow:
+      - aiQuery: "the result shows the weather info, {description: string}"
+        name: weather
+`);
+console.log(result);
 ```
 
 ## Properties
diff --git a/apps/site/docs/en/automate-with-scripts-in-yaml.mdx b/apps/site/docs/en/automate-with-scripts-in-yaml.mdx
@@ -206,10 +206,32 @@ You can use the environment variable in the `.yaml` file like this:
 ```
 
 ## Use bridge mode
+
 By using bridge mode, you can utilize YAML scripts to automate the web browser on your desktop. This is particularly useful if you want to reuse cookies, plugins, and page states, or if you want to manually interact with automation scripts.
 
 See [Bridge Mode by Chrome Extension](./bridge-mode-by-chrome-extension) for more details.
 
+## Run yaml script with javascript
+
+You can also run a yaml script with javascript by using the `runYaml` method of the Midscene agent. Only the `tasks` part of the yaml script will be executed.
+
+```typescript
+const { result } = await agent.runYaml(`
+tasks:
+  - name: search weather
+    flow:
+      - ai: input 'weather today' in input box, click search button
+      - sleep: 3000
+
+  - name: query weather
+    flow:
+      - aiQuery: "the result shows the weather info, {description: string}"
+        name: weather
+`);
+```
+
+For more details about the agent API, please refer to [API](./api).
+
 ## FAQ
 
 **How to get cookies in JSON format from Chrome?**
diff --git a/apps/site/docs/zh/API.md b/apps/site/docs/zh/API.md
@@ -21,7 +21,7 @@ Midscene 中每个 Agent 都有自己的构造函数。
 
 这些是 Midscene 中各类 Agent 的主要 API。
 
-> 在以下文档中，你可能会看到带有 `mid.` 前缀的函数调用。如果你在 Playwright 中使用了解构赋值（object destructuring），如 `async ({ ai, aiQuery }) => { /* ... */}`，你可以不带这个前缀进行调用。这只是语法的区别。
+> 在以下文档中，你可能会看到带有 `agent.` 前缀的函数调用。如果你在 Playwright 中使用了解构赋值（object destructuring），如 `async ({ ai, aiQuery }) => { /* ... */}`，你可以不带这个前缀进行调用。这只是语法的区别。
 
 ### `.aiAction(steps: string)` 或 `.ai(steps: string)` - 控制界面
 
@@ -32,11 +32,11 @@ Midscene 中每个 Agent 都有自己的构造函数。
 以下是一些正确示例：
 
 ```typescript
-await mid.aiAction('在任务框中输入 "Learn JS today"，然后按回车键创建任务');
-await mid.aiAction('将鼠标移动到任务列表中的第二项，然后点击第二个任务右侧的删除按钮');
+await agent.aiAction('在任务框中输入 "Learn JS today"，然后按回车键创建任务');
+await agent.aiAction('将鼠标移动到任务列表中的第二项，然后点击第二个任务右侧的删除按钮');
 
 // 使用 `.ai` 简写
-await mid.ai('点击任务列表下方的 "completed" 状态按钮');
+await agent.ai('点击任务列表下方的 "completed" 状态按钮');
 ```
 
 务必使用清晰、详细的步骤描述。使用非常简略的指令（如 “发一条微博” ）会导致非常不稳定的执行结果或运行失败。
@@ -62,7 +62,7 @@ await mid.ai('点击任务列表下方的 "completed" 状态按钮');
 例如，从页面解析详细信息：
 
 ```typescript
-const dataA = await mid.aiQuery({
+const dataA = await agent.aiQuery({
   time: '左上角展示的日期和时间，string',
   userInfo: '用户信息，{name: string}',
   tableFields: '表格的字段名，string[]',
@@ -72,18 +72,18 @@ const dataA = await mid.aiQuery({
 你也可以用纯字符串描述预期的返回值格式：
 
 // dataB 将是一个字符串数组
-const dataB = await mid.aiQuery('string[]，列表中的任务名称');
+const dataB = await agent.aiQuery('string[]，列表中的任务名称');
 
 // dataC 将是一个包含对象的数组
-const dataC = await mid.aiQuery('{name: string, age: string}[], 表格中的数据记录');
+const dataC = await agent.aiQuery('{name: string, age: string}[], 表格中的数据记录');
 ```
 
 ### `.aiAssert(assertion: string, errorMsg?: string)` - 进行断言
 
 `.aiAssert` 的功能类似于一般的断言（assert）方法，但可以用自然语言编写条件参数 `assertion`。Midscene 会调用 AI 来判断条件是否为真。若条件不满足，SDK 会抛出一个错误并在 `errorMsg` 后附上 AI 生成的错误原因。
 
 ```typescript
-await mid.aiAssert('"Sauce Labs Onesie" 的价格是 7.99');
+await agent.aiAssert('"Sauce Labs Onesie" 的价格是 7.99');
 ```
 
 :::tip
@@ -92,7 +92,7 @@ await mid.aiAssert('"Sauce Labs Onesie" 的价格是 7.99');
 例如你可以这么替代上一个断言代码：
 
 ```typescript
-const items = await mid.aiQuery(
+const items = await agent.aiQuery(
   '"{name: string, price: number}[], 返回商品名称和价格列表)',
 );
 const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');
@@ -108,7 +108,29 @@ expect(onesieItem.price).toBe(7.99);
 考虑到 AI 服务的时间消耗，`.aiWaitFor` 并不是一个特别高效的方法。使用一个普通的 `sleep` 可能是替代 `waitFor` 的另一种方式。
 
 ```typescript
-await mid.aiWaitFor("界面上至少有一个耳机的信息");
+await agent.aiWaitFor("界面上至少有一个耳机的信息");
+```
+
+### `.runYaml(yamlScriptContent: string)` - 运行一个 yaml 脚本
+
+`.runYaml` 会运行 yaml 脚本中的 `tasks` 部分，并返回所有 `.aiQuery` 调用的结果（如果存在此类调用）。yaml 脚本中的 `target` 部分将被忽略。
+
+如果想要忽略 yaml 脚本运行中的错误，可以在 yaml 脚本中设置 `continueOnError` 选项。更多关于 yaml 脚本的信息，请参考 [Automate with Scripts in YAML](./automate-with-scripts-in-yaml)。
+
+```typescript
+const { result } = await agent.runYaml(`
+tasks:
+  - name: search weather
+    flow:
+      - ai: input 'weather today' in input box, click search button
+      - sleep: 3000
+
+  - name: query weather
+    flow:
+      - aiQuery: "the result shows the weather info, {description: string}"
+        name: weather
+`);
+console.log(result);
 ```
 
 ## 属性
diff --git a/apps/site/docs/zh/automate-with-scripts-in-yaml.mdx b/apps/site/docs/zh/automate-with-scripts-in-yaml.mdx
@@ -211,6 +211,27 @@ topic=weather today
 
 请参阅 [通过 Chrome 扩展桥接模式](./bridge-mode-by-chrome-extension) 了解更多详细信息。
 
+## 使用 JavaScript 运行 YAML 脚本
+
+你也可以使用 JavaScript 运行 YAML 脚本，调用 Agent 上的 `runYaml` 方法即可。注意，这种方法只会执行 YAML 脚本中的 `tasks` 部分。
+
+```typescript
+const { result } = await agent.runYaml(`
+tasks:
+  - name: search weather
+    flow:
+      - ai: input 'weather today' in input box, click search button
+      - sleep: 3000
+
+  - name: query weather
+    flow:
+      - aiQuery: "the result shows the weather info, {description: string}"
+        name: weather
+`);
+```
+
+更多关于 agent 的 API，请参考 [API](./api)。
+
 ## FAQ
 
 **如何从 Chrome 中获取 JSON 格式的 Cookies？**
diff --git a/packages/web-integration/src/common/agent.ts b/packages/web-integration/src/common/agent.ts
@@ -11,6 +11,7 @@ import {
 } from '@midscene/core';
 import { NodeType } from '@midscene/shared/constants';
 
+import { ScriptPlayer, parseYamlScript } from '@/yaml';
 import { MIDSCENE_USE_VLM_UI_TARS, getAIConfig } from '@midscene/core/env';
 import {
   groupedActionDumpFileExt,
@@ -252,6 +253,30 @@ export class PageAgent<PageType extends WebPage = WebPage> {
     );
   }
 
+  async runYaml(yamlScriptContent: string): Promise<{
+    result: Record<string, any>;
+  }> {
+    const script = parseYamlScript(yamlScriptContent, 'yaml', true);
+    const player = new ScriptPlayer(script, async (target) => {
+      return { agent: this, freeFn: [] };
+    });
+    await player.run();
+
+    if (player.status === 'error') {
+      const errors = player.taskStatusList
+        .filter((task) => task.status === 'error')
+        .map((task) => {
+          return `task - ${task.name}: ${task.error?.message}`;
+        })
+        .join('\n');
+      throw new Error(`Error(s) occurred in running yaml script:\n${errors}`);
+    }
+
+    return {
+      result: player.result,
+    };
+  }
+
   async destroy() {
     await this.page.destroy();
   }
diff --git a/packages/web-integration/src/yaml/player.ts b/packages/web-integration/src/yaml/player.ts
@@ -35,7 +35,7 @@ export class ScriptPlayer {
     public onTaskStatusChange?: (taskStatus: ScriptPlayerTaskStatus) => void,
   ) {
     this.result = {};
-    this.output = script.target.output;
+    this.output = script.target?.output;
     this.taskStatusList = (script.tasks || []).map((task, taskIndex) => ({
       ...task,
       index: taskIndex,
diff --git a/packages/web-integration/src/yaml/utils.ts b/packages/web-integration/src/yaml/utils.ts
@@ -24,19 +24,25 @@ function interpolateEnvVars(content: string): string {
 export function parseYamlScript(
   content: string,
   filePath?: string,
+  ignoreCheckingTarget?: boolean,
 ): MidsceneYamlScript {
   const interpolatedContent = interpolateEnvVars(content);
   const obj = yaml.load(interpolatedContent) as MidsceneYamlScript;
   const pathTip = filePath ? `, failed to load ${filePath}` : '';
-  assert(obj.target, `property "target" is required in yaml script${pathTip}`);
-  assert(
-    typeof obj.target === 'object',
-    `property "target" must be an object${pathTip}`,
-  );
-  assert(obj.tasks, `property "tasks" is required in yaml script${pathTip}`);
+  if (!ignoreCheckingTarget) {
+    assert(
+      obj.target,
+      `property "target" is required in yaml script${pathTip}`,
+    );
+    assert(
+      typeof obj.target === 'object',
+      `property "target" must be an object${pathTip}`,
+    );
+  }
+  assert(obj.tasks, `property "tasks" is required in yaml script ${pathTip}`);
   assert(
     Array.isArray(obj.tasks),
-    `property "tasks" must be an array${pathTip}`,
+    `property "tasks" must be an array in yaml script, but got ${obj.tasks}`,
   );
   return obj;
 }
diff --git a/packages/web-integration/tests/ai/web/puppeteer/agent.test.ts b/packages/web-integration/tests/ai/web/puppeteer/agent.test.ts