Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: export yaml runner in javascipt #368

Merged
merged 3 commits into from
Feb 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 32 additions & 10 deletions apps/site/docs/en/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ And also, puppeteer agent has an extra option:

These are the main methods on all kinds of agents in Midscene.

> In the following documentation, you may see functions called with the `mid.` prefix. If you use destructuring in Playwright, like `async ({ ai, aiQuery }) => { /* ... */}`, you can call the functions without this prefix. It's just a matter of syntax.
> In the following documentation, you may see functions called with the `agent.` prefix. If you use destructuring in Playwright, like `async ({ ai, aiQuery }) => { /* ... */}`, you can call the functions without this prefix. It's just a matter of syntax.
### `.aiAction(steps: string)` or `.ai(steps: string)` - Interact with the page

Expand All @@ -32,11 +32,11 @@ You can use `.aiAction` to perform a series of actions. It accepts a `steps: str
These are some good samples:

```typescript
await mid.aiAction('Enter "Learn JS today" in the task box, then press Enter to create');
await mid.aiAction('Move your mouse over the second item in the task list and click the Delete button to the right of the second task');
await agent.aiAction('Enter "Learn JS today" in the task box, then press Enter to create');
await agent.aiAction('Move your mouse over the second item in the task list and click the Delete button to the right of the second task');

// use `.ai` shortcut
await mid.ai('Click the "completed" status button below the task list');
await agent.ai('Click the "completed" status button below the task list');
```

Steps should always be clearly and thoroughly described. A very brief prompt like 'Tweet "Hello World"' will result in unstable performance and a high likelihood of failure.
Expand All @@ -62,7 +62,7 @@ You can extract customized data from the UI. Provided that the multi-modal AI ca
For example, to parse detailed information from page:

```typescript
const dataA = await mid.aiQuery({
const dataA = await agent.aiQuery({
time: 'date and time, string',
userInfo: 'user info, {name: string}',
tableFields: 'field names of table, string[]',
Expand All @@ -74,18 +74,18 @@ You can also describe the expected return value format as a plain string:

```typescript
// dataB will be a string array
const dataB = await mid.aiQuery('string[], task names in the list');
const dataB = await agent.aiQuery('string[], task names in the list');

// dataC will be an array with objects
const dataC = await mid.aiQuery('{name: string, age: string}[], Data Record in the table');
const dataC = await agent.aiQuery('{name: string, age: string}[], Data Record in the table');
```

### `.aiAssert(assertion: string, errorMsg?: string)` - do an assertion

`.aiAssert` works just like the normal `assert` method, except that the condition is a prompt string written in natural language. Midscene will call AI to determine if the `assertion` is true. If the condition is not met, an error will be thrown containing `errorMsg` and a detailed reason generated by AI.

```typescript
await mid.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
await agent.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
```

:::tip
Expand All @@ -94,7 +94,7 @@ Assertions are usually a very important part of your script. To prevent the poss
For example, to replace the previous assertion,

```typescript
const items = await mid.aiQuery(
const items = await agent.aiQuery(
'"{name: string, price: number}[], return item name and price of each item',
);
const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');
Expand All @@ -110,7 +110,29 @@ expect(onesieItem.price).toBe(7.99);
When considering the time required for the AI service, `.aiWaitFor` may not be very efficient. Using a simple `sleep` method might be a useful alternative to `waitFor`.

```typescript
await mid.aiWaitFor("there is at least one headphone item on page");
await agent.aiWaitFor("there is at least one headphone item on page");
```

### `.runYaml(yamlScriptContent: string)` - run a yaml script

`.runYaml` will run the `tasks` part of the yaml script and return the result of all the `.aiQuery` calls (if any). The `target` part of the yaml script will be ignored in this function.

To ignore some errors while running, you can set the `continueOnError` option in the yaml script. For more details about the yaml script schema, please refer to [Automate with Scripts in YAML](./automate-with-scripts-in-yaml).

```typescript
const { result } = await agent.runYaml(`
tasks:
- name: search weather
flow:
- ai: input 'weather today' in input box, click search button
- sleep: 3000
- name: query weather
flow:
- aiQuery: "the result shows the weather info, {description: string}"
name: weather
`);
console.log(result);
```

## Properties
Expand Down
22 changes: 22 additions & 0 deletions apps/site/docs/en/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -206,10 +206,32 @@ You can use the environment variable in the `.yaml` file like this:
```

## Use bridge mode

By using bridge mode, you can utilize YAML scripts to automate the web browser on your desktop. This is particularly useful if you want to reuse cookies, plugins, and page states, or if you want to manually interact with automation scripts.

See [Bridge Mode by Chrome Extension](./bridge-mode-by-chrome-extension) for more details.

## Run yaml script with javascript

You can also run a yaml script with javascript by using the `runYaml` method of the Midscene agent. Only the `tasks` part of the yaml script will be executed.

```typescript
const { result } = await agent.runYaml(`
tasks:
- name: search weather
flow:
- ai: input 'weather today' in input box, click search button
- sleep: 3000

- name: query weather
flow:
- aiQuery: "the result shows the weather info, {description: string}"
name: weather
`);
```

For more details about the agent API, please refer to [API](./api).

## FAQ

**How to get cookies in JSON format from Chrome?**
Expand Down
42 changes: 32 additions & 10 deletions apps/site/docs/zh/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Midscene 中每个 Agent 都有自己的构造函数。

这些是 Midscene 中各类 Agent 的主要 API。

> 在以下文档中,你可能会看到带有 `mid.` 前缀的函数调用。如果你在 Playwright 中使用了解构赋值(object destructuring),如 `async ({ ai, aiQuery }) => { /* ... */}`,你可以不带这个前缀进行调用。这只是语法的区别。
> 在以下文档中,你可能会看到带有 `agent.` 前缀的函数调用。如果你在 Playwright 中使用了解构赋值(object destructuring),如 `async ({ ai, aiQuery }) => { /* ... */}`,你可以不带这个前缀进行调用。这只是语法的区别。
### `.aiAction(steps: string)``.ai(steps: string)` - 控制界面

Expand All @@ -32,11 +32,11 @@ Midscene 中每个 Agent 都有自己的构造函数。
以下是一些正确示例:

```typescript
await mid.aiAction('在任务框中输入 "Learn JS today",然后按回车键创建任务');
await mid.aiAction('将鼠标移动到任务列表中的第二项,然后点击第二个任务右侧的删除按钮');
await agent.aiAction('在任务框中输入 "Learn JS today",然后按回车键创建任务');
await agent.aiAction('将鼠标移动到任务列表中的第二项,然后点击第二个任务右侧的删除按钮');

// 使用 `.ai` 简写
await mid.ai('点击任务列表下方的 "completed" 状态按钮');
await agent.ai('点击任务列表下方的 "completed" 状态按钮');
```

务必使用清晰、详细的步骤描述。使用非常简略的指令(如 “发一条微博” )会导致非常不稳定的执行结果或运行失败。
Expand All @@ -62,7 +62,7 @@ await mid.ai('点击任务列表下方的 "completed" 状态按钮');
例如,从页面解析详细信息:

```typescript
const dataA = await mid.aiQuery({
const dataA = await agent.aiQuery({
time: '左上角展示的日期和时间,string',
userInfo: '用户信息,{name: string}',
tableFields: '表格的字段名,string[]',
Expand All @@ -72,18 +72,18 @@ const dataA = await mid.aiQuery({
你也可以用纯字符串描述预期的返回值格式:

// dataB 将是一个字符串数组
const dataB = await mid.aiQuery('string[],列表中的任务名称');
const dataB = await agent.aiQuery('string[],列表中的任务名称');

// dataC 将是一个包含对象的数组
const dataC = await mid.aiQuery('{name: string, age: string}[], 表格中的数据记录');
const dataC = await agent.aiQuery('{name: string, age: string}[], 表格中的数据记录');
```

### `.aiAssert(assertion: string, errorMsg?: string)` - 进行断言

`.aiAssert` 的功能类似于一般的断言(assert)方法,但可以用自然语言编写条件参数 `assertion`。Midscene 会调用 AI 来判断条件是否为真。若条件不满足,SDK 会抛出一个错误并在 `errorMsg` 后附上 AI 生成的错误原因。

```typescript
await mid.aiAssert('"Sauce Labs Onesie" 的价格是 7.99');
await agent.aiAssert('"Sauce Labs Onesie" 的价格是 7.99');
```

:::tip
Expand All @@ -92,7 +92,7 @@ await mid.aiAssert('"Sauce Labs Onesie" 的价格是 7.99');
例如你可以这么替代上一个断言代码:

```typescript
const items = await mid.aiQuery(
const items = await agent.aiQuery(
'"{name: string, price: number}[], 返回商品名称和价格列表)',
);
const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');
Expand All @@ -108,7 +108,29 @@ expect(onesieItem.price).toBe(7.99);
考虑到 AI 服务的时间消耗,`.aiWaitFor` 并不是一个特别高效的方法。使用一个普通的 `sleep` 可能是替代 `waitFor` 的另一种方式。

```typescript
await mid.aiWaitFor("界面上至少有一个耳机的信息");
await agent.aiWaitFor("界面上至少有一个耳机的信息");
```

### `.runYaml(yamlScriptContent: string)` - 运行一个 yaml 脚本

`.runYaml` 会运行 yaml 脚本中的 `tasks` 部分,并返回所有 `.aiQuery` 调用的结果(如果存在此类调用)。yaml 脚本中的 `target` 部分将被忽略。

如果想要忽略 yaml 脚本运行中的错误,可以在 yaml 脚本中设置 `continueOnError` 选项。更多关于 yaml 脚本的信息,请参考 [Automate with Scripts in YAML](./automate-with-scripts-in-yaml)

```typescript
const { result } = await agent.runYaml(`
tasks:
- name: search weather
flow:
- ai: input 'weather today' in input box, click search button
- sleep: 3000
- name: query weather
flow:
- aiQuery: "the result shows the weather info, {description: string}"
name: weather
`);
console.log(result);
```

## 属性
Expand Down
21 changes: 21 additions & 0 deletions apps/site/docs/zh/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,27 @@ topic=weather today

请参阅 [通过 Chrome 扩展桥接模式](./bridge-mode-by-chrome-extension) 了解更多详细信息。

## 使用 JavaScript 运行 YAML 脚本

你也可以使用 JavaScript 运行 YAML 脚本,调用 Agent 上的 `runYaml` 方法即可。注意,这种方法只会执行 YAML 脚本中的 `tasks` 部分。

```typescript
const { result } = await agent.runYaml(`
tasks:
- name: search weather
flow:
- ai: input 'weather today' in input box, click search button
- sleep: 3000
- name: query weather
flow:
- aiQuery: "the result shows the weather info, {description: string}"
name: weather
`);
```

更多关于 agent 的 API,请参考 [API](./api)

## FAQ

**如何从 Chrome 中获取 JSON 格式的 Cookies?**
Expand Down
25 changes: 25 additions & 0 deletions packages/web-integration/src/common/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import {
} from '@midscene/core';
import { NodeType } from '@midscene/shared/constants';

import { ScriptPlayer, parseYamlScript } from '@/yaml';
import { MIDSCENE_USE_VLM_UI_TARS, getAIConfig } from '@midscene/core/env';
import {
groupedActionDumpFileExt,
Expand Down Expand Up @@ -252,6 +253,30 @@ export class PageAgent<PageType extends WebPage = WebPage> {
);
}

async runYaml(yamlScriptContent: string): Promise<{
result: Record<string, any>;
}> {
const script = parseYamlScript(yamlScriptContent, 'yaml', true);
const player = new ScriptPlayer(script, async (target) => {
return { agent: this, freeFn: [] };
});
await player.run();

if (player.status === 'error') {
const errors = player.taskStatusList
.filter((task) => task.status === 'error')
.map((task) => {
return `task - ${task.name}: ${task.error?.message}`;
})
.join('\n');
throw new Error(`Error(s) occurred in running yaml script:\n${errors}`);
}

return {
result: player.result,
};
}

async destroy() {
await this.page.destroy();
}
Expand Down
2 changes: 1 addition & 1 deletion packages/web-integration/src/yaml/player.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ export class ScriptPlayer {
public onTaskStatusChange?: (taskStatus: ScriptPlayerTaskStatus) => void,
) {
this.result = {};
this.output = script.target.output;
this.output = script.target?.output;
this.taskStatusList = (script.tasks || []).map((task, taskIndex) => ({
...task,
index: taskIndex,
Expand Down
20 changes: 13 additions & 7 deletions packages/web-integration/src/yaml/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,25 @@ function interpolateEnvVars(content: string): string {
export function parseYamlScript(
content: string,
filePath?: string,
ignoreCheckingTarget?: boolean,
): MidsceneYamlScript {
const interpolatedContent = interpolateEnvVars(content);
const obj = yaml.load(interpolatedContent) as MidsceneYamlScript;
const pathTip = filePath ? `, failed to load ${filePath}` : '';
assert(obj.target, `property "target" is required in yaml script${pathTip}`);
assert(
typeof obj.target === 'object',
`property "target" must be an object${pathTip}`,
);
assert(obj.tasks, `property "tasks" is required in yaml script${pathTip}`);
if (!ignoreCheckingTarget) {
assert(
obj.target,
`property "target" is required in yaml script${pathTip}`,
);
assert(
typeof obj.target === 'object',
`property "target" must be an object${pathTip}`,
);
}
assert(obj.tasks, `property "tasks" is required in yaml script ${pathTip}`);
assert(
Array.isArray(obj.tasks),
`property "tasks" must be an array${pathTip}`,
`property "tasks" must be an array in yaml script, but got ${obj.tasks}`,
);
return obj;
}
Expand Down
Loading
Loading