Skip to content

Commit 195eaad

Browse files
authored
feat(cache): supports ui-tars model caching capability (#361)
1 parent 9d5f2fb commit 195eaad

File tree

13 files changed

+221
-61
lines changed

13 files changed

+221
-61
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ From version v0.10.0, we support a new open-source model named [`UI-TARS`](https
4646
- **Support Private Deployment 🤖**: Supports private deployment of [`UI-TARS`](https://github.com/bytedance/ui-tars) model, which outperforms closed-source models like GPT-4o and Claude in UI automation scenarios while better protecting data security.
4747
- **Support General Models 🌟**: Supports general large models like GPT-4o and Claude, adapting to various scenario needs.
4848
- **Visual Reports for Debugging 🎞️**: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
49+
- **Support Caching 🔄**: The first time you execute a task through AI, it will be cached, and subsequent executions of the same task will significantly improve execution efficiency.
4950
- **Completely Open Source 🔥**: Experience a whole new automation development experience, enjoy!
5051
- **Understand UI, JSON Format Responses 🔍**: You can specify data format requirements and receive responses in JSON format.
5152
- **Intuitive Assertions 🤔**: Express your assertions in natural language, and AI will understand and process them.

README.zh.md

+1
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ Midscene.js 让 AI 成为你的浏览器操作员 🤖。只需用自然语言
4747
- **支持私有化部署 🤖**:支持私有化部署 [`UI-TARS`](https://github.com/bytedance/ui-tars) 模型,相比 GPT-4o、Claude 等闭源模型,不仅在 UI 自动化场景下表现更加出色,还能更好地保护数据安全。
4848
- **支持通用模型 🌟**:支持 GPT-4o、Claude 等通用大模型,适配多种场景需求。
4949
- **用可视化报告来调试 🎞️**:通过我们的测试报告和 Playground,你可以轻松理解、回放和调试整个过程。
50+
- **支持缓存 🔄**:首次通过 AI 执行后任务会被缓存,后续执行相同任务时可显著提升执行效率。
5051
- **完全开源 🔥**:体验全新的自动化开发体验,尽情享受吧!
5152
- **理解UI、JSON格式回答 🔍**:你可以提出关于数据格式的要求,然后得到 JSON 格式的预期回应。
5253
- **直观断言 🤔**:用自然语言表达你的断言,AI 会理解并处理。

apps/site/docs/en/caching.md apps/site/docs/en/caching.mdx

+38-5
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,47 @@ Currently, the caching capability is supported in all scenarios, and Midscene ca
88

99
**Usage**
1010

11-
```diff
12-
- playwright test --config=playwright.config.ts
13-
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
14-
```
11+
12+
import { Tab, Tabs } from 'rspress/theme';
13+
14+
<Tabs>
15+
<Tab label="Playwright">
16+
```diff
17+
- playwright test --config=playwright.config.ts
18+
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
19+
```
20+
</Tab>
21+
<Tab label="Puppeteer">
22+
```diff
23+
- tsx demo.ts
24+
+ MIDSCENE_CACHE=true tsx demo.ts
25+
```
26+
27+
```javascript
28+
const mid = new PuppeteerAgent(originPage, {
29+
cacheId: 'puppeteer-swag-sab)', // Add cache id
30+
});
31+
```
32+
</Tab>
33+
<Tab label="Bridge Mode">
34+
```diff
35+
- tsx demo.ts
36+
+ MIDSCENE_CACHE=true tsx demo.ts
37+
```
38+
39+
```javascript
40+
const agent = new AgentOverChromeBridge({
41+
cacheId: 'star-midscene-github', // Add cache id
42+
});
43+
```
44+
</Tab>
45+
</Tabs>
46+
47+
1548

1649
**Effect**
1750

18-
After enabling the cache, the execution time is significantly reduced, for example, from 1m16s to 23s.
51+
After enabling the cache, the execution time is significantly reduced, for example, from 39s to 13s.
1952

2053
* **before**
2154

860 KB
Loading
860 KB
Loading

apps/site/docs/zh/caching.md apps/site/docs/zh/caching.mdx

+37-5
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,46 @@ Midscene.js 提供了 AI 缓存能力,用于提升整个 AI 执行过程的稳
88

99
**使用方式**
1010

11-
```diff
12-
- playwright test --config=playwright.config.ts
13-
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
14-
```
11+
12+
import { Tab, Tabs } from 'rspress/theme';
13+
14+
<Tabs>
15+
<Tab label="Playwright">
16+
```diff
17+
- playwright test --config=playwright.config.ts
18+
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
19+
```
20+
</Tab>
21+
<Tab label="Puppeteer">
22+
```diff
23+
- tsx demo.ts
24+
+ MIDSCENE_CACHE=true tsx demo.ts
25+
```
26+
27+
```javascript
28+
const mid = new PuppeteerAgent(originPage, {
29+
cacheId: 'puppeteer-swag-sab)', // Add cache id
30+
});
31+
```
32+
</Tab>
33+
<Tab label="Bridge Mode">
34+
```diff
35+
- tsx demo.ts
36+
+ MIDSCENE_CACHE=true tsx demo.ts
37+
```
38+
39+
```javascript
40+
const agent = new AgentOverChromeBridge({
41+
cacheId: 'star-midscene-github', // Add cache id
42+
});
43+
```
44+
</Tab>
45+
</Tabs>
46+
1547

1648
**使用效果**
1749

18-
通过引入缓存后,用例的执行时间大幅降低了,例如从1分16秒降低到了23秒
50+
通过引入缓存后,用例的执行时间大幅降低了,例如从39秒降低到了13秒
1951

2052
* **before**
2153

packages/web-integration/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@
106106
"test:u": "vitest --run -u",
107107
"test:ai": "AI_TEST_TYPE=web npm run test",
108108
"test:ai:temp": "AI_TEST_TYPE=web BRIDGE_MODE=true vitest --run tests/ai/bridge/temp.test.ts",
109-
"test:ai:bridge": "BRIDGE_MODE=true npm run test --inspect tests/ai/bridge/agent.test.ts",
109+
"test:ai:bridge": "MIDSCENE_CACHE=true BRIDGE_MODE=true AI_TEST_TYPE=web npm run test --inspect tests/ai/bridge/temp.test.ts",
110110
"test:ai:cache": "MIDSCENE_CACHE=true AI_TEST_TYPE=web npm run test",
111111
"test:ai:all": "npm run test:ai:web && npm run test:ai:native",
112112
"test:ai:native": "MIDSCENE_CACHE=true AI_TEST_TYPE=native npm run test",

packages/web-integration/src/bridge-mode/page-browser-side.ts

+8-2
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,9 @@ export class ChromeExtensionPageBrowserSide extends ChromeExtensionProxyPage {
103103

104104
public async connectNewTabWithUrl(
105105
url: string,
106-
options?: BridgeConnectTabOptions,
106+
options: BridgeConnectTabOptions = {
107+
trackingActiveTab: true,
108+
},
107109
) {
108110
const tab = await chrome.tabs.create({ url });
109111
const tabId = tab.id;
@@ -117,7 +119,11 @@ export class ChromeExtensionPageBrowserSide extends ChromeExtensionProxyPage {
117119
}
118120
}
119121

120-
public async connectCurrentTab(options?: BridgeConnectTabOptions) {
122+
public async connectCurrentTab(
123+
options: BridgeConnectTabOptions = {
124+
trackingActiveTab: true,
125+
},
126+
) {
121127
const tabs = await chrome.tabs.query({ active: true, currentWindow: true });
122128
console.log('current tab', tabs);
123129
const tabId = tabs[0]?.id;

packages/web-integration/src/chrome-extension/page.ts

+3
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,13 @@ function sleep(ms: number) {
2323
return new Promise((resolve) => setTimeout(resolve, ms));
2424
}
2525

26+
declare const __VERSION__: string;
27+
2628
export default class ChromeExtensionProxyPage implements AbstractPage {
2729
pageType = 'chrome-extension-proxy';
2830

2931
public trackingActiveTab: boolean;
32+
private version: string = __VERSION__;
3033

3134
private viewportSize?: Size;
3235

packages/web-integration/src/common/task-cache.ts

+59-9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import { existsSync, readFileSync } from 'node:fs';
22
import { join } from 'node:path';
33
import type { AIElementIdResponse, PlanningAIResponse } from '@midscene/core';
4+
import type { vlmPlanning } from '@midscene/core/ai-model';
45
import { getAIConfig } from '@midscene/core/env';
56
import {
67
getLogDirByType,
@@ -24,6 +25,19 @@ export type PlanTask = {
2425
response: PlanningAIResponse;
2526
};
2627

28+
export type UITarsPlanTask = {
29+
type: 'ui-tars-plan';
30+
prompt: string;
31+
pageContext: {
32+
url: string;
33+
size: {
34+
width: number;
35+
height: number;
36+
};
37+
};
38+
response: Awaited<ReturnType<typeof vlmPlanning>>;
39+
};
40+
2741
export type LocateTask = {
2842
type: 'locate';
2943
prompt: string;
@@ -37,7 +51,7 @@ export type LocateTask = {
3751
response: AIElementIdResponse;
3852
};
3953

40-
export type AiTasks = Array<PlanTask | LocateTask>;
54+
export type AiTasks = Array<PlanTask | LocateTask | UITarsPlanTask>;
4155

4256
export type AiTaskCache = {
4357
aiTasks: Array<{
@@ -46,6 +60,19 @@ export type AiTaskCache = {
4660
}>;
4761
};
4862

63+
export type CacheGroup = {
64+
readCache: <T extends 'plan' | 'locate' | 'ui-tars-plan'>(
65+
pageContext: WebUIContext,
66+
type: T,
67+
actionPrompt: string,
68+
) => T extends 'plan'
69+
? PlanTask['response']
70+
: T extends 'locate'
71+
? LocateTask['response']
72+
: UITarsPlanTask['response'];
73+
saveCache: (cache: UITarsPlanTask | PlanTask | LocateTask) => void;
74+
};
75+
4976
export class TaskCache {
5077
cache: AiTaskCache;
5178

@@ -66,7 +93,7 @@ export class TaskCache {
6693
};
6794
}
6895

69-
getCacheGroupByPrompt(aiActionPrompt: string) {
96+
getCacheGroupByPrompt(aiActionPrompt: string): CacheGroup {
7097
const { aiTasks = [] } = this.cache || { aiTasks: [] };
7198
const index = aiTasks.findIndex((item) => item.prompt === aiActionPrompt);
7299
const newCacheGroup: AiTasks = [];
@@ -75,30 +102,43 @@ export class TaskCache {
75102
tasks: newCacheGroup,
76103
});
77104
return {
78-
readCache: <T extends 'plan' | 'locate'>(
105+
readCache: <T extends 'plan' | 'locate' | 'ui-tars-plan'>(
79106
pageContext: WebUIContext,
80107
type: T,
81108
actionPrompt: string,
82109
) => {
83110
if (index === -1) {
84-
return false;
111+
return false as any;
85112
}
86113
if (type === 'plan') {
87114
return this.readCache(
88115
pageContext,
89116
type,
90117
actionPrompt,
91118
aiTasks[index].tasks,
92-
) as T extends 'plan' ? PlanTask['response'] : LocateTask['response'];
119+
) as PlanTask['response'];
120+
}
121+
if (type === 'ui-tars-plan') {
122+
return this.readCache(
123+
pageContext,
124+
type,
125+
actionPrompt,
126+
aiTasks[index].tasks,
127+
) as UITarsPlanTask['response'];
93128
}
129+
94130
return this.readCache(
95131
pageContext,
96132
type,
97133
actionPrompt,
98134
aiTasks[index].tasks,
99-
) as T extends 'plan' ? PlanTask['response'] : LocateTask['response'];
135+
) as T extends 'plan'
136+
? PlanTask['response']
137+
: T extends 'locate'
138+
? LocateTask['response']
139+
: UITarsPlanTask['response'];
100140
},
101-
saveCache: (cache: PlanTask | LocateTask) => {
141+
saveCache: (cache: PlanTask | LocateTask | UITarsPlanTask) => {
102142
newCacheGroup.push(cache);
103143
this.writeCacheToFile();
104144
},
@@ -127,6 +167,12 @@ export class TaskCache {
127167
userPrompt: string,
128168
cacheGroup: AiTasks,
129169
): PlanTask['response'];
170+
readCache(
171+
pageContext: WebUIContext,
172+
type: 'ui-tars-plan',
173+
userPrompt: string,
174+
cacheGroup: AiTasks,
175+
): UITarsPlanTask['response'];
130176
readCache(
131177
pageContext: WebUIContext,
132178
type: 'locate',
@@ -135,10 +181,14 @@ export class TaskCache {
135181
): LocateTask['response'];
136182
readCache(
137183
pageContext: WebUIContext,
138-
type: 'plan' | 'locate',
184+
type: 'plan' | 'locate' | 'ui-tars-plan',
139185
userPrompt: string,
140186
cacheGroup: AiTasks,
141-
): PlanTask['response'] | LocateTask['response'] | false {
187+
):
188+
| PlanTask['response']
189+
| LocateTask['response']
190+
| UITarsPlanTask['response']
191+
| false {
142192
if (cacheGroup.length > 0) {
143193
const index = cacheGroup.findIndex((item) => item.prompt === userPrompt);
144194

packages/web-integration/src/common/tasks.ts

+36-8
Original file line numberDiff line numberDiff line change
@@ -591,7 +591,10 @@ export class PageTaskExecutor {
591591
return task;
592592
}
593593

594-
private planningTaskToGoal(userPrompt: string) {
594+
private planningTaskToGoal(
595+
userPrompt: string,
596+
cacheGroup: ReturnType<TaskCache['getCacheGroupByPrompt']>,
597+
) {
595598
const task: ExecutionTaskPlanningApply = {
596599
type: 'Planning',
597600
locate: null,
@@ -621,10 +624,30 @@ export class PageTaskExecutor {
621624
],
622625
});
623626
const startTime = Date.now();
624-
const planResult = await vlmPlanning({
625-
userInstruction: param.userPrompt,
626-
conversationHistory: this.conversationHistory,
627-
size: pageContext.size,
627+
628+
const planCache = cacheGroup.readCache(
629+
pageContext,
630+
'ui-tars-plan',
631+
userPrompt,
632+
);
633+
let planResult: Awaited<ReturnType<typeof vlmPlanning>>;
634+
if (planCache) {
635+
planResult = planCache;
636+
} else {
637+
planResult = await vlmPlanning({
638+
userInstruction: param.userPrompt,
639+
conversationHistory: this.conversationHistory,
640+
size: pageContext.size,
641+
});
642+
}
643+
cacheGroup.saveCache({
644+
type: 'ui-tars-plan',
645+
pageContext: {
646+
url: pageContext.url,
647+
size: pageContext.size,
648+
},
649+
prompt: userPrompt,
650+
response: planResult,
628651
});
629652
const aiCost = Date.now() - startTime;
630653
const { actions, action_summary } = planResult;
@@ -643,6 +666,9 @@ export class PageTaskExecutor {
643666
whatHaveDone: '',
644667
},
645668
},
669+
cache: {
670+
hit: Boolean(planCache),
671+
},
646672
aiCost,
647673
};
648674
},
@@ -738,15 +764,17 @@ export class PageTaskExecutor {
738764
onTaskStart: options?.onTaskStart,
739765
});
740766
this.conversationHistory = [];
741-
767+
const cacheGroup = this.taskCache.getCacheGroupByPrompt(userPrompt);
742768
const isCompleted = false;
743769
let currentActionNumber = 0;
744770
const maxActionNumber = 20;
745771

746772
while (!isCompleted && currentActionNumber < maxActionNumber) {
747773
currentActionNumber++;
748-
const planningTask: ExecutionTaskPlanningApply =
749-
this.planningTaskToGoal(userPrompt);
774+
const planningTask: ExecutionTaskPlanningApply = this.planningTaskToGoal(
775+
userPrompt,
776+
cacheGroup,
777+
);
750778
await taskExecutor.append(planningTask);
751779
const output = await taskExecutor.flush();
752780
if (taskExecutor.isInErrorState()) {

0 commit comments

Comments
 (0)