Skip to content

Commit 95231aa

Browse files
authored
feat(gui-agent): add CLI package for GUI Agent automation (#1741)
1 parent 2c2f8c1 commit 95231aa

9 files changed

Lines changed: 656 additions & 3 deletions

File tree

multimodal/gui-agent/cli/README.md

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# @gui-agent/cli
2+
3+
CLI for GUI Agent - A powerful automation tool for desktop, web, and mobile applications.
4+
5+
## Installation
6+
7+
### Global Installation
8+
```bash
9+
npm install -g @gui-agent/cli
10+
```
11+
12+
### Use via npx (without installation)
13+
```bash
14+
npx @gui-agent/cli run [options]
15+
```
16+
17+
### Local Installation
18+
```bash
19+
npm install @gui-agent/cli
20+
```
21+
22+
## Usage
23+
24+
### Basic Usage
25+
26+
```bash
27+
gui-agent run
28+
```
29+
30+
This will start an interactive prompt where you can:
31+
1. Configure your VLM model settings (provider, base URL, API key, model name)
32+
2. Select the target operator (computer, browser, or android)
33+
3. Enter your automation instruction
34+
35+
### Available Commands
36+
37+
#### `gui-agent run`
38+
Run GUI Agent automation with optional parameters.
39+
40+
#### `gui-agent reset`
41+
Reset stored configuration (API keys, model settings, etc.).
42+
```bash
43+
gui-agent reset # Reset default configuration file
44+
gui-agent reset -c custom.json # Reset specific configuration file
45+
```
46+
47+
### Command Line Options
48+
49+
```bash
50+
gui-agent run [options]
51+
```
52+
53+
#### Options:
54+
- `-p, --presets <url>` - Load model configuration from a remote YAML preset file
55+
- `-t, --target <target>` - Specify the target operator:
56+
- `computer` - Desktop automation (default)
57+
- `browser` - Web browser automation
58+
- `android` - Android mobile automation
59+
- `-q, --query <query>` - Provide the automation instruction directly via command line
60+
- `-c, --config <path>` - Path to a custom configuration file (default: `~/.gui-agent-cli.json`)
61+
62+
### Examples
63+
64+
#### Computer Automation
65+
```bash
66+
gui-agent run -t computer -q "Open Chrome browser and navigate to github.com"
67+
```
68+
69+
#### Android Mobile Automation
70+
Make sure your Android device is connected via USB debugging:
71+
72+
```bash
73+
gui-agent run -t android -q "Open WhatsApp and send a message to John"
74+
```
75+
76+
#### Browser Automation
77+
```bash
78+
gui-agent run -t browser -q "Search for 'GUI Agent automation' on Google"
79+
```
80+
81+
#### Using Remote Presets
82+
```bash
83+
gui-agent run -p "https://example.com/config.yaml" -q "Automate the login process"
84+
```
85+
86+
## Configuration
87+
88+
### Model Configuration
89+
90+
The CLI requires VLM (Vision Language Model) configuration. You can provide this via:
91+
92+
1. **Interactive setup** - When you first run the CLI, it will prompt for:
93+
- Model provider (volcengine, anthropic, openai, lm-studio, deepseek, ollama)
94+
- Model base URL
95+
- API key
96+
- Model name
97+
98+
2. **Configuration file** - Settings are saved to `~/.gui-agent-cli.json`:
99+
```json
100+
{
101+
"provider": "openai",
102+
"baseURL": "https://api.openai.com/v1",
103+
"apiKey": "your-api-key",
104+
"model": "gpt-4-vision-preview",
105+
"useResponsesApi": false
106+
}
107+
```
108+
109+
3. **Remote presets** - Load configuration from a YAML file:
110+
```yaml
111+
vlmBaseUrl: "https://api.openai.com/v1"
112+
vlmApiKey: "your-api-key"
113+
vlmModelName: "gpt-4-vision-preview"
114+
useResponsesApi: false
115+
```
116+
117+
#### Supported Providers
118+
- **volcengine** - VolcEngine (ByteDance) models
119+
- **anthropic** - Anthropic Claude models
120+
- **openai** - OpenAI models (default)
121+
- **lm-studio** - LM Studio local models
122+
- **deepseek** - DeepSeek models
123+
- **ollama** - Ollama local models
124+
125+
## Operators
126+
127+
### Computer Automation (nut-js)
128+
129+
#### Using Remote Presets
130+
```bash
131+
gui-agent start -p "https://example.com/config.yaml" -q "Automate the login process"
132+
```
133+
134+
## Configuration
135+
136+
### Model Configuration
137+
138+
The CLI requires VLM (Vision Language Model) configuration. You can provide this via:
139+
140+
1. **Interactive setup** - When you first run the CLI, it will prompt for:
141+
- Model provider (volcengine, anthropic, openai, lm-studio, deepseek, ollama)
142+
- Model base URL
143+
- API key
144+
- Model name
145+
146+
2. **Configuration file** - Settings are saved to `~/.gui-agent-cli.json`:
147+
```json
148+
{
149+
"provider": "openai",
150+
"baseURL": "https://api.openai.com/v1",
151+
"apiKey": "your-api-key",
152+
"model": "gpt-4-vision-preview",
153+
"useResponsesApi": false
154+
}
155+
```
156+
157+
3. **Remote presets** - Load configuration from a YAML file:
158+
```yaml
159+
vlmBaseUrl: "https://api.openai.com/v1"
160+
vlmApiKey: "your-api-key"
161+
vlmModelName: "gpt-4-vision-preview"
162+
useResponsesApi: false
163+
```
164+
165+
#### Supported Providers
166+
- **volcengine** - VolcEngine (ByteDance) models
167+
- **anthropic** - Anthropic Claude models
168+
- **openai** - OpenAI models (default)
169+
- **lm-studio** - LM Studio local models
170+
- **deepseek** - DeepSeek models
171+
- **ollama** - Ollama local models
172+
173+
## Operators
174+
175+
### Desktop Automation (nut-js)
176+
- Automates desktop applications
177+
- Uses computer vision to identify UI elements
178+
- Supports mouse and keyboard actions
179+
- Works with Windows, macOS, and Linux
180+
181+
### Android Automation (adb)
182+
- Controls Android devices via ADB
183+
- Requires USB debugging enabled
184+
- Can automate mobile apps and system UI
185+
- Supports touch gestures and device interactions
186+
187+
## Configuration Management
188+
189+
### Reset Configuration
190+
To clear all stored configuration and start fresh:
191+
```bash
192+
gui-agent reset
193+
```
194+
195+
This will remove the configuration file (`~/.gui-agent-cli.json`) and the CLI will prompt you to configure settings again on the next run.
196+
197+
### Custom Configuration File
198+
You can specify a custom configuration file location:
199+
```bash
200+
gui-agent run -c /path/to/custom-config.json
201+
```
202+
203+
To reset a specific configuration file:
204+
```bash
205+
gui-agent reset -c /path/to/custom-config.json
206+
```
207+
208+
## Development
209+
210+
### Building the CLI
211+
```bash
212+
npm run build
213+
```
214+
215+
### Development Mode
216+
```bash
217+
npm run dev
218+
```
219+
220+
### Running Tests
221+
```bash
222+
npm test
223+
```
224+
225+
## License
226+
227+
Apache-2.0
228+
229+
## Contributing
230+
231+
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/usr/bin/env node
2+
3+
function main() {
4+
try {
5+
const { run } = require('../dist/cli/commands');
6+
run();
7+
} catch (err) {
8+
console.error(err);
9+
process.exit(1);
10+
}
11+
}
12+
13+
main();
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
{
2+
"name": "@gui-agent/cli",
3+
"version": "0.3.0",
4+
"description": "CLI for GUI Agent",
5+
"repository": {
6+
"type": "git",
7+
"url": "https://github.com/bytedance/UI-TARS-desktop"
8+
},
9+
"bugs": {
10+
"url": "https://github.com/bytedance/UI-TARS-desktop/issues"
11+
},
12+
"bin": {
13+
"gui-agent": "./bin/index.js"
14+
},
15+
"keywords": [
16+
"CLI",
17+
"GUI-Agent",
18+
"Automation"
19+
],
20+
"scripts": {
21+
"dev": "rslib build --watch",
22+
"build": "rslib build",
23+
"build:watch": "rslib build --watch",
24+
"cli": "node bin/index.js",
25+
"test": "vitest"
26+
},
27+
"license": "Apache-2.0",
28+
"files": [
29+
"dist",
30+
"bin"
31+
],
32+
"publishConfig": {
33+
"access": "public",
34+
"registry": "https://registry.npmjs.org"
35+
},
36+
"dependencies": {
37+
"commander": "^14.0.0",
38+
"jimp": "1.6.0",
39+
"js-yaml": "^4.1.0",
40+
"@clack/prompts": "^0.11.0",
41+
"@gui-agent/agent-sdk": "workspace:*",
42+
"@gui-agent/operator-nutjs": "workspace:*",
43+
"@gui-agent/operator-adb": "workspace:*",
44+
"@gui-agent/operator-browser": "workspace:*",
45+
"node-fetch": "^2.7.0"
46+
},
47+
"devDependencies": {
48+
"@rslib/core": "0.10.0",
49+
"@types/js-yaml": "^4.0.9",
50+
"@types/node-fetch": "^2.6.2",
51+
"typescript": "^5.5.3",
52+
"vitest": "3.2.4"
53+
}
54+
}
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
/**
2+
* Copyright (c) 2025 Bytedance, Inc. and its affiliates.
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
import { defineConfig } from '@rslib/core';
6+
7+
const BANNER = `/**
8+
* Copyright (c) 2025 Bytedance, Inc. and its affiliates.
9+
* SPDX-License-Identifier: Apache-2.0
10+
*/`;
11+
12+
export default defineConfig({
13+
source: {
14+
entry: {
15+
index: ['./src/**', '!./src/**/*.test.ts'],
16+
},
17+
},
18+
lib: [
19+
{
20+
format: 'esm',
21+
syntax: 'es2021',
22+
bundle: false,
23+
autoExternal: false,
24+
dts: true,
25+
banner: { js: BANNER },
26+
},
27+
{
28+
format: 'cjs',
29+
syntax: 'es2021',
30+
bundle: false,
31+
dts: true,
32+
banner: { js: BANNER },
33+
},
34+
],
35+
output: {
36+
target: 'node',
37+
cleanDistPath: false,
38+
sourceMap: true,
39+
},
40+
});

0 commit comments

Comments
 (0)