|
| 1 | +# @gui-agent/cli |
| 2 | + |
| 3 | +CLI for GUI Agent - A powerful automation tool for desktop, web, and mobile applications. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +### Global Installation |
| 8 | +```bash |
| 9 | +npm install -g @gui-agent/cli |
| 10 | +``` |
| 11 | + |
| 12 | +### Use via npx (without installation) |
| 13 | +```bash |
| 14 | +npx @gui-agent/cli run [options] |
| 15 | +``` |
| 16 | + |
| 17 | +### Local Installation |
| 18 | +```bash |
| 19 | +npm install @gui-agent/cli |
| 20 | +``` |
| 21 | + |
| 22 | +## Usage |
| 23 | + |
| 24 | +### Basic Usage |
| 25 | + |
| 26 | +```bash |
| 27 | +gui-agent run |
| 28 | +``` |
| 29 | + |
| 30 | +This will start an interactive prompt where you can: |
| 31 | +1. Configure your VLM model settings (provider, base URL, API key, model name) |
| 32 | +2. Select the target operator (computer, browser, or android) |
| 33 | +3. Enter your automation instruction |
| 34 | + |
| 35 | +### Available Commands |
| 36 | + |
| 37 | +#### `gui-agent run` |
| 38 | +Run GUI Agent automation with optional parameters. |
| 39 | + |
| 40 | +#### `gui-agent reset` |
| 41 | +Reset stored configuration (API keys, model settings, etc.). |
| 42 | +```bash |
| 43 | +gui-agent reset # Reset default configuration file |
| 44 | +gui-agent reset -c custom.json # Reset specific configuration file |
| 45 | +``` |
| 46 | + |
| 47 | +### Command Line Options |
| 48 | + |
| 49 | +```bash |
| 50 | +gui-agent run [options] |
| 51 | +``` |
| 52 | + |
| 53 | +#### Options: |
| 54 | +- `-p, --presets <url>` - Load model configuration from a remote YAML preset file |
| 55 | +- `-t, --target <target>` - Specify the target operator: |
| 56 | + - `computer` - Desktop automation (default) |
| 57 | + - `browser` - Web browser automation |
| 58 | + - `android` - Android mobile automation |
| 59 | +- `-q, --query <query>` - Provide the automation instruction directly via command line |
| 60 | +- `-c, --config <path>` - Path to a custom configuration file (default: `~/.gui-agent-cli.json`) |
| 61 | + |
| 62 | +### Examples |
| 63 | + |
| 64 | +#### Computer Automation |
| 65 | +```bash |
| 66 | +gui-agent run -t computer -q "Open Chrome browser and navigate to github.com" |
| 67 | +``` |
| 68 | + |
| 69 | +#### Android Mobile Automation |
| 70 | +Make sure your Android device is connected via USB debugging: |
| 71 | + |
| 72 | +```bash |
| 73 | +gui-agent run -t android -q "Open WhatsApp and send a message to John" |
| 74 | +``` |
| 75 | + |
| 76 | +#### Browser Automation |
| 77 | +```bash |
| 78 | +gui-agent run -t browser -q "Search for 'GUI Agent automation' on Google" |
| 79 | +``` |
| 80 | + |
| 81 | +#### Using Remote Presets |
| 82 | +```bash |
| 83 | +gui-agent run -p "https://example.com/config.yaml" -q "Automate the login process" |
| 84 | +``` |
| 85 | + |
| 86 | +## Configuration |
| 87 | + |
| 88 | +### Model Configuration |
| 89 | + |
| 90 | +The CLI requires VLM (Vision Language Model) configuration. You can provide this via: |
| 91 | + |
| 92 | +1. **Interactive setup** - When you first run the CLI, it will prompt for: |
| 93 | + - Model provider (volcengine, anthropic, openai, lm-studio, deepseek, ollama) |
| 94 | + - Model base URL |
| 95 | + - API key |
| 96 | + - Model name |
| 97 | + |
| 98 | +2. **Configuration file** - Settings are saved to `~/.gui-agent-cli.json`: |
| 99 | + ```json |
| 100 | + { |
| 101 | + "provider": "openai", |
| 102 | + "baseURL": "https://api.openai.com/v1", |
| 103 | + "apiKey": "your-api-key", |
| 104 | + "model": "gpt-4-vision-preview", |
| 105 | + "useResponsesApi": false |
| 106 | + } |
| 107 | + ``` |
| 108 | + |
| 109 | +3. **Remote presets** - Load configuration from a YAML file: |
| 110 | + ```yaml |
| 111 | + vlmBaseUrl: "https://api.openai.com/v1" |
| 112 | + vlmApiKey: "your-api-key" |
| 113 | + vlmModelName: "gpt-4-vision-preview" |
| 114 | + useResponsesApi: false |
| 115 | + ``` |
| 116 | +
|
| 117 | +#### Supported Providers |
| 118 | +- **volcengine** - VolcEngine (ByteDance) models |
| 119 | +- **anthropic** - Anthropic Claude models |
| 120 | +- **openai** - OpenAI models (default) |
| 121 | +- **lm-studio** - LM Studio local models |
| 122 | +- **deepseek** - DeepSeek models |
| 123 | +- **ollama** - Ollama local models |
| 124 | +
|
| 125 | +## Operators |
| 126 | +
|
| 127 | +### Computer Automation (nut-js) |
| 128 | +
|
| 129 | +#### Using Remote Presets |
| 130 | +```bash |
| 131 | +gui-agent start -p "https://example.com/config.yaml" -q "Automate the login process" |
| 132 | +``` |
| 133 | + |
| 134 | +## Configuration |
| 135 | + |
| 136 | +### Model Configuration |
| 137 | + |
| 138 | +The CLI requires VLM (Vision Language Model) configuration. You can provide this via: |
| 139 | + |
| 140 | +1. **Interactive setup** - When you first run the CLI, it will prompt for: |
| 141 | + - Model provider (volcengine, anthropic, openai, lm-studio, deepseek, ollama) |
| 142 | + - Model base URL |
| 143 | + - API key |
| 144 | + - Model name |
| 145 | + |
| 146 | +2. **Configuration file** - Settings are saved to `~/.gui-agent-cli.json`: |
| 147 | + ```json |
| 148 | + { |
| 149 | + "provider": "openai", |
| 150 | + "baseURL": "https://api.openai.com/v1", |
| 151 | + "apiKey": "your-api-key", |
| 152 | + "model": "gpt-4-vision-preview", |
| 153 | + "useResponsesApi": false |
| 154 | + } |
| 155 | + ``` |
| 156 | + |
| 157 | +3. **Remote presets** - Load configuration from a YAML file: |
| 158 | + ```yaml |
| 159 | + vlmBaseUrl: "https://api.openai.com/v1" |
| 160 | + vlmApiKey: "your-api-key" |
| 161 | + vlmModelName: "gpt-4-vision-preview" |
| 162 | + useResponsesApi: false |
| 163 | + ``` |
| 164 | +
|
| 165 | +#### Supported Providers |
| 166 | +- **volcengine** - VolcEngine (ByteDance) models |
| 167 | +- **anthropic** - Anthropic Claude models |
| 168 | +- **openai** - OpenAI models (default) |
| 169 | +- **lm-studio** - LM Studio local models |
| 170 | +- **deepseek** - DeepSeek models |
| 171 | +- **ollama** - Ollama local models |
| 172 | +
|
| 173 | +## Operators |
| 174 | +
|
| 175 | +### Desktop Automation (nut-js) |
| 176 | +- Automates desktop applications |
| 177 | +- Uses computer vision to identify UI elements |
| 178 | +- Supports mouse and keyboard actions |
| 179 | +- Works with Windows, macOS, and Linux |
| 180 | +
|
| 181 | +### Android Automation (adb) |
| 182 | +- Controls Android devices via ADB |
| 183 | +- Requires USB debugging enabled |
| 184 | +- Can automate mobile apps and system UI |
| 185 | +- Supports touch gestures and device interactions |
| 186 | +
|
| 187 | +## Configuration Management |
| 188 | +
|
| 189 | +### Reset Configuration |
| 190 | +To clear all stored configuration and start fresh: |
| 191 | +```bash |
| 192 | +gui-agent reset |
| 193 | +``` |
| 194 | + |
| 195 | +This will remove the configuration file (`~/.gui-agent-cli.json`) and the CLI will prompt you to configure settings again on the next run. |
| 196 | + |
| 197 | +### Custom Configuration File |
| 198 | +You can specify a custom configuration file location: |
| 199 | +```bash |
| 200 | +gui-agent run -c /path/to/custom-config.json |
| 201 | +``` |
| 202 | + |
| 203 | +To reset a specific configuration file: |
| 204 | +```bash |
| 205 | +gui-agent reset -c /path/to/custom-config.json |
| 206 | +``` |
| 207 | + |
| 208 | +## Development |
| 209 | + |
| 210 | +### Building the CLI |
| 211 | +```bash |
| 212 | +npm run build |
| 213 | +``` |
| 214 | + |
| 215 | +### Development Mode |
| 216 | +```bash |
| 217 | +npm run dev |
| 218 | +``` |
| 219 | + |
| 220 | +### Running Tests |
| 221 | +```bash |
| 222 | +npm test |
| 223 | +``` |
| 224 | + |
| 225 | +## License |
| 226 | + |
| 227 | +Apache-2.0 |
| 228 | + |
| 229 | +## Contributing |
| 230 | + |
| 231 | +Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository. |
0 commit comments