|
| 1 | +--- |
| 2 | +name: agent-device |
| 3 | +description: Automates interactions for iOS simulators/devices and Android emulators/devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile targets. |
| 4 | +--- |
| 5 | + |
| 6 | +# Mobile Automation with agent-device |
| 7 | + |
| 8 | +For agent-driven exploration: use refs. For deterministic replay scripts: use selectors. |
| 9 | + |
| 10 | +## Quick start |
| 11 | + |
| 12 | +```bash |
| 13 | +agent-device open Settings --platform ios |
| 14 | +agent-device snapshot -i |
| 15 | +agent-device press @e3 |
| 16 | +agent-device wait text "Camera" |
| 17 | +agent-device alert wait 10000 |
| 18 | +agent-device fill @e5 "test" |
| 19 | +agent-device close |
| 20 | +``` |
| 21 | + |
| 22 | +If not installed, run: |
| 23 | + |
| 24 | +```bash |
| 25 | +npx -y agent-device |
| 26 | +``` |
| 27 | + |
| 28 | +## Core workflow |
| 29 | + |
| 30 | +1. Open app or deep link: `open [app|url] [url]` (`open` handles target selection + boot/activation in the normal flow) |
| 31 | +2. Snapshot: `snapshot` to get refs from accessibility tree |
| 32 | +3. Interact using refs (`press @ref`, `fill @ref "text"`; `click` is an alias of `press`) |
| 33 | +4. Re-snapshot after navigation/UI changes |
| 34 | +5. Close session when done |
| 35 | + |
| 36 | +## Commands |
| 37 | + |
| 38 | +### Navigation |
| 39 | + |
| 40 | +```bash |
| 41 | +agent-device boot # Ensure target is booted/ready without opening app |
| 42 | +agent-device boot --platform ios # Boot iOS target |
| 43 | +agent-device boot --platform android # Boot Android emulator/device target |
| 44 | +agent-device open [app|url] [url] # Boot device/simulator; optionally launch app or deep link URL |
| 45 | +agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime) |
| 46 | +agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only) |
| 47 | +agent-device open "myapp://home" --platform android # Android deep link |
| 48 | +agent-device open "https://example.com" --platform ios # iOS deep link (opens in browser) |
| 49 | +agent-device open MyApp "myapp://screen/to" --platform ios # iOS deep link in app context |
| 50 | +agent-device close [app] # Close app or just end session |
| 51 | +agent-device reinstall <app> <path> # Uninstall + install app in one command |
| 52 | +agent-device session list # List active sessions |
| 53 | +``` |
| 54 | + |
| 55 | +`boot` requires either an active session or an explicit selector (`--platform`, `--device`, `--udid`, or `--serial`). |
| 56 | +`boot` is a fallback, not a regular step: use it when starting a new session only if `open` cannot find/connect to an available target. |
| 57 | + |
| 58 | +### Snapshot (page analysis) |
| 59 | + |
| 60 | +```bash |
| 61 | +agent-device snapshot # Full XCTest accessibility tree snapshot |
| 62 | +agent-device snapshot -i # Interactive elements only (recommended) |
| 63 | +agent-device snapshot -c # Compact output |
| 64 | +agent-device snapshot -d 3 # Limit depth |
| 65 | +agent-device snapshot -s "Camera" # Scope to label/identifier |
| 66 | +agent-device snapshot --raw # Raw node output |
| 67 | +``` |
| 68 | + |
| 69 | +XCTest is the iOS snapshot engine: fast, complete, and no Accessibility permission required. |
| 70 | + |
| 71 | +### Find (semantic) |
| 72 | + |
| 73 | +```bash |
| 74 | +agent-device find "Sign In" click |
| 75 | +agent-device find text "Sign In" click |
| 76 | +agent-device find label "Email" fill "user@example.com" |
| 77 | +agent-device find value "Search" type "query" |
| 78 | +agent-device find role button click |
| 79 | +agent-device find id "com.example:id/login" click |
| 80 | +agent-device find "Settings" wait 10000 |
| 81 | +agent-device find "Settings" exists |
| 82 | +``` |
| 83 | + |
| 84 | +### Settings helpers |
| 85 | + |
| 86 | +```bash |
| 87 | +agent-device settings wifi on |
| 88 | +agent-device settings wifi off |
| 89 | +agent-device settings airplane on |
| 90 | +agent-device settings airplane off |
| 91 | +agent-device settings location on |
| 92 | +agent-device settings location off |
| 93 | +``` |
| 94 | + |
| 95 | +Note: iOS wifi/airplane toggles status bar indicators, not actual network state. |
| 96 | +Airplane off clears status bar overrides. |
| 97 | +iOS settings helpers are simulator-only. |
| 98 | + |
| 99 | +### App state |
| 100 | + |
| 101 | +```bash |
| 102 | +agent-device appstate |
| 103 | +``` |
| 104 | + |
| 105 | +- Android: `appstate` reports live foreground package/activity. |
| 106 | +- iOS: `appstate` is session-scoped and reports the app tracked by the active session on the target device. |
| 107 | +- For iOS `appstate`, ensure a matching session exists (for example `open --session <name> --platform ios --device "<name>" <app>`). |
| 108 | + |
| 109 | +### Interactions (use @refs from snapshot) |
| 110 | + |
| 111 | +```bash |
| 112 | +agent-device press @e1 # Canonical tap command (`click` is an alias) |
| 113 | +agent-device focus @e2 |
| 114 | +agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch) |
| 115 | +agent-device type "text" # Type into focused field without clearing |
| 116 | +agent-device press 300 500 # Tap by coordinates |
| 117 | +agent-device press 300 500 --count 12 --interval-ms 45 |
| 118 | +agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2 |
| 119 | +agent-device press @e1 --count 5 # Repeat taps on the same target |
| 120 | +agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration |
| 121 | +agent-device swipe 540 1500 540 500 120 |
| 122 | +agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong |
| 123 | +agent-device long-press 300 500 800 # Long press (where supported) |
| 124 | +agent-device scroll down 0.5 |
| 125 | +agent-device pinch 2.0 # Zoom in 2x (iOS simulator only) |
| 126 | +agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only) |
| 127 | +agent-device back |
| 128 | +agent-device home |
| 129 | +agent-device app-switcher |
| 130 | +agent-device wait 1000 |
| 131 | +agent-device wait text "Settings" |
| 132 | +agent-device is visible 'id="settings_anchor"' # selector assertions for deterministic checks |
| 133 | +agent-device is text 'id="header_title"' "Settings" |
| 134 | +agent-device alert get |
| 135 | +``` |
| 136 | + |
| 137 | +### Get information |
| 138 | + |
| 139 | +```bash |
| 140 | +agent-device get text @e1 |
| 141 | +agent-device get attrs @e1 |
| 142 | +agent-device screenshot out.png |
| 143 | +``` |
| 144 | + |
| 145 | +### Deterministic replay and updating |
| 146 | + |
| 147 | +```bash |
| 148 | +agent-device open App --relaunch # Fresh app process restart in the current session |
| 149 | +agent-device open App --save-script # Save session script (.ad) on close (default path) |
| 150 | +agent-device open App --save-script ./workflows/app-flow.ad # Save to custom file path |
| 151 | +agent-device replay ./session.ad # Run deterministic replay from .ad script |
| 152 | +agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place |
| 153 | +``` |
| 154 | + |
| 155 | +`replay` reads `.ad` recordings. |
| 156 | +`--relaunch` controls launch semantics; `--save-script` controls recording. Combine only when both are needed. |
| 157 | +`--save-script` path is a file path; parent directories are created automatically. |
| 158 | +For ambiguous bare values, use `--save-script=workflow.ad` or `./workflow.ad`. |
| 159 | + |
| 160 | +### Fast batching (JSON steps) |
| 161 | + |
| 162 | +Use `batch` when an agent already has a known short sequence and wants fewer orchestration round trips. |
| 163 | + |
| 164 | +```bash |
| 165 | +agent-device batch \ |
| 166 | + --session sim \ |
| 167 | + --platform ios \ |
| 168 | + --udid 00008150-001849640CF8401C \ |
| 169 | + --steps-file /tmp/batch-steps.json \ |
| 170 | + --json |
| 171 | +``` |
| 172 | + |
| 173 | +Inline JSON works for small payloads: |
| 174 | + |
| 175 | +```bash |
| 176 | +agent-device batch --steps '[{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]' |
| 177 | +``` |
| 178 | + |
| 179 | +Step format: |
| 180 | + |
| 181 | +```json |
| 182 | +[ |
| 183 | + { "command": "open", "positionals": ["settings"], "flags": {} }, |
| 184 | + { "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} }, |
| 185 | + { "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} }, |
| 186 | + { "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} } |
| 187 | +] |
| 188 | +``` |
| 189 | + |
| 190 | +Batch best practices: |
| 191 | + |
| 192 | +- Batch one screen-local flow at a time. |
| 193 | +- Add sync guards (`wait`, `is exists`) after mutating steps (`open`, `click`, `fill`, `swipe`). |
| 194 | +- Treat prior refs/snapshot assumptions as stale after UI mutations. |
| 195 | +- Prefer `--steps-file` over inline JSON. |
| 196 | +- Keep batches moderate (about 5-20 steps). |
| 197 | +- Use failure context (`step`, `partialResults`) to replan from the failed step. |
| 198 | + |
| 199 | +Stale accessibility tree note: |
| 200 | + |
| 201 | +- Rapid mutations can outrun accessibility tree updates. |
| 202 | +- Mitigate with explicit waits and phase splitting (navigate, verify/extract, cleanup). |
| 203 | + |
| 204 | +### Trace logs (XCTest) |
| 205 | + |
| 206 | +```bash |
| 207 | +agent-device trace start # Start trace capture |
| 208 | +agent-device trace start ./trace.log # Start trace capture to path |
| 209 | +agent-device trace stop # Stop trace capture |
| 210 | +agent-device trace stop ./trace.log # Stop and move trace log |
| 211 | +``` |
| 212 | + |
| 213 | +### Devices and apps |
| 214 | + |
| 215 | +```bash |
| 216 | +agent-device devices |
| 217 | +agent-device apps --platform ios # iOS simulator + iOS device, includes default/system apps |
| 218 | +agent-device apps --platform ios --all # explicit include-all (same as default) |
| 219 | +agent-device apps --platform ios --user-installed |
| 220 | +agent-device apps --platform android # includes default/system apps |
| 221 | +agent-device apps --platform android --all # explicit include-all (same as default) |
| 222 | +agent-device apps --platform android --user-installed |
| 223 | +``` |
| 224 | + |
| 225 | +## Best practices |
| 226 | + |
| 227 | +- `press` is the canonical tap command; `click` is an alias with the same behavior. |
| 228 | +- `press` (and `click`) accepts `x y`, `@ref`, and selector targets. |
| 229 | +- `press`/`click` support gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`, `--double-tap`. |
| 230 | +- `--double-tap` cannot be combined with `--hold-ms` or `--jitter-px`. |
| 231 | +- `swipe` supports coordinate + timing controls and repeat patterns: `swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern`. |
| 232 | +- `swipe` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid long-press side effects. |
| 233 | +- Pinch (`pinch <scale> [x y]`) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out. |
| 234 | +- Snapshot refs are the core mechanism for interactive agent flows. |
| 235 | +- Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows). |
| 236 | +- Prefer `snapshot -i` to reduce output size. |
| 237 | +- On iOS, snapshots use XCTest and do not require Accessibility permission. |
| 238 | +- If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state. |
| 239 | +- `open <app|url> [url]` can be used within an existing session to switch apps or open deep links. |
| 240 | +- `open <app>` updates session app bundle context; `open <app> <url>` opens a deep link on iOS. |
| 241 | +- Use `open <app> --relaunch` during React Native/Fast Refresh debugging when you need a fresh app process without ending the session. |
| 242 | +- Use `--session <name>` for parallel sessions; avoid device contention. |
| 243 | +- Use `--activity <component>` on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens. |
| 244 | +- On iOS devices, `http(s)://` URLs fall back to Safari automatically; custom scheme URLs require an active app in the session. |
| 245 | +- iOS physical-device runner requires Xcode signing/provisioning; optional overrides: `AGENT_DEVICE_IOS_TEAM_ID`, `AGENT_DEVICE_IOS_SIGNING_IDENTITY`, `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`. |
| 246 | +- Default daemon request timeout is `45000`ms. For slow physical-device setup/build, increase `AGENT_DEVICE_DAEMON_TIMEOUT_MS` (for example `120000`). |
| 247 | +- For daemon startup troubleshooting, follow stale metadata hints for `~/.agent-device/daemon.json` / `~/.agent-device/daemon.lock`. |
| 248 | +- Use `fill` when you want clear-then-type semantics. |
| 249 | +- Use `type` when you want to append/enter text without clearing. |
| 250 | +- On Android, prefer `fill` for important fields; it verifies entered text and retries once when IME reorders characters. |
| 251 | +- If using deterministic replay scripts, use `replay -u` during maintenance runs to update selector drift in replay scripts. Use plain `replay` in CI. |
| 252 | + |
| 253 | +## References |
| 254 | + |
| 255 | +- [references/snapshot-refs.md](references/snapshot-refs.md) |
| 256 | +- [references/session-management.md](references/session-management.md) |
| 257 | +- [references/permissions.md](references/permissions.md) |
| 258 | +- [references/video-recording.md](references/video-recording.md) |
| 259 | +- [references/coordinate-system.md](references/coordinate-system.md) |
| 260 | +- [references/batching.md](references/batching.md) |
0 commit comments