Skip to content

Commit 680e2ed

Browse files
committed
chore(app): update agent skills
1 parent 3331fdb commit 680e2ed

13 files changed

Lines changed: 536 additions & 2 deletions

File tree

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
---
2+
name: agent-device
3+
description: Automates interactions for iOS simulators/devices and Android emulators/devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile targets.
4+
---
5+
6+
# Mobile Automation with agent-device
7+
8+
For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.
9+
10+
## Quick start
11+
12+
```bash
13+
agent-device open Settings --platform ios
14+
agent-device snapshot -i
15+
agent-device press @e3
16+
agent-device wait text "Camera"
17+
agent-device alert wait 10000
18+
agent-device fill @e5 "test"
19+
agent-device close
20+
```
21+
22+
If not installed, run:
23+
24+
```bash
25+
npx -y agent-device
26+
```
27+
28+
## Core workflow
29+
30+
1. Open app or deep link: `open [app|url] [url]` (`open` handles target selection + boot/activation in the normal flow)
31+
2. Snapshot: `snapshot` to get refs from accessibility tree
32+
3. Interact using refs (`press @ref`, `fill @ref "text"`; `click` is an alias of `press`)
33+
4. Re-snapshot after navigation/UI changes
34+
5. Close session when done
35+
36+
## Commands
37+
38+
### Navigation
39+
40+
```bash
41+
agent-device boot # Ensure target is booted/ready without opening app
42+
agent-device boot --platform ios # Boot iOS target
43+
agent-device boot --platform android # Boot Android emulator/device target
44+
agent-device open [app|url] [url] # Boot device/simulator; optionally launch app or deep link URL
45+
agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
46+
agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
47+
agent-device open "myapp://home" --platform android # Android deep link
48+
agent-device open "https://example.com" --platform ios # iOS deep link (opens in browser)
49+
agent-device open MyApp "myapp://screen/to" --platform ios # iOS deep link in app context
50+
agent-device close [app] # Close app or just end session
51+
agent-device reinstall <app> <path> # Uninstall + install app in one command
52+
agent-device session list # List active sessions
53+
```
54+
55+
`boot` requires either an active session or an explicit selector (`--platform`, `--device`, `--udid`, or `--serial`).
56+
`boot` is a fallback, not a regular step: use it when starting a new session only if `open` cannot find/connect to an available target.
57+
58+
### Snapshot (page analysis)
59+
60+
```bash
61+
agent-device snapshot # Full XCTest accessibility tree snapshot
62+
agent-device snapshot -i # Interactive elements only (recommended)
63+
agent-device snapshot -c # Compact output
64+
agent-device snapshot -d 3 # Limit depth
65+
agent-device snapshot -s "Camera" # Scope to label/identifier
66+
agent-device snapshot --raw # Raw node output
67+
```
68+
69+
XCTest is the iOS snapshot engine: fast, complete, and no Accessibility permission required.
70+
71+
### Find (semantic)
72+
73+
```bash
74+
agent-device find "Sign In" click
75+
agent-device find text "Sign In" click
76+
agent-device find label "Email" fill "user@example.com"
77+
agent-device find value "Search" type "query"
78+
agent-device find role button click
79+
agent-device find id "com.example:id/login" click
80+
agent-device find "Settings" wait 10000
81+
agent-device find "Settings" exists
82+
```
83+
84+
### Settings helpers
85+
86+
```bash
87+
agent-device settings wifi on
88+
agent-device settings wifi off
89+
agent-device settings airplane on
90+
agent-device settings airplane off
91+
agent-device settings location on
92+
agent-device settings location off
93+
```
94+
95+
Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
96+
Airplane off clears status bar overrides.
97+
iOS settings helpers are simulator-only.
98+
99+
### App state
100+
101+
```bash
102+
agent-device appstate
103+
```
104+
105+
- Android: `appstate` reports live foreground package/activity.
106+
- iOS: `appstate` is session-scoped and reports the app tracked by the active session on the target device.
107+
- For iOS `appstate`, ensure a matching session exists (for example `open --session <name> --platform ios --device "<name>" <app>`).
108+
109+
### Interactions (use @refs from snapshot)
110+
111+
```bash
112+
agent-device press @e1 # Canonical tap command (`click` is an alias)
113+
agent-device focus @e2
114+
agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch)
115+
agent-device type "text" # Type into focused field without clearing
116+
agent-device press 300 500 # Tap by coordinates
117+
agent-device press 300 500 --count 12 --interval-ms 45
118+
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
119+
agent-device press @e1 --count 5 # Repeat taps on the same target
120+
agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration
121+
agent-device swipe 540 1500 540 500 120
122+
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
123+
agent-device long-press 300 500 800 # Long press (where supported)
124+
agent-device scroll down 0.5
125+
agent-device pinch 2.0 # Zoom in 2x (iOS simulator only)
126+
agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only)
127+
agent-device back
128+
agent-device home
129+
agent-device app-switcher
130+
agent-device wait 1000
131+
agent-device wait text "Settings"
132+
agent-device is visible 'id="settings_anchor"' # selector assertions for deterministic checks
133+
agent-device is text 'id="header_title"' "Settings"
134+
agent-device alert get
135+
```
136+
137+
### Get information
138+
139+
```bash
140+
agent-device get text @e1
141+
agent-device get attrs @e1
142+
agent-device screenshot out.png
143+
```
144+
145+
### Deterministic replay and updating
146+
147+
```bash
148+
agent-device open App --relaunch # Fresh app process restart in the current session
149+
agent-device open App --save-script # Save session script (.ad) on close (default path)
150+
agent-device open App --save-script ./workflows/app-flow.ad # Save to custom file path
151+
agent-device replay ./session.ad # Run deterministic replay from .ad script
152+
agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place
153+
```
154+
155+
`replay` reads `.ad` recordings.
156+
`--relaunch` controls launch semantics; `--save-script` controls recording. Combine only when both are needed.
157+
`--save-script` path is a file path; parent directories are created automatically.
158+
For ambiguous bare values, use `--save-script=workflow.ad` or `./workflow.ad`.
159+
160+
### Fast batching (JSON steps)
161+
162+
Use `batch` when an agent already has a known short sequence and wants fewer orchestration round trips.
163+
164+
```bash
165+
agent-device batch \
166+
--session sim \
167+
--platform ios \
168+
--udid 00008150-001849640CF8401C \
169+
--steps-file /tmp/batch-steps.json \
170+
--json
171+
```
172+
173+
Inline JSON works for small payloads:
174+
175+
```bash
176+
agent-device batch --steps '[{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]'
177+
```
178+
179+
Step format:
180+
181+
```json
182+
[
183+
{ "command": "open", "positionals": ["settings"], "flags": {} },
184+
{ "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
185+
{ "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
186+
{ "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
187+
]
188+
```
189+
190+
Batch best practices:
191+
192+
- Batch one screen-local flow at a time.
193+
- Add sync guards (`wait`, `is exists`) after mutating steps (`open`, `click`, `fill`, `swipe`).
194+
- Treat prior refs/snapshot assumptions as stale after UI mutations.
195+
- Prefer `--steps-file` over inline JSON.
196+
- Keep batches moderate (about 5-20 steps).
197+
- Use failure context (`step`, `partialResults`) to replan from the failed step.
198+
199+
Stale accessibility tree note:
200+
201+
- Rapid mutations can outrun accessibility tree updates.
202+
- Mitigate with explicit waits and phase splitting (navigate, verify/extract, cleanup).
203+
204+
### Trace logs (XCTest)
205+
206+
```bash
207+
agent-device trace start # Start trace capture
208+
agent-device trace start ./trace.log # Start trace capture to path
209+
agent-device trace stop # Stop trace capture
210+
agent-device trace stop ./trace.log # Stop and move trace log
211+
```
212+
213+
### Devices and apps
214+
215+
```bash
216+
agent-device devices
217+
agent-device apps --platform ios # iOS simulator + iOS device, includes default/system apps
218+
agent-device apps --platform ios --all # explicit include-all (same as default)
219+
agent-device apps --platform ios --user-installed
220+
agent-device apps --platform android # includes default/system apps
221+
agent-device apps --platform android --all # explicit include-all (same as default)
222+
agent-device apps --platform android --user-installed
223+
```
224+
225+
## Best practices
226+
227+
- `press` is the canonical tap command; `click` is an alias with the same behavior.
228+
- `press` (and `click`) accepts `x y`, `@ref`, and selector targets.
229+
- `press`/`click` support gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`, `--double-tap`.
230+
- `--double-tap` cannot be combined with `--hold-ms` or `--jitter-px`.
231+
- `swipe` supports coordinate + timing controls and repeat patterns: `swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern`.
232+
- `swipe` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid long-press side effects.
233+
- Pinch (`pinch <scale> [x y]`) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out.
234+
- Snapshot refs are the core mechanism for interactive agent flows.
235+
- Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
236+
- Prefer `snapshot -i` to reduce output size.
237+
- On iOS, snapshots use XCTest and do not require Accessibility permission.
238+
- If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
239+
- `open <app|url> [url]` can be used within an existing session to switch apps or open deep links.
240+
- `open <app>` updates session app bundle context; `open <app> <url>` opens a deep link on iOS.
241+
- Use `open <app> --relaunch` during React Native/Fast Refresh debugging when you need a fresh app process without ending the session.
242+
- Use `--session <name>` for parallel sessions; avoid device contention.
243+
- Use `--activity <component>` on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens.
244+
- On iOS devices, `http(s)://` URLs fall back to Safari automatically; custom scheme URLs require an active app in the session.
245+
- iOS physical-device runner requires Xcode signing/provisioning; optional overrides: `AGENT_DEVICE_IOS_TEAM_ID`, `AGENT_DEVICE_IOS_SIGNING_IDENTITY`, `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`.
246+
- Default daemon request timeout is `45000`ms. For slow physical-device setup/build, increase `AGENT_DEVICE_DAEMON_TIMEOUT_MS` (for example `120000`).
247+
- For daemon startup troubleshooting, follow stale metadata hints for `~/.agent-device/daemon.json` / `~/.agent-device/daemon.lock`.
248+
- Use `fill` when you want clear-then-type semantics.
249+
- Use `type` when you want to append/enter text without clearing.
250+
- On Android, prefer `fill` for important fields; it verifies entered text and retries once when IME reorders characters.
251+
- If using deterministic replay scripts, use `replay -u` during maintenance runs to update selector drift in replay scripts. Use plain `replay` in CI.
252+
253+
## References
254+
255+
- [references/snapshot-refs.md](references/snapshot-refs.md)
256+
- [references/session-management.md](references/session-management.md)
257+
- [references/permissions.md](references/permissions.md)
258+
- [references/video-recording.md](references/video-recording.md)
259+
- [references/coordinate-system.md](references/coordinate-system.md)
260+
- [references/batching.md](references/batching.md)
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Batching
2+
3+
## When to use batch
4+
5+
- The agent already knows a short sequence of commands.
6+
- Steps belong to one logical screen flow.
7+
- You want one result object with per-step timing and failure context.
8+
9+
## When not to use batch
10+
11+
- Flows are unrelated and should be retried independently.
12+
- The workflow is highly dynamic and requires replanning after each step.
13+
- You need human approvals between steps.
14+
15+
## CLI patterns
16+
17+
From file:
18+
19+
```bash
20+
agent-device batch --session sim --platform ios --steps-file /tmp/batch-steps.json --json
21+
```
22+
23+
Inline (small payloads only):
24+
25+
```bash
26+
agent-device batch --steps '[{"command":"open","positionals":["settings"]}]'
27+
```
28+
29+
## Step payload contract
30+
31+
```json
32+
[
33+
{ "command": "open", "positionals": ["settings"], "flags": {} },
34+
{ "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
35+
{ "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
36+
{ "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
37+
]
38+
```
39+
40+
Rules:
41+
42+
- `positionals` optional, defaults to `[]`.
43+
- `flags` optional, defaults to `{}`.
44+
- nested `batch` and `replay` are rejected.
45+
- stop-on-first-error is the supported mode (`--on-error stop`).
46+
47+
## Response handling
48+
49+
Success includes:
50+
51+
- `total`, `executed`, `totalDurationMs`
52+
- `results[]` entries with `step`, `command`, `durationMs`, and optional `data`
53+
54+
Failure includes:
55+
56+
- `details.step`
57+
- `details.command`
58+
- `details.executed`
59+
- `details.partialResults`
60+
61+
Use these fields to replan from the first failing step.
62+
63+
## Common error categories and agent actions
64+
65+
- `INVALID_ARGS`: payload/step shape issue; fix payload and retry.
66+
- `SESSION_NOT_FOUND`: open or select the correct session, then retry.
67+
- `UNSUPPORTED_OPERATION`: switch command/target to supported operation.
68+
- `AMBIGUOUS_MATCH`: refine selector/locator, then retry failed step.
69+
- `COMMAND_FAILED`: add sync guard (`wait`, `is exists`) and retry from failed step.
70+
71+
## Reliability guardrails
72+
73+
- Add sync guards after mutating steps.
74+
- Assume snapshot/ref drift after navigation.
75+
- Keep batch size moderate (about 5-20 steps).
76+
- Split long workflows into phases:
77+
1. navigate
78+
2. verify/extract
79+
3. cleanup
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Coordinate System
2+
3+
All coordinate-based actions use device screen coordinates:
4+
5+
- Origin: top-left of the device screen
6+
- Units: device points for iOS, pixels for Android
7+
8+
Use screenshots to reason about coordinates.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Permissions and Setup
2+
3+
## iOS snapshots
4+
5+
iOS snapshots use XCTest and do not require macOS Accessibility permissions.
6+
7+
## iOS physical device runner
8+
9+
For iOS physical devices, XCTest runner setup requires valid signing/provisioning.
10+
Use Automatic Signing in Xcode, or provide optional overrides:
11+
12+
- `AGENT_DEVICE_IOS_TEAM_ID`
13+
- `AGENT_DEVICE_IOS_SIGNING_IDENTITY`
14+
- `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`
15+
16+
If setup/build takes long, increase:
17+
18+
- `AGENT_DEVICE_DAEMON_TIMEOUT_MS` (default `45000`, for example `120000`)
19+
20+
If daemon startup fails with stale metadata hints, clean stale files and retry:
21+
22+
- `~/.agent-device/daemon.json`
23+
- `~/.agent-device/daemon.lock`
24+
25+
## iOS: "Allow Paste" dialog
26+
27+
iOS 16+ shows an "Allow Paste" prompt when an app reads the system pasteboard. Under XCUITest (which `agent-device` uses), this prompt is suppressed by the testing runtime. Use `xcrun simctl pbcopy booted` to set clipboard content directly on the simulator instead.
28+
29+
## Simulator troubleshooting
30+
31+
- If snapshots return 0 nodes, restart Simulator and re-open the app.

0 commit comments

Comments
 (0)