android-tester

android-tester is a CLI tool for Q/A testing of Android apps on internal development devices.

To reliably run test cases, it uses a vision-based dual-agent loop:

Operator — selects the next UI action from the current screen context.
Reflector — evaluates before/after screenshots and updates progress.

Test instructions are given in natural language. Progress and results are streamed to stdout as JSON lines.

Prerequisites

python >= 3.11
uv
make (optional)
A reachable Android device or emulator (USB or network ADB)

Installation

From the skill directory:

make install

This installs Python dependencies (via uv) and ADB.

The CLI loads configuration from an optional config.env file in the skill root.

Copy config.env.example to get started:

cp config.env.example config.env

Priority (highest wins): CLI arg > env var > config.env > built-in default.

Usage

The CLI has two subcommands:

android-tester run — run a single test instruction on one device.
android-tester batch — run a set of test cases from a JSON document, sharded across multiple devices (one async worker per device).

android-tester run --device-id <DEVICE_ID> --instructions "<TEST_INSTRUCTION>"

android-tester batch --file <CASES_JSON> --devices <DEVICE_ID> [<DEVICE_ID> ...]

Common arguments (both subcommands)

Argument	Required	Default	Description
`-o`, `--output-dir`	no	`./output/<task-id>` (run) / `./output` (batch)	Output directory. In `batch`, each task gets a `<task-id>` subdirectory under this root.
`--sico-endpoint`	no	`SICO_ENDPOINT` env var	Sico platform base URL
`--sico-app-name`	no	`sico`	Sico application name used to construct API paths
`--sico-agent-instance-id`	no	`SICO_AGENT_INSTANCE_ID` env var	Agent instance ID for X-Sico-Context header
`--llmhub-model`	no	`gpt5.4`	LLM model identifier
`--llmhub-model-image-size`	no	—	LLM perceived image size as `WIDTHxHEIGHT` (e.g. `1024x768`). When set, action coordinates are rescaled to match the actual screenshot size
`--model-auto-resize-width`	no	`768`	Target width (in pixels) for downscaling screenshots before sending them to the LLM. Set to `0` to disable. Only takes effect when `--llmhub-model-image-size` is unset.
`--log-level`	no	`WARNING`	Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`)
`--telemetry` / `--no-telemetry`	no	enabled	Enable or disable telemetry collection
`--reflector` / `--no-reflector`	no	disabled	Enable or disable the reflector step after each action
`--max-no-progress-steps`	no	`6`	Stop after this many steps without progress
`--max-repetitive-actions`	no	`5`	Stop after this many identical consecutive actions
`--n-retries-if-failed`	no	`0`	Re-run the whole pipeline up to this many additional times on failure

`run` arguments

Argument	Required	Default	Description
`--device-id`	yes	—	ADB device serial or `host:port` (e.g. `10.0.0.5:5555`)
`--instructions`	yes	—	Natural-language test instruction to execute
`--task-id`	no	auto-generated UUID	Unique task identifier
`--task-name`	no	—	Human-readable label for the test run
`--device-name`	no	same as `--device-id`	Friendly device name used in logs

`batch` arguments

Argument	Required	Default	Description
`--file`	one of `--file`/`--test-cases`	—	Path to a JSON file with test cases. Use `-` to read JSON from stdin.
`--test-cases`	one of `--file`/`--test-cases`	—	Inline JSON document with the same shape as `--file`.
`--devices`	yes	—	One or more ADB device serials or `host:port` entries. One async worker is spawned per device; cases are pulled from a shared queue.

Test-cases JSON format

{
  "test-cases": [
    {
      "instruction": "Open Settings and enable Wi-Fi",
      "task-name": "Enable Wi-Fi",
      "task-id": "tc-001"
    },
    {
      "instruction": "Open Microsoft Edge and navigate to bing.com"
    }
  ]
}

instruction is required.
task-name and task-id are optional. Missing task-id is filled with an auto-generated UUID.
The root is a dict so future top-level keys can be added without breaking existing files.

Examples

Single run:

android-tester run \
  --device-id emulator-5554 \
  --instructions "Open Settings and enable Wi-Fi" \
  --task-name "Enable Wi-Fi"

Batch from a file across two devices:

android-tester batch \
  --file ./cases.json \
  --devices 127.0.0.1:5557 127.0.0.1:5559

Batch from inline JSON:

android-tester batch \
  --test-cases '{"test-cases":[{"instruction":"Open Settings"}]}' \
  --devices emulator-5554

Batch from stdin:

cat cases.json | android-tester batch --file - --devices emulator-5554

Environment Variables

Configured via config.env in the skill root (see config.env.example):

Variable	Required	Default	Description
`SICO_ENDPOINT`	yes	—	Sico platform base URL
`SICO_AGENT_INSTANCE_ID`	no	—	Agent instance ID for X-Sico-Context header
`LLMHUB_MODEL`	no	`gpt5.4`	Model name for LLM Hub requests
`LLMHUB_MODEL_IMAGE_SIZE`	no	—	LLM perceived image size as `WIDTHxHEIGHT` (e.g. `1024x768`). When set, action coordinates are rescaled to match the actual screenshot size
`MODEL_AUTO_RESIZE_WIDTH`	no	`0`	Target width (in pixels) for downscaling screenshots before sending them to the LLM, preserving aspect ratio. Screenshots narrower than this value are sent unchanged. Required for some models (e.g. GPT-5) to effectively operate device GUIs; a typical value is `768`. Set to `0` to disable. Only takes effect when `LLMHUB_MODEL_IMAGE_SIZE` is unset.
`LOG_LEVEL`	no	`INFO`	Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`)
`TELEMETRY_ENABLED`	no	`true`	Enable telemetry collection (`true`, `false`)
`REFLECTOR_ENABLED`	no	`false`	Enable reflector step after each action (`true`, `false`)
`MAX_NO_PROGRESS_STEPS`	no	`6`	Maximum consecutive no-progress steps before stopping
`MAX_REPETITIVE_ACTIONS`	no	`5`	Maximum repetitive actions before stopping

Output

Stdout emits one JSON object per line. Every message includes an event field and an ISO-8601 timestamp.

In run mode, all per-step events are written to stdout. In batch mode, only aggregate progress events (task_started, task_finished, task_error) are written to stdout; per-task per-step events are written to <output-dir>/<task-id>/events.jsonl.

All output files are written to the output directory (logged to stdout at startup):

Per-task events file (batch only): <output-dir>/<task-id>/events.jsonl
Per-task error trace (batch only, on exception): <output-dir>/<task-id>/error.txt
Screenshots: step-NNN-1-before.png, step-NNN-2-after.png
Report: report.html — self-contained HTML report with screenshots, execution summary, and per-step logs

Event types

screenshot — captured before/after each action:

{"event": "screenshot", "timestamp": "2026-04-14T09:15:02.123456+00:00", "task_id": "...", "step": 1, "image_path": "logs/step_001_before.png", "description": "before-action"}

operator_log — operator agent output (thought, action, description):

{"event": "operator_log", "timestamp": "2026-04-14T09:15:03.456789+00:00", "task_id": "...", "step": 1, "message": "PressBack()"}

reflector_log — reflector agent output (outcome, progress, next goal):

{"event": "reflector_log", "timestamp": "2026-04-14T09:15:04.789012+00:00", "task_id": "...", "step": 1, "message": "Wi-Fi toggle is now enabled. Progress: 80%"}

task_result — final execution result:

{"event": "task_result", "timestamp": "2026-04-14T09:15:30.345678+00:00", "task_id": "...", "duration": 32.4, "status": "completed", "reason": "Task completed successfully"}

Batch progress events (stdout, `batch` mode only)

task_started — emitted when a worker picks up a case:

{"event": "task_started", "task_id": "...", "task_name": "...", "device_id": "127.0.0.1:5559", "instruction": "...", "log_file": "output/<task-id>/events.jsonl", "completed": 3, "succeeded": 2, "failed": 1, "total": 10, "remaining": 7, "progress": 0.3}

task_finished — emitted when a case finishes (success or failure):

{"event": "task_finished", "task_id": "...", "task_name": "...", "device_id": "127.0.0.1:5559", "status": "completed", "log_file": "output/<task-id>/events.jsonl", "completed": 4, "succeeded": 3, "failed": 1, "total": 10, "remaining": 6, "progress": 0.4}

task_error — emitted when a case raises an unhandled exception. The full traceback is also written to <output-dir>/<task-id>/error.txt:

{"event": "task_error", "task_id": "...", "task_name": "...", "device_id": "127.0.0.1:5559", "log_file": "output/<task-id>/events.jsonl", "error_file": "output/<task-id>/error.txt", "error": "...", "error_type": "ADBCommandError"}

Exit Codes

Code	Meaning
`0`	All test cases completed successfully
`1`	At least one test case failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

android-tester

Prerequisites

Installation

Usage

Common arguments (both subcommands)

`run` arguments

`batch` arguments

Test-cases JSON format

Examples

Environment Variables

Output

Event types

Batch progress events (stdout, `batch` mode only)

Exit Codes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

android-tester

Prerequisites

Installation

Usage

Common arguments (both subcommands)

run arguments

batch arguments

Test-cases JSON format

Examples

Environment Variables

Output

Event types

Batch progress events (stdout, batch mode only)

Exit Codes

`run` arguments

`batch` arguments

Batch progress events (stdout, `batch` mode only)