Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Sid: A Token-Efficient CLI for iOS Automation
# Pippin: A Token-Efficient CLI for iOS Automation

**Sid** (Simulator Driver) is a command-line interface designed to bridge the gap between Large Language Models (LLMs) and the iOS Simulator. It provides a set of stateless, atomic commands to inspect, interact with, and verify the state of iOS applications running in the Simulator.
**Pippin** (Simulator Driver) is a command-line interface designed to bridge the gap between Large Language Models (LLMs) and the iOS Simulator. It provides a set of stateless, atomic commands to inspect, interact with, and verify the state of iOS applications running in the Simulator.

## Features

Expand All @@ -10,86 +10,86 @@

## Installation

You can run Sid directly using `uvx` (recommended):
You can run Pippin directly using `uvx` (recommended):

```bash
uvx sid --help
uvx pippin --help
```

Or install it via pip:

```bash
pip install sid
pip install pippin
```

*Note: You must have `idb` (iOS Development Bridge) and Xcode command-line tools installed and configured on your machine.*

## Usage

Sid commands follow the structure: `sid [command] [subcommand] [flags]`
Pippin commands follow the structure: `pippin [command] [subcommand] [flags]`

### Vision (Seeing the Screen)

* **Inspect UI:** Get a JSON representation of the current screen.
```bash
sid inspect --interactive-only
pippin inspect --interactive-only
```
* **Take Screenshot:** Capture the visual state.
```bash
sid screenshot output.png
pippin screenshot output.png
```

### Interaction (Acting on the App)

* **Tap Element:** Tap by accessibility identifier or label text.
```bash
sid tap "Log In"
pippin tap "Log In"
```
* **Type Text:** Enter text into the focused field.
```bash
sid type "user@example.com" --submit
pippin type "user@example.com" --submit
```
* **Scroll:** Scroll in a direction, optionally until an element is found.
```bash
sid scroll down --until-visible "Submit"
pippin scroll down --until-visible "Submit"
```
* **Gestures:** Perform swipes.
```bash
sid gesture swipe 100,200 100,400
pippin gesture swipe 100,200 100,400
```

### System (Controlling the Environment)

* **Launch App:** Launch an app by Bundle ID.
```bash
sid launch com.example.myapp --clean
pippin launch com.example.myapp --clean
```
* **Open URL:** Open a deep link.
```bash
sid open "myapp://settings"
pippin open "myapp://settings"
```
* **Permissions:** Manage privacy permissions.
```bash
sid permission camera grant
pippin permission camera grant
```
* **Location:** Set simulated GPS coordinates.
```bash
sid location 37.7749 -122.4194
pippin location 37.7749 -122.4194
```

### Verification (Checking State)

* **Assert:** Verify UI state (exists, visible, hidden, text matches).
```bash
sid assert "Welcome Message" visible
pippin assert "Welcome Message" visible
```
* **Logs:** Fetch recent app logs.
```bash
sid logs --crash-report
pippin logs --crash-report
```
* **File Tree:** List files in the app's sandbox.
```bash
sid tree documents
pippin tree documents
```

## Contributing
Expand Down
66 changes: 33 additions & 33 deletions SPEC.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Sid: A Token-Efficient CLI for iOS Automation
# Pippin: A Token-Efficient CLI for iOS Automation

## 1. Philosophy & Goals
**Sid** (Simulator Driver) is a command-line interface designed to bridge the gap between Large Language Models (LLMs) and the iOS Simulator.
**Pippin** (Simulator Driver) is a command-line interface designed to bridge the gap between Large Language Models (LLMs) and the iOS Simulator.

* **Token Efficiency (The "Narrow Context" Principle):** Sid’s primary output is a simplified, text-based JSON representation of the UI. This allows LLMs to "see" the screen using minimal tokens, avoiding the high cost and latency of processing raw screenshots.
* **Stateless Atomic Actions:** Each command is independent. Sid does not maintain a complex session, making it easier for an Agent to reason about the state at any given step.
* **Native Wrapper:** Under the hood, Sid orchestrates `xcrun simctl` (for system tasks) and `idb` (for deep accessibility inspection).
* **Token Efficiency (The "Narrow Context" Principle):** Pippin’s primary output is a simplified, text-based JSON representation of the UI. This allows LLMs to "see" the screen using minimal tokens, avoiding the high cost and latency of processing raw screenshots.
* **Stateless Atomic Actions:** Each command is independent. Pippin does not maintain a complex session, making it easier for an Agent to reason about the state at any given step.
* **Native Wrapper:** Under the hood, Pippin orchestrates `xcrun simctl` (for system tasks) and `idb` (for deep accessibility inspection).

---

## 2. Architecture
* **Interface:** `sid [command] [subcommand] [flags]`
* **Interface:** `pippin [command] [subcommand] [flags]`
* **Output Format:** Standard JSON (for machine parsing) or human-readable text.
* **Error Handling:** Returns strictly formatted error codes and descriptive messages to help the LLM self-correct (e.g., `ERR_ELEMENT_NOT_FOUND`, `ERR_APP_CRASHED`).

Expand All @@ -21,7 +21,7 @@
### 3.1. Vision (The "See" Commands)
*These commands generate the context for the LLM to understand the current state.*

#### `sid inspect`
#### `pippin inspect`
Returns a simplified JSON tree of the current screen's accessibility hierarchy.
* **Flag:** `--interactive-only` (Default: `true`). Filters out structural containers (`Window`, `Other`) and keeps actionable elements (`Button`, `TextField`, `Cell`, `Switch`, `StaticText`).
* **Flag:** `--depth [n]`. Limits the hierarchy depth to save tokens.
Expand All @@ -37,7 +37,7 @@ Returns a simplified JSON tree of the current screen's accessibility hierarchy.
}
```

#### `sid screenshot`
#### `pippin screenshot`
Captures the visual state for verification or multimodal fallback.
* **Args:** `[filename]`
* **Flag:** `--mask-text` (Optional). Redacts text for privacy/security before saving.
Expand All @@ -47,57 +47,57 @@ Captures the visual state for verification or multimodal fallback.
### 3.2. Interaction (The "Act" Commands)
*Direct manipulation of the app UI.*

#### `sid tap`
#### `pippin tap`
Taps a UI element.
* **Targeting Logic:** Accepts a string query.
1. **Exact Match:** Accessibility Identifier.
2. **Fuzzy Match:** Label text (e.g., "Login" matches "Log In").
3. **Coordinate Fallback:** `--x [num] --y [num]`.
* **Example:** `sid tap "Sign Up"`
* **Example:** `pippin tap "Sign Up"`

#### `sid type`
#### `pippin type`
Inputs text into the currently focused field.
* **Args:** `[text_string]`
* **Flag:** `--submit` (Default: `false`). Hits "Return/Enter" on the keyboard after typing.
* **Example:** `sid type "user@example.com" --submit`
* **Example:** `pippin type "user@example.com" --submit`

#### `sid scroll`
#### `pippin scroll`
* **Args:** `[direction]` (`up`, `down`, `left`, `right`).
* **Flag:** `--until-visible [element_label]`. A specialized loop that scrolls until a specific element appears in the `inspect` tree.

#### `sid gesture`
* **Swipe:** `sid gesture swipe [start_x],[start_y] [end_x],[end_y]`
* **Pinch:** `sid gesture pinch [in|out]`
#### `pippin gesture`
* **Swipe:** `pippin gesture swipe [start_x],[start_y] [end_x],[end_y]`
* **Pinch:** `pippin gesture pinch [in|out]`

---

### 3.3. System & Environment (The "God Mode")
*Developers need to test how the app behaves under different system conditions.*

#### `sid launch`
#### `pippin launch`
* **Args:** `[bundle_id]`
* **Flag:** `--clean`. Wipes the app container (simulates a fresh install).
* **Flag:** `--args "[key]=[value]"`. Passes Launch Arguments (e.g., `-TakingScreenshots YES`).
* **Flag:** `--locale [code]`. Launches the app in a specific language (e.g., `es-MX`).

#### `sid open`
#### `pippin open`
Opens a URL scheme or Universal Link to test routing.
* **Args:** `[url]`
* **Example:** `sid open "myapp://settings/profile?edit=true"`
* **Example:** `pippin open "myapp://settings/profile?edit=true"`

#### `sid permission`
#### `pippin permission`
Manages TCC (Privacy) permissions to test "Happy Path" vs. "Denied Path".
* **Args:** `[service] [status]`
* **Services:** `camera`, `photos`, `location`, `microphone`, `contacts`, `calendar`.
* **Status:** `grant`, `deny`, `reset`.
* **Example:** `sid permission camera deny`
* **Example:** `pippin permission camera deny`

#### `sid location`
#### `pippin location`
Simulates GPS coordinates.
* **Args:** `[lat] [lon]`
* **Example:** `sid location 37.7749 -122.4194` (San Francisco)
* **Example:** `pippin location 37.7749 -122.4194` (San Francisco)

#### `sid network` (Advanced)
#### `pippin network` (Advanced)
* **Args:** `[condition]`
* **Options:** `wifi`, `cellular`, `offline`.

Expand All @@ -106,18 +106,18 @@ Simulates GPS coordinates.
### 3.4. Verification & Debugging (The "Check" Commands)
*Tools for the LLM to verify success or diagnose failure.*

#### `sid assert`
#### `pippin assert`
Quick boolean check for LLM usage.
* **Args:** `[element_query] [state]`
* **States:** `exists`, `visible`, `hidden`, `text=[value]`.
* **Output:** `PASS` or `FAIL: Element found but text was 'Cancel', expected 'Submit'`.

#### `sid logs`
#### `pippin logs`
Fetches the tail of the system log for the target app.
* **Flag:** `--crash-report`. Checks if a crash log was generated in the last session and outputs the stack trace.
* **Use Case:** "The app closed unexpectedly. Why?"

#### `sid tree`
#### `pippin tree`
Lists files in the app's sandbox.
* **Args:** `[directory]` (`documents`, `caches`, `tmp`).
* **Use Case:** Verifying that a file download or database export actually occurred.
Expand All @@ -128,11 +128,11 @@ Lists files in the app's sandbox.

**Objective:** "Verify that the app handles denied Camera permissions gracefully."

1. `sid launch com.myapp.beta --clean`
2. `sid inspect` -> Finds "Start Scan" button.
3. `sid permission camera deny` (Pre-emptively deny permission).
4. `sid tap "Start Scan"`
5. `sid inspect`
1. `pippin launch com.myapp.beta --clean`
2. `pippin inspect` -> Finds "Start Scan" button.
3. `pippin permission camera deny` (Pre-emptively deny permission).
4. `pippin tap "Start Scan"`
5. `pippin inspect`
* **Agent Logic:** Looks for an alert with text "Camera Permission Needed" or "Open Settings".
* **If found:** `sid assert "Open Settings" visible` -> Returns `PASS`.
* **If found:** `pippin assert "Open Settings" visible` -> Returns `PASS`.
* **If not found:** Agent marks test as `FAILED` (App likely stalled or crashed).
12 changes: 6 additions & 6 deletions docs/improvements/01-hierarchy.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# 01: Preserve UI Hierarchy in Inspect Output

**Impact:** Critical — this is the single biggest reason AI agents struggle with Sid.
**Impact:** Critical — this is the single biggest reason AI agents struggle with Pippin.
**Effort:** Medium
**Files:** `sid/utils/ui.py`, `sid/commands/vision.py`
**Files:** `pippin/utils/ui.py`, `pippin/commands/vision.py`

## Problem

Expand Down Expand Up @@ -108,10 +108,10 @@ def simplify_node(node, interactive_only=False, depth=None, current_depth=0):
### 3. Update `inspect_cmd` to use hierarchy by default

```
sid inspect → hierarchical output (new default)
sid inspect --flat → current flat behavior (backward compat)
sid inspect --all → hierarchical, no filtering
sid inspect --depth 3 → limit nesting depth
pippin inspect → hierarchical output (new default)
pippin inspect --flat → current flat behavior (backward compat)
pippin inspect --all → hierarchical, no filtering
pippin inspect --depth 3 → limit nesting depth
```

### Example: Hierarchical Output
Expand Down
22 changes: 11 additions & 11 deletions docs/improvements/02-subcommand-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

**Impact:** High — AI agents (and humans) cannot discover argument syntax.
**Effort:** Small
**Files:** `sid/main.py`
**Files:** `pippin/main.py`

## Problem

Lines 10-50 of `main.py` intercept `-h` / `--help` before argparse processes the subcommand:

```python
if "-h" in sys.argv or "--help" in sys.argv:
print("Sid: A CLI for iOS Automation")
print("Pippin: A CLI for iOS Automation")
print("""...""")
sys.exit(0)
```

This means `sid tap --help`, `sid inspect -h`, `sid launch --help` all print the same top-level overview. The per-subcommand parsers have detailed argument definitions (e.g., `--interactive-only`, `--depth`, `--submit`, `--until-visible`) but they're completely invisible.
This means `pippin tap --help`, `pippin inspect -h`, `pippin launch --help` all print the same top-level overview. The per-subcommand parsers have detailed argument definitions (e.g., `--interactive-only`, `--depth`, `--submit`, `--until-visible`) but they're completely invisible.

## Proposed Fix

Expand All @@ -33,7 +33,7 @@ Remove the manual help interception entirely and let argparse handle it natively

```python
DESCRIPTION = """\
Sid: A Token-Efficient CLI for iOS Automation
Pippin: A Token-Efficient CLI for iOS Automation

Vision:
inspect Inspect UI hierarchy and return a simplified JSON tree
Expand Down Expand Up @@ -66,7 +66,7 @@ Utils:
parser = argparse.ArgumentParser(
description=DESCRIPTION,
formatter_class=argparse.RawDescriptionHelpFormatter,
usage="sid [command] [options]",
usage="pippin [command] [options]",
)
```

Expand All @@ -75,10 +75,10 @@ parser = argparse.ArgumentParser(
After this change:

```
$ sid --help → Shows the grouped overview + global options
$ sid tap --help → Shows: "Tap a UI element" + args/flags for tap
$ sid inspect --help → Shows: --interactive-only, --all, --depth flags
$ sid launch --help → Shows: bundle_id, --clean, --args, --locale
$ pippin --help → Shows the grouped overview + global options
$ pippin tap --help → Shows: "Tap a UI element" + args/flags for tap
$ pippin inspect --help → Shows: --interactive-only, --all, --depth flags
$ pippin launch --help → Shows: bundle_id, --clean, --args, --locale
```

### Also fix the `except SystemExit` block
Expand All @@ -93,6 +93,6 @@ No try/except needed — argparse will print help and exit cleanly on its own.

## Testing

- Verify `sid -h` still shows the grouped overview.
- Verify `sid <subcommand> -h` shows per-subcommand args.
- Verify `pippin -h` still shows the grouped overview.
- Verify `pippin <subcommand> -h` shows per-subcommand args.
- Verify that invalid arguments produce useful error messages (argparse does this by default).
Loading