Skip to content

Commit a1d3553

Browse files
committed
docs: update README and AGENTS.md for accessibility-first find_text
Both macOS and Windows now use the platform accessibility API as the primary search mechanism for find_text, with OCR fallback. Update documentation to reflect this unified behavior and note that macOS accessibility results use semantic names (e.g., "All Clear" not "AC").
1 parent 1c179a0 commit a1d3553

2 files changed

Lines changed: 8 additions & 6 deletions

File tree

AGENTS.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,16 +61,17 @@ Captures pixel data and layout.
6161

6262
#### `find_text`
6363
Fast-path to get coordinates without image analysis.
64-
* **Inputs:** `text` (string), `display_id` (number, optional).
64+
* **Inputs:** `text` (string, case-insensitive substring match against accessibility element names, then OCR), `app_name` (string, optional), `window_id` (number, optional), `display_id` (number, optional).
6565
* **Returns (JSON array):**
6666
```json
6767
[
6868
{ "text": "Save", "x": 500, "y": 300, "confidence": 1.0, "bounds": { "x": 480, "y": 290, "width": 40, "height": 20 } }
6969
]
7070
```
7171
* **Platform behavior:**
72-
* **Windows:** Uses **UI Automation (UIA)** as the primary mechanism — searches the accessibility tree for elements whose name matches the query. This gives precise element-level coordinates (`confidence: 1.0`). Falls back to OCR automatically if UIA finds no matches.
73-
* **macOS:** Uses OCR (Vision framework).
72+
* **Both platforms:** Uses the **platform accessibility API** as the primary mechanism — searches the accessibility tree for elements whose name matches the query. This gives precise element-level coordinates (`confidence: 1.0`). Falls back to OCR automatically if accessibility finds no matches.
73+
* **macOS:** Accessibility API (primary), Vision OCR (fallback). Note: accessibility results use semantic names (e.g., "All Clear" instead of "AC", "Subtract" instead of "−"), so search by meaning rather than displayed symbol.
74+
* **Windows:** UI Automation (primary), WinRT OCR (fallback).
7475

7576
### 2. Input & Interaction (The "Hands")
7677

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ This MCP server is designed to be **highly discoverable and usable** by AI model
5454
**Core Capabilities for System Prompts:**
5555
1. `take_screenshot`: The "eyes". Returns images + layout metadata + text locations (OCR).
5656
2. `click` / `type_text`: The "hands". Interacts with the system based on visual feedback.
57-
3. `find_text`: A shortcut to find text on screen and get its coordinates immediately. On Windows, uses **UI Automation** for precise element-level matching, with OCR fallback.
57+
3. `find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform **accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
5858
4. `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
5959

6060
## 📦 Installation (macOS + Windows)
@@ -202,11 +202,12 @@ graph TD
202202
|----|---------|----------|
203203
| **macOS** | Screenshots | `screencapture` (CLI) |
204204
| | Input | `CGEvent` (CoreGraphics) |
205+
| | Text Search (`find_text`) | `Accessibility API` (primary), Vision OCR (fallback) |
205206
| | OCR | `VNRecognizeTextRequest` (Vision Framework) |
206207
| **Windows** | Screenshots | `BitBlt` (GDI) |
207208
| | Input | `SendInput` (Win32) |
209+
| | Text Search (`find_text`) | `UI Automation` (primary), WinRT OCR (fallback) |
208210
| | OCR | `Windows.Media.Ocr` (WinRT) |
209-
| | Text Search (`find_text`) | `UI Automation` (primary), OCR (fallback) |
210211

211212
### Screenshot Coordinate Precision
212213

@@ -256,7 +257,7 @@ On macOS, you must grant permissions to the **host application** (e.g., Terminal
256257

257258
Works out of the box on **Windows 10/11**.
258259
* Uses standard Win32 APIs (GDI, SendInput).
259-
* `find_text` uses **UI Automation (UIA)** as the primary search mechanism, querying the accessibility tree for element names. This is faster and more precise than OCR for standard UI elements (buttons, labels, menus). Falls back to OCR automatically when UIA finds no matches.
260+
* `find_text` uses **UI Automation (UIA)** as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). Falls back to OCR automatically when UIA finds no matches.
260261
* OCR uses the built-in Windows Media OCR engine (offline).
261262
* **Note:** Cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.
262263

0 commit comments

Comments
 (0)