You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update README and AGENTS.md for accessibility-first find_text
Both macOS and Windows now use the platform accessibility API as the
primary search mechanism for find_text, with OCR fallback. Update
documentation to reflect this unified behavior and note that macOS
accessibility results use semantic names (e.g., "All Clear" not "AC").
* **Windows:** Uses **UI Automation (UIA)** as the primary mechanism — searches the accessibility tree for elements whose name matches the query. This gives precise element-level coordinates (`confidence: 1.0`). Falls back to OCR automatically if UIA finds no matches.
73
-
* **macOS:** Uses OCR (Vision framework).
72
+
* **Both platforms:** Uses the **platform accessibility API** as the primary mechanism — searches the accessibility tree for elements whose name matches the query. This gives precise element-level coordinates (`confidence: 1.0`). Falls back to OCR automatically if accessibility finds no matches.
73
+
* **macOS:** Accessibility API (primary), Vision OCR (fallback). Note: accessibility results use semantic names (e.g., "All Clear" instead of "AC", "Subtract" instead of "−"), so search by meaning rather than displayed symbol.
Copy file name to clipboardExpand all lines: README.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@ This MCP server is designed to be **highly discoverable and usable** by AI model
54
54
**Core Capabilities for System Prompts:**
55
55
1.`take_screenshot`: The "eyes". Returns images + layout metadata + text locations (OCR).
56
56
2.`click` / `type_text`: The "hands". Interacts with the system based on visual feedback.
57
-
3.`find_text`: A shortcut to find text on screen and get its coordinates immediately. On Windows, uses**UI Automation** for precise element-level matching, with OCR fallback.
57
+
3.`find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform**accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
58
58
4.`load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
|| Text Search (`find_text`) |`UI Automation` (primary), OCR (fallback) |
210
211
211
212
### Screenshot Coordinate Precision
212
213
@@ -256,7 +257,7 @@ On macOS, you must grant permissions to the **host application** (e.g., Terminal
256
257
257
258
Works out of the box on **Windows 10/11**.
258
259
* Uses standard Win32 APIs (GDI, SendInput).
259
-
*`find_text` uses **UI Automation (UIA)** as the primary search mechanism, querying the accessibility tree for element names. This is faster and more precise than OCR for standard UI elements (buttons, labels, menus). Falls back to OCR automatically when UIA finds no matches.
260
+
*`find_text` uses **UI Automation (UIA)** as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). Falls back to OCR automatically when UIA finds no matches.
260
261
* OCR uses the built-in Windows Media OCR engine (offline).
261
262
***Note:** Cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.
0 commit comments