Use this flow when the target is not text: icons, glyphs, toggles, shapes, or custom controls.
- OCR cannot identify the target
find_textis not applicable- You have a reference image for the element you want to click
- Take a screenshot of the app window
- Load the reference image with
load_image - Search the screenshot with
find_image - Click the returned
screen_xandscreen_y
Take a screenshot:
{
"tool": "take_screenshot",
"arguments": {
"app_name": "MyApp"
}
}The screenshot metadata includes screenshot_id. Use that in the next step.
Load a template image:
{
"tool": "load_image",
"arguments": {
"path": "/path/to/icon.png"
}
}Example result:
{
"image_id": "image-0",
"width": 64,
"height": 64,
"channels": 4,
"mime": "image/png"
}Find the image:
{
"tool": "find_image",
"arguments": {
"screenshot_id": "screenshot-0",
"template_id": "image-0"
}
}Example result:
{
"matches": [
{
"score": 0.95,
"center": { "x": 132, "y": 232 },
"screen_x": 166,
"screen_y": 216
}
]
}Click the match:
{
"tool": "click",
"arguments": {
"x": 166,
"y": 216
}
}- Start with the default
fastmode - Use
accurateonly when you need a wider or more precise search - Keep the template image tightly cropped around the visual element