Vision coordinate mismatch in Windows 11 due to UI scaling

### Description of the bug:


Gemini Vision model consistently provides incorrect pixel coordinates for UI elements when Windows 11 scaling is active (e.g., 125%). The model seems to identify the element correctly but fails to map it to the actual OS screen coordinates.
​Steps to Reproduce:
​Upload a screenshot of a 1080p desktop with an active window (like the Windows On-Screen Keyboard).
​Ask Gemini for the exact x, y coordinates of the "Close" (X) button.
​Input these coordinates into an automation tool (Clickermann).
​Actual Result:
The click lands far outside the intended target, often shifting significantly to the right or hanging on the window border. Even manual offset corrections by the AI don't solve the scaling translation issue.
​Visual Proof (TikTok):
I've recorded a full session of this "struggle" to show how the AI misses the target repeatedly: https://vt.tiktok.com/ZSHcb2YAB/

### Actual vs expected behavior:


_No response_

### Any other information you'd like to share?


_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision coordinate mismatch in Windows 11 due to UI scaling #1199

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Vision coordinate mismatch in Windows 11 due to UI scaling #1199

Description

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions