Description of the bug:
Gemini Vision model consistently provides incorrect pixel coordinates for UI elements when Windows 11 scaling is active (e.g., 125%). The model seems to identify the element correctly but fails to map it to the actual OS screen coordinates.
Steps to Reproduce:
Upload a screenshot of a 1080p desktop with an active window (like the Windows On-Screen Keyboard).
Ask Gemini for the exact x, y coordinates of the "Close" (X) button.
Input these coordinates into an automation tool (Clickermann).
Actual Result:
The click lands far outside the intended target, often shifting significantly to the right or hanging on the window border. Even manual offset corrections by the AI don't solve the scaling translation issue.
Visual Proof (TikTok):
I've recorded a full session of this "struggle" to show how the AI misses the target repeatedly: https://vt.tiktok.com/ZSHcb2YAB/
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response
Description of the bug:
Gemini Vision model consistently provides incorrect pixel coordinates for UI elements when Windows 11 scaling is active (e.g., 125%). The model seems to identify the element correctly but fails to map it to the actual OS screen coordinates.
Steps to Reproduce:
Upload a screenshot of a 1080p desktop with an active window (like the Windows On-Screen Keyboard).
Ask Gemini for the exact x, y coordinates of the "Close" (X) button.
Input these coordinates into an automation tool (Clickermann).
Actual Result:
The click lands far outside the intended target, often shifting significantly to the right or hanging on the window border. Even manual offset corrections by the AI don't solve the scaling translation issue.
Visual Proof (TikTok):
I've recorded a full session of this "struggle" to show how the AI misses the target repeatedly: https://vt.tiktok.com/ZSHcb2YAB/
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response