Skip to content

[Bug]: Click prediction marker is offset from actual cursor position on macOS Tahoe 26 #1876

@e-ndorfin

Description

@e-ndorfin

Version

v0.3.0

Model

UI-TARS-1.5-7B

Deployment Method

Local

Issue Description

Summary

On macOS Tahoe 26, the red click prediction marker shown by UI-TARS Desktop can appear offset from where the cursor actually moves/clicks.

The underlying action execution appears to be separate from the visual marker positioning. In practice, this makes it look like the model or operator clicked the wrong place, even when the cursor movement itself may be targeting a different location than the marker suggests.

Environment

  • OS: macOS Tahoe 26.1
  • Build: 25B78
  • App: UI-TARS Desktop
  • Operator: Local Computer Operator

What I Observed

When the VLM returns a click action, the app shows a red prediction marker on screen. On my machine, that marker is visibly offset from where the cursor actually moves.

This makes it difficult to tell whether a bad click came from:

  • the VLM predicting the wrong coordinate
  • the operator executing the coordinate incorrectly
  • the visual prediction marker being rendered in the wrong coordinate space

Expected Behavior

The red prediction marker should be centered on the same screen location where the operator moves/clicks the cursor.

Actual Behavior

The red prediction marker appears offset from the actual cursor position.

Suspected Area

This may be related to DPR / coordinate-space conversion for the prediction overlay.

The marker path appears to go through:

  • apps/ui-tars/src/main/shared/setOfMarks.ts
  • apps/ui-tars/src/main/window/ScreenMarker.ts

The execution path appears to go through:

  • packages/ui-tars/action-parser/src/actionParser.ts
  • packages/ui-tars/sdk/src/utils.ts
  • packages/ui-tars/operators/nut-js/src/index.ts
  • apps/ui-tars/src/main/agent/operator.ts

The issue may be that the marker overlay and the actual cursor execution are not using the same coordinate space on macOS Tahoe 26 / Retina-style displays.

Impact

This is more than a debugging issue. The VLM can use the visual click indicator in subsequent screenshots to understand where its previous action landed. If the red marker is offset from the actual clicked location or target button, the model may receive misleading feedback and adjust its next action based on an incorrect visual signal.

That can degrade the agent loop by making recovery and self-correction less reliable, especially for tasks that require precise GUI interaction.

Error Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't workingUI-TARS

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions