Skip to content

feat(agent-tars): strict-typed gui agent procotol#1295

Merged
ulivz merged 17 commits into
mainfrom
feat/gui-agent-strict-types
Aug 29, 2025
Merged

feat(agent-tars): strict-typed gui agent procotol#1295
ulivz merged 17 commits into
mainfrom
feat/gui-agent-strict-types

Conversation

@ulivz
Copy link
Copy Markdown
Member

@ulivz ulivz commented Aug 29, 2025

Summary

Implements strict typing system for GUI Agent tool responses with comprehensive type safety and extensibility. This addresses the architectural goals from #1292 by introducing a robust type system that uses generics to distinguish different action types.

Before

Directly exposes the return value of Operator execute, including some unused fields. If the underlying layer is changed, the upper layer may be unavailable.

{
    "id": "248a4fa4-7b88-40ea-a77a-4c1dfc6c8006",
    "type": "tool_result",
    "timestamp": 1756426147199,
    "toolCallId": "call_1756426145455_2fuev",
    "name": "browser_vision_control",
    "content": {
        "action": "click(point='<point>500 678</point>')",
        "status": "success",
        "result": {
            "startX": 640,
            "startY": 542.4,
            "startXPercent": 0.5,
            "startYPercent": 0.6779999999999999,
            "action_inputs": {
                "start_box": "[0.5,0.678]"
            }
        }
    },
    "elapsedMs": 1741
},

After

Relying on a stable data structure, not relying on the internal conversion details of the GUI Agent, the consumption from perspectives such as the Web UI is simple and stable.

    {
        "id": "2f670a16-210d-4b64-9c0e-4c038d848a02",
        "type": "tool_result",
        "timestamp": 1756454122281,
        "toolCallId": "call_1756454120546_fvc2u",
        "name": "browser_vision_control",
        "content": {
            "success": true,
            "action": "click(point='<point>645 587</point>')",
            "normalizedAction": {
                "type": "click",
                "inputs": {
                    "startX": 0.645,
                    "startY": 0.5870000000000001
                }
            }
        },
        "elapsedMs": 1732
    },

Checklist

  • Added or updated necessary tests (Optional).
  • Updated documentation to align with changes (Optional).
  • Verified no breaking changes, or prepared solutions for any occurring breaking changes (Optional).
  • My change does not involve the above items.

Introduce comprehensive type system for GUI Agent actions with:
- Strict generic types for different action types (click, drag, type, etc.)
- Percentage-based coordinate system for cross-platform compatibility
- Backward compatibility with legacy interfaces
- Type-safe response format with GUIAgentToolResponse<T>
@netlify
Copy link
Copy Markdown

netlify Bot commented Aug 29, 2025

Deploy Preview for agent-tars-docs ready!

Name Link
🔨 Latest commit 2ce968a
🔍 Latest deploy log https://app.netlify.com/projects/agent-tars-docs/deploys/68b16463959ff900084fe2f4
😎 Deploy Preview https://deploy-preview-1295--agent-tars-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Comment thread multimodal/agent-tars/core/src/browser/browser-gui-agent.ts Outdated
Comment thread multimodal/agent-tars/core/src/browser/gui-agent-types.ts Outdated
- Remove deprecated Coords type
- Move GUI Agent types to @tarko/agent-interface for broader reuse
- Add @tarko/agent-interface dependency to agent-tars/core
- Update imports to use centralized type definitions
Comment thread multimodal/tarko/agent-interface/src/gui-agent-types.ts Outdated
Comment thread multimodal/agent-tars/core/src/browser/browser-gui-agent.ts Outdated
Comment thread multimodal/agent-tars/core/src/browser/browser-gui-agent.ts Outdated
Comment thread multimodal/agent-tars/core/src/browser/browser-gui-agent.ts
ulivz added 7 commits August 29, 2025 14:42
- Wrap all GUI Agent types in GUIAgent namespace
- Remove 'as' type assertions for better type safety
- Use strict BrowserExecuteResult type instead of 'any'
- Update BrowserControlRenderer to support new response format
- Add backward compatibility for legacy and new GUI response structures
- Remove legacy exports for backward compatibility (new code doesn't need it)
- Update all type usage to GUIAgent namespace format
- Simplify imports to only necessary types from @tarko/agent-interface
- Maintain clean namespace organization for better type clarity
- Allow namespace usage for GUI Agent types organization
- Namespaces are appropriate for this use case where we need
  to group related types under a clear semantic boundary
- Format gui-agent-types.ts
- Format browser-gui-agent.ts
- Format BrowserControlRenderer.tsx
- Ensure consistent code formatting across all modified files
- Create multimodal/tarko/shared-utils/src/gui-agent.ts with reusable utilities
- Move convertToGUIResponse, convertToNormalizedAction, createErrorAction to shared package
- Add GUIExecuteResult interface for standardized execute results
- Update browser-gui-agent.ts to use shared utilities
- Enable other Agents to easily adopt GUI Agent type system
- Remove duplicate code and improve maintainability
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 14.26%. Comparing base (5a7200d) to head (2ce968a).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1295      +/-   ##
==========================================
+ Coverage   14.21%   14.26%   +0.04%     
==========================================
  Files         250      250              
  Lines        8562     8562              
  Branches     1672     1672              
==========================================
+ Hits         1217     1221       +4     
+ Misses       7170     7166       -4     
  Partials      175      175              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread multimodal/tarko/agent-interface/src/gui-agent-types.ts Outdated
@ulivz ulivz force-pushed the feat/gui-agent-strict-types branch from 0a26312 to f08cf64 Compare August 29, 2025 07:43
ulivz added 6 commits August 29, 2025 15:45
- Add inputs field to BaseAction to define fundamental action structure
- Remove unnecessary 'as' type assertions in browser-gui-agent.ts
- Use strict GUIExecuteResult typing for better type safety
- Convert BaseAction to generic interface for better type safety
- Change all action types from interfaces to type aliases
- Ensure type and inputs consistency through generics
- Reduce code duplication and improve maintainability
- Rename actionStr to action for raw action string
- Rename action to normalizedAction for parsed action object
- Update convertToGUIResponse and createGUIErrorResponse functions
- Update BrowserControlRenderer to use new field names
- Improve API clarity and consistency
- Use optional chaining to simplify null checks
- Add coordinate-based action type validation
- Log unsupported action types for better debugging
- Reduce code complexity and improve readability
@ulivz ulivz force-pushed the feat/gui-agent-strict-types branch from 12aac20 to 01ef52c Compare August 29, 2025 08:04
ZhaoHeh
ZhaoHeh previously approved these changes Aug 29, 2025
@ulivz ulivz force-pushed the feat/gui-agent-strict-types branch from 01ef52c to 2ce968a Compare August 29, 2025 08:27
@ulivz ulivz changed the title feat(agent-tars): implement strict typing for GUI Agent tool responses feat(agent-tars): strict-typed gui agent procotol Aug 29, 2025
@ulivz ulivz merged commit 4aa9d78 into main Aug 29, 2025
13 checks passed
@ulivz ulivz deleted the feat/gui-agent-strict-types branch August 29, 2025 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants