feat(agent-tars): strict-typed gui agent procotol#1295
Merged
Conversation
Introduce comprehensive type system for GUI Agent actions with: - Strict generic types for different action types (click, drag, type, etc.) - Percentage-based coordinate system for cross-platform compatibility - Backward compatibility with legacy interfaces - Type-safe response format with GUIAgentToolResponse<T>
✅ Deploy Preview for agent-tars-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
ulivz
commented
Aug 29, 2025
- Remove deprecated Coords type - Move GUI Agent types to @tarko/agent-interface for broader reuse - Add @tarko/agent-interface dependency to agent-tars/core - Update imports to use centralized type definitions
ulivz
commented
Aug 29, 2025
- Wrap all GUI Agent types in GUIAgent namespace - Remove 'as' type assertions for better type safety - Use strict BrowserExecuteResult type instead of 'any' - Update BrowserControlRenderer to support new response format - Add backward compatibility for legacy and new GUI response structures
- Remove legacy exports for backward compatibility (new code doesn't need it) - Update all type usage to GUIAgent namespace format - Simplify imports to only necessary types from @tarko/agent-interface - Maintain clean namespace organization for better type clarity
- Allow namespace usage for GUI Agent types organization - Namespaces are appropriate for this use case where we need to group related types under a clear semantic boundary
- Format gui-agent-types.ts - Format browser-gui-agent.ts - Format BrowserControlRenderer.tsx - Ensure consistent code formatting across all modified files
- Create multimodal/tarko/shared-utils/src/gui-agent.ts with reusable utilities - Move convertToGUIResponse, convertToNormalizedAction, createErrorAction to shared package - Add GUIExecuteResult interface for standardized execute results - Update browser-gui-agent.ts to use shared utilities - Enable other Agents to easily adopt GUI Agent type system - Remove duplicate code and improve maintainability
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1295 +/- ##
==========================================
+ Coverage 14.21% 14.26% +0.04%
==========================================
Files 250 250
Lines 8562 8562
Branches 1672 1672
==========================================
+ Hits 1217 1221 +4
+ Misses 7170 7166 -4
Partials 175 175 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ulivz
commented
Aug 29, 2025
0a26312 to
f08cf64
Compare
- Add inputs field to BaseAction to define fundamental action structure - Remove unnecessary 'as' type assertions in browser-gui-agent.ts - Use strict GUIExecuteResult typing for better type safety
- Convert BaseAction to generic interface for better type safety - Change all action types from interfaces to type aliases - Ensure type and inputs consistency through generics - Reduce code duplication and improve maintainability
- Rename actionStr to action for raw action string - Rename action to normalizedAction for parsed action object - Update convertToGUIResponse and createGUIErrorResponse functions - Update BrowserControlRenderer to use new field names - Improve API clarity and consistency
- Use optional chaining to simplify null checks - Add coordinate-based action type validation - Log unsupported action types for better debugging - Reduce code complexity and improve readability
12aac20 to
01ef52c
Compare
ZhaoHeh
previously approved these changes
Aug 29, 2025
01ef52c to
2ce968a
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements strict typing system for GUI Agent tool responses with comprehensive type safety and extensibility. This addresses the architectural goals from #1292 by introducing a robust type system that uses generics to distinguish different action types.
Before
Directly exposes the return value of Operator execute, including some unused fields. If the underlying layer is changed, the upper layer may be unavailable.
{ "id": "248a4fa4-7b88-40ea-a77a-4c1dfc6c8006", "type": "tool_result", "timestamp": 1756426147199, "toolCallId": "call_1756426145455_2fuev", "name": "browser_vision_control", "content": { "action": "click(point='<point>500 678</point>')", "status": "success", "result": { "startX": 640, "startY": 542.4, "startXPercent": 0.5, "startYPercent": 0.6779999999999999, "action_inputs": { "start_box": "[0.5,0.678]" } } }, "elapsedMs": 1741 },After
Relying on a stable data structure, not relying on the internal conversion details of the GUI Agent, the consumption from perspectives such as the Web UI is simple and stable.
{ "id": "2f670a16-210d-4b64-9c0e-4c038d848a02", "type": "tool_result", "timestamp": 1756454122281, "toolCallId": "call_1756454120546_fvc2u", "name": "browser_vision_control", "content": { "success": true, "action": "click(point='<point>645 587</point>')", "normalizedAction": { "type": "click", "inputs": { "startX": 0.645, "startY": 0.5870000000000001 } } }, "elapsedMs": 1732 },Checklist