-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Is your feature request related to a problem? Please describe.
Currently, we have hover, click and press_key tools which are rather low level and to send multiple of them the client needs to be able to issue multiple tool calls (which might involve multiple LLM round trips). In addition, we support some low level input commands but not all like the perform_actions API.
Describe the solution you'd like
WebDriver defines the perform actions API https://www.w3.org/TR/webdriver-bidi/#command-input-performActions that allows sending a sequence of actions in one tool call. We can replace our existing low level APIs with an API similar to WebDriver (but integrated with our other concepts like uids). We should not replace higher level APIs though, like fill, fill_form, drag.
Describe alternatives you've considered
Current approach.
Additional context
We need to test if the models have any difficulties using the WebDriver like API.