-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: add hover action support #3994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add hover action support #3994
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis PR introduces comprehensive support for a new "hover" action type across both the frontend and backend systems. The changes span type definitions, UI components, action handlers, exception types, LLM prompts, and DOM utilities to enable hover-based interaction detection and execution. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant LLMPrompt
participant Backend
participant Scraper
participant Browser
participant DOM
User->>Frontend: Perform action
Frontend->>LLMPrompt: Extract action (needs hover?)
LLMPrompt->>LLMPrompt: Detect hover-only elements
LLMPrompt-->>Backend: Return HOVER action + hold_seconds
Backend->>Backend: parse_action → HoverAction
Backend->>Scraper: Resolve element ID → locator
Scraper->>DOM: Get SkyvernElement with hoverOnly flag
Backend->>DOM: Call hover_to_reveal(max_depth=4)
DOM->>Browser: Iterate hover attempts on element/parents
Browser->>Browser: Hover with timeout
Browser->>Browser: Wait settle_delay_s
Browser->>Browser: Check visibility
alt Hover successful
DOM-->>Backend: Returns True
Backend->>Browser: Scroll into view
Backend->>Browser: Perform hover with timeout
opt hold_seconds > 0
Backend->>Browser: Wait hold_seconds
end
Backend-->>Frontend: ActionSuccess
else Hover failed
DOM-->>Backend: Returns False
Backend-->>Frontend: ActionFailure(FailToHover)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 0fda52f in 2 minutes and 50 seconds. Click for details.
- Reviewed
537lines of code in19files - Skipped
1files when reviewing. - Skipped posting
7draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. skyvern/webeye/scraper/scraper.py:114
- Draft comment:
When processing the hoverOnly flag, consider validating or explicitly converting its value to a boolean rather than relying on truthy evaluation. This ensures consistent behavior if the flag isn’t strictly boolean. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is suggesting a defensive programming practice but doesn't identify an actual bug. The patternif element.get("hoverOnly"):is idiomatic Python and is used consistently throughout this codebase (as seen on line 907). The comment is speculative - it says "consider" and talks about potential inconsistent behavior "if the flag isn't strictly boolean", but there's no evidence that hoverOnly would ever be anything other than a boolean or undefined. This is a code quality suggestion that isn't clearly actionable, and the existing pattern is already consistent with the rest of the codebase. The rules state I should not keep speculative comments and should only keep comments with strong evidence of correctness. Perhaps there's a real issue where hoverOnly could be a string like "true" or "false" from JSON parsing, which would cause truthy evaluation to behave unexpectedly. Maybe the data source for these elements could provide non-boolean values. While it's theoretically possible that hoverOnly could be a non-boolean value, there's no evidence in the diff or surrounding code that this is a real concern. The same pattern is used elsewhere in the file (line 907) without issue, and the codebase consistently uses truthy evaluation for similar flags. Without concrete evidence that this is a problem, this is just speculative defensive programming advice. This comment should be deleted. It's a speculative suggestion about potential issues without evidence of an actual problem. The pattern used is consistent with the rest of the codebase, and the comment doesn't identify a clear bug or required change.
2. skyvern/webeye/utils/dom.py:406
- Draft comment:
In the hover_to_reveal method, you call self.get_locator() multiple times within the loop. Consider caching the locator at the start of each iteration to avoid redundant async calls and potentially improve performance. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment claimsself.get_locator()involves "redundant async calls", but looking at the implementation (line 458-459),get_locator()is a simple synchronous method that returnsself.locator. It's just an attribute access, not an async operation. The performance impact of calling this method twice per loop iteration is negligible. The comment appears to be based on a misunderstanding - the reviewer may have assumedget_locator()was an async method because it's used in an async context, but it's actually just a simple getter. This is not a meaningful performance issue. Could there be other reasons to cache the locator beyond performance? Perhaps for code clarity or to ensure consistency? However, the comment specifically mentions performance and "redundant async calls", which is factually incorrect. While caching could marginally improve code clarity, the comment's justification is factually wrong -get_locator()is not an async call and has negligible performance impact. The comment doesn't provide valid reasoning for the suggested change. This comment should be deleted because it's based on a false premise. Theget_locator()method is a simple synchronous getter that returns an instance variable, not an async operation. The performance concern raised is not valid.
3. skyvern/webeye/utils/dom.py:424
- Draft comment:
In hover_to_reveal, exceptions during hover attempts are caught and only logged at the debug level. Consider whether some failures (especially repeated ones) should be escalated or handled differently to avoid masking underlying issues with hover interactions. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment is speculative and advisory rather than pointing to a concrete issue. It uses phrases like "Consider whether" which is asking the author to think about something rather than identifying a definite problem. The current implementation seems reasonable - it catches exceptions during hover attempts, logs them for debugging, tries parent elements, and returns False if all attempts fail. The caller can then decide how to handle the failure. The comment doesn't identify a specific scenario where the current approach is wrong or would cause issues. It's asking the author to "consider" alternative approaches without strong evidence that the current approach is problematic. The method does catch all exceptions broadly withexcept Exception:, which could potentially mask important errors. However, the method returns a boolean indicating success/failure, so the caller can handle the case appropriately. Without specific evidence of what problems this causes, the comment remains speculative. While broad exception catching can be problematic, the method has a clear contract (returns bool for success/failure) and logs failures for debugging. The comment doesn't provide specific evidence of what issues this causes or what specific exceptions should be handled differently. It's asking the author to "consider" alternatives without demonstrating a concrete problem. This comment should be deleted. It's speculative and advisory ("Consider whether...") rather than identifying a concrete issue. It doesn't provide actionable feedback with clear evidence of a problem. The current implementation appears reasonable with proper error logging and a clear return value indicating success/failure.
4. skyvern/forge/prompts/skyvern/extract-action-static.j2:6
- Draft comment:
Typo found: In the example IDINSTER_CARD_WRAPPER, it appears that 'INSTER' is a misspelling. Consider correcting it to something likeINSERT_CARD_WRAPPERif that's intended. - Reason this comment was not posted:
Comment looked like it was already resolved.
5. skyvern/forge/prompts/skyvern/extract-action-static.j2:7
- Draft comment:
Minor consistency note: The sentence starting after the period should ideally use a capital letter. Consider changing 'prioritize actions with the red information.' to 'Prioritize actions with the red information.' - Reason this comment was not posted:
Comment was on unchanged code.
6. skyvern/forge/prompts/skyvern/extract-action.j2:7
- Draft comment:
Capitalization issue: The sentence starts with a lowercase 'prioritize'. It should begin with a capital letter, i.e. 'Prioritize actions with the red information.' - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =10%<= threshold50%This comment is purely informative and focuses on a minor stylistic issue that doesn't impact the functionality or logic of the code. It doesn't provide a suggestion for code improvement or address a potential issue in the code's behavior.
7. skyvern/webeye/scraper/domUtils.js:449
- Draft comment:
Minor typo: In the comment, consider changing "According this logic" to "According to this logic". - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_qTuuOH4FjNsuAB4A
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
skyvern/webeye/actions/actions.py (1)
144-178: Add missing action type branches toAction.validate()factory method
HoverActionand seven other action types (Extract, Scroll, Keypress, Move, Drag, VerificationCode, LeftMouse) are defined as concrete classes but lack dispatch branches inAction.validate(). SinceAction.model_validate()is called in production code when retrieving actions from the database (db/client.py:452, 3379, 3397, 3419, 3429), any of these action types stored and then retrieved will fail withValueError("Unsupported action type").Add elif branches for all eight missing types in
Action.validate()(lines 144-178):elif action_type is ActionType.SOLVE_CAPTCHA: return SolveCaptchaAction.model_validate(value) + elif action_type is ActionType.HOVER: + return HoverAction.model_validate(value) + elif action_type is ActionType.EXTRACT: + return ExtractAction.model_validate(value) + elif action_type is ActionType.SCROLL: + return ScrollAction.model_validate(value) + elif action_type is ActionType.KEYPRESS: + return KeypressAction.model_validate(value) + elif action_type is ActionType.MOVE: + return MoveAction.model_validate(value) + elif action_type is ActionType.DRAG: + return DragAction.model_validate(value) + elif action_type is ActionType.VERIFICATION_CODE: + return VerificationCodeAction.model_validate(value) + elif action_type is ActionType.LEFT_MOUSE: + return LeftMouseAction.model_validate(value) elif action_type is ActionType.RELOAD_PAGE: return ReloadPageAction.model_validate(value)
🧹 Nitpick comments (8)
skyvern/webeye/scraper/domUtils.js (1)
414-446: Hover-only heuristics work but class-name checks may be too broadThe new hover plumbing is wired correctly end‑to‑end:
isHoverOnlyElementsets a boolean that’s propagated viahoverOnlyinbuildElementObject.isElementVisibletreating zero‑sized but hover‑only elements as visible, andisInteractableallowingpointer-events: nonefor such elements, gives the agent something it can later reveal via hover.The main concern is the breadth of the heuristics:
- Treating any ancestor whose class contains
"card"or"item"as implying hover‑only behavior is quite loose and may mark a large fraction of the DOM as hover‑only on many sites.- Combined with the visibility and interactability overrides, this could surface elements that are permanently hidden or not actually hover‑revealed as “visible & interactable”, leading to noisy action sets and potentially confusing LLM behavior.
Consider tightening
isHoverOnlyElement, for example:
- Require a stronger hover signal (class names that contain both a hover token and a card/item token), or
- Use word‑boundary checks / specific patterns (e.g.
\bcard\b,\bmenu-item\b) instead of simpleincludes("item"), or- Gate the ancestor
"card"/"item"checks on seeing a hover-ish pattern somewhere in the chain.A small refinement here would keep hover support while reducing the chance of over‑detecting hover‑only elements.
Also applies to: 485-492, 863-871, 1561-1567
skyvern/core/script_generations/generate_script.py (1)
92-109: Hover action codegen is correct; tighten hold_seconds type handlingThe new hover wiring is consistent:
"hover"inACTION_MAPandACTIONS_WITH_XPATHensures HOVER actions generatepage.hover(selector="xpath=...")calls.- The hover branch correctly adds an optional
hold_secondskwarg only when present and positive, matching the new page API.One robustness concern:
elif method == "hover": hold_seconds = act.get("hold_seconds") if hold_seconds and hold_seconds > 0: ...If
hold_secondsis ever a string (e.g. JSON‑decoded"1.0"), the>comparison will raise aTypeError. Since the generator is long‑lived infrastructure, it’s safer to guard on numeric types:- elif method == "hover": - hold_seconds = act.get("hold_seconds") - if hold_seconds and hold_seconds > 0: + elif method == "hover": + hold_seconds = act.get("hold_seconds") + if isinstance(hold_seconds, (int, float)) and hold_seconds > 0: args.append( cst.Arg( keyword=cst.Name("hold_seconds"), value=_value(hold_seconds), whitespace_after_arg=cst.ParenthesizedWhitespace( indent=True, last_line=cst.SimpleWhitespace(INDENT), ), ) )This keeps behavior the same for valid floats while avoiding surprising crashes if the upstream payload ever passes a non‑numeric value.
Also applies to: 110-118, 266-294
skyvern/webeye/actions/parse_actions.py (1)
20-45: Hover parsing is fine; consider cleaning up the null-action branch
- The new
ActionType.HOVERhandling looks good: it reusesbase_action_dictand safely defaultshold_secondsto0when omitted or null.- If you ever expect non-numeric
hold_seconds, you might optionally normalize it (e.g.,float(action.get("hold_seconds") or 0)), but the current behavior matches other Pydantic-backed fields.One small cleanup while you’re here:
action_typeis set asActionType[...], but later you haveif action_type == "null":. That condition will never be true because you’re comparing an enum instance to a string; null actions are already handled by the earlierif "action_type" not in action or action["action_type"] is Nonebranch.Consider either dropping that
"null"branch or rewriting it againstActionType.NULL_ACTIONif you intend to support"null"explicitly in model outputs.Also applies to: 169-180
skyvern/forge/prompts/skyvern/extract-action.j2 (1)
6-10: Prompt updates align with hover support; a couple of small clarity nits
- The new guidance about not inventing IDs and choosing the closest real ID is great and directly addresses prior issues with made-up identifiers.
- Adding explicit HOVER semantics (hover to reveal UI, then potentially WAIT before the next action) matches the new backend behavior.
Two optional clarity tweaks you might consider:
- Line 9: “card, tile, or model name” might be intended as “modal name” or “modal window”; if you meant a modal dialog, rewording would avoid confusion.
- The WAIT description now says “WAIT should not be used if there are actions to take such as hovering or clicking,” while earlier you recommend a WAIT after HOVER if new elements need time to appear. You could add a brief qualifier like “after you’ve exhausted immediately-available actions” to make the intended sequence clearer for the model.
Also applies to: 22-22
skyvern/webeye/utils/dom.py (1)
134-135: Hover metadata and reveal loop look correct; consider narrowing exception handling.The
_hover_onlyflag andrequires_hover()+hover_to_reveal()flow are consistent with the scraped element model and avoid interfering with non-hover elements (early return whenrequires_hover()is false). The parent-walking logic is guarded bycount() != 1and logs on failures, which makes the reveal loop reasonably safe.The only minor concern is the two broad
except Exceptionblocks insidehover_to_reveal(). If you want to quiet static-analysis noise and make failures more intentional, you could consider catching specific Playwright errors (e.g.,PlaywrightError/TimeoutError) and letting truly unexpected exceptions propagate, while keeping the same logging structure. This is optional and doesn’t block correctness.Also applies to: 403-443
skyvern/webeye/actions/handler.py (2)
28-36: Hover handler wiring is sound; minor polish around error handling and unused arg.The new
handle_hover_actioncorrectly mirrors the pattern of other web actions: it resolves aSkyvernElementviaDomUtil, reuseshover_to_reveal()for hover-only parents, performs a Playwrighthover(), honorshold_seconds, and surfaces failures viaFailToHover, withActionType.HOVERproperly registered inActionHandler.Two small nits you might want to address:
stepis unused inhandle_hover_action; to appease linters you could either prefix it with_stepor use it in logging/trace context.- Both try/except blocks catch bare
Exception. If you’d like to quiet BLE001 and make behavior more explicit, you could narrow those to expected runtime classes (e.g.,MissingElement,MultipleElementsFound,PlaywrightError) and let unexpected exceptions bubble to the outerhandle_actioncatch-all, which already logs and wraps unknown errors.These are non-blocking and mostly lint/clarity improvements.
Also applies to: 1978-2012, 2221-2221
2264-2305: Pre-click hover reveal integration looks good; consider logging reveal outcome.Calling
await skyvern_element.hover_to_reveal()at the start ofchain_click()is a nice way to reuse the hover-only metadata when clicking elements that require hover to become interactable. The early-return behavior inhover_to_reveal()ensures non-hover elements aren’t penalized.Right now the return value of
hover_to_reveal()is ignored. That’s fine functionally because the click fallbacks still run, but you may want to either:
- Log when
hover_to_reveal()returnsFalsefor elements marked as hover-only, or- Short-circuit to a clearer
FailToClickreason in that case,to make debugging failed interactions on hover-only UIs easier. Not required for correctness, just for observability.
skyvern/forge/prompts/skyvern/extract-action-static.j2 (1)
6-7: Prompt updates for HOVER and ID constraints are consistent and clear.The strengthened ID guidance (no invented IDs, choose closest real ID) and the new HOVER description line up well with the backend
ActionType.HOVERandhandle_hover_actionsemantics. This should help the LLM prefer explicit hover steps before click actions on hover-only UIs.If in the future you want the model to control hover duration (
hold_seconds), you could extend this schema doc to mention an optionalhold_secondsfield for HOVER actions, but that’s an enhancement rather than a requirement for this PR.Also applies to: 9-9, 22-22
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
skyvern-frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (19)
skyvern-frontend/src/api/types.ts(2 hunks)skyvern-frontend/src/routes/tasks/detail/ActionTypePill.tsx(1 hunks)skyvern-frontend/src/routes/tasks/detail/ActionTypePillMinimal.tsx(1 hunks)skyvern-frontend/src/routes/tasks/detail/hooks/useActions.ts(1 hunks)skyvern/cli/init_command.py(1 hunks)skyvern/core/script_generations/generate_script.py(3 hunks)skyvern/core/script_generations/script_skyvern_page.py(1 hunks)skyvern/core/script_generations/skyvern_page.py(1 hunks)skyvern/exceptions.py(1 hunks)skyvern/forge/prompts/skyvern/extract-action-static.j2(2 hunks)skyvern/forge/prompts/skyvern/extract-action.j2(2 hunks)skyvern/forge/sdk/db/utils.py(2 hunks)skyvern/webeye/actions/action_types.py(2 hunks)skyvern/webeye/actions/actions.py(1 hunks)skyvern/webeye/actions/handler.py(4 hunks)skyvern/webeye/actions/parse_actions.py(2 hunks)skyvern/webeye/scraper/domUtils.js(4 hunks)skyvern/webeye/scraper/scraper.py(2 hunks)skyvern/webeye/utils/dom.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
skyvern/webeye/actions/actions.py (1)
skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
skyvern/core/script_generations/script_skyvern_page.py (1)
skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
skyvern-frontend/src/routes/tasks/detail/hooks/useActions.ts (1)
skyvern-frontend/src/api/types.ts (1)
ActionTypes(231-252)
skyvern/core/script_generations/skyvern_page.py (2)
skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)skyvern/library/skyvern_locator.py (1)
locator(128-130)
skyvern/webeye/actions/action_types.py (1)
skyvern-frontend/src/api/types.ts (1)
ActionType(254-254)
skyvern/forge/sdk/db/utils.py (2)
skyvern/webeye/actions/actions.py (1)
HoverAction(285-287)skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
skyvern/webeye/actions/parse_actions.py (2)
skyvern/webeye/actions/actions.py (1)
HoverAction(285-287)skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
skyvern/webeye/utils/dom.py (2)
skyvern/core/script_generations/skyvern_page.py (1)
hover(210-227)skyvern/library/skyvern_locator.py (2)
locator(128-130)count(62-64)
skyvern/webeye/actions/handler.py (5)
skyvern/exceptions.py (1)
FailToHover(496-498)skyvern/webeye/actions/actions.py (1)
HoverAction(285-287)skyvern/webeye/actions/responses.py (3)
ActionResult(8-46)ActionFailure(66-83)ActionSuccess(49-63)skyvern/webeye/utils/dom.py (3)
hover_to_reveal(406-443)get_locator(458-459)get_id(394-395)skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
🪛 Ruff (0.14.4)
skyvern/core/script_generations/skyvern_page.py
216-216: Unused method argument: intention
(ARG002)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
skyvern/webeye/utils/dom.py
423-423: Do not catch blind exception: Exception
(BLE001)
435-435: Do not catch blind exception: Exception
(BLE001)
skyvern/webeye/actions/handler.py
1984-1984: Unused function argument: step
(ARG001)
1989-1989: Do not catch blind exception: Exception
(BLE001)
2004-2004: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (11)
skyvern/cli/init_command.py (1)
55-55: Rich markup disabled correctly for file-exists messagePassing
markup=Falsewhile keepingstyle="yellow"is the right way to avoid Rich mis-parsing[...backend_env_path...]as markup while preserving coloring.skyvern/webeye/actions/action_types.py (1)
15-15: HOVER action wiring into ActionType looks consistentThe new
HOVER = "hover"member and its inclusion inis_web_actionandPOST_ACTION_EXECUTION_ACTION_TYPESis consistent with how other web actions (e.g., CLICK) are treated and should ensure hover actions participate in the normal web handling and post‑action processing.Also applies to: 33-42, 45-55
skyvern/webeye/scraper/scraper.py (1)
107-116: Propagating hover-only metadata through scraping pipeline looks correct
- Adding
data-skyvern-hover-only="true"injson_to_htmlwhenelement.get("hoverOnly")is set cleanly exposes hover-only hints to downstream consumers without mutating the original element.- Early‑returning
Truefrom_should_keep_unique_idforhoverOnlyelements ensures their IDs survive trimming so they remain addressable for hover actions even if otherwise non‑interactable.This is a good, minimal integration of the new hover-only concept into the scraper.
Also applies to: 902-909
skyvern/core/script_generations/script_skyvern_page.py (1)
134-148: Hover emoji mapping integrates cleanly into script-mode outputAdding
ActionType.HOVER: "🖱️"toACTION_EMOJISis consistent with existing mappings and ensures hover actions are clearly labeled in script-mode logs with no side effects on behavior.skyvern-frontend/src/routes/tasks/detail/ActionTypePill.tsx (1)
8-12: Hover icon mapping is consistent with existing click iconThe new
hoverentry iniconsreusesCursorArrowIcon, aligning hover and click visually and integrating cleanly with the existing pill rendering.skyvern-frontend/src/routes/tasks/detail/hooks/useActions.ts (1)
15-27: Hover handling in getActionInput is reasonable and non-breakingAdding the
ActionTypes.Hoverbranch that returns"Hover"keepsgetActionInputaligned with the new action type and only affects the legacy path that uses this helper. The change is safe and consistent with the existing"Click"labeling.skyvern-frontend/src/routes/tasks/detail/ActionTypePillMinimal.tsx (1)
14-19: Hover icon mapping looks correct
hoveris now wired into theiconsmap with the same cursor icon asclick, so minimal pills will render hover actions without impacting existing types.skyvern/exceptions.py (1)
491-499:FailToHoveraligns with existing failure exceptionsThe new
FailToHoverexception mirrorsFailToClick’s pattern and will integrate cleanly with existing error handling and logging.skyvern-frontend/src/api/types.ts (1)
230-252: Frontend now exposes thehoveraction type consistentlyAdding
Hover: "hover"toActionTypesandhover: "Hover"toReadableActionTypeskeeps the frontend action-type union and labels in sync with the backendHOVERenum and the new hover UI affordance.Also applies to: 256-279
skyvern/forge/sdk/db/utils.py (1)
61-85: Hydration mapping forActionType.HOVERlooks correctImporting
HoverActionand wiringActionType.HOVER: HoverActionintoACTION_TYPE_TO_CLASSis the right way to lethydrate_actionhandle hover rows from the DB; this aligns with the existing patterns for other web actions.Also applies to: 90-112
skyvern/core/script_generations/skyvern_page.py (1)
209-227: Markintentionparameter as intentionally unused to resolve Ruff ARG002The
hover()implementation is correct. To satisfy Ruff, the intentionally unusedintentionparameter should be prefixed with an underscore or assigned with underscore. Apply the suggested fix:@action_wrap(ActionType.HOVER) async def hover( self, selector: str, *, timeout: float = settings.BROWSER_ACTION_TIMEOUT_MS, hold_seconds: float = 0.0, intention: str | None = None, **kwargs: Any, ) -> str: """Move the mouse over the element identified by `selector`.""" + # `intention` is accepted for API symmetry with other actions but is not yet used. + _ = intention if not selector: raise ValueError("Hover requires a selector.") locator = self.page.locator(selector, **kwargs) await locator.hover(timeout=timeout) if hold_seconds and hold_seconds > 0: await asyncio.sleep(hold_seconds) return selectorThis keeps the signature consistent with other action methods while silencing the linter and preserving future flexibility.
| Use the user details to fill in necessary values. Always satisfy required fields if the field isn't already filled in. Don't return any action for the same field, if this field is already filled in and the value is the same as the one you would have filled in. | ||
| MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc. | ||
| Each interactable element is tagged with an ID. Avoid taking action on a disabled element when there is an alternative action available. | ||
| Each interactable element is tagged with an ID. Avoid taking action on a disabled element when there is an alternative action available. The `id` you output MUST exactly match one of the IDs from the provided elements list—never invent IDs (e.g., `INSTER_CARD_WRAPPER`). If the user refers to an element that isn't listed, choose the closest matching real element ID or explain why no suitable element exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this change? have you run into any issues without the change?
i'm asking becasue:
we've supported the solve_captcha, Wait action, and going to support the actions likereload_page, open_tab in the future. These actions do not need the ID. i think this prompt might bring a side-affect for these actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah actually without this instruction the model was hovering on the wrong element, to address the future issue you mentioned, what do you think about this change?
| Each interactable element is tagged with an ID. Avoid taking action on a disabled element when there is an alternative action available. The `id` you output MUST exactly match one of the IDs from the provided elements list—never invent IDs (e.g., `INSTER_CARD_WRAPPER`). If the user refers to an element that isn't listed, choose the closest matching real element ID or explain why no suitable element exists. | |
| Each interactable element is tagged with an ID. Avoid taking action on a disabled element when there is an alternative action available. For element-based actions (CLICK, HOVER, INPUT_TEXT, UPLOAD_FILE, SELECT_OPTION), the `id` you output MUST exactly match one of the IDs from the provided elements list—never invent IDs. If the user refers to an element that isn't listed, choose the closest matching real element ID or explain why no suitable element exists. Actions that do not operate on a specific element (e.g., WAIT, SOLVE_CAPTCHA, RELOAD_PAGE, OPEN_TAB) should not provide an `id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah actually without this instruction the model was hovering on the wrong element
maybe LLM doesn't understand the HTML + description of Hover action, so he just gave the wrong id, wdyt?
| :param css: css of the element to click | ||
| """ | ||
| try: | ||
| await skyvern_element.hover_to_reveal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hover_to_reveal() simulates moving the mouse over the element before attempting to click or follow its link. Many menus/buttons are hidden until hovered, so clicking them directly fails. We mark some nodes as “hover-only” during scraping, and when an action targets one, hover_to_reveal() expands it so the subsequent navigate_to_a_href/locator.click has something visible and interactable to hit.
| function isHoverOnlyElement(element) { | ||
| // Check for common hover-only patterns in class names | ||
| const className = element.className?.toString() ?? ""; | ||
| const parentClassName = element.parentElement?.className?.toString() ?? ""; | ||
|
|
||
| // Common hover-only class patterns | ||
| if ( | ||
| className.includes("hover-") || | ||
| className.includes("-hover") || | ||
| parentClassName.includes("hover-") || | ||
| parentClassName.includes("-hover") | ||
| ) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check if parent has hover-related attributes or classes that might reveal this element | ||
| let parent = element.parentElement; | ||
| while (parent && parent !== document.body) { | ||
| const parentClass = parent.className?.toString() ?? ""; | ||
| if ( | ||
| parentClass.includes("hover") || | ||
| parentClass.includes("card") || | ||
| parentClass.includes("item") | ||
| ) { | ||
| // This element might be revealed on parent hover | ||
| return true; | ||
| } | ||
| parent = parent.parentElement; | ||
| } | ||
|
|
||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know it's an initial version, but what's the next plan to optimize it?
The scope, like this, might be very wide -- which means it might increase tons of tokens in the LLM call. do you have a plan to observe the change?
parentClass.includes("hover") ||
parentClass.includes("card") ||
parentClass.includes("item")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great question.
@mohamedmamdouh22 why do you think of using "card" and "item" besides "hover"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wintonzheng
The extra checks for "card" and "item" in isHoverOnlyElement are a heuristic. Many modern UIs place hover-revealed controls inside card or list-item wrappers whose own hover state controls the child’s visibility. By treating ancestors with those class names as possible hover containers, the scraper marks those children as “hover-only” so they remain interactable instead of being discarded as invisible when their own bounding box is 0×0 until the parent is hovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LawyZheng
Good call. Right now we walk all the way to body, which is broad and can pull in false positives and extra DOM in the prompt. Short term, we can cap the ancestor scan to ~4–5 levels to catch common card/list wrappers without fanning out to the whole page. Next steps I’d take:
-
Add a depth limit to the parent walk (e.g., stop after 4–5 hops).
-
Add a small metric/flag in the scraper to log how many nodes are tagged hover-only and any extra HTML we include, so we can see whether the cap actually trims token usage.
-
Keep the class-name heuristic for now but watch precision: we could tighten matches (e.g., -card, card-, list-item, menu-item) and avoid generic substrings like “item” alone if it proves too noisy.
My view: a shallow cap plus slightly more specific class checks should reduce noise and prompt size without missing typical hover-only quick-actions on cards/list rows. If that’s not enough, we can switch to CSS hover rule detection or sampling actual hover states, but the depth limit is the quickest win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we at least implement the depth limit in the first version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
|
appreciate the change! Generally speaking, i think the PR is good. At the beginning of the scraping/step, Skyvern will scroll to the top of the page, and scroll down to take screenshots + parse the DOM. Will the hover thing be broken when the page is scrolled up and down? For example: It will be better if you can provide a real case that can be unblocked by adding the hover action. cc @wintonzheng @suchintan for another review. |
The context is here! |
|
|
||
| const icons: Partial<Record<ActionType, React.ReactNode>> = { | ||
| click: <CursorArrowIcon className="h-4 w-4" />, | ||
| hover: <CursorArrowIcon className="h-4 w-4" />, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you pick another icon that makes sense from https://www.radix-ui.com/icons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cant see one which is more suitable from the radix icons, please let me know if you see any one makes more sense
|
|
||
| const icons: Partial<Record<ActionType, React.ReactNode>> = { | ||
| click: <CursorArrowIcon className="h-4 w-4" />, | ||
| hover: <CursorArrowIcon className="h-4 w-4" />, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same: please pick another icon that makes sense from https://www.radix-ui.com/icons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
skyvern/forge/prompts/skyvern/extract-action-static.j2(2 hunks)skyvern/webeye/scraper/scraper.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
skyvern/webeye/scraper/scraper.py (1)
skyvern/library/skyvern_locator.py (1)
get_attribute(78-80)
🪛 Ruff (0.14.4)
skyvern/webeye/scraper/scraper.py
686-686: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (8)
skyvern/forge/prompts/skyvern/extract-action-static.j2 (2)
6-6: LGTM: Clear guidance on ID matching for element-based vs. non-element actions.The instruction effectively distinguishes which actions require exact ID matches from the elements list (element-based: CLICK, HOVER, INPUT_TEXT, UPLOAD_FILE, SELECT_OPTION) versus which should not provide IDs (non-element: WAIT, SOLVE_CAPTCHA, etc.). The example of an invented ID and the fallback strategy for missing elements add helpful guardrails against LLM hallucination.
Based on past review comments showing this was refined to address concerns about future actions.
22-22: LGTM: HOVER action type clearly described.The "HOVER" addition is well-integrated into the action_type enum with a clear description that distinguishes it from CLICK (hover without clicking) and explains its purpose (revealing hover-only menus). The description format is consistent with other action types.
skyvern/webeye/scraper/scraper.py (6)
114-115: LGTM! Good addition of hover marker.The
data-hover="1"attribute provides a clear HTML marker for hover-only elements, making them identifiable in the DOM representation sent to the LLM.
213-213: LGTM! Good documentation update.The comment now correctly references
SKYVERN_ID_ATTR, improving code clarity and consistency with the actual implementation.
678-692: LGTM! Proper refactoring to use SKYVERN_ID_ATTR constant.The changes correctly fetch and validate the Skyvern ID attribute from frame elements, with appropriate early returns and logging when the attribute is missing. The broad exception handler at line 686 (flagged by static analysis) is acceptable in this context since scraping should continue gracefully even if individual frames fail to load, and the error is logged for debugging.
698-698: LGTM! Consistent usage of skyvern_id.The skyvern_id is correctly used as the frame name (line 698) and for matching the frame element with its children (line 702), maintaining consistency with the refactoring to use SKYVERN_ID_ATTR.
Also applies to: 702-702
909-910: LGTM! Critical for hover action targeting.The early return ensures that hover-only elements retain their unique IDs, which is essential for the hover action handler to target these elements. Without this, elements marked with
hoverOnlywould lose their identifiers and become unreachable.
411-486: Hover state persistence behavior is architecturally limited; verify if cross-step use case applies.The concern is valid: scraping at the start of each step uses
scroll=True(line 2316), which displaces the page. However, investigation reveals nuance:
- Within-step hover chains work: All actions in a step share the same
scraped_pageobject andcurrent_pageinstance (line 1157–1285). A hover action sets browser-level state that persists for subsequent actions in the same step.- Cross-step hover chains don't work: Each new step triggers
_scrape_with_type()withscroll=True, which scrolls the page and loses transient hover state. This is pre-existing architectural behavior, not introduced by this PR.The PR only adds hover detection and element ID preservation; it doesn't alter scraping or cross-step state management.
Clarify with reviewers: Do you need hover to persist across steps (step N hover → step N+1 action), or is within-step chaining (step N hover → step N other action) sufficient? If cross-step support is required, that's a separate architectural concern outside this PR's scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
skyvern/webeye/scraper/scraper.py (1)
686-692: Consider catching more specific exceptions.While the bare
Exceptioncatch provides graceful degradation when frame elements can't be accessed, it could hide unexpected errors. Consider catching more specific exceptions likeAttributeError,TimeoutError, or Playwright-specific errors.- except Exception: + except (AttributeError, TimeoutError, Exception) as e: LOG.warning( "Unable to get Skyvern id from frame_element", attr=SKYVERN_ID_ATTR, + error_type=type(e).__name__, exc_info=True, )That said, the current implementation does log the full traceback with
exc_info=True, which aids debugging.skyvern/webeye/actions/parse_actions.py (1)
170-171: Consider validatinghold_secondsparameter.The current implementation accepts any value for
hold_secondswithout validation. Consider adding bounds checking to ensure the value is non-negative and within a reasonable range to prevent issues like:
- Negative values (semantically invalid for a hold duration)
- Excessively large values that could cause actions to hang indefinitely
Example validation:
if action_type == ActionType.HOVER: - return HoverAction(**base_action_dict, hold_seconds=action.get("hold_seconds", 0) or 0) + hold_seconds = action.get("hold_seconds", 0) or 0 + if hold_seconds < 0 or hold_seconds > 300: # Cap at 5 minutes as a safety measure + LOG.warning( + "Invalid hold_seconds for hover action, clamping to valid range", + hold_seconds=hold_seconds, + task_id=base_action_dict.get("task_id"), + ) + hold_seconds = max(0, min(hold_seconds, 300)) + return HoverAction(**base_action_dict, hold_seconds=hold_seconds)skyvern/webeye/scraper/domUtils.js (1)
414-450: Case-sensitive string matching may miss hover-only elements.The function uses case-sensitive
.includes()checks on class names, which means class names like"Hover-button","CARD-item", or"Hover-Menu"won't match. Consider normalizing to lowercase before checking:function isHoverOnlyElement(element) { // Check for common hover-only patterns in class names - const className = element.className?.toString() ?? ""; - const parentClassName = element.parentElement?.className?.toString() ?? ""; + const className = element.className?.toString().toLowerCase() ?? ""; + const parentClassName = element.parentElement?.className?.toString().toLowerCase() ?? ""; // Common hover-only class patterns if ( className.includes("hover-") || className.includes("-hover") || parentClassName.includes("hover-") || parentClassName.includes("-hover") ) { return true; } // Check if parent has hover-related attributes or classes that might reveal this element let parent = element.parentElement; let depth = 0; // Cap recursion to avoid walking the entire tree and bloating prompts const maxDepth = 5; while (parent && parent !== document.body && depth < maxDepth) { - const parentClass = parent.className?.toString() ?? ""; + const parentClass = parent.className?.toString().toLowerCase() ?? ""; if ( parentClass.includes("hover") || parentClass.includes("card") || parentClass.includes("item") ) { // This element might be revealed on parent hover return true; } parent = parent.parentElement; depth += 1; } return false; }skyvern/webeye/actions/handler.py (2)
1994-2028: Standardize hover failure reporting and address lint noise inhandle_hover_action.The overall flow (resolve element → hover_to_reveal → locator.hover → optional hold) looks sound and matches the rest of the handler APIs. Two non-blocking improvements:
Consistent error shaping via
FailToHover:
- Currently, element-resolution failures (
dom.get_skyvern_element_by_id) returnActionFailure(exception=exc), while hover failures returnActionFailure(FailToHover(...)).- If you want metrics/logs to treat all hover failures uniformly, consider wrapping the first failure path in
FailToHoveras well:dom = DomUtil(scraped_page=scraped_page, page=page) try: skyvern_element = await dom.get_skyvern_element_by_id(action.element_id) except Exception as exc: # noqa: BLE001 LOG.warning( "Failed to resolve element for hover action", action=action, workflow_run_id=task.workflow_run_id, exc_info=True, )
return [ActionFailure(exception=exc)]
return [ActionFailure(FailToHover(action.element_id, msg=str(exc)))]
- Lint friendliness (
step+ broadexcept):
stepis unused; if Ruff’sARG001is enforced in CI, you can either rename it to_stepor mark it with# noqa: ARG001to preserve the shared handler signature.- Broad
except Exceptionis the established pattern in this module (handlers want to convert any failure intoActionFailure), so it’s reasonable to keep. If Ruff’sBLE001is enabled, local# noqa: BLE001on these twoexceptlines is enough to quiet it without changing behavior.These are polish-level changes; existing behavior is acceptable as-is.
2317-2317: Pre-clickhover_to_reveal()integration inchain_clickis a good fit; consider optional reuse in fallbacks.Calling
await skyvern_element.hover_to_reveal()before the main click is safe (it’s a no-op unlessrequires_hover()is true) and should unblock hover-only anchors/buttons before navigation or click.If you see real-world cases where the fallback targets (bound labels, blocking elements, etc.) are also hover-only, it might be worth optionally reusing
hover_to_reveal()there as well, but that can be deferred until such cases actually surface.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
skyvern/forge/prompts/skyvern/extract-action-static.j2(2 hunks)skyvern/forge/prompts/skyvern/extract-action.j2(2 hunks)skyvern/webeye/actions/actions.py(1 hunks)skyvern/webeye/actions/handler.py(4 hunks)skyvern/webeye/actions/parse_actions.py(2 hunks)skyvern/webeye/scraper/domUtils.js(4 hunks)skyvern/webeye/scraper/scraper.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- skyvern/webeye/actions/actions.py
- skyvern/forge/prompts/skyvern/extract-action.j2
🧰 Additional context used
🧬 Code graph analysis (3)
skyvern/webeye/scraper/scraper.py (1)
skyvern/library/skyvern_locator.py (1)
get_attribute(78-80)
skyvern/webeye/actions/parse_actions.py (2)
skyvern/webeye/actions/actions.py (1)
HoverAction(286-288)skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
skyvern/webeye/actions/handler.py (5)
skyvern/exceptions.py (1)
FailToHover(496-498)skyvern/webeye/actions/actions.py (1)
HoverAction(286-288)skyvern/webeye/actions/responses.py (3)
ActionResult(8-46)ActionFailure(66-83)ActionSuccess(49-63)skyvern/webeye/utils/dom.py (5)
DomUtil(1023-1090)get_skyvern_element_by_id(1040-1083)hover_to_reveal(406-443)get_locator(458-459)get_id(394-395)skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)
🪛 Ruff (0.14.5)
skyvern/webeye/scraper/scraper.py
686-686: Do not catch blind exception: Exception
(BLE001)
skyvern/webeye/actions/handler.py
2000-2000: Unused function argument: step
(ARG001)
2005-2005: Do not catch blind exception: Exception
(BLE001)
2020-2020: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (13)
skyvern/webeye/scraper/scraper.py (4)
114-116: LGTM! Clear hover hint for LLM.The
hoverable="true"attribute injection provides a clear signal to the LLM about hover-triggerable elements. The attribute name is readable and well-placed in the processing flow.
213-214: LGTM! Comment clarifies the use of SKYVERN_ID_ATTR.The updated comment accurately documents that interactable elements are marked with the SKYVERN_ID_ATTR attribute.
909-910: LGTM! Correctly preserves IDs for hover-only elements.The early return ensures that hover-only elements retain their IDs in the trimmed element tree, which is necessary for the hover action handler to target these elements. The placement before other checks is appropriate.
678-702: Refactoring verified—frame elements are properly tagged with the SKYVERN_ID_ATTR constant.The constant
SKYVERN_ID_ATTRis defined inskyvern/constants.pyas"unique_id", which matches the attribute name used throughout the JavaScript DOM scraping code (domUtils.js setselement.setAttribute("unique_id", element_id)). All usages in the refactored code and across the codebase (dom.py, handler.py, agent_functions.py) are consistent with this constant. The changes are correct.skyvern/forge/prompts/skyvern/extract-action-static.j2 (2)
6-6: LGTM: Clear ID validation rules prevent hallucination.The explicit guidance that element-based actions must use exact IDs from the provided list (no invented IDs) is necessary and well-structured. The distinction between element-based actions (CLICK, HOVER, INPUT_TEXT, etc.) requiring IDs and non-element actions (WAIT, SOLVE_CAPTCHA, etc.) not requiring IDs is clear and addresses the issue raised in past review comments.
22-22: HOVER action type description is clear, but subject to workflow limitation.The HOVER action type is correctly added to the enum with a clear description distinguishing it from CLICK. The description accurately conveys the purpose (reveal hover-only menus without clicking).
However, this action type is subject to the hover state persistence limitation raised in Line 9: hover-revealed state may be cleared when Skyvern's scraping workflow scrolls the page at step boundaries. Ensure the workflow limitation is documented (see Line 9 comment) before this action type can be safely used in multi-step scenarios.
skyvern/webeye/actions/parse_actions.py (1)
30-30: LGTM: Import added correctly.The
HoverActionimport is properly placed and necessary for the hover action handling below.skyvern/webeye/scraper/domUtils.js (4)
490-496: LGTM: Hover-only elements with zero dimensions now treated as visible.This change correctly allows elements that are initially hidden (zero dimensions) but become visible on hover to pass the visibility check, enabling the hover action workflow.
864-875: LGTM: Hover-only elements with pointer-events:none remain interactable.The logic correctly preserves interactability for hover-revealed elements that have
pointer-events: noneset initially, allowing the agent to attempt hover actions on their parents.
1570-1570: LGTM: hoverOnly flag added to element object.The
hoverOnlyfield is properly computed and will be available to the backend action planning logic.
414-450: Verify hover state persistence across step boundaries.As noted in the PR discussion by LawyZheng and suchintan, there's a concern about whether hover state persists when Skyvern's scraping workflow scrolls the page between steps. The current implementation correctly detects and marks hover-only elements, but the interaction between hover actions and the scraping workflow needs verification:
- Step N: Agent performs hover action to reveal element
- Step N+1 starts: Scraping scrolls page (typically to top, then incrementally down)
- Question: Does the revealed element remain visible, or is hover state lost?
Consider generating a test scenario or providing a real-world example (as requested by reviewers) that demonstrates:
- Hover action reveals an element in one step
- Element remains accessible/visible in subsequent steps despite scrolling
- Or: if hover state is lost, how the agent recovers (re-hover before click, etc.)
This will help validate the architectural approach and ensure the feature works as intended in multi-step workflows.
skyvern/webeye/actions/handler.py (2)
35-35: Import ofFailToHoveris consistent with new hover failure semantics.The new exception is correctly imported and used in the hover handler to surface hover-specific failures; no issues here.
2237-2237: HOVER action registration looks correct.Registering
ActionType.HOVERwithhandle_hover_actionkeeps the handler table consistent with other web actions and works withActionType.is_web_action/check_for_invalid_web_actiongating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
skyvern/webeye/scraper/domUtils.js (1)
414-450: Hover-only heuristic is useful butcard/itemremain very broadThe ancestor scan with
parentClass.includes("hover") || parentClass.includes("card") || parentClass.includes("item")plus a depth of 5 will still tag a lot of generic containers as hover-only. That’s probably fine for a first pass, but it may over-mark elements and bloat prompts.Consider tightening the non-
hoverchecks to more specific patterns (e.g.menu-item,list-item,nav-item,*-card) or explicit classList matches, and/or logging hoverOnly counts so we can tune this heuristic from data.
🧹 Nitpick comments (3)
skyvern/webeye/scraper/domUtils.js (1)
868-875: Pointer-events override for hover-only is reasonableLetting elements with
pointer-events: noneremain interactable whenisHoverOnlyElementis true is a good way to keep hover-revealed CTAs in the tree while still filtering true non-interactables. This relies entirely on the heuristic, so once you tightenisHoverOnlyElement, revisit this branch if you see noisy interactables.skyvern/webeye/scraper/scraper.py (1)
666-709: Frame Skyvern ID handling is consistent with the new ID schemeIn
add_frame_interactable_elements:
- Reading
skyvern_idviaframe_element.get_attribute(SKYVERN_ID_ATTR),- Logging clearly when it’s missing,
- Passing
frame_name=skyvern_idintobuild_tree_from_body, and- Attaching the resulting
frame_element_treewhereelement["id"] == skyvern_idall line up with how
build_element_dictandresolve_locatorinterpret frame IDs. This should keep cross-frame resolution working while making the ID semantics explicit.skyvern/webeye/utils/dom.py (1)
404-445: Narrow exception handling and cache element locator for best practicesThe implementation is solid, but two optional improvements align with Playwright best practices:
Narrow broad exceptions: Playwright raises
TimeoutErrorfor timeouts and theErrorbase class for other failures. Replace bothexcept Exception:blocks (lines ~426 and ~433) withexcept Error:to avoid masking unexpected failures while keeping the method best-effort. For timeouts specifically, you could catchTimeoutError.Cache element locator: Avoid recomputing
self.get_locator()by caching it once at the start and reusing it in theis_visible()check:element_locator = self.get_locator() hover_target = element_locator # ... in loop: if await element_locator.is_visible(timeout=timeout):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
skyvern/webeye/scraper/domUtils.js(4 hunks)skyvern/webeye/scraper/scraper.py(5 hunks)skyvern/webeye/utils/dom.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
skyvern/webeye/scraper/scraper.py (1)
skyvern/library/skyvern_locator.py (1)
get_attribute(78-80)
skyvern/webeye/utils/dom.py (2)
skyvern/core/script_generations/skyvern_page.py (1)
hover(210-227)skyvern/library/skyvern_locator.py (2)
locator(128-130)count(62-64)
🪛 Ruff (0.14.5)
skyvern/webeye/scraper/scraper.py
688-688: Do not catch blind exception: Exception
(BLE001)
skyvern/webeye/utils/dom.py
424-424: Do not catch blind exception: Exception
(BLE001)
436-436: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (6)
skyvern/webeye/scraper/domUtils.js (2)
491-494: Zero-size hover-only elements treated as visible looks correctMarking
rect.width/height <= 0elements as visible whenisHoverOnlyElement(element)is true aligns visibility with your hover-only heuristic and ensures these CTAs survive scraping. Just be aware this also lets their text surface intogetVisibleText, which will expose hover-only labels to the LLM even before hover — that seems desirable given you also tag themhoverOnly.
1571-1571: PropagatinghoverOnlyinto elementObj is necessary for backend hover supportAdding
hoverOnlyto the serialized element object lets Python distinguish elements that require hover, and it’s kept through hashing/ID maps. This wiring looks consistent with howisHoverOnlyElementis used elsewhere.skyvern/webeye/utils/dom.py (1)
127-138: SkyvernElement hover-only flag wiring looks correctStoring
hoverOnlyfrom the static element into_hover_onlyand caching it alongside tag/frame/attributes keeps the Python view in sync with the scraper. No issues here.skyvern/webeye/scraper/scraper.py (3)
114-116:hoverable="true"attribute is a good, LLM-friendly surfaceAdding
hoverable="true"at render time whenhoverOnlyis set cleanly exposes hover semantics to the LLM without polluting the stored attributes (since this lives only in the localattributescopy insidejson_to_html). This should also work consistently for both full and economy trees.
213-223: SKYVERN_ID_ATTR-based CSS selectors maintain ID lookup consistencyUsing
[{SKYVERN_ID_ATTR}='{element_id}']as the canonical CSS for each element, with the comment explaining thatget_interactable_element_treetags interactables with that attribute, keeps the Python side aligned with the JSunique_idbehavior. The build maps (id_to_css_dict,id_to_frame_dict, hashes) remain coherent after this change.
906-930: Preserving IDs for hover-only elements is important and correctUpdating
_should_keep_unique_idto always keep theidwhenelement.get("hoverOnly")is truthy ensures hover-only CTAs remain individually addressable after trimming, even if they’re not otherwise interactable or marked readonly/disabled. This is key for resolving selectors for hover + click sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
skyvern/webeye/scraper/domUtils.js (1)
436-440: Consider tightening class name patterns to reduce false positives.The current heuristic checks if parent class names contain "hover", "card", or "item". This may be too broad:
- "item" appears in many contexts unrelated to hover (e.g., "menuitem", "listitem", "item-price")
- "card" similarly appears in non-hover contexts (e.g., "credit-card", "scorecard")
Consider more specific patterns to reduce false positives:
const parentClass = parent.className?.toString() ?? ""; if ( parentClass.includes("hover") || - parentClass.includes("card") || - parentClass.includes("item") + /\b(card|item)-hover\b/.test(parentClass) || + /\bhover-(card|item)\b/.test(parentClass) || + /\bmenu-item\b/.test(parentClass) || + /\blist-item\b/.test(parentClass) ) {This uses word boundaries and more specific patterns to match common hover-related class naming conventions while avoiding overly generic matches.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
skyvern/core/script_generations/skyvern_page.py(1 hunks)skyvern/library/skyvern_locator.py(1 hunks)skyvern/webeye/actions/handler.py(4 hunks)skyvern/webeye/scraper/domUtils.js(4 hunks)skyvern/webeye/utils/dom.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
skyvern/core/script_generations/skyvern_page.py (2)
skyvern/webeye/actions/action_types.py (1)
ActionType(4-42)skyvern/library/skyvern_locator.py (2)
hover(49-52)locator(129-131)
skyvern/webeye/actions/handler.py (5)
skyvern/exceptions.py (1)
FailToHover(496-498)skyvern/forge/sdk/trace/__init__.py (2)
TraceManager(14-109)traced_async(18-56)skyvern/webeye/actions/actions.py (1)
HoverAction(286-288)skyvern/webeye/actions/responses.py (3)
ActionResult(8-46)ActionFailure(66-83)ActionSuccess(49-63)skyvern/webeye/utils/dom.py (5)
DomUtil(1025-1092)get_skyvern_element_by_id(1042-1085)hover_to_reveal(407-445)get_locator(460-461)get_id(395-396)
skyvern/webeye/utils/dom.py (1)
skyvern/library/skyvern_locator.py (4)
hover(49-52)is_visible(88-90)locator(129-131)count(63-65)
🪛 Ruff (0.14.5)
skyvern/core/script_generations/skyvern_page.py
216-216: Unused method argument: intention
(ARG002)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
skyvern/webeye/actions/handler.py
2000-2000: Unused function argument: step
(ARG001)
2005-2005: Do not catch blind exception: Exception
(BLE001)
2021-2021: Do not catch blind exception: Exception
(BLE001)
skyvern/webeye/utils/dom.py
425-425: Do not catch blind exception: Exception
(BLE001)
437-437: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (8)
skyvern/library/skyvern_locator.py (1)
49-52: LGTM! Good defensive programming.Scrolling the element into view before hovering ensures the hover action can execute reliably, consistent with the pattern used in other action methods.
skyvern/webeye/scraper/domUtils.js (1)
414-450: Address hover state persistence concern raised in PR objectives.The PR reviewers (LawyZheng and suchintan) raised an important question: Skyvern's step-start scraping workflow scrolls to the top and then scrolls down between steps. Will hover-revealed states be broken by that scrolling?
For example:
- Step 1 completes a hover action, revealing a dropdown
- Step 2's scraping begins and scrolls the page
- The hover state may be lost, making the revealed elements disappear
This implementation marks elements as
hoverOnlyduring scraping but doesn't address hover state persistence across steps. Consider:
- Short-term: Document this limitation so users understand hover actions may need to be repeated after scrolling.
- Long-term: Investigate maintaining hover state during scraping or re-hovering before subsequent actions on hover-revealed elements.
Can you provide a real-world example (as requested by reviewers) where this hover action successfully unblocks a workflow that previously failed?
skyvern/webeye/utils/dom.py (2)
404-406: LGTM! Clean accessor for hover requirement.The
requires_hover()method provides a clear, public interface to check if an element needs hover-based reveal.
407-445: Remove the hover state persistence concern and correct the exception handling approach.The hover state concern is invalid—
hover_to_reveal()is called immediately before each action on hover-only elements (see handler.py lines 2015–2020 and 2318–2322), so hover state is re-established per-action, not once during scraping. This addresses scrolling between steps.For exception handling, the broad
Exceptioncatching at lines 425 and 437 can be improved with better logging, but the proposed diff has a flaw:except (TimeoutError, Exception)is redundant sinceTimeoutErroris already a subclass ofException. Instead, keepexcept Exception:and add error context logging if desired—the current pattern is acceptable for retry logic.Likely an incorrect or invalid review comment.
skyvern/webeye/actions/handler.py (4)
35-35: LGTM: Exception import for hover action support.The
FailToHoverexception is properly imported and used in the hover action handler.
2238-2238: LGTM: Hover action handler properly registered.The registration follows the established pattern for other action types in the system.
2318-2318: Hover-to-reveal added before click - verify state persistence.Adding
hover_to_reveal()before click/navigation makes sense to reveal hidden interactive elements (as explained in past review comments). However, this is subject to the same hover state persistence concern raised for explicit hover actions.Since this call happens immediately before the click within the same action execution, the risk is lower than for cross-step scenarios. However, if the click triggers navigation or there are timing delays, the hover state could still be lost.
The verification script suggested for the main hover handler (lines 1994-2029) should also cover this usage pattern.
Based on past review comments.
1994-2029: Based on the code review and web search results, I can now verify the major architectural concern raised in the original review.Hover state is lost at step boundaries—this is a confirmed architectural issue requiring verification.
Skyvern workflows chain multiple tasks together, and within each task, the Planner Agent decomposes complex objectives into achievable goals and creates step-by-step execution plans. The concern is that browser sessions persist the real-time state of the browser, but the actual browser state (specifically hover-revealed elements) will be reset.
The script output confirms the critical issue: during scraping at step boundaries, the scraping workflow scrolls to the top and then scrolls down to capture screenshots and parse the DOM. This scroll-to-top operation resets all browser hover states. A hover action performed in step N will have its revealed elements hidden when step N+1 begins its scraping cycle and calls
scroll_to_top().The
handle_hover_actionfunction at lines 1994-2029 provides no mechanism to preserve or reapply hover state across steps. While line 2016 callsscroll_into_view_if_needed()to ensure visibility, this only affects DOM positioning and does not maintain the transient hover-reveal state that depends on the element remaining hovered.This is a valid architectural concern requiring clarification. The reviewers requested real-case examples where hover actions work across step boundaries. Without such examples or architectural documentation explaining how hover state persists, this remains a potential blocker for workflows that depend on multi-step hover interactions.
Additionally, the function calls
hover_to_reveal()(line 2015) then immediatelyhover()again (line 2017)—this redundancy lacks explanation but may be intentional for thehold_secondsfeature.Minor notes: The
stepparameter (line 2000) is unused, and bareExceptioncatching (lines 2005, 2021) could be more specific.
| @action_wrap(ActionType.HOVER) | ||
| async def hover( | ||
| self, | ||
| selector: str, | ||
| *, | ||
| timeout: float = settings.BROWSER_ACTION_TIMEOUT_MS, | ||
| hold_seconds: float = 0.0, | ||
| intention: str | None = None, | ||
| **kwargs: Any, | ||
| ) -> str: | ||
| """Move the mouse over the element identified by `selector`.""" | ||
| if not selector: | ||
| raise ValueError("Hover requires a selector.") | ||
|
|
||
| locator = self.page.locator(selector, **kwargs) | ||
| await locator.scroll_into_view_if_needed() | ||
| await locator.hover(timeout=timeout) | ||
| if hold_seconds and hold_seconds > 0: | ||
| await asyncio.sleep(hold_seconds) | ||
| return selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused parameter and missing AI support.
Two observations:
-
The
intentionparameter is accepted but never used. If this is reserved for future AI-powered hover, consider either removing it for now or documenting the intent. -
Unlike
click()andfill(), this method lacks AI fallback support (nopromptparameter, no AI mode handling). If hover actions should support natural language prompts like other actions, consider adding that capability for consistency.
Apply this diff if intention should be removed for now:
@action_wrap(ActionType.HOVER)
async def hover(
self,
selector: str,
*,
timeout: float = settings.BROWSER_ACTION_TIMEOUT_MS,
hold_seconds: float = 0.0,
- intention: str | None = None,
**kwargs: Any,
) -> str:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @action_wrap(ActionType.HOVER) | |
| async def hover( | |
| self, | |
| selector: str, | |
| *, | |
| timeout: float = settings.BROWSER_ACTION_TIMEOUT_MS, | |
| hold_seconds: float = 0.0, | |
| intention: str | None = None, | |
| **kwargs: Any, | |
| ) -> str: | |
| """Move the mouse over the element identified by `selector`.""" | |
| if not selector: | |
| raise ValueError("Hover requires a selector.") | |
| locator = self.page.locator(selector, **kwargs) | |
| await locator.scroll_into_view_if_needed() | |
| await locator.hover(timeout=timeout) | |
| if hold_seconds and hold_seconds > 0: | |
| await asyncio.sleep(hold_seconds) | |
| return selector | |
| @action_wrap(ActionType.HOVER) | |
| async def hover( | |
| self, | |
| selector: str, | |
| *, | |
| timeout: float = settings.BROWSER_ACTION_TIMEOUT_MS, | |
| hold_seconds: float = 0.0, | |
| **kwargs: Any, | |
| ) -> str: | |
| """Move the mouse over the element identified by `selector`.""" | |
| if not selector: | |
| raise ValueError("Hover requires a selector.") | |
| locator = self.page.locator(selector, **kwargs) | |
| await locator.scroll_into_view_if_needed() | |
| await locator.hover(timeout=timeout) | |
| if hold_seconds and hold_seconds > 0: | |
| await asyncio.sleep(hold_seconds) | |
| return selector |
🧰 Tools
🪛 Ruff (0.14.5)
216-216: Unused method argument: intention
(ARG002)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
🖱️ This PR introduces comprehensive hover action support to Skyvern, enabling the automation framework to handle mouse-over interactions that reveal hidden UI elements like dropdown menus, tooltips, and hover-triggered buttons. The implementation spans both frontend and backend components, including LLM prompt updates to intelligently detect and interact with hover-dependent UI elements.
🔍 Detailed Analysis
Key Changes
HOVERas a new action type across the entire codebase, including TypeScript definitions, Python enums, and action handlersTechnical Implementation
sequenceDiagram participant LLM as AI Agent participant DOM as DOM Scanner participant Action as Action Handler participant Browser as Browser/Page DOM->>DOM: Detect hover-only elements DOM->>LLM: Mark elements with hoverOnly flag LLM->>Action: Generate HOVER action Action->>Browser: Execute hover with optional hold_seconds Browser->>Browser: Reveal hidden UI elements Action->>LLM: Continue with next action (e.g., CLICK)Impact
Created with Palmier
Important
Adds hover action support across Skyvern, updating frontend, backend, and script generation to handle hover interactions.
HOVERaction type toActionTypesintypes.tsand updatesActionTypePill.tsxandActionTypePillMinimal.tsxto display hover actions.extract-action-static.j2andextract-action.j2to recognize hover-triggered UI elements.generate_script.pywith configurable hold duration.domUtils.jsto detect hover-only elements and mark them withhoverOnlyflag.dom.pyto includehover_to_reveal()method inSkyvernElementclass.FailToHoverexception inexceptions.py.init_command.pyto prevent Rich console formatting errors.This description was created by
for 0fda52f. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.