Skip to content

feat(web-integration): support element cache for cross-origin iframes#2491

Open
Postroggy wants to merge 1 commit into
web-infra-dev:mainfrom
Postroggy:feat/cross-origin-iframe-cache
Open

feat(web-integration): support element cache for cross-origin iframes#2491
Postroggy wants to merge 1 commit into
web-infra-dev:mainfrom
Postroggy:feat/cross-origin-iframe-cache

Conversation

@Postroggy
Copy link
Copy Markdown
Contributor

Summary

When an element lives inside a cross-origin iframe, the browser's same-origin policy blocks DOM traversal from the top-level page context. This caused XPath-based element caching to fail: the write path could only reach the iframe element itself, and the read path returned null when trying to access contentDocument.

This PR fixes element cache support for cross-origin iframes by detecting cross-origin boundaries in the browser-side locator and handling them on the Node side via the automation framework's Frame API (CDP).

Why

Midscene's element caching uses XPath to avoid repeated AI calls for element location. For same-origin iframes this works because the browser-side script can traverse into the iframe's DOM. For cross-origin iframes, contentDocument returns null (or throws), so the XPath can never reach the inner element — cache writes store only the iframe's xpath, and cache reads fail with null.

The fix leverages the fact that both Playwright and Puppeteer can access any frame via CDP (frame.evaluate()), bypassing the browser's same-origin policy.

Changes

  • packages/shared/src/extractor/locator.ts

    • Add CrossOriginIframeSignal interface returned when getXpathsByPoint hits a cross-origin iframe boundary
    • Return the signal (with iframe xpath + translated coordinates) instead of silently falling through
    • Add getElementInfoByXpathInCurrentFrame() for evaluating xpath within a single frame context with coordinate offset
    • Add getIframeRectByXpath() for retrieving iframe bounding rect from parent frame
    • Import SUB_XPATH_SEPARATOR from constants instead of defining locally
  • packages/shared/src/constants/index.ts

    • Move SUB_XPATH_SEPARATOR here to align with project conventions for constants
  • packages/shared/src/extractor/index.ts

    • Export new symbols: getElementInfoByXpathInCurrentFrame, getIframeRectByXpath, CrossOriginIframeSignal
  • packages/web-integration/src/puppeteer/base-page.ts

    • Modify getXpathsByPoint() to detect CrossOriginIframeSignal and delegate to new handleCrossOriginXpathsByPoint()
    • Add handleCrossOriginXpathsByPoint(): finds the Frame object via xpath, re-evaluates inside it, supports recursive nested cross-origin iframes (max depth 5)
    • Add findFrameByXpath() / findSingleFrameByXpath(): resolve compound iframe xpath to Frame objects for both Puppeteer and Playwright
    • Modify getElementInfoByXpath() to split compound xpaths and resolve through multiple frames via resolveMultiFrameElementInfo()
    • Add resolveMultiFrameElementInfo(): accumulates coordinate offsets through iframe chain and evaluates final xpath in deepest frame
    • Add evaluateInContext(): evaluate script in any Page/Frame context

Validation

  • pnpm run lint — passes
  • npx nx build @midscene/shared — passes
  • npx nx build @midscene/web — passes
  • npx nx test @midscene/shared — 28 files, 324 tests passed
  • npx nx test @midscene/web — 36 files, 312 tests passed
  • Manual E2E: ran a YAML script targeting a login form inside a cross-origin iframe twice — first run generates cache with compound XPaths (containing |>>|), second run successfully reads cache and locates elements via Frame API (execution time dropped from 84s to 50s due to skipped AI calls)

When an element lives inside a cross-origin iframe, the browser's
same-origin policy blocks DOM traversal from the top-level page context.
This caused XPath-based element caching to fail: the write path could
only reach the iframe element itself, and the read path returned null
when trying to access contentDocument.

Fix by detecting cross-origin boundaries in the browser-side locator
(returning a structured CrossOriginIframeSignal) and handling them on
the Node side via the automation framework's Frame API, which bypasses
same-origin restrictions through CDP.

Write path: when getXpathsByPoint hits a cross-origin iframe, it returns
a signal with the iframe's xpath and translated coordinates. The Node
orchestrator then finds the Frame object and re-evaluates inside it,
producing a compound xpath joined by |>>|.

Read path: compound xpaths are split by |>>|, each iframe segment is
resolved to a Frame via contentFrame(), and the final segment is
evaluated in the deepest frame with coordinate offsets accumulated from
parent frames.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant