feat(web-integration): support element cache for cross-origin iframes#2491
Open
Postroggy wants to merge 1 commit into
Open
feat(web-integration): support element cache for cross-origin iframes#2491Postroggy wants to merge 1 commit into
Postroggy wants to merge 1 commit into
Conversation
When an element lives inside a cross-origin iframe, the browser's same-origin policy blocks DOM traversal from the top-level page context. This caused XPath-based element caching to fail: the write path could only reach the iframe element itself, and the read path returned null when trying to access contentDocument. Fix by detecting cross-origin boundaries in the browser-side locator (returning a structured CrossOriginIframeSignal) and handling them on the Node side via the automation framework's Frame API, which bypasses same-origin restrictions through CDP. Write path: when getXpathsByPoint hits a cross-origin iframe, it returns a signal with the iframe's xpath and translated coordinates. The Node orchestrator then finds the Frame object and re-evaluates inside it, producing a compound xpath joined by |>>|. Read path: compound xpaths are split by |>>|, each iframe segment is resolved to a Frame via contentFrame(), and the final segment is evaluated in the deepest frame with coordinate offsets accumulated from parent frames.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When an element lives inside a cross-origin iframe, the browser's same-origin policy blocks DOM traversal from the top-level page context. This caused XPath-based element caching to fail: the write path could only reach the iframe element itself, and the read path returned null when trying to access
contentDocument.This PR fixes element cache support for cross-origin iframes by detecting cross-origin boundaries in the browser-side locator and handling them on the Node side via the automation framework's Frame API (CDP).
Why
Midscene's element caching uses XPath to avoid repeated AI calls for element location. For same-origin iframes this works because the browser-side script can traverse into the iframe's DOM. For cross-origin iframes,
contentDocumentreturns null (or throws), so the XPath can never reach the inner element — cache writes store only the iframe's xpath, and cache reads fail with null.The fix leverages the fact that both Playwright and Puppeteer can access any frame via CDP (
frame.evaluate()), bypassing the browser's same-origin policy.Changes
packages/shared/src/extractor/locator.tsCrossOriginIframeSignalinterface returned whengetXpathsByPointhits a cross-origin iframe boundarygetElementInfoByXpathInCurrentFrame()for evaluating xpath within a single frame context with coordinate offsetgetIframeRectByXpath()for retrieving iframe bounding rect from parent frameSUB_XPATH_SEPARATORfrom constants instead of defining locallypackages/shared/src/constants/index.tsSUB_XPATH_SEPARATORhere to align with project conventions for constantspackages/shared/src/extractor/index.tsgetElementInfoByXpathInCurrentFrame,getIframeRectByXpath,CrossOriginIframeSignalpackages/web-integration/src/puppeteer/base-page.tsgetXpathsByPoint()to detectCrossOriginIframeSignaland delegate to newhandleCrossOriginXpathsByPoint()handleCrossOriginXpathsByPoint(): finds the Frame object via xpath, re-evaluates inside it, supports recursive nested cross-origin iframes (max depth 5)findFrameByXpath()/findSingleFrameByXpath(): resolve compound iframe xpath to Frame objects for both Puppeteer and PlaywrightgetElementInfoByXpath()to split compound xpaths and resolve through multiple frames viaresolveMultiFrameElementInfo()resolveMultiFrameElementInfo(): accumulates coordinate offsets through iframe chain and evaluates final xpath in deepest frameevaluateInContext(): evaluate script in any Page/Frame contextValidation
pnpm run lint— passesnpx nx build @midscene/shared— passesnpx nx build @midscene/web— passesnpx nx test @midscene/shared— 28 files, 324 tests passednpx nx test @midscene/web— 36 files, 312 tests passed|>>|), second run successfully reads cache and locates elements via Frame API (execution time dropped from 84s to 50s due to skipped AI calls)