The Browser.AI DOM Processing Service is a sophisticated system that analyzes web pages, extracts interactive elements, and provides AI agents with structured information about the DOM. It combines JavaScript execution in the browser with Python-based processing to create a comprehensive understanding of web page structure and interactivity.
graph TB
subgraph "DOM Processing Pipeline"
DS[DOM Service]
JS[JavaScript Injector]
Builder[DOM Tree Builder]
Parser[Node Parser]
Processor[History Tree Processor]
end
subgraph "Browser Integration"
Page[Playwright Page]
Eval[JavaScript Evaluation]
Elements[DOM Elements]
Viewport[Viewport State]
end
subgraph "Data Structures"
Tree[DOM Tree]
SelectorMap[Selector Map]
Coordinates[Coordinate System]
History[Element History]
end
DS --> JS
JS --> Builder
Builder --> Parser
Parser --> Processor
Page --> Eval
Eval --> Elements
Elements --> Tree
Tree --> SelectorMap
Viewport --> Coordinates
SelectorMap --> History
style DS fill:#fff3e0
style Tree fill:#e8f5e8
style SelectorMap fill:#e1f5fe
class DomService:
def __init__(self, page: Page):
self.page = page
self.xpath_cache = {}
self.js_code = resources.read_text('browser_ai.dom', 'buildDomTree.js')
async def get_clickable_elements(
self,
highlight_elements: bool = True,
focus_element: int = -1,
viewport_expansion: int = 0,
) -> DOMStateThe DOM processing begins with sophisticated JavaScript code executed in the browser that analyzes the page structure:
// buildDomTree.js - Core DOM analysis engine
(args = {
doHighlightElements: true,
focusHighlightIndex: -1,
viewportExpansion: 0,
}) => {
const { doHighlightElements, focusHighlightIndex, viewportExpansion } = args;
let highlightIndex = 0;
const DOM_HASH_MAP = {};
// Element highlighting and analysis functions
function highlightElement(element, index, parentIframe = null) {
// Visual highlighting with color-coded indices
// Coordinate calculation and viewport analysis
}
function extractElementInfo(element) {
// Extract all relevant element properties
// Calculate positions and visibility
}
}sequenceDiagram
participant Python as DOM Service
participant JS as JavaScript Engine
participant Browser as Browser DOM
participant Elements as DOM Elements
Python->>JS: Execute buildDomTree.js
JS->>Browser: Traverse DOM Tree
Browser->>Elements: Query Element Properties
Elements-->>Browser: Element Data
Browser-->>JS: Processed Information
loop For Each Interactive Element
JS->>Browser: Check Visibility
JS->>Browser: Calculate Coordinates
JS->>Browser: Extract Attributes
JS->>Browser: Apply Highlighting
end
JS-->>Python: Return DOM Map + Metadata
Python->>Python: Build Python DOM Tree
The JavaScript engine identifies interactive elements based on:
-
Tag-based Selection
- Input elements (
input,textarea,select) - Interactive elements (
button,a,label) - Form elements (
form,fieldset)
- Input elements (
-
Attribute-based Selection
- Elements with
onclickhandlers - Elements with
hrefattributes - Elements with
tabindex - Elements with ARIA roles
- Elements with
-
Computed Style Analysis
cursor: pointerelements- Non-zero dimensions
- Visible elements (not
display: none)
-
Contextual Analysis
- Parent container interactivity
- Event listener presence
- Focus capability
classDiagram
class DOMBaseNode {
+bool is_visible
+Optional[DOMElementNode] parent
}
class DOMElementNode {
+str tag_name
+Optional[int] highlight_index
+dict[str, str] attributes
+list[DOMBaseNode] children
+str xpath
+Optional[CoordinateSet] viewport_coordinates
+Optional[CoordinateSet] page_coordinates
+Optional[ViewportInfo] viewport_info
+bool shadow_root
+get_advanced_css_selector()
+to_dict()
+from_dict()
}
class DOMTextNode {
+str text
+str type = 'TEXT_NODE'
+has_parent_with_highlight_index()
}
DOMBaseNode <|-- DOMElementNode
DOMBaseNode <|-- DOMTextNode
DOMElementNode --> DOMBaseNode : children
class Coordinates(BaseModel):
x: int
y: int
class CoordinateSet(BaseModel):
top_left: Coordinates
top_right: Coordinates
bottom_left: Coordinates
bottom_right: Coordinates
center: Coordinates
width: int
height: intclass ViewportInfo(BaseModel):
scroll_x: int
scroll_y: int
width: int
height: intclass DOMState(BaseModel):
element_tree: DOMElementNode # Root of the DOM tree
selector_map: SelectorMap # Index -> Element mappinggraph TD
Start[Page Load] --> Validate[Validate JavaScript]
Validate --> Inject[Inject buildDomTree.js]
Inject --> Execute[Execute with Parameters]
Execute --> Parse[Parse JavaScript Results]
Parse --> BuildTree[Build Python DOM Tree]
BuildTree --> ProcessNodes[Process Node Relationships]
ProcessNodes --> MapSelectors[Create Selector Map]
MapSelectors --> Cleanup[Memory Cleanup]
Cleanup --> Return[Return DOM State]
subgraph "Node Processing"
ProcessNodes --> ElementNodes[Process Element Nodes]
ProcessNodes --> TextNodes[Process Text Nodes]
ProcessNodes --> Coordinates[Calculate Coordinates]
ProcessNodes --> Attributes[Extract Attributes]
end
subgraph "Tree Building"
BuildTree --> ParentChild[Establish Parent-Child]
BuildTree --> Highlighting[Apply Highlighting]
BuildTree --> Indexing[Create Index Map]
end
style Start fill:#e8f5e8
style Return fill:#e8f5e8
def _parse_node(self, node_data: dict) -> tuple[Optional[DOMBaseNode], list[int]]:
"""Parse JavaScript node data into Python DOM node"""
# Handle text nodes
if node_data.get('type') == 'TEXT_NODE':
return DOMTextNode(
text=node_data['text'],
is_visible=node_data['isVisible'],
parent=None
), []
# Process element coordinates
viewport_coordinates = None
page_coordinates = None
viewport_info = None
# Extract coordinate information
if 'viewportCoordinates' in node_data:
viewport_coordinates = CoordinateSet(...)
if 'pageCoordinates' in node_data:
page_coordinates = CoordinateSet(...)
if 'viewport' in node_data:
viewport_info = ViewportInfo(...)
# Create element node
return DOMElementNode(...), children_idsThe History Tree Processor provides capabilities for tracking elements across page changes and maintaining element identity:
graph LR
subgraph "History Processing"
Convert[Convert to History Element]
Hash[Generate Element Hash]
Find[Find in Current Tree]
Match[Match Elements]
end
subgraph "Hashing Strategy"
Branch[Branch Path Hash]
Attr[Attributes Hash]
XPath[XPath Hash]
Combined[Combined Hash]
end
Convert --> Hash
Hash --> Branch
Hash --> Attr
Hash --> XPath
Branch --> Combined
Attr --> Combined
XPath --> Combined
Combined --> Find
Find --> Match
style Convert fill:#fff3e0
style Match fill:#e8f5e8
class HistoryTreeProcessor:
"""Operations on DOM elements for history tracking"""
@staticmethod
def convert_dom_element_to_history_element(
dom_element: DOMElementNode
) -> DOMHistoryElement:
"""Convert current DOM element to history-trackable format"""
parent_branch_path = HistoryTreeProcessor._get_parent_branch_path(dom_element)
css_selector = dom_element.get_advanced_css_selector()
return DOMHistoryElement(
dom_element.tag_name,
dom_element.xpath,
dom_element.highlight_index,
parent_branch_path,
dom_element.attributes,
dom_element.shadow_root,
css_selector=css_selector,
page_coordinates=dom_element.page_coordinates,
viewport_coordinates=dom_element.viewport_coordinates,
viewport_info=dom_element.viewport_info,
)
@staticmethod
def find_history_element_in_tree(
dom_history_element: DOMHistoryElement,
tree: DOMElementNode
) -> Optional[DOMElementNode]:
"""Find historical element in current DOM tree"""
hashed_history = HistoryTreeProcessor._hash_dom_history_element(dom_history_element)
def process_node(node: DOMElementNode):
if node.highlight_index is not None:
hashed_node = HistoryTreeProcessor._hash_dom_element(node)
if hashed_node == hashed_history:
return node
for child in node.children:
if isinstance(child, DOMElementNode):
result = process_node(child)
if result is not None:
return result
return None
return process_node(tree)@dataclass
class HashedDomElement:
"""Hash representation of DOM element for identity tracking"""
branch_path_hash: str # Hash of parent element path
attributes_hash: str # Hash of element attributes
xpath_hash: str # Hash of XPath selectorgraph TB
subgraph "Visual Highlighting"
Create[Create Highlight Container]
Generate[Generate Color Scheme]
Position[Calculate Positions]
Overlay[Create Visual Overlay]
end
subgraph "Color System"
Colors[Iron Man Theme Colors]
Index[Index-based Selection]
Focus[Focus Element Emphasis]
end
subgraph "Coordinate Mapping"
Viewport[Viewport Coordinates]
Page[Page Coordinates]
Relative[Relative Positioning]
end
Create --> Generate
Generate --> Position
Position --> Overlay
Colors --> Index
Index --> Focus
Viewport --> Relative
Page --> Relative
style Create fill:#e8f5e8
style Colors fill:#fff3e0
The DOM service provides comprehensive Shadow DOM support:
// Shadow DOM traversal in buildDomTree.js
function processShadowRoots(element) {
if (element.shadowRoot) {
// Process shadow DOM content
const shadowElements = extractFromShadowRoot(element.shadowRoot);
return processShadowElements(shadowElements);
}
return [];
}async def get_clickable_elements(
self,
highlight_elements: bool = True,
focus_element: int = -1,
viewport_expansion: int = 0, # Expand viewport analysis area
) -> DOMState:
"""Extract clickable elements with viewport awareness"""The viewport_expansion parameter allows analysis beyond the visible viewport, useful for:
- Pre-loading elements that will become visible
- Analyzing scroll-triggered content
- Comprehensive page understanding
# Efficient memory cleanup after processing
html_to_dict = node_map[js_root_id]
del node_map
del js_node_map
del js_root_id
gc.collect() # Force garbage collectionclass DomService:
def __init__(self, page: Page):
self.xpath_cache = {} # Cache XPath calculationsThe DOM service supports selective element processing:
- Attribute Filtering: Include only specified attributes
- Element Type Filtering: Focus on specific element types
- Visibility Filtering: Process only visible elements
- Region-based Processing: Analyze specific page regions
sequenceDiagram
participant Agent
participant Browser
participant DOMService
participant JavaScript
Agent->>Browser: get_current_state()
Browser->>DOMService: get_clickable_elements()
DOMService->>JavaScript: Execute buildDomTree.js
JavaScript-->>DOMService: DOM Map + Metadata
DOMService->>DOMService: Build Python Tree
DOMService-->>Browser: DOM State
Browser-->>Agent: Browser State with DOM
Agent->>Agent: Analyze Interactive Elements
Agent->>Browser: Execute Action on Element[index]
# Agent uses selector map to find elements
selector_map = dom_state.selector_map
target_element = selector_map[highlight_index]
# Execute action on specific element
await controller.execute_action({
"click_element": {"index": highlight_index}
})# Validate JavaScript execution capability
if await self.page.evaluate('1+1') != 2:
raise ValueError('The page cannot evaluate javascript code properly')def _parse_node(self, node_data: dict):
if not node_data:
return None, []
# Graceful handling of missing data
# Fallback strategies for malformed DOM
# Type validation for all extracted data- Automatic cleanup of large DOM structures
- Garbage collection after processing
- Cache size limits for XPath storage
- Selective data retention
from browser_ai import DomService
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto("https://example.com")
dom_service = DomService(page)
dom_state = await dom_service.get_clickable_elements(
highlight_elements=True,
focus_element=-1,
viewport_expansion=100
)
print(f"Found {len(dom_state.selector_map)} interactive elements")
# Access specific element
first_element = dom_state.selector_map[0]
print(f"Element: {first_element.tag_name}")
print(f"Attributes: {first_element.attributes}")from browser_ai.dom.history_tree_processor.service import HistoryTreeProcessor
# Extract DOM state
dom_state = await dom_service.get_clickable_elements()
# Convert elements to history format
history_elements = []
for element in dom_state.selector_map.values():
history_element = HistoryTreeProcessor.convert_dom_element_to_history_element(element)
history_elements.append(history_element)
# Navigate to new page
await page.goto("https://example.com/page2")
# Get new DOM state
new_dom_state = await dom_service.get_clickable_elements()
# Find matching elements
for history_element in history_elements:
current_element = HistoryTreeProcessor.find_history_element_in_tree(
history_element,
new_dom_state.element_tree
)
if current_element:
print(f"Found matching element: {current_element.highlight_index}")# Always clean up resources
try:
dom_state = await dom_service.get_clickable_elements()
# Process DOM state
finally:
# Cleanup handled automatically by garbage collection
passtry:
dom_state = await dom_service.get_clickable_elements()
except ValueError as e:
logger.error(f"DOM processing failed: {e}")
# Implement fallback strategy# For large pages, consider viewport expansion limits
dom_state = await dom_service.get_clickable_elements(
highlight_elements=True, # Enable for visual debugging
focus_element=-1, # No focus highlighting
viewport_expansion=0 # Limit to visible area only
)This documentation provides comprehensive coverage of the DOM processing system. The combination of JavaScript-based extraction and Python-based processing creates a robust foundation for AI-driven browser automation.