Skip to content

Latest commit

 

History

History
652 lines (513 loc) · 17 KB

File metadata and controls

652 lines (513 loc) · 17 KB

DOM Processing and Element Extraction

Overview

The Browser.AI DOM Processing Service is a sophisticated system that analyzes web pages, extracts interactive elements, and provides AI agents with structured information about the DOM. It combines JavaScript execution in the browser with Python-based processing to create a comprehensive understanding of web page structure and interactivity.

Architecture

DOM Service Components

graph TB
    subgraph "DOM Processing Pipeline"
        DS[DOM Service]
        JS[JavaScript Injector]
        Builder[DOM Tree Builder]
        Parser[Node Parser]
        Processor[History Tree Processor]
    end
    
    subgraph "Browser Integration"
        Page[Playwright Page]
        Eval[JavaScript Evaluation]
        Elements[DOM Elements]
        Viewport[Viewport State]
    end
    
    subgraph "Data Structures"
        Tree[DOM Tree]
        SelectorMap[Selector Map]
        Coordinates[Coordinate System]
        History[Element History]
    end
    
    DS --> JS
    JS --> Builder
    Builder --> Parser
    Parser --> Processor
    Page --> Eval
    Eval --> Elements
    Elements --> Tree
    Tree --> SelectorMap
    Viewport --> Coordinates
    SelectorMap --> History
    
    style DS fill:#fff3e0
    style Tree fill:#e8f5e8
    style SelectorMap fill:#e1f5fe
Loading

Core DOM Service Structure

class DomService:
    def __init__(self, page: Page):
        self.page = page
        self.xpath_cache = {}
        self.js_code = resources.read_text('browser_ai.dom', 'buildDomTree.js')
    
    async def get_clickable_elements(
        self,
        highlight_elements: bool = True,
        focus_element: int = -1,
        viewport_expansion: int = 0,
    ) -> DOMState

JavaScript DOM Tree Builder

Client-Side DOM Analysis

The DOM processing begins with sophisticated JavaScript code executed in the browser that analyzes the page structure:

// buildDomTree.js - Core DOM analysis engine
(args = {
    doHighlightElements: true,
    focusHighlightIndex: -1,
    viewportExpansion: 0,
}) => {
    const { doHighlightElements, focusHighlightIndex, viewportExpansion } = args;
    let highlightIndex = 0;
    
    const DOM_HASH_MAP = {};
    
    // Element highlighting and analysis functions
    function highlightElement(element, index, parentIframe = null) {
        // Visual highlighting with color-coded indices
        // Coordinate calculation and viewport analysis
    }
    
    function extractElementInfo(element) {
        // Extract all relevant element properties
        // Calculate positions and visibility
    }
}

Element Analysis Process

sequenceDiagram
    participant Python as DOM Service
    participant JS as JavaScript Engine
    participant Browser as Browser DOM
    participant Elements as DOM Elements
    
    Python->>JS: Execute buildDomTree.js
    JS->>Browser: Traverse DOM Tree
    Browser->>Elements: Query Element Properties
    Elements-->>Browser: Element Data
    Browser-->>JS: Processed Information
    
    loop For Each Interactive Element
        JS->>Browser: Check Visibility
        JS->>Browser: Calculate Coordinates
        JS->>Browser: Extract Attributes
        JS->>Browser: Apply Highlighting
    end
    
    JS-->>Python: Return DOM Map + Metadata
    Python->>Python: Build Python DOM Tree
Loading

Element Selection Criteria

The JavaScript engine identifies interactive elements based on:

  1. Tag-based Selection

    • Input elements (input, textarea, select)
    • Interactive elements (button, a, label)
    • Form elements (form, fieldset)
  2. Attribute-based Selection

    • Elements with onclick handlers
    • Elements with href attributes
    • Elements with tabindex
    • Elements with ARIA roles
  3. Computed Style Analysis

    • cursor: pointer elements
    • Non-zero dimensions
    • Visible elements (not display: none)
  4. Contextual Analysis

    • Parent container interactivity
    • Event listener presence
    • Focus capability

Python DOM Tree Structure

DOM Node Hierarchy

classDiagram
    class DOMBaseNode {
        +bool is_visible
        +Optional[DOMElementNode] parent
    }
    
    class DOMElementNode {
        +str tag_name
        +Optional[int] highlight_index
        +dict[str, str] attributes
        +list[DOMBaseNode] children
        +str xpath
        +Optional[CoordinateSet] viewport_coordinates
        +Optional[CoordinateSet] page_coordinates
        +Optional[ViewportInfo] viewport_info
        +bool shadow_root
        +get_advanced_css_selector()
        +to_dict()
        +from_dict()
    }
    
    class DOMTextNode {
        +str text
        +str type = 'TEXT_NODE'
        +has_parent_with_highlight_index()
    }
    
    DOMBaseNode <|-- DOMElementNode
    DOMBaseNode <|-- DOMTextNode
    DOMElementNode --> DOMBaseNode : children
Loading

Data Structures

Coordinate System

class Coordinates(BaseModel):
    x: int
    y: int

class CoordinateSet(BaseModel):
    top_left: Coordinates
    top_right: Coordinates
    bottom_left: Coordinates
    bottom_right: Coordinates
    center: Coordinates
    width: int
    height: int

Viewport Information

class ViewportInfo(BaseModel):
    scroll_x: int
    scroll_y: int
    width: int
    height: int

DOM State

class DOMState(BaseModel):
    element_tree: DOMElementNode      # Root of the DOM tree
    selector_map: SelectorMap         # Index -> Element mapping

Element Processing Workflow

Complete DOM Processing Pipeline

graph TD
    Start[Page Load] --> Validate[Validate JavaScript]
    Validate --> Inject[Inject buildDomTree.js]
    Inject --> Execute[Execute with Parameters]
    Execute --> Parse[Parse JavaScript Results]
    Parse --> BuildTree[Build Python DOM Tree]
    BuildTree --> ProcessNodes[Process Node Relationships]
    ProcessNodes --> MapSelectors[Create Selector Map]
    MapSelectors --> Cleanup[Memory Cleanup]
    Cleanup --> Return[Return DOM State]
    
    subgraph "Node Processing"
        ProcessNodes --> ElementNodes[Process Element Nodes]
        ProcessNodes --> TextNodes[Process Text Nodes]
        ProcessNodes --> Coordinates[Calculate Coordinates]
        ProcessNodes --> Attributes[Extract Attributes]
    end
    
    subgraph "Tree Building"
        BuildTree --> ParentChild[Establish Parent-Child]
        BuildTree --> Highlighting[Apply Highlighting]
        BuildTree --> Indexing[Create Index Map]
    end
    
    style Start fill:#e8f5e8
    style Return fill:#e8f5e8
Loading

Node Parsing Algorithm

def _parse_node(self, node_data: dict) -> tuple[Optional[DOMBaseNode], list[int]]:
    """Parse JavaScript node data into Python DOM node"""
    
    # Handle text nodes
    if node_data.get('type') == 'TEXT_NODE':
        return DOMTextNode(
            text=node_data['text'],
            is_visible=node_data['isVisible'],
            parent=None
        ), []
    
    # Process element coordinates
    viewport_coordinates = None
    page_coordinates = None
    viewport_info = None
    
    # Extract coordinate information
    if 'viewportCoordinates' in node_data:
        viewport_coordinates = CoordinateSet(...)
        
    if 'pageCoordinates' in node_data:
        page_coordinates = CoordinateSet(...)
        
    if 'viewport' in node_data:
        viewport_info = ViewportInfo(...)
    
    # Create element node
    return DOMElementNode(...), children_ids

History Tree Processing

Element Tracking and Persistence

The History Tree Processor provides capabilities for tracking elements across page changes and maintaining element identity:

graph LR
    subgraph "History Processing"
        Convert[Convert to History Element]
        Hash[Generate Element Hash]
        Find[Find in Current Tree]
        Match[Match Elements]
    end
    
    subgraph "Hashing Strategy"
        Branch[Branch Path Hash]
        Attr[Attributes Hash]
        XPath[XPath Hash]
        Combined[Combined Hash]
    end
    
    Convert --> Hash
    Hash --> Branch
    Hash --> Attr
    Hash --> XPath
    Branch --> Combined
    Attr --> Combined
    XPath --> Combined
    Combined --> Find
    Find --> Match
    
    style Convert fill:#fff3e0
    style Match fill:#e8f5e8
Loading

Element Identity Management

class HistoryTreeProcessor:
    """Operations on DOM elements for history tracking"""
    
    @staticmethod
    def convert_dom_element_to_history_element(
        dom_element: DOMElementNode
    ) -> DOMHistoryElement:
        """Convert current DOM element to history-trackable format"""
        
        parent_branch_path = HistoryTreeProcessor._get_parent_branch_path(dom_element)
        css_selector = dom_element.get_advanced_css_selector()
        
        return DOMHistoryElement(
            dom_element.tag_name,
            dom_element.xpath,
            dom_element.highlight_index,
            parent_branch_path,
            dom_element.attributes,
            dom_element.shadow_root,
            css_selector=css_selector,
            page_coordinates=dom_element.page_coordinates,
            viewport_coordinates=dom_element.viewport_coordinates,
            viewport_info=dom_element.viewport_info,
        )
    
    @staticmethod
    def find_history_element_in_tree(
        dom_history_element: DOMHistoryElement,
        tree: DOMElementNode
    ) -> Optional[DOMElementNode]:
        """Find historical element in current DOM tree"""
        
        hashed_history = HistoryTreeProcessor._hash_dom_history_element(dom_history_element)
        
        def process_node(node: DOMElementNode):
            if node.highlight_index is not None:
                hashed_node = HistoryTreeProcessor._hash_dom_element(node)
                if hashed_node == hashed_history:
                    return node
            
            for child in node.children:
                if isinstance(child, DOMElementNode):
                    result = process_node(child)
                    if result is not None:
                        return result
            return None
        
        return process_node(tree)

Element Hashing Strategy

@dataclass
class HashedDomElement:
    """Hash representation of DOM element for identity tracking"""
    branch_path_hash: str      # Hash of parent element path
    attributes_hash: str       # Hash of element attributes
    xpath_hash: str           # Hash of XPath selector

Advanced DOM Features

Element Highlighting System

graph TB
    subgraph "Visual Highlighting"
        Create[Create Highlight Container]
        Generate[Generate Color Scheme]
        Position[Calculate Positions]
        Overlay[Create Visual Overlay]
    end
    
    subgraph "Color System"
        Colors[Iron Man Theme Colors]
        Index[Index-based Selection]
        Focus[Focus Element Emphasis]
    end
    
    subgraph "Coordinate Mapping"
        Viewport[Viewport Coordinates]
        Page[Page Coordinates]
        Relative[Relative Positioning]
    end
    
    Create --> Generate
    Generate --> Position
    Position --> Overlay
    Colors --> Index
    Index --> Focus
    Viewport --> Relative
    Page --> Relative
    
    style Create fill:#e8f5e8
    style Colors fill:#fff3e0
Loading

Shadow DOM Support

The DOM service provides comprehensive Shadow DOM support:

// Shadow DOM traversal in buildDomTree.js
function processShadowRoots(element) {
    if (element.shadowRoot) {
        // Process shadow DOM content
        const shadowElements = extractFromShadowRoot(element.shadowRoot);
        return processShadowElements(shadowElements);
    }
    return [];
}

Viewport-Aware Processing

async def get_clickable_elements(
    self,
    highlight_elements: bool = True,
    focus_element: int = -1,
    viewport_expansion: int = 0,  # Expand viewport analysis area
) -> DOMState:
    """Extract clickable elements with viewport awareness"""

The viewport_expansion parameter allows analysis beyond the visible viewport, useful for:

  • Pre-loading elements that will become visible
  • Analyzing scroll-triggered content
  • Comprehensive page understanding

Performance Optimization

Memory Management

# Efficient memory cleanup after processing
html_to_dict = node_map[js_root_id]

del node_map
del js_node_map  
del js_root_id

gc.collect()  # Force garbage collection

Caching Strategy

class DomService:
    def __init__(self, page: Page):
        self.xpath_cache = {}  # Cache XPath calculations

Selective Processing

The DOM service supports selective element processing:

  1. Attribute Filtering: Include only specified attributes
  2. Element Type Filtering: Focus on specific element types
  3. Visibility Filtering: Process only visible elements
  4. Region-based Processing: Analyze specific page regions

Integration with Agent System

DOM State in Agent Flow

sequenceDiagram
    participant Agent
    participant Browser
    participant DOMService
    participant JavaScript
    
    Agent->>Browser: get_current_state()
    Browser->>DOMService: get_clickable_elements()
    DOMService->>JavaScript: Execute buildDomTree.js
    JavaScript-->>DOMService: DOM Map + Metadata
    DOMService->>DOMService: Build Python Tree
    DOMService-->>Browser: DOM State
    Browser-->>Agent: Browser State with DOM
    
    Agent->>Agent: Analyze Interactive Elements
    Agent->>Browser: Execute Action on Element[index]
Loading

Element Selection for Actions

# Agent uses selector map to find elements
selector_map = dom_state.selector_map
target_element = selector_map[highlight_index]

# Execute action on specific element
await controller.execute_action({
    "click_element": {"index": highlight_index}
})

Error Handling and Resilience

JavaScript Execution Validation

# Validate JavaScript execution capability
if await self.page.evaluate('1+1') != 2:
    raise ValueError('The page cannot evaluate javascript code properly')

DOM Parsing Robustness

def _parse_node(self, node_data: dict):
    if not node_data:
        return None, []
    
    # Graceful handling of missing data
    # Fallback strategies for malformed DOM
    # Type validation for all extracted data

Memory Safety

  • Automatic cleanup of large DOM structures
  • Garbage collection after processing
  • Cache size limits for XPath storage
  • Selective data retention

Usage Examples

Basic DOM Extraction

from browser_ai import DomService
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.launch()
    page = await browser.new_page()
    await page.goto("https://example.com")
    
    dom_service = DomService(page)
    dom_state = await dom_service.get_clickable_elements(
        highlight_elements=True,
        focus_element=-1,
        viewport_expansion=100
    )
    
    print(f"Found {len(dom_state.selector_map)} interactive elements")
    
    # Access specific element
    first_element = dom_state.selector_map[0]
    print(f"Element: {first_element.tag_name}")
    print(f"Attributes: {first_element.attributes}")

Advanced Processing with History Tracking

from browser_ai.dom.history_tree_processor.service import HistoryTreeProcessor

# Extract DOM state
dom_state = await dom_service.get_clickable_elements()

# Convert elements to history format
history_elements = []
for element in dom_state.selector_map.values():
    history_element = HistoryTreeProcessor.convert_dom_element_to_history_element(element)
    history_elements.append(history_element)

# Navigate to new page
await page.goto("https://example.com/page2")

# Get new DOM state
new_dom_state = await dom_service.get_clickable_elements()

# Find matching elements
for history_element in history_elements:
    current_element = HistoryTreeProcessor.find_history_element_in_tree(
        history_element, 
        new_dom_state.element_tree
    )
    if current_element:
        print(f"Found matching element: {current_element.highlight_index}")

Best Practices

1. Resource Management

# Always clean up resources
try:
    dom_state = await dom_service.get_clickable_elements()
    # Process DOM state
finally:
    # Cleanup handled automatically by garbage collection
    pass

2. Error Handling

try:
    dom_state = await dom_service.get_clickable_elements()
except ValueError as e:
    logger.error(f"DOM processing failed: {e}")
    # Implement fallback strategy

3. Performance Tuning

# For large pages, consider viewport expansion limits
dom_state = await dom_service.get_clickable_elements(
    highlight_elements=True,      # Enable for visual debugging
    focus_element=-1,             # No focus highlighting
    viewport_expansion=0          # Limit to visible area only
)

This documentation provides comprehensive coverage of the DOM processing system. The combination of JavaScript-based extraction and Python-based processing creates a robust foundation for AI-driven browser automation.