Skip to content

feat: Layout inspection commands to reduce screenshot dependency #117

@szymdzum

Description

@szymdzum

Vision

Make screenshots the escape hatch for visual debugging, not the default way to "see" a page.

Problem

Agents use screenshots because they want to "see" the page. But most intents don't need pixels:

Intent What They Actually Need Current Solution Better Solution
"What's on this page?" Structure, content Screenshot (expensive) DOM / A11y tree
"Where is the button?" Element location Screenshot (expensive) Bounding boxes
"What's below the fold?" More content Full-page screenshot Scroll + query
"Why does this look broken?" Visual rendering Screenshot Screenshot ✓

Only visual debugging truly needs pixels.

Proposed Commands

1. bdg dom layout [selector]

Returns element positions, sizes, and visibility without pixels.

bdg dom layout "button.submit"
{
  "selector": "button.submit",
  "count": 1,
  "elements": [{
    "index": 0,
    "tag": "button",
    "text": "Submit",
    "bounds": { "x": 450, "y": 1200, "width": 120, "height": 40 },
    "viewport": { "visible": false, "belowFold": true, "percentVisible": 0 },
    "computed": { "display": "block", "visibility": "visible" }
  }]
}

Use case: Agent needs to know where something is without burning tokens on screenshot.

2. bdg dom scroll <selector>

Scroll element into viewport.

bdg dom scroll "footer"           # Scroll to footer
bdg dom scroll --to "bottom"      # Scroll to page bottom
bdg dom scroll --by 500           # Scroll down 500px
bdg dom scroll --to "top"         # Back to top

Use case: Navigate long pages without full-page screenshots.

3. Enhanced bdg dom a11y tree output

Add visual hints to accessibility tree:

[Button] "Submit" (below fold, y=1200)
[Link] "Learn more" (visible, y=450)
[Image] "Hero banner" (above fold, 1200×400)

Use case: Agent can understand page layout from a11y tree without screenshots.

Workflow Example

# Old way (expensive)
bdg dom screenshot page.png      # 12,000 tokens burned

# New way (cheap)
bdg dom a11y tree               # ~500 tokens, shows structure
bdg dom layout "form"            # ~100 tokens, shows position
bdg dom scroll "form"            # 0 tokens, brings into view
bdg dom screenshot form.png --selector "form"  # ~500 tokens, element only

Implementation Notes

  • layout uses DOM.getBoxModel and DOM.getDocument
  • scroll uses Runtime.evaluate with scrollIntoView()
  • A11y enhancement uses existing Accessibility.getFullAXTree + position data

Acceptance Criteria

  • bdg dom layout [selector] returns bounding boxes and visibility
  • bdg dom scroll <selector> scrolls element into view
  • bdg dom scroll --to top|bottom for page navigation
  • A11y tree includes position hints (above/below fold)
  • Update skill docs with "screenshots as last resort" guidance

Priority

This is a strategic feature for token efficiency. Complements #116 (smart resize) as the long-term solution.

Labels

  • enhancement
  • agent-friendly
  • strategic

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-friendlyImproves agent workflowsenhancementNew feature or requesthigh-impactHigh value for agents/users

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions