-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
agent-friendlyImproves agent workflowsImproves agent workflowsenhancementNew feature or requestNew feature or requesthigh-impactHigh value for agents/usersHigh value for agents/users
Description
Vision
Make screenshots the escape hatch for visual debugging, not the default way to "see" a page.
Problem
Agents use screenshots because they want to "see" the page. But most intents don't need pixels:
| Intent | What They Actually Need | Current Solution | Better Solution |
|---|---|---|---|
| "What's on this page?" | Structure, content | Screenshot (expensive) | DOM / A11y tree |
| "Where is the button?" | Element location | Screenshot (expensive) | Bounding boxes |
| "What's below the fold?" | More content | Full-page screenshot | Scroll + query |
| "Why does this look broken?" | Visual rendering | Screenshot | Screenshot ✓ |
Only visual debugging truly needs pixels.
Proposed Commands
1. bdg dom layout [selector]
Returns element positions, sizes, and visibility without pixels.
bdg dom layout "button.submit"{
"selector": "button.submit",
"count": 1,
"elements": [{
"index": 0,
"tag": "button",
"text": "Submit",
"bounds": { "x": 450, "y": 1200, "width": 120, "height": 40 },
"viewport": { "visible": false, "belowFold": true, "percentVisible": 0 },
"computed": { "display": "block", "visibility": "visible" }
}]
}Use case: Agent needs to know where something is without burning tokens on screenshot.
2. bdg dom scroll <selector>
Scroll element into viewport.
bdg dom scroll "footer" # Scroll to footer
bdg dom scroll --to "bottom" # Scroll to page bottom
bdg dom scroll --by 500 # Scroll down 500px
bdg dom scroll --to "top" # Back to topUse case: Navigate long pages without full-page screenshots.
3. Enhanced bdg dom a11y tree output
Add visual hints to accessibility tree:
[Button] "Submit" (below fold, y=1200)
[Link] "Learn more" (visible, y=450)
[Image] "Hero banner" (above fold, 1200×400)
Use case: Agent can understand page layout from a11y tree without screenshots.
Workflow Example
# Old way (expensive)
bdg dom screenshot page.png # 12,000 tokens burned
# New way (cheap)
bdg dom a11y tree # ~500 tokens, shows structure
bdg dom layout "form" # ~100 tokens, shows position
bdg dom scroll "form" # 0 tokens, brings into view
bdg dom screenshot form.png --selector "form" # ~500 tokens, element onlyImplementation Notes
layoutusesDOM.getBoxModelandDOM.getDocumentscrollusesRuntime.evaluatewithscrollIntoView()- A11y enhancement uses existing
Accessibility.getFullAXTree+ position data
Acceptance Criteria
-
bdg dom layout [selector]returns bounding boxes and visibility -
bdg dom scroll <selector>scrolls element into view -
bdg dom scroll --to top|bottomfor page navigation - A11y tree includes position hints (above/below fold)
- Update skill docs with "screenshots as last resort" guidance
Priority
This is a strategic feature for token efficiency. Complements #116 (smart resize) as the long-term solution.
Labels
enhancementagent-friendlystrategic
Metadata
Metadata
Assignees
Labels
agent-friendlyImproves agent workflowsImproves agent workflowsenhancementNew feature or requestNew feature or requesthigh-impactHigh value for agents/usersHigh value for agents/users