Skip to content

Commit 6050d8f

Browse files
gsienerclaude
andcommitted
Migrate browser automation from Playwright to agent-browser
Replace raw Playwright with agent-browser (Vercel Labs wrapper) for more stable element selection via accessibility tree snapshots instead of CSS selectors. Key changes: - Add AgentBrowserClient wrapper around BrowserManager - Replace CSS selectors with ElementMatchers (role, name, text) - Add snapshot-helpers.ts with findElementByMatcher/clickByMatcher - Implement fallback strategy: snapshot refs → CSS selectors → keyboard - Fix Drive API calls to include required 'fields' parameter - Add comprehensive tests for matchers and snapshot helpers Files added: - src/browser/agent-browser-client.ts - src/browser/matchers.ts - src/browser/snapshot-helpers.ts - tests/matchers.test.ts - tests/snapshot-helpers.test.ts Files removed: - src/browser/page-helpers.ts (replaced by snapshot-helpers.ts) - src/browser/selectors.ts (replaced by matchers.ts) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent d0c02f1 commit 6050d8f

21 files changed

Lines changed: 1630 additions & 747 deletions

CLAUDE.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,27 @@
88

99
### Why Browser Automation?
1010

11-
Google's Docs API cannot create suggested edits—only read them. This is a known limitation ([issue #287903901](https://issuetracker.google.com/issues/287903901)). We use Playwright browser automation to write suggestions while using the API for reliable reading.
11+
Google's Docs API cannot create suggested edits—only read them. This is a known limitation ([issue #287903901](https://issuetracker.google.com/issues/287903901)). We use agent-browser (a Vercel Labs CLI wrapping Playwright) for browser automation to write suggestions while using the API for reliable reading.
1212

1313
### Hybrid Read/Write Strategy
1414

1515
- **Read path**: Google Docs API via service account (fast, reliable)
16-
- **Write path**: Playwright browser automation (fragile but necessary)
16+
- **Write path**: agent-browser automation (more stable than raw Playwright)
1717

1818
### Text Anchoring Over Indexes
1919

2020
Claude's suggestions use text content matching (`findText`) rather than document indexes. Indexes shift as edits are made; text anchoring is more robust for sequential operations.
2121

22+
### Accessibility Tree-Based Element Selection
23+
24+
Instead of CSS selectors, we use agent-browser's accessibility tree snapshots with refs (`@e1`, `@e2`) for more stable element selection. Elements are matched by role, name, and text properties. Fallback to CSS selectors when snapshots don't match.
25+
2226
## Development Practices
2327

2428
- **TDD**: Write tests before implementation
25-
- **Isolated Selectors**: All DOM selectors in `src/browser/selectors.ts` for easy maintenance when Google's UI changes
26-
- **Keyboard Shortcuts**: Prefer keyboard shortcuts over DOM selectors (more stable)
29+
- **Isolated Matchers**: Element matchers in `src/browser/matchers.ts` for easy maintenance when Google's UI changes
30+
- **Keyboard Shortcuts**: Prefer keyboard shortcuts over UI selectors (most stable)
31+
- **Snapshot-First**: Try accessibility tree refs first, fall back to CSS selectors
2732
- **Retry Logic**: All browser operations use retry with exponential backoff
2833

2934
## Testing
@@ -38,10 +43,11 @@ npm run lint # TypeScript type check
3843

3944
### Adding a New Browser Operation
4045

41-
1. Add selector to `src/browser/selectors.ts`
42-
2. Write test in `tests/`
43-
3. Implement in `src/browser/docs-writer.ts`
44-
4. Wrap with `withRetry()` for resilience
46+
1. Add matcher to `src/browser/matchers.ts` (role, name, text patterns)
47+
2. Add CSS selector fallbacks if needed
48+
3. Write test in `tests/`
49+
4. Implement using `clickByMatcher()` / `fillByMatcher()` from `snapshot-helpers.ts`
50+
5. Wrap with `withRetry()` for resilience
4551

4652
### Modifying Claude's Response Format
4753

package-lock.json

Lines changed: 39 additions & 19 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,11 @@
1818
},
1919
"dependencies": {
2020
"@anthropic-ai/sdk": "^0.30.0",
21+
"agent-browser": "^0.5.0",
2122
"commander": "^12.0.0",
2223
"dotenv": "^16.0.0",
2324
"googleapis": "^130.0.0",
24-
"playwright": "^1.40.0",
25+
"playwright-core": "^1.57.0",
2526
"zod": "^3.22.0"
2627
},
2728
"devDependencies": {
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
/**
2+
* Agent Browser Client - Wrapper around BrowserManager for track-changes usage.
3+
*
4+
* This module provides a clean interface to agent-browser's BrowserManager,
5+
* exposing only the functionality we need while providing better error handling
6+
* and logging.
7+
*/
8+
9+
import { BrowserManager } from "agent-browser/dist/browser.js";
10+
import type { Page, Locator } from "playwright-core";
11+
import type { RefMap, EnhancedSnapshot } from "agent-browser/dist/snapshot.js";
12+
import { logger } from "../utils/logger.js";
13+
14+
export interface AgentBrowserOptions {
15+
headless?: boolean;
16+
viewport?: { width: number; height: number };
17+
storageStatePath?: string;
18+
}
19+
20+
/**
21+
* Client wrapper for agent-browser's BrowserManager.
22+
* Provides a cleaner interface and maintains compatibility with existing code patterns.
23+
*/
24+
export class AgentBrowserClient {
25+
private manager: BrowserManager;
26+
private launched = false;
27+
28+
constructor() {
29+
this.manager = new BrowserManager();
30+
}
31+
32+
/**
33+
* Launch the browser with specified options.
34+
*/
35+
async launch(options: AgentBrowserOptions = {}): Promise<void> {
36+
if (this.launched) {
37+
logger.debug("Browser already launched");
38+
return;
39+
}
40+
41+
logger.info("Launching browser via agent-browser", {
42+
headless: options.headless ?? true,
43+
viewport: options.viewport,
44+
});
45+
46+
await this.manager.launch({
47+
id: "launch",
48+
action: "launch",
49+
headless: options.headless ?? true,
50+
viewport: options.viewport ?? { width: 1280, height: 800 },
51+
});
52+
53+
this.launched = true;
54+
logger.info("Browser launched successfully");
55+
}
56+
57+
/**
58+
* Get the current Playwright Page object.
59+
* This allows using familiar Playwright APIs for interactions.
60+
*/
61+
getPage(): Page {
62+
if (!this.launched) {
63+
throw new Error("Browser not launched. Call launch() first.");
64+
}
65+
return this.manager.getPage();
66+
}
67+
68+
/**
69+
* Get an accessibility tree snapshot with element refs.
70+
* @param options.interactive - Only include interactive elements
71+
* @param options.compact - Remove structural elements without content
72+
*/
73+
async getSnapshot(options?: {
74+
interactive?: boolean;
75+
compact?: boolean;
76+
maxDepth?: number;
77+
selector?: string;
78+
}): Promise<EnhancedSnapshot> {
79+
if (!this.launched) {
80+
throw new Error("Browser not launched. Call launch() first.");
81+
}
82+
return this.manager.getSnapshot(options);
83+
}
84+
85+
/**
86+
* Get the cached ref map from the last snapshot.
87+
*/
88+
getRefMap(): RefMap {
89+
return this.manager.getRefMap();
90+
}
91+
92+
/**
93+
* Get a Playwright Locator from a ref (e.g., "e1", "@e1", "ref=e1").
94+
* Returns null if ref doesn't exist or is invalid.
95+
*/
96+
getLocatorFromRef(ref: string): Locator | null {
97+
if (!this.launched) {
98+
throw new Error("Browser not launched. Call launch() first.");
99+
}
100+
return this.manager.getLocatorFromRef(ref);
101+
}
102+
103+
/**
104+
* Get a Playwright Locator - supports both refs and regular selectors.
105+
*/
106+
getLocator(selectorOrRef: string): Locator {
107+
if (!this.launched) {
108+
throw new Error("Browser not launched. Call launch() first.");
109+
}
110+
return this.manager.getLocator(selectorOrRef);
111+
}
112+
113+
/**
114+
* Check if a selector string looks like a ref (e.g., "@e1").
115+
*/
116+
isRef(selector: string): boolean {
117+
return this.manager.isRef(selector);
118+
}
119+
120+
/**
121+
* Save the current storage state (cookies, localStorage) to a file.
122+
*/
123+
async saveStorageState(path: string): Promise<void> {
124+
if (!this.launched) {
125+
throw new Error("Browser not launched. Call launch() first.");
126+
}
127+
await this.manager.saveStorageState(path);
128+
logger.info("Storage state saved", { path });
129+
}
130+
131+
/**
132+
* Set the viewport size.
133+
*/
134+
async setViewport(width: number, height: number): Promise<void> {
135+
if (!this.launched) {
136+
throw new Error("Browser not launched. Call launch() first.");
137+
}
138+
await this.manager.setViewport(width, height);
139+
}
140+
141+
/**
142+
* Check if the browser is currently launched.
143+
*/
144+
isLaunched(): boolean {
145+
return this.launched && this.manager.isLaunched();
146+
}
147+
148+
/**
149+
* Close the browser and clean up resources.
150+
*/
151+
async close(): Promise<void> {
152+
if (this.launched) {
153+
await this.manager.close();
154+
this.launched = false;
155+
logger.info("Browser closed");
156+
}
157+
}
158+
}

0 commit comments

Comments
 (0)