-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path.cursorrules
209 lines (168 loc) · 6.21 KB
/
.cursorrules
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
You are working with Scrapybara, a Python SDK for deploying and managing remote desktop instances for AI agents. Use this guide to properly interact with the SDK.
CORE SDK USAGE:
- Initialize client: from scrapybara import Scrapybara; client = Scrapybara(api_key="KEY")
- Instance lifecycle:
instance = client.start(instance_type="small", timeout_hours=1)
instance.pause() # Pause to save resources
instance.resume(timeout_hours=1) # Resume work
instance.stop() # Terminate and clean up
CORE INSTANCE OPERATIONS:
- Screenshots: instance.screenshot().base_64_image
- Bash commands: instance.bash(command="ls -la")
- Mouse control: instance.computer(action="mouse_move", coordinate=[x, y])
- Click actions: instance.computer(action="left_click")
- File operations: instance.file.read(path="/path/file"), instance.file.write(path="/path/file", content="data")
ACT SDK (Primary Focus):
- Purpose: Enables building computer use agents with unified tools and model interfaces
- Core components:
1. Model: Handles LLM integration (currently Anthropic)
from scrapybara.anthropic import Anthropic
model = Anthropic() # Or model = Anthropic(api_key="KEY") for own key
2. Tools: Interface for computer interactions
- BashTool: Run shell commands
- ComputerTool: Mouse/keyboard control
- EditTool: File operations
- BrowserTool: Web automation via Playwright
tools = [
BashTool(instance),
ComputerTool(instance),
EditTool(instance),
BrowserTool(instance)
]
3. Prompt:
- system: system prompt, recommend to use SYSTEM_PROMPT
- prompt: simple user prompt
- messages: list of messages
- Only include either prompt or messages, not both
response = client.act(
model=Anthropic(),
tools=tools,
system=SYSTEM_PROMPT,
prompt="Task",
on_step=handle_step
)
messages = response.messages
steps = response.steps
text = response.text
output = response.output
usage = response.usage
MESSAGE HANDLING:
- Response Structure: Messages are structured with roles (user/assistant/tool) and typed content
- Content Types:
- TextPart: Simple text content
TextPart(type="text", text="content")
- ImagePart: Base64 or URL images
ImagePart(type="image", image="base64...", mime_type="image/png")
- ToolCallPart: Tool invocations
ToolCallPart(
type="tool-call",
tool_call_id="id",
tool_name="bash",
args={"command": "ls"}
)
- ToolResultPart: Tool execution results
ToolResultPart(
type="tool-result",
tool_call_id="id",
tool_name="bash",
result="output",
is_error=False
)
STEP HANDLING:
# Access step information in callbacks
def handle_step(step: Step):
print(f"Text: {step.text}")
if step.tool_calls:
for call in step.tool_calls:
print(f"Tool: {call.tool_name}")
if step.tool_results:
for result in step.tool_results:
print(f"Result: {result.result}")
print(f"Tokens: {step.usage.total_tokens if step.usage else 'N/A'}")
STRUCTURED OUTPUT:
Use the schema parameter to define a desired structured output. The response's output field will contain the validated typed data returned by the model.
class HNSchema(BaseModel):
class Post(BaseModel):
title: str
url: str
points: int
posts: List[Post]
response = client.act(
model=Anthropic(),
tools=tools,
schema=HNSchema,
system=SYSTEM_PROMPT,
prompt="Get the top 10 posts on Hacker News",
)
posts = response.output.posts
TOKEN USAGE:
- Track token usage through TokenUsage objects
- Fields: prompt_tokens, completion_tokens, total_tokens
- Available in both Step and ActResponse objects
Here's a brief example of how to use the Scrapybara SDK:
from scrapybara import Scrapybara
from scrapybara.anthropic import Anthropic
from scrapybara.prompts import SYSTEM_PROMPT
from scrapybara.tools import BashTool, ComputerTool, EditTool, BrowserTool
client = Scrapybara()
instance = client.start()
instance.browser.start()
response = client.act(
model=Anthropic(),
tools=[
BashTool(instance),
ComputerTool(instance),
EditTool(instance),
BrowserTool(instance),
],
system=SYSTEM_PROMPT,
prompt="Go to the YC website and fetch the HTML",
on_step=lambda step: print(f"{step}\n"),
)
messages = response.messages
steps = response.steps
text = response.text
output = response.output
usage = response.usage
instance.browser.stop()
instance.stop()
EXECUTION PATTERNS:
1. Basic agent execution:
response = client.act(
model=Anthropic(),
tools=tools,
system="System context here",
prompt="Task description"
)
2. Browser automation:
cdp_url = instance.browser.start().cdp_url
auth_state_id = instance.browser.save_auth(name="default").auth_state_id # Save auth
instance.browser.authenticate(auth_state_id=auth_state_id) # Reuse auth
3. File management:
instance.file.write("/tmp/data.txt", "content")
content = instance.file.read("/tmp/data.txt").content
IMPORTANT GUIDELINES:
- Always stop instances after use to prevent unnecessary billing
- Use async client (AsyncScrapybara) for non-blocking operations
- Handle API errors with try/except ApiError blocks
- Default timeout is 60s; customize with timeout parameter or request_options
- Instance auto-terminates after 1 hour by default
- For browser operations, always start browser before BrowserTool usage
- Prefer bash commands over GUI interactions for launching applications
ERROR HANDLING:
from scrapybara.core.api_error import ApiError
try:
client.start()
except ApiError as e:
print(f"Error {e.status_code}: {e.body}")
BROWSER TOOL OPERATIONS:
- Required setup:
cdp_url = instance.browser.start().cdp_url
tools = [BrowserTool(instance)]
- Commands: go_to, get_html, evaluate, click, type, screenshot, get_text, get_attribute
- Always handle browser authentication states appropriately
ENV VARIABLES & CONFIGURATION:
- Set env vars: instance.env.set({"API_KEY": "value"})
- Get env vars: vars = instance.env.get().variables
- Delete env vars: instance.env.delete(["VAR_NAME"])
Remember to handle resources properly and implement appropriate error handling in your code. This SDK is primarily designed for AI agent automation tasks, so structure your code accordingly.