Skip to content

Commit de30222

Browse files
authored
L/docs updates agent (#392)
* add columns * docs updates and small sdk changes * cleanup * fx
1 parent 58140f9 commit de30222

32 files changed

Lines changed: 1716 additions & 1421 deletions

docs/advanced/patterns.mdx

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,53 @@
11
---
22
title: "Advanced Patterns"
3-
description: "Mock mode, scenario internals, and troubleshooting"
3+
description: "Sandboxing, mocking, scenario internals, and troubleshooting"
44
icon: "wrench"
55
---
66

7-
## Mock Mode
7+
## Sandboxing
8+
9+
Agents need isolated state. You can't point an agent at production — it'll make real changes, hit real APIs, affect real users. These patterns keep things safe.
10+
11+
### Database Isolation
12+
13+
**In-memory SQLite** — fastest, resets automatically:
14+
15+
```python
16+
import sqlite3
17+
db = sqlite3.connect(":memory:")
18+
19+
@env.scenario("update-order")
20+
async def update_order(order_id: str):
21+
db.executescript(Path("fixtures/orders.sql").read_text())
22+
answer = yield f"Update order {order_id} to shipped"
23+
row = db.execute("SELECT status FROM orders WHERE id=?", (order_id,)).fetchone()
24+
yield 1.0 if row and row[0] == "shipped" else 0.0
25+
```
26+
27+
**Transaction rollback** — use your real DB, undo changes:
28+
29+
```python
30+
@env.scenario("process-refund")
31+
async def process_refund(order_id: str):
32+
conn = await asyncpg.connect(DATABASE_URL)
33+
tx = conn.transaction()
34+
await tx.start()
35+
try:
36+
answer = yield f"Process refund for order {order_id}"
37+
yield reward
38+
finally:
39+
await tx.rollback()
40+
await conn.close()
41+
```
42+
43+
**Fixture seeding** — deterministic starting state:
44+
45+
```python
46+
await db.execute("TRUNCATE orders, users CASCADE")
47+
await db.executemany("INSERT INTO users ...", fixtures["users"])
48+
```
49+
50+
### Mocking External Services
851

952
`env.mock()` intercepts at the tool layer. Agents only see tools, so this is usually all you need for testing agent logic without hitting real services:
1053

@@ -16,6 +59,25 @@ env.mock_tool("charge_card", {"success": True, "transaction_id": "tx-mock"})
1659

1760
Your agent code stays the same — toggle `env.mock()` for testing.
1861

62+
For stateful mocking (tracking what happened for assertions):
63+
64+
```python
65+
class MockPaymentService:
66+
def __init__(self):
67+
self.charges = []
68+
69+
async def charge(self, amount: int, card_token: str) -> dict:
70+
self.charges.append({"amount": amount, "token": card_token})
71+
return {"success": True, "id": f"ch-{len(self.charges)}"}
72+
73+
payments = MockPaymentService()
74+
75+
@env.scenario("checkout")
76+
async def checkout(cart_total: int):
77+
_ = yield f"Complete checkout for ${cart_total}"
78+
yield 1.0 if any(c["amount"] == cart_total for c in payments.charges) else 0.0
79+
```
80+
1981
## Testing Scenarios Directly
2082

2183
Scenarios are async generators. `hud.eval()` drives them automatically, but you can test the grading logic directly:

docs/advanced/testing-environments.mdx

Lines changed: 0 additions & 239 deletions
This file was deleted.

0 commit comments

Comments
 (0)