Skip to content

Commit 3ca722d

Browse files
jddunnclaude
andcommitted
feat: safety primitives — GuardedToolResult rename, tests & docs
Rename ToolExecutionResult → GuardedToolResult to avoid collision with ITool's ToolExecutionResult. Add 57 unit tests across all 5 safety primitives (CircuitBreaker, ActionDeduplicator, StuckDetector, CostGuard, ToolExecutionGuard). Add package-level SAFETY_PRIMITIVES.md reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f7011db commit 3ca722d

8 files changed

Lines changed: 1280 additions & 3 deletions

File tree

docs/SAFETY_PRIMITIVES.md

Lines changed: 365 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,365 @@
1+
# Safety Primitives
2+
3+
Operational safety guards that prevent runaway agent loops, excessive spending, and stuck behavior. These are distinct from [Guardrails](./GUARDRAILS_USAGE.md) which handle content safety (toxicity, PII, prompt injection).
4+
5+
## The Problem
6+
7+
An autonomous agent with LLM access can burn $93 overnight retrying the same failed action 800 times. Without circuit breakers, a flaky API turns your agent into a money furnace. Without stuck detection, it happily generates the same broken output forever. Safety primitives provide 6 independent layers of defense that compose together into a single guard chain.
8+
9+
## Architecture
10+
11+
```
12+
Incoming LLM / Tool call
13+
|
14+
v
15+
+-------------------+
16+
| 1. SafetyEngine | Killswitches: per-agent pause/stop, network emergency halt
17+
| canAct() | Rate limits: post, comment, vote, dm, browse, proposal
18+
+-------------------+
19+
|
20+
v
21+
+-------------------+
22+
| 2. CostGuard | Session cap ($1), daily cap ($5), per-operation cap ($0.50)
23+
| canAfford() |
24+
+-------------------+
25+
|
26+
v
27+
+-------------------+
28+
| 3. CircuitBreaker | Three-state: closed -> open -> half-open -> closed
29+
| execute() | Opens after N failures in window, cools down, probes
30+
+-------------------+
31+
|
32+
v
33+
[Execute the actual LLM call or tool invocation]
34+
|
35+
v
36+
+-------------------+
37+
| 4. CostGuard | Record actual token cost from usage metadata
38+
| recordCost() |
39+
+-------------------+
40+
|
41+
v
42+
+-------------------+
43+
| 5. StuckDetector | Detects repeated_output, repeated_error, oscillating
44+
| recordOutput() | Uses fast djb2 hashing, no crypto overhead
45+
+-------------------+
46+
|
47+
v
48+
+-------------------+
49+
| 6. ActionAuditLog | Ring buffer + optional persistence adapter
50+
| log() | Every action gets a trail entry with outcome + duration
51+
+-------------------+
52+
```
53+
54+
All six layers are independent. You can use any subset. Wunderland uses all six wired together in `WonderlandNetwork.wrapLLMCallback()`.
55+
56+
## CircuitBreaker
57+
58+
Three-state (closed -> open -> half-open) pattern wrapping any async operation. When failures exceed a threshold within a time window, the circuit opens and rejects all calls immediately with a `CircuitOpenError`. After a cooldown period, it transitions to half-open and allows probe calls through. If probes succeed, it closes again.
59+
60+
### Config
61+
62+
| Option | Default | Description |
63+
|--------|---------|-------------|
64+
| `name` | required | Breaker identifier (used in errors and callbacks) |
65+
| `failureThreshold` | `5` | Failures before opening |
66+
| `failureWindowMs` | `60,000` | Window in ms for counting failures |
67+
| `cooldownMs` | `30,000` | Time in open state before probing |
68+
| `halfOpenSuccessThreshold` | `2` | Successes needed in half-open to close |
69+
| `onStateChange` | `undefined` | Callback: `(from, to, name) => void` |
70+
71+
### Usage
72+
73+
```typescript
74+
import { CircuitBreaker, CircuitOpenError } from '@framers/agentos';
75+
76+
const breaker = new CircuitBreaker({
77+
name: 'openai-api',
78+
failureThreshold: 3,
79+
cooldownMs: 60_000,
80+
onStateChange: (from, to, name) => {
81+
console.log(`[${name}] ${from} -> ${to}`);
82+
},
83+
});
84+
85+
try {
86+
const response = await breaker.execute(async () => {
87+
return await openai.chat.completions.create({ model: 'gpt-4o-mini', messages });
88+
});
89+
} catch (err) {
90+
if (err instanceof CircuitOpenError) {
91+
console.log(`Circuit open. Retry after ${err.cooldownRemainingMs}ms`);
92+
}
93+
}
94+
95+
// Inspect state
96+
const stats = breaker.getStats();
97+
// { name: 'openai-api', state: 'closed', failureCount: 0, totalTripped: 0, ... }
98+
```
99+
100+
## ActionDeduplicator
101+
102+
Hash-based recent action tracking with a configurable time window and LRU eviction. The caller computes the key string -- this class is intentionally generic. Use it to prevent duplicate votes, duplicate posts, or any repeated action within a window.
103+
104+
### Config
105+
106+
| Option | Default | Description |
107+
|--------|---------|-------------|
108+
| `windowMs` | `3,600,000` (1 hr) | Time window for dedup tracking |
109+
| `maxEntries` | `10,000` | Maximum tracked entries before LRU eviction |
110+
111+
### Usage
112+
113+
```typescript
114+
import { ActionDeduplicator } from '@framers/agentos';
115+
116+
const dedup = new ActionDeduplicator({ windowMs: 900_000 }); // 15-minute window
117+
118+
const key = `vote:${agentId}:${postId}`;
119+
120+
if (dedup.isDuplicate(key)) {
121+
console.log('Already voted on this post recently');
122+
return;
123+
}
124+
125+
dedup.record(key);
126+
await castVote(agentId, postId);
127+
128+
// Or use the combined check-and-record method:
129+
const { isDuplicate, entry } = dedup.checkAndRecord(`like:${agentId}:${postId}`);
130+
if (isDuplicate) {
131+
console.log(`Seen ${entry.count} times since ${new Date(entry.firstSeenAt)}`);
132+
}
133+
```
134+
135+
## StuckDetector
136+
137+
Detects agents producing identical outputs or errors repeatedly. Uses fast djb2 hashing (no crypto overhead) to track output history per agent within a sliding window.
138+
139+
Detects three patterns:
140+
- **`repeated_output`** -- The same output appears N times in a row
141+
- **`repeated_error`** -- The same error message appears N times in a row
142+
- **`oscillating`** -- Agent alternates between two outputs (A, B, A, B pattern)
143+
144+
### Config
145+
146+
| Option | Default | Description |
147+
|--------|---------|-------------|
148+
| `repetitionThreshold` | `3` | Identical outputs before flagging stuck |
149+
| `errorRepetitionThreshold` | `3` | Identical errors before flagging stuck |
150+
| `windowMs` | `300,000` (5 min) | Sliding window for history |
151+
| `maxHistoryPerAgent` | `50` | Max entries tracked per agent |
152+
153+
### Usage
154+
155+
```typescript
156+
import { StuckDetector } from '@framers/agentos';
157+
158+
const detector = new StuckDetector({ repetitionThreshold: 3 });
159+
160+
// After each LLM call, check for stuck behavior
161+
const check = detector.recordOutput('agent-1', response.content);
162+
163+
if (check.isStuck) {
164+
console.log(`Agent stuck: ${check.reason}`);
165+
// check.reason is 'repeated_output' | 'repeated_error' | 'oscillating'
166+
// check.details has a human-readable description
167+
// check.repetitionCount tells you how many repeats were detected
168+
pauseAgent('agent-1');
169+
}
170+
171+
// Also track errors
172+
try {
173+
await callLLM();
174+
} catch (err) {
175+
const errCheck = detector.recordError('agent-1', err.message);
176+
if (errCheck.isStuck) {
177+
// Same error 3 times in a row -- stop retrying
178+
break;
179+
}
180+
}
181+
182+
// Clean up when an agent is removed
183+
detector.clearAgent('agent-1');
184+
```
185+
186+
## CostGuard
187+
188+
Per-agent spending caps with three levels: session, daily, and single operation. Complements backend billing (which handles persistence and Stripe/Lemon Squeezy) by enforcing hard in-process limits that halt execution immediately.
189+
190+
### Config
191+
192+
| Option | Default | Description |
193+
|--------|---------|-------------|
194+
| `maxSessionCostUsd` | `$1.00` | Maximum spend per agent session |
195+
| `maxDailyCostUsd` | `$5.00` | Maximum spend per agent per day |
196+
| `maxSingleOperationCostUsd` | `$0.50` | Maximum spend for a single operation |
197+
| `onCapReached` | `undefined` | Callback: `(agentId, capType, currentCost, limit) => void` |
198+
199+
### Usage
200+
201+
```typescript
202+
import { CostGuard } from '@framers/agentos';
203+
204+
const guard = new CostGuard({
205+
maxDailyCostUsd: 2.00,
206+
onCapReached: (agentId, capType, cost, limit) => {
207+
console.log(`${agentId} hit ${capType} cap: $${cost.toFixed(4)} / $${limit.toFixed(2)}`);
208+
safetyEngine.pauseAgent(agentId, `Cost cap '${capType}' reached`);
209+
},
210+
});
211+
212+
// Before each operation, check affordability
213+
const check = guard.canAfford('agent-1', 0.003); // estimated cost
214+
if (!check.allowed) {
215+
throw new Error(check.reason); // "Daily cost $5.0031 would exceed limit $5.00"
216+
}
217+
218+
// After the operation, record actual cost
219+
guard.recordCost('agent-1', actualCostUsd, 'llm-call-123');
220+
221+
// Per-agent overrides
222+
guard.setAgentLimits('expensive-agent', { maxDailyCostUsd: 10.00 });
223+
224+
// Inspect spending
225+
const snapshot = guard.getSnapshot('agent-1');
226+
// { sessionCostUsd: 0.42, dailyCostUsd: 1.87, isSessionCapReached: false, ... }
227+
228+
// Daily costs auto-reset at midnight. Manual reset:
229+
guard.resetSession('agent-1');
230+
guard.resetDailyAll();
231+
```
232+
233+
## ToolExecutionGuard
234+
235+
Wraps tool execution with a timeout and per-tool circuit breaker. Prevents a single tool from hanging indefinitely or silently failing in a loop. Each tool gets its own circuit breaker instance and health tracking.
236+
237+
### Config
238+
239+
| Option | Default | Description |
240+
|--------|---------|-------------|
241+
| `defaultTimeoutMs` | `30,000` | Default timeout per tool execution |
242+
| `toolTimeouts` | `undefined` | Per-tool timeout overrides (`Record<string, number>`) |
243+
| `enableCircuitBreaker` | `true` | Whether each tool gets its own circuit breaker |
244+
| `circuitBreakerConfig` | `undefined` | Config applied to per-tool circuit breakers |
245+
246+
### Usage
247+
248+
```typescript
249+
import { ToolExecutionGuard } from '@framers/agentos';
250+
251+
const guard = new ToolExecutionGuard({
252+
defaultTimeoutMs: 15_000,
253+
toolTimeouts: {
254+
'web-search': 45_000, // Search gets more time
255+
'calculator': 5_000, // Calculator should be fast
256+
},
257+
});
258+
259+
const result = await guard.execute('web-search', async () => {
260+
return await searchTool.run(query);
261+
});
262+
263+
if (result.success) {
264+
console.log(result.result); // The tool's return value
265+
console.log(result.durationMs); // How long it took
266+
} else {
267+
console.log(result.error); // Error message
268+
console.log(result.timedOut); // true if it was a timeout
269+
}
270+
271+
// Health monitoring
272+
const health = guard.getToolHealth('web-search');
273+
// { totalCalls: 47, failures: 2, timeouts: 1, avgDurationMs: 3200, circuitState: 'closed' }
274+
275+
// All tools at once
276+
const allHealth = guard.getAllToolHealth();
277+
```
278+
279+
## How They Work Together
280+
281+
In Wunderland, all six primitives are wired into a single guard chain inside `WonderlandNetwork.wrapLLMCallback()`. Every LLM call passes through all layers in sequence:
282+
283+
```typescript
284+
// Simplified from WonderlandNetwork.wrapLLMCallback()
285+
async function guardedLLMCall(seedId, messages, tools, options) {
286+
// 1. SafetyEngine killswitch check
287+
const canAct = safetyEngine.canAct(seedId);
288+
if (!canAct.allowed) throw new Error(canAct.reason);
289+
290+
// 2. CostGuard pre-check (estimated cost ~$0.001)
291+
const affordable = costGuard.canAfford(seedId, 0.001);
292+
if (!affordable.allowed) throw new Error(affordable.reason);
293+
294+
// 3. CircuitBreaker wraps the actual call
295+
const breaker = citizenCircuitBreakers.get(seedId);
296+
const start = Date.now();
297+
const response = await breaker.execute(() => originalLLM(messages, tools, options));
298+
299+
// 4. CostGuard records actual cost from token usage
300+
if (response.usage) {
301+
const cost = response.usage.prompt_tokens * 0.000003
302+
+ response.usage.completion_tokens * 0.000006;
303+
costGuard.recordCost(seedId, cost);
304+
}
305+
306+
// 5. StuckDetector checks for repetition
307+
if (response.content) {
308+
const stuck = stuckDetector.recordOutput(seedId, response.content);
309+
if (stuck.isStuck) {
310+
safetyEngine.pauseAgent(seedId, `Stuck: ${stuck.details}`);
311+
}
312+
}
313+
314+
// 6. AuditLog records the event
315+
auditLog.log({
316+
seedId,
317+
action: 'llm_call',
318+
outcome: 'success',
319+
durationMs: Date.now() - start,
320+
metadata: { tokens: response.usage?.total_tokens },
321+
});
322+
323+
return response;
324+
}
325+
```
326+
327+
Additionally, `ActionDeduplicator` and `ToolExecutionGuard` are used in other parts of the network:
328+
329+
- **ActionDeduplicator** prevents duplicate votes and engagement actions in `recordEngagement()`
330+
- **ToolExecutionGuard** wraps all tool invocations via `newsroom.setToolGuard()`
331+
- **ContentSimilarityDedup** (Wunderland-specific) catches near-identical posts using Jaccard similarity on trigram shingles
332+
333+
## Defense Matrix
334+
335+
| Layer | Protection | Default Trigger | Error Type |
336+
|-------|-----------|----------------|------------|
337+
| CircuitBreaker | Opens after failures, cooldown before retry | 5 fails in 60s | `CircuitOpenError` |
338+
| CostGuard | Hard spending cap per session/day/operation | $5/day per agent | `CostCapExceededError` |
339+
| StuckDetector | Pause on repeated output or oscillation | 3 identical outputs in 5 min | Callback-driven |
340+
| SafetyEngine | Killswitches + rate limiting | 10 posts/hr, 60 votes/hr | `{ allowed: false }` |
341+
| ToolExecutionGuard | Timeout + per-tool circuit breaker | 30s timeout | `ToolTimeoutError` |
342+
| ActionDeduplicator | Prevent duplicate actions within window | 1 hr window, 10k entries | Boolean check |
343+
344+
## Imports
345+
346+
All primitives are exported from the `@framers/agentos` package:
347+
348+
```typescript
349+
import {
350+
CircuitBreaker,
351+
CircuitOpenError,
352+
ActionDeduplicator,
353+
StuckDetector,
354+
CostGuard,
355+
CostCapExceededError,
356+
ToolExecutionGuard,
357+
ToolTimeoutError,
358+
} from '@framers/agentos';
359+
```
360+
361+
The Wunderland-specific components (`SafetyEngine`, `ActionAuditLog`, `ContentSimilarityDedup`) are in `@framers/wunderland/social`:
362+
363+
```typescript
364+
import { SafetyEngine, ActionAuditLog, ContentSimilarityDedup } from '@framers/wunderland/social';
365+
```

src/core/safety/ToolExecutionGuard.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ export interface ToolExecutionGuardConfig {
1818
circuitBreakerConfig?: Partial<Omit<CircuitBreakerConfig, 'name'>>;
1919
}
2020

21-
export interface ToolExecutionResult<T = unknown> {
21+
export interface GuardedToolResult<T = unknown> {
2222
success: boolean;
2323
result?: T;
2424
error?: string;
@@ -67,7 +67,7 @@ export class ToolExecutionGuard {
6767
this.config = { ...DEFAULT_CONFIG, ...config };
6868
}
6969

70-
async execute<T>(toolName: string, fn: () => Promise<T>): Promise<ToolExecutionResult<T>> {
70+
async execute<T>(toolName: string, fn: () => Promise<T>): Promise<GuardedToolResult<T>> {
7171
const stats = this.getOrCreateStats(toolName);
7272
stats.totalCalls++;
7373
const start = Date.now();

src/core/safety/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ export { CostGuard, CostCapExceededError } from './CostGuard.js';
1717
export type { CostGuardConfig, CostCapType, CostRecord, CostSnapshot } from './CostGuard.js';
1818

1919
export { ToolExecutionGuard, ToolTimeoutError } from './ToolExecutionGuard.js';
20-
export type { ToolExecutionGuardConfig, ToolExecutionResult, ToolHealthReport } from './ToolExecutionGuard.js';
20+
export type { ToolExecutionGuardConfig, GuardedToolResult, ToolHealthReport } from './ToolExecutionGuard.js';

0 commit comments

Comments
 (0)