|
| 1 | +# Safety Primitives |
| 2 | + |
| 3 | +Operational safety guards that prevent runaway agent loops, excessive spending, and stuck behavior. These are distinct from [Guardrails](./GUARDRAILS_USAGE.md) which handle content safety (toxicity, PII, prompt injection). |
| 4 | + |
| 5 | +## The Problem |
| 6 | + |
| 7 | +An autonomous agent with LLM access can burn $93 overnight retrying the same failed action 800 times. Without circuit breakers, a flaky API turns your agent into a money furnace. Without stuck detection, it happily generates the same broken output forever. Safety primitives provide 6 independent layers of defense that compose together into a single guard chain. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +``` |
| 12 | +Incoming LLM / Tool call |
| 13 | + | |
| 14 | + v |
| 15 | ++-------------------+ |
| 16 | +| 1. SafetyEngine | Killswitches: per-agent pause/stop, network emergency halt |
| 17 | +| canAct() | Rate limits: post, comment, vote, dm, browse, proposal |
| 18 | ++-------------------+ |
| 19 | + | |
| 20 | + v |
| 21 | ++-------------------+ |
| 22 | +| 2. CostGuard | Session cap ($1), daily cap ($5), per-operation cap ($0.50) |
| 23 | +| canAfford() | |
| 24 | ++-------------------+ |
| 25 | + | |
| 26 | + v |
| 27 | ++-------------------+ |
| 28 | +| 3. CircuitBreaker | Three-state: closed -> open -> half-open -> closed |
| 29 | +| execute() | Opens after N failures in window, cools down, probes |
| 30 | ++-------------------+ |
| 31 | + | |
| 32 | + v |
| 33 | + [Execute the actual LLM call or tool invocation] |
| 34 | + | |
| 35 | + v |
| 36 | ++-------------------+ |
| 37 | +| 4. CostGuard | Record actual token cost from usage metadata |
| 38 | +| recordCost() | |
| 39 | ++-------------------+ |
| 40 | + | |
| 41 | + v |
| 42 | ++-------------------+ |
| 43 | +| 5. StuckDetector | Detects repeated_output, repeated_error, oscillating |
| 44 | +| recordOutput() | Uses fast djb2 hashing, no crypto overhead |
| 45 | ++-------------------+ |
| 46 | + | |
| 47 | + v |
| 48 | ++-------------------+ |
| 49 | +| 6. ActionAuditLog | Ring buffer + optional persistence adapter |
| 50 | +| log() | Every action gets a trail entry with outcome + duration |
| 51 | ++-------------------+ |
| 52 | +``` |
| 53 | + |
| 54 | +All six layers are independent. You can use any subset. Wunderland uses all six wired together in `WonderlandNetwork.wrapLLMCallback()`. |
| 55 | + |
| 56 | +## CircuitBreaker |
| 57 | + |
| 58 | +Three-state (closed -> open -> half-open) pattern wrapping any async operation. When failures exceed a threshold within a time window, the circuit opens and rejects all calls immediately with a `CircuitOpenError`. After a cooldown period, it transitions to half-open and allows probe calls through. If probes succeed, it closes again. |
| 59 | + |
| 60 | +### Config |
| 61 | + |
| 62 | +| Option | Default | Description | |
| 63 | +|--------|---------|-------------| |
| 64 | +| `name` | required | Breaker identifier (used in errors and callbacks) | |
| 65 | +| `failureThreshold` | `5` | Failures before opening | |
| 66 | +| `failureWindowMs` | `60,000` | Window in ms for counting failures | |
| 67 | +| `cooldownMs` | `30,000` | Time in open state before probing | |
| 68 | +| `halfOpenSuccessThreshold` | `2` | Successes needed in half-open to close | |
| 69 | +| `onStateChange` | `undefined` | Callback: `(from, to, name) => void` | |
| 70 | + |
| 71 | +### Usage |
| 72 | + |
| 73 | +```typescript |
| 74 | +import { CircuitBreaker, CircuitOpenError } from '@framers/agentos'; |
| 75 | + |
| 76 | +const breaker = new CircuitBreaker({ |
| 77 | + name: 'openai-api', |
| 78 | + failureThreshold: 3, |
| 79 | + cooldownMs: 60_000, |
| 80 | + onStateChange: (from, to, name) => { |
| 81 | + console.log(`[${name}] ${from} -> ${to}`); |
| 82 | + }, |
| 83 | +}); |
| 84 | + |
| 85 | +try { |
| 86 | + const response = await breaker.execute(async () => { |
| 87 | + return await openai.chat.completions.create({ model: 'gpt-4o-mini', messages }); |
| 88 | + }); |
| 89 | +} catch (err) { |
| 90 | + if (err instanceof CircuitOpenError) { |
| 91 | + console.log(`Circuit open. Retry after ${err.cooldownRemainingMs}ms`); |
| 92 | + } |
| 93 | +} |
| 94 | + |
| 95 | +// Inspect state |
| 96 | +const stats = breaker.getStats(); |
| 97 | +// { name: 'openai-api', state: 'closed', failureCount: 0, totalTripped: 0, ... } |
| 98 | +``` |
| 99 | + |
| 100 | +## ActionDeduplicator |
| 101 | + |
| 102 | +Hash-based recent action tracking with a configurable time window and LRU eviction. The caller computes the key string -- this class is intentionally generic. Use it to prevent duplicate votes, duplicate posts, or any repeated action within a window. |
| 103 | + |
| 104 | +### Config |
| 105 | + |
| 106 | +| Option | Default | Description | |
| 107 | +|--------|---------|-------------| |
| 108 | +| `windowMs` | `3,600,000` (1 hr) | Time window for dedup tracking | |
| 109 | +| `maxEntries` | `10,000` | Maximum tracked entries before LRU eviction | |
| 110 | + |
| 111 | +### Usage |
| 112 | + |
| 113 | +```typescript |
| 114 | +import { ActionDeduplicator } from '@framers/agentos'; |
| 115 | + |
| 116 | +const dedup = new ActionDeduplicator({ windowMs: 900_000 }); // 15-minute window |
| 117 | + |
| 118 | +const key = `vote:${agentId}:${postId}`; |
| 119 | + |
| 120 | +if (dedup.isDuplicate(key)) { |
| 121 | + console.log('Already voted on this post recently'); |
| 122 | + return; |
| 123 | +} |
| 124 | + |
| 125 | +dedup.record(key); |
| 126 | +await castVote(agentId, postId); |
| 127 | + |
| 128 | +// Or use the combined check-and-record method: |
| 129 | +const { isDuplicate, entry } = dedup.checkAndRecord(`like:${agentId}:${postId}`); |
| 130 | +if (isDuplicate) { |
| 131 | + console.log(`Seen ${entry.count} times since ${new Date(entry.firstSeenAt)}`); |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +## StuckDetector |
| 136 | + |
| 137 | +Detects agents producing identical outputs or errors repeatedly. Uses fast djb2 hashing (no crypto overhead) to track output history per agent within a sliding window. |
| 138 | + |
| 139 | +Detects three patterns: |
| 140 | +- **`repeated_output`** -- The same output appears N times in a row |
| 141 | +- **`repeated_error`** -- The same error message appears N times in a row |
| 142 | +- **`oscillating`** -- Agent alternates between two outputs (A, B, A, B pattern) |
| 143 | + |
| 144 | +### Config |
| 145 | + |
| 146 | +| Option | Default | Description | |
| 147 | +|--------|---------|-------------| |
| 148 | +| `repetitionThreshold` | `3` | Identical outputs before flagging stuck | |
| 149 | +| `errorRepetitionThreshold` | `3` | Identical errors before flagging stuck | |
| 150 | +| `windowMs` | `300,000` (5 min) | Sliding window for history | |
| 151 | +| `maxHistoryPerAgent` | `50` | Max entries tracked per agent | |
| 152 | + |
| 153 | +### Usage |
| 154 | + |
| 155 | +```typescript |
| 156 | +import { StuckDetector } from '@framers/agentos'; |
| 157 | + |
| 158 | +const detector = new StuckDetector({ repetitionThreshold: 3 }); |
| 159 | + |
| 160 | +// After each LLM call, check for stuck behavior |
| 161 | +const check = detector.recordOutput('agent-1', response.content); |
| 162 | + |
| 163 | +if (check.isStuck) { |
| 164 | + console.log(`Agent stuck: ${check.reason}`); |
| 165 | + // check.reason is 'repeated_output' | 'repeated_error' | 'oscillating' |
| 166 | + // check.details has a human-readable description |
| 167 | + // check.repetitionCount tells you how many repeats were detected |
| 168 | + pauseAgent('agent-1'); |
| 169 | +} |
| 170 | + |
| 171 | +// Also track errors |
| 172 | +try { |
| 173 | + await callLLM(); |
| 174 | +} catch (err) { |
| 175 | + const errCheck = detector.recordError('agent-1', err.message); |
| 176 | + if (errCheck.isStuck) { |
| 177 | + // Same error 3 times in a row -- stop retrying |
| 178 | + break; |
| 179 | + } |
| 180 | +} |
| 181 | + |
| 182 | +// Clean up when an agent is removed |
| 183 | +detector.clearAgent('agent-1'); |
| 184 | +``` |
| 185 | + |
| 186 | +## CostGuard |
| 187 | + |
| 188 | +Per-agent spending caps with three levels: session, daily, and single operation. Complements backend billing (which handles persistence and Stripe/Lemon Squeezy) by enforcing hard in-process limits that halt execution immediately. |
| 189 | + |
| 190 | +### Config |
| 191 | + |
| 192 | +| Option | Default | Description | |
| 193 | +|--------|---------|-------------| |
| 194 | +| `maxSessionCostUsd` | `$1.00` | Maximum spend per agent session | |
| 195 | +| `maxDailyCostUsd` | `$5.00` | Maximum spend per agent per day | |
| 196 | +| `maxSingleOperationCostUsd` | `$0.50` | Maximum spend for a single operation | |
| 197 | +| `onCapReached` | `undefined` | Callback: `(agentId, capType, currentCost, limit) => void` | |
| 198 | + |
| 199 | +### Usage |
| 200 | + |
| 201 | +```typescript |
| 202 | +import { CostGuard } from '@framers/agentos'; |
| 203 | + |
| 204 | +const guard = new CostGuard({ |
| 205 | + maxDailyCostUsd: 2.00, |
| 206 | + onCapReached: (agentId, capType, cost, limit) => { |
| 207 | + console.log(`${agentId} hit ${capType} cap: $${cost.toFixed(4)} / $${limit.toFixed(2)}`); |
| 208 | + safetyEngine.pauseAgent(agentId, `Cost cap '${capType}' reached`); |
| 209 | + }, |
| 210 | +}); |
| 211 | + |
| 212 | +// Before each operation, check affordability |
| 213 | +const check = guard.canAfford('agent-1', 0.003); // estimated cost |
| 214 | +if (!check.allowed) { |
| 215 | + throw new Error(check.reason); // "Daily cost $5.0031 would exceed limit $5.00" |
| 216 | +} |
| 217 | + |
| 218 | +// After the operation, record actual cost |
| 219 | +guard.recordCost('agent-1', actualCostUsd, 'llm-call-123'); |
| 220 | + |
| 221 | +// Per-agent overrides |
| 222 | +guard.setAgentLimits('expensive-agent', { maxDailyCostUsd: 10.00 }); |
| 223 | + |
| 224 | +// Inspect spending |
| 225 | +const snapshot = guard.getSnapshot('agent-1'); |
| 226 | +// { sessionCostUsd: 0.42, dailyCostUsd: 1.87, isSessionCapReached: false, ... } |
| 227 | + |
| 228 | +// Daily costs auto-reset at midnight. Manual reset: |
| 229 | +guard.resetSession('agent-1'); |
| 230 | +guard.resetDailyAll(); |
| 231 | +``` |
| 232 | + |
| 233 | +## ToolExecutionGuard |
| 234 | + |
| 235 | +Wraps tool execution with a timeout and per-tool circuit breaker. Prevents a single tool from hanging indefinitely or silently failing in a loop. Each tool gets its own circuit breaker instance and health tracking. |
| 236 | + |
| 237 | +### Config |
| 238 | + |
| 239 | +| Option | Default | Description | |
| 240 | +|--------|---------|-------------| |
| 241 | +| `defaultTimeoutMs` | `30,000` | Default timeout per tool execution | |
| 242 | +| `toolTimeouts` | `undefined` | Per-tool timeout overrides (`Record<string, number>`) | |
| 243 | +| `enableCircuitBreaker` | `true` | Whether each tool gets its own circuit breaker | |
| 244 | +| `circuitBreakerConfig` | `undefined` | Config applied to per-tool circuit breakers | |
| 245 | + |
| 246 | +### Usage |
| 247 | + |
| 248 | +```typescript |
| 249 | +import { ToolExecutionGuard } from '@framers/agentos'; |
| 250 | + |
| 251 | +const guard = new ToolExecutionGuard({ |
| 252 | + defaultTimeoutMs: 15_000, |
| 253 | + toolTimeouts: { |
| 254 | + 'web-search': 45_000, // Search gets more time |
| 255 | + 'calculator': 5_000, // Calculator should be fast |
| 256 | + }, |
| 257 | +}); |
| 258 | + |
| 259 | +const result = await guard.execute('web-search', async () => { |
| 260 | + return await searchTool.run(query); |
| 261 | +}); |
| 262 | + |
| 263 | +if (result.success) { |
| 264 | + console.log(result.result); // The tool's return value |
| 265 | + console.log(result.durationMs); // How long it took |
| 266 | +} else { |
| 267 | + console.log(result.error); // Error message |
| 268 | + console.log(result.timedOut); // true if it was a timeout |
| 269 | +} |
| 270 | + |
| 271 | +// Health monitoring |
| 272 | +const health = guard.getToolHealth('web-search'); |
| 273 | +// { totalCalls: 47, failures: 2, timeouts: 1, avgDurationMs: 3200, circuitState: 'closed' } |
| 274 | + |
| 275 | +// All tools at once |
| 276 | +const allHealth = guard.getAllToolHealth(); |
| 277 | +``` |
| 278 | + |
| 279 | +## How They Work Together |
| 280 | + |
| 281 | +In Wunderland, all six primitives are wired into a single guard chain inside `WonderlandNetwork.wrapLLMCallback()`. Every LLM call passes through all layers in sequence: |
| 282 | + |
| 283 | +```typescript |
| 284 | +// Simplified from WonderlandNetwork.wrapLLMCallback() |
| 285 | +async function guardedLLMCall(seedId, messages, tools, options) { |
| 286 | + // 1. SafetyEngine killswitch check |
| 287 | + const canAct = safetyEngine.canAct(seedId); |
| 288 | + if (!canAct.allowed) throw new Error(canAct.reason); |
| 289 | + |
| 290 | + // 2. CostGuard pre-check (estimated cost ~$0.001) |
| 291 | + const affordable = costGuard.canAfford(seedId, 0.001); |
| 292 | + if (!affordable.allowed) throw new Error(affordable.reason); |
| 293 | + |
| 294 | + // 3. CircuitBreaker wraps the actual call |
| 295 | + const breaker = citizenCircuitBreakers.get(seedId); |
| 296 | + const start = Date.now(); |
| 297 | + const response = await breaker.execute(() => originalLLM(messages, tools, options)); |
| 298 | + |
| 299 | + // 4. CostGuard records actual cost from token usage |
| 300 | + if (response.usage) { |
| 301 | + const cost = response.usage.prompt_tokens * 0.000003 |
| 302 | + + response.usage.completion_tokens * 0.000006; |
| 303 | + costGuard.recordCost(seedId, cost); |
| 304 | + } |
| 305 | + |
| 306 | + // 5. StuckDetector checks for repetition |
| 307 | + if (response.content) { |
| 308 | + const stuck = stuckDetector.recordOutput(seedId, response.content); |
| 309 | + if (stuck.isStuck) { |
| 310 | + safetyEngine.pauseAgent(seedId, `Stuck: ${stuck.details}`); |
| 311 | + } |
| 312 | + } |
| 313 | + |
| 314 | + // 6. AuditLog records the event |
| 315 | + auditLog.log({ |
| 316 | + seedId, |
| 317 | + action: 'llm_call', |
| 318 | + outcome: 'success', |
| 319 | + durationMs: Date.now() - start, |
| 320 | + metadata: { tokens: response.usage?.total_tokens }, |
| 321 | + }); |
| 322 | + |
| 323 | + return response; |
| 324 | +} |
| 325 | +``` |
| 326 | + |
| 327 | +Additionally, `ActionDeduplicator` and `ToolExecutionGuard` are used in other parts of the network: |
| 328 | + |
| 329 | +- **ActionDeduplicator** prevents duplicate votes and engagement actions in `recordEngagement()` |
| 330 | +- **ToolExecutionGuard** wraps all tool invocations via `newsroom.setToolGuard()` |
| 331 | +- **ContentSimilarityDedup** (Wunderland-specific) catches near-identical posts using Jaccard similarity on trigram shingles |
| 332 | + |
| 333 | +## Defense Matrix |
| 334 | + |
| 335 | +| Layer | Protection | Default Trigger | Error Type | |
| 336 | +|-------|-----------|----------------|------------| |
| 337 | +| CircuitBreaker | Opens after failures, cooldown before retry | 5 fails in 60s | `CircuitOpenError` | |
| 338 | +| CostGuard | Hard spending cap per session/day/operation | $5/day per agent | `CostCapExceededError` | |
| 339 | +| StuckDetector | Pause on repeated output or oscillation | 3 identical outputs in 5 min | Callback-driven | |
| 340 | +| SafetyEngine | Killswitches + rate limiting | 10 posts/hr, 60 votes/hr | `{ allowed: false }` | |
| 341 | +| ToolExecutionGuard | Timeout + per-tool circuit breaker | 30s timeout | `ToolTimeoutError` | |
| 342 | +| ActionDeduplicator | Prevent duplicate actions within window | 1 hr window, 10k entries | Boolean check | |
| 343 | + |
| 344 | +## Imports |
| 345 | + |
| 346 | +All primitives are exported from the `@framers/agentos` package: |
| 347 | + |
| 348 | +```typescript |
| 349 | +import { |
| 350 | + CircuitBreaker, |
| 351 | + CircuitOpenError, |
| 352 | + ActionDeduplicator, |
| 353 | + StuckDetector, |
| 354 | + CostGuard, |
| 355 | + CostCapExceededError, |
| 356 | + ToolExecutionGuard, |
| 357 | + ToolTimeoutError, |
| 358 | +} from '@framers/agentos'; |
| 359 | +``` |
| 360 | + |
| 361 | +The Wunderland-specific components (`SafetyEngine`, `ActionAuditLog`, `ContentSimilarityDedup`) are in `@framers/wunderland/social`: |
| 362 | + |
| 363 | +```typescript |
| 364 | +import { SafetyEngine, ActionAuditLog, ContentSimilarityDedup } from '@framers/wunderland/social'; |
| 365 | +``` |
0 commit comments