Skip to content

Commit e7ef5d8

Browse files
author
ConnorWhelan11
committed
latency benchmark results
1 parent 9dfb58d commit e7ef5d8

File tree

7 files changed

+734
-2
lines changed

7 files changed

+734
-2
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,18 @@ See `packages/clawdstrike-openclaw/docs/getting-started.md`.
120120
| **Fail-Closed Design** | Invalid policies reject at load time; errors deny access |
121121
| **Signed Receipts** | Tamper-evident audit trail with Ed25519 signatures |
122122

123+
## Performance
124+
125+
Guard checks add **<0.05ms** overhead per tool call. For context, typical LLM API calls take 500-2000ms.
126+
127+
| Operation | Latency | % of LLM call |
128+
|-----------|---------|---------------|
129+
| Single guard check | <0.001ms | <0.0001% |
130+
| Full policy evaluation | ~0.04ms | ~0.004% |
131+
| Jailbreak detection (heuristic+statistical) | ~0.03ms | ~0.003% |
132+
133+
No external API calls required for core detection. [Full benchmarks →](docs/src/reference/benchmarks.md)
134+
123135
## Documentation
124136

125137
- [Design Philosophy](docs/src/concepts/design-philosophy.md) — Fail-closed, defense in depth

docs/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454
- [TypeScript](reference/api/typescript.md)
5555
- [Python](reference/api/python.md)
5656
- [CLI](reference/api/cli.md)
57+
- [Benchmarks](reference/benchmarks.md)
5758

5859
# Recipes
5960

docs/src/reference/benchmarks.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Performance Benchmarks
2+
3+
Clawdstrike is designed for minimal latency overhead at the tool boundary. This page documents our benchmark methodology and results.
4+
5+
## Summary
6+
7+
| Operation | Average Latency | Context |
8+
|-----------|-----------------|---------|
9+
| Individual guard check | <0.001ms | Pattern matching, allowlist lookup |
10+
| PolicyEngine.evaluate() | ~0.04ms | Full policy evaluation |
11+
| Jailbreak detection (heuristic+statistical) | ~0.03ms | Without ML/LLM layers |
12+
| Combined tool boundary check | ~0.05ms | All guards + jailbreak |
13+
14+
**Bottom line:** Guard overhead is <0.01% of typical LLM API latency (500-2000ms).
15+
16+
## Running Benchmarks
17+
18+
### TypeScript SDK (`@clawdstrike/sdk`)
19+
20+
```bash
21+
cd packages/hush-ts
22+
npm run bench
23+
24+
# JSON output for CI
25+
npm run bench:json
26+
```
27+
28+
### OpenClaw Plugin (`@clawdstrike/clawdstrike-security`)
29+
30+
```bash
31+
cd packages/clawdstrike-openclaw
32+
npm run bench
33+
34+
# JSON output for CI
35+
npm run bench:json
36+
```
37+
38+
### Rust CLI
39+
40+
```bash
41+
# After building in release mode
42+
time (for i in {1..100}; do ./target/release/hush check --action-type file --ruleset strict /tmp/test.txt; done)
43+
```
44+
45+
## Detailed Results
46+
47+
### Guard Latency (TypeScript)
48+
49+
Benchmarked on Apple M1 Pro, Node.js v20.x:
50+
51+
```
52+
======================================================================
53+
BENCHMARK RESULTS
54+
======================================================================
55+
Benchmark Avg (ms) Min (ms) Max (ms) Ops/sec
56+
----------------------------------------------------------------------
57+
ForbiddenPath (safe) 0.0003 0.0000 0.0420 3125000
58+
ForbiddenPath (blocked) 0.0003 0.0000 0.0210 3571428
59+
SecretLeak (clean) 0.0004 0.0000 0.0420 2500000
60+
SecretLeak (detected) 0.0002 0.0000 0.0210 4166666
61+
EgressAllowlist (allowed) 0.0002 0.0000 0.0210 4545454
62+
EgressAllowlist (blocked) 0.0003 0.0000 0.0420 3571428
63+
Jailbreak Heuristic (safe) 0.0005 0.0000 0.0840 2000000
64+
Jailbreak Heuristic (detected) 0.0003 0.0000 0.0420 3333333
65+
Jailbreak Statistical (safe) 0.0089 0.0000 0.0840 112359
66+
Jailbreak Statistical (suspicious) 0.0126 0.0000 0.0840 79365
67+
Combined Tool Check 0.0004 0.0000 0.0420 2380952
68+
Jailbreak Full Pipeline 0.0067 0.0000 0.0420 149253
69+
======================================================================
70+
```
71+
72+
### PolicyEngine Latency (OpenClaw Plugin)
73+
74+
```
75+
================================================================================
76+
POLICYENGINE BENCHMARK RESULTS
77+
================================================================================
78+
Benchmark Avg (ms) p50 (ms) p95 (ms) p99 (ms)
79+
--------------------------------------------------------------------------------
80+
File Read (allowed) 0.0350 0.0330 0.0420 0.0830
81+
File Read (blocked) 0.0380 0.0330 0.0420 0.1670
82+
Network Egress (allowed) 0.0340 0.0330 0.0420 0.0420
83+
Network Egress (blocked) 0.0360 0.0330 0.0420 0.0830
84+
Command Exec 0.0320 0.0330 0.0420 0.0420
85+
Rapid Sequential (10 checks) 0.3400 0.3330 0.4170 0.4580
86+
================================================================================
87+
88+
Summary:
89+
Average single-check overhead: 0.0350ms
90+
Typical LLM API latency: 500-2000ms
91+
Guard overhead as % of LLM: 0.0035%
92+
Verdict: Negligible impact on agent performance
93+
```
94+
95+
## Why It's Fast
96+
97+
1. **No network calls** — Core detection is self-contained, no external API dependencies
98+
2. **Pattern pre-compilation** — Regex patterns are compiled once at startup
99+
3. **Early exit** — Fail-fast evaluation stops on first violation
100+
4. **Minimal allocations** — Hot paths avoid heap allocations where possible
101+
5. **Optional expensive layers** — ML and LLM-as-judge are opt-in for high-stakes decisions
102+
103+
## Latency Budget
104+
105+
For a typical agentic workflow:
106+
107+
| Phase | Latency |
108+
|-------|---------|
109+
| User input processing | 1-5ms |
110+
| **Clawdstrike preflight check** | **<0.1ms** |
111+
| LLM API call | 500-2000ms |
112+
| Tool execution | 10-1000ms |
113+
| **Clawdstrike post-action check** | **<0.1ms** |
114+
| Response formatting | 1-5ms |
115+
116+
Clawdstrike adds <0.2ms to a workflow that typically takes 500-3000ms.
117+
118+
## CI Integration
119+
120+
Benchmarks can output JSON for tracking performance over time:
121+
122+
```bash
123+
OUTPUT_JSON=1 npm run bench > benchmark-results.json
124+
```
125+
126+
Example JSON output:
127+
128+
```json
129+
{
130+
"timestamp": "2026-02-03T12:00:00.000Z",
131+
"node": "v20.10.0",
132+
"summary": {
133+
"avgOverheadMs": 0.035,
134+
"overheadPercent": 0.0035
135+
},
136+
"results": [
137+
{ "name": "File Read (allowed)", "avgMs": 0.035, "p50Ms": 0.033, "p95Ms": 0.042, "p99Ms": 0.083 }
138+
]
139+
}
140+
```
141+
142+
## Comparison with External Guardrails
143+
144+
Some guardrail solutions call external APIs for every check:
145+
146+
| Approach | Latency | Cost |
147+
|----------|---------|------|
148+
| Clawdstrike (built-in) | <0.1ms | Free |
149+
| External model API (e.g., Gray Swan) | 100-500ms | Per-request |
150+
| LLM-as-judge | 500-2000ms | Per-request |
151+
152+
Clawdstrike's multi-layer approach runs fast heuristic/statistical checks first, only invoking expensive layers when needed.

0 commit comments

Comments
 (0)