Skip to content

Commit ad193a1

Browse files
committed
feat(patterns): add trust middleware guardrails recipe
1 parent 34022c5 commit ad193a1

3 files changed

Lines changed: 300 additions & 0 deletions

File tree

authors.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,10 @@ rodrigo-olivares:
128128
name: Rodrigo Olivares
129129
website: https://github.com/rodrigo-olivares
130130
avatar: https://avatars.githubusercontent.com/u/185015001?v=4
131+
sriram7737:
132+
name: Sriram Rampelli
133+
website: https://github.com/sriram7737
134+
avatar: https://avatars.githubusercontent.com/u/79433129?v=4
131135
zealoushacker:
132136
name: Alex Notov
133137
website: https://github.com/zealoushacker
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Adding a trust layer to Claude agent loops\n",
8+
"\n",
9+
"Claude can reason about tools, but production agent loops still need deterministic controls outside the model. This notebook shows a guardrails-as-code pattern: let Claude propose useful work, then validate tool calls, enforce policy, and emit an inspectable trace before anything consequential executes. It extends the agent-loop ideas in Anthropic's [Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) post with a small middleware layer. Pramagent is used as the concrete implementation, but the pattern is the important part: the LLM is not the final authority."
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": null,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"%pip install -q \"anthropic>=0.87.0\" \"pramagent>=0.8.0\""
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": null,
24+
"metadata": {},
25+
"outputs": [],
26+
"source": [
27+
"import asyncio\n",
28+
"import json\n",
29+
"import os\n",
30+
"from pprint import pprint\n",
31+
"\n",
32+
"import anthropic\n",
33+
"from pramagent import Pramagent, Verdict\n",
34+
"from pramagent.layers import ComplianceLayer, HITLLayer, ReliabilityLayer, Rule, SafetyLayer\n",
35+
"from pramagent.layers import ToolGuardLayer, ToolPolicy\n",
36+
"from pramagent.layers.tool_guard import SideEffect\n",
37+
"from pramagent.providers import AnthropicProvider\n",
38+
"\n",
39+
"MODEL = \"claude-haiku-4-5\"\n",
40+
"api_key = os.environ.get(\"ANTHROPIC_API_KEY\")\n",
41+
"if not api_key:\n",
42+
" raise RuntimeError(\"Set ANTHROPIC_API_KEY in your environment before running this notebook.\")\n",
43+
"\n",
44+
"client = anthropic.Anthropic(api_key=api_key)"
45+
]
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"## The baseline agent loop\n",
52+
"\n",
53+
"Start with a tiny Claude tool loop. It has one mock side-effect tool, `send_email`. The baseline lets Claude choose the tool and arguments, then the application executes the proposed tool directly. That is useful for a demo, but it leaves several production gaps: unvalidated tool calls, no tenant policy, no human approval gate, and no audit trace explaining why execution was allowed."
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": null,
59+
"metadata": {},
60+
"outputs": [],
61+
"source": [
62+
"tools = [\n",
63+
" {\n",
64+
" \"name\": \"send_email\",\n",
65+
" \"description\": \"Send an email to a user. This is a mock tool in the notebook.\",\n",
66+
" \"input_schema\": {\n",
67+
" \"type\": \"object\",\n",
68+
" \"properties\": {\n",
69+
" \"to\": {\"type\": \"string\"},\n",
70+
" \"subject\": {\"type\": \"string\"},\n",
71+
" \"body\": {\"type\": \"string\"},\n",
72+
" },\n",
73+
" \"required\": [\"to\", \"subject\", \"body\"],\n",
74+
" },\n",
75+
" }\n",
76+
"]\n",
77+
"\n",
78+
"baseline_prompt = \"Send an email to sam@example.com saying the deployment report is ready.\"\n",
79+
"\n",
80+
"message = client.messages.create(\n",
81+
" model=MODEL,\n",
82+
" max_tokens=400,\n",
83+
" tools=tools,\n",
84+
" messages=[{\"role\": \"user\", \"content\": baseline_prompt}],\n",
85+
")\n",
86+
"\n",
87+
"raw_tool_uses = [block for block in message.content if block.type == \"tool_use\"]\n",
88+
"print(\"Claude proposed tool calls:\")\n",
89+
"for tool_use in raw_tool_uses:\n",
90+
" print(tool_use.name)\n",
91+
" pprint(tool_use.input)"
92+
]
93+
},
94+
{
95+
"cell_type": "markdown",
96+
"metadata": {},
97+
"source": [
98+
"## Wrapping the loop with a trust layer\n",
99+
"\n",
100+
"The trust layer sits between the model output and tool execution. Tool policy is defined as code, not as a prompt instruction. Here the model may propose `send_email`, but the middleware checks schema, side-effect class, and tenant scope first. Consequential actions escalate to human review; silence is treated as no approval."
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": null,
106+
"metadata": {},
107+
"outputs": [],
108+
"source": [
109+
"tool_guard = ToolGuardLayer(\n",
110+
" policies=[\n",
111+
" ToolPolicy(\n",
112+
" name=\"send_email\",\n",
113+
" side_effect=SideEffect.EXTERNAL_MESSAGE,\n",
114+
" action=Verdict.ESCALATE,\n",
115+
" allowed_tenants={\"internal_demo\"},\n",
116+
" schema={\n",
117+
" \"type\": \"object\",\n",
118+
" \"properties\": {\n",
119+
" \"to\": {\"type\": \"string\", \"maxLength\": 200},\n",
120+
" \"subject\": {\"type\": \"string\", \"maxLength\": 120},\n",
121+
" \"body\": {\"type\": \"string\", \"maxLength\": 2000},\n",
122+
" },\n",
123+
" \"required\": [\"to\", \"subject\", \"body\"],\n",
124+
" \"additionalProperties\": False,\n",
125+
" },\n",
126+
" detail=\"External message requires explicit approval before execution.\",\n",
127+
" )\n",
128+
" ]\n",
129+
")\n",
130+
"\n",
131+
"armor = Pramagent(\n",
132+
" provider=AnthropicProvider(model=MODEL, max_tokens=400),\n",
133+
" compliance=ComplianceLayer(),\n",
134+
" safety=SafetyLayer(\n",
135+
" rules=[\n",
136+
" Rule(\n",
137+
" rule_id=\"block_bulk_export\",\n",
138+
" action=Verdict.BLOCK,\n",
139+
" pattern=r\"\\b(dump|export)\\b.*\\b(users?|accounts?|secrets?)\\b\",\n",
140+
" detail=\"Bulk data export is blocked before provider contact.\",\n",
141+
" )\n",
142+
" ]\n",
143+
" ),\n",
144+
" reliability=ReliabilityLayer(max_concurrent=4, timeout_s=30.0),\n",
145+
" hitl=HITLLayer(require_approval_for=[\"send_email\"], timeout_s=2.0),\n",
146+
" tool_guard=tool_guard,\n",
147+
" escalate_policy={\"pre\": \"hitl\", \"post\": \"log\"},\n",
148+
")"
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"metadata": {},
155+
"outputs": [],
156+
"source": [
157+
"# Same tool proposal, now checked before execution.\n",
158+
"proposal = raw_tool_uses[0]\n",
159+
"decision = armor.validate_tool(\n",
160+
" proposal.name,\n",
161+
" proposal.input,\n",
162+
" tenant_id=\"internal_demo\",\n",
163+
" session_id=\"cookbook\",\n",
164+
")\n",
165+
"print(\"tool verdict:\", decision.verdict)\n",
166+
"print(\"reason:\", decision.reason)\n",
167+
"\n",
168+
"if decision.verdict == Verdict.ALLOW:\n",
169+
" print(\"Would execute mock tool now.\")\n",
170+
"elif decision.verdict == Verdict.ESCALATE:\n",
171+
" print(\"Tool execution is held for approval; no side effect runs yet.\")\n",
172+
"else:\n",
173+
" print(\"Tool execution is blocked.\")"
174+
]
175+
},
176+
{
177+
"cell_type": "code",
178+
"execution_count": null,
179+
"metadata": {},
180+
"outputs": [],
181+
"source": [
182+
"# A cross-tenant attempt is rejected deterministically, regardless of model wording.\n",
183+
"cross_tenant_decision = armor.validate_tool(\n",
184+
" \"send_email\",\n",
185+
" {\"to\": \"sam@example.com\", \"subject\": \"Report\", \"body\": \"The deployment report is ready.\"},\n",
186+
" tenant_id=\"external_tenant\",\n",
187+
" session_id=\"cookbook\",\n",
188+
")\n",
189+
"print(\"cross-tenant verdict:\", cross_tenant_decision.verdict)\n",
190+
"print(\"reason:\", cross_tenant_decision.reason)"
191+
]
192+
},
193+
{
194+
"cell_type": "markdown",
195+
"metadata": {},
196+
"source": [
197+
"## Trace as a first-class response field\n",
198+
"\n",
199+
"Now run an agent request through the full trust stack. This example asks for an email action, so Claude is called, but the final action is held by the human-in-the-loop gate. The important distinction is that this is not a model refusal; it is a deterministic application control saying the action was not executed without approval."
200+
]
201+
},
202+
{
203+
"cell_type": "code",
204+
"execution_count": null,
205+
"metadata": {},
206+
"outputs": [],
207+
"source": [
208+
"response = await armor.run(\n",
209+
" \"Send an email to sam@example.com saying the deployment report is ready.\",\n",
210+
" tenant_id=\"internal_demo\",\n",
211+
" session_id=\"cookbook-trace\",\n",
212+
" action=\"send_email\",\n",
213+
")\n",
214+
"\n",
215+
"print(\"blocked:\", response.blocked)\n",
216+
"print(\"hitl:\", response.hitl)\n",
217+
"print(\"output:\", response.output)\n",
218+
"print(\"trace hash:\", response.trace.this_hash)"
219+
]
220+
},
221+
{
222+
"cell_type": "code",
223+
"execution_count": null,
224+
"metadata": {},
225+
"outputs": [],
226+
"source": [
227+
"trace_rows = [\n",
228+
" {\n",
229+
" \"layer\": event.layer,\n",
230+
" \"decision\": event.decision,\n",
231+
" \"detail\": event.detail,\n",
232+
" \"latency_ms\": round(event.latency_ms, 2),\n",
233+
" }\n",
234+
" for event in response.trace.layer_events\n",
235+
"]\n",
236+
"pprint(trace_rows)\n",
237+
"print(\"chain valid:\", armor.audit.verify_chain())"
238+
]
239+
},
240+
{
241+
"cell_type": "markdown",
242+
"metadata": {},
243+
"source": [
244+
"## When to use this pattern\n",
245+
"\n",
246+
"This pattern is useful when an agent can call tools, move data, contact users, modify accounts, or trigger any other side effect. It is especially helpful for RBAC-scoped agents, regulated workflows, human approval queues, and systems where an auditor needs to reconstruct why a decision was allowed, blocked, or held. The tradeoff is extra code and, for HITL paths, extra latency. Keep the policy narrow: guard the consequential paths first instead of wrapping every harmless text-only call."
247+
]
248+
},
249+
{
250+
"cell_type": "markdown",
251+
"metadata": {},
252+
"source": [
253+
"## Recap and resources\n",
254+
"\n",
255+
"- Keep model reasoning and control boundaries separate.\n",
256+
"- Let Claude propose useful actions, but validate tool execution outside the model.\n",
257+
"- Treat `ESCALATE` as a real state: held until approval, not silently executed.\n",
258+
"- Emit structured traces so reviewers can inspect the decision path later.\n",
259+
"\n",
260+
"Resources:\n",
261+
"\n",
262+
"- [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)\n",
263+
"- [Pramagent on GitHub](https://github.com/sriram7737/pramagent)\n",
264+
"- [Pramagent on PyPI](https://pypi.org/project/pramagent/)\n",
265+
"- [Implementation status](https://github.com/sriram7737/pramagent/blob/main/docs/IMPLEMENTATION_STATUS.md)"
266+
]
267+
}
268+
],
269+
"metadata": {
270+
"kernelspec": {
271+
"display_name": "Python 3",
272+
"language": "python",
273+
"name": "python3"
274+
},
275+
"language_info": {
276+
"name": "python",
277+
"pygments_lexer": "ipython3"
278+
}
279+
},
280+
"nbformat": 4,
281+
"nbformat_minor": 5
282+
}

registry.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,20 @@
523523
date: '2026-06-08'
524524
categories:
525525
- Agent Patterns
526+
- title: Adding a trust layer to Claude agent loops
527+
description: Wrap a Claude tool loop with deterministic guardrails-as-code for
528+
tool validation, HITL escalation, and inspectable audit traces.
529+
path: patterns/agents/trust_middleware_guardrails.ipynb
530+
authors:
531+
- sriram7737
532+
date: '2026-06-16'
533+
categories:
534+
- Agent Patterns
535+
- Tools
536+
tags:
537+
- guardrails
538+
- tool-use
539+
- safety
526540
- title: Introduction to Claude Skills
527541
description: Create documents, analyze data, automate workflows with Claude's Excel,
528542
PowerPoint, PDF skills.

0 commit comments

Comments
 (0)