feat(patterns): add trust middleware guardrails recipe

sriram7737 · sriram7737 · commit ad193a150956 · 2026-06-16T14:58:30.000-04:00
diff --git a/authors.yaml b/authors.yaml
@@ -128,6 +128,10 @@ rodrigo-olivares:
   name: Rodrigo Olivares
   website: https://github.com/rodrigo-olivares
   avatar: https://avatars.githubusercontent.com/u/185015001?v=4
+sriram7737:
+  name: Sriram Rampelli
+  website: https://github.com/sriram7737
+  avatar: https://avatars.githubusercontent.com/u/79433129?v=4
 zealoushacker:
   name: Alex Notov
   website: https://github.com/zealoushacker
diff --git a/patterns/agents/trust_middleware_guardrails.ipynb b/patterns/agents/trust_middleware_guardrails.ipynb
@@ -0,0 +1,282 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Adding a trust layer to Claude agent loops\n",
+    "\n",
+    "Claude can reason about tools, but production agent loops still need deterministic controls outside the model. This notebook shows a guardrails-as-code pattern: let Claude propose useful work, then validate tool calls, enforce policy, and emit an inspectable trace before anything consequential executes. It extends the agent-loop ideas in Anthropic's [Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) post with a small middleware layer. Pramagent is used as the concrete implementation, but the pattern is the important part: the LLM is not the final authority."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -q \"anthropic>=0.87.0\" \"pramagent>=0.8.0\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "import json\n",
+    "import os\n",
+    "from pprint import pprint\n",
+    "\n",
+    "import anthropic\n",
+    "from pramagent import Pramagent, Verdict\n",
+    "from pramagent.layers import ComplianceLayer, HITLLayer, ReliabilityLayer, Rule, SafetyLayer\n",
+    "from pramagent.layers import ToolGuardLayer, ToolPolicy\n",
+    "from pramagent.layers.tool_guard import SideEffect\n",
+    "from pramagent.providers import AnthropicProvider\n",
+    "\n",
+    "MODEL = \"claude-haiku-4-5\"\n",
+    "api_key = os.environ.get(\"ANTHROPIC_API_KEY\")\n",
+    "if not api_key:\n",
+    "    raise RuntimeError(\"Set ANTHROPIC_API_KEY in your environment before running this notebook.\")\n",
+    "\n",
+    "client = anthropic.Anthropic(api_key=api_key)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The baseline agent loop\n",
+    "\n",
+    "Start with a tiny Claude tool loop. It has one mock side-effect tool, `send_email`. The baseline lets Claude choose the tool and arguments, then the application executes the proposed tool directly. That is useful for a demo, but it leaves several production gaps: unvalidated tool calls, no tenant policy, no human approval gate, and no audit trace explaining why execution was allowed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tools = [\n",
+    "    {\n",
+    "        \"name\": \"send_email\",\n",
+    "        \"description\": \"Send an email to a user. This is a mock tool in the notebook.\",\n",
+    "        \"input_schema\": {\n",
+    "            \"type\": \"object\",\n",
+    "            \"properties\": {\n",
+    "                \"to\": {\"type\": \"string\"},\n",
+    "                \"subject\": {\"type\": \"string\"},\n",
+    "                \"body\": {\"type\": \"string\"},\n",
+    "            },\n",
+    "            \"required\": [\"to\", \"subject\", \"body\"],\n",
+    "        },\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "baseline_prompt = \"Send an email to sam@example.com saying the deployment report is ready.\"\n",
+    "\n",
+    "message = client.messages.create(\n",
+    "    model=MODEL,\n",
+    "    max_tokens=400,\n",
+    "    tools=tools,\n",
+    "    messages=[{\"role\": \"user\", \"content\": baseline_prompt}],\n",
+    ")\n",
+    "\n",
+    "raw_tool_uses = [block for block in message.content if block.type == \"tool_use\"]\n",
+    "print(\"Claude proposed tool calls:\")\n",
+    "for tool_use in raw_tool_uses:\n",
+    "    print(tool_use.name)\n",
+    "    pprint(tool_use.input)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Wrapping the loop with a trust layer\n",
+    "\n",
+    "The trust layer sits between the model output and tool execution. Tool policy is defined as code, not as a prompt instruction. Here the model may propose `send_email`, but the middleware checks schema, side-effect class, and tenant scope first. Consequential actions escalate to human review; silence is treated as no approval."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tool_guard = ToolGuardLayer(\n",
+    "    policies=[\n",
+    "        ToolPolicy(\n",
+    "            name=\"send_email\",\n",
+    "            side_effect=SideEffect.EXTERNAL_MESSAGE,\n",
+    "            action=Verdict.ESCALATE,\n",
+    "            allowed_tenants={\"internal_demo\"},\n",
+    "            schema={\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"to\": {\"type\": \"string\", \"maxLength\": 200},\n",
+    "                    \"subject\": {\"type\": \"string\", \"maxLength\": 120},\n",
+    "                    \"body\": {\"type\": \"string\", \"maxLength\": 2000},\n",
+    "                },\n",
+    "                \"required\": [\"to\", \"subject\", \"body\"],\n",
+    "                \"additionalProperties\": False,\n",
+    "            },\n",
+    "            detail=\"External message requires explicit approval before execution.\",\n",
+    "        )\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "armor = Pramagent(\n",
+    "    provider=AnthropicProvider(model=MODEL, max_tokens=400),\n",
+    "    compliance=ComplianceLayer(),\n",
+    "    safety=SafetyLayer(\n",
+    "        rules=[\n",
+    "            Rule(\n",
+    "                rule_id=\"block_bulk_export\",\n",
+    "                action=Verdict.BLOCK,\n",
+    "                pattern=r\"\\b(dump|export)\\b.*\\b(users?|accounts?|secrets?)\\b\",\n",
+    "                detail=\"Bulk data export is blocked before provider contact.\",\n",
+    "            )\n",
+    "        ]\n",
+    "    ),\n",
+    "    reliability=ReliabilityLayer(max_concurrent=4, timeout_s=30.0),\n",
+    "    hitl=HITLLayer(require_approval_for=[\"send_email\"], timeout_s=2.0),\n",
+    "    tool_guard=tool_guard,\n",
+    "    escalate_policy={\"pre\": \"hitl\", \"post\": \"log\"},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Same tool proposal, now checked before execution.\n",
+    "proposal = raw_tool_uses[0]\n",
+    "decision = armor.validate_tool(\n",
+    "    proposal.name,\n",
+    "    proposal.input,\n",
+    "    tenant_id=\"internal_demo\",\n",
+    "    session_id=\"cookbook\",\n",
+    ")\n",
+    "print(\"tool verdict:\", decision.verdict)\n",
+    "print(\"reason:\", decision.reason)\n",
+    "\n",
+    "if decision.verdict == Verdict.ALLOW:\n",
+    "    print(\"Would execute mock tool now.\")\n",
+    "elif decision.verdict == Verdict.ESCALATE:\n",
+    "    print(\"Tool execution is held for approval; no side effect runs yet.\")\n",
+    "else:\n",
+    "    print(\"Tool execution is blocked.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# A cross-tenant attempt is rejected deterministically, regardless of model wording.\n",
+    "cross_tenant_decision = armor.validate_tool(\n",
+    "    \"send_email\",\n",
+    "    {\"to\": \"sam@example.com\", \"subject\": \"Report\", \"body\": \"The deployment report is ready.\"},\n",
+    "    tenant_id=\"external_tenant\",\n",
+    "    session_id=\"cookbook\",\n",
+    ")\n",
+    "print(\"cross-tenant verdict:\", cross_tenant_decision.verdict)\n",
+    "print(\"reason:\", cross_tenant_decision.reason)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Trace as a first-class response field\n",
+    "\n",
+    "Now run an agent request through the full trust stack. This example asks for an email action, so Claude is called, but the final action is held by the human-in-the-loop gate. The important distinction is that this is not a model refusal; it is a deterministic application control saying the action was not executed without approval."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = await armor.run(\n",
+    "    \"Send an email to sam@example.com saying the deployment report is ready.\",\n",
+    "    tenant_id=\"internal_demo\",\n",
+    "    session_id=\"cookbook-trace\",\n",
+    "    action=\"send_email\",\n",
+    ")\n",
+    "\n",
+    "print(\"blocked:\", response.blocked)\n",
+    "print(\"hitl:\", response.hitl)\n",
+    "print(\"output:\", response.output)\n",
+    "print(\"trace hash:\", response.trace.this_hash)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trace_rows = [\n",
+    "    {\n",
+    "        \"layer\": event.layer,\n",
+    "        \"decision\": event.decision,\n",
+    "        \"detail\": event.detail,\n",
+    "        \"latency_ms\": round(event.latency_ms, 2),\n",
+    "    }\n",
+    "    for event in response.trace.layer_events\n",
+    "]\n",
+    "pprint(trace_rows)\n",
+    "print(\"chain valid:\", armor.audit.verify_chain())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## When to use this pattern\n",
+    "\n",
+    "This pattern is useful when an agent can call tools, move data, contact users, modify accounts, or trigger any other side effect. It is especially helpful for RBAC-scoped agents, regulated workflows, human approval queues, and systems where an auditor needs to reconstruct why a decision was allowed, blocked, or held. The tradeoff is extra code and, for HITL paths, extra latency. Keep the policy narrow: guard the consequential paths first instead of wrapping every harmless text-only call."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Recap and resources\n",
+    "\n",
+    "- Keep model reasoning and control boundaries separate.\n",
+    "- Let Claude propose useful actions, but validate tool execution outside the model.\n",
+    "- Treat `ESCALATE` as a real state: held until approval, not silently executed.\n",
+    "- Emit structured traces so reviewers can inspect the decision path later.\n",
+    "\n",
+    "Resources:\n",
+    "\n",
+    "- [Anthropic: Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)\n",
+    "- [Pramagent on GitHub](https://github.com/sriram7737/pramagent)\n",
+    "- [Pramagent on PyPI](https://pypi.org/project/pramagent/)\n",
+    "- [Implementation status](https://github.com/sriram7737/pramagent/blob/main/docs/IMPLEMENTATION_STATUS.md)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/registry.yaml b/registry.yaml
@@ -523,6 +523,20 @@
   date: '2026-06-08'
   categories:
   - Agent Patterns
+- title: Adding a trust layer to Claude agent loops
+  description: Wrap a Claude tool loop with deterministic guardrails-as-code for
+    tool validation, HITL escalation, and inspectable audit traces.
+  path: patterns/agents/trust_middleware_guardrails.ipynb
+  authors:
+  - sriram7737
+  date: '2026-06-16'
+  categories:
+  - Agent Patterns
+  - Tools
+  tags:
+  - guardrails
+  - tool-use
+  - safety
 - title: Introduction to Claude Skills
   description: Create documents, analyze data, automate workflows with Claude's Excel,
     PowerPoint, PDF skills.