Skip to content

Commit 567cc76

Browse files
committed
docs: add design patterns for securing agents research analysis
1 parent 03942a1 commit 567cc76

1 file changed

Lines changed: 212 additions & 0 deletions

File tree

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Design Patterns for Securing LLM Agents - Research Analysis
2+
3+
## Paper Summary
4+
5+
**Full Title:** Design Patterns for Securing LLM Agents against Prompt Injections
6+
**Authors:** Beat Buesser (IBM), Ana-Maria Creţu (EPFL), Edoardo Debenedetti (ETH Zurich), Daniel Dobos (Swisscom), Daniel Fabian (Google), Marc Fischer (Invariant Labs), David Froelicher (Swisscom), Kathrin Grosse (IBM), Daniel Naeff (ETH AI Center), Ezinwanne Ozoani (AppliedAI Institute for Europe), Andrew Paverd (Microsoft), Florian Tramèr (ETH Zurich), Václav Volhejn (Kyutai)
7+
**Date:** June 2025 (arXiv:2506.08837v1)
8+
**URL:** https://arxiv.org/html/2506.08837v1
9+
10+
### Key Contribution
11+
12+
This paper presents **six principled design patterns** for building LLM agents with **provable resistance to prompt injection attacks**. The authors argue that general-purpose agents cannot provide meaningful security guarantees, and instead propose constrained agent designs that trade some utility for security.
13+
14+
### Core Design Patterns
15+
16+
The paper presents six main patterns, all following a common principle: **once an LLM agent has ingested untrusted input, it must be constrained so that input cannot trigger consequential actions**.
17+
18+
#### 1. Action-Selector Pattern
19+
- **Description:** Agent acts as an LLM-modulated "switch" that selects from predefined actions
20+
- **Security:** Immune to prompt injections - no feedback from actions back to agent
21+
- **Trade-off:** Limited utility, loses fuzzy search capabilities
22+
23+
#### 2. Plan-Then-Execute Pattern
24+
- **Description:** Agent commits to a fixed action plan before processing untrusted data
25+
- **Security:** Provides "control flow integrity" - untrusted data cannot change the plan
26+
- **Limitation:** Can still manipulate action parameters, doesn't prevent user prompt injections
27+
28+
#### 3. LLM Map-Reduce Pattern
29+
- **Description:** Dispatch isolated sub-agents to process individual pieces of untrusted data, aggregate safely
30+
- **Security:** Strict isolation prevents injection spread between data pieces
31+
- **Implementation:** Two approaches - non-LLM reduce operation or constrained LLM reduce with safety checks
32+
33+
#### 4. Dual LLM Pattern
34+
- **Description:** Separate privileged LLM (tools, no untrusted data) from quarantined LLM (no tools, processes untrusted data)
35+
- **Security:** Symbolic variable handling prevents direct injection feedback
36+
- **Architecture:** Requires orchestrator component for safe variable dereferencing
37+
38+
#### 5. Code-Then-Execute Pattern
39+
- **Description:** Agent writes formal program code to solve task, then executes it
40+
- **Security:** Explicit program structure limits attack surface
41+
- **Generalization:** Extends plan-then-execute with formal programming constructs
42+
43+
#### 6. Context-Minimization Pattern
44+
- **Description:** Remove user prompt from context after action determination
45+
- **Security:** Prevents prompt injections from affecting response generation
46+
- **Application:** Particularly useful for customer service scenarios
47+
48+
### 10 Case Studies
49+
50+
The paper includes comprehensive case studies demonstrating real-world applicability:
51+
52+
1. **OS Assistant with Fuzzy Search** - File operations with untrusted file contents
53+
2. **SQL Agent** - Database queries with potential data contamination
54+
3. **Email and Calendar Assistant** - Third-party email processing
55+
4. **Customer Service Chatbot** - User prompt injection prevention
56+
5. **Booking Assistant** - Third-party service provider data
57+
6. **Recruitment Agent** - CV processing with injection risks
58+
7. **Healthcare Assistant** - Patient data privacy concerns
59+
8. **Software Engineering Agent** - Code repository analysis
60+
9. **Research Assistant** - Academic paper processing
61+
10. **Financial Analysis Agent** - Market data with potential manipulation
62+
63+
### Provability Claims and Threat Models
64+
65+
**Provability Approach:**
66+
- Patterns provide **formal guarantees** under specific constraints
67+
- Security holds **even if underlying LLM is vulnerable** to prompt injection
68+
- Focus on **system-level isolation** rather than model-level defenses
69+
70+
**Threat Models:**
71+
- **Indirect Prompt Injection:** Attacker controls third-party data processed by agent
72+
- **Direct Prompt Injection:** Malicious or inadvertent user input
73+
- **Data Exfiltration:** Unauthorized information extraction
74+
- **Unauthorized Actions:** Tool misuse, privilege escalation
75+
- **Denial of Service:** Resource exhaustion attacks
76+
77+
## Feature Delta with LLMTrace
78+
79+
### Architectural Layer Analysis
80+
81+
**LLMTrace operates at the proxy/transport layer:**
82+
- HTTP/WebSocket request/response interception
83+
- Token-level analysis and rate limiting
84+
- Request routing and authentication
85+
- Protocol-agnostic monitoring
86+
87+
**These patterns operate at the application/agent layer:**
88+
- LLM workflow orchestration
89+
- Tool access control and sandboxing
90+
- Context management and isolation
91+
- Agent reasoning and planning
92+
93+
### Pattern Applicability Analysis
94+
95+
| Pattern | Description | LLMTrace Applicability | Gap |
96+
|---------|-------------|----------------------|-----|
97+
| **Action-Selector** | Predefined action selection | **HIGH** - Can enforce action allowlists at proxy level | Need action pattern detection |
98+
| **Plan-Then-Execute** | Fixed action planning | **MEDIUM** - Can detect plan deviations in request patterns | Cannot enforce planning phase |
99+
| **LLM Map-Reduce** | Isolated sub-agent processing | **LOW** - Limited visibility into agent orchestration | No sub-agent tracking |
100+
| **Dual LLM** | Privileged/quarantined separation | **MEDIUM** - Can route to different endpoints based on data trust | Need trust level classification |
101+
| **Code-Then-Execute** | Formal program generation | **LOW** - Code execution happens post-proxy | Cannot inspect generated code |
102+
| **Context-Minimization** | Context cleanup | **HIGH** - Can strip context in request modification | Need context analysis capabilities |
103+
104+
### Existing Capability Mapping
105+
106+
**Current LLMTrace features that align with patterns:**
107+
108+
1. **Rate Limiting as Action Gating:**
109+
- Maps to Action-Selector pattern constraint mechanism
110+
- Can prevent rapid-fire unauthorized actions
111+
- Could be enhanced with action-type awareness
112+
113+
2. **Request/Response Filtering:**
114+
- Supports Context-Minimization through content stripping
115+
- Can implement basic prompt injection detection
116+
- Potential for pattern-aware filtering
117+
118+
3. **Multi-Model Routing:**
119+
- Enables Dual LLM pattern through endpoint separation
120+
- Can route trusted/untrusted requests to different models
121+
- Supports isolation boundaries
122+
123+
4. **Authentication and Authorization:**
124+
- Enforces tool access constraints (Action-Selector support)
125+
- Can implement privilege separation
126+
- Maps to agent capability restrictions
127+
128+
## Actionable Recommendations
129+
130+
### Immediate Enhancements
131+
132+
1. **Pattern-Aware Request Classification:**
133+
```
134+
- Add request metadata for agent pattern identification
135+
- Implement pattern-specific validation rules
136+
- Create pattern compliance scoring
137+
```
138+
139+
2. **Enhanced Action Detection:**
140+
```
141+
- Extend tool calling detection to support action allowlists
142+
- Add pattern matching for common agent action sequences
143+
- Implement action-type based rate limiting
144+
```
145+
146+
3. **Context Manipulation Features:**
147+
```
148+
- Add context stripping/minimization capabilities
149+
- Implement sensitive data redaction in requests
150+
- Support for context isolation enforcement
151+
```
152+
153+
### Medium-term Development
154+
155+
4. **Agent Pattern Compliance Monitoring:**
156+
```
157+
- Detect when agents violate declared patterns
158+
- Alert on suspicious action sequence deviations
159+
- Provide pattern adherence metrics
160+
```
161+
162+
5. **Trust-Based Routing:**
163+
```
164+
- Implement data source trust classification
165+
- Route untrusted data processing to quarantined endpoints
166+
- Support for symbolic variable handling
167+
```
168+
169+
6. **Advanced Prompt Injection Detection:**
170+
```
171+
- Pattern-specific injection detection algorithms
172+
- Integration with agent workflow understanding
173+
- Behavioral anomaly detection for pattern violations
174+
```
175+
176+
### Research Questions for LLMTrace
177+
178+
**High Priority:**
179+
- Can we enforce Action-Selector patterns through proxy-level allowlisting?
180+
- How can we detect Plan-Then-Execute pattern violations in request streams?
181+
- What agent pattern compliance metrics would be most valuable?
182+
183+
**Medium Priority:**
184+
- Can LLMTrace help implement Context-Minimization automatically?
185+
- How can we support Dual LLM routing with trust-level classification?
186+
- What behavioral signatures indicate agent pattern adherence?
187+
188+
**Long-term:**
189+
- Should LLMTrace provide a pattern-compliance-as-a-service layer?
190+
- Can we build an agent security pattern recommendation engine?
191+
- How can we integrate with agent frameworks to enforce patterns automatically?
192+
193+
### Implementation Priorities
194+
195+
1. **Phase 1 (1-2 months):** Action-Selector pattern support through enhanced allowlisting
196+
2. **Phase 2 (2-4 months):** Context-Minimization features and pattern detection
197+
3. **Phase 3 (4-6 months):** Full pattern compliance monitoring and trust-based routing
198+
4. **Phase 4 (6+ months):** Advanced behavioral analysis and pattern recommendation
199+
200+
### Strategic Value Proposition
201+
202+
**For LLMTrace:**
203+
- Positions as the security layer for agent pattern enforcement
204+
- Differentiates from basic prompt injection detection
205+
- Creates integration opportunities with agent frameworks
206+
207+
**For Users:**
208+
- Provides systematic agent security implementation guidance
209+
- Reduces agent security implementation complexity
210+
- Offers measurable security compliance metrics
211+
212+
This research demonstrates that **agent-level security patterns can be significantly enhanced by proxy-level enforcement**, creating a compelling integration opportunity for LLMTrace in the emerging agent security ecosystem.

0 commit comments

Comments
 (0)