Research-backed solutions to the three critical failure modes that break AI agents in production: hallucinations, timeouts, and memory loss.
This repository demonstrates research-backed techniques for preventing AI agent failures with working code examples.
| 🚨 Failure Mode | 💡 Solution Approach | 📊 Projects | ⏱️ Total Time |
|---|---|---|---|
| Hallucinations | Detection and mitigation through 4 techniques | 4 demos | 2 hours |
| Getting Stuck | Context overflow, MCP timeouts, reasoning loops | 3 demos | 1.5 hours |
| Memory Loss | Persistent memory and context retrieval | Coming soon | - |
The Problem: Agents fabricate statistics, choose wrong tools, ignore business rules, and claim success when operations fail.
The Solution: 4 research-backed techniques that detect, contain, and mitigate hallucinations before they cause damage.
| 📓 Demo | 🎯 Focus & Key Learning | ⏱️ Time | 📊 Level |
|---|---|---|---|
| 01 - Graph-RAG vs Traditional RAG | Structured data retrieval - Compare RAG vs Graph-RAG on 300 hotel FAQs, Neo4j knowledge graph with auto entity extraction, eliminate statistical hallucinations | 30 min | |
| 02 - Semantic Tool Selection | Intelligent tool filtering - Filter 31 tools to top 3 relevant, reduce errors and token costs, dynamic tool swapping | 45 min | |
| 03 - Multi-Agent Validation Pattern | Cross-validation workflows - Executor → Validator → Critic pattern catches hallucinations, Strands Swarm orchestration | 30 min | |
| 04 - Neurosymbolic Guardrails for AI Agents | Symbolic validation - Compare prompt engineering vs symbolic rules, business rule compliance, LLM cannot bypass | 20 min |
| 🎯 Technique | 📈 Improvement | 🔍 Metric |
|---|---|---|
| Graph-RAG | Accuracy | Precise queries on 300 hotel FAQs via knowledge graph |
| Semantic Tool Selection | Reduce errors and token costs | Tool selection hallucination detection (research validated), Token cost per query |
| Neurosymbolic Rules | Compliance | Business rule enforcement - LLM cannot bypass |
| Multi-Agent Validation | Detects errors | Invalid operation detection before reaching users |
→ Explore hallucination prevention demos
The Problem: Agents get stuck when context windows overflow with large data, MCP tools stop responding on slow APIs, or agents repeat the same tool calls without making progress — burning tokens and blocking workflows.
The Solution: 3 research-backed techniques that prevent context overflow, handle unresponsive APIs, and detect reasoning loops before they waste resources.
| 📓 Demo | 🎯 Focus & Key Learning | ⏱️ Time | 📊 Level |
|---|---|---|---|
| 01 - Context Window Overflow | Memory management — Store large data outside context with Memory Pointer Pattern, 7x token reduction validated by IBM Research | 30 min | |
| 02 - MCP Tools Not Responding | Async patterns — Handle slow/unresponsive APIs with async handleId, prevent 424 errors, immediate responses | 20 min | |
| 03 - Reasoning Loops | Loop detection — DebounceHook blocks duplicate calls, clear SUCCESS/FAILED states stop retries, 7x fewer tool calls | 25 min |
→ Explore token waste prevention demos
(Coming soon)
Details
| 🔧 Technology | 🎯 Purpose | ⚡ Key Capabilities |
|---|---|---|
| Strands Agents | AI agent framework | Dynamic tool swapping, multi-agent orchestration, conversation memory, hooks system |
| Amazon Bedrock | LLM access | Claude 3 Haiku/Sonnet for agent reasoning and tool calling |
| Neo4j | Graph database | Relationship-aware queries, precise aggregations, multi-hop traversal |
| FAISS | Vector search | Semantic similarity, tool filtering, efficient nearest neighbor search |
| SentenceTransformers | Embeddings | Text embeddings for semantic tool selection and memory retrieval |
Before You Begin:
- Python 3.9+ installed locally
- LLM access: OpenAI (default), Amazon Bedrock, Anthropic, or Ollama
OPENAI_API_KEYenvironment variable (for default setup)- AWS CLI configured if using Amazon Bedrock (
aws configure) - Basic understanding of AI agents and tool calling
Model Configuration: All demos use OpenAI with GPT-4o-mini by default. You can swap to any provider supported by Strands — see Strands Model Providers for configuration.
AWS Credentials Setup (if using Amazon Bedrock): Follow the AWS credentials configuration guide to configure your environment.
git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-failcd stop-ai-agent-hallucinationsEach demo folder contains detailed README files and working code examples.
| 💰 Service | 💵 Approximate Cost | 📊 Usage Pattern | 🔗 Pricing Link |
|---|---|---|---|
| OpenAI GPT-4o-mini | ~$0.15 per 1M input tokens | Agent reasoning and tool calling | OpenAI Pricing |
| Amazon Bedrock (Claude) | ~$0.25 per 1M input tokens | Alternative LLM provider | Bedrock Pricing |
| Neo4j (local) | Free | Graph database for demos | Neo4j Pricing |
| FAISS (local) | Free | Vector search library | FAISS GitHub |
| SentenceTransformers | Free | Local embeddings | SBERT Docs |
💡 All demos can run locally with minimal costs. OpenAI GPT-4o-mini is the most cost-effective option for testing.
- Strands Agents Documentation - Framework documentation and model providers
- Amazon Bedrock Documentation - LLM service guide and model access
- Search for tools in your AgentCore gateway with a natural language query
- Neo4j Graph Database Guide - Graph database setup and Cypher queries
Contributions are welcome! See CONTRIBUTING for more information.
If you discover a potential security issue in this project, notify AWS/Amazon Security via the vulnerability reporting page. Please do not create a public GitHub issue.
This library is licensed under the MIT-0 License. See the LICENSE file for details.