Skip to content

aws-samples/sample-why-agents-fail

🤖 Why AI Agents Fail (And How to Fix Them)

GitHub stars License Python AWS Strands

Research-backed solutions to the three critical failure modes that break AI agents in production: hallucinations, timeouts, and memory loss.

Star this repository


🎯 Learning Path: Understand → Prevent → Scale

This repository demonstrates research-backed techniques for preventing AI agent failures with working code examples.

🚨 Failure Mode 💡 Solution Approach 📊 Projects ⏱️ Total Time
Hallucinations Detection and mitigation through 4 techniques 4 demos 2 hours
Getting Stuck Context overflow, MCP timeouts, reasoning loops 3 demos 1.5 hours
Memory Loss Persistent memory and context retrieval Coming soon -

🎭 Stop AI Agent Hallucinations

The Problem: Agents fabricate statistics, choose wrong tools, ignore business rules, and claim success when operations fail.

The Solution: 4 research-backed techniques that detect, contain, and mitigate hallucinations before they cause damage.

📓 Hallucination Prevention Demos

📓 Demo 🎯 Focus & Key Learning ⏱️ Time 📊 Level
01 - Graph-RAG vs Traditional RAG Structured data retrieval - Compare RAG vs Graph-RAG on 300 hotel FAQs, Neo4j knowledge graph with auto entity extraction, eliminate statistical hallucinations 30 min Intermediate
02 - Semantic Tool Selection Intelligent tool filtering - Filter 31 tools to top 3 relevant, reduce errors and token costs, dynamic tool swapping 45 min Intermediate
03 - Multi-Agent Validation Pattern Cross-validation workflows - Executor → Validator → Critic pattern catches hallucinations, Strands Swarm orchestration 30 min Intermediate
04 - Neurosymbolic Guardrails for AI Agents Symbolic validation - Compare prompt engineering vs symbolic rules, business rule compliance, LLM cannot bypass 20 min Intermediate

📊 Key Results

🎯 Technique 📈 Improvement 🔍 Metric
Graph-RAG Accuracy Precise queries on 300 hotel FAQs via knowledge graph
Semantic Tool Selection Reduce errors and token costs Tool selection hallucination detection (research validated), Token cost per query
Neurosymbolic Rules Compliance Business rule enforcement - LLM cannot bypass
Multi-Agent Validation Detects errors Invalid operation detection before reaching users

→ Explore hallucination prevention demos


🔄 Stop Agents from Wasting Tokens

The Problem: Agents get stuck when context windows overflow with large data, MCP tools stop responding on slow APIs, or agents repeat the same tool calls without making progress — burning tokens and blocking workflows.

The Solution: 3 research-backed techniques that prevent context overflow, handle unresponsive APIs, and detect reasoning loops before they waste resources.

📓 Token Waste & Stuck Agent Demos

📓 Demo 🎯 Focus & Key Learning ⏱️ Time 📊 Level
01 - Context Window Overflow Memory management — Store large data outside context with Memory Pointer Pattern, 7x token reduction validated by IBM Research 30 min Intermediate
02 - MCP Tools Not Responding Async patterns — Handle slow/unresponsive APIs with async handleId, prevent 424 errors, immediate responses 20 min Intermediate
03 - Reasoning Loops Loop detection — DebounceHook blocks duplicate calls, clear SUCCESS/FAILED states stop retries, 7x fewer tool calls 25 min Intermediate

→ Explore token waste prevention demos


Your Agent Doesn't Remember You

(Coming soon)


🔧 Technologies Used

Details
🔧 Technology 🎯 Purpose ⚡ Key Capabilities
Strands Agents AI agent framework Dynamic tool swapping, multi-agent orchestration, conversation memory, hooks system
Amazon Bedrock LLM access Claude 3 Haiku/Sonnet for agent reasoning and tool calling
Neo4j Graph database Relationship-aware queries, precise aggregations, multi-hop traversal
FAISS Vector search Semantic similarity, tool filtering, efficient nearest neighbor search
SentenceTransformers Embeddings Text embeddings for semantic tool selection and memory retrieval

Prerequisites

Before You Begin:

  • Python 3.9+ installed locally
  • LLM access: OpenAI (default), Amazon Bedrock, Anthropic, or Ollama
  • OPENAI_API_KEY environment variable (for default setup)
  • AWS CLI configured if using Amazon Bedrock (aws configure)
  • Basic understanding of AI agents and tool calling

Model Configuration: All demos use OpenAI with GPT-4o-mini by default. You can swap to any provider supported by Strands — see Strands Model Providers for configuration.

AWS Credentials Setup (if using Amazon Bedrock): Follow the AWS credentials configuration guide to configure your environment.


🚀 Quick Start Guide

1. Clone Repository

git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail

2. Start with Hallucinations

cd stop-ai-agent-hallucinations

3. Explore All Techniques

Each demo folder contains detailed README files and working code examples.


💰 Cost Estimation

💰 Service 💵 Approximate Cost 📊 Usage Pattern 🔗 Pricing Link
OpenAI GPT-4o-mini ~$0.15 per 1M input tokens Agent reasoning and tool calling OpenAI Pricing
Amazon Bedrock (Claude) ~$0.25 per 1M input tokens Alternative LLM provider Bedrock Pricing
Neo4j (local) Free Graph database for demos Neo4j Pricing
FAISS (local) Free Vector search library FAISS GitHub
SentenceTransformers Free Local embeddings SBERT Docs

💡 All demos can run locally with minimal costs. OpenAI GPT-4o-mini is the most cost-effective option for testing.


📖 Additional Learning Resources



🤝 Contributing

Contributions are welcome! See CONTRIBUTING for more information.


Security

If you discover a potential security issue in this project, notify AWS/Amazon Security via the vulnerability reporting page. Please do not create a public GitHub issue.


📄 License

This library is licensed under the MIT-0 License. See the LICENSE file for details.

About

How to stop AI agents from hallucinating and wasting tokens. Working demos: Graph-RAG, semantic tool selection, neurosymbolic guardrails, DebounceHook — built with Strands Agents

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors