ReAct agents waste 92.6% of retries. Here's the architecture fix (error taxonomy + circuit breakers + deterministic routing) that drops waste to 0%.
90.8% of retries in a standard ReAct agent are wasted on errors that can never succeed.
This repo contains the benchmark that found it — and the three structural fixes that eliminate it.
📄 Article: Your ReAct Agent Is Wasting 90% of Its Retries — And You Don't Even See It (Towards Data Science)
A deterministic, fully reproducible simulation comparing two agent architectures across 200 tasks:
- ReAct Agent — standard Thought → Action → Observation loop with a global retry counter
- Controlled Workflow — deterministic plan execution with error taxonomy, per-tool circuit breakers, and typed tool routing
The single architectural difference: where tool names are resolved — from LLM output (ReAct) or from a Python dict at plan time (Workflow).
git clone https://github.com/Emmimal/react-retry-waste-analysis
cd react-retry-waste-analysis
pip install matplotlib # only dependency beyond stdlib
python app.py --seed 42Every number in the article is reproduced exactly by --seed 42.
| Metric | ReAct | Workflow |
|---|---|---|
| Success rate | 89.5% | 100.0% |
| Total retries | 513 | 80 |
| Wasted retries | 466 (90.8%) | 0 (0.0%) |
| Hallucination events | 155 | 0 |
| Step σ | 1.36 | 0.46 |
| P95 latency (ms) | 143.3 | 146.2 |
| Estimated cost ($) | $0.3450 | $0.3222 |
# Full 200-task benchmark
python app.py --seed 42
# Watch a single task execute in verbose mode
python app.py --replay 7
# Export results to JSON
python app.py --seed 42 --export-json
# Custom task count or seed
python app.py --tasks 500 --seed 99
# Skip plot generation
python app.py --no-plots
# Custom plot output directory
python app.py --plot-dir my_plotsRunning the benchmark produces:
- Full results table (success rate, retry budget, error taxonomy, latency, cost)
- Sensitivity analysis at hallucination rates of 5%, 15%, and 28%
- 6 figures saved to
plots/:fig1_success_hallucinations.pngfig2_retry_budget.pngfig3_step_distribution.pngfig4_error_taxonomy.pngfig5_latency_cdf.pngfig6_sensitivity.png
Classify errors at the point they're raised. Non-retryable errors (TOOL_NOT_FOUND, INVALID_INPUT) emit RETRY_SKIPPED and consume zero budget. Applicable to a ReAct agent without changing its architecture.
Each tool gets its own CircuitBreaker instance. A degraded tool fails fast without draining budget for other tools.
Tool names are resolved from STEP_TO_TOOL: dict[StepKind, str] at plan time — never from LLM output. Hallucination at the routing layer becomes structurally impossible.
| Parameter | Value | Notes |
|---|---|---|
SEED |
42 | Global random seed |
NUM_TASKS |
200 | Tasks per experiment |
hallucination_rate |
28% | Conservative estimate from published benchmarks |
HALLUCINATION_RETRY_BURN |
3 | Retry slots burned per hallucination event |
MAX_REACT_RETRIES |
6 | Global retry budget for ReAct |
SENSITIVITY_RATES |
5%, 15%, 28% | Hallucination rates for sensitivity sweep |
Note: The 28% hallucination rate is a calibrated parameter, not a directly reported figure. Your observed rate will vary with model, prompt quality, and tool schema design.
- Latency figures are simulated — do not use for capacity planning
HALLUCINATION_RETRY_BURN = 3influences the exact waste percentage; the structural conclusion (workflow wastes 0%) holds at all values- The workflow's zero hallucination count is a simulation design property; hallucinations can still occur upstream of routing in production
- Three tools is a simplified environment; threshold values will need tuning for your workload
- Yao et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arxiv.org/abs/2210.03629
- Shinn et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arxiv.org/abs/2303.11366
- Fowler, M. (2014). CircuitBreaker. martinfowler.com
- Python 3.9+
matplotlib(plots only — all other dependencies are stdlib)
MIT