MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Authors: Weichen Zhang, Yiyou Sun, Pohao Huang, Jiayue Pu, Heyue Lin, Dawn Song

University of California, Berkeley

Abstract

Hallucinations pose critical risks in large language model (LLM)-based agents. When outputs are inconsistent with the contextual or environmental reality, they manifest incorrect or harmful actions. While recent study have exposed such failures, existing evaluations remain fragmented and lack a principled testbed. In this paper, we present MIRAGE-Bench—Measuring Illusions in Risky AGEnt settings—the first unified benchmark for eliciting and evaluating hallucinations in interactive LLM-agent scenarios. We begin by introducing a three-part taxonomy to address agentic hallucinations: actions that are unfaithful to (i) task instructions, (ii) execution history, or (iii) environment observations. To analyze, we first elicit such failures by performing a systematic audit of existing agent benchmarks, then synthesize test cases using a snapshot strategy that isolates decision points in deterministic and reproducible manners. To evaluate hallucination behaviors, we adopt a fine-grained-level LLM-as-a-Judge paradigm with tailored risk-aware prompts, enabling scalable, high-fidelity assessment of agent actions without enumerating full action spaces. MIRAGE-Bench provides actionable insights on failure modes of LLM agents and lays the groundwork for principled progress in mitigating hallucinations in interactive environments.

🚀 Quick Start

Clone the repository

git clone https://github.com/sunblaze-ucb/mirage-bench.git
cd mirage-bench

Create and activate the Conda environment

git clone
conda env create -f environment.yml
conda activate mirage

Run inference for all models

bash inference_all.sh

Verify all inference results using LLM-as-a-Judge

bash verify_all.sh

Compute metrics

python ./script/calculate_utility_score.py
python ./script/calculate_hallucination_rate.py

Map results to unified risk settings in the paper

python ./script/unify_results.py

📚 Citation

If you use MIRAGE in your research, please cite:

@misc{zhang2025miragebenchllmagenthallucinating,
      title={MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them}, 
      author={Weichen Zhang and Yiyou Sun and Pohao Huang and Jiayue Pu and Heyue Lin and Dawn Song},
      year={2025},
      eprint={2507.21017},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.21017}, 
}

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

For questions or issues, please open a GitHub issue or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset_all		dataset_all
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference_all.sh		inference_all.sh
verify_all.sh		verify_all.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Abstract

🚀 Quick Start

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Languages

License

sunblaze-ucb/mirage-bench

Folders and files

Latest commit

History

Repository files navigation

MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Abstract

🚀 Quick Start

📚 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages