Add agentrial to LLM Evaluation list by alepot55 · Pull Request #297 · Hannibal046/Awesome-LLM

alepot55 · 2026-02-06T21:39:57Z

AI agents pass benchmarks but fail in production. Why? Single-run evaluations hide variance. agentrial runs your agent N times, computes Wilson confidence intervals, and uses Fisher exact tests to detect regressions in CI/CD. pip install agentrial, write a YAML, done.

GitHub: https://github.com/alepot55/agentrial

Added agentrial to LLM Evaluation section in README.

Add agentrial to LLM Evaluation list

74a07f7

Added agentrial to LLM Evaluation section in README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agentrial to LLM Evaluation list#297

Add agentrial to LLM Evaluation list#297
alepot55 wants to merge 1 commit intoHannibal046:mainfrom
alepot55:patch-1

alepot55 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alepot55 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant