Skip to content

Add agentrial to LLM Evaluation list#297

Open
alepot55 wants to merge 1 commit intoHannibal046:mainfrom
alepot55:patch-1
Open

Add agentrial to LLM Evaluation list#297
alepot55 wants to merge 1 commit intoHannibal046:mainfrom
alepot55:patch-1

Conversation

@alepot55
Copy link

@alepot55 alepot55 commented Feb 6, 2026

AI agents pass benchmarks but fail in production. Why? Single-run evaluations hide variance. agentrial runs your agent N times, computes Wilson confidence intervals, and uses Fisher exact tests to detect regressions in CI/CD. pip install agentrial, write a YAML, done.

GitHub: https://github.com/alepot55/agentrial

Added agentrial to LLM Evaluation section in README.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant