Open source LLM evaluation framework with cost + latency + hallucination metrics #2730

vignesh2027 · 2026-06-08T05:39:06Z

vignesh2027
Jun 8, 2026

Hey DeepEval community!

Love what DeepEval is doing with LLM-as-judge evaluation. I built an open source framework that takes a complementary approach focused on production metrics.

Key difference from DeepEval: No LLM-as-judge needed. Everything runs locally or from real API responses.

5 metrics tracked simultaneously:

Accuracy: 4-strategy cascade (exact, normalized, MC, fuzzy Levenshtein)
Latency: p50/p75/p90/p95/p99 from real async API calls
Cost per 1K tokens: from actual token counts, not estimates
Hallucination Rate: linguistic signal analysis (hedging/uncertainty/grounding signals)
Reasoning Quality: chain-of-thought depth score 1-10

One command benchmark:

pip install llm-evaluation-framework
llm-eval compare --models gpt-4o-mini --models gemini/gemini-1.5-flash --benchmark mmlu --samples 100

Works via LiteLLM so any model is supported.

Live demo (no API key needed): https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

71 tests, 82% coverage. The two approaches complement well: DeepEval for semantic/behavioral evaluation, this for production metrics. Feedback welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open source LLM evaluation framework with cost + latency + hallucination metrics #2730

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Open source LLM evaluation framework with cost + latency + hallucination metrics #2730

Uh oh!

vignesh2027 Jun 8, 2026

Replies: 0 comments

vignesh2027
Jun 8, 2026