Skip to content

Commit 9668ba8

Browse files
Merge pull request #64 from ProfessorSeb/main
Updating the readme.md file
2 parents 8919baf + c3a51ed commit 9668ba8

3 files changed

Lines changed: 80 additions & 3 deletions

File tree

README.md

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,66 @@
11
<p align="center">
2-
<img src="docs/assets/logo-color.png" alt="agentevals" width="420" />
2+
<picture>
3+
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/logo-color-on-transparent.svg">
4+
<source media="(prefers-color-scheme: light)" srcset="docs/assets/logo-dark-on-transparent.svg">
5+
<img src="docs/assets/logo-color-on-transparent.svg" alt="agentevals" width="420" />
6+
</picture>
37
</p>
48

5-
`agentevals` evaluates AI agent behavior from OpenTelemetry traces, without re-running the agent. Record once, score as many times as you want.
9+
<h1 align="center">Ship Agents Reliably</h1>
610

7-
Works with any OTel-instrumented framework (LangChain, Strands, Google ADK, and others). Supports Jaeger JSON and OTLP trace formats, built-in and custom evaluators, and LLM-based judges.
11+
<p align="center">
12+
Benchmark your agents before they hit production.<br>
13+
agentevals scores performance and inference quality from OpenTelemetry traces — no re-runs, no guesswork.
14+
</p>
15+
16+
<p align="center">
17+
<a href="https://github.com/agentevals-dev/agentevals/stargazers"><img src="https://img.shields.io/github/stars/agentevals-dev/agentevals?style=social" alt="GitHub Stars"></a>
18+
&nbsp;
19+
<a href="https://discord.gg/cpveEn8Ah2"><img src="https://img.shields.io/discord/1435836734666707190?label=Discord&logo=discord&logoColor=white&color=5865F2" alt="Discord"></a>
20+
&nbsp;
21+
<a href="https://github.com/agentevals-dev/agentevals/releases"><img src="https://img.shields.io/github/v/release/agentevals-dev/agentevals?label=Release" alt="Release"></a>
22+
&nbsp;
23+
<a href="https://github.com/agentevals-dev/agentevals/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-green.svg" alt="License"></a>
24+
&nbsp;
25+
<a href="https://pypi.org/project/agentevals-cli/"><img src="https://img.shields.io/pypi/v/agentevals-cli?label=PyPI&color=blue" alt="PyPI"></a>
26+
</p>
27+
28+
<p align="center">
29+
<a href="#installation">Install</a> · <a href="#quick-start">Quick Start</a> · <a href="https://github.com/agentevals-dev/agentevals/releases">Releases</a> · <a href="CONTRIBUTING.md">Contributing</a> · <a href="https://discord.gg/cpveEn8Ah2">Discord</a>
30+
</p>
31+
32+
---
33+
34+
## What is agentevals?
35+
36+
agentevals is a framework-agnostic evaluation solution that scores AI agent behavior directly from [OpenTelemetry](https://opentelemetry.io/) traces. Record your agent's actions once, then evaluate as many times as you want — no re-runs, no guesswork.
37+
38+
It works with any OTel-instrumented framework (LangChain, Strands, Google ADK, and others), supports Jaeger JSON and OTLP trace formats, and ships with built-in evaluators, custom evaluator support, and LLM-based judges.
839

940
- **CLI** for scripting and CI pipelines
1041
- **Web UI** for visual inspection and local developer experience
1142
- **MCP server** so MCP clients can run evaluations from a conversation
1243

44+
## Why agentevals?
45+
46+
Most evaluation tools require you to **re-execute your agent** for every test — burning tokens, time, and money on duplicate LLM calls. agentevals takes a different approach:
47+
48+
- **No re-execution** — score agents from existing traces without replaying expensive LLM calls
49+
- **Framework-agnostic** — works with any agent framework that emits OpenTelemetry spans
50+
- **Golden eval sets** — compare actual behavior against defined expected behaviors for deterministic pass/fail gating
51+
- **Custom evaluators** — write scoring logic in Python, JavaScript, or any language
52+
- **CI/CD ready** — gate deployments on quality thresholds directly in your pipeline
53+
- **Local-first** — no cloud dependency required; everything runs on your machine
54+
55+
## How It Works
56+
57+
agentevals follows three simple steps:
58+
59+
1. **Collect traces** — Instrument your agent with OpenTelemetry (or export traces from your tracing backend). Point the OTLP exporter at the agentevals receiver, or load trace files directly.
60+
2. **Define eval sets** — Create golden evaluation sets that describe expected agent behavior: which tools should be called, in what order, and what the output should look like.
61+
3. **Run evaluations** — Use the CLI, Web UI, or MCP server to score traces against your eval sets. Get per-metric scores, pass/fail results, and detailed span-level breakdowns.
62+
63+
1364
> [!IMPORTANT]
1465
> This project is under active development. Expect breaking changes.
1566
Lines changed: 13 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)