Merge pull request #64 from ProfessorSeb/main

krisztianfekete · web-flow · commit 9668ba854f5a · 2026-03-24T16:14:05.000+01:00
Updating the readme.md file
diff --git a/README.md b/README.md
@@ -1,15 +1,66 @@
 <p align="center">
-  <img src="docs/assets/logo-color.png" alt="agentevals" width="420" />
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="docs/assets/logo-color-on-transparent.svg">
+    <source media="(prefers-color-scheme: light)" srcset="docs/assets/logo-dark-on-transparent.svg">
+    <img src="docs/assets/logo-color-on-transparent.svg" alt="agentevals" width="420" />
+  </picture>
 </p>
 
-`agentevals` evaluates AI agent behavior from OpenTelemetry traces, without re-running the agent. Record once, score as many times as you want.
+<h1 align="center">Ship Agents Reliably</h1>
 
-Works with any OTel-instrumented framework (LangChain, Strands, Google ADK, and others). Supports Jaeger JSON and OTLP trace formats, built-in and custom evaluators, and LLM-based judges.
+<p align="center">
+Benchmark your agents before they hit production.<br>
+agentevals scores performance and inference quality from OpenTelemetry traces — no re-runs, no guesswork.
+</p>
+
+<p align="center">
+  <a href="https://github.com/agentevals-dev/agentevals/stargazers"><img src="https://img.shields.io/github/stars/agentevals-dev/agentevals?style=social" alt="GitHub Stars"></a>
+  &nbsp;
+  <a href="https://discord.gg/cpveEn8Ah2"><img src="https://img.shields.io/discord/1435836734666707190?label=Discord&logo=discord&logoColor=white&color=5865F2" alt="Discord"></a>
+  &nbsp;
+  <a href="https://github.com/agentevals-dev/agentevals/releases"><img src="https://img.shields.io/github/v/release/agentevals-dev/agentevals?label=Release" alt="Release"></a>
+  &nbsp;
+  <a href="https://github.com/agentevals-dev/agentevals/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-green.svg" alt="License"></a>
+  &nbsp;
+  <a href="https://pypi.org/project/agentevals-cli/"><img src="https://img.shields.io/pypi/v/agentevals-cli?label=PyPI&color=blue" alt="PyPI"></a>
+</p>
+
+<p align="center">
+  <a href="#installation">Install</a> · <a href="#quick-start">Quick Start</a> · <a href="https://github.com/agentevals-dev/agentevals/releases">Releases</a> · <a href="CONTRIBUTING.md">Contributing</a> · <a href="https://discord.gg/cpveEn8Ah2">Discord</a>
+</p>
+
+---
+
+## What is agentevals?
+
+agentevals is a framework-agnostic evaluation solution that scores AI agent behavior directly from [OpenTelemetry](https://opentelemetry.io/) traces. Record your agent's actions once, then evaluate as many times as you want — no re-runs, no guesswork.
+
+It works with any OTel-instrumented framework (LangChain, Strands, Google ADK, and others), supports Jaeger JSON and OTLP trace formats, and ships with built-in evaluators, custom evaluator support, and LLM-based judges.
 
 - **CLI** for scripting and CI pipelines
 - **Web UI** for visual inspection and local developer experience
 - **MCP server** so MCP clients can run evaluations from a conversation
 
+## Why agentevals?
+
+Most evaluation tools require you to **re-execute your agent** for every test — burning tokens, time, and money on duplicate LLM calls. agentevals takes a different approach:
+
+- **No re-execution** — score agents from existing traces without replaying expensive LLM calls
+- **Framework-agnostic** — works with any agent framework that emits OpenTelemetry spans
+- **Golden eval sets** — compare actual behavior against defined expected behaviors for deterministic pass/fail gating
+- **Custom evaluators** — write scoring logic in Python, JavaScript, or any language
+- **CI/CD ready** — gate deployments on quality thresholds directly in your pipeline
+- **Local-first** — no cloud dependency required; everything runs on your machine
+
+## How It Works
+
+agentevals follows three simple steps:
+
+1. **Collect traces** — Instrument your agent with OpenTelemetry (or export traces from your tracing backend). Point the OTLP exporter at the agentevals receiver, or load trace files directly.
+2. **Define eval sets** — Create golden evaluation sets that describe expected agent behavior: which tools should be called, in what order, and what the output should look like.
+3. **Run evaluations** — Use the CLI, Web UI, or MCP server to score traces against your eval sets. Get per-metric scores, pass/fail results, and detailed span-level breakdowns.
+
+
 > [!IMPORTANT]
 > This project is under active development. Expect breaking changes.
 
diff --git a/docs/assets/logo-color-on-transparent.svg b/docs/assets/logo-color-on-transparent.svg
@@ -0,0 +1,13 @@
+<svg width="3302" height="1066" viewBox="0 0 3302 1066" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path d="M518.695 264C560.958 264 595.207 298.274 595.207 340.537C595.207 382.8 560.958 417.048 518.695 417.048C454.983 417.048 403.305 468.548 403 532.184V533.304C403.306 596.94 454.983 648.438 518.695 648.438H518.722C560.985 648.439 595.232 682.687 595.232 724.95C595.232 767.213 560.984 801.461 518.722 801.461C476.459 801.461 442.21 767.213 442.21 724.95V724.67C442.057 661.008 390.482 609.408 326.795 609.255H326.515C284.252 609.255 250.004 575.006 250.004 532.743C250.004 490.48 284.252 456.232 326.515 456.232H326.642C390.431 456.156 442.108 404.453 442.185 340.664V340.512C442.185 298.249 476.432 264 518.695 264ZM492.436 469.353C527.452 454.848 567.596 471.476 582.101 506.492C596.605 541.508 579.976 581.653 544.96 596.157C509.944 610.661 469.8 594.033 455.296 559.017C440.792 524.001 457.42 483.857 492.436 469.353Z" fill="#8023C3"/>
+<path d="M1029.16 401.476V655.084H982.736L976.321 616.93C956.253 644.357 928.75 658.054 893.878 658.054C870.849 658.054 850.254 652.839 832.16 642.443C814.066 632.046 799.887 617.029 789.688 597.359C779.49 577.721 774.391 554.683 774.391 528.247C774.391 501.81 779.589 479.796 789.952 460.125C800.315 440.487 814.56 425.305 832.654 414.546C850.748 403.819 871.178 398.439 893.878 398.439C912.301 398.439 928.454 401.839 942.271 408.605C956.089 415.371 967.274 424.711 975.861 436.593V401.41H1029.19L1029.16 401.476ZM902.76 612.97C924.802 612.97 942.567 605.214 956.089 589.701C969.577 574.189 976.321 554.023 976.321 529.27C976.321 504.516 969.577 483.294 956.089 467.617C942.6 451.94 924.802 444.085 902.76 444.085C880.718 444.085 862.92 451.94 849.432 467.617C835.944 483.294 829.199 503.526 829.199 528.28C829.199 553.033 835.944 573.76 849.432 589.47C862.92 605.148 880.685 613.003 902.76 613.003V612.97Z" fill="white"/>
+<path d="M2723.45 401.476V655.084H2677.03L2670.61 616.93C2650.54 644.357 2623.04 658.054 2588.17 658.054C2565.14 658.054 2544.54 652.839 2526.45 642.443C2508.36 632.046 2494.18 617.029 2483.98 597.359C2473.78 577.721 2468.68 554.683 2468.68 528.247C2468.68 501.81 2473.88 479.796 2484.24 460.125C2494.6 440.487 2508.85 425.305 2526.94 414.546C2545.04 403.819 2565.47 398.439 2588.17 398.439C2606.59 398.439 2622.74 401.839 2636.56 408.605C2650.38 415.371 2661.56 424.711 2670.15 436.593V401.41H2723.48L2723.45 401.476ZM2597.05 612.97C2619.09 612.97 2636.86 605.214 2650.38 589.701C2663.87 574.189 2670.61 554.023 2670.61 529.27C2670.61 504.516 2663.87 483.294 2650.38 467.617C2636.89 451.94 2619.09 444.085 2597.05 444.085C2575.01 444.085 2557.21 451.94 2543.72 467.617C2530.23 483.294 2523.49 503.526 2523.49 528.28C2523.49 553.033 2530.23 573.76 2543.72 589.47C2557.21 605.148 2574.97 613.003 2597.05 613.003V612.97Z" fill="white"/>
+<path d="M1308.72 401.47V644.682C1308.72 680.36 1298.19 707.985 1277.14 727.655C1256.08 747.293 1223.48 757.129 1179.36 757.129C1145.11 757.129 1117.32 749.439 1095.93 734.091C1074.51 718.744 1062.67 697.027 1060.37 668.94H1114.68C1117.97 683.132 1125.54 694.123 1137.38 701.879C1149.23 709.635 1164.52 713.529 1183.31 713.529C1231.7 713.529 1255.88 689.898 1255.88 642.701V614.482C1237.46 642.206 1209.96 656.101 1173.44 656.101C1150.41 656.101 1129.82 650.887 1111.72 640.49C1093.63 630.094 1079.45 615.242 1069.25 595.901C1059.05 576.593 1053.95 553.721 1053.95 527.284C1053.95 500.847 1059.15 479.394 1069.51 459.922C1079.88 440.449 1094.12 425.333 1112.22 414.606C1130.31 403.88 1150.74 398.5 1173.44 398.5C1192.52 398.5 1209 402.296 1222.82 409.887C1236.64 417.478 1247.82 427.907 1256.41 441.109L1262.33 401.47H1308.75H1308.72ZM1182.32 610.984C1204.36 610.984 1222.13 603.294 1235.65 587.947C1249.14 572.6 1255.88 552.698 1255.88 528.274C1255.88 503.851 1249.14 482.794 1235.65 467.084C1222.16 451.406 1204.36 443.551 1182.32 443.551C1160.28 443.551 1142.48 451.307 1128.99 466.82C1115.51 482.332 1108.76 502.498 1108.76 527.251C1108.76 552.005 1115.51 572.17 1128.99 587.683C1142.48 603.195 1160.25 610.951 1182.32 610.951V610.984Z" fill="white"/>
+<path d="M1324.01 528.769C1324.01 502.696 1329.21 479.823 1339.57 460.153C1349.94 440.515 1364.41 425.333 1383.03 414.573C1401.62 403.847 1422.94 398.467 1446.99 398.467C1471.03 398.467 1492.81 403.417 1511.43 413.319C1530.02 423.22 1544.66 437.28 1555.39 455.433C1566.08 473.585 1571.61 494.906 1571.93 519.33C1571.93 525.931 1571.44 532.697 1570.45 539.628H1379.87V542.598C1381.19 564.711 1388.1 582.237 1400.6 595.109C1413.1 607.98 1429.71 614.416 1450.47 614.416C1466.92 614.416 1480.74 610.555 1491.96 602.766C1503.14 595.01 1510.55 584.019 1514.16 569.827H1567.49C1562.89 595.571 1550.45 616.727 1530.22 633.229C1509.99 649.731 1484.72 657.982 1454.42 657.982C1428.07 657.982 1405.14 652.636 1385.53 641.876C1365.96 631.15 1350.79 616.034 1340.1 596.561C1329.41 577.088 1324.04 554.447 1324.04 528.703L1324.01 528.769ZM1517.55 500.517C1515.25 482.035 1507.91 467.579 1495.58 457.182C1483.24 446.786 1467.68 441.571 1448.93 441.571C1431.49 441.571 1416.42 446.951 1403.76 457.677C1391.09 468.404 1383.76 482.695 1381.78 500.517H1517.55Z" fill="white"/>
+<path d="M1587.2 401.47H1633.61L1639.54 434.673C1658.62 410.58 1685.63 398.5 1720.5 398.5C1735.3 398.5 1748.96 400.711 1761.49 405.2C1773.99 409.656 1784.78 416.686 1793.83 426.257C1802.88 435.828 1809.88 447.974 1814.82 462.661C1819.75 477.348 1822.22 494.94 1822.22 515.402V655.078H1768.4V518.373C1768.4 494.28 1763.3 475.929 1753.1 463.387C1742.9 450.845 1727.93 444.574 1708.16 444.574C1687.11 444.574 1670.56 451.935 1658.55 466.622C1646.54 481.309 1640.52 501.541 1640.52 527.317V655.111H1587.2V401.47Z" fill="white"/>
+<path d="M1845.75 401.47V330.609H1899.57V401.437H1960.3V448.502H1899.57V580.752C1899.57 590.653 1901.54 597.683 1905.49 601.809C1909.44 605.934 1916.18 608.014 1925.72 608.014H1966.22V655.078H1914.87C1890.85 655.078 1873.31 649.566 1862.29 638.477C1851.27 627.42 1845.75 609.994 1845.75 586.23V448.535" fill="white"/>
+<path d="M1976.12 528.769C1976.12 502.696 1981.32 479.823 1991.69 460.153C2002.05 440.515 2016.52 425.333 2035.14 414.573C2053.73 403.847 2075.05 398.467 2099.1 398.467C2123.15 398.467 2144.93 403.417 2163.51 413.319C2182.1 423.22 2196.77 437.28 2207.47 455.433C2218.16 473.585 2223.69 494.906 2224.01 519.33C2224.01 525.931 2223.52 532.697 2222.53 539.628H2031.95V542.598C2033.27 564.711 2040.18 582.237 2052.68 595.109C2065.18 607.98 2081.83 614.416 2102.55 614.416C2119 614.416 2132.85 610.555 2144.04 602.766C2155.22 595.01 2162.63 584.019 2166.24 569.827H2219.57C2214.97 595.571 2202.53 616.727 2182.3 633.229C2162.07 649.731 2136.8 657.982 2106.5 657.982C2080.15 657.982 2057.19 652.636 2037.61 641.876C2018.04 631.15 2002.87 616.034 1992.18 596.561C1981.49 577.088 1976.12 554.447 1976.12 528.703V528.769ZM2169.67 500.517C2167.36 482.035 2160.03 467.579 2147.69 457.182C2135.35 446.786 2119.79 441.571 2101.04 441.571C2083.6 441.571 2068.54 446.951 2055.87 457.677C2043.2 468.404 2035.87 482.695 2033.89 500.517H2169.67Z" fill="white"/>
+<path d="M2216.86 401.475H2274.14L2343.75 597.621L2412.38 401.475H2468.67L2375.33 655.082H2310.16L2216.83 401.475H2216.86Z" fill="white"/>
+<path d="M2754.43 308.332H2807.76V655.079H2754.43V308.332Z" fill="white"/>
+<path d="M2882.6 571.352C2883.59 584.554 2889.77 595.379 2901.12 603.796C2912.47 612.212 2927.21 616.436 2945.31 616.436C2961.43 616.436 2974.52 613.4 2984.55 607.261C2994.59 601.155 2999.62 592.97 2999.62 582.739C2999.62 574.157 2997.32 567.721 2992.71 563.431C2988.11 559.14 2981.92 556.104 2974.19 554.256C2966.46 552.44 2954.52 550.559 2938.4 548.546C2916.36 545.905 2898.16 542.341 2883.85 537.885C2869.54 533.43 2857.99 526.334 2849.28 516.597C2840.56 506.861 2836.18 493.725 2836.18 477.223C2836.18 461.711 2840.56 447.915 2849.28 435.868C2857.99 423.821 2870 414.481 2885.33 407.88C2900.63 401.279 2918 397.979 2937.41 397.979C2969.32 397.979 2995.25 405.075 3015.18 419.267C3035.09 433.459 3045.88 453.459 3047.52 479.203H2995.67C2994.36 467.651 2988.6 458.146 2978.4 450.72C2968.2 443.294 2955.37 439.564 2939.88 439.564C2924.38 439.564 2911.92 442.535 2902.34 448.476C2892.8 454.416 2888.03 462.503 2888.03 472.734C2888.03 480.325 2890.4 486.035 2895.2 489.83C2899.97 493.626 2905.99 496.266 2913.23 497.752C2920.47 499.237 2932.15 500.986 2948.3 502.966C2970.01 505.277 2988.31 508.841 3003.11 513.627C3017.91 518.413 3029.76 526.004 3038.67 536.4C3047.56 546.797 3052 560.923 3052 578.745C3052 594.587 3047.39 608.548 3038.18 620.595C3028.97 632.642 3016.27 641.883 3000.15 648.319C2984.03 654.755 2965.9 657.989 2945.83 657.989C2911.92 657.989 2884.51 650.299 2863.62 634.952C2842.73 619.605 2831.94 598.383 2831.28 571.286H2882.64L2882.6 571.352Z" fill="white"/>
+</svg>
diff --git a/docs/assets/logo-dark-on-transparent.svg b/docs/assets/logo-dark-on-transparent.svg