You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-2Lines changed: 16 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,22 @@ The tool provides a CLI for local dev work, scripting and CI pipelines, a web UI
7
7
> [!IMPORTANT]
8
8
> This project is under active development. Expect breaking changes.
9
9
10
+
## Instrument Your Agent in 3 Lines
11
+
12
+
```python
13
+
from agentevals import AgentEvals
14
+
15
+
app = AgentEvals()
16
+
17
+
with app.session(eval_set_id="my-eval"):
18
+
# your agent code — any framework, unchanged
19
+
agent.invoke("Roll a 20-sided die for me")
20
+
```
21
+
22
+
Wrap your agent code in `app.session()` and every LLM call, tool invocation, and response streams live to the agentevals UI. No OpenTelemetry setup, no WebSocket plumbing, no cleanup — the SDK handles all of it.
23
+
24
+
Requires the `[streaming]` extra: `pip install "agentevals[streaming]"`. Works with LangChain, Strands, Google ADK, or anything that emits OTel spans. See [examples/sdk_example/](examples/sdk_example/) for framework-specific patterns.
25
+
10
26
## Installation
11
27
12
28
Download a release wheel from the [releases page](../../releases). Two variants are available — both share the same filename but differ in contents:
Copy file name to clipboardExpand all lines: examples/README.md
+53-1Lines changed: 53 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,53 @@ agentevals evaluates AI agents by consuming their [OpenTelemetry](https://opente
4
4
5
5
This guide covers the instrumentation patterns agentevals supports, with a recommendation for new projects. Each example in this directory is a working agent you can run and modify.
6
6
7
+
## SDK (Quick Start)
8
+
9
+
The `AgentEvals` SDK wraps all OTel boilerplate into a single context manager. Use this for the simplest setup:
10
+
11
+
```python
12
+
from agentevals import AgentEvals
13
+
14
+
app = AgentEvals()
15
+
16
+
with app.session(eval_set_id="my-eval"):
17
+
# Your agent code here — any framework, unchanged
18
+
result = my_agent.invoke("Hello!")
19
+
```
20
+
21
+
Works with LangChain, Strands, Google ADK, and any OTel-instrumented agent. For frameworks that create their own `TracerProvider` (like Strands), pass it explicitly:
22
+
23
+
```python
24
+
telemetry = StrandsTelemetry()
25
+
26
+
with app.session(eval_set_id="strands-eval", tracer_provider=telemetry.tracer_provider):
27
+
agent("Roll a die")
28
+
```
29
+
30
+
For simple prompt→response agents, there's also a decorator shorthand:
31
+
32
+
```python
33
+
app = AgentEvals(eval_set_id="my-eval")
34
+
35
+
@app.agent
36
+
defmy_agent(prompt):
37
+
return llm.invoke(prompt).content
38
+
39
+
app.run(["Hello!", "Tell me a joke"])
40
+
```
41
+
42
+
To keep the SDK wired up in your code but skip streaming when the dev server isn't running, set `streaming=False`:
When disabled, `session()` and `session_async()` become no-ops — your agent code runs normally without any WebSocket connection, OTel setup, or background threads.
49
+
50
+
See [sdk_example/](./sdk_example/) for complete working examples.
51
+
52
+
## Advanced: Manual OTel Setup
53
+
7
54
> [!TIP]
8
55
> **Prefer OTel GenAI semantic conventions** for new agents. They are framework-agnostic,
9
56
> interoperable across observability tools, and benefit from the growing OTel ecosystem.
0 commit comments