Skip to content

Commit a8ca8a6

Browse files
Merge pull request #20 from agentevals-dev/feature/sdk
Add initial SDK implementation
2 parents 4cf9217 + 4b891d2 commit a8ca8a6

10 files changed

Lines changed: 1157 additions & 3 deletions

File tree

README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,22 @@ The tool provides a CLI for local dev work, scripting and CI pipelines, a web UI
77
> [!IMPORTANT]
88
> This project is under active development. Expect breaking changes.
99
10+
## Instrument Your Agent in 3 Lines
11+
12+
```python
13+
from agentevals import AgentEvals
14+
15+
app = AgentEvals()
16+
17+
with app.session(eval_set_id="my-eval"):
18+
# your agent code — any framework, unchanged
19+
agent.invoke("Roll a 20-sided die for me")
20+
```
21+
22+
Wrap your agent code in `app.session()` and every LLM call, tool invocation, and response streams live to the agentevals UI. No OpenTelemetry setup, no WebSocket plumbing, no cleanup — the SDK handles all of it.
23+
24+
Requires the `[streaming]` extra: `pip install "agentevals[streaming]"`. Works with LangChain, Strands, Google ADK, or anything that emits OTel spans. See [examples/sdk_example/](examples/sdk_example/) for framework-specific patterns.
25+
1026
## Installation
1127

1228
Download a release wheel from the [releases page](../../releases). Two variants are available — both share the same filename but differ in contents:
@@ -40,8 +56,6 @@ uv sync
4056

4157
# Using Nix (includes all dependencies)
4258
nix develop .
43-
44-
4559
```
4660

4761
Run a quick evaluation:

examples/README.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,53 @@ agentevals evaluates AI agents by consuming their [OpenTelemetry](https://opente
44

55
This guide covers the instrumentation patterns agentevals supports, with a recommendation for new projects. Each example in this directory is a working agent you can run and modify.
66

7+
## SDK (Quick Start)
8+
9+
The `AgentEvals` SDK wraps all OTel boilerplate into a single context manager. Use this for the simplest setup:
10+
11+
```python
12+
from agentevals import AgentEvals
13+
14+
app = AgentEvals()
15+
16+
with app.session(eval_set_id="my-eval"):
17+
# Your agent code here — any framework, unchanged
18+
result = my_agent.invoke("Hello!")
19+
```
20+
21+
Works with LangChain, Strands, Google ADK, and any OTel-instrumented agent. For frameworks that create their own `TracerProvider` (like Strands), pass it explicitly:
22+
23+
```python
24+
telemetry = StrandsTelemetry()
25+
26+
with app.session(eval_set_id="strands-eval", tracer_provider=telemetry.tracer_provider):
27+
agent("Roll a die")
28+
```
29+
30+
For simple prompt→response agents, there's also a decorator shorthand:
31+
32+
```python
33+
app = AgentEvals(eval_set_id="my-eval")
34+
35+
@app.agent
36+
def my_agent(prompt):
37+
return llm.invoke(prompt).content
38+
39+
app.run(["Hello!", "Tell me a joke"])
40+
```
41+
42+
To keep the SDK wired up in your code but skip streaming when the dev server isn't running, set `streaming=False`:
43+
44+
```python
45+
app = AgentEvals(streaming=os.getenv("AGENTEVALS_STREAM", "1") == "1")
46+
```
47+
48+
When disabled, `session()` and `session_async()` become no-ops — your agent code runs normally without any WebSocket connection, OTel setup, or background threads.
49+
50+
See [sdk_example/](./sdk_example/) for complete working examples.
51+
52+
## Advanced: Manual OTel Setup
53+
754
> [!TIP]
855
> **Prefer OTel GenAI semantic conventions** for new agents. They are framework-agnostic,
956
> interoperable across observability tools, and benefit from the growing OTel ecosystem.
@@ -138,7 +185,12 @@ cd ui && npm run dev
138185
### 3. Run an Example Agent
139186

140187
```bash
141-
# Pick one:
188+
# SDK examples (recommended starting point):
189+
python examples/sdk_example/context_manager_example.py
190+
python examples/sdk_example/decorator_example.py
191+
python examples/sdk_example/async_example.py
192+
193+
# Manual OTel setup examples:
142194
python examples/dice_agent/main.py
143195
python examples/langchain_agent/main.py
144196
python examples/strands_agent/main.py
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
"""Async context manager for ADK and other async agents.
2+
3+
Use session_async() when your agent code is async. This avoids the
4+
background thread used by the sync context manager.
5+
6+
Prerequisites:
7+
1. Start agentevals dev server:
8+
$ agentevals serve --dev --port 8001
9+
10+
2. Set your API key:
11+
$ export GOOGLE_API_KEY="your-key-here"
12+
13+
Usage:
14+
$ python examples/sdk_example/async_example.py
15+
"""
16+
17+
import asyncio
18+
import logging
19+
from pathlib import Path
20+
21+
logging.basicConfig(level=logging.INFO)
22+
23+
from dotenv import load_dotenv
24+
from google.adk.runners import InMemoryRunner
25+
from google.genai import types
26+
27+
# Import the dice_agent from the sibling example directory.
28+
# In a real project this would be a normal package import.
29+
import importlib.util
30+
31+
_agent_path = Path(__file__).resolve().parent.parent / "dice_agent" / "agent.py"
32+
_spec = importlib.util.spec_from_file_location("dice_agent_module", _agent_path)
33+
_mod = importlib.util.module_from_spec(_spec)
34+
_spec.loader.exec_module(_mod)
35+
dice_agent = _mod.dice_agent
36+
37+
from agentevals import AgentEvals
38+
39+
load_dotenv(override=True)
40+
41+
app = AgentEvals()
42+
43+
44+
async def main():
45+
async with app.session_async(
46+
eval_set_id="sdk-async-demo",
47+
metadata={"model": dice_agent.model},
48+
):
49+
runner = InMemoryRunner(agent=dice_agent, app_name="dice_app")
50+
session = await runner.session_service.create_session(
51+
app_name="dice_app", user_id="demo_user"
52+
)
53+
54+
for query in ["Roll a 20-sided die", "Is that number prime?"]:
55+
print(f"User: {query}")
56+
content = types.Content(
57+
role="user", parts=[types.Part.from_text(text=query)]
58+
)
59+
async for event in runner.run_async(
60+
user_id="demo_user", session_id=session.id, new_message=content
61+
):
62+
if event.content.parts and event.content.parts[0].text:
63+
print(f"Agent: {event.content.parts[0].text}")
64+
65+
66+
if __name__ == "__main__":
67+
asyncio.run(main())
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
"""Drop-in streaming for existing agent code using the AgentEvals SDK.
2+
3+
This is the primary SDK pattern — wrap your existing code in a context manager
4+
and traces stream to the agentevals UI automatically.
5+
6+
Prerequisites:
7+
1. Start agentevals dev server:
8+
$ agentevals serve --dev --port 8001
9+
10+
2. (Optional) Start UI:
11+
$ cd ui && npm run dev
12+
13+
3. Set your API key:
14+
$ export OPENAI_API_KEY="your-key-here"
15+
16+
Usage:
17+
$ python examples/sdk_example/context_manager_example.py
18+
"""
19+
20+
import logging
21+
22+
from agentevals import AgentEvals
23+
from dotenv import load_dotenv
24+
from langchain_openai import ChatOpenAI
25+
26+
logging.basicConfig(level=logging.INFO)
27+
load_dotenv(override=True)
28+
29+
app = AgentEvals()
30+
llm = ChatOpenAI(model="gpt-4o-mini")
31+
32+
with app.session(eval_set_id="sdk-context-manager-demo", metadata={"model": "gpt-4o-mini"}):
33+
print(llm.invoke("What is 2 + 2?").content)
34+
print(llm.invoke("Is that number prime?").content)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
"""Decorator shorthand for simple prompt-to-response agents.
2+
3+
Use this pattern when your agent is a simple function that takes a prompt
4+
and returns a result. For more complex agents with multi-turn conversations
5+
or state, use the context manager pattern instead.
6+
7+
Prerequisites:
8+
1. Start agentevals dev server:
9+
$ agentevals serve --dev --port 8001
10+
11+
2. Set your API key:
12+
$ export OPENAI_API_KEY="your-key-here"
13+
14+
Usage:
15+
$ python examples/sdk_example/decorator_example.py
16+
"""
17+
18+
import logging
19+
20+
from agentevals import AgentEvals
21+
from dotenv import load_dotenv
22+
from langchain_openai import ChatOpenAI
23+
24+
logging.basicConfig(level=logging.INFO)
25+
load_dotenv(override=True)
26+
27+
app = AgentEvals(eval_set_id="sdk-decorator-demo")
28+
llm = ChatOpenAI(model="gpt-4o-mini")
29+
30+
31+
@app.agent
32+
def my_agent(prompt):
33+
return llm.invoke(prompt).content
34+
35+
36+
app.run(["What is 2 + 2?", "Tell me a joke", "Is 17 prime?"])

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ live = [
2525
]
2626
streaming = [
2727
"opentelemetry-sdk>=1.20.0",
28+
"websockets>=12.0",
2829
]
2930

3031
[project.scripts]
@@ -45,3 +46,8 @@ members = []
4546
[tool.pytest.ini_options]
4647
testpaths = ["tests"]
4748
pythonpath = ["src"]
49+
50+
[dependency-groups]
51+
dev = [
52+
"pytest>=9.0.2",
53+
]

src/agentevals/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,10 @@
66
__version__ = version("agentevals")
77
except PackageNotFoundError:
88
__version__ = "0.0.0-dev"
9+
10+
11+
def __getattr__(name):
12+
if name == "AgentEvals":
13+
from .sdk import AgentEvals
14+
return AgentEvals
15+
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

0 commit comments

Comments
 (0)