A simple ADK agent that demonstrates live streaming evaluation with agentevals.
This agent can:
- Roll dice with any number of sides
- Check if numbers are prime
- Stream traces in real-time to agentevals dev server
- Get instant evaluation feedback
export GOOGLE_API_KEY="your-google-api-key"cd /path/to/agentevals
agentevals serve --dev --port 8001cd /path/to/agentevals/ui
npm run devOpen http://localhost:5173 and click "I am developing an agent" to see the streaming view.
cd /path/to/agentevals
python examples/dice_agent/main.pyTry making changes and re-running to see how evaluations change:
Edit agent.py line 48:
dice_agent = Agent(
name="dice_agent",
model="gemini-2.0-flash-thinking-exp-01-21", # Try different models!
instruction=...,
tools=[roll_die, check_prime],
)Re-run:
python examples/dice_agent/main.pyWatch in the UI:
- New session appears with model name in the session ID
- Compare tool calling behavior between models
- See if evaluation scores differ
Edit agent.py:
dice_agent = Agent(
name="dice_agent",
model="gemini-2.5-flash",
instruction="""You are a mathematical assistant specializing in dice and prime numbers.
Always explain your reasoning when checking prime numbers.
Use the tools provided to give accurate results.""",
tools=[roll_die, check_prime],
)Add a new tool in agent.py:
def roll_multiple(count: int, sides: int = 6) -> dict:
"""Roll multiple dice at once."""
results = [random.randint(1, sides) for _ in range(count)]
return {
"count": count,
"sides": sides,
"results": results,
"total": sum(results),
"average": sum(results) / count
}
dice_agent = Agent(
name="dice_agent",
model="gemini-2.5-flash",
instruction=...,
tools=[roll_die, roll_multiple, check_prime], # Add new tool
)Update main.py to test the new functionality.
🎲 Dice Agent - Live Streaming Example
==================================================
✓ Connected to agentevals dev server
Session: dice-agent-gemini-2.5-flash
Model: gemini-2.5-flash
View live: http://localhost:5173
[1/3] User: Hi! Can you help me?
Agent: Hello! I can help you roll dice and check prime numbers...
[2/3] User: Roll a 20-sided die for me
Agent: I rolled a 20-sided die and got 13
[3/3] User: Is the number you rolled prime?
Agent: Yes, 13 is a prime number!
✓ Agent execution complete
Waiting for evaluation results...
⚡ Evaluation results:
✓ tool_trajectory_avg_score: 1.0
Before running agent:
- Click "I am developing an agent" on welcome screen
- See "No active sessions" message
While agent runs:
- Session card appears immediately with status "ACTIVE"
- Span count increments in real-time as agent executes
- See eval set: "dice_agent_eval"
After agent completes:
- Status changes to "EVALUATED"
- Evaluation results appear as colored badges
- Each metric shows: name and score (e.g., "tool_trajectory_avg_score: 1.00")
Multiple runs:
- Each run creates a new session with model name in ID
- Compare sessions side-by-side
- See how different models affect span counts and scores
agent.py- Agent definition with toolsmain.py- Main script with streaming setupeval_set.json- Evaluation cases for trajectory checkingREADME.md- This file
- Keep dev server running - Leave it running across multiple agent runs
- Watch the UI - See how different models/prompts affect the trace structure
- Check evaluations - Use
tool_trajectory_avg_scoreto measure correctness - Iterate quickly - No need to restart anything except the agent script
"Connection refused"
- Make sure dev server is running:
agentevals serve --dev --port 8001
"GOOGLE_API_KEY not set"
- Export your API key:
export GOOGLE_API_KEY="..."
"Module not found: agentevals"
- Install agentevals:
pip install -e /path/to/agentevals
No evaluation results
- The eval set needs to match agent behavior
- Check
eval_set.json- it expectsroll_dieandcheck_primeto be called