Dice Agent - Live Streaming Example

A simple ADK agent that demonstrates live streaming evaluation with agentevals.

What This Does

This agent can:

Roll dice with any number of sides
Check if numbers are prime
Stream traces in real-time to agentevals dev server
Get instant evaluation feedback

Quick Start

1. Set up your API key

export GOOGLE_API_KEY="your-google-api-key"

2. Start agentevals dev server (Terminal 1)

cd /path/to/agentevals
agentevals serve --dev --port 8001

3. Start the UI (Terminal 2, optional)

cd /path/to/agentevals/ui
npm run dev

Open http://localhost:5173 and click "I am developing an agent" to see the streaming view.

4. Run the agent (Terminal 3)

cd /path/to/agentevals
python examples/dice_agent/main.py

Iterate and Experiment

Try making changes and re-running to see how evaluations change:

Change 1: Switch Models

Edit agent.py line 48:

dice_agent = Agent(
    name="dice_agent",
    model="gemini-2.0-flash-thinking-exp-01-21",  # Try different models!
    instruction=...,
    tools=[roll_die, check_prime],
)

Re-run:

python examples/dice_agent/main.py

Watch in the UI:

New session appears with model name in the session ID
Compare tool calling behavior between models
See if evaluation scores differ

Change 2: Modify Instructions

Edit agent.py:

dice_agent = Agent(
    name="dice_agent",
    model="gemini-2.5-flash",
    instruction="""You are a mathematical assistant specializing in dice and prime numbers.

Always explain your reasoning when checking prime numbers.
Use the tools provided to give accurate results.""",
    tools=[roll_die, check_prime],
)

Change 3: Add More Tools

Add a new tool in agent.py:

def roll_multiple(count: int, sides: int = 6) -> dict:
    """Roll multiple dice at once."""
    results = [random.randint(1, sides) for _ in range(count)]
    return {
        "count": count,
        "sides": sides,
        "results": results,
        "total": sum(results),
        "average": sum(results) / count
    }

dice_agent = Agent(
    name="dice_agent",
    model="gemini-2.5-flash",
    instruction=...,
    tools=[roll_die, roll_multiple, check_prime],  # Add new tool
)

Update main.py to test the new functionality.

What You'll See

In Terminal

🎲 Dice Agent - Live Streaming Example
==================================================

✓ Connected to agentevals dev server
  Session: dice-agent-gemini-2.5-flash
  Model: gemini-2.5-flash
  View live: http://localhost:5173

[1/3] User: Hi! Can you help me?
     Agent: Hello! I can help you roll dice and check prime numbers...

[2/3] User: Roll a 20-sided die for me
     Agent: I rolled a 20-sided die and got 13

[3/3] User: Is the number you rolled prime?
     Agent: Yes, 13 is a prime number!

✓ Agent execution complete
  Waiting for evaluation results...

⚡ Evaluation results:
  ✓ tool_trajectory_avg_score: 1.0

In Browser (Live Streaming View)

Before running agent:

Click "I am developing an agent" on welcome screen
See "No active sessions" message

While agent runs:

Session card appears immediately with status "ACTIVE"
Span count increments in real-time as agent executes
See eval set: "dice_agent_eval"

After agent completes:

Status changes to "EVALUATED"
Evaluation results appear as colored badges
Each metric shows: name and score (e.g., "tool_trajectory_avg_score: 1.00")

Multiple runs:

Each run creates a new session with model name in ID
Compare sessions side-by-side
See how different models affect span counts and scores

Files

agent.py - Agent definition with tools
main.py - Main script with streaming setup
eval_set.json - Evaluation cases for trajectory checking
README.md - This file

Tips

Keep dev server running - Leave it running across multiple agent runs
Watch the UI - See how different models/prompts affect the trace structure
Check evaluations - Use tool_trajectory_avg_score to measure correctness
Iterate quickly - No need to restart anything except the agent script

Troubleshooting

"Connection refused"

Make sure dev server is running: agentevals serve --dev --port 8001

"GOOGLE_API_KEY not set"

Export your API key: export GOOGLE_API_KEY="..."

"Module not found: agentevals"

Install agentevals: pip install -e /path/to/agentevals

No evaluation results

The eval set needs to match agent behavior
Check eval_set.json - it expects roll_die and check_prime to be called

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dice Agent - Live Streaming Example

What This Does

Quick Start

1. Set up your API key

2. Start agentevals dev server (Terminal 1)

3. Start the UI (Terminal 2, optional)

4. Run the agent (Terminal 3)

Iterate and Experiment

Change 1: Switch Models

Change 2: Modify Instructions

Change 3: Add More Tools

What You'll See

In Terminal

In Browser (Live Streaming View)

Files

Tips

Troubleshooting

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Dice Agent - Live Streaming Example

What This Does

Quick Start

1. Set up your API key

2. Start agentevals dev server (Terminal 1)

3. Start the UI (Terminal 2, optional)

4. Run the agent (Terminal 3)

Iterate and Experiment

Change 1: Switch Models

Change 2: Modify Instructions

Change 3: Add More Tools

What You'll See

In Terminal

In Browser (Live Streaming View)

Files

Tips

Troubleshooting