Skip to content

Commit c04abda

Browse files
xkejonahe
andauthored
multi-turn experiment example (#302)
Co-authored-by: Jonah Eisen <jonaheisen1@gmail.com>
1 parent e8d40d6 commit c04abda

7 files changed

Lines changed: 191 additions & 0 deletions
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
GALILEO_API_KEY="your-galileo-api-key"
2+
GALILEO_PROJECT="your-galileo-project"
3+
4+
# Provide the console url below if you are not using app.galileo.ai
5+
# GALILEO_CONSOLE_URL="your-galileo-console-url"
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Multi-Turn Experiment Example
2+
3+
The example in this folder demonstrates how to use [create_experiment](https://docs.galileo.ai/sdk-api/python/reference/experiments#create_experiment) to compute a session-level metric for a multi-turn conversation.
4+
5+
## Setup Instructions
6+
7+
### 1. Create and Activate Virtual Environment
8+
9+
```bash
10+
# Navigate to the example folder
11+
cd python/experiments/multi-turn
12+
13+
# Create virtual environment
14+
python -m venv venv
15+
16+
# Activate virtual environment
17+
source venv/bin/activate
18+
```
19+
20+
### 2. Install Dependencies
21+
22+
Run
23+
24+
```bash
25+
pip install -r requirements.txt
26+
```
27+
28+
### 3. Configure Environment Variables
29+
30+
Your `.env` should look like this. Feel free to follow the `.env.example` and enter your credentials
31+
32+
```bash
33+
34+
# Required: Your Galileo API key
35+
GALILEO_API_KEY="your-galileo-api-key"
36+
37+
# Required: Galileo project name
38+
GALILEO_PROJECT="your-galileo-project"
39+
40+
# Provide the console url below if you are not using app.galileo.ai
41+
# GALILEO_CONSOLE_URL="your-galileo-console-url"
42+
```
43+
44+
### 4. Add Integration in Galileo Console
45+
46+
The session-level metric in this example uses an LLM.
47+
48+
Make sure that you've configured a valid LLM integration in the Galileo console.
49+
50+
Related documentation: [Configure an LLM integration](https://docs.galileo.ai/getting-started/evaluate-and-improve/evaluate-and-improve#configure-an-llm-integration)
51+
52+
## Basic Example
53+
54+
Run the basic example:
55+
56+
```bash
57+
python basic-example.py
58+
```
59+
60+
The `METRIC_NAME` variable in this script cites a session-level metric.
61+
62+
Pre-defined session-level metrics include:
63+
64+
- `GalileoMetrics.conversation_quality`
65+
- `GalileoMetrics.action_completion`
66+
- `GalileoMetrics.action_advancement`
67+
- `GalileoMetrics.agent_efficiency`
68+
- `GalileoMetrics.context_adherence`
69+
- `GalileoMetrics.context_relevance`
70+
- `GalileoMetrics.tool_error_rate`
71+
72+
Related documentation: [Metrics Comparison](https://docs.galileo.ai/concepts/metrics/metric-comparison)
73+
74+
Optionally, you can define your own custom session-level metric in the Galileo Console UI, and then add the custom metric name.
75+
76+
![Example custom session-level boolean metric](screenshot-custom-session-level-boolean-metric.png)
77+
78+
## Troubleshooting
79+
80+
Visit the "Sessions" tab of the Experiment in the Galileo Console to confirm the status of the metric computation.
81+
82+
![Troubleshooting auth error](screenshot-session-level-metric-auth-error.png)
83+
84+
If you see an auth error, go to the metric details and make sure that a [valid integration](https://docs.galileo.ai/getting-started/evaluate-and-improve/evaluate-and-improve#configure-an-llm-integration) has been configured.
85+
86+
![Metric details](screenshot-session-level-metric-details.png)
87+
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
import os
2+
import time
3+
4+
from galileo import galileo_context, GalileoMetrics
5+
from galileo.experiments import create_experiment, get_experiment
6+
from galileo.projects import get_project, create_project
7+
from galileo.search import get_sessions
8+
from galileo.utils.metrics import create_metric_configs
9+
from galileo.resources.models import MetricSuccess
10+
11+
# Provide the name of a session-level metric
12+
METRIC_NAME = GalileoMetrics.conversation_quality
13+
14+
# example custom metric name (must be set up in advance)
15+
# METRIC_NAME = "multi-turn-session-test-metric-apples"
16+
17+
# Load environment variables from the .env file
18+
from dotenv import load_dotenv
19+
20+
load_dotenv()
21+
22+
# Get the Galileo project
23+
24+
project_name = os.getenv("GALILEO_PROJECT")
25+
project_obj = get_project(name=project_name)
26+
if not project_obj:
27+
project_obj = create_project(project_name)
28+
29+
print(f"Project name: {project_obj.name}, Project ID: {project_obj.id}")
30+
31+
# Create a unique experiment
32+
33+
time_suffix = time.strftime("%m%d-%H%M")
34+
35+
experiment = create_experiment(experiment_name=f"multi-turn-experiment-{time_suffix}", experiment_group="multi-turn examples")
36+
print(f"Experiment name: {experiment.name}")
37+
38+
galileo_context.init(project=project_obj.name, experiment_id=experiment.id)
39+
40+
# Enable a session-level metric in the created experiment, and get the metric ID
41+
42+
metric_configs, _ = create_metric_configs(
43+
project_id=project_obj.id,
44+
run_id=experiment.id,
45+
metrics=[METRIC_NAME],
46+
)
47+
assert len(metric_configs) == 1
48+
metric_name = metric_configs[0].name
49+
metric_id = metric_configs[0].id
50+
print(f"Metric Name: {metric_name}")
51+
print(f"Metric ID: {metric_id}")
52+
53+
# Log a multi-turn convo using Galileo context and logger
54+
55+
multi_turn_convo = [
56+
{"user": "What is your favorite fruit?", "assistant": "I like blueberries. What about you?"},
57+
{"user": "I like strawberries.", "assistant": "Strawberries are great! Do you like blueberries too?"},
58+
{"user": "Yes, I do!", "assistant": "Awesome! Blueberries are delicious and packed with nutrients."},
59+
]
60+
61+
62+
logger = galileo_context.get_logger_instance(project=project_obj.name, experiment_id=experiment.id)
63+
64+
# Create a session and log traces for each turn in the conversation
65+
66+
logger.start_session()
67+
68+
for turn in multi_turn_convo:
69+
70+
logger.start_trace(input=turn["user"], name="User turn")
71+
logger.add_llm_span(
72+
input=turn["user"],
73+
output=turn["assistant"],
74+
model="gpt-5.4-mini",
75+
)
76+
logger.conclude(output=turn["assistant"])
77+
78+
79+
galileo_context.flush()
80+
81+
# Poll the session-level metric until it's computed
82+
83+
status = "unknown"
84+
while True:
85+
86+
sessions = get_sessions(project_id=project_obj.id, experiment_id=experiment.id)
87+
assert len(sessions.records) > 0, "No sessions found for the experiment"
88+
89+
session = sessions.records[0]
90+
metric = session.metric_info[metric_id]
91+
92+
if isinstance(metric, MetricSuccess):
93+
print(f"Metric {METRIC_NAME} computed successfully with value: {metric.value}")
94+
break
95+
print(f"Metric is not computed yet, retrying in 10 seconds...")
96+
97+
time.sleep(10)
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
galileo
2+
python-dotenv
182 KB
Loading
135 KB
Loading
331 KB
Loading

0 commit comments

Comments
 (0)