Skip to content

Commit 72d6aa8

Browse files
committed
update readme
1 parent 3939533 commit 72d6aa8

5 files changed

Lines changed: 152 additions & 80 deletions

File tree

.github/workflows/cloud.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
name: Cloud Evidently Check
22
on:
33
push:
4-
branches: [main]
4+
branches: [ main ]
55
pull_request:
6-
branches: [main]
6+
branches: [ main ]
77
types:
88
- opened
99
- reopened

.github/workflows/local.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
name: Local Evidently Check
22
on:
33
push:
4-
branches: [main]
4+
branches: [ main ]
55
pull_request:
6-
branches: [main]
6+
branches: [ main ]
77
types:
88
- opened
99
- reopened

README.md

Lines changed: 136 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,6 @@ my-llm-agent/
6464
│ └── evidently.yml
6565
├── src/
6666
│ ├── my_agent.py
67-
│ ├── run_agent.py
6867
│ └── evidently_config.py
6968
├── requirements.txt
7069
└── README.md
@@ -114,45 +113,72 @@ Create `src/my_agent.py`:
114113
```python
115114
# src/my_agent.py
116115

117-
```
118-
119-
(*👉 Replace the function logic with your actual agent code later.*)
120-
121-
---
116+
from agents import Agent, WebSearchTool, Runner
122117

123-
## 📑 Step 4: Write the Agent Runner
118+
my_agent = Agent(
119+
name="Assistant",
120+
instructions="You are a helpful assistant",
121+
model="gpt-4.1",
122+
tools=[
123+
WebSearchTool(),
124+
],
125+
)
124126

125-
We also need a way to run your custom agent code on test dataset. See *this section* if you want to do it via custom descriptor.
126127

127-
Create `src/run_agent.py`.
128-
This will load the dataset from cloud, run your agent on each row, and save the results locally for evidently action to use.
128+
def answer(question: str) -> str:
129+
response = Runner.run_sync(my_agent, question)
130+
return response.final_output
129131

130-
```python
131-
# src/run_agent.py
132132

133133

134134
```
135135

136+
(*👉 Replace the function logic with your actual agent code later.*)
137+
136138
---
137139

138-
## 📑 Step 5: Define Evidently Config
140+
## 📑 Step 4: Define Evidently Config
139141

140142
Create `src/evidently_config.py`.
141143
This config defines:
142144

145+
* A descriptor to run your agent
143146
* A descriptor to check your agent’s answers
144147
* Optionally a test summary metric
145148

146149
```python
147150
# src/evidently_config.py
148151

152+
from evidently import ColumnType
153+
from evidently.cli.report import ReportConfig
154+
from evidently.core.datasets import DatasetColumn
155+
from evidently.metrics import MinValue
156+
from evidently.tests import gte, eq
157+
from evidently.descriptors import NegativityLLMEval, WordCount, CustomColumnDescriptor
158+
from src.my_agent import answer
159+
160+
161+
def answer_descriptor(col: DatasetColumn) -> DatasetColumn:
162+
return DatasetColumn(ColumnType.Text, col.data.apply(answer))
163+
164+
165+
descr_conf = ReportConfig(descriptors=[
166+
CustomColumnDescriptor("question", answer_descriptor, alias="answer"),
167+
NegativityLLMEval("answer", provider="openai", model="gpt-4o-mini", alias="answer_negativity",
168+
tests=[eq("POSITIVE", column="answer_negativity")]),
169+
WordCount("answer", alias="answer_word_count"),
170+
],
171+
metrics=[MinValue(column="answer_word_count", tests=[gte(2)])]
172+
)
173+
174+
149175
```
150176

151177
(*👉 Adjust descriptor params to your needs later.*)
152178

153179
---
154180

155-
## 📦 Step 6: Define Dependencies
181+
## 📦 Step 5: Define Dependencies
156182

157183
Create `requirements.txt`:
158184

@@ -163,25 +189,60 @@ openai-agents
163189

164190
---
165191

166-
## ⚙️ Step 7: Create GitHub Actions Workflow
192+
## ⚙️ Step 6: Create GitHub Actions Workflow
167193

168194
Create `.github/workflows/evidently.yml`:
169195

170196
```yaml
197+
name: Cloud Evidently Check
198+
on:
199+
push:
200+
branches: [ main ]
201+
pull_request:
202+
branches: [ main ]
203+
types:
204+
- opened
205+
- reopened
206+
- synchronize
207+
- ready_for_review
208+
209+
workflow_dispatch:
210+
211+
permissions:
212+
contents: read
213+
statuses: write
214+
215+
concurrency:
216+
group: ${{ github.workflow }}-${{ github.ref }}
217+
cancel-in-progress: true
218+
219+
jobs:
220+
evidently-check:
221+
name: Evidently Report
222+
runs-on: ubuntu-22.04
223+
steps:
224+
- uses: actions/checkout@v4
225+
- uses: actions/setup-python@v5
226+
with:
227+
python-version: "3.12"
228+
cache: "pip"
229+
cache-dependency-path: requirements.txt
230+
- name: "Install requirements"
231+
run: pip install -r requirements.txt
232+
- uses: evidentlyai/evidently-report-action@main
233+
env:
234+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
235+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
236+
with:
237+
config_path: src/evidently_config.py
238+
api_key: ${{ secrets.EVIDENTLY_API_KEY }}
239+
input_path: cloud://<dataset_id>
240+
output: cloud://<project_id>
171241

172242
```
173243

174244
**Notes:**
175-
176-
* Replace `YOUR_DATASET_ID` with the dataset ID from Evidently Cloud
177-
* This workflow:
178-
179-
* Downloads the base test dataset
180-
* Runs your agent to produce answers
181-
* Runs Evidently descriptors and metrics
182-
* Uploads results to Evidently Cloud
183-
* Uploads artifacts with report files
184-
* Fails CI if any test fails
245+
Replace `YOUR_DATASET_ID` with the dataset ID from Evidently Cloud
185246

186247
---
187248

@@ -196,13 +257,10 @@ Every time you push a commit or open a PR:
196257
* Fail workflow if test fails
197258
* Attach a link to the report in GitHub Check Annotations
198259

199-
---
200-
201-
Excellent — here’s the **Local run** tutorial section in the same clean format. I’ll start with a short intro describing the difference from the Cloud run.
202260

203261
---
204262

205-
## 📦 Tutorial: Continuous LLM Agent Evaluation with Evidently Local Run
263+
## 📦 Continuous LLM Agent Evaluation with Evidently Local Run
206264

207265
In this tutorial, you’ll set up a GitHub Actions workflow to evaluate your LLM agent locally using Evidently’s CLI inside CI, without using Evidently Cloud.
208266

@@ -251,39 +309,70 @@ question
251309

252310
---
253311

254-
## 📑 Step 2: Use the Same Agent Code and Runner
312+
## 📑 Step 2: Use the Same Agent Code and Config
255313

256314
Re-use the same:
257315

258316
* `src/my_agent.py`
259317
* `src/run_agent.py`
318+
* `src/evidently_config.py`
319+
* `requirements.txt`
260320

261321
No changes needed.
262322

263323
---
264324

265-
## 📑 Step 3: Use the Same Evidently Config
266-
267-
Re-use the same `src/evidently_config.py` from the cloud tutorial.
268-
269-
No changes needed.
270-
271-
---
272-
273-
## 📦 Step 4: Use the Same Dependencies
274-
275-
Re-use the same `requirements.txt`.
276-
277-
No changes needed.
278-
279-
---
280-
281-
## ⚙️ Step 5: Create Local-Mode GitHub Actions Workflow
325+
## ⚙️ Step 3: Create Local-Mode GitHub Actions Workflow
282326

283327
Create `.github/workflows/evidently.yml`:
284328

285329
```yaml
286330

331+
name: Local Evidently Check
332+
on:
333+
push:
334+
branches: [ main ]
335+
pull_request:
336+
branches: [ main ]
337+
types:
338+
- opened
339+
- reopened
340+
- synchronize
341+
- ready_for_review
342+
343+
workflow_dispatch:
344+
345+
permissions:
346+
contents: read
347+
statuses: write
348+
349+
concurrency:
350+
group: ${{ github.workflow }}-${{ github.ref }}
351+
cancel-in-progress: true
352+
353+
jobs:
354+
evidently-check:
355+
name: Evidently Report
356+
runs-on: ubuntu-22.04
357+
steps:
358+
- uses: actions/checkout@v4
359+
- uses: actions/setup-python@v5
360+
with:
361+
python-version: "3.12"
362+
cache: "pip"
363+
cache-dependency-path: requirements.txt
364+
- name: "Install requirements"
365+
run: pip install -r requirements.txt
366+
- uses: evidentlyai/evidently-report-action@main
367+
env:
368+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
369+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
370+
with:
371+
config_path: src/evidently_config.py
372+
input_path: data/test_questions.csv
373+
output: report.json
374+
upload_artifacts: 'true'
375+
287376
```
288377

289378
**Notes:**
@@ -308,24 +397,6 @@ On every commit:
308397
* Fails workflow if test fails
309398

310399

311-
---
312-
313-
## 🔥 Advanced: Custom Descriptor for Generating Responses
314-
315-
If you'd like to integrate inference directly as an Evidently Descriptor:
316-
317-
```python
318-
319-
```
320-
321-
Add it to config:
322-
323-
```python
324-
325-
```
326-
327-
This runs your agent inference as part of the Evidently workflow inside CI.
328-
329400
---
330401

331402

@@ -340,5 +411,3 @@ This runs your agent inference as part of the Evidently workflow inside CI.
340411
## ✅ Summary
341412

342413
This setup makes it possible to automatically check that changes to your LLM agent’s prompt or logic still produce acceptable, positive results on a test set — in CI, using either local files or integrated Evidently Cloud datasets and storage.
343-
344-
Would you like me to wrap this into a clean `.md` file now? I can draft it immediately.

src/evidently_config.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,21 @@
1+
from evidently import ColumnType
12
from evidently.cli.report import ReportConfig
3+
from evidently.core.datasets import DatasetColumn
24
from evidently.metrics import MinValue
35
from evidently.tests import gte, eq
46
from evidently.descriptors import NegativityLLMEval, WordCount, CustomColumnDescriptor
5-
from src.my_agent import answer_descriptor
7+
from src.my_agent import answer
8+
9+
10+
def answer_descriptor(col: DatasetColumn) -> DatasetColumn:
11+
return DatasetColumn(ColumnType.Text, col.data.apply(answer))
12+
613

714
descr_conf = ReportConfig(descriptors=[
8-
CustomColumnDescriptor("question", answer_descriptor, alias="answer"),
9-
NegativityLLMEval("answer", provider="openai", model="gpt-4o-mini", alias="answer_negativity", tests=[eq("POSITIVE", column="answer_negativity")]),
10-
WordCount("answer", alias="answer_word_count"),
15+
CustomColumnDescriptor("question", answer_descriptor, alias="answer"),
16+
NegativityLLMEval("answer", provider="openai", model="gpt-4o-mini", alias="answer_negativity",
17+
tests=[eq("POSITIVE", column="answer_negativity")]),
18+
WordCount("answer", alias="answer_word_count"),
1119
],
1220
metrics=[MinValue(column="answer_word_count", tests=[gte(2)])]
1321
)

src/my_agent.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
from agents import Agent, WebSearchTool, Runner
22

3-
from evidently import ColumnType
4-
from evidently.core.datasets import DatasetColumn
5-
63
my_agent = Agent(
74
name="Assistant",
85
instructions="You are a helpful assistant",
@@ -17,5 +14,3 @@ def answer(question: str) -> str:
1714
response = Runner.run_sync(my_agent, question)
1815
return response.final_output
1916

20-
def answer_descriptor(col: DatasetColumn) -> DatasetColumn:
21-
return DatasetColumn(ColumnType.Text, col.data.apply(answer))

0 commit comments

Comments
 (0)