You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(*👉 Adjust descriptor params to your needs later.*)
152
178
153
179
---
154
180
155
-
## 📦 Step 6: Define Dependencies
181
+
## 📦 Step 5: Define Dependencies
156
182
157
183
Create `requirements.txt`:
158
184
@@ -163,25 +189,60 @@ openai-agents
163
189
164
190
---
165
191
166
-
## ⚙️ Step 7: Create GitHub Actions Workflow
192
+
## ⚙️ Step 6: Create GitHub Actions Workflow
167
193
168
194
Create `.github/workflows/evidently.yml`:
169
195
170
196
```yaml
197
+
name: Cloud Evidently Check
198
+
on:
199
+
push:
200
+
branches: [ main ]
201
+
pull_request:
202
+
branches: [ main ]
203
+
types:
204
+
- opened
205
+
- reopened
206
+
- synchronize
207
+
- ready_for_review
208
+
209
+
workflow_dispatch:
210
+
211
+
permissions:
212
+
contents: read
213
+
statuses: write
214
+
215
+
concurrency:
216
+
group: ${{ github.workflow }}-${{ github.ref }}
217
+
cancel-in-progress: true
218
+
219
+
jobs:
220
+
evidently-check:
221
+
name: Evidently Report
222
+
runs-on: ubuntu-22.04
223
+
steps:
224
+
- uses: actions/checkout@v4
225
+
- uses: actions/setup-python@v5
226
+
with:
227
+
python-version: "3.12"
228
+
cache: "pip"
229
+
cache-dependency-path: requirements.txt
230
+
- name: "Install requirements"
231
+
run: pip install -r requirements.txt
232
+
- uses: evidentlyai/evidently-report-action@main
233
+
env:
234
+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
235
+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
236
+
with:
237
+
config_path: src/evidently_config.py
238
+
api_key: ${{ secrets.EVIDENTLY_API_KEY }}
239
+
input_path: cloud://<dataset_id>
240
+
output: cloud://<project_id>
171
241
172
242
```
173
243
174
244
**Notes:**
175
-
176
-
* Replace `YOUR_DATASET_ID` with the dataset ID from Evidently Cloud
177
-
* This workflow:
178
-
179
-
* Downloads the base test dataset
180
-
* Runs your agent to produce answers
181
-
* Runs Evidently descriptors and metrics
182
-
* Uploads results to Evidently Cloud
183
-
* Uploads artifacts with report files
184
-
* Fails CI if any test fails
245
+
Replace `YOUR_DATASET_ID` with the dataset ID from Evidently Cloud
185
246
186
247
---
187
248
@@ -196,13 +257,10 @@ Every time you push a commit or open a PR:
196
257
* Fail workflow if test fails
197
258
* Attach a link to the report in GitHub Check Annotations
198
259
199
-
---
200
-
201
-
Excellent — here’s the **Local run** tutorial section in the same clean format. I’ll start with a short intro describing the difference from the Cloud run.
202
260
203
261
---
204
262
205
-
## 📦 Tutorial: Continuous LLM Agent Evaluation with Evidently Local Run
263
+
## 📦 Continuous LLM Agent Evaluation with Evidently Local Run
206
264
207
265
In this tutorial, you’ll set up a GitHub Actions workflow to evaluate your LLM agent locally using Evidently’s CLI inside CI, without using Evidently Cloud.
208
266
@@ -251,39 +309,70 @@ question
251
309
252
310
---
253
311
254
-
## 📑 Step 2: Use the Same Agent Code and Runner
312
+
## 📑 Step 2: Use the Same Agent Code and Config
255
313
256
314
Re-use the same:
257
315
258
316
*`src/my_agent.py`
259
317
*`src/run_agent.py`
318
+
*`src/evidently_config.py`
319
+
*`requirements.txt`
260
320
261
321
No changes needed.
262
322
263
323
---
264
324
265
-
## 📑 Step 3: Use the Same Evidently Config
266
-
267
-
Re-use the same `src/evidently_config.py` from the cloud tutorial.
## 🔥 Advanced: Custom Descriptor for Generating Responses
314
-
315
-
If you'd like to integrate inference directly as an Evidently Descriptor:
316
-
317
-
```python
318
-
319
-
```
320
-
321
-
Add it to config:
322
-
323
-
```python
324
-
325
-
```
326
-
327
-
This runs your agent inference as part of the Evidently workflow inside CI.
328
-
329
400
---
330
401
331
402
@@ -340,5 +411,3 @@ This runs your agent inference as part of the Evidently workflow inside CI.
340
411
## ✅ Summary
341
412
342
413
This setup makes it possible to automatically check that changes to your LLM agent’s prompt or logic still produce acceptable, positive results on a test set — in CI, using either local files or integrated Evidently Cloud datasets and storage.
343
-
344
-
Would you like me to wrap this into a clean `.md` file now? I can draft it immediately.
0 commit comments