Skip to content

Commit 704e8ee

Browse files
fix(ci): run agent eval workflows only on PRs to next
Remove nightly and manual dispatch triggers, scope both workflows to path-filtered PRs targeting next, and keep judge graders disabled in CI. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 06e09c5 commit 704e8ee

3 files changed

Lines changed: 9 additions & 11 deletions

File tree

.github/workflows/agent-evals.yml

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,12 @@ name: Agent evals
22

33
on:
44
pull_request:
5+
branches:
6+
- next
57
paths:
68
- packages/shared/docs/agent-onboarding.md
79
- libs/agent-evals/**
810
- .github/workflows/agent-evals.yml
9-
schedule:
10-
# Nightly regression run (06:00 UTC) catches model/playbook drift outside PRs.
11-
- cron: '0 6 * * *'
12-
workflow_dispatch:
1311

1412
jobs:
1513
evals:
@@ -36,6 +34,5 @@ jobs:
3634
- name: Run agent evals
3735
env:
3836
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
39-
# Judge graders only on scheduled/manual runs; PRs run deterministic graders.
40-
NOVU_EVAL_JUDGE: ${{ (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch') && 'true' || 'false' }}
37+
NOVU_EVAL_JUDGE: 'false'
4138
run: pnpm --filter @novu/agent-evals eval src/suites/agent-onboarding

.github/workflows/agent-onboarding-webhook.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
name: Notify Cursor Automation on Agent Onboarding Change
22

33
on:
4-
push:
4+
pull_request:
55
branches:
66
- next
77
paths:
8-
- "packages/shared/docs/agent-onboarding.md"
8+
- packages/shared/docs/agent-onboarding.md
9+
- .github/workflows/agent-onboarding-webhook.yml
910

1011
permissions:
1112
contents: read

libs/agent-evals/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ pnpm --filter @novu/agent-evals eval:watch
154154
# Single scenario
155155
pnpm --filter @novu/agent-evals exec vitest run --config vitest.evals.config.ts -t keyless-slack-secure
156156

157-
# Enable LLM judge graders (also enabled on scheduled CI runs)
157+
# Enable LLM judge graders locally
158158
NOVU_EVAL_JUDGE=true pnpm --filter @novu/agent-evals eval
159159
```
160160

@@ -175,7 +175,7 @@ Scenarios are independent and dominated by live-model latency, so they run concu
175175

176176
Each scenario uses `judgeThreshold: 0.8` — the average judge score for that scenario must be ≥ 80%. This is stricter than the old global `--fail-under 80` (which gated on the average across all scenarios): every scenario must pass individually.
177177

178-
Judge graders run only when `NOVU_EVAL_JUDGE=true` (PR/push CI runs deterministic graders only; scheduled and workflow-dispatch CI enable judges by default).
178+
Judge graders run only when `NOVU_EVAL_JUDGE=true` (CI runs deterministic graders only).
179179

180180
## Triage failing scenarios
181181

@@ -189,4 +189,4 @@ When a scenario fails, use the Cursor skill `triage-agent-eval-failures` (`.curs
189189

190190
## CI
191191

192-
GitHub Actions workflow `.github/workflows/agent-evals.yml` runs `pnpm --filter @novu/agent-evals eval` on playbook or harness changes, with `NOVU_EVAL_JUDGE` enabled on schedule and workflow-dispatch.
192+
GitHub Actions workflow `.github/workflows/agent-evals.yml` runs `pnpm --filter @novu/agent-evals eval` on PRs to `next` that touch the playbook or harness.

0 commit comments

Comments
 (0)