fix(ci): run agent eval workflows only on PRs to next

djabarovgeorge · cursoragent · djabarovgeorge · commit 704e8ee8ee4c · 2026-06-21T13:33:35.000+03:00
Remove nightly and manual dispatch triggers, scope both workflows to path-filtered PRs targeting next, and keep judge graders disabled in CI.

Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;
diff --git a/.github/workflows/agent-evals.yml b/.github/workflows/agent-evals.yml
@@ -2,14 +2,12 @@ name: Agent evals
 
 on:
   pull_request:
+    branches:
+      - next
     paths:
       - packages/shared/docs/agent-onboarding.md
       - libs/agent-evals/**
       - .github/workflows/agent-evals.yml
-  schedule:
-    # Nightly regression run (06:00 UTC) catches model/playbook drift outside PRs.
-    - cron: '0 6 * * *'
-  workflow_dispatch:
 
 jobs:
   evals:
@@ -36,6 +34,5 @@ jobs:
       - name: Run agent evals
         env:
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-          # Judge graders only on scheduled/manual runs; PRs run deterministic graders.
-          NOVU_EVAL_JUDGE: ${{ (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch') && 'true' || 'false' }}
+          NOVU_EVAL_JUDGE: 'false'
         run: pnpm --filter @novu/agent-evals eval src/suites/agent-onboarding
diff --git a/.github/workflows/agent-onboarding-webhook.yml b/.github/workflows/agent-onboarding-webhook.yml
@@ -1,11 +1,12 @@
 name: Notify Cursor Automation on Agent Onboarding Change
 
 on:
-  push:
+  pull_request:
     branches:
       - next
     paths:
-      - "packages/shared/docs/agent-onboarding.md"
+      - packages/shared/docs/agent-onboarding.md
+      - .github/workflows/agent-onboarding-webhook.yml
 
 permissions:
   contents: read
diff --git a/libs/agent-evals/README.md b/libs/agent-evals/README.md
@@ -154,7 +154,7 @@ pnpm --filter @novu/agent-evals eval:watch
 # Single scenario
 pnpm --filter @novu/agent-evals exec vitest run --config vitest.evals.config.ts -t keyless-slack-secure
 
-# Enable LLM judge graders (also enabled on scheduled CI runs)
+# Enable LLM judge graders locally
 NOVU_EVAL_JUDGE=true pnpm --filter @novu/agent-evals eval
 ```
 
@@ -175,7 +175,7 @@ Scenarios are independent and dominated by live-model latency, so they run concu
 
 Each scenario uses `judgeThreshold: 0.8` — the average judge score for that scenario must be ≥ 80%. This is stricter than the old global `--fail-under 80` (which gated on the average across all scenarios): every scenario must pass individually.
 
-Judge graders run only when `NOVU_EVAL_JUDGE=true` (PR/push CI runs deterministic graders only; scheduled and workflow-dispatch CI enable judges by default).
+Judge graders run only when `NOVU_EVAL_JUDGE=true` (CI runs deterministic graders only).
 
 ## Triage failing scenarios
 
@@ -189,4 +189,4 @@ When a scenario fails, use the Cursor skill `triage-agent-eval-failures` (`.curs
 
 ## CI
 
-GitHub Actions workflow `.github/workflows/agent-evals.yml` runs `pnpm --filter @novu/agent-evals eval` on playbook or harness changes, with `NOVU_EVAL_JUDGE` enabled on schedule and workflow-dispatch.
+GitHub Actions workflow `.github/workflows/agent-evals.yml` runs `pnpm --filter @novu/agent-evals eval` on PRs to `next` that touch the playbook or harness.