Skip to content

Commit 4d74c0c

Browse files
author
Chibi Vikram
committed
fix: remove missing custom evaluators from eval set
The eval set was referencing two custom evaluators that don't exist: - ResumeCompletedEvaluator - SuspendResumeTrajectoryEvaluator This caused the eval command to fail with: 'Could not find the following evaluators' Changes: - Removed evaluatorRefs and evaluationCriterias - Removed expectedOutputs (agent suspends, doesn't complete on first run) - Updated description to clarify suspend/resume testing workflow - Kept simple inputs for testing suspend phase The eval set now works for demonstrating suspend behavior without requiring custom evaluators. For full suspend/resume testing with evaluators, use the demo scripts or --resume flag.
1 parent 06061a4 commit 4d74c0c

1 file changed

Lines changed: 1 addition & 18 deletions

File tree

samples/tool-calling-suspend-resume/evaluations/eval-sets/test_simple_no_auth.json

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,20 @@
22
"version": "1.0",
33
"id": "test-simple-no-auth",
44
"name": "Simple Suspend/Resume Test (No Auth Required)",
5-
"description": "Tests suspend/resume pattern using simple dict payload. Validates that the agent suspends, persists state, and completes after resume.",
6-
"evaluatorRefs": [
7-
"ResumeCompletedEvaluator",
8-
"SuspendResumeTrajectoryEvaluator"
9-
],
5+
"description": "Tests suspend/resume pattern using simple dict payload. Note: This eval set demonstrates suspend behavior - evaluations will suspend on first run and complete after resume with --resume flag.",
106
"evaluations": [
117
{
128
"id": "test_suspend_resume_basic",
139
"name": "Basic suspend/resume with query",
1410
"inputs": {
1511
"query": "Test suspend with simple payload"
16-
},
17-
"evaluationCriterias": {
18-
"ResumeCompletedEvaluator": {
19-
"searchText": "Completed with resume data"
20-
},
21-
"SuspendResumeTrajectoryEvaluator": {
22-
"expectedAgentBehavior": "The agent should receive the query, suspend execution at the interrupt() call, and when resumed, complete successfully with the resume data incorporated in the result."
23-
}
2412
}
2513
},
2614
{
2715
"id": "test_suspend_resume_with_different_query",
2816
"name": "Suspend/resume with different query content",
2917
"inputs": {
3018
"query": "What happens during suspension?"
31-
},
32-
"evaluationCriterias": {
33-
"ResumeCompletedEvaluator": {
34-
"searchText": "Completed with resume data"
35-
}
3619
}
3720
}
3821
]

0 commit comments

Comments
 (0)