-
Notifications
You must be signed in to change notification settings - Fork 628
Expand file tree
/
Copy pathevals.json
More file actions
147 lines (147 loc) · 4.52 KB
/
evals.json
File metadata and controls
147 lines (147 loc) · 4.52 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
{
"scenarios": [
{
"id": "hf-cli-smoke",
"skills": ["skills/hf-cli"],
"tests": "evals/hf-cli/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the modern hf CLI, not deprecated huggingface-cli commands.",
"Provides a concrete command that is directly executable.",
"Stays focused on the user's request."
]
}
},
{
"id": "community-evals-smoke",
"skills": ["skills/huggingface-community-evals"],
"tests": "evals/huggingface-community-evals/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Chooses the correct local evaluation script for the task.",
"Keeps the workflow local instead of drifting into HF Jobs.",
"Provides a runnable command shape."
]
}
},
{
"id": "datasets-smoke",
"skills": ["skills/huggingface-datasets"],
"tests": "evals/huggingface-datasets/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the Dataset Viewer API endpoints that match the requested workflow.",
"Covers split discovery, preview, and pagination.",
"Keeps the answer concrete and operational."
]
}
},
{
"id": "gradio-smoke",
"skills": ["skills/huggingface-gradio"],
"tests": "evals/huggingface-gradio/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Shows a minimal valid Gradio solution.",
"Uses the API shape that matches the user request.",
"Keeps the code concise and directly usable."
]
}
},
{
"id": "jobs-smoke",
"skills": ["skills/huggingface-jobs"],
"tests": "evals/huggingface-jobs/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the hf_jobs MCP tool as the preferred submission path.",
"Mentions HF_TOKEN handling for Hub access.",
"Gives a concrete submission pattern rather than generic advice."
]
}
},
{
"id": "llm-trainer-smoke",
"skills": ["skills/huggingface-llm-trainer"],
"tests": "evals/huggingface-llm-trainer/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the training workflow described by the skill.",
"Keeps Trackio or equivalent monitoring enabled.",
"Points to the right script or job pattern for SFT."
]
}
},
{
"id": "paper-publisher-smoke",
"skills": ["skills/huggingface-paper-publisher"],
"tests": "evals/huggingface-paper-publisher/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the paper_manager script correctly.",
"Provides the right subcommand and required flag.",
"Keeps the answer concrete and executable."
]
}
},
{
"id": "papers-smoke",
"skills": ["skills/huggingface-papers"],
"tests": "evals/huggingface-papers/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the HF paper metadata and markdown endpoints correctly.",
"Handles the provided paper id directly.",
"Keeps the answer narrow and API-focused."
]
}
},
{
"id": "trackio-smoke",
"skills": ["skills/huggingface-trackio"],
"tests": "evals/huggingface-trackio/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Uses the Trackio CLI flow that matches the retrieval task.",
"Returns JSON-oriented automation guidance.",
"Keeps the answer concrete and specific."
]
}
},
{
"id": "vision-trainer-smoke",
"skills": ["skills/huggingface-vision-trainer"],
"tests": "evals/huggingface-vision-trainer/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Emphasizes dataset validation before expensive training.",
"Uses the helper script referenced by the skill.",
"Keeps the answer focused on the requested preflight step."
]
}
},
{
"id": "transformers-js-smoke",
"skills": ["skills/transformers-js"],
"tests": "evals/transformers-js/smoke.json",
"judge": {
"enabled": true,
"criteria": [
"Shows the minimal correct Transformers.js setup for Node.js.",
"Includes proper pipeline usage and cleanup.",
"Keeps the answer directly usable."
]
}
}
]
}