Skip to content

Commit 7a311de

Browse files
committed
feat: add Claude Code skills for agents
1 parent 9433e49 commit 7a311de

3 files changed

Lines changed: 422 additions & 2 deletions

File tree

src/uipath/_cli/cli_init.py

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,46 @@ def generate_agent_md_files(target_directory: str, no_agents_md_override: bool)
132132
if generate_agent_md_file(agent_dir, file_name, no_agents_md_override):
133133
any_overridden = True
134134

135+
# Generate Claude Code skills in .claude/commands/
136+
if generate_claude_code_skills(target_directory, no_agents_md_override):
137+
any_overridden = True
138+
135139
if any_overridden:
136-
console.success(f"Updated {click.style('AGENTS.md', fg='cyan')} related files.")
140+
console.success(
141+
f"Updated {click.style('AGENTS.md', fg='cyan')} and Claude Code skills."
142+
)
137143
return
138144

139-
console.success(f"Created {click.style('AGENTS.md', fg='cyan')} related files.")
145+
console.success(
146+
f"Created {click.style('AGENTS.md', fg='cyan')} and Claude Code skills."
147+
)
148+
149+
150+
def generate_claude_code_skills(
151+
target_directory: str, no_agents_md_override: bool
152+
) -> bool:
153+
"""Generate Claude Code skill files in .claude/commands/.
154+
155+
Args:
156+
target_directory: The directory where the .claude folder should be created.
157+
no_agents_md_override: Whether to override existing files.
158+
159+
Returns:
160+
True if any file was overridden, False otherwise.
161+
"""
162+
claude_commands_dir = os.path.join(target_directory, ".claude", "commands")
163+
os.makedirs(claude_commands_dir, exist_ok=True)
164+
165+
skill_files = ["new-agent.md", "eval.md"]
166+
167+
any_overridden = False
168+
for file_name in skill_files:
169+
if generate_agent_md_file(
170+
claude_commands_dir, file_name, no_agents_md_override
171+
):
172+
any_overridden = True
173+
174+
return any_overridden
140175

141176

142177
def write_bindings_file(bindings: Bindings) -> Path:

src/uipath/_resources/eval.md

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
---
2+
allowed-tools: Bash, Read, Write, Edit, Glob
3+
description: Create and run agent evaluations
4+
---
5+
6+
I'll help you create and run evaluations for your UiPath agent.
7+
8+
## Step 1: Check project setup
9+
10+
Let me check your project structure:
11+
12+
!ls -la evaluations/ entry-points.json 2>/dev/null || echo "NEEDS_SETUP"
13+
14+
# Check if schemas might be stale (main.py newer than entry-points.json)
15+
!if [ -f main.py ] && [ -f entry-points.json ] && [ main.py -nt entry-points.json ]; then echo "SCHEMAS_MAY_BE_STALE"; fi
16+
17+
### If NEEDS_SETUP
18+
19+
If `entry-points.json` doesn't exist, initialize the project first:
20+
21+
!uv run uipath init
22+
23+
Then re-run this skill.
24+
25+
### If SCHEMAS_MAY_BE_STALE
26+
27+
Your `main.py` is newer than `entry-points.json`. Refresh schemas:
28+
29+
!uv run uipath init --no-agents-md-override
30+
31+
## Step 2: What would you like to do?
32+
33+
1. **Create new eval set** - Set up evaluations from scratch
34+
2. **Add test case** - Add a test to existing eval set
35+
3. **Run evaluations** - Execute tests and see results
36+
4. **Analyze failures** - Debug failing tests
37+
38+
---
39+
40+
## Creating an Eval Set
41+
42+
First, create the directory structure:
43+
44+
!mkdir -p evaluations/eval-sets evaluations/evaluators
45+
46+
Read the agent's Input/Output schema from entry-points.json to understand the data types.
47+
48+
### Evaluator Selection Guide
49+
50+
| If your output is... | Use this evaluator | evaluatorTypeId |
51+
|---------------------|-------------------|-----------------|
52+
| Exact string/number | `ExactMatchEvaluator` | `uipath-exact-match` |
53+
| Contains key phrases | `ContainsEvaluator` | `uipath-contains` |
54+
| Semantically correct | `LLMJudgeOutputEvaluator` | `uipath-llm-judge-output-semantic-similarity` |
55+
| JSON with numbers | `JsonSimilarityEvaluator` | `uipath-json-similarity` |
56+
57+
### Step 1: Create Evaluator Config Files
58+
59+
**Each evaluator needs a JSON config file** in `evaluations/evaluators/`.
60+
61+
**ExactMatchEvaluator** (`evaluations/evaluators/exact-match.json`):
62+
```json
63+
{
64+
"version": "1.0",
65+
"id": "ExactMatchEvaluator",
66+
"name": "ExactMatchEvaluator",
67+
"description": "Checks for exact output match",
68+
"evaluatorTypeId": "uipath-exact-match",
69+
"evaluatorConfig": {
70+
"name": "ExactMatchEvaluator",
71+
"targetOutputKey": "*"
72+
}
73+
}
74+
```
75+
76+
**LLMJudgeOutputEvaluator** (`evaluations/evaluators/llm-judge-output.json`):
77+
```json
78+
{
79+
"version": "1.0",
80+
"id": "LLMJudgeOutputEvaluator",
81+
"name": "LLMJudgeOutputEvaluator",
82+
"description": "Uses LLM to judge semantic similarity",
83+
"evaluatorTypeId": "uipath-llm-judge-output-semantic-similarity",
84+
"evaluatorConfig": {
85+
"name": "LLMJudgeOutputEvaluator",
86+
"model": "gpt-4o-mini-2024-07-18"
87+
}
88+
}
89+
```
90+
91+
**JsonSimilarityEvaluator** (`evaluations/evaluators/json-similarity.json`):
92+
```json
93+
{
94+
"version": "1.0",
95+
"id": "JsonSimilarityEvaluator",
96+
"name": "JsonSimilarityEvaluator",
97+
"description": "Compares JSON structures",
98+
"evaluatorTypeId": "uipath-json-similarity",
99+
"evaluatorConfig": {
100+
"name": "JsonSimilarityEvaluator",
101+
"targetOutputKey": "*"
102+
}
103+
}
104+
```
105+
106+
**ContainsEvaluator** (`evaluations/evaluators/contains.json`):
107+
```json
108+
{
109+
"version": "1.0",
110+
"id": "ContainsEvaluator",
111+
"name": "ContainsEvaluator",
112+
"description": "Checks if output contains text",
113+
"evaluatorTypeId": "uipath-contains",
114+
"evaluatorConfig": {
115+
"name": "ContainsEvaluator"
116+
}
117+
}
118+
```
119+
120+
### Step 2: Create Eval Set
121+
122+
**Eval Set Template** (`evaluations/eval-sets/default.json`):
123+
```json
124+
{
125+
"version": "1.0",
126+
"id": "default-eval-set",
127+
"name": "Default Evaluation Set",
128+
"evaluatorRefs": ["ExactMatchEvaluator"],
129+
"evaluations": [
130+
{
131+
"id": "test-1",
132+
"name": "Test description",
133+
"inputs": {
134+
"field": "value"
135+
},
136+
"evaluationCriterias": {
137+
"ExactMatchEvaluator": {
138+
"expectedOutput": {
139+
"result": "expected value"
140+
}
141+
}
142+
}
143+
}
144+
]
145+
}
146+
```
147+
148+
**Important notes:**
149+
- `evaluatorRefs` must list ALL evaluators used in any test case
150+
- Each evaluator in `evaluatorRefs` needs a matching JSON config in `evaluations/evaluators/`
151+
- `evaluationCriterias` keys must match entries in `evaluatorRefs`
152+
- Use `expectedOutput` for most evaluators
153+
- LLM evaluators need `model` in their config (e.g., `gpt-4o-mini-2024-07-18`)
154+
155+
---
156+
157+
## Adding a Test Case
158+
159+
When adding a test to an existing eval set:
160+
161+
1. Read the existing eval set
162+
2. Check which evaluators are in `evaluatorRefs`
163+
3. Add the new test to `evaluations` array
164+
4. If using a new evaluator, add it to `evaluatorRefs`
165+
166+
### Test Case Template
167+
168+
```json
169+
{
170+
"id": "test-{n}",
171+
"name": "Description of what this tests",
172+
"inputs": { },
173+
"evaluationCriterias": {
174+
"EvaluatorName": {
175+
"expectedOutput": { }
176+
}
177+
}
178+
}
179+
```
180+
181+
---
182+
183+
## Running Evaluations
184+
185+
First, read entry-points.json to get the entrypoint name (e.g., `main`):
186+
187+
!uv run uipath eval main evaluations/eval-sets/default.json --output-file eval-results.json
188+
189+
**Note:** Replace `main` with your actual entrypoint from entry-points.json.
190+
191+
### Analyze Results
192+
193+
After running, read `eval-results.json` and show:
194+
- Pass/fail summary table
195+
- For failures: expected vs actual output
196+
- Suggestions for fixing or changing evaluators
197+
198+
### Results Format
199+
200+
```json
201+
{
202+
"evaluationSetResults": [{
203+
"evaluationRunResults": [
204+
{
205+
"evaluationId": "test-1",
206+
"evaluatorId": "ExactMatchEvaluator",
207+
"result": { "score": 1.0 },
208+
"errorMessage": null
209+
}
210+
]
211+
}]
212+
}
213+
```
214+
215+
- Score 1.0 = PASS
216+
- Score < 1.0 = FAIL (show expected vs actual)
217+
- errorMessage present = ERROR (show message)
218+
219+
---
220+
221+
## Evaluator Reference
222+
223+
### Deterministic Evaluators
224+
225+
**ExactMatchEvaluator** - Exact output matching
226+
```json
227+
"ExactMatchEvaluator": {
228+
"expectedOutput": { "result": "exact value" }
229+
}
230+
```
231+
232+
**ContainsEvaluator** - Output contains substring
233+
```json
234+
"ContainsEvaluator": {
235+
"searchText": "must contain this"
236+
}
237+
```
238+
239+
**JsonSimilarityEvaluator** - JSON comparison with tolerance
240+
```json
241+
"JsonSimilarityEvaluator": {
242+
"expectedOutput": { "value": 10.0 }
243+
}
244+
```
245+
246+
### LLM-Based Evaluators
247+
248+
**LLMJudgeOutputEvaluator** - Semantic correctness
249+
```json
250+
"LLMJudgeOutputEvaluator": {
251+
"expectedOutput": { "summary": "Expected semantic meaning" }
252+
}
253+
```
254+
255+
**LLMJudgeTrajectoryEvaluator** - Validate agent reasoning
256+
```json
257+
"LLMJudgeTrajectoryEvaluator": {
258+
"expectedAgentBehavior": "The agent should first fetch data, then process it"
259+
}
260+
```
261+
262+
---
263+
264+
## Common Issues
265+
266+
### "No evaluations found"
267+
- Check `evaluations/eval-sets/` directory exists
268+
- Verify JSON file is valid
269+
270+
### Evaluator not found
271+
- Each evaluator needs a JSON config file in `evaluations/evaluators/`
272+
- Config file must have correct `evaluatorTypeId` (see templates above)
273+
- Config file must have `name` field at root level
274+
- LLM evaluators need `model` in `evaluatorConfig`
275+
276+
### Evaluator skipped
277+
- Ensure evaluator is listed in root `evaluatorRefs` array
278+
- Check evaluator config file exists in `evaluations/evaluators/`
279+
280+
### Schema mismatch
281+
- Run `uv run uipath init --no-agents-md-override` to refresh schemas
282+
- Check `entry-points.json` matches your Input/Output models

0 commit comments

Comments
 (0)