Skip to content

Commit 1a0348c

Browse files
manzoorwanijkclaude
andcommitted
WIP: Add agent-experience-eval GitHub Action
Composite action that evaluates a repo's AI agent experience quality by running the claude-md-improver skill and producing a structured JSON report. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2609d95 commit 1a0348c

6 files changed

Lines changed: 346 additions & 0 deletions

File tree

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
name: Agent Experience Evaluation
2+
3+
on:
4+
schedule:
5+
- cron: '0 6 * * 1' # Weekly, Monday 6am UTC
6+
pull_request: # TODO: Remove after initial testing
7+
paths:
8+
- 'projects/github-actions/agent-experience-eval/**'
9+
- '.github/workflows/agent-experience-eval.yml'
10+
- 'tools/score-agent-experience.sh'
11+
push:
12+
paths:
13+
- 'CLAUDE.md'
14+
- '**/CLAUDE.md'
15+
- 'AGENTS.md'
16+
- '**/AGENTS.md'
17+
- '.claude/**'
18+
- '**/.claude/**'
19+
- '.cursorrules'
20+
- '**/.cursorrules'
21+
- '.cursor/**'
22+
- '**/.cursor/**'
23+
- '.windsurfrules'
24+
- '.aider.conf.yml'
25+
- '.codeiumrc'
26+
- '.github/copilot-instructions.md'
27+
- '.codex/**'
28+
workflow_dispatch: {}
29+
30+
jobs:
31+
evaluate:
32+
runs-on: ubuntu-latest
33+
timeout-minutes: 10
34+
steps:
35+
- uses: actions/checkout@v4
36+
37+
- name: Install Claude Code
38+
run: npm install -g @anthropic-ai/claude-code
39+
40+
- name: Evaluate Agent Experience
41+
uses: ./projects/github-actions/agent-experience-eval
42+
with:
43+
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
44+
45+
- name: Upload evaluation artifact
46+
uses: actions/upload-artifact@v4
47+
with:
48+
name: agent-experience-eval
49+
path: agent-experience-eval.json
50+
retention-days: 30
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Changelog
2+
3+
## 1.0.0-alpha - unreleased
4+
5+
### Added
6+
- Initial release of the Agent Experience Evaluation action.
7+
- AI instruction file discovery (CLAUDE.md, AGENTS.md, .cursorrules, etc.).
8+
- Claude Code CLI-based evaluation against 6-criteria rubric.
9+
- JSON report generation with schema validation.
10+
- Artifact upload support for central collection.
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Agent Experience Evaluation
2+
3+
A GitHub Action that evaluates a repository's AI agent experience quality by scoring its CLAUDE.md, AGENTS.md, and other AI instruction files against a standardized rubric.
4+
5+
## How it works
6+
7+
1. **Discovers** AI instruction files (CLAUDE.md, AGENTS.md, .cursorrules, .cursor/**, etc.)
8+
2. **Evaluates** them using the Claude Code CLI against a 6-criteria, 100-point rubric
9+
3. **Validates** the structured JSON report
10+
4. **Uploads** the report as a GitHub Actions artifact for central collection
11+
12+
## Usage
13+
14+
```yaml
15+
# .github/workflows/agent-experience-eval.yml
16+
name: Agent Experience Evaluation
17+
18+
on:
19+
schedule:
20+
- cron: '0 6 * * 1' # Weekly, Monday 6am UTC
21+
push:
22+
paths:
23+
- 'CLAUDE.md'
24+
- '**/CLAUDE.md'
25+
- 'AGENTS.md'
26+
- '**/AGENTS.md'
27+
- '.claude/**'
28+
- '**/.claude/**'
29+
- '.cursorrules'
30+
- '**/.cursorrules'
31+
- '.cursor/**'
32+
- '**/.cursor/**'
33+
workflow_dispatch: {}
34+
35+
jobs:
36+
evaluate:
37+
runs-on: ubuntu-latest
38+
steps:
39+
- uses: actions/checkout@v4
40+
41+
- name: Install Claude Code
42+
run: npm install -g @anthropic-ai/claude-code
43+
44+
- name: Evaluate Agent Experience
45+
uses: Automattic/action-agent-experience-eval@v1
46+
with:
47+
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
48+
```
49+
50+
## Inputs
51+
52+
| Input | Required | Default | Description |
53+
|-------|----------|---------|-------------|
54+
| `anthropic_api_key` | Yes | — | Anthropic API key for Claude Code CLI |
55+
| `max_turns` | No | `15` | Maximum turns for the CLI evaluation |
56+
| `output_path` | No | `agent-experience-eval.json` | Path for the evaluation JSON file |
57+
| `upload_artifact` | No | `true` | Upload the JSON as a GitHub Actions artifact |
58+
| `artifact_name` | No | `agent-experience-eval` | Name of the uploaded artifact |
59+
| `artifact_retention_days` | No | `30` | Days to retain the artifact |
60+
61+
## Outputs
62+
63+
| Output | Description |
64+
|--------|-------------|
65+
| `score` | Overall evaluation score (0-100) |
66+
| `grade` | Letter grade (A, B, C, D, F) |
67+
| `json_path` | Path to the evaluation JSON file |
68+
| `has_ai_files` | Whether AI instruction files were found |
69+
70+
## Scoring Rubric
71+
72+
| Criterion | Points | Description |
73+
|-----------|--------|-------------|
74+
| Commands/Workflows | 20 | Build, test, lint, deploy commands documented |
75+
| Architecture Clarity | 20 | Codebase map with directories, modules, data flow |
76+
| Non-obvious Patterns | 15 | Gotchas, quirks, workarounds, edge cases |
77+
| Conciseness | 15 | Dense, valuable content with no filler |
78+
| Currency | 15 | Commands work, file refs accurate, stack current |
79+
| Actionability | 15 | Copy-paste ready commands, concrete steps |
80+
81+
**Grade scale:** A (90-100), B (70-89), C (50-69), D (30-49), F (0-29)
82+
83+
## Artifact Schema
84+
85+
The action produces a JSON artifact with this structure:
86+
87+
```json
88+
{
89+
"version": 1,
90+
"score": 72,
91+
"grade": "B",
92+
"files_found": ["CLAUDE.md", ".cursor/rules/testing.mdc"],
93+
"criteria": {
94+
"commands_workflows": { "score": 18, "max": 20, "notes": "..." },
95+
"architecture_clarity": { "score": 15, "max": 20, "notes": "..." },
96+
"non_obvious_patterns": { "score": 12, "max": 15, "notes": "..." },
97+
"conciseness": { "score": 10, "max": 15, "notes": "..." },
98+
"currency": { "score": 8, "max": 15, "notes": "..." },
99+
"actionability": { "score": 9, "max": 15, "notes": "..." }
100+
},
101+
"issues": ["..."],
102+
"recommendations": ["..."]
103+
}
104+
```
105+
106+
## AI Files Detected
107+
108+
The action searches for:
109+
110+
- `CLAUDE.md`, `AGENTS.md`, `AGENTS.override.md` (root and subdirectories)
111+
- `.cursorrules`, `.windsurfrules`, `.aider.conf.yml`, `.codeiumrc`
112+
- `.github/copilot-instructions.md`
113+
- `.claude/**`, `.cursor/**`, `.codex/**` directories
114+
115+
## Prerequisites
116+
117+
The Claude Code CLI (`@anthropic-ai/claude-code`) must be installed on the runner before this action runs. See the usage example above.
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
name: "Agent Experience Evaluation"
2+
description: "Evaluate a repository's AI agent experience quality by scoring its CLAUDE.md, AGENTS.md, and other AI instruction files against a standardized rubric."
3+
branding:
4+
icon: 'award'
5+
color: 'blue'
6+
inputs:
7+
anthropic_api_key:
8+
description: "Anthropic API key for Claude Code CLI"
9+
required: true
10+
max_turns:
11+
description: "Maximum turns for the Claude Code CLI evaluation"
12+
required: false
13+
default: "15"
14+
output_path:
15+
description: "Path where the evaluation JSON file will be written"
16+
required: false
17+
default: "agent-experience-eval.json"
18+
outputs:
19+
score:
20+
description: "Overall evaluation score (0-100)"
21+
value: ${{ steps.eval.outputs.score }}
22+
grade:
23+
description: "Letter grade (A, B, C, D, F)"
24+
value: ${{ steps.eval.outputs.grade }}
25+
json_path:
26+
description: "Path to the evaluation JSON file"
27+
value: ${{ steps.eval.outputs.json_path }}
28+
runs:
29+
using: composite
30+
steps:
31+
- name: Run evaluation
32+
id: eval
33+
shell: bash
34+
env:
35+
ANTHROPIC_API_KEY: ${{ inputs.anthropic_api_key }}
36+
MAX_TURNS: ${{ inputs.max_turns }}
37+
OUTPUT_PATH: ${{ inputs.output_path }}
38+
ACTION_PATH: ${{ github.action_path }}
39+
run: |
40+
PROMPT=$(cat "$ACTION_PATH/prompt.md")
41+
42+
echo "::group::Running Claude Code evaluation"
43+
RAW=$(claude --print --output-format text --max-turns "$MAX_TURNS" "$PROMPT" < /dev/null 2>/dev/null)
44+
echo "::endgroup::"
45+
46+
# Extract JSON — direct parse, code fences, or brace extraction
47+
if echo "$RAW" | jq . > "$OUTPUT_PATH" 2>/dev/null; then
48+
:
49+
elif FENCED=$(echo "$RAW" | sed -n '/^```\(json\)\?$/,/^```$/{ /^```/d; p; }') && \
50+
[[ -n "$FENCED" ]] && echo "$FENCED" | jq . > "$OUTPUT_PATH" 2>/dev/null; then
51+
:
52+
elif BRACED=$(echo "$RAW" | sed -n '/^{/,/^}/p') && \
53+
[[ -n "$BRACED" ]] && echo "$BRACED" | jq . > "$OUTPUT_PATH" 2>/dev/null; then
54+
:
55+
else
56+
echo "::error::Could not parse JSON from Claude output"
57+
echo "${RAW:0:500}"
58+
exit 1
59+
fi
60+
61+
# Read score and grade
62+
SCORE=$(jq -r '.score' "$OUTPUT_PATH")
63+
GRADE=$(jq -r '.grade' "$OUTPUT_PATH")
64+
echo "Score: $SCORE/100 (Grade: $GRADE)"
65+
66+
echo "score=$SCORE" >> "$GITHUB_OUTPUT"
67+
echo "grade=$GRADE" >> "$GITHUB_OUTPUT"
68+
echo "json_path=$OUTPUT_PATH" >> "$GITHUB_OUTPUT"
69+
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Run /claude-md-management:claude-md-improver to audit this repo.
2+
3+
After the skill produces its quality report, convert the results into
4+
a JSON object matching this exact schema:
5+
6+
{
7+
"version": 1,
8+
"score": <number 0-100, the average score from the report>,
9+
"grade": "<A|B|C|D|F, from the report>",
10+
"files_found": ["<paths to all AI instruction files found>"],
11+
"criteria": {
12+
"commands_workflows": { "score": <0-20>, "max": 20, "notes": "<feedback from report>" },
13+
"architecture_clarity": { "score": <0-20>, "max": 20, "notes": "<feedback from report>" },
14+
"non_obvious_patterns": { "score": <0-15>, "max": 15, "notes": "<feedback from report>" },
15+
"conciseness": { "score": <0-15>, "max": 15, "notes": "<feedback from report>" },
16+
"currency": { "score": <0-15>, "max": 15, "notes": "<feedback from report>" },
17+
"actionability": { "score": <0-15>, "max": 15, "notes": "<feedback from report>" }
18+
},
19+
"issues": ["<specific problems from the report>"],
20+
"recommendations": ["<specific recommendations from the report>"]
21+
}
22+
23+
IMPORTANT:
24+
- Run the skill FIRST, then extract the data into JSON.
25+
- Do NOT modify any repo files. Only output the JSON.
26+
- If no AI instruction files exist, score 0, grade F.
27+
- The "score" field must equal the sum of all criteria scores.
28+
- The "grade" must match the score: A (90-100), B (70-89), C (50-69), D (30-49), F (0-29).
29+
- Output ONLY the JSON object. No markdown code fences. No text before or after.

tools/score-agent-experience.sh

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
#!/usr/bin/env bash
2+
#
3+
# Evaluate a repository's AI agent experience quality.
4+
# Uses the same prompt as the agent-experience-eval GitHub Action.
5+
#
6+
# Usage:
7+
# ./tools/score-agent-experience.sh
8+
# ./tools/score-agent-experience.sh -o report.json
9+
# ./tools/score-agent-experience.sh -m 10
10+
11+
set -euo pipefail
12+
13+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
14+
PROMPT_FILE="$SCRIPT_DIR/../projects/github-actions/agent-experience-eval/prompt.md"
15+
OUTPUT="agent-experience-eval.json"
16+
MAX_TURNS=15
17+
18+
while [[ $# -gt 0 ]]; do
19+
case "$1" in
20+
--) shift ;;
21+
-o|--output) OUTPUT="$2"; shift 2 ;;
22+
-m|--max-turns) MAX_TURNS="$2"; shift 2 ;;
23+
-h|--help)
24+
echo "Usage: ./tools/score-agent-experience.sh [-o output.json] [-m max-turns]"
25+
exit 0 ;;
26+
*) echo "Unknown option: $1" >&2; exit 1 ;;
27+
esac
28+
done
29+
30+
if git rev-parse --show-toplevel &>/dev/null; then
31+
cd "$(git rev-parse --show-toplevel)"
32+
fi
33+
34+
[[ -z "${ANTHROPIC_API_KEY:-}" ]] && { echo "ERROR: ANTHROPIC_API_KEY is not set." >&2; exit 1; }
35+
command -v claude &>/dev/null || { echo "ERROR: claude CLI not found." >&2; exit 1; }
36+
command -v jq &>/dev/null || { echo "ERROR: jq not found." >&2; exit 1; }
37+
38+
PROMPT=$(cat "$PROMPT_FILE")
39+
40+
echo "Running evaluation..." >&2
41+
RAW=$(claude --print --output-format text --max-turns "$MAX_TURNS" "$PROMPT" < /dev/null 2>/dev/null)
42+
43+
# Extract JSON
44+
if echo "$RAW" | jq . > "$OUTPUT" 2>/dev/null; then
45+
:
46+
elif FENCED=$(echo "$RAW" | sed -n '/^```\(json\)\?$/,/^```$/{ /^```/d; p; }') && \
47+
[[ -n "$FENCED" ]] && echo "$FENCED" | jq . > "$OUTPUT" 2>/dev/null; then
48+
:
49+
elif BRACED=$(echo "$RAW" | sed -n '/^{/,/^}/p') && \
50+
[[ -n "$BRACED" ]] && echo "$BRACED" | jq . > "$OUTPUT" 2>/dev/null; then
51+
:
52+
else
53+
echo "ERROR: Could not parse JSON from Claude output." >&2
54+
echo "${RAW:0:500}" >&2
55+
exit 1
56+
fi
57+
58+
# Print summary
59+
SCORE=$(jq -r '.score' "$OUTPUT")
60+
GRADE=$(jq -r '.grade' "$OUTPUT")
61+
echo "" >&2
62+
echo "Score: $SCORE/100 (Grade: $GRADE)" >&2
63+
jq -r '.criteria | to_entries[] | "\(.key)\t\(.value.score)\t\(.value.max)"' "$OUTPUT" |
64+
while IFS=$'\t' read -r key score max; do
65+
label=$(echo "$key" | tr '_' ' ' | sed 's/\b\(.\)/\u\1/g')
66+
filled=$(( score * 20 / max ))
67+
bar=$(printf '█%.0s' $(seq 1 "$filled") 2>/dev/null || true)$(printf '░%.0s' $(seq 1 $((20 - filled))) 2>/dev/null || true)
68+
printf " %-22s %s %s/%s\n" "$label" "$bar" "$score" "$max" >&2
69+
done
70+
echo "" >&2
71+
echo "Report written to $OUTPUT" >&2

0 commit comments

Comments
 (0)