WIP: Add agent-experience-eval GitHub Action

manzoorwanijk · claude · manzoorwanijk · commit 1a0348c7b4ac · 2026-03-25T15:25:55.000-04:00
Composite action that evaluates a repo's AI agent experience quality
by running the claude-md-improver skill and producing a structured
JSON report.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/.github/workflows/agent-experience-eval.yml b/.github/workflows/agent-experience-eval.yml
@@ -0,0 +1,50 @@
+name: Agent Experience Evaluation
+
+on:
+  schedule:
+    - cron: '0 6 * * 1' # Weekly, Monday 6am UTC
+  pull_request: # TODO: Remove after initial testing
+    paths:
+      - 'projects/github-actions/agent-experience-eval/**'
+      - '.github/workflows/agent-experience-eval.yml'
+      - 'tools/score-agent-experience.sh'
+  push:
+    paths:
+      - 'CLAUDE.md'
+      - '**/CLAUDE.md'
+      - 'AGENTS.md'
+      - '**/AGENTS.md'
+      - '.claude/**'
+      - '**/.claude/**'
+      - '.cursorrules'
+      - '**/.cursorrules'
+      - '.cursor/**'
+      - '**/.cursor/**'
+      - '.windsurfrules'
+      - '.aider.conf.yml'
+      - '.codeiumrc'
+      - '.github/copilot-instructions.md'
+      - '.codex/**'
+  workflow_dispatch: {}
+
+jobs:
+  evaluate:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install Claude Code
+        run: npm install -g @anthropic-ai/claude-code
+
+      - name: Evaluate Agent Experience
+        uses: ./projects/github-actions/agent-experience-eval
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+
+      - name: Upload evaluation artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: agent-experience-eval
+          path: agent-experience-eval.json
+          retention-days: 30
diff --git a/projects/github-actions/agent-experience-eval/CHANGELOG.md b/projects/github-actions/agent-experience-eval/CHANGELOG.md
@@ -0,0 +1,10 @@
+# Changelog
+
+## 1.0.0-alpha - unreleased
+
+### Added
+- Initial release of the Agent Experience Evaluation action.
+- AI instruction file discovery (CLAUDE.md, AGENTS.md, .cursorrules, etc.).
+- Claude Code CLI-based evaluation against 6-criteria rubric.
+- JSON report generation with schema validation.
+- Artifact upload support for central collection.
diff --git a/projects/github-actions/agent-experience-eval/README.md b/projects/github-actions/agent-experience-eval/README.md
@@ -0,0 +1,117 @@
+# Agent Experience Evaluation
+
+A GitHub Action that evaluates a repository's AI agent experience quality by scoring its CLAUDE.md, AGENTS.md, and other AI instruction files against a standardized rubric.
+
+## How it works
+
+1. **Discovers** AI instruction files (CLAUDE.md, AGENTS.md, .cursorrules, .cursor/**, etc.)
+2. **Evaluates** them using the Claude Code CLI against a 6-criteria, 100-point rubric
+3. **Validates** the structured JSON report
+4. **Uploads** the report as a GitHub Actions artifact for central collection
+
+## Usage
+
+```yaml
+# .github/workflows/agent-experience-eval.yml
+name: Agent Experience Evaluation
+
+on:
+  schedule:
+    - cron: '0 6 * * 1'  # Weekly, Monday 6am UTC
+  push:
+    paths:
+      - 'CLAUDE.md'
+      - '**/CLAUDE.md'
+      - 'AGENTS.md'
+      - '**/AGENTS.md'
+      - '.claude/**'
+      - '**/.claude/**'
+      - '.cursorrules'
+      - '**/.cursorrules'
+      - '.cursor/**'
+      - '**/.cursor/**'
+  workflow_dispatch: {}
+
+jobs:
+  evaluate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install Claude Code
+        run: npm install -g @anthropic-ai/claude-code
+
+      - name: Evaluate Agent Experience
+        uses: Automattic/action-agent-experience-eval@v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+```
+
+## Inputs
+
+| Input | Required | Default | Description |
+|-------|----------|---------|-------------|
+| `anthropic_api_key` | Yes | — | Anthropic API key for Claude Code CLI |
+| `max_turns` | No | `15` | Maximum turns for the CLI evaluation |
+| `output_path` | No | `agent-experience-eval.json` | Path for the evaluation JSON file |
+| `upload_artifact` | No | `true` | Upload the JSON as a GitHub Actions artifact |
+| `artifact_name` | No | `agent-experience-eval` | Name of the uploaded artifact |
+| `artifact_retention_days` | No | `30` | Days to retain the artifact |
+
+## Outputs
+
+| Output | Description |
+|--------|-------------|
+| `score` | Overall evaluation score (0-100) |
+| `grade` | Letter grade (A, B, C, D, F) |
+| `json_path` | Path to the evaluation JSON file |
+| `has_ai_files` | Whether AI instruction files were found |
+
+## Scoring Rubric
+
+| Criterion | Points | Description |
+|-----------|--------|-------------|
+| Commands/Workflows | 20 | Build, test, lint, deploy commands documented |
+| Architecture Clarity | 20 | Codebase map with directories, modules, data flow |
+| Non-obvious Patterns | 15 | Gotchas, quirks, workarounds, edge cases |
+| Conciseness | 15 | Dense, valuable content with no filler |
+| Currency | 15 | Commands work, file refs accurate, stack current |
+| Actionability | 15 | Copy-paste ready commands, concrete steps |
+
+**Grade scale:** A (90-100), B (70-89), C (50-69), D (30-49), F (0-29)
+
+## Artifact Schema
+
+The action produces a JSON artifact with this structure:
+
+```json
+{
+  "version": 1,
+  "score": 72,
+  "grade": "B",
+  "files_found": ["CLAUDE.md", ".cursor/rules/testing.mdc"],
+  "criteria": {
+    "commands_workflows":   { "score": 18, "max": 20, "notes": "..." },
+    "architecture_clarity": { "score": 15, "max": 20, "notes": "..." },
+    "non_obvious_patterns": { "score": 12, "max": 15, "notes": "..." },
+    "conciseness":          { "score": 10, "max": 15, "notes": "..." },
+    "currency":             { "score": 8,  "max": 15, "notes": "..." },
+    "actionability":        { "score": 9,  "max": 15, "notes": "..." }
+  },
+  "issues": ["..."],
+  "recommendations": ["..."]
+}
+```
+
+## AI Files Detected
+
+The action searches for:
+
+- `CLAUDE.md`, `AGENTS.md`, `AGENTS.override.md` (root and subdirectories)
+- `.cursorrules`, `.windsurfrules`, `.aider.conf.yml`, `.codeiumrc`
+- `.github/copilot-instructions.md`
+- `.claude/**`, `.cursor/**`, `.codex/**` directories
+
+## Prerequisites
+
+The Claude Code CLI (`@anthropic-ai/claude-code`) must be installed on the runner before this action runs. See the usage example above.
diff --git a/projects/github-actions/agent-experience-eval/action.yml b/projects/github-actions/agent-experience-eval/action.yml
@@ -0,0 +1,69 @@
+name: "Agent Experience Evaluation"
+description: "Evaluate a repository's AI agent experience quality by scoring its CLAUDE.md, AGENTS.md, and other AI instruction files against a standardized rubric."
+branding:
+  icon: 'award'
+  color: 'blue'
+inputs:
+  anthropic_api_key:
+    description: "Anthropic API key for Claude Code CLI"
+    required: true
+  max_turns:
+    description: "Maximum turns for the Claude Code CLI evaluation"
+    required: false
+    default: "15"
+  output_path:
+    description: "Path where the evaluation JSON file will be written"
+    required: false
+    default: "agent-experience-eval.json"
+outputs:
+  score:
+    description: "Overall evaluation score (0-100)"
+    value: ${{ steps.eval.outputs.score }}
+  grade:
+    description: "Letter grade (A, B, C, D, F)"
+    value: ${{ steps.eval.outputs.grade }}
+  json_path:
+    description: "Path to the evaluation JSON file"
+    value: ${{ steps.eval.outputs.json_path }}
+runs:
+  using: composite
+  steps:
+    - name: Run evaluation
+      id: eval
+      shell: bash
+      env:
+        ANTHROPIC_API_KEY: ${{ inputs.anthropic_api_key }}
+        MAX_TURNS: ${{ inputs.max_turns }}
+        OUTPUT_PATH: ${{ inputs.output_path }}
+        ACTION_PATH: ${{ github.action_path }}
+      run: |
+        PROMPT=$(cat "$ACTION_PATH/prompt.md")
+
+        echo "::group::Running Claude Code evaluation"
+        RAW=$(claude --print --output-format text --max-turns "$MAX_TURNS" "$PROMPT" < /dev/null 2>/dev/null)
+        echo "::endgroup::"
+
+        # Extract JSON — direct parse, code fences, or brace extraction
+        if echo "$RAW" | jq . > "$OUTPUT_PATH" 2>/dev/null; then
+          :
+        elif FENCED=$(echo "$RAW" | sed -n '/^```\(json\)\?$/,/^```$/{ /^```/d; p; }') && \
+             [[ -n "$FENCED" ]] && echo "$FENCED" | jq . > "$OUTPUT_PATH" 2>/dev/null; then
+          :
+        elif BRACED=$(echo "$RAW" | sed -n '/^{/,/^}/p') && \
+             [[ -n "$BRACED" ]] && echo "$BRACED" | jq . > "$OUTPUT_PATH" 2>/dev/null; then
+          :
+        else
+          echo "::error::Could not parse JSON from Claude output"
+          echo "${RAW:0:500}"
+          exit 1
+        fi
+
+        # Read score and grade
+        SCORE=$(jq -r '.score' "$OUTPUT_PATH")
+        GRADE=$(jq -r '.grade' "$OUTPUT_PATH")
+        echo "Score: $SCORE/100 (Grade: $GRADE)"
+
+        echo "score=$SCORE" >> "$GITHUB_OUTPUT"
+        echo "grade=$GRADE" >> "$GITHUB_OUTPUT"
+        echo "json_path=$OUTPUT_PATH" >> "$GITHUB_OUTPUT"
+
diff --git a/projects/github-actions/agent-experience-eval/prompt.md b/projects/github-actions/agent-experience-eval/prompt.md
@@ -0,0 +1,29 @@
+Run /claude-md-management:claude-md-improver to audit this repo.
+
+After the skill produces its quality report, convert the results into
+a JSON object matching this exact schema:
+
+{
+  "version": 1,
+  "score": <number 0-100, the average score from the report>,
+  "grade": "<A|B|C|D|F, from the report>",
+  "files_found": ["<paths to all AI instruction files found>"],
+  "criteria": {
+    "commands_workflows":   { "score": <0-20>, "max": 20, "notes": "<feedback from report>" },
+    "architecture_clarity": { "score": <0-20>, "max": 20, "notes": "<feedback from report>" },
+    "non_obvious_patterns": { "score": <0-15>, "max": 15, "notes": "<feedback from report>" },
+    "conciseness":          { "score": <0-15>, "max": 15, "notes": "<feedback from report>" },
+    "currency":             { "score": <0-15>, "max": 15, "notes": "<feedback from report>" },
+    "actionability":        { "score": <0-15>, "max": 15, "notes": "<feedback from report>" }
+  },
+  "issues": ["<specific problems from the report>"],
+  "recommendations": ["<specific recommendations from the report>"]
+}
+
+IMPORTANT:
+- Run the skill FIRST, then extract the data into JSON.
+- Do NOT modify any repo files. Only output the JSON.
+- If no AI instruction files exist, score 0, grade F.
+- The "score" field must equal the sum of all criteria scores.
+- The "grade" must match the score: A (90-100), B (70-89), C (50-69), D (30-49), F (0-29).
+- Output ONLY the JSON object. No markdown code fences. No text before or after.
diff --git a/tools/score-agent-experience.sh b/tools/score-agent-experience.sh
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+#
+# Evaluate a repository's AI agent experience quality.
+# Uses the same prompt as the agent-experience-eval GitHub Action.
+#
+# Usage:
+#   ./tools/score-agent-experience.sh
+#   ./tools/score-agent-experience.sh -o report.json
+#   ./tools/score-agent-experience.sh -m 10
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROMPT_FILE="$SCRIPT_DIR/../projects/github-actions/agent-experience-eval/prompt.md"
+OUTPUT="agent-experience-eval.json"
+MAX_TURNS=15
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --)           shift ;;
+    -o|--output)  OUTPUT="$2"; shift 2 ;;
+    -m|--max-turns) MAX_TURNS="$2"; shift 2 ;;
+    -h|--help)
+      echo "Usage: ./tools/score-agent-experience.sh [-o output.json] [-m max-turns]"
+      exit 0 ;;
+    *) echo "Unknown option: $1" >&2; exit 1 ;;
+  esac
+done
+
+if git rev-parse --show-toplevel &>/dev/null; then
+  cd "$(git rev-parse --show-toplevel)"
+fi
+
+[[ -z "${ANTHROPIC_API_KEY:-}" ]] && { echo "ERROR: ANTHROPIC_API_KEY is not set." >&2; exit 1; }
+command -v claude &>/dev/null || { echo "ERROR: claude CLI not found." >&2; exit 1; }
+command -v jq &>/dev/null || { echo "ERROR: jq not found." >&2; exit 1; }
+
+PROMPT=$(cat "$PROMPT_FILE")
+
+echo "Running evaluation..." >&2
+RAW=$(claude --print --output-format text --max-turns "$MAX_TURNS" "$PROMPT" < /dev/null 2>/dev/null)
+
+# Extract JSON
+if echo "$RAW" | jq . > "$OUTPUT" 2>/dev/null; then
+  :
+elif FENCED=$(echo "$RAW" | sed -n '/^```\(json\)\?$/,/^```$/{ /^```/d; p; }') && \
+     [[ -n "$FENCED" ]] && echo "$FENCED" | jq . > "$OUTPUT" 2>/dev/null; then
+  :
+elif BRACED=$(echo "$RAW" | sed -n '/^{/,/^}/p') && \
+     [[ -n "$BRACED" ]] && echo "$BRACED" | jq . > "$OUTPUT" 2>/dev/null; then
+  :
+else
+  echo "ERROR: Could not parse JSON from Claude output." >&2
+  echo "${RAW:0:500}" >&2
+  exit 1
+fi
+
+# Print summary
+SCORE=$(jq -r '.score' "$OUTPUT")
+GRADE=$(jq -r '.grade' "$OUTPUT")
+echo "" >&2
+echo "Score: $SCORE/100 (Grade: $GRADE)" >&2
+jq -r '.criteria | to_entries[] | "\(.key)\t\(.value.score)\t\(.value.max)"' "$OUTPUT" |
+while IFS=$'\t' read -r key score max; do
+  label=$(echo "$key" | tr '_' ' ' | sed 's/\b\(.\)/\u\1/g')
+  filled=$(( score * 20 / max ))
+  bar=$(printf '█%.0s' $(seq 1 "$filled") 2>/dev/null || true)$(printf '░%.0s' $(seq 1 $((20 - filled))) 2>/dev/null || true)
+  printf "  %-22s %s  %s/%s\n" "$label" "$bar" "$score" "$max" >&2
+done
+echo "" >&2
+echo "Report written to $OUTPUT" >&2