Skip to content

Add skill coverage measurement tool#509

Draft
Evangelink wants to merge 1 commit intomainfrom
dev/amauryleve/skill-coverage-tool
Draft

Add skill coverage measurement tool#509
Evangelink wants to merge 1 commit intomainfrom
dev/amauryleve/skill-coverage-tool

Conversation

@Evangelink
Copy link
Copy Markdown
Member

Introduce Measure-SkillCoverage.ps1 under eng/skill-coverage/ that measures how much of a SKILL.md's teaching content is verified by eval.yaml test scenarios — analogous to code coverage for skill files.

The script extracts testable 'coverage points' from four structural sections of a skill file (Validation checklist, Common Pitfalls, Workflow Steps, and code patterns from fenced blocks), then cross-references them against eval.yaml assertions and rubric criteria using regex matching and keyword overlap heuristics.

Features:

  • Per-skill and aggregate (-All) analysis across all plugins
  • Table output with colored pass/fail and per-category percentages
  • JSON output for machine consumption (single object or array)
  • Distinguishes deterministic assertions from rubric-only coverage
  • MinCoverage threshold for CI gating
  • Tracks source line numbers for all coverage point categories

cc @ViktorHofer @JanKrivanek

Introduce Measure-SkillCoverage.ps1 under eng/skill-coverage/ that
measures how much of a SKILL.md's teaching content is verified by
eval.yaml test scenarios — analogous to code coverage for skill files.

The script extracts testable 'coverage points' from four structural
sections of a skill file (Validation checklist, Common Pitfalls,
Workflow Steps, and code patterns from fenced blocks), then
cross-references them against eval.yaml assertions and rubric
criteria using regex matching and keyword overlap heuristics.

Features:
- Per-skill and aggregate (-All) analysis across all plugins
- Table output with colored pass/fail and per-category percentages
- JSON output for machine consumption (single object or array)
- Distinguishes deterministic assertions from rubric-only coverage
- MinCoverage threshold for CI gating
- Tracks source line numbers for all coverage point categories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant