Releases: joeynyc/skillscore
Releases · joeynyc/skillscore
v2.0.2: Anthropic-Aligned Rubric Redesign
What's New since v1.2.1
Breaking: New Scoring Rubric (v2.0.0)
- Completely redesigned scoring to align with Anthropic's official skill documentation
- New rubric categories reflect real-world skill quality signals
Fixes
- Excluded
plm-cli/from npm package (v2.0.1) - Removed dead code, simplified async, fixed scoreScope budget
- Fixed incorrect
test:uicommand in README
Docs
- Updated README with real-world scores, working GitHub examples, and API field docs
- Full changelog: v1.2.1...v2.0.2
v1.2.1: Slim package
Added .npmignore to exclude source, tests, and assets from the published npm package.
- Package size: 4MB → 41KB (99% reduction)
- Published files: 35 (dist + README + LICENSE only)
Full Changelog: v1.2.0...v1.2.1
v1.2.0: CLI Quality Improvements
What's New
CLI Improvements
- Variadic paths —
skillscore <path...>now accepts multiple paths natively, auto-entering batch mode when 2+ paths are given - Explicit GitHub flag — Shorthand GitHub references (e.g.
user/repo/skill) now require-g/--githubflag, eliminating false positives from local paths that look like GitHub shorthands - Testable error handling —
CliErrorclass replaces rawprocess.exit(1)calls, enabling programmatic testing and better error messages
Architecture
- Shared Reporter interface — All three reporters (
TerminalReporter,JsonReporter,MarkdownReporter) now implement a commonReporterinterface, exported for programmatic use - Weight validation — Runtime assertion ensures scoring category weights always sum to 1.0, catching drift immediately on module load
Testing
- 6 new programmatic CLI tests — End-to-end tests covering error paths, valid skills, JSON output, verbose mode, and batch mode (60 total tests)
Docs
- Updated README with new CLI options,
-gflag examples, andReporterinterface in API usage section
Full Changelog: v1.1.0...v1.2.0
v1.1.0 — Routing, Templates & Containment Scoring
What's New
5 new scoring sub-criteria inspired by OpenAI's Skills + Shell + Compaction blog and production data from Glean:
🎯 Scope — Routing Quality
- Negative routing examples (2pts) — Does the skill say when not to use it? Glean found triggering dropped ~20% without these.
- Routing-quality description (1pt) — Concrete routing signals (tool names, I/O, "use when") vs vague marketing copy.
📄 Documentation — Embedded Templates
- Embedded templates & worked examples (2pts) — Real output templates inside the skill, not just descriptions. Glean reported biggest quality gains from this pattern.
🔒 Safety — Network Containment
- Network containment (1pt) — If a skill uses HTTP/curl/fetch, does it mention allowlists or scoping? Flags the tools + networking combo risk.
📁 Structure — Artifact Output
- Artifact output spec (1pt) — Does the skill define where outputs/artifacts go?
No Breaking Changes
- All 8 categories unchanged
- All weights unchanged
- Point rebalancing within categories only
- 56 tests passing
npm install -g skillscore@1.1.0SkillScore v1.0.0 — Initial Release
🚀 SkillScore v1.0.0
The universal quality standard for AI agent skills.
Features
- 8 weighted scoring categories: Structure, Clarity, Safety, Dependencies, Error Handling, Scope, Documentation, Portability
- Deterministic analysis — no API keys required
- GitHub URL support — score skills directly from GitHub repos
- Batch comparison mode — evaluate multiple skills side by side
- 3 output formats — Terminal (colorful), JSON, Markdown
- 56 passing tests with comprehensive coverage
Installation
npm install -g skillscoreQuick Start
# Local skill
skillscore ./my-skill/
# From GitHub
skillscore vercel-labs/skills/find-skills
# Batch compare
skillscore ./skill1 ./skill2 --batchScoring Methodology
| Category | Weight |
|---|---|
| Safety | 20% |
| Clarity | 20% |
| Structure | 15% |
| Dependencies | 10% |
| Error Handling | 10% |
| Scope | 10% |
| Documentation | 10% |
| Portability | 5% |
Works with any SKILL.md-based skill — skills.sh, ClaHub, GitHub, or local.