Skip to content

feat(skills): add per-skill evals and improve description trigger phrases#23

Draft
DataBoyTX wants to merge 1 commit intomainfrom
feat/skill-evals-audit
Draft

feat(skills): add per-skill evals and improve description trigger phrases#23
DataBoyTX wants to merge 1 commit intomainfrom
feat/skill-evals-audit

Conversation

@DataBoyTX
Copy link
Copy Markdown

Summary

  • 8 skills now have evals/evals.json - all user-facing skills were missing per-skill evals: graphistry, pygraphistry, pygraphistry-ai, pygraphistry-connectors, pygraphistry-core, pygraphistry-gfql, pygraphistry-visualization, graphistry-rest-api. Each has 3 positive test cases + 1 negative boundary case with typed assertions (contains/negative).
  • Description frontmatter improved for all 8 skills - replaced passive "Use for..." phrasing with explicit quoted trigger phrases ("Use when asked to...", "Also triggers on..."), and added proactive-suggest clauses. Goal is to reduce undertriggering, per skill-creator best practices.
  • Audit report added at docs/skill-evals-audit-2026-04.md with full findings, priority matrix, and remaining work.

What this does NOT include (deferred to follow-up)

  • P4 - gfql tier sizing: pygraphistry-gfql is 232 lines and could benefit from an examples/ dir to shed the quick-reference tables. Low-risk but needs a separate review.
  • P5 - description optimization loop: Running scripts/run_loop.py against trigger evals for the two router skills (graphistry, pygraphistry) to auto-optimize descriptions. Requires claude CLI and longer iteration cycle.

Test plan

  • python3 scripts/ci/validate_skills.py passes (18 skills validated - confirmed locally)
  • Spot-check one eval JSON per skill is well-formed
  • Review audit report at docs/skill-evals-audit-2026-04.md for completeness
  • Optional: run one skill through the skill-creator eval loop to verify assertion format is recognized

…ases

All 8 user-facing skills were missing evals/evals.json. Each now has
3 positive test cases and 1 negative boundary case with assertions,
following skill-creator best practices.

Also updated description frontmatter for all 8 skills to replace passive
"Use for..." phrasing with explicit quoted trigger phrases, secondary trigger
patterns, and proactive-suggest clauses to reduce undertriggering.

Audit report and priority matrix added to docs/skill-evals-audit-2026-04.md.

Skills updated: graphistry, pygraphistry, pygraphistry-ai,
pygraphistry-connectors, pygraphistry-core, pygraphistry-gfql,
pygraphistry-visualization, graphistry-rest-api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant