feat(skills): add per-skill evals and improve description trigger phrases by DataBoyTX · Pull Request #23 · graphistry/graphistry-skills

DataBoyTX · 2026-04-27T18:48:21Z

Summary

8 skills now have evals/evals.json - all user-facing skills were missing per-skill evals: graphistry, pygraphistry, pygraphistry-ai, pygraphistry-connectors, pygraphistry-core, pygraphistry-gfql, pygraphistry-visualization, graphistry-rest-api. Each has 3 positive test cases + 1 negative boundary case with typed assertions (contains/negative).
Description frontmatter improved for all 8 skills - replaced passive "Use for..." phrasing with explicit quoted trigger phrases ("Use when asked to...", "Also triggers on..."), and added proactive-suggest clauses. Goal is to reduce undertriggering, per skill-creator best practices.
Audit report added at docs/skill-evals-audit-2026-04.md with full findings, priority matrix, and remaining work.

What this does NOT include (deferred to follow-up)

P4 - gfql tier sizing: pygraphistry-gfql is 232 lines and could benefit from an examples/ dir to shed the quick-reference tables. Low-risk but needs a separate review.
P5 - description optimization loop: Running scripts/run_loop.py against trigger evals for the two router skills (graphistry, pygraphistry) to auto-optimize descriptions. Requires claude CLI and longer iteration cycle.

Test plan

python3 scripts/ci/validate_skills.py passes (18 skills validated - confirmed locally)
Spot-check one eval JSON per skill is well-formed
Review audit report at docs/skill-evals-audit-2026-04.md for completeness
Optional: run one skill through the skill-creator eval loop to verify assertion format is recognized

…ases All 8 user-facing skills were missing evals/evals.json. Each now has 3 positive test cases and 1 negative boundary case with assertions, following skill-creator best practices. Also updated description frontmatter for all 8 skills to replace passive "Use for..." phrasing with explicit quoted trigger phrases, secondary trigger patterns, and proactive-suggest clauses to reduce undertriggering. Audit report and priority matrix added to docs/skill-evals-audit-2026-04.md. Skills updated: graphistry, pygraphistry, pygraphistry-ai, pygraphistry-connectors, pygraphistry-core, pygraphistry-gfql, pygraphistry-visualization, graphistry-rest-api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): add per-skill evals and improve description trigger phrases#23

feat(skills): add per-skill evals and improve description trigger phrases#23
DataBoyTX wants to merge 1 commit intomainfrom
feat/skill-evals-audit

DataBoyTX commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DataBoyTX commented Apr 27, 2026

Summary

What this does NOT include (deferred to follow-up)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant