feat(skills): add per-skill evals and improve description trigger phrases#23
Draft
feat(skills): add per-skill evals and improve description trigger phrases#23
Conversation
…ases All 8 user-facing skills were missing evals/evals.json. Each now has 3 positive test cases and 1 negative boundary case with assertions, following skill-creator best practices. Also updated description frontmatter for all 8 skills to replace passive "Use for..." phrasing with explicit quoted trigger phrases, secondary trigger patterns, and proactive-suggest clauses to reduce undertriggering. Audit report and priority matrix added to docs/skill-evals-audit-2026-04.md. Skills updated: graphistry, pygraphistry, pygraphistry-ai, pygraphistry-connectors, pygraphistry-core, pygraphistry-gfql, pygraphistry-visualization, graphistry-rest-api
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
evals/evals.json- all user-facing skills were missing per-skill evals:graphistry,pygraphistry,pygraphistry-ai,pygraphistry-connectors,pygraphistry-core,pygraphistry-gfql,pygraphistry-visualization,graphistry-rest-api. Each has 3 positive test cases + 1 negative boundary case with typed assertions (contains/negative)."Use when asked to...","Also triggers on..."), and added proactive-suggest clauses. Goal is to reduce undertriggering, per skill-creator best practices.docs/skill-evals-audit-2026-04.mdwith full findings, priority matrix, and remaining work.What this does NOT include (deferred to follow-up)
pygraphistry-gfqlis 232 lines and could benefit from anexamples/dir to shed the quick-reference tables. Low-risk but needs a separate review.scripts/run_loop.pyagainst trigger evals for the two router skills (graphistry,pygraphistry) to auto-optimize descriptions. RequiresclaudeCLI and longer iteration cycle.Test plan
python3 scripts/ci/validate_skills.pypasses (18 skills validated - confirmed locally)docs/skill-evals-audit-2026-04.mdfor completeness