feat: improve skill score for generate-agent-skills#45
Conversation
|
Hey @yogesh-tessl! I've cherry-picked the two changes I think are clear wins, the improved description frontmatter and the Reference Index table into #46. For the broader rewrite, I want to take a closer look at the tradeoffs rather than land it all at once. Some of the removed content (STOP CONDITIONS, post-validation checklists, the script-vs-checklist decision tree) is there intentionally as LLM behavioral anchoring, and I want to find the right balance between the token optimization your tool identified and the robustness we're aiming for. Your PR is a great reference point for that conversation, so I'll be keeping it open as I work through it. Thanks again for the contribution! 🙏 |
There was a problem hiding this comment.
Thanks again for this @yogesh-tessl,
I've left a few comments on edits I think would be important to keep.
Also, the repo requires signed commits. Could you make sure to update the PR commits (how to)?
It would also be important that you sign the CLA too!
| → Use **scripts** for deterministic execution | ||
| → Examples: Schema validation, API calls, file format conversion | ||
|
|
||
| **Real example from this session:** |
There was a problem hiding this comment.
The analyze_repo.py → analysis_checklist.md real-world example is a great anchor for LLMs that tend to over-script analysis tasks. Would you be open to keeping that inline example and pointing to BEST_PRACTICES.md §6 after it rather than replacing it? Keeps the token saving on the taxonomy lists while preserving the reasoning anchor.
| - If `SKILL.md` does NOT exist → Scaffolding failed, do NOT proceed | ||
| - If you created files manually → You have violated the workflow, DELETE and re-run script | ||
| - If the script reported errors → Fix errors before proceeding to Step 4 | ||
| Validates naming (`^[a-z0-9][a-z0-9-]*[a-z0-9]$`), creates the directory under `.github/skills/`, and generates SKILL.md with placeholders. Verify with `ls -la .github/skills/<skill-name>/` before proceeding. |
There was a problem hiding this comment.
Would you be able to add back a few explicit stop conditions here? Something like: "If SKILL.md does NOT exist after running the script → do NOT proceed." These act as LLM behavioral anchors, they're different from documentation in that they halt the agent on failure rather than just inform it.
|
|
||
| **🛑 STOP CONDITION:** | ||
| If you did NOT run the scaffolding script or manually created files, STOP and re-do from Step 3. | ||
| Fix critical errors before proceeding. Confirm scaffolding script was used, frontmatter includes a "Use when..." clause, and no placeholder files remain. |
There was a problem hiding this comment.
The original checklist here served as a self-compliance gate the model runs against itself. Would you consider a trimmed checklist of the 3–5 most critical items (e.g. ran scaffold script, frontmatter has "Use when...", no placeholder files remain)?
| - **Best practices** (context economy, freedom scale): `references/BEST_PRACTICES.md` | ||
| - **Templates** (frontmatter, structure patterns): `references/TEMPLATES.md` | ||
| - **Workflows** (sequential, conditional, iterative): `references/workflows.md` | ||
| - **Output patterns** (templates, validation checklists): `references/output-patterns.md` |
There was a problem hiding this comment.
Could you add **Do not hallucinate answers.** Always consult the authoritative sources. back after the table? It's a direct instruction to the model and helps meaningfully reduces fabricated references.
Hey @srbouffard 👋 I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | documentation-build | 94% | 94% | — | | documentation-diataxis | 85% | 85% | — | | documentation-review | 88% | 88% | — | | documentation-structure | 79% | 79% | — | | documentation-style | 94% | 94% | — | | documentation-verify | 94% | 94% | — | | **generate-agent-skills** | **65%** | **87%** | **+22%** | | generate-agent | 82% | 82% | — | | generate-path-instructions | 88% | 88% | — | | generate-prompt | 78% | 78% | — | | generate-repo-instructions | 81% | 81% | — | | migrate-harness-tests-to-state-transition-test | 75% | 75% | — | | retrospective-artifacts | 88% | 88% | — | | landscape-jira | 79% | 79% | — | <details> <summary>Changes made to <code>generate-agent-skills</code></summary> - **Rewrote description** to include concrete actions (creates/updates SKILL.md files, scaffolds directories, validates structure) and an explicit "Use when..." clause with natural trigger terms (`new agent skill`, `skill.md`, `skill definitions`, `skill templates`) - **Removed verbose emoji warning blocks** (🚨,⚠️ , 🛑) and replaced with concise mandatory constraints — same rules, 75% fewer tokens - **Condensed Step 2** from 40+ lines to ~5 lines by removing inline resource type taxonomy (already covered in `references/BEST_PRACTICES.md`) and pointing there instead - **Simplified Step 3** scaffolding section — removed redundant verification checkpoint and stop conditions, kept the essential command and naming regex - **Tightened Step 4** content generation — removed explanations that duplicate reference files, kept frontmatter guidance and structure pattern references - **Consolidated Step 5** validation — replaced 25-line checklist with concise confirmation criteria - **Streamlined Step 6** testing — replaced verbose problem/solution pairs with compact iteration checklist - **Added Reference Index** table at bottom for quick navigation to all 5 reference files Net result: 357 → 138 lines (61% reduction) while preserving all workflow steps, validation checkpoints, and reference file pointers. </details> I also stress-tested your `documentation-style` skill against a few real-world task evals and it held up really well on MyST/reST syntax validation with cross-referenced style guide citations. Kudos for that. Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags. Thanks in advance 🙏
Address review comments on PR canonical#45 while keeping the token savings: - Step 2: restore the analyze_repo.py -> analysis_checklist.md reasoning anchor inline, pointing to BEST_PRACTICES.md §6 for the full flowchart - Step 3: restore explicit STOP CONDITIONS (halt on scaffold failure / manual file creation / script errors) - Step 5: restore a trimmed self-compliance checklist (ran scaffold script, no critical errors, "Use when..." clause, no placeholders) - Reference Index: restore "Do not hallucinate answers" instruction These act as behavioral anchors that halt the agent on failure rather than merely inform it.
5e1dc12 to
95459fa
Compare
|
@srbouffard I have dropped another commit for all the comments. Hope that helps. Also the CLA signing for the Tessl is under progress. Thanks! |
|
@DamianReeves The CLA has been signed by my organisation, but it doesn't seem to have been reflected here yet. Could you help me check if there's anything on my end that I might be missing or need to do?
|

Hey @srbouffard 👋
the collections model is a great idea. Being able to subscribe a repo to a set of skills and instructions and keep them synced automatically solves a real pain point when you're managing dozens of repos.
ran your skills through
tessl skill reviewat work and found some targeted improvements. Here's the before/after:Changes made to
generate-agent-skillsnew agent skill,skill.md,skill definitions,skill templates)references/BEST_PRACTICES.md) and pointing there insteadNet result: 357 → 138 lines (61% reduction) while preserving all workflow steps, validation checkpoints, and reference file pointers.
quick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.
If you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.