Skip to content

feat: improve skill score for generate-agent-skills#45

Open
yogesh-tessl wants to merge 2 commits into
canonical:mainfrom
yogesh-tessl:improve/skill-review-optimization
Open

feat: improve skill score for generate-agent-skills#45
yogesh-tessl wants to merge 2 commits into
canonical:mainfrom
yogesh-tessl:improve/skill-review-optimization

Conversation

@yogesh-tessl

Copy link
Copy Markdown

Hey @srbouffard 👋

the collections model is a great idea. Being able to subscribe a repo to a set of skills and instructions and keep them synced automatically solves a real pain point when you're managing dozens of repos.

ran your skills through tessl skill review at work and found some targeted improvements. Here's the before/after:

Skill Before After Change
generate-agent-skills 65% 87% +22%
Changes made to generate-agent-skills
  • Rewrote description to include concrete actions (creates/updates SKILL.md files, scaffolds directories, validates structure) and an explicit "Use when..." clause with natural trigger terms (new agent skill, skill.md, skill definitions, skill templates)
  • Removed verbose emoji warning blocks (🚨, ⚠️, 🛑) and replaced with concise mandatory constraints - same rules, 75% fewer tokens
  • Condensed Step 2 from 40+ lines to ~5 lines by removing inline resource type taxonomy (already covered in references/BEST_PRACTICES.md) and pointing there instead
  • Simplified Step 3 scaffolding section - removed redundant verification checkpoint and stop conditions, kept the essential command and naming regex
  • Tightened Step 4 content generation - removed explanations that duplicate reference files, kept frontmatter guidance and structure pattern references
  • Consolidated Step 5 validation - replaced 25-line checklist with concise confirmation criteria
  • Streamlined Step 6 testing - replaced verbose problem/solution pairs with compact iteration checklist
  • Added Reference Index table at bottom for quick navigation to all 5 reference files

Net result: 357 → 138 lines (61% reduction) while preserving all workflow steps, validation checkpoints, and reference file pointers.


quick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.

If you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.

@srbouffard

Copy link
Copy Markdown
Collaborator

Hey @yogesh-tessl!
thanks so much for taking the time to run this through your tooling and write up of a detailed before/after breakdown. I really appreciate the transparency around your affiliation too. I've been wondering about integrating some efficiency testing of the skills for some time now.

I've cherry-picked the two changes I think are clear wins, the improved description frontmatter and the Reference Index table into #46.

For the broader rewrite, I want to take a closer look at the tradeoffs rather than land it all at once. Some of the removed content (STOP CONDITIONS, post-validation checklists, the script-vs-checklist decision tree) is there intentionally as LLM behavioral anchoring, and I want to find the right balance between the token optimization your tool identified and the robustness we're aiming for. Your PR is a great reference point for that conversation, so I'll be keeping it open as I work through it.

Thanks again for the contribution! 🙏

@srbouffard srbouffard left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this @yogesh-tessl,
I've left a few comments on edits I think would be important to keep.

Also, the repo requires signed commits. Could you make sure to update the PR commits (how to)?
It would also be important that you sign the CLA too!

→ Use **scripts** for deterministic execution
→ Examples: Schema validation, API calls, file format conversion

**Real example from this session:**

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyze_repo.pyanalysis_checklist.md real-world example is a great anchor for LLMs that tend to over-script analysis tasks. Would you be open to keeping that inline example and pointing to BEST_PRACTICES.md §6 after it rather than replacing it? Keeps the token saving on the taxonomy lists while preserving the reasoning anchor.

- If `SKILL.md` does NOT exist → Scaffolding failed, do NOT proceed
- If you created files manually → You have violated the workflow, DELETE and re-run script
- If the script reported errors → Fix errors before proceeding to Step 4
Validates naming (`^[a-z0-9][a-z0-9-]*[a-z0-9]$`), creates the directory under `.github/skills/`, and generates SKILL.md with placeholders. Verify with `ls -la .github/skills/<skill-name>/` before proceeding.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to add back a few explicit stop conditions here? Something like: "If SKILL.md does NOT exist after running the script → do NOT proceed." These act as LLM behavioral anchors, they're different from documentation in that they halt the agent on failure rather than just inform it.

Comment thread skills/generate-agent-skills/SKILL.md Outdated

**🛑 STOP CONDITION:**
If you did NOT run the scaffolding script or manually created files, STOP and re-do from Step 3.
Fix critical errors before proceeding. Confirm scaffolding script was used, frontmatter includes a "Use when..." clause, and no placeholder files remain.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original checklist here served as a self-compliance gate the model runs against itself. Would you consider a trimmed checklist of the 3–5 most critical items (e.g. ran scaffold script, frontmatter has "Use when...", no placeholder files remain)?

- **Best practices** (context economy, freedom scale): `references/BEST_PRACTICES.md`
- **Templates** (frontmatter, structure patterns): `references/TEMPLATES.md`
- **Workflows** (sequential, conditional, iterative): `references/workflows.md`
- **Output patterns** (templates, validation checklists): `references/output-patterns.md`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add **Do not hallucinate answers.** Always consult the authoritative sources. back after the table? It's a direct instruction to the model and helps meaningfully reduces fabricated references.

Hey @srbouffard 👋

I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after:

| Skill | Before | After | Change |
|-------|--------|-------|--------|
| documentation-build | 94% | 94% | — |
| documentation-diataxis | 85% | 85% | — |
| documentation-review | 88% | 88% | — |
| documentation-structure | 79% | 79% | — |
| documentation-style | 94% | 94% | — |
| documentation-verify | 94% | 94% | — |
| **generate-agent-skills** | **65%** | **87%** | **+22%** |
| generate-agent | 82% | 82% | — |
| generate-path-instructions | 88% | 88% | — |
| generate-prompt | 78% | 78% | — |
| generate-repo-instructions | 81% | 81% | — |
| migrate-harness-tests-to-state-transition-test | 75% | 75% | — |
| retrospective-artifacts | 88% | 88% | — |
| landscape-jira | 79% | 79% | — |

<details>
<summary>Changes made to <code>generate-agent-skills</code></summary>

- **Rewrote description** to include concrete actions (creates/updates SKILL.md files, scaffolds directories, validates structure) and an explicit "Use when..." clause with natural trigger terms (`new agent skill`, `skill.md`, `skill definitions`, `skill templates`)
- **Removed verbose emoji warning blocks** (🚨, ⚠️, 🛑) and replaced with concise mandatory constraints — same rules, 75% fewer tokens
- **Condensed Step 2** from 40+ lines to ~5 lines by removing inline resource type taxonomy (already covered in `references/BEST_PRACTICES.md`) and pointing there instead
- **Simplified Step 3** scaffolding section — removed redundant verification checkpoint and stop conditions, kept the essential command and naming regex
- **Tightened Step 4** content generation — removed explanations that duplicate reference files, kept frontmatter guidance and structure pattern references
- **Consolidated Step 5** validation — replaced 25-line checklist with concise confirmation criteria
- **Streamlined Step 6** testing — replaced verbose problem/solution pairs with compact iteration checklist
- **Added Reference Index** table at bottom for quick navigation to all 5 reference files

Net result: 357 → 138 lines (61% reduction) while preserving all workflow steps, validation checkpoints, and reference file pointers.

</details>

I also stress-tested your `documentation-style` skill against a few real-world task evals and it held up really well on MyST/reST syntax validation with cross-referenced style guide citations. Kudos for that.

Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute.

Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags.

Thanks in advance 🙏
Address review comments on PR canonical#45 while keeping the token savings:

- Step 2: restore the analyze_repo.py -> analysis_checklist.md reasoning
  anchor inline, pointing to BEST_PRACTICES.md §6 for the full flowchart
- Step 3: restore explicit STOP CONDITIONS (halt on scaffold failure /
  manual file creation / script errors)
- Step 5: restore a trimmed self-compliance checklist (ran scaffold
  script, no critical errors, "Use when..." clause, no placeholders)
- Reference Index: restore "Do not hallucinate answers" instruction

These act as behavioral anchors that halt the agent on failure rather
than merely inform it.
@yogesh-tessl yogesh-tessl force-pushed the improve/skill-review-optimization branch from 5e1dc12 to 95459fa Compare June 3, 2026 09:42
@yogesh-tessl

Copy link
Copy Markdown
Author

@srbouffard I have dropped another commit for all the comments. Hope that helps.

Also the CLA signing for the Tessl is under progress. Thanks!

@yogesh-tessl

Copy link
Copy Markdown
Author

@DamianReeves The CLA has been signed by my organisation, but it doesn't seem to have been reflected here yet.

Could you help me check if there's anything on my end that I might be missing or need to do?

Screenshot 2026-06-15 at 2 23 04 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants