-
Notifications
You must be signed in to change notification settings - Fork 21
feat: improve skill score for generate-agent-skills #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| --- | ||
| name: generate-agent-skills | ||
| description: Architects, generates, and validates Agent Skills. Enforces specification and best practices. Used any time an agent skill must be created or updated. | ||
| description: "Creates and updates SKILL.md files with YAML frontmatter, workflow steps, and bundle resources. Scaffolds directories via scaffold_skill.py and validates via validate_skill.py. Use when creating a new agent skill, writing a skill.md, updating skill definitions, or generating skill templates." | ||
| compatibility: python3 | ||
| allowed-tools: python3 ls grep cat mkdir | ||
| metadata: | ||
|
|
@@ -10,27 +10,13 @@ metadata: | |
|
|
||
| # Agent Skill Architect Workflow | ||
|
|
||
| This skill guides you through creating high-quality Agent Skills following a proven 6-step process. | ||
| Create high-quality Agent Skills following a 6-step process: understand β plan β scaffold β write β validate β test. | ||
|
|
||
| --- | ||
|
|
||
| ## π¨ **CRITICAL WORKFLOW REQUIREMENTS** π¨ | ||
|
|
||
| **Before you begin, understand these NON-NEGOTIABLE rules:** | ||
|
|
||
| 1. **You MUST run `scripts/scaffold_skill.py` in Step 3.** | ||
| Manual file creation is PROHIBITED. The scaffolding script ensures consistency. | ||
|
|
||
| 2. **You MUST use the generated templates.** | ||
| After scaffolding, templates exist in `references/`. Use them as your foundation. | ||
|
|
||
| 3. **You MUST run `scripts/validate_skill.py` in Step 5.** | ||
| Validation catches errors before they propagate. | ||
|
|
||
| 4. **You MUST follow all 6 steps in order.** | ||
| Skipping steps leads to non-compliant or broken skills. | ||
|
|
||
| **If you bypass scaffolding scripts, you have FAILED this workflow.** | ||
| **Mandatory constraints:** | ||
| 1. Run `scripts/scaffold_skill.py` in Step 3 β manual file creation is prohibited. | ||
| 2. Use the generated templates from `references/` as your foundation. | ||
| 3. Run `scripts/validate_skill.py` in Step 5 before finalizing. | ||
| 4. Follow all 6 steps in order. | ||
|
|
||
| --- | ||
|
|
||
|
|
@@ -62,157 +48,54 @@ If working with an existing skill, analyze: | |
|
|
||
| ## Step 2: Planning Reusable Contents | ||
|
|
||
| Analyze the concrete examples from Step 1 to identify what **reusable resources** would help. | ||
|
|
||
| ### β οΈ Critical Decision: Script vs. Checklist | ||
|
|
||
| **Before planning scripts, ask:** "Is this task primarily analysis or computation?" | ||
|
|
||
| **Analysis tasks** (reading, synthesizing, pattern recognition): | ||
| β Use **checklists** or **reference docs** for LLM to follow | ||
| β Examples: Repository analysis, code review, documentation synthesis | ||
|
|
||
| **Computation tasks** (math, APIs, precise transformations): | ||
| β Use **scripts** for deterministic execution | ||
| β Examples: Schema validation, API calls, file format conversion | ||
|
|
||
| **Real example from this session:** | ||
| - β Initially planned `analyze_repo.py` script | ||
| - β Corrected to `analysis_checklist.md` reference | ||
| - **Why:** Repository analysis = LLM strength (reading, pattern detection, synthesis) | ||
|
|
||
| **See `references/BEST_PRACTICES.md` Β§6 for detailed decision flowchart.** | ||
|
|
||
| --- | ||
|
|
||
| ### Ask for each example: | ||
| 1. How would I execute this task from scratch? | ||
| 2. What scripts, references, or assets would make this repeatable? | ||
| 3. **Is this analysis (LLM) or computation (script)?** | ||
| Analyze examples from Step 1 to identify reusable resources. For each, decide: analysis tasks β `references/` (checklists, patterns, domain knowledge); computation tasks β `scripts/` (math, APIs, validation); output artifacts β `assets/` (templates, images, seed data). | ||
|
|
||
| ### Resource Types: | ||
| **Real example:** A repository-analysis task was initially planned as an `analyze_repo.py` script, then corrected to an `analysis_checklist.md` reference β repository analysis is an LLM strength (reading, pattern detection, synthesis), not deterministic computation. When in doubt, prefer a checklist over a script for analysis work. See `references/BEST_PRACTICES.md` Β§6 for the full decision flowchart. | ||
|
|
||
| **scripts/** - For deterministic operations only: | ||
| - β Math/computation (calculations, aggregations) | ||
| - β External interactions (API calls, database queries) | ||
| - β Precise transformations (file format conversion, schema validation) | ||
| - β Repetitive generation (boilerplate rendering) | ||
| - β Analysis tasks (use checklists instead) | ||
| - β Pattern recognition (LLM excels at this) | ||
|
|
||
| **references/** - For LLM-driven analysis and knowledge: | ||
| - β Checklists for systematic analysis (e.g., repository discovery) | ||
| - β Pattern libraries (e.g., positive constraint conversions) | ||
| - β API documentation (endpoints, parameters) | ||
| - β Domain knowledge (company policies, industry standards) | ||
| - β Decision trees and workflows | ||
|
|
||
| **assets/** - For files used in output: | ||
| - β Templates (documents, slides, boilerplate code) | ||
| - β Images (logos, icons, diagrams) | ||
| - β Fonts (typography files) | ||
| - β Seed data (sample datasets, fixtures) | ||
|
|
||
| **Output:** A list of specific files to create with correct categorization (script vs reference) | ||
| **Output:** A list of specific files to create with correct categorization. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 3: Skill Scaffolding | ||
|
|
||
| **β οΈ MANDATORY STEP - DO NOT SKIP β οΈ** | ||
| Run the scaffolding script (use `--simple` for SKILL.md only): | ||
|
|
||
| You MUST execute the scaffolding script. Manual file creation is PROHIBITED. | ||
|
|
||
| ### Command: | ||
| ```bash | ||
| python3 scripts/scaffold_skill.py --name <skill-name> | ||
| ``` | ||
|
|
||
| ### Options: | ||
| - **Default mode:** Creates SKILL.md + example files in scripts/, references/, assets/ | ||
| - **Simple mode:** Use `--simple` flag for minimal structure (SKILL.md only) | ||
|
|
||
| **The script will:** | ||
| - β Validate naming conventions (lowercase, hyphens, alphanumeric) | ||
| - β Create skill directory in .github/skills/ | ||
| - β Generate SKILL.md with structuring guidance | ||
| - β Create example files to demonstrate resource organization | ||
| Validates naming (`^[a-z0-9][a-z0-9-]*[a-z0-9]$`), creates the directory under `.github/skills/`, and generates SKILL.md with placeholders. Verify with `ls -la .github/skills/<skill-name>/` before proceeding. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you be able to add back a few explicit stop conditions here? Something like: "If SKILL.md does NOT exist after running the script β do NOT proceed." These act as LLM behavioral anchors, they're different from documentation in that they halt the agent on failure rather than just inform it. |
||
|
|
||
| **Note:** The script auto-detects `.github/skills` from git root. Naming must match regex: `^[a-z0-9][a-z0-9-]*[a-z0-9]$` | ||
|
|
||
| --- | ||
|
|
||
| ### β **Verification Checkpoint** | ||
|
|
||
| After running the scaffolding script, confirm these files exist: | ||
| ```bash | ||
| ls -la .github/skills/<skill-name>/ | ||
| ``` | ||
|
|
||
| **Expected output:** | ||
| - `SKILL.md` (with "Structuring This Skill" guidance section) | ||
| - `scripts/example.py` (placeholder script) | ||
| - `references/example_reference.md` (placeholder reference) | ||
| - `assets/README.md` (if using default mode) | ||
|
|
||
| **π STOP CONDITIONS:** | ||
| - If `SKILL.md` does NOT exist β Scaffolding failed, do NOT proceed | ||
| - If you created files manually β You have violated the workflow, DELETE and re-run script | ||
| - If the script reported errors β Fix errors before proceeding to Step 4 | ||
| **Stop conditions:** | ||
| - If `SKILL.md` does NOT exist after running the script β do NOT proceed; the scaffolding failed. | ||
| - If you created files manually instead of running the script β delete them and re-run the script. | ||
| - If the script reported errors β fix them before continuing to Step 4. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 4: Content Generation | ||
|
|
||
| Populate the skill with actual content. | ||
|
|
||
| ### 4.1: Implement Reusable Resources First | ||
|
|
||
| Start with scripts/, references/, and assets/ identified in Step 2. | ||
|
|
||
| **For scripts:** | ||
| - Replace `scripts/example.py` with actual implementation | ||
| - Test by running: `python3 scripts/<script_name>.py` | ||
| - Ensure error messages are descriptive (print to stderr) | ||
|
|
||
| **For references:** | ||
| - Replace `references/example_reference.md` with actual docs | ||
| - Keep SKILL.md lean - move details here | ||
| - For large files (>100 lines), add Table of Contents | ||
| ### 4.1: Implement Reusable Resources | ||
|
|
||
| **For assets:** | ||
| - Add actual template files, images, fonts | ||
| - Replace or delete `assets/README.md` | ||
| - Use descriptive filenames | ||
|
|
||
| **Important:** Delete any example files you don't need! | ||
| Replace placeholder files from scaffolding with actual implementations: | ||
| - **scripts/**: Replace `example.py` with real scripts. Test each: `python3 scripts/<name>.py` | ||
| - **references/**: Replace `example_reference.md` with actual docs. Keep SKILL.md lean β move details here. | ||
| - **assets/**: Add template files, images, seed data. Delete unused placeholders. | ||
|
|
||
| ### 4.2: Write SKILL.md Content | ||
|
|
||
| Follow the structuring guidance embedded in the generated SKILL.md template. | ||
|
|
||
| **Choose your structure pattern:** | ||
| - **Workflow-Based:** Sequential processes (see `references/workflows.md`) | ||
| - **Task-Based:** Tool collections with different operations | ||
| - **Reference/Guidelines:** Standards, specifications, coding rules | ||
| - **Capabilities-Based:** Integrated systems with multiple features | ||
|
|
||
| **Key elements:** | ||
| **Frontmatter (YAML):** | ||
| - `name`: Must match directory name exactly. | ||
| - `description`: High-entropy, keyword-rich, 3rd person. Include a "Use when..." clause with trigger scenarios and concrete capabilities. | ||
| - Example: `"Processes PDF documents for form filling, text extraction, and merging. Use when working with PDF files or when user requests document manipulation tasks."` | ||
|
|
||
| 1. **Frontmatter (YAML):** | ||
| - `name`: Must match directory name exactly | ||
| - `description`: High-entropy, keyword-rich, 3rd person | ||
| - Include WHEN to use this skill (triggers) | ||
| - Include WHAT the skill does (capabilities) | ||
| - Example: "Processes PDF documents for form filling, text extraction, and merging. Use when working with PDF files or when user requests document manipulation tasks." | ||
| **Body (Markdown):** | ||
| - Use imperative form ("Run the script", not "You should run"). | ||
| - Reference scripts/references explicitly by path. | ||
| - Choose a structure pattern: workflow-based, task-based, reference/guidelines, or capabilities-based (see `references/workflows.md`). | ||
| - Consult `references/BEST_PRACTICES.md` for the Freedom Scale, `references/output-patterns.md` for output formatting. | ||
|
|
||
| 2. **Body (Markdown):** | ||
| - Use imperative/infinitive form ("Run the script", not "You should run") | ||
| - Reference scripts/references explicitly by path | ||
| - Consult `references/BEST_PRACTICES.md` for the "Freedom Scale" | ||
| - Consult `references/output-patterns.md` for output formatting | ||
|
|
||
| **Delete the "Structuring This Skill" section** when done - it's guidance only! | ||
| Delete the "Structuring This Skill" guidance section when done. | ||
|
|
||
| ### 4.3: Design Patterns | ||
|
|
||
|
|
@@ -231,126 +114,39 @@ Follow the structuring guidance embedded in the generated SKILL.md template. | |
|
|
||
| ## Step 5: Validation | ||
|
|
||
| **β οΈ MANDATORY STEP - DO NOT SKIP β οΈ** | ||
|
|
||
| Run the validation script to ensure specification compliance. | ||
| Run the validation script to ensure specification compliance: | ||
|
|
||
| ### Command: | ||
| ```bash | ||
| python3 scripts/validate_skill.py --path <path-to-skill-root> | ||
| python3 scripts/validate_skill.py --path .github/skills/<skill-name> | ||
| ``` | ||
|
|
||
| **Example:** | ||
| ```bash | ||
| python3 scripts/validate_skill.py --path .github/skills/diagnose-ci-failure | ||
| ``` | ||
| Checks: directory naming, SKILL.md exists, required frontmatter fields (`name`, `description`), name matches directory. Warnings about missing `references/` or `scripts/` are advisory. | ||
|
|
||
| ### What it checks: | ||
| - β Directory naming regex (`^[a-z0-9][a-z0-9-]*[a-z0-9]$`) | ||
| - β SKILL.md exists | ||
| - β YAML frontmatter has required fields (name, description) | ||
| - β Name in YAML matches directory name | ||
| - β οΈ Advisory: Presence of references/ and scripts/ (warnings only) | ||
| Fix critical errors before proceeding. Then confirm: | ||
| - [ ] Ran `scripts/scaffold_skill.py` (did not create files manually) | ||
| - [ ] Ran `scripts/validate_skill.py` with no critical errors | ||
| - [ ] Frontmatter `description` includes a "Use when..." clause | ||
| - [ ] No placeholder files (`example.py`, `example_reference.md`) remain | ||
|
|
||
| **If validation fails:** | ||
| - Read the error output carefully | ||
| - Fix critical violations immediately | ||
| - Warnings are informational (acceptable for simple skills) | ||
|
|
||
| **When valid:** Proceed to testing! | ||
|
|
||
| --- | ||
|
|
||
| ### β **Post-Validation Checklist** | ||
|
|
||
| Before proceeding to Step 6, confirm: | ||
|
|
||
| **Workflow Compliance:** | ||
| - [ ] I RAN `scripts/scaffold_skill.py` (Step 3) | ||
| - [ ] I USED the generated templates from scaffolding | ||
| - [ ] I CONSULTED `references/TEMPLATES.md` and `references/BEST_PRACTICES.md` (Step 4) | ||
| - [ ] I RAN `scripts/validate_skill.py` (Step 5) | ||
| - [ ] Validation script reported SUCCESS (no critical errors) | ||
|
|
||
| **Content Quality:** | ||
| - [ ] YAML frontmatter includes `name` and `description` | ||
| - [ ] Description is high-entropy and keyword-rich | ||
| - [ ] No "Structuring This Skill" guidance section remains in SKILL.md | ||
| - [ ] Example files (`example.py`, `example_reference.md`) are deleted or replaced | ||
| - [ ] Scripts are in `scripts/`, references in `references/`, templates in `assets/` | ||
|
|
||
| **π STOP CONDITION:** | ||
| If you did NOT run the scaffolding script or manually created files, STOP and re-do from Step 3. | ||
| If you did not run the scaffolding script or manually created files, STOP and re-do from Step 3. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 6: Testing and Iteration | ||
|
|
||
| After creating the skill, test and refine based on real usage. | ||
|
|
||
| ### Testing Workflow: | ||
|
|
||
| 1. **Test with real examples** from Step 1 | ||
| - Does the skill trigger on expected queries? | ||
| - Do scripts execute without errors? | ||
| - Is output quality acceptable? | ||
|
|
||
| 2. **Identify friction points:** | ||
| - Are instructions clear enough? | ||
| - Are there missing scripts or references? | ||
| - Is context loaded efficiently? | ||
|
|
||
| 3. **Iterate on improvements:** | ||
| - Update SKILL.md for clarity | ||
| - Add missing examples or edge cases | ||
| - Optimize script error handling | ||
| - Split large references if needed (progressive disclosure) | ||
|
|
||
| 4. **Re-validate** after changes | ||
|
|
||
| ### Common Iteration Patterns: | ||
|
|
||
| **Problem:** Skill isn't triggering when expected | ||
| **Solution:** Enhance description with more keywords and trigger scenarios | ||
|
|
||
| **Problem:** Agent struggles with workflow steps | ||
| **Solution:** Add decision tree or flowchart; consult `references/workflows.md` | ||
|
|
||
| **Problem:** Context feels bloated | ||
| **Solution:** Move content from SKILL.md to references/; add grep hints | ||
|
|
||
| **Problem:** Scripts fail in edge cases | ||
| **Solution:** Add error handling; print descriptive messages to stderr | ||
|
|
||
| **Problem:** Output quality inconsistent | ||
| **Solution:** Add templates or validation checklist; see `references/output-patterns.md` | ||
|
|
||
| ### When to Stop Iterating: | ||
|
|
||
| β Skill triggers reliably on target queries | ||
| β Workflows execute without confusion | ||
| β Output quality meets requirements | ||
| β No critical errors in testing | ||
|
|
||
| --- | ||
|
|
||
| ## Knowledge Retrieval | ||
|
|
||
| If questions arise during skill creation: | ||
|
|
||
| **Specification questions** (naming, structure, required files): | ||
| β Read `references/SPECIFICATION.md` | ||
|
|
||
| **Best practices** (context economy, freedom scale, anti-patterns): | ||
| β Read `references/BEST_PRACTICES.md` | ||
| Test the skill with real examples from Step 1, then iterate: | ||
|
|
||
| **Templates and examples** (frontmatter, structure patterns): | ||
| β Read `references/TEMPLATES.md` | ||
| 1. **Trigger testing**: Does the skill activate on expected queries? If not, add keywords to the description. | ||
| 2. **Execution testing**: Do scripts run without errors? Is output quality acceptable? | ||
| 3. **Context efficiency**: If SKILL.md feels bloated, move content to `references/`. | ||
| 4. **Re-validate** after each round of changes with `scripts/validate_skill.py`. | ||
|
|
||
| **Workflow design** (sequential, conditional, iterative): | ||
| β Read `references/workflows.md` | ||
| ## Reference Index | ||
|
|
||
| **Output formatting** (templates, examples, validation): | ||
| β Read `references/output-patterns.md` | ||
| - **Specification** (naming, structure): `references/SPECIFICATION.md` | ||
| - **Best practices** (context economy, freedom scale): `references/BEST_PRACTICES.md` | ||
| - **Templates** (frontmatter, structure patterns): `references/TEMPLATES.md` | ||
| - **Workflows** (sequential, conditional, iterative): `references/workflows.md` | ||
| - **Output patterns** (templates, validation checklists): `references/output-patterns.md` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add |
||
|
|
||
| **Do not hallucinate answers.** Always consult the authoritative sources. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
analyze_repo.pyβanalysis_checklist.mdreal-world example is a great anchor for LLMs that tend to over-script analysis tasks. Would you be open to keeping that inline example and pointing toBEST_PRACTICES.md Β§6after it rather than replacing it? Keeps the token saving on the taxonomy lists while preserving the reasoning anchor.