| name | skill-maker |
|---|---|
| description | Create, audit, or consolidate agent skills following the Agent Skills open standard (agentskills.io). Interviews the user relentlessly about intent, scope, and edge cases before drafting. Covers SKILL.md structure, frontmatter, progressive disclosure, description optimization, script bundling, sub-command architecture, setup gates, context systems, and review. Use when the user wants to create a skill, write a skill, build a new skill, make a skill, draft a SKILL.md, or mentions "skill-maker". Also use when asked to review a skill, audit a SKILL.md, check why a skill never triggers, improve an existing skill, or fix a skill. Also use when asked to package expertise, workflows, or domain knowledge into a reusable skill. Also use when asked to consolidate skills, merge skills, combine skills, reduce skill count, or refactor multiple skills into one. |
Create agent skills following the Agent Skills open standard.
What do you need to do?
- Audit an existing skill — Review, improve, or debug a SKILL.md
- Create a new skill — Interview, draft, and review from scratch
- Consolidate skills — Merge multiple skills into fewer
Wait for response before proceeding.
| Response | Workflow |
|---|---|
| 1, "audit", "review", "check", "fix", "improve" | Audit Workflow (Step 1–4 in this file) |
| 2, "create", "write", "build", "new", "draft" | Phases 1–5 (Interview → Draft → Description → Scripts → Review) in this file |
| 3, "consolidate", "merge", "combine" | references/consolidation-guide.md — return to Phase 5 for final checklist |
Use this workflow when reviewing, improving, or debugging an existing skill.
Read the full SKILL.md and list all files in the skill directory (references/, scripts/, templates/, assets/).
Check each category. Note issues as you go.
Frontmatter:
-
namematches the directory name, lowercase+hyphens, max 64 chars -
descriptionis under 1024 chars, non-empty, third person -
descriptionincludes trigger phrases (not just a summary of what the skill does) -
descriptioncovers edge phrasings users would actually say
Structure:
- SKILL.md body is under 500 lines
- Essential principles are inline in SKILL.md (not only in a reference file)
- All referenced files exist (check every path in the SKILL.md)
- References are one level deep (no nested chains: A → B → C)
Content quality:
- No rigid ALWAYS/NEVER rules without reasoning (explain WHY)
- No explanations of things the agent already knows from training
- Steps are specific and verifiable (not "handle errors appropriately")
- Success criteria are observable and testable
- Examples use fake data where appropriate
Router pattern (if applicable):
- Intake question asks what the user wants before routing
- Router table maps commands to reference files
- All referenced workflow/reference files exist
- Essential principles are in SKILL.md, not only in sub-command references
- If skill has multiple semantic sections, consider XML tags for structure (see
references/xml-structure-guide.md)
Scripts (if present):
- Scripts have shebangs,
--help, and structured output - No interactive prompts (all input via flags/env/stdin)
- Cross-platform paths (pathlib, no hardcoded separators)
- Error messages explain what went wrong and what to do
Read references/anti-patterns.md for the full catalog of common failures.
Present findings grouped by severity:
- Critical — skill won't trigger or produces wrong output
- Important — structural issues, missing files, spec violations
- Minor — style, conciseness, optimization opportunities
For each finding, state the issue, cite the specific line or section, and recommend a fix.
Ask the user which findings to fix. Apply changes surgically — don't rewrite sections that aren't broken. Run the Phase 5 review checklist on the modified skill before finishing.
Interview the user about every aspect of this skill until reaching shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
Ask one question at a time. Wait for the answer before asking the next. Adapt follow-ups based on what you learn. Each question should provide clear benefit toward building a better skill — cut questions the codebase can answer for you.
If a question can be answered by exploring the codebase, explore the codebase instead of asking.
Focus areas, roughly in order:
- Purpose and audience. What task does this skill cover? What specific problem does it solve? What does the user do today without it?
- Scope boundaries. What should this skill NOT do? What adjacent tasks belong to other skills?
- Input/output. What does the user provide? What does the skill produce? Specific formats?
- Edge cases. What goes wrong? Common mistakes? Gotchas for new users?
- Success criteria. How do you know the skill worked correctly?
- What can be scripted? Look for deterministic operations that should be code, not LLM instructions. Scripts are cheaper, faster, and more reliable.
- References needed? Domain knowledge too large for SKILL.md that should live in separate files?
- Existing patterns. Similar skills or workflows to draw from? Check the codebase.
- Platform constraints. macOS, Windows, and Linux? Scripts must handle path separators, temp directories, and shell differences.
- External services and APIs. Does the skill call external APIs or services? If yes, read
references/api-skill-patterns.md— it covers credential handling, schema discovery, instance-specific values, and error placement.
After the interview questions above, decide the architecture. Most skills are simple — only escalate when the answers demand it.
Question 1: How many distinct things can a user want to do?
- One specific thing → Simple skill (single SKILL.md, under 200 lines)
- Multiple things with shared principles → continue to Q2
Question 2: Is there shared domain knowledge across those operations?
- No, each operation is self-contained → Simple skill (or multiple separate simple skills)
- Yes, multiple operations share knowledge → Router skill (SKILL.md +
references/)
Question 3: Does it cover a full lifecycle (build, debug, test, ship)?
- No → Router skill is sufficient
- Yes → Domain expertise skill (exhaustive references, full lifecycle workflows)
| What you're building | Pattern |
|---|---|
| "A skill that commits with a conventional message" | Simple |
| "A skill that manages PRs — create, review, merge, close" | Router |
| "A skill for building and shipping macOS apps" | Domain expertise |
| "A skill that audits other skills" | Simple (upgrade to Router if it grows) |
For Router and Domain expertise patterns, also ask:
- Does the skill need project-level context? If every command needs the same background, design a context file pattern with a loader script.
- Are there mandatory setup gates? Steps that must pass before any work begins. Gates prevent generic output.
- Does behavior vary by task type? If so, design a register/mode system that classifies the task first, then loads different references.
Read references/architecture-patterns.md for implementation details of each pattern.
Consolidation signal check: If the interview reveals the new skill overlaps significantly with existing skills (shared scripts, cross-references, linear pipeline), consider consolidating instead of creating. Read references/consolidation-guide.md for the signals and workflow.
Do not proceed to Phase 2 until the user confirms the scope is complete.
Write the skill following the spec. Read references/spec-guide.md for the full format reference before drafting.
Starter templates: Use templates/simple-skill.md for single-purpose skills, templates/router-skill.md for multi-command skills using markdown headings, or templates/router-skill-xml.md for multi-command skills using XML structure. Copy the template as a starting point, then customize.
---
name: skill-name # lowercase, hyphens, max 64 chars
description: | # max 1024 chars — this is the ONLY triggering mechanism
What the skill does. Use when [specific triggers].
Also use when [additional triggers].
---The description must be slightly "pushy" — agents tend to undertrigger. Include both what the skill does AND specific phrases/contexts that should activate it.
Follow progressive disclosure — three loading levels:
- Metadata (~100 tokens):
nameanddescriptionloaded at startup for all skills - Instructions (< 500 lines): Full SKILL.md body loaded when skill activates
- Resources (as needed):
references/,scripts/,assets/loaded only when required
Keep the SKILL.md body under 500 lines. If approaching this limit, split domain-specific content into references/ files with clear pointers about when to read them.
Before writing domain knowledge into a new reference file, check if it already exists in another reference. Shared data (exit criteria, field mappings, workflow rules) must live in exactly one file. New references should point to the existing source — not embed a copy.
Common trap: a new sub-command reference duplicates tables from an existing reference because it "needs them for context." Instead, add a one-line pointer: "Load references/workflows.md for exit criteria per status."
Exception: intentional duplication. When two sub-commands need the same query pattern but referencing each other would create a transitive loading chain (A → B → C), duplicate the pattern and add a note: "Same query pattern as X.md Step N — duplicated here to avoid transitive loading." This is cheaper than forcing the agent to load an unrelated file.
- Imperative form: "Run the command" not "You should run the command"
- Explain WHY, not just what: Avoid rigid ALWAYS/NEVER rules without reasoning. Agents generalize from principles better than from rigid rules. Instead of "ALWAYS use pdfplumber. NEVER use PyPDF2," write "Use pdfplumber over PyPDF2 — it handles malformed PDFs more gracefully and preserves layout metadata needed for table extraction." Principles adapt to edge cases; rigid rules break.
- Don't explain what the agent already knows: Skip basic programming concepts, standard library usage, and well-known tool behavior. Only add context the agent doesn't have — project-specific conventions, non-obvious behavior, domain-specific gotchas. A 30-token code example beats a 150-token explanation of what a library is.
- Output templates: Define exact formats when the output structure matters
- Concrete examples: Show input → output for non-obvious workflows
- Gotchas sections: Common mistakes the agent should avoid
- Checklists: Multi-step workflows with validation gates
- Conditional loading: "Read
references/api-errors.mdif the API returns a non-200 status code" — not "see references/ for details" - Absolute bans: When certain patterns are always wrong, use match-and-refuse lists. "If you're about to write X, stop and do Y instead." More effective than vague "be careful" guidance.
- Avoid hardcoded thresholds: Don't write arbitrary numbers as rules (e.g., "when you have 3+ sub-commands" or "if more than 5 issues") unless the threshold comes from a real constraint (API limit, spec requirement). Instead, describe the signal that triggers the behavior (e.g., "when you're copying the same text into another sub-command"). Hardcoded numbers feel authoritative but are usually guesses that don't generalize.
Read references/anti-patterns.md during drafting to avoid known pitfalls.
Agents parse XML tags more reliably than markdown headings when a skill has semantically distinct sections (principles, intake, routing, references). XML tags create unambiguous containers; markdown headings blend together in long prompts.
Read references/xml-structure-guide.md for suggested patterns and anti-patterns.
When XML helps:
- Skills with an intake question + routing table + essential principles
- Skills where an agent needs to quickly locate a specific section
- Skills with inline workflows that need clear start/end boundaries
When markdown is enough:
- Simple skills with a single linear workflow
- Sequential instructional content (phases, steps) where order matters more than section lookup
For skills with multiple distinct operations, use a router table in SKILL.md.
<intake>
## What would you like to do?
1. **Craft a feature** — Build end-to-end
2. **Audit code** — Technical quality checks
**Wait for response before proceeding.**
</intake>
<routing>
| Response | Workflow |
|----------|----------|
| 1, "craft", "build" | `references/craft.md` |
| 2, "audit", "check" | `references/audit.md` |
</routing>Back the router with a scripts/command-metadata.json as the single source of truth:
{
"craft": {
"description": "Full build flow. Use when building a new feature end-to-end.",
"argumentHint": "[feature description]"
}
}Non-negotiable checks before any file edits. Gates prevent generic output from missing context.
## Setup (non-optional)
| Gate | Required check | If fail |
|---|---|---|
| Context | Project config loaded via `python scripts/load_context.py` | Run the loader first |
| Config | Config file exists and is valid | Run `skill-name setup` |
| Command | Sub-command reference is loaded | Load the reference |
| Mutation | All gates above pass | Do not edit project files |When behavior varies by task type, classify first, then load different references:
## Register
Every task is **library** (published, API-stable) or **application** (internal, can break).
Identify before acting. Load the matching reference: [references/library.md] or [references/application.md].Steps that depend on optional environment capabilities (browser automation, specific CLI tools) must degrade gracefully:
### Automated Scan (Capability-Gated)
Run the automated scanner when ALL of these are true:
- The target files exist and are readable
- The required CLI tool is installed
If unavailable, state in one line that the step is skipped and why. Do not ask the user to install tooling.When one command produces output that another consumes, define the artifact structure explicitly. The producing command's reference defines the format; the consuming command's reference says what it expects:
### Plan Structure
**1. Summary** (2-3 sentences)
**2. Primary Goal**
**3. Approach**
...For build/implementation commands, mandate inspect-and-fix passes with explicit exit bars:
### Critique and fix loop
After the first pass, write a short self-critique and patch. Repeat until no material issues remain:
1. Does it match the requirements?
2. Does it pass the [quality test]?
3. Check every expected scenario.
4. Check edge cases.
The exit bar is not "it works." It is: [explicit quality threshold].The description is the only thing agents see at startup. Read references/description-guide.md for the full optimization process.
Quick validation:
- Write 5 should-trigger queries (different phrasings, including ones that don't name the skill directly)
- Write 5 should-not-trigger queries (near-misses that share keywords but need different skills)
- Check: would the description correctly distinguish these?
- Revise if needed — broaden for missed triggers, narrow for false triggers
- Verify under 1024 characters
For skills with sub-commands, the main description covers the skill broadly. Each sub-command's description in command-metadata.json is optimized separately for auto-trigger keyword matching.
Read references/scripts-guide.md for the full guide.
Bias toward scripts. Every deterministic operation should be a script, not an instruction. Scripts are cheaper (no LLM tokens), faster (no reasoning), and more reliable (no hallucination).
For each piece of the skill's workflow, ask: "Could a script do this?" If yes, write the script.
Should be scripts:
- Validation (input format, required fields, schema compliance)
- File generation from templates
- Data extraction and transformation
- API calls with structured responses
- Setup and environment checks
- Output formatting
- Context loading (read project files, resolve paths, return JSON)
- Pin/unpin shortcuts (create/remove command aliases)
- Cleanup (remove deprecated files after skill updates)
Should stay as instructions:
- Deciding between architectural approaches
- Reviewing code for quality or style
- Explaining tradeoffs to the user
- Creative writing or design decisions
- Interview/discovery conversations
Key patterns:
- Python without dependencies: stdlib only,
argparsefor CLI parsing - Python with dependencies: PEP 723 inline metadata with
uv run - All scripts: Structured output (JSON when piped), clear exit codes, descriptive
--help
For skills that need project-level context, write a loader script:
The script should follow all standard patterns: argparse with --help, structured JSON output (pretty when interactive, compact when piped), clear exit codes (0 = found, 1 = missing), pathlib for cross-platform paths, and stdlib-only imports. See the "Context File System" section in references/architecture-patterns.md for a skeleton.
The SKILL.md references it: "Load context via python scripts/load_context.py. Consume the full JSON output. Never pipe through head, tail, or grep."
Before presenting the final skill, verify against this checklist:
-
nameis lowercase, hyphens only, max 64 chars -
descriptionis under 1024 chars and includes trigger phrases -
descriptionis slightly pushy — covers edge phrasings that should activate the skill - SKILL.md body is under 500 lines
- Instructions use imperative form
- Sub-commands have a router table with clear routing rules
-
command-metadata.jsonis the single source of truth for command descriptions - Setup gates are defined with fail actions for each gate
- Register/mode system classifies before loading references
- Capability-gated steps degrade gracefully with one-line skip reasons
- Router/domain skills with distinct sections (intake, routing, principles) consider XML tags for clarity (
references/xml-structure-guide.md)
- Domain knowledge split into
references/with clear "when to read" pointers - Each reference is self-contained — no transitive loading (see
spec-guide.md→ Reference Architecture) - Reference loading is conditional, not eager ("Read X if Y happens")
- Shared concerns (auth, config) extracted into their own reference, not embedded in a consumer
- Error handling lives in the reference for the tool that produces the error
- Multi-approach skills include a decision table routing to the correct reference
- No browser-only tools referenced (Postman, API consoles, OAuth login pages)
- Scripts (if any) have shebangs, structured output, and
--help - Context loader returns JSON, handles missing files, resolves fallback paths
- Scripts are cross-platform (pathlib, tempfile, no hardcoded paths)
- Scripts are idempotent — safe to re-run
- Credential files are never read into context — passed via shell substitution only
- Credential setup is single-sourced in its own reference file
- Capability gate checks for credentials before attempting API calls
- API schema discovery is documented (OpenAPI download, GraphQL introspection, or live endpoints)
- API examples have been validated against the live endpoint
- Instance-specific values include programmatic discovery methods
- No references to old skill names anywhere in the project (
grep -rnthe entire repo) - Router intake menus are sequentially numbered (no gaps from removed items)
- Script docstrings and
--helptext reference the new skill name, not the old ones - Reference paths resolve correctly from each file's location (no
references/references/nesting) - All example files from old skills are represented in the consolidated examples
- Scripts in the same skill use consistent patterns (NO_COLOR, shell flags, TTY checks, exit codes)
- README, ADRs, and other docs updated to reflect new skill structure
- New description covers all trigger phrases from all old skills' descriptions
- No time-sensitive information (URLs to specific versions, dates that will go stale)
- Examples use fake data where possible (emails, names, tokens) — see
spec-guide.md→ Fake Data in Examples - Consistent terminology throughout
- Concrete examples included for non-obvious workflows
- Absolute bans defined for patterns that are always wrong
- Self-critique loops defined for build/implementation commands with explicit exit bars
<reference_index>
| Reference | Load when... |
|---|---|
references/spec-guide.md |
Drafting a SKILL.md (Phase 2) — full format reference |
references/description-guide.md |
Optimizing the description (Phase 3) |
references/scripts-guide.md |
Writing scripts (Phase 4) |
references/anti-patterns.md |
Drafting or auditing — common failures to avoid |
references/architecture-patterns.md |
Choosing between simple, router, and domain expertise patterns |
references/api-skill-patterns.md |
Skill calls external APIs or services |
references/consolidation-guide.md |
Merging multiple skills into fewer |
references/xml-structure-guide.md |
Deciding on XML vs markdown structure |
</reference_index>