Skip to content

jvalin17/agent-toolkit

Repository files navigation

Agent Toolkit

License: Apache 2.0

Skills, guardrails, and structural hooks for AI coding agents. Plan, build, test, debug, and ship — any repo, any language.

Best on Claude Code — hooks enforce rules the model cannot bypass.
Also works on Cursor, Codex, Gemini, Windsurf, Aider — via project rules (setup guide).


What this is

Piece Purpose
Skills Step-by-step workflows — /explore, /implementation, /precommit, …
Guardrails Safety and quality rules (shared/guardrails.md)
Hooks Structural enforcement on Claude Code — block bad writes, gate commits, route skills

Prompt rules can be ignored. Hooks cannot. On other LLMs you get skills + guardrails via AGENTS.md; you enforce gates manually.

System overview · Architecture docs


Quick start

git clone https://github.com/jvalin17/agent-toolkit.git
cd agent-toolkit && ./install.sh          # once — needs python3, jq, Claude Code

cd /path/to/your-project && claude        # hooks inject context; look for "AGENT TOOLKIT ACTIVE"
/explore .                                # understand the codebase
/precommit                                # before commit (default gate)

Natural language works: "fix the login bug" routes to /debug. Chain hands-off: /requirements auto my-app.

Auto-continuation — sessions use a two-layer limit: at 70 min a breadcrumb is saved to HANDOFF.md (session continues); at 200 min (or first compaction) a hard stop fires with a restart prompt. Two ways to run long tasks:

agent-toolkit-continue "Build auth system"   # interactive — restarts in same terminal
claude-auto "Build auth system"              # headless — for CI/background tasks

Or set "continue": true in gates.json for in-hook restart (headless only). Set "continue": false to disable (session will warn but keep running).

Install details & updates: docs/install-and-updates.md


Daily workflow

When Do this
Building /explore or /requirements/implementation
Committing /precommit → write findings → finalize_report.pygit commit
Pushing (guarded) /evaluate → finalize → git push
python3 hooks/finalize_report.py precommit .scratch/precommit_<slug>/findings.json

With defaults, only the hook writes reports/ and .gates/ — the agent cannot fake gate files.

→ Full commit/push flows: docs/workflow.md · Gate profiles: shared/gate-unlock.md


Skills

Common
/explore Understand existing code
/requirements Gather requirements
/implementation Build with TDD
/precommit Quality gate before commit
/debug Hypothesis-driven debugging
/evaluate Quality score (push gate)

All 13 skills: docs/skills.md


Configuration

All settings live in gates.json at your project root. Use presets or edit directly.

agent-toolkit-setup --status      # show current config
agent-toolkit-setup --balanced    # daily dev (default)
agent-toolkit-setup --guarded     # production
agent-toolkit-setup --lockdown    # strict + all reviews
agent-toolkit-setup --tdd off     # toggle one setting

Presets

Preset Commit requires Push requires Use when
balanced (default) /precommit Daily development
guarded /precommit /evaluate Production branches
lockdown /precommit + /evaluate /evaluate + /reviewer + /assess High-risk changes
quick Local experiments only

All settings

Gate enforcement

Setting Values Default What it does
enforcement block / warn block Whether missing gates prevent or just warn on commit/push
profile minimal / standard / strict / paranoid minimal Which skills are required at commit and push
gate_mode legacy / signed legacy How gates are verified — signed uses JWT for teams/CI
eval_threshold 0100 95 Minimum /evaluate score to pass the push gate

Examples:

// Block commits that skip /precommit (default behavior)
"enforcement": "block"

// Just warn (useful when rolling out gates on an existing project)
"enforcement": "warn"

// Require /evaluate before push (production branches)
"profile": "standard"

// Require /evaluate + /reviewer + /assess before push (high-risk)
"profile": "paranoid"

// Use JWT-signed gates (team repos with branch protection)
"gate_mode": "signed"

// Lower the bar for evaluate score (e.g. early prototypes)
"eval_threshold": 80

TDD & quality

Setting Values Default What it does
tdd true / false true Enable test-first workflow enforcement
tdd_mode remind / strict remind remind = advisory nudge; strict = hard-blocks source edits until tests exist

Examples:

// Nudge to write tests first but don't block (default)
"tdd": true, "tdd_mode": "remind"

// Hard-block: cannot edit src/ until a failing test exists
"tdd": true, "tdd_mode": "strict"

// Disable TDD enforcement entirely (not recommended)
"tdd": false

Security & anti-fake

Setting Values Default What it does
gate_protect true / false true Block agent from writing .gates/ files directly
report_protect true / false true Block agent from writing reports/ files directly
mode normal / strict normal strict enables anti-fake drift detection on fixtures

Examples:

// Default: agent cannot fake passing gates or reports
"gate_protect": true, "report_protect": true

// Strict anti-fake: detect drift in test fixtures and gate provenance
"mode": "strict"

// Disable protections (only for debugging the toolkit itself)
"gate_protect": false, "report_protect": false

Session behavior

Setting Values Default What it does
compact_at_minutes 0+ 70 Layer 1: write HANDOFF.md breadcrumb at this time; session continues
max_session_minutes 0+ 200 Layer 2: hard stop — session ends with restart prompt
continue true / false true Auto-restart session when context is exhausted (headless)
skill_routing true / false true Auto-detect user intent and route to the matching skill
auto true / false false Run skills in auto mode (no confirmation prompts)
model auto / model name auto Override which model the agent uses

Sessions use a two-layer limit system. Layer 1 (compact_at_minutes) writes HANDOFF.md as a breadcrumb so the agent can re-orient after compaction — the session keeps running. Layer 2 (max_session_minutes, or 1 compaction, or 700KB output) is a hard stop that writes HANDOFF.md with a restart prompt you can paste into a new session.

Examples:

// Two-layer defaults: breadcrumb at 70 min, hard stop at 200 min
"compact_at_minutes": 70, "max_session_minutes": 200

// Shorter sessions (e.g. for cost control)
"compact_at_minutes": 30, "max_session_minutes": 60

// Disable Layer 1 breadcrumb (only hard stop at 200 min)
"compact_at_minutes": 0

// Auto-restart when context runs out (headless mode)
"continue": true

// Keep session alive with warnings only (no restart)
"continue": false

// Disable skill routing (manual /skill invocation only)
"skill_routing": false

// Run skills without asking for confirmation
"auto": true

// Pin to a specific model
"model": "opus"

Project commands

Setting Values Default What it does
test_command shell command "python3 -m pytest tests/ -q" Command the toolkit runs to execute tests
lint_command shell command "python3 -m compileall -q ..." Command the toolkit runs to lint/check code

Examples:

// Node.js project
"test_command": "npm test",
"lint_command": "npm run lint"

// Go project
"test_command": "go test ./...",
"lint_command": "golangci-lint run"

// Rust project
"test_command": "cargo test",
"lint_command": "cargo clippy"

// Python with coverage
"test_command": "python3 -m pytest tests/ --cov=src -q",
"lint_command": "ruff check ."

Example: full config

{
  "enforcement": "block",
  "profile": "standard",
  "gate_mode": "legacy",
  "eval_threshold": 95,
  "tdd": true,
  "tdd_mode": "remind",
  "gate_protect": true,
  "report_protect": true,
  "mode": "normal",
  "continue": true,
  "compact_at_minutes": 70,
  "max_session_minutes": 200,
  "skill_routing": true,
  "auto": false,
  "model": "auto",
  "test_command": "npm test",
  "lint_command": "npm run lint"
}

→ Full reference: docs/configuration.md · Signed gates: shared/gate-unlock.md


Documentation

Doc For
System overview How skills, hooks, gates, and reports connect
Daily workflow Commit, push, finalize, gate profiles
Install & updates First setup, auto-sync, manual refresh
Other LLMs Cursor, GPT, Gemini, Windsurf, Aider
Skills reference All 13 skills
Configuration gates.json, presets, signed mode
Gate unlock Legacy vs signed, rare options
Troubleshooting Common failures
Guardrails All G-* rules
Architecture index Design docs, requirements

Advanced

Feature Doc
Auto-continuation (long tasks) agent-toolkit-continue (interactive) / claude-auto (headless) · architecture/auto-continuation.md
TDD strict mode "tdd_mode": "strict" in gates.json — blocks source edits until tests exist
Strict mode (anti-fake) shared/strict-mode.md
Signed gates (teams / CI) shared/gate-unlock.md
Auto mode (/skill auto) shared/orchestrator.md

Troubleshooting

finalize_report.py: No such file or directory

The skill tried to run finalize_report.py using a relative path from a different project directory. Ensure the skill SKILL.md files use the absolute path:

python3 /path/to/agent-toolkit/hooks/finalize_report.py <skill> .scratch/<skill>_<slug>/findings.json

BLOCKED: git commit requires precommit skill

The gate hook blocks commits when no gates.json is found, assuming the project is toolkit-managed. Two fixes:

  1. Upgrade the toolkit — the latest gate hook skips enforcement for repos without gates.json
  2. Temporary bypass — set the env var before your commit:
    AGENT_TOOLKIT_ENFORCEMENT=warn git commit -m "your message"

Run install.sh in project root

The legacy fallback triggers when gates.json exists but has no commit_requires. Either:

  • Add "commit_requires": ["precommit"] to gates.json and run /precommit
  • Or remove gates.json to opt out of gating entirely

Contributing

PRs welcome. Open an issue with battle-tested patterns or bugs you caught.

License

Licensed under the Apache License, Version 2.0. See LICENSE (SPDX: Apache-2.0).

About

Production-ready skills for AI coding agents. 13 skills, 9 agents, harness hooks & quality gates (signed optional for long sessions). Plan, build, test, debug, ship. Any repo, any language. Claude Code native, universal LLM compatible.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors