Harness is an AI Agent development scaffold Meta-Skill that establishes four layers of enhancement for any project in one command: knowledge management, architecture constraints, feedback loops, and entropy management.
Optimized for Claude Code: Harness leverages Claude Code's unique Hook system (SessionStart / PreToolUse / PostToolUse / Stop) for system-level behavior enhancement. Combined with the experimental Agent Teams feature, you can spin up multi-role collaboration (Architect / Engineer / Tester) in one prompt. Dual-layer enhancement (Hook system-level + instruction file rule-level) + Skill auto-matching are fully active on Claude Code, delivering the most complete enhancement experience.
Compatible with 9 AI coding tools: Cursor, Windsurf, Cline, GitHub Copilot, Aider, Continue, Devin, and any tool supporting project-level instruction files (via
AGENT.mdas generic fallback). Instruction file rules and docs/ documentation work universally across all tools, ensuring core enhancements remain effective regardless of your IDE.
AI Agents write code fast, but "fast" brings four core problems:
| Problem | Symptom | Consequence |
|---|---|---|
| Knowledge gaps | Every new session starts from scratch with no project context | Repeated mistakes, violated conventions |
| No constraints | Bad code exists in the codebase, AI copies and produces more bad code | Security vulnerabilities, architecture decay |
| No feedback | "Confidently declares mission accomplished" when it's actually a mess | Production incidents, rework |
| Entropy increase | Writing fast = garbage piles up fast | Technical debt explosion, outdated documentation |
Harness's solution: Establish four layers of enhancement with a single command at project initialization, automatically effective in every subsequent development session.
But it's not just these 4 — we've identified 24 pain points across 7 categories. Here's how Harness addresses each one ↓
24 common pain points in AI-assisted development, organized by category. Each lists Harness's current solution, strength rating, and planned future enhancements. Detailed version with full problem descriptions and solution architecture →
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 1 | AI codes before thinking — jumps to implementation without understanding requirements | superpowers brainstorming HARD-GATE: no code until design approved | ★★★★★ | — |
| 2 | Plans collapse mid-task — AI forgets the plan halfway through | planning-with-files 4 Hooks: re-read task_plan.md before every tool call | ★★★★☆ | Auto-detect plan drift (compare actions vs plan) |
| 3 | One-shot answers — AI gives a single solution without exploring alternatives | superpowers brainstorming forces 2+ approaches with trade-offs | ★★★★☆ | — |
| 4 | No adversarial review — nobody challenges the AI's design | Challenger (C) agent role: CLAIM/CHALLENGE/VERIFICATION/VERDICT | ★★★☆☆ | Auto-invoke Challenger after Architect produces a plan |
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 5 | Context loss after /compact — AI forgets decisions and progress | Compact checkpoint rule: must update progress.md + task_plan.md before compact | ★★★★☆ | Auto-checkpoint hook before compact |
| 6 | New session cold start — AI doesn't know the project | CLAUDE.md (≤150 lines) auto-loaded + docs/ B-tree index for on-demand deep reading | ★★★★★ | — |
| 7 | Repeated mistakes across sessions — same pitfall hit multiple times | claudeception extracts pitfalls into reusable Skills; docs/pitfalls/ records | ★★★★☆ | Auto-match pitfall Skills before coding starts |
| 8 | Context window quality degradation — output quality drops as context grows | Context Recovery 4-step protocol + Token Budget rules (offset+limit, structured output) | ★★★☆☆ | Token pressure monitoring + auto-compact suggestion |
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 9 | "Done" without verification — AI claims completion without evidence | superpowers verification Iron Law + Stop hook checks Phase completion | ★★★★★ | — |
| 10 | No tests — code ships without test coverage | superpowers TDD Iron Law: "NO CODE WITHOUT FAILING TEST FIRST" | ★★★★★ | — |
| 11 | Skips code review — AI produces code nobody reviews | superpowers code-review dispatches reviewer subagent | ★★★★☆ | Auto-trigger review on PR creation |
| 12 | Security vulnerabilities introduced — AI writes insecure code | 3-layer security: CWE defense in CLAUDE.md + secure-coding.md + security-review Skills | ★★★★☆ | Auto-security-scan on every commit (Enterprise hook) |
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 13 | Dead code accumulation — commented-out code, unused imports pile up | CLAUDE.md 5 MUST NOT hygiene rules + quality gate Check #5 | ★★★★☆ | Lint integration in quality gate |
| 14 | Documentation goes stale — docs don't match code after changes | Three-tier doc sync: Lite (self-check) → Standard (dynamic grep) → Full (quality gate) | ★★★★☆ | PostToolUse hook for real-time doc sync reminder |
| 15 | Root directory pollution — test scripts, debug files accumulate | harness-cleanup interactive archive + harness-audit checks root cleanliness | ★★★★☆ | — |
| 16 | FIXME/HACK debt — temporary fixes become permanent | CLAUDE.md rule: resolve within 1 week; harness-audit flags stale FIXMEs | ★★★☆☆ | Track FIXME age in quality gate |
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 17 | API hallucination — AI invents non-existent APIs or library functions | Challenger role verifies claims against source/docs; superpowers Red Flags table | ★★★☆☆ | Auto-verify imports against installed packages |
| 18 | Confident but wrong — AI states incorrect facts with high confidence | Challenger VERDICT system (CONFIRMED/REFUTED/UNVERIFIED) + evidence requirement | ★★★☆☆ | Mandatory citation for architectural claims |
| 19 | Blind copy-paste — AI copies existing bad patterns in the codebase | CLAUDE.md MUST NOT rules + security standards block known bad patterns | ★★★☆☆ | Anti-pattern database from pitfall records |
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 20 | No role separation — same AI does design, coding, testing, review | Agent Team: Architect / Challenger / Engineer / Tester with strict constraints | ★★★★☆ | Workflow orchestration (auto role transitions) |
| 21 | Experience not captured — hard-won knowledge lost after session ends | claudeception continuous learning + UserPromptSubmit hook evaluation | ★★★★☆ | Auto-extract on session end (not just /claudeception) |
| 22 | No project health visibility — don't know if enhancements are working | harness-audit: scan completeness of CLAUDE.md/docs/hooks/skills, output score | ★★★☆☆ | Trend tracking across audits |
| # | Pain Point | Current Solution | Strength | Future Enhancement |
|---|---|---|---|---|
| 23 | Secret leaks in commits — API keys, credentials committed to git | Enterprise Hook: pre-commit secret scan + CLAUDE.md MUST NOT .env/.key/.pem | ★★★★☆ | Default-on secret scanning (not Enterprise-only) |
| 24 | Supply chain attacks — malicious dependencies slip in | supply-chain-audit Skill (8 languages) + sca-ai-denoise for vulnerability triage | ★★★★☆ | Auto-audit on dependency changes |
Overall: 4 fully solved (★★★★★), 13 strong (★★★★☆), 7 partial (★★★☆☆), 0 unsolved. ★★★★★ = system-level enforcement, ★★★★☆ = strong with minor gaps, ★★★☆☆ = partial, enhancement planned.
┌─────────────────────────────────────────────────────────┐
│ Harness Enhancement System │
├──────────────┬──────────────┬──────────────┬────────────┤
│ Layer One │ Layer Two │ Layer Three │ Layer Four │
│ Knowledge │ Architecture│ Feedback │ Entropy │
│ Mgmt 📋 │ Constraints 🚧│ Loops 🔄 │ Mgmt 🧹 │
│ │ │ │ │
│ CLAUDE.md │ Hook-based │ TDD │ Code │
│ docs/ tree │ enhancement │ Code Review │ hygiene │
│ Agent Team │ Security │ Verification│ Doc sync │
│ Skill │ standards │ gates │ Pitfall │
│ ecosystem │ CWE defense │ Security │ records │
│ │ Behavior │ review │ Knowledge │
│ │ red lines │ │ extraction│
└──────────────┴──────────────┴──────────────┴────────────┘
Problem: The AI Agent doesn't know your project's background, conventions, or habits.
Harness's solution:
1. CLAUDE.md — The AI's Onboarding Manual
After automatically analyzing the project, Harness generates a lean CLAUDE.md (≤150 lines) that serves as the AI's first reading material at the start of every session:
# MyProject
One-line description
## Documentation Navigation
| Category | Path | Content |
|----------|------|---------|
| Architecture | docs/architecture/ | System architecture, tech stack, DB, API |
| Dev Conventions | docs/conventions/ | must-follow / must-not / secure-coding |
| Pitfall Records | docs/pitfalls/ | Categorized by tech stack |
## Behavior Rules (MUST FOLLOW)
- MUST brainstorm before writing code (HARD-GATE)
- MUST write tests before implementation (TDD)
- MUST security review before committing
- MUST NOT leave dead code / debug outputWhy "slim down"? Because CLAUDE.md is read in full at every session. Stuffing 500 lines into it = wasted tokens + key information buried. Detailed content is split into docs/ sub-documents, loaded on demand.
2. docs/ — Multi-Level Index Tree (AI's B-tree Memory)
AI's memory isn't a pile of documents — it's an index tree. Each level stores only pointers; only leaf nodes store content:
L0: CLAUDE.md (≤150 lines)
→ 5 category pointers, zero actual content
│
L1: docs/xxx/INDEX.md (≤50 lines)
→ Module list + one-line summary + last updated date
│
L2: docs/xxx/module/INDEX.md (≤30 lines)
→ Leaf document list + timeline (when was what added/changed)
│
L3: docs/xxx/module/topic.md (≤150 lines)
→ Actual content (the ONLY level with real content)
AI memory recovery path: read L0 → determine direction → read L1 → locate module → read L2 → find document → read L3. Only one index level is read at a time, minimizing token consumption.
Auto-scaling: Early projects: L1 points directly to leaves (two levels suffice). As the project grows, L2 subdirectories naturally emerge. When L1 INDEX exceeds 10 entries → upgrade to three levels. When a leaf exceeds 150 lines → auto-split.
docs/
├── architecture/INDEX.md → system-overview.md, tech-stack.md, db-schema.md, api-reference.md
├── implementation/INDEX.md → auth/INDEX.md, payment/INDEX.md, export/INDEX.md ...
│ └── auth/INDEX.md → oauth2-flow.md, rbac-model.md, session-mgmt.md + timeline
├── conventions/INDEX.md → must-follow.md, must-not.md, secure-coding.md
├── pitfalls/INDEX.md → docker/INDEX.md, database/INDEX.md ... + timeline
└── backlog/INDEX.md → optimization.md, features.md
3. Agent Team — Role-Based Division of Labor
Different roles read different docs and follow different constraints:
| Role | Responsibilities | Constraints |
|---|---|---|
| Architect (A) | Planning, design, interaction, commits, docs | Must brainstorm → user approval → write design doc |
| Challenger (C) | Adversarial review of plans, designs, and claims | Never accept claims without evidence; verify API usage, thread safety, edge cases |
| Engineer (E) | Coding, fixes, refactoring | Must TDD, must not touch architecture-level config |
| Tester (T) | Write tests, verify | Must not modify business code, only report bugs |
Trigger methods:
- Natural language:
"Have Agent B implement this feature"/"Have Agent C run tests to verify" - Agent tool:
"You are Agent B (Engineer). Read .harness/agents/agent-b-engineer.md for your responsibilities. Task: ..." - Isolation mode: Agent tool +
isolation: "worktree"(separate branch for complex tasks)
Extensible: Frontend / Backend / DevOps / DBA / Security.
4. Skill Ecosystem — Reusable Capability Library
Harness doesn't reinvent the wheel — it references and orchestrates existing open-source Skills:
~/.claude/skills/
├── superpowers/ ← Behavior control methodology (14 sub-Skills)
├── planning-with-files/ ← Plan persistence
├── claudeception/ ← Knowledge extraction & automatic Skill generation
├── skill-creator/ ← General-purpose Skill generator
├── security-review-skill-creator/ ← Security audit Skill generator
├── frontend-design/ ← Frontend development
├── web-vuln-analyzer/ ← Web vulnerability analysis
├── sca-ai-denoise/ ← SCA vulnerability denoising
├── supply-chain-audit/ ← Supply chain poisoning detection
└── harness/ ← This Skill (for project initialization)
Problem: Written rules alone aren't enough — AI will rationalize skipping them. You need an "access control system."
Harness's solution: Dual-layer enhancement + Skill auto-matching (Open-Source vs Enterprise Mode)
┌──────────────────────────────────────────────────────┐
│ Layer 1: Hooks — Access Control │
│ │
│ ── Core Hooks (enabled by default) ── │
│ SessionStart → Auto-inject superpowers methodology│
│ UserPromptSubmit → Show plan status + knowledge │
│ extraction reminder │
│ PreToolUse → Re-read task_plan.md before every │
│ tool call │
│ PostToolUse → Remind to update progress after │
│ writes │
│ Stop → Check completion status on exit │
│ │
│ ── Security Gate Hooks [Optional/Enterprise] ── │
│ PreToolUse(Bash) → git commit secret/sensitive │
│ file interception │
│ PreToolUse(Bash) → commit format validation │
│ PreToolUse(Bash) → dangerous command interception │
│ PostToolUse(W/E) → code security anti-pattern │
│ scan (WARNING) │
├──────────────────────────────────────────────────────┤
│ Layer 2: CLAUDE.md — Written Rules (instruction- │
│ level, strongly persuasive) │
│ │
│ MUST brainstorm → MUST /plan → MUST TDD │
│ MUST security review → MUST code review │
│ MUST NOT dead code → MUST NOT claim completion │
│ without verification │
│ MUST NOT eval()/exec() with user input — CWE-95 │
│ MUST NOT shell=True with user args — CWE-78 │
│ MUST NOT f-string SQL concatenation — CWE-89 │
├──────────────────────────────────────────────────────┤
│ Skill Auto-matching — Semantic Trigger │
│ (skill description matched to current task) │
│ │
│ Iron Laws: "NO CODE WITHOUT FAILING TEST FIRST" │
│ Red Flags: 13 common "rationalized skip" excuses │
│ → intercepted one by one │
│ HARD-GATE: No code allowed until design is approved │
└──────────────────────────────────────────────────────┘
Open-Source Mode vs Enterprise Mode:
| Mode | Layer 1 | Layer 2 | Skill Auto-matching | Security Assurance |
|---|---|---|---|---|
| Open-Source (default) | Core Hooks (3 Skills) | MUST/MUST NOT text rules | Skill semantic trigger | AI self-discipline + text rules |
| Enterprise (optional) | Core + 4 security gate Hooks | Same + HOOK enforcement layer | Same | System-level technical gates |
- Open-Source Mode (default): Zero additional blocking — security relies on CLAUDE.md MUST/MUST NOT rules for AI self-discipline. Ideal for individual developers, open-source projects
- Enterprise Mode (optional): Enables 4 security gate Hook scripts (pre-commit / commit-msg / dangerous-cmd / write-scan) for system-level enforcement. Ideal for enterprise teams, projects with high compliance requirements
- Enterprise mode Hook scripts are in
references/hook-scripts.md; users choose whether to enable during Harness initialization Step 2
Why dual-layer enhancement + Skill auto-matching?
- Hooks are the strongest safeguard: system-level enforcement, AI cannot skip them (Claude Code only)
- CLAUDE.md is the fallback: rules remain effective even without Hook configuration (universal across all AI tools)
- Skills are the semantic trigger: superpowers' Iron Laws + Red Flags make the AI "automatically stop when tempted to skip"
Security Standards (hard control, non-negotiable):
docs/conventions/secure-coding.md contains three parts, all mandatory baselines:
Part A: 15 high-risk CWE defenses (SQL injection/command injection/XSS/SSRF/deserialization...)
→ All code changes must comply, no exceptions
Part B: OWASP Top 10 coding standards (13 rules, with code examples)
→ Mandatory baselines when writing code
Part C: AI Agent security red lines:
❌ No reverse shells / C2 callbacks
❌ No intranet tunneling / port forwarding to external
❌ No data exfiltration / credential theft
❌ No backdoor installation / hidden user creation
❌ No privilege escalation / disabling security mechanisms
❌ No code obfuscation / supply chain poisoning
Security standards are written into CLAUDE.md behavior rules to ensure they take effect automatically at every session. For in-depth audits, the security-review skill is triggered for a full check.
Problem: After finishing work, the AI doesn't know if it did well, and it will "confidently declare mission accomplished."
Harness's solution: Multiple verification mechanisms
Code complete
│
├─ Automatic feedback: Tests
│ TDD Iron Law → Every line of code has a corresponding test
│ Finish code → Run tests immediately → Red/green light instant feedback
│
├─ Agent reviews Agent: Code Review
│ superpowers:requesting-code-review
│ → Dispatch code-reviewer subagent (another AI reviews)
│ → Checks: correctness / security / test coverage / performance / compatibility
│
├─ Security Review
│ → MUST execute security review checklist before committing
│ → Has project-specific security-review skill → auto-audit
│ → Doesn't have one → security-review-skill-creator generates one first
│
├─ Completion Verification
│ superpowers:verification-before-completion
│ Iron Law: NO COMPLETION CLAIMS WITHOUT FRESH EVIDENCE
│ → Tests pass + lint passes + security review passes → only then can claim "done"
│
└─ Debugging Feedback: Systematic Debugging
superpowers:systematic-debugging
Iron Law: NO FIXES WITHOUT ROOT CAUSE
→ 4 stages: root cause investigation → pattern analysis → hypothesis verification → implement fix
superpowers' "anti-rationalization" design:
The AI's biggest problem isn't not knowing the rules — it's being skilled at finding excuses to skip rules. superpowers addresses this with a "Red Flags table" — listing 13 common excuses and intercepting each one:
| AI's Inner Monologue | Reality |
|---|---|
| "This is just a simple problem" | Simple problems still need checking for applicable Skills |
| "Write code first, add tests later" | That's not TDD, that's "tests as an afterthought" |
| "Time is tight, skip Review" | Code shipped without Review takes even longer to fix later |
| "Rewriting from scratch is wasteful" | Sunk cost fallacy — the time has already been spent |
Problem: AI works fast = technical debt accumulates fast. Docs go stale, dead code piles up, experience isn't captured.
Harness's solution: Continuous cleaning + knowledge crystallization
1. Code Hygiene (before every commit)
MUST DO:
✅ Delete unused code (don't comment it out — git has history)
✅ Delete debug print / console.log
✅ Delete unused imports / variables / functions
✅ Delete temporary files (test scripts go in tests/)
✅ Clean up unused dependencies
MUST NOT:
❌ Don't leave "just in case" commented-out code
❌ Don't leave empty except/catch blocks
❌ Don't pile up scripts in the root directory
❌ Don't leave FIXME/HACK unresolved for more than 1 week
2. Documentation Sync (three-tier quality gate)
Documentation sync is enforced at three levels to balance thoroughness with speed:
| Level | Trigger | What Happens |
|---|---|---|
| Lite | After editing source code (CLAUDE.md behavior rule) | Self-check: does a matching doc in docs/ exist? If yes and content affected → update now |
| Standard | Claiming "done" or "complete" | Auto quality gate: dynamic scan of git diff + grep docs/ for references to changed modules |
| Full | "quality gate" or "ready to commit" | All 7 checks including tests, lint, security, doc sync, hygiene, progress, commit format |
The quality gate dynamically detects which docs need updating (no hardcoded file mappings) — it scans docs/ structure and greps for references to changed files.
3. Pitfall Records (auto-triggered after debugging for >10 minutes)
Choose either approach:
Approach A: Write to docs/pitfalls/
## Problem Title
Symptom → Root Cause → Solution → Prevention
Approach B: /claudeception to generate a Skill
→ Automatically extract reusable knowledge from the pitfall experience
→ Generate .claude/skills/<pitfall-name>/SKILL.md
→ Next time a similar problem occurs, the Skill triggers automatically
4. Knowledge Crystallization (claudeception — Continuous Learning System)
Work session
│
├─ UserPromptSubmit hook continuously reminds:
│ "Is there any non-obvious knowledge extractable from this task?"
│
├─ Trigger conditions:
│ • Debugged for >10 minutes on a non-documented issue
│ • Discovered a workaround through trial and error
│ • Project-specific non-obvious pattern
│
├─ Quality gate (all 5 criteria must be met):
│ ✅ Reusable (not just useful this one time)
│ ✅ Non-trivial (not something findable in documentation)
│ ✅ Specific (has clear trigger conditions and steps)
│ ✅ Verified (confirmed the solution works)
│ ✅ Actionable (general enough to reuse, specific enough to execute)
│
└─ Output: new SKILL.md
→ Stored in .claude/skills/ (project-level)
→ Or ~/.claude/skills/ (user-level, shared across projects)
→ Automatically matched and triggered on similar problems next time
Complete Workflow: Initialization → Feature Implementation → Code Review → Wrap-up →
A feature goes through 7 phases, each with independent verification:
Phase 0: [explore](bundled-skills/explore/SKILL.md) — Understand existing architecture via knowledge graph
Phase 1: brainstorming → spec (HARD-GATE, no code until approved)
Phase 1.5: [design-review](bundled-skills/design-review/SKILL.md) spec — Independent challenger verifies spec references
Phase 2: writing-plans → plan → [design-review](bundled-skills/design-review/SKILL.md) plan — Challenger verifies plan + spec coverage
Phase 3: subagent-driven-development — TDD execution per task
Phase 4: [ai-implementation-integrity](../../ai-implementation-integrity/SKILL.md) — Dead code / hallucinated API / chain connectivity checks
Phase 5: code-review + security-review
Phase 6: finish branch + claudeception (knowledge extraction)
Every phase has an independent verifier — no phase trusts the previous phase's agent. Read the full workflow →
In any project directory:
You: harness
Harness interactively executes 8 steps:
Step 1: Analyze project (language/framework/structure) → Display project profile → User confirms
Step 2: Install Skill ecosystem + configure Hooks → Dual-layer enhancement ready
Step 3: Deep information gathering (3 parallel Agents read code/history/docs) → Generate CLAUDE.md + docs/
Step 4: Design Agent Team (interactive role selection + trigger methods)
Step 5: Inject development conventions + security standards
Step 6: Create .harness/ planning infrastructure
Step 7: Display summary + getting started guide
Step 8: Scenario integration verification (11 scenario coverage check)
Resulting project structure:
project/
├── CLAUDE.md ← Slim index + behavior rules
├── docs/ ← Multi-level documentation system
│ ├── architecture/
│ ├── implementation/
│ ├── conventions/ ← must-follow + must-not + secure-coding
│ ├── pitfalls/ ← Pitfall records (accumulated during development)
│ └── backlog/
├── .harness/ ← Agent Team + plan templates
│ ├── agents/
│ ├── plans/
│ └── templates/
└── .claude/settings.json ← Hook configuration
After initialization, every subsequent development session automatically enters the enhancement system:
Developing a new feature:
You: Help me build a user export feature
Agent (auto-triggers brainstorming HARD-GATE):
→ Don't write code yet, let me understand the requirements
→ Clarifying questions: Export format? Data volume? Access control?
→ Propose 2 approaches + trade-offs
→ You choose an approach → Write design doc
Agent (auto-triggers writing-plans):
→ Split into 5 small tasks (2-5 minutes each)
→ Write to task_plan.md
Agent (auto-triggers TDD):
→ Write tests first: test_export_csv / test_export_permission / test_export_large_data
→ Tests red → Write implementation → Tests green
Agent (auto-triggers code-review):
→ Dispatch reviewer subagent for review
→ Review passes → security review → verification → done
Fixing a bug:
You: Users report the export feature is timing out
Agent (auto-triggers systematic-debugging):
→ Don't rush to change code, find the root cause first
→ 4 stages: investigate → analyze patterns → verify hypothesis → fix
Agent (after fix, claudeception hook triggers):
→ "This debugging session found that exporting large datasets needs streaming — record as a Skill?"
→ Generate .claude/skills/export-streaming-fix/SKILL.md
Security audit:
You: Generate a security audit skill for this project
Agent (triggers security-review-skill-creator):
→ Analyze project tech stack (Python + FastAPI + PostgreSQL)
→ Generate customized audit rules
→ Store in .claude/skills/security-review-skill-for-myproject/
You: Audit the code for security
Agent (triggers the generated audit Skill):
→ Audit item by item using project-specific rules
→ Output findings + remediation recommendations
Harness doesn't reinvent the wheel — it orchestrates existing open-source Skills to build the enhancement system:
┌─────────────┐
│ Harness │ ← Project initialization entry point
│ Meta-Skill │
└──────┬──────┘
│ orchestrates
┌───────────────────┼───────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ superpowers │ │ planning- │ │claudeception│
│ (obra) │ │ with-files │ │ (blader) │
│ │ │ (OthmanAdi) │ │ │
│ 14 sub-Skills│ │ │ │ Knowledge │
│ brainstorming│ │ /plan cmd │ │ extraction │
│ writing-plans│ │ 4 Hooks │ │ Skill gen │
│ TDD │ │ Session │ │ Quality │
│ debugging │ │ recovery │ │ gates │
│ code-review │ │ │ │ │
│ verification │ │ │ │ │
└──────────────┘ └─────────────┘ └─────────────┘
│
│ generates
┌──────▼──────────────────────────────────────┐
│ Skill Factory │
│ skill-creator → General Skill generation │
│ security-review-skill-creator → Security │
│ audit generation │
│ superpowers:writing-skills → TDD-style │
│ Skill writing │
│ claudeception → Extract Skills from │
│ experience │
└─────────────────────────────────────────────┘
│
│ outputs
┌──────▼──────────────────────────────────────┐
│ Project-Specific Skills (.claude/skills/) │
│ security-review-skill-for-<project> │
│ <pitfall-name> Skill │
│ <workflow-name> Skill │
└──────────────────────────────────────────────┘
| Skill | Source | Hook Mechanism | Enhancement Role |
|---|---|---|---|
| superpowers | obra/superpowers | SessionStart injects methodology | Layer Two (constraints) + Layer Three (feedback) |
| planning-with-files | OthmanAdi/planning-with-files | 4 Hooks continuously inject plans | Layer One (knowledge) + Layer Four (entropy) |
| claudeception | blader/Claudeception | UserPromptSubmit reminder | Layer Four (entropy) + Layer One (knowledge) |
A core design goal of Harness is that you don't need to manually invoke Skills during day-to-day development. The dual-layer enhancement mechanism (Hook + CLAUDE.md + Skill description) makes key methodologies take effect automatically in the background.
Automatic Triggering (Hook + CLAUDE.md rules — no manual action needed)
| Skill | Trigger Mechanism | What You Experience |
|---|---|---|
| superpowers (brainstorming) | SessionStart hook + CLAUDE.md HARD-GATE | Say "build a new feature" → Claude automatically asks requirements, proposes solutions, refuses to write code directly |
| superpowers (TDD) | SessionStart hook Iron Law | Claude automatically writes tests before implementation |
| superpowers (systematic-debugging) | SessionStart hook | Say "this bug" → Claude automatically follows root cause analysis instead of blind fixes |
| superpowers (verification) | SessionStart hook + Stop hook | Claude automatically runs verification before claiming "done" |
| planning-with-files | 4 Hooks full coverage | Every input shows plan status, re-reads plan before tool calls, reminds to update progress after writes |
| claudeception (evaluation) | UserPromptSubmit hook | On every input, Claude internally evaluates "is there extractable knowledge" but won't interrupt you |
| harness-quality-gate | CLAUDE.md commit rules | Claude auto-checks tests/lint/security/doc sync before commits |
Manual Triggering (on-demand)
| Skill | How to Trigger | Notes |
|---|---|---|
| superpowers (code-review) | "Help me review this code" | You must initiate; Claude dispatches a reviewer subagent |
| claudeception (generate Skill) | "/claudeception" or "summarize this as a skill" | Hook only evaluates; actual extraction requires you to say the word |
| security-review-skill-creator | "Generate a security audit skill for this project" | On-demand project-specific audit rule generation |
| skill-creator | "Help me create a skill for XX" | On-demand workflow codification |
| harness-audit | "harness audit" | On-demand project health check |
| harness-guide | "harness guide" or "recommend a skill" | On-demand Skill recommendation |
| harness-cleanup | "harness cleanup" or "clean up temp files" | Interactive archive of temp files to archive/ |
| harness-resume | "harness resume" | Lightweight context recovery after /compact (~3k tokens) |
| harness-handoff | "harness handoff" | Deep handoff for new agent takeover (~8k tokens) |
In short: Hooks handle the automatic, you handle the decisions. During daily coding, brainstorming/TDD/debugging/planning all work automatically — you only need to speak up when you want a review, want to capture experience, or want to generate a new Skill.
After Harness is set up, the following scenarios automatically receive protection:
| # | Scenario | Protection Mechanism |
|---|---|---|
| 1 | Feature Development | SessionStart injects methodology → planning 4 hooks track → CLAUDE.md TDD/Review rules → claudeception knowledge extraction |
| 2 | Bug Debugging | SessionStart injects systematic-debugging → planning tracks → claudeception pitfall reminder |
| 3 | Plan Execution | planning-with-files 4 hooks full coverage (display → re-read → remind to update → completion check) |
| 4 | Architecture Changes | CLAUDE.md HARD-GATE → superpowers brainstorming → docs/ sync rules |
| 5 | Task Completion | superpowers verification → planning Stop hook checks Phase completion |
| 6 | Refactoring | planning 4 hooks + .harness/templates/refactor.md (Invariants constraints) + TDD |
| 11 | Code Hygiene | CLAUDE.md 5 MUST NOT rules + claudeception reminder |
| # | Scenario | Protection Mechanism | Notes |
|---|---|---|---|
| 7 | PR / Code Review | CLAUDE.md review rules + superpowers:requesting-code-review | User-initiated, semantic-level is sufficient |
| 9 | DB Migration | planning tracks + CLAUDE.md doc sync rules | High-risk but low-frequency, plan constraints are sufficient |
| # | Scenario | Coverage Method |
|---|---|---|
| 8 | Dependency Updates / SCA | Security scanning platform SCA scan + sca-ai-denoise denoising |
| 10 | Hotfix / Emergency Fix | Shares mechanism with Scenario 1 + security scanning platform incremental scan |
New feature request
│
▼
┌─────────────────────┐
│ 1. Brainstorming │ ← superpowers HARD-GATE
│ Explore context │ No code allowed until design is approved
│ Clarify needs │
│ Propose approaches│
│ + trade-offs │
│ User approves │
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 2. Writing Plans │ ← superpowers + planning-with-files
│ Split into small │ task_plan.md persisted
│ tasks │ Hooks continuously inject status
│ 2-5 min each │
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 3. TDD │ ← superpowers Iron Law
│ Write failing test│ "NO CODE WITHOUT FAILING TEST"
│ Write minimal impl│
│ Tests pass │
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 4. Code Review │ ← superpowers:requesting-code-review
│ Subagent reviews │ Agent reviews Agent
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 5. Verification │ ← superpowers Iron Law
│ Tests pass? │ "NO COMPLETION WITHOUT EVIDENCE"
│ Lint passes? │
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 6. Knowledge Capture│ ← claudeception + pitfalls
│ Extractable │ /claudeception or docs/pitfalls/
│ knowledge? │ PostToolUse hook reminder
│ Docs need update? │
└─────────────────────┘
Harness is not limited to Claude Code — it's compatible with all major AI coding tools. During initialization, it auto-detects the current environment and generates the appropriate instruction file:
| AI Tool | Instruction File | Format |
|---|---|---|
| Claude Code | CLAUDE.md |
Markdown (tables/links/code blocks) |
| Cursor | .cursorrules or .cursor/rules/*.md |
Plain text (indentation instead of tables) |
| Windsurf | .windsurfrules |
Plain text |
| Cline | .clinerules |
Plain text |
| GitHub Copilot | .github/copilot-instructions.md |
Markdown |
| Aider | CONVENTIONS.md |
Markdown |
| Continue | .continuerules |
Plain text |
| Devin | devin.md |
Markdown |
| Generic / Unknown | AGENT.md |
Markdown |
Same content, adapted format: All instruction files contain the same project knowledge, behavior rules, and documentation navigation — only the format is adjusted per tool (Markdown tools get tables/links, plain text tools get indentation/lists).
Portability of Dual-Layer Enhancement:
- Layer 1 (Hooks): Claude Code only — other tools skip this layer
- Layer 2 (Instruction file rules): Universal across all tools — only the filename differs
- docs/ documentation: Fully universal — all AI tools can read Markdown docs
Harness provides 7 post-initialization commands, installed as independent Skills in bundled-skills/:
| Command | Trigger Words | Description |
|---|---|---|
harness help |
"harness help", "harness commands", "what commands" | Command index + installed Skill inventory + scenario quick entries |
harness audit |
"harness audit", "project health check", "harness status" | Scan CLAUDE.md/docs/hooks/skill completeness, output score + remediation |
harness quality gate |
"quality gate", "ready to commit", or auto on "done" | 3 levels: Lite (doc sync), Standard (hygiene+docs+progress), Full (all 7 checks) |
harness guide |
"recommend skill", "which skill", "skill recommendation" | Read scenario→Skill recommendation matrix, match the best Skill |
harness cleanup |
"harness cleanup", "clean up temp files", "archive temp files" | Interactive temp file scan, confirm then archive to archive/ (never deletes) |
harness resume |
"harness resume", "resume context after compact" | Lightweight context recovery (same session, ~3k tokens) |
harness handoff |
"harness handoff", "new agent takeover" | Deep context handoff (cross-session/crash recovery, ~8k tokens) |
Harness packages 9 security/development Skills + 7 command Skills, deployed to ~/.claude/skills/ via symlink during installation:
# Batch install
for skill in ~/.claude/skills/harness-en/bundled-skills/*/; do
name=$(basename "$skill")
ln -sf "$skill" ~/.claude/skills/"$name"
done| Category | Skill | Description | Config Required |
|---|---|---|---|
| 🏭 Factory | skill-creator | General Skill generator | None |
| 🏭 Factory | security-review-skill-creator | Security audit Skill generator | None (lark optional) |
| 🔒 Security | security-review-skill-for-docker | Docker/container security audit | None |
| 🔒 Security | security-review-skill-for-terraform | Terraform/IaC security audit | None |
| 🔒 Security | sca-ai-denoise | SCA vulnerability denoising | None |
| 🔒 Security | supply-chain-audit | Supply chain poisoning detection (8 langs) | None |
| 🔒 Security | skills-audit | Third-party Skill security audit | Optional ANTHROPIC_API_KEY |
| 🔒 Security | web-vuln-analyzer | Web vulnerability analysis | |
| 🔒 Security | android-vuln-analyzer | Android vulnerability analysis | |
| 🛠 Command | harness-help | Help | None |
| 🛠 Command | harness-audit | Health check | None |
| 🛠 Command | harness-quality-gate | Quality gate | None |
| 🛠 Command | harness-guide | Skill recommendation | None |
| 🧹 Command | harness-cleanup | Interactive temp file archive (never deletes) | None |
| 🔄 Command | harness-resume | Lightweight context recovery (after /compact) | None |
| 🔄 Command | harness-handoff | Deep context handoff (new agent takeover) | None |
Skills requiring configuration are interactively guided during Harness initialization Step 2; skipped ones are marked "Not available".
~/.claude/skills/harness-en/
├── SKILL.md Main file (8-step workflow + 11 scenario integrations)
├── README.md This document
├── bundled-skills/ Bundled Skills (symlink install)
│ ├── harness-help/SKILL.md Command: help index
│ ├── harness-audit/SKILL.md Command: project health check
│ ├── harness-quality-gate/SKILL.md Command: pre-commit quality gate
│ ├── harness-guide/SKILL.md Command: Skill recommendation guide
│ ├── harness-cleanup/SKILL.md Command: interactive temp file archive
│ ├── harness-resume/SKILL.md Command: lightweight context recovery
│ ├── harness-handoff/SKILL.md Command: deep context handoff
│ ├── skill-creator/ General Skill generator
│ ├── security-review-skill-creator/ Security audit Skill generator
│ ├── security-review-skill-for-docker/ Docker security audit
│ ├── security-review-skill-for-terraform/ Terraform security audit
│ ├── sca-ai-denoise/ SCA vulnerability denoising
│ ├── supply-chain-audit/ Supply chain poisoning detection
│ ├── skills-audit/ Third-party Skill security audit
│ ├── web-vuln-analyzer/ Web vulnerability analysis (lite)
│ └── android-vuln-analyzer/ Android vulnerability analysis
├── references/
│ ├── skill-ecosystem.md Full Skill ecosystem map + installation methods
│ ├── skill-guide.md Scenario → Skill recommendation matrix (harness guide data source)
│ ├── doc-templates.md Documentation system templates (CLAUDE.md / docs/)
│ ├── agent-teams.md Agent Team role framework
│ ├── secure-coding.md Security standards (CWE + OWASP + Agent red lines)
│ ├── conventions.md Dev conventions + Agent behavior rules (9 MUST rules) + Token optimization
│ ├── lang-patterns.md Tech stack coding patterns (6 languages/frameworks)
│ ├── hook-scripts.md Enterprise Hook gate scripts (4 scripts + activation guide)
│ └── roadmap.md AI development pain points & solutions (24 items, detailed)
└── templates/
├── claude-md-index.md CLAUDE.md slim template (L0 index)
├── sub-index.md L1 category index + L2 module index templates
├── task-plan.md Task plan template
└── agent-role.md Agent role definition template
Harness's behavior enhancement consists of two layers:
| Layer | Mechanism | Enforcement | Description |
|---|---|---|---|
| Hook scripts | Shell commands, exit 1 blocking | System-level | Pre-commit secret detection, dangerous command interception -- AI cannot bypass |
| Instruction enhancement | CLAUDE.md + Skill description | Depends on LLM | TDD/Review/documentation sync rules -- AI "should" follow but might not |
Enterprise mode (all 4 Hook scripts enabled) provides stronger security guarantees. Open-source mode (default) relies on the LLM's instruction following capability -- this is prompt engineering, not deterministic access control.
Q: What's the relationship between Harness and superpowers? A: superpowers is the underlying behavior control framework (14 sub-Skills), while Harness is the higher-level orchestrator — it installs superpowers, configures hooks, generates documentation, and assembles the Agent Team. Analogy: superpowers is the Linux kernel, Harness is the Ubuntu installer.
Q: Do I need to say "harness" at every session? A: No. Harness only runs once during project initialization. After that, the dual-layer enhancement — Hooks + CLAUDE.md + Skills — takes effect automatically at every session.
Q: What if I don't want a particular enhancement? A: Harness is interactive — every step can be skipped. You can also run only specific steps (e.g., generate docs only without setting up an Agent Team).
Q: Will re-running Harness overwrite my docs? A: No. Harness is idempotent — existing files are only supplemented, never overwritten; already-installed Skills are skipped.
Q: Can I add my own conventions? A: Yes. Edit the files under docs/conventions/ directly, or add custom MUST/MUST NOT entries to the behavior rules section of CLAUDE.md.
Q: How do I share pitfall experience across projects?
A: Skills generated by claudeception can be stored in ~/.claude/skills/ (user-level), shared across all projects. Storing in .claude/skills/ (project-level) limits them to the current project only.