Harness — AI Agent Development Scaffold & Security Enhancement

Harness is an AI Agent development scaffold Meta-Skill that establishes four layers of enhancement for any project in one command: knowledge management, architecture constraints, feedback loops, and entropy management.

Optimized for Claude Code: Harness leverages Claude Code's unique Hook system (SessionStart / PreToolUse / PostToolUse / Stop) for system-level behavior enhancement. Combined with the experimental Agent Teams feature, you can spin up multi-role collaboration (Architect / Engineer / Tester) in one prompt. Dual-layer enhancement (Hook system-level + instruction file rule-level) + Skill auto-matching are fully active on Claude Code, delivering the most complete enhancement experience.

Compatible with 9 AI coding tools: Cursor, Windsurf, Cline, GitHub Copilot, Aider, Continue, Devin, and any tool supporting project-level instruction files (via AGENT.md as generic fallback). Instruction file rules and docs/ documentation work universally across all tools, ensuring core enhancements remain effective regardless of your IDE.

Why Do You Need Harness?

AI Agents write code fast, but "fast" brings four core problems:

Problem	Symptom	Consequence
Knowledge gaps	Every new session starts from scratch with no project context	Repeated mistakes, violated conventions
No constraints	Bad code exists in the codebase, AI copies and produces more bad code	Security vulnerabilities, architecture decay
No feedback	"Confidently declares mission accomplished" when it's actually a mess	Production incidents, rework
Entropy increase	Writing fast = garbage piles up fast	Technical debt explosion, outdated documentation

Harness's solution: Establish four layers of enhancement with a single command at project initialization, automatically effective in every subsequent development session.

But it's not just these 4 — we've identified 24 pain points across 7 categories. Here's how Harness addresses each one ↓

Roadmap: AI Development Pain Points & Harness Solutions

24 common pain points in AI-assisted development, organized by category. Each lists Harness's current solution, strength rating, and planned future enhancements. Detailed version with full problem descriptions and solution architecture →

Thinking & Planning

#	Pain Point	Current Solution	Strength	Future Enhancement
1	AI codes before thinking — jumps to implementation without understanding requirements	superpowers brainstorming HARD-GATE: no code until design approved	★★★★★	—
2	Plans collapse mid-task — AI forgets the plan halfway through	planning-with-files 4 Hooks: re-read task_plan.md before every tool call	★★★★☆	Auto-detect plan drift (compare actions vs plan)
3	One-shot answers — AI gives a single solution without exploring alternatives	superpowers brainstorming forces 2+ approaches with trade-offs	★★★★☆	—
4	No adversarial review — nobody challenges the AI's design	Challenger (C) agent role: CLAIM/CHALLENGE/VERIFICATION/VERDICT	★★★☆☆	Auto-invoke Challenger after Architect produces a plan

Memory & Context

#	Pain Point	Current Solution	Strength	Future Enhancement
5	Context loss after /compact — AI forgets decisions and progress	Compact checkpoint rule: must update progress.md + task_plan.md before compact	★★★★☆	Auto-checkpoint hook before compact
6	New session cold start — AI doesn't know the project	CLAUDE.md (≤150 lines) auto-loaded + docs/ B-tree index for on-demand deep reading	★★★★★	—
7	Repeated mistakes across sessions — same pitfall hit multiple times	claudeception extracts pitfalls into reusable Skills; docs/pitfalls/ records	★★★★☆	Auto-match pitfall Skills before coding starts
8	Context window quality degradation — output quality drops as context grows	Context Recovery 4-step protocol + Token Budget rules (offset+limit, structured output)	★★★☆☆	Token pressure monitoring + auto-compact suggestion

Quality Control

#	Pain Point	Current Solution	Strength	Future Enhancement
9	"Done" without verification — AI claims completion without evidence	superpowers verification Iron Law + Stop hook checks Phase completion	★★★★★	—
10	No tests — code ships without test coverage	superpowers TDD Iron Law: "NO CODE WITHOUT FAILING TEST FIRST"	★★★★★	—
11	Skips code review — AI produces code nobody reviews	superpowers code-review dispatches reviewer subagent	★★★★☆	Auto-trigger review on PR creation
12	Security vulnerabilities introduced — AI writes insecure code	3-layer security: CWE defense in CLAUDE.md + secure-coding.md + security-review Skills	★★★★☆	Auto-security-scan on every commit (Enterprise hook)

Code Hygiene & Documentation

#	Pain Point	Current Solution	Strength	Future Enhancement
13	Dead code accumulation — commented-out code, unused imports pile up	CLAUDE.md 5 MUST NOT hygiene rules + quality gate Check #5	★★★★☆	Lint integration in quality gate
14	Documentation goes stale — docs don't match code after changes	Three-tier doc sync: Lite (self-check) → Standard (dynamic grep) → Full (quality gate)	★★★★☆	PostToolUse hook for real-time doc sync reminder
15	Root directory pollution — test scripts, debug files accumulate	harness-cleanup interactive archive + harness-audit checks root cleanliness	★★★★☆	—
16	FIXME/HACK debt — temporary fixes become permanent	CLAUDE.md rule: resolve within 1 week; harness-audit flags stale FIXMEs	★★★☆☆	Track FIXME age in quality gate

Hallucination & Reliability

#	Pain Point	Current Solution	Strength	Future Enhancement
17	API hallucination — AI invents non-existent APIs or library functions	Challenger role verifies claims against source/docs; superpowers Red Flags table	★★★☆☆	Auto-verify imports against installed packages
18	Confident but wrong — AI states incorrect facts with high confidence	Challenger VERDICT system (CONFIRMED/REFUTED/UNVERIFIED) + evidence requirement	★★★☆☆	Mandatory citation for architectural claims
19	Blind copy-paste — AI copies existing bad patterns in the codebase	CLAUDE.md MUST NOT rules + security standards block known bad patterns	★★★☆☆	Anti-pattern database from pitfall records

Collaboration & Workflow

#	Pain Point	Current Solution	Strength	Future Enhancement
20	No role separation — same AI does design, coding, testing, review	Agent Team: Architect / Challenger / Engineer / Tester with strict constraints	★★★★☆	Workflow orchestration (auto role transitions)
21	Experience not captured — hard-won knowledge lost after session ends	claudeception continuous learning + UserPromptSubmit hook evaluation	★★★★☆	Auto-extract on session end (not just /claudeception)
22	No project health visibility — don't know if enhancements are working	harness-audit: scan completeness of CLAUDE.md/docs/hooks/skills, output score	★★★☆☆	Trend tracking across audits

Security & Compliance

#	Pain Point	Current Solution	Strength	Future Enhancement
23	Secret leaks in commits — API keys, credentials committed to git	Enterprise Hook: pre-commit secret scan + CLAUDE.md MUST NOT .env/.key/.pem	★★★★☆	Default-on secret scanning (not Enterprise-only)
24	Supply chain attacks — malicious dependencies slip in	supply-chain-audit Skill (8 languages) + sca-ai-denoise for vulnerability triage	★★★★☆	Auto-audit on dependency changes

Overall: 4 fully solved (★★★★★), 13 strong (★★★★☆), 7 partial (★★★☆☆), 0 unsolved. ★★★★★ = system-level enforcement, ★★★★☆ = strong with minor gaps, ★★★☆☆ = partial, enhancement planned.

Four-Layer Enhancement System

┌─────────────────────────────────────────────────────────┐
│                Harness Enhancement System                │
├──────────────┬──────────────┬──────────────┬────────────┤
│  Layer One   │  Layer Two   │  Layer Three │ Layer Four │
│  Knowledge   │  Architecture│  Feedback    │  Entropy   │
│  Mgmt 📋    │  Constraints 🚧│  Loops 🔄  │  Mgmt 🧹  │
│              │              │              │            │
│  CLAUDE.md   │  Hook-based  │  TDD         │  Code      │
│  docs/ tree  │  enhancement │  Code Review │  hygiene   │
│  Agent Team  │  Security    │  Verification│  Doc sync  │
│  Skill       │  standards   │  gates       │  Pitfall   │
│  ecosystem   │  CWE defense │  Security    │  records   │
│              │  Behavior    │  review      │  Knowledge │
│              │  red lines   │              │  extraction│
└──────────────┴──────────────┴──────────────┴────────────┘

Layer One: Knowledge Management 📋

Problem: The AI Agent doesn't know your project's background, conventions, or habits.

Harness's solution:

1. CLAUDE.md — The AI's Onboarding Manual

After automatically analyzing the project, Harness generates a lean CLAUDE.md (≤150 lines) that serves as the AI's first reading material at the start of every session:

# MyProject
One-line description

## Documentation Navigation
| Category | Path | Content |
|----------|------|---------|
| Architecture | docs/architecture/ | System architecture, tech stack, DB, API |
| Dev Conventions | docs/conventions/  | must-follow / must-not / secure-coding |
| Pitfall Records | docs/pitfalls/     | Categorized by tech stack |

## Behavior Rules (MUST FOLLOW)
- MUST brainstorm before writing code (HARD-GATE)
- MUST write tests before implementation (TDD)
- MUST security review before committing
- MUST NOT leave dead code / debug output

Why "slim down"? Because CLAUDE.md is read in full at every session. Stuffing 500 lines into it = wasted tokens + key information buried. Detailed content is split into docs/ sub-documents, loaded on demand.

2. docs/ — Multi-Level Index Tree (AI's B-tree Memory)

AI's memory isn't a pile of documents — it's an index tree. Each level stores only pointers; only leaf nodes store content:

L0: CLAUDE.md (≤150 lines)
     → 5 category pointers, zero actual content
     │
L1: docs/xxx/INDEX.md (≤50 lines)
     → Module list + one-line summary + last updated date
     │
L2: docs/xxx/module/INDEX.md (≤30 lines)
     → Leaf document list + timeline (when was what added/changed)
     │
L3: docs/xxx/module/topic.md (≤150 lines)
     → Actual content (the ONLY level with real content)

AI memory recovery path: read L0 → determine direction → read L1 → locate module → read L2 → find document → read L3. Only one index level is read at a time, minimizing token consumption.

Auto-scaling: Early projects: L1 points directly to leaves (two levels suffice). As the project grows, L2 subdirectories naturally emerge. When L1 INDEX exceeds 10 entries → upgrade to three levels. When a leaf exceeds 150 lines → auto-split.

docs/
├── architecture/INDEX.md     → system-overview.md, tech-stack.md, db-schema.md, api-reference.md
├── implementation/INDEX.md   → auth/INDEX.md, payment/INDEX.md, export/INDEX.md ...
│   └── auth/INDEX.md         → oauth2-flow.md, rbac-model.md, session-mgmt.md + timeline
├── conventions/INDEX.md      → must-follow.md, must-not.md, secure-coding.md
├── pitfalls/INDEX.md         → docker/INDEX.md, database/INDEX.md ... + timeline
└── backlog/INDEX.md          → optimization.md, features.md

3. Agent Team — Role-Based Division of Labor

Different roles read different docs and follow different constraints:

Role	Responsibilities	Constraints
Architect (A)	Planning, design, interaction, commits, docs	Must brainstorm → user approval → write design doc
Challenger (C)	Adversarial review of plans, designs, and claims	Never accept claims without evidence; verify API usage, thread safety, edge cases
Engineer (E)	Coding, fixes, refactoring	Must TDD, must not touch architecture-level config
Tester (T)	Write tests, verify	Must not modify business code, only report bugs

Trigger methods:

Natural language: "Have Agent B implement this feature" / "Have Agent C run tests to verify"
Agent tool: "You are Agent B (Engineer). Read .harness/agents/agent-b-engineer.md for your responsibilities. Task: ..."
Isolation mode: Agent tool + isolation: "worktree" (separate branch for complex tasks)

Extensible: Frontend / Backend / DevOps / DBA / Security.

4. Skill Ecosystem — Reusable Capability Library

Harness doesn't reinvent the wheel — it references and orchestrates existing open-source Skills:

~/.claude/skills/
├── superpowers/                 ← Behavior control methodology (14 sub-Skills)
├── planning-with-files/         ← Plan persistence
├── claudeception/               ← Knowledge extraction & automatic Skill generation
├── skill-creator/               ← General-purpose Skill generator
├── security-review-skill-creator/ ← Security audit Skill generator
├── frontend-design/             ← Frontend development
├── web-vuln-analyzer/           ← Web vulnerability analysis
├── sca-ai-denoise/              ← SCA vulnerability denoising
├── supply-chain-audit/          ← Supply chain poisoning detection
└── harness/                     ← This Skill (for project initialization)

Layer Two: Architecture Constraints 🚧

Problem: Written rules alone aren't enough — AI will rationalize skipping them. You need an "access control system."

Harness's solution: Dual-layer enhancement + Skill auto-matching (Open-Source vs Enterprise Mode)

┌──────────────────────────────────────────────────────┐
│  Layer 1: Hooks — Access Control                     │
│                                                      │
│  ── Core Hooks (enabled by default) ──               │
│  SessionStart    → Auto-inject superpowers methodology│
│  UserPromptSubmit → Show plan status + knowledge     │
│                    extraction reminder                │
│  PreToolUse      → Re-read task_plan.md before every │
│                    tool call                          │
│  PostToolUse     → Remind to update progress after   │
│                    writes                             │
│  Stop            → Check completion status on exit   │
│                                                      │
│  ── Security Gate Hooks [Optional/Enterprise] ──     │
│  PreToolUse(Bash)  → git commit secret/sensitive     │
│                      file interception                │
│  PreToolUse(Bash)  → commit format validation        │
│  PreToolUse(Bash)  → dangerous command interception  │
│  PostToolUse(W/E)  → code security anti-pattern      │
│                      scan (WARNING)                   │
├──────────────────────────────────────────────────────┤
│  Layer 2: CLAUDE.md — Written Rules (instruction-    │
│           level, strongly persuasive)                 │
│                                                      │
│  MUST brainstorm → MUST /plan → MUST TDD             │
│  MUST security review → MUST code review             │
│  MUST NOT dead code → MUST NOT claim completion      │
│  without verification                                │
│  MUST NOT eval()/exec() with user input — CWE-95    │
│  MUST NOT shell=True with user args — CWE-78        │
│  MUST NOT f-string SQL concatenation — CWE-89       │
├──────────────────────────────────────────────────────┤
│  Skill Auto-matching — Semantic Trigger               │
│  (skill description matched to current task)          │
│                                                      │
│  Iron Laws: "NO CODE WITHOUT FAILING TEST FIRST"     │
│  Red Flags: 13 common "rationalized skip" excuses    │
│             → intercepted one by one                 │
│  HARD-GATE: No code allowed until design is approved │
└──────────────────────────────────────────────────────┘

Open-Source Mode vs Enterprise Mode:

Mode	Layer 1	Layer 2	Skill Auto-matching	Security Assurance
Open-Source (default)	Core Hooks (3 Skills)	MUST/MUST NOT text rules	Skill semantic trigger	AI self-discipline + text rules
Enterprise (optional)	Core + 4 security gate Hooks	Same + HOOK enforcement layer	Same	System-level technical gates

Open-Source Mode (default): Zero additional blocking — security relies on CLAUDE.md MUST/MUST NOT rules for AI self-discipline. Ideal for individual developers, open-source projects
Enterprise Mode (optional): Enables 4 security gate Hook scripts (pre-commit / commit-msg / dangerous-cmd / write-scan) for system-level enforcement. Ideal for enterprise teams, projects with high compliance requirements
Enterprise mode Hook scripts are in references/hook-scripts.md; users choose whether to enable during Harness initialization Step 2

Why dual-layer enhancement + Skill auto-matching?

Hooks are the strongest safeguard: system-level enforcement, AI cannot skip them (Claude Code only)
CLAUDE.md is the fallback: rules remain effective even without Hook configuration (universal across all AI tools)
Skills are the semantic trigger: superpowers' Iron Laws + Red Flags make the AI "automatically stop when tempted to skip"

Security Standards (hard control, non-negotiable):

docs/conventions/secure-coding.md contains three parts, all mandatory baselines:

Part A: 15 high-risk CWE defenses (SQL injection/command injection/XSS/SSRF/deserialization...)
  → All code changes must comply, no exceptions

Part B: OWASP Top 10 coding standards (13 rules, with code examples)
  → Mandatory baselines when writing code

Part C: AI Agent security red lines:
  ❌ No reverse shells / C2 callbacks
  ❌ No intranet tunneling / port forwarding to external
  ❌ No data exfiltration / credential theft
  ❌ No backdoor installation / hidden user creation
  ❌ No privilege escalation / disabling security mechanisms
  ❌ No code obfuscation / supply chain poisoning

Security standards are written into CLAUDE.md behavior rules to ensure they take effect automatically at every session. For in-depth audits, the security-review skill is triggered for a full check.

Layer Three: Feedback Loops 🔄

Problem: After finishing work, the AI doesn't know if it did well, and it will "confidently declare mission accomplished."

Harness's solution: Multiple verification mechanisms

Code complete
  │
  ├─ Automatic feedback: Tests
  │   TDD Iron Law → Every line of code has a corresponding test
  │   Finish code → Run tests immediately → Red/green light instant feedback
  │
  ├─ Agent reviews Agent: Code Review
  │   superpowers:requesting-code-review
  │   → Dispatch code-reviewer subagent (another AI reviews)
  │   → Checks: correctness / security / test coverage / performance / compatibility
  │
  ├─ Security Review
  │   → MUST execute security review checklist before committing
  │   → Has project-specific security-review skill → auto-audit
  │   → Doesn't have one → security-review-skill-creator generates one first
  │
  ├─ Completion Verification
  │   superpowers:verification-before-completion
  │   Iron Law: NO COMPLETION CLAIMS WITHOUT FRESH EVIDENCE
  │   → Tests pass + lint passes + security review passes → only then can claim "done"
  │
  └─ Debugging Feedback: Systematic Debugging
      superpowers:systematic-debugging
      Iron Law: NO FIXES WITHOUT ROOT CAUSE
      → 4 stages: root cause investigation → pattern analysis → hypothesis verification → implement fix

superpowers' "anti-rationalization" design:

The AI's biggest problem isn't not knowing the rules — it's being skilled at finding excuses to skip rules. superpowers addresses this with a "Red Flags table" — listing 13 common excuses and intercepting each one:

AI's Inner Monologue	Reality
"This is just a simple problem"	Simple problems still need checking for applicable Skills
"Write code first, add tests later"	That's not TDD, that's "tests as an afterthought"
"Time is tight, skip Review"	Code shipped without Review takes even longer to fix later
"Rewriting from scratch is wasteful"	Sunk cost fallacy — the time has already been spent

Layer Four: Entropy Management 🧹

Problem: AI works fast = technical debt accumulates fast. Docs go stale, dead code piles up, experience isn't captured.

Harness's solution: Continuous cleaning + knowledge crystallization

1. Code Hygiene (before every commit)

MUST DO:
  ✅ Delete unused code (don't comment it out — git has history)
  ✅ Delete debug print / console.log
  ✅ Delete unused imports / variables / functions
  ✅ Delete temporary files (test scripts go in tests/)
  ✅ Clean up unused dependencies

MUST NOT:
  ❌ Don't leave "just in case" commented-out code
  ❌ Don't leave empty except/catch blocks
  ❌ Don't pile up scripts in the root directory
  ❌ Don't leave FIXME/HACK unresolved for more than 1 week

2. Documentation Sync (three-tier quality gate)

Documentation sync is enforced at three levels to balance thoroughness with speed:

Level	Trigger	What Happens
Lite	After editing source code (CLAUDE.md behavior rule)	Self-check: does a matching doc in docs/ exist? If yes and content affected → update now
Standard	Claiming "done" or "complete"	Auto quality gate: dynamic scan of git diff + grep docs/ for references to changed modules
Full	"quality gate" or "ready to commit"	All 7 checks including tests, lint, security, doc sync, hygiene, progress, commit format

The quality gate dynamically detects which docs need updating (no hardcoded file mappings) — it scans docs/ structure and greps for references to changed files.

3. Pitfall Records (auto-triggered after debugging for >10 minutes)

Choose either approach:

Approach A: Write to docs/pitfalls/
  ## Problem Title
  Symptom → Root Cause → Solution → Prevention

Approach B: /claudeception to generate a Skill
  → Automatically extract reusable knowledge from the pitfall experience
  → Generate .claude/skills/<pitfall-name>/SKILL.md
  → Next time a similar problem occurs, the Skill triggers automatically

4. Knowledge Crystallization (claudeception — Continuous Learning System)

Work session
  │
  ├─ UserPromptSubmit hook continuously reminds:
  │   "Is there any non-obvious knowledge extractable from this task?"
  │
  ├─ Trigger conditions:
  │   • Debugged for >10 minutes on a non-documented issue
  │   • Discovered a workaround through trial and error
  │   • Project-specific non-obvious pattern
  │
  ├─ Quality gate (all 5 criteria must be met):
  │   ✅ Reusable (not just useful this one time)
  │   ✅ Non-trivial (not something findable in documentation)
  │   ✅ Specific (has clear trigger conditions and steps)
  │   ✅ Verified (confirmed the solution works)
  │   ✅ Actionable (general enough to reuse, specific enough to execute)
  │
  └─ Output: new SKILL.md
      → Stored in .claude/skills/ (project-level)
      → Or ~/.claude/skills/ (user-level, shared across projects)
      → Automatically matched and triggered on similar problems next time

Development Workflow

Complete Workflow: Initialization → Feature Implementation → Code Review → Wrap-up →

A feature goes through 7 phases, each with independent verification:

Phase 0: [explore](bundled-skills/explore/SKILL.md) — Understand existing architecture via knowledge graph
Phase 1: brainstorming → spec (HARD-GATE, no code until approved)
Phase 1.5: [design-review](bundled-skills/design-review/SKILL.md) spec — Independent challenger verifies spec references
Phase 2: writing-plans → plan → [design-review](bundled-skills/design-review/SKILL.md) plan — Challenger verifies plan + spec coverage
Phase 3: subagent-driven-development — TDD execution per task
Phase 4: [ai-implementation-integrity](../../ai-implementation-integrity/SKILL.md) — Dead code / hallucinated API / chain connectivity checks
Phase 5: code-review + security-review
Phase 6: finish branch + claudeception (knowledge extraction)

Every phase has an independent verifier — no phase trusts the previous phase's agent. Read the full workflow →

Usage

Initialization (One-Time)

In any project directory:

You: harness

Harness interactively executes 8 steps:

Step 1: Analyze project (language/framework/structure) → Display project profile → User confirms
Step 2: Install Skill ecosystem + configure Hooks → Dual-layer enhancement ready
Step 3: Deep information gathering (3 parallel Agents read code/history/docs) → Generate CLAUDE.md + docs/
Step 4: Design Agent Team (interactive role selection + trigger methods)
Step 5: Inject development conventions + security standards
Step 6: Create .harness/ planning infrastructure
Step 7: Display summary + getting started guide
Step 8: Scenario integration verification (11 scenario coverage check)

Resulting project structure:

project/
├── CLAUDE.md                  ← Slim index + behavior rules
├── docs/                      ← Multi-level documentation system
│   ├── architecture/
│   ├── implementation/
│   ├── conventions/           ← must-follow + must-not + secure-coding
│   ├── pitfalls/              ← Pitfall records (accumulated during development)
│   └── backlog/
├── .harness/                  ← Agent Team + plan templates
│   ├── agents/
│   ├── plans/
│   └── templates/
└── .claude/settings.json      ← Hook configuration

Daily Development (Automatically Effective Every Session)

After initialization, every subsequent development session automatically enters the enhancement system:

Developing a new feature:

You: Help me build a user export feature

Agent (auto-triggers brainstorming HARD-GATE):
  → Don't write code yet, let me understand the requirements
  → Clarifying questions: Export format? Data volume? Access control?
  → Propose 2 approaches + trade-offs
  → You choose an approach → Write design doc

Agent (auto-triggers writing-plans):
  → Split into 5 small tasks (2-5 minutes each)
  → Write to task_plan.md

Agent (auto-triggers TDD):
  → Write tests first: test_export_csv / test_export_permission / test_export_large_data
  → Tests red → Write implementation → Tests green

Agent (auto-triggers code-review):
  → Dispatch reviewer subagent for review
  → Review passes → security review → verification → done

Fixing a bug:

You: Users report the export feature is timing out

Agent (auto-triggers systematic-debugging):
  → Don't rush to change code, find the root cause first
  → 4 stages: investigate → analyze patterns → verify hypothesis → fix

Agent (after fix, claudeception hook triggers):
  → "This debugging session found that exporting large datasets needs streaming — record as a Skill?"
  → Generate .claude/skills/export-streaming-fix/SKILL.md

Security audit:

You: Generate a security audit skill for this project

Agent (triggers security-review-skill-creator):
  → Analyze project tech stack (Python + FastAPI + PostgreSQL)
  → Generate customized audit rules
  → Store in .claude/skills/security-review-skill-for-myproject/

You: Audit the code for security

Agent (triggers the generated audit Skill):
  → Audit item by item using project-specific rules
  → Output findings + remediation recommendations

Skill Dependency Graph

Harness doesn't reinvent the wheel — it orchestrates existing open-source Skills to build the enhancement system:

                        ┌─────────────┐
                        │   Harness   │  ← Project initialization entry point
                        │  Meta-Skill │
                        └──────┬──────┘
                               │ orchestrates
           ┌───────────────────┼───────────────────┐
           │                   │                   │
    ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
    │ superpowers  │    │ planning-   │    │claudeception│
    │  (obra)      │    │ with-files  │    │  (blader)   │
    │              │    │ (OthmanAdi) │    │             │
    │ 14 sub-Skills│    │             │    │ Knowledge   │
    │ brainstorming│    │ /plan cmd   │    │ extraction  │
    │ writing-plans│    │ 4 Hooks     │    │ Skill gen   │
    │ TDD          │    │ Session     │    │ Quality     │
    │ debugging    │    │ recovery    │    │ gates       │
    │ code-review  │    │             │    │             │
    │ verification │    │             │    │             │
    └──────────────┘    └─────────────┘    └─────────────┘
           │
           │ generates
    ┌──────▼──────────────────────────────────────┐
    │  Skill Factory                               │
    │  skill-creator → General Skill generation    │
    │  security-review-skill-creator → Security    │
    │                   audit generation           │
    │  superpowers:writing-skills → TDD-style      │
    │                   Skill writing              │
    │  claudeception → Extract Skills from         │
    │                   experience                 │
    └─────────────────────────────────────────────┘
           │
           │ outputs
        ┌──────▼──────────────────────────────────────┐
    │  Project-Specific Skills (.claude/skills/)   │
    │  security-review-skill-for-<project>         │
    │  <pitfall-name> Skill                        │
    │  <workflow-name> Skill                        │
    └──────────────────────────────────────────────┘

Core Skills in Detail

Skill	Source	Hook Mechanism	Enhancement Role
superpowers	obra/superpowers	SessionStart injects methodology	Layer Two (constraints) + Layer Three (feedback)
planning-with-files	OthmanAdi/planning-with-files	4 Hooks continuously inject plans	Layer One (knowledge) + Layer Four (entropy)
claudeception	blader/Claudeception	UserPromptSubmit reminder	Layer Four (entropy) + Layer One (knowledge)

Skill Triggering: Automatic vs Manual

A core design goal of Harness is that you don't need to manually invoke Skills during day-to-day development. The dual-layer enhancement mechanism (Hook + CLAUDE.md + Skill description) makes key methodologies take effect automatically in the background.

Automatic Triggering (Hook + CLAUDE.md rules — no manual action needed)

Skill	Trigger Mechanism	What You Experience
superpowers (brainstorming)	SessionStart hook + CLAUDE.md HARD-GATE	Say "build a new feature" → Claude automatically asks requirements, proposes solutions, refuses to write code directly
superpowers (TDD)	SessionStart hook Iron Law	Claude automatically writes tests before implementation
superpowers (systematic-debugging)	SessionStart hook	Say "this bug" → Claude automatically follows root cause analysis instead of blind fixes
superpowers (verification)	SessionStart hook + Stop hook	Claude automatically runs verification before claiming "done"
planning-with-files	4 Hooks full coverage	Every input shows plan status, re-reads plan before tool calls, reminds to update progress after writes
claudeception (evaluation)	UserPromptSubmit hook	On every input, Claude internally evaluates "is there extractable knowledge" but won't interrupt you
harness-quality-gate	CLAUDE.md commit rules	Claude auto-checks tests/lint/security/doc sync before commits

Manual Triggering (on-demand)

Skill	How to Trigger	Notes
superpowers (code-review)	"Help me review this code"	You must initiate; Claude dispatches a reviewer subagent
claudeception (generate Skill)	"/claudeception" or "summarize this as a skill"	Hook only evaluates; actual extraction requires you to say the word
security-review-skill-creator	"Generate a security audit skill for this project"	On-demand project-specific audit rule generation
skill-creator	"Help me create a skill for XX"	On-demand workflow codification
harness-audit	"harness audit"	On-demand project health check
harness-guide	"harness guide" or "recommend a skill"	On-demand Skill recommendation
harness-cleanup	"harness cleanup" or "clean up temp files"	Interactive archive of temp files to archive/
harness-resume	"harness resume"	Lightweight context recovery after /compact (~3k tokens)
harness-handoff	"harness handoff"	Deep handoff for new agent takeover (~8k tokens)

In short: Hooks handle the automatic, you handle the decisions. During daily coding, brainstorming/TDD/debugging/planning all work automatically — you only need to speak up when you want a review, want to capture experience, or want to generate a new Skill.

Scenario Integration Coverage (11 Scenarios)

After Harness is set up, the following scenarios automatically receive protection:

Fully Protected Scenarios (Hook + CLAUDE.md + Skill — all layers)

#	Scenario	Protection Mechanism
1	Feature Development	SessionStart injects methodology → planning 4 hooks track → CLAUDE.md TDD/Review rules → claudeception knowledge extraction
2	Bug Debugging	SessionStart injects systematic-debugging → planning tracks → claudeception pitfall reminder
3	Plan Execution	planning-with-files 4 hooks full coverage (display → re-read → remind to update → completion check)
4	Architecture Changes	CLAUDE.md HARD-GATE → superpowers brainstorming → docs/ sync rules
5	Task Completion	superpowers verification → planning Stop hook checks Phase completion
6	Refactoring	planning 4 hooks + .harness/templates/refactor.md (Invariants constraints) + TDD
11	Code Hygiene	CLAUDE.md 5 MUST NOT rules + claudeception reminder

Partially Protected Scenarios (CLAUDE.md + Skill semantic triggering)

#	Scenario	Protection Mechanism	Notes
7	PR / Code Review	CLAUDE.md review rules + superpowers:requesting-code-review	User-initiated, semantic-level is sufficient
9	DB Migration	planning tracks + CLAUDE.md doc sync rules	High-risk but low-frequency, plan constraints are sufficient

Scenarios Covered by Security Scanning Platform

#	Scenario	Coverage Method
8	Dependency Updates / SCA	Security scanning platform SCA scan + sca-ai-denoise denoising
10	Hotfix / Emergency Fix	Shares mechanism with Scenario 1 + security scanning platform incremental scan

End-to-End Development Workflow

New feature request
  │
  ▼
┌─────────────────────┐
│ 1. Brainstorming    │  ← superpowers HARD-GATE
│    Explore context   │     No code allowed until design is approved
│    Clarify needs     │
│    Propose approaches│
│    + trade-offs      │
│    User approves     │
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│ 2. Writing Plans    │  ← superpowers + planning-with-files
│    Split into small  │     task_plan.md persisted
│    tasks             │     Hooks continuously inject status
│    2-5 min each      │
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│ 3. TDD              │  ← superpowers Iron Law
│    Write failing test│     "NO CODE WITHOUT FAILING TEST"
│    Write minimal impl│
│    Tests pass        │
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│ 4. Code Review      │  ← superpowers:requesting-code-review
│    Subagent reviews  │     Agent reviews Agent
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│ 5. Verification     │  ← superpowers Iron Law
│    Tests pass?       │     "NO COMPLETION WITHOUT EVIDENCE"
│    Lint passes?      │
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│ 6. Knowledge Capture│  ← claudeception + pitfalls
│    Extractable       │     /claudeception or docs/pitfalls/
│    knowledge?        │     PostToolUse hook reminder
│    Docs need update? │
└─────────────────────┘

Multi-AI Tool Compatibility

Harness is not limited to Claude Code — it's compatible with all major AI coding tools. During initialization, it auto-detects the current environment and generates the appropriate instruction file:

AI Tool	Instruction File	Format
Claude Code	`CLAUDE.md`	Markdown (tables/links/code blocks)
Cursor	`.cursorrules` or `.cursor/rules/*.md`	Plain text (indentation instead of tables)
Windsurf	`.windsurfrules`	Plain text
Cline	`.clinerules`	Plain text
GitHub Copilot	`.github/copilot-instructions.md`	Markdown
Aider	`CONVENTIONS.md`	Markdown
Continue	`.continuerules`	Plain text
Devin	`devin.md`	Markdown
Generic / Unknown	`AGENT.md`	Markdown

Same content, adapted format: All instruction files contain the same project knowledge, behavior rules, and documentation navigation — only the format is adjusted per tool (Markdown tools get tables/links, plain text tools get indentation/lists).

Portability of Dual-Layer Enhancement:

Layer 1 (Hooks): Claude Code only — other tools skip this layer
Layer 2 (Instruction file rules): Universal across all tools — only the filename differs
docs/ documentation: Fully universal — all AI tools can read Markdown docs

Command System

Harness provides 7 post-initialization commands, installed as independent Skills in bundled-skills/:

Command	Trigger Words	Description
`harness help`	"harness help", "harness commands", "what commands"	Command index + installed Skill inventory + scenario quick entries
`harness audit`	"harness audit", "project health check", "harness status"	Scan CLAUDE.md/docs/hooks/skill completeness, output score + remediation
`harness quality gate`	"quality gate", "ready to commit", or auto on "done"	3 levels: Lite (doc sync), Standard (hygiene+docs+progress), Full (all 7 checks)
`harness guide`	"recommend skill", "which skill", "skill recommendation"	Read scenario→Skill recommendation matrix, match the best Skill
`harness cleanup`	"harness cleanup", "clean up temp files", "archive temp files"	Interactive temp file scan, confirm then archive to archive/ (never deletes)
`harness resume`	"harness resume", "resume context after compact"	Lightweight context recovery (same session, ~3k tokens)
`harness handoff`	"harness handoff", "new agent takeover"	Deep context handoff (cross-session/crash recovery, ~8k tokens)

Bundled Skills

Harness packages 9 security/development Skills + 7 command Skills, deployed to ~/.claude/skills/ via symlink during installation:

# Batch install
for skill in ~/.claude/skills/harness-en/bundled-skills/*/; do
  name=$(basename "$skill")
  ln -sf "$skill" ~/.claude/skills/"$name"
done

Category	Skill	Description	Config Required
🏭 Factory	skill-creator	General Skill generator	None
🏭 Factory	security-review-skill-creator	Security audit Skill generator	None (lark optional)
🔒 Security	security-review-skill-for-docker	Docker/container security audit	None
🔒 Security	security-review-skill-for-terraform	Terraform/IaC security audit	None
🔒 Security	sca-ai-denoise	SCA vulnerability denoising	None
🔒 Security	supply-chain-audit	Supply chain poisoning detection (8 langs)	None
🔒 Security	skills-audit	Third-party Skill security audit	Optional ANTHROPIC_API_KEY
🔒 Security	web-vuln-analyzer	Web vulnerability analysis	⚠️ Docker + API keys
🔒 Security	android-vuln-analyzer	Android vulnerability analysis	⚠️ apktool/jadx/frida
🛠 Command	harness-help	Help	None
🛠 Command	harness-audit	Health check	None
🛠 Command	harness-quality-gate	Quality gate	None
🛠 Command	harness-guide	Skill recommendation	None
🧹 Command	harness-cleanup	Interactive temp file archive (never deletes)	None
🔄 Command	harness-resume	Lightweight context recovery (after /compact)	None
🔄 Command	harness-handoff	Deep context handoff (new agent takeover)	None

Skills requiring configuration are interactively guided during Harness initialization Step 2; skipped ones are marked "Not available".

File Manifest

~/.claude/skills/harness-en/
├── SKILL.md                              Main file (8-step workflow + 11 scenario integrations)
├── README.md                             This document
├── bundled-skills/                       Bundled Skills (symlink install)
│   ├── harness-help/SKILL.md             Command: help index
│   ├── harness-audit/SKILL.md            Command: project health check
│   ├── harness-quality-gate/SKILL.md     Command: pre-commit quality gate
│   ├── harness-guide/SKILL.md            Command: Skill recommendation guide
│   ├── harness-cleanup/SKILL.md          Command: interactive temp file archive
│   ├── harness-resume/SKILL.md           Command: lightweight context recovery
│   ├── harness-handoff/SKILL.md          Command: deep context handoff
│   ├── skill-creator/                    General Skill generator
│   ├── security-review-skill-creator/    Security audit Skill generator
│   ├── security-review-skill-for-docker/ Docker security audit
│   ├── security-review-skill-for-terraform/ Terraform security audit
│   ├── sca-ai-denoise/                   SCA vulnerability denoising
│   ├── supply-chain-audit/               Supply chain poisoning detection
│   ├── skills-audit/                     Third-party Skill security audit
│   ├── web-vuln-analyzer/                Web vulnerability analysis (lite)
│   └── android-vuln-analyzer/            Android vulnerability analysis
├── references/
│   ├── skill-ecosystem.md                Full Skill ecosystem map + installation methods
│   ├── skill-guide.md                    Scenario → Skill recommendation matrix (harness guide data source)
│   ├── doc-templates.md                  Documentation system templates (CLAUDE.md / docs/)
│   ├── agent-teams.md                    Agent Team role framework
│   ├── secure-coding.md                  Security standards (CWE + OWASP + Agent red lines)
│   ├── conventions.md                    Dev conventions + Agent behavior rules (9 MUST rules) + Token optimization
│   ├── lang-patterns.md                  Tech stack coding patterns (6 languages/frameworks)
│   ├── hook-scripts.md                   Enterprise Hook gate scripts (4 scripts + activation guide)
│   └── roadmap.md                        AI development pain points & solutions (24 items, detailed)
└── templates/
    ├── claude-md-index.md                CLAUDE.md slim template (L0 index)
    ├── sub-index.md                      L1 category index + L2 module index templates
    ├── task-plan.md                      Task plan template
    └── agent-role.md                     Agent role definition template

Honesty Statement

Harness's behavior enhancement consists of two layers:

Layer	Mechanism	Enforcement	Description
Hook scripts	Shell commands, exit 1 blocking	System-level	Pre-commit secret detection, dangerous command interception -- AI cannot bypass
Instruction enhancement	CLAUDE.md + Skill description	Depends on LLM	TDD/Review/documentation sync rules -- AI "should" follow but might not

Enterprise mode (all 4 Hook scripts enabled) provides stronger security guarantees. Open-source mode (default) relies on the LLM's instruction following capability -- this is prompt engineering, not deterministic access control.

FAQ

Q: What's the relationship between Harness and superpowers? A: superpowers is the underlying behavior control framework (14 sub-Skills), while Harness is the higher-level orchestrator — it installs superpowers, configures hooks, generates documentation, and assembles the Agent Team. Analogy: superpowers is the Linux kernel, Harness is the Ubuntu installer.

Q: Do I need to say "harness" at every session? A: No. Harness only runs once during project initialization. After that, the dual-layer enhancement — Hooks + CLAUDE.md + Skills — takes effect automatically at every session.

Q: What if I don't want a particular enhancement? A: Harness is interactive — every step can be skipped. You can also run only specific steps (e.g., generate docs only without setting up an Agent Team).

Q: Will re-running Harness overwrite my docs? A: No. Harness is idempotent — existing files are only supplemented, never overwritten; already-installed Skills are skipped.

Q: Can I add my own conventions? A: Yes. Edit the files under docs/conventions/ directly, or add custom MUST/MUST NOT entries to the behavior rules section of CLAUDE.md.

Q: How do I share pitfall experience across projects? A: Skills generated by claudeception can be stored in ~/.claude/skills/ (user-level), shared across all projects. Storing in .claude/skills/ (project-level) limits them to the current project only.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
bundled-skills		bundled-skills
references		references
templates		templates
README.md		README.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

Harness — AI Agent Development Scaffold & Security Enhancement

Why Do You Need Harness?

Roadmap: AI Development Pain Points & Harness Solutions

Thinking & Planning

Memory & Context

Quality Control

Code Hygiene & Documentation

Hallucination & Reliability

Collaboration & Workflow

Security & Compliance

Four-Layer Enhancement System

Layer One: Knowledge Management 📋

Layer Two: Architecture Constraints 🚧

Layer Three: Feedback Loops 🔄

Layer Four: Entropy Management 🧹

Development Workflow

Usage

Initialization (One-Time)

Daily Development (Automatically Effective Every Session)

Skill Dependency Graph

Core Skills in Detail

Skill Triggering: Automatic vs Manual

Scenario Integration Coverage (11 Scenarios)

Fully Protected Scenarios (Hook + CLAUDE.md + Skill — all layers)

Partially Protected Scenarios (CLAUDE.md + Skill semantic triggering)

Scenarios Covered by Security Scanning Platform

End-to-End Development Workflow

Multi-AI Tool Compatibility

Command System

Bundled Skills

File Manifest

Honesty Statement

FAQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages