Harness Engineering

This repository is an active attempt to formalize and implement Harness Engineering—a mechanical, repo-enforced framework for making large (often agent-driven) code changes safe, reviewable, and incremental.

Inspired by OpenAI's work on Harness Engineering, this project provides the scaffolding, tooling, and conventions necessary for humans to steer while agents execute.

What is Harness Engineering?

Harness Engineering is a set of constraints and feedback loops that enable high-throughput agent-driven development without architectural decay. It treats the repository as a system of record for everything agents need to know.

Key insight from OpenAI's experience:

"What Codex can't see doesn't exist."

Knowledge in Slack threads, Google Docs, or engineers' heads is illegible to agents. The solution: encode everything—architecture, taste, quality standards—into mechanical, verifiable structures in the repo itself.

Core Principles

Principle	What it means
One canonical harness command	`just check` (or equivalent) runs all gates locally and in CI
Architecture as code	Import boundaries enforced mechanically (Import Linter, grimp)
Taste as code	AST rules (ast-grep) ban patterns and prevent regressions
Evidence-based chunks	Every change produces tests, snapshots, or golden diffs
Progressive disclosure	`AGENTS.md` is a map, not an encyclopedia; deeper docs linked
Agent-to-agent review	Council of models votes on risky changes

Repository Structure

.
├── .claude-plugin/
│   └── plugin.json             # Claude Code plugin manifest
├── docs/
│   ├── harness-engineering.md    # Full HES v1 specification
│   └── workflows/
│       └── RPEQ.md               # Research → Plan → Execute → QA workflow
├── agents/                       # Agent definitions for specific tasks
│   ├── analysis/                 # Codebase analysis agents
│   ├── development/              # Development agents (TDD, refactoring)
│   ├── documentation/            # Documentation agents
│   ├── research/                 # Research and synthesis agents
│   ├── performance/              # Profiling and optimization agents
│   └── security/                 # Security review agents
├── commands/                     # Slash commands for Claude Code
│   ├── context engineering/      # Research, plan, execute workflow
│   ├── integration/              # External tool integrations
│   ├── quality-assurance/        # QA and inspection commands
│   └── utilities/                # Helper commands
├── skills/                       # Reusable skill definitions
│   ├── ast-grep-setup/           # TypeScript ast-grep rules setup
│   ├── codebase-research/        # Codebase mapping and research
│   ├── differential-session-runner/ # Durable debugging/evidence sessions
│   ├── harness-map/              # Repository harness layer mapping
│   ├── implementation-planner/   # Plan generation from research
│   ├── plan-executor/            # Execute implementation plans
│   └── qa-from-execute/          # QA review after execution
├── prompt-hooks/                 # Prompt hooks for automation
├── alias/                        # Model alias configurations
└── rules/                        # Structural and taste rules

Installation

As a Claude Code Plugin

Install this as a plugin to get namespaced skills, agents, and commands:

# Install from local directory (for development)
claude --plugin-dir /path/to/harness-engineering

# Or add to your plugin marketplaces for easy installation

Once installed, skills are available as /harness-engineering:<skill-name>:

/harness-engineering:codebase-research - Map and research codebases
/harness-engineering:harness-map - Map a repo's mechanical harness layers
/harness-engineering:differential-session-runner - Create or continue durable debugging/evidence sessions
/harness-engineering:implementation-planner - Generate execution plans
/harness-engineering:plan-executor - Execute implementation plans
/harness-engineering:qa-from-execute - QA review of changes
/harness-engineering:ast-grep-setup - Set up structural linting

Using This Repository

Most of the files in this repo (agents, commands, skills) are designed for Claude Code and OpenAI Codex. However, they can easily be adapted to other harnesses:

For other AI tools: Just prompt your tool to read and adapt the files. The patterns are tool-agnostic.
For a universal sync tool: We plan to build a sync tool that can push these definitions to any harness.
Internal use: Clone this repo and ask Claude or Codex to set it up for your project. They will understand the structure and adapt it to your codebase.

The core principles and gates are universal. The implementation details (slash commands, prompt hooks) are just one expression of the Harness Engineering philosophy.

The Six Gates

A Harness Engineering compliant repository enforces these gates:

Gate A: Formatting + Lint — Deterministic style enforcement (ruff, etc.)
Gate B: Import Boundaries — Architecture constraints via Import Linter
Gate C: Structural Ratchets — AST rules via ast-grep
Gate D: Snapshot Testing — Behavior lock via syrupy
Gate E: Golden Outputs — End-to-end deterministic artifacts
Gate F: Numerical Equivalence — Tolerances for math/ML/sims refactors

Work Chunks

Changes merge only as work chunks—small, independently verifiable units with evidence:

One thing changed
Harness enforcement adjusted so the change stays true
Evidence produced (tests, snapshots, goldens, diffs)
All gates pass

Chunk documentation lives in docs/chunks/NNN-<slug>.md with:

Intent (what changes)
Preconditions (harness status, baselines)
Exact commands
Evidence (snapshots, goldens, numerical equivalence)
Rollback procedure

Current Status

This repository is under active development. We are:

Building out agent definitions for common development tasks
Creating slash commands for the RPEQ workflow
Formalizing skill patterns for research, planning, execution, and QA
Implementing prompt hooks for automated validation

See docs/harness-engineering.md for the full specification.

References

OpenAI: Harness Engineering — The original article describing agent-first development at OpenAI
AGENTS.md — Convention for agent instructions
Import Linter — Architecture boundary enforcement
ast-grep — Structural search and linting
Syrupy — Snapshot testing

License

MIT License — see LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harness Engineering

What is Harness Engineering?

Core Principles

Repository Structure

Installation

As a Claude Code Plugin

Using This Repository

The Six Gates

Work Chunks

Current Status

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude-plugin		.claude-plugin
agents		agents
alias		alias
commands		commands
docs		docs
prompt-hooks		prompt-hooks
rules/ast-grep		rules/ast-grep
skills		skills
status-line		status-line
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Harness Engineering

What is Harness Engineering?

Core Principles

Repository Structure

Installation

As a Claude Code Plugin

Using This Repository

The Six Gates

Work Chunks

Current Status

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages