Awesome Agent Harness

A curated, implementation-first list of agent harness engineering resources, with GitHub projects as the primary focus.

Total entries: 270
GitHub entries: 243 (90.0%)
GitHub in project categories (excluding readings): 238/238 (100.0%)
Categories: 9
Last verified: 2026-06-09
Language: English | 中文

Featured Harness Blogs

Scaling Managed Agents: Decoupling the brain from the hands: Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
What We Learned Building Cloud Agents: Cognition's field report on secure cloud-agent infrastructure, VM isolation, full-state snapshots, orchestration, governance, integrations, and enterprise adoption.
Claude Code auto mode: Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
Harness engineering (OpenAI): Field report on building reliable agent-first software via harness constraints and verification.
The next evolution of the Agents SDK: OpenAI's product and engineering post on model-native agent harnesses, native sandbox execution, manifests, memory, and filesystem and shell tools.
Building Effective AI Agents: Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
Writing effective tools for AI agents: Best practices for tool interface design so agents call tools safely and reliably.
Effective harnesses for long-running agents: Practical guide to maintaining state, resumability, and reliability over long agent runs.
Harness design for long-running application development: Follow-up article on improving long-running app generation through harness structure.
Improving Deep Agents with harness engineering: Evidence that harness improvements alone can move benchmark performance.
Evaluating Deep Agents: Our Learnings: LangChain's practical lessons on evaluating stateful and long-horizon agents.
Your Agent Needs a Harness, Not a Framework: Argument for reliability-first infrastructure around agents instead of framework-only thinking.

Category Overview

Category	Entries
Harness Architecture & Orchestration	44
Context & Working-State Engineering	16
Execution Substrates & Sandboxing	25
Protocols, Tool Interfaces & Agent Contracts	24
Evaluation Harnesses & Benchmarks	27
Observability & Reliability Operations	15
Guardrails, Security & Governance	19
Reference Harness Implementations	68
Essential Readings & Ecosystem Maps	32

Catalog

Notes:

Stars are rendered as badges from snapshot values.
Repository update dates are tracked in data/projects.yaml and validation reports.
Entries are sorted by stars (descending) within each category.

Harness Architecture & Orchestration

Project	Link	Tags	Summary
Superpowers	GitHub	skills, workflow, cross-agent	Cross-agent software development methodology built from composable skills, mandatory workflows, worktrees, planning, TDD, review, and subagent execution.
ECC	GitHub	cross-harness, hooks, skills	Cross-harness operator system combining skills, hooks, memory optimization, security scanning, and validation workflows for agentic work.
gstack	GitHub	skills, qa, release	Claude Code and cross-agent skill stack that turns product planning, architecture review, QA, security, release, and retrospectives into repeatable agent workflows.
DeerFlow	GitHub	long-horizon, memory, subagents	Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes.
oh-my-openagent	GitHub	multi-harness, team-mode, skills	Multi-harness agent OS for OpenCode, Codex, Claude Code, and other coding agents with team-mode orchestration, background agents, MCPs, and skills.
AutoGen	GitHub	multi-agent, orchestration, framework	Programming framework for agentic AI with multi-agent interaction and orchestration.
Ruflo	GitHub	multi-agent, swarm, mcp	Multi-agent orchestration platform for Claude Code with swarms, persistent memory, federation, plugins, and MCP hooks.
CrewAI	GitHub	multi-agent, workflows, control-plane	Multi-agent automation framework with production Flows, autonomous Crews, event-driven control, tracing, guardrails, memory, and human review hooks.
Addy's Agent Skills	GitHub	skills, quality-gates, coding-agents	Production-grade engineering skills for coding agents that package lifecycle workflows, quality gates, reviews, testing, debugging, security, and release practices.
Agno	GitHub	scale, runtime, management	Agent software runtime focused on running and managing agentic systems at scale.
LangGraph	GitHub	graph, workflow, runtime	Graph-based runtime for resilient stateful agents and deterministic workflow control.
Semantic Kernel	GitHub	enterprise, orchestration, plugins	Enterprise-grade agentic application framework with orchestration and plugin patterns.
OpenAI Agents SDK (Python)	GitHub	sdk, handoff, workflows	Lightweight framework for multi-agent workflows, handoffs, and production patterns.
Symphony	GitHub	orchestration, control-plane, workflows	Ticket-driven orchestration layer that turns project work into isolated autonomous implementation runs.
deepagents	GitHub	runtime, orchestration, long-running	Open-source harness for long-running, tool-using agents with planning and subagent patterns.
Archon	GitHub	workflow-engine, worktrees, validation	Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates.
Google ADK (Python)	GitHub	toolkit, deployment, evaluation	Code-first toolkit to build, evaluate, and deploy advanced AI agents.
elizaOS	GitHub	agent-os, plugins, benchmarks	Extensible agent runtime and operating system with CLI scaffolding, agent loop, plugins, memory/state primitives, dashboards, connectors, and benchmark suites.
PydanticAI	GitHub	python, typing, schema	Type-safe Python framework for agents with strong schema contracts and tooling.
Gas Town	GitHub	multi-agent, workspaces, coordination	Multi-agent workspace manager for coordinating coding agents with persistent work tracking, git-backed hooks, handoffs, supervision, and merge queues.
Microsoft Agent Framework	GitHub	multi-agent, workflows, observability	Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability.
Hive	GitHub	harness, orchestration, runtime	Outcome-driven agent runtime harness with explicit control loops and orchestration blocks.
VoltAgent	GitHub	typescript, platform, runtime	TypeScript agent engineering platform built around open runtime abstractions.
mcp-agent	GitHub	mcp, runtime, workflow	Practical agent framework centered on MCP tool ecosystems and workflow composition.
PraisonAI	GitHub	multi-agent, workflow, memory	Multi-agent workforce framework with autonomous planning, execution, memory, RAG, dashboards, and multi-provider model support.
Agent Squad	GitHub	routing, multi-agent, context	Multi-agent orchestration framework that routes requests, preserves conversation context, supports Python/TypeScript, and coordinates specialist agents.
Yao	GitHub	single-binary, runtime, autonomous	Single-binary runtime for defining and running autonomous agents.
Open Multi-Agent	GitHub	multi-agent, dag, tracing	TypeScript-native multi-agent orchestrator that turns goals into task DAGs with parallel execution, MCP integration, and live tracing.
Strands Agents	GitHub	sdk, mcp, tools	Model-driven agent SDK and monorepo with Python/TypeScript agent loops, provider adapters, tools, MCP integration, multi-agent systems, and streaming.
Cloudflare Agents	GitHub	platform, deployment, runtime	Platform runtime for building and deploying agents with production infrastructure primitives.
Flue	GitHub	typescript, headless, sandbox	TypeScript harness framework for building headless agents with sessions, tools, skills, and pluggable sandboxes.
Embabel Agent Framework	GitHub	jvm, planning, typed-flows	JVM agent framework for typed agentic flows with goals, actions, conditions, dynamic planning, platform modes, and testability.
OpenAI Agents SDK (JS/TS)	GitHub	typescript, workflows, sandbox-agents	JavaScript/TypeScript framework for multi-agent workflows with handoffs, tools, guardrails, sessions, tracing, and sandbox agents.
Docker Agent	GitHub	docker, runtime, container	Agent builder and runtime stack emphasizing container-native execution.
NeMo Agent Toolkit	GitHub	multi-agent, optimization, toolkit	Open toolkit for connecting and optimizing teams of AI agents.
Apache Burr	GitHub	state-machine, persistence, tracing	State-machine framework for decision-making agents and LLM apps with persistence, telemetry UI, tracing, and framework-agnostic execution.
Scion	GitHub	multi-agent, containers, orchestration	Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes.
deepagentsjs	GitHub	typescript, langgraph, subagents	TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks.
oh-my-agent	GitHub	multi-agent, skills, cross-runtime	Portable multi-agent harness that projects shared agents, skills, workflows, and rules into multiple coding-agent runtimes.
Chorus	GitHub	ai-dlc, permissions, task-state	AI-human collaboration harness for session lifecycle, task state, sub-agent orchestration, observability, and recovery.
Pydantic AI Harness	GitHub	capabilities, hooks, pydantic	Official Pydantic AI capability library for composing tools, lifecycle hooks, instructions, and model settings into reusable agent harnesses.
Water	GitHub	python, framework, approval-gates	Python agent harness framework for orchestration, resilience, observability, guardrails, approval gates, sandboxing, and deployment.
OmniCoreAgent	GitHub	python, mcp, serving	Python production harness with model loop, tools, MCP, memory, workspace files, guardrails, events, subagents, background tasks, and REST/SSE serving.
hankweave	GitHub	long-horizon, runtime, checkpoints	Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals.

Context & Working-State Engineering

Project	Link	Tags	Summary
claude-mem	GitHub	memory, context, session	Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs.
Beads	GitHub	memory, issue-tracking, work-state	Agent-optimized distributed issue tracker that stores long-horizon coding work as dependency-aware graph state with memory recall and multi-branch sync.
planning-with-files	GitHub	planning, skills, persistence	Skill package for persistent file-based planning in coding-agent workflows.
agentmemory	GitHub	memory, mcp, hooks	Persistent memory server for coding agents using hooks, MCP/REST integration, hybrid search, and shared session recall.
Context Mode	GitHub	context, mcp, session	MCP context optimization server that sandboxes tool output, indexes session events, and restores continuity across agent compactions.
Agent Skills for Context Engineering	GitHub	skills, context, production	Large skill library oriented around context engineering and production agents.
Trellis	GitHub	specs, memory, workflow	Multi-platform coding-agent workflow framework with task context, project memory, and spec injection.
Context-Engineering Handbook	GitHub	context-engineering, handbook, practices	First-principles handbook focused on practical context engineering for agent systems.
CCPM	GitHub	planning, github-issues, parallel-execution	Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution.
TencentDB Agent Memory	GitHub	memory, context-offloading, openclaw	Local agent memory plugin combining symbolic short-term state, layered long-term memory, traceability, and OpenClaw/Hermes integrations.
Acontext	GitHub	skills, memory, progressive-disclosure	Skill-memory layer that distills agent runs into inspectable skill files and recalls them through agent-controlled tools.
Awesome Context Engineering	GitHub	awesome-list, context, survey	Survey-style list for context engineering resources and frameworks.
agentic-stack	GitHub	cross-harness, memory, skills	Portable memory, skills, protocols, and dashboard layer that keeps state across multiple coding-agent harnesses.
context-space	GitHub	context, infrastructure, mcp	Infrastructure project focused on context engineering building blocks and MCP-centric integrations.
Memorix	GitHub	memory, mcp, cross-agent	Local-first cross-agent memory control plane with MCP support, workspace sync, sessions, and orchestration state.
sd0x-dev-flow	GitHub	hooks, state-machine, claude-code	Claude Code harness layer with hook-enforced dual review, durable state-machine gates, context-compaction recovery, and fail-closed safety.

Execution Substrates & Sandboxing

Project	Link	Tags	Summary
Daytona	GitHub	sandbox, execution, infra	Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs.
CUA	GitHub	computer-use, sandbox, infra	Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support.
Browser Harness	GitHub	browser, cdp, self-healing	Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight.
E2B	GitHub	cloud-sandbox, execution, enterprise	Secure cloud environments with real tools for production-grade agent execution.
OpenSandbox	GitHub	sandbox, security, runtime	Secure and extensible sandbox runtime built for agent workloads.
OpenShell	GitHub	sandbox, policy, runtime	Safe private runtime for autonomous agents with sandbox lifecycle control and declarative filesystem, network, process, and inference policies.
Microsandbox	GitHub	sandbox, vm, mcp	Rootless local VM sandbox runtime with SDKs, detached long-running sessions, agent skills, and MCP server integration.
CubeSandbox	GitHub	microvm, sandbox, e2b-compatible	MicroVM-based sandbox service for AI agents with sub-60ms startup, E2B-compatible APIs, and hardware-level isolation.
Sandcastle	GitHub	sandbox, typescript, branch-strategy	TypeScript library for orchestrating coding agents inside isolated sandboxes with configurable branch strategies.
agent-infra sandbox	GitHub	all-in-one, browser, shell	All-in-one sandbox combining browser, shell, files, MCP, and IDE server.
Judge0	GitHub	code-execution, sandbox, backend	Scalable sandboxed code execution system usable as an agent execution backend.
Agent Sandbox	GitHub	kubernetes, sandbox, stateful	Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support.
stakpak/agent	GitHub	always-on, autonomous, ops	Always-on open agent that runs on your machines with autonomous operational loops.
Sandbox Agent	GitHub	sandbox, coding-agents, session-schema	HTTP/SSE control server for running coding agents inside sandboxes with normalized sessions, permissions, event streaming, and replay.
E2B Desktop Sandbox	GitHub	desktop, sandbox, computer-use	Secure virtual desktop sandbox for computer-use agents with SDK control and screen streaming.
OSS-Fuzz Gen	GitHub	fuzzing, security, execution	LLM-powered fuzzing workflows integrated with controlled execution contexts.
AgentBay SDK	GitHub	cloud-sandbox, computer-use, sdk	Cloud sandbox SDK for agents spanning browser, desktop, mobile, and code execution environments.
Tensorlake	GitHub	microvm, sandbox, orchestration	Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration.
AgentScope Runtime	GitHub	runtime, sandbox, deployment	Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services.
SWE-ReX	GitHub	sandbox, execution, coding-agent	Sandboxed execution infrastructure for AI coding agents at local and cloud scale.
sandboxed.sh	GitHub	self-hosted, isolation, orchestrator	Self-hosted orchestrator running coding agents inside isolated Linux workspaces.
Capsule	GitHub	wasm, sandbox, task-runtime	Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking.
agentbox	GitHub	sandbox, coding-agents, network-policy	Locked-down local sandbox for AI coding agents with scoped filesystem access, egress policy, secret injection, firewalling, and persistent agent state.
HexAgent	GitHub	computer-layer, sandbox, runtime	Agent harness that separates the runtime from the computer it operates on through local, VM, and cloud sandbox backends.
terminal-bench-env	GitHub	terminal, benchmark-env, sandbox	Environment layer for terminal-agent benchmark execution.

Protocols, Tool Interfaces & Agent Contracts

Project	Link	Tags	Summary
Anthropic Agent Skills	GitHub	skills, spec, claude	Official Agent Skills repository containing the skills specification, templates, and reference skill implementations for Claude.
GitHub Spec Kit	GitHub	spec-driven, workflows, tooling	Toolkit for spec-driven development to guide deterministic agent execution.
MCP Servers	GitHub	mcp, servers, implementations	Official collection of MCP server implementations across tools and domains.
Chrome DevTools MCP	GitHub	mcp, browser, devtools	Official MCP server that gives coding agents Chrome DevTools access for reliable browser automation, debugging, and performance analysis.
Playwright MCP	GitHub	mcp, browser, playwright	Official Playwright MCP server giving agents structured accessibility snapshots and deterministic browser automation tools.
Claude Code Plugins Directory	GitHub	plugins, claude-code, marketplace	Anthropic-managed Claude Code plugin marketplace defining plugin manifests, MCP configuration, commands, agents, skills, and submission quality gates.
FastMCP	GitHub	mcp, python, framework	Python framework for building MCP servers and clients with generated schemas, validation, documentation, production deployment patterns, and governance hooks.
Serena	GitHub	mcp, coding-agents, semantic-tools	MCP toolkit that gives coding agents IDE-like semantic retrieval, editing, refactoring, debugging, and memory tools.
MCP Python SDK	GitHub	mcp, python, sdk	Official Python implementation of MCP for building clients and servers that expose tools, resources, prompts, protocol lifecycle events, and standard transports.
AGENTS.md	GitHub	spec, agent-file, instructions	Open format for repository-local instructions that coding agents can follow.
Agent Skills Specification	GitHub	skills, spec, progressive-disclosure	Open specification and documentation for packaging reusable agent capabilities, workflows, scripts, references, and assets behind progressive disclosure.
MCP TypeScript SDK	GitHub	mcp, typescript, sdk	Official TypeScript MCP SDK with server and client packages, transports, auth helpers, middleware adapters, and runnable examples.
Model Context Protocol	GitHub	mcp, protocol, interoperability	Core specification and docs for MCP-based tool and context interoperability.
directories (rules and MCP indexes)	GitHub	directories, mcp, rules	Curated directories of agent rules and MCP servers for tool discovery.
Atmosphere	GitHub	jvm, multi-protocol, governance	JVM runtime for streaming governable AI agents across MCP, A2A, AG-UI, and browser-facing transports.
LangChain MCP Adapters	GitHub	mcp, adapters, integration	Adapters connecting LangChain components with MCP servers.
SkillHub	GitHub	skills, registry, governance	Self-hosted enterprise agent skill registry with package publishing, versioning, discovery, namespaces, RBAC, reviews, and audit logs.
Agent Client Protocol	GitHub	acp, protocol, coding-agents	Open protocol that standardizes communication between code editors and coding agents.
Microsoft MCP Servers	GitHub	mcp, enterprise, servers	Microsoft's official MCP server catalog for enterprise data and tools.
ACPX	GitHub	acp, client, sessions	Headless CLI client for stateful Agent Client Protocol sessions.
GitAgentProtocol	GitHub	standard, git-native, workflows	Git-native, framework-agnostic standard for defining agents, skills, workflows, tools, and runtime memory in repositories.
Microsoft Learn MCP	GitHub	mcp, docs, grounding	MCP server and CLI for grounding agents with Microsoft documentation sources.
IBM MCP	GitHub	mcp, clients, tooling	IBM collection of MCP servers, clients, and developer tooling.
AGENT.md	GitHub	standard, agent-file, interoperability	Standardized machine-readable file format for agentic coding tools.

Evaluation Harnesses & Benchmarks

Project	Link	Tags	Summary
Promptfoo	GitHub	eval, red-team, ci	Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool.
DeepEval	GitHub	evaluation, framework, testing	LLM evaluation framework supporting agent and workflow quality testing.
RAGAS	GitHub	rag, metrics, evaluation	Open evaluation toolkit for LLM and RAG quality metrics.
lm-evaluation-harness	GitHub	benchmark, harness, llm	Popular benchmark harness for consistent LLM evaluation across tasks.
SWE-bench	GitHub	benchmark, swe, evaluation	Standard benchmark for evaluating issue-fixing software engineering agents.
verifiers	GitHub	verifier, rl, evaluation	Library for RL environments and verifier-based evaluation loops.
AgentBench	GitHub	benchmark, cross-domain, agent	Cross-environment benchmark for evaluating LLM agents as tool-using systems.
LangWatch	GitHub	simulation, evaluation, testing	End-to-end platform for agent simulations, evaluation loops, and production testing.
EvalScope	GitHub	benchmark, framework, llm	Customizable framework for large-model benchmarking and performance evaluation.
Harbor	GitHub	evaluation, harness, rl-env	Framework for running agent evaluations and constructing RL-style environments.
Terminal-Bench	GitHub	terminal, benchmark, long-horizon	Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks.
WebArena	GitHub	web-agent, benchmark, environment	Self-hostable web environment and evaluation harness for autonomous web agents with reproducible end-to-end tasks.
tau2-bench	GitHub	tool-use, interaction, benchmark	Tool-agent-user interaction benchmark emphasizing multi-step execution quality.
Meta-Harness	GitHub	harness-search, optimization, terminal-bench	Framework for automated search over task-specific model harnesses, with reference experiments for memory systems and terminal-agent scaffolds.
NeMo Gym	GitHub	rl-env, training, evaluation	Toolkit for building RL environments suitable for LLM/agent training and eval.
TheAgentCompany	GitHub	benchmark, workplace, multi-step	Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy.
Claw-Eval	GitHub	benchmark, trajectory, safety	Evaluation harness and benchmark for autonomous agents with human-verified tasks, trajectory auditing, and completion, safety, and robustness rubrics.
Inspect Evals	GitHub	inspect, eval-suite, reproducibility	Evaluation suite collection for Inspect AI workflows.
auto-harness	GitHub	optimization, regression, evals	Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight.
SWE-Bench Pro	GitHub	swe, benchmark, long-horizon	Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents.
WildClawBench	GitHub	benchmark, harness-comparison, multimodal	In-the-wild benchmark that compares multiple agent harnesses on end-to-end multimodal, coding, safety, and productivity tasks inside a live OpenClaw environment.
ClawBench	GitHub	browser-agent, benchmark, recording	Browser-agent benchmark with live-site tasks, isolated containers, five-layer recording, and agentic scoring.
Agent Evaluation	GitHub	evaluation, testing, ci	AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows.
WorkArena	GitHub	browser, benchmark, enterprise	Browser benchmark for practical enterprise-like knowledge work tasks.
OpenHands Benchmarks	GitHub	openhands, eval, harness	Evaluation harness and benchmark definitions for OpenHands systems.
WebArena-Verified	GitHub	web-agent, benchmark, deterministic	Verified web-agent benchmark with deterministic evaluators.
HarnessBench	GitHub	harness-comparison, browser-agent, benchmark	Benchmark for comparing agent harnesses on the same everyday web tasks with fixed models and per-harness containers.

Observability & Reliability Operations

Project	Link	Tags	Summary
Langfuse	GitHub	llmops, tracing, metrics	Open-source LLM engineering platform for traces, metrics, prompts, and evals.
MLflow	GitHub	platform, monitoring, evaluation	Broad AI engineering platform with monitoring and evaluation support for agents.
Opik	GitHub	monitoring, eval, tracing	End-to-end debug/eval/monitoring stack for LLM apps and agent workflows.
RagaAI Catalyst	GitHub	agentops, analytics, monitoring	Agent observability and monitoring framework with timeline and graph analytics.
TensorZero	GitHub	llmops, gateway, optimization	Open LLMOps stack unifying gateway, observability, evaluation, and optimization.
Arize Phoenix	GitHub	observability, tracing, evaluation	Open platform for AI observability, tracing, and evaluation analytics.
OpenLLMetry	GitHub	opentelemetry, instrumentation, tracing	OpenTelemetry-based instrumentation for GenAI and LLM applications.
Helicone	GitHub	monitoring, traffic, production	Lightweight platform for monitoring and evaluating LLM traffic in production.
AgentOps SDK	GitHub	agentops, monitoring, cost	Monitoring and benchmarking SDK for agent workflows with cost and trace tracking.
Latitude	GitHub	platform, eval, observability	Open-source agent engineering platform with eval and observability capabilities.
Laminar	GitHub	observability, tracing, evals	Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards.
Desloppify	GitHub	quality-gates, codebase-health, ci	Agent-facing codebase quality harness with scans, scoring, LLM review, prioritized fix loops, persistent state, and CI gates.
claude-code-reverse	GitHub	trace, visualization, debugging	Tooling to visualize and inspect Claude Code LLM interaction traces.
Future AGI	GitHub	observability, evaluation, guardrails	Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations.
OpenInference	GitHub	spec, instrumentation, observability	Open instrumentation specification and tooling for AI observability.

Guardrails, Security & Governance

Project	Link	Tags	Summary
LiteLLM	GitHub	gateway, proxy, guardrails	Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails.
Kong	GitHub	gateway, policy, infra	API and AI gateway infrastructure useful for policy enforcement in agent systems.
Parlant	GitHub	interaction-control, guardrails, customer-agents	Interaction-control harness for customer-facing agents focused on consistent, predictable, and governed LLM behavior.
Portkey Gateway	GitHub	gateway, guardrails, routing	AI gateway with routing and guardrails for multi-model production traffic.
CAI (Cybersecurity AI)	GitHub	security, governance, framework	Security-focused agent framework for offensive/defensive AI workflows.
OpenAI Realtime Agents	GitHub	realtime, orchestration, control	Advanced agentic realtime patterns with structured control and interaction loops.
Plano	GitHub	proxy, safety, data-plane	AI-native proxy and data plane with orchestration, safety, and observability.
OpenAI CS Agents Demo	GitHub	demo, handoffs, governance	Customer-service multi-agent demo highlighting handoffs and guardrail-like control points.
Agent Governance Toolkit	GitHub	governance, policy, sandboxing	Runtime governance toolkit that deterministically enforces agent policy, identity, sandboxing, and audit controls before actions execute.
ContextForge	GitHub	gateway, governance, observability	Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability.
Archestra	GitHub	enterprise, guardrails, governance	Enterprise AI platform with guardrails, MCP registry, and orchestration services.
Tracecat	GitHub	security, automation, policy	AI automation platform for security teams with policy and workflow controls.
AgentGateway	GitHub	gateway, mcp, proxy	Agentic proxy gateway for AI agents and MCP server ecosystems.
ClawManager	GitHub	control-plane, governance, runtimes	Kubernetes-native control plane for governing agent runtimes, AI gateway access, and reusable skills across multiple agent backends.
Agent Vault	GitHub	credentials, egress-policy, proxy	Credential proxy and vault that brokers agent API access without exposing real secrets, with egress filtering and request logging.
Haft	GitHub	governance, decisions, mcp	Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute.
Sponsio	GitHub	contracts, runtime-safety, guardrails	Runtime enforcement layer that checks every agent action against deterministic contracts before execution.
DashClaw	GitHub	approvals, policy, audit	Governance layer that intercepts risky agent actions, enforces policy, routes approvals, and records audit-ready decision trails.
Tandem	GitHub	runtime-authority, approvals, audit	Governed runtime authority layer for agents with scoped execution, tool visibility, permissioned memory, approval gates, and audit trails.

Reference Harness Implementations

Project	Link	Tags	Summary
OpenClaw	GitHub	gateway, channels, sandboxing	Local-first personal assistant harness with a gateway control plane for sessions, channels, tools, events, skills, and sandboxed non-main agents.
Claw Code	GitHub	rust, cli, sessions	Public Rust implementation of the claw CLI agent harness with auth, sessions, parity checks, container workflows, and terminal execution guidance.
Hermes Agent	GitHub	memory, skills, subagents	Self-improving agent runtime with memory, skill creation, subagents, scheduled automations, and pluggable terminal backends.
OpenCode	GitHub	terminal, coding-agent, subagents	Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime.
Claude Code	GitHub	terminal, coding-agent, git-workflows	Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language.
Gemini CLI	GitHub	terminal, coding-agent, mcp	Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls.
Browser Use	GitHub	browser-agent, automation, benchmarks	Browser-agent framework that exposes websites to LLMs through browser state, tools, cloud browsers, and benchmarked task runs.
Codex CLI	GitHub	terminal, coding-agent, local-execution	Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks.
LobeHub	GitHub	operator, multi-agent, scheduling	Chief-agent-operator platform for scheduling, running, and reporting on multi-agent workstreams.
OpenHands	GitHub	coding-agent, software-engineering, repo	Open-source AI software engineer focused on repo-level coding task execution.
Paperclip	GitHub	managed-agents, control-plane, governance	Managed-agent control plane with org charts, ticketing, budgets, heartbeats, and audit trails for coordinating agent teams.
learn-claude-code	GitHub	tutorial, harness, claude-code	Hands-on harness tutorial for building Claude Code-like systems from scratch.
Cline	GitHub	coding-agent, mcp, checkpoints	Open-source coding agent spanning IDE, terminal, SDK, and kanban surfaces with shared approvals, MCP, checkpoints, and agent teams.
pi	GitHub	coding-agent, runtime, monorepo	Agent harness monorepo combining a coding-agent CLI, shared runtime, and multi-provider LLM stack.
OpenManus	GitHub	general-agent, autonomy, workflows	Open foundation for broad autonomous agent workflows with coding-heavy use cases.
aider	GitHub	terminal, repo-map, testing	Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops.
CowAgent	GitHub	reference, skills, multi-channel	Reference agent harness implementation with planning, memory, knowledge, skills, tools, MCP integration, schedulers, browser automation, and multi-channel delivery.
nanobot	GitHub	runtime, memory, multi-channel	Ultra-lightweight agent runtime with WebUI, chat channels, tools, memory, MCP, model routing, deployment, and long-running goal support.
CLI-Anything	GitHub	cli, tool-use, automation	CLI agent system that unifies command-line tool usage in agent loops.
Claude Code Plugins: Orchestration and Automation	GitHub	claude-code, plugins, orchestration	Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators.
Agent TARS	GitHub	computer-use, browser-agent, mcp	Multimodal computer and browser agent stack with CLI/Web UI, hybrid GUI/DOM browser control, MCP tools, event streams, and sandbox support.
oh-my-claudecode	GitHub	claude-code, multi-agent, worktrees	Team-first orchestration layer for Claude Code with staged multi-agent execution, worktree-aware setup, and persistent session artifacts.
Multica	GitHub	managed-agents, coding-agent, runtimes	Managed-agents platform that assigns issues to coding agents, routes execution through runtimes, and compounds reusable skills.
ZeroClaw	GitHub	runtime, approval-gates, sandboxing	Single-binary agent runtime with providers, channels, tools, memory, SOPs, approval gates, sandboxing, ACP, and tool receipts.
oh-my-codex	GitHub	codex, workflow, worktrees	Workflow layer for OpenAI Codex CLI with stronger session startup, standard planning-to-completion flows, durable state, skills, hooks, and worktree launches.
NanoClaw	GitHub	containers, claude-sdk, scheduling	Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization.
Vibe Kanban	GitHub	coding-agent, workspaces, review	Kanban control plane for planning, running, reviewing, and merging work from coding agents in isolated workspaces.
Qwen Code	GitHub	terminal, coding-agent, cli	Terminal-native open-source coding agent tuned for practical dev loops.
SuperClaude Framework	GitHub	config, personas, workflow	Configuration framework adding commands, personas, and method templates to coding agents.
cmux	GitHub	macos, workspace, browser	Native macOS terminal and browser workspace for AI coding agents with notifications, split panes, and scriptable control.
Compound Engineering	GitHub	plugins, worktrees, review	Cross-agent engineering plugin that codifies brainstorming, planning, worktree execution, review, and knowledge compounding loops.
Devika	GitHub	assistant, planning, coding	Open-source coding assistant system for planning and implementing development tasks.
SWE-agent	GitHub	swe, issue-fixing, tooling	Research-grade coding agent that resolves GitHub issues with explicit tooling loops.
OpenFang	GitHub	agent-os, guardrails, rust	Rust agent operating system with autonomous capability packages, manifests, guardrails, tools, memory, sandboxing, audit trails, and channel adapters.
Aperant	GitHub	coding-agent, parallel, memory	Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory.
Eigent	GitHub	desktop, cowork, productivity	Open-source desktop cowork agent for autonomous task execution and productivity.
OpenHarness	GitHub	tool-use, memory, multi-agent	Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination.
IronClaw	GitHub	security, wasm, routines	Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory.
Agent S	GitHub	computer-use, gui-agent, evaluation	Open-source computer-use agent framework with grounding models, reflection, local code execution option, and OSWorld-style evaluation support.
Superset	GitHub	worktrees, desktop, parallel	Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace.
oh-my-pi	GitHub	terminal, lsp, subagents	Terminal AI coding agent with edit safety, LSP integration, and subagent support.
GitHub Copilot CLI	GitHub	terminal, coding-agent, mcp	Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context.
Open SWE	GitHub	async, coding-agent, swe	Asynchronous open-source coding agent focused on software issue workflows.
Paseo	GitHub	coding-agent, daemon, multi-device	Multi-device coding-agent daemon and client stack for orchestrating local agents, parallel runs, and cross-provider workflows.
Agent Orchestrator	GitHub	worktrees, parallel, dashboard	Worktree-based orchestration layer for parallel coding agents with autonomous CI and review feedback handling.
jcode	GitHub	coding-agent, terminal, rust	Rust coding-agent harness built for multi-session workflows, customization, memory, and terminal performance.
Harness	GitHub	claude-code, meta-factory, agent-teams	Claude Code meta-factory that generates domain-specific agent teams, skills, orchestration patterns, and validation steps from a project description.
OSAURUS	GitHub	macos, local-first, memory	Native macOS harness for autonomous coding agents with persistent memory.
1Code	GitHub	coding-agent, orchestration, worktrees	Desktop-first coding-agent orchestrator with worktree isolation, background sandboxes, MCP tooling, and automation triggers.
holaOS	GitHub	long-horizon, desktop, durable-state	Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state.
Webwright	GitHub	browser-agent, code-as-action, playwright	Minimal browser-agent harness that lets coding models solve web tasks by writing rerunnable Playwright scripts in a workspace.
mini-swe-agent	GitHub	minimal, swe, coding-agent	Minimal coding agent implementation with strong benchmark competitiveness.
HiClaw	GitHub	multi-agent, human-in-the-loop, shared-state	Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms.
gptme	GitHub	terminal, tools, mcp	Terminal-native personal agent with local tools, shell and web access, provider-agnostic models, plugins, skills, MCP, guardrails, and autonomous loops.
TinyAGI	GitHub	team-orchestration, autonomous, workflows	Team-style agent orchestrator for one-person-company style autonomous workflows.
Open Claude Cowork	GitHub	desktop, ui, orchestration	Desktop coding cowork assistant that turns agent orchestration into GUI workflows.
Amazon Bedrock AgentCore Samples	GitHub	aws, runtime, operations	Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers.
Maestro	GitHub	desktop, worktrees, orchestration	Desktop command center for parallel coding agents with worktree isolation, queued tasks, auto-run playbooks, and reusable sessions.
AI-DLC Workflows	GitHub	workflow-rules, quality-gates, steering	Official AWS workflow ruleset that steers coding agents through adaptive phases, quality gates, and IDE-specific context files.
Google Agents CLI	GitHub	google-cloud, lifecycle, skills	Google Cloud CLI and skill bundle that gives coding agents scaffold, evaluation, deployment, publishing, and observability workflows.
Open Cowork	GitHub	desktop, sandbox, mcp	Desktop agent app with VM-backed sandboxing, MCP connectors, GUI control, and built-in skill workflows.
thClaws	GitHub	rust, workspace, skills	Native Rust agent workspace with shared agent loop, sessions, tools, skills, MCP, memory, hooks, and sandboxing.
mini-coding-agent	GitHub	coding-agent, minimal, approvals	Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts.
codex-autorunner	GitHub	meta-harness, tickets, long-running	Meta-harness that treats tickets as the control plane for long-running coding agents, with queue execution, hub UI, and chat notifications.
CheetahClaws	GitHub	coding-agent, python, mcp	Python agent harness infrastructure for long-horizon, multi-model, tool-using coding assistants with MCP, skills, memory, approvals, checkpoints, and bridges.
MateClaw	GitHub	self-hosted, approvals, channels	Self-hosted multi-user agent harness with StateGraph reasoning, skills, MCP/ACP registry, approvals, audit trail, and channel adapters.
OpenClaw.NET	GitHub	dotnet, gateway, governance	NativeAOT-friendly .NET agent runtime and gateway with tools, memory, MCP, governance ledger, evidence bundles, and harness regression tests.
Utah	GitHub	durable-execution, event-driven, multi-channel	Inngest-powered durable agent harness with a think-act-observe loop, step-level retries, singleton concurrency, cancellation, and multi-channel adapters.

Essential Readings & Ecosystem Maps

Project	Link	Stars	Tags	Summary
awesome-claude-code	GitHub		awesome-list, claude-code, skills	Community collection of Claude Code skills, hooks, and orchestrator tooling.
Awesome Agent Skills	GitHub		awesome-list, skills, cross-harness	Curated cross-harness map of official and community Agent Skills for Claude Code, Codex, Gemini CLI, Cursor, OpenCode, Copilot, and related hosts.
awesome-agentic-patterns	GitHub		awesome-list, patterns, design	Catalog of reusable agentic design patterns and implementation motifs.
awesome-mcp-servers	GitHub		awesome-list, mcp, tools	Curated MCP server index for tool interoperability in agent systems.
awesome-harness-engineering	GitHub		awesome-list, curation, harness	Curated list focused on harness engineering articles, benchmarks, and implementations.
12 Factor Agents	Reference	-	reading, operations, principles	Operations-oriented principles for building maintainable production agents.
Agent Frameworks, Runtimes, and Harnesses, oh my!	Reference	-	reading, langchain, architecture	Clear decomposition of framework vs runtime vs harness responsibilities.
An open-source spec for Codex orchestration: Symphony.	Reference	-	reading, openai, orchestration	OpenAI's orchestration write-up on turning issue trackers into always-on control planes for coding agents.
Building agents with the Claude Agent SDK	Reference	-	reading, claude, sdk	Claude blog on production-oriented SDK usage for sessions, tools, and orchestration.
Building Effective AI Agents	Reference	-	reading, anthropic, agents	Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
Claude Code auto mode	Reference	-	reading, anthropic, permissions	Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
Code execution with MCP	Reference	-	reading, anthropic, mcp	Anthropic's design notes on controlled code execution via MCP boundaries.
Demystifying Evals for AI Agents	Reference	-	reading, evals, anthropic	Methodology for designing robust agent evals in non-deterministic trajectories.
Effective context engineering for AI agents	Reference	-	reading, context, anthropic	Guidance on context-window budgeting and working-state management for agents.
Effective harnesses for long-running agents	Reference	-	reading, long-running, anthropic	Practical guide to maintaining state, resumability, and reliability over long agent runs.
Evaluating Deep Agents: Our Learnings	Reference	-	reading, langchain, evaluation	LangChain's practical lessons on evaluating stateful and long-horizon agents.
Harness design for long-running application development	Reference	-	reading, app-dev, anthropic	Follow-up article on improving long-running app generation through harness structure.
Harness Engineering (Martin Fowler)	Reference	-	reading, architecture, fowler	Architectural perspective on harness engineering and entropy control.
Harness engineering (OpenAI)	Reference	-	reading, methodology, openai	Field report on building reliable agent-first software via harness constraints and verification.
How we built our multi-agent research system	Reference	-	reading, anthropic, multi-agent	Anthropic architecture write-up on role separation and coordination in multi-agent systems.
Improving Deep Agents with harness engineering	Reference	-	reading, langchain, harness	Evidence that harness improvements alone can move benchmark performance.
Making Claude Code more secure and autonomous with sandboxing	Reference	-	reading, anthropic, sandboxing	How Anthropic uses sandbox boundaries to raise agent autonomy without giving up security controls.
Quantifying infrastructure noise in agentic coding evals	Reference	-	reading, anthropic, evaluation	Analysis of how infrastructure choices impact coding-agent benchmark outcomes.
Scaling Managed Agents: Decoupling the brain from the hands	Reference	-	reading, anthropic, architecture	Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
Skill Issue: Harness Engineering for Coding Agents	Reference	-	reading, humanlayer, coding-agents	Practical breakdown of why coding-agent quality depends heavily on harness setup.
Testing Agent Skills Systematically with Evals	Reference	-	reading, openai, evals	OpenAI Developers guide for turning agent traces into repeatable skill evaluations.
The Anatomy of an Agent Harness	Reference	-	reading, architecture, langchain	Conceptual decomposition of agent harness components and their responsibilities.
The next evolution of the Agents SDK	Reference	-	reading, openai, sdk	OpenAI's product and engineering post on model-native agent harnesses, native sandbox execution, manifests, memory, and filesystem and shell tools.
Unrolling the Codex agent loop	Reference	-	reading, openai, architecture	OpenAI engineering deep dive into the Codex harness loop, prompt growth, tool-call replay, and stateless execution tradeoffs.
What We Learned Building Cloud Agents	Reference	-	reading, cognition, cloud-agents	Cognition's field report on secure cloud-agent infrastructure, VM isolation, full-state snapshots, orchestration, governance, integrations, and enterprise adoption.
Writing effective tools for AI agents	Reference	-	reading, anthropic, tools	Best practices for tool interface design so agents call tools safely and reliably.
Your Agent Needs a Harness, Not a Framework	Reference	-	reading, inngest, reliability	Argument for reliability-first infrastructure around agents instead of framework-only thinking.

Maintenance Notes

Source of truth: data/projects.yaml
Regenerate README files: python3 scripts/render_readme.py
Verify catalog and links: python3 scripts/verify_catalog.py

Citation

@misc{li2026agentharness,
  title={Agent Harness Engineering: A Survey},
  author={Li, Junjie and Xiao, Xi and Zhang, Yunbei and Liu, Chen and Zhao, Lin and Liao, Xiaoying and Ji, Yingrui and Wang, Janet and Gu, Jianyang and Ge, Yingqiang and Xu, Weijie and Fang, Xi and Xu, Xiang and Zhao, Tianchen and Kim, Youngeun and Wang, Tianyang and Hamm, Jihun and Krishnaswamy, Smita and Huan, Jun and Reddy, Chandan},
  url={https://openreview.net/pdf?id=eONq7FdiHa},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
docs		docs
reports/verification		reports/verification
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Agent Harness

Featured Harness Blogs

Contents

Category Overview

Catalog

Harness Architecture & Orchestration

Context & Working-State Engineering

Execution Substrates & Sandboxing

Protocols, Tool Interfaces & Agent Contracts

Evaluation Harnesses & Benchmarks

Observability & Reliability Operations

Guardrails, Security & Governance

Reference Harness Implementations

Essential Readings & Ecosystem Maps

Maintenance Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome Agent Harness

Featured Harness Blogs

Contents

Category Overview

Catalog

Harness Architecture & Orchestration

Context & Working-State Engineering

Execution Substrates & Sandboxing

Protocols, Tool Interfaces & Agent Contracts

Evaluation Harnesses & Benchmarks

Observability & Reliability Operations

Guardrails, Security & Governance

Reference Harness Implementations

Essential Readings & Ecosystem Maps

Maintenance Notes

Citation

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages