Skip to content

Picrew/awesome-agent-harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Agent Harness

A curated, implementation-first list of agent harness engineering resources, with GitHub projects as the primary focus.

  • Total entries: 270
  • GitHub entries: 243 (90.0%)
  • GitHub in project categories (excluding readings): 238/238 (100.0%)
  • Categories: 9
  • Last verified: 2026-06-09
  • Language: English | 中文

Featured Harness Blogs

Contents

Category Overview

Category Entries
Harness Architecture & Orchestration 44
Context & Working-State Engineering 16
Execution Substrates & Sandboxing 25
Protocols, Tool Interfaces & Agent Contracts 24
Evaluation Harnesses & Benchmarks 27
Observability & Reliability Operations 15
Guardrails, Security & Governance 19
Reference Harness Implementations 68
Essential Readings & Ecosystem Maps 32

Catalog

Notes:

  • Stars are rendered as badges from snapshot values.
  • Repository update dates are tracked in data/projects.yaml and validation reports.
  • Entries are sorted by stars (descending) within each category.

Harness Architecture & Orchestration

Project Link Stars Tags Summary
Superpowers GitHub star skills, workflow, cross-agent Cross-agent software development methodology built from composable skills, mandatory workflows, worktrees, planning, TDD, review, and subagent execution.
ECC GitHub star cross-harness, hooks, skills Cross-harness operator system combining skills, hooks, memory optimization, security scanning, and validation workflows for agentic work.
gstack GitHub star skills, qa, release Claude Code and cross-agent skill stack that turns product planning, architecture review, QA, security, release, and retrospectives into repeatable agent workflows.
DeerFlow GitHub star long-horizon, memory, subagents Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes.
oh-my-openagent GitHub star multi-harness, team-mode, skills Multi-harness agent OS for OpenCode, Codex, Claude Code, and other coding agents with team-mode orchestration, background agents, MCPs, and skills.
AutoGen GitHub star multi-agent, orchestration, framework Programming framework for agentic AI with multi-agent interaction and orchestration.
Ruflo GitHub star multi-agent, swarm, mcp Multi-agent orchestration platform for Claude Code with swarms, persistent memory, federation, plugins, and MCP hooks.
CrewAI GitHub star multi-agent, workflows, control-plane Multi-agent automation framework with production Flows, autonomous Crews, event-driven control, tracing, guardrails, memory, and human review hooks.
Addy's Agent Skills GitHub star skills, quality-gates, coding-agents Production-grade engineering skills for coding agents that package lifecycle workflows, quality gates, reviews, testing, debugging, security, and release practices.
Agno GitHub star scale, runtime, management Agent software runtime focused on running and managing agentic systems at scale.
LangGraph GitHub star graph, workflow, runtime Graph-based runtime for resilient stateful agents and deterministic workflow control.
Semantic Kernel GitHub star enterprise, orchestration, plugins Enterprise-grade agentic application framework with orchestration and plugin patterns.
OpenAI Agents SDK (Python) GitHub star sdk, handoff, workflows Lightweight framework for multi-agent workflows, handoffs, and production patterns.
Symphony GitHub star orchestration, control-plane, workflows Ticket-driven orchestration layer that turns project work into isolated autonomous implementation runs.
deepagents GitHub star runtime, orchestration, long-running Open-source harness for long-running, tool-using agents with planning and subagent patterns.
Archon GitHub star workflow-engine, worktrees, validation Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates.
Google ADK (Python) GitHub star toolkit, deployment, evaluation Code-first toolkit to build, evaluate, and deploy advanced AI agents.
elizaOS GitHub star agent-os, plugins, benchmarks Extensible agent runtime and operating system with CLI scaffolding, agent loop, plugins, memory/state primitives, dashboards, connectors, and benchmark suites.
PydanticAI GitHub star python, typing, schema Type-safe Python framework for agents with strong schema contracts and tooling.
Gas Town GitHub star multi-agent, workspaces, coordination Multi-agent workspace manager for coordinating coding agents with persistent work tracking, git-backed hooks, handoffs, supervision, and merge queues.
Microsoft Agent Framework GitHub star multi-agent, workflows, observability Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability.
Hive GitHub star harness, orchestration, runtime Outcome-driven agent runtime harness with explicit control loops and orchestration blocks.
VoltAgent GitHub star typescript, platform, runtime TypeScript agent engineering platform built around open runtime abstractions.
mcp-agent GitHub star mcp, runtime, workflow Practical agent framework centered on MCP tool ecosystems and workflow composition.
PraisonAI GitHub star multi-agent, workflow, memory Multi-agent workforce framework with autonomous planning, execution, memory, RAG, dashboards, and multi-provider model support.
Agent Squad GitHub star routing, multi-agent, context Multi-agent orchestration framework that routes requests, preserves conversation context, supports Python/TypeScript, and coordinates specialist agents.
Yao GitHub star single-binary, runtime, autonomous Single-binary runtime for defining and running autonomous agents.
Open Multi-Agent GitHub star multi-agent, dag, tracing TypeScript-native multi-agent orchestrator that turns goals into task DAGs with parallel execution, MCP integration, and live tracing.
Strands Agents GitHub star sdk, mcp, tools Model-driven agent SDK and monorepo with Python/TypeScript agent loops, provider adapters, tools, MCP integration, multi-agent systems, and streaming.
Cloudflare Agents GitHub star platform, deployment, runtime Platform runtime for building and deploying agents with production infrastructure primitives.
Flue GitHub star typescript, headless, sandbox TypeScript harness framework for building headless agents with sessions, tools, skills, and pluggable sandboxes.
Embabel Agent Framework GitHub star jvm, planning, typed-flows JVM agent framework for typed agentic flows with goals, actions, conditions, dynamic planning, platform modes, and testability.
OpenAI Agents SDK (JS/TS) GitHub star typescript, workflows, sandbox-agents JavaScript/TypeScript framework for multi-agent workflows with handoffs, tools, guardrails, sessions, tracing, and sandbox agents.
Docker Agent GitHub star docker, runtime, container Agent builder and runtime stack emphasizing container-native execution.
NeMo Agent Toolkit GitHub star multi-agent, optimization, toolkit Open toolkit for connecting and optimizing teams of AI agents.
Apache Burr GitHub star state-machine, persistence, tracing State-machine framework for decision-making agents and LLM apps with persistence, telemetry UI, tracing, and framework-agnostic execution.
Scion GitHub star multi-agent, containers, orchestration Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes.
deepagentsjs GitHub star typescript, langgraph, subagents TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks.
oh-my-agent GitHub star multi-agent, skills, cross-runtime Portable multi-agent harness that projects shared agents, skills, workflows, and rules into multiple coding-agent runtimes.
Chorus GitHub star ai-dlc, permissions, task-state AI-human collaboration harness for session lifecycle, task state, sub-agent orchestration, observability, and recovery.
Pydantic AI Harness GitHub star capabilities, hooks, pydantic Official Pydantic AI capability library for composing tools, lifecycle hooks, instructions, and model settings into reusable agent harnesses.
Water GitHub star python, framework, approval-gates Python agent harness framework for orchestration, resilience, observability, guardrails, approval gates, sandboxing, and deployment.
OmniCoreAgent GitHub star python, mcp, serving Python production harness with model loop, tools, MCP, memory, workspace files, guardrails, events, subagents, background tasks, and REST/SSE serving.
hankweave GitHub star long-horizon, runtime, checkpoints Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals.

Context & Working-State Engineering

Project Link Stars Tags Summary
claude-mem GitHub star memory, context, session Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs.
Beads GitHub star memory, issue-tracking, work-state Agent-optimized distributed issue tracker that stores long-horizon coding work as dependency-aware graph state with memory recall and multi-branch sync.
planning-with-files GitHub star planning, skills, persistence Skill package for persistent file-based planning in coding-agent workflows.
agentmemory GitHub star memory, mcp, hooks Persistent memory server for coding agents using hooks, MCP/REST integration, hybrid search, and shared session recall.
Context Mode GitHub star context, mcp, session MCP context optimization server that sandboxes tool output, indexes session events, and restores continuity across agent compactions.
Agent Skills for Context Engineering GitHub star skills, context, production Large skill library oriented around context engineering and production agents.
Trellis GitHub star specs, memory, workflow Multi-platform coding-agent workflow framework with task context, project memory, and spec injection.
Context-Engineering Handbook GitHub star context-engineering, handbook, practices First-principles handbook focused on practical context engineering for agent systems.
CCPM GitHub star planning, github-issues, parallel-execution Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution.
TencentDB Agent Memory GitHub star memory, context-offloading, openclaw Local agent memory plugin combining symbolic short-term state, layered long-term memory, traceability, and OpenClaw/Hermes integrations.
Acontext GitHub star skills, memory, progressive-disclosure Skill-memory layer that distills agent runs into inspectable skill files and recalls them through agent-controlled tools.
Awesome Context Engineering GitHub star awesome-list, context, survey Survey-style list for context engineering resources and frameworks.
agentic-stack GitHub star cross-harness, memory, skills Portable memory, skills, protocols, and dashboard layer that keeps state across multiple coding-agent harnesses.
context-space GitHub star context, infrastructure, mcp Infrastructure project focused on context engineering building blocks and MCP-centric integrations.
Memorix GitHub star memory, mcp, cross-agent Local-first cross-agent memory control plane with MCP support, workspace sync, sessions, and orchestration state.
sd0x-dev-flow GitHub star hooks, state-machine, claude-code Claude Code harness layer with hook-enforced dual review, durable state-machine gates, context-compaction recovery, and fail-closed safety.

Execution Substrates & Sandboxing

Project Link Stars Tags Summary
Daytona GitHub star sandbox, execution, infra Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs.
CUA GitHub star computer-use, sandbox, infra Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support.
Browser Harness GitHub star browser, cdp, self-healing Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight.
E2B GitHub star cloud-sandbox, execution, enterprise Secure cloud environments with real tools for production-grade agent execution.
OpenSandbox GitHub star sandbox, security, runtime Secure and extensible sandbox runtime built for agent workloads.
OpenShell GitHub star sandbox, policy, runtime Safe private runtime for autonomous agents with sandbox lifecycle control and declarative filesystem, network, process, and inference policies.
Microsandbox GitHub star sandbox, vm, mcp Rootless local VM sandbox runtime with SDKs, detached long-running sessions, agent skills, and MCP server integration.
CubeSandbox GitHub star microvm, sandbox, e2b-compatible MicroVM-based sandbox service for AI agents with sub-60ms startup, E2B-compatible APIs, and hardware-level isolation.
Sandcastle GitHub star sandbox, typescript, branch-strategy TypeScript library for orchestrating coding agents inside isolated sandboxes with configurable branch strategies.
agent-infra sandbox GitHub star all-in-one, browser, shell All-in-one sandbox combining browser, shell, files, MCP, and IDE server.
Judge0 GitHub star code-execution, sandbox, backend Scalable sandboxed code execution system usable as an agent execution backend.
Agent Sandbox GitHub star kubernetes, sandbox, stateful Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support.
stakpak/agent GitHub star always-on, autonomous, ops Always-on open agent that runs on your machines with autonomous operational loops.
Sandbox Agent GitHub star sandbox, coding-agents, session-schema HTTP/SSE control server for running coding agents inside sandboxes with normalized sessions, permissions, event streaming, and replay.
E2B Desktop Sandbox GitHub star desktop, sandbox, computer-use Secure virtual desktop sandbox for computer-use agents with SDK control and screen streaming.
OSS-Fuzz Gen GitHub star fuzzing, security, execution LLM-powered fuzzing workflows integrated with controlled execution contexts.
AgentBay SDK GitHub star cloud-sandbox, computer-use, sdk Cloud sandbox SDK for agents spanning browser, desktop, mobile, and code execution environments.
Tensorlake GitHub star microvm, sandbox, orchestration Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration.
AgentScope Runtime GitHub star runtime, sandbox, deployment Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services.
SWE-ReX GitHub star sandbox, execution, coding-agent Sandboxed execution infrastructure for AI coding agents at local and cloud scale.
sandboxed.sh GitHub star self-hosted, isolation, orchestrator Self-hosted orchestrator running coding agents inside isolated Linux workspaces.
Capsule GitHub star wasm, sandbox, task-runtime Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking.
agentbox GitHub star sandbox, coding-agents, network-policy Locked-down local sandbox for AI coding agents with scoped filesystem access, egress policy, secret injection, firewalling, and persistent agent state.
HexAgent GitHub star computer-layer, sandbox, runtime Agent harness that separates the runtime from the computer it operates on through local, VM, and cloud sandbox backends.
terminal-bench-env GitHub star terminal, benchmark-env, sandbox Environment layer for terminal-agent benchmark execution.

Protocols, Tool Interfaces & Agent Contracts

Project Link Stars Tags Summary
Anthropic Agent Skills GitHub star skills, spec, claude Official Agent Skills repository containing the skills specification, templates, and reference skill implementations for Claude.
GitHub Spec Kit GitHub star spec-driven, workflows, tooling Toolkit for spec-driven development to guide deterministic agent execution.
MCP Servers GitHub star mcp, servers, implementations Official collection of MCP server implementations across tools and domains.
Chrome DevTools MCP GitHub star mcp, browser, devtools Official MCP server that gives coding agents Chrome DevTools access for reliable browser automation, debugging, and performance analysis.
Playwright MCP GitHub star mcp, browser, playwright Official Playwright MCP server giving agents structured accessibility snapshots and deterministic browser automation tools.
Claude Code Plugins Directory GitHub star plugins, claude-code, marketplace Anthropic-managed Claude Code plugin marketplace defining plugin manifests, MCP configuration, commands, agents, skills, and submission quality gates.
FastMCP GitHub star mcp, python, framework Python framework for building MCP servers and clients with generated schemas, validation, documentation, production deployment patterns, and governance hooks.
Serena GitHub star mcp, coding-agents, semantic-tools MCP toolkit that gives coding agents IDE-like semantic retrieval, editing, refactoring, debugging, and memory tools.
MCP Python SDK GitHub star mcp, python, sdk Official Python implementation of MCP for building clients and servers that expose tools, resources, prompts, protocol lifecycle events, and standard transports.
AGENTS.md GitHub star spec, agent-file, instructions Open format for repository-local instructions that coding agents can follow.
Agent Skills Specification GitHub star skills, spec, progressive-disclosure Open specification and documentation for packaging reusable agent capabilities, workflows, scripts, references, and assets behind progressive disclosure.
MCP TypeScript SDK GitHub star mcp, typescript, sdk Official TypeScript MCP SDK with server and client packages, transports, auth helpers, middleware adapters, and runnable examples.
Model Context Protocol GitHub star mcp, protocol, interoperability Core specification and docs for MCP-based tool and context interoperability.
directories (rules and MCP indexes) GitHub star directories, mcp, rules Curated directories of agent rules and MCP servers for tool discovery.
Atmosphere GitHub star jvm, multi-protocol, governance JVM runtime for streaming governable AI agents across MCP, A2A, AG-UI, and browser-facing transports.
LangChain MCP Adapters GitHub star mcp, adapters, integration Adapters connecting LangChain components with MCP servers.
SkillHub GitHub star skills, registry, governance Self-hosted enterprise agent skill registry with package publishing, versioning, discovery, namespaces, RBAC, reviews, and audit logs.
Agent Client Protocol GitHub star acp, protocol, coding-agents Open protocol that standardizes communication between code editors and coding agents.
Microsoft MCP Servers GitHub star mcp, enterprise, servers Microsoft's official MCP server catalog for enterprise data and tools.
ACPX GitHub star acp, client, sessions Headless CLI client for stateful Agent Client Protocol sessions.
GitAgentProtocol GitHub star standard, git-native, workflows Git-native, framework-agnostic standard for defining agents, skills, workflows, tools, and runtime memory in repositories.
Microsoft Learn MCP GitHub star mcp, docs, grounding MCP server and CLI for grounding agents with Microsoft documentation sources.
IBM MCP GitHub star mcp, clients, tooling IBM collection of MCP servers, clients, and developer tooling.
AGENT.md GitHub star standard, agent-file, interoperability Standardized machine-readable file format for agentic coding tools.

Evaluation Harnesses & Benchmarks

Project Link Stars Tags Summary
Promptfoo GitHub star eval, red-team, ci Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool.
DeepEval GitHub star evaluation, framework, testing LLM evaluation framework supporting agent and workflow quality testing.
RAGAS GitHub star rag, metrics, evaluation Open evaluation toolkit for LLM and RAG quality metrics.
lm-evaluation-harness GitHub star benchmark, harness, llm Popular benchmark harness for consistent LLM evaluation across tasks.
SWE-bench GitHub star benchmark, swe, evaluation Standard benchmark for evaluating issue-fixing software engineering agents.
verifiers GitHub star verifier, rl, evaluation Library for RL environments and verifier-based evaluation loops.
AgentBench GitHub star benchmark, cross-domain, agent Cross-environment benchmark for evaluating LLM agents as tool-using systems.
LangWatch GitHub star simulation, evaluation, testing End-to-end platform for agent simulations, evaluation loops, and production testing.
EvalScope GitHub star benchmark, framework, llm Customizable framework for large-model benchmarking and performance evaluation.
Harbor GitHub star evaluation, harness, rl-env Framework for running agent evaluations and constructing RL-style environments.
Terminal-Bench GitHub star terminal, benchmark, long-horizon Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks.
WebArena GitHub star web-agent, benchmark, environment Self-hostable web environment and evaluation harness for autonomous web agents with reproducible end-to-end tasks.
tau2-bench GitHub star tool-use, interaction, benchmark Tool-agent-user interaction benchmark emphasizing multi-step execution quality.
Meta-Harness GitHub star harness-search, optimization, terminal-bench Framework for automated search over task-specific model harnesses, with reference experiments for memory systems and terminal-agent scaffolds.
NeMo Gym GitHub star rl-env, training, evaluation Toolkit for building RL environments suitable for LLM/agent training and eval.
TheAgentCompany GitHub star benchmark, workplace, multi-step Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy.
Claw-Eval GitHub star benchmark, trajectory, safety Evaluation harness and benchmark for autonomous agents with human-verified tasks, trajectory auditing, and completion, safety, and robustness rubrics.
Inspect Evals GitHub star inspect, eval-suite, reproducibility Evaluation suite collection for Inspect AI workflows.
auto-harness GitHub star optimization, regression, evals Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight.
SWE-Bench Pro GitHub star swe, benchmark, long-horizon Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents.
WildClawBench GitHub star benchmark, harness-comparison, multimodal In-the-wild benchmark that compares multiple agent harnesses on end-to-end multimodal, coding, safety, and productivity tasks inside a live OpenClaw environment.
ClawBench GitHub star browser-agent, benchmark, recording Browser-agent benchmark with live-site tasks, isolated containers, five-layer recording, and agentic scoring.
Agent Evaluation GitHub star evaluation, testing, ci AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows.
WorkArena GitHub star browser, benchmark, enterprise Browser benchmark for practical enterprise-like knowledge work tasks.
OpenHands Benchmarks GitHub star openhands, eval, harness Evaluation harness and benchmark definitions for OpenHands systems.
WebArena-Verified GitHub star web-agent, benchmark, deterministic Verified web-agent benchmark with deterministic evaluators.
HarnessBench GitHub star harness-comparison, browser-agent, benchmark Benchmark for comparing agent harnesses on the same everyday web tasks with fixed models and per-harness containers.

Observability & Reliability Operations

Project Link Stars Tags Summary
Langfuse GitHub star llmops, tracing, metrics Open-source LLM engineering platform for traces, metrics, prompts, and evals.
MLflow GitHub star platform, monitoring, evaluation Broad AI engineering platform with monitoring and evaluation support for agents.
Opik GitHub star monitoring, eval, tracing End-to-end debug/eval/monitoring stack for LLM apps and agent workflows.
RagaAI Catalyst GitHub star agentops, analytics, monitoring Agent observability and monitoring framework with timeline and graph analytics.
TensorZero GitHub star llmops, gateway, optimization Open LLMOps stack unifying gateway, observability, evaluation, and optimization.
Arize Phoenix GitHub star observability, tracing, evaluation Open platform for AI observability, tracing, and evaluation analytics.
OpenLLMetry GitHub star opentelemetry, instrumentation, tracing OpenTelemetry-based instrumentation for GenAI and LLM applications.
Helicone GitHub star monitoring, traffic, production Lightweight platform for monitoring and evaluating LLM traffic in production.
AgentOps SDK GitHub star agentops, monitoring, cost Monitoring and benchmarking SDK for agent workflows with cost and trace tracking.
Latitude GitHub star platform, eval, observability Open-source agent engineering platform with eval and observability capabilities.
Laminar GitHub star observability, tracing, evals Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards.
Desloppify GitHub star quality-gates, codebase-health, ci Agent-facing codebase quality harness with scans, scoring, LLM review, prioritized fix loops, persistent state, and CI gates.
claude-code-reverse GitHub star trace, visualization, debugging Tooling to visualize and inspect Claude Code LLM interaction traces.
Future AGI GitHub star observability, evaluation, guardrails Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations.
OpenInference GitHub star spec, instrumentation, observability Open instrumentation specification and tooling for AI observability.

Guardrails, Security & Governance

Project Link Stars Tags Summary
LiteLLM GitHub star gateway, proxy, guardrails Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails.
Kong GitHub star gateway, policy, infra API and AI gateway infrastructure useful for policy enforcement in agent systems.
Parlant GitHub star interaction-control, guardrails, customer-agents Interaction-control harness for customer-facing agents focused on consistent, predictable, and governed LLM behavior.
Portkey Gateway GitHub star gateway, guardrails, routing AI gateway with routing and guardrails for multi-model production traffic.
CAI (Cybersecurity AI) GitHub star security, governance, framework Security-focused agent framework for offensive/defensive AI workflows.
OpenAI Realtime Agents GitHub star realtime, orchestration, control Advanced agentic realtime patterns with structured control and interaction loops.
Plano GitHub star proxy, safety, data-plane AI-native proxy and data plane with orchestration, safety, and observability.
OpenAI CS Agents Demo GitHub star demo, handoffs, governance Customer-service multi-agent demo highlighting handoffs and guardrail-like control points.
Agent Governance Toolkit GitHub star governance, policy, sandboxing Runtime governance toolkit that deterministically enforces agent policy, identity, sandboxing, and audit controls before actions execute.
ContextForge GitHub star gateway, governance, observability Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability.
Archestra GitHub star enterprise, guardrails, governance Enterprise AI platform with guardrails, MCP registry, and orchestration services.
Tracecat GitHub star security, automation, policy AI automation platform for security teams with policy and workflow controls.
AgentGateway GitHub star gateway, mcp, proxy Agentic proxy gateway for AI agents and MCP server ecosystems.
ClawManager GitHub star control-plane, governance, runtimes Kubernetes-native control plane for governing agent runtimes, AI gateway access, and reusable skills across multiple agent backends.
Agent Vault GitHub star credentials, egress-policy, proxy Credential proxy and vault that brokers agent API access without exposing real secrets, with egress filtering and request logging.
Haft GitHub star governance, decisions, mcp Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute.
Sponsio GitHub star contracts, runtime-safety, guardrails Runtime enforcement layer that checks every agent action against deterministic contracts before execution.
DashClaw GitHub star approvals, policy, audit Governance layer that intercepts risky agent actions, enforces policy, routes approvals, and records audit-ready decision trails.
Tandem GitHub star runtime-authority, approvals, audit Governed runtime authority layer for agents with scoped execution, tool visibility, permissioned memory, approval gates, and audit trails.

Reference Harness Implementations

Project Link Stars Tags Summary
OpenClaw GitHub star gateway, channels, sandboxing Local-first personal assistant harness with a gateway control plane for sessions, channels, tools, events, skills, and sandboxed non-main agents.
Claw Code GitHub star rust, cli, sessions Public Rust implementation of the claw CLI agent harness with auth, sessions, parity checks, container workflows, and terminal execution guidance.
Hermes Agent GitHub star memory, skills, subagents Self-improving agent runtime with memory, skill creation, subagents, scheduled automations, and pluggable terminal backends.
OpenCode GitHub star terminal, coding-agent, subagents Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime.
Claude Code GitHub star terminal, coding-agent, git-workflows Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language.
Gemini CLI GitHub star terminal, coding-agent, mcp Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls.
Browser Use GitHub star browser-agent, automation, benchmarks Browser-agent framework that exposes websites to LLMs through browser state, tools, cloud browsers, and benchmarked task runs.
Codex CLI GitHub star terminal, coding-agent, local-execution Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks.
LobeHub GitHub star operator, multi-agent, scheduling Chief-agent-operator platform for scheduling, running, and reporting on multi-agent workstreams.
OpenHands GitHub star coding-agent, software-engineering, repo Open-source AI software engineer focused on repo-level coding task execution.
Paperclip GitHub star managed-agents, control-plane, governance Managed-agent control plane with org charts, ticketing, budgets, heartbeats, and audit trails for coordinating agent teams.
learn-claude-code GitHub star tutorial, harness, claude-code Hands-on harness tutorial for building Claude Code-like systems from scratch.
Cline GitHub star coding-agent, mcp, checkpoints Open-source coding agent spanning IDE, terminal, SDK, and kanban surfaces with shared approvals, MCP, checkpoints, and agent teams.
pi GitHub star coding-agent, runtime, monorepo Agent harness monorepo combining a coding-agent CLI, shared runtime, and multi-provider LLM stack.
OpenManus GitHub star general-agent, autonomy, workflows Open foundation for broad autonomous agent workflows with coding-heavy use cases.
aider GitHub star terminal, repo-map, testing Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops.
CowAgent GitHub star reference, skills, multi-channel Reference agent harness implementation with planning, memory, knowledge, skills, tools, MCP integration, schedulers, browser automation, and multi-channel delivery.
nanobot GitHub star runtime, memory, multi-channel Ultra-lightweight agent runtime with WebUI, chat channels, tools, memory, MCP, model routing, deployment, and long-running goal support.
CLI-Anything GitHub star cli, tool-use, automation CLI agent system that unifies command-line tool usage in agent loops.
Claude Code Plugins: Orchestration and Automation GitHub star claude-code, plugins, orchestration Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators.
Agent TARS GitHub star computer-use, browser-agent, mcp Multimodal computer and browser agent stack with CLI/Web UI, hybrid GUI/DOM browser control, MCP tools, event streams, and sandbox support.
oh-my-claudecode GitHub star claude-code, multi-agent, worktrees Team-first orchestration layer for Claude Code with staged multi-agent execution, worktree-aware setup, and persistent session artifacts.
Multica GitHub star managed-agents, coding-agent, runtimes Managed-agents platform that assigns issues to coding agents, routes execution through runtimes, and compounds reusable skills.
ZeroClaw GitHub star runtime, approval-gates, sandboxing Single-binary agent runtime with providers, channels, tools, memory, SOPs, approval gates, sandboxing, ACP, and tool receipts.
oh-my-codex GitHub star codex, workflow, worktrees Workflow layer for OpenAI Codex CLI with stronger session startup, standard planning-to-completion flows, durable state, skills, hooks, and worktree launches.
NanoClaw GitHub star containers, claude-sdk, scheduling Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization.
Vibe Kanban GitHub star coding-agent, workspaces, review Kanban control plane for planning, running, reviewing, and merging work from coding agents in isolated workspaces.
Qwen Code GitHub star terminal, coding-agent, cli Terminal-native open-source coding agent tuned for practical dev loops.
SuperClaude Framework GitHub star config, personas, workflow Configuration framework adding commands, personas, and method templates to coding agents.
cmux GitHub star macos, workspace, browser Native macOS terminal and browser workspace for AI coding agents with notifications, split panes, and scriptable control.
Compound Engineering GitHub star plugins, worktrees, review Cross-agent engineering plugin that codifies brainstorming, planning, worktree execution, review, and knowledge compounding loops.
Devika GitHub star assistant, planning, coding Open-source coding assistant system for planning and implementing development tasks.
SWE-agent GitHub star swe, issue-fixing, tooling Research-grade coding agent that resolves GitHub issues with explicit tooling loops.
OpenFang GitHub star agent-os, guardrails, rust Rust agent operating system with autonomous capability packages, manifests, guardrails, tools, memory, sandboxing, audit trails, and channel adapters.
Aperant GitHub star coding-agent, parallel, memory Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory.
Eigent GitHub star desktop, cowork, productivity Open-source desktop cowork agent for autonomous task execution and productivity.
OpenHarness GitHub star tool-use, memory, multi-agent Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination.
IronClaw GitHub star security, wasm, routines Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory.
Agent S GitHub star computer-use, gui-agent, evaluation Open-source computer-use agent framework with grounding models, reflection, local code execution option, and OSWorld-style evaluation support.
Superset GitHub star worktrees, desktop, parallel Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace.
oh-my-pi GitHub star terminal, lsp, subagents Terminal AI coding agent with edit safety, LSP integration, and subagent support.
GitHub Copilot CLI GitHub star terminal, coding-agent, mcp Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context.
Open SWE GitHub star async, coding-agent, swe Asynchronous open-source coding agent focused on software issue workflows.
Paseo GitHub star coding-agent, daemon, multi-device Multi-device coding-agent daemon and client stack for orchestrating local agents, parallel runs, and cross-provider workflows.
Agent Orchestrator GitHub star worktrees, parallel, dashboard Worktree-based orchestration layer for parallel coding agents with autonomous CI and review feedback handling.
jcode GitHub star coding-agent, terminal, rust Rust coding-agent harness built for multi-session workflows, customization, memory, and terminal performance.
Harness GitHub star claude-code, meta-factory, agent-teams Claude Code meta-factory that generates domain-specific agent teams, skills, orchestration patterns, and validation steps from a project description.
OSAURUS GitHub star macos, local-first, memory Native macOS harness for autonomous coding agents with persistent memory.
1Code GitHub star coding-agent, orchestration, worktrees Desktop-first coding-agent orchestrator with worktree isolation, background sandboxes, MCP tooling, and automation triggers.
holaOS GitHub star long-horizon, desktop, durable-state Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state.
Webwright GitHub star browser-agent, code-as-action, playwright Minimal browser-agent harness that lets coding models solve web tasks by writing rerunnable Playwright scripts in a workspace.
mini-swe-agent GitHub star minimal, swe, coding-agent Minimal coding agent implementation with strong benchmark competitiveness.
HiClaw GitHub star multi-agent, human-in-the-loop, shared-state Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms.
gptme GitHub star terminal, tools, mcp Terminal-native personal agent with local tools, shell and web access, provider-agnostic models, plugins, skills, MCP, guardrails, and autonomous loops.
TinyAGI GitHub star team-orchestration, autonomous, workflows Team-style agent orchestrator for one-person-company style autonomous workflows.
Open Claude Cowork GitHub star desktop, ui, orchestration Desktop coding cowork assistant that turns agent orchestration into GUI workflows.
Amazon Bedrock AgentCore Samples GitHub star aws, runtime, operations Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers.
Maestro GitHub star desktop, worktrees, orchestration Desktop command center for parallel coding agents with worktree isolation, queued tasks, auto-run playbooks, and reusable sessions.
AI-DLC Workflows GitHub star workflow-rules, quality-gates, steering Official AWS workflow ruleset that steers coding agents through adaptive phases, quality gates, and IDE-specific context files.
Google Agents CLI GitHub star google-cloud, lifecycle, skills Google Cloud CLI and skill bundle that gives coding agents scaffold, evaluation, deployment, publishing, and observability workflows.
Open Cowork GitHub star desktop, sandbox, mcp Desktop agent app with VM-backed sandboxing, MCP connectors, GUI control, and built-in skill workflows.
thClaws GitHub star rust, workspace, skills Native Rust agent workspace with shared agent loop, sessions, tools, skills, MCP, memory, hooks, and sandboxing.
mini-coding-agent GitHub star coding-agent, minimal, approvals Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts.
codex-autorunner GitHub star meta-harness, tickets, long-running Meta-harness that treats tickets as the control plane for long-running coding agents, with queue execution, hub UI, and chat notifications.
CheetahClaws GitHub star coding-agent, python, mcp Python agent harness infrastructure for long-horizon, multi-model, tool-using coding assistants with MCP, skills, memory, approvals, checkpoints, and bridges.
MateClaw GitHub star self-hosted, approvals, channels Self-hosted multi-user agent harness with StateGraph reasoning, skills, MCP/ACP registry, approvals, audit trail, and channel adapters.
OpenClaw.NET GitHub star dotnet, gateway, governance NativeAOT-friendly .NET agent runtime and gateway with tools, memory, MCP, governance ledger, evidence bundles, and harness regression tests.
Utah GitHub star durable-execution, event-driven, multi-channel Inngest-powered durable agent harness with a think-act-observe loop, step-level retries, singleton concurrency, cancellation, and multi-channel adapters.

Essential Readings & Ecosystem Maps

Project Link Stars Tags Summary
awesome-claude-code GitHub star awesome-list, claude-code, skills Community collection of Claude Code skills, hooks, and orchestrator tooling.
Awesome Agent Skills GitHub star awesome-list, skills, cross-harness Curated cross-harness map of official and community Agent Skills for Claude Code, Codex, Gemini CLI, Cursor, OpenCode, Copilot, and related hosts.
awesome-agentic-patterns GitHub star awesome-list, patterns, design Catalog of reusable agentic design patterns and implementation motifs.
awesome-mcp-servers GitHub star awesome-list, mcp, tools Curated MCP server index for tool interoperability in agent systems.
awesome-harness-engineering GitHub star awesome-list, curation, harness Curated list focused on harness engineering articles, benchmarks, and implementations.
12 Factor Agents Reference - reading, operations, principles Operations-oriented principles for building maintainable production agents.
Agent Frameworks, Runtimes, and Harnesses, oh my! Reference - reading, langchain, architecture Clear decomposition of framework vs runtime vs harness responsibilities.
An open-source spec for Codex orchestration: Symphony. Reference - reading, openai, orchestration OpenAI's orchestration write-up on turning issue trackers into always-on control planes for coding agents.
Building agents with the Claude Agent SDK Reference - reading, claude, sdk Claude blog on production-oriented SDK usage for sessions, tools, and orchestration.
Building Effective AI Agents Reference - reading, anthropic, agents Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
Claude Code auto mode Reference - reading, anthropic, permissions Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
Code execution with MCP Reference - reading, anthropic, mcp Anthropic's design notes on controlled code execution via MCP boundaries.
Demystifying Evals for AI Agents Reference - reading, evals, anthropic Methodology for designing robust agent evals in non-deterministic trajectories.
Effective context engineering for AI agents Reference - reading, context, anthropic Guidance on context-window budgeting and working-state management for agents.
Effective harnesses for long-running agents Reference - reading, long-running, anthropic Practical guide to maintaining state, resumability, and reliability over long agent runs.
Evaluating Deep Agents: Our Learnings Reference - reading, langchain, evaluation LangChain's practical lessons on evaluating stateful and long-horizon agents.
Harness design for long-running application development Reference - reading, app-dev, anthropic Follow-up article on improving long-running app generation through harness structure.
Harness Engineering (Martin Fowler) Reference - reading, architecture, fowler Architectural perspective on harness engineering and entropy control.
Harness engineering (OpenAI) Reference - reading, methodology, openai Field report on building reliable agent-first software via harness constraints and verification.
How we built our multi-agent research system Reference - reading, anthropic, multi-agent Anthropic architecture write-up on role separation and coordination in multi-agent systems.
Improving Deep Agents with harness engineering Reference - reading, langchain, harness Evidence that harness improvements alone can move benchmark performance.
Making Claude Code more secure and autonomous with sandboxing Reference - reading, anthropic, sandboxing How Anthropic uses sandbox boundaries to raise agent autonomy without giving up security controls.
Quantifying infrastructure noise in agentic coding evals Reference - reading, anthropic, evaluation Analysis of how infrastructure choices impact coding-agent benchmark outcomes.
Scaling Managed Agents: Decoupling the brain from the hands Reference - reading, anthropic, architecture Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
Skill Issue: Harness Engineering for Coding Agents Reference - reading, humanlayer, coding-agents Practical breakdown of why coding-agent quality depends heavily on harness setup.
Testing Agent Skills Systematically with Evals Reference - reading, openai, evals OpenAI Developers guide for turning agent traces into repeatable skill evaluations.
The Anatomy of an Agent Harness Reference - reading, architecture, langchain Conceptual decomposition of agent harness components and their responsibilities.
The next evolution of the Agents SDK Reference - reading, openai, sdk OpenAI's product and engineering post on model-native agent harnesses, native sandbox execution, manifests, memory, and filesystem and shell tools.
Unrolling the Codex agent loop Reference - reading, openai, architecture OpenAI engineering deep dive into the Codex harness loop, prompt growth, tool-call replay, and stateless execution tradeoffs.
What We Learned Building Cloud Agents Reference - reading, cognition, cloud-agents Cognition's field report on secure cloud-agent infrastructure, VM isolation, full-state snapshots, orchestration, governance, integrations, and enterprise adoption.
Writing effective tools for AI agents Reference - reading, anthropic, tools Best practices for tool interface design so agents call tools safely and reliably.
Your Agent Needs a Harness, Not a Framework Reference - reading, inngest, reliability Argument for reliability-first infrastructure around agents instead of framework-only thinking.

Maintenance Notes

  • Source of truth: data/projects.yaml
  • Regenerate README files: python3 scripts/render_readme.py
  • Verify catalog and links: python3 scripts/verify_catalog.py

Citation

@misc{li2026agentharness,
  title={Agent Harness Engineering: A Survey},
  author={Li, Junjie and Xiao, Xi and Zhang, Yunbei and Liu, Chen and Zhao, Lin and Liao, Xiaoying and Ji, Yingrui and Wang, Janet and Gu, Jianyang and Ge, Yingqiang and Xu, Weijie and Fang, Xi and Xu, Xiang and Zhao, Tianchen and Kim, Youngeun and Wang, Tianyang and Hamm, Jihun and Krishnaswamy, Smita and Huan, Jun and Reddy, Chandan},
  url={https://openreview.net/pdf?id=eONq7FdiHa},
  year={2026}
}

About

An awesome list of Agent Harness engineering resources, including GitHub projects, tools, benchmarks, and practical guides.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages