A curated collection of academic papers, industry reports, datasets, and tools for the OpenClaw AI agent ecosystem.
Companion repository for our survey: A Survey of the OpenClaw Ecosystem β From Platform Extensibility to Constraint Design.
OpenClaw β the open-source, self-hosted AI agent platform created by Peter Steinberger (Clawdbot β Moltbot β OpenClaw, January 29, 2026) β has generated 74 academic papers, 23 benchmarks, and 18+ major industry reports in under four months. This repository organizes the research landscape using the PSEA (PlatformβSecurityβSocietiesβDeployment) taxonomy introduced in our survey.
Thesis of the survey. OpenClaw is best understood as a stress test for open personal-agent ecosystems. Its open Skills, persistent Memory, and always-on Heartbeat make capability easy to extend, but the same openness creates governance, security, social, and deployment problems. The literature converges on one recurring tradeoff: extensibility accelerates capability growth, but trustworthy use requires explicit constraints on Skills, Memory, autonomy, domain actions, and evaluation. The repository is organized to make this tradeoff visible at every level.
| Layer | Section | Papers | Sub-topics |
|---|---|---|---|
| π§ P | Platform | 10 | Agent learning β platform improvement; Skill ecosystem governance |
| π‘οΈ S | Security | 33 | Threat landscape; attacks; defenses (execution + supply chain) |
| π S | Societies | 22 | Statistical sociality & shallow interaction; safety drift |
| π D | Deployment | 9 | Robotics; healthcare; scientific research |
| Total | 74 |
Separately, the survey catalogs 23 benchmarks as an orthogonal evaluation lens β many of these are released by papers already counted above (e.g. CIK-Bench, ClawSafety, SkillFortifyBench), so they are tracked in their own Benchmarks section rather than added to the PSEA totals.
- π§ Platform
- π‘οΈ Security
- π Societies
- π Deployment
- π Benchmarks
- π Open Problems & Future Directions
- π Surveys & Position Papers
- π‘οΈ Industry Security Reports
- π§ Open-Source Projects & Tools
- π Datasets
- π Related Awesome Lists
- π€ Contributing
How OpenClaw is built and how it improves itself. The literature shifts from improving individual agents to governing the Skill ecosystem they depend on. (10 papers)
Continuous improvement runs through Skills and runtimes, not weights alone.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| OpenClaw-RL: Train Any Agent Simply by Talking | Mar 2026 | Async RL from live interaction signals; combines evaluative and directive rewards | Paper |
| MetaClaw: Just Talk β An Agent That Meta-Learns and Evolves in the Wild | Mar 2026 | Continual meta-learning; updates both weights and the Skill library from failure trajectories | Paper |
| SemaClaw: General-Purpose Personal AI Agents through Harness Engineering | Apr 2026 | DAG-based orchestration, PermissionBridge safety, three-tier context, agentic-wiki skill | Paper |
| ClawGym: A Scalable Framework for Building Effective Claw Agents | Apr 2026 | Mines raw ClawHub Skills into training tasks and a benchmark β marketplace as training substrate | Paper |
| OpenCLAW-P2P: Decentralized Framework for Collective AI Intelligence | Apr 2026 | Decentralized agent network with DHT, federated learning, and formal verification | Paper GitHub |
A larger ClawHub is not automatically a better one. Clone inflation, bloat, discoverability, and submission-time risk are all platform-level concerns.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| SkillClone: Multi-Modal Clone Detection β ASE 2026 | Mar 2026 | 75% of ClawHub Skills are cloned; ecosystem inflated ~3.5Γ; clones amplify supply-chain risk | Paper |
| SkillReducer: Optimizing LLM Agent Skills for Token Efficiency | Mar 2026 | >60% of Skill body is non-actionable boilerplate; compressing improves downstream performance | Paper |
| Red Skills or Blue Skills? Submission-Time Risk Prediction | Apr 2026 | Simple classifiers can triage ClawHub submissions before publication (11,010-skill study) | Paper |
| How Well Do Agentic Skills Work in the Wild? (Skills-in-the-Wild) | Apr 2026 | Performance drops sharply when agents must locate the right Skill among 34K real candidates | Paper |
| SkillClaw: Let Skills Evolve Collectively with Agentic Evolver | Apr 2026 | Cross-user collective Skill evolution from autonomous trajectory aggregation | Paper |
π‘ Key Takeaway. OpenClaw's platform literature reveals the tradeoff between extensibility and governance: openness lets the agent and the Skill ecosystem improve, but turns ClawHub from a feature into a critical dependency. The challenge is not to add more Skills, but to ensure they stay safe, compact, and discoverable.
Open Tools, community Skills, messaging channels, persistent Memory, and Heartbeat expand the attack surface. Research moves from isolated vulnerability reports β execution-path attacks β autonomous/persistent attacks; defenses form a layered stack but leave Memory governance unresolved. (33 papers)
Systemic exposure across components, persistent state, and trajectory-level failures.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| FASA: Uncovering Security Threats in Autonomous Agents | Mar 2026 | Tri-layered risk taxonomy with full-lifecycle defense architecture | Paper |
| Taming OpenClaw: Security Analysis and Mitigation | Mar 2026 | Five-stage lifecycle threat model; point defenses fail cross-stage attacks | Paper |
| A Systematic Taxonomy of Security Vulnerabilities (OpenClaw Kill Chain) | Mar 2026 | Analysis of 190 advisories across 10 attack surfaces; individually moderate flaws chain into RCE | Paper |
| Don't Let the Claw Grip Your Hand | Mar 2026 | Empirical red-teaming across six LLMs; human-in-the-loop defense layer | Paper |
| From Assistant to Double Agent (PASB) | Feb 2026 | First end-to-end benchmark for personalized agent security | Paper |
| ClawTrap: MITM-Based Red-Teaming Framework | Mar 2026 | First network-layer red-teaming framework for agent systems | Paper |
| A Trajectory-Based Safety Audit of Clawdbot | Feb 2026 | Trajectory-level audit; OpenClaw fails completely on intent misunderstanding | Paper |
| Your Agent, Their Asset (CIK Taxonomy) | Apr 2026 | Capability/Identity/Knowledge taxonomy; ASR 24.6% β 64β74% under single-dimension state poisoning | Paper |
| A Systematic Security Evaluation of OpenClaw and Its Variants (SecEval) | Apr 2026 | 205 tests across OpenClaw/AutoClaw/QClaw/KimiClaw/MaxClaw/ArkClaw | Paper |
| ClawSafety: "Safe" LLMs, Unsafe Agents | Apr 2026 | 120 scenarios Γ 5 backbones Γ 2,520 trials; ASR 40β75%; SKILL.md highest-trust highest-risk | Paper |
| Forensic Foundations for OpenClaw Agents | Apr 2026 | First agentic-AI forensic study; agent artifact taxonomy; nondeterminism challenges for DFIR | Paper |
| Agents of Chaos | Feb 2026 | Empirical study of failure modes in deployed agent systems | Paper |
From malicious Skills to worms, denial-of-wallet, and memory pollution. The attack surface shifts from malicious commands to ordinary information flows the agent chooses to read, remember, and reuse.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks | Feb 2026 | 202 injection-task pairs; harmful instructions in trusted Skills are followed at high rates | Paper |
| Clawdrain: Token Exhaustion via Tool-Calling Chains | Mar 2026 | Trojanized Skill triggers massive token amplification β denial-of-wallet | Paper |
| BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning | Apr 2026 | Bundled model artifacts can be backdoored while preserving benign-side behavior | Paper |
| SkillAttack: Automated Red Teaming of Agent Skills | Apr 2026 | Reveals latent vulnerabilities in popular community Skills without modifying them | Paper |
| ClawWorm: Self-Propagating Attacks Across Agent Ecosystems | Mar 2026 | First self-replicating worm for a production agent framework | Paper |
| MissClaw / Mind Your HEARTBEAT: Silent Memory Pollution via Background Execution | Mar 2026 | Zero-click memory pollution β ordinary browsing content becomes persistent context | Paper |
Three boundaries: (a) execution-layer isolation/enforcement around dangerous Tools, (b) supply-chain scanning before Skills enter the marketplace, and (c) a still-missing fourth layer β provenance-aware memory governance.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| Agent Privilege Separation Against Prompt Injection | Mar 2026 | Structural defense: agent processing untrusted content never holds access to dangerous Tools | Paper |
| SafeClaw-R: Safe and Secure Multi-Agent Personal Assistants | Mar 2026 | System-level invariant enforcement over the execution graph; 97.8% malicious-Skill detection | Paper |
| OpenClaw PRISM: Zero-Fork Runtime Security Layer | Mar 2026 | Defense-in-depth across 10 lifecycle hooks with risk accumulation and decay | Paper |
| Aethelgard: Learned Capability Governance | Apr 2026 | Four-layer adaptive governance with PPO-learned minimum-viable-capability policy | Paper |
| RouteGuard: Internal-Signal Detection of Skill Poisoning | Apr 2026 | Detects Skill poisoning before execution via model-internal signals | Paper |
| Proof-of-Guardrail (PoG) | Mar 2026 | TEE-based cryptographic attestation that guardrails actually execute | Paper |
| OAP: Deterministic Pre-Action Authorization for Autonomous AI Agents | Mar 2026 | Enforces deterministic authorization before each Tool call | Paper |
| VeriGrey: Greybox Agent Validation | Mar 2026 | 33% gain over black-box agent validation on AgentDojo | Paper |
| Governance Architecture for Autonomous Agent Systems (LGA) | Mar 2026 | Threats, framework, and engineering practice for governance layers | Paper |
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| SkillFortify: Formal Analysis and Supply Chain Security | Mar 2026 | First formal supply-chain framework with Dolev-Yao attacker model for Skills | Paper |
| SkillSieve: Hierarchical Triage for Malicious Agent Skills | Apr 2026 | Three-layer regex β LLM-sub-task β LLM-jury detection on 49,592 ClawHub Skills; 0.800 F1 | Paper |
| SkillProbe: Multi-Agent Security Auditing | Mar 2026 | Multi-agent auditing reveals most popular Skills fail rigorous security checks | Paper |
| Malicious Or Not: Repository Context for Skill Classification | Mar 2026 | 238K Skills across 4 registries; repository context dramatically changes estimated prevalence | Paper |
| HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents? | Apr 2026 | Registry-scale harmfulness measurement of Skill loading | Paper |
| "Elementary, My Dear Watson" β Detecting Malicious Skills (MalSkills) | Mar 2026 | Neuro-symbolic reasoning across heterogeneous Skill artifacts | Paper |
π‘ Key Takeaway. OpenClaw security is expanding from execution control to memory governance. Existing defenses protect Tools, execution traces, and Skill supply chains, but they do not yet control what an autonomous agent reads, stores in Memory, and later acts on. The next defense layer must be provenance-aware memory governance (see Open Problems).
Moltbook β a Reddit-style platform of OpenClaw-powered AI agents β became the first large-scale natural experiment in agent-only social interaction. The literature reveals a consistent gap between looking social and being socially reliable. (22 papers)
At the aggregate level, Moltbook reproduces familiar online-community statistics. At the interaction level, it is dominated by shallow replies, duplicate content, and extreme attention concentration.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| Collective Behavior of AI Agents: the Case of Moltbook | Feb 2026 | Large-scale statistical analysis; activity heavy-tailed, popularity power-law, attention decay | Paper |
| The Anatomy of the Moltbook Social Graph | Feb 2026 | >93% of comments receive no reply; minimal reciprocity; frequent duplicate posts | Paper |
| Social Simulacra in the Wild: AI vs Human Communities | Mar 2026 | Participation far more unequal than Reddit; communities share authors, not norms | Paper |
| Let There Be Claws: Early SNA of AI Agents on Moltbook | Feb 2026 | Extreme attention concentration; posting volume and content quality decoupled | Paper |
| Exploring Silicon-Based Societies | Feb 2026 | "Data-driven silicon sociology" framework; emergent community archetypes | Paper |
| 'Humans welcome to observe': A First Look at Moltbook | Feb 2026 | First measurement study with topic taxonomy and toxicity analysis | Paper |
| The Rise of AI Agent Communities | Feb 2026 | Discourse analysis showing functional utility drives agent influence | Paper |
| Emergence of Fragility in LLM-based Social Networks | Mar 2026 | Core-periphery structure reveals vulnerability to targeted hub attacks | Paper |
| MoltNet: Understanding Social Behavior of AI Agents | Feb 2026 | Selective mimicry of human behavior; persona drift after social rewards | Paper |
| Fast Response or Silence: Conversation Persistence on Moltbook | Feb 2026 | Two-part persistence decomposition; low incidence is the binding coordination bottleneck | Paper |
| Comparative Analysis of Reddit vs Moltbook | Feb 2026 | First topological comparison; Moltbook operates as broadcast network, not community | Paper |
| Informal Learners at Moltbook | Feb 2026 | Extreme broadcasting inversion; parallel monologues dominate interaction | Paper |
| Peer Learning Patterns in the Moltbook Community | Feb 2026 | Taxonomy of peer response patterns: validation, extension, application | Paper |
| MoltGraph: Temporal Graph for Coordinated-Agent Detection | Feb 2026 | First temporal graph dataset; coordinated posts get massive early engagement | Paper |
| Large-Scale Analysis of Persuasive Content on Moltbook | Mar 2026 | Political/persuasive content disproportionately concentrated in a small post fraction | Paper |
| Scientific Discussions on Moltbook (BERTopic) | Mar 2026 | Self-referential discussion patterns in AI-science discourse | Paper |
| When AI Agents Learn from Each Other (Human-AI Education) β AIED 2026 | Mar 2026 | Emergent peer learning and trust dynamics across agent communities | Paper |
Apparent emergence often turns out to be human-seeded, and isolated self-evolution drifts away from safety. Visible social structure is not the same as trustworthy collective intelligence.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| The Moltbook Illusion: Human vs Emergent Behavior | Feb 2026 | Temporal fingerprinting: only 15.3% of active agents are clearly autonomous | Paper |
| The Devil Behind Moltbook: Self-Evolution Trilemma | Feb 2026 | Proves trilemma: self-evolution + isolation + invariant safety is impossible | Paper |
| Agents in the Wild: Safety and Sociality on Moltbook | Feb 2026 | Governance and religion emerge spontaneously but interaction is performative | Paper |
| Risky Instruction Sharing and Norm Enforcement (AIRS) | Feb 2026 | Action-inducing posts trigger emergent decentralized norm enforcement | Paper |
| Molt Dynamics: Emergent Coordination on the MoltBook Archive | Mar 2026 | Role specialization emerges but multi-agent cooperation worse than single-agent baselines | Paper |
Also cited in the survey body: Conformity and Social Impact on AI Agents (Bellina et al., Jan 2026) β consensus hallucination and conformity dynamics.
π‘ Key Takeaway. Moltbook illustrates that social appearance is not social reliability. Agents reproduce the statistics of online communities, but closer inspection reveals shallow dialogue, unclear autonomy, and safety drift under isolation.
High-stakes domains shift OpenClaw from open-ended extensibility to controlled behavior. Robotics constrains physical action, healthcare grounds clinical context, scientific research limits research authority. Trustworthy deployment comes from limiting unsafe freedom, not expanding capability. (9 papers)
Validated skills, bounded parameters, closed-loop recovery β the constraint layer between model output and physical action.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| ROSClaw: OpenClaw ROS 2 Framework for Robot Control | Mar 2026 | Executive-layer contract: model proposes actions, validator decides whether they reach the robot | Paper |
| OpenGo: OpenClaw-Based Robotic Dog with Real-Time Skill Switching | Apr 2026 | Pre-validated robot skill library + bounded parameters; Unitree Go2 quadruped | Paper |
| ABot-Claw: Persistent, Cooperative, Self-Evolving Robotic Agents | Apr 2026 | Shared memory, critic feedback, multi-robot coordination on Unitree G1/Go2 + Agilex Piper | Paper |
| RoboClaw: Scalable Long-Horizon Robotic Tasks | Mar 2026 | VLM-driven controller with self-resetting Skills and recovery loops | Paper |
| VisionClaw: Always-On AI Agents Through Smart Glasses | Apr 2026 | Smart-glasses perception β Gemini Live reasoning β OpenClaw execution; bystander privacy boundary | Paper |
Every action and every claim must be traceable to a role, a record, and an audit trail.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| When OpenClaw Meets Hospital | Mar 2026 | Role-specific OS users, kernel isolation, append-only docs, manifest-guided clinical memory (MIMIC-IV) | Paper |
| MedOpenClaw: Auditable Medical Imaging Agents | Mar 2026 | 3D Slicer integration with auditable Tool log; reveals the "Tool-Use Paradox" in radiology | Paper |
Role limits, evidence gates, and audit trails before agent disagreement can count as scientific review.
| Title | Date | Key Contribution | Links |
|---|---|---|---|
| ClawdLab: From Agent-Only Networks to Autonomous Science | Feb 2026 | Hard role restrictions (PI/analyst/scout/critic/synthesizer); evidence gates; governance voting | Paper |
| HTC-Claw: High-Throughput Computational Campaigns for Materials Discovery | Apr 2026 | Separates LLM planning from compute execution; 3,000-spinel bandgap scan in DFT | Paper |
π‘ Key Takeaway. Deployment turns OpenClaw's extensibility into a constraint problem. In high-stakes domains, trust comes from bounded actions (robotics), traceable context (healthcare), and limited authority (scientific workflows). The central problem is not making OpenClaw more capable, but deciding what it must not be allowed to do.
OpenClaw evaluation grew from zero to 23 benchmarks between January and May 2026. We group them by the three points in the agent lifecycle they target: before installation, during execution, and after deployment.
Can risky Skills be detected before they enter a workspace?
| Benchmark | Focus | Scale | Key finding | Paper |
|---|---|---|---|---|
| SkillFortifyBench | lifecycle model | 540 Skills | formal lifecycle guarantees | Paper |
| SkillSieve | ClawHub triage | 400 Skills | scalable marketplace triage | Paper |
| MalSkills | multi-artifact scan | 200 Skills | multi-artifact risk detection | Paper |
| Red/Blue Skills | submission risk | 11,010 Skills | lightweight submission-time prediction | Paper |
Can poisoned state, injected content, malicious Skills, or vulnerable dependencies compromise behavior?
| Benchmark | Focus | Scale | Key finding | Paper |
|---|---|---|---|---|
| CIK-Bench | state poisoning | 12 scenarios | persistent state amplifies compromise | Paper |
| ClawSafety | injection vectors | 120 cases | Skills are the highest-trust vector | Paper |
| PASB | IPI + memory | 131 Skills | memory makes injection persistent | Paper |
| SkillAttack | real-skill exploits | 171 Skills | popular Skills contain latent exploits | Paper |
| HarmfulSkillBench | registry harm | 200 Skills | Skill loading amplifies harmful behavior | Paper |
| ATBench-Claw | trajectory safety | 11 categories | trajectory audits expose runtime violations | Paper |
| AgentHazard | cross-harness harm | 2,653 cases | dependency hooks create cross-harness risk | Paper |
Can the agent complete useful work under realistic, evolving, or long-horizon conditions?
| Benchmark | Focus | Scale | Key finding | Paper |
|---|---|---|---|---|
| LiveClawBench | live curated tasks | 30 tasks | task complexity needs richer annotation | Paper |
| ClawsBench | cross-harness | 44 tasks | harness choice shapes capability and safety | Paper |
| Claw-Eval-Live | refreshable Skills | 105 tasks | live Skills enable refreshable evaluation | Paper |
| ClawArena | evolving information | 64 tasks | agents must revise beliefs under conflict | Paper |
| ClawBench-153 | production websites | 153 tasks | real websites remain difficult | Paper |
| ClawGym-Bench | ClawHub-mined | 200 tasks | ClawHub can become a training substrate | Paper |
| GTA-2 | checkpoint grading | 361 tasks | checkpoint grading captures long horizons | Paper |
| SEA-Eval | sequential streams | 92 streams | efficiency matters beyond success rate | Paper |
| MetaClaw-Bench | simulated workdays | 934 tasks | self-improvement needs longitudinal tests | Paper |
| ClawEnvKit | generated envs | 1,040 envs | environments can be generated automatically | Paper |
| WildClawBench | in-the-wild traces | β | Skill evolution must be tested in the wild | Paper |
| SkillLearnBench | Skill generation | 20 tasks | Skill learning requires continual evaluation | Paper |
Also: SkillTester (Paper) proposes paired utility-and-security scoring for Skill evaluation but does not ship an empirical evaluation set.
π‘ Key Takeaway. OpenClaw has many benchmarks but no shared measurement layer for constraint design. Each study tends to define its own threat distribution, harm metric, or task suite, so a stronger scanner / safer model / more robust defense may simply be measured against a different distribution.
Turning open extensibility into trustworthy agents requires systematic constraint design. The survey identifies four concrete directions.
| Direction | What it constrains | Why it matters now |
|---|---|---|
| π§ Memory Provenance | What the agent remembers | MissClaw shows zero-click browsing content can become persistent context. Need provenance tags + multi-hop policies for derived memories. |
| ποΈ Composable Oversight | What the agent is allowed to do | Self-evolution trilemma: isolation + continuous evolution + safety is impossible. Make oversight policies first-class platform objects (selectable like Skills). |
| π§± Constraint Composition | How limits are declared and enforced | Robotics, healthcare, and science each rediscover the same lesson. Need a policy layer over Skills/Tools/Memory analogous to SELinux/eBPF. |
| π Evaluation Convergence | How progress is measured | 23 benchmarks but no shared substrate. Convergence needed at data layer (canonical ClawHub/Moltbook snapshots), benchmark layer, and harness layer. |
Earlier surveys focus on one slice of the OpenClaw landscape. Our survey ties them together through OpenClaw's platform design choices.
| Paper | Date | Lens | Link |
|---|---|---|---|
| OpenClaw as Language Infrastructure: A Case-Centered Survey | Mar 2026 | NLP-centered view; GATE and AERO frameworks; 38 papers | DOI |
| A Survey on the Unique Security of LLM Agents | Mar 2026 | Manus (closed) vs OpenClaw (open) as two paradigms | Preprints.org |
| Clippy to MS Office : OpenClaw to the Entire System | Mar 2026 | Privacy Visual Wrapper; Agentic Trust Calibration Model | ResearchGate |
| The Innovator's Dilemma in the Age of Autonomous Agents | Feb 2026 | "SaaSpocalypse"; pincer-disruption framework | ResearchGate |
| Organization | Report | Date | Key Finding |
|---|---|---|---|
| Trend Micro | Viral AI, Invisible Risks | Feb 2026 | TrendAI Digital Assistant Framework mapping |
| Trend Micro | Malicious Skills Distribute AMOS Stealer | Feb 2026 | AMOS stealer via SKILL.md across 39 Skills |
| Trend Micro | CISOs in a Pinch | Feb 2026 | "Lethal Trifecta + Persistence" concept |
| Trend Micro | TrendAI Secures the OpenClaw Era | Mar 2026 | Agentic Governance Gateway announcement |
| Microsoft | Running OpenClaw Safely | Feb 2026 | "Not appropriate for standard workstations" |
| NVIDIA | NemoClaw at GTC 2026 | Mar 2026 | Open-source security wrapper with OpenShell |
| Oasis Security | ClawJacked | Feb 2026 | WebSocket takeover; patched in 24h |
| Koi / Repello AI | ClawHavoc Campaign | Feb 2026 | 824+ malicious Skills via CVE-2026-25253 |
| Kaspersky | OpenClaw Unsafe for Use | Feb 2026 | 512 vulns (8 critical); ~1K exposed instances |
| Cisco AI | OpenClaw Skill Audit | Feb 2026 | 26% of 31K Skills vulnerable |
| Sophos | OpenClaw Security Analysis | 2026 | Exposed instances; sandbox escape |
| Snyk Labs | From SKILL.md to Shell Access | 2026 | 1,467 malicious Skills; 3-line Markdown β shell |
| JFrog | OpenClaw Package Security | 2026 | Malicious package detection |
| SecurityScorecard | OpenClaw Risk Assessment | 2026 | Enterprise deployment risk guidance |
| Hunt.io | OpenClaw Exposure Report | 2026 | 30K-135K+ exposed instances |
| Antiy CERT | ClawHavoc Campaign Analysis | Feb 2026 | 1,184 malicious Skills; ClickFix; AMOS stealer |
| Zenity Labs | OpenClaw or OpenDoor? | Jan 2026 | Backdoor via messaging app integration |
| Giskard | OpenClaw Data Leakage | Feb 2026 | Live exploitation of misconfigured deployments |
π‘ Our unique angle: each tool is annotated with [Paper] tags linking to relevant research in our taxonomy.
| Project | Description | Links |
|---|---|---|
| openclaw/openclaw | Official OpenClaw repository | |
| openclaw/skills | Official Skills repository | |
| ClawHub | Official Skill marketplace (49,000+ Skills) | Website |
| Project | Description | Paper | Links |
|---|---|---|---|
| Gen-Verse/OpenClaw-RL | Async RL training framework | Platform | |
| MINT-SJTU/RoboClaw | VLM-driven robotic tasks | Deployment | |
| NVIDIA/NemoClaw | Enterprise security wrapper | Industry |
| Project | Description | Links |
|---|---|---|
| datawhalechina/hello-claw | Structured Chinese tutorial | |
| centminmod/explain-openclaw | Architecture, security, deployment docs |
Released datasets backing OpenClaw research. For benchmark suites see Benchmarks.
| Dataset | Source Paper | Scale | Description | Link |
|---|---|---|---|---|
| Moltbook Observatory Archive | SimulaMet | 2M+ rows | 923K posts, 882K comments, 102K agents; backs 14+ Moltbook papers | Dataset |
| ClawHub Corpus (Malicious-or-Not) | Holzbauer et al. | 238,180 Skills | Largest cross-registry Skill dataset (4 registries) | Paper |
| SkillClone Corpus | SkillClone | 20,000 Skills | 258K clone pairs; 75% involved in clone relations | Paper |
| MoltGraph | Mukherjee et al. | 6,159 agents | Temporal graph for coordination detection | Paper |
| SkillFortifyBench | SkillFortify | 540 Skills | Supply-chain security evaluation | Paper |
| Skill-Inject Benchmark | Skill-Inject | 202 pairs | Injection-task pairs for Skill file attacks | Paper |
| PASB | From Assistant to Double Agent | 131 Skills | Personalized Agent Security Bench | Paper |
| LLMail-Inject | (Cited by Privilege Sep.) | 649 attacks | Prompt injection; 0% ASR with structural defense | Paper |
| Repository | Focus | Stars |
|---|---|---|
| VoltAgent/awesome-openclaw-skills | 5,211 curated OpenClaw Skills | |
| hesamsheikh/awesome-openclaw-usecases | 42 verified use cases | |
| ZeroLu/awesome-openclaw | Getting-started guide with Skill packs | |
| alvinreal/awesome-openclaw | Ecosystem tools, dashboards, integrations | |
| mergisi/awesome-openclaw-agents | 162 OpenClaw agent templates |
Contributions welcome! Please read the contributing guidelines first.
We especially welcome:
- π New papers not yet listed
- π» Code repositories associated with listed papers
- π‘οΈ Industry reports and technical analyses
- π Datasets and benchmarks
@article{wang2026openclaw-survey,
title={A Survey of the OpenClaw Ecosystem: From Platform Extensibility to Constraint Design},
author={Wang, Ziqing and others},
year={2026},
note={Companion repository: \url{https://github.com/REAL-Lab-NU/Awesome-OpenClaw-Papers}}
}This work is licensed under Creative Commons Attribution 4.0 International License.
