You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+66-31Lines changed: 66 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,29 @@
1
1
# LightHeart OpenClaw
2
2
3
-
**Keep your AI agents running. No matter what they do to themselves.**
4
-
5
-
An open source operations toolkit for persistent LLM agents. Built for [OpenClaw](https://openclaw.io) but many components work with any agent framework or service stack.
6
-
7
-
This toolkit is the infrastructure layer of a proven multi-agent architecture — the [OpenClaw Collective](COLLECTIVE.md) — where 3 AI agents coordinate autonomously on shared projects using local GPU hardware. The companion repository **Android-Labs** (private) is the proof of work: 3,464 commits from 3 agents over 8 days, producing three shipping products and 50+ technical research documents. These tools kept them running.
3
+
**A methodology for building persistent AI agent teams that actually work.**
4
+
5
+
Patterns, tools, and battle-tested operational knowledge for running AI agents
6
+
that stay up for hours, days, and weeks — coordinating with each other, learning
7
+
from failures, and shipping real software. Built from production experience
8
+
running 3+ agents 24/7 on local hardware.
9
+
10
+
About 70% of this repository is framework-agnostic. The patterns for identity,
11
+
memory, coordination, autonomy, and observability apply to any agent system —
12
+
Claude Code, LangChain, AutoGPT, custom agents, or anything else that runs long
13
+
enough to accumulate state. The remaining 30% is a reference implementation
14
+
using [OpenClaw](https://openclaw.io) and vLLM that demonstrates the patterns
15
+
concretely.
16
+
17
+
This is the infrastructure layer of a proven multi-agent architecture — the
18
+
[OpenClaw Collective](COLLECTIVE.md) — where 3 AI agents coordinate
19
+
autonomously on shared projects using local GPU hardware. The companion
20
+
repository **Android-Labs** (private) is the proof of work: 3,464 commits from
21
+
3 agents over 8 days, producing three shipping products and 50+ technical
22
+
research documents. These tools kept them running.
23
+
24
+
**Start here:**[docs/PHILOSOPHY.md](docs/PHILOSOPHY.md) — the conceptual
25
+
foundation, five pillars, complete failure taxonomy, and a reading map based on
26
+
what you're building.
8
27
9
28
| Component | What it does | Requires OpenClaw? | Platform |
@@ -20,42 +39,57 @@ This toolkit is the infrastructure layer of a proven multi-agent architecture
20
39
21
40
## What's Inside
22
41
23
-
### Session Watchdog
24
-
A lightweight daemon that monitors `.jsonl` session files and automatically cleans up bloated ones before they hit the context ceiling. Runs on a timer, catches danger-zone sessions, deletes them, and removes their references from `sessions.json` so the gateway seamlessly creates fresh ones.
25
-
26
-
**The agent doesn't even notice.** It just gets a clean context window mid-conversation. No more `Context overflow: prompt too large for the model` crashes.
27
-
28
-
### vLLM Tool Call Proxy (v4)
29
-
A transparent proxy between OpenClaw and vLLM that makes local model tool calling actually work. Handles SSE re-wrapping, tool call extraction from text, response cleaning, and loop protection.
42
+
### The Methodology
30
43
31
-
Without it, you get "No reply from agent" with 0 tokens. With it, your local agents just work.
44
+
These docs capture what we learned running persistent agent teams. They apply to
45
+
any framework.
32
46
33
-
### Token Spy — API Cost & Usage Monitor
34
-
A transparent API proxy that captures per-turn token usage, cost, latency, and session health for cloud model calls (Anthropic, OpenAI, Moonshot). Point your agent's `baseUrl` at Token Spy instead of the upstream API — it logs everything, then forwards requests and responses untouched, including SSE streams.
47
+
| Doc | What It Covers |
48
+
|-----|---------------|
49
+
|[PHILOSOPHY.md](docs/PHILOSOPHY.md)|**Start here.** Five pillars of persistent agents, failure taxonomy, reading map, framework portability guide |
50
+
|[WRITING-BASELINES.md](memory-shepherd/docs/WRITING-BASELINES.md)| How to define agent identity that survives resets and drift |
Includes a real-time dashboard with session health cards, cost charts, token breakdown, and cumulative spend tracking. Can auto-kill sessions that exceed a configurable character limit. Works with any OpenAI-compatible or Anthropic API client.
55
+
### The Reference Implementation (OpenClaw + vLLM)
37
56
38
-
### Golden Configs
39
-
Battle-tested `openclaw.json` and `models.json` templates with the critical `compat` block that prevents OpenClaw from sending parameters vLLM silently rejects. Getting these four flags wrong produces mysterious failures with no error messages — we figured them out so you don't have to.
57
+
Working tools that implement the methodology. Use them directly or adapt the
58
+
patterns to your stack.
40
59
41
-
### Workspace Templates
42
-
Starter personality files (`SOUL.md`, `IDENTITY.md`, `TOOLS.md`, `MEMORY.md`) that OpenClaw injects into every agent session. Customize your agent's personality, knowledge, and working memory.
60
+
**Session Watchdog** — Monitors `.jsonl` session files and cleans up bloated
61
+
ones before they hit the context ceiling. The agent doesn't notice — it just
62
+
gets a clean context window mid-conversation.
43
63
44
-
### Memory Shepherd
45
-
Periodic memory reset for persistent LLM agents. Agents accumulate scratch notes in `MEMORY.md` during operation — Memory Shepherd archives those notes and restores the file to a curated baseline on a schedule. Keeps agents on-mission by preventing context drift, memory bloat, and self-modification of instructions.
64
+
**vLLM Tool Call Proxy (v4)** — Transparent proxy between OpenClaw and vLLM
65
+
that makes local model tool calling work. Handles SSE re-wrapping, tool call
66
+
extraction from text, response cleaning, and loop protection.
46
67
47
-
Defines a `---` separator convention: everything above is operator-controlled identity (rules, capabilities, pointers), everything below is agent scratch space that gets archived and cleared. See [memory-shepherd/README.md](memory-shepherd/README.md) for full documentation.
68
+
**Token Spy** — Transparent API proxy that captures per-turn token usage, cost,
69
+
latency, and session health for cloud model calls (Anthropic, OpenAI, Moonshot).
70
+
Real-time dashboard with session health cards, cost charts, and auto-kill for
71
+
sessions exceeding configurable limits. Works with any OpenAI-compatible or
72
+
Anthropic API client.
48
73
49
-
### Guardian
50
-
Self-healing process watchdog for LLM infrastructure. Runs as a root systemd service that agents cannot kill or modify. Monitors processes, systemd services, Docker containers, and file integrity — automatically restoring from known-good backups when things break.
74
+
**Memory Shepherd** — Periodic memory reset for persistent agents. Archives
75
+
scratch notes and restores MEMORY.md to a curated baseline on a schedule.
76
+
Defines the `---` separator convention: operator-controlled identity above,
77
+
agent scratch space below.
51
78
52
-
Supports tiered health checks (port listening, HTTP endpoints, custom commands, JSON validation), a recovery cascade (soft restart → backup restore → restart), generational backups with immutable flags, and restart delegation chains. Everything is config-driven via an INI file. See [guardian/README.md](guardian/README.md) for full documentation.
79
+
**Guardian** — Self-healing process watchdog for LLM infrastructure. Runs as a
80
+
root systemd service that agents cannot kill or modify. Monitors processes,
81
+
systemd services, Docker containers, and file integrity — automatically
82
+
restoring from known-good backups when things break. Supports tiered health
83
+
checks, recovery cascades, and generational backups. See
84
+
[guardian/README.md](guardian/README.md) for full documentation.
53
85
54
-
### Architecture Docs
55
-
Deep-dive documentation on how OpenClaw talks to vLLM, why the proxy exists, how session files work, and the five failure points that kill local setups.
86
+
**Golden Configs** — Battle-tested `openclaw.json` and `models.json` with the
87
+
critical `compat` block that prevents silent failures. Workspace templates for
88
+
agent personality, identity, tools, and working memory.
56
89
57
-
### Operational Guides
58
-
Lessons learned from running agents 24/7, multi-agent coordination patterns, and infrastructure protection strategies — all discovered by persistent agents running on local hardware. See the [docs/](docs/) directory.
90
+
**Architecture Docs** — How OpenClaw talks to vLLM, why the proxy exists, how
91
+
session files work, and the five failure points that kill local setups.
92
+
See [ARCHITECTURE.md](docs/ARCHITECTURE.md) and [SETUP.md](docs/SETUP.md).
59
93
60
94
---
61
95
@@ -350,6 +384,7 @@ LightHeart-OpenClaw/
350
384
│ └── docs/
351
385
│ └── HEALTH-CHECKS.md # Health check & recovery reference
│ ├── TOKEN-SPY.md # Token Spy setup & API reference
@@ -439,4 +474,4 @@ Apache 2.0 — see [LICENSE](LICENSE)
439
474
440
475
---
441
476
442
-
Built by [Lightheart Labs](https://github.com/Light-Heart-Labs) and the [OpenClaw Collective](COLLECTIVE.md) from real production pain running autonomous AI agents on local hardware.
477
+
Built from production experience by [Lightheart Labs](https://github.com/Light-Heart-Labs) and the [OpenClaw Collective](COLLECTIVE.md). The patterns were discovered by the agents. The docs were written by the agents. The lessons were learned the hard way.
0 commit comments