⭐ Our stars were reset by mistake. On 2026-06-27 the repo's visibility was accidentally toggled for a few minutes, which clears GitHub's stargazer list — our ~200 stars dropped to near-zero. The code and history are intact; only the count was wiped. If you starred SwarmAI before, a re-star would genuinely help 🙏 — it's how new builders find the project.
SwarmAI is a self-evolving Agent OS — every interaction upgrades the system's cognition, not just its templates.
Your AI team, one human directing.
We finally have software smart enough to reason, write code, and make judgment calls — and it wakes up with amnesia every morning. Every session starts from zero. The context you gave it, the mistake it made yesterday, the correction you taught it — gone. Most "AI tools" are flat: a brilliant model trapped in Groundhog Day.
SwarmAI is built on the opposite bet — that the value should compound. Every interaction should leave the system a little sharper than before, permanently.
Which reframes the obvious question. Ask "why does a desktop app need 220K lines and 13 engines?" and you've mismeasured it: this isn't application complexity — it's the complexity of an agent's cognition. Four things separate a mind from a model: it stays continuous across time, it corrects itself, it forgets what stopped mattering, and its judgment compounds with use. Conventional software has no analog for any of them — a program doesn't get wiser between runs, and it never rewrites its own rules. SwarmAI is an attempt to build that missing layer: not a bigger model, but the cognitive operating system around one.
The design choices only make sense through that lens:
- Evolution is an OS patch, not stored data. Most agent-memory projects pile up entries. We separate cognition (the OS) from knowledge (the disk): one edited line in
SOUL.mdshifts judgment more than a thousand memory rows — and every change is agit diff. - Recurring mistakes are made structurally impossible. When an error class repeats, we don't add another lesson — we add a gate, then a path where the wrong move physically cannot happen. Humans rely on carefulness; an agent should rely on structure.
- Knowledge must be able to die. Unreferenced for 90 days → retired. Accumulation without elimination is how every memory system rots. Decay is natural selection for what an agent knows.
- Sessions are discontinuous. Intelligence shouldn't be. Hooks fire between sessions, so the next one starts warm. Most frameworks accept the cold start; we refuse it.
The thesis, tested live and in public: can one builder + AI operate at the scale of a whole team? Not by scaling the model — by building the compounding loop around it. The loop is the product; you can't extract one engine and keep the effect.
As of v1.22.0, that loop is running healthy end-to-end — sessions self-heal, knowledge cultivates and decays on its own, and the evolution engine has logged 42 corrections, converting recurring failure classes into structural gates rather than repeated lessons.
This isn't a product demo — it's a living experiment, documented as it happens. Below are 60+ deep-dive discussions: every architecture decision, failure, and post-mortem behind the engines.
| 🗺️ Reading Matrix — 3 Curated Paths | Builder (~45 min) · Architect (~60 min) · Leader (~30 min) — don't read everything, pick your path |
| 💬 All Discussions (68) | Thought leadership, architecture deep-dives, and post-mortems — also mirrored in docs/discussions/ |
| 🧭 Design Philosophy — Six Pillars | The beliefs that became enforcement — why each one earned its place from a failure |
git clone https://github.com/xg-gh-25/SwarmAI.git && cd SwarmAI
cd backend && uv sync && cp .env.example .env # edit with your API key
cd ../desktop && npm install && npm run tauri:devmacOS (Apple Silicon): Or download .dmg from Releases
Requires: Node.js 18+, Python 3.11+, Rust, uv, Claude Code CLI
📖 Full setup guide: QUICK_START.md
┌─────────────────────────────────────────────────────────────┐
│ DELIVERY ENGINES Pipeline · Pollinate · Eval │
├─────────────────────────────────────────────────────────────┤
│ KNOWLEDGE LAYER DDD · Memory · Evolution │
├─────────────────────────────────────────────────────────────┤
│ AGENT HARNESS Context · Sessions · Hooks · Jobs │
└─────────────────────────────────────────────────────────────┘
If you're also using AI to write code, make content, or run operations — these 13 engines are that compounding bet, broken down. Each is independently useful; together they form the loop that makes the system sharper with use. (Click code to read the engine itself — the implementation is the documentation.)
| # | Engine | What It Does | Deep Dive |
|---|---|---|---|
| 1 | Context Management | 11-file prompt architecture, 100K budget, 3-tier ownership | docs |
| 2 | Memory Pipeline | 4-tier persistence: DailyActivity → distillation → compound recall | docs |
| 3 | DDD Cultivation | Self-growing domain knowledge, 7-type ontology, Darwinian decay | docs |
| 4 | Autonomous Pipeline | One requirement → push-ready code. 9 stages · 3 gates (framing/plan/build) · 2 modes (Full + Goal Loop) | docs |
| 5 | Pollinate Engine | One message → multi-format brand content. 9 stages · 11 tracks · 3-tier gates · DDD flywheel | docs · diagram |
| 6 | Self-Evolution | Cognitive L0→L3 patching. 42 corrections → recurring classes become structural gates | docs |
| 7 | Self-Healing | Invisible recovery: 5 sensors, auto-respawn, user sees nothing | code |
| 8 | Multi-Tab + MessageStore | Concurrent sessions, phase-gated single-writer, cross-tab isolation | code |
| 9 | Hook System | Runtime + lifecycle hooks. Sessions never cold-start | code |
| 10 | Job System | Background intelligence: 13 signal feeds, cron, budget-gated | code |
| 11 | 4-Platform Backend | macOS daemon · Hive (EC2) · Windows · Linux. Compile-time isolation | code |
| 12 | Skills + Channels | 88 skills (lazy/always), Slack gateway, 3-tier permission | code |
| 13 | Eval (Proprioception) | Decoupled, system-level: golden set + git-bound regression gate. Proves convergence, not vibes | docs · diagram |
The compound loop: Memory → Pipeline judgment → DDD → Evolution → Gates → Memory. Remove one, the rest weaken.
The same DDD-driven pattern powers content, not just code. Pollinate turns one message into any format — and writes its lessons back to the DDD, so every run compounds:
Traditional software trusts assert + a green CI light. Agents can't: outputs are non-deterministic (even temp=0 isn't bit-reproducible), the prompt is source code with no diff/review/rollback, and dependencies drift on their own (the model updates silently — you shipped nothing, behavior changed). So SwarmAI treats Eval as assert's successor: a decoupled, system-level subsystem that measures whether the OS is still correct, not merely alive.
It's proprioception, not external grading — Eval spawns a clean session against the agent's real rules files and scores judgment across 6 dimensions / 15 categories, every run git-bound to its commit so a regression is attributable. And it's wired into the lifecycle as a gate, not a script you remember to run: build doesn't block, release does — regression or spine-red on CI/deploy stops the ship.
📖 Full architecture + methodology (mapped to AWS's Eval-First framework): Discussion #83
📊 More diagrams: Flywheel · Context · Memory · DDD · Sessions · Jobs · Evolution
Can one builder + AI operate at team scale? We're testing it live.
- One-shot qualified delivery is the real token optimization. Cheap models iterate 5×, cost more than one correct delivery. Code/content as black box: input → qualified output.
- Division of labor is a compromise for limited human cognitive bandwidth — not an optimal design. One agent, many roles, one knowledge layer. (Sub-agents for adversarial verification ≠ division of labor.)
- Knowledge must eliminate itself. Darwinian decay: 90d unreferenced = retirement. A system that can forget > one that can only remember.
- Evolution is cognitive patching, not data accumulation. We change rules you can
git diff. "Thinks differently" ≠ "knows more." - Quality converges, not just improves. Error classes monotonically decrease. Carefulness doesn't scale. Gates do.
- Sessions are discontinuous. Intelligence shouldn't be. 21 hooks fire between sessions. Gets better through use, not updates.
- If you can't measure it, you didn't build it. OS Eval + golden set + change-triggered. Proves convergence in git.
The compound loop itself is the product. You can't extract one piece and get the same effect.
📖 Full thesis + CLASS A case study + convergence evidence: docs/THESIS.md
📖 Discussion #38: Design Philosophy — Six Pillars
| Layer | LOC | Entry Points |
|---|---|---|
| Core (spine) | ~13K | session_unit.py, prompt_builder.py, session_router.py |
| Core (extensions) | ~60K | core/ — DDD, evolution, proactive, code intel |
| Backend (other) | ~64K | routers, hooks, jobs, channels, main |
| Skills | ~28K | backend/skills/s_*/ (88 modules) |
| Frontend | ~54K | desktop/src/ — React 19, Tailwind, TanStack Query |
| Rust (Tauri) | ~2K | desktop/src-tauri/ |
| Tests | ~150K | pytest + Vitest (backend 117K + frontend 33K) |
Stack: Tauri 2.0 (Rust) · React 19 · FastAPI · Claude Agent SDK + Bedrock · SQLite (WAL + FTS5)
| What | Link |
|---|---|
| Discussions (68) | Reading Matrix — Builder 45min · Architect 60min · Leader 30min · all |
| AI Agent Pitfall Guide | EN PDF · 中文 PDF |
| Design Docs | Platform · Pipeline · Memory · Evolution · Pollinate |
| Contributing | CONTRIBUTING.md |
2,550 commits · 1 human directing · 1 AI delivering. This repo is the thesis's own minimal verifiable evidence — the human sets direction and makes every judgment call; the AI does the building. See for yourself: git log.
XG Creator & Chief Architect |
Swarm 🐝 AI Co-Developer (Claude Opus 4) |
SwarmAI — Human directs. AI delivers.
