Self-learning AI development skills for Zo Computer.
Zouroboros is a self-enhancing AI development toolkit. It starts with specification-first development — Socratic interviews, immutable seed specs, and 3-stage evaluation. Then it closes the loop: the system diagnoses its own health, prescribes improvements, executes them autonomously, and verifies the results.
The snake eats its own tail.
Adapted from Q00/ouroboros. Native TypeScript/Bun, zero Python dependencies, designed as Zo Skills.
| Skill | Description |
|---|---|
| spec-first-interview | Socratic interview → ambiguity scoring → immutable seed YAML |
| three-stage-eval | Mechanical → Semantic → Consensus verification pipeline |
| unstuck-lateral | 5 lateral-thinking personas to break through stagnation |
| autoloop | Autonomous single-metric file optimization loop (inspired by karpathy/autoresearch) |
| Skill | Description |
|---|---|
| zouroboros-introspect | Self-diagnostic health scorecard across 7 system metrics |
| zouroboros-prescribe | Auto-generates improvement seeds from scorecard, with governor safety gate |
| zouroboros-evolve | Executes prescriptions, measures delta, reverts regressions |
| Persona | Purpose |
|---|---|
| Zouroboros | The self-enhancement engine — clinical, metric-driven, autonomous |
| Hacker | Break past constraints creatively |
| Researcher | Stop and investigate systematically |
| Simplifier | Cut to MVP ruthlessly |
| Architect | Fix structural problems |
| Contrarian | Question the problem itself |
git clone https://github.com/marlandoj/Zo-Ouroboros.git /tmp/zouroboros
bash /tmp/zouroboros/install.sh
rm -rf /tmp/zouroborosgit clone https://github.com/marlandoj/Zo-Ouroboros.git /tmp/zouroboros && bash /tmp/zouroboros/install.sh && rm -rf /tmp/zouroboros| Variable | Default | Description |
|---|---|---|
ZOUROBOROS_WORKSPACE |
$HOME |
Root workspace path |
ZOUROBOROS_SKILLS_DIR |
$HOME/Skills |
Where skills are installed |
ZOUROBOROS_IDENTITY_DIR |
$HOME/IDENTITY |
Where persona files go |
ZOUROBOROS_SEEDS_DIR |
$HOME/Seeds/zouroboros |
Where prescriptions are saved |
ZOUROBOROS_MEMORY_DB |
$HOME/.zo/memory/shared-facts.db |
SQLite memory database |
ZOUROBOROS_MEMORY_SCRIPTS |
$HOME/Skills/zo-memory-system/scripts |
Memory system CLI |
- Bun runtime (v1.0+)
- SQLite3 CLI
- zo-memory-system skill (for memory DB, graph, episodes, procedures)
- Optional: Ollama with
qwen2.5:1.5b+nomic-embed-text(for memory gate + embeddings) - Autoloop skill (bundled — for file-targeting metric optimization)
bun Skills/spec-first-interview/scripts/interview.ts score \
--request "Add rate limiting to the API"bun Skills/zouroboros-introspect/scripts/introspect.ts --verboseOutput:
╔════════════════════════════════════════════════════════╗
║ ZOUROBOROS INTROSPECTION SCORECARD ║
╠════════════════════════════════════════════════════════╣
║ ✅ Memory Recall 100.0% → score:100% ║
║ ✅ Graph Connectivity 90.0% → score:100% ║
║ ⚠️ Routing Accuracy N/A → score: 50% ║
║ ⚠️ Eval Calibration N/A → score: 50% ║
║ ⚠️ Procedure Freshness N/A → score: 50% ║
║ ⚠️ Episode Velocity N/A → score: 50% ║
╠════════════════════════════════════════════════════════╣
║ ⚠️ COMPOSITE HEALTH: 70/100 ║
║ Weakest: Routing Accuracy ║
╚════════════════════════════════════════════════════════╝
# Introspect → identify weakest metric
bun Skills/zouroboros-introspect/scripts/introspect.ts --json > /tmp/scorecard.json
# Prescribe → generate improvement seed
bun Skills/zouroboros-prescribe/scripts/prescribe.ts --scorecard /tmp/scorecard.json
# Evolve → execute the prescription
bun Skills/zouroboros-evolve/scripts/evolve.ts --prescription Seeds/zouroboros/rx-*.jsonCreate a scheduled agent that runs the pipeline daily. The Zouroboros persona handles the rest autonomously — diagnosing, prescribing, executing, and reporting via email.
-
Introspect — Measures 7 health metrics across memory, graph, routing, eval, procedures, and episode velocity. Outputs a composite score (0–100) and ranks improvement opportunities.
-
Prescribe — Maps the weakest metric to one of 14 playbooks. Generates a seed YAML (spec-first format) and optionally a program.md (autoloop format). A governor gate blocks high-risk prescriptions.
-
Evolve — Executes the prescription via autoloop (file-targeting) or script mode (procedural). Captures pre/post scorecards. Reverts on regression.
| Metric | Source | Target | Weight |
|---|---|---|---|
| Memory Recall | Continuation eval fixture pass rate | ≥ 85% | 0.22 |
| Graph Connectivity | Knowledge graph orphan fact ratio | ≥ 80% linked | 0.14 |
| Routing Accuracy | Swarm episode success rate | ≥ 85% | 0.18 |
| Eval Calibration | Stage 3 override rate | ≤ 15% | 0.14 |
| Procedure Freshness | Stale procedure ratio (14+ days) | ≤ 30% | 0.14 |
| Episode Velocity | 7-day success trend vs prior 7 days | Positive | 0.08 |
| Skill Effectiveness | Per-skill success rate from skill_executions | ≥ 85% | 0.10 |
| ID | Playbook | Metric | Severity |
|---|---|---|---|
| A | Fixture Expansion | Memory Recall | WARNING |
| B | Graph-Boost Weight Tuning | Memory Recall | CRITICAL |
| C | Batch Wikilink Extraction | Graph Connectivity | WARNING |
| D | Entity Consolidation | Graph Connectivity | CRITICAL |
| E | Signal Weight Adjustment | Routing Accuracy | WARNING |
| F | Capability Keyword Expansion | Routing Accuracy | CRITICAL |
| G | Drift Threshold Adjustment | Eval Calibration | WARNING |
| H | Semantic Fixture Addition | Eval Calibration | CRITICAL |
| I | Batch Procedure Evolution | Procedure Freshness | WARNING |
| J | Procedure Regeneration | Procedure Freshness | CRITICAL |
| K | Failure Root-Cause Analysis | Episode Velocity | WARNING |
| L | Executor Health Check | Episode Velocity | CRITICAL |
| M | Skill Error Pattern Fix | Skill Effectiveness | WARNING |
| N | Tool Call Optimization | Skill Effectiveness | CRITICAL |
The governor prevents runaway self-modification:
- Approval gate — Playbooks marked
⚠️ require human approval before execution - Schema protection — Never touches database migrations or structure
- Blast radius limit — Max 3 files modified per cycle
- Weight bounds — Routing/scoring weights can only change ±10% per cycle
- Regression detection — Any metric dropping >2% triggers automatic revert
- Audit trail — Every cycle stored as a memory episode with full metadata
Before building anything complex, the interview skill runs a Socratic dialogue to surface ambiguity. It only asks questions — it never promises to build. Once clarity reaches 80% across three dimensions (goal 40%, constraints 30%, success criteria 30%), it generates an immutable seed YAML.
In conversation:
You: "I need better logging in the API."
Zo: "What does 'better' mean here — more log volume, structured formats, or centralized collection? Are there specific endpoints that are hard to debug today?"
You: "Structured JSON logs, mainly for the payment endpoints. We're flying blind on failed transactions."
Zo: "What's the retention requirement? And should these logs include PII like customer IDs, or do we need to mask those?"
(5–8 questions later, clarity ≥ 0.80)
Zo: "Requirements are clear. Generating seed spec..." → outputs
seed-structured-logging.yaml
Ambiguity scoring:
bun Skills/spec-first-interview/scripts/interview.ts score --request "Make the site faster"
# → Ambiguity: 0.98 — HIGH AMBIGUITY
bun Skills/spec-first-interview/scripts/interview.ts score \
--request "Add Redis caching to /api/products with 5-min TTL, invalidate on product updates"
# → Ambiguity: 0.15 — READYAfter building something, the eval pipeline verifies it against the seed spec in three progressive stages. Each stage is a gate — if it fails, you stop and fix before moving on.
In conversation:
You: "I finished the structured logging. Can you verify it?"
Zo: (Stage 1 — Mechanical) "Running lint, tests, type checks... all passing, 91% coverage. Moving to semantic evaluation."
Zo: (Stage 2 — Semantic) "Checking acceptance criteria against the seed: ✓ JSON format with timestamp, level, context fields ✓ Payment endpoints emit structured logs on failure ✗ PII masking not implemented — customer_id appears in plaintext Score: 0.74 — triggering consensus stage."
Zo: (Stage 3 — Consensus, 3 perspectives) Proposer: "Core logging works. PII issue is a one-line fix." Devil's Advocate: "Shipping unmasked PII to log aggregators is a compliance violation. This isn't a minor gap." Synthesizer: "NEEDS WORK. Fix PII masking, then resubmit."
| Stage | Cost | Checks |
|---|---|---|
| 1. Mechanical | $0 | Compile, lint, test, coverage |
| 2. Semantic | Low | AC compliance, goal alignment, drift score |
| 3. Consensus | Medium | 3-perspective deliberation (if drift > 0.3 or score in 0.7–0.8) |
bun Skills/three-stage-eval/scripts/evaluate.ts \
--artifact ./my-project/ --seed ./seeds/seed-abc.yamlWhen you're stuck — same error after multiple attempts, going in circles, or the eval keeps returning NEEDS WORK — the unstuck skill auto-selects a lateral-thinking persona based on your problem description.
In conversation:
You: "I keep hitting a 403 on the webhook endpoint. I've tried 5 different auth approaches."
Zo (auto-selects Hacker): "You're assuming you need direct API access. What if you delegated auth to a service account that already has the permissions? Or proxied through an internal endpoint that bypasses the external auth layer?"
You: "I don't understand why this returns stale data. The cache should have invalidated."
Zo (auto-selects Researcher): "Let's stop guessing. Have we confirmed the invalidation event actually fires? Let me check the event bus logs for the last 24 hours and compare timestamps against the cache TTL."
You: "This feature is ballooning. I'm touching 12 files just to add one field."
Zo (auto-selects Simplifier): "Drop the persistence layer for MVP. Store it in memory only. That eliminates 8 of those 12 files. Ship that, get feedback, then decide if you need the rest."
You: "Every change to the request handler breaks something else."
Zo (auto-selects Architect): "You're hitting high coupling. The handler mixes routing, validation, and response formatting. Extract validation into a middleware layer. Simple changes stop cascading."
You: "Should we even be building this? Nobody asked for it."
Zo (auto-selects Contrarian): "Then don't build it. Run a 5-question user survey first. If fewer than 30% say they want this, you just saved two weeks."
| Signal | Persona |
|---|---|
| "error", "can't", "constraint" | Hacker |
| "don't understand", "why" | Researcher |
| "too complex", "overwhelming" | Simplifier |
| "keeps breaking", "touching everything" | Architect |
| "wrong approach", "step back" | Contrarian |
Zouroboros/
├── README.md
├── LICENSE
├── install.sh # Installer script
├── zouroboros.config.ts # Portable path configuration
├── skills/
│ ├── spec-first-interview/ # Socratic interview + seed generation
│ │ ├── SKILL.md
│ │ ├── scripts/interview.ts
│ │ └── references/
│ ├── three-stage-eval/ # 3-stage verification pipeline
│ │ ├── SKILL.md
│ │ ├── scripts/evaluate.ts
│ │ └── references/
│ ├── unstuck-lateral/ # 5 lateral-thinking personas
│ │ ├── SKILL.md
│ │ └── references/
│ ├── autoloop/ # Autonomous metric optimization loop
│ │ ├── SKILL.md
│ │ ├── scripts/autoloop.ts
│ │ ├── assets/template.program.md
│ │ └── references/
│ ├── zouroboros-introspect/ # Self-diagnostic scorecard
│ │ ├── SKILL.md
│ │ ├── scripts/introspect.ts
│ │ ├── scripts/skill-tracker.ts # Skill execution recorder
│ │ └── references/metric-thresholds.md
│ ├── zouroboros-prescribe/ # Self-prescription engine
│ │ ├── SKILL.md
│ │ ├── scripts/prescribe.ts
│ │ └── references/playbooks.md
│ └── zouroboros-evolve/ # Evolution executor
│ ├── SKILL.md
│ └── scripts/evolve.ts
└── personas/
├── zouroboros.md # Self-enhancement persona template
├── unstuck-hacker.md
├── unstuck-researcher.md
├── unstuck-simplifier.md
├── unstuck-architect.md
└── unstuck-contrarian.md
┌─────────────────────────────────────────────────────────────┐
│ USER (Approval) │
│ • Reviews high-risk prescriptions │
│ • Receives daily email scorecards │
│ • Can override governor, adjust thresholds │
└────────────────────────┬────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ INTROSPECT │→│ PRESCRIBE │→│ EVOLVE │
│ (measure) │ │ (plan) │ │ (execute) │
│ │ │ │ │ │
│ 7 metrics │ │ 14 playbooks│ │ Autoloop or │
│ Composite │ │ Governor │ │ Script mode │
│ score 0-100 │ │ Seed YAML │ │ Pre/post │
│ │ │ Program.md │ │ scorecard │
└──────┬──────┘ └─────────────┘ └──────┬──────┘
│ │
│ ┌─────────────┐ │
└───────→│ MEMORY │←────────┘
│ │
│ Facts │
│ Episodes │
│ Procedures │
│ Graph │
└─────────────┘
Zouroboros builds on these Zo Computer subsystems:
| System | Role | Required? |
|---|---|---|
| zo-memory-system | Facts, episodes, procedures, graph, embeddings | Yes |
| zo-swarm-orchestrator | Parallel task execution with 6-signal routing | For routing metrics |
| autoloop | Single-metric file optimization (bundled) | For file-targeting playbooks |
| Ollama | Local inference (memory gate, auto-capture, procedure evolution) | For memory gate |
- Add a collector function in
introspect.ts(follow themeasureMemoryRecallpattern) - Add thresholds in
references/metric-thresholds.md - Add playbooks in
references/playbooks.md(WARNING + CRITICAL variants) - Register the playbook in
prescribe.ts'sgetPlaybook()switch - Add an executor in
evolve.tsif the playbook uses script mode
- Define it in
references/playbooks.mdwith target file, metric command, and constraints - Add it to the
getPlaybook()registry inprescribe.ts - If script mode: add a case in
evolve.ts's execution switch - If file mode: autoloop handles it automatically via program.md
Edit references/metric-thresholds.md and the corresponding constants in introspect.ts. The system will auto-calibrate — if composite stays above 90 for 2+ weeks, tighten targets.
Adapted from Q00/ouroboros by @Q00 — a specification-first AI development system. Also inspired by potentialInc/claude-ooo and karpathy/autoresearch patterns.
Self-enhancement architecture designed and built on Zo Computer.
MIT — see LICENSE.
