Skip to content

marlandoj/zouroboros-seedkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐍 Zouroboros

Self-learning AI development skills for Zo Computer.

Zouroboros is a self-enhancing AI development toolkit. It starts with specification-first development — Socratic interviews, immutable seed specs, and 3-stage evaluation. Then it closes the loop: the system diagnoses its own health, prescribes improvements, executes them autonomously, and verifies the results.

The snake eats its own tail.

Zouroboros Self-Enhancement Pipeline

Adapted from Q00/ouroboros. Native TypeScript/Bun, zero Python dependencies, designed as Zo Skills.


What's Included

Foundational Skills

Skill Description
spec-first-interview Socratic interview → ambiguity scoring → immutable seed YAML
three-stage-eval Mechanical → Semantic → Consensus verification pipeline
unstuck-lateral 5 lateral-thinking personas to break through stagnation
autoloop Autonomous single-metric file optimization loop (inspired by karpathy/autoresearch)

Self-Enhancement Skills (the closed loop)

Skill Description
zouroboros-introspect Self-diagnostic health scorecard across 7 system metrics
zouroboros-prescribe Auto-generates improvement seeds from scorecard, with governor safety gate
zouroboros-evolve Executes prescriptions, measures delta, reverts regressions

Personas

Persona Purpose
Zouroboros The self-enhancement engine — clinical, metric-driven, autonomous
Hacker Break past constraints creatively
Researcher Stop and investigate systematically
Simplifier Cut to MVP ruthlessly
Architect Fix structural problems
Contrarian Question the problem itself

Install

Zo Computer (recommended)

git clone https://github.com/marlandoj/Zo-Ouroboros.git /tmp/zouroboros
bash /tmp/zouroboros/install.sh
rm -rf /tmp/zouroboros

One-liner

git clone https://github.com/marlandoj/Zo-Ouroboros.git /tmp/zouroboros && bash /tmp/zouroboros/install.sh && rm -rf /tmp/zouroboros

Environment Variables (optional)

Variable Default Description
ZOUROBOROS_WORKSPACE $HOME Root workspace path
ZOUROBOROS_SKILLS_DIR $HOME/Skills Where skills are installed
ZOUROBOROS_IDENTITY_DIR $HOME/IDENTITY Where persona files go
ZOUROBOROS_SEEDS_DIR $HOME/Seeds/zouroboros Where prescriptions are saved
ZOUROBOROS_MEMORY_DB $HOME/.zo/memory/shared-facts.db SQLite memory database
ZOUROBOROS_MEMORY_SCRIPTS $HOME/Skills/zo-memory-system/scripts Memory system CLI

Prerequisites

  • Bun runtime (v1.0+)
  • SQLite3 CLI
  • zo-memory-system skill (for memory DB, graph, episodes, procedures)
  • Optional: Ollama with qwen2.5:1.5b + nomic-embed-text (for memory gate + embeddings)
  • Autoloop skill (bundled — for file-targeting metric optimization)

Quick Start

1. Score a request

bun Skills/spec-first-interview/scripts/interview.ts score \
  --request "Add rate limiting to the API"

2. Run the self-diagnostic

bun Skills/zouroboros-introspect/scripts/introspect.ts --verbose

Output:

╔════════════════════════════════════════════════════════╗
║  ZOUROBOROS INTROSPECTION SCORECARD                     ║
╠════════════════════════════════════════════════════════╣
║  ✅ Memory Recall          100.0% → score:100%  ║
║  ✅ Graph Connectivity      90.0% → score:100%  ║
║  ⚠️  Routing Accuracy        N/A   → score: 50%  ║
║  ⚠️  Eval Calibration        N/A   → score: 50%  ║
║  ⚠️  Procedure Freshness     N/A   → score: 50%  ║
║  ⚠️  Episode Velocity        N/A   → score: 50%  ║
╠════════════════════════════════════════════════════════╣
║  ⚠️  COMPOSITE HEALTH: 70/100                          ║
║     Weakest: Routing Accuracy                           ║
╚════════════════════════════════════════════════════════╝

3. Run the full self-enhancement pipeline

# Introspect → identify weakest metric
bun Skills/zouroboros-introspect/scripts/introspect.ts --json > /tmp/scorecard.json

# Prescribe → generate improvement seed
bun Skills/zouroboros-prescribe/scripts/prescribe.ts --scorecard /tmp/scorecard.json

# Evolve → execute the prescription
bun Skills/zouroboros-evolve/scripts/evolve.ts --prescription Seeds/zouroboros/rx-*.json

4. Schedule it (Zo Computer)

Create a scheduled agent that runs the pipeline daily. The Zouroboros persona handles the rest autonomously — diagnosing, prescribing, executing, and reporting via email.


The Self-Enhancement Loop

How It Works

  1. Introspect — Measures 7 health metrics across memory, graph, routing, eval, procedures, and episode velocity. Outputs a composite score (0–100) and ranks improvement opportunities.

  2. Prescribe — Maps the weakest metric to one of 14 playbooks. Generates a seed YAML (spec-first format) and optionally a program.md (autoloop format). A governor gate blocks high-risk prescriptions.

  3. Evolve — Executes the prescription via autoloop (file-targeting) or script mode (procedural). Captures pre/post scorecards. Reverts on regression.

7 Health Metrics

Metric Source Target Weight
Memory Recall Continuation eval fixture pass rate ≥ 85% 0.22
Graph Connectivity Knowledge graph orphan fact ratio ≥ 80% linked 0.14
Routing Accuracy Swarm episode success rate ≥ 85% 0.18
Eval Calibration Stage 3 override rate ≤ 15% 0.14
Procedure Freshness Stale procedure ratio (14+ days) ≤ 30% 0.14
Episode Velocity 7-day success trend vs prior 7 days Positive 0.08
Skill Effectiveness Per-skill success rate from skill_executions ≥ 85% 0.10

14 Playbooks

ID Playbook Metric Severity
A Fixture Expansion Memory Recall WARNING
B Graph-Boost Weight Tuning Memory Recall CRITICAL
C Batch Wikilink Extraction Graph Connectivity WARNING
D Entity Consolidation Graph Connectivity CRITICAL
E Signal Weight Adjustment Routing Accuracy WARNING
F Capability Keyword Expansion Routing Accuracy CRITICAL ⚠️
G Drift Threshold Adjustment Eval Calibration WARNING
H Semantic Fixture Addition Eval Calibration CRITICAL ⚠️
I Batch Procedure Evolution Procedure Freshness WARNING
J Procedure Regeneration Procedure Freshness CRITICAL
K Failure Root-Cause Analysis Episode Velocity WARNING
L Executor Health Check Episode Velocity CRITICAL ⚠️
M Skill Error Pattern Fix Skill Effectiveness WARNING ⚠️
N Tool Call Optimization Skill Effectiveness CRITICAL ⚠️

⚠️ = Requires human approval (governor blocks autonomous execution)

Governor Safety Rules

The governor prevents runaway self-modification:

  1. Approval gate — Playbooks marked ⚠️ require human approval before execution
  2. Schema protection — Never touches database migrations or structure
  3. Blast radius limit — Max 3 files modified per cycle
  4. Weight bounds — Routing/scoring weights can only change ±10% per cycle
  5. Regression detection — Any metric dropping >2% triggers automatic revert
  6. Audit trail — Every cycle stored as a memory episode with full metadata

Foundational Skills

Spec-First Interview

Before building anything complex, the interview skill runs a Socratic dialogue to surface ambiguity. It only asks questions — it never promises to build. Once clarity reaches 80% across three dimensions (goal 40%, constraints 30%, success criteria 30%), it generates an immutable seed YAML.

In conversation:

You: "I need better logging in the API."

Zo: "What does 'better' mean here — more log volume, structured formats, or centralized collection? Are there specific endpoints that are hard to debug today?"

You: "Structured JSON logs, mainly for the payment endpoints. We're flying blind on failed transactions."

Zo: "What's the retention requirement? And should these logs include PII like customer IDs, or do we need to mask those?"

(5–8 questions later, clarity ≥ 0.80)

Zo: "Requirements are clear. Generating seed spec..." → outputs seed-structured-logging.yaml

Ambiguity scoring:

bun Skills/spec-first-interview/scripts/interview.ts score --request "Make the site faster"
# → Ambiguity: 0.98 — HIGH AMBIGUITY

bun Skills/spec-first-interview/scripts/interview.ts score \
  --request "Add Redis caching to /api/products with 5-min TTL, invalidate on product updates"
# → Ambiguity: 0.15 — READY

Three-Stage Evaluation

After building something, the eval pipeline verifies it against the seed spec in three progressive stages. Each stage is a gate — if it fails, you stop and fix before moving on.

In conversation:

You: "I finished the structured logging. Can you verify it?"

Zo: (Stage 1 — Mechanical) "Running lint, tests, type checks... all passing, 91% coverage. Moving to semantic evaluation."

Zo: (Stage 2 — Semantic) "Checking acceptance criteria against the seed: ✓ JSON format with timestamp, level, context fields ✓ Payment endpoints emit structured logs on failure ✗ PII masking not implemented — customer_id appears in plaintext Score: 0.74 — triggering consensus stage."

Zo: (Stage 3 — Consensus, 3 perspectives) Proposer: "Core logging works. PII issue is a one-line fix." Devil's Advocate: "Shipping unmasked PII to log aggregators is a compliance violation. This isn't a minor gap." Synthesizer: "NEEDS WORK. Fix PII masking, then resubmit."

Stage Cost Checks
1. Mechanical $0 Compile, lint, test, coverage
2. Semantic Low AC compliance, goal alignment, drift score
3. Consensus Medium 3-perspective deliberation (if drift > 0.3 or score in 0.7–0.8)
bun Skills/three-stage-eval/scripts/evaluate.ts \
  --artifact ./my-project/ --seed ./seeds/seed-abc.yaml

Unstuck Lateral

When you're stuck — same error after multiple attempts, going in circles, or the eval keeps returning NEEDS WORK — the unstuck skill auto-selects a lateral-thinking persona based on your problem description.

In conversation:

You: "I keep hitting a 403 on the webhook endpoint. I've tried 5 different auth approaches."

Zo (auto-selects Hacker): "You're assuming you need direct API access. What if you delegated auth to a service account that already has the permissions? Or proxied through an internal endpoint that bypasses the external auth layer?"

You: "I don't understand why this returns stale data. The cache should have invalidated."

Zo (auto-selects Researcher): "Let's stop guessing. Have we confirmed the invalidation event actually fires? Let me check the event bus logs for the last 24 hours and compare timestamps against the cache TTL."

You: "This feature is ballooning. I'm touching 12 files just to add one field."

Zo (auto-selects Simplifier): "Drop the persistence layer for MVP. Store it in memory only. That eliminates 8 of those 12 files. Ship that, get feedback, then decide if you need the rest."

You: "Every change to the request handler breaks something else."

Zo (auto-selects Architect): "You're hitting high coupling. The handler mixes routing, validation, and response formatting. Extract validation into a middleware layer. Simple changes stop cascading."

You: "Should we even be building this? Nobody asked for it."

Zo (auto-selects Contrarian): "Then don't build it. Run a 5-question user survey first. If fewer than 30% say they want this, you just saved two weeks."

Signal Persona
"error", "can't", "constraint" Hacker
"don't understand", "why" Researcher
"too complex", "overwhelming" Simplifier
"keeps breaking", "touching everything" Architect
"wrong approach", "step back" Contrarian

File Structure

Zouroboros/
├── README.md
├── LICENSE
├── install.sh                          # Installer script
├── zouroboros.config.ts                # Portable path configuration
├── skills/
│   ├── spec-first-interview/           # Socratic interview + seed generation
│   │   ├── SKILL.md
│   │   ├── scripts/interview.ts
│   │   └── references/
│   ├── three-stage-eval/               # 3-stage verification pipeline
│   │   ├── SKILL.md
│   │   ├── scripts/evaluate.ts
│   │   └── references/
│   ├── unstuck-lateral/                # 5 lateral-thinking personas
│   │   ├── SKILL.md
│   │   └── references/
│   ├── autoloop/                       # Autonomous metric optimization loop
│   │   ├── SKILL.md
│   │   ├── scripts/autoloop.ts
│   │   ├── assets/template.program.md
│   │   └── references/
│   ├── zouroboros-introspect/          # Self-diagnostic scorecard
│   │   ├── SKILL.md
│   │   ├── scripts/introspect.ts
│   │   ├── scripts/skill-tracker.ts   # Skill execution recorder
│   │   └── references/metric-thresholds.md
│   ├── zouroboros-prescribe/           # Self-prescription engine
│   │   ├── SKILL.md
│   │   ├── scripts/prescribe.ts
│   │   └── references/playbooks.md
│   └── zouroboros-evolve/              # Evolution executor
│       ├── SKILL.md
│       └── scripts/evolve.ts
└── personas/
    ├── zouroboros.md                   # Self-enhancement persona template
    ├── unstuck-hacker.md
    ├── unstuck-researcher.md
    ├── unstuck-simplifier.md
    ├── unstuck-architect.md
    └── unstuck-contrarian.md

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      USER (Approval)                         │
│   • Reviews high-risk prescriptions                          │
│   • Receives daily email scorecards                          │
│   • Can override governor, adjust thresholds                 │
└────────────────────────┬────────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
  │  INTROSPECT │→│  PRESCRIBE  │→│   EVOLVE    │
  │  (measure)  │ │  (plan)     │ │  (execute)  │
  │             │ │             │ │             │
  │ 7 metrics   │ │ 14 playbooks│ │ Autoloop or │
  │ Composite   │ │ Governor    │ │ Script mode │
  │ score 0-100 │ │ Seed YAML   │ │ Pre/post    │
  │             │ │ Program.md  │ │ scorecard   │
  └──────┬──────┘ └─────────────┘ └──────┬──────┘
         │                                │
         │        ┌─────────────┐         │
         └───────→│   MEMORY    │←────────┘
                  │             │
                  │ Facts       │
                  │ Episodes    │
                  │ Procedures  │
                  │ Graph       │
                  └─────────────┘

Dependencies

Zouroboros builds on these Zo Computer subsystems:

System Role Required?
zo-memory-system Facts, episodes, procedures, graph, embeddings Yes
zo-swarm-orchestrator Parallel task execution with 6-signal routing For routing metrics
autoloop Single-metric file optimization (bundled) For file-targeting playbooks
Ollama Local inference (memory gate, auto-capture, procedure evolution) For memory gate

Extending

Adding a New Metric

  1. Add a collector function in introspect.ts (follow the measureMemoryRecall pattern)
  2. Add thresholds in references/metric-thresholds.md
  3. Add playbooks in references/playbooks.md (WARNING + CRITICAL variants)
  4. Register the playbook in prescribe.ts's getPlaybook() switch
  5. Add an executor in evolve.ts if the playbook uses script mode

Adding a New Playbook

  1. Define it in references/playbooks.md with target file, metric command, and constraints
  2. Add it to the getPlaybook() registry in prescribe.ts
  3. If script mode: add a case in evolve.ts's execution switch
  4. If file mode: autoloop handles it automatically via program.md

Adjusting Thresholds

Edit references/metric-thresholds.md and the corresponding constants in introspect.ts. The system will auto-calibrate — if composite stays above 90 for 2+ weeks, tighten targets.


Credits

Adapted from Q00/ouroboros by @Q00 — a specification-first AI development system. Also inspired by potentialInc/claude-ooo and karpathy/autoresearch patterns.

Self-enhancement architecture designed and built on Zo Computer.


License

MIT — see LICENSE.

About

Specification-first AI development skills for Zo Computer. Socratic interview → seed spec → 3-stage evaluation → unstuck personas. Adapted from Q00/ouroboros.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors