Skip to content

JSLEEKR/ouroboros-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ouroboros-rs

Rust License Tests

Spec-driven evolutionary workflow engine. Socratic interview, seed crystallization, Double Diamond execution, 3-stage evaluation, and evolutionary loop with convergence detection.

Inspired by ouroboros. Reimplemented from scratch in Rust.

Why This Exists

The original ouroboros (Python) treats specifications as an evolving ontology rather than static documents. It combines Socratic questioning, ambiguity scoring, and an evolutionary loop where each generation's evaluation drives ontology mutations until convergence.

ouroboros-rs reimplements the core algorithmic pipeline in Rust, providing:

  • 10-100x faster specification processing and evaluation cycles
  • Single binary distribution with no Python runtime dependency
  • Type-safe events — 18 typed event variants replace Python's dict[str, Any]
  • Memory safety — no NoneType crashes (original issue #275)
  • Efficient persistence — SQLite event store without async overhead
  • Native Result<T, E> — zero-cost error handling that maps directly from the original's Result monad

Architecture

User Input
    |
[Interview Engine]  -- Multi-perspective Socratic questioning
    |                  (Researcher, Simplifier, Architect, BreadthKeeper, SeedCloser)
    | Ambiguity gate: score <= 0.2
    v
[Seed Generator]    -- LLM extracts structured requirements
    |                  -> immutable Seed with ontology schema
    v
[Evolutionary Loop] -- Up to 30 generations
    |
    +-- [Wonder Engine]     Identify gaps, tensions, assumptions (gen 2+)
    +-- [Reflect Engine]    Propose ontology mutations (add/modify/remove)
    +-- [Seed Evolution]    Apply mutations -> new immutable Seed
    +-- [Double Diamond]    Discover -> Define -> Design -> Deliver
    |     +-- AC Decomposition  (2-5 children, max depth 5)
    |     +-- Topological Sort  (Kahn's algorithm for parallel levels)
    +-- [Evaluation Pipeline]
    |     +-- Stage 1: Mechanical  (lint, build, test, coverage >= 0.7)
    |     +-- Stage 2: Semantic    (LLM compliance, drift, gaming detection)
    |     +-- Stage 3: Consensus   (multi-model voting OR deliberative)
    +-- [Convergence Check]
          +-- Ontology similarity >= 0.95
          +-- Stagnation detection (3-gen window)
          +-- Oscillation detection (period-2 cycling)
          +-- Exit gates (score, AC pass, no regressions)

Quick Start

Installation

# From source
git clone https://github.com/JSLEEKR/ouroboros-rs.git
cd ouroboros-rs
cargo build --release

# Binary is at target/release/ouroboros-rs

CLI Usage

# Show default configuration
ouroboros-rs config

# Start an interview session
ouroboros-rs interview --prompt "Build a REST API for task management"

# Start brownfield interview (existing codebase)
ouroboros-rs interview --brownfield --threshold 0.15

# Run evolutionary loop on a seed
ouroboros-rs evolve --seed seed.json --max-generations 10

# Evaluate artifacts against a seed
ouroboros-rs evaluate --seed seed.json --artifacts output/

# Show version and info
ouroboros-rs info

Library Usage

use ouroboros_rs::{
    InterviewEngine, Seed, EvolutionaryLoop, EvalPipeline, OuroborosConfig,
    llm::{LlmAdapter, CompletionConfig, ProviderError},
    seed::{AcceptanceCriterion, OntologySchema, OntologyField},
};
use async_trait::async_trait;

// 1. Implement the LlmAdapter trait for your provider
struct MyLlmProvider;

#[async_trait]
impl LlmAdapter for MyLlmProvider {
    async fn complete(&self, config: CompletionConfig) -> Result<String, ProviderError> {
        // Your LLM API call here
        todo!()
    }
}

// 2. Run an interview
let mut engine = InterviewEngine::new(0.2);
let llm = MyLlmProvider;

let question = engine.generate_question(&llm, Some("Build a parser")).await?;
let ambiguity = engine.record_answer(&question, "A log file parser", &llm).await?;

if engine.is_ready() {
    let result = engine.result();
    // Generate seed from interview
}

// 3. Create a seed directly
let seed = Seed::new(
    "Parse log files efficiently",
    vec!["Handle files up to 1GB".into()],
    vec![
        AcceptanceCriterion::new("AC-1", "Parse syslog format", 1),
        AcceptanceCriterion::new("AC-2", "Handle malformed lines", 2),
    ],
    OntologySchema::new(vec![
        OntologyField::new("log_entry", "object", "A parsed log entry"),
        OntologyField::new("severity", "enum", "Log severity level"),
    ]),
    vec!["Performance > 100MB/s".into()],
);

// 4. Run the evolutionary loop
let config = OuroborosConfig::default();
let mut evloop = EvolutionaryLoop::new(config);
let final_seed = evloop.run(seed, &llm).await?;

Core Concepts

Immutable Seeds

Seeds are the crystallized specification — frozen after creation. Mutations create new Seeds with incremented generation numbers and parent references.

let seed1 = Seed::new("Original goal", ...);
let seed2 = seed1.evolve(
    Some("Evolved goal".into()),
    None,  // keep constraints
    None,  // keep criteria
    Some(new_ontology),
);
assert_eq!(seed2.metadata.generation, 2);
assert_eq!(seed2.metadata.parent_id, Some(seed1.metadata.id));

Ambiguity Scoring

The interview engine scores clarity across dimensions:

Greenfield: ambiguity = 1.0 - (goal*0.40 + constraints*0.30 + criteria*0.30)
Brownfield: ambiguity = 1.0 - (goal*0.35 + constraints*0.25 + criteria*0.25 + context*0.15)

Gate: ambiguity <= 0.2 to proceed

OntologyDelta Similarity

Weighted field comparison for convergence detection:

similarity = name_present * 0.5 + type_match * 0.3 + exact_match * 0.2

where:
  name_present = fields in both / total fields
  type_match   = same-type fields / total fields
  exact_match  = identical fields / total fields

Multi-Signal Convergence

The evolutionary loop stops when any signal fires:

Signal Condition
Converged Ontology similarity >= 0.95
Stagnated Similarity unchanged for 3 consecutive generations
Oscillating Score alternates A-B-A-B (period-2 cycling)
Gates Met Eval score >= 0.7 AND all ACs pass
Max Generations 30 generations reached

3-Stage Evaluation Pipeline

Stage 1: Mechanical
  - Runs lint, build, test, coverage checks
  - Early termination: pipeline halts on failure
  - Coverage gate: >= 70%

Stage 2: Semantic (LLM)
  - Per-AC compliance assessment
  - Specification drift detection
  - Reward-hacking / gaming signal detection

Stage 3: Consensus
  - Voting mode: N models vote, threshold approval (default 2/3)
  - Deliberative mode: advocate + devil's advocate + judge

Typed Events

All state changes are captured as typed events (not untyped dictionaries):

pub enum EventPayload {
    InterviewStarted { is_brownfield: bool },
    QuestionAsked { round: usize, perspective: String, question: String },
    SeedGenerated { seed_id: String, generation: u32, ... },
    GenerationCompleted { generation: u32, eval_score: f64, ... },
    ConvergenceDetected { generation: u32, reason: String, similarity: f64 },
    // ... 18 variants total
}

Events are persisted to SQLite for session resume and audit trail.

API Reference

Key Types

Type Description
Seed Immutable specification with goal, constraints, ACs, ontology
OntologySchema Typed field definitions for the domain model
OntologyDelta Difference between two ontology schemas
InterviewEngine Multi-perspective Socratic interview state machine
AmbiguityScorer Weighted ambiguity scoring (greenfield/brownfield)
DoubleDiamond 4-phase execution engine
AcTree Hierarchical acceptance criteria tree
TopoSort Kahn's algorithm for parallel execution levels
EvalPipeline 3-stage evaluation orchestrator
EvolutionaryLoop Generation loop with convergence detection
WonderEngine Socratic gap analysis
ReflectEngine Ontology mutation proposer
SqliteEventStore Typed event persistence

Key Traits

/// Implement this for your LLM provider
#[async_trait]
pub trait LlmAdapter: Send + Sync {
    async fn complete(&self, config: CompletionConfig) -> Result<String, ProviderError>;
}

Configuration

{
  "max_generations": 30,
  "ambiguity_threshold": 0.2,
  "convergence_threshold": 0.95,
  "stagnation_window": 3,
  "max_decomposition_depth": 5,
  "min_coverage": 0.7,
  "consensus_threshold": 0.67,
  "consensus_model_count": 3
}

How It Differs from the Original

Aspect ouroboros (Python) ouroboros-rs (Rust)
Performance Python 3.12+ with asyncio Native binary, zero-cost abstractions
Events dict[str, Any] payloads 18 typed enum variants
Error handling Custom Result monad Native Result<T, E>
Immutability Pydantic frozen=True Rust ownership + no mut
State Async SQLite with ORM rusqlite (sync, no ORM overhead)
Distribution pip install + Python runtime Single binary
LLM interface Coupled to Claude/Codex SDKs Provider-agnostic trait
Dependencies 15+ (anthropic, textual, litellm...) 9 minimal deps
NoneType safety Runtime crashes (issue #275) Compile-time Option checks
Session resume Fragile (issue #50) Event replay from SQLite

Features Not Included

The following are intentionally excluded as non-core:

  • Claude Code / Codex runtime adapters — implement LlmAdapter instead
  • Product management workflows — secondary feature
  • TUI dashboard — UI concern, not core algorithm
  • MCP server — integration concern
  • Plugin system — distribution concern

Project Structure

src/
  lib.rs              -- Public API and module re-exports
  main.rs             -- CLI entry point (clap)
  config/mod.rs       -- Engine configuration
  llm/
    mod.rs            -- LLM module
    traits.rs         -- LlmAdapter trait, CompletionConfig, ProviderError
    mock.rs           -- MockLlmAdapter for testing
  seed/
    mod.rs            -- Seed module
    schema.rs         -- Seed, OntologySchema, OntologyField, AcceptanceCriterion
    generator.rs      -- SeedGenerator (from interview + evolution)
  interview/
    mod.rs            -- Interview module
    engine.rs         -- InterviewEngine state machine
    ambiguity.rs      -- AmbiguityScorer with weighted components
    perspectives.rs   -- 5 interview perspectives
  execution/
    mod.rs            -- Execution module
    double_diamond.rs -- DoubleDiamond 4-phase engine
    ac_tree.rs        -- AcTree hierarchical decomposition
    decomposition.rs  -- AcDecomposer (2-5 children, max depth 5)
    topo_sort.rs      -- TopoSort (Kahn's algorithm)
  evaluation/
    mod.rs            -- Evaluation module
    pipeline.rs       -- EvalPipeline orchestrator
    mechanical.rs     -- Stage 1: MechanicalEvaluator
    semantic.rs       -- Stage 2: SemanticEvaluator
    consensus.rs      -- Stage 3: ConsensusEvaluator
  evolution/
    mod.rs            -- Evolution module
    evloop.rs         -- EvolutionaryLoop (up to 30 generations)
    wonder.rs         -- WonderEngine (gap analysis)
    reflect.rs        -- ReflectEngine (ontology mutations)
    convergence.rs    -- ConvergenceChecker (multi-signal)
  lineage/
    mod.rs            -- Lineage module
    types.rs          -- OntologyLineage, GenerationRecord, OntologyDelta
    events.rs         -- OuroborosEvent, EventPayload (18 variants)
  persistence/
    mod.rs            -- Persistence module
    sqlite.rs         -- SqliteEventStore

Testing

cargo test           # Run all 199 tests
cargo test --lib     # Library tests only
cargo test -- seed   # Run seed module tests

Tests use MockLlmAdapter — no real LLM calls needed. All modules have comprehensive unit tests covering:

  • Seed immutability and serialization (11 tests)
  • Seed generation and JSON extraction (8 tests)
  • Interview state machine transitions (10 tests)
  • Ambiguity scoring with edge cases (12 tests)
  • Perspective selection by round (8 tests)
  • Evolutionary loop and convergence (3 tests)
  • Convergence detection signals (9 tests)
  • Wonder engine gap analysis (6 tests)
  • Reflect engine mutations (7 tests)
  • Double Diamond phases (10 tests)
  • AC tree operations (11 tests)
  • AC decomposition (6 tests)
  • Topological sort (9 tests)
  • Mechanical evaluation (9 tests)
  • Semantic evaluation (6 tests)
  • Consensus voting and deliberation (8 tests)
  • Evaluation pipeline (5 tests)
  • Event types and serialization (4 tests)
  • Event store persistence (11 tests)
  • LLM mock adapter (6 tests)
  • LLM traits (4 tests)
  • Configuration (3 tests)

License

MIT License - see LICENSE for details.

About

Rust reimplementation of ouroboros — Socratic interview, spec crystallization, Double Diamond execution, and evolutionary evaluation

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages