Spec-driven evolutionary workflow engine. Socratic interview, seed crystallization, Double Diamond execution, 3-stage evaluation, and evolutionary loop with convergence detection.
Inspired by ouroboros. Reimplemented from scratch in Rust.
The original ouroboros (Python) treats specifications as an evolving ontology rather than static documents. It combines Socratic questioning, ambiguity scoring, and an evolutionary loop where each generation's evaluation drives ontology mutations until convergence.
ouroboros-rs reimplements the core algorithmic pipeline in Rust, providing:
- 10-100x faster specification processing and evaluation cycles
- Single binary distribution with no Python runtime dependency
- Type-safe events — 18 typed event variants replace Python's
dict[str, Any] - Memory safety — no NoneType crashes (original issue #275)
- Efficient persistence — SQLite event store without async overhead
- Native Result<T, E> — zero-cost error handling that maps directly from the original's Result monad
User Input
|
[Interview Engine] -- Multi-perspective Socratic questioning
| (Researcher, Simplifier, Architect, BreadthKeeper, SeedCloser)
| Ambiguity gate: score <= 0.2
v
[Seed Generator] -- LLM extracts structured requirements
| -> immutable Seed with ontology schema
v
[Evolutionary Loop] -- Up to 30 generations
|
+-- [Wonder Engine] Identify gaps, tensions, assumptions (gen 2+)
+-- [Reflect Engine] Propose ontology mutations (add/modify/remove)
+-- [Seed Evolution] Apply mutations -> new immutable Seed
+-- [Double Diamond] Discover -> Define -> Design -> Deliver
| +-- AC Decomposition (2-5 children, max depth 5)
| +-- Topological Sort (Kahn's algorithm for parallel levels)
+-- [Evaluation Pipeline]
| +-- Stage 1: Mechanical (lint, build, test, coverage >= 0.7)
| +-- Stage 2: Semantic (LLM compliance, drift, gaming detection)
| +-- Stage 3: Consensus (multi-model voting OR deliberative)
+-- [Convergence Check]
+-- Ontology similarity >= 0.95
+-- Stagnation detection (3-gen window)
+-- Oscillation detection (period-2 cycling)
+-- Exit gates (score, AC pass, no regressions)
# From source
git clone https://github.com/JSLEEKR/ouroboros-rs.git
cd ouroboros-rs
cargo build --release
# Binary is at target/release/ouroboros-rs# Show default configuration
ouroboros-rs config
# Start an interview session
ouroboros-rs interview --prompt "Build a REST API for task management"
# Start brownfield interview (existing codebase)
ouroboros-rs interview --brownfield --threshold 0.15
# Run evolutionary loop on a seed
ouroboros-rs evolve --seed seed.json --max-generations 10
# Evaluate artifacts against a seed
ouroboros-rs evaluate --seed seed.json --artifacts output/
# Show version and info
ouroboros-rs infouse ouroboros_rs::{
InterviewEngine, Seed, EvolutionaryLoop, EvalPipeline, OuroborosConfig,
llm::{LlmAdapter, CompletionConfig, ProviderError},
seed::{AcceptanceCriterion, OntologySchema, OntologyField},
};
use async_trait::async_trait;
// 1. Implement the LlmAdapter trait for your provider
struct MyLlmProvider;
#[async_trait]
impl LlmAdapter for MyLlmProvider {
async fn complete(&self, config: CompletionConfig) -> Result<String, ProviderError> {
// Your LLM API call here
todo!()
}
}
// 2. Run an interview
let mut engine = InterviewEngine::new(0.2);
let llm = MyLlmProvider;
let question = engine.generate_question(&llm, Some("Build a parser")).await?;
let ambiguity = engine.record_answer(&question, "A log file parser", &llm).await?;
if engine.is_ready() {
let result = engine.result();
// Generate seed from interview
}
// 3. Create a seed directly
let seed = Seed::new(
"Parse log files efficiently",
vec!["Handle files up to 1GB".into()],
vec![
AcceptanceCriterion::new("AC-1", "Parse syslog format", 1),
AcceptanceCriterion::new("AC-2", "Handle malformed lines", 2),
],
OntologySchema::new(vec![
OntologyField::new("log_entry", "object", "A parsed log entry"),
OntologyField::new("severity", "enum", "Log severity level"),
]),
vec!["Performance > 100MB/s".into()],
);
// 4. Run the evolutionary loop
let config = OuroborosConfig::default();
let mut evloop = EvolutionaryLoop::new(config);
let final_seed = evloop.run(seed, &llm).await?;Seeds are the crystallized specification — frozen after creation. Mutations create new Seeds with incremented generation numbers and parent references.
let seed1 = Seed::new("Original goal", ...);
let seed2 = seed1.evolve(
Some("Evolved goal".into()),
None, // keep constraints
None, // keep criteria
Some(new_ontology),
);
assert_eq!(seed2.metadata.generation, 2);
assert_eq!(seed2.metadata.parent_id, Some(seed1.metadata.id));The interview engine scores clarity across dimensions:
Greenfield: ambiguity = 1.0 - (goal*0.40 + constraints*0.30 + criteria*0.30)
Brownfield: ambiguity = 1.0 - (goal*0.35 + constraints*0.25 + criteria*0.25 + context*0.15)
Gate: ambiguity <= 0.2 to proceed
Weighted field comparison for convergence detection:
similarity = name_present * 0.5 + type_match * 0.3 + exact_match * 0.2
where:
name_present = fields in both / total fields
type_match = same-type fields / total fields
exact_match = identical fields / total fields
The evolutionary loop stops when any signal fires:
| Signal | Condition |
|---|---|
| Converged | Ontology similarity >= 0.95 |
| Stagnated | Similarity unchanged for 3 consecutive generations |
| Oscillating | Score alternates A-B-A-B (period-2 cycling) |
| Gates Met | Eval score >= 0.7 AND all ACs pass |
| Max Generations | 30 generations reached |
Stage 1: Mechanical
- Runs lint, build, test, coverage checks
- Early termination: pipeline halts on failure
- Coverage gate: >= 70%
Stage 2: Semantic (LLM)
- Per-AC compliance assessment
- Specification drift detection
- Reward-hacking / gaming signal detection
Stage 3: Consensus
- Voting mode: N models vote, threshold approval (default 2/3)
- Deliberative mode: advocate + devil's advocate + judge
All state changes are captured as typed events (not untyped dictionaries):
pub enum EventPayload {
InterviewStarted { is_brownfield: bool },
QuestionAsked { round: usize, perspective: String, question: String },
SeedGenerated { seed_id: String, generation: u32, ... },
GenerationCompleted { generation: u32, eval_score: f64, ... },
ConvergenceDetected { generation: u32, reason: String, similarity: f64 },
// ... 18 variants total
}Events are persisted to SQLite for session resume and audit trail.
| Type | Description |
|---|---|
Seed |
Immutable specification with goal, constraints, ACs, ontology |
OntologySchema |
Typed field definitions for the domain model |
OntologyDelta |
Difference between two ontology schemas |
InterviewEngine |
Multi-perspective Socratic interview state machine |
AmbiguityScorer |
Weighted ambiguity scoring (greenfield/brownfield) |
DoubleDiamond |
4-phase execution engine |
AcTree |
Hierarchical acceptance criteria tree |
TopoSort |
Kahn's algorithm for parallel execution levels |
EvalPipeline |
3-stage evaluation orchestrator |
EvolutionaryLoop |
Generation loop with convergence detection |
WonderEngine |
Socratic gap analysis |
ReflectEngine |
Ontology mutation proposer |
SqliteEventStore |
Typed event persistence |
/// Implement this for your LLM provider
#[async_trait]
pub trait LlmAdapter: Send + Sync {
async fn complete(&self, config: CompletionConfig) -> Result<String, ProviderError>;
}{
"max_generations": 30,
"ambiguity_threshold": 0.2,
"convergence_threshold": 0.95,
"stagnation_window": 3,
"max_decomposition_depth": 5,
"min_coverage": 0.7,
"consensus_threshold": 0.67,
"consensus_model_count": 3
}| Aspect | ouroboros (Python) | ouroboros-rs (Rust) |
|---|---|---|
| Performance | Python 3.12+ with asyncio | Native binary, zero-cost abstractions |
| Events | dict[str, Any] payloads |
18 typed enum variants |
| Error handling | Custom Result monad | Native Result<T, E> |
| Immutability | Pydantic frozen=True |
Rust ownership + no mut |
| State | Async SQLite with ORM | rusqlite (sync, no ORM overhead) |
| Distribution | pip install + Python runtime | Single binary |
| LLM interface | Coupled to Claude/Codex SDKs | Provider-agnostic trait |
| Dependencies | 15+ (anthropic, textual, litellm...) | 9 minimal deps |
| NoneType safety | Runtime crashes (issue #275) | Compile-time Option checks |
| Session resume | Fragile (issue #50) | Event replay from SQLite |
The following are intentionally excluded as non-core:
- Claude Code / Codex runtime adapters — implement
LlmAdapterinstead - Product management workflows — secondary feature
- TUI dashboard — UI concern, not core algorithm
- MCP server — integration concern
- Plugin system — distribution concern
src/
lib.rs -- Public API and module re-exports
main.rs -- CLI entry point (clap)
config/mod.rs -- Engine configuration
llm/
mod.rs -- LLM module
traits.rs -- LlmAdapter trait, CompletionConfig, ProviderError
mock.rs -- MockLlmAdapter for testing
seed/
mod.rs -- Seed module
schema.rs -- Seed, OntologySchema, OntologyField, AcceptanceCriterion
generator.rs -- SeedGenerator (from interview + evolution)
interview/
mod.rs -- Interview module
engine.rs -- InterviewEngine state machine
ambiguity.rs -- AmbiguityScorer with weighted components
perspectives.rs -- 5 interview perspectives
execution/
mod.rs -- Execution module
double_diamond.rs -- DoubleDiamond 4-phase engine
ac_tree.rs -- AcTree hierarchical decomposition
decomposition.rs -- AcDecomposer (2-5 children, max depth 5)
topo_sort.rs -- TopoSort (Kahn's algorithm)
evaluation/
mod.rs -- Evaluation module
pipeline.rs -- EvalPipeline orchestrator
mechanical.rs -- Stage 1: MechanicalEvaluator
semantic.rs -- Stage 2: SemanticEvaluator
consensus.rs -- Stage 3: ConsensusEvaluator
evolution/
mod.rs -- Evolution module
evloop.rs -- EvolutionaryLoop (up to 30 generations)
wonder.rs -- WonderEngine (gap analysis)
reflect.rs -- ReflectEngine (ontology mutations)
convergence.rs -- ConvergenceChecker (multi-signal)
lineage/
mod.rs -- Lineage module
types.rs -- OntologyLineage, GenerationRecord, OntologyDelta
events.rs -- OuroborosEvent, EventPayload (18 variants)
persistence/
mod.rs -- Persistence module
sqlite.rs -- SqliteEventStore
cargo test # Run all 199 tests
cargo test --lib # Library tests only
cargo test -- seed # Run seed module testsTests use MockLlmAdapter — no real LLM calls needed. All modules have comprehensive unit tests covering:
- Seed immutability and serialization (11 tests)
- Seed generation and JSON extraction (8 tests)
- Interview state machine transitions (10 tests)
- Ambiguity scoring with edge cases (12 tests)
- Perspective selection by round (8 tests)
- Evolutionary loop and convergence (3 tests)
- Convergence detection signals (9 tests)
- Wonder engine gap analysis (6 tests)
- Reflect engine mutations (7 tests)
- Double Diamond phases (10 tests)
- AC tree operations (11 tests)
- AC decomposition (6 tests)
- Topological sort (9 tests)
- Mechanical evaluation (9 tests)
- Semantic evaluation (6 tests)
- Consensus voting and deliberation (8 tests)
- Evaluation pipeline (5 tests)
- Event types and serialization (4 tests)
- Event store persistence (11 tests)
- LLM mock adapter (6 tests)
- LLM traits (4 tests)
- Configuration (3 tests)
MIT License - see LICENSE for details.