-
Notifications
You must be signed in to change notification settings - Fork 442
Expand file tree
/
Copy pathllms-full.txt
More file actions
652 lines (486 loc) · 21.3 KB
/
llms-full.txt
File metadata and controls
652 lines (486 loc) · 21.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
# Ouroboros — Full Model Context Reference
> Specification-first workflow engine for AI coding agents.
> Package: ouroboros-ai | CLI: ouroboros | Claude Code skills: ooo
> Python >= 3.12 | License: MIT
---
## What Ouroboros Does
Ouroboros sits between a human and an AI coding runtime (Claude Code, Codex CLI).
It replaces ad-hoc prompting with a structured loop:
Interview -> Seed -> Execute -> Evaluate -> Evolve (repeat)
The core insight: most AI coding fails at the INPUT, not the output.
Ouroboros forces clarity before code through Socratic questioning and
ontological analysis.
---
## Command Surfaces
Two command surfaces exist. They are NOT a 1:1 mapping.
### ooo (Claude Code skills — run inside a Claude Code session)
ooo setup Register MCP server, configure project (one-time)
ooo interview Socratic questioning — expose hidden assumptions
ooo seed Crystallize interview into immutable spec (auto-invoked by interview; advanced/manual use only)
ooo run Execute via Double Diamond decomposition
ooo evaluate 3-stage verification gate
ooo evolve Evolutionary loop until ontology converges
ooo cancel Cancel a running or orphaned session
ooo unstuck 5 lateral thinking personas when stuck
ooo status Drift detection + session tracking
ooo ralph Persistent loop until verified
ooo update Update to latest version
ooo tutorial Interactive hands-on learning
ooo welcome Onboarding guide
ooo help Full reference
### ouroboros (Typer CLI — any terminal)
ouroboros setup Detect runtimes, configure Ouroboros
ouroboros interview Start interactive interview
ouroboros run Execute workflows from a seed file
ouroboros cancel Cancel stuck or orphaned executions
ouroboros status Check system status and execution history
ouroboros config Manage configuration settings
ouroboros tui Interactive TUI monitor
ouroboros monitor Shorthand for ouroboros tui monitor
ouroboros mcp MCP server commands
NOTE: Both `ooo interview` and `ouroboros interview` start the Socratic interview flow.
---
## Architecture Overview
### Source Layout
src/ouroboros/
bigbang/ Interview, ambiguity scoring, brownfield explorer
routing/ PAL Router — 3-tier cost optimization (1x / 10x / 30x)
execution/ Double Diamond, hierarchical AC decomposition
evaluation/ Mechanical -> Semantic -> Multi-Model Consensus
evolution/ Wonder / Reflect cycle, convergence detection
resilience/ 4-pattern stagnation detection, 5 lateral personas
observability/ 3-component drift measurement, auto-retrospective
persistence/ Event sourcing (SQLAlchemy + aiosqlite), checkpoints
orchestrator/ Runtime abstraction layer (Claude Code, Codex CLI)
core/ Types, errors, seed, ontology, security
providers/ LiteLLM adapter (100+ models)
mcp/ MCP client/server integration
plugin/ Plugin system (skill/agent auto-discovery)
tui/ Terminal UI dashboard (Textual)
cli/ Typer-based CLI
### Layers
Plugin Layer Skills (14) + Agents (9), hot-reload, magic prefix detection
Core Layer Immutable Seed, AC tree, ontology schema, version tracking
Execution Layer Double Diamond, dependency-aware parallel execution
State Layer SQLite event store, append-only, full replay, checkpoints
Orchestration 6-phase pipeline, PAL Router cost optimization
Presentation TUI dashboard (Textual), CLI (Typer)
---
## The Six Phases
Phase 0: BIG BANG Crystallize requirements into a Seed
Phase 1: PAL ROUTER Select appropriate model tier
Phase 2: DOUBLE DIAMOND Decompose and execute tasks
Phase 3: RESILIENCE Handle stagnation with lateral thinking
Phase 4: EVALUATION Verify outputs at three stages
Phase 5: SECONDARY LOOP Process deferred TODOs
(cycle back as needed)
### Phase 0: Big Bang
Components:
bigbang/interview.py InterviewEngine for Socratic interviews
bigbang/ambiguity.py Ambiguity score calculation
bigbang/seed_generator.py Seed generation from interview results
Process:
1. User provides initial context/idea
2. Engine asks clarifying questions (up to MAX_INTERVIEW_ROUNDS)
3. Ambiguity score calculated after each response
4. Interview completes when ambiguity <= 0.2
5. Immutable Seed generated
Ambiguity = 1 - Sum(clarity_i * weight_i)
Greenfield weights:
Goal Clarity 40%
Constraint Clarity 30%
Success Criteria 30%
Brownfield weights:
Goal Clarity 35%
Constraint Clarity 25%
Success Criteria 25%
Context Clarity 15%
Gate: Ambiguity <= 0.2
### Phase 1: PAL Router (Progressive Adaptive LLM)
Components:
routing/router.py Main routing logic
routing/complexity.py Task complexity estimation
routing/tiers.py Model tier definitions
routing/escalation.py Escalation logic on failure
routing/downgrade.py Downgrade logic on success
Tiers:
FRUGAL 1x cost complexity < 0.4
STANDARD 10x cost complexity < 0.7
FRONTIER 30x cost complexity >= 0.7 or critical
Complexity scoring:
complexity = 0.30 * norm_tokens + 0.30 * norm_tools + 0.40 * norm_depth
where:
norm_tokens = min(tokens / 4000, 1.0)
norm_tools = min(tools / 5, 1.0)
norm_depth = min(depth / 5, 1.0)
Escalation: 2 consecutive failures at current tier triggers escalation
Frugal -> Standard -> Frontier -> Stagnation Event
Downgrade: 5 consecutive successes triggers downgrade
Frontier -> Standard -> Frugal
Similar task patterns (Jaccard similarity >= 0.80) inherit tier preferences.
### Phase 2: Double Diamond
Components:
execution/double_diamond.py Four-phase execution cycle
execution/decomposition.py Hierarchical task decomposition
execution/atomicity.py Atomicity detection
execution/subagent.py Isolated subagent execution
Four phases:
1. Discover (divergent) — Explore problem space
2. Define (convergent) — Converge on core problem
3. Design (divergent) — Explore solution approaches
4. Deliver (convergent) — Converge on implementation
Recursive decomposition:
Each AC -> Discover + Define -> atomicity check
Atomic (single-focused, 1-2 files) -> Design + Deliver
Non-atomic -> decompose into 2-5 child ACs, recurse
Constraints:
MAX_DEPTH = 5 hard recursion limit
COMPRESSION_DEPTH = 3 context truncated to 500 chars at depth 3+
### Phase 3: Resilience
Components:
resilience/stagnation.py Stagnation detection (4 patterns)
resilience/lateral.py Persona rotation and lateral thinking
Stagnation patterns:
SPINNING Same output hash repeated (SHA-256), threshold: 3
OSCILLATION A->B->A->B alternating pattern, threshold: 2 cycles
NO_DRIFT Drift score unchanging (epsilon < 0.01), threshold: 3
DIMINISHING_RETURNS Progress rate < 0.01, threshold: 3
Lateral thinking personas:
HACKER Unconventional workarounds best for: SPINNING
RESEARCHER Seek more information best for: NO_DRIFT, DIMINISHING_RETURNS
SIMPLIFIER Reduce complexity best for: DIMINISHING_RETURNS, OSCILLATION
ARCHITECT Restructure fundamentally best for: OSCILLATION, NO_DRIFT
CONTRARIAN Challenge all assumptions best for: all patterns
### Phase 4: Evaluation
Components:
evaluation/pipeline.py Pipeline orchestration
evaluation/mechanical.py Stage 1: Mechanical checks
evaluation/semantic.py Stage 2: Semantic verification
evaluation/consensus.py Stage 3: Multi-model consensus
evaluation/trigger.py Consensus trigger matrix
Stage 1: Mechanical ($0)
Lint, build, test, static analysis, coverage (threshold: 70%)
Any check fails -> pipeline stops
Stage 2: Semantic ($$)
AC compliance, goal alignment, drift, uncertainty scoring
Score >= 0.8 and no trigger -> approved without consensus
Uses Standard tier model (temperature: 0.2)
Stage 3: Consensus ($$$)
Triggered by 1 of 6 conditions (checked in priority order):
1. Seed modification (seeds are immutable)
2. Ontology evolution (schema changes)
3. Goal reinterpretation
4. Seed drift > 0.3
5. Stage 2 uncertainty > 0.3
6. Lateral thinking adoption
Simple mode: 3 models vote (GPT-4o, Claude Sonnet 4, Gemini 2.5 Pro)
2/3 majority required
Deliberative mode: Advocate / Devil's Advocate / Judge roles
### Phase 5: Secondary Loop
Components:
secondary/todo_registry.py Non-blocking TODO capture during execution
secondary/scheduler.py Batch processing after primary goal
TODO Registration:
During execution, discovered improvements are registered asynchronously
via TodoRegistry without disrupting the primary flow.
Each TODO has: description, context (execution ID), priority, status
Priority levels:
HIGH Critical improvements, addressed first
MEDIUM Standard improvements, moderate impact
LOW Nice-to-have, minimal urgency
Batch Processing:
Activates only after primary goal completion (all ACs passed)
Processes TODOs in priority order (HIGH -> MEDIUM -> LOW)
Non-blocking failures: one failed TODO does not stop others
User can skip via --skip-secondary flag
BatchStatus:
COMPLETED All TODOs processed (some may have failed)
PARTIAL Processing stopped early (timeout)
SKIPPED User chose to skip
NO_TODOS No pending TODOs to process
Returns BatchSummary: total, success_count, failure_count, skipped_count
---
## Core Data Models
### Seed (Immutable Specification)
In the happy path, seeds are auto-generated by the interview (Phase 0).
Most users never create or edit seeds manually. Manual seed authoring is an
advanced workflow for power users — see docs/guides/seed-authoring.md.
class Seed(BaseModel, frozen=True):
goal: str # Primary objective
constraints: tuple[str, ...] # Hard requirements
acceptance_criteria: tuple[str, ...] # Success criteria
ontology_schema: OntologySchema # Output structure
evaluation_principles: tuple[EvaluationPrinciple, ...]
exit_conditions: tuple[ExitCondition, ...]
metadata: SeedMetadata
class SeedMetadata(BaseModel, frozen=True):
seed_id: str # auto-generated UUID
version: str # default "1.0.0"
created_at: datetime
ambiguity_score: float # 0.0 to 1.0
interview_id: str | None
class OntologySchema(BaseModel, frozen=True):
name: str
description: str
fields: tuple[OntologyField, ...]
class OntologyField(BaseModel, frozen=True):
name: str
field_type: str # "string" | "number" | "boolean" | "array" | "object"
description: str
required: bool = True
class EvaluationPrinciple(BaseModel, frozen=True):
name: str
description: str
weight: float # 0.0 to 1.0, default 1.0
class ExitCondition(BaseModel, frozen=True):
name: str
description: str
evaluation_criteria: str
Once generated, a Seed cannot be modified. Any change triggers consensus.
### Result Type
Result[T, E] — generic frozen dataclass for expected failures
Methods: ok(value), err(error), unwrap(), unwrap_or(default),
map(fn), map_err(fn), and_then(fn)
Properties: is_ok, is_err, value, error
### Error Hierarchy
OuroborosError (base)
ProviderError LLM provider failures (provider, status_code)
ConfigError Configuration issues (config_key, config_file)
PersistenceError Database/storage issues (operation, table)
ValidationError Data validation failures (field, value, safe_value)
---
## Event Sourcing
All state changes are immutable events in a single SQLite table (events):
Columns: id (UUID), aggregate_type, aggregate_id, event_type,
payload (JSON), timestamp, consensus_id
Event types use dot-notation past tense:
orchestrator.session.started
execution.ac.completed
Indexes (5): aggregate_type, aggregate_id, composite, event_type, timestamp
Features:
Append-only writes
Unit of Work pattern (events + checkpoint atomic commits)
Full replay capability
3-level rollback depth
5-minute periodic checkpointing
---
## Runtime Abstraction
### AgentRuntime Protocol
class AgentRuntime(Protocol):
def execute_task(prompt, tools, system_prompt, resume_handle)
-> AsyncIterator[AgentMessage]
async def execute_task_to_result(prompt, tools, system_prompt, resume_handle)
-> Result[TaskResult, ProviderError]
Key types:
AgentMessage Normalized streaming message (backend-neutral)
RuntimeHandle Frozen dataclass with session/resume state
TaskResult Collected outcome of completed task
### RuntimeHandle
@dataclass(frozen=True, slots=True)
class RuntimeHandle:
backend: str # "claude" | "codex" | custom
kind: str = "agent_runtime"
native_session_id: str | None
conversation_id: str | None
previous_response_id: str | None
transcript_path: str | None
cwd: str | None
approval_mode: str | None
updated_at: str | None
metadata: dict[str, Any]
Computed properties: lifecycle_state, is_terminal, can_resume,
can_observe, can_terminate
Methods: observe(), terminate(), snapshot(), to_dict(), from_dict()
### Shipped Adapters
ClaudeAgentAdapter (backend="claude")
Module: src/ouroboros/orchestrator/adapter.py
Wraps Claude Agent SDK / Claude Code CLI
Streaming via claude_agent_sdk.query()
Auto transient-error retry, session resumption
CodexCliRuntime (backend="codex")
Module: src/ouroboros/orchestrator/codex_cli_runtime.py
Drives OpenAI Codex CLI as session-oriented runtime
Parses newline-delimited JSON from stdout
Skill-command interception for deterministic MCP dispatch
### Runtime Factory
create_agent_runtime(backend, permission_mode, model, cwd)
Backend resolution order:
1. OUROBOROS_AGENT_RUNTIME env var
2. orchestrator.runtime_backend in ~/.ouroboros/config.yaml
3. Explicit backend= parameter
Aliases: claude/claude_code, codex/codex_cli
---
## MCP Integration
Ouroboros is an MCP Hub (both client and server).
### MCP Server Mode
ouroboros mcp serve
Exposed tools:
ouroboros_execute_seed Execute a seed specification
ouroboros_session_status Session status query
ouroboros_query_events Event store query
### MCP Client Mode
ouroboros run --mcp-config mcp.yaml seed.yaml
Tool precedence:
1. Built-in tools always win
2. First MCP server in config wins for duplicates
3. Use --mcp-tool-prefix to namespace
### MCP Types
TransportType: stdio | sse | streamable-http
ContentType: text | image | resource
MCPServerConfig: name, transport, command, args, url, env, timeout, headers
MCPToolDefinition: name, description, parameters, server_name
MCPToolResult: content, is_error, meta
MCPCapabilities: tools, resources, prompts, logging
### MCP Error Hierarchy
MCPError (base, extends OuroborosError)
MCPClientError
MCPConnectionError (transport)
MCPTimeoutError (timeout_seconds, operation)
MCPProtocolError
MCPServerError
MCPAuthError
MCPResourceNotFoundError
MCPToolError (tool_name, error_code)
---
## Drift Control
3-component weighted measurement:
Goal drift 50% weight
Constraint drift 30% weight
Ontology drift 20% weight
Drift score: 0.0 to 1.0
Threshold: <= 0.3 (high drift triggers re-examination)
Automatic retrospective every N cycles
---
## Ontology Convergence
Similarity = 0.5 * name_overlap + 0.3 * type_match + 0.2 * exact_match
Convergence threshold: similarity >= 0.95
Hard cap: 30 generations
Pathological pattern detection:
Stagnation: similarity >= 0.95 for 3 consecutive generations
Oscillation: Gen N ~ Gen N-2 (period-2 cycle)
Repetitive: >= 70% question overlap across 3 generations
---
## The Nine Agents
Loaded on-demand, never preloaded:
Socratic Interviewer Questions-only, never builds
Ontologist Finds essence, not symptoms
Seed Architect Crystallizes specs from dialogue
Evaluator 3-stage verification
Contrarian Challenges every assumption
Hacker Finds unconventional paths
Simplifier Removes complexity
Researcher Stops coding, starts investigating
Architect Identifies structural causes
---
## Configuration
### File Layout
~/.ouroboros/
config.yaml Main configuration
credentials.yaml API keys (chmod 600)
ouroboros.db SQLite event store
seeds/ Generated seed YAML files
data/ Reserved for future use
logs/ouroboros.log Log output
.env Optional, auto-loaded
### Config Sections
orchestrator Runtime backend selection, agent permissions
llm Model selection, permission mode
economics PAL Router tier definitions, escalation thresholds
clarification Phase 0 interview settings
execution Phase 2 Double Diamond settings
resilience Phase 3 stagnation/lateral thinking
evaluation Phase 4 evaluation pipeline settings
consensus Multi-model consensus settings
persistence SQLite event store settings
drift Drift monitoring thresholds
logging Log level, path, verbosity
### Key Environment Variables
ANTHROPIC_API_KEY Claude API key
OPENAI_API_KEY OpenAI API key
OUROBOROS_AGENT_RUNTIME Runtime backend override (claude | codex)
TERM=xterm-256color TUI terminal compatibility
### Minimal config.yaml
orchestrator:
runtime_backend: claude # claude | codex
logging:
level: info # debug | info | warning | error
persistence:
database_path: data/ouroboros.db
---
## Security Limits
Input validation constants (core/security.py):
MAX_INITIAL_CONTEXT_LENGTH 50,000 chars Interview input limit
MAX_USER_RESPONSE_LENGTH 10,000 chars Interview response limit
MAX_SEED_FILE_SIZE 1,000,000 bytes Seed YAML file size cap
MAX_LLM_RESPONSE_LENGTH 100,000 chars LLM response truncation
---
## Performance Characteristics
Event Store:
Append latency: < 10ms p99
Query latency: < 50ms for 1000 events
Storage: ~1KB per event
Compression: 80% reduction at checkpoints
TUI:
Refresh rate: 500ms polling
Event processing: < 100ms per update
Memory:
Base: 50MB
Per session: 10-100MB depending on complexity
Concurrency:
Agent pool: 2-10 parallel agents
Task queue: priority-based async processing
---
## TUI Dashboard
Terminal-based real-time workflow monitor (Textual framework).
Launch: ouroboros tui monitor (or ouroboros monitor)
Screens:
1 Dashboard Phase progress, AC tree, live status
2 Execution Timeline, phase outputs, events
3 Logs Filterable log viewer with level coloring
4 Debug State inspector, raw events, config
s Session Browse and switch sessions
e Lineage Evolutionary lineage across generations
State: TUIState dataclass in events.py, owned by app.py as SSOT
Event flow: EventStore -> app._subscribe_to_events() (poll 0.5s)
-> create_message_from_event() -> post_message()
---
## Extension Points
### Adding a New Runtime Adapter
1. Create module in src/ouroboros/orchestrator/
2. Implement AgentRuntime protocol (execute_task, execute_task_to_result)
3. Register in runtime_factory.py (add backend name set, extend resolve)
4. Emit RuntimeHandle with your backend tag
5. Update runtime_backend Literal in config/models.py
6. Write tests verifying AgentRuntime structural subtyping
### Custom Skills
Place in skills/ directory with SKILL.md defining:
name, version, description, magic_prefixes, triggers, mode, agents, tools
### Custom Agents
Place in src/ouroboros/agents/ as bundled markdown files, or in an explicit
override directory via OUROBOROS_AGENTS_DIR / .claude-plugin/agents/:
role, capabilities, tools
### MCP Server Integration
Register custom tool/resource handlers via MCPServerAdapter
or use ToolRegistry for the global registry
---
## Design Principles
1. Frugal First Start cheap, escalate only on failure
2. Immutable Seed Direction cannot change; only path adapts
3. Progressive Verification Cheap checks first, consensus at gates
4. Lateral Over Vertical When stuck, change perspective
5. Event-Sourced Every state change is an event; nothing lost
---
## Key File Locations
CLAUDE.md Dev environment setup, ooo command routing
docs/getting-started.md Onboarding guide (single source of truth)
docs/architecture.md Full architecture document
docs/config-reference.md Complete config reference
docs/api/core.md Core module API reference
docs/api/mcp.md MCP module API reference
docs/runtime-capability-matrix.md Runtime feature comparison
docs/runtime-guides/claude-code.md Claude Code backend guide
docs/runtime-guides/codex.md Codex CLI backend guide
docs/guides/seed-authoring.md Advanced seed authoring
docs/guides/evaluation-pipeline.md Evaluation pipeline details
docs/guides/tui-usage.md TUI dashboard reference
docs/contributing/ Contributor guides