Skip to content

Releases: Q00/ouroboros

v0.42.0

15 Jun 11:54

Choose a tag to compare

Ouroboros × gajae-code — Together, we build the future. Intention in. Software out.

v0.42.0 — A new kernel, frugal-or-frontier, and Claude without the SDK

v0.40 closed the loop, v0.41 let it run anywhere and trust what it ships. v0.42
adds another runtime kernel, lets you choose how hard it thinks — frugal or
frontier, per stage — and runs Claude with no SDK at all. Much of this release's
frugality and usability work grew out of @deepakdgupta1's direction-check
discussions; the credit, and the mapping, are at the bottom.

The headline

  • A new runtime kernel — GJC. gajae-code (gjc) joins Pi, Claude, Codex,
    Gemini, OpenCode, Goose, and Copilot as a swappable runtime. Ouroboros stays the
    workflow engine; GJC is just another kernel you can drop under it. Select it with
    orchestrator.runtime_backend: gjc (or ouroboros config), and setup --runtime gjc
    installs the ooo bridge so ooo … commands route back through Ouroboros.
  • Frugal or frontier — your call. A new reasoning-effort dial
    (low / medium / high) makes effort, not model family, the primary cost
    lever — dial it up where it counts, down where it doesn't. The dead
    complexity→tier router is gone, and the model floor for structured-extraction
    tasks dropped from Opus to Sonnet. Pick the agent and model per stage in the new
    guided ouroboros config GUI, with an effective-config view that shows what
    actually runs.
  • Claude without the SDK. A new ourocode ACP backend drives
    ourocode as a subprocess — no Anthropic SDK,
    no API key, just your Claude Pro/Max sign-in — for the in-process completion
    path (interview / seed / qa / evaluate). If you've wanted Ouroboros on your
    Claude subscription instead of an API key, this is the path.

ouroboros config — guided settings GUI with per-stage agent and model, and one-click Frugal / Balanced / Frontier presets

ouroboros config — set the runtime and model per stage (interview → execute →
evaluate → reflect), or one-click a Frugal / Balanced / Frontier preset for
every stage. The effective-config view shows what actually runs, with inheritance
and env-override sources. (Note: GJC is selectable as a stage agent right here —
Interview is running on gjc above.)

What's Changed

Runtimes & Agent OS

  • feat(runtime): GJC (gajae-code) runtime — foundation + RPC envelope protocol, GjcRuntime, GjcLLMAdapter, setup --runtime gjc + ooo bridge, docs (#1379#1383); allow GJC backend validation (#1439)
  • feat(providers): SDK-free Claude via the ourocode ACP backend (#1438)
  • feat(config): guided settings GUI + effective-config view — pick agent & model per stage (#1416)
  • fix(cli): pi backend switching and quiet dispatch logs (#1362)

Frugality & the effort dial

  • feat(providers): reasoning-effort dial — effort-first investment lever (#1435)
  • chore(frugality): remove dead tier-router organs + effort-first model defaults (#1434)
  • fix(frugality): stop Fable-5-style seed AC over-atomization (#1432)

Reliability — jobs, processes & WAL

  • feat(mcp): reconcile zombie jobs whose owning process has died (#1373)
  • fix(#1419): terminal jobs leak codex worker + companion shell processes (#1425)
  • fix(mcp): stop orphaned servers and reclaim WAL; speed up session list (#1359)
  • fix(mcp): detect client death behind uvx, bound shutdown, drain jobs before store close (#1407)
  • fix(mcp): keep terminal job results after handle ttl (#1433); mark run results as unevaluated (#1437)
  • fix(#1422): honor worktrees for dirty delegated execution (#1428); honor Codex stream timeout overrides (#1387)

Cleanup & refactoring

  • chore(cleanup): remove dead execution/secondary/routing packages — −14k LOC (#1410)
  • refactor(providers): consolidate cli-stream/child-env/kiro duplication, fix hermes unbounded buffer (#1409)
  • refactor(mcp): consolidate tool-layer duplication, fix durable cancel + lost audit rows (#1408)

Config, CLI & orchestrator

  • feat(cli): read-only job event polling (#1436); meaningful job_wait long-poll mode (#1426)
  • fix(config): make web port zero choose a free port (#1440); accept prose exit-condition lists (#1431)
  • feat(orchestrator): pace non-Claude delivery within a rate budget (#1372); plan delivery fan-out within concurrency limits (#1361); capability contracts — map skills to MCP tools, subagent orchestration, code-investigation + lateral persona metadata (#1365#1371)
  • fix(orchestrator): tighten usage-limit pause classification (#1364); preserve typed error metadata on hermes failures (#1360)

Docs & maintenance

  • docs(rfc): frugality control loop, spend estimator, spend actuator (effort dial), atomicity & decomposition, configuration coherence, journey transparency (#1392, #1402#1406)
  • chore(security): refresh & exact-pin deps, add Dependabot, harden release.yml (#1347); dependency & GitHub Actions bumps (#1348#1358)

Shaped in the open

The frugality and usability arc of this release started as direction-check
discussions from @deepakdgupta1 — failure analysis and acceptance properties
posted before any code — and landed as the RFC series and the work above. Thank
you.

  • Theme: token frugality — waste-only, goal-subordinate, learned guardrails → RFC #1403
  • Investment case: the complexity→tier mapping behind frugality → RFC #1404 / #1405 — which concluded the answer was not a tier map but an effort dial (#1435)
  • Investment case: atomicity & AC decomposition reliability → RFC #1406, and the over-atomization fix (#1432)
  • Theme: making Ouroboros more usable — transparency → RFC #1402, journey transparency (#1392), and the ouroboros config GUI (#1416)

Welcome also to first-time contributors @Yeachan-Heo (the entire GJC runtime
stack), @sergiobuilds, and @deepakdgupta1.


What's Changed

  • chore(security): refresh & exact-pin deps, add Dependabot, harden release.yml by @Q00 in #1347
  • chore(deps): bump actions/download-artifact from 4 to 8 by @dependabot[bot] in #1348
  • chore(deps): bump actions/setup-python from 5 to 6 by @dependabot[bot] in #1350
  • chore(deps): bump astral-sh/setup-uv from 4 to 8.1.0 by @dependabot[bot] in #1351
  • chore(deps): bump actions/checkout from 4 to 6.0.2 by @dependabot[bot] in #1352
  • chore(deps): bump softprops/action-gh-release from 2 to 3 by @dependabot[bot] in #1353
  • chore(deps): bump actions/upload-artifact from 4 to 7 by @dependabot[bot] in #1354
  • chore(deps-dev): bump types-pyyaml from 6.0.12.20250915 to 6.0.12.20260518 by @dependabot[bot] in #1356
  • chore(deps): bump rich from 14.3.3 to 15.0.0 by @dependabot[bot] in #1357
  • chore(deps-dev): bump mypy from 1.19.1 to 2.1.0 by @dependabot[bot] in #1358
  • fix(orchestrator): preserve typed error metadata on hermes failures (usage-limit pause regression) by @deepakdgupta1 in #1360
  • feat(orchestrator): plan delivery fan-out within backend concurrency limits by @deepakdgupta1 in #1361
  • fix(mcp): stop orphaned servers and reclaim WAL; speed up session list by @Q00 in #1359
  • fix(cli): add pi backend switching and quiet dispatch logs by @Q00 in #1362
  • fix(orchestrator): tighten usage-limit pause classification by @Q00 in #1364
  • feat(orchestrator): [stack 1/7] map skills to MCP tools by @Q00 in #1365
  • feat(backends): [stack 2/7] describe subagent orchestration support by @Q00 in #1366
  • fix(evaluation): [stack 3/7] preserve mechanical diagnostic changes by @Q00 in #1367
  • feat(orchestrator): [stack 4/7] add owned tool capability contracts by @Q00 in #1368
  • feat(mcp): [stack 5/7] emit code-investigation metadata by @Q00 in #1369
  • feat(mcp): [stack 6/7] consume lateral persona metadata by @Q00 in #1370
  • test(orchestrator): [stack 7/7] cover capability review follow-ups by @Q00 in #1371
  • chore(deps): bump codecov/codecov-action from 3 to 6.0.1 by @dependabot[bot] in #1349
  • chore(deps): bump the python-minor-patch group across 1 directory with 10 updates by @dependabot[bot] in #1355
  • feat(orchestrator): pace non-Claude delivery within a configurable rate budget by @deepakdgupta1 in #1372
  • feat(mcp): reconcile zombie jobs whose owning process has died by @deepakdgupta1 in #1373
  • feat(orchestrator): surface non-native execution-parameter handling (#1374 rebased) by @Q00 in #1393
  • docs(rfc): journey transparency — state breadcrumb + TUI surfacing by @Q00 in #1392
  • docs(rfc): configuration coherence — surface config, make reload honest (#1376) by @deepakdgupta1 in #1402
  • docs(rfc): token frugality as one control loop — attribution + advisory guardrails (#1377) by @deepakdgupta1 in #1403
  • docs(rfc): the spend estimator — difficulty + stakes from measured inputs (#1384) by @DeepakDG...
Read more

v0.41.0

07 Jun 09:30

Choose a tag to compare

v0.41.0 — Run it anywhere, and trust what it ships

A week ago, ooo auto learned to finish the job on its own. This release makes
that autonomy something you can actually rely on: it runs on one more runtime,
it refuses to start building until the goal is unambiguous, and the verdict that
decides "is this actually done?" can no longer be gamed.

The headline

Autonomy is only worth as much as the trust behind it. v0.40.0 closed the loop —
goal in, product out. v0.41.0 spends its week hardening the two ends of that
loop and widening the floor it runs on.

  • Run it anywhere. Pi joins Claude, Codex, Gemini, OpenCode, Goose, and
    Copilot as a first-class runtime. Ouroboros stays the workflow engine; the
    runtime is a swappable kernel. Installing it got more reliable, and every
    default model pin now lives in one place.
  • Trust what it ships — at the input. The Socratic interview no longer thinks
    alone. At every ambiguity milestone it convenes a panel — a researcher, a
    contrarian, a simplifier — to surface hidden assumptions before the question
    reaches you. And ooo auto will not start building until the Seed is genuinely
    low-ambiguity and passes QA.
  • Trust what it ships — at the output. The verifier's verdict is now typed,
    audited, and routed by an explicit admission policy. A test that really ran but
    reported the wrong evidence form is no longer smeared as "fabrication," and a
    faked clean run still doesn't pass.

🖥️ Run it anywhere — the Agent OS gets a new kernel

Pi is now a first-class Ouroboros runtime. Ouroboros owns the workflow engine,
Seed decomposition, checkpointing, evaluation handoff, and ooo skill dispatch;
for each runtime task it shells out to pi --mode json and normalizes Pi's JSONL
events into Ouroboros AgentMessage values. As the new runtime guide puts it:
"Pi is an Ouroboros runtime" means the runtime is selectable — not that Pi is
imported into Ouroboros.
That is the whole Agent OS thesis in one sentence.

  • PiLLMAdapter for --llm-backend pi; pi / pi_cli registered as LLM- and
    interview-driver-capable in the backend registry and provider factory (#1326)
  • Pi backend-aware default-model normalization — default --llm-backend pi uses
    Pi's own backend default instead of forwarding an Anthropic model name (#1326)
  • Align the Pi runtime with documented JSON mode (#1321)
  • Report malformed Pi runtime events as a typed ProviderError instead of
    failing opaquely (#1325)
  • Wire the Pi runtime setup surface — ouroboros setup --runtime pi installs the
    managed Pi bridge (5c674c1)
  • Opt-in native Pi CLI smoke test for end-to-end confidence (#1329)

Installing and updating got more trustworthy. The week's two same-day releases
surfaced real install-path risk; this closes it.

  • Run setup with the freshly installed ouroboros binary, not a stale one
    left on PATH (#1345)
  • Installer UX improvements; pipx/pip install paths now preserve existing PATH
    precedence (#1343)

One source of truth for model pins. The same default model strings were
hand-copied across three layers, so Opus had silently been frozen at 4.6 since
February. They now live in a single _model_defaults.py.

  • Centralize every default Claude model pin into one source of truth
    (_model_defaults.py) and pin exact snapshots rather than the "default"
    sentinel, so evaluation/consensus grading stays reproducible. Net move: the
    Opus reasoning tier → 4.8 (interview, seed, ontology, evaluation, consensus
    advocate); the Sonnet judgment tier (qa_model) stays pinned at 4.6,
    retiring the dated claude-sonnet-4-20250514 (#1324, #1323)

Roadmap, in the open. A point-in-time AgentOS issue-sequencing graph
(Track A / B / C) is now published so you can see which merged PRs resolved which
roadmap tracks. #961 remains the canonical roadmap SSOT (#1293).

Housekeeping. Prune unused optional packages (#1301); pin typer before the
vendored click to stabilize resolution (#1300).


🧠 Trust what it ships — at the input: the interview stops thinking alone

Ouroboros has always opened with a single questioner. Now that questioner has a
panel. Milestone lateral review is promoted from a non-blocking advisory to a
required lightweight subagent pass
at exactly the moments hidden assumptions
start to bite.

  • When an interview crosses an ambiguity milestone — initial → progress,
    progress → refined, refined → ready — the main session dispatches
    ouroboros_lateral_think with researcher, contrarian, and simplifier
    personas (adding architect when the answer changes system shape or ownership)
    before answering or asking the returned question (9d229c4)
  • This is the supported "deep research style" interview experience: multiple
    perspectives visibly help, while the final prompt stays easy to answer. Results
    are folded into 2–3 concrete options or one recommended draft — not dumped
    as a report
  • Lateral review also fires whenever the main session would otherwise compress a
    user's free-text into a decision, or when the question is about tradeoffs,
    priorities, non-goals, risk, success criteria, or rollout
  • run_lateral_review is now a declared interview capability, with per-runtime
    capability/instruction artifacts wired in (9d229c4)

ooo auto won't build something underspecified. The interview no longer
closes on ledger completeness alone.

  • Gate auto runs on backend-confirmed low ambiguity (≤ 0.20) plus a pre-run
    Seed QA pass for both the MCP and CLI entrypoints; QA findings feed back into
    bounded Seed-repair attempts before blocking, so failures are actionable and
    resumable (#1302)
  • Normalize natural worktree-policy names (e.g. create_isolated_worktree → always)
    and fail fast when complete_product=true is paired with a too-short timeout,
    instead of burning the budget in the interview and blocking late (#1305)

🛡️ Trust what it ships — at the output: a verdict you can't game

The more autonomous the loop, the more its "done" has to mean done. This release
makes the verifier's decision typed, auditable, and policy-routed (RFC #814,
Verdict Envelope v1).

  • Promote TraceGuard verdict admission into VerifierVerdict: H1 verifier
    output now carries a typed status, evidence refs, and a retry_admission, and
    ACCEPT / RETRY / REDISPATCH / ESCALATE_MODEL / ESCALATE_HUMAN / BLOCK decisions
    are persisted on atomic typed-evidence events (#1330)
    • Benchmark fixtures: accepted → ACCEPT, missing evidence → EVIDENCE_MISSING / RETRY, semantic miss → SCOPE_CREEP / REDISPATCH, repeated fabrication → FABRICATION_SUSPECTED / ESCALATE_MODEL
  • Prefer the verifier's retry-admission policy (H7): re-run the same leaf only
    when retry_admission=RETRY; honor intentional divergence between
    failure_class and retry_admission (e.g. FABRICATION_SUSPECTED +
    REDISPATCH) instead of inferring policy from the failure class alone (#1331)
  • Classify masked test evidence fairly (#1292): a transcript that clearly ran
    the test command but masked its status behind an output filter (… | tail) is
    now EVIDENCE_FORM_MISMATCH — retryable, with actionable feedback (e.g. add
    set -o pipefail) — rather than FABRICATION_SUSPECTED. The #1208 guard holds:
    unprotected output-filter pipelines still don't prove a clean commands_run
    claim. The verifier's evidence boundary is now codified in docs so core stays
    language- and runner-agnostic

What's Changed

Runtimes & Agent OS

  • feat(providers): add Pi LLM adapter (#1326)
  • fix(pi): align runtime with documented JSON mode (#1321)
  • fix(pi): report malformed runtime events (#1325)
  • fix(setup): wire Pi runtime setup surface (5c674c1)
  • test(orchestrator): add opt-in Pi CLI smoke test (#1329)
  • fix(installer): prefer freshly installed ouroboros for setup (#1345)
  • feat(installer): improve install script UX (#1343)
  • refactor(config): centralize Claude model pins into a single source of truth (align to 4.8) (#1324)
  • fix(config): replace retiring qa_model default with claude-sonnet-4-6 (#1323)
  • chore(deps): prune unused optional packages (#1301)
  • fix(deps): pin typer before vendored click (#1300)
  • fix(opencode): cover Windows cleanup review blockers (#1320)
  • fix(goose): keep LLM completion calls profile-free (#1303)
  • fix(run): guard home dir in _detect_project_root_from_seed_path (#1313)

Interview (the philosophy layer)

  • feat(interview): dispatch lateral review at milestones (9d229c4)
  • fix(auto): gate runs on low-ambiguity seed QA (#1302)
  • Harden ooo auto policy aliases and timeout preflight (#1305)

Verifier & harness integrity

  • feat(harness): promote TraceGuard verdict admission (#1330, refs #814)
  • fix(h7): prefer verifier retry admission policy (#1331)
  • fix(orchestrator): classify masked test evidence forms (#1292, refs #1234)

Docs

  • docs(providers): document Pi provider surfaces (#1327)
  • docs(runtime): fix shipped backend wording (#1332)
  • docs(agentos): add issue sequencing graph snapshot (#1293)
  • Verdict Envelope v1 RFC, verifier-evidence-policy, runtime-capability-matrix,
    Pi runtime guide, and contributing/key-patterns updates

What's Changed

  • fix(orchestrator): classify masked test evidence forms by @Q00 in #1292
  • docs(agentos): add issue sequencing graph snapshot by @Q00 in #1293
  • fix(deps): pin typer before vendored click by @Q00 in #1300
  • chore(deps): prune unused optional packages by @Q00 in #1301
  • fix(goose): keep LLM completion calls profile-free by @mdc2122 in #1303
  • fix(run): guard home dir in _detect_project_root_from_seed_path by @kenlin8827 in #1313
  • fix(opencode): cover Windo...
Read more

v0.40.1

30 May 13:59
@Q00 Q00

Choose a tag to compare

What's Changed

Bug Fixes

  • Include click as an installer runtime dependency (#1299)

Full Changelog: v0.40.0...v0.40.1

What's Changed

  • fix(installer): include click runtime dependency by @Q00 in #1299

Full Changelog: v0.40.0...v0.40.1

v0.40.0

30 May 05:25
@Q00 Q00

Choose a tag to compare

v0.40.0 — ooo auto crosses the line

This is the release where ooo auto stops being a demo and becomes a machine that finishes your work.

You drop in a vague intention. The Socratic interview pins it down into a precise,
machine-checkable goal — and then the engine refuses to stop until that goal is
actually built, verified, and shipped. No babysitting. No "it drew up a plan
and gave up halfway." The loop owns the outcome, end to end.

This is not "generate a plan." This is goal in, product out — autonomously.

The headline

The interview no longer stops at understanding. It stops at done.

ooo auto specifies your goal and then drives the full
Interview → Seed → Execute → Evaluate loop on its own — and keeps going until
the goal is real. Then it goes a little further. This is a feature that runs
beyond the goal.

  • A seamless run, end to end. ooo auto is no longer a convenience wrapper
    around the steps — it's a single closed loop that takes your intent and carries
    it all the way to a shipped product. Interview lifecycle events stream into the
    EventStore, detached job tracking got real UX, and auto.product.emitted fires
    the moment Ralph hits a successful terminal — so you know it actually delivered.

  • An interview that specifies your goal AND finishes the job. The interview
    no longer just clarifies and quits. Closure ladders, auto_fill_remaining and
    partial_seed_from_evidence substrates, and safe-default synthesis mean a
    non-converging conversation still becomes a real, executable product instead of
    a dead end. Deadlines route through a closure ladder, not a terminal BLOCKED.

  • A loop that will not quit until your goal is done. Ralph persistence,
    wall-clock RuntimeControls + Watchdog, checkpoint-committed coding sessions,
    and oscillation detection routed through lateral UNSTUCK escalation keep the
    engine grinding through stalls, recovery, and dead patches until verification
    actually passes — not until it gets tired.

  • Beyond the goal. This is the headline: the system is now built to run past
    the point where you stop watching. You set the goal; it carries the goal to
    completion on its own. That capability did not exist before this release —
    a feature beyond the goal.


What's Changed

ooo auto — autonomous completion

  • Emit auto.product.emitted on Ralph success terminal (#1297)
  • auto_fill_remaining substrate for non-converging interviews (#1296)
  • Interview deadline → closure ladder, not terminal BLOCKED (#1270)
  • partial_seed_from_evidence substrate (#1269)
  • Degraded seed → partial product terminal (#1271)
  • Isolate coding sessions with checkpoint commits (#1281)
  • Wire interview lifecycle events to EventStore (#1260)
  • Make ledger_done the primary interview closure check (#1252)
  • Safe-default lateral escalation substrate + matcher-fire routing (#1250, #1251)
  • Route Ralph oscillation_detected through UNSTUCK_LATERAL (#1175)
  • Propagate closure_mode to seed grading (#1265)
  • Safe-default closure mode + partial-unsafe blocker code (#1167)
  • Ledger-derived task-class inference + task-class catalog (#1173, #1177)
  • Additive assumption_sources provenance surface (#1169)
  • Surface defaulted_sections in AutoPipelineResult (#1146)
  • Canonical stop_reason_code for interview-layer blockers (#1151)
  • Relay auto interview questions in progress output (#1284)
  • Improve detached auto job tracking UX (#1286)

Runtime, orchestrator & acceptance

  • Runtime acceptance evidence (L3-1) (#1181)
  • RuntimeControls + wall-clock Watchdog runner (L2-1) (#1178)
  • Canonical acceptance harness skeleton (L0-a) (#1174)
  • AgentOS health-readiness table + release-readiness triage (#1282)
  • Plugin v0.4 schema + tool-call hook type promotion (#1277)
  • Plumb force flag through ouroboros_generate_seed (#1158)

Bug Fixes

  • Prevent timezone comparison TypeError when merging events (#1298)
  • Stop blocking language/runtime/greenfield interview questions (#1295)
  • Size Ralph per-iteration timeout to pipeline budget, not 1800s default (#1294)
  • Require terminal run evidence before Ralph (#1279)
  • Clean up synchronous complete-product runs (#1280)
  • Close backend-ready fallback resumes (#1278)
  • Synthesize seed when authoring backend is unavailable (#1261)
  • Make L1 CLI predicate match on goal signal alone (#1264)
  • Make watchdog controls replay safe (#1207)
  • Clear stale stop reason codes / expose active task class in MCP meta (#1194, #1196)
  • Honor prompt-declared non_goals in unsafe-context matcher (#1221)
  • Use .ouroboros marker and existence gate for project_dir resolution (#1246)
  • Resolve project dirs for central seeds (#1161)
  • Migrate Codex setup profiles to profile-v2; accept official Rust CLI binaries (#1268, #1162)
  • Correlate verifier diagnostics; credit transcript test commands for tests_passed (#1198, #1166)
  • Match command claims wrapped in output redirection/pager pipes (#1168)
  • Tolerate gradle verifier evidence; extend AC stall watchdog budget (#1238, #1233)
  • Register and wire the qa command (#1230)
  • Preserve MCP tool metadata across transport (#1210)
  • Bound workflow lifecycle reason_code/refs and nesting depth (#1144)
  • Guard workflow_ir aggregate type against raw append (#1147)
  • Sanitize Windows-reserved chars in checkpoint seed_id (#1156)
  • Enforce runtime artifact env; keep plugin artifact hooks self-contained (#1197, #1206)
  • Project blocked plugin invocations as workflow failures (#1215)
  • Prompt required grants during plugin install (#1209)
  • Keep fallback/installer installs stable-only; align plugin version fallback with hatch-vcs (#1217, #1216)
  • Reject status-masking test pipelines (#1208)

Docs, Tests & Maintenance

  • AgentOS profile taxonomy, ControlJournal, IR ↔ projection mapping contracts locked (#1275, #1274, #1150)
  • Plugin artifact/state and before/after tool-call hook contracts defined (#1276, #1145)
  • Canonical regression coverage for closure ladder, runtime probes, live-run evidence (#1272, #1222, #1195)
  • CI: enforce ooo auto R-run section per RFC #1256 §I5 (#1259)
  • Make Hermes runtime cwd assertions cross-platform (#1288)

Full Changelog: v0.39.1...v0.40.0

What's Changed

  • fix(persistence): sanitize Windows-reserved chars in checkpoint seed_id (fixes #1155) by @Jun-0913 in #1156
  • feat(auto): canonical stop_reason_code for interview-layer blockers by @shaun0927 in #1151
  • fix(auto): close interview on ledger-only consensus at max_rounds by @shaun0927 in #1148
  • feat(auto): surface defaulted_sections in AutoPipelineResult by @shaun0927 in #1146
  • feat(auto): safe-default closure mode + partial-unsafe blocker code (PR-B2) by @shaun0927 in #1167
  • feat(auto): additive assumption_sources provenance surface (PR-C2) by @shaun0927 in #1169
  • fix(hook): treat configured installs without prefs.json as returning users by @lifrary in #1152
  • feat(auto): task-class catalog data (L1-a) by @shaun0927 in #1173
  • feat(mcp): plumb force flag through ouroboros_generate_seed by @hooni0918 in #1158
  • feat(auto): route Ralph oscillation_detected through UNSTUCK_LATERAL (L5-a) by @shaun0927 in #1175
  • fix(orchestrator): credit transcript test commands for tests_passed claims by @nkjunbc in #1166
  • fix(orchestrator): match command claims wrapped in output redirection/pager pipes by @nkjunbc in #1168
  • feat(tests): canonical acceptance harness skeleton (L0-a) by @shaun0927 in #1174
  • feat(auto): ledger-derived task-class inference (L1-b) by @shaun0927 in #1177
  • feat(runtime): RuntimeControls + wall-clock Watchdog runner (L2-1) by @shaun0927 in #1178
  • feat(orchestrator): runtime acceptance evidence (L3-1) by @shaun0927 in #1181
  • fix(plugin): accept command-level AgentOS metadata by @shaun0927 in #1180
  • test(canonical): emit L0 summary and lock fixture failures by @shaun0927 in #1182
  • docs(auto): align L1 task-class follow-up labels by @shaun0927 in #1183
  • docs(auto): align convergence contract with safe-default closure by @shaun0927 in #1184
  • fix(auto): surface assumption_sources through envelope clients by @shaun0927 in #1185
  • fix(orchestrator): preserve shell-preamble command proof after output plumbing by @shaun0927 in #1186
  • fix(auto): route Ralph oscillation replay through UNSTUCK_LATERAL by @shaun0927 in #1187
  • feat(auto): Seed AC injection + active_task_class envelope (L1-d, L1-e) by @shaun0927 in #1188
  • feat(auto): wall-clock watchdog integration in AutoPipeline (L2-2) by @shaun0927 in #1189
  • feat(auto): runtime-probe envelope + advisory probe_runner (L3-2) by @shaun0927 in #1190
  • feat(tests): L0 live-wire + L1 catalog cross-validate (P1) by @shaun0927 in #1191
  • Keep plugin runtime writes outside trusted homes by @shaun0927 in #1193
  • fix(tests): align canonical live-run evidence by @Q00 in #1195
  • fix(auto): expose active ...
Read more

v0.39.1

20 May 10:56
@Q00 Q00

Choose a tag to compare

What's Changed

Features

  • Add ouroboros status run --json projection surface (#1133)
  • Record durable workflow lifecycle events in orchestrator (#1134)
  • Add on_error/on_cancel plugin observability hooks (PR E) (#1137)
  • Expose MCP interview reasoning metadata (#1140)
  • Prompt for required trust grants on plugin install (#1141)
  • Expose Ralph-start alias while preserving runtime ownership
  • Dispatch lifecycle hooks within plugin trust boundaries
  • Make plugin permission waits share the typed HITL contract
  • Expose projection checkpoint anchors safely
  • Expose plugin manifests as harness descriptors
  • Let safe-default synthesis close persisted interviews
  • Surface malformed Claude tool-use turns at the provider boundary

Bug Fixes

  • Defer lateral advisory side effects in interview (#1130)
  • Make plugin workflow ids collision-proof
  • Advise first live milestone crossing in interview
  • Make auto ledger conflicts deterministic
  • Preserve bounded recovery redispatch semantics
  • Validate HITL timeout decisions through replayed state
  • Keep safe defaults tied to persisted interviews

Testing & Hardening

  • Expand workflow IR conformance harness (#1135)
  • Add mechanical-evaluation projection fixture (#1132)
  • Lock plugin lifecycle conformance baseline
  • Lock the short-goal interview convergence matrix against regression
  • Lock projection fixture evidence flow

Instrumentation & Docs

  • Emit structured-log events at safe-default decision points in auto (#1138)
  • Mark completed projection follow-up slots in agentos docs (#1136)
  • Persist init interview HITL telemetry without coupling the renderer
  • Record interview lateral-review design before implementation
  • Update README

Full Changelog: v0.39.0...v0.39.1

What's Changed

  • docs: define interview milestone lateral contract by @honor2030 in #1108
  • feat(plugin): add hook runtime audit schema names by @shaun0927 in #1109
  • fix(runtime): surface malformed tool-use turns by @shaun0927 in #1111
  • feat(hitl): record init interview responses by @shaun0927 in #1112
  • feat(mcp): add start ralph tool alias by @shaun0927 in #1113
  • feat(hitl): validate timeout events from replay by @shaun0927 in #1114
  • Document runtime delegation ownership contract by @shaun0927 in #1115
  • Specify plugin permission HITL contract by @shaun0927 in #1116
  • test(auto): cover #821 short-goal interview convergence matrix by @shaun0927 in #1117
  • feat(plugin): dispatch v1 lifecycle hooks by @shaun0927 in #1110
  • feat(plugin): expose manifest descriptor projection by @shaun0927 in #1118
  • feat(auto): consume lateral recovery plans for Ralph redispatch by @shaun0927 in #1120
  • feat(auto): centralize deterministic ledger conflict policy by @shaun0927 in #1121
  • test(plugin): lock v0.3 lifecycle conformance by @shaun0927 in #1119
  • fix(auto): close safe-defaultable interview gaps at max rounds by @shaun0927 in #1122
  • test(projection): lock mechanical evaluation fixture by @shaun0927 in #1123
  • feat(projection): surface context checkpoint anchors by @shaun0927 in #1124
  • test(workflow): lock projection boundary fixture by @shaun0927 in #1125
  • feat(plugin): classify terminal hook contract by @shaun0927 in #1127
  • feat(interview): surface milestone lateral review advisories by @shaun0927 in #1128
  • feat(workflow): represent plugin actions as planned nodes by @shaun0927 in #1126
  • fix(auto): let safe-default synthesis close interviews by @shaun0927 in #1129
  • fix(interview): defer lateral advisory side effects by @shaun0927 in #1130
  • feat(plugin): prompt for required trust grants on install by @Q00 in #1141
  • docs(agentos): mark completed projection follow-up slots by @shaun0927 in #1136
  • test(harness): add mechanical-evaluation projection fixture by @shaun0927 in #1132
  • instrument(auto): emit structured-log events at safe-default decision points by @shaun0927 in #1138
  • Expose MCP interview reasoning metadata by @Q00 in #1140
  • feat(cli): add ouroboros status run --json projection surface by @shaun0927 in #1133
  • feat(orchestrator): record durable workflow lifecycle events by @shaun0927 in #1134
  • feat(plugin): add on_error/on_cancel observability hooks (PR E) by @shaun0927 in #1137
  • test(orchestrator): expand workflow IR conformance harness by @shaun0927 in #1135

Full Changelog: v0.39.0...v0.39.1

v0.39.0

18 May 08:34
@Q00 Q00

Choose a tag to compare

Ouroboros v0.39.0

This release lands a high-severity security fix, flips ooo run to the
fat-harness execution path by default, and completes the AgentOS roadmap
wiring/baseline milestone tracked in #961.

🔒 Security

RCE via untrusted project-directory .env (high severity)

Ouroboros is run inside cloned repositories. config/loader.py loaded
./.env from the working directory into os.environ at import time with the
same trust as the home-directory ~/.ouroboros/.env. Because
OUROBOROS_*_CLI_PATH and the runtime/backend selector env vars decide which
binary
the Claude Agent SDK / runtime adapters spawn, a malicious repository
could ship a .env plus an executable script and achieve arbitrary code
execution
on the victim's machine as soon as they ran any command that builds
a runtime adapter (e.g. ooo, ouroboros init).

  • Classification: CWE-426 (Untrusted Search Path) + CWE-15 (External
    Control of System or Configuration Setting)
  • Root cause: the project-directory .env travels with whatever
    repository the user cloned and is therefore an untrusted trust boundary;
    it was conflated with the trusted home config.

Fixes:

  • Denylist for untrusted .env (#1078):
    blocks the 8 OUROBOROS_*_CLI_PATH keys plus the runtime/backend selectors
    (OUROBOROS_AGENT_RUNTIME, OUROBOROS_RUNTIME, OUROBOROS_LLM_BACKEND)
    when loading an untrusted .env.
  • Fail-closed default: _load_env_file now defaults to trusted=False;
    only ~/.ouroboros/.env opts into trust explicitly, so any future caller is
    safe by default.
  • Defense in depth: ClaudeCodeAdapter._resolve_cli_path rejects any
    resolved CLI path inside the current working directory and falls back to the
    SDK bundled CLI — a legitimate Claude CLI is always a global install, never
    shipped inside a repo.
  • Additional hardening: block PATH from untrusted project env
    (#1098) and refuse symlinked
    managed install roots (#1097).

Trusted sources — shell export, ~/.ouroboros/.env,
~/.ouroboros/config.yaml — keep full custom-CLI support, so no legitimate
workflow regresses
. The fix was adversarially reviewed by a security-focused
agent over two rounds (round 2 returned APPROVED with no remaining bypasses).

🙏 Reported by @qerogram — thank you for the responsible disclosure.

🚀 AgentOS Roadmap Progress (#961)

The AgentOS substrate wiring + baseline milestone is now complete.

Track A — ooo run fat-harness

  • ooo run CLI now defaults to the fat-harness execution path
  • Verifier-capability, typed blocked evidence, profile-aware decomposition, and
    profile-schema wiring landed; fat-harness AC acceptance now requires verifier
    PASS with typed evidence verification
  • Baseline gate evidence captured and recorded; #961 carries
    baseline-metrics-captured and the agentos-substrate-wiring milestone is
    closed
  • Readable baseline-metrics rendering + semantic-miss baseline metric reporting

Track B — ooo auto self-healing

  • Phase 2 typed recovery plan and Phase 3 DomainProfile merged
  • Hardened auto: Seed goal-drift repair from the ledger, strict grading with
    concrete coding evidence, observation/execution acceptance-criteria
    separation, and complete-product Ralph-loop wiring

Track C — AgentOS substrate dump (#920#960)

  • Workflow IR v1 lifecycle replay, conformance fixtures, and projection
    hardening against ambiguous run identity
  • Plugin lifecycle hook permission scope, v1 hook vocabulary, and bounded
    Tier 1 hook contract surface
  • HITL state projection, run-snapshot projection, typed HITL resume
    validation, and cancel-confirmation routing through typed events
  • Runtime transition contract validation (fail-closed on incomplete revision
    checks, malformed input rejection, secret-alias detection)
  • Skill runtime guides installable for Hermes/Claude/Codex from backend metadata

✨ Features

  • ooo run CLI flipped to fat-harness by default (with temporary opt-in path
    during rollout)
  • CLI: read-only Workflow IR inspection and status run projection JSON
    (#1063,
    #1064)
  • CLI: status health checks (#1101)
  • Harness: strict projection records, project artifact/verdict records
    (#1061)
  • Codex: live MCP doctor check (#1047),
    missing-MCP-extra detection (#1046),
    JSONL stdio for live MCP doctor (#1052)
  • Orchestrator: workflow lifecycle conformance report
    (#1038),
    HITL state projection (#1036),
    run snapshot projection (#1037)
  • Experimental Goose runtime can be enabled safely

🐛 Bug Fixes

  • Orchestrator: prevent execution workers from recursively invoking auto
    (#1075), recover from invalid
    dependency stages (#1070),
    reconcile sibling ACs from execution evidence
    (#1096)
  • Auto: surface execution terminal failures instead of reporting complete
    (#1076), canonicalize
    observation execution criteria (#1095),
    keep repaired Seed identifiers synchronized
    (#1071)
  • Jobs: preserve runner failure over terminal evidence
    (#1094), fail stalled
    progress-accounting executions (#1085),
    wait for runner cleanup after progress-stall failure
    (#1089)
  • Interview: scope completion-signal heuristic to user-prefix answers
    (#1077)
  • Goose: preserve approval for default permission modes
    (#1106)
  • Evidence scope hardening for observation/docs-only ACs
    (#1072,
    #1073,
    #1093)
  • Bigbang: add force flag to SeedGenerator.generate, replacing the
    FORCED_SCORE_VALUE hack (#1107)

📚 Docs & Maintenance

  • Clarify Windows WSL installation path
  • Align contributing documentation guidance
    (#1102)
  • AgentOS: sequence projection follow-up slots, clarify Workflow IR v1 boundary
  • Remove legacy self-report acceptance fallback
    (#1086) and unreachable verifier
    branch

Full Changelog: v0.38.2...v0.39.0

What's Changed

  • feat(orchestrator): add fat-harness baseline metrics report by @honor2030 in #977
  • feat(plugin): define hook audit event vocabulary by @shaun0927 in #973
  • feat(runtime): classify malformed tool-use turns by @shaun0927 in #972
  • feat(plugin): accept optional hook declarations by @shaun0927 in #970
  • docs(plugin): define lifecycle hook contract by @shaun0927 in #969
  • feat(orchestrator): support typed blocked leaf evidence by @shaun0927 in #927
  • feat(profiles): introduce profile YAML schema + loader by @honor2030 in #976
  • feat(hitl): add typed WAIT/RESUME contract by @shaun0927 in #971
  • feat(plugin): add v1 lifecycle hook contract types (#939) by @shaun0927 in #984
  • feat(plugin): enforce v1 hook contract in manifest validator (#939 PR-2) by @shaun0927 in #985
  • feat(plugin): add schema v0.3 with v1-only hook enum (#939 PR-3) by @shaun0927 in #986
  • feat(plugin): add v1 hook lifecycle permission scope (#939 PR-4) by @shaun0927 in #987
  • feat(orchestrator): add human-readable baseline metrics formatter by @shaun0927 in #988
  • feat(orchestrator): record fat-harness baseline metrics evidence by @shaun0927 in #989
  • feat(harness): add Run/Step/Artifact/Verdict projection records (#946 PR-1a) by @shaun0927 in #980
  • feat(harness): add ProjectionBuilder over the EventStore (#946 PR-1b) by @shaun0927 in #983
  • feat(harness): add journal → evidence-manifest normalizer (#978 P1) by @shaun0927 in #982
  • feat(orchestrator): add typed Workflow IR schema and validator (#956 PR-1) by @shaun0927 in #981
  • feat(harness): expose projection records through MCP query (#946 PR-2) by @shaun0927 in #990
  • feat(orchestrator): add read-only Seed to Workflow IR adapter (#956 PR-2) by @shaun0927 in #991
  • feat(orchestrator): audit profile-aware AC decomposition (#920 PR-1) by @shaun0927 in #992
  • feat(harness): load AC manifests for TraceGuard deliver ga...
Read more

v0.38.2

13 May 02:59
@Q00 Q00

Choose a tag to compare

What's Changed

Bug Fixes

  • Close residual allowed_tools=[] leak in sub-CLI envelope for interview

Testing

  • Lock empty allowedTools passthrough
  • Cover strict empty allowed-tools envelope (#975)

Full Changelog: v0.38.1...v0.38.2

What's Changed

  • fix(interview): close residual allowed_tools=[] leak in sub-CLI envelope by @Q00 in #974

Full Changelog: v0.38.1...v0.38.2

v0.38.1

12 May 23:42
@Q00 Q00

Choose a tag to compare

What's Changed

Features

  • Persist typed recovery plans after QA failure (#928)
  • Let decomposition consume execution profiles (#929)
  • Route verifiers by profile capability (#926)

Bug Fixes

  • Mutual-agreement closure gate for interview driver (#962)

Full Changelog: v0.38.0...v0.38.1

What's Changed

  • fix(auto): mutual-agreement closure gate for interview driver by @Q00 in #962
  • feat(orchestrator): route verifiers by profile capability by @shaun0927 in #926
  • feat(orchestrator): let decomposition consume execution profiles by @shaun0927 in #929
  • feat(auto): persist typed recovery plans after QA failure by @shaun0927 in #928

Full Changelog: v0.38.0...v0.38.1

v0.38.0

12 May 22:27
@Q00 Q00

Choose a tag to compare

What's Changed

This release wraps up the #830 Orchestrator stack (9 PRs), the #809 P3 DomainProfile rollout (coding + research profiles wired through ooo auto), and the #518 AgentProcess durability work. It also brings a major round of security/safety hardening across plugin trust, secret redaction, and subprocess bounding.

Features

Orchestrator (#830 stack, PRs 1/9 → 9/9)

  • Profile YAML schema + loader (#881)
  • Typed evidence schema validator (#883)
  • External verifier loop (#884)
  • Profile-aware decomposition params (#885)
  • PRE/POST phase wrappers (#886)
  • Failure taxonomy + recovery policy (#887)
  • Adaptive model/tool routing (#889)
  • Per-dispatch context budget (#890)
  • ProfileBackedStrategy + deprecate code-executor.md (PR 9/9)

DomainProfile (#809 P3 stack)

  • First built-in coding DomainProfile + parity tests (#851)
  • Second built-in research DomainProfile + plurality acceptance (#850)
  • 3-step DomainProfile activation in ooo auto CLI (#852)
  • Route AutoAnswerer through DomainProfile (#854)
  • Route safe_defaults through DomainProfile (#853)
  • Recovery-loop guards (#888)

AgentProcess & Evolution (#518, #578)

  • Durable pause/resume for AgentProcess via CheckpointStore (#844)
  • Wrap evolve_step in AgentProcess (#846)
  • Map watchdog timeouts onto Directive vocabulary (#836)
  • Emit control.directive.emitted from watchdog timeouts (#838)

MCP & Auto

  • ouroboros_start_evaluate fire-and-forget handler (#882)
  • Unified status surface for auto + ralph (#792)

Bug Fixes

Orchestrator (#891 stack)

  • Wire H3 wrappers into ProfileBackedStrategy
  • Per-profile Bash activity semantics
  • Direct executor through every AC, not just first
  • Replace build_post_block reuse with multi-AC directive
  • Preserve legacy domain guidance in system prompt
  • Derive guidance tool list from profile
  • Single consolidated evidence record + blocker in JSON
  • Drop blocker-marker contract until H2 schema lands
  • Strip deprecation banner from live code-executor prompt

MCP & Auto

  • Harden start_auto session exclusivity
  • Bump interview/seed phase timeouts; exempt user_preferences from shell-metachar scan (#894)
  • Restore coding DomainProfile lightweight loading (#879)
  • Restore coding profile lazy import boundary (#875)
  • Bound encoded Seed filenames (#878)

AgentProcess durability

  • Keep AgentProcess cancel durable until restart observes it (#845)
  • Prevent false terminal cancellation for live AgentProcess work (#880)
  • Preserve AgentProcess replay across lifecycle slices (#847)

Security & Hardening

  • Redact secret-shaped event resource payloads (#866)
  • Avoid persisting full Codex auth paths in failure events (#864)
  • Make trust and disable transitions atomic in CLI/plugin (#868)
  • Bound firewall subprocess invocation time (#858)

Other

  • Preserve raw JSON success fallback in copilot (#877)
  • Ignore telemetry JSON in copilot success fallback (#870)

Refactoring

  • Replace hardcoded model strings with config-aware getters in PM (#893)

Testing

  • Register integration pytest marker (#896)
  • Full Interview→Seed→Run→Ralph→QA E2E integration test (#793)
  • Isolate codex_cli profile tests from user config

Documentation

  • RFC: unified runtime timeout contract (#578) (#841)
  • Clarify stable Python source checkout setup (#876, #874)

Full Changelog: v0.37.0...v0.38.0

What's Changed

  • test(orchestrator): isolate codex_cli profile tests from user config by @Q00 in #872
  • docs(rfc): unified runtime timeout contract (#578) by @shaun0927 in #841
  • feat(auto): 3-step DomainProfile activation in ooo auto CLI (#809 P3, PR 3/6) by @shaun0927 in #852
  • fix(plugin): bound firewall subprocess invocation time by @shaun0927 in #858
  • fix(cli/plugin): make trust and disable transitions atomic by @shaun0927 in #868
  • fix(security): redact secret-shaped event resource payloads by @shaun0927 in #866
  • fix(interview): avoid persisting full Codex auth paths in failure events by @shaun0927 in #864
  • feat(evolution): map watchdog timeouts onto Directive vocabulary (#578) by @shaun0927 in #836
  • feat(orchestrator): durable pause/resume for AgentProcess via CheckpointStore (#518) by @shaun0927 in #844
  • feat(evolution): wrap evolve_step in AgentProcess (#518) by @shaun0927 in #846
  • feat(orchestrator): implement AgentProcess.replay() from control directive events (#518) by @shaun0927 in #847
  • feat(auto): route safe_defaults through DomainProfile (#809 P3, PR 5/6) by @shaun0927 in #853
  • feat(auto+jobs): unified status surface for auto + ralph by @shaun0927 in #792
  • docs: clarify source checkout Python defaults by @shaun0927 in #874
  • feat(auto): second built-in research DomainProfile + plurality acceptance (#809 P3, PR 6/6) by @shaun0927 in #850
  • feat(auto): route AutoAnswerer through DomainProfile (#809 P3, PR 4/6) by @shaun0927 in #854
  • fix(copilot): ignore telemetry JSON in success fallback by @shaun0927 in #870
  • feat(evolution): emit control.directive.emitted from watchdog timeouts (#578) by @shaun0927 in #838
  • test(integration): full Interview→Seed→Run→Ralph→QA E2E by @shaun0927 in #793
  • feat(auto): first built-in coding DomainProfile + parity tests (#809 P3, PR 2/6) by @shaun0927 in #851
  • fix(auto): restore coding DomainProfile lightweight loading by @shaun0927 in #879
  • feat(auto): recovery-loop guards (#809 P2.2b, Stack 1/2) by @Q00 in #888
  • docs: clarify stable Python source checkout setup by @honor2030 in #876
  • fix(copilot): preserve raw JSON success fallback by @shaun0927 in #877
  • fix(auto): bound encoded Seed filenames by @shaun0927 in #878
  • fix(auto): restore coding profile lazy import boundary by @shaun0927 in #875
  • fix(orchestrator): keep AgentProcess cancellation owned until work exits by @shaun0927 in #880
  • feat(orchestrator): profile YAML schema + loader (#830 PR 1/9) by @Q00 in #881
  • feat(mcp): add ouroboros_start_evaluate fire-and-forget handler by @Q00 in #882
  • feat(orchestrator): typed evidence schema validator (#830 PR 2/9) by @Q00 in #883
  • feat(orchestrator): durable cancel signal for AgentProcess (#518) by @shaun0927 in #845
  • feat(orchestrator): external verifier loop (#830 PR 3/9) by @Q00 in #884
  • feat(orchestrator): profile-aware decomposition params (#830 PR 4/9) by @Q00 in #885
  • feat(orchestrator): PRE/POST phase wrappers (#830 PR 5/9) by @Q00 in #886
  • feat(orchestrator): failure taxonomy + recovery policy (#830 PR 6/9) by @Q00 in #887
  • feat(orchestrator): adaptive model/tool routing (#830 PR 7/9) by @Q00 in #889
  • feat(orchestrator): per-dispatch context budget (#830 PR 8/9) by @Q00 in #890
  • test: register integration pytest marker by @Q00 in #896
  • fix(auto): bump interview/seed phase timeouts and exempt user_preferences from shell-metachar scan by @Q00 in #894
  • refactor(pm): replace hardcoded model strings with config-aware getters by @cohemm in #893
  • feat(orchestrator): ProfileBackedStrategy + deprecate code-executor.md (#830 PR 9/9) by @Q00 in #891
  • feat(auto): fire-and-forget ouroboros_start_auto + relax user_preferences value types by @Q00 in #895

New Contributors

Full Changelog: v0.37.0...v0.38.0

v0.37.0

11 May 16:19
@Q00 Q00

Choose a tag to compare

What's Changed

Features

ooo auto Pipeline

  • DomainProfile and VerifiablePredicate contracts (#849, #809 P3 PR 1/6)
  • UNSTUCK_LATERAL persona advisor on EVALUATE fail (#829)
  • EVALUATE phase verifies run output against seed AC (#825)
  • Formalize run-handoff idempotency contract (#843)
  • Chain RUN→RALPH automatically with --complete-product (#791)
  • user_preference source + deterministic ambiguity floor (#811)
  • Top-level pipeline_timeout_seconds deadline (#790)
  • Steer interviews toward open ledger gaps (#761)
  • Finalize safe-default interview gaps (#763)
  • Classify interview questions by intent (#762)
  • Expose ledger provenance as ledger_provenance in pipeline result meta (#740)
  • CI lint guard for ooo auto product boundary (#753)

Interview & Unstuck

  • Debate mode for ooo lateral (#812)
  • Raise prompt budget caps for richer answers
  • Isolate adapter from plugin MCP servers + hardening RFC

Ralph & Evolution

  • Total wall-clock budget max_total_seconds for Ralph (#789)
  • Oscillation / no-progress detection in Ralph (#788)
  • Pin v0 watchdog cancellation contract (#842)

Plugin & CLI

  • TrustStore concurrency primitives + LockEntry subject helper + manifest tuple ordering (#807)
  • UserLevel program registry: cross-axis collisions + command-name index (#747)
  • argv_summary in firewall audit events (observation-only) (#805)
  • ooo plugin {discover,inspect,list} read-only commands (#750)
  • Warn on stderr when ooo plugin list row has unreadable trust.json (#833)
  • Surface trust_read_error in ooo plugin list --json (#832)
  • Route ooo publish / ooo resume-session keywords via hook (#742)

MCP

  • Diagnostic event for interview response shape (#837)
  • Structured envelope for interview length-guard branch (#834)

Bug Fixes

Interview

  • Close parent-context leaks in sub-CLI envelope (#869)
  • Close Restate gate bypass for short PATH 2 answers (#827)
  • Scope strict MCP isolation
  • Reserve CLI adapter prompt headroom
  • Keep interview prompt budget below CLI failure ceiling
  • Budget interview prompts with serialized CLI framing

Security & Plugin Firewall

  • Contain auto Seed persistence paths (#865)
  • Prevent argv secret leaks across firewall outputs (#857)
  • Fail-closed on tampered plugin home + refuse legacy trust under subject contract (#808)
  • Escape all C0/DEL chars in lockfile TOML basic strings (#795)
  • Deep-copy audit event in unwrap_plugin_event (#796)
  • Defensive name validation + tighten source schema (#746)
  • Degrade row on corrupt trust.json instead of aborting list (#798)
  • Tighten _word_boundary_match to reject hyphen as token edge (#800)

Auto / Ralph

  • Bound retry on run_handoff_status="unknown" with idempotency-key (#787)
  • SeedRepairer.converge() adds max_iterations + outer wait_for (#785)
  • NFKC-normalize unsafe-context input before regex bank (#794)
  • Exact-match the canonical key in safe-default rollback (#804)
  • Per-iteration wall-clock timeout for Ralph (#784)
  • Close tool envelope on max_turns=1 to stop turn starvation (#770)

Providers & Misc

  • Isolate subprocess from host plugin env (#754)
  • Skip symlinks in check-auto-boundary scan (#797)
  • Keep Copilot completions from leaking tool events (#860)

Refactoring

  • Extract material-progress taxonomy module (#839)
  • max_turns=1 envelope sweep across remaining MCP sites (#786)

Testing

  • Pin three-surface AgentProcess acceptance contract (#848)
  • Pin watchdog resume/replay contract (#840)
  • Widen test_ralph_handler_returns_job_id_and_completes_loop deadline to 60s
  • End-to-end contract proof with github-pr-ops fixture (#752)
  • Define interview convergence contract (#760)
  • Guard interview prompt cap against CLI ceiling

Documentation

  • Forward complete_product / pipeline_timeout in skills/auto SKILL.md (#820)
  • Unify interview Step 9 payload schema + define Add-context retry (#828)
  • Add Refine and Restate gates to interview SKILL.md (+ multiple follow-up refinements)
  • Mark interview-hardening RFC as Accepted
  • Broaden uv install guidance for policy-restricted environments (#768)
  • Update version numbers in welcome skill (#810)

Full Changelog: v0.36.0...v0.37.0

What's Changed

  • fix(providers,interview): isolate subprocess from host plugin env by @ASak1104 in #754
  • feat(auto): CI lint guard for ooo auto product boundary by @shaun0927 in #753
  • test(auto): define interview convergence contract by @shaun0927 in #760
  • feat(auto): steer interviews toward open ledger gaps by @shaun0927 in #761
  • feat(cli): ooo plugin {discover,inspect,list} (read-only) by @shaun0927 in #750
  • feat(hook): route 'ooo publish' and 'ooo resume-session' keywords by @shaun0927 in #742
  • feat(auto): expose ledger provenance in pipeline result meta as ledger_provenance (#640) by @shaun0927 in #740
  • feat(plugin): lockfile + per-user trust store by @shaun0927 in #746
  • feat(auto): classify interview questions by intent by @shaun0927 in #762
  • feat(auto): finalize safe default interview gaps by @shaun0927 in #763
  • feat(plugin): UserLevel program registry by @shaun0927 in #747
  • docs(install): broaden uv install guidance for policy-restricted environments by @shaun0927 in #768
  • fix(mcp,interview): close tool envelope on max_turns=1 to stop turn starvation by @shaun0927 in #770
  • feat(plugin): add argv_summary to firewall audit events (observation-only) by @Q00 in #805
  • fix(auto): exact-match the canonical key in safe-default rollback by @Q00 in #804
  • fix(hook): tighten _word_boundary_match to reject hyphen as token edge by @Q00 in #800
  • fix(cli/plugin): degrade row on corrupt trust.json instead of aborting list by @Q00 in #798
  • fix(plugin): deep-copy audit event in unwrap_plugin_event by @Q00 in #796
  • fix(auto): NFKC-normalize unsafe-context input before regex bank by @Q00 in #794
  • fix(ralph): per-iteration wall-clock timeout by @shaun0927 in #784
  • fix(auto): SeedRepairer.converge() add max_iterations + outer wait_for by @shaun0927 in #785
  • refactor(mcp): max_turns=1 envelope sweep across remaining sites by @shaun0927 in #786
  • fix(auto): bound retry on run_handoff_status="unknown" with idempotency-key by @shaun0927 in #787
  • feat(ralph): oscillation / no-progress detection by @shaun0927 in #788
  • feat(ralph): total wall-clock budget max_total_seconds by @shaun0927 in #789
  • fix(plugin): escape all C0/DEL chars in lockfile TOML basic strings by @Q00 in #795
  • fix(scripts): skip symlinks in check-auto-boundary scan by @Q00 in #797
  • feat(auto): top-level pipeline_timeout_seconds deadline by @shaun0927 in #790
  • test(plugin): end-to-end contract proof with github-pr-ops fixture by @shaun0927 in #752
  • Fix stale welcomeVersion hardcoded in welcome skill by @adam0white in #810
  • feat(plugin): TrustStore concurrency primitives + LockEntry subject helper + manifest tuple ordering by @shaun0927 in #807
  • fix(plugin/firewall): fail-closed on tampered plugin home + refuse legacy trust under subject contract by @shaun0927 in #808
  • feat(auto): user_preference source + deterministic ambiguity floor (#809 P1) by @Q00 in #811
  • feat(auto): chain RUN→RALPH automatically with --complete-product by @shaun0927 in #791
  • feat(interview): isolate adapter from plugin MCP servers + RFC by @Q00 in #822
  • feat(interview): raise prompt budget caps for richer answers by @Q00 in #823
  • docs(interview): add Refine and Restate gates to SKILL.md by @Q00 in #824
  • docs(rfc): mark interview-hardening RFC as Accepted by @Q00 in #826
  • feat(auto): EVALUATE phase verifies run output against seed AC (#809 P2.1) by @Q00 in #825
  • feat(auto): UNSTUCK_LATERAL persona advisor on EVALUATE fail (#809 P2.2) by @Q00 in #829
  • docs(interview): unify Step 9 payload schema and define Add context retry by @shaun0927 in #828
  • feat(cli/plugin): surface trust_read_error in ooo plugin list --json (#806) by @shaun0927 in #832
  • feat(mcp): structured envelope for interview length-guard branch (#831) by @shaun0927 in #834
  • docs(skills/auto): forward complete_product/pipeline_timeout in SKILL.md by @shaun0927 in #820
  • refactor(evolution): extract material-progress taxonomy module (#578) by @shaun0927 in #839
  • test(evolution): pin watchdog resume/replay contract (#578) by @shaun0927 in https://github.com/Q...
Read more