You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prerelease RC artifacts are published on their tag page, for example [`v6.1.0-rc.1`](https://github.com/joi-lab/ouroboros-desktop/releases/tag/v6.1.0-rc.1); `/releases/latest` intentionally stays on the latest stable release.
35
+
34
36
<palign="center">
35
37
<imgsrc="assets/setup.png"width="500"alt="Drag Ouroboros.app to install">
36
38
</p>
@@ -473,6 +475,7 @@ not paraphrase it.
473
475
474
476
| Version | Date | Description |
475
477
|---------|------|-------------|
478
+
| 6.1.0-rc.1 | 2026-05-25 |**rc(runtime): harden live subagent handoff, isolation, and UI lineage.** Adds effective task-status SSOT, real bounded wait tools including `wait_for_tasks`, forged subagent ingress rejection, strict local-readonly constraints, DNS fail-closed browser isolation, child-drive mailbox routing/retention, web_search source attribution, lineage-aware cost observability, threaded child cards, and focused regressions. |
476
479
| 6.0.0 | 2026-05-25 |**major(runtime): add live local-readonly subagents.** Upgrades `schedule_task` to a strict child-task contract, runs leaf subagents through the existing queue and workers with forked memory by default, enforces schema and execute-time local-readonly isolation, preserves full task-result handoff, and documents the delegation review rules. |
477
480
| 5.33.0-rc.6 | 2026-05-24 |**rc(gateway): prevent masking upload connection/parse faults as size-limit errors.** Introduces a typed ChatUploadPayloadTooLarge exception class to isolate file-size 413 blocks from connection cuts and form-parse faults, returning a standard 400 with original message for ASGI/socket errors. Includes focused test coverage. |
478
481
| 5.33.0-rc.5 | 2026-05-24 |**rc(gateway): prevent masking upload connection/parse faults as size-limit errors.** Refactors the chat upload ASGI stream wrapper to verify if caught exceptions are indeed the 'oversized' signal before returning a 413, returning a 400 with the original error message for connection cuts and malformed formats. |
`drive_root`, `budget_drive_root`, and `task_constraint`.
255
+
`drive_root`, `child_drive_root`, `budget_drive_root`, and `task_constraint`.
256
+
`task_status.py` is the effective-status SSOT for gateway and tool reads: a
257
+
child terminal result overrides a stale parent `requested`/`scheduled`/`running`
258
+
result, while authoritative parent terminal failures/cancellations stay
259
+
authoritative. Workspace artifact tasks stay nonterminal while
260
+
`artifact_status` is `pending`/`finalizing`; only `ready`/`failed` artifact
261
+
states make the effective workspace result terminal. `wait_for_task` performs a
262
+
bounded wait (default 180s), and `wait_for_tasks` performs batch waits (default
263
+
600s) with full per-child result, trace, and cost output preserved untruncated.
253
264
254
265
Live subagents run with deterministic
255
266
`task_constraint.mode="local_readonly_subagent"`. The registry filters their
@@ -261,21 +272,33 @@ is unchanged for normal tasks, but subagents additionally deny known
261
272
secret/control files such as `settings.json`, token/credential/key files, and
262
273
secret-like owner-state paths. Browser tools remain available for remote-page
263
274
inspection, but subagents fail closed instead of auto-installing browser
264
-
dependencies and cannot browse or act on loopback/local or non-HTTP URLs, make
265
-
browser subrequests to loopback/local URLs, or run arbitrary browser JavaScript.
275
+
dependencies and cannot browse or act on non-HTTP(S), loopback, private,
276
+
link-local, reserved, or unresolved hosts. The guard checks literal IPs and DNS
277
+
results before navigation, after redirects, and in route handlers, so hostnames
278
+
resolving to blocked addresses are denied. This is a URL/DNS-layer guard, not a
279
+
connect-time proxy; hostile DNS rebinding would need a future resolver-pinning
280
+
or proxy design if stronger network isolation is required. Subagents also
281
+
cannot run arbitrary browser JavaScript.
266
282
267
283
`memory_mode=forked` is the default and uses the same child-drive mechanism as
268
284
headless workspaces: copy stable memory seed files only (`identity.md`,
269
285
`WORLD.md`, `registry.md`, `knowledge/`) into
270
286
`data/state/headless_tasks/<task_id>/data`, without dialogue history, scratchpad
271
287
blocks, task history, or auto-merge. `empty` creates a blank child drive.
272
-
`shared`keeps the parent drive and should be used only when shared local state
273
-
is an explicit parent decision. On completion, only the child task result is
274
-
copied back to the parent drive; identity, scratchpad, registry, knowledge,
275
-
dialogue blocks, and `memory_export` are never merged or exported
288
+
`shared`is rejected for live local subagents and external workspace tasks; a
289
+
future sanitized shared mode must be designed separately. On completion, only
290
+
the child task result is copied back to the parent drive; identity, scratchpad,
291
+
registry, knowledge, dialogue blocks, and `memory_export` are never merged or exported
276
292
automatically. v1 subagents are leaf workers: the schema and execute-time gate
277
293
hide and block `schedule_task`, while the supervisor keeps a structural depth
278
-
cap of 2 and a maximum of 3 active child tasks per `root_task_id`.
294
+
cap of 2 and a maximum of 3 active child tasks per `root_task_id`. External
295
+
`/api/tasks` and CLI `run` requests may not forge
296
+
`delegation_role=subagent` or parent/root lineage; only the internal
297
+
`schedule_task` event path can create live subagents. Startup performs a
298
+
best-effort prune of terminal copied
299
+
back child drives under `state/headless_tasks/` after the retention window
300
+
(default 7 days, env/settings override), and skips nonterminal or artifact
301
+
finalization states.
279
302
280
303
### Two-process model
281
304
@@ -411,7 +434,7 @@ Shown when `settings.json` does not contain any supported remote provider key an
411
434
Web onboarding uses `/api/claude-code/status` and `/api/claude-code/install`.
412
435
- The wizard blocks progression if nothing runnable is configured.
413
436
- When OpenRouter is absent and official OpenAI is the only configured remote runtime, untouched default model values are auto-remapped to `openai::gpt-5.5` / `openai::gpt-5.5-mini` so first-run startup does not strand the app on OpenRouter-only defaults.
414
-
-`web_search` uses the official OpenAI Responses API only. It requires `OPENAI_API_KEY` and treats any non-empty `OPENAI_BASE_URL` as an incompatible custom runtime configuration rather than a fallback.
437
+
-`web_search` uses the official OpenAI Responses API only. It requires `OPENAI_API_KEY` and treats any non-empty `OPENAI_BASE_URL` as an incompatible custom runtime configuration rather than a fallback. Results are JSON with `answer` and `sources[]` when citation annotations are available; usage events include task/root/parent/delegation attribution and `source=web_search`.
415
438
- When Cloud.ru is the only configured remote runtime, first-run model defaults use explicit `cloudru::...` IDs from `provider_models.CLOUDRU_DIRECT_DEFAULTS`; OpenAI-compatible remains an explicit model-selection flow from the full Settings page because there is no single safe universal default model ID for arbitrary compatible endpoints.
416
439
- Closing the wizard without saving is non-fatal: the main app still launches and the user can finish configuration in Settings.
417
440
@@ -514,7 +537,7 @@ Rationale: frontend work should not require understanding supervisor, worker, ma
514
537
515
538
### Chat
516
539
517
-
`web/modules/chat.js` owns the message timeline, input, attachment staging, input recall, budget pill, runtime controls, and live task card. It loads persisted history from `/api/chat/history`, merges echoed local messages by `client_message_id`, and collapses task/progress/tool chatter into expandable cards rather than transcript spam. Mobile keyboard handling lives in `web/app.js` + CSS `keyboard-open` classes so only the message pane scrolls while the visual viewport changes.
540
+
`web/modules/chat.js` owns the message timeline, input, attachment staging, input recall, budget pill, runtime controls, and live task cards. It loads persisted history from `/api/chat/history`, merges echoed local messages by `client_message_id`, and collapses task/progress/tool chatter into expandable cards rather than transcript spam. Subagent progress uses separate child cards keyed by `subagent_task_id`/`task_id`; parent cards receive lineage references (`parent_task_id`, `root_task_id`, child id, role) without duplicating child bubbles on reload/reconnect. Mobile keyboard handling lives in `web/app.js` + CSS `keyboard-open` classes so only the message pane scrolls while the visual viewport changes.
518
541
519
542
History sync is intentionally two-pass: progress/system entries are replayed first to build live-card timelines, then regular user/assistant messages call `finishLiveCard`. This prevents `taskState.completed` from being set before progress events apply, which previously discarded thinking-bubble/live-card state.
Copy file name to clipboardExpand all lines: docs/DEVELOPMENT.md
+23-5Lines changed: 23 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,7 +123,7 @@ Derived from P7 (Minimalism): entire codebase fits in one context window.
123
123
- Module hard gate: 1600 lines for non-grandfathered modules in `tests/test_smoke.py`. Grandfathered (`GRANDFATHERED_OVERSIZED_MODULES` in `ouroboros/review.py`): `llm.py`, `claude_advisory_review.py`, `review_state.py`, `server.py`, and temporary v5.7.1 debt `git.py` — split deferred until each surface stabilises, with `git.py` expected to pay down in the next tools pass.
124
124
- Method target: <150 lines. Crossing that line is a decomposition signal, not an automatic failure by itself.
125
125
- Method hard gate: 300 lines in `tests/test_smoke.py`.
126
-
- Codebase-wide function-count hard gate: enforced by `tests/test_smoke.py` against the value defined in `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` (currently 2250; single source of truth — bump the constant when adding a feature with an explicit comment justifying the increase).
126
+
- Codebase-wide function-count hard gate: enforced by `tests/test_smoke.py` against the value defined in `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` (currently 2275; single source of truth — bump the constant when adding a feature with an explicit comment justifying the increase).
127
127
- Function parameters: <8.
128
128
- Net complexity growth per cycle approaches zero.
129
129
- If a feature is not used in the current cycle — it is premature.
@@ -252,7 +252,7 @@ Before every commit, verify the following:
252
252
#### Module Size & Complexity
253
253
-[ ] Module stays near one context window (~1000 lines target; 1600 hard gate unless explicitly grandfathered debt)
254
254
-[ ] No method exceeds the practical target (150 lines) or the hard gate (300 lines)
255
-
-[ ] Total Python function count stays under the current smoke hard gate (currently 2250; consult `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` for the active value; bump with a comment if a feature requires more headroom)
255
+
-[ ] Total Python function count stays under the current smoke hard gate (currently 2275; consult `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` for the active value; bump with a comment if a feature requires more headroom)
256
256
-[ ] No function has more than 8 parameters
257
257
-[ ] No gratuitous abstract layers (Bible P7)
258
258
@@ -275,19 +275,37 @@ Before every commit, verify the following:
275
275
`role`, `context`, `constraints`, and `memory_mode` are optional. Do not
276
276
reintroduce public `parent_task_id` or `description` arguments; lineage comes
277
277
from `ToolContext`.
278
+
- Live `memory_mode=shared` is disabled. Keep `forked` and `empty` as the only
279
+
live subagent modes unless a later design adds sanitized shared-context v2.
280
+
- External `/api/tasks` and CLI requests must reject forged
281
+
`delegation_role=subagent`; only `schedule_task` may create subagents.
278
282
-`task_constraint.mode="local_readonly_subagent"` must be enforced twice:
279
283
schema discovery exposes only the local-readonly allowlist, and registry
280
284
execution rejects forbidden calls even when invoked manually.
285
+
-`task_constraint` boolean parsing must be strict; strings such as `"false"`
286
+
are false, never truthy through Python's `bool("false")`.
0 commit comments