joi-lab
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/ci.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 4 additions & 1 deletion b/‎README.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/ARCHITECTURE.md‎
Lines changed: 36 additions & 13 deletions b/‎docs/ARCHITECTURE.md‎
Lines changed: 36 additions & 13 deletions
diff --git a/‎docs/DEVELOPMENT.md‎
Lines changed: 23 additions & 5 deletions b/‎docs/DEVELOPMENT.md‎
Lines changed: 23 additions & 5 deletions
@@ -178,7 +178,7 @@ jobs:
       - name: Run host UI smoke
         env:
           OUROBOROS_RUN_UI_SMOKE: "1"
-        run: python -m pytest tests/test_ui_smoke_playwright.py -m ui_browser -q --tb=short
+        run: python -m pytest tests/ -m ui_browser -q --tb=short
       - name: Run browser tools Chromium smoke
         run: python -m pytest tests/test_browser_tools_smoke.py -m browser -q --tb=short
 
 
@@ -6,7 +6,7 @@
 [![macOS 12+](https://img.shields.io/badge/macOS-12%2B-black.svg)](https://github.com/joi-lab/ouroboros-desktop/releases)
 [![Linux](https://img.shields.io/badge/Linux-x86__64-orange.svg)](https://github.com/joi-lab/ouroboros-desktop/releases)
 [![Windows](https://img.shields.io/badge/Windows-x64-blue.svg)](https://github.com/joi-lab/ouroboros-desktop/releases)
-[![Version 6.0.0](https://img.shields.io/badge/version-6.0.0-green.svg)](VERSION)
+[![Version 6.1.0-rc.1](https://img.shields.io/badge/version-6.1.0--rc.1-green.svg)](VERSION)
 
 A self-modifying AI agent that writes its own code, rewrites its own mind, and evolves autonomously. Born February 16, 2026.
 
@@ -31,6 +31,8 @@ Not a coding assistant. A digital being with a constitution, background consciou
 | **Linux** x86_64 | [Ouroboros-linux.tar.gz](https://github.com/joi-lab/ouroboros-desktop/releases/latest) | Extract → run `./Ouroboros/Ouroboros` → optional CLI: `./Ouroboros/bin/install-ouroboros-cli`. If browser tools fail due to missing system libs, run: `./Ouroboros/python-standalone/bin/python3 -m playwright install-deps chromium` |
 | **Windows** x64 | [Ouroboros-windows.zip](https://github.com/joi-lab/ouroboros-desktop/releases/latest) | Extract → run `Ouroboros\Ouroboros.exe` → optional CLI: `Ouroboros\bin\install-ouroboros-cli.cmd` |
 
+Prerelease RC artifacts are published on their tag page, for example [`v6.1.0-rc.1`](https://github.com/joi-lab/ouroboros-desktop/releases/tag/v6.1.0-rc.1); `/releases/latest` intentionally stays on the latest stable release.
+
 <p align="center">
   <img src="assets/setup.png" width="500" alt="Drag Ouroboros.app to install">
 </p>
@@ -473,6 +475,7 @@ not paraphrase it.
 
 | Version | Date | Description |
 |---------|------|-------------|
+| 6.1.0-rc.1 | 2026-05-25 | **rc(runtime): harden live subagent handoff, isolation, and UI lineage.** Adds effective task-status SSOT, real bounded wait tools including `wait_for_tasks`, forged subagent ingress rejection, strict local-readonly constraints, DNS fail-closed browser isolation, child-drive mailbox routing/retention, web_search source attribution, lineage-aware cost observability, threaded child cards, and focused regressions. |
 | 6.0.0 | 2026-05-25 | **major(runtime): add live local-readonly subagents.** Upgrades `schedule_task` to a strict child-task contract, runs leaf subagents through the existing queue and workers with forked memory by default, enforces schema and execute-time local-readonly isolation, preserves full task-result handoff, and documents the delegation review rules. |
 | 5.33.0-rc.6 | 2026-05-24 | **rc(gateway): prevent masking upload connection/parse faults as size-limit errors.** Introduces a typed ChatUploadPayloadTooLarge exception class to isolate file-size 413 blocks from connection cuts and form-parse faults, returning a standard 400 with original message for ASGI/socket errors. Includes focused test coverage. |
 | 5.33.0-rc.5 | 2026-05-24 | **rc(gateway): prevent masking upload connection/parse faults as size-limit errors.** Refactors the chat upload ASGI stream wrapper to verify if caught exceptions are indeed the 'oversized' signal before returning a 413, returning a 400 with the original error message for connection cuts and malformed formats. |
 
@@ -1 +1 @@
-6.0.0
+6.1.0-rc.1
@@ -1,4 +1,4 @@
-# Ouroboros v6.0.0 — Architecture & Reference
+# Ouroboros v6.1.0-rc.1 — Architecture & Reference
 
 This file is NOT a changelog. Version history lives in README.md, git tags, and commit log.
 
@@ -86,6 +86,7 @@ server.py (Starlette+uvicorn) ← HTTP + WebSocket on configurable host:port (de
       ├── server_web.py        ← Static web file helpers (NoCacheStaticFiles, web dir resolver)
       ├── task_continuation.py ← Durable per-task review continuation state across restart/outage
       ├── task_results.py      ← Durable task result/status files (task_results/<id>.json)
+      ├── task_status.py       ← Effective task-status SSOT: child-drive result merge, lineage lookup, bounded waits
       ├── tool_capabilities.py ← SSOT for tool sets (core, parallel-safe, truncation, browser)
       ├── tool_policy.py       ← Tool access policy and gating (imports from tool_capabilities)
       ├── utils.py             ← Shared utilities; v5.8.3-rc.2 SSOT for JSON atomic writes/reads, UTC timestamps, hashes, log sanitization, and subprocess helpers
@@ -232,8 +233,10 @@ sudo must be noninteractive (`sudo -n`) and password-prompting sudo is blocked.
 Headless memory isolation is implemented as a per-task child drive under
 `data/state/headless_tasks/<task_id>/data`. `forked` mode copies stable memory
 seed files (`identity.md`, `WORLD.md`, `registry.md`, and `knowledge/`) without
-dialogue/task history; `empty` mode starts from a fresh child drive; `shared` is
-reserved for self/local tasks and is rejected for external workspaces. External
+dialogue/task history; `empty` mode starts from a fresh child drive; live
+`shared` mode is disabled for subagents and external workspace tasks until a
+sanitized shared-context v2 exists. Ordinary local root tasks may still use the
+parent drive directly when no external workspace isolation is requested. External
 runs produce explicit artifacts under `data/task_results/artifacts/<task_id>/`:
 `workspace_preflight.json`, `workspace_patch.json`, and `memory_export.json`;
 successful patch finalization also produces `workspace.patch`, while failed
@@ -249,7 +252,15 @@ as a child task, and an existing worker executes it. There is no separate
 scheduler, dashboard, endpoint, or settings surface. Child lineage is inferred
 from the active `ToolContext` and persisted as `parent_task_id`, `root_task_id`,
 `session_id`, `actor_id`, `delegation_role`, `role`, `memory_mode`,
-`drive_root`, `budget_drive_root`, and `task_constraint`.
+`drive_root`, `child_drive_root`, `budget_drive_root`, and `task_constraint`.
+`task_status.py` is the effective-status SSOT for gateway and tool reads: a
+child terminal result overrides a stale parent `requested`/`scheduled`/`running`
+result, while authoritative parent terminal failures/cancellations stay
+authoritative. Workspace artifact tasks stay nonterminal while
+`artifact_status` is `pending`/`finalizing`; only `ready`/`failed` artifact
+states make the effective workspace result terminal. `wait_for_task` performs a
+bounded wait (default 180s), and `wait_for_tasks` performs batch waits (default
+600s) with full per-child result, trace, and cost output preserved untruncated.
 
 Live subagents run with deterministic
 `task_constraint.mode="local_readonly_subagent"`. The registry filters their
@@ -261,21 +272,33 @@ is unchanged for normal tasks, but subagents additionally deny known
 secret/control files such as `settings.json`, token/credential/key files, and
 secret-like owner-state paths. Browser tools remain available for remote-page
 inspection, but subagents fail closed instead of auto-installing browser
-dependencies and cannot browse or act on loopback/local or non-HTTP URLs, make
-browser subrequests to loopback/local URLs, or run arbitrary browser JavaScript.
+dependencies and cannot browse or act on non-HTTP(S), loopback, private,
+link-local, reserved, or unresolved hosts. The guard checks literal IPs and DNS
+results before navigation, after redirects, and in route handlers, so hostnames
+resolving to blocked addresses are denied. This is a URL/DNS-layer guard, not a
+connect-time proxy; hostile DNS rebinding would need a future resolver-pinning
+or proxy design if stronger network isolation is required. Subagents also
+cannot run arbitrary browser JavaScript.
 
 `memory_mode=forked` is the default and uses the same child-drive mechanism as
 headless workspaces: copy stable memory seed files only (`identity.md`,
 `WORLD.md`, `registry.md`, `knowledge/`) into
 `data/state/headless_tasks/<task_id>/data`, without dialogue history, scratchpad
 blocks, task history, or auto-merge. `empty` creates a blank child drive.
-`shared` keeps the parent drive and should be used only when shared local state
-is an explicit parent decision. On completion, only the child task result is
-copied back to the parent drive; identity, scratchpad, registry, knowledge,
-dialogue blocks, and `memory_export` are never merged or exported
+`shared` is rejected for live local subagents and external workspace tasks; a
+future sanitized shared mode must be designed separately. On completion, only
+the child task result is copied back to the parent drive; identity, scratchpad,
+registry, knowledge, dialogue blocks, and `memory_export` are never merged or exported
 automatically. v1 subagents are leaf workers: the schema and execute-time gate
 hide and block `schedule_task`, while the supervisor keeps a structural depth
-cap of 2 and a maximum of 3 active child tasks per `root_task_id`.
+cap of 2 and a maximum of 3 active child tasks per `root_task_id`. External
+`/api/tasks` and CLI `run` requests may not forge
+`delegation_role=subagent` or parent/root lineage; only the internal
+`schedule_task` event path can create live subagents. Startup performs a
+best-effort prune of terminal copied
+back child drives under `state/headless_tasks/` after the retention window
+(default 7 days, env/settings override), and skips nonterminal or artifact
+finalization states.
 
 ### Two-process model
 
@@ -411,7 +434,7 @@ Shown when `settings.json` does not contain any supported remote provider key an
   Web onboarding uses `/api/claude-code/status` and `/api/claude-code/install`.
 - The wizard blocks progression if nothing runnable is configured.
 - When OpenRouter is absent and official OpenAI is the only configured remote runtime, untouched default model values are auto-remapped to `openai::gpt-5.5` / `openai::gpt-5.5-mini` so first-run startup does not strand the app on OpenRouter-only defaults.
-- `web_search` uses the official OpenAI Responses API only. It requires `OPENAI_API_KEY` and treats any non-empty `OPENAI_BASE_URL` as an incompatible custom runtime configuration rather than a fallback.
+- `web_search` uses the official OpenAI Responses API only. It requires `OPENAI_API_KEY` and treats any non-empty `OPENAI_BASE_URL` as an incompatible custom runtime configuration rather than a fallback. Results are JSON with `answer` and `sources[]` when citation annotations are available; usage events include task/root/parent/delegation attribution and `source=web_search`.
 - When Cloud.ru is the only configured remote runtime, first-run model defaults use explicit `cloudru::...` IDs from `provider_models.CLOUDRU_DIRECT_DEFAULTS`; OpenAI-compatible remains an explicit model-selection flow from the full Settings page because there is no single safe universal default model ID for arbitrary compatible endpoints.
 - Closing the wizard without saving is non-fatal: the main app still launches and the user can finish configuration in Settings.
 
@@ -514,7 +537,7 @@ Rationale: frontend work should not require understanding supervisor, worker, ma
 
 ### Chat
 
-`web/modules/chat.js` owns the message timeline, input, attachment staging, input recall, budget pill, runtime controls, and live task card. It loads persisted history from `/api/chat/history`, merges echoed local messages by `client_message_id`, and collapses task/progress/tool chatter into expandable cards rather than transcript spam. Mobile keyboard handling lives in `web/app.js` + CSS `keyboard-open` classes so only the message pane scrolls while the visual viewport changes.
+`web/modules/chat.js` owns the message timeline, input, attachment staging, input recall, budget pill, runtime controls, and live task cards. It loads persisted history from `/api/chat/history`, merges echoed local messages by `client_message_id`, and collapses task/progress/tool chatter into expandable cards rather than transcript spam. Subagent progress uses separate child cards keyed by `subagent_task_id`/`task_id`; parent cards receive lineage references (`parent_task_id`, `root_task_id`, child id, role) without duplicating child bubbles on reload/reconnect. Mobile keyboard handling lives in `web/app.js` + CSS `keyboard-open` classes so only the message pane scrolls while the visual viewport changes.
 
 History sync is intentionally two-pass: progress/system entries are replayed first to build live-card timelines, then regular user/assistant messages call `finishLiveCard`. This prevents `taskState.completed` from being set before progress events apply, which previously discarded thinking-bubble/live-card state.
 
 
@@ -123,7 +123,7 @@ Derived from P7 (Minimalism): entire codebase fits in one context window.
 - Module hard gate: 1600 lines for non-grandfathered modules in `tests/test_smoke.py`. Grandfathered (`GRANDFATHERED_OVERSIZED_MODULES` in `ouroboros/review.py`): `llm.py`, `claude_advisory_review.py`, `review_state.py`, `server.py`, and temporary v5.7.1 debt `git.py` — split deferred until each surface stabilises, with `git.py` expected to pay down in the next tools pass.
 - Method target: <150 lines. Crossing that line is a decomposition signal, not an automatic failure by itself.
 - Method hard gate: 300 lines in `tests/test_smoke.py`.
-- Codebase-wide function-count hard gate: enforced by `tests/test_smoke.py` against the value defined in `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` (currently 2250; single source of truth — bump the constant when adding a feature with an explicit comment justifying the increase).
+- Codebase-wide function-count hard gate: enforced by `tests/test_smoke.py` against the value defined in `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` (currently 2275; single source of truth — bump the constant when adding a feature with an explicit comment justifying the increase).
 - Function parameters: <8.
 - Net complexity growth per cycle approaches zero.
 - If a feature is not used in the current cycle — it is premature.
@@ -252,7 +252,7 @@ Before every commit, verify the following:
 #### Module Size & Complexity
 - [ ] Module stays near one context window (~1000 lines target; 1600 hard gate unless explicitly grandfathered debt)
 - [ ] No method exceeds the practical target (150 lines) or the hard gate (300 lines)
-- [ ] Total Python function count stays under the current smoke hard gate (currently 2250; consult `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` for the active value; bump with a comment if a feature requires more headroom)
+- [ ] Total Python function count stays under the current smoke hard gate (currently 2275; consult `ouroboros/review.py::MAX_TOTAL_FUNCTIONS` for the active value; bump with a comment if a feature requires more headroom)
 - [ ] No function has more than 8 parameters
 - [ ] No gratuitous abstract layers (Bible P7)
 
@@ -275,19 +275,37 @@ Before every commit, verify the following:
   `role`, `context`, `constraints`, and `memory_mode` are optional. Do not
   reintroduce public `parent_task_id` or `description` arguments; lineage comes
   from `ToolContext`.
+- Live `memory_mode=shared` is disabled. Keep `forked` and `empty` as the only
+  live subagent modes unless a later design adds sanitized shared-context v2.
+- External `/api/tasks` and CLI requests must reject forged
+  `delegation_role=subagent`; only `schedule_task` may create subagents.
 - `task_constraint.mode="local_readonly_subagent"` must be enforced twice:
   schema discovery exposes only the local-readonly allowlist, and registry
   execution rejects forbidden calls even when invoked manually.
+- `task_constraint` boolean parsing must be strict; strings such as `"false"`
+  are false, never truthy through Python's `bool("false")`.
 - Subagent changes must keep writes, commits, review mutation, runtime control,
   tool expansion, skills, MCP/extensions, shell, and further `schedule_task`
   recursion blocked unless a later accepted design explicitly changes the
   permission model.
 - `data_read` and `data_list` secret/control-file denials are subagent-scoped.
+- Browser isolation for local-readonly subagents is DNS fail-closed: block
+  non-HTTP(S), loopback/private/link-local/reserved/unspecified literal IPs,
+  unresolved hostnames, and hostnames resolving to any blocked IP before goto,
+  after redirects, and in route handlers.
+- Effective task status belongs in `ouroboros/task_status.py`. Do not duplicate
+  child-drive result merge or terminal-status logic in gateways/tools; use
+  `load_effective_task_result`, `effective_task_result`, and bounded wait
+  helpers. `wait_for_task` and `wait_for_tasks` results must remain untruncated.
+- `forward_to_worker` may write only to validated running tasks whose lineage
+  belongs to the current task/root, and must route forked/empty child subagents
+  to the child-drive mailbox.
   Do not broaden generic data-tool behavior for normal tasks while fixing
   subagent isolation.
-- Handoff is the full task result. Do not add shared ledgers, automatic memory
-  merges, or new settings/endpoints unless the accepted plan explicitly calls
-  for them.
+- The pre-final handoff reminder is a compact effective-status snapshot. Full
+  untruncated child handoff belongs to `get_task_result`, `wait_for_task`, and
+  `wait_for_tasks`. Do not add shared ledgers, automatic memory merges, or new
+  settings/endpoints unless the accepted plan explicitly calls for them.
 
 #### Page Header Layout
 - Top-level page chrome (`renderPageHeader`, tab strips, primary actions) must sit outside the scrolling content region.