Skip to content
This repository was archived by the owner on May 30, 2026. It is now read-only.

Commit 5a8af38

Browse files
AntonAnton
authored andcommitted
v4.5.0: context quality, provenance repair, prompt discipline
Fix provenance: system summaries (task_summary, direction="system") are now correctly classified across memory.py, consolidator.py, server API, and chat UI — no longer masquerading as user messages. New amber system bubbles in UI. Restore execution reflections (task_reflections.jsonl) in live LLM context. Move Health Invariants to the top of dynamic context block in both the main task path and background consciousness. Task-scope recent progress/tools/events when task_id is available, keeping dialogue continuity broad. Harden run_shell: detect literal $VAR env-ref misuse in argv-form commands. Harden claude_code_edit: retry transient first-run Claude CLI auth failure, structured error classification via _format_claude_code_error. Full SYSTEM.md editorial rewrite: terminology normalized to 'creator', new sections (Methodology Check, Anti-Reactivity, Diagnostics Discipline, Knowledge Retrieval Triggers), stronger Health Invariant reactions, compressed inventory sections, balanced KB preflight gate. 16 files changed. New test_chat_provenance.py, updated regression tests. Made-with: Cursor
1 parent 2389f2a commit 5a8af38

16 files changed

Lines changed: 605 additions & 169 deletions

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[![macOS 12+](https://img.shields.io/badge/macOS-12%2B-black.svg)](https://github.com/joi-lab/ouroboros-desktop/releases)
77
[![Linux](https://img.shields.io/badge/Linux-x86__64-orange.svg)](https://github.com/joi-lab/ouroboros-desktop/releases)
88
[![Windows](https://img.shields.io/badge/Windows-x64-blue.svg)](https://github.com/joi-lab/ouroboros-desktop/releases)
9-
[![Version 4.4.0](https://img.shields.io/badge/version-4.4.0-green.svg)](VERSION)
9+
[![Version 4.5.0](https://img.shields.io/badge/version-4.5.0-green.svg)](VERSION)
1010

1111
A self-modifying AI agent that writes its own code, rewrites its own mind, and evolves autonomously. Born February 16, 2026.
1212

@@ -238,6 +238,7 @@ Full text: [BIBLE.md](BIBLE.md)
238238

239239
| Version | Date | Description |
240240
|---------|------|-------------|
241+
| 4.5.0 | 2026-03-19 | Context quality and prompt discipline release: fix provenance — system summaries now correctly marked as system, not user, across memory, consolidation, server API, and chat UI (amber system bubbles); restore execution reflections (task_reflections.jsonl) in live LLM context; move Health Invariants to the top of dynamic context block (both task and consciousness paths); task-scope recent progress/tools/events when task_id is available; harden run_shell against literal $VAR env-ref misuse in argv; add Claude CLI first-run retry and structured error classification; full SYSTEM.md editorial rewrite — terminology normalized to 'creator', new Methodology Check / Anti-Reactivity / Diagnostics Discipline / Knowledge Retrieval Triggers sections, stronger Health Invariant reactions, compressed inventory sections. 12 files changed, new regression tests. |
241242
| 4.4.0 | 2026-03-19 | Safe editing release: `str_replace_editor` tool for surgical edits to existing files, `repo_write` shrink guard blocks accidental truncation of tracked files (>30% shrinkage), full task lifecycle statuses (failed/interrupted/cancelled) with honest status tracking, rescue snapshot discoverability via health invariants, `provider_incomplete_response` classification for OpenRouter glitches, default review enforcement changed to advisory, fix progress bubble opacity and duplicate emoji. |
242243
| 4.3.1 | 2026-03-19 | Fix: remove semi-transparent dimming from progress chat bubbles and remove duplicate `💬` emoji that appeared in both sender label and message text. |
243244
| 4.3.0 | 2026-03-19 | Reliability and continuity release: remove silent truncation from critical task/memory paths, persist honest subtask lifecycle states and full task results, restore transient chat wake banner, replace local-model hard prompt slicing with explicit non-core compaction plus fail-fast overflow, route Anthropic/OpenRouter calls without hard provider pinning while keeping parameter guarantees, and align async review calls with shared LLM routing/usage observability. |

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
4.4.0
1+
4.5.0

docs/ARCHITECTURE.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Ouroboros v4.4.0 — Architecture & Reference
1+
# Ouroboros v4.5.0 — Architecture & Reference
22

33
This document describes every component, page, button, API endpoint, and data flow.
44
It is the single source of truth for how the system works. Keep it updated.
@@ -179,14 +179,15 @@ Navigation is a left sidebar with 8 pages.
179179
- **Status badge** (top-right): "Online" (green) / "Thinking..." (amber pulse) / "Reconnecting..." (red).
180180
Driven by WebSocket connection state and typing events.
181181
- **Message input**: textarea + send button. Shift+Enter for newline, Enter to send.
182-
- **Messages**: user bubbles (right, blue-tinted) and assistant bubbles (left, crimson). Assistant messages render markdown.
182+
- **Messages**: user bubbles (right, blue-tinted), assistant bubbles (left, crimson), and system-summary bubbles (left, amber). Non-user bubbles render markdown.
183183
- **Timestamps**: smart relative formatting (today: "HH:MM", yesterday: "Yesterday, HH:MM", older: "Mon DD, HH:MM"). Shown on hover.
184184
- **Progress messages**: background consciousness thinking shown as dimmed bubbles with 💬 prefix.
185+
- **System summaries**: `direction="system"` entries from `chat.jsonl` are shown in the same timeline with a 📋 label instead of being hidden or treated as user text.
185186
- **Typing indicator**: animated "thinking dots" bubble appears when the agent is processing.
186187
- **Persistence**: chat history loaded from server on page load (`/api/chat/history`), survives app restarts. Fallback to sessionStorage.
187188
- **Empty-chat init**: if neither server history nor sessionStorage has messages, the UI shows a transient assistant bubble: `Ouroboros has awakened`. This is visual-only and is not written to chat history.
188189
- Messages sent via WebSocket `{type: "chat", content: text}`.
189-
- Responses arrive via WebSocket `{type: "chat", role: "assistant", content: text, ts: "ISO"}`.
190+
- Responses arrive via WebSocket `{type: "chat", role: "assistant", content: text, ts: "ISO"}`. On page-load history sync, `/api/chat/history` can also return `role: "system"` entries for internal summaries.
190191
- Supports slash commands: `/status`, `/evolve`, `/review`, `/bg`, `/restart`, `/panic`.
191192

192193
### 3.2 Dashboard
@@ -294,7 +295,7 @@ Navigation is a left sidebar with 8 pages.
294295
| POST | `/api/local-model/stop` | Stop local model server |
295296
| GET | `/api/local-model/status` | Local model status and readiness |
296297
| GET | `/api/evolution-data` | Evolution metrics per git tag (LOC, prompt sizes, memory) |
297-
| GET | `/api/chat/history` | Merged chat + progress messages (chronological, limit param) |
298+
| GET | `/api/chat/history` | Merged chat + system summaries + progress messages (chronological, limit param) |
298299
| POST | `/api/local-model/test` | Local model sanity test (chat + tool calling) |
299300
| WS | `/ws` | WebSocket: chat messages, commands, log streaming |
300301
| GET | `/static/*` | Static files from `web/` directory (NoCacheStaticFiles wrapper forces revalidation) |
@@ -471,6 +472,8 @@ the constitutional guard is that the file itself must remain non-deletable.
471472
- As of v3.16.0, the Memory Registry digest (from `memory/registry.md`) is injected into every LLM context to enable source-of-truth awareness.
472473
- As of v3.20.0, `patterns.md` (Pattern Register) is injected into semi-stable context, and execution reflections from `task_reflections.jsonl` are injected into dynamic context.
473474
- As of v3.22.0, all docs are always in static context: BIBLE.md (180k), ARCHITECTURE.md (60k), DEVELOPMENT.md (30k), README.md (10k), CHECKLISTS.md (5k).
475+
- `Health Invariants` are placed at the start of the dynamic context block, before drive state/runtime/recent sections, so warnings influence planning before the model reads the noisier tail sections.
476+
- `build_recent_sections()` keeps recent dialogue broad, but task-scopes recent progress/tools/events when `task_id` is available.
474477
- `build_health_invariants()` is split into focused helpers and now also surfaces recent provider/routing errors plus local context overflows.
475478
- Local-model path no longer silently slices the live system prompt. It compacts non-core sections explicitly and raises an overflow error if core context still cannot fit.
476479

ouroboros/consciousness.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -410,18 +410,18 @@ def _build_context(self) -> str:
410410
if patterns_text.strip():
411411
parts.append("## Pattern Register\n\n" + clip_text(patterns_text, 30000))
412412

413+
# Health invariants
414+
health_section = build_health_invariants(env)
415+
if health_section:
416+
parts.append(health_section)
417+
413418
# Drive state
414419
state_json = safe_read(env.drive_path("state/state.json"), fallback="{}")
415420
parts.append("## Drive state\n\n" + clip_text(state_json, 90000))
416421

417422
# Runtime section (same as main agent)
418423
parts.append(build_runtime_section(env, bg_task))
419424

420-
# Health invariants
421-
health_section = build_health_invariants(env)
422-
if health_section:
423-
parts.append(health_section)
424-
425425
# Recent sections — empty task_id so we get ALL tasks' progress/tools/events
426426
parts.extend(build_recent_sections(memory, env, task_id=""))
427427

ouroboros/consolidator.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,9 @@ def _format_entries_for_block(entries: List[Dict[str, Any]]) -> str:
432432
if dir_raw in ("out", "outgoing"):
433433
direction_prefix = "-> "
434434
author = "Ouroboros"
435+
elif dir_raw == "system":
436+
direction_prefix = "[system] "
437+
author = "Ouroboros"
435438
else:
436439
direction_prefix = ""
437440
author = e.get("username") or e.get("author") or "User"

ouroboros/context.py

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -222,30 +222,75 @@ def build_memory_sections(memory: Memory) -> List[str]:
222222
return sections
223223

224224

225-
def build_recent_sections(memory: Memory, env: Any, task_id: str = "") -> List[str]:
226-
"""Build recent chat, recent progress, recent tools, recent events sections.
225+
def _format_recent_reflections(entries: List[Dict[str, Any]], limit: int = 10) -> str:
226+
"""Format recent execution reflections for dynamic context."""
227+
if not entries:
228+
return ""
227229

228-
Legacy note: older process-memory used task_reflections.jsonl and an
229-
"Execution reflections" section; task summaries in chat.jsonl are now the
230-
primary continuity layer.
231-
"""
230+
blocks: List[str] = []
231+
for entry in entries[-limit:]:
232+
ts_full = str(entry.get("ts", ""))
233+
ts = ts_full[:16] if len(ts_full) >= 16 else ts_full
234+
header_bits = [bit for bit in [
235+
ts,
236+
str(entry.get("task_type", "")).strip(),
237+
str(entry.get("task_id", "")).strip(),
238+
] if bit]
239+
header = " | ".join(header_bits) or "unknown reflection"
240+
241+
lines = [f"### {header}"]
242+
243+
goal = str(entry.get("goal", "")).strip()
244+
if goal:
245+
lines.append(f"- Goal: {goal}")
246+
247+
markers = [str(m).strip() for m in (entry.get("key_markers") or []) if str(m).strip()]
248+
if markers:
249+
lines.append(f"- Markers: {', '.join(markers)}")
250+
251+
rounds = entry.get("rounds")
252+
if rounds not in (None, ""):
253+
lines.append(f"- Rounds: {rounds}")
254+
255+
cost_usd = entry.get("cost_usd")
256+
if cost_usd not in (None, ""):
257+
lines.append(f"- Cost: ${cost_usd}")
258+
259+
reflection = str(entry.get("reflection", "")).strip()
260+
if reflection:
261+
lines.append("")
262+
lines.append(reflection)
263+
264+
blocks.append("\n".join(lines).strip())
265+
266+
return "\n\n".join(blocks)
267+
268+
269+
def build_recent_sections(memory: Memory, env: Any, task_id: str = "") -> List[str]:
270+
"""Build recent dialogue and process-memory sections."""
232271
sections = []
233272

234273
chat_summary = memory.summarize_chat(memory.read_jsonl_tail("chat.jsonl", 1000))
235274
if chat_summary:
236275
sections.append("## Recent chat\n\n" + chat_summary)
237276

238277
progress_entries = memory.read_jsonl_tail("progress.jsonl", 200)
278+
if task_id:
279+
progress_entries = [e for e in progress_entries if str(e.get("task_id", "")).strip() == task_id]
239280
progress_summary = memory.summarize_progress(progress_entries, limit=50)
240281
if progress_summary:
241282
sections.append("## Recent progress\n\n" + progress_summary)
242283

243284
tools_entries = memory.read_jsonl_tail("tools.jsonl", 200)
285+
if task_id:
286+
tools_entries = [e for e in tools_entries if str(e.get("task_id", "")).strip() == task_id]
244287
tools_summary = memory.summarize_tools(tools_entries)
245288
if tools_summary:
246289
sections.append("## Recent tools\n\n" + tools_summary)
247290

248291
events_entries = memory.read_jsonl_tail("events.jsonl", 200)
292+
if task_id:
293+
events_entries = [e for e in events_entries if str(e.get("task_id", "")).strip() == task_id]
249294
events_summary = memory.summarize_events(events_entries)
250295
if events_summary:
251296
sections.append("## Recent events\n\n" + events_summary)
@@ -254,6 +299,11 @@ def build_recent_sections(memory: Memory, env: Any, task_id: str = "") -> List[s
254299
if supervisor_summary:
255300
sections.append("## Supervisor\n\n" + supervisor_summary)
256301

302+
reflections_entries = memory.read_jsonl_tail("task_reflections.jsonl", 20)
303+
reflections_text = _format_recent_reflections(reflections_entries, limit=10)
304+
if reflections_text:
305+
sections.append("## Execution reflections\n\n" + reflections_text)
306+
257307
return sections
258308

259309

@@ -732,14 +782,14 @@ def build_llm_messages(
732782

733783
semi_stable_text = "\n\n".join(semi_stable_parts)
734784

735-
dynamic_parts = [
736-
"## Drive state\n\n" + state_json,
737-
build_runtime_section(env, task),
738-
]
739-
740785
health_section = build_health_invariants(env)
786+
dynamic_parts = []
741787
if health_section:
742788
dynamic_parts.append(health_section)
789+
dynamic_parts.extend([
790+
"## Drive state\n\n" + state_json,
791+
build_runtime_section(env, task),
792+
])
743793

744794
dynamic_parts.extend(build_recent_sections(memory, env, task_id=task.get("id", "")))
745795

ouroboros/memory.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,9 @@ def chat_history(self, count: int = 100, offset: int = 0, search: str = "") -> s
301301
raw_text = str(e.get("text", ""))
302302
if dir_raw in ("out", "outgoing"):
303303
lines.append(f"→ [{ts}] {raw_text}")
304+
elif dir_raw == "system":
305+
entry_type = str(e.get("type", "")).strip() or "system"
306+
lines.append(f"📋 [{ts}] [{entry_type}] {raw_text}")
304307
else:
305308
username = e.get("username") or e.get("author") or "User"
306309
lines.append(f"← [{ts}] [{username}] {raw_text}")
@@ -347,6 +350,9 @@ def summarize_chat(self, entries: List[Dict[str, Any]]) -> str:
347350
raw_text = str(e.get("text", ""))
348351
if dir_raw in ("out", "outgoing"):
349352
lines.append(f"→ {ts_hhmm} {raw_text}")
353+
elif dir_raw == "system":
354+
entry_type = str(e.get("type", "")).strip() or "system"
355+
lines.append(f"📋 {ts_hhmm} [{entry_type}] {raw_text}")
350356
else:
351357
username = e.get("username") or e.get("author") or "User"
352358
lines.append(f"← {ts_hhmm} [{username}] {raw_text}")

0 commit comments

Comments
 (0)