From 4504a159cc4af28e00096312b35620c84e82b704 Mon Sep 17 00:00:00 2001 From: SongSong <209874400+thesongzhu@users.noreply.github.com> Date: Mon, 1 Jun 2026 00:25:04 -0700 Subject: [PATCH] docs: record chat secure-fact rejection proof --- docs/current-source-of-truth.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/current-source-of-truth.md b/docs/current-source-of-truth.md index f46b9d1e..4499956a 100644 --- a/docs/current-source-of-truth.md +++ b/docs/current-source-of-truth.md @@ -232,7 +232,7 @@ This document is the current architecture reference for steady-state Friday runt - Focused product API proof covers `/v1/uix/preferences` materializing selected safe communication learned facts with evidence/boundary metadata, learned-fact correction through `/v1/uix/learned-facts/:factKey`, synthetic memory visibility/delete, restart durability, and invalid/sensitive-shaped UIX preference rejection with zero sensitive learned facts persisted. Focused service proof additionally enumerates every current canonical communication key, UIX key, and Reflex key under generic UIX write semantics: ordinary communication/UIX/non-high-impact Reflex keys persist, high-impact Reflex keys fail closed behind Review Center, and invalid canonical `persona.mbti` values are rejected instead of being stored as silent fallback. This is still not exhaustive proof for every UI rendering state, live model-visible use of every key, cross-device behavior, or every high-impact Review Center UI path. - Learned-fact deletion and non-resurrection are additionally live-proven for bounded route surfaces: two service-seeded learned facts (`pref:persona_tone`, `pref:persona_verbosity`) were written through Friday's self-learning runtime pipeline against isolated state while the HTTP hub was stopped; one learned fact deleted via `/v1/uix/learned-facts/:factKey` and one deleted via synthetic memory ID `learned-fact:` both stayed absent from learned-facts, memory search, and memory list across one isolated runtime restart. - `/v1/uix/preferences` remains the explicit communication-preference persistence surface. Claims about learned facts must distinguish explicit preference storage, learning-event emission, learned-fact materialization, learned-fact deletion/revocation evidence, and the exact route/delete surfaces proven in live or focused product API tests. -- Sensitive-learning guardrails are focused rather than universal: merged guard coverage blocks sensitive/high-risk preference candidates across extraction/runtime/tool/world-model paths, case-insensitive English terms, and a normalized driver's-license (`driver_s_license`) caller gap found by a 27-category/6-surface harness. Live DeepSeek agent-route, first-party `/ws/chat`, and targeted real-browser `/chat` medical/privacy no-persistence proofs passed. PR #450 additionally proves a focused explicit `secure_fact` path: redacted candidate generation for one sensitive fact-shaped run, encrypted staging, Review Center API approval, seeded `/reflex` approve/reject rendering, one live-generated candidate approved through `/reflex`, and one real-browser `/chat` generated candidate approved through `/reflex`. Broader third-party channels, document/attachment ingestion, every sensitive category through live UI, chat-generated secure-fact rejection behavior, and full end-to-end secure-storage review UX remain proof-pending. +- Sensitive-learning guardrails are focused rather than universal: merged guard coverage blocks sensitive/high-risk preference candidates across extraction/runtime/tool/world-model paths, case-insensitive English terms, and a normalized driver's-license (`driver_s_license`) caller gap found by a 27-category/6-surface harness. Live DeepSeek agent-route, first-party `/ws/chat`, and targeted real-browser `/chat` medical/privacy no-persistence proofs passed. PR #450 additionally proves a focused explicit `secure_fact` path: redacted candidate generation for one sensitive fact-shaped run, encrypted staging, Review Center API approval, seeded `/reflex` approve/reject rendering, one live-generated candidate approved through `/reflex`, one real-browser `/chat` generated candidate approved through `/reflex`, and one real-browser `/chat` generated candidate rejected through `/reflex` with staged-secret deletion. Broader third-party channels, document/attachment ingestion, every sensitive category through live UI, broad generated-candidate rejection across every kind/status, and full end-to-end secure-storage review UX remain proof-pending. - False-recall guardrails are focused rather than universal: the agent prompt truth-labels memory search as scoped, runtime memory-recall tasks enforce or retry toward `memory_search` evidence, and compaction replay is labeled as unconfirmed context rather than durable memory. This reduces hallucinated memory risk, but docs and UI must not claim end-to-end hallucinated-memory detection across live channels, documents, tool output, or public-web flows. ## Runtime admin and security surfaces