Skip to content

Commit ae1546f

Browse files
wehosHongzhi Wenclaude
authored
[Memory] PR-1 evidence mechanism foundation (RFC Project-N-E-K-O#928) (Project-N-E-K-O#929)
* feat(memory): PR-1 evidence mechanism foundation (RFC Project-N-E-K-O#928) Implements PR-1 per memory-evidence-rfc §4.1 — signal detection + evidence data layer + one-shot migration. PR-2/3/4 (decay/archive, render-budget/merge-on-promote, funnel analytics) follow separately. Schema + infra: - memory/evidence.py: pure-function module (effective_rein/disp with independent clocks, evidence_score, derive_status, maybe_mark_sub_zero) - memory/event_log.py: +3 event types (reflection/persona.evidence_updated, persona.entry_updated) + ALL_EVIDENCE_SOURCES enum - PersonaManager + ReflectionEngine: +evidence fields in entry schemas, new aapply_signal with full-snapshot record_and_save contract - §3.4.4 _texts_may_contradict 调用链保持 aadd_fact 唯一入口 Signal detection: - FactStore.aextract_facts_and_detect_signals: Stage-1 extract (no existing-obs context, avoids self-cycling) + Stage-2 map signals (defensive target_id validation) - importance<5 no longer hard-dropped at extract (§3.1.3) - tags schema kept but new facts default to [] (§2.7) Prompts (config/prompts_memory.py): - FACT_EXTRACTION_PROMPT rewritten (drop tags + importance filter) - SIGNAL_DETECTION_PROMPT new (5-lang i18n + watermark) - NEGATIVE_TARGET_CHECK_PROMPT new (5-lang + 情感分析专家/sends some useful information watermark) - NEGATIVE_KEYWORDS_I18N: 5-language frozenset dict + scan helper Auto-promote refactor: - _aauto_promote_stale_locked is now score-driven pending→confirmed only; confirmed→promoted deferred to PR-3 (_apromote_with_merge) - AUTO_CONFIRM_DAYS / AUTO_PROMOTE_DAYS deleted - aget_followup_topics filters evidence_score >= 0 (§3.8.6) memory_server.py wiring: - Registers 3 reconciler handlers (full-snapshot apply + sha256 guard for persona.entry_updated) - _periodic_signal_extraction_loop runs Stage-1+Stage-2 + signal dispatch on idle/N-turn trigger - _amaybe_trigger_negative_keyword_hook: conversation-path fast trigger → Layer-2 LLM target check → disputation signals - _aone_shot_migration_if_needed: seeds evidence fields on legacy reflection/persona, marker-guarded, resumable on crash - check_feedback emits confirmed→rein+1 / denied→disp+1 / ignored→ rein -0.2 signals Config (§Appendix A): - All new constants land in config/__init__.py + __all__ - Gate-1/2/4 草案值: rein half-life 30d, disp 180d, archive_days 14, confirmed/promoted threshold 1.0/2.0, archive max 500 - Gate 3 LLM tier choices: summary/correction/emotion/correction (TO CONFIRM before merge) Dependencies: - pyproject.toml: +tiktoken>=0.7.0 (PR-3 uses; ship now to avoid double-bumping the dep in a later PR) Tests (29 new, covering S1/S1b/S2/S3/S4/S6/S7/S8/S18): - test_evidence_math.py: decay math + independent clocks + tier mapping + migration seed arithmetic + sub_zero accumulation - test_evidence_apply_signal.py: aapply_signal updates views, emits events, preserves independent clocks, reconciler handler is idempotent on replay - test_evidence_extraction.py: Stage-1 no existing-obs leakage, Stage-2 drops hallucinated ids, failure semantics preserve facts, low-importance facts now persisted - test_manager_locks.py updated to drop AUTO_PROMOTE_DAYS dependency while keeping lock-ordering coverage Pre-merge reviewer gates (RFC §6.5) — still need sign-off: - Gate 1: EVIDENCE_REIN_HALF_LIFE_DAYS=30 / DISP=180 - Gate 2: REFLECTION_SYNTHESIS_CONTEXT_ABSORBED_COUNT=10 / DAYS=14 - Gate 3: 4 LLM tier selections (currently summary/correction/ emotion/correction — confirm before PR-3 merges) - Gate 4: ARCHIVE_FILE_MAX_ENTRIES=500, EVIDENCE_PROMOTE_RETRY_BACKOFF_MINUTES=30, MAX_RETRIES=5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(memory): PR Project-N-E-K-O#929 review nits — unused imports + empty except Round-1 feedback from github-code-quality[bot]: - memory/reflection.py: drop unused EVIDENCE_PROMOTED_THRESHOLD (only used in PR-3 _apromote_with_merge) + IGNORED_REINFORCEMENT_DELTA (consumed in memory_server.py, not this file) - memory_server.py: remove unused `ttype` local in negative-keyword dispatch loop — `obs['target_type']` always wins downstream - memory_server.py: replace empty `except Exception: pass` in signal-check turn counter with `except Exception as e: logger.debug(...)` — best-effort failure now audit-visible No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): PR Project-N-E-K-O#929 Codex + CodeRabbit round-2 review Codex round-2: - [P1] Stage-1 terminal failure now raises FactExtractionFailed instead of silently returning ([], []) — memory_server signal loop catches it and keeps the cursor untouched so the next cycle retries the same message window (§3.4.3 "Stage-1 失败则游标不推进") - [P2] Signal-extraction SQL window uses last successful check_ts as cursor (new _signal_check_window_start helper); long active sessions no longer drop user messages older than the fixed 10-min window CodeRabbit round-1 (7 inline + 1 outside): - Negative-target-check prompt JSON examples were invalid ("reflection" or "persona" literals inside JSON); split schema doc into prose + one clean JSON example across all 5 languages - memory_server: mark_done moved after dispatch so transient event_log failures leave the cursor untouched and the next cycle retries - aapply_signal downstream failures (check_feedback path) now warn explicitly so operators can spot lost signals; full retry semantics (surfaced feedback rollback / outbox) deferred to follow-up - persona.aapply_signal / reflection.aapply_signal _sync_save now honour assert_cloudsave_writable — evidence writes respect the same read-only/maintenance gate as save_persona / save_reflections - _filter_reflections + _aauto_promote_stale_locked now reference REFLECTION_TERMINAL_STATUSES instead of hardcoded tuples, so promote_blocked / merged stay consistently excluded - test_evidence_apply_signal: dropped tautological `!= x or True` and rewrote S4 idempotency test to exercise the real production handler from memory.evidence_handlers (new module — factored out of memory_server so tests can register the same apply function the reconciler runs) - Added S5 test for persona.entry_updated sha256 mismatch → raise - FactStore._apersist_new_facts clamps importance to 1..10 and whitelists entity to {master, neko, relationship}; dirty LLM output (-3 / 999 / "user") no longer leaks into facts.json CodeRabbit round-2 (1 outside): - _prepare_save_reflections was silently dropping `merged` and `promote_blocked` entries from the main file when _aauto_promote_stale_locked filtered its active set through REFLECTION_TERMINAL_STATUSES. Fix: keep non-archivable terminal states (merged / promote_blocked) in keep_in_main; archive only promoted/denied after _REFLECTION_ARCHIVE_DAYS - Added regression test covering this path test_manager_locks: stronger assertion — reload reflections after aauto_promote_stale to verify `status == "confirmed"` persisted Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(memory): PR Project-N-E-K-O#929 round-3 nit — log on bad last_check_ts parse github-code-quality flagged the `except (ValueError, TypeError): pass` in `_signal_check_window_start` as an empty handler. Replace with a debug-level log documenting the "corrupt cursor" fall-through — same control flow, audit-visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(memory): PR Project-N-E-K-O#929 round-3 review — prompt cleanup + test coverage CodeRabbit round-3: - Drop stray English fragment "sends some useful information..." from ja/ko/ru NEGATIVE_TARGET_CHECK_PROMPT versions. The `======以上为` section markers already satisfy RFC §3.4.6 watermark requirement for those languages, so the English carried no semantic weight and only risked confusing the LLM. Keep the line on `en` where it reads as natural English. - test_reflection_apply_unknown_id_returns_false / test_persona_apply_signal_unknown_entry_returns_false now assert `ev.read_since(name, None) == []` — unknown target_id must not produce an event (tightens the "append before mutate" discipline from the fingerprint on the file's docstring). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory): PR Project-N-E-K-O#929 round-4 — harden evidence event assertions CodeRabbit round-4: - S4 replay loop now asserts each of the 9 replays returns False (no-op). Strengthens the idempotence guarantee — previously only the first replay return value was checked. - test_reflection_apply_emits_evidence_event now locks the full-snapshot payload: rein_last_signal_at/disp_last_signal_at/sub_zero_days in addition to reinforcement/disputation/source. Catches regressions where the payload degrades to a partial snapshot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(memory): PR Project-N-E-K-O#929 round-5 review — prompt clarity + test helper + dead code CodeRabbit round-5: - NEGATIVE_TARGET_CHECK_PROMPT en/ja/ko/ru: prose said "empty array / 空配列 / 빈 배열 / пустой массив" but example shows {"targets": []} (object with empty array field). LLM could reasonably return just `[]` and break parsing. Rewritten to explicitly call for `{"targets": []}` across all 4 languages. - scan_negative_keywords: drop the unreachable `if not kws: return False` branch — `NEGATIVE_KEYWORDS_I18N['zh']` is always a non-empty frozenset so the fallback via dict.get(lang, ['zh']) is guaranteed to yield one. - tests/unit/test_evidence_apply_signal.py: extracted `_find_by_id` helper for the `[x for x in rows if x["id"] == rid][0]` pattern. Failures now report which id was missing rather than IndexError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory): PR Project-N-E-K-O#929 round-6 — lock persona evidence payload snapshot CodeRabbit round-6: the persona.evidence_updated event test only checked `entry_id`. Extended to match the reflection-side full-snapshot contract — now locks entity_key, reinforcement, disputation, both last_signal_at clock fields, sub_zero_days, and source. A future regression that drops any of these degrades the test clearly instead of silently passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory): PR Project-N-E-K-O#929 round-7 — both-sides signal + unused-var rename CodeRabbit round-7: - test_reflection_apply_reinforcement_updates_fields: unused `ev` → `_ev` to match the `_fs`/`_pm`/`_cm` unused-fixture convention - New test_reflection_apply_both_sides_updates_both_clocks: locks RFC §3.4.1 forward-compat contract — a combined delta `{'reinforcement': 1.0, 'disputation': 0.5}` updates both counters AND both last_signal_at timestamps, and emits a single event with both clock fields set in the payload Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(memory): parallelize per-character startup/signal loops Sequential `for name in catgirl_names: await ...` loops wasted wall-clock on independent per-character work. Mirror the _periodic_rebuttal_loop / _periodic_auto_promote_loop pattern (inner async helper + asyncio.gather with return_exceptions=True) across: - Startup reconciler replay: N × event_log replay → max(replay) - Startup evidence migration: N × seed → max(seed) - _periodic_signal_extraction_loop inner per-char body: N × (Stage-1 + Stage-2 + dispatch + auto_promote_stale) → max(...) Per-character isolation is already baked in (per-char event_log lock, asyncio.Lock on PersonaManager/ReflectionEngine, per-char file paths), so gather is safe. Failures stay isolated via return_exceptions plus the inner try/except. Intentionally LEFT sequential: - _periodic_idle_maintenance_loop — re-checks _is_idle() between chars and breaks on busy, which only works as a serial gate main_server / agent_server audited — main_server already gathers per- character init / teardown; agent_server has no per-character async fan-out to parallelize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(memory): PR Project-N-E-K-O#929 round-8 — reflection filter + importance seed Four tightenings in response to user design review: 1. `aget_confirmed_reflections` now filters `score > 0 AND not suppress`. Earlier the function only checked `status='confirmed'`, so a confirmed reflection that later dropped to score=0 (rein 抵消 disp) or even negative would keep rendering in the "比较确定的印象" region — language 漂移 bug. score==0 no longer renders; archive countdown still uses strict `< 0` (natural drift will trip it soon enough). 2. `_filter_followup_candidates` now excludes derived-confirmed (pending + score >= EVIDENCE_CONFIRMED_THRESHOLD). Closes the漂移 window where a pending reflection had score>=1 but hadn't yet been flipped to status='confirmed' by the periodic loop — AI was still probing with "还不太确定" tone even though user had already confirmed. 3. `ReflectionEngine` gains the same 5h-window `recent_mentions` / `suppress` machinery that `PersonaManager` has (RFC §2.6). Scoped to `status='confirmed'` only — pending is meant to be probed, suppressing it defeats the purpose. memory_server wires: - `arecord_mentions(name, ai_response)` after each turn (parallel to persona.arecord_mentions) - `aupdate_suppressions(name)` before render in the two call sites - `aget_confirmed_reflections` also skips `suppress=True` 4. Importance-based initial reinforcement seed. Reflections synthesized from a batch containing a high-importance fact (nicknames, IDs, user "please remember X" flags) start with a head-start rein instead of 0, so they fast-track through pending→confirmed: max importance 10 → rein 0.8 max importance 9 → rein 0.6 max importance 8 → rein 0.4 max importance 7 → rein 0.2 max importance 5-6 → 0 MAX-based (not avg) because one critical fact in the batch is enough to flag the whole reflection as important. FACT_EXTRACTION_PROMPT rubric rewritten across 5 languages to guide the LLM to calibrate importance ratings (previous prompt had terse "1-10" with no guidance, so every fact drifted to 7). New tests: 10 units — importance curve, confirmed-filter, followup derived-confirmed exclusion, mention suppress mechanism for confirmed reflection, synth high/low importance initial rein seed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * revert(memory): PR Project-N-E-K-O#929 round-9 — restore derived-confirmed pending as followup candidate User design call: `_filter_followup_candidates` should NOT gate on `score >= CONFIRMED_THRESHOLD`. A pending reflection whose score has drifted into derived-confirmed range is still a worthwhile followup candidate — surfacing it gives the user a natural chance to re-affirm or push back before the periodic loop finally flips the stored status. Rolls back the round-8 addition in `_filter_followup_candidates`. The other three round-8 changes (aget_confirmed filter, recent_mentions for confirmed reflections, importance-based initial rein seed) stay. Test renamed: `test_followup_excludes_derived_confirmed` → `test_followup_includes_derived_confirmed_pending` with inverted assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(memory-evidence-rfc): v1.2 amendments from PR Project-N-E-K-O#929 review Fold four behavior-affecting changes surfaced during the PR-1 implementation review thread back into the design doc so the RFC reflects the shipped semantics. - Revision log: new v1.2 entry summarizing the four changes + the cleanup refactors (FactExtractionFailed exception, cursor-based signal window, evidence_handlers module extraction, parallelized startup loops) - §2.6: recent_mentions scope extended from persona-only to include confirmed reflection; pending stays out by design - §3.1.7 (new): importance → initial reinforcement seed. MAX-based mapping, bounded <1.0 so still can't bypass CONFIRMED_THRESHOLD on its own, works in tandem with the polished FACT_EXTRACTION_PROMPT rubric to route user-flagged "please remember X" signals into a fast-track - §3.4.7: FACT_EXTRACTION_PROMPT text updated with the importance rubric (10 = nickname/ID/"please remember X", 8-9 = long-term stable preference, 6-7 = routine, 5 = minor, 1-4 = weak) - §3.8.4: ReflectionEngine arecord_mentions / aupdate_suppressions API additions documented - §3.8.6: new aget_confirmed_reflections render filter (score > 0 AND not suppress) alongside the existing followup filter; explicit note that followup does NOT gate the derived- confirmed upper bound (design decision) - §4.1: PR-1 scope updated to list v1.2 additions and refactors; test plan augmented with the v1.2 test items; added evidence_handlers.py to module list - §8.10 (new): S26-S33 success criteria for v1.2 features No code changes — documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(memory): PR Project-N-E-K-O#929 round-10 — signal weight differentiation + user_fact combo Splits signal weights by directness: - Direct signals (user_confirm / user_rebut / user_keyword_rebut) stay at 1.0 - Indirect signal user_fact NEGATES stays at 1.0 (semantic指向 still明确) - Indirect signal user_fact REINFORCES drops to 0.5 (silver standard; Stage-2 LLM 推断性强,降权抵消误关联风险) All weights moved into named constants in config/__init__.py: USER_FACT_REINFORCE_DELTA = 0.5 USER_FACT_NEGATE_DELTA = 1.0 USER_CONFIRM_DELTA = 1.0 USER_REBUT_DELTA = 1.0 USER_KEYWORD_REBUT_DELTA = 1.0 IGNORED_REINFORCEMENT_DELTA = -0.2 (pre-existing, renamed use site) USER_FACT_REINFORCE_COMBO_THRESHOLD = 2 USER_FACT_REINFORCE_COMBO_BONUS = 0.5 Combo bonus for user_fact reinforces: counter on each entry ticks on every user_fact reinforce; once count > threshold (= 3rd signal onward), each new reinforce adds base + bonus = 1.0 instead of 0.5. Counter never resets and decay doesn't apply to it (auditable累计事实). Implementation: - memory/evidence.py: new `compute_evidence_snapshot(entry, delta, now_iso, source)` shared helper — PersonaManager.aapply_signal and ReflectionEngine.aapply_signal both delegate to it so combo semantics are single-sourced - Entry schema (persona + reflection): +user_fact_reinforce_count: int - Event payload `reflection.evidence_updated` / `persona.evidence_updated`: +user_fact_reinforce_count field (full-snapshot contract) - evidence_handlers._EVIDENCE_SNAPSHOT_KEYS: +user_fact_reinforce_count so reconciler replay preserves counter - memory_server._adispatch_evidence_signals: uses USER_FACT_*_DELTA / USER_KEYWORD_REBUT_DELTA constants; _extract_facts_and_check_feedback uses USER_CONFIRM_DELTA / USER_REBUT_DELTA / IGNORED_REINFORCEMENT_DELTA Tests (4 new): - Combo curve: 1st/2nd → +0.5; 3rd/4th+ → +1.0 - user_confirm / user_rebut / user_fact negates do NOT tick counter - Existing event-payload assertions extended to check user_fact_reinforce_count RFC v1.2.1: - Revision log entry - §3.1.8 new subsection: differentiated weights + combo design rationale - §3.4.1 mapping table updated with constants - Appendix A.1: all new constants listed - §8.10: S34-S36 new success criteria Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): PR Project-N-E-K-O#929 round-11 — review findings Addresses CodeRabbit + github-code-quality feedback on the v1.2.1 commit (28268fc): 🔴 Critical (evidence_handlers.py) - All three handlers silently swallowed JSONDecodeError/OSError on load, letting Reconciler.areconcile advance the sentinel past events that never actually applied. Let those exceptions propagate so replay pauses and the operator gets alerted. File-not-exists stays a normal no-op (first boot / new character). 🟠 Major (memory_server migration marker) - `_aone_shot_migration_if_needed` wrote the marker unconditionally, even when individual `aapply_signal` calls raised. This silently turned "fully resumable" into "fail-once, skip forever". Now track `seed_failures`; if > 0, log warning and return WITHOUT writing the marker → next boot continues from断点 (already-seeded entries still skip via field check). 🟠 Major (_adispatch_evidence_signals) - Used to swallow per-signal aapply_signal exceptions → caller (_periodic_signal_extraction_loop) advanced its cursor even when evidence writes failed. Now returns bool (`all_ok`); signal loop skips `_signal_check_mark_done` when any dispatch failed, so Stage-2 re-generates the signals next tick. Negative-keyword hook doesn't have a cursor, so it discards the return. 🟡 Minor (reflection.py empty except) - `_apply_update_reflection_suppressions` `except (ValueError, TypeError): pass` now logs at debug level with the offending timestamp value, same discipline as round-3 fix on `_signal_check_window_start`. 🟡 Nitpick (test_reflection_filters.py) - Removed unused `import asyncio` and `import os`. 🟡 Minor (RFC doc) - §4.1 status line "已合并进 PR Project-N-E-K-O#929" was premature; changed to "已在 PR Project-N-E-K-O#929 中实现 ... 待合并". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(memory): PR Project-N-E-K-O#929 round-12 — shape-mismatch raise + misc nits Three valid findings from latest CodeRabbit review; one rejected. ✅ 🟠 Major (evidence_handlers): top-level JSON type mismatch now RAISES instead of silently coercing to empty list/dict. The previous tolerant path let Reconciler.areconcile advance the sentinel past events it couldn't actually apply. Applied in all three handlers (reflection, persona evidence, persona entry). Same rationale as round-11 JSONDecodeError propagation — a handler that can't find a usable view cannot let replay skip the event. ✅ 🟡 Minor (RFC): fenced code block in §3.1.7 now tagged ```text so markdownlint MD040 stops flagging it. ✅ 🟡 Minor (test_reflection_filters.py): file header comment claimed `aget_followup_topics` "also excludes derived-confirmed" — that was the round-8 behavior which round-9 reverted per user design call. Rewrote to reflect current behavior: derived-confirmed pending IS a valid followup candidate. ❌ 🟡 Rejected (USER_KEYWORD_REBUT_DELTA value): CodeRabbit claimed config has 0.5 but RFC has 1.0. Verified: config/__init__.py, RFC §3.1.8 table, RFC §3.4.1 table, RFC Appendix A.1, and memory_server.py all use 1.0 consistently. CodeRabbit hallucinated the mismatch — no code change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hongzhi Wen <cartabio.coder1@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4d27cdb commit ae1546f

17 files changed

Lines changed: 4220 additions & 261 deletions

config/__init__.py

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -786,6 +786,73 @@ def translate_value(val):
786786
TIME_COMPRESSED_TABLE_NAME = "time_indexed_compressed"
787787

788788

789+
# ── Memory evidence mechanism (docs/design/memory-evidence-rfc.md) ────
790+
# 用户驱动的 evidence 计数器相关常量。所有评分计算都以 "净用户确认次数"
791+
# 为单位(§3.1.2 偏离 task spec 原公式——去掉 importance 项)。阈值改值
792+
# 会产生实际 behavior 变化,详见 RFC §6.5 pre-merge reviewer gates。
793+
794+
# §3.1.4 派生状态阈值
795+
EVIDENCE_CONFIRMED_THRESHOLD = 1.0 # score ≥ 1 → confirmed
796+
EVIDENCE_PROMOTED_THRESHOLD = 2.0 # score ≥ 2 → promoted
797+
EVIDENCE_ARCHIVE_THRESHOLD = -2.0 # score ≤ -2 → archive_candidate
798+
799+
# §3.5.3 归档相关(sub_zero_days 计数 + 分片大小上限)
800+
EVIDENCE_ARCHIVE_DAYS = 14 # sub_zero 累计达此天数 → 真正归档
801+
ARCHIVE_FILE_MAX_ENTRIES = 500 # 归档分片文件单文件最大 entry 数
802+
803+
# §3.1.5 ignored 扣分
804+
IGNORED_REINFORCEMENT_DELTA = -0.2 # check_feedback ignored → reinforcement += delta
805+
806+
# §3.1.8 每种 signal 源的 delta 权重(v1.2.1:区分 direct vs indirect)
807+
# 直接信号(用户显式回应 surfaced reflection 或命中负面关键词)权重 1.0;
808+
# 间接信号(Stage-2 LLM 推断 fact 对 reflection 的关系)权重 0.5,避免
809+
# LLM 误关联把 evidence 污染太快。
810+
USER_FACT_REINFORCE_DELTA = 0.5 # Stage-2 reinforces(间接,银标准)
811+
USER_FACT_NEGATE_DELTA = 1.0 # Stage-2 negates(否定即使间接也保留强权,
812+
# 因 LLM 判 negates 通常语义更明确)
813+
USER_CONFIRM_DELTA = 1.0 # check_feedback confirmed(直接,金标准)
814+
USER_REBUT_DELTA = 1.0 # check_feedback denied(直接)
815+
USER_KEYWORD_REBUT_DELTA = 1.0 # 关键词 + LLM target 检查(直接 + 显式)
816+
817+
# user_fact reinforces 的 combo bonus:累计 count 超过阈值后,每条新信号额
818+
# 外加 bonus,让"用户反复间接表达"的信号仍能追上"一次直接确认"的权重。
819+
# 默认:前 2 条各 0.5;第 3 条起每条 0.5 + 0.5 bonus = 1.0。
820+
USER_FACT_REINFORCE_COMBO_THRESHOLD = 2 # count > threshold 时激活
821+
USER_FACT_REINFORCE_COMBO_BONUS = 0.5 # 超阈值后每条的额外加权
822+
823+
# §3.4.3 signal 抽取背景循环触发条件
824+
EVIDENCE_SIGNAL_CHECK_ENABLED = True # 独立开关
825+
EVIDENCE_SIGNAL_CHECK_EVERY_N_TURNS = 10 # 累积 N 轮触发
826+
EVIDENCE_SIGNAL_CHECK_IDLE_MINUTES = 5 # 或空闲 N 分钟触发
827+
EVIDENCE_SIGNAL_CHECK_INTERVAL_SECONDS = 40 # 轮询间隔(与 IDLE_CHECK_INTERVAL 对齐)
828+
EVIDENCE_DETECT_SIGNALS_MAX_OBSERVATIONS = 200 # Stage-2 prompt 带入的 existing 上限
829+
830+
# §3.6 render budget(PR-3 使用,此处先占位)
831+
PERSONA_RENDER_TOKEN_BUDGET = 2000 # 非-protected persona 预算
832+
REFLECTION_RENDER_TOKEN_BUDGET = 1000 # reflection 渲染预算
833+
PERSONA_RENDER_ENCODING = "o200k_base" # tiktoken encoding
834+
835+
# §3.9 merge-on-promote 节流(PR-3 使用)
836+
EVIDENCE_PROMOTE_RETRY_BACKOFF_MINUTES = 30 # 连续失败节流窗口
837+
EVIDENCE_PROMOTE_MAX_RETRIES = 5 # 死信阈值
838+
839+
# §6.5 pre-merge reviewer gates —— 草案值,reviewer 敲定前保留
840+
# Gate 1: 半衰期(§3.5.2)
841+
EVIDENCE_REIN_HALF_LIFE_DAYS = 30 # reinforcement 半衰期
842+
EVIDENCE_DISP_HALF_LIFE_DAYS = 180 # disputation 半衰期(longer than rein)
843+
844+
# Gate 2: reflection 合成 context 量(§3.4.3 阶段 2)
845+
REFLECTION_SYNTHESIS_CONTEXT_ABSORBED_COUNT = 10 # 最近 N 条 absorbed fact 作参考
846+
REFLECTION_SYNTHESIS_CONTEXT_ABSORBED_DAYS = 14 # 且在 N 天内
847+
848+
# Gate 3: LLM tier 选型(候选见 RFC §6.5 Gate 3 表)
849+
# "summary" = qwen-plus 级;"correction" = qwen-max 级;"emotion" = qwen-flash 级
850+
EVIDENCE_EXTRACT_FACTS_MODEL_TIER = "summary" # Stage-1 抽 fact
851+
EVIDENCE_DETECT_SIGNALS_MODEL_TIER = "correction" # Stage-2 判 signal 映射
852+
EVIDENCE_NEGATIVE_TARGET_MODEL_TIER = "emotion" # 关键词二次判定(延迟敏感)
853+
EVIDENCE_PROMOTION_MERGE_MODEL_TIER = "correction" # Promote 合并决策
854+
855+
789856
# Provider 相关配置已统一迁移至 config.providers, 此处仅 re-export 保持向后兼容
790857
from config.providers import ( # noqa: E402, F401
791858
EXTRA_BODY_OPENAI,
@@ -907,4 +974,36 @@ def translate_value(val):
907974
# OpenFang
908975
'OPENFANG_PORT',
909976
'OPENFANG_BASE_URL',
977+
# Memory evidence mechanism (RFC: docs/design/memory-evidence-rfc.md)
978+
'EVIDENCE_CONFIRMED_THRESHOLD',
979+
'EVIDENCE_PROMOTED_THRESHOLD',
980+
'EVIDENCE_ARCHIVE_THRESHOLD',
981+
'EVIDENCE_ARCHIVE_DAYS',
982+
'ARCHIVE_FILE_MAX_ENTRIES',
983+
'IGNORED_REINFORCEMENT_DELTA',
984+
'USER_FACT_REINFORCE_DELTA',
985+
'USER_FACT_NEGATE_DELTA',
986+
'USER_CONFIRM_DELTA',
987+
'USER_REBUT_DELTA',
988+
'USER_KEYWORD_REBUT_DELTA',
989+
'USER_FACT_REINFORCE_COMBO_THRESHOLD',
990+
'USER_FACT_REINFORCE_COMBO_BONUS',
991+
'EVIDENCE_SIGNAL_CHECK_ENABLED',
992+
'EVIDENCE_SIGNAL_CHECK_EVERY_N_TURNS',
993+
'EVIDENCE_SIGNAL_CHECK_IDLE_MINUTES',
994+
'EVIDENCE_SIGNAL_CHECK_INTERVAL_SECONDS',
995+
'EVIDENCE_DETECT_SIGNALS_MAX_OBSERVATIONS',
996+
'PERSONA_RENDER_TOKEN_BUDGET',
997+
'REFLECTION_RENDER_TOKEN_BUDGET',
998+
'PERSONA_RENDER_ENCODING',
999+
'EVIDENCE_PROMOTE_RETRY_BACKOFF_MINUTES',
1000+
'EVIDENCE_PROMOTE_MAX_RETRIES',
1001+
'EVIDENCE_REIN_HALF_LIFE_DAYS',
1002+
'EVIDENCE_DISP_HALF_LIFE_DAYS',
1003+
'REFLECTION_SYNTHESIS_CONTEXT_ABSORBED_COUNT',
1004+
'REFLECTION_SYNTHESIS_CONTEXT_ABSORBED_DAYS',
1005+
'EVIDENCE_EXTRACT_FACTS_MODEL_TIER',
1006+
'EVIDENCE_DETECT_SIGNALS_MODEL_TIER',
1007+
'EVIDENCE_NEGATIVE_TARGET_MODEL_TIER',
1008+
'EVIDENCE_PROMOTION_MERGE_MODEL_TIER',
9101009
]

0 commit comments

Comments
 (0)