Commit fea9d9e
refactor(memory): IdleMaint 调度清理 + LLM timeout 兜底 + review snapshot 重设计开 thinking (Project-N-E-K-O#977)
* refactor(memory): IdleMaint gate 拆分 + 后台循环错峰 + Stage-2 tier 改 summary
A1. Outbox replay 并发 4→2,缓和 24h 停机后启动期 LLM 后端冲击。
A2. IdleMaint subtask 2 (persona 矛盾审视) 不再被 recent_memory_auto_review
或 REVIEW_SKIP_HISTORY_LEN 限制——resolve_corrections 不读 recent history,
属独立矛盾消解管线,本就不该跟 recent.review 共用一道闸门。把 review 闸
门移到 subtask 3 头部。
A3. 5 个后台循环加 _INITIAL_DELAY_* 错峰,避免首轮全部撞 startup + interval
同一时刻:
- IdleMaint: 20s(替换原 startup_phase 高频轮询机制)
- Signal extraction: 60s
- Rebuttal: 100s
- Auto-promote: 150s(与 rebuttal 错开 50s)
- Archive sweep: 250s(远小于 INTERVAL=3600s,确保短会话用户也能跑到一次)
顺手修了 except: continue 路径不 sleep 的 busy-loop 隐患(每个 except
分支补 await asyncio.sleep(INTERVAL))。
A4. Stage-2 signal detection tier 从 correction 改 summary(与 PR Project-N-E-K-O#972 docstring
对齐);同时把 promotion merge 从 memory/__init__.py + neko-guide.md 的
summary 列表挪到 correction 列表(与 EVIDENCE_PROMOTION_MERGE_MODEL_TIER
的实际值对齐)。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(memory): 给所有 LLM 调用站点显式 timeout + 禁掉 SDK 自动重试
之前所有 memory/ LLM 调用都没传 timeout,最坏走 OpenAI SDK 默认 600s ×
SDK 默认 max_retries=2 = 30 分钟单次调用上限。recent.py 和 facts.py
还有业务层 max_retries=3,叠加后单次 attempt 最坏可达 1.5 小时。
按调用路径性质给每个站点加显式 timeout + max_retries=0:
- recall._fine_rank: 8s(请求路径,上游 query_memory 5s 截断)
- recent.compress_history / further_compress (_get_llm): 30s(请求路径)
- recent.review_history (_get_review_llm): 120s(后台,prompt 长)
- persona._resolve_corrections_locked: 90s(持锁会卡 /process 路径)
- fact_dedup._aresolve_locked: 60s(持锁但只阻 background worker)
- facts._allm_call_with_retries (Stage-1/2/negative-keyword): 60s 默认
- reflection._synthesize_reflections_locked: 90s(持锁,输出多字段 JSON)
- reflection._check_feedback_locked: 60s(后台分类)
- reflection.check_feedback_for_confirmed: 60s(周期性反驳扫描)
- reflection._allm_call_promotion_merge: 45s(决策 prompt 短)
max_retries=0 把重试统一收口到业务层(已有的 _allm_call_with_retries
等),避免与 SDK 默认 max_retries=2 叠加翻 3 倍。SDK 抛超时直接走业务
层 retry 或外层 try/except 兜底。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(memory): review snapshot+capacity 重设计 + Stage-2 / review_history 开 thinking
Phase C — review 调度从"每次 /process cancel-and-restart"改为"统一 spawn
gate + 不打断":
- /process /renew /settle 不再 cancel 在跑的 review,改为调统一的
maybe_spawn_review(name),看到 in-flight 直接 skip 本次 spawn。
- IdleMaint subtask 3 也改调 maybe_spawn_review,删掉所有内联 gate。
- maybe_spawn_review 由 per-name asyncio.Lock 串行化 gate+spawn,跑 5 道闸:
in-flight / review_enabled / history_len / min_interval (active 时 ×2) /
自上次 cutoff 起累积 user msg ≥ MIN_NEW_MSGS_FOR_REVIEW(5)。
- REVIEW_MIN_INTERVAL 300s → 30s(配合 MIN_NEW_MSGS=5 + active ×2 双重限流)。
review_history 接受 snapshot 参数(spawn 时拍下的 history 副本):
- LLM 输入用 snapshot 不动当前 history → 期间 /process 可继续追加、压缩等
- 完成时基于 snapshot 末尾 K=3 条 fingerprint 在当前 history 里定位 cutoff_idx
- 逆向走出 capacity(连续匹配长度),用 corrected 末尾
min(capacity, len(corrected)) 条替换 [cutoff_idx-capacity+1, cutoff_idx]
这段 slot;cutoff_idx 之后的新增消息保留不动
- review 输出比 capacity 短 = review 决定删条 → 结果就比原来短
- cutoff 在当前 history 里失配(被压缩 / 被 /new_dialog 清空)→ 'white' 返回
→ caller 把 last_reviewed_cutoff_tail 设 None → 下一轮门评估视为∞放行 →
立即重 review 重建 fingerprint
- review LLM 输出里的 SystemMessage(summary 备忘录)强制丢弃,保护压缩边界
新增持久化字段 _maint_state[name].last_reviewed_cutoff_tail (K=3 fingerprint)。
Phase D — 开 thinking:
- Stage-2 signal detection ([memory/facts.py](memory/facts.py)
_allm_detect_signals): 显式传 extra_body=None 关闭自动解析,让 thinking
模型按默认行为响应;timeout 拉到 90s。任务是 new_fact × existing_observation
的关系判断 + target_id 选择,现有防御代码就在补 LLM 幻觉,思考能减少
target_id 错位。完全后台无人等。
- review_history (recent.py:_get_review_llm): 显式传 extra_body=None 开
thinking。Phase C 重设计后 review 不持任何 manager 锁、不阻塞用户路径、
并发跑也无所谓——开 thinking 完全在收益侧(重写历史的判断密度高)。
timeout 保持 120s。
_allm_call_with_retries helper 加 extra_body 参数(默认 sentinel 表示"调用方
没指定,走 create_chat_llm 自动解析",显式 None 表示"开 thinking"),保持
Stage-1 fact extract / negative keyword check 行为不变。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(memory): 给 maybe_spawn_review 的空 except 加语义注释
回应 github-code-quality bot 在 PR Project-N-E-K-O#977 的 inline 提示——last_review_ts
解析失败时 pass 的目的(视为'从未 review 过')需要明示。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): 回应 CodeRabbit 三条 review (review snapshot 自愈 / 身份比对 / patched fingerprint)
Issue #1(白 review 不该刷 last_review_ts,Major):
review_history 返回 'white' 时,原代码同时更新 last_review_ts。这让 gate 4
(min_interval) 继续挡 30/60s,违背了用户原意——白 review 本身就是"cutoff
失效,应尽快重建锚点"的强信号。改为:白 review 时只清空 fingerprint,不
动 last_review_ts,下轮 gate 4 用旧 ts(通常已过门)+ gate 5 视 ∞ 通行
→ 立即重 review。
Issue #2(finally 清理可能误删新 spawn 的句柄,Critical):
原 finally 无条件 pop()/clear() 会在并发场景下误删 maybe_spawn_review 刚
写入的 correction_tasks / correction_cancel_flags。理论上 spawn lock +
asyncio finally 同步语义已经排除了这种 race(done() 直到 finally 完成才
返回 True,maybe_spawn_review 的 in-flight 检查不会通过),但身份比对是
廉价的防御。改为:按 asyncio.current_task() / cancel_event 身份比对再
pop,确保只清自己的条目。
Issue Project-N-E-K-O#3(成功路径需要返回 patched 后的 fingerprint,Major):
review_history 之前返回 True,调用方对 snapshot 做 build_review_fingerprint。
但 review 可能改写过末尾 K 条里的任一条——存的旧 fingerprint 在新
history 里再也定位不到,下次 _count_new_user_msgs_since_last_review 退化
成 ∞ 永真,gate 5 形同虚设,每次 /process 都触发 review。改为:
- review_history 返回 (status, fingerprint) tuple
('patched', new_fp) / ('white', None) / ('failed', None)
- new_fp 由 review_history 内部基于 patched 后的 new_history 末尾算出
- 调用方直接写入 maint_state,不再用 build_review_fingerprint(snapshot)
附带处理:
- corrected 为空(罕见 LLM 返回空"修正后的对话")时按白 review 处理,
避免 anchor 漂移到非 review 区
- _run_review_in_background 显式接收 cancel_event 参数(不再从 dict 拿),
与身份比对配套
- 已有手工算法验证:OLD fingerprint (snapshot tail) 在 patched history 里
找不到;NEW fingerprint (patched tail) 在 patched history 里找得到 ✓
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): review LLM 输出 content 归一化为 str (CodeRabbit Issue Project-N-E-K-O#4)
review prompt 让 LLM 返回 {role, content} JSON,但 thinking 模型偶尔会把
content 输出为 list/dict(多模态 segment 风格)。原代码直接塞进
HumanMessage(content=...),下游(recall / prompt build / fingerprint
比对的 content[:50] 截取)拿到非字符串会炸。
复用 compress_history 已有的归一化策略:list → 拼 dict.text 或 str(item);
其他 → str(item)。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongzhi Wen <cartabio.coder1@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 6cd35ca commit fea9d9e
10 files changed
Lines changed: 608 additions & 270 deletions
File tree
- .agent/rules
- config
- memory
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1237 | 1237 | | |
1238 | 1238 | | |
1239 | 1239 | | |
1240 | | - | |
| 1240 | + | |
1241 | 1241 | | |
1242 | 1242 | | |
1243 | 1243 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
400 | 400 | | |
401 | 401 | | |
402 | 402 | | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
403 | 406 | | |
404 | 407 | | |
405 | 408 | | |
| 409 | + | |
406 | 410 | | |
407 | 411 | | |
408 | 412 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
50 | 55 | | |
51 | 56 | | |
52 | 57 | | |
| |||
307 | 312 | | |
308 | 313 | | |
309 | 314 | | |
| 315 | + | |
| 316 | + | |
310 | 317 | | |
311 | 318 | | |
312 | 319 | | |
313 | 320 | | |
314 | 321 | | |
315 | 322 | | |
316 | 323 | | |
317 | | - | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
318 | 335 | | |
319 | 336 | | |
320 | 337 | | |
| |||
323 | 340 | | |
324 | 341 | | |
325 | 342 | | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
326 | 346 | | |
327 | 347 | | |
328 | 348 | | |
| 349 | + | |
329 | 350 | | |
330 | 351 | | |
331 | 352 | | |
| |||
704 | 725 | | |
705 | 726 | | |
706 | 727 | | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
707 | 733 | | |
708 | 734 | | |
709 | 735 | | |
710 | 736 | | |
| 737 | + | |
| 738 | + | |
711 | 739 | | |
712 | 740 | | |
713 | 741 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1605 | 1605 | | |
1606 | 1606 | | |
1607 | 1607 | | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
1608 | 1611 | | |
1609 | 1612 | | |
1610 | 1613 | | |
| 1614 | + | |
1611 | 1615 | | |
1612 | 1616 | | |
1613 | 1617 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
385 | 385 | | |
386 | 386 | | |
387 | 387 | | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
388 | 392 | | |
389 | 393 | | |
390 | 394 | | |
| 395 | + | |
391 | 396 | | |
392 | 397 | | |
393 | 398 | | |
| |||
0 commit comments