Commit 4b349b6
* fix(memory): 7 处 LLM 终态失败的 liveness dead-letter 兜底 (#1409)
毒 input(safety filter / 永远 parse 不出 / prompt 过长等)让 LLM
``_allm_call_with_retries`` 永久耗尽 → 上层不动 progress marker → 同一
输入下轮再被打到 LLM 同样失败 → 永久卡死该角色 pipeline。issue #1409
诊断出 7 处同源缺口,统一加 ``MEMORY_LIVENESS_MAX_ATTEMPTS=5`` per-input
attempt counter,超阈值强推 marker / 丢 dead-letter,对偶 schema 重判
已有的 ``_bump_fact_recheck_attempts`` pattern。
修复 site:
- 0a/0b: signal extract path A/B 同 cursor 反复 ``FactExtractionFailed`` /
``aextract_facts_with_known_pool 返 None`` → 强推 cursor (in-memory)
- 1: rebuttal loop 同 cursor 反复 ``check_feedback_for_confirmed 返 None``
→ 强推落盘 ``CURSOR_REBUTTAL_CHECKED_UNTIL`` (in-memory counter, cursor 落盘)
- 2: ``PersonaManager.resolve_corrections`` 批量 LLM 失败 → 给本批 entry
bump 落盘 ``resolve_attempts``,超阈值从 queue 丢 dead-letter
- 3: ``FactDedupResolver.aresolve`` 同上(pair 落盘 ``resolve_attempts``)
- 4: refine cluster LLM 失败 → 给非 fact 成员 bump 落盘 ``refine_attempts``;
达上限的 entry 在下次 cluster gather 过滤掉;apply_refine_actions
在 stamp 成功时清回 0
- 7: outbox handler 永久 raise → append_attempt 行落盘;累计达上限
append_done 当 dead-letter,顺带解锁 compact 永久阻塞
新增 ``config.MEMORY_LIVENESS_MAX_ATTEMPTS=5``。已有保护参考
(``_bump_fact_recheck_attempts`` + ``_apromote_with_merge`` dead-letter)
不动。``recent.review_history`` / ``synthesize_reflections`` / embedding
worker 自然演进或本地服务,不属本类,不收。
测试 ``tests/unit/test_memory_liveness_dead_letter.py`` 17 条覆盖每 site:
N-1 次不动 marker、N 次触发 dead-letter、成功路径清 counter。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tests): 清掉 test_memory_liveness_dead_letter 里 lint 报的 3 处冗余
- 删 `import os` / `import tempfile` (整文件未引用)
- `op_id = outbox.append_pending(...)` 在 test_run_outbox_op_dead_letters_at_threshold
里没后续使用(pending op 通过 outbox.pending_ops 重新读),改为不接 return value
不动 config 那条 `MEMORY_LIVENESS_MAX_ATTEMPTS` "unused global" 告警——它实际被 4 处
跨模块 import 使用(persona / fact_dedup / refine driver / memory_server),CodeQL 只扫
模块内 scope 看不到 cross-module 引用,是误报。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): 3 处 review feedback 修复(Codex P1 + CodeRabbit P1×2)
1. Outbox `_run_outbox_op` dead-letter 守 attempt 持久化失败 (Codex P1)
- 旧逻辑:`aappend_attempt` 抛异常时仍递增 `total_attempts = prior+1`
并可能立即 `aappend_done`,但 attempt 没落盘 → 重启后磁盘只看到
prior_attempts (=N-1) 个 attempt + 1 个 done → op 永久丢但记录显示
"只失败 N-1 次就 done",违背 ≥ N 次失败才放弃契约
- 修:attempt 持久化失败时按 transient 处理,保留 pending 等下次重放
- 加 regression test:模拟 N-1 个 attempt + IOError 注入,验证不 dead-letter
2. Site 0a cursor_key 用 start_time 不要字面 'cold' (CodeRabbit P1)
- 旧 `cursor_key = state.get('last_check_ts') or 'cold'` 把所有冷启动多轮
失败聚合到同一桶,第 N 次强推 cursor 到当时的 now,把那段时间进来的
正常 msg 一起 dead-letter
- 修:`cursor_key = start_time.isoformat(...)`——有 cursor 时 start_time
== cursor(_signal_check_window_start 直接返 cursor),冷启动时
start_time 每轮不同 → 不误聚合
3. dedup + corrections "LLM 返 list 但 0 消费" 也算 attempt (CodeRabbit P1 + 对偶)
- 旧逻辑:LLM 输出有效 list 但 action 全无效(unknown action / invalid
index / format 错),`_aapply_decisions` / `_apply_correction_results`
返 0 → 队列原样保留 → 队头同样毒 batch 下次重喂 → 永久卡
- 修:dedup 和 corrections 都在 `processed_keys=空` / `resolved=0` 时
调 `_abump_*_attempts_and_dead_letter` 计 attempt(对偶:CodeRabbit
只提了 dedup,对偶补 corrections)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): 2 处 CodeRabbit P1 round-2 修复
1. _run_outbox_op 进门先短路已 dead-letter 的 op
- 边缘场景:上轮 attempt 推到 N + aappend_done 失败 → op 留 pending →
重启 replay 再跑 handler 一次。对非幂等 handler(outbox 契约要求幂等
但不强制)就是真重复副作用
- 修:进门先 `prior_attempts >= MAX` 短路直接补 append_done,handler
绝不再跑。MEMORY_LIVENESS_MAX_ATTEMPTS / prior_attempts 同步 hoist 到
函数顶部,去掉 except 块里的重复 import / 计算
- 加 regression test test_run_outbox_op_short_circuits_when_already_dead_letter:
mock handler 跑就报错,验证 short-circuit 路径不触达 handler
2. _abump_refine_attempts (reflection) 保存前过滤 terminal status
- `_aload_reflections_full` 加载含 terminal entries 的全集;如果传给
`asave_reflections` → `_prepare_save_reflections` 把 active_ids 当成
"想要存活的集",磁盘上同 id 的全部 continue 不归档 → 老 promoted/denied
永远 archive 不掉
- 修:跟 arecord_mentions / aupdate_suppressions / _aauto_promote_stale_locked
同款过滤 REFLECTION_TERMINAL_STATUSES 再 save。persona 这边无 archive 流程不
需要
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): refine exception 路径不计 refine_attempts (Codex P1 round-3)
`_resolve_cluster` 内部 `await apply_fn(...)` (manager 端持久化) 没有 try
包裹,apply_fn 抛异常(cloudsave 维护态 / atomic_write IO / 锁竞争)会直接
冒到 `refine_pass` 的 `except Exception as e:` 块,被当 cluster liveness
failure 计 attempt。Apply 失败是磁盘/锁层 transient,跟 cluster 内容无关,
不该让 entry refine_attempts 因此累计触发非必要 dead-letter。
修:refine_pass 只在 `_resolve_cluster` 明确返 False(=LLM 输出空 / parse
失败 / 非 list 等持续性问题,跟 cluster 内容相关)时调 failure_fn。
exception 路径仅 failed++ + warn,不 bump。
LLM 网络 transient (ainvoke 抛 TimeoutError 等) 同理被归类 transient 不
bump——LLM 持续性问题已被 return False 路径覆盖。
加 regression test `test_refine_pass_exception_does_not_invoke_failure_fn`:
mock `_resolve_cluster` 抛 RuntimeError,验证 failure_fn 不被调。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(refine): 同步 FailureFn 契约说明跟上一轮实现
42edea7 改了实现(exception 路径不调 failure_fn 不计 refine_attempts),
但 FailureFn 类型注释和 refine_pass docstring 还写着"返 False 或抛异常时
调用",跟实际不符——容易让后续 manager-side 实现者按旧契约重新把 exception
也算进 refine_attempts。doc-only 跟代码对齐。
CodeRabbit Minor 反馈。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): liveness attempt 字段读取脏值兜底 (Codex P2 round-4)
`int(d.get('refine_attempts', 0) or 0)` 等模式在 manual edit / legacy /
migration noise 写进 `""` / `"unknown"` / list / dict 等脏值时会抛
ValueError/TypeError,让上游 list comprehension(候选 gather / batch 选取)
挂掉整个 refine pass / resolve loop —— liveness 兜底自己变了新的 liveness
缺口。
加 `memory.facts.safe_int_field` 共享 helper(跟既有 `safe_importance` 同邻
居),兜底任何脏值返 default 不抛。9 处 call site 全部换用:
- memory/fact_dedup.py: 2 处 (resolve_attempts gather + bump)
- memory/persona.py: 3 处 (resolve_attempts gather/bump + refine_attempts bump)
- memory/reflection.py: 1 处 (refine_attempts bump,function-local import)
- app/memory_server.py: 3 处 (outbox _attempt_count + 2 个 refine driver
函数的 candidate gather filter)
加 regression test `test_safe_int_field_handles_dirty_values` 覆盖 ''、
'unknown'、'high'、list、dict 等脏值,确保各场景兜底不抛。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongzhi Wen <cartabio.coder1@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 21677b3 commit 4b349b6
9 files changed
Lines changed: 1498 additions & 22 deletions
File tree
- app
- config
- memory
- tests/unit
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1110 | 1110 | | |
1111 | 1111 | | |
1112 | 1112 | | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
1113 | 1125 | | |
1114 | 1126 | | |
1115 | 1127 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| |||
375 | 376 | | |
376 | 377 | | |
377 | 378 | | |
| 379 | + | |
378 | 380 | | |
379 | 381 | | |
380 | 382 | | |
| |||
384 | 386 | | |
385 | 387 | | |
386 | 388 | | |
387 | | - | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
388 | 401 | | |
389 | 402 | | |
390 | 403 | | |
| |||
421 | 434 | | |
422 | 435 | | |
423 | 436 | | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
424 | 440 | | |
425 | 441 | | |
426 | 442 | | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
427 | 449 | | |
428 | 450 | | |
429 | 451 | | |
430 | 452 | | |
431 | 453 | | |
432 | 454 | | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
433 | 469 | | |
434 | 470 | | |
435 | 471 | | |
| |||
463 | 499 | | |
464 | 500 | | |
465 | 501 | | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
466 | 551 | | |
467 | 552 | | |
468 | 553 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
77 | 100 | | |
78 | 101 | | |
79 | 102 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
136 | 154 | | |
137 | 155 | | |
138 | 156 | | |
| |||
154 | 172 | | |
155 | 173 | | |
156 | 174 | | |
157 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
158 | 182 | | |
159 | 183 | | |
160 | 184 | | |
161 | 185 | | |
162 | 186 | | |
| 187 | + | |
163 | 188 | | |
164 | 189 | | |
165 | 190 | | |
| |||
172 | 197 | | |
173 | 198 | | |
174 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
175 | 205 | | |
176 | 206 | | |
177 | 207 | | |
178 | 208 | | |
179 | 209 | | |
180 | | - | |
| 210 | + | |
| 211 | + | |
181 | 212 | | |
182 | 213 | | |
183 | 214 | | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
184 | 220 | | |
185 | 221 | | |
186 | 222 | | |
187 | 223 | | |
188 | 224 | | |
| 225 | + | |
189 | 226 | | |
190 | 227 | | |
191 | 228 | | |
| |||
195 | 232 | | |
196 | 233 | | |
197 | 234 | | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
198 | 245 | | |
199 | 246 | | |
200 | | - | |
| 247 | + | |
201 | 248 | | |
202 | 249 | | |
203 | 250 | | |
| |||
206 | 253 | | |
207 | 254 | | |
208 | 255 | | |
209 | | - | |
| 256 | + | |
210 | 257 | | |
211 | 258 | | |
212 | 259 | | |
| |||
237 | 284 | | |
238 | 285 | | |
239 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
240 | 290 | | |
241 | 291 | | |
242 | 292 | | |
| |||
0 commit comments