压缩持续失败时用受保护的后台分段压缩兜底,防上下文无限增长#1632
Conversation
#1629 兜住了暂时性压缩失败(限流抖一下、下轮恢复:失败跳过本轮、保留完整 历史、下轮重试)。但持续性失败(一直 429 / 一直超时 / 被后续对话并发覆盖) 会让历史一直压不掉、无限膨胀。本改动在主路径压缩失败时起一个受保护的一次性 后台压缩做 best-effort 兜底,主路径某轮成功就 cancel 它;实在不行则历史超一个 特别大的硬上限时丢弃最旧的未压缩原文,保证有界。 - recent.py:compress_history 重构出可复用 helper(_render_messages_to_text / _build_summary_prompt / _invoke_summary_llm),单次路径行为不变;输入过大时 走分段 map-reduce 压缩,减小单次 LLM 输入、避免输入过大超时。update_history 加 on_compress_done 回调钩子;新增 merge_backup_memo(fingerprint 快照对齐 合并写回,复用 _compute_review_capacity)与 _enforce_hard_cap(最终兜底裁剪)。 - memory_server.py:_on_compress_done 回调(失败起后台 / 成功 cancel + 清退避); _run_backup_compress 编排(compress 在锁外、merge 在 _get_settle_lock 内); compress_backup_tasks in-flight 去重;复用 Gate6 失败退避防 summary 模型持续 故障时空烧;在 /process、/renew、/settle、IdleMaint 四个压缩调用点接线。 - config:RECENT_COMPRESS_INPUT_BUDGET_TOKENS(分段,8000)、RECENT_HARD_CAP_TOKENS (兜底,60000,设很大平时不触发)。 - 测试:分段切分/路径、硬上限裁剪、快照合并(merged/moot)、回调 ok 真假、 后台退避/in-flight/dead-letter/复位。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
Walkthrough本 PR 为历史压缩主路径失败新增后台兜底任务链并重构 compress_history 为模块化分段 map-reduce,加入硬上限裁剪、指纹退避与合并回写,并在 IdleMaint 与对话端点注入回调以联动兜底行为喵。 Changes近期历史压缩兜底与流程重构
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 781e66405f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if await acount_tokens(summary) > MAX_SUMMARY_TOKENS: | ||
| reduced = await self.further_compress(summary) | ||
| if reduced is None: | ||
| logger.warning(f"[RecentHistory] {lanlan_name} 二次压缩失败,跳过本轮压缩") | ||
| return None |
There was a problem hiding this comment.
Keep retrying after second-stage compression fails
When Stage 1 returns an oversized summary and further_compress() exhausts its own retries once, this now returns None immediately and skips the whole compression. Before this refactor, the outer summary loop continued and retried Stage 1 up to its remaining attempts, which matters when the second-stage call is transiently failing or when another Stage-1 response would be short enough to avoid Stage 2. In that scenario long histories remain uncompressed and fall into the new backup/hard-cap path unnecessarily.
Useful? React with 👍 / 👎.
| async with _get_settle_lock(lanlan_name): | ||
| status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0]) |
There was a problem hiding this comment.
Serialize backup merges with /process writes
This lock only protects callers that also use _get_settle_lock, but /process still writes recent history without that lock at app/memory_server.py:3937. If a backup merge reaches the awaited disk-write section while a /process update reloads and writes the same recent.json, the last writer can drop either the newly generated memo or the just-added turn, so the backup merge is not actually protected during active chat.
Useful? React with 👍 / 👎.
| def _trim(): | ||
| if count_tokens(self._render_messages_to_text(history, lanlan_name)) <= RECENT_HARD_CAP_TOKENS: | ||
| return None # 未超,不动 |
There was a problem hiding this comment.
Count the full history when enforcing the hard cap
_render_messages_to_text() truncates every message to RECENT_PER_MESSAGE_MAX_TOKENS for summary input, but the context path later appends the full message content in app/memory_server.py:4363-4369. With sustained compression failures and long pasted turns, the hard-cap check can count hundreds of >500-token messages as under RECENT_HARD_CAP_TOKENS while the actual injected recent history is far above the cap, so the new fallback does not reliably bound prompt size.
Useful? React with 👍 / 👎.
| except Exception as e: | ||
| logger.error(f"[RecentHistory] {lanlan_name} 后台压缩合并落盘失败: {e}", exc_info=True) | ||
| logger.info( | ||
| f"[RecentHistory] {lanlan_name} 后台压缩合并完成:history {len(current)}→{len(new_history)}" | ||
| ) | ||
| return 'merged' |
There was a problem hiding this comment.
Report failed backup writes as failures
If the atomic write fails here (for example a transient filesystem/cloud-save error), this logs the exception but still falls through to return 'merged'; _run_backup_compress() then clears the backup failure state and reports success. That leaves only the in-memory memo while recent.json remains uncompressed, so a restart or reload loses the successful backup compression and the retry budget has already been reset.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
tests/unit/test_recent_compress_backup.py (1)
126-150: ⚡ Quick win用例名写了“resets”但缺少状态复位断言,建议补齐喵。
当前只验证“会重新起后台任务”,还应同时断言 dead-letter 状态被清零并已触发持久化保存,避免将来回归漏检喵。
可直接补的断言示例喵
- with patch.object(memory_server, "recent_history_manager", fake_mgr), \ - patch.object(memory_server, "_asave_maint_state", AsyncMock()): + with patch.object(memory_server, "recent_history_manager", fake_mgr), \ + patch.object(memory_server, "_asave_maint_state", AsyncMock()) as save_state: await memory_server._on_compress_done(name, new_snapshot, ok=False, detailed=False) # 输入变了 → 复位放行,起了后台 task = memory_server.compress_backup_tasks.get(name) assert task is not None + assert memory_server._maint_state[name]["compress_backup_fail_attempts"] == 0 + assert memory_server._maint_state[name]["compress_backup_fail_fp"] is None + save_state.assert_awaited_once() await _cleanup_task(task)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_recent_compress_backup.py` around lines 126 - 150, Add assertions after awaiting memory_server._on_compress_done to verify the dead-letter state was cleared and persisted: assert memory_server._maint_state[name]["compress_backup_fail_attempts"] == 0 and that memory_server._maint_state[name]["compress_backup_fail_fp"] is falsy (e.g. None or empty string) to confirm the fingerprint was cleared, and assert the patched AsyncMock memory_server._asave_maint_state was awaited (e.g. _asave_maint_state.assert_awaited()) so persistence was triggered; keep existing checks that a background task was started via memory_server.compress_backup_tasks.get(name).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/memory_server.py`:
- Around line 3416-3453: _on_compress_done currently performs blocking awaits
(_clear_compress_backup_failure and _asave_maint_state) while running inside the
settle/renew critical section; change it to only mutate in-memory _maint_state
and fire-and-forget the persistence: in the ok=True branch update/reset the
in-memory compress-backup failure counters in _maint_state (do not await
_clear_compress_backup_failure) and spawn a background task to call
_clear_compress_backup_failure (or a small wrapper that does the save) via
_spawn_background_task; likewise, when resetting fail counters after input
change set state['compress_backup_fail_attempts']=0 and
state['compress_backup_fail_fp']=None and call _spawn_background_task to run
_asave_maint_state (do not await); keep spawn/cancel logic for
_run_backup_compress as-is so no awaits happen inside the critical path.
- Around line 3399-3409: compress/merge-stage exceptions from
recent_history_manager.merge_backup_memo and the subsequent
_clear_compress_backup_failure are currently not counted toward
compress_backup_fail_attempts; wrap the merge+clear steps in their own
try/except that, on any Exception (but not asyncio.CancelledError), calls the
same failure-counter helper used when compress_history() fails to bump
compress_backup_fail_attempts for lanlan_name (reuse the existing helper that
increments compress_backup_fail_attempts), then re-raise or log consistently;
keep the existing asyncio.CancelledError handling separate and ensure you
reference recent_history_manager.merge_backup_memo,
_clear_compress_backup_failure, compress_history(), and
compress_backup_fail_attempts when making the change.
In `@memory/recent.py`:
- Around line 665-667: Remove the incorrect early-return fast-path that assumes
few messages cannot exceed token hard cap: delete the len(history) <=
self.max_history_length + 1 check in the block handling
self.user_histories[lanlan_name] and instead compute the actual token usage for
`history` (using the existing token-counting utility in this module/class) and
only return when the computed token count is safely <= RECENT_HARD_CAP_TOKENS;
keep references to `history`, `self.user_histories`, `self.max_history_length`
and `RECENT_HARD_CAP_TOKENS` so the logic enforces token-based truncation rather
than message-count heuristics.
- Around line 573-586: The loop in reduce (using _split_texts_by_budget,
_invoke_summary_llm, _build_summary_prompt) currently breaks when len(batches)
>= len(partials) but then returns "\n\n".join(partials), potentially handing
back already-over-budget partials to compress_history; change that behavior so
when the reduction cannot shrink further (len(batches) >= len(partials)) the
function returns None (or another explicit failure signal) immediately instead
of breaking and returning partials, so upstream (compress_history) can handle
the over-budget case instead of re-sending an oversized chunk.
---
Nitpick comments:
In `@tests/unit/test_recent_compress_backup.py`:
- Around line 126-150: Add assertions after awaiting
memory_server._on_compress_done to verify the dead-letter state was cleared and
persisted: assert
memory_server._maint_state[name]["compress_backup_fail_attempts"] == 0 and that
memory_server._maint_state[name]["compress_backup_fail_fp"] is falsy (e.g. None
or empty string) to confirm the fingerprint was cleared, and assert the patched
AsyncMock memory_server._asave_maint_state was awaited (e.g.
_asave_maint_state.assert_awaited()) so persistence was triggered; keep existing
checks that a background task was started via
memory_server.compress_backup_tasks.get(name).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e367eb4a-1715-4817-99da-3f1fbaf2724f
📒 Files selected for processing (5)
app/memory_server.pyconfig/__init__.pymemory/recent.pytests/unit/test_recent_compress_backup.pytests/unit/test_recent_compression_failure.py
| # 2) 合并写回(锁内,快)。merge_backup_memo 用 fingerprint 对齐,积压已被 | ||
| # 主路径压掉 / 被清空就返回 'moot' 丢弃(白做)。 | ||
| async with _get_settle_lock(lanlan_name): | ||
| status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0]) | ||
| # 'merged' 或 'moot' 都说明这段积压已处理 / 已过时,清退避计数。 | ||
| await _clear_compress_backup_failure(lanlan_name) | ||
| logger.info(f"[CompressBackup] {lanlan_name} 后台压缩完成:{status}") | ||
| except asyncio.CancelledError: | ||
| logger.info(f"[CompressBackup] {lanlan_name} 后台压缩被取消(主路径已成功)") | ||
| except Exception as e: | ||
| logger.error(f"[CompressBackup] {lanlan_name} 后台压缩后处理出错: {e}") |
There was a problem hiding this comment.
把 merge/回写阶段的异常也计入退避喵
现在只有 compress_history() 失败才会 bump compress_backup_fail_attempts。如果 merge_backup_memo() 或后面的 _clear_compress_backup_failure() 持续抛错,这个任务会直接退出且不记失败;下一次主路径压缩失败又会对同一份 snapshot 重新起后台压缩,等于把这套 Gate6 退避绕过去了喵。这样磁盘/merge 侧的持续故障还是会反复空烧 summary 调用喵。
😼 可参考的修法喵
- async with _get_settle_lock(lanlan_name):
- status = await recent_history_manager.merge_backup_memo(lanlan_name, snapshot, result[0])
- # 'merged' 或 'moot' 都说明这段积压已处理 / 已过时,清退避计数。
- await _clear_compress_backup_failure(lanlan_name)
- logger.info(f"[CompressBackup] {lanlan_name} 后台压缩完成:{status}")
+ try:
+ async with _get_settle_lock(lanlan_name):
+ status = await recent_history_manager.merge_backup_memo(
+ lanlan_name, snapshot, result[0]
+ )
+ # 'merged' 或 'moot' 都说明这段积压已处理 / 已过时,清退避计数。
+ await _clear_compress_backup_failure(lanlan_name)
+ logger.info(f"[CompressBackup] {lanlan_name} 后台压缩完成:{status}")
+ except Exception as e:
+ attempts = await _record_compress_backup_failure(lanlan_name, snapshot)
+ logger.warning(
+ f"[CompressBackup] {lanlan_name} 后台压缩后处理失败,"
+ f"退避计数 → {attempts}: {e}"
+ )
+ return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/memory_server.py` around lines 3399 - 3409, compress/merge-stage
exceptions from recent_history_manager.merge_backup_memo and the subsequent
_clear_compress_backup_failure are currently not counted toward
compress_backup_fail_attempts; wrap the merge+clear steps in their own
try/except that, on any Exception (but not asyncio.CancelledError), calls the
same failure-counter helper used when compress_history() fails to bump
compress_backup_fail_attempts for lanlan_name (reuse the existing helper that
increments compress_backup_fail_attempts), then re-raise or log consistently;
keep the existing asyncio.CancelledError handling separate and ensure you
reference recent_history_manager.merge_backup_memo,
_clear_compress_backup_failure, compress_history(), and
compress_backup_fail_attempts when making the change.
| async def _on_compress_done(lanlan_name: str, snapshot: list, ok: bool, detailed: bool): | ||
| """update_history 压缩结束回调(recent.py 注入)。 | ||
| ok=True(主路径压成功)→ cancel 在跑的后台兜底 + 清退避; | ||
| ok=False(主路径压失败)→ 起一个受保护的后台兜底压缩(若无在跑、未被退避挡)。 | ||
|
|
||
| 本回调只 spawn / cancel task,不 await 后台 LLM——它可能在 _get_settle_lock | ||
| 内被调(/renew、/settle),绝不能阻塞。""" | ||
| if ok: | ||
| task = compress_backup_tasks.get(lanlan_name) | ||
| if task is not None and not task.done(): | ||
| task.cancel() | ||
| await _clear_compress_backup_failure(lanlan_name) | ||
| return | ||
| # ok=False:主路径压缩失败 → 起后台兜底 | ||
| if not snapshot: | ||
| return | ||
| existing = compress_backup_tasks.get(lanlan_name) | ||
| if existing is not None and not existing.done(): | ||
| return # in-flight:同角色已有后台压缩在跑,不重复起 | ||
| # 失败退避(Gate 6 模式):连续失败 ≥ N 且输入未变 → dead-letter,不再起, | ||
| # 防 summary 模型持续故障时每轮都起一个注定失败的后台任务空烧。 | ||
| from config import MEMORY_LIVENESS_MAX_ATTEMPTS | ||
| from memory.recent import build_review_fingerprint | ||
| state = _maint_state.setdefault(lanlan_name, {}) | ||
| fail_attempts = state.get('compress_backup_fail_attempts', 0) or 0 | ||
| if fail_attempts >= MEMORY_LIVENESS_MAX_ATTEMPTS: | ||
| cur_fp = build_review_fingerprint(snapshot) | ||
| if state.get('compress_backup_fail_fp') == cur_fp: | ||
| logger.debug( | ||
| f"[CompressBackup] {lanlan_name} 失败退避 dead-letter" | ||
| f"(连续失败 {fail_attempts} 次且输入未变),跳过" | ||
| ) | ||
| return | ||
| # 输入变了 → 旧计数过期,复位放行 | ||
| state['compress_backup_fail_attempts'] = 0 | ||
| state['compress_backup_fail_fp'] = None | ||
| await _asave_maint_state() | ||
| task = _spawn_background_task(_run_backup_compress(lanlan_name, list(snapshot), detailed)) |
There was a problem hiding this comment.
这个回调还在临界区里做写盘喵
注释里说这里“只 spawn / cancel task”,但 ok=True 分支会 await _clear_compress_backup_failure(),退避复位分支也会 await _asave_maint_state()。/renew、/settle 这两条路径都是在 _get_settle_lock() 内调用 update_history() 的,所以这些 await 会把 idle_maintenance_state.json 的写盘绑进用户请求的串行窗口里,慢盘时会直接拖长请求并额外阻塞同角色的 /new_dialog 喵。这里最好只改内存状态,然后 fire-and-forget 持久化喵。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/memory_server.py` around lines 3416 - 3453, _on_compress_done currently
performs blocking awaits (_clear_compress_backup_failure and _asave_maint_state)
while running inside the settle/renew critical section; change it to only mutate
in-memory _maint_state and fire-and-forget the persistence: in the ok=True
branch update/reset the in-memory compress-backup failure counters in
_maint_state (do not await _clear_compress_backup_failure) and spawn a
background task to call _clear_compress_backup_failure (or a small wrapper that
does the save) via _spawn_background_task; likewise, when resetting fail
counters after input change set state['compress_backup_fail_attempts']=0 and
state['compress_backup_fail_fp']=None and call _spawn_background_task to run
_asave_maint_state (do not await); keep spawn/cancel logic for
_run_backup_compress as-is so no awaits happen inside the critical path.
…盘失败上报 - compress_history:恢复 stage-2(further_compress)失败时重试 stage-1 的行为 (重构时退化成直接 return None),用有限计数循环复现、避免原 continue 不计数 的死循环。 - _enforce_hard_cap:按真实 content token 算(不再走会把每条截到 ≤500 的压缩 渲染 _render_messages_to_text,否则超长原文被低估、硬上限对其失效);去掉 「条数 ≤ N+1 就提前 return」的断言(几条超长原文就能顶破 token 上限)。 - _segmented_compress:reduce 缩不动 / 深度耗尽时把 merged 硬截到预算兜底, 不把超预算文本继续传给主体最终总结。 - merge_backup_memo:落盘失败返回 'failed'(而非谎报 'merged'); _run_backup_compress 据此 bump 退避而非清计数。 - 测试:补 merge 落盘失败→'failed';修测试 import 风格(统一 monkeypatch 字符串 目标)与 _cleanup_task 的 except 范围(BaseException → CancelledError+Exception)。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
感谢 @chatgpt-codex-connector 和 @coderabbitai 的评审。逐条处理(已推 Codex
CodeRabbit
另 @github-code-quality 的 except 过宽 / 测试 import 风格也一并修了。 |
|
To use Codex here, create an environment for this repo. |
|
Tip For best results, initiate chat on the files or code changes.
逐条确认一下喵: CodeRabbit 四条
整体逻辑更健壮了喵,后台兜底 + 硬上限的三层防线设计很稳喵~ 本喵……就、就勉强夸你一下好了喵! (´•ω•`)ノ🐇✨ 🧠 Learnings used |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e23916532
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await self._enforce_hard_cap(lanlan_name) | ||
| # best-effort:通知上层起一个受保护的后台压缩任务尽力压(主路径失败)。 | ||
| await self._notify_compress_done(on_compress_done, lanlan_name, snapshot, False, detailed) |
There was a problem hiding this comment.
Run backup compression before pruning the snapshot
When a single main compression attempt fails after recent history is already above RECENT_HARD_CAP_TOKENS, this prunes the oldest messages before scheduling the background fallback. The fallback then compresses the pre-prune snapshot, but merge_backup_memo() requires that whole snapshot to still be present at the head of current history, so a later successful backup becomes moot and the pruned conversation is permanently lost instead of being summarized. This turns the intended final fallback for sustained failures into data loss on any transient failure once the cap is exceeded.
Useful? React with 👍 / 👎.
| if kept and kept_tok + mtok > RECENT_HARD_CAP_TOKENS and len(kept) >= self.max_history_length: | ||
| break | ||
| kept.append(msg) | ||
| kept_tok += mtok |
There was a problem hiding this comment.
Allow the hard cap to trim oversized recent turns
When the newest max_history_length messages alone exceed RECENT_HARD_CAP_TOKENS (for example a few large pasted turns while compression is failing), this loop still appends them all because it refuses to stop until len(kept) >= self.max_history_length. The resulting new_history can remain above the hard cap, and /new_dialog later injects those full message contents, so the fallback still does not bound prompt size in exactly the long-message failure case it is meant to protect.
Useful? React with 👍 / 👎.
Codex 指出:把裁剪放在主路径**每次**压缩失败时,会让"历史超 cap 后任何一次 暂时性失败"立刻丢最旧原文,而后台压缩用的是裁剪前 snapshot → 合并失配 moot → 那批对话没被摘要就永久丢失,把"持续失败的最终兜底"变成"暂时失败就丢数据"。 - recent.py:update_history 失败分支不再裁剪;_enforce_hard_cap 改名为 enforce_hard_cap(供 memory_server 调用)+ 自包含落盘(不再依赖 update_history 的后续落盘)。 - memory_server:裁剪改由 _run_backup_compress 在后台 best-effort 也压不成后触发 (锁内串行写),以及 dead-letter 分支触发(不加锁防与 settle lock 重入死锁)。 暂时性失败时后台会成功、根本走不到裁剪,那批对话被压成摘要保留。 - 测试:补 enforce_hard_cap AsyncMock + 断言后台失败 / dead-letter 时裁剪被触发。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@chatgpt-codex-connector 复评的两条(已推
|
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
背景
#1629 修了"压缩失败覆盖空备忘录"的数据丢失 bug,改成"失败跳过本轮、保留完整历史、下轮重试",正好兜住暂时性失败(如 RPM 限流抖一下,下轮恢复)。
但持续性失败没兜住:压缩一直成不了 → 历史一直压不掉、越积越多 → 注入主模型的 prompt 无限膨胀(最终撑爆 context / 成本飙升)。几种原因互相独立、都 valid:
/process调update_history不持_get_settle_lock(/renew·/settle·/cache都持),压缩 await LLM 数十秒期间被后续对话重载磁盘覆盖,压缩白做。方案(best effort → 实在不行才丢)
review_history的_compute_review_capacity)——积压还在原位就替换成备忘录、保留这期间新增的对话;被主路径压掉/清空就丢弃(moot)。compress在_get_settle_lock外(LLM 耗时不阻塞其它端点)、merge在锁内(快,串行化写)。RECENT_COMPRESS_INPUT_BUDGET_TOKENS就分段 map-reduce,减小单次 LLM 输入。RECENT_HARD_CAP_TOKENS(设很大,平时不触发,只兜持续 429 这类 best-effort 也救不回的场景)→ 丢弃最旧的未压缩对话原文,保留近期若干条 + 备忘录,保证有界。改动
memory/recent.py:compress_history重构出可复用 helper(_render_messages_to_text/_build_summary_prompt/_invoke_summary_llm,单次路径行为不变)+ 输入过大时分段压缩;update_history加on_compress_done钩子;新增merge_backup_memo(快照对齐合并)、_enforce_hard_cap(兜底裁剪)。app/memory_server.py:_on_compress_done回调(失败起后台 / 成功 cancel + 清退避)+_run_backup_compress编排 +compress_backup_tasksin-flight 去重 + Gate6 失败退避;4 个压缩调用点接线。config/__init__.py:RECENT_COMPRESS_INPUT_BUDGET_TOKENS(8000)、RECENT_HARD_CAP_TOKENS(60000)。测试
tests/unit/test_recent_compression_failure.py(扩展)+tests/unit/test_recent_compress_backup.py(新增):分段切分/路径、硬上限裁剪、快照合并(merged/moot)、回调 ok 真假、后台退避/in-flight/dead-letter/复位。uv run pytest tests/unit/ -k "compress or recent or backoff or review or stale or temporal or memo or summary"→ 416 passed。Relates to #1629。
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug 修复
新特性
测试