feat: 添加内存诊断工具#1686
Conversation
Walkthrough该 PR 新增长期内存诊断:配置与依赖、在主进程可选注册 MemoryDiagnosticsTask,实装周期快照采集(进程/GC/asyncio/运行时/消息二进制等)、tracemalloc 差分、JSONL 持久化与轮转、估算工具、测试与中文操作指南。 ChangesMemory Diagnostics Service Implementation
Sequence diagram (高层次交互): sequenceDiagram
participant Task as MemoryDiagnosticsTask.run()
participant Collector as _collect_snapshot()
participant Heartflow as Heartflow
participant Tracemalloc as tracemalloc
participant Writer as _write_snapshot()
participant JSONL as JSONL file
participant Logger as Logger
Task->>Collector: 启动采集
Collector->>Heartflow: 拉取会话/运行时指标
Collector->>Tracemalloc: 可选采样与 diff
Collector-->>Task: 返回快照
Task->>Writer: 异步写入
Writer->>JSONL: 追加行并轮转/清理
Task->>Logger: 输出摘要与告警
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/memory_diagnostics_guide.md`:
- Line 167: 将文中描述 `top_sessions` 的句子语序优化为更自然的中文:将“`top_sessions`
可以看到占用最高的会话。”改为“`top_sessions` 可以看到占用最高的会话列表。”或改为“`top_sessions`
可查看占用最高的会话”,以避免表述生硬并保持与上下文一致;定位并更新使用 `top_sessions` 的句子即可。
- Line 614: 将字段 chat_manager.last_message_binary_mb
的描述从“最后一条图片、表情、语音被保留”改为更明确的表述,例如“最后一条包含图片/表情/语音的消息被保留”;只修改该字段的自然语言说明,保证语义为“保留最后一条含二进制内容的消息”,并保持表格其它列格式不变。
- Line 238: 一句话摘要:将“最可疑的会话”改为更通顺的表述以优化语句修饰关系。请在文档中将出现的短语 `top_binary_sessions`
对应描述句子“`top_binary_sessions` 会列出二进制占用最可疑的会话。”修改为更自然的表达,例如“`top_binary_sessions`
会列出可疑程度最高的会话”或“`top_binary_sessions`:最可疑会话列表”,确保语义不变且读句通顺。
In `@src/services/memory_diagnostics_service.py`:
- Around line 1009-1018: The diagnostic snapshot currently writes raw child
process command lines via _safe_process_cmdline into the JSONL (seen where the
dict includes "cmdline": _safe_process_cmdline(child)), which can leak secrets;
modify the data collection to sanitize/redact sensitive tokens before persisting
by implementing or reusing a sanitizer (e.g., sanitize_cmdline or
redact_sensitive_parts) and call it either inside _safe_process_cmdline or
immediately before assigning "cmdline"; ensure the sanitizer strips or replaces
common secret patterns (token, password, api_key, --secret=, bearer tokens, long
hex/base64 strings) and apply the same change to the other occurrence mentioned
(around lines with similar dict at 1053-1056) so no raw cmdline values are
written to disk.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 32c0ac01-a735-4587-ab67-7440aa4ef749
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
docs/memory_diagnostics_guide.mdpyproject.tomlrequirements.txtsrc/config/config.pysrc/config/official_configs.pysrc/main.pysrc/services/memory_diagnostics_service.pytests/test_memory_diagnostics_service.py
|
|
||
| - `runtime_count` 持续增长,而且不回落:可能是会话 runtime 没有淘汰。 | ||
| - `lock_count` 跟着 session 数增长:说明会话创建锁也在累积。 | ||
| - `top_sessions` 可以看到占用最高的会话。 |
There was a problem hiding this comment.
优化语序以提升中文表达自然度。
建议把“占用最高的会话”改为“占用最高的会话”前加“最”,或直接改成“占用最高的会话列表”,避免读感生硬。
🧰 Tools
🪛 LanguageTool
[uncategorized] ~167-~167: 动词的修饰一般为‘形容词(副词)+地+动词’。您的意思是否是:最高"地"会话
Context: ...长:说明会话创建锁也在累积。 - top_sessions 可以看到占用最高的会话。 常见原因: - 很多群聊或私聊陆续触发 MaiSaka。 - 会话长...
(wb4)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/memory_diagnostics_guide.md` at line 167, 将文中描述 `top_sessions`
的句子语序优化为更自然的中文:将“`top_sessions` 可以看到占用最高的会话。”改为“`top_sessions`
可以看到占用最高的会话列表。”或改为“`top_sessions` 可查看占用最高的会话”,以避免表述生硬并保持与上下文一致;定位并更新使用
`top_sessions` 的句子即可。
| | `chat_manager.loaded` | ChatManager 模块是否已加载 | 未进入消息接收运行态或模块未初始化 | | ||
| | `chat_manager.sessions` | ChatManager 当前 session 数 | 聊天 session 对象滞留 | | ||
| | `chat_manager.last_messages` | ChatManager 记录的最后消息数量 | 最后一条消息缓存增长 | | ||
| | `chat_manager.last_message_binary_mb` | 最后消息缓存中的二进制估算 | 最后一条图片、表情、语音被保留 | |
There was a problem hiding this comment.
该字段说明存在轻微歧义,建议改写。
“最后一条图片、表情、语音被保留”可改为“最后一条包含图片/表情/语音的消息被保留”,避免“最后一条”修饰范围不清。
🧰 Tools
🪛 LanguageTool
[uncategorized] ~614-~614: 数量词修饰并列短语,可能产生歧义
Context: ...message_binary_mb| 最后消息缓存中的二进制估算 | 最后一条图片、表情、语音被保留 | |chat_manager.last_message...
(s5)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/memory_diagnostics_guide.md` at line 614, 将字段
chat_manager.last_message_binary_mb
的描述从“最后一条图片、表情、语音被保留”改为更明确的表述,例如“最后一条包含图片/表情/语音的消息被保留”;只修改该字段的自然语言说明,保证语义为“保留最后一条含二进制内容的消息”,并保持表格其它列格式不变。
There was a problem hiding this comment.
♻️ Duplicate comments (3)
docs/memory_diagnostics_guide.md (3)
238-238:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win调整修饰关系以改善句子通顺度。
"二进制占用最可疑的会话"中的修饰关系略显别扭。建议改为"二进制占用可疑程度最高的会话"或"二进制占用最高的可疑会话"。
📝 建议的表达优化
-- `top_binary_sessions` 会列出二进制占用最可疑的会话。 +- `top_binary_sessions` 会列出二进制占用可疑程度最高的会话。或
-- `top_binary_sessions` 会列出二进制占用最可疑的会话。 +- `top_binary_sessions` 会列出二进制占用最高的可疑会话。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/memory_diagnostics_guide.md` at line 238, 将原文中关于 top_binary_sessions 的描述“二进制占用最可疑的会话”改为更通顺的表述,例如“二进制占用可疑程度最高的会话”(或“二进制占用最高的可疑会话”);在包含 top_binary_sessions 的句子中替换该短语以改善修饰关系并保持语义不变。
614-614:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win消除数量词修饰并列短语的歧义。
"最后一条图片、表情、语音被保留"中,"最后一条"修饰的范围不够明确,可能产生歧义。建议明确表述为"最后一条包含图片/表情/语音的消息被保留"。
📝 建议的表达优化
-| `chat_manager.last_message_binary_mb` | 最后消息缓存中的二进制估算 | 最后一条图片、表情、语音被保留 | +| `chat_manager.last_message_binary_mb` | 最后消息缓存中的二进制估算 | 最后一条包含图片/表情/语音的消息被保留 |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/memory_diagnostics_guide.md` at line 614, The phrase for chat_manager.last_message_binary_mb is ambiguous; update the table cell text "最后一条图片、表情、语音被保留" to an explicit wording such as "最后一条包含图片/表情/语音的消息被保留" so "最后一条" clearly modifies "消息"; locate the row with the symbol `chat_manager.last_message_binary_mb` and replace the existing Chinese fragment accordingly.
167-167:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win优化中文表达以提升清晰度。
"可以看到占用最高的会话"在中文表达中略显生硬。建议改为"可以看到占用最高的会话列表"或"可查看占用最高的会话"。
📝 建议的表达优化
-- `top_sessions` 可以看到占用最高的会话。 +- `top_sessions` 可以看到占用最高的会话列表。或
-- `top_sessions` 可以看到占用最高的会话。 +- `top_sessions` 可查看占用最高的会话。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/memory_diagnostics_guide.md` at line 167, 在文档中对 `top_sessions` 的说明句子措辞不够流畅,请将当前的“`top_sessions` 可以看到占用最高的会话。”替换为更自然的表述,例如“`top_sessions` 可查看占用最高的会话”或“`top_sessions` 可以看到占用最高的会话列表”,以提升可读性并保持对标识符 `top_sessions` 的引用一致;只需在 docs/memory_diagnostics_guide.md 中更新该句即可。
🧹 Nitpick comments (2)
src/services/memory_diagnostics_service.py (2)
1160-1173: 运维建议:interesting task 关键词列表需随系统演进更新。当前硬编码的关键词列表(learn, description, heartflow, maisaka, memory, websocket, reply_effect, embedding)用于过滤"值得关注的任务"。未来如果新增重要子系统(如新的自动化服务或插件),可能需要更新此列表以确保诊断覆盖范围。
建议在新增核心异步任务时,评估是否需要将相关关键词加入此列表。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/memory_diagnostics_service.py` around lines 1160 - 1173, The hardcoded keyword list in _is_interesting_task should be made configurable so operators can update monitored task keywords without changing code; extract the tuple currently in _is_interesting_task into a centralized, editable source (e.g., a settings constant or config loader) and have _is_interesting_task read that list at runtime (fallback to the current defaults if config absent); reference the function name _is_interesting_task and the current keywords (learn, description, heartflow, maisaka, memory, websocket, reply_effect, embedding) so you replace the inline tuple with a call or import (e.g., from MEMORY_DIAGNOSTICS_KEYWORDS or load_interesting_task_keywords()) that reads from application config / environment / feature flags and document the config key for operators to update.
341-352: 💤 Low value可选:移除未使用的返回值。
该方法声明返回
int并在 Line 352 返回scan_count,但调用处(Line 245)未使用返回值。建议改为返回None以保持类型签名与实际使用一致。♻️ 可选的类型签名调整
- def _fill_runtime_binary_metrics(self, item: dict[str, Any], runtime: Any, scan_budget: int) -> int: + def _fill_runtime_binary_metrics(self, item: dict[str, Any], runtime: Any, scan_budget: int) -> None: message_cache = getattr(runtime, "message_cache", []) or [] scan_count = min(len(message_cache), max(0, scan_budget)) binary_summary = _estimate_messages_binary(_iter_spread(message_cache, scan_count)) @@ -349,7 +349,6 @@ item["binary_scan_strategy"] = "spread" item["binary_scan_skipped_messages"] = max(0, len(message_cache) - scan_count) item["binary_lower_bound"] = bool(item["binary_scan_truncated"]) - return scan_count🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/memory_diagnostics_service.py` around lines 341 - 352, 将 _fill_runtime_binary_metrics 的签名和实现从返回 int 改为返回 None:把函数的返回类型注解由 int 改为 None(或移除注解中的返回类型),并删除最后的 `return scan_count`,因为调用者未使用该返回值;保留内部逻辑(scan_count 计算、item 更新等)不变,以确保行为一致。引用符号:函数名 `_fill_runtime_binary_metrics`、局部变量 `scan_count`、字典 `item`。
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@docs/memory_diagnostics_guide.md`:
- Line 238: 将原文中关于 top_binary_sessions
的描述“二进制占用最可疑的会话”改为更通顺的表述,例如“二进制占用可疑程度最高的会话”(或“二进制占用最高的可疑会话”);在包含
top_binary_sessions 的句子中替换该短语以改善修饰关系并保持语义不变。
- Line 614: The phrase for chat_manager.last_message_binary_mb is ambiguous;
update the table cell text "最后一条图片、表情、语音被保留" to an explicit wording such as
"最后一条包含图片/表情/语音的消息被保留" so "最后一条" clearly modifies "消息"; locate the row with the
symbol `chat_manager.last_message_binary_mb` and replace the existing Chinese
fragment accordingly.
- Line 167: 在文档中对 `top_sessions` 的说明句子措辞不够流畅,请将当前的“`top_sessions`
可以看到占用最高的会话。”替换为更自然的表述,例如“`top_sessions` 可查看占用最高的会话”或“`top_sessions`
可以看到占用最高的会话列表”,以提升可读性并保持对标识符 `top_sessions` 的引用一致;只需在
docs/memory_diagnostics_guide.md 中更新该句即可。
---
Nitpick comments:
In `@src/services/memory_diagnostics_service.py`:
- Around line 1160-1173: The hardcoded keyword list in _is_interesting_task
should be made configurable so operators can update monitored task keywords
without changing code; extract the tuple currently in _is_interesting_task into
a centralized, editable source (e.g., a settings constant or config loader) and
have _is_interesting_task read that list at runtime (fallback to the current
defaults if config absent); reference the function name _is_interesting_task and
the current keywords (learn, description, heartflow, maisaka, memory, websocket,
reply_effect, embedding) so you replace the inline tuple with a call or import
(e.g., from MEMORY_DIAGNOSTICS_KEYWORDS or load_interesting_task_keywords())
that reads from application config / environment / feature flags and document
the config key for operators to update.
- Around line 341-352: 将 _fill_runtime_binary_metrics 的签名和实现从返回 int 改为返回
None:把函数的返回类型注解由 int 改为 None(或移除注解中的返回类型),并删除最后的 `return
scan_count`,因为调用者未使用该返回值;保留内部逻辑(scan_count 计算、item 更新等)不变,以确保行为一致。引用符号:函数名
`_fill_runtime_binary_metrics`、局部变量 `scan_count`、字典 `item`。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f63a2a59-9293-4cfc-815d-60283d01844e
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
docs/memory_diagnostics_guide.mdpyproject.tomlrequirements.txtsrc/config/config.pysrc/config/official_configs.pysrc/main.pysrc/services/memory_diagnostics_service.pytests/test_memory_diagnostics_service.py
🚧 Files skipped from review as they are similar to previous changes (5)
- pyproject.toml
- src/main.py
- requirements.txt
- src/config/official_configs.py
- tests/test_memory_diagnostics_service.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_memory_diagnostics_service.py (1)
293-299:⚠️ Potential issue | 🟠 Major | ⚡ Quick win不要把敏感参数明文保留固化为测试预期。
这里把
--token=secret-value原样作为正确结果,会固化潜在凭据泄露行为(JSONL 落盘、问题提交流程中都可能外泄)。建议将预期改为脱敏后的参数值,并同步调整实现。🔧 建议修改(示例)
def test_safe_process_cmdline_keeps_raw_arguments_with_limit() -> None: process = SimpleNamespace(cmdline=lambda: ["python.exe", "--token=secret-value", "D:\\private\\script.py"]) cmdline = diagnostics._safe_process_cmdline(process) - assert cmdline == ["python.exe", "--token=secret-value", "D:\\private\\script.py"] + assert cmdline == ["python.exe", "--token=***", "D:\\private\\script.py"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_memory_diagnostics_service.py` around lines 293 - 299, The test test_safe_process_cmdline_keeps_raw_arguments_with_limit currently asserts the raw sensitive flag "--token=secret-value"; update the expectation to a sanitized token value (e.g. "--token=<redacted>" or masked form) and adjust the implementation in diagnostics._safe_process_cmdline to produce that sanitized form for sensitive flags/keys (tokens, passwords, keys) instead of returning secrets verbatim; locate and update the test assertion and ensure diagnostics._safe_process_cmdline's masking rule covers both "--key=value" and separate "--key", "value" patterns so the test and function remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/test_memory_diagnostics_service.py`:
- Around line 293-299: The test
test_safe_process_cmdline_keeps_raw_arguments_with_limit currently asserts the
raw sensitive flag "--token=secret-value"; update the expectation to a sanitized
token value (e.g. "--token=<redacted>" or masked form) and adjust the
implementation in diagnostics._safe_process_cmdline to produce that sanitized
form for sensitive flags/keys (tokens, passwords, keys) instead of returning
secrets verbatim; locate and update the test assertion and ensure
diagnostics._safe_process_cmdline's masking rule covers both "--key=value" and
separate "--key", "value" patterns so the test and function remain consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f57d240d-2b6e-4cae-a1b3-5e1da12f8ff3
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
docs/memory_diagnostics_guide.mdpyproject.tomlrequirements.txtsrc/config/config.pysrc/config/official_configs.pysrc/main.pysrc/services/memory_diagnostics_service.pytests/test_memory_diagnostics_service.py
🚧 Files skipped from review as they are similar to previous changes (6)
- requirements.txt
- pyproject.toml
- src/main.py
- src/config/config.py
- src/config/official_configs.py
- src/services/memory_diagnostics_service.py
zh-CN目标翻译作为常规 GitHub 编辑面;常规翻译以 Crowdin ->l10n_*PR 回流为准,详见docs/i18n.md请填写以下内容
(删除掉中括号内的空格,并替换为小写的x)
main分支 禁止修改,请确认本次提交的分支 不是main分支src/A_memorix,我确认已阅读src/A_memorix/MODIFICATION_POLICY.md,不涉及则无需勾选其他信息
关联 Issue:Close #
截图/GIF:
附加信息:
新增长时间运行内存诊断工具,用于排查长期运行后的内存占用增长问题。
诊断任务可通过 [debug] 配置开关启用,定期采集相关状态,并输出 JSONL 快照日志,方便对比趋势和定位异常来源。并用 AI 写了个零基础使用文档。
Summary by CodeRabbit
新功能
文档
测试
杂项