Skip to content

伴学 Batch A 性能优化与稳定性加固#1598

Open
MomiJiSan wants to merge 6 commits into
Project-N-E-K-O:mainfrom
MomiJiSan:feat/study-companion-perf-optimize
Open

伴学 Batch A 性能优化与稳定性加固#1598
MomiJiSan wants to merge 6 commits into
Project-N-E-K-O:mainfrom
MomiJiSan:feat/study-companion-perf-optimize

Conversation

@MomiJiSan
Copy link
Copy Markdown
Contributor

@MomiJiSan MomiJiSan commented Jun 2, 2026

主要变更

  • 优化伴学事件总线异步派发,增加 backlog 丢弃统计、worker 异常恢复与重启计数重置。
  • 优化伴学状态 payload、知识追踪与批量写入路径,减少重复 deep copy 和多次数据库往返。
  • 为 StudyStore 增加 WAL/read connection 配置、open 失败清理、批量写入 step 日志。
  • 将非关键候选知识写入放入 savepoint,避免候选知识失败回滚用户答题、QA、FSRS 等关键数据。
  • 修复 OCR 轻量截图缺 imagehash 时失败的问题,改为禁用 pHash 并继续生成 snapshot。
  • OCR worker 超时/失败时 cancel future,并记录 backend resolve 失败诊断。
  • 修复 KnowledgeTracker batch fallback 在写锁内调用 legacy 路径的问题。
  • 增加 memory schema SQLite identifier 校验。
  • 拆分 KnowledgeGraph topic id/name 索引,避免名称与 ID 冲突误命中。
  • 补充相关单元测试覆盖上述回归点。

验证

  • uv run pytest -q 相关 6 个伴学测试文件:197 passed
  • uv run pytest -q plugin/tests/unit/plugins/test_study_companion_knowledge_quality.py plugin/tests/unit/plugins/test_study_companion_knowledge_contribution.py:15 passed
  • uv run ruff check ...:All checks passed
  • uv run python -m compileall ...:通过
  • git diff --check:通过
  • GitNexus detect_changes:risk low,无 affected processes

Summary by CodeRabbit

发行说明

  • 新功能

    • 增加批量答案写入与写入互斥接口,提升写入吞吐与一致性。
    • 事件总线改为后台 worker 模式,支持启动/停止与队列容错。
  • 性能改进

    • 引入只读数据库连接以支持并发读取。
    • OCR 轻量路径并行化 JPEG 编码与文本提取,提升响应与稳定性;视觉快照 TTL 延长至 30s。
  • Bug 修复

    • 改善启动/关闭清理与异常容错,增强告警与资源关闭鲁棒性。
    • 增加 SQLite 标识符校验以防非法输入。
  • 测试

    • 大幅扩展单元测试覆盖批量写入、并发读写、事件总线与 OCR 行为。

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2ac5f355-0fe5-47b2-b148-ca211c6625c7

📥 Commits

Reviewing files that changed from the base of the PR and between 654bbfa and 7c19c41.

📒 Files selected for processing (2)
  • plugin/plugins/study_companion/store.py
  • plugin/tests/unit/plugins/test_study_companion.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugin/tests/unit/plugins/test_study_companion.py
  • plugin/plugins/study_companion/store.py

Walkthrough

此 PR 将 StudyCompanion 的事件总线改为有界队列+单 worker,增加读写分离的线程本地只读连接与批量答案事务,KnowledgeTracker 引入内存主题索引与批量候选写入,OCR 管道加入线程池并行处理,插件生命周期完善资源清理与 review_due future 管理,并大幅扩展单元测试覆盖喵。

Changes

事件总线队列化改造

Layer / File(s) Summary
队列与 Worker 配置初始化
plugin/plugins/study_companion/_event_bus.py
新增 bounded queue/失败阈值/退避常量并初始化 queue 与 worker 状态。
事件入队与调度重构
plugin/plugins/study_companion/_event_bus.py
schedule_emit 改为入队(队满时丢弃最早),懒启动/复用单 worker 并返回 worker task 或 None。
Worker 消费循环与故障恢复
plugin/plugins/study_companion/_event_bus.py
新增 _consume_queue() 处理队列事件、调用 emit、追踪连续失败并按阈值停止或指数退避;增加 task_done 守护与停机后丢弃残存项。
Worker 生命周期停止与清理
plugin/plugins/study_companion/_event_bus.py
新增 stop_worker() 取消并等待 worker,移除旧的 per-event 完成回调实现并配套测试更新。

数据库连接分层与批量写入

Layer / File(s) Summary
读连接基础设施
plugin/plugins/study_companion/store.py
新增 _require_read_conn() 提供线程本地只读连接,open/close 管理读连接集,并在 journal 配置上优先 WAL、不可用回退 DELETE;暴露 answer_write_lock() 写锁。
批量答案写入事务处理
plugin/plugins/study_companion/store.py
新增 batch_write_answer_data(...) 在 BEGIN IMMEDIATE 下分步写入 session/qa/mastery/wrong/fsrs/review_log,候选写入使用 savepoint 隔离以允许主事务提交。
候选重评分与状态推导
plugin/plugins/study_companion/store.py
新增 _batch_recompute_candidate_score 与模块级 _next_candidate_status 用于基于 evidence/score_parts 推导候选状态。
交互裁剪与 FSRS 辅助
plugin/plugins/study_companion/store.py
append_interaction() 增加计数并按间隔触发 interactions 裁剪;新增 _fsrs_card_from_row() 将 DB 行转换为 FSRS 卡片 dict。

只读查询迁移

Layer / File(s) Summary
FSRS / QA / Topics 只读迁移
plugin/plugins/study_companion/store_fsrs.py, plugin/plugins/study_companion/store_qa.py, plugin/plugins/study_companion/store_topics.py, plugin/plugins/study_companion/store_knowledge.py
大量查询方法改为使用 _require_read_conn() 执行只读 SELECT,移除对写连接与全局锁的依赖,保留排序/limit 语义与返回解析。

知识追踪批量写入

Layer / File(s) Summary
知识图索引与主题解析
plugin/plugins/study_companion/knowledge_tracker.py
KnowledgeGraph 增加内存 topic 索引、脏标记、mark_dirty() 与 resolve_known_topic();discover_candidate 优先使用内存索引并在索引截断时回退存储解析。
批量答案写入与回退策略
plugin/plugins/study_companion/knowledge_tracker.py
新增 _BatchAnswerWriteFailed,KnowledgeTracker.on_answer 在 store 支持时走加锁批量路径,失败则回退到 legacy 写入。
候选与主题数据构造
plugin/plugins/study_companion/knowledge_tracker.py
新增方法构建 topic/候选/证据 写入数据(含 uuid id、dedupe_key、evidence 等)。
日志降噪与异常上报
plugin/plugins/study_companion/knowledge_tracker.py
质量告警失败计数与阈值升级逻辑,新增 _log_exception 优先调用实例 logger.exception。

OCR 管道线程池与并行处理

Layer / File(s) Summary
线程池与资源初始化
plugin/plugins/study_companion/study_ocr_pipeline.py
引入 ThreadPoolExecutor,在 init 初始化 executor;imagehash 设为可选,VISION_SNAPSHOT_TTL 延长至 30s,新增 close/_retire/_require_executor 管理。
并行 JPEG 编码 与 OCR 提取
plugin/plugins/study_companion/study_ocr_pipeline.py
capture_lightweight 在满足条件时并行提交 JPEG 编码与 OCR 提取(设超时),JPEG 失败回退同步编码,仅当 OCR 状态为 ok/empty 才记忆 vision snapshot,诊断包含 OCR 状态信息。
像素驱动编码与签名扩展
plugin/plugins/study_companion/study_ocr_pipeline.py
_encode_lightweight_jpeg 采用目标像素缩放 + 质量循环策略,_calculate_thumbnail_phash 在缺失 imagehash 时返回 None,_extract_image 新增 _skip_vision_snapshot 参数。

插件生命周期与 review_due 管理

Layer / File(s) Summary
复习到期 Payload future 管理
plugin/plugins/study_companion/__init__.py, plugin/plugins/study_companion/entry_communication_review_events.py
新增 _review_due_payload_future 字段;_emit_review_due_if_needed 使用 loop.run_in_executor 并缓存 future,取消时用 asyncio.shield 等待并按当前引用清理 future。
启动失败与正常关闭清理
plugin/plugins/study_companion/__init__.py
在 _cleanup_after_failed_startup 与 shutdown 中按阶段引用并调用 event_bus.stop_worker() 与 ocr_pipeline.close()(如可调用),捕获异常并记录 warning,清空对应字段。

数据安全与载荷拷贝

Layer / File(s) Summary
SQLite 标识符校验
plugin/plugins/study_companion/memory_schema.py
新增正则检查 _SQLITE_IDENTIFIER_RE 与 _validate_sqlite_identifier(),_ensure_column 在构造 PRAGMA/ALTER 前校验 table/column 名称并在非法时抛 ValueError。
状态載荷深拷贝
plugin/plugins/study_companion/service.py
移除 json_copy,使用 copy.deepcopy 对暴露的知识子字段做深拷贝,knowledge 外层按需不做不必要 deepcopy。

单元测试覆盖扩展

Layer / File(s) Summary
Store / EventBus / Shutdown 测试
plugin/tests/unit/plugins/test_study_companion.py, plugin/tests/unit/plugins/test_study_event_bus.py
新增 WAL/读连接/回退/线程本地读连接/交互裁剪/batch 回滚 与 EventBus worker 失败/丢弃/stop 的测试用例。
KnowledgeTracker 与 Graph 测试
plugin/tests/unit/plugins/test_study_companion_knowledge_tracker.py
新增批量写入路径、回退、锁外回退、候选隔离、图索引复用与脏标记等断言。
OCR / 安全 / 服务 测试
plugin/tests/unit/plugins/test_study_companion_study_ocr_pipeline.py, plugin/tests/unit/plugins/test_study_companion_memory_schema.py, plugin/tests/unit/plugins/test_study_companion_service_ui_api.py
覆盖 JPEG 编码参数、imagehash 缺失降级、超时取消、executor 复用、标识符校验与 payload 深拷贝行为。

🎯 Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Poem

队列排队响铃声,喵,
读写分离稳稳行,喵,
OCR 线程忙并发,喵,
批量写入悄然成,喵,
测试护航保无恙,喵。

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR标题准确总结了主要变更重点——伴学系统的性能优化与稳定性加固,与changeset中的多个核心改动一致(事件总线、存储读连接、批量写入、OCR管道等)。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (5)
plugin/tests/unit/plugins/test_study_event_bus.py (1)

584-613: 💤 Low value

测试逻辑正确,覆盖了 backlog 满时丢弃最旧事件的边界情况喵~

不过第 609 行访问了 bus._queue._queue 内部 deque,这依赖 CPython 实现细节喵。虽然在单元测试里这样写可以接受,但如果将来 asyncio.Queue 内部实现变了可能会脆弱喵。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/tests/unit/plugins/test_study_event_bus.py` around lines 584 - 613,
The test currently peeks into the private deque via bus._queue._queue which
relies on CPython internals; instead, drain the queue using public APIs then
restore it to preserve state: loop while not bus._queue.empty() and call
bus._queue.get_nowait() to collect items into a list, perform your assertions on
that list, then put the items back into bus._queue in the same order with
put_nowait so the worker state is unchanged; update
test_schedule_emit_drops_when_backlog_is_full to use this drain-and-restore
approach rather than accessing bus._queue._queue directly.
plugin/plugins/study_companion/service.py (1)

33-72: ⚡ Quick win

json_copydeepcopy 的行为差异比预期小,但策略不一致需整理喵~

检查发现 json_copy 并不是真的 JSON 往返(dumps→loads),只是递归复制 dict/list/tuple、规范化 dict 键为 str、保留其他类型不变。所以它并不会把 datetime 或自定义对象强制转成字符串喵。

StudyState 类型标注看,这些状态字段都是 list/dict/float/str 之类的 JSON 友好类型,不太可能在运行时塞进 tuple 或非 str 的 dict 键。因此从 json_copy 换成 deepcopy 实际风险比初期担心的小得多呢~

不过 Line 200 的 build_tutor_payload 还在用 json_copy,而 build_status_payload 改用 deepcopy 了,这策略不一致有点别扭喵。建议两边对齐一下,要么都改成 deepcopy,要么一起用 json_copy,这样更清晰也更易维护呢~

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/plugins/study_companion/service.py` around lines 33 - 72, The diff
shows build_status_payload using copy.deepcopy while build_tutor_payload still
uses json_copy, causing inconsistent copying semantics; pick one strategy and
apply it consistently. Decide whether to prefer deepcopy or json_copy (given
StudyState fields are JSON-friendly deepcopy is acceptable), then replace calls
to json_copy in build_tutor_payload with copy.deepcopy (or vice versa replace
deepcopy uses with json_copy) and update imports/usages accordingly (look for
functions build_status_payload, build_tutor_payload, and the json_copy symbol)
so both payload builders use the same copy function across the file.
plugin/tests/unit/plugins/test_study_companion_knowledge_tracker.py (1)

295-338: 💤 Low value

这里和 _store 帮手重复造轮子了喵~

为了塞一个自定义 logger,你把 seed 路径拼接和 StudyStore 构造又抄了一遍(第 300-308 行),跟 _store(第 36-46 行)几乎一模一样喵。让 _store 接受可选 logger 参数就能复用啦,笨蛋别嫌麻烦喵~

♻️ 让 _store 接受 logger 的小改造喵
-def _store(tmp_path: Path) -> StudyStore:
+def _store(tmp_path: Path, logger: object | None = None) -> StudyStore:
     seed = (
         Path(__file__).resolve().parents[3]
         / "plugins"
         / "study_companion"
         / "static"
         / "knowledge_graph_seed.json"
     )
-    store = StudyStore(tmp_path / "study.db", tmp_path / "seed.json", _Logger(), seed)
+    store = StudyStore(
+        tmp_path / "study.db", tmp_path / "seed.json", logger or _Logger(), seed
+    )
     store.open()
     return store

然后这个测试里直接 store = _store(tmp_path, logger) 就好了喵。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/tests/unit/plugins/test_study_companion_knowledge_tracker.py` around
lines 295 - 338, The test
test_study_store_batch_write_answer_data_keeps_answer_when_candidates_fail
duplicates seed path construction and StudyStore instantiation that already
exists in the helper function _store; modify _store to accept an optional logger
parameter (e.g., def _store(tmp_path: Path, logger: Optional[Logger]=None) or
similar) and use it when constructing StudyStore, then update this test to call
store = _store(tmp_path, logger) instead of reassembling the seed path and
calling StudyStore directly; ensure the unique symbols referenced are _store and
StudyStore so callers keep the same behavior while allowing injection of the
custom logger.
plugin/tests/unit/plugins/test_study_companion.py (1)

3884-3895: ⚡ Quick win

把 worker 任务对象的终态也断言出来喵。

这里只校验 bus._worker_task is None,如果 shutdown() 以后只是把字段清空、但没有真的等这个 task 结束,这条回归测试还是会误过喵。上面已经拿到了 task,这里最好再补一条 assert task.done(),必要的话再细化到 cancelled()/done(),这样才能把“worker 确实停掉了”锁死喵。

可参考的小改动喵
     await plugin.shutdown()

     assert bus._worker_task is None
+    assert task.done()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/tests/unit/plugins/test_study_companion.py` around lines 3884 - 3895,
在测试中除了断言 bus._worker_task 为 None 之外,还需要断言之前保存的 task 对象已经结束以确保 worker
实际停止;在现有代码片段里对变量 task(由 bus.schedule_emit 返回的任务)添加断言例如 assert
task.done(),如需更严格可改为断言 task.cancelled() 或同时检查 done() && not cancelled(),以确保
plugin.shutdown() 等待并终止实际的 worker 任务(引用符号:task, bus._worker_task,
plugin.shutdown)。
plugin/plugins/study_companion/knowledge_tracker.py (1)

457-479: ⚡ Quick win

批量能力探测和实际调用面不一致喵。

_can_batch_answer_data() 现在只检查 _supports_batch_answerbatch_write_answer_data,但批量分支马上还会调用 answer_write_lock()load_answer_write_state()。只要某个 store 或测试桩只实现了前两项,这里就会在 fallback 之前直接 AttributeError 掉喵。建议把这两个方法也纳入 gate,或者缺失时直接退回 legacy 喵。

Also applies to: 608-611

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/plugins/study_companion/knowledge_tracker.py` around lines 457 - 479,
The batch-capability gate in _can_batch_answer_data is incomplete: it only
checks _supports_batch_answer and batch_write_answer_data but the batch path
also expects store.answer_write_lock() and load_answer_write_state(), which can
cause AttributeError before falling back; update _can_batch_answer_data to also
require answer_write_lock and load_answer_write_state on the store (and any test
stubs) or, if those methods are missing, ensure the call sites that invoke
answer_write_lock() and load_answer_write_state() (the branch that calls
_on_answer_batch from the code around _on_answer_batch/_on_answer_legacy and the
similar logic at lines ~608-611) immediately fall back to _on_answer_legacy
instead of attempting the batch flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugin/plugins/study_companion/__init__.py`:
- Around line 405-409: The call to close_ocr() on self._ocr_pipeline can raise
and currently isn't caught, which may prevent clearing self._ocr_pipeline and
subsequent shutdown steps; wrap the close_ocr() invocation in a try/except
(mirroring _cleanup_after_failed_startup) to catch any Exception, log the error
(e.g., self.logger.exception or self.logger.error with the exception), and
ensure self._ocr_pipeline is set to None and remaining
shutdown/state-save/store-close logic runs (use finally or set the attribute
after the try/except) so the plugin state remains consistent.

In `@plugin/plugins/study_companion/_event_bus.py`:
- Around line 180-203: The worker currently does an immediate return when
_worker_failure_count >= _MAX_WORKER_FAILURES which leaves queued events and
_scheduled_emit_count stuck; instead of returning directly from the inner loop,
replace the immediate return with logic that preserves/backfills the backlog:
when should_stop is true, if self._scheduled_emit_count > 0 call
self.schedule_emit() (so queued emits will trigger a fresh worker) and then
break out of the loop (allowing the outer finally to clear self._worker_task),
ensuring you do not silently drop or leave events; reference symbols:
_worker_failure_count, _MAX_WORKER_FAILURES, should_stop, _scheduled_emit_count,
schedule_emit(), _safe_task_done, _worker_task.

In `@plugin/plugins/study_companion/knowledge_tracker.py`:
- Around line 263-282: The cached index (_ensure_index) is truncated to
_TOPIC_INDEX_LIMIT and _topic_name_index/_topic_id_index can't be treated as the
full set; modify discover_candidate() and upsert_candidate() to, when the index
was truncated (detect via len(topics) >= _TOPIC_INDEX_LIMIT or count_topics() >
_TOPIC_INDEX_LIMIT), perform an exact store-side lookup instead of relying only
on _topic_name_index: call the store APIs (e.g., list_topics with filters or a
get_topic_by_name/get_topic_by_id if available) to verify existence by name or
id, and use that authoritative result for matching (also apply the same change
for full-text / first-line fallback logic so it queries the store when the index
is truncated to avoid creating duplicate candidate records).

In `@plugin/plugins/study_companion/store_knowledge.py`:
- Around line 303-319: candidate_status_counts currently runs two separate
SELECTs (grouped counts and total) outside of _lock which can yield inconsistent
snapshots; fix by computing total from the grouped result or by issuing a single
aggregated SQL so both values come from the same snapshot. Specifically, inside
candidate_status_counts use the rows returned by the first query on
candidate_knowledge_items (via _require_read_conn().execute(...).fetchall()) to
derive total (sum of COUNT values) or replace the two queries with one query
that returns grouped counts plus the overall count in one result; ensure the
computation remains protected by the existing _lock usage in this class so the
returned by_status/by_type and total are consistent.

In `@plugin/plugins/study_companion/store.py`:
- Around line 209-227: The PRAGMA journal_mode=WAL call in
_configure_connection_journal currently ignores the returned journal mode string
(SQLite often returns "delete" instead of raising), so change
_configure_connection_journal to capture and return the PRAGMA response (actual
mode string or a boolean like is_wal), log the returned value on both success
and fallback, and ensure it attempts the DELETE fallback only if WAL wasn't
returned; then update _require_read_conn to check the returned mode from
_configure_connection_journal and disable the independent read connection (or
fall back to the locked read path) when the mode is not "wal", using the
existing _log_warning for any non-wal cases.

In `@plugin/plugins/study_companion/study_ocr_pipeline.py`:
- Around line 361-363: The current _calculate_thumbnail_phash(image:
Image.Image) returns an empty string when the optional imagehash dependency is
missing, which the caller then treats as a real phash and flags every frame as
changed; change the function to return None (adjust signature to ->
Optional[str]) when imagehash is unavailable, and update caller logic that reads
its return value (the code that computes has_content_change) to treat None as
"detection disabled" (skip phash comparison and do not set
has_content_change=True). Ensure all uses of _calculate_thumbnail_phash handle
the Optional return safely.
- Around line 116-121: The current timeout handling in capture_lightweight
(where jpeg_future.result(timeout=3.0) / ocr_future.result(timeout=5.0) leads to
jpeg_future.cancel()/ocr_future.cancel()) and close() (which only does
executor.shutdown(wait=False)) doesn't stop already-started worker tasks and
leaves self._executor reusable, causing resource contention; update the flow so
that on any timeout or cancel path you mark the current ThreadPoolExecutor
(self._executor) as unusable and replace it with a fresh executor (set
self._executor = None or create a new ThreadPoolExecutor) to isolate background
tasks, and/or add an interrupt mechanism to worker functions
(_encode_lightweight_jpeg and _extract_image) by accepting a stop_event or
periodic timeout checks so long-running tasks can exit early; ensure close()
also marks self._executor unusable (shutdown then set to None) and
capture_lightweight recreates or lazily creates a new executor before submitting
new futures.

---

Nitpick comments:
In `@plugin/plugins/study_companion/knowledge_tracker.py`:
- Around line 457-479: The batch-capability gate in _can_batch_answer_data is
incomplete: it only checks _supports_batch_answer and batch_write_answer_data
but the batch path also expects store.answer_write_lock() and
load_answer_write_state(), which can cause AttributeError before falling back;
update _can_batch_answer_data to also require answer_write_lock and
load_answer_write_state on the store (and any test stubs) or, if those methods
are missing, ensure the call sites that invoke answer_write_lock() and
load_answer_write_state() (the branch that calls _on_answer_batch from the code
around _on_answer_batch/_on_answer_legacy and the similar logic at lines
~608-611) immediately fall back to _on_answer_legacy instead of attempting the
batch flow.

In `@plugin/plugins/study_companion/service.py`:
- Around line 33-72: The diff shows build_status_payload using copy.deepcopy
while build_tutor_payload still uses json_copy, causing inconsistent copying
semantics; pick one strategy and apply it consistently. Decide whether to prefer
deepcopy or json_copy (given StudyState fields are JSON-friendly deepcopy is
acceptable), then replace calls to json_copy in build_tutor_payload with
copy.deepcopy (or vice versa replace deepcopy uses with json_copy) and update
imports/usages accordingly (look for functions build_status_payload,
build_tutor_payload, and the json_copy symbol) so both payload builders use the
same copy function across the file.

In `@plugin/tests/unit/plugins/test_study_companion_knowledge_tracker.py`:
- Around line 295-338: The test
test_study_store_batch_write_answer_data_keeps_answer_when_candidates_fail
duplicates seed path construction and StudyStore instantiation that already
exists in the helper function _store; modify _store to accept an optional logger
parameter (e.g., def _store(tmp_path: Path, logger: Optional[Logger]=None) or
similar) and use it when constructing StudyStore, then update this test to call
store = _store(tmp_path, logger) instead of reassembling the seed path and
calling StudyStore directly; ensure the unique symbols referenced are _store and
StudyStore so callers keep the same behavior while allowing injection of the
custom logger.

In `@plugin/tests/unit/plugins/test_study_companion.py`:
- Around line 3884-3895: 在测试中除了断言 bus._worker_task 为 None 之外,还需要断言之前保存的 task
对象已经结束以确保 worker 实际停止;在现有代码片段里对变量 task(由 bus.schedule_emit 返回的任务)添加断言例如 assert
task.done(),如需更严格可改为断言 task.cancelled() 或同时检查 done() && not cancelled(),以确保
plugin.shutdown() 等待并终止实际的 worker 任务(引用符号:task, bus._worker_task,
plugin.shutdown)。

In `@plugin/tests/unit/plugins/test_study_event_bus.py`:
- Around line 584-613: The test currently peeks into the private deque via
bus._queue._queue which relies on CPython internals; instead, drain the queue
using public APIs then restore it to preserve state: loop while not
bus._queue.empty() and call bus._queue.get_nowait() to collect items into a
list, perform your assertions on that list, then put the items back into
bus._queue in the same order with put_nowait so the worker state is unchanged;
update test_schedule_emit_drops_when_backlog_is_full to use this
drain-and-restore approach rather than accessing bus._queue._queue directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cd0b6919-d7e7-492d-a801-7411d3a302a4

📥 Commits

Reviewing files that changed from the base of the PR and between 93c4562 and 4df5ce4.

📒 Files selected for processing (18)
  • plugin/plugins/study_companion/__init__.py
  • plugin/plugins/study_companion/_event_bus.py
  • plugin/plugins/study_companion/entry_communication_review_events.py
  • plugin/plugins/study_companion/knowledge_tracker.py
  • plugin/plugins/study_companion/memory_schema.py
  • plugin/plugins/study_companion/service.py
  • plugin/plugins/study_companion/store.py
  • plugin/plugins/study_companion/store_fsrs.py
  • plugin/plugins/study_companion/store_knowledge.py
  • plugin/plugins/study_companion/store_qa.py
  • plugin/plugins/study_companion/store_topics.py
  • plugin/plugins/study_companion/study_ocr_pipeline.py
  • plugin/tests/unit/plugins/test_study_companion.py
  • plugin/tests/unit/plugins/test_study_companion_knowledge_tracker.py
  • plugin/tests/unit/plugins/test_study_companion_memory_schema.py
  • plugin/tests/unit/plugins/test_study_companion_service_ui_api.py
  • plugin/tests/unit/plugins/test_study_companion_study_ocr_pipeline.py
  • plugin/tests/unit/plugins/test_study_event_bus.py

Comment thread plugin/plugins/study_companion/__init__.py Outdated
Comment thread plugin/plugins/study_companion/_event_bus.py
Comment thread plugin/plugins/study_companion/knowledge_tracker.py
Comment thread plugin/plugins/study_companion/store_knowledge.py Outdated
Comment thread plugin/plugins/study_companion/store.py Outdated
Comment thread plugin/plugins/study_companion/study_ocr_pipeline.py
Comment thread plugin/plugins/study_companion/study_ocr_pipeline.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/plugins/study_companion/study_ocr_pipeline.py (1)

261-284: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

同步 JPEG 回退这里会把编码异常直接抛给调用方喵

jpeg_future 超时/失败后会走 Line 280-284 的同步 _encode_lightweight_jpeg(),但这段不在 try/except 里喵。_encode_lightweight_jpeg() 自己在 Line 368-370 会抛 RuntimeError,所以这里不会像前面的分支那样返回 LightweightSnapshot(status="capture_failed"),而是直接把异常冒出去喵。既然这个方法整体在做“失败降级”,这里也需要同样的兜底喵。

🐾 可以这样补一个本地兜底喵
-        if jpeg_bytes is None:
-            jpeg_bytes = self._encode_lightweight_jpeg(
-                thumbnail,
-                max_bytes=self._config.awareness.image_max_bytes,
-            )
+        if jpeg_bytes is None:
+            try:
+                jpeg_bytes = self._encode_lightweight_jpeg(
+                    thumbnail,
+                    max_bytes=self._config.awareness.image_max_bytes,
+                )
+            except Exception as exc:
+                return LightweightSnapshot(
+                    status="capture_failed",
+                    captured_at=captured_at,
+                    diagnostic=f"jpeg encode failed: {exc}",
+                    window_title=window_title,
+                    app_type=app_type,
+                    thumbnail_phash=thumbnail_phash,
+                    has_content_change=has_content_change,
+                )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/plugins/study_companion/study_ocr_pipeline.py` around lines 261 - 284,
The fallback path that calls _encode_lightweight_jpeg(thumbnail, ...) when
jpeg_bytes is None can raise RuntimeError and currently escapes the surrounding
failure-handling; wrap that call in a try/except and on exception create and
return the same kind of capture-failed LightweightSnapshot (or set jpeg_bytes to
None and construct LightweightSnapshot(status="capture_failed", ...) consistent
with the ocr failure branch), while also calling
jpeg_future.cancel()/_retire_executor(executor) as done elsewhere; update the
block that currently sets jpeg_bytes to instead handle exceptions from
_encode_lightweight_jpeg and produce the appropriate LightweightSnapshot or
error diagnostic.
♻️ Duplicate comments (1)
plugin/plugins/study_companion/knowledge_tracker.py (1)

396-411: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

索引截断后的正文匹配还是会把旧 topic 裂成新 candidate 喵

这次补的是 normalizedfirst精确回查喵,但 Line 400-402 这条“正文里包含 topic 名”的路径还是只扫被截断的 _topic_name_index 喵。也就是说,当真实 topic 落在 _TOPIC_INDEX_LIMIT 之外时,像 "... Late Topic ..." 这种输入依旧会落到后面的 upsert_candidate(),继续把已存在 topic 误写成新 candidate 喵。这个根因和之前提过的是同一个,只是现在还剩 substring 匹配这条漏网路径喵。截断模式下这里要么补 store 侧的名称检索,要么干脆禁用这种基于部分索引的 substring 自动发现喵。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/plugins/study_companion/knowledge_tracker.py` around lines 396 - 411,
The substring match over _topic_name_index (the for loop that checks "if name
and name in normalized") still runs when self._index_truncated, causing existing
topics beyond _TOPIC_INDEX_LIMIT to be treated as new candidates; update this
path to either (a) bypass substring-based discovery when self._index_truncated
(i.e. skip the name-in-normalized check) or (b) perform a store-side resolution
via self._resolve_store_topic(name or normalized substring) before returning
topic_id; ensure the change references _index_truncated, _topic_name_index,
_resolve_store_topic and prevents upsert_candidate from being invoked for
already-existing topics when the index is truncated.
🧹 Nitpick comments (1)
plugin/tests/unit/plugins/test_study_companion.py (1)

3951-3977: ⚡ Quick win

OCR pipeline 清理失败时的优雅降级测试很赞喵~

这个测试验证了关闭失败时的容错行为喵,确保 shutdown() 不会因为 OCR pipeline 清理失败而整体失败喵。符合 PR 目标中提到的"降级优雅并记录诊断信息"喵!

不过本喵有个小建议喵:可以考虑验证一下 logger.warnings 中的具体错误消息是否包含 "ocr close failed" 原始异常信息,这样能确认诊断信息的完整性喵~

💡 可选的断言增强建议喵

在 line 3975-3977 的断言后,可以增加一个验证原始异常信息是否被记录的断言喵:

     assert any(
         "study shutdown OCR pipeline cleanup failed" in str(item[0][0])
         for item in ctx.logger.warnings
     )
+    assert any(
+        "ocr close failed" in str(item)
+        for item in ctx.logger.warnings
+    )

这样可以确保诊断信息不仅包含框架层的消息,也包含底层的具体错误喵~

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/tests/unit/plugins/test_study_companion.py` around lines 3951 - 3977,
Add an assertion to test_shutdown_clears_ocr_pipeline_when_close_fails that
verifies the original exception text is present in the logged warning: after
calling plugin.shutdown() and checking plugin._ocr_pipeline is None, scan
ctx.logger.warnings for an entry that includes the substring "ocr close failed"
(thrown by _FailingClosePipeline.close()) in addition to the existing "study
shutdown OCR pipeline cleanup failed" check so the test confirms the original
exception message is preserved in the logger output from plugin.shutdown().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@plugin/plugins/study_companion/study_ocr_pipeline.py`:
- Around line 261-284: The fallback path that calls
_encode_lightweight_jpeg(thumbnail, ...) when jpeg_bytes is None can raise
RuntimeError and currently escapes the surrounding failure-handling; wrap that
call in a try/except and on exception create and return the same kind of
capture-failed LightweightSnapshot (or set jpeg_bytes to None and construct
LightweightSnapshot(status="capture_failed", ...) consistent with the ocr
failure branch), while also calling
jpeg_future.cancel()/_retire_executor(executor) as done elsewhere; update the
block that currently sets jpeg_bytes to instead handle exceptions from
_encode_lightweight_jpeg and produce the appropriate LightweightSnapshot or
error diagnostic.

---

Duplicate comments:
In `@plugin/plugins/study_companion/knowledge_tracker.py`:
- Around line 396-411: The substring match over _topic_name_index (the for loop
that checks "if name and name in normalized") still runs when
self._index_truncated, causing existing topics beyond _TOPIC_INDEX_LIMIT to be
treated as new candidates; update this path to either (a) bypass substring-based
discovery when self._index_truncated (i.e. skip the name-in-normalized check) or
(b) perform a store-side resolution via self._resolve_store_topic(name or
normalized substring) before returning topic_id; ensure the change references
_index_truncated, _topic_name_index, _resolve_store_topic and prevents
upsert_candidate from being invoked for already-existing topics when the index
is truncated.

---

Nitpick comments:
In `@plugin/tests/unit/plugins/test_study_companion.py`:
- Around line 3951-3977: Add an assertion to
test_shutdown_clears_ocr_pipeline_when_close_fails that verifies the original
exception text is present in the logged warning: after calling plugin.shutdown()
and checking plugin._ocr_pipeline is None, scan ctx.logger.warnings for an entry
that includes the substring "ocr close failed" (thrown by
_FailingClosePipeline.close()) in addition to the existing "study shutdown OCR
pipeline cleanup failed" check so the test confirms the original exception
message is preserved in the logger output from plugin.shutdown().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 70f5e1e7-c227-4860-a216-8d60b3ae2b10

📥 Commits

Reviewing files that changed from the base of the PR and between 4df5ce4 and da3ab1a.

📒 Files selected for processing (12)
  • plugin/plugins/study_companion/__init__.py
  • plugin/plugins/study_companion/_event_bus.py
  • plugin/plugins/study_companion/knowledge_tracker.py
  • plugin/plugins/study_companion/service.py
  • plugin/plugins/study_companion/store.py
  • plugin/plugins/study_companion/store_knowledge.py
  • plugin/plugins/study_companion/study_ocr_pipeline.py
  • plugin/tests/unit/plugins/test_study_companion.py
  • plugin/tests/unit/plugins/test_study_companion_knowledge_quality.py
  • plugin/tests/unit/plugins/test_study_companion_knowledge_tracker.py
  • plugin/tests/unit/plugins/test_study_companion_study_ocr_pipeline.py
  • plugin/tests/unit/plugins/test_study_event_bus.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • plugin/plugins/study_companion/service.py
  • plugin/plugins/study_companion/init.py
  • plugin/tests/unit/plugins/test_study_companion_study_ocr_pipeline.py
  • plugin/tests/unit/plugins/test_study_event_bus.py
  • plugin/plugins/study_companion/_event_bus.py
  • plugin/plugins/study_companion/store.py

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 654bbfa31a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +196 to +201
self.open()
return self._require_conn()
conn = getattr(self._read_local, "conn", None)
if conn is None:
with self._lock:
if self._conn is None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize fallback reads on the writer connection

When WAL is unavailable, this branch returns the shared write connection, but the read call sites then execute queries after this method has released _lock (for example list_topics() and get_raw() call _require_read_conn().execute(...) directly). In DELETE-journal fallback environments, a background status/read can therefore use the same sqlite connection concurrently with a locked batch write, which defeats the previous serialization and can surface intermittent sqlite transaction/connection errors. Keep the lock held for fallback reads or provide a read helper that executes under _lock when _read_connections_enabled is false.

Useful? React with 👍 / 👎.

@MomiJiSan
Copy link
Copy Markdown
Contributor Author

这个 PR 做了什么

本 PR 完成伴学性能优化,并补齐审查中发现的稳定性问题。

核心改动包括:

  • 事件总线改为更稳的异步派发路径,增加 backlog 丢弃统计、worker 异常恢复,并修复 worker 重启后 failure count 不重置的问题。
  • 优化状态 payload 和知识追踪写入路径,减少重复 deep copy 和重复数据库访问。
  • StudyStore 增强 WAL/read connection 配置,open 初始化失败时会清理半初始化连接。
  • 批量答题写入中,QA、mastery、FSRS 等关键数据与候选知识写入隔离;候选知识失败不会回滚用户答题记录。
  • OCR 轻量截图缺 imagehash 时不再失败,而是禁用 pHash 后继续生成 snapshot。
  • OCR 后台 worker 超时/失败会 cancel future,并记录 OCR backend resolve 失败诊断。
  • KnowledgeTracker batch fallback 改到写锁外执行,避免 legacy 路径被错误包进同一把写锁。
  • memory schema 增加 SQLite identifier 校验。
  • KnowledgeGraph 拆分 topic id/name 索引,避免名称与 ID 冲突误命中。
  • 补充回归测试覆盖上述问题。

为什么要这样做

本pr的目标不是新增大功能,而是先把伴学链路里的高频路径稳定下来:事件派发、轻量截图、状态构建、答题写入、知识追踪这些路径都会在伴学运行时反复触发。

这批改动优先处理三类风险:

  • 性能风险:减少重复 deep copy、减少写入路径中的重复数据库访问,降低高频状态刷新和答题记录写入的开销。
  • 数据一致性风险:把关键学习数据和非关键候选知识写入隔离,避免候选知识失败导致用户答题记录被回滚。
  • 可恢复性风险:OCR、事件 worker、SQLite WAL 初始化等路径失败时不再直接扩大故障面,而是降级、记录诊断并继续保留主流程。

Batch B 执行前提

Batch B 不应该直接叠在不稳定的 Batch A 上执行。建议满足以下前提后再开始:
注:需要等待 notebook 分支合并后再启动;否则 Batch B 可能基于旧接口实现,后续 notebook 合入会造成接口漂移和返工。

  • 当前 PR 已合入,或 Batch B 分支明确基于当前 PR 最新提交。
  • CI / 本地相关测试通过,至少包括本 PR 覆盖的伴学单元测试、knowledge quality/contribution 测试。
  • GitNexus detect_changes 结果维持 low 风险,无 HIGH/CRITICAL 未解释项。
  • Batch A 的审查问题已关闭,没有剩余会影响数据一致性、锁、事务、OCR 失败恢复的 blocker。
  • 运行一次伴学关键手动流程:状态页加载、轻量截图、答题记录、错题/FSRS 写入、review queue 查询,确认没有明显回归。
  • Batch B 开始前先重新做 impact analysis,尤其是如果要继续改 Store、KnowledgeTracker、OCR pipeline、事件总线这几个共享路径。

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant