fix(galgame): 让 RapidOCR 模型配置实际生效#1194
Conversation
本次更改为 RapidOCR 运行时补充模型路径解析逻辑,按用户模型缓存优先、包内 models 目录兜底的顺序查找 det/cls/rec ONNX 模型,并在构造 RapidOCR 实例时传入解析后的模型路径,使 lang_type、ocr_version、 model_type 配置能够实际影响模型加载。同时将默认 OCR 版本从 PP-OCRv5 调整为与当前内置模型一致的 PP-OCRv4,并更新手动 OCR pipeline 验证脚本以复用统一默认值。 新增单元测试覆盖模型路径解析、用户缓存优先级、缺失模型回退以及从实际导入包定位 models 目录等场景,防止配置再次失效。
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
Walkthrough将默认 RapidOCR OCR 版本从 "PP-OCRv5" 改为 "PP-OCRv4",并在 galgame 插件中添加包内与用户模型缓存的动态模型路径解析与注入逻辑,调整运行时构造参数传递与加载调用,新增单元测试覆盖模型优先级与加载行为喵。 ChangesOCR 版本与动态模型路径解析
Sequence Diagram(s)sequenceDiagram
participant Config as 配置层
participant Loader as 运行时加载
participant Resolver as 路径解析
participant FS as 文件系统
participant RapidOCR as RapidOCR包
Config->>Loader: load_rapidocr_runtime(config)
Loader->>Loader: 派生 package_models_dir(module.parent / "models")
Loader->>Resolver: _build_runtime_constructor_kwargs(..., package_models_dir)
Resolver->>Resolver: _resolve_rapidocr_model_paths(lang, ocr_version, ...)
loop 对每种模型类型 det/cls/rec
Resolver->>FS: 检查 user_cache 中候选文件
FS-->>Resolver: 存在/不存在
alt 存在
Resolver->>Resolver: 选用 user_cache 中的模型路径
else
Resolver->>FS: 检查 package_models_dir 中候选文件
FS-->>Resolver: 存在/不存在
Resolver->>Resolver: 如存在则选用 package 中的路径
end
end
Resolver-->>Loader: 返回 det/cls/rec 路径与 engine_type
Loader->>RapidOCR: 构造 runtime(模型路径, engine_type, ...)
RapidOCR-->>Loader: 返回 runtime 实例与 metadata
Loader-->>Config: 返回 runtime 与元信息
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
💡 Codex ReviewN.E.K.O/plugin/plugins/galgame_plugin/rapidocr_support.py Lines 110 to 114 in e14d2a9 The new resolver explicitly discards ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
plugin/plugins/galgame_plugin/rapidocr_support.py (2)
222-253: 💤 Low value
model_search_dirs这个 API 有点别扭,可以再瘦一瘦身喵~第 221-222 行:
model_cache_dir已经作为命名参数传进来了,调用方又把同一个对象塞到model_search_dirs[0],然后这里又靠search_dirs[1] if len(search_dirs) > 1 else Path()这种魔术索引把 package 目录抠出来——顺序约定全靠默契,谁多塞一个或者顺序换了都会爆出奇奇怪怪的 bug 喵。要么直接收一个具名的package_models_dir,要么彻底用model_search_dirs单一来源(去掉model_cache_dir重复传入),二选一就好啦~顺带 244-250 行那几个
if use_resolved_model_paths and det_path:里det_path/rec_path的判空在use_resolved_model_paths = bool(det_path and rec_path)之后已经是冗余的,可以顺手清掉喵。♻️ 建议的简化喵~
def _build_runtime_constructor_kwargs( runtime_class: type[Any], *, engine_type: str, lang_type: str, model_type: str, ocr_version: str, model_cache_dir: Path, - model_search_dirs: list[Path] | None = None, + package_models_dir: Path | None = None, ) -> dict[str, Any]: ... if has_var_kwargs: - search_dirs = list(model_search_dirs or []) - package_models_dir = search_dirs[1] if len(search_dirs) > 1 else Path() det_path, cls_path, rec_path = _resolve_rapidocr_model_paths( model_cache_dir=model_cache_dir, - package_models_dir=package_models_dir, + package_models_dir=package_models_dir or Path(), lang_type=lang_type, ocr_version=ocr_version, model_type=model_type, ) kwargs: dict[str, Any] = {} - use_resolved_model_paths = bool(det_path and rec_path) - if use_resolved_model_paths and det_path: - kwargs["det_model_path"] = det_path - if use_resolved_model_paths and cls_path: - kwargs["cls_model_path"] = cls_path - if use_resolved_model_paths and rec_path: - kwargs["rec_model_path"] = rec_path + if det_path and rec_path: + kwargs["det_model_path"] = det_path + kwargs["rec_model_path"] = rec_path + if cls_path: + kwargs["cls_model_path"] = cls_path if engine_type: kwargs["engine_type"] = engine_type return kwargs调用点(
load_rapidocr_runtime里)相应改成package_models_dir=package_models_dir就行啦喵~🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/plugins/galgame_plugin/rapidocr_support.py` around lines 222 - 253, The code is fragile because load_rapidocr_runtime mixes model_cache_dir and a positional model_search_dirs list and extracts package_models_dir via search_dirs[1]; change the API to accept an explicit package_models_dir named parameter (e.g., add package_models_dir: Path | None to load_rapidocr_runtime signature) and stop relying on model_search_dirs indexing — pass that package_models_dir into _resolve_rapidocr_model_paths instead of reading search_dirs[1]; also remove redundant per-path emptiness checks after computing use_resolved_model_paths = bool(det_path and rec_path) (i.e., drop "and det_path"/"and rec_path" guards in the blocks that set det_model_path/cls_model_path/rec_model_path) so only the use_resolved_model_paths flag controls adding those kwargs.
349-360: 💤 Low value包内 models 目录的兜底路径已有保护喵,但还可以更完善~
代码在第 121-122 行有对
search_dir的验证逻辑:if not search_dir or not search_dir.is_dir(): continue,所以即使package_models_dir不存在也会被安全跳过。但从可观测性角度,当包内 models 目录找不到而被跳过时,缺少日志记录。建议在_resolve_rapidocr_model_paths函数中增加一条 debug 级日志,记录搜索路径的查找失败或跳过情况,方便后续排错的时候有线索可循喵~这样即使用户遇到模型加载问题,也能从日志里看出是否命中了兜底路径。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/plugins/galgame_plugin/rapidocr_support.py` around lines 349 - 360, The function _resolve_rapidocr_model_paths should log when a candidate search_dir is skipped; update the logic that iterates search_dir (used via model_search_dirs including package_models_dir constructed in rapidocr_support.py) to emit a debug-level message whenever "if not search_dir or not search_dir.is_dir(): continue" evaluates true, including the path value and reason (empty vs not a directory) so package_models_dir misses are observable; add this debug log next to that check (use the module's logger or existing logging facility) without changing control flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugin/plugins/galgame_plugin/rapidocr_support.py`:
- Around line 102-135: The function _resolve_rapidocr_model_paths currently
deletes model_type and hardcodes mobile class filename, so server/mobile config
is ignored; restore and use the model_type value when building det_name,
rec_name and cls_name (e.g. include "_server" vs "_infer" variants or the
correct naming convention for server vs mobile), and add a simple
validation/clamp for model_type (accept only "mobile" or "server" and emit a
warning/log if unsupported) so users who set rapidocr_model_type="server" will
have the correct filenames searched (update references to det_name, rec_name,
cls_name and the initial del model_type usage).
---
Nitpick comments:
In `@plugin/plugins/galgame_plugin/rapidocr_support.py`:
- Around line 222-253: The code is fragile because load_rapidocr_runtime mixes
model_cache_dir and a positional model_search_dirs list and extracts
package_models_dir via search_dirs[1]; change the API to accept an explicit
package_models_dir named parameter (e.g., add package_models_dir: Path | None to
load_rapidocr_runtime signature) and stop relying on model_search_dirs indexing
— pass that package_models_dir into _resolve_rapidocr_model_paths instead of
reading search_dirs[1]; also remove redundant per-path emptiness checks after
computing use_resolved_model_paths = bool(det_path and rec_path) (i.e., drop
"and det_path"/"and rec_path" guards in the blocks that set
det_model_path/cls_model_path/rec_model_path) so only the
use_resolved_model_paths flag controls adding those kwargs.
- Around line 349-360: The function _resolve_rapidocr_model_paths should log
when a candidate search_dir is skipped; update the logic that iterates
search_dir (used via model_search_dirs including package_models_dir constructed
in rapidocr_support.py) to emit a debug-level message whenever "if not
search_dir or not search_dir.is_dir(): continue" evaluates true, including the
path value and reason (empty vs not a directory) so package_models_dir misses
are observable; add this debug log next to that check (use the module's logger
or existing logging facility) without changing control flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 621e055f-edc1-4054-914c-58a08c3392b0
📒 Files selected for processing (5)
plugin/plugins/galgame_plugin/models.pyplugin/plugins/galgame_plugin/plugin.tomlplugin/plugins/galgame_plugin/rapidocr_support.pyplugin/plugins/galgame_plugin/test_ocr_pipeline.pyplugin/tests/unit/plugins/test_galgame_rapidocr_support.py
- _resolve_rapidocr_model_paths 真正使用 model_type,server 变体走 PaddleOCR 的 `_server_infer` 命名(cls 跨变体共享 mobile) - _build_runtime_constructor_kwargs 把 model_search_dirs 列表换成具名 package_models_dir 参数,去掉魔术索引和冗余的 per-path 判空 - 新增 server 变体回归测试,更新现有测试切到新参数名 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai @codex 综合回一下 review 反馈(at 5fd9684): Major / P2: Nitpick (222-253): Nitpick (349-360):包内 models 目录 miss 时加 debug 日志 — 不采纳。 |
|
Tip For best results, initiate chat on the files or code changes.
Major / P2 Nitpick (222-253) Nitpick (349-360) debug 日志 🚫 — 不采纳的理由本喵能接受喵~ 总体来看这次修改把主要的问题都解决掉了,设计也变得更整洁了喵~辛苦了,做得不错嘛……才、才不是在夸你呢喵! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5fd96846a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| det_name = f"{lang}_{version}{server_infix}_det_infer.onnx" | ||
| rec_name = f"{lang}_{version}{server_infix}_rec_infer.onnx" |
There was a problem hiding this comment.
Build server model filenames using RapidOCR naming
When model_type == "server", this now constructs ..._server_det_infer.onnx/..._server_rec_infer.onnx, but RapidOCR 1.4.4 (the version pinned in this repo) uses ..._det_server_infer.onnx and ..._rec_server_infer.onnx. In that configuration the resolver never finds server models, so model_type=server silently falls back to non-server behavior and the new regression test passes only because it encodes the same wrong pattern.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
你说得对,我把 infix 位置放错了。查 SWHL/RapidOCR HuggingFace repo 确认实际命名是 ch_PP-OCRv4_det_server_infer.onnx / ch_PP-OCRv4_rec_server_infer.onnx(_server 在 _det/_rec 之后、_infer 之前),不是我之前写的 ..._server_det_infer。已修:infix 移到正确位置,回归测试同步更新。
at 432318d
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugin/plugins/galgame_plugin/rapidocr_support.py (1)
350-351: 💤 Low value小喵唠叨:
module_file为空时回退到Path()会变成 CWD 搜索喵。
Path()等于Path("."),然后_resolve_rapidocr_model_paths里的search_dir.is_dir()在当前工作目录下会判 True,理论上会去 CWD 翻*_det_infer.onnx。虽然rapidocr_onnxruntime是常规包,__file__缺失的几率几乎为零,但留个空哨兵会更明确喵~♻️ 一种更显式的写法(可选)喵~
- module_file = getattr(module, "__file__", "") or "" - package_models_dir = Path(module_file).resolve().parent / "models" if module_file else Path() + module_file = getattr(module, "__file__", "") or "" + package_models_dir: Path | None = ( + Path(module_file).resolve().parent / "models" if module_file else None + )
_build_runtime_constructor_kwargs(... package_models_dir=package_models_dir)已经能接None了,所以不用动调用点喵。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/plugins/galgame_plugin/rapidocr_support.py` around lines 350 - 351, The code falls back to Path() when module.__file__ is missing which equals the CWD and can cause _resolve_rapidocr_model_paths to inadvertently search the working directory for ONNX files; change the sentinel to None instead: set package_models_dir to Path(module_file).resolve().parent / "models" if module_file else None so callers like _resolve_rapidocr_model_paths and _build_runtime_constructor_kwargs receive None (the latter already accepts None) and will skip CWD scanning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@plugin/plugins/galgame_plugin/rapidocr_support.py`:
- Around line 350-351: The code falls back to Path() when module.__file__ is
missing which equals the CWD and can cause _resolve_rapidocr_model_paths to
inadvertently search the working directory for ONNX files; change the sentinel
to None instead: set package_models_dir to Path(module_file).resolve().parent /
"models" if module_file else None so callers like _resolve_rapidocr_model_paths
and _build_runtime_constructor_kwargs receive None (the latter already accepts
None) and will skip CWD scanning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 96fea7d2-75b6-4bb5-ac8a-586d2de4e793
📒 Files selected for processing (2)
plugin/plugins/galgame_plugin/rapidocr_support.pyplugin/tests/unit/plugins/test_galgame_rapidocr_support.py
…odule-file fallback - 修正 server 变体文件名:infix 应在 `_det`/`_rec` 之后(PaddleOCR 实际命名是 `ch_PP-OCRv4_det_server_infer.onnx`,参见 SWHL/RapidOCR HuggingFace repo), 上一版误把 infix 放到了 lang/version 后面 - module_file 缺失时 package_models_dir 从 Path()(= CWD)改为 None, 避免 _resolve_rapidocr_model_paths 误扫工作目录 - 同步修正 server 变体回归测试的文件名 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai @codex 跟进 review(at 432318d): Codex P2 — server filename infix 位置错 ✅ 修了。我和 CodeRabbit 上一轮的建议都把 CodeRabbit nitpick (350-351) — |
|
Tip For best results, initiate chat on the files or code changes.
Codex P2 — server filename infix 位置 ✅ — 已在 SWHL/RapidOCR HuggingFace repo 里亲自核实过喵!PP-OCRv4 目录下确实是 CodeRabbit nitpick (350-351) — 两条修复都很漂亮,辛苦了喵! |
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…config-not-applied # Conflicts: # plugin/plugins/galgame_plugin/rapidocr_support.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81c80625aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary by CodeRabbit
发布说明
改进
测试