Fix memory resolution regression for multimodal Gemini models#14209
Fix memory resolution regression for multimodal Gemini models#14209yingfeng merged 4 commits intoinfiniflow:mainfrom
Conversation
📝 WalkthroughWalkthroughTenant-model lookup gained a keyword-only Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
api/db/services/tenant_llm_service.py (1)
109-116: Consider logging when fallback compatibility match is used.When the exact
model_typelookup fails but a compatible model is found via fallback, this is a significant behavioral divergence from strict matching. Adding a debug log here would help with troubleshooting resolution issues.📝 Suggested logging
candidate = compatible_objs[0] if cls.model_supports_type(candidate.llm_name, model_type_val, candidate.llm_factory): + logging.debug( + "Model %s with stored type %s resolved via compatibility fallback for requested type %s", + candidate.llm_name, candidate.model_type, model_type_val + ) return candidate return NoneAs per coding guidelines:
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/db/services/tenant_llm_service.py` around lines 109 - 116, The current flow returns a compatible candidate when exact model_type lookup fails without any logging; modify the method that calls cls._query_model_records to emit a debug log when a fallback compatibility match is used: after selecting candidate = compatible_objs[0] and before returning it, call the module logger to log that a fallback match was chosen, including identifiers like candidate.llm_name, candidate.llm_factory and the requested model_type_val so operators can trace why the non-exact model was returned; keep the log at debug level and do not change return behavior.api/db/joint_services/tenant_model_service.py (1)
24-26: Consider extracting_normalize_model_typeto avoid duplication.This helper is identical to
TenantLLMService._normalize_model_typeinapi/db/services/tenant_llm_service.py(lines 47-49). Consider extracting it to a shared utility module (e.g.,api/utils/model_utils.py) to maintain DRY principles.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/db/joint_services/tenant_model_service.py` around lines 24 - 26, The helper _normalize_model_type is duplicated between tenant_model_service.py and TenantLLMService._normalize_model_type in tenant_llm_service.py; extract this logic into a shared utility (e.g., create api/utils/model_utils.py with a function normalize_model_type) and replace both usages to call that new utility function, updating imports in tenant_model_service.py and tenant_llm_service.py to import normalize_model_type so the shared logic is centralized and DRY.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/memory_api.py`:
- Around line 35-37: The call to ensure_tenant_model_id_for_params (with
strict=True) can raise ArgumentException but is currently executed before the
try/except that handles ArgumentException; move that call into the existing try
block that begins after req is parsed (or wrap just that call in a small
try/except) so that any ArgumentException is caught and handled by the same
logic that returns get_error_argument_result(); reference
ensure_tenant_model_id_for_params, get_request_json, current_user.id and the
existing ArgumentException handler to locate where to move or wrap the call.
---
Nitpick comments:
In `@api/db/joint_services/tenant_model_service.py`:
- Around line 24-26: The helper _normalize_model_type is duplicated between
tenant_model_service.py and TenantLLMService._normalize_model_type in
tenant_llm_service.py; extract this logic into a shared utility (e.g., create
api/utils/model_utils.py with a function normalize_model_type) and replace both
usages to call that new utility function, updating imports in
tenant_model_service.py and tenant_llm_service.py to import normalize_model_type
so the shared logic is centralized and DRY.
In `@api/db/services/tenant_llm_service.py`:
- Around line 109-116: The current flow returns a compatible candidate when
exact model_type lookup fails without any logging; modify the method that calls
cls._query_model_records to emit a debug log when a fallback compatibility match
is used: after selecting candidate = compatible_objs[0] and before returning it,
call the module logger to log that a fallback match was chosen, including
identifiers like candidate.llm_name, candidate.llm_factory and the requested
model_type_val so operators can trace why the non-exact model was returned; keep
the log at debug level and do not change return behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 28e1cee5-54ca-4887-afb3-269b18629c22
📒 Files selected for processing (6)
api/apps/restful_apis/memory_api.pyapi/db/joint_services/memory_message_service.pyapi/db/joint_services/tenant_model_service.pyapi/db/services/dialog_service.pyapi/db/services/tenant_llm_service.pyapi/utils/tenant_utils.py
|
@yingfeng Could you please assign the review of this PR? |
|
Thanks for working on this @spider-yamet! I agree that the PR fixes the regression in #14206. That said, I think this specific bug can be fixed in a smaller and more targeted way. The failure starts during memory creation, in Because the bad state is introduced there, I think the narrowest fix is to handle it there:
That solves the issue at the exact failure point, with a very small patch. It also avoids broadening shared lookup behavior in other layers, and it leaves existing Suggested change: diff --git a/api/utils/tenant_utils.py b/api/utils/tenant_utils.py
index 83da91f1c..2d1baf66a 100644
--- a/api/utils/tenant_utils.py
+++ b/api/utils/tenant_utils.py
@@ -30,6 +30,8 @@ def ensure_tenant_model_id_for_params(tenant_id: str, param_dict: dict) -> dict:
if param_dict.get(key) and not param_dict.get(f"tenant_{key}"):
model_type = _KEY_TO_MODEL_TYPE.get(key)
tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key], model_type)
+ if not tenant_model and model_type == LLMType.CHAT:
+ tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key])
if tenant_model:
param_dict.update({f"tenant_{key}": tenant_model.id})
else:I think this is a better fit for #14206 because it fixes the regression where it starts, with the fewest line changes possible. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14209 +/- ##
==========================================
- Coverage 98.11% 96.66% -1.45%
==========================================
Files 10 10
Lines 690 690
Branches 108 108
==========================================
- Hits 677 667 -10
- Misses 4 8 +4
- Partials 9 15 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@6ba3i Thanks for your detailed feedback. I updated the fix to match your suggestion and kept it as narrow as possible. Instead of changing the shared model resolution behavior, I only handle it at memory creation time: we still try the typed CHAT lookup first, and if that misses, we retry once without model_type just for llm_id, so the existing multimodal tenant row can still be reused. I also fixed the error handling in memory_api.create_memory(), so if strict model resolution fails, it now goes through the existing argument-error path instead of surfacing as a 500. Other broader compatibility changes were removed so the patch stays focused on the exact regression point you called out. I would appreciate your feedback now. :) |
6ba3i
left a comment
There was a problem hiding this comment.
i think your addition of the strict rule might've broken the ci @spider-yamet especially because it only happens in the memory tests in sdk, do you mind trying and removing them or fixing the ci errors ? Thanks !
|
@6ba3i Could you please check again? pr is passing the ci :) |
|
@yingfeng could you please share your opinion? |
What problem does this PR solve?
Fixes #14206.
This issue is a regression. PR #9520 previously changed Gemini models from
image2texttochatto fix chat-side resolution, but PR #13073 later restored those Gemini entries toimage2textduring model-list updates, which reintroduced the bug.The underlying problem is that Gemini models are multimodal and advertise both
CHATandIMAGE2TEXT, while tenant model resolution still depends on a single storedmodel_type. That makes chat-only flows such as memory extraction fragile when a compatible model is stored asimage2text.This PR fixes the issue at the model resolution layer instead of changing
llm_factories.jsonagain:model_typelookup firsttenant_llm_id=0This preserves existing multimodal and
image2textbehavior while restoring chat compatibility for memory-related flows.Type of change
Testing
image2textbut tagged withCHATcan still be resolved forchatget_model_config_by_type_and_name(..., CHAT, ...)returns a chat-compatible runtime configget_model_config_by_id(..., CHAT)also returns a chat-compatible runtime config