Fix memory resolution regression for multimodal Gemini models by spider-yamet · Pull Request #14209 · infiniflow/ragflow

spider-yamet · 2026-04-18T08:52:21Z

What problem does this PR solve?

This issue is a regression. PR #9520 previously changed Gemini models from image2text to chat to fix chat-side resolution, but PR #13073 later restored those Gemini entries to image2text during model-list updates, which reintroduced the bug.

The underlying problem is that Gemini models are multimodal and advertise both CHAT and IMAGE2TEXT, while tenant model resolution still depends on a single stored model_type. That makes chat-only flows such as memory extraction fragile when a compatible model is stored as image2text.

This PR fixes the issue at the model resolution layer instead of changing llm_factories.json again:

keep the stored tenant model type unchanged
try exact model_type lookup first
if no exact match is found, fall back only when the model metadata shows the requested capability is supported
coerce the runtime config to the requested type for chat callers
fail fast in memory creation instead of silently persisting tenant_llm_id=0

This preserves existing multimodal and image2text behavior while restoring chat compatibility for memory-related flows.

Type of change

Bug Fix (non-breaking change which fixes an issue)

Testing

Re-checked the current memory creation and memory message extraction paths against the updated resolution logic
Verified locally that a Gemini-style tenant model stored as image2text but tagged with CHAT can still be resolved for chat
Verified get_model_config_by_type_and_name(..., CHAT, ...) returns a chat-compatible runtime config
Verified get_model_config_by_id(..., CHAT) also returns a chat-compatible runtime config
Verified strict resolution still fails when the model metadata does not advertise chat capability

coderabbitai · 2026-04-18T08:52:40Z

📝 Walkthrough

Walkthrough

Tenant-model lookup gained a keyword-only strict flag and a CHAT-type fallback lookup without model_type filtering; when strict=True unresolved models now raise an ArgumentException. The memory creation handler now invokes the resolver inside its existing try and explicitly errors if tenant_llm_id is missing.

Changes

Cohort / File(s)	Summary
Memory API Handler `api/apps/restful_apis/memory_api.py`	Moved `ensure_tenant_model_id_for_params(..., strict=True)` into the existing `try` block and added an explicit check that raises an `ArgumentException` when `tenant_llm_id` is absent.
Tenant Model Resolution `api/utils/tenant_utils.py`	Added keyword-only `strict: bool = False` to `ensure_tenant_model_id_for_params`; for CHAT keys, attempt a fallback lookup without `model_type` when initial lookup fails. If `strict=True` and unresolved, raise `ArgumentException`; otherwise default to `tenant_{key} = 0`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through code and chased a clue,
When types hid models from my view,
I tried a fallback, soft and spry,
Now memories find their model—hooray, reply! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the primary fix: resolving a memory resolution regression for multimodal Gemini models, which matches the core objective of addressing issue `#14206`.
Linked Issues check	✅ Passed	The PR directly addresses issue `#14206` by implementing the recommended narrowed fix: adding strict-mode fallback retry logic for CHAT-only lookups in ensure_tenant_model_id_for_params [`#14206`] and adding validation to fail fast in memory creation [`#14206`].
Out of Scope Changes check	✅ Passed	All changes are scoped to addressing `#14206`: modifications to tenant model resolution (tenant_utils.py) and memory creation error handling (memory_api.py) follow the recommended narrowed approach without unrelated refactoring or broader system changes.
Description check	✅ Passed	The PR description clearly identifies the problem (regression with Gemini multimodal models), provides context, lists the fix strategy, and includes comprehensive testing verification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

api/db/services/tenant_llm_service.py (1)

109-116: Consider logging when fallback compatibility match is used.

When the exact model_type lookup fails but a compatible model is found via fallback, this is a significant behavioral divergence from strict matching. Adding a debug log here would help with troubleshooting resolution issues.

📝 Suggested logging

         candidate = compatible_objs[0]
         if cls.model_supports_type(candidate.llm_name, model_type_val, candidate.llm_factory):
+            logging.debug(
+                "Model %s with stored type %s resolved via compatibility fallback for requested type %s",
+                candidate.llm_name, candidate.model_type, model_type_val
+            )
             return candidate
         return None

As per coding guidelines: **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/db/services/tenant_llm_service.py` around lines 109 - 116, The current
flow returns a compatible candidate when exact model_type lookup fails without
any logging; modify the method that calls cls._query_model_records to emit a
debug log when a fallback compatibility match is used: after selecting candidate
= compatible_objs[0] and before returning it, call the module logger to log that
a fallback match was chosen, including identifiers like candidate.llm_name,
candidate.llm_factory and the requested model_type_val so operators can trace
why the non-exact model was returned; keep the log at debug level and do not
change return behavior.

api/db/joint_services/tenant_model_service.py (1)

24-26: Consider extracting _normalize_model_type to avoid duplication.

This helper is identical to TenantLLMService._normalize_model_type in api/db/services/tenant_llm_service.py (lines 47-49). Consider extracting it to a shared utility module (e.g., api/utils/model_utils.py) to maintain DRY principles.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/db/joint_services/tenant_model_service.py` around lines 24 - 26, The
helper _normalize_model_type is duplicated between tenant_model_service.py and
TenantLLMService._normalize_model_type in tenant_llm_service.py; extract this
logic into a shared utility (e.g., create api/utils/model_utils.py with a
function normalize_model_type) and replace both usages to call that new utility
function, updating imports in tenant_model_service.py and tenant_llm_service.py
to import normalize_model_type so the shared logic is centralized and DRY.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/memory_api.py`:
- Around line 35-37: The call to ensure_tenant_model_id_for_params (with
strict=True) can raise ArgumentException but is currently executed before the
try/except that handles ArgumentException; move that call into the existing try
block that begins after req is parsed (or wrap just that call in a small
try/except) so that any ArgumentException is caught and handled by the same
logic that returns get_error_argument_result(); reference
ensure_tenant_model_id_for_params, get_request_json, current_user.id and the
existing ArgumentException handler to locate where to move or wrap the call.

---

Nitpick comments:
In `@api/db/joint_services/tenant_model_service.py`:
- Around line 24-26: The helper _normalize_model_type is duplicated between
tenant_model_service.py and TenantLLMService._normalize_model_type in
tenant_llm_service.py; extract this logic into a shared utility (e.g., create
api/utils/model_utils.py with a function normalize_model_type) and replace both
usages to call that new utility function, updating imports in
tenant_model_service.py and tenant_llm_service.py to import normalize_model_type
so the shared logic is centralized and DRY.

In `@api/db/services/tenant_llm_service.py`:
- Around line 109-116: The current flow returns a compatible candidate when
exact model_type lookup fails without any logging; modify the method that calls
cls._query_model_records to emit a debug log when a fallback compatibility match
is used: after selecting candidate = compatible_objs[0] and before returning it,
call the module logger to log that a fallback match was chosen, including
identifiers like candidate.llm_name, candidate.llm_factory and the requested
model_type_val so operators can trace why the non-exact model was returned; keep
the log at debug level and do not change return behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 28e1cee5-54ca-4887-afb3-269b18629c22

📥 Commits

Reviewing files that changed from the base of the PR and between 6712b50 and 862c06f.

📒 Files selected for processing (6)

api/apps/restful_apis/memory_api.py
api/db/joint_services/memory_message_service.py
api/db/joint_services/tenant_model_service.py
api/db/services/dialog_service.py
api/db/services/tenant_llm_service.py
api/utils/tenant_utils.py

spider-yamet · 2026-04-20T01:09:29Z

@yingfeng Could you please assign the review of this PR?

6ba3i · 2026-04-20T05:32:21Z

Thanks for working on this @spider-yamet! I agree that the PR fixes the regression in #14206.

That said, I think this specific bug can be fixed in a smaller and more targeted way.

The failure starts during memory creation, in ensure_tenant_model_id_for_params(...). For llm_id, we resolve the tenant model as LLMType.CHAT. That lookup is strict, so it misses multimodal Gemini rows that are stored as image2text. Once that happens, memory creation persists an invalid tenant_llm_id, and the later memory extraction path fails because it no longer has a valid tenant model id to use.

Because the bad state is introduced there, I think the narrowest fix is to handle it there:

keep the current typed lookup first
only for LLMType.CHAT, if that lookup misses, retry once without model_type
if the untyped lookup finds the existing tenant row, persist that id and leave the rest of the flow unchanged

That solves the issue at the exact failure point, with a very small patch. It also avoids broadening shared lookup behavior in other layers, and it leaves existing image2text / PDF parsing behavior alone.

Suggested change:

diff --git a/api/utils/tenant_utils.py b/api/utils/tenant_utils.py
index 83da91f1c..2d1baf66a 100644
--- a/api/utils/tenant_utils.py
+++ b/api/utils/tenant_utils.py
@@ -30,6 +30,8 @@ def ensure_tenant_model_id_for_params(tenant_id: str, param_dict: dict) -> dict:
         if param_dict.get(key) and not param_dict.get(f"tenant_{key}"):
             model_type = _KEY_TO_MODEL_TYPE.get(key)
             tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key], model_type)
+            if not tenant_model and model_type == LLMType.CHAT:
+                tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key])
             if tenant_model:
                 param_dict.update({f"tenant_{key}": tenant_model.id})
             else:

I think this is a better fit for #14206 because it fixes the regression where it starts, with the fewest line changes possible.

codecov · 2026-04-20T05:41:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.66%. Comparing base (6712b50) to head (0c9ab77).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #14209      +/-   ##
==========================================
- Coverage   98.11%   96.66%   -1.45%     
==========================================
  Files          10       10              
  Lines         690      690              
  Branches      108      108              
==========================================
- Hits          677      667      -10     
- Misses          4        8       +4     
- Partials        9       15       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

spider-yamet · 2026-04-20T05:53:32Z

@6ba3i Thanks for your detailed feedback. I updated the fix to match your suggestion and kept it as narrow as possible. Instead of changing the shared model resolution behavior, I only handle it at memory creation time: we still try the typed CHAT lookup first, and if that misses, we retry once without model_type just for llm_id, so the existing multimodal tenant row can still be reused.

I also fixed the error handling in memory_api.create_memory(), so if strict model resolution fails, it now goes through the existing argument-error path instead of surfacing as a 500. Other broader compatibility changes were removed so the patch stays focused on the exact regression point you called out.

I would appreciate your feedback now. :)

6ba3i

i think your addition of the strict rule might've broken the ci @spider-yamet especially because it only happens in the memory tests in sdk, do you mind trying and removing them or fixing the ci errors ? Thanks !

spider-yamet · 2026-04-20T08:00:21Z

@6ba3i Could you please check again? pr is passing the ci :)

6ba3i

LGTM!

spider-yamet · 2026-04-20T08:28:50Z

@yingfeng could you please share your opinion?

Fix memory message failure

862c06f

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 🐞 bug Something isn't working, pull request that fix bug. labels Apr 18, 2026

spider-yamet changed the title ~~Fix memory chat model resolution for multimodal tenant LLMs~~ Fix memory resolution regression for multimodal Gemini models Apr 18, 2026

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread api/apps/restful_apis/memory_api.py

add exception handling

2a3bf25

6ba3i self-assigned this Apr 20, 2026

yingfeng added the ci Continue Integration label Apr 20, 2026

yingfeng marked this pull request as draft April 20, 2026 05:33

yingfeng marked this pull request as ready for review April 20, 2026 05:33

simplify the fix by llm service

3f8f587

dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 20, 2026

6ba3i suggested changes Apr 20, 2026

View reviewed changes

Fix ci failure

0c9ab77

spider-yamet requested a review from 6ba3i April 20, 2026 08:05

6ba3i approved these changes Apr 20, 2026

View reviewed changes

yingfeng merged commit 78c3583 into infiniflow:main Apr 20, 2026
2 checks passed

coderabbitai Bot mentioned this pull request Apr 23, 2026

Fix llm_id validation for image2text Gemini usage. #14320

Open

1 task

Conversation

spider-yamet commented Apr 18, 2026 • edited by yingfeng Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Testing

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spider-yamet commented Apr 20, 2026

Uh oh!

6ba3i commented Apr 20, 2026

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

spider-yamet commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

6ba3i left a comment

Choose a reason for hiding this comment

Uh oh!

spider-yamet commented Apr 20, 2026

Uh oh!

6ba3i left a comment

Choose a reason for hiding this comment

Uh oh!

spider-yamet commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spider-yamet commented Apr 18, 2026 •

edited by yingfeng

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

codecov Bot commented Apr 20, 2026 •

edited

Loading

spider-yamet commented Apr 20, 2026 •

edited

Loading