Skip to content

Fix memory resolution regression for multimodal Gemini models#14209

Merged
yingfeng merged 4 commits intoinfiniflow:mainfrom
spider-yamet:fix/memory-message-fail
Apr 20, 2026
Merged

Fix memory resolution regression for multimodal Gemini models#14209
yingfeng merged 4 commits intoinfiniflow:mainfrom
spider-yamet:fix/memory-message-fail

Conversation

@spider-yamet
Copy link
Copy Markdown
Contributor

@spider-yamet spider-yamet commented Apr 18, 2026

What problem does this PR solve?

Fixes #14206.

This issue is a regression. PR #9520 previously changed Gemini models from image2text to chat to fix chat-side resolution, but PR #13073 later restored those Gemini entries to image2text during model-list updates, which reintroduced the bug.

The underlying problem is that Gemini models are multimodal and advertise both CHAT and IMAGE2TEXT, while tenant model resolution still depends on a single stored model_type. That makes chat-only flows such as memory extraction fragile when a compatible model is stored as image2text.

This PR fixes the issue at the model resolution layer instead of changing llm_factories.json again:

  • keep the stored tenant model type unchanged
  • try exact model_type lookup first
  • if no exact match is found, fall back only when the model metadata shows the requested capability is supported
  • coerce the runtime config to the requested type for chat callers
  • fail fast in memory creation instead of silently persisting tenant_llm_id=0

This preserves existing multimodal and image2text behavior while restoring chat compatibility for memory-related flows.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

Testing

  • Re-checked the current memory creation and memory message extraction paths against the updated resolution logic
  • Verified locally that a Gemini-style tenant model stored as image2text but tagged with CHAT can still be resolved for chat
  • Verified get_model_config_by_type_and_name(..., CHAT, ...) returns a chat-compatible runtime config
  • Verified get_model_config_by_id(..., CHAT) also returns a chat-compatible runtime config
  • Verified strict resolution still fails when the model metadata does not advertise chat capability

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 🐞 bug Something isn't working, pull request that fix bug. labels Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

Tenant-model lookup gained a keyword-only strict flag and a CHAT-type fallback lookup without model_type filtering; when strict=True unresolved models now raise an ArgumentException. The memory creation handler now invokes the resolver inside its existing try and explicitly errors if tenant_llm_id is missing.

Changes

Cohort / File(s) Summary
Memory API Handler
api/apps/restful_apis/memory_api.py
Moved ensure_tenant_model_id_for_params(..., strict=True) into the existing try block and added an explicit check that raises an ArgumentException when tenant_llm_id is absent.
Tenant Model Resolution
api/utils/tenant_utils.py
Added keyword-only strict: bool = False to ensure_tenant_model_id_for_params; for CHAT keys, attempt a fallback lookup without model_type when initial lookup fails. If strict=True and unresolved, raise ArgumentException; otherwise default to tenant_{key} = 0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through code and chased a clue,
When types hid models from my view,
I tried a fallback, soft and spry,
Now memories find their model—hooray, reply! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the primary fix: resolving a memory resolution regression for multimodal Gemini models, which matches the core objective of addressing issue #14206.
Linked Issues check ✅ Passed The PR directly addresses issue #14206 by implementing the recommended narrowed fix: adding strict-mode fallback retry logic for CHAT-only lookups in ensure_tenant_model_id_for_params [#14206] and adding validation to fail fast in memory creation [#14206].
Out of Scope Changes check ✅ Passed All changes are scoped to addressing #14206: modifications to tenant model resolution (tenant_utils.py) and memory creation error handling (memory_api.py) follow the recommended narrowed approach without unrelated refactoring or broader system changes.
Description check ✅ Passed The PR description clearly identifies the problem (regression with Gemini multimodal models), provides context, lists the fix strategy, and includes comprehensive testing verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@spider-yamet spider-yamet changed the title Fix memory chat model resolution for multimodal tenant LLMs Fix memory resolution regression for multimodal Gemini models Apr 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
api/db/services/tenant_llm_service.py (1)

109-116: Consider logging when fallback compatibility match is used.

When the exact model_type lookup fails but a compatible model is found via fallback, this is a significant behavioral divergence from strict matching. Adding a debug log here would help with troubleshooting resolution issues.

📝 Suggested logging
         candidate = compatible_objs[0]
         if cls.model_supports_type(candidate.llm_name, model_type_val, candidate.llm_factory):
+            logging.debug(
+                "Model %s with stored type %s resolved via compatibility fallback for requested type %s",
+                candidate.llm_name, candidate.model_type, model_type_val
+            )
             return candidate
         return None

As per coding guidelines: **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/db/services/tenant_llm_service.py` around lines 109 - 116, The current
flow returns a compatible candidate when exact model_type lookup fails without
any logging; modify the method that calls cls._query_model_records to emit a
debug log when a fallback compatibility match is used: after selecting candidate
= compatible_objs[0] and before returning it, call the module logger to log that
a fallback match was chosen, including identifiers like candidate.llm_name,
candidate.llm_factory and the requested model_type_val so operators can trace
why the non-exact model was returned; keep the log at debug level and do not
change return behavior.
api/db/joint_services/tenant_model_service.py (1)

24-26: Consider extracting _normalize_model_type to avoid duplication.

This helper is identical to TenantLLMService._normalize_model_type in api/db/services/tenant_llm_service.py (lines 47-49). Consider extracting it to a shared utility module (e.g., api/utils/model_utils.py) to maintain DRY principles.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/db/joint_services/tenant_model_service.py` around lines 24 - 26, The
helper _normalize_model_type is duplicated between tenant_model_service.py and
TenantLLMService._normalize_model_type in tenant_llm_service.py; extract this
logic into a shared utility (e.g., create api/utils/model_utils.py with a
function normalize_model_type) and replace both usages to call that new utility
function, updating imports in tenant_model_service.py and tenant_llm_service.py
to import normalize_model_type so the shared logic is centralized and DRY.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/memory_api.py`:
- Around line 35-37: The call to ensure_tenant_model_id_for_params (with
strict=True) can raise ArgumentException but is currently executed before the
try/except that handles ArgumentException; move that call into the existing try
block that begins after req is parsed (or wrap just that call in a small
try/except) so that any ArgumentException is caught and handled by the same
logic that returns get_error_argument_result(); reference
ensure_tenant_model_id_for_params, get_request_json, current_user.id and the
existing ArgumentException handler to locate where to move or wrap the call.

---

Nitpick comments:
In `@api/db/joint_services/tenant_model_service.py`:
- Around line 24-26: The helper _normalize_model_type is duplicated between
tenant_model_service.py and TenantLLMService._normalize_model_type in
tenant_llm_service.py; extract this logic into a shared utility (e.g., create
api/utils/model_utils.py with a function normalize_model_type) and replace both
usages to call that new utility function, updating imports in
tenant_model_service.py and tenant_llm_service.py to import normalize_model_type
so the shared logic is centralized and DRY.

In `@api/db/services/tenant_llm_service.py`:
- Around line 109-116: The current flow returns a compatible candidate when
exact model_type lookup fails without any logging; modify the method that calls
cls._query_model_records to emit a debug log when a fallback compatibility match
is used: after selecting candidate = compatible_objs[0] and before returning it,
call the module logger to log that a fallback match was chosen, including
identifiers like candidate.llm_name, candidate.llm_factory and the requested
model_type_val so operators can trace why the non-exact model was returned; keep
the log at debug level and do not change return behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 28e1cee5-54ca-4887-afb3-269b18629c22

📥 Commits

Reviewing files that changed from the base of the PR and between 6712b50 and 862c06f.

📒 Files selected for processing (6)
  • api/apps/restful_apis/memory_api.py
  • api/db/joint_services/memory_message_service.py
  • api/db/joint_services/tenant_model_service.py
  • api/db/services/dialog_service.py
  • api/db/services/tenant_llm_service.py
  • api/utils/tenant_utils.py

Comment thread api/apps/restful_apis/memory_api.py
@spider-yamet
Copy link
Copy Markdown
Contributor Author

@yingfeng Could you please assign the review of this PR?

@6ba3i 6ba3i self-assigned this Apr 20, 2026
@6ba3i
Copy link
Copy Markdown
Contributor

6ba3i commented Apr 20, 2026

Thanks for working on this @spider-yamet! I agree that the PR fixes the regression in #14206.

That said, I think this specific bug can be fixed in a smaller and more targeted way.

The failure starts during memory creation, in ensure_tenant_model_id_for_params(...). For llm_id, we resolve the tenant model as LLMType.CHAT. That lookup is strict, so it misses multimodal Gemini rows that are stored as image2text. Once that happens, memory creation persists an invalid tenant_llm_id, and the later memory extraction path fails because it no longer has a valid tenant model id to use.

Because the bad state is introduced there, I think the narrowest fix is to handle it there:

  • keep the current typed lookup first
  • only for LLMType.CHAT, if that lookup misses, retry once without model_type
  • if the untyped lookup finds the existing tenant row, persist that id and leave the rest of the flow unchanged

That solves the issue at the exact failure point, with a very small patch. It also avoids broadening shared lookup behavior in other layers, and it leaves existing image2text / PDF parsing behavior alone.

Suggested change:

diff --git a/api/utils/tenant_utils.py b/api/utils/tenant_utils.py
index 83da91f1c..2d1baf66a 100644
--- a/api/utils/tenant_utils.py
+++ b/api/utils/tenant_utils.py
@@ -30,6 +30,8 @@ def ensure_tenant_model_id_for_params(tenant_id: str, param_dict: dict) -> dict:
         if param_dict.get(key) and not param_dict.get(f"tenant_{key}"):
             model_type = _KEY_TO_MODEL_TYPE.get(key)
             tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key], model_type)
+            if not tenant_model and model_type == LLMType.CHAT:
+                tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key])
             if tenant_model:
                 param_dict.update({f"tenant_{key}": tenant_model.id})
             else:

I think this is a better fit for #14206 because it fixes the regression where it starts, with the fewest line changes possible.

@yingfeng yingfeng added the ci Continue Integration label Apr 20, 2026
@yingfeng yingfeng marked this pull request as draft April 20, 2026 05:33
@yingfeng yingfeng marked this pull request as ready for review April 20, 2026 05:33
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.66%. Comparing base (6712b50) to head (0c9ab77).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14209      +/-   ##
==========================================
- Coverage   98.11%   96.66%   -1.45%     
==========================================
  Files          10       10              
  Lines         690      690              
  Branches      108      108              
==========================================
- Hits          677      667      -10     
- Misses          4        8       +4     
- Partials        9       15       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 20, 2026
@spider-yamet
Copy link
Copy Markdown
Contributor Author

spider-yamet commented Apr 20, 2026

@6ba3i Thanks for your detailed feedback. I updated the fix to match your suggestion and kept it as narrow as possible. Instead of changing the shared model resolution behavior, I only handle it at memory creation time: we still try the typed CHAT lookup first, and if that misses, we retry once without model_type just for llm_id, so the existing multimodal tenant row can still be reused.

I also fixed the error handling in memory_api.create_memory(), so if strict model resolution fails, it now goes through the existing argument-error path instead of surfacing as a 500. Other broader compatibility changes were removed so the patch stays focused on the exact regression point you called out.

I would appreciate your feedback now. :)

Copy link
Copy Markdown
Contributor

@6ba3i 6ba3i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think your addition of the strict rule might've broken the ci @spider-yamet especially because it only happens in the memory tests in sdk, do you mind trying and removing them or fixing the ci errors ? Thanks !

@spider-yamet
Copy link
Copy Markdown
Contributor Author

@6ba3i Could you please check again? pr is passing the ci :)

@spider-yamet spider-yamet requested a review from 6ba3i April 20, 2026 08:05
Copy link
Copy Markdown
Contributor

@6ba3i 6ba3i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@spider-yamet
Copy link
Copy Markdown
Contributor Author

@yingfeng could you please share your opinion?

@yingfeng yingfeng merged commit 78c3583 into infiniflow:main Apr 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐖api The modified files are located under directory 'api/apps/sdk' 🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Python SDK - Memory Message Processing Fails Due to Incorrect model_type in llm_factories.json

3 participants