fix: add doc_type to Weaviate properties and default Vector attributes#33398
fix: add doc_type to Weaviate properties and default Vector attributes#33398RickDamon wants to merge 2 commits intolanggenius:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where multimodal image retrieval failed on Docker deployments utilizing Weaviate as the vector database. The problem stemmed from Weaviate's search methods not returning the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a bug where doc_type was missing from Weaviate search results, causing failures in multimodal image retrieval. The changes are well-implemented by updating the default vector attributes and the Weaviate collection schema to include doc_type. The addition of comprehensive unit tests is a great way to ensure the fix is robust and prevents future regressions. I have one minor suggestion to improve the new test code's clarity.
api/tests/unit_tests/core/rag/datasource/vdb/weaviate/test_weaviate_vector.py
Outdated
Show resolved
Hide resolved
Fixes langgenius#33388. Weaviate's search_by_vector only returns properties listed in return_properties. The doc_type field was missing from both the default attributes list and Weaviate's schema definition, causing multimodal image retrieval to fail on Docker deployments (which use Weaviate by default). Changes: - Add doc_type to Vector class default attributes list - Add doc_type property in Weaviate _create_collection schema - Add doc_type check in Weaviate _ensure_properties for existing collections - Add unit tests for doc_type handling in Weaviate vector operations
154b7fe to
59961a3
Compare
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-03-13 12:33:05.424910580 +0000
+++ /tmp/pyrefly_pr.txt 2026-03-13 12:32:56.247946792 +0000
@@ -385,7 +385,7 @@
ERROR Object of class `list` has no attribute `fields` [missing-attribute]
--> core/rag/datasource/vdb/vikingdb/vikingdb_vector.py:143:55
ERROR Class member `WeaviateVector._get_uuids` overrides parent class `BaseVector` in an inconsistent manner [bad-param-name-override]
- --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:237:9
+ --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:240:9
ERROR `response` may be uninitialized [unbound-name]
--> core/rag/extractor/firecrawl/firecrawl_app.py:134:16
ERROR `response` may be uninitialized [unbound-name]
|
Fixes #33388. Weaviate's search_by_vector only returns properties listed in return_properties. The doc_type field was missing from both the default attributes list and Weaviate's schema definition, causing multimodal image retrieval to fail on Docker deployments (which use Weaviate by default).
Changes:
Important
Fixes #<issue number>.Summary
Fixes #33388
Problem
When using multimodal embedding (e.g., Tongyi
multimodal-embedding-v1) to index documents containing images, the image retrieval fails with empty results on Docker deployments (which use Weaviate as the default vector database).Root Cause: Weaviate's
search_by_vectorandsearch_by_full_textmethods only return properties explicitly listed in thereturn_propertiesparameter. Thedoc_typefield was missing from:attributeslist in theVectorclass (vector_factory.py)_create_collectionand_ensure_properties)This caused
doc_typeto beNonein search results, so image documents were incorrectly routed toindex_node_ids(which queriesDocumentSegmentby segment index node ID) instead ofimage_doc_ids(which queriesUploadFile), leading to empty retrieval results.Note: This issue only affects Weaviate users. Other vector databases (Qdrant, Milvus, PgVector, Chroma, Elasticsearch, etc.) return all stored metadata by default, so
doc_typeis preserved in their search results.Changes
"doc_type"toVectorclass defaultattributeslist, so Weaviate includes it inreturn_propertiesduring searchdoc_type(TEXT type) property definition in Weaviate_create_collection()schema for new collectionsdoc_typecheck in Weaviate_ensure_properties()to automatically add the property to existing collectionsScreenshots
Not applicable (backend-only fix, no UI changes).
Checklist
make lintandmake type-check(backend) andcd web && npx lint-staged(frontend) to appease the lint gods