Skip to content

feat(embedding): add multimodal support to Gemini backend#380

Closed
murasame-desu-ai wants to merge 6 commits intoNevaMind-AI:mainfrom
murasame-desu-ai:feat/gemini-multimodal-embedding
Closed

feat(embedding): add multimodal support to Gemini backend#380
murasame-desu-ai wants to merge 6 commits intoNevaMind-AI:mainfrom
murasame-desu-ai:feat/gemini-multimodal-embedding

Conversation

@murasame-desu-ai
Copy link

Summary

  • Extend EmbeddingBackend.build_embedding_payload() input type from list[str] to list[str | dict] for multimodal support
  • Add _build_parts() helper in Gemini backend to convert text, image, or text+image dict inputs into Gemini content parts format
  • Support bytes (auto base64-encoded) and str (pre-encoded) image inputs with configurable mime_type
  • Add 9 unit tests covering text-only, image-only, multimodal, mixed inputs, and response parsing

Test plan

  • All 9 new tests pass (pytest tests/test_gemini_embedding.py -v)
  • Verify existing embedding tests still pass (no existing tests found)
  • Manual test with real Gemini API endpoint (optional)

🤖 Generated with Claude Code

murasame-desu-ai and others added 6 commits February 10, 2026 00:00
- Add Anthropic LLM provider (Claude API support with Bearer/x-api-key auth)
- Add Gemini LLM and embedding provider (Google AI Studio API)
- Improve SQLite repository with better embedding search
- Add embed_api_key for separate embedding authentication
- Fix max_tokens handling for different providers
- Add context that this is a personal memory DB, not general knowledge
- Remove 'general knowledge questions → NO_RETRIEVE' rule that caused
  false negatives for personal terms/slang
- Add rules to RETRIEVE for unknown names, nicknames, slang
- Add 'when in doubt, RETRIEVE' principle

Fixes issue where queries like group chat nicknames or personal events
were incorrectly classified as general knowledge and skipped retrieval.
Extend GeminiEmbeddingBackend.build_embedding_payload() to accept
dict inputs with text, image (bytes or base64 str), and mime_type
fields alongside plain strings. Backward compatible with existing
text-only callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant