Prerequisites
What are you trying to do that currently feels hard or impossible?
I am building AI agent workflows that require searching, clustering, and operating over a large collection of images using semantic similarity. Currently, genai-toolbox supports semantic search and vector operations for text and tabular data, but lacks any image ingestion or image-to-vector pipeline. It is difficult or impossible to enable AI agents to retrieve or analyze images based on content (visual similarity, embedding queries, multimodal prompts, etc.) in a unified toolbox setup.
Suggested Solution(s)
Introduce a new database source and/or toolset for image ingestion and vectorization.
- Allow users to add (or point to) a directory/bucket of images
- Use SOTA embedding models or vision APIs (e.g., Hugging Face, CLIP, Google Vision) to extract vector embeddings
- Store embeddings in supported vector DBs (BigQuery, Neo4j, Elasticsearch, Cloud SQL, etc.)
- Provide query, retrieval, and filtering tools for image-based semantic search
- Leverage existing patterns from text/vector tool implementations for seamless integration
- Expose standard toolbox/agent APIs for multimodal queries (text + image)
Alternatives Considered
Workarounds involve building a parallel image search stack outside of genai-toolbox (e.g., using standalone FAISS, Milvus, or cloud vision vector DBs), which leads to fragmented agent workflows, duplicate effort, and poor integration with toolbox tools, query language, and agent APIs.
Additional Details
Related PRs and issues: #2415 (Support all vector databases), PR #2909 (Cloud SQL Postgres vector tools), PR #2890 (BigQuery semantic search).
This feature would unlock:
- Visual similarity search for AI agents
- Multimodal (text+image) analytics
- Competitive position vs other open source agent/AI stacks
- Unified dev experience for vision + text + data AI tasks
happy to send a PR 😀
Prerequisites
What are you trying to do that currently feels hard or impossible?
I am building AI agent workflows that require searching, clustering, and operating over a large collection of images using semantic similarity. Currently, genai-toolbox supports semantic search and vector operations for text and tabular data, but lacks any image ingestion or image-to-vector pipeline. It is difficult or impossible to enable AI agents to retrieve or analyze images based on content (visual similarity, embedding queries, multimodal prompts, etc.) in a unified toolbox setup.
Suggested Solution(s)
Introduce a new database source and/or toolset for image ingestion and vectorization.
Alternatives Considered
Workarounds involve building a parallel image search stack outside of genai-toolbox (e.g., using standalone FAISS, Milvus, or cloud vision vector DBs), which leads to fragmented agent workflows, duplicate effort, and poor integration with toolbox tools, query language, and agent APIs.
Additional Details
Related PRs and issues: #2415 (Support all vector databases), PR #2909 (Cloud SQL Postgres vector tools), PR #2890 (BigQuery semantic search).
This feature would unlock:
happy to send a PR 😀