Support image ingestion and semantic vectorization as a first-class database source

### Prerequisites

- [x] Search the [current open issues](https://github.com/googleapis/genai-toolbox/issues)

### What are you trying to do that currently feels hard or impossible?

I am building AI agent workflows that require searching, clustering, and operating over a large collection of images using semantic similarity. Currently, genai-toolbox supports semantic search and vector operations for text and tabular data, but lacks any image ingestion or image-to-vector pipeline. It is difficult or impossible to enable AI agents to retrieve or analyze images based on content (visual similarity, embedding queries, multimodal prompts, etc.) in a unified toolbox setup.

### Suggested Solution(s)

Introduce a new database source and/or toolset for image ingestion and vectorization.
- Allow users to add (or point to) a directory/bucket of images
- Use SOTA embedding models or vision APIs (e.g., Hugging Face, CLIP, Google Vision) to extract vector embeddings
- Store embeddings in supported vector DBs (BigQuery, Neo4j, Elasticsearch, Cloud SQL, etc.)
- Provide query, retrieval, and filtering tools for image-based semantic search
- Leverage existing patterns from text/vector tool implementations for seamless integration
- Expose standard toolbox/agent APIs for multimodal queries (text + image)

### Alternatives Considered

Workarounds involve building a parallel image search stack outside of genai-toolbox (e.g., using standalone FAISS, Milvus, or cloud vision vector DBs), which leads to fragmented agent workflows, duplicate effort, and poor integration with toolbox tools, query language, and agent APIs.

### Additional Details

Related PRs and issues: #2415 (Support all vector databases), PR #2909 (Cloud SQL Postgres vector tools), PR #2890 (BigQuery semantic search).

This feature would unlock:
- Visual similarity search for AI agents
- Multimodal (text+image) analytics
- Competitive position vs other open source agent/AI stacks
- Unified dev experience for vision + text + data AI tasks

happy to send a PR 😀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support image ingestion and semantic vectorization as a first-class database source #2948

Prerequisites

What are you trying to do that currently feels hard or impossible?

Suggested Solution(s)

Alternatives Considered

Additional Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support image ingestion and semantic vectorization as a first-class database source #2948

Description

Prerequisites

What are you trying to do that currently feels hard or impossible?

Suggested Solution(s)

Alternatives Considered

Additional Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions