Skip to content

feat(genkit_vertexai): support Gemini and multimodal embedders#261

Open
CorieW wants to merge 17 commits into
genkit-ai:mainfrom
invertase:@invertase/vertexai-improve-embedders
Open

feat(genkit_vertexai): support Gemini and multimodal embedders#261
CorieW wants to merge 17 commits into
genkit-ai:mainfrom
invertase:@invertase/vertexai-improve-embedders

Conversation

@CorieW

@CorieW CorieW commented Apr 28, 2026

Copy link
Copy Markdown
Member

Handle Vertex AI embedding models with the correct request shape for Gemini, legacy text embedding, and multimodal embedding APIs. Add fallback behavior for older Gemini models and expand tests around embedder listing, request routing, and mock HTTP responses.

Experienced Problem

Currently, multimodal embedder has a limitation where it can only have one document at one time, due to the following limitation:

Looks like there is a limitation in the current embedder response shape for multimodal batch embedding in Genkit Dart (maybe others as well?). embedMany returns List<Embedding>, which works well when each input document produces exactly one embedding, but becomes ambiguous when a single document can produce multiple modality-specific embeddings.

For example:

  • Document 1: image + text
  • Document 2: text
  • Document 3: text

A flat result may contain four embeddings, but it does not indicate which embeddings belong to which input document or modality. A clearer response shape for multimodal batch embedding would be List<List<Embedding>>, where the outer list maps to input documents and the inner list contains that document's modality-specific embeddings.

Solutions

  1. Allow any number of single modality documents OR a single document with any number of modalities.
  2. Add a multimodal-specific embedding to Genkit to avoid breaking changes with non-multimodal embedders.
  3. Use metadata property to associate embeddings to documents.

Implemented Solution

Used metadata property to associate embeddings to documents.

image

Before these changes, the following things were happening

  • Model Parity
    • text-embedding-005 - “No input variables specified for this action”.
    • text-multilingual-embedding-002 - “No input variables specified for this action”.
    • multimodalembedding@001 - Doesn't work.
    • gemini-embedding-001 - “No input variables specified for this action”.

Testing

  • Embedders registered
image
  • Gemini embedder
image
  • Text embedder
image
  • Multimodal embedder
image

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and expands Vertex AI embedder support by introducing a dedicated embedders.dart module. It adds support for Gemini embedding models (including batch requests), multimodal embeddings, and legacy text embedding models, each with specific request/response handling. A review comment identifies that for text embedding models using the predict endpoint, the task_type field should be moved from the parameters object to the instance object and renamed to task_type (snake_case) for REST API compatibility.

Comment thread packages/genkit_vertexai/lib/src/embedders.dart
Comment thread packages/genkit_vertexai/lib/src/embedders.dart Outdated
Comment thread packages/genkit_vertexai/lib/src/embedders.dart Outdated
Comment thread packages/genkit_vertexai/lib/src/embedders.dart
Comment thread packages/genkit_vertexai/lib/src/embedders.dart Outdated
Comment thread packages/genkit_vertexai/lib/src/embedders.dart
@cabljac

cabljac commented May 11, 2026

Copy link
Copy Markdown
Collaborator

the multimodal media-input logic (embedders.dart:323-434) has 7 throw sites but only 2 covered by tests. Could add coverage for the remaining failure modes?

  • Data-URI image input → asserts bytesBase64Encoded + mimeType in the
    request body.
  • gs:// image input → asserts gcsUri + mimeType in the request body.
  • Mixed text + media in one document → throws (constraint called out in the PR
    description, currently unpinned).
  • Multiple media parts in one document → throws.
  • Unsupported MIME type (e.g. audio/wav) → throws.
  • Missing MIME (no contentType, non-data: URL) → throws.

@cabljac

cabljac commented May 11, 2026

Copy link
Copy Markdown
Collaborator

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for Vertex AI embedders, including batch embedding and multimodal capabilities for text, image, and video. It refactors the Vertex AI plugin to use a dedicated embedder module and adds extensive unit tests. Feedback highlights opportunities to improve code robustness by using defensive parsing for API responses and safer URL parsing with Uri.tryParse to avoid potential runtime exceptions.

Comment thread packages/genkit_vertexai/lib/src/embedders.dart Outdated
Comment thread packages/genkit_vertexai/lib/src/embedders.dart Outdated
Comment thread packages/genkit_vertexai/lib/src/embedders.dart Outdated
@CorieW CorieW marked this pull request as ready for review May 12, 2026 15:14
@CorieW CorieW requested a review from pavelgj May 12, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants