Skip to content

Fix(embeddings): Wrap each separate input in a Content+Part to fix batching#4873

Open
yorickvP wants to merge 1 commit intopydantic:mainfrom
datakami:embed-proper-contentlistunion
Open

Fix(embeddings): Wrap each separate input in a Content+Part to fix batching#4873
yorickvP wants to merge 1 commit intopydantic:mainfrom
datakami:embed-proper-contentlistunion

Conversation

@yorickvP
Copy link
Copy Markdown

@yorickvP yorickvP commented Mar 27, 2026

gemini-embedding-2-preview was interpreting an array as a single multi-part embedding request, causing only one embedding to be returned.

This commit wraps each separate input in a Content+Part object to fix that issue.

Pre-Review Checklist

  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • No breaking changes in accordance with the version policy.
  • Linting and type checking pass per make format and make typecheck.
  • PR title is fit for the release changelog.

Pre-Merge Checklist

  • New tests for any fix or new behavior, maintaining 100% coverage.
  • Updated documentation for new features and behaviors, including docstrings for API docs.

gemini-embedding-2-preview was interpreting an array as a single
multi-part embedding request, causing only one embedding to be
returned.

This commit wraps each separate input in a Content+Part object to fix
that issue.
@github-actions github-actions bot added size: S Small PR (≤100 weighted lines) bug Report that something isn't working, or PR implementing a fix labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

title=settings.get('google_title'),
)

contents: ContentListUnion = [Content(parts=[Part(text=text)]) for text in inputs]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Content role defaults to None instead of 'user'

The old code passed raw strings (list[str]) which the Google SDK internally converted to Content objects with role='user'. The new code at line 166 creates Content(parts=[Part(text=text)]) without specifying role, which defaults to role=None. I verified this by inspecting the SDK: Content(parts=[Part(text='hello')]) yields role=None.

The existing VCR cassette (tests/cassettes/test_embeddings/TestGoogle.test_query.yaml:22) shows role: user in the recorded request body. However, VCR matching is configured in tests/conftest.py to only match on method and path (not body), so tests still pass.

For the embedding API specifically, the role field is semantically irrelevant — the API extracts text from parts regardless of role. This is not a bug, but if strict request parity with the old behavior is desired, role='user' could be added explicitly. The cassettes should ideally be re-recorded to reflect the actual new request format.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Report that something isn't working, or PR implementing a fix size: S Small PR (≤100 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug when using gemini-embedding-2-preview

1 participant