Fix(embeddings): Wrap each separate input in a Content+Part to fix batching#4873
Fix(embeddings): Wrap each separate input in a Content+Part to fix batching#4873yorickvP wants to merge 1 commit intopydantic:mainfrom
Conversation
gemini-embedding-2-preview was interpreting an array as a single multi-part embedding request, causing only one embedding to be returned. This commit wraps each separate input in a Content+Part object to fix that issue.
| title=settings.get('google_title'), | ||
| ) | ||
|
|
||
| contents: ContentListUnion = [Content(parts=[Part(text=text)]) for text in inputs] |
There was a problem hiding this comment.
🚩 Content role defaults to None instead of 'user'
The old code passed raw strings (list[str]) which the Google SDK internally converted to Content objects with role='user'. The new code at line 166 creates Content(parts=[Part(text=text)]) without specifying role, which defaults to role=None. I verified this by inspecting the SDK: Content(parts=[Part(text='hello')]) yields role=None.
The existing VCR cassette (tests/cassettes/test_embeddings/TestGoogle.test_query.yaml:22) shows role: user in the recorded request body. However, VCR matching is configured in tests/conftest.py to only match on method and path (not body), so tests still pass.
For the embedding API specifically, the role field is semantically irrelevant — the API extracts text from parts regardless of role. This is not a bug, but if strict request parity with the old behavior is desired, role='user' could be added explicitly. The cassettes should ideally be re-recorded to reflect the actual new request format.
Was this helpful? React with 👍 or 👎 to provide feedback.
gemini-embedding-2-preview was interpreting an array as a single multi-part embedding request, causing only one embedding to be returned.
This commit wraps each separate input in a Content+Part object to fix that issue.
gemini-embedding-2-preview#4872Pre-Review Checklist
make formatandmake typecheck.Pre-Merge Checklist