Skip to content

Commit 6a66a6e

Browse files
authored
feat(gemini): multimodal content embedding via shared Content API (#459)
* docs(06): capture phase context * docs(state): record phase 6 context session * docs(06): research gemini multimodal adoption phase * docs(phase-6): add validation strategy * docs(06): create phase plan * feat(06-01): add content.go with conversion helpers and capability derivation - neutralIntentToTaskType map for 5 shared neutral intents to Gemini task types - extToMIME map for file extension MIME inference - capabilitiesForModel: full 5-modality caps for gemini-embedding-2-preview, text-only for others - resolveBytes: handles bytes, base64, file, and URL source kinds - resolveMIME: falls back from BinarySource.MIMEType to file extension - validateMIMEModality: rejects mismatched MIME/modality combinations - convertToGenaiContent / convertToGenaiContents: converts shared Content to genai.Content - resolveTaskTypeForContent: ProviderHints > intent mapper > default * feat(06-01): implement ContentEmbeddingFunction+CapabilityAware+IntentMapper on GeminiEmbeddingFunction - DefaultEmbeddingModel updated to gemini-embedding-2-preview; LegacyEmbeddingModel added for gemini-embedding-001 - Compile-time assertions for ContentEmbeddingFunction, CapabilityAware, IntentMapper - Client.CreateContentEmbedding: per-item task type resolution for single items, batch uses default - GeminiEmbeddingFunction.EmbedContent and EmbedContents: validate against caps then delegate to CreateContentEmbedding - GeminiEmbeddingFunction.Capabilities: delegates to capabilitiesForModel - GeminiEmbeddingFunction.MapIntent: translates 5 neutral intents, rejects non-neutral with escape-hatch hint - RegisterContent("google_genai") added to init() alongside existing RegisterDense * docs(06-01): complete gemini multimodal content interface plan - 06-01-SUMMARY.md: full plan summary with task commits, decisions, and patterns - STATE.md: advanced to plan 2, recorded metrics, added key decisions - ROADMAP.md: updated phase 6 plan progress (1/2 summaries) - REQUIREMENTS.md: marked GEM-01, GEM-02, GEM-03 complete * test(06-02): add unit tests for gemini multimodal content implementation - TestCapabilitiesForModel: 3 model variants (preview, legacy, unknown) - TestGeminiCapabilities: interface method on GeminiEmbeddingFunction - TestMapIntent: all 5 neutral intents map to correct Gemini task types - TestMapIntentRejectsNonNeutral: non-neutral intents rejected with ProviderHints hint - TestResolveMIME: explicit MIME, extension fallback, error cases - TestValidateMIMEModality: valid and invalid MIME-modality combinations - TestResolveBytesKinds: bytes, base64, file kinds (URL skipped) - TestConvertToGenaiContent*: text, binary, mixed, error cases - TestResolveTaskTypeForContent: priority chain (hints > intent > default) - TestEmbedContentLegacyModelRejectsMultimodal: D-03/D-04 negative case - TestGeminiContentRegistration: GEM-03 HasContent + BuildContent round-trip - TestGeminiContentConfigRoundTrip: Name + GetConfig -> BuildContent - TestDefaultModelChanged: D-01 constant verification * docs(06-02): complete gemini multimodal content tests plan - 06-02-SUMMARY.md: 19-function unit test suite proving GEM-01/GEM-02/GEM-03 - STATE.md: advanced to last-plan, 16/16 plans complete, decisions recorded - ROADMAP.md: phase 06 marked Complete (2/2 summaries) * docs(phase-06): complete phase execution * docs(phase-06): evolve PROJECT.md after phase completion * test(gemini): add multimodal content integration tests and mock-based coverage - Add testdata/ with 5 modality assets (PNG, MP3, MP4, PDF, text) - Add gemini_content_integration_test.go (ef tag): text, image, audio, video, PDF, mixed-part, batch, intent, ProviderHints, dimension tests - Add mock-based unit tests for CreateContentEmbedding, EmbedContent, EmbedContents defensive paths (empty response, API error, validation) - Add convertToGenaiContents, resolveBytes URL, unsupported kind tests - Trim MP3 to 60s (Gemini 80s audio limit) - Fix lint: gci import ordering on compile-time assertions - Coverage: 57.9% → 84.7% (unit-only: 72.9%) * docs: add phase 8 for Gemini/Nemotron documentation * fix(gemini): add nil source guards, structural validation, and configurable file size limit * refactor(gemini): use genai.NewPartFromURI for URL sources instead of client-side fetch * feat(gemini): honor Content.Dimension in single-content embedding requests * fix(gemini): derive capabilities from effective model when context override is set * fix(gemini): enforce MaxBatchSize on EmbedContents and default to 250 * fix(gemini): validate IntentMapper result with IsValid() for consistency * fix(gemini): add path traversal guard, use LimitReader for file reads, validate in CreateContentEmbedding - Reject file paths containing ".." to prevent path traversal - Replace os.Stat+os.ReadFile with os.Open+io.LimitReader to eliminate TOCTOU race - Add ValidateContents call in CreateContentEmbedding for defense-in-depth * fix(gemini): reject per-item overrides in batch requests and check nil embedding values - Error when batch contents have per-item Intent, Dimension, or ProviderHints since Gemini applies one config per batch (prevents silent data corruption) - Check for nil embedding Values in API response (aligns with Voyage pattern) * fix(gemini): enforce MaxFileSize on bytes and base64 payloads The genai SDK performs no client-side size validation. Enforce the configurable limit (default 100MB) on all inline payload types consistently, not just file reads. * docs(gemini): document batch limitations for per-item Content overrides * docs(phase-06): update validation strategy with Nyquist audit results * fix(gemini): pre-check base64 payload size before decoding to avoid OOM allocation
1 parent fce8437 commit 6a66a6e

26 files changed

Lines changed: 3930 additions & 25 deletions

.planning/PROJECT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,4 +60,4 @@ Go applications can use Chroma and embedding providers through a stable, portabl
6060
| Add vLLM/Nemotron as Phase 7 | Second provider (nvidia/omni-embed-nemotron-3b via vLLM) proves portability beyond a single provider | ✓ Good |
6161

6262
---
63-
*Last updated: 2026-03-20 — Phase 5 complete, Content API documented, DOCS-02 test coverage verified*
63+
*Last updated: 2026-03-20 — Phase 6 complete, Gemini natively implements ContentEmbeddingFunction+CapabilityAware+IntentMapper for 5 modalities, registered in content factory*

.planning/REQUIREMENTS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@
3434

3535
### Gemini Multimodal Adoption
3636

37-
- [ ] **GEM-01**: Gemini implements `SharedContentEmbeddingFunction` and `CapabilityAware` for text, image, audio, video, and PDF modalities
38-
- [ ] **GEM-02**: Neutral intents map to Gemini task types with explicit errors for unsupported combinations
39-
- [ ] **GEM-03**: Gemini is registered in the multimodal factory/registry path with config round-trip support
37+
- [x] **GEM-01**: Gemini implements `SharedContentEmbeddingFunction` and `CapabilityAware` for text, image, audio, video, and PDF modalities
38+
- [x] **GEM-02**: Neutral intents map to Gemini task types with explicit errors for unsupported combinations
39+
- [x] **GEM-03**: Gemini is registered in the multimodal factory/registry path with config round-trip support
4040

4141
### vLLM/Nemotron Provider Validation
4242

@@ -83,9 +83,9 @@
8383
| MAP-02 | Phase 4 | Complete |
8484
| DOCS-01 | Phase 5 | Complete |
8585
| DOCS-02 | Phase 5 | Complete |
86-
| GEM-01 | Phase 6 | Pending |
87-
| GEM-02 | Phase 6 | Pending |
88-
| GEM-03 | Phase 6 | Pending |
86+
| GEM-01 | Phase 6 | Complete |
87+
| GEM-02 | Phase 6 | Complete |
88+
| GEM-03 | Phase 6 | Complete |
8989
| VLLM-01 | Phase 7 | Pending |
9090
| VLLM-02 | Phase 7 | Pending |
9191

.planning/ROADMAP.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ This roadmap initializes GSD planning for the current brownfield milestone focus
1919
- [x] **Phase 3: Registry and Config Integration** - Extend registry/build-from-config and collection auto-wiring for richer multimodal interfaces. (completed 2026-03-20)
2020
- [x] **Phase 4: Provider Mapping and Explicit Failures** - Define neutral intent mapping and surface unsupported combinations explicitly. (completed 2026-03-20)
2121
- [x] **Phase 5: Documentation and Verification** - Update docs, examples, and tests around portable multimodal usage and compatibility. (completed 2026-03-20)
22-
- [ ] **Phase 6: Gemini Multimodal Adoption** - Wire Gemini into the shared multimodal contract with full modality support. (issue #443)
22+
- [x] **Phase 6: Gemini Multimodal Adoption** - Wire Gemini into the shared multimodal contract with full modality support. (issue #443) (completed 2026-03-20)
2323
- [ ] **Phase 7: vLLM/Nemotron Provider Validation** - Add vLLM OpenAI-compatible provider targeting nvidia/omni-embed-nemotron-3b to validate the foundation end-to-end.
2424

2525
## Phase Details
@@ -110,8 +110,11 @@ Plans:
110110
3. Existing `EmbedDocuments`/`EmbedQuery` behavior remains unchanged.
111111
4. Gemini is registered in the multimodal factory/registry path with config round-trip support.
112112
5. Unit tests cover request construction, intent mapping, and backward-compatible wrappers.
113+
**Plans**: 2 plans
113114

114-
Plans: TBD during planning
115+
Plans:
116+
- [x] 06-01-PLAN.md — Implement content helpers, interface methods, CreateContentEmbedding, registration, and default model update
117+
- [x] 06-02-PLAN.md — Add unit tests for capability derivation, intent mapping, MIME resolution, content conversion, negative cases, and config round-trip
115118

116119
### Phase 7: vLLM/Nemotron Provider Validation
117120
**Goal:** Add a vLLM OpenAI-compatible embedding provider targeting nvidia/omni-embed-nemotron-3b to validate the shared multimodal contract against a second real multimodal model beyond Gemini.
@@ -134,5 +137,15 @@ Plans: TBD during planning
134137
| 3. Registry and Config Integration | 3/3 | Complete | 2026-03-20 |
135138
| 4. Provider Mapping and Explicit Failures | 2/2 | Complete | 2026-03-20 |
136139
| 5. Documentation and Verification | 2/2 | Complete | 2026-03-20 |
137-
| 6. Gemini Multimodal Adoption | - | Not started | - |
140+
| 6. Gemini Multimodal Adoption | 2/2 | Complete | 2026-03-20 |
138141
| 7. vLLM/Nemotron Provider Validation | - | Not started | - |
142+
143+
### Phase 8: Document Gemini and Nemotron multimodal embedding functions
144+
145+
**Goal:** [To be planned]
146+
**Requirements**: TBD
147+
**Depends on:** Phase 7
148+
**Plans:** 0 plans
149+
150+
Plans:
151+
- [ ] TBD (run /gsd:plan-phase 8 to break down)

.planning/STATE.md

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
gsd_state_version: 1.0
3-
milestone: v0.4
4-
milestone_name: milestone
3+
milestone: v0.4.1
4+
milestone_name: Provider-Neutral Multimodal Foundations
55
status: unknown
6-
stopped_at: Completed 05-01-PLAN.md
7-
last_updated: "2026-03-20T16:55:29.356Z"
6+
stopped_at: Completed 06-02-PLAN.md
7+
last_updated: "2026-03-20T20:11:45.881Z"
88
progress:
99
total_phases: 7
10-
completed_phases: 5
11-
total_plans: 14
12-
completed_plans: 14
10+
completed_phases: 6
11+
total_plans: 16
12+
completed_plans: 16
1313
---
1414

1515
# Project State
@@ -19,12 +19,12 @@ progress:
1919
See: .planning/PROJECT.md (updated 2026-03-18)
2020

2121
**Core value:** Go applications can use Chroma and embedding providers through a stable, portable API that minimizes provider-specific friction.
22-
**Current focus:** Phase 05documentation-and-verification
22+
**Current focus:** Phase 06gemini-multimodal-adoption
2323

2424
## Current Position
2525

26-
Phase: 05 (documentation-and-verification) — EXECUTING
27-
Plan: 1 of 2
26+
Phase: 7
27+
Plan: Not started
2828

2929
## Performance Metrics
3030

@@ -58,6 +58,8 @@ Plan: 1 of 2
5858
| Phase 04 P02 | 4 | 2 tasks | 2 files |
5959
| Phase 05 P02 | 2 | 1 tasks | 1 files |
6060
| Phase 05 P01 | 2 | 2 tasks | 2 files |
61+
| Phase 06 P01 | 5 | 2 tasks | 2 files |
62+
| Phase 06 P02 | 10min | 1 tasks | 1 files |
6163

6264
## Accumulated Context
6365

@@ -90,13 +92,19 @@ Recent decisions affecting current work:
9092
- [Phase 05-01]: Show mixed-part Roboflow example with separate Content items via EmbedContents (one Part per Content due to adapter constraint)
9193
- [Phase 05-01]: Frame both EmbedDocuments and Content API as coexisting indefinitely — no deprecation signal in docs
9294
- [Phase 05-01]: Escape-hatch admonition for ProviderHints references godoc rather than documenting mechanism inline
95+
- [Phase 06-01]: Default model updated to gemini-embedding-2-preview; LegacyEmbeddingModel constant added for gemini-embedding-001
96+
- [Phase 06-01]: Batch requests use default task type for all items; single-item requests allow per-item ProviderHints override
97+
- [Phase 06-01]: resolveMIME falls back from BinarySource.MIMEType to file extension; fails explicitly when neither resolves
98+
- [Phase 06-02]: Construct GeminiEmbeddingFunction via struct literal in unit tests to avoid genai.NewClient network calls while keeping tests hermetic
99+
- [Phase 06-02]: EmbedContentLegacyModelRejectsMultimodal uses dual-string check because ValidateContentSupport produces message with 'does not support' not 'unsupported'
93100

94101
### Roadmap Evolution
95102

96103
- Project initialized around provider-neutral multimodal embedding foundations (#442).
97104
- Rebranded milestone v0.5 → v0.4.1 (all changes additive, no public API breakage).
98105
- Added Phase 6: Gemini Multimodal Adoption (#443).
99106
- Added Phase 7: vLLM/Nemotron Provider Validation (nvidia/omni-embed-nemotron-3b).
107+
- Added Phase 8: Document Gemini and Nemotron multimodal embedding functions.
100108

101109
### Pending Todos
102110

@@ -122,6 +130,6 @@ None yet.
122130

123131
## Session
124132

125-
**Last Date:** 2026-03-20T16:37:59.612Z
126-
**Stopped At:** Completed 05-01-PLAN.md
133+
**Last Date:** 2026-03-20T20:05:03.640Z
134+
**Stopped At:** Completed 06-02-PLAN.md
127135
**Resume File:** None

.planning/config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"granularity": "standard",
44
"parallelization": true,
55
"commit_docs": true,
6-
"model_profile": "balanced",
6+
"model_profile": "quality",
77
"workflow": {
88
"research": true,
99
"plan_check": true,

0 commit comments

Comments
 (0)