Skip to content

Commit 9556690

Browse files
authored
feat: multimodal documentation, examples, and changelog for v0.4.1 (#464)
* docs(08): capture phase context * docs(state): record phase 8 context session * docs(08): research phase domain * docs(phase-8): add validation strategy * docs(08): create phase plan * fix(08): revise plans based on checker feedback * docs(08-01): update Gemini and VoyageAI sections in embeddings.md - Add VoyageAI option functions list (11 options) - Add VoyageAI Multimodal (Content API) subsection with image and video examples - Update Gemini default model to gemini-embedding-2-preview - Add WithMaxFileSize to Gemini option list - Add Gemini Multimodal (Content API) subsection with image and video examples - Both multimodal subsections cross-reference embeddings/multimodal.md * docs(08): capture phase context * docs(state): record phase 8 context session * docs(08): research phase domain * docs(phase-8): add validation strategy * docs(08): create phase plan * fix(08): revise plans based on checker feedback * docs(08-02): update README with multimodal capability mentions and example rows - Add Content API feature bullet in additional support features - Update Gemini line with multimodal modalities (text, images, audio, video, PDF) - Update VoyageAI line with multimodal modalities (text, images, video) - Add gemini_multimodal and voyage_multimodal example table rows * feat(08-01): add runnable multimodal example programs - Add examples/v2/gemini_multimodal/main.go with EmbedContent and EmbedContents - Add examples/v2/voyage_multimodal/main.go with EmbedContent and EmbedContents - Both examples demonstrate image and video modalities via Content API - Follow established example pattern with log.Fatalf error handling * docs(08-02): create CHANGELOG.md and correct ROADMAP.md naming - Create CHANGELOG.md with v0.4.1 release notes in Keep a Changelog format - Remove Nemotron parenthetical from Phase 7 description in ROADMAP.md - Mark Phase 7 as complete in ROADMAP.md - Reword Phase 8 success criteria to remove stale Nemotron reference * docs(08-01): complete provider documentation plan - Add 08-01-SUMMARY.md with execution results - Update STATE.md with plan progress and decisions - Update ROADMAP.md with plan completion status * docs(08-02): complete README/CHANGELOG/ROADMAP updates plan - Add 08-02-SUMMARY.md with execution results - Update STATE.md with progress, metrics, and session info - Update ROADMAP.md Phase 7 completion and naming correction * fix(08): use Embedding.Len() instead of non-existent ArrayOfFloat32 field * refactor(08): share testdata across providers and add VoyageAI integration tests - Move test assets from pkg/embeddings/gemini/testdata/ to shared pkg/embeddings/testdata/ for reuse by all multimodal providers - Add VoyageAI Content API integration tests mirroring Gemini's coverage (text, image, video, mixed parts, batch, intent) - Align example programs to use identical URLs and descriptions - Align VoyageAI doc section to match Gemini's content examples * docs(08): add phase verification report * docs(phase-08): complete phase execution * docs(phase-08): evolve PROJECT.md after phase completion * docs(08): rewrite multimodal.md as concept-first guide Replace reference-style docs with a guide that explains Content, Part, BinarySource, and Intent in plain terms before showing code. Add ASCII concept diagram, intent decision table, common recipes section, and provider support matrix. Use Gemini as primary example provider. * docs(09): add Phase 9 — convenience constructors and documentation polish * fix(08): use absolute paths in integration tests to avoid path traversal rejection The shared testdata at ../testdata/ triggers the containsDotDot path traversal check in resolveBytes. Using filepath.Abs resolves the relative path to an absolute one before it reaches the security check. * fix(08): replace panic with require.NoError in testdataPath helper Pass *testing.T to testdataPath so filepath.Abs errors use require.NoError instead of panic, consistent with project guidelines. * fix(08): skip VoyageAI video integration test — asset exceeds context window The 5.3MB test video base64-encodes to ~7MB which exceeds VoyageAI's 32K token context window. Video conversion logic is covered by unit tests with mock server. * fix(08): add small video asset for VoyageAI integration test VoyageAI tokenizes video at 1120 pixels/token with a 32K token limit. The original 1280x720 8s video (~197K tokens) far exceeds this. Create a 480x480 2s 15fps copy (~6K tokens) that fits within the limit. Original asset preserved for Gemini which handles large files natively. * fix(08): address PR review — remove placeholder URLs, add ef.Close() - Remove fake example.com/lecture.mp4 video URL from batch examples (both Gemini and VoyageAI) — only use real Wikimedia image URLs - Add explicit ef.Close() to Gemini example for proper resource cleanup * fix(08): use local testdata assets in examples instead of external URLs Examples should be fully self-contained and not rely on external resources. Replace Wikimedia image URLs and example.com placeholders with local testdata files (lioness.png, the_pounce.mp4). VoyageAI example uses the_pounce_small.mp4 to fit within 32K token limit. * fix(08): address PR review — run() pattern, defer Close(), add webp/gif MIME support - Refactor examples to use main() → run() error pattern for proper defer cleanup - Add .webp and .gif to Gemini extToMIME map (API supports them, map was missing) - Add run-from-repo-root comment to both multimodal examples * fix(08): resolve CWD dependency in examples, use placeholder paths in docs Examples now locate repo root via go.mod lookup so they run from any directory. Doc snippets use generic /path/to/ placeholders instead of internal testdata paths.
1 parent 4532be7 commit 9556690

28 files changed

Lines changed: 2539 additions & 93 deletions

File tree

.planning/PROJECT.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,17 @@ Go applications can use Chroma and embedding providers through a stable, portabl
2222

2323
### Active
2424

25-
- [ ] Add a provider-neutral multimodal input model that supports mixed-part requests across text, image, audio, video, and PDF.
26-
- [ ] Add provider-neutral intent semantics and per-request multimodal options without breaking current text-only and image-only flows.
25+
None — all v0.4.1 milestone requirements validated.
26+
27+
### Recently Validated
28+
29+
- ✓ Add a provider-neutral multimodal input model that supports mixed-part requests across text, image, audio, video, and PDF — Validated in Phases 1-2
30+
- ✓ Add provider-neutral intent semantics and per-request multimodal options without breaking current text-only and image-only flows — Validated in Phases 3-4
2731
- ✓ Public docs explain portable intent usage, escape hatches, and compatibility; tests cover validation, adapters, registry round-trips, and unsupported-combination failures — Validated in Phase 5: Documentation and Verification
2832

2933
### Out of Scope
3034

31-
- Shipping every provider on the new multimodal contract in this milestone — Gemini and vLLM/Nemotron validate the foundation, remaining providers adopt later
35+
- Shipping every provider on the new multimodal contract in this milestone — Gemini and VoyageAI validate the foundation, remaining providers adopt later
3236
- Replacing or removing existing `EmbeddingFunction` and image-only multimodal APIs — backwards compatibility is an explicit acceptance criterion
3337
- Changing collection/query semantics outside the embedding abstraction boundary — keep the milestone scoped to shared embedding foundations
3438

@@ -57,7 +61,7 @@ Go applications can use Chroma and embedding providers through a stable, portabl
5761
| Use the existing codebase map as brownfield context instead of re-running codebase mapping | `.planning/codebase/` already captures architecture, concerns, structure, and testing | ✓ Good |
5862
| Rebrand milestone from `v0.5` to `v0.4.1` | All changes since v0.4.0 are purely additive with no public API breakage — patch bump is correct semver | ✓ Good |
5963
| Add Gemini multimodal as Phase 6 (issue #443) | First concrete provider adoption validates the shared contract end-to-end | ✓ Good |
60-
| Add vLLM/Nemotron as Phase 7 | Second provider (nvidia/omni-embed-nemotron-3b via vLLM) proves portability beyond a single provider | ✓ Good |
64+
| Pivot Phase 7 from vLLM/Nemotron to VoyageAI | vLLM lacks NVOmniEmbedModel support; VoyageAI multimodal validates portability with text/image/video | ✓ Good |
6165

6266
---
63-
*Last updated: 2026-03-22Phase 7 complete, VoyageAI implements ContentEmbeddingFunction+CapabilityAware+IntentMapper for text/image/video, validating the shared multimodal contract is portable across providers*
67+
*Last updated: 2026-03-23v0.4.1 milestone complete. All 8 phases executed: shared contract, capabilities, registry, intent mapping, docs, Gemini multimodal, VoyageAI multimodal, provider documentation and changelog.*

.planning/ROADMAP.md

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ This roadmap initializes GSD planning for the current brownfield milestone focus
66

77
## Milestones
88

9-
- 🚧 **v0.4.1 Provider-Neutral Multimodal Foundations** - Phases 1-7 (current planning milestone)
9+
- 🚧 **v0.4.1 Provider-Neutral Multimodal Foundations** - Phases 1-9 (current planning milestone)
1010

1111
## v0.4.1 Provider-Neutral Multimodal Foundations
1212

13-
**Milestone Goal:** Add provider-neutral multimodal embedding foundations that support richer modalities and portable intents while preserving existing text-only and image-only APIs, then validate with Gemini and vLLM/Nemotron provider adoptions.
13+
**Milestone Goal:** Add provider-neutral multimodal embedding foundations that support richer modalities and portable intents while preserving existing text-only and image-only APIs, then validate with Gemini and VoyageAI provider adoptions.
1414

1515
## Phases
1616

@@ -20,7 +20,9 @@ This roadmap initializes GSD planning for the current brownfield milestone focus
2020
- [x] **Phase 4: Provider Mapping and Explicit Failures** - Define neutral intent mapping and surface unsupported combinations explicitly. (completed 2026-03-20)
2121
- [x] **Phase 5: Documentation and Verification** - Update docs, examples, and tests around portable multimodal usage and compatibility. (completed 2026-03-20)
2222
- [x] **Phase 6: Gemini Multimodal Adoption** - Wire Gemini into the shared multimodal contract with full modality support. (issue #443) (completed 2026-03-20)
23-
- [ ] **Phase 7: Voyage Multimodal Adoption** - Wire VoyageAI into the shared multimodal contract with text, image, and video support to validate the foundation end-to-end. (pivoted from vLLM/Nemotron — vLLM lacks NVOmniEmbedModel support)
23+
- [x] **Phase 7: Voyage Multimodal Adoption** - Wire VoyageAI into the shared multimodal contract with text, image, and video support to validate the foundation end-to-end.
24+
- [x] **Phase 8: Document Gemini and VoyageAI multimodal embedding functions** - Update provider docs, add runnable examples, update README, create changelog. (completed 2026-03-23)
25+
- [ ] **Phase 9: Convenience Constructors and Documentation Polish** - Add shorthand constructors to reduce Content API verbosity and update docs.
2426

2527
## Phase Details
2628

@@ -132,6 +134,24 @@ Plans:
132134
- [x] 07-01-PLAN.md — Implement content.go with multimodal types, conversion helpers, capabilities, intent mapping, and wire interface implementations + registration into voyage.go
133135
- [x] 07-02-PLAN.md — Add unit tests for capability derivation, intent mapping, content conversion, batch rejection, config round-trip, and registration
134136

137+
### Phase 8: Document Gemini and VoyageAI multimodal embedding functions
138+
**Goal:** Update provider-specific documentation for Gemini and VoyageAI to show Content API multimodal usage, add runnable examples, update README and changelog to close the v0.4.1 milestone.
139+
**Depends on:** Phase 7
140+
**Requirements**: [D-01, D-02, D-03, D-04, D-05, D-06, D-07, D-08, D-09, D-10, D-11]
141+
**Success Criteria** (what must be TRUE):
142+
1. Gemini and VoyageAI sections in embeddings.md have "Multimodal (Content API)" subsections with EmbedContent examples.
143+
2. Gemini default model references updated to gemini-embedding-2-preview throughout docs.
144+
3. VoyageAI section lists all available option functions.
145+
4. Runnable multimodal examples exist for both Gemini and VoyageAI.
146+
5. README mentions multimodal Content API capabilities and lists new examples.
147+
6. CHANGELOG.md documents v0.4.1 release.
148+
7. ROADMAP.md references VoyageAI consistently throughout all phase headings and descriptions.
149+
**Plans**: 2 plans
150+
151+
Plans:
152+
- [ ] 08-01-PLAN.md — Update embeddings.md provider sections and add runnable multimodal examples
153+
- [ ] 08-02-PLAN.md — Update README, create CHANGELOG, correct ROADMAP naming
154+
135155
## Progress
136156

137157
| Phase | Plans Complete | Status | Completed |
@@ -143,13 +163,19 @@ Plans:
143163
| 5. Documentation and Verification | 2/2 | Complete | 2026-03-20 |
144164
| 6. Gemini Multimodal Adoption | 2/2 | Complete | 2026-03-20 |
145165
| 7. Voyage Multimodal Adoption | 0/2 | Planning complete | - |
166+
| 8. Document Gemini and VoyageAI | 0/2 | Planning complete | - |
146167

147-
### Phase 8: Document Gemini and Nemotron multimodal embedding functions
168+
### Phase 9: Convenience Constructors and Documentation Polish
148169

149-
**Goal:** [To be planned]
170+
**Goal:** Add shorthand constructors (NewImageURL, NewImageFile, NewVideoURL, etc.) to reduce Content API verbosity, update multimodal docs and examples to use them, and verify the simplified surface end-to-end.
150171
**Requirements**: TBD
151-
**Depends on:** Phase 7
172+
**Depends on:** Phase 8
173+
**Success Criteria** (what must be TRUE):
174+
1. Convenience constructors exist for common modality+source combinations (at minimum: NewImageURL, NewImageFile, NewVideoURL, NewVideoFile, NewAudioFile, NewPDFFile).
175+
2. Existing tests and examples continue to work — constructors are additive sugar, not replacements.
176+
3. Multimodal docs and provider examples are updated to show the shorthand forms alongside the verbose forms.
177+
4. All new constructors have unit tests.
152178
**Plans:** 0 plans
153179

154180
Plans:
155-
- [ ] TBD (run /gsd:plan-phase 8 to break down)
181+
- [ ] TBD (run /gsd:plan-phase 9 to break down)

.planning/STATE.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
gsd_state_version: 1.0
33
milestone: v0.4.1
44
milestone_name: Provider-Neutral Multimodal Foundations
5-
status: unknown
6-
stopped_at: Completed 07-02-PLAN.md
7-
last_updated: "2026-03-22T18:59:25.986Z"
5+
status: Milestone complete
6+
stopped_at: Completed 08-02-PLAN.md
7+
last_updated: "2026-03-23T14:09:11.460Z"
88
progress:
99
total_phases: 8
10-
completed_phases: 7
11-
total_plans: 18
12-
completed_plans: 18
10+
completed_phases: 8
11+
total_plans: 20
12+
completed_plans: 20
1313
---
1414

1515
# Project State
@@ -19,7 +19,7 @@ progress:
1919
See: .planning/PROJECT.md (updated 2026-03-18)
2020

2121
**Core value:** Go applications can use Chroma and embedding providers through a stable, portable API that minimizes provider-specific friction.
22-
**Current focus:** Phase 07voyage-multimodal-adoption
22+
**Current focus:** Phase 08document-gemini-and-nemotron-multimodal-embedding-functions
2323

2424
## Current Position
2525

@@ -62,6 +62,8 @@ Plan: Not started
6262
| Phase 06 P02 | 10min | 1 tasks | 1 files |
6363
| Phase 07 P01 | 3min | 2 tasks | 2 files |
6464
| Phase 07 P02 | 4min | 1 tasks | 1 files |
65+
| Phase 08 P01 | 2min | 2 tasks | 3 files |
66+
| Phase 08 P02 | 4min | 2 tasks | 3 files |
6567

6668
## Accumulated Context
6769

@@ -103,6 +105,8 @@ Recent decisions affecting current work:
103105
- [Phase 07]: Batch requests reject per-item Intent/Dimension/ProviderHints with explicit errors matching Gemini pattern
104106
- [Phase 07]: multimodalURL derives endpoint by replacing /v1/embeddings suffix, falling back to constant for custom base URLs
105107
- [Phase 07]: Used struct literal construction for hermetic VoyageAI unit tests, matching Gemini Phase 06-02 pattern
108+
- [Phase 08]: Follow plan as specified - no deviations required for provider documentation updates
109+
- [Phase 08]: Reworded ROADMAP Phase 8 success criteria to eliminate last Nemotron text reference
106110

107111
### Roadmap Evolution
108112

@@ -111,6 +115,7 @@ Recent decisions affecting current work:
111115
- Added Phase 6: Gemini Multimodal Adoption (#443).
112116
- Added Phase 7: Originally vLLM/Nemotron, pivoted to Voyage Multimodal Adoption (vLLM lacks NVOmniEmbedModel support).
113117
- Added Phase 8: Document Gemini and Nemotron multimodal embedding functions.
118+
- Added Phase 9: Convenience Constructors and Documentation Polish — reduce Content API verbosity with shorthand constructors.
114119

115120
### Pending Todos
116121

@@ -136,6 +141,6 @@ None yet.
136141

137142
## Session
138143

139-
**Last Date:** 2026-03-22T16:18:22.574Z
140-
**Stopped At:** Completed 07-02-PLAN.md
144+
**Last Date:** 2026-03-23T12:45:15.049Z
145+
**Stopped At:** Completed 08-02-PLAN.md
141146
**Resume File:** None

0 commit comments

Comments
 (0)