You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Shipping every provider on the new multimodal contract in this milestone — Geminiand VoyageAI validate the foundation, remaining providers adopt later
43
+
- Shipping every provider on the new multimodal contract in this milestone — Gemini, VoyageAI, and Twelve Labs are in scope; remaining providers adopt later
36
44
- Replacing or removing existing `EmbeddingFunction` and image-only multimodal APIs — backwards compatibility is an explicit acceptance criterion
37
45
- Changing collection/query semantics outside the embedding abstraction boundary — keep the milestone scoped to shared embedding foundations
@@ -23,6 +23,14 @@ This roadmap initializes GSD planning for the current brownfield milestone focus
23
23
-[x]**Phase 7: Voyage Multimodal Adoption** - Wire VoyageAI into the shared multimodal contract with text, image, and video support to validate the foundation end-to-end.
| 9. Convenience Constructors | 0/0 | Not started | - |
176
+
| 10. Code Cleanups | 0/0 | Not started | - |
177
+
| 11. Fork Double-Close Bug | 0/0 | Not started | - |
178
+
| 12. SDK Auto-Wiring Research | 0/0 | Not started | - |
179
+
| 13. Collection.ForkCount | 0/0 | Not started | - |
180
+
| 14. Delete with Limit | 0/0 | Not started | - |
181
+
| 15. OpenRouter Embeddings | 0/0 | Not started | - |
182
+
| 16. Twelve Labs EF | 0/0 | Not started | - |
183
+
| 17. Cloud RRF/GroupBy Tests | 0/0 | Not started | - |
167
184
168
185
### Phase 9: Convenience Constructors and Documentation Polish
169
186
@@ -179,3 +196,117 @@ Plans:
179
196
180
197
Plans:
181
198
-[ ] TBD (run /gsd:plan-phase 9 to break down)
199
+
200
+
### Phase 10: Code Cleanups
201
+
**Goal:** Consolidate duplicated path safety utilities into a shared internal package, fix the *context.Context pointer-to-interface anti-pattern across embedding providers, and add registry test cleanup to prevent global state leaks.
202
+
**Depends on:** Phase 9
203
+
**Issues**: #456, #461, #466
204
+
**Success Criteria** (what must be TRUE):
205
+
1. A shared `pkg/internal/pathutil` package provides `ContainsDotDot`, `ValidateFilePath`, and `SafePath` utilities.
206
+
2. Gemini, Voyage, and default_ef use the shared path utilities instead of local duplicates.
207
+
3. Gemini, Nomic, and Mistral use `context.Context` (not `*context.Context`) for DefaultContext.
208
+
4. Registry tests use `t.Cleanup` with unregister helpers to prevent global state leaks.
209
+
5. All existing tests pass without modification.
210
+
**Plans:** 0 plans
211
+
212
+
Plans:
213
+
-[ ] TBD (run /gsd:plan-phase 10 to break down)
214
+
215
+
### Phase 11: Fork Double-Close Bug
216
+
**Goal:** Fix EF pointer sharing in Fork() that causes the same underlying embedding function resource to be closed twice when client.Close() iterates cached collections.
217
+
**Depends on:** None (independent, but should precede ForkCount work)
218
+
**Issues**: #454
219
+
**Success Criteria** (what must be TRUE):
220
+
1. Forked collections do not double-close shared EF resources when client.Close() is called.
221
+
2. Both `embeddingFunction` and `contentEmbeddingFunction` ownership is handled correctly.
222
+
3. Tests cover Fork + Close lifecycle without panics or use-after-close errors.
223
+
4. Existing fork tests continue to pass.
224
+
**Plans:** 0 plans
225
+
226
+
Plans:
227
+
-[ ] TBD (run /gsd:plan-phase 11 to break down)
228
+
229
+
### Phase 12: SDK Auto-Wiring Research
230
+
**Goal:** Trace contentEmbeddingFunction auto-wiring behavior in official Chroma SDKs (Python, JavaScript) to verify chroma-go's approach is consistent or document deliberate differences.
1. Python SDK auto-wiring behavior documented for get_collection, list_collections, and create_collection.
235
+
2. JavaScript SDK auto-wiring behavior documented for equivalent operations.
236
+
3. Comparison with chroma-go behavior written up with any recommended changes or documented differences.
237
+
**Plans:** 0 plans
238
+
239
+
Plans:
240
+
-[ ] TBD (run /gsd:plan-phase 12 to break down)
241
+
242
+
### Phase 13: Collection.ForkCount
243
+
**Goal:** Add `ForkCount(ctx) (int, error)` to the V2 Collection interface with HTTP transport support, matching upstream Chroma's /fork_count endpoint.
244
+
**Depends on:** Phase 11, Phase 12 (benefits from fork bug fix and SDK research)
245
+
**Issues**: #460
246
+
**Success Criteria** (what must be TRUE):
247
+
1.`pkg/api/v2.Collection` includes `ForkCount(ctx context.Context) (int, error)`.
248
+
2. HTTP implementation issues `GET .../fork_count` and decodes `{"count": n}`.
249
+
3. Embedded/local behavior returns an explicit unsupported error.
**Goal:** Add limit parameter support to collection delete operations, matching upstream Chroma PRs #6573/#6582.
259
+
**Depends on:** None (independent)
260
+
**Issues**: #439
261
+
**Success Criteria** (what must be TRUE):
262
+
1. Delete operations accept an optional limit parameter.
263
+
2. HTTP transport sends the limit when specified.
264
+
3. Tests cover delete-with-limit happy path and edge cases.
265
+
**Plans:** 0 plans
266
+
267
+
Plans:
268
+
-[ ] TBD (run /gsd:plan-phase 14 to break down)
269
+
270
+
### Phase 15: OpenRouter Embeddings Compatibility
271
+
**Goal:** Extend the OpenAI embedding function to support OpenRouter-specific fields (encoding_format, input_type, provider preferences) and relax model validation for provider-prefixed IDs.
272
+
**Depends on:** None (independent)
273
+
**Issues**: #438
274
+
**Success Criteria** (what must be TRUE):
275
+
1.`CreateEmbeddingRequest` supports `encoding_format`, `input_type`, and `provider` fields.
276
+
2.`WithModel` accepts provider-prefixed model IDs (e.g. `openai/text-embedding-3-small`).
277
+
3. Provider preferences struct covers documented OpenRouter fields with extensibility.
278
+
4. Existing OpenAI behavior and tests remain unchanged.
279
+
5. Docs include OpenRouter usage example with `WithBaseURL`.
280
+
**Plans:** 0 plans
281
+
282
+
Plans:
283
+
-[ ] TBD (run /gsd:plan-phase 15 to break down)
284
+
285
+
### Phase 16: Twelve Labs Embedding Function
286
+
**Goal:** Add a new Twelve Labs multimodal embedding provider supporting text, image, and audio embeddings via the Twelve Labs API.
287
+
**Depends on:** Phase 9 (benefits from Content API foundations)
288
+
**Issues**: #190
289
+
**Success Criteria** (what must be TRUE):
290
+
1.`pkg/embeddings/twelvelabs` implements dense embedding and Content API interfaces.
291
+
2. Supports text, image, and audio modalities per Twelve Labs API docs.
292
+
3. Registered in factory/registry with config round-trip support.
293
+
4. Tests cover request construction, modality validation, and config persistence.
294
+
5. Docs and examples added for Twelve Labs provider.
295
+
**Plans:** 0 plans
296
+
297
+
Plans:
298
+
-[ ] TBD (run /gsd:plan-phase 16 to break down)
299
+
300
+
### Phase 17: Cloud RRF and GroupBy Test Coverage
301
+
**Goal:** Add end-to-end cloud integration tests that exercise Search API RRF and GroupBy primitives against live Chroma Cloud.
302
+
**Depends on:** None (independent, but best run last as test hardening)
303
+
**Issues**: #462
304
+
**Success Criteria** (what must be TRUE):
305
+
1. RRF smoke test using dense + sparse KNN ranks with `WithKnnReturnRank`.
306
+
2. RRF weighted/custom-k test proves request acceptance and ordering changes.
307
+
3. GroupBy MinK/MaxK tests assert per-group caps and flattened limits.
308
+
4. All tests tagged `cloud` and use existing cloud test infrastructure.
0 commit comments