Skip to content

Commit f72b854

Browse files
authored
Phase 10: Code Cleanups (#470)
* docs(10): capture phase context * docs(state): record phase 10 context session * docs(10): research phase domain * docs(phase-10): add validation strategy * docs(10): create phase plan * refactor(10-01): extract shared pathutil package from duplicated provider code - Create pkg/internal/pathutil with ContainsDotDot, ValidateFilePath, SafePath - Replace local containsDotDot in gemini/content.go and voyage/content.go - Replace local safePath in default_ef/download_utils.go - Add comprehensive unit tests for all three functions * fix(10-01): replace *context.Context pointer-to-interface with context.Context value - Fix Gemini DefaultContext from *context.Context to context.Context - Fix Nomic DefaultContext from *context.Context to context.Context - Fix Mistral DefaultContext from *context.Context to context.Context - Remove unnecessary pointer indirection and address-of operations * docs(10-01): complete path safety and context anti-pattern plan - Add 10-01-SUMMARY.md with execution results - Update STATE.md with phase 10 progress - Update ROADMAP.md plan progress - Mark CLN-01, CLN-02, CLN-03, CLN-06 requirements complete * feat(10-02): add URL path extension fallback to resolveMIME in Gemini and Voyage - Add net/url import and URL path parsing as 3rd fallback in resolveMIME - URL query strings and fragments are stripped by url.Parse before extension check - Update error message to mention file/URL with known extension - Add 4 new test cases per provider (URL fallback, query string, fragment, no-ext) - Remove dead TestVoyageContainsDotDot test referencing removed containsDotDot function * refactor(10-02): add registry unregister helpers and t.Cleanup to all tests - Add 4 unexported unregister helpers (dense, sparse, multimodal, content) - Add t.Cleanup with unregister calls to all 22 registration test sites - Replace inline mu.Lock/delete/mu.Unlock with unregister helper calls - Tests pass with -count=2 proving no global state leaks * docs(10-02): complete URL MIME inference and registry test cleanup plan - SUMMARY.md documents URL path fallback and registry cleanup changes - STATE.md updated with position, metrics, and decisions - ROADMAP.md reflects Phase 10 complete (2/2 plans) - REQUIREMENTS.md marks CLN-04, CLN-05, CLN-06 complete * docs(phase-10): complete phase execution and verification * fix(10): harden input validation, error handling, and test coverage - Add empty-string guards to ValidateFilePath and SafePath - Surface URL parse errors in resolveMIME instead of silently discarding - Wrap ValidateFilePath/SafePath errors with caller context - Make containsDotDot unexported, remove dead SafePath branch - Remove stale TODO(#469), use defer for unregister mutex consistency - Add tests for malformed URLs, multi-level traversal, edge cases * fix(10): handle root destPath in SafePath, strip URL from error messages - Special-case root "/" in SafePath prefix check to avoid false rejection - Remove raw URL from resolveMIME parse error to prevent credential leaks - Add test for SafePath with root destPath * fix(10): add MIME format validation, Gemini modality default case, clarify ValidateFilePath scope - Clarify ValidateFilePath doc: only rejects relative ".." traversal, not absolute path manipulation; point callers to SafePath for confinement - Add default case to Gemini validateMIMEModality matching Voyage's pattern, rejecting unknown modalities instead of silently passing - Add MIME type format validation in BinarySource.Validate() using RFC 2045 type/subtype regex to block injection via malformed MIME strings - Add tests for all three changes * refactor(10): simplify MIME regex init to regexp.Compile with nolint * fix(10): remove weak MIME validation fallback, reject all if regex unavailable
1 parent c2f4229 commit f72b854

27 files changed

Lines changed: 2151 additions & 101 deletions

.planning/REQUIREMENTS.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,15 @@
5151
- [x] **CONV-03**: All convenience constructors have unit tests and existing tests/examples remain green
5252
- [x] **CONV-04**: Multimodal docs and provider examples show shorthand constructors as the primary examples with verbose forms linked from the generic Content API page
5353

54+
### Code Cleanups
55+
56+
- [x] **CLN-01**: Duplicated path safety functions (`containsDotDot`, `safePath`) are extracted into a shared `pkg/internal/pathutil` package with unit tests
57+
- [x] **CLN-02**: Gemini, Voyage, and default_ef providers import path safety utilities from the shared package instead of maintaining local copies
58+
- [x] **CLN-03**: Gemini, Nomic, and Mistral providers use `context.Context` (value type) for `DefaultContext` instead of `*context.Context` (pointer-to-interface anti-pattern)
59+
- [x] **CLN-04**: Registry tests use `t.Cleanup` with unexported unregister helpers to prevent global state leaks between test runs
60+
- [x] **CLN-05**: Gemini and VoyageAI `resolveMIME` functions infer MIME type from URL path extensions as a fallback, making URL constructors work end-to-end
61+
- [x] **CLN-06**: All existing tests pass without modification after cleanup changes
62+
5463
## v2 Requirements
5564

5665
### Provider Adoption
@@ -101,12 +110,18 @@
101110
| CONV-02 | Phase 9 | Planned |
102111
| CONV-03 | Phase 9 | Planned |
103112
| CONV-04 | Phase 9 | Planned |
113+
| CLN-01 | Phase 10 | Planned |
114+
| CLN-02 | Phase 10 | Planned |
115+
| CLN-03 | Phase 10 | Planned |
116+
| CLN-04 | Phase 10 | Planned |
117+
| CLN-05 | Phase 10 | Planned |
118+
| CLN-06 | Phase 10 | Planned |
104119

105120
**Coverage:**
106-
- v1 requirements: 25 total
107-
- Mapped to phases: 25
121+
- v1 requirements: 31 total
122+
- Mapped to phases: 31
108123
- Unmapped: 0
109124

110125
---
111126
*Requirements defined: 2026-03-18*
112-
*Last updated: 2026-03-25 — added CONV-01/02/03/04 for phase 9*
127+
*Last updated: 2026-03-26 -- added CLN-01/02/03/04/05/06 for phase 10*

.planning/ROADMAP.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This roadmap initializes GSD planning for the current brownfield milestone focus
2323
- [x] **Phase 7: Voyage Multimodal Adoption** - Wire VoyageAI into the shared multimodal contract with text, image, and video support to validate the foundation end-to-end.
2424
- [x] **Phase 8: Document Gemini and VoyageAI multimodal embedding functions** - Update provider docs, add runnable examples, update README, create changelog. (completed 2026-03-23)
2525
- [ ] **Phase 9: Convenience Constructors and Documentation Polish** - Add shorthand constructors to reduce Content API verbosity and update docs.
26-
- [ ] **Phase 10: Code Cleanups** - Extract shared path safety utilities, fix *context.Context anti-pattern, add registry test cleanup, fix resolveMIME for URL-backed sources. (issues #456, #461, #466, #469)
26+
- [x] **Phase 10: Code Cleanups** - Extract shared path safety utilities, fix *context.Context anti-pattern, add registry test cleanup, fix resolveMIME for URL-backed sources. (issues #456, #461, #466, #469) (completed 2026-03-26)
2727
- [ ] **Phase 11: Fork Double-Close Bug** - Fix EF pointer sharing in Fork() that causes double-close on client.Close(). (issue #454)
2828
- [ ] **Phase 12: SDK Auto-Wiring Research** - Trace contentEmbeddingFunction auto-wiring behavior in official Chroma SDKs. (issue #455)
2929
- [ ] **Phase 13: Collection.ForkCount** - Add ForkCount endpoint support for upstream /fork_count API. (issue #460)
@@ -172,8 +172,8 @@ Plans:
172172
| 6. Gemini Multimodal Adoption | 2/2 | Complete | 2026-03-20 |
173173
| 7. Voyage Multimodal Adoption | 2/2 | Complete | 2026-03-22 |
174174
| 8. Document Gemini and VoyageAI | 2/2 | Complete | 2026-03-23 |
175-
| 9. Convenience Constructors | 0/2 | Not started | - |
176-
| 10. Code Cleanups | 0/0 | Not started | - |
175+
| 9. Convenience Constructors | 2/2 | Complete | - |
176+
| 10. Code Cleanups | 2/2 | Complete | 2026-03-26 |
177177
| 11. Fork Double-Close Bug | 0/0 | Not started | - |
178178
| 12. SDK Auto-Wiring Research | 0/0 | Not started | - |
179179
| 13. Collection.ForkCount | 0/0 | Not started | - |
@@ -202,17 +202,19 @@ Plans:
202202
**Goal:** Consolidate duplicated path safety utilities into a shared internal package, fix the *context.Context pointer-to-interface anti-pattern across embedding providers, add registry test cleanup to prevent global state leaks, and fix resolveMIME for URL-backed sources.
203203
**Depends on:** Phase 9
204204
**Issues**: #456, #461, #466, #469
205+
**Requirements**: [CLN-01, CLN-02, CLN-03, CLN-04, CLN-05, CLN-06]
205206
**Success Criteria** (what must be TRUE):
206207
1. A shared `pkg/internal/pathutil` package provides `ContainsDotDot`, `ValidateFilePath`, and `SafePath` utilities.
207208
2. Gemini, Voyage, and default_ef use the shared path utilities instead of local duplicates.
208209
3. Gemini, Nomic, and Mistral use `context.Context` (not `*context.Context`) for DefaultContext.
209210
4. Registry tests use `t.Cleanup` with unregister helpers to prevent global state leaks.
210211
5. All existing tests pass without modification.
211212
6. Gemini and VoyageAI `resolveMIME` infer MIME type from URL path extensions, making URL constructors work end-to-end.
212-
**Plans:** 0 plans
213+
**Plans:** 2/2 plans complete
213214

214215
Plans:
215-
- [ ] TBD (run /gsd:plan-phase 10 to break down)
216+
- [x] 10-01-PLAN.md — Create shared pathutil package, replace local implementations, fix *context.Context anti-pattern
217+
- [x] 10-02-PLAN.md — Add resolveMIME URL fallback, registry unregister helpers, and t.Cleanup to all tests
216218

217219
### Phase 11: Fork Double-Close Bug
218220
**Goal:** Fix EF pointer sharing in Fork() that causes the same underlying embedding function resource to be closed twice when client.Close() iterates cached collections.

.planning/STATE.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
gsd_state_version: 1.0
33
milestone: v0.4.1
44
milestone_name: Provider-Neutral Multimodal Foundations
5-
status: "Phase 9 shipped — PR #468"
6-
stopped_at: Completed 09-02-PLAN.md
7-
last_updated: "2026-03-26T08:17:06.350Z"
5+
status: Ready to plan
6+
stopped_at: Completed 10-02-PLAN.md
7+
last_updated: "2026-03-26T10:37:37.709Z"
88
progress:
99
total_phases: 17
10-
completed_phases: 9
11-
total_plans: 22
12-
completed_plans: 22
10+
completed_phases: 10
11+
total_plans: 24
12+
completed_plans: 24
1313
---
1414

1515
# Project State
@@ -19,11 +19,11 @@ progress:
1919
See: .planning/PROJECT.md (updated 2026-03-18)
2020

2121
**Core value:** Go applications can use Chroma and embedding providers through a stable, portable API that minimizes provider-specific friction.
22-
**Current focus:** Phase 09convenience-constructors-and-documentation-polish
22+
**Current focus:** Phase 10code-cleanups
2323

2424
## Current Position
2525

26-
Phase: 10
26+
Phase: 11
2727
Plan: Not started
2828

2929
## Performance Metrics
@@ -66,6 +66,8 @@ Plan: Not started
6666
| Phase 08 P02 | 4min | 2 tasks | 3 files |
6767
| Phase 09 P01 | 1min | 2 tasks | 2 files |
6868
| Phase 09 P02 | 5min | 2 tasks | 4 files |
69+
| Phase 10 P01 | 4min | 2 tasks | 8 files |
70+
| Phase 10 P02 | 7min | 2 tasks | 6 files |
6971

7072
## Accumulated Context
7173

@@ -111,6 +113,8 @@ Recent decisions affecting current work:
111113
- [Phase 08]: Reworded ROADMAP Phase 8 success criteria to eliminate last Nemotron text reference
112114
- [Phase 09]: Return Content by value and use ContentOption func(*Content) with no error return for convenience constructors
113115
- [Phase 09]: Shorthand-first doc pattern: provider multimodal sections show NewTextContent/NewImageFile, link to multimodal.md for verbose
116+
- [Phase 10]: Follow plan as specified - no deviations required for path safety consolidation and context anti-pattern fix
117+
- [Phase 10]: Remove dead TestVoyageContainsDotDot referencing function eliminated in Plan 01
114118

115119
### Roadmap Evolution
116120

@@ -143,6 +147,6 @@ None.
143147

144148
## Session
145149

146-
**Last Date:** 2026-03-25T16:12:32.516Z
147-
**Stopped At:** Completed 09-02-PLAN.md
150+
**Last Date:** 2026-03-26T10:32:36.242Z
151+
**Stopped At:** Completed 10-02-PLAN.md
148152
**Resume File:** None

0 commit comments

Comments
 (0)