Skip to content

Commit 7f52522

Browse files
authored
Phase 21: RrfRank arithmetic methods compute correct expressions (#496)
* docs(21): capture phase context * docs(state): record phase 21 context session * docs(21): create phase plan * docs: cross-AI review for phase 21 * docs(21): replan with cross-AI review feedback Strengthen test assertions from top-level key checks to exact JSON matching via require.JSONEq, add receiver immutability verification, chained composition test, and WithKnnReturnRank() in test setup. * docs(21): replan with cross-AI review feedback * test(21-01): add failing tests for RrfRank arithmetic methods - Add mustNewRrfRank test helper - Add TestRrfRankArithmetic with 11 subtests covering all 10 arithmetic methods plus chained composition - Each subtest asserts: pointer inequality, exact JSON via require.JSONEq, and receiver immutability - All tests fail because RrfRank methods currently return the receiver * fix(21-01): implement RrfRank arithmetic methods to build expression trees - Replace all 10 no-op methods that silently returned the receiver - Each method now returns the correct expression node (MulRank, SubRank, SumRank, DivRank, AbsRank, ExpRank, LogRank, MaxRank, MinRank) - Pattern matches KnnRank/ValRank implementations exactly - Fixes #481 * docs(21-01): complete RrfRank arithmetic fix plan - Add execution summary with task results, verification, and self-check * docs(21): add code review report * docs(phase-21): complete phase execution * docs(phase-21): evolve PROJECT.md after phase completion * fix(21): address PR review nice-to-haves Fix LogRank.Log() silent no-op: log(log(x)) != log(x), so the method must wrap in a new LogRank rather than return the receiver. Same bug class as the RrfRank arithmetic methods fixed earlier in this phase; uncovered while reviewing for related silent failures. Add two regression tests: - TestRrfRankArithmetic subtest "wrappers remain independent across sequential calls" pins down the aliasing contract: multiple wrappers built from the same RrfRank receiver must not cross-contaminate. - TestLogRankLogComposition asserts log(log(x)) marshals to a nested {"$log":{"$log":{"$val":10}}} expression. * docs(21): insert phase 21.1 for cloud integration coverage Phase 21 verification revealed a gap: the RrfRank arithmetic fix has unit-test coverage but no cloud integration tests exercising the new expression trees end-to-end. Insert Phase 21.1 as an urgent follow-up to close this gap before the milestone lands. * docs(21): ship phase 21 — PR #496 * test(21): drop tautological receiver-identity assertion in RrfRank test The `result == Rank(rrf)` check became structurally guaranteed false once every arithmetic method returns a distinct concrete type (MulRank, SumRank, etc.), making it a tautology rather than a real regression pin. The per-case JSONEq and receiver-immutability assertions already cover the original bug class behaviorally — if a method regressed to `return r`, the receiver's RRF JSON would fail JSONEq against every expected wrapper shape. Raised by claude bot review on PR #496. * docs(21.1): capture phase context * docs(state): record phase 21.1 context session * docs(21.1): research RRF cloud integration test coverage phase * docs(21.1): add validation strategy * docs(21.1): create phase plan * docs(21.1): resolve plan-checker blocker and warnings * docs(21.1): cross-AI review from gemini and codex * docs(21.1): replan phase incorporating cross-AI review feedback * test(21.1): scaffold cloud arithmetic tests for all 10 RrfRank methods * docs(21.1-01): complete cloud RRF arithmetic scaffolding plan * chore(phase-21.1): mark phase as executing in STATE * test(21.1): tighten cloud arithmetic assertions from empirical run Pass 2 of Phase 21.1 two-pass ship workflow. Pass 1 scaffolding shipped in the previous commit with observe-only deferred t.Logf for the 6 non-safe RrfRank arithmetic methods. This commit translates the user's empirical Pass 1 run observations into regression-pin assertions on Negate, Abs, Exp, Log, Max(0), and Min(0). Safe-bucket methods (Add/Sub/Multiply/Div) retain the shape-guarded strict differential from Pass 1 (label changed from `pass1 safe` to `pass2 safe`). Per-row empirical pins: - Negate: order-flip pin (NotEqual baseline.IDs) — inverts lower-is-better - Abs: order-flip pin (NotEqual baseline.IDs) — equivalent to Negate on all-negative baselines because abs(x)=-x for x<=0 - Exp: monotonic-transform pin (Equal baseline.IDs + NotEqual Scores) - Log: inner-empty-Scores pin + default-insertion-order IDs pin - Max(0): all-zero scores pin + default-insertion-order IDs pin - Min(0): no-op pin (Equal baseline.IDs + Equal baseline.Scores) — mathematical identity on all-negative baselines since min(x,0)=x for x<=0 Every sr.IDs[0]/sr.Scores[0] dereference is preceded by a require.NotEmpty outer-slice guard (M2). The deferred pass2 logger is kept as a regression audit trail. A consolidated corpus-limitation comment block documents why Negate==Abs, Max(0) collapses to zero, Log degenerates, and Min(0) is an identity on the current 5-doc seed corpus — corpus improvement is out of scope for Phase 21.1 per D-20. Issues filed from empirical Pass 1 observations (L1 classification applied): - #497 ([BUG] Log) - #498 ([ENH] Max(0)) Server-behavior defects tagged [BUG]; client-API-contract defects tagged [ENH]. Min(0) intentionally receives no issue (mathematical identity on all-negative corpus is correct, not a defect — corpus limitation documented inline). Client-side guards and server-side fixes are deferred per D-20 — each becomes its own phase. * docs(21.1-02): complete Pass 2 empirical tightening plan Document Phase 21.1 Plan 02 (Pass 2) execution: - Per-row final assertion summary for all 10 RrfRank arithmetic methods - Pass 1 commit e5788f8 + Pass 2 commit 5a39719 both referenced - Two issues filed per L1 rubric: - #497 ([BUG] Log) - #498 ([ENH] Max(0)) - Min(0) intentionally receives no issue (mathematical identity on all- negative corpus, not a defect) - Task 4 (D-21 checkpoint:human-verify) awaits user confirmation that both `go test -tags="basicv2 cloud" -v -run TestCloudClientSearchRRFArithmetic` and `make test-cloud` are green against a real Chroma Cloud instance * docs(21.1): mark plan 01 complete in ROADMAP * docs(21.1): mark plan 02 complete in ROADMAP * docs(21.1): add code review report * docs(phase-21.1): complete phase execution * test(21.1): address PR review feedback on RrfRank arithmetic tests - rank_test.go: extend TestRrfRankArithmetic with IntOperand and Rank-as-operand rows, exercising the previously-uncovered branches of operandToRank for RrfRank. - client_cloud_test.go: strip planning-artifact comments (phase IDs, pass labels, decision IDs, task IDs, file line references) while preserving the durable substance of every explanation. Log cleanup errors in t.Cleanup instead of swallowing them. * docs(roadmap): add Phase 29 for rank expression composition robustness Captures three follow-up issues discovered during PR review of #496: - #499 [BUG] operandToRank silently substitutes Val(0) for nil operand - #500 [BUG] RrfRank.Log and Max(0) silently degenerate on non-positive fused scores - #501 [ENH] Reject mathematically meaningless RrfRank arithmetic compositions at build time Phase 29 groups these under a single robustness theme with three sub-plans (one per issue). Depends on Phase 21 since arithmetic methods must build expression trees before they can be validated. * docs(rank): warn against post-composition mutation of RrfRank fields RrfRank has public fields (Ranks, K, Normalize). Because arithmetic methods like Add/Multiply embed *RrfRank as a pointer into the returned SumRank/MulRank/etc., a later write to rrf.K is observed by the already- composed expression at marshal time — silently changing a query the caller believed was built. Pre-existing pattern shared with KnnRank, but Phase 21's arithmetic fix is the first change that makes the aliasing observable for RrfRank. Add a short godoc note so SDK users avoid the footgun. Addresses reviewer observation #1 on PR #496. * test(21.1): strengthen cloud RRF arithmetic assertions Two test-robustness improvements from PR review feedback on #496: 1. Front-load the non-positive-baseline corpus assumption. Negate, Abs, and Min_0 assertions all depend on every baseline RRF score being <= 0. Previously a corpus regression would surface as three cascading failures; now it surfaces as one clear "corpus assumption violated" failure at the top of the test. 2. Safe-bucket now asserts ID-order preservation. All safe-bucket operands are positive constants (Add(+1), Sub(+1), Multiply(+2), Div(+2)), so each transform is strictly monotonic increasing and the ID order must match the baseline. The previous NotEqual-on-scores differential missed a class of regression where a server-side path might silently reorder results. * docs(rank): clarify RrfRank aliasing scope and arithmetic semantics Addresses two PR review observations on #496: 1. The previous warning said "Do not mutate fields on a *RrfRank" but the most likely real-world footgun is mutating the Ranks slice contents (e.g. rrf.Ranks[0].Weight = 5.0), which a strict reading of "fields" wouldn't cover. Reword to explicitly name the slice-contents case. 2. RrfRank.MarshalJSON negates the fused sum client-side so that the resulting rank score follows the "higher is better" convention. Arithmetic composes with that final score, not the raw fusion sum. Add a short note so users don't mentally apply arithmetic to the wrong intermediate. * docs(rank): correct RrfRank arithmetic semantics and fix operandToRank drift Two doc fixes from PR review on #496: 1. The RrfRank arithmetic docstring I added in 571a8be was factually WRONG — it said arithmetic operates on a "higher-is-better" score, but Chroma's canonical convention is lower-is-better. Verified against: - Official docs (trychroma.com/cloud/search-api/overview): "Results are ordered by score (ascending - lower is better)" - chroma-core/chroma Rust server at rust/types/src/execution/operator.rs: rrf() returns `-sum` to align with the lower-is-better convention. - The literal comment above this client's own MarshalJSON at rank.go:1226: "RRF gives higher scores for better, Chroma needs lower for better". Rewrite the docstring to accurately state the lower-is-better convention and explicitly name the degenerate transforms (Log, Max(0), Abs) so users don't reach for them expecting positive-domain behavior. 2. operandToRank doc drift: the comment claimed both nil AND unknown types return Val(0), but only the nil branch does that. Unknown types return *UnknownRank which errors at MarshalJSON time. Correct the comment to match reality — particularly ironic to leave incorrect in a PR whose theme is eliminating silent failures in rank composition. * docs(rank): polish RrfRank arithmetic docstring and test comments Four nits from PR review on #496: 1. Remove a stale inline comment inside operandToRank's default branch ("return zero to maintain chaining") that contradicted the code — the branch returns *UnknownRank, not zero. The top-level doc already covers this. 2. The Log bullet described internal math (NaN) rather than what the user actually observes. Rewrite to match the cloud test's empirical pin: empty inner Scores slice, IDs in insertion order. 3. Add a small qualifier to the Max and Abs bullets so they can be read in isolation without losing the "RRF's non-positive output" context from the surrounding paragraph. 4. Document the Min_0 cloud-test assertion's reliance on bit-exact float equality (server uses f32::min, which per IEEE 754 passes x through unchanged when x <= 0). Makes the implicit assumption explicit so a future server implementation change that drifts by 1 ULP is traceable to the assertion style. * docs(rank): warn that RrfRank.Negate flips result ordering RrfRank.MarshalJSON auto-negates the fusion sum, so on any non-empty corpus the score is <= 0. Negate on that input produces a >= 0 value and pushes the best match to the bottom of the result set. The doc block previously listed Negate under "behave as expected" while the cloud test asserted the order flip — this fix moves Negate into the footgun bullets alongside Abs, matching what the test pins.
1 parent 0f5962b commit 7f52522

25 files changed

Lines changed: 4670 additions & 43 deletions

.planning/PROJECT.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Go applications can use Chroma and embedding providers through a stable, portabl
2929
- ✓ Cloud integration tests for Search API RRF and GroupBy — v0.4.1
3030
- ✓ Code cleanups: shared pathutil, context.Context fix, registry test cleanup — v0.4.1
3131
- ✓ SDK auto-wiring behavior documented across Python, JS, Rust, Go — v0.4.1
32+
- ✓ RrfRank arithmetic methods build correct expression trees instead of silent no-ops — v0.4.2 Phase 21
3233

3334
## Current Milestone: v0.4.2 Bug Fixes and Robustness
3435

@@ -45,8 +46,6 @@ Go applications can use Chroma and embedding providers through a stable, portabl
4546
- Add Twelve Labs async embedding support (#479)
4647

4748
### Active
48-
49-
- RrfRank arithmetic methods silently return self without computing — #481
5049
- WithGroupBy(nil) accepted as no-op instead of error — #482
5150
- Embedded GetOrCreateCollection passes closed EFs to CreateCollection fallback — #493
5251
- Default ORT EF leaked when CreateCollection finds existing collection — #494
@@ -107,4 +106,4 @@ This document evolves at phase transitions and milestone boundaries.
107106
4. Update Context with current state
108107

109108
---
110-
*Last updated: 2026-04-08milestone v0.4.2 started*
109+
*Last updated: 2026-04-09Phase 21 (RrfRank arithmetic fix) complete*

.planning/ROADMAP.md

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Milestones
44

55
-**v0.4.1 Provider-Neutral Multimodal Foundations** — Phases 1-20 (shipped 2026-04-08)
6-
- 🚧 **v0.4.2 Bug Fixes and Robustness** — Phases 21-28 (in progress)
6+
- 🚧 **v0.4.2 Bug Fixes and Robustness** — Phases 21-29 (in progress)
77

88
## Phases
99

@@ -22,14 +22,15 @@ See: [v0.4.1 Archived Roadmap](milestones/v0.4.1-ROADMAP.md)
2222
- Integer phases (21, 22, ...): Planned milestone work
2323
- Decimal phases (21.1, 21.2): Urgent insertions (marked with INSERTED)
2424

25-
- [ ] **Phase 21: RrfRank Arithmetic Fix** - RrfRank arithmetic methods compute correct results instead of silently returning self
25+
- [x] **Phase 21: RrfRank Arithmetic Fix** - RrfRank arithmetic methods compute correct results instead of silently returning self (completed 2026-04-09)
2626
- [ ] **Phase 22: WithGroupBy Validation** - WithGroupBy(nil) returns an error instead of silently skipping grouping
2727
- [ ] **Phase 23: ORT EF Leak Fix** - Default ORT EF is properly closed when CreateCollection finds an existing collection
2828
- [ ] **Phase 24: GetOrCreateCollection EF Safety** - GetOrCreateCollection does not pass closed EFs to CreateCollection fallback
2929
- [ ] **Phase 25: Error Body Truncation** - Embedding provider error messages truncate raw HTTP bodies to safe display lengths
3030
- [ ] **Phase 26: Twelve Labs Async Embedding** - Twelve Labs provider handles async task responses for long-running media
3131
- [ ] **Phase 27: Download Stack Consolidation** - default_ef download code uses shared downloadutil instead of its own HTTP implementation
3232
- [ ] **Phase 28: Morph Test Fix** - Morph EF integration test handles upstream 404 gracefully
33+
- [ ] **Phase 29: Rank Expression Composition Robustness** - Reject silent footguns in rank composition (nil operands, degenerate RRF compositions)
3334

3435
## Phase Details
3536

@@ -41,10 +42,26 @@ See: [v0.4.1 Archived Roadmap](milestones/v0.4.1-ROADMAP.md)
4142
1. Calling Multiply, Sub, Add, Div, or Negate on an RrfRank returns a new rank value reflecting the computation, not the original receiver
4243
2. The computed rank values marshal to valid JSON that Chroma accepts
4344
3. Tests confirm each arithmetic method produces distinct output from its input
44-
**Plans**: TBD
45+
**Plans**: 1 plan
46+
47+
Plans:
48+
- [x] 21-01-PLAN.md — Fix RrfRank arithmetic methods and add test coverage
49+
50+
### Phase 21.1: RRF cloud integration test coverage including arithmetic compositions (INSERTED)
51+
52+
**Goal:** Add Chroma Cloud integration test coverage for all 10 RrfRank arithmetic methods (Add, Sub, Multiply, Div, Negate, Abs, Exp, Log, Max, Min) end-to-end against a real Chroma Cloud instance, closing the cloud-test-bar gap left by Phase 21 (which shipped structural unit tests only).
53+
**Requirements**: D-01..D-22 (CONTEXT.md decision IDs — phase has no REQ-IDs because it's an inserted urgent-work phase)
54+
**Depends on:** Phase 21
55+
**Success Criteria** (what must be TRUE):
56+
1. `TestCloudClientSearchRRFArithmetic` exists in `pkg/api/v2/client_cloud_test.go` exercising all 10 methods in a single table-driven function under build tag `basicv2 && cloud`
57+
2. Safe-bucket methods (Add, Sub, Multiply, Div) assert strict differential against an RRF baseline
58+
3. Semflip + degenerate methods (Negate, Abs, Exp, Log, Max(0), Min(0)) have empirically pinned assertions reflecting actual server behavior
59+
4. `make test-cloud -run TestCloudClientSearchRRFArithmetic` passes against a real Chroma Cloud instance (D-21, user-run gate per D-22)
60+
**Plans**: 2 plans
4561

4662
Plans:
47-
- [ ] 21-01: TBD
63+
- [x] 21.1-01-PLAN.md — Pass 1 scaffolding: TestCloudClientSearchRRFArithmetic with all 10 rows, safe-bucket strict differential, semflip+degenerate observe-only
64+
- [x] 21.1-02-PLAN.md — Pass 2 empirical tightening: per-row pinned assertions from user observations + [BUG] issues + D-21 user-run gate
4865

4966
### Phase 22: WithGroupBy Validation
5067
**Goal**: WithGroupBy rejects nil input with a clear error
@@ -134,20 +151,38 @@ Plans:
134151
Plans:
135152
- [ ] 28-01: TBD
136153

154+
### Phase 29: Rank Expression Composition Robustness
155+
**Goal**: Rank expression composition fails loud on programmer errors and rejects mathematically meaningless RRF compositions before sending to the server
156+
**Depends on**: Phase 21 (arithmetic methods must build expression trees before they can be validated)
157+
**Requirements**: COMP-01, COMP-02, COMP-03
158+
**Issues**: amikos-tech/chroma-go#499, amikos-tech/chroma-go#500, amikos-tech/chroma-go#501
159+
**Success Criteria** (what must be TRUE):
160+
1. Passing nil to any `*Rank.Add/Sub/Multiply/Div/Max/Min` produces a rank whose `MarshalJSON` reports a clear error instead of silently substituting `Val(0)` (#499)
161+
2. `RrfRank.Log()` and `RrfRank.Max(Val(0))` reject the composition at build time with a descriptive error instead of producing a degenerate query (#501)
162+
3. Client detects and reports result-shape mismatch (empty inner `Scores` with populated inner `IDs`) from `Search` responses so callers see silent server-side degeneration (#500)
163+
4. `TestCloudClientSearchRRFArithmetic` is updated to assert the new client-side errors on degenerate rows instead of pinning the current fallback behavior
164+
**Plans**: TBD
165+
166+
Plans:
167+
- [ ] 29-01: TBD — Fix `operandToRank` nil handling (#499)
168+
- [ ] 29-02: TBD — Client-side rejection of degenerate RRF compositions (#501)
169+
- [ ] 29-03: TBD — Result-shape validation in `Search` response handling (#500)
170+
137171
## Progress
138172

139173
**Execution Order:**
140-
Phases execute in numeric order: 21 -> 22 -> 23 -> 24 -> ... -> 28.
174+
Phases execute in numeric order: 21 -> 22 -> 23 -> 24 -> ... -> 29.
141175
Phases 21, 22, 25, 27, 28 are independent and may execute in any order relative to each other.
142-
Phase 24 depends on Phase 23. Phase 26 depends on Phase 25.
176+
Phase 24 depends on Phase 23. Phase 26 depends on Phase 25. Phase 29 depends on Phase 21.
143177

144178
| Phase | Milestone | Plans Complete | Status | Completed |
145179
|-------|-----------|----------------|--------|-----------|
146-
| 21. RrfRank Arithmetic Fix | v0.4.2 | 0/0 | Not started | - |
180+
| 21. RrfRank Arithmetic Fix | v0.4.2 | 1/1 | Complete | 2026-04-09 |
147181
| 22. WithGroupBy Validation | v0.4.2 | 0/0 | Not started | - |
148182
| 23. ORT EF Leak Fix | v0.4.2 | 0/0 | Not started | - |
149183
| 24. GetOrCreateCollection EF Safety | v0.4.2 | 0/0 | Not started | - |
150184
| 25. Error Body Truncation | v0.4.2 | 0/0 | Not started | - |
151185
| 26. Twelve Labs Async Embedding | v0.4.2 | 0/0 | Not started | - |
152186
| 27. Download Stack Consolidation | v0.4.2 | 0/0 | Not started | - |
153187
| 28. Morph Test Fix | v0.4.2 | 0/0 | Not started | - |
188+
| 29. Rank Expression Composition Robustness | v0.4.2 | 0/3 | Not started | - |

.planning/STATE.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,16 @@
22
gsd_state_version: 1.0
33
milestone: v0.4.2
44
milestone_name: Bug Fixes and Robustness
5-
status: Ready to plan
6-
stopped_at: Roadmap created with 8 phases (21-28)
7-
last_updated: "2026-04-08T18:00:00.000Z"
5+
status: executing
6+
stopped_at: Phase 21.1 context gathered
7+
last_updated: "2026-04-09T14:32:02.308Z"
8+
last_activity: 2026-04-09
89
progress:
9-
total_phases: 8
10-
completed_phases: 0
11-
total_plans: 0
12-
completed_plans: 0
13-
percent: 0
10+
total_phases: 9
11+
completed_phases: 2
12+
total_plans: 3
13+
completed_plans: 3
14+
percent: 100
1415
---
1516

1617
# Project State
@@ -20,21 +21,22 @@ progress:
2021
See: .planning/PROJECT.md (updated 2026-04-08)
2122

2223
**Core value:** Go applications can use Chroma and embedding providers through a stable, portable API that minimizes provider-specific friction.
23-
**Current focus:** Phase 21 - RrfRank Arithmetic Fix
24+
**Current focus:** Phase 21.1 — rrf-cloud-integration-test-coverage-including-arithmetic-com
2425

2526
## Current Position
2627

27-
Phase: 21 of 28 (RrfRank Arithmetic Fix)
28-
Plan: --
29-
Status: Ready to plan
30-
Last activity: 2026-04-08 -- Roadmap created for v0.4.2 (8 phases, 15 requirements)
28+
Phase: 22
29+
Plan: Not started
30+
Status: Executing Phase 21.1
31+
Last activity: 2026-04-09
3132

3233
Progress: [░░░░░░░░░░] 0%
3334

3435
## Performance Metrics
3536

3637
**Velocity:**
37-
- Total plans completed: 0
38+
39+
- Total plans completed: 3
3840
- Average duration: --
3941
- Total execution time: 0 hours
4042

@@ -44,11 +46,15 @@ Progress: [░░░░░░░░░░] 0%
4446

4547
Decisions are logged in PROJECT.md Key Decisions table.
4648

49+
### Roadmap Evolution
50+
51+
- Phase 21.1 inserted after Phase 21: RRF cloud integration test coverage including arithmetic compositions (URGENT) — post-fix cloud coverage gap for Phase 21 arithmetic methods
52+
4753
### Blockers/Concerns
4854

4955
- Phase 28 (Morph): upstream URL may be permanently moved -- need to verify before coding
5056

5157
## Session
5258

53-
**Last Date:** 2026-04-08
54-
**Stopped At:** Roadmap created, ready to plan Phase 21
59+
**Last Date:** 2026-04-09T11:41:08.829Z
60+
**Stopped At:** Phase 21.1 context gathered

0 commit comments

Comments
 (0)