Commit 7f52522
authored
Phase 21: RrfRank arithmetic methods compute correct expressions (#496)
* docs(21): capture phase context
* docs(state): record phase 21 context session
* docs(21): create phase plan
* docs: cross-AI review for phase 21
* docs(21): replan with cross-AI review feedback
Strengthen test assertions from top-level key checks to exact JSON
matching via require.JSONEq, add receiver immutability verification,
chained composition test, and WithKnnReturnRank() in test setup.
* docs(21): replan with cross-AI review feedback
* test(21-01): add failing tests for RrfRank arithmetic methods
- Add mustNewRrfRank test helper
- Add TestRrfRankArithmetic with 11 subtests covering all 10 arithmetic
methods plus chained composition
- Each subtest asserts: pointer inequality, exact JSON via require.JSONEq,
and receiver immutability
- All tests fail because RrfRank methods currently return the receiver
* fix(21-01): implement RrfRank arithmetic methods to build expression trees
- Replace all 10 no-op methods that silently returned the receiver
- Each method now returns the correct expression node (MulRank, SubRank,
SumRank, DivRank, AbsRank, ExpRank, LogRank, MaxRank, MinRank)
- Pattern matches KnnRank/ValRank implementations exactly
- Fixes #481
* docs(21-01): complete RrfRank arithmetic fix plan
- Add execution summary with task results, verification, and self-check
* docs(21): add code review report
* docs(phase-21): complete phase execution
* docs(phase-21): evolve PROJECT.md after phase completion
* fix(21): address PR review nice-to-haves
Fix LogRank.Log() silent no-op: log(log(x)) != log(x), so the method
must wrap in a new LogRank rather than return the receiver. Same bug
class as the RrfRank arithmetic methods fixed earlier in this phase;
uncovered while reviewing for related silent failures.
Add two regression tests:
- TestRrfRankArithmetic subtest "wrappers remain independent across
sequential calls" pins down the aliasing contract: multiple wrappers
built from the same RrfRank receiver must not cross-contaminate.
- TestLogRankLogComposition asserts log(log(x)) marshals to a nested
{"$log":{"$log":{"$val":10}}} expression.
* docs(21): insert phase 21.1 for cloud integration coverage
Phase 21 verification revealed a gap: the RrfRank arithmetic fix has
unit-test coverage but no cloud integration tests exercising the new
expression trees end-to-end. Insert Phase 21.1 as an urgent follow-up
to close this gap before the milestone lands.
* docs(21): ship phase 21 — PR #496
* test(21): drop tautological receiver-identity assertion in RrfRank test
The `result == Rank(rrf)` check became structurally guaranteed false
once every arithmetic method returns a distinct concrete type (MulRank,
SumRank, etc.), making it a tautology rather than a real regression pin.
The per-case JSONEq and receiver-immutability assertions already cover
the original bug class behaviorally — if a method regressed to
`return r`, the receiver's RRF JSON would fail JSONEq against every
expected wrapper shape.
Raised by claude bot review on PR #496.
* docs(21.1): capture phase context
* docs(state): record phase 21.1 context session
* docs(21.1): research RRF cloud integration test coverage phase
* docs(21.1): add validation strategy
* docs(21.1): create phase plan
* docs(21.1): resolve plan-checker blocker and warnings
* docs(21.1): cross-AI review from gemini and codex
* docs(21.1): replan phase incorporating cross-AI review feedback
* test(21.1): scaffold cloud arithmetic tests for all 10 RrfRank methods
* docs(21.1-01): complete cloud RRF arithmetic scaffolding plan
* chore(phase-21.1): mark phase as executing in STATE
* test(21.1): tighten cloud arithmetic assertions from empirical run
Pass 2 of Phase 21.1 two-pass ship workflow. Pass 1 scaffolding shipped
in the previous commit with observe-only deferred t.Logf for the 6
non-safe RrfRank arithmetic methods. This commit translates the user's
empirical Pass 1 run observations into regression-pin assertions on
Negate, Abs, Exp, Log, Max(0), and Min(0). Safe-bucket methods
(Add/Sub/Multiply/Div) retain the shape-guarded strict differential
from Pass 1 (label changed from `pass1 safe` to `pass2 safe`).
Per-row empirical pins:
- Negate: order-flip pin (NotEqual baseline.IDs) — inverts lower-is-better
- Abs: order-flip pin (NotEqual baseline.IDs) — equivalent to Negate
on all-negative baselines because abs(x)=-x for x<=0
- Exp: monotonic-transform pin (Equal baseline.IDs + NotEqual Scores)
- Log: inner-empty-Scores pin + default-insertion-order IDs pin
- Max(0): all-zero scores pin + default-insertion-order IDs pin
- Min(0): no-op pin (Equal baseline.IDs + Equal baseline.Scores) —
mathematical identity on all-negative baselines since
min(x,0)=x for x<=0
Every sr.IDs[0]/sr.Scores[0] dereference is preceded by a
require.NotEmpty outer-slice guard (M2). The deferred pass2 logger is
kept as a regression audit trail. A consolidated corpus-limitation
comment block documents why Negate==Abs, Max(0) collapses to zero, Log
degenerates, and Min(0) is an identity on the current 5-doc seed
corpus — corpus improvement is out of scope for Phase 21.1 per D-20.
Issues filed from empirical Pass 1 observations (L1 classification
applied):
- #497 ([BUG] Log)
- #498 ([ENH] Max(0))
Server-behavior defects tagged [BUG]; client-API-contract defects
tagged [ENH]. Min(0) intentionally receives no issue (mathematical
identity on all-negative corpus is correct, not a defect — corpus
limitation documented inline). Client-side guards and server-side
fixes are deferred per D-20 — each becomes its own phase.
* docs(21.1-02): complete Pass 2 empirical tightening plan
Document Phase 21.1 Plan 02 (Pass 2) execution:
- Per-row final assertion summary for all 10 RrfRank arithmetic methods
- Pass 1 commit e5788f8 + Pass 2 commit 5a39719 both referenced
- Two issues filed per L1 rubric:
- #497 ([BUG] Log)
- #498 ([ENH] Max(0))
- Min(0) intentionally receives no issue (mathematical identity on all-
negative corpus, not a defect)
- Task 4 (D-21 checkpoint:human-verify) awaits user confirmation that
both `go test -tags="basicv2 cloud" -v -run TestCloudClientSearchRRFArithmetic`
and `make test-cloud` are green against a real Chroma Cloud instance
* docs(21.1): mark plan 01 complete in ROADMAP
* docs(21.1): mark plan 02 complete in ROADMAP
* docs(21.1): add code review report
* docs(phase-21.1): complete phase execution
* test(21.1): address PR review feedback on RrfRank arithmetic tests
- rank_test.go: extend TestRrfRankArithmetic with IntOperand and
Rank-as-operand rows, exercising the previously-uncovered branches
of operandToRank for RrfRank.
- client_cloud_test.go: strip planning-artifact comments (phase IDs,
pass labels, decision IDs, task IDs, file line references) while
preserving the durable substance of every explanation. Log cleanup
errors in t.Cleanup instead of swallowing them.
* docs(roadmap): add Phase 29 for rank expression composition robustness
Captures three follow-up issues discovered during PR review of #496:
- #499 [BUG] operandToRank silently substitutes Val(0) for nil operand
- #500 [BUG] RrfRank.Log and Max(0) silently degenerate on non-positive
fused scores
- #501 [ENH] Reject mathematically meaningless RrfRank arithmetic
compositions at build time
Phase 29 groups these under a single robustness theme with three sub-plans
(one per issue). Depends on Phase 21 since arithmetic methods must build
expression trees before they can be validated.
* docs(rank): warn against post-composition mutation of RrfRank fields
RrfRank has public fields (Ranks, K, Normalize). Because arithmetic
methods like Add/Multiply embed *RrfRank as a pointer into the returned
SumRank/MulRank/etc., a later write to rrf.K is observed by the already-
composed expression at marshal time — silently changing a query the
caller believed was built.
Pre-existing pattern shared with KnnRank, but Phase 21's arithmetic fix
is the first change that makes the aliasing observable for RrfRank.
Add a short godoc note so SDK users avoid the footgun.
Addresses reviewer observation #1 on PR #496.
* test(21.1): strengthen cloud RRF arithmetic assertions
Two test-robustness improvements from PR review feedback on #496:
1. Front-load the non-positive-baseline corpus assumption. Negate, Abs,
and Min_0 assertions all depend on every baseline RRF score being
<= 0. Previously a corpus regression would surface as three cascading
failures; now it surfaces as one clear "corpus assumption violated"
failure at the top of the test.
2. Safe-bucket now asserts ID-order preservation. All safe-bucket
operands are positive constants (Add(+1), Sub(+1), Multiply(+2),
Div(+2)), so each transform is strictly monotonic increasing and the
ID order must match the baseline. The previous NotEqual-on-scores
differential missed a class of regression where a server-side path
might silently reorder results.
* docs(rank): clarify RrfRank aliasing scope and arithmetic semantics
Addresses two PR review observations on #496:
1. The previous warning said "Do not mutate fields on a *RrfRank" but the
most likely real-world footgun is mutating the Ranks slice contents
(e.g. rrf.Ranks[0].Weight = 5.0), which a strict reading of "fields"
wouldn't cover. Reword to explicitly name the slice-contents case.
2. RrfRank.MarshalJSON negates the fused sum client-side so that the
resulting rank score follows the "higher is better" convention.
Arithmetic composes with that final score, not the raw fusion sum.
Add a short note so users don't mentally apply arithmetic to the
wrong intermediate.
* docs(rank): correct RrfRank arithmetic semantics and fix operandToRank drift
Two doc fixes from PR review on #496:
1. The RrfRank arithmetic docstring I added in 571a8be was factually
WRONG — it said arithmetic operates on a "higher-is-better" score,
but Chroma's canonical convention is lower-is-better. Verified
against:
- Official docs (trychroma.com/cloud/search-api/overview):
"Results are ordered by score (ascending - lower is better)"
- chroma-core/chroma Rust server at
rust/types/src/execution/operator.rs: rrf() returns `-sum` to
align with the lower-is-better convention.
- The literal comment above this client's own MarshalJSON at
rank.go:1226: "RRF gives higher scores for better, Chroma needs
lower for better".
Rewrite the docstring to accurately state the lower-is-better
convention and explicitly name the degenerate transforms (Log, Max(0),
Abs) so users don't reach for them expecting positive-domain behavior.
2. operandToRank doc drift: the comment claimed both nil AND unknown
types return Val(0), but only the nil branch does that. Unknown types
return *UnknownRank which errors at MarshalJSON time. Correct the
comment to match reality — particularly ironic to leave incorrect in
a PR whose theme is eliminating silent failures in rank composition.
* docs(rank): polish RrfRank arithmetic docstring and test comments
Four nits from PR review on #496:
1. Remove a stale inline comment inside operandToRank's default branch
("return zero to maintain chaining") that contradicted the code —
the branch returns *UnknownRank, not zero. The top-level doc
already covers this.
2. The Log bullet described internal math (NaN) rather than what the
user actually observes. Rewrite to match the cloud test's empirical
pin: empty inner Scores slice, IDs in insertion order.
3. Add a small qualifier to the Max and Abs bullets so they can be
read in isolation without losing the "RRF's non-positive output"
context from the surrounding paragraph.
4. Document the Min_0 cloud-test assertion's reliance on bit-exact
float equality (server uses f32::min, which per IEEE 754 passes
x through unchanged when x <= 0). Makes the implicit assumption
explicit so a future server implementation change that drifts by
1 ULP is traceable to the assertion style.
* docs(rank): warn that RrfRank.Negate flips result ordering
RrfRank.MarshalJSON auto-negates the fusion sum, so on any non-empty
corpus the score is <= 0. Negate on that input produces a >= 0 value
and pushes the best match to the bottom of the result set.
The doc block previously listed Negate under "behave as expected"
while the cloud test asserted the order flip — this fix moves Negate
into the footgun bullets alongside Abs, matching what the test pins.1 parent 0f5962b commit 7f52522
25 files changed
Lines changed: 4670 additions & 43 deletions
File tree
- .planning
- phases
- 21-rrfrank-arithmetic-fix
- 21.1-rrf-cloud-integration-test-coverage-including-arithmetic-com
- pkg/api/v2
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
48 | | - | |
49 | | - | |
50 | 49 | | |
51 | 50 | | |
52 | 51 | | |
| |||
107 | 106 | | |
108 | 107 | | |
109 | 108 | | |
110 | | - | |
| 109 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
44 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
45 | 61 | | |
46 | 62 | | |
47 | | - | |
| 63 | + | |
| 64 | + | |
48 | 65 | | |
49 | 66 | | |
50 | 67 | | |
| |||
134 | 151 | | |
135 | 152 | | |
136 | 153 | | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
137 | 171 | | |
138 | 172 | | |
139 | 173 | | |
140 | | - | |
| 174 | + | |
141 | 175 | | |
142 | | - | |
| 176 | + | |
143 | 177 | | |
144 | 178 | | |
145 | 179 | | |
146 | | - | |
| 180 | + | |
147 | 181 | | |
148 | 182 | | |
149 | 183 | | |
150 | 184 | | |
151 | 185 | | |
152 | 186 | | |
153 | 187 | | |
| 188 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | | - | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
8 | 9 | | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | | - | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
47 | 53 | | |
48 | 54 | | |
49 | 55 | | |
50 | 56 | | |
51 | 57 | | |
52 | 58 | | |
53 | | - | |
54 | | - | |
| 59 | + | |
| 60 | + | |
0 commit comments