Commit 70e3d52
authored
Revert pruning PRs (#331)
# Revert #282 and #327 to restore green CI for VecSim tests (via Claude)
## Problem
Private CI `build-share-lib.yml` workflow's `test-shared-libs` jobs
(VecSim tests) — reduced, gcc-11, and gcc-14 on rockylinux8 — started
failing between **Apr 29** (last green: `0ce34e1`) and **May 6** (first
red: `8050bec`).
### Failing tests
- `SVSTest.quant_modes`
- `SVSTest.save_load`
from `VectorSimilarity/tests/unit/test_svs.cpp:2975`, with assertions
like:
```
Expected equality of these values:
id Which is: 55
(idx + 45) Which is: 54
```
## Root cause
The `test-shared-libs` (reduced) job clones `RedisAI/VectorSimilarity`
and runs its test suite against the SVS shared library. The failure
window coincided with SVS commit **`629a79c`** — *"Enhance heuristic
pruning to handle duplicate clusters (#282)"* — which added a
post-pruning step to both `IterativePruneStrategy` and
`ProgressivePruneStrategy` in `include/svs/index/vamana/prune.h`:
> If `all_duplicates && anchor_set && !result.empty()`, overwrite
`result.back()` with the closest candidate from the pool whose distance
≠ `anchor_dist`.
On the VectorSimilarity test fixture (`v[i] = (i,i,i,i)`, dim=4, n=100),
every node has **symmetric pairs of equidistant neighbors** (e.g. nodes
`i-1` and `i+1` both at `d²=4` from node `i`). These look like
"duplicate clusters" to the heuristic, but geometrically they are
symmetric pairs *in opposite directions*, not redundant edges.
The post-prune step:
1. Overwrote a genuine nearest-neighbor edge with a strictly farther
candidate.
2. Biased replacement toward one direction (the first non-tied pool
entry), which systematically removed edges on one side of every node.
**Result:** the constructed Vamana graph is no longer traversable to
certain neighbors for certain queries. For `query = (50,50,50,50)`, the
exact top-10 by ID (`[45..54]`) can't all be reached; search returns
`55` in place of `54`.
## Attempted fixes
### Attempt 1 — `9cee0f1` (#327, "Fix duplicate-cluster trigger to
require >=2 kept candidates")
Tightened the guard from `!result.empty()` to `result.size() >= 2`.
Correctly prevented the branch from firing when only one neighbor was
retained, but did **not** address the core bug: the heuristic still
overwrites a genuine nearest edge with a farther candidate when 2+
retained neighbors tie in distance.
**Result:** the `test-shared-libs` failures persisted.
### Attempt 2 — append-only variant
Branch:
[`dev/eglaser-sharedci-fix`](https://github.com/intel/ScalableVectorSearch/tree/dev/eglaser-sharedci-fix)
Changed both strategies in `include/svs/index/vamana/prune.h`:
- `result.back() = ...` → `result.push_back(...)` (append instead of
overwrite).
- Added `result.size() < max_result_size` guard (only fire when there's
room).
**Rationale:** if every kept neighbor ties at one distance, each one is
a legitimate nearest edge and must not be replaced. Append a diversity
edge only when capacity allows.
**Result:** the failure pattern changed but the test still failed:
| | Top-10 by ID | Missing |
|---|---|---|
| Before attempt 2 | `[45..53, 55]` | vector **54** |
| After attempt 2 | `[46..55]` | vector **45** (entire set shifted +1) |
The same graph-asymmetry problem reappeared on the opposite side because
the appended "diversity" candidate is still systematically biased
(always the first pool entry past the tie band, which stably falls on
one side of the symmetric ring). The heuristic gains an asymmetric edge
toward lower IDs instead of losing an asymmetric edge toward higher IDs.
## Conclusion
Both #327 and the append-only variant confirm that the diversity
heuristic's **detection criterion is fundamentally wrong**: it uses
distance equality to infer directional redundancy, but same-distance
neighbors can be (and in symmetric data, typically are) in *opposite
directions*.
No local tweak to the guard or the insertion method fixes this — the
criterion itself needs redesign to detect **directional collapse**
rather than **distance collapse**. That's a nontrivial design task and
out of scope for an immediate unblock.
## This PR
Reverts:
- **#282** (`629a79c`) — *Enhance heuristic pruning to handle duplicate
clusters*
- **#327** (`9cee0f1`) — *Fix duplicate-cluster trigger to require >=2
kept candidates* (only exists to patch #282)
This restores the pre-April pruning behavior, which restores green CI
for VecSim tests.
## Follow-up
Issue **#80** (the original motivation for #282) should be reopened with
a note that the fix needs to detect **directional** rather than
**distance** redundancy.1 parent 80e4a6c commit 70e3d52
2 files changed
Lines changed: 0 additions & 167 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | 133 | | |
137 | 134 | | |
138 | 135 | | |
| |||
148 | 145 | | |
149 | 146 | | |
150 | 147 | | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | 148 | | |
162 | 149 | | |
163 | 150 | | |
| |||
184 | 171 | | |
185 | 172 | | |
186 | 173 | | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | 174 | | |
226 | 175 | | |
227 | 176 | | |
| |||
254 | 203 | | |
255 | 204 | | |
256 | 205 | | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | 206 | | |
261 | 207 | | |
262 | 208 | | |
| |||
272 | 218 | | |
273 | 219 | | |
274 | 220 | | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | 221 | | |
286 | 222 | | |
287 | 223 | | |
| |||
300 | 236 | | |
301 | 237 | | |
302 | 238 | | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | 239 | | |
342 | 240 | | |
343 | 241 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | 20 | | |
25 | 21 | | |
26 | 22 | | |
| |||
50 | 46 | | |
51 | 47 | | |
52 | 48 | | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | 49 | | |
0 commit comments