Skip to content

Commit e3ddfed

Browse files
Pratyaksh Sharmafacebook-github-bot
authored andcommitted
Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync (facebookresearch#4910)
Summary: ## Problem Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`). The previous approach had two issues: 1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm 2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats` ## Changes ### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction` - Capture the `HNSWStats` returned by `hnsw.search()` per query - Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention - Call `hnsw_stats.combine()` once, single-threaded, after the parallel region - This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291) ### `FlatHammingDis` — remove stats accumulation entirely - Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking) - Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead) - Remove destructor (no stats to accumulate → no sync needed) ### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation - The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost - Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461) ## Performance Impact - **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free) - Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()` - `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW) - No change to search results — only stats accumulation mechanism changes Reviewed By: mnorris11 Differential Revision: D95911440
1 parent b023178 commit e3ddfed

1 file changed

Lines changed: 15 additions & 13 deletions

File tree

faiss/IndexBinaryHNSW.cpp

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -217,25 +217,33 @@ void IndexBinaryHNSW::search(
217217
using RH = HeapBlockResultHandler<HNSW::C>;
218218
RH bres(n, distances_f, labels, k);
219219

220+
size_t n1 = 0, n2 = 0, ndis = 0, nhops = 0;
221+
220222
#pragma omp parallel
221223
{
222224
VisitedTable vt(ntotal);
223225
std::unique_ptr<DistanceComputer> dis(get_distance_computer());
224226
RH::SingleResultHandler res(bres);
225227

226-
#pragma omp for
228+
#pragma omp for reduction(+ : n1, n2, ndis, nhops)
227229
for (idx_t i = 0; i < n; i++) {
228230
res.begin(i);
229231
dis->set_query((float*)(x + i * code_size));
230232
// Given that IndexBinaryHNSW is not an IndexHNSW, we pass nullptr
231233
// as the index parameter. This state does not get used in the
232234
// search function, as it is merely there to to enable Panorama
233235
// execution for IndexHNSWFlatPanorama.
234-
hnsw.search(*dis, nullptr, res, vt, params_in);
236+
HNSWStats stats = hnsw.search(*dis, nullptr, res, vt, params_in);
237+
n1 += stats.n1;
238+
n2 += stats.n2;
239+
ndis += stats.ndis;
240+
nhops += stats.nhops;
235241
res.end();
236242
}
237243
}
238244

245+
hnsw_stats.combine({n1, n2, ndis, nhops});
246+
239247
#pragma omp parallel for
240248
for (int i = 0; i < n * k; ++i) {
241249
distances[i] = std::round(distances_f[i]);
@@ -267,11 +275,9 @@ template <class HammingComputer>
267275
struct FlatHammingDis : DistanceComputer {
268276
const int code_size;
269277
const uint8_t* b;
270-
size_t ndis;
271278
HammingComputer hc;
272279

273280
float operator()(idx_t i) override {
274-
ndis++;
275281
return hc.hamming(b + i * code_size);
276282
}
277283

@@ -281,21 +287,13 @@ struct FlatHammingDis : DistanceComputer {
281287
}
282288

283289
explicit FlatHammingDis(const IndexBinaryFlat& storage)
284-
: code_size(storage.code_size),
285-
b(storage.xb.data()),
286-
ndis(0),
287-
hc() {}
290+
: code_size(storage.code_size), b(storage.xb.data()), hc() {}
288291

289292
// NOTE: Pointers are cast from float in order to reuse the floating-point
290293
// DistanceComputer.
291294
void set_query(const float* x) override {
292295
hc.set((uint8_t*)x, code_size);
293296
}
294-
295-
~FlatHammingDis() override {
296-
#pragma omp atomic
297-
hnsw_stats.ndis += ndis;
298-
}
299297
};
300298

301299
struct BuildDistanceComputer {
@@ -405,6 +403,10 @@ void IndexBinaryHNSWCagra::search(
405403

406404
res.end();
407405
}
406+
#pragma omp critical
407+
{
408+
hnsw_stats.combine(search_stats);
409+
}
408410
}
409411

410412
#pragma omp parallel for

0 commit comments

Comments
 (0)